A successful data warehouse requires a solid data architecture. Data architecture is a broad topic, and this chapter has focused on the data architect and data developer responsibilities and deliverables, including database architecture, platform architecture, and data models.
First, the database architecture depends on the data warehouse implementation pattern, which is typically a centralized EDW, a federated data mart, or a hub-and-spoke approach. In all cases, the data warehouse team is faced with a series of challenges, including scope (maintaining a single version of the truth throughout the implementation), scale (handling huge data volumes), and quality (delivering results that business users trust).
Providing a single version of the truth presents the data warehouse team with significant technical challenges. However, there are larger business and process issues that have resulted in the emergence of data governance as a key corporate activity.
Master data and MDM are crucial to arriving at a single version of the truth and are old problems in data warehousing that have spawned an emerging class of software products. The decision about whether to purchase and use an MDM software product is not a forgone conclusion and depends on a variety of factors.
Once you’ve selected an appropriate data warehouse implementation pattern, a robust platform architecture is required to support the data warehouse volumes, including loading data into the data warehouse and obtaining results from the data warehouse. The SQL Server platform provides customers with a variety of options, including reference hardware solutions (Fast Track Data Warehouse), data warehouse appliances (SQL Server 2008 R2 PDW), and data virtualization (SQL Server 2008 R2 Enterprise Edition or Data Center).
As stated above, physical best practices and guidance within this chapter are for the symmetric multi-processing (SMP) versions of SQL Server 2008 R2 due to some differences in functionality between the SQL Server 2008 R2 SMP release and the initial release of PDW.
The next key deliverable is developing the correct data models for the different databases within the data warehouse. The level of denormalization your database development team chooses depends on whether the database’s user community is business consumers (denormalized) or data integration developers and data stewards (more normalized).
As we can see in this chapter, the data architecture deliverables provide the foundation for the core data warehouse activities, which we cover in the remaining chapters of this toolkit:
Loading data into the data warehouse (Chapter 3 – Data Integration)
Managing the data warehouse (Chapter 4 – Database Administration)
Retrieving results from the data warehouse (Chapter 5 – Querying, Monitoring, and Performance)
To learn more about data warehouse data architecture considerations and best practices, see the following links:
Best Practices for Data Warehousing with SQL Server 2008
Clustered Indexes and Heaps
Data Compression: Strategy, Capacity Planning and Best Practices
Data Governance Institute’s Web site:
The Data Governance & Stewardship Community of Practice
The Data Loading Performance Guide
Data Warehousing 2.0 and SQL Server: Architecture and Vision
Data Warehouse Design Considerations
Fast Track Data Warehouse 2.0 Architecture
Hub-And-Spoke: Building an EDW with SQL Server and Strategies of Implementation
Introduction to New Data Warehouse Scalability Features in SQL Server 2008
An Introduction to Fast Track Data Warehouse Architectures
Introduction to the Unified Dimensional Model (UDM)
Kimball University: Data Stewardship 101: First Step to Quality and Consistency
Partitioned Tables and Indexes in SQL Server 2005
Partitioned Table and Index Strategies Using SQL Server 2008
Scaling Up Your Data Warehouse with SQL Server 2008
Storage Top 10 Best Practices
Strategies for Partitioning Relational Data Warehouses in Microsoft SQL Server
Thinking Global BI: Data-Warehouse Principles for Supporting Enterprise-Enabled Business-Intelligence Applications
Top 10 SQL Server 2005 Performance Issues for Data Warehouse and Reporting Applications
Using SQL Server to Build a Hub-and-Spoke Enterprise Data Warehouse Architecture
Microsoft Corporation Copyright © 2010