Microsoft EDW Architecture, Guidance and Deployment Best Practices
Chapter 5: Querying, Monitoring, and Performance Tuning
By Microsoft Corporation
Contributing writers from Solid Quality Mentors: James Miller
Technical reviewers from Microsoft: Benjamin Wright-Jones, Steve Young, Ross LoForte
Technical reviewers from Solid Quality Mentors: Gianluca Hotz, Larry Barnes
Contributing editors from Solid Quality Mentors: Kathy Blomstrom
Applies to: SQL Server 2008 R2
Chapter 5: Querying, Monitoring, and Performance Tuning 1
Query Optimization Overview 3
Developing Queries with Attention to Both ‘What’ and ‘How’ 4
Query Execution Plans 5
Sample Database Overview 5
Benefits of De-normalization from a Data Warehouse Query Perspective 7
Partitioned Tables 7
Queries Based on the Partitioned Column 8
Queries Indirectly Based on a Partitioned Column 11
Querying with Partitioned Indexes 13
Queries on a Partitioned Heap 17
Covering Indexes 18
Filtered Indexes 19
Indexed Views 20
Routine Aggregation Using Summary Tables in the Data Warehouse 21
Key Query Performance Metrics 23
Options for Monitoring T-SQL Code 24
Setting Up the Management Data Warehouse and Data Collection Components 25
Standard Management Data Warehouse Reporting 31
Querying the SQL Trace Collection in the Management Data Warehouse 36
Performance Tuning 39
Targeted Performance Tuning 39
Query Tuning Based on Disk I/O 40
Working with Execution Plans 45
Join Operators 47
Isolation Level Considerations 52
Best Practices Summary 54
Conclusion and Resources 54
Additional Resources 55
Data warehouses are read-optimized versions of line-of-business (LOB) systems and other source feeds. The sheer volume of data in a data warehouse can result in queries taking hours to complete while consuming large amounts of server resources. Worst case, queries may never complete within the required time, while negatively impacting other data warehouse workloads. Successful data warehouses require development team members to have a deeper understanding of query dynamics, as well as of best practices for monitoring, and performance tuning.
This chapter’s audience is the data warehouse team members responsible for developing and optimizing queries and monitoring data warehouse query workloads on the Microsoft SQL Server platform.
This chapter is organized into following sections:
SQL Server best practices
Conclusion and resources
Query Optimization Overview
Figure 5-1 shows how querying, monitoring, and tuning are related.
Figure 5-1: Query optimization cycle
Database professionals responsible for building large data warehouse applications need to be equipped with knowledge of SQL Server techniques such as table partitioning, indexing strategies, query isolation levels, execution plans, SQL Profiler, ETL techniques, and a variety of performance-related methods for optimization.
However, this is only a first step. The data warehouse also needs to be systematically monitored for excess resource utilization and anomalous behavior. Metrics collected from monitoring processes should enable query and server base lining, expose performance trending, and help with long-term capacity planning.
A key deliverable of a monitoring solution should be the ability to pinpoint reoccurring queries that carry an exceptionally high database cost. This information should then form a “to-do” list for subsequent performance tuning efforts. This approach should provide the highest return on investment for the time devoted to optimizing data warehouse queries.
Developing Queries with Attention to Both ‘What’ and ‘How’
There are times when it is enough to simply know the answer—the “what.” For example, when your spouse asks, “What time do you plan to be home from work?” you can generally give an ETA without delving into the method of calculation based of tasks, promises, deadlines, and commuting hazards that stand in your way.
Contrast this to the “what and how” of a common database query. Suppose your business consumers need to determine the number of orders placed for a widget during the previous day. Of paramount importance is the “what” (i.e., the correct number of orders), which you can produce by properly incorporating business rules and database logic. However, you may also need to examine how the results are derived if the query runs too long.
Now consider a large data warehouse query. In this example, let’s say you need to determine the changes to sales volume based on a correlation to Internet banner ad impressions. This scenario typically requires the analysis of large tables and filtering based on a variety of criteria. You, of course, still need accuracy, but the query plan may become critical, especially if the execution time expands from minutes to hours, and beyond. Bottom line: You will want to become very familiar with estimated and actual query plans when working in a data warehouse—or in other words, the “how” of your SQL queries.
Note: The term query in this chapter refers to all SELECT, INSERT, UPDATE, and DELETE statements that are often batched together in scripts, stored procedures, and ad hoc queries.