SQL: Returning the Top X row for each group (SQL Spackle) (good one)

Reference: http://www.sqlservercentral.com/articles/T-SQL/71571/

Returning the Top X row for each group (SQL Spackle)

By Dave Ballantyne, 2012/11/23 (first published: 2010/12/06)

In our table we have a list of runners, the time the course was completed in and the runner’s age. Our task is to retrieve the fastest X runners in each age range.

CREATE TABLE #Runners
(
Runner integer NOT NULL,
Time integer NOT NULL,
Age integer NOT NULL
)
INSERT INTO #Runners
SELECT 1 , 10 , 20 UNION ALL
SELECT 2 , 15 , 20 UNION ALL
SELECT 3 , 11 , 20 UNION ALL
SELECT 4 , 12 , 30 UNION ALL
SELECT 5 , 18 , 30 UNION ALL
SELECT 6 , 9 , 40 UNION ALL
SELECT 7 , 16 , 40 UNION ALL
SELECT 8 , 13 , 30

By far the simplest way is to use a combination of the ranking functions ( Specifically row_number () ) and CTE ‘s (Common Table Expressions). This may not always be the most efficient, but I’ll come back to that point later.

Our first task is to ascertain the order that the runners completed the course in. If we were interested in only the overall, age range aside, order we could simply

Select * 
from #Runners 
order by Time 

and then by adding then TOP clause we can filter the Top X (assuming 2 here)

Select top(2) * 
from #Runners 
order by Time 

However, we do need the TOP X for each age category. As mentioned above we will be using the row_number() ranking function to help us do this. If we were to execute

select * ,row_number() over (order by Time ) as RowN 
from #Runners 
order by Rown 
Runner Time Age RowN

SQL: Understanding SQL Server Statistics (good one)

Reference: http://sqlserverperformance.idera.com/tsql-optimization/understanding-sql-server-statistics/

Understanding SQL Server Statistics

Donabel Santos (twitter (@sqlbelle) | blog) – April 25, 2011

“Statistics provides tools that you need in order to react intelligently to information you hear or read” – David Lane, 2003

If there’s an upcoming election and you are running for office and getting ready to go from town to town city to city with your flyers, you will want to know approximately how many flyers you’re going to bring.

If you’re the coach of a sports team, you will want to know your players’ stats before you decide who to play when, and against who. You will often play a matchup game, even if you have 20 players, you might be allowed to play just 5 at a time, and you will want to know which of your players will best match up to the other team’s roster. And you don’t want to interview them one by one at game time (table scan), you want to know, based on their statistics, who your best bets are.

Just like the election candidate or the sports coach, SQL Server tries to use statistics to “react intelligently” in its query optimization. Knowing number of records, density of pages, histogram, or available indexes help the SQL Server optimizer “guess” more accurately how it can best retrieve data. A common misnomer is that if you have indexes, SQL Server will use those indexes to retrieve records in your query. Not necessarily. If you create, let’s say, an index to a column City and <90% of the values are ‘Vancouver’, SQL Server will most likely opt for a table scan instead of using the index if it knows these stats.

For the most part, there *may* be minimal we need to do to keep our statistics up-to-date (depending on your configurations), but understanding statistics a little bit better is in order to help us understand SQL Server optimization a little bit more.

How are statistics created?

Statistics can be created different ways
– Statistics are automatically created for each index key you create.

– If the database setting autocreate stats is on, then SQL Server will automatically create statistics for non-indexed columns that are used in queries.

– CREATE STATISTICS

What do statistics look like?

If you’re curious, there’s a couple ways you can peek at what statistics look like.

Option 1 – you can go to your Statistics node in your SSMS, right click > Properties, then go to Details. Below is a sample of the stats and histogram that’s collected for one of the tables in my database

Option 2 – you can use DBCC SHOW_STATISTICS WITH HISTOGRAM

The histograms are a great way to visualize the data distribution in your table.

How are statistics updated?

The default settings in SQL Server are to autocreate and autoupdate statistics.

Notice that there are two (2) options with the Auto Update statistics.
Auto Update Statistics basically means, if there is an incoming query but statistics are stale, SQL Server will update statistics first before it generates an execution plan.
Auto Update Statistics Asynchronously on the other hand means, if there is an incoming query but statistics are stale, SQL Server uses the stale statistics to generate the execution plan, then updates the statistics afterwards.

However, if you want to manually update statistics, you can use either sp_updatestats or UPDATE STATISTICS <statistics name>

How do we know statistics are being used?

One good check you can do is when you generate execution plans for your queries:

check out your “Actual Number of Rows” and “Estimated Number of Rows”.

If these numbers are (consistently) fairly close, then most likely your statistics are up-to-date and used by the optimizer for the query. If not, time for you to re-check your statistics create/update frequency.

What configuration settings should we set?

There may be cases when you may want to disable statistics update temporarily while you’re doing massive updates on a table, and you don’t want it to be slowed down by the autoupdate.

However, for the most part, you will want to keep the SQL Server settings:
– auto create statistics
– auto update statistics

References:

Rob Carrol. http://blogs.technet.com/b/rob/archive/2008/05/16/sql-server-statistics.aspx

Elisabeth Redei has an excellent 3-part series on SQL Server Statistics:
http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/lies-damned-lies-and-statistics-part-i.aspx
http://sqlblog.com/blogs/elisabeth_redei/archive/2009/08/10/lies-damned-lies-and-statistics-part-ii.aspx
http://sqlblog.com/blogs/elisabeth_redei/archive/2009/12/17/lies-damned-lies-and-statistics-part-iii-sql-server-2008.aspx

Excellent Books that touch on statistics
– Apress. Grant Fritchey & Sajal Dam. SQL Server 2008 Query Performance Tuning Distilled.
RedGate. Holger Schmeling. SQL Server Statistics

SQL: Performance Tuning Re-indexing and Update Statistics – A Case Study (good one)

http://www.sql-server-performance.com/2012/performance-tuning-re-indexing-update-statistics/

Performance Tuning Re-indexing and Update Statistics – A Case Study

By satnam singh

Recently we started experiencing a very strange issue in our production reporting environment where the Re-indexing and Update Statistics operation suddenly began taking more than 2 days to complete and was thus causing blockage in the database which in turn caused impairment in application performance.

Reporting Server Configuration Details:
SQL Server Version: SQL Server 2005 Enterprise Edition
Service Pack: 3
Windows Version: Windows Server 2003 R2 Enterprise Edition
Number of CPU’s: 12
RAM: 16 GB

To troubleshoot this issue, I first separated the Re-indexing and Update Statistics operations.

Index Re-building:

Originally the Re-indexing operation was performed using the DBCC DBReindex command. When this command was executed against the database, it drops and re-creates every index present in the tables of the particular database which will take a considerable amount of time as well as cause heavy usage of the Server Hardware resources.

As per best practice recommended by Microsoft there is no need for us to perform the re-indexing of each and every index in the tables of the database. If the fragmentation level of a particular index is less than 30% then we can simply re-organize it, if it is greater than 30% then we should elect for the re-building of the index.

First we need to understand the difference between the re-build and re-organize operations. Index rebuild will drop the existing index and re-create it again whereas index re-organize will physically re-organize the leaf node of the index. One thing to remember here is that the rebuildng operation is a very high resource utilization task.

Based upon this, I decided to modify the Re-indexing task as follows: