Into SQL: Avoid TempDB contention

A great discussion on msdn blogs, that any DBA should read when feels not certain about tempdb configuration, it summaries what should be done, and speaks about some myths of worker writes, it is worth reading

http://blogs.msdn.com/b/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx

SQL Server Uses One Thread Per Data File

The Legend grew from the following SQL Server behavior - “When SQL Server is creating a database file (.mdf, .ndf, .ldf) it writes zero’s to all bytes in the file and stamps the appropriate allocation structures. SQL Server can use a thread per file when creating a database.” This is a true statement but leaves room for interpretation so allow me to clarify with the following examples.

The true behavior is that for each UNIQUE disk drive specified a thread is used to initialize the database files. The number of threads is capped by a reasonable pool limit around one half of the total worker thread limit.

create database …. name = c:\MyFile1.mdf, name=c:\MyFile1.ndf

The SQL Server would use a single worker to create this database initializing MyFile1.mdf and then MyFile2.ndf.

create database …. name = c:\MyFile1.mdf, name=D:\MyFile1.ndf

The SQL Server would use two workers to create this database one thread initializing MyFile1.mdf and the other MyFile2.ndf because different drives have been specified. SQL Server makes the assumption that different drives have the possibility of unique I/O paths.

This information has been taken out of context and propagated incorrectly as SQL Server has a thread per database file so adding more files to a database can increase I/O performance. Adding more files and properly aligning them with storage can increase the I/O performance for a variety of reasons but achieving new I/O threads per file is NOT one of them. SQL Server I/O is all about I/O response time. You can have a variety of I/O configurations as long as the response times are acceptable.

The SQL Server 2005 – Instant File Initialization – feature significantly reduces the impact of database creation because zero’s don’t have to be stamped in all bytes of a database file, only the log files. This reduces the gain from using multiple threads during database creation.

SQL Server 2005 and Newer

Actions such as create and drop database assign a work unit number based on the volume identity.   This means each file specified in the create/drop database statement gets assigned a work unit number.
The APIs GetVolumePathName and GetVolumeNameForMountPoint are used to obtain the identity of the volume. Only when these APIs return an error do we revert to the drive identifier behavior, making the assumption that different drive letters are associated with unique volumes. A background worker is created to handle each unique work unit.
The following illustration the drives C and D are using Volume A and E used Volume B.   C and D will be assigned to Work Unit 1 and processed by Worker 1 while E will be assigned to Work Unit 2 and be processed by Worker 2.

The Truth

SQL Server uses asynchronous I/O allowing any worker to issue an I/O requests regardless of the number and size of the database files or what scheduler is involved. In fact, some may remember SQL Server 6.5 and previous versions had the sp_configure option ‘max async I/O’ that was commonly set to 255 and was removed when SQL Server 7.0 released.

SQL Server controls the async I/O depth with a set of FCBIoRequest structures. These structures hold the I/O request information, including the OVERLAPPED structure. There is a single pool of these structures for the entire instance of SQL Server. For SQL Server 6.5 the maximum number of these structures was controlled with the configuration option.

Starting with SQL Server 7.0 the number of these structures is only limited by available memory. It is actually a bit more complicated than just can a new FCBIoRequest be allocated. SQL Server has basic throttles for lazy writer, checkpoint, read ahead and other activities. This means that the practical limit is ((# of workers) * (action practical limit)). For example, the practical limit for checkpoint is generically no more than 100 outstanding I/O requests a any point in time and less if the target response time is not being honored.

Tempdb

The immediate question I get is – “What about all the KB articles on tempdb, a file for scheduler and such?” This is again a misinterpretation of the legend as tempdb tuning is all about the allocation activity, not a number of threads per file (no such thing). Workers use tempdb like any other database, any worker can issue I/O to and from tempdb as needed.

Tempdb is the database with the highest level of create and drop actions and under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks. SQL Server 2005 reduces contention with the ‘cached temp table’ feature and allocation contention skip ahead actions.

When multiple workers are attempting to create or drop objects in tempdb you can decrease the bottleneck by having

· As many files in tempdb as schedulers

· All files sizes are equal

· Uniform allocations enabled (-T1118)

By configuring in this way each new allocation occurs in round-robin fashion across the files and only on uniform extents. SQL Server only allows a single worker to be active on each scheduler at any point in time so if every ‘running’ worker is attempting tempdb allocations they will logically partition to a separate file in tempdb, reducing the allocation contentions.

I/O Affinity

I/O affinity is the ability to dedicate a scheduler to the processing of I/O requests. This muddied the water as you can imagine adding to the legend.

I/O affinity does not understand database or database specifics. It is only a queuing mechanism for issuing and completing I/O. A worker determines it needs to perform an I/O and does all the basic work. The I/O affinity hook is just before the worker issues the actual ReadFile or WriteFile call. Instead of issuing the I/O the worker puts the I/O request on a queue. The queue is serviced by the dedicated scheduler instead.

A Disk Queue Length greater than 2 indicates an I/O bottleneck

A statement was published many years ago that sustained disk queue length greater than 2 is an indication of an I/O bottleneck. This statement is still true if the application is not designed to handle the situation. SQL Server is designed to push disk queue lengths above 2 when it is appropriate.

SQL Server uses async I/O to help maximize resource usage. SQL Server understands that it can hand off an I/O request to the I/O subsystem and continue with other activity. Let’s look an example of this.

SQL Server checkpoint posts up to 100 I/O requests and monitors the I/O response time in order to properly throttle checkpoint impact. When the I/O response time exceeds the target the number of I/Os is throttled. The disk queue length can easily exceed 2 and not be an indication of a subsystem problem. SQL Server is attempting to maximize the I/O channel.

SQL Server does read ahead or tempdb spooling actions. The workers seldom directly wait for the I/O to complete. Instead the workers continue on with other actives taking advantage of CPU processing power instead of waiting for the I/O response. You would not want SQL Server to read a page, add it to a sort, and write the sort page during a large sort. Instead SQL Server will post the sort I/O in motion and return to the next read page and continue sorting. The sort operations track the outstanding I/Os and check completion status at later intervals.

The next time someone looks at the disk counters on the system and states the disk queue length is above 2 and this is a problem take a minute to look at other data points. The key for SQL Server is the I/O response times. Start by looking at the Average Disk Seconds Per Transfer for the same I/O path. Then look at the SQL Server sysprocesses and waitstats for buffer based I/O wait information. You want to see evidence from SQL Server that the disk queue length is related to an I/O bottleneck. Don’t just take the depth of 2 at face value.

- Bob Dorr

Into SQL

Tuesday, September 25, 2012

Avoid TempDB contention

No comments:

Post a Comment