How to Identify I/O Bottlenecks in MS SQL Server

Problem

\r\nWe experience regular slowdowns on our SQL Server databases. After analyzing the memory and CPU usage we would like to continue the root cause investigation by examining I/O bottlenecks. What is your recommendation to recognize I/O related bottlenecks in SQL Server?\r\n

Solution

\r\nThe I/O subsystem is a key factor when it comes to SQL Server performance since database pages are constantly moved in and out of the buffer pool. Also the transaction logs and tempDB generate significant I/O traffic. Therefore you have to ensure that your I/O subsystem performs as expected, otherwise you will be a victim of increased response times and frequent time-outs. In this tip I will describe some of the ways to identify I/O related bottlenecks using the built-in tools and provide some disk configuration ideas.\r\n\r\n


\r\n\r\n

Performance Monitor

\r\nYou can use Performance Monitor to check the load on your I/O subsystem. The following performance counters can be setup to check disk performance.\r\n\r\nThe PhysicalDisk Object: Avg. Disk Queue Length counter shows you the average number of read and write requests that were queued on the selected physical disk. The higher the number the more disk operations are waiting. It requires attention if this value frequently exceeds a value of 2 during peak usage of SQL Server.  If you have multiple drives you should take this number and divide by the number of drives in the array to see if the number is above 2.  For example, you have 4 drives and a disk queue length of 10, this would be 10/4 = 2.5, so this would be the value you want to use not 10.\r\n\r\nAvg. Disk Sec/Read and Avg. Disk Sec/Write shows the average time of a data reads and writes from/to the disk. It is good up to 10 ms, but it is still acceptable if less than 20 ms. Any higher value needs further investigation.\r\n\r\nPhysical Disk: %Disk Time is the ratio of elapsed time when the disk drive was busy with read or write requests. The rule of thumb for this value is that it should be below 50 percent.\r\n\r\nThe counters Disk Reads/Sec and Disk Writes/Sec show you the rate of read/write operations on the disk. It should be less than 85 percent of the disk capacity since the disk access time increases exponentially beyond this value.\r\n\r\nYou can determine the disk capacity by gradually increasing the load on the system. One way to do this is to use SQLIO.  You should look for the point where the throughput is constant, but the latency increases.\r\n\r\nYou can use the counters for RAID configurations with the following calculations:\r\nRaid 0: I/O per disk = (reads + writes) / number of disks\r\nRaid 1: I/O per disk = [reads + (writes*2)] / 2\r\nRaid 5: I/O per disk = [reads + (writes*4)] / number of disks\r\nRaid 10: I/O per disk = [reads + (writes*2)] / number of disks\r\n\r\nHere is an example of your I/O per disk for RAID 1, if we get these values from the counters:\r\n\r\nDisk Reads/sec = 90\r\nDisk Writes/sec = 75\r\nThe formula for I/O on a RAID-1 array is [reads + (writes*2)] / 2 or [90 + (75*2)] / 2 = 120 I/Os per disk\r\n\r\n


\r\n\r\n

Dynamic Management Views

\r\nThere are some useful Dynamic Management Views (DMVs) to check I/O bottlenecks.\r\n\r\nAn I/O latch wait occurs when a page is accessed for reading or writing but the page is not available in the buffer pool. It causes waits on PAGEIOLATCH_EX or PAGEIOLATCH_SH, depending upon the type of request. These wait types can indicate an I/O bottleneck. You can query the sys.dm_os_wait_stats DMV to find latch wait statistics. You can identify I/O problems if you save query outputs of waiting_task_counts and wait_time_ms values from a normal working state of your SQL Server and compare these values when performance is degraded.\r\n

\r\n

select *\r\nfrom sys.dm_os_wait_stats\r\nwhere wait_type like 'PAGEIOLATCH%'\r\norder by wait_type asc

\r\n

\r\nPending I/O requests can be found by querying the following DMVs and can be used to identify which disk is responsible for the bottleneck.\r\n

\r\n

select database_id,\r\n       file_id,\r\n       io_stall,\r\n       io_pending_ms_ticks,\r\n       scheduler_address\r\nfrom sys.dm_io_virtual_file_stats(NULL, NULL) iovfs,\r\n     sys.dm_io_pending_io_requests as iopior\r\nwhere iovfs.file_handle = iopior.io_handle

\r\n

\r\n \r\n\r\n


\r\n\r\n

Disk Fragmentation

\r\nI would recommend that you check the disk fragmentation and the configuration of your disks used by the SQL Server instance.\r\n\r\nFragmentation of files on NTFS can cause significant reductions in performance. Disks should be defragmented regularly and a defragmentation policy and plan should be put in place.  Research shows that in some cases a SAN can actually perform worse with defragmentation enabled thus SANs need to be treated on a case-by-case basis.\r\n\r\nFragmentation of indexes can also cause high I/O utilization on NTFS, but this does not have the same affect on SANs which perform better on random I/Os.\r\n\r\n


\r\n\r\n

Disk Configuration / Best Practices

\r\nAs a general rule, you should have log files on a physical disk that is separate from the data files for better performance.  The I/O profile for a heavily used database data files (including tempDB) are random. The I/O profile for all database log files are sequential except when a transaction needs to be rolled back.\r\n\r\nThe internal disks should only be used for database log files, because they excel at sequential I/O, but perform poorly on random I/Os.\r\n\r\nThe database data and log files should each be placed on their own dedicated disk packs. To ensure optimal performance, I recommend that the database log file be placed on two internal disks configured as RAID 1.  The database data file should reside on a SAN system which is only accessed by SQL Server and has controlled querying and reporting.  Ad hoc access should be disallowed.\r\n\r\nWrite caching should be enabled where possible and you should make sure the cache is protected from power failures and other possible failures.\r\n\r\nTo limit possible I/O bottlenecks for your OLTP system you should not mix OLAP and OLTP environments.   Also, make sure your code is tuned and create indexes where necessary to avoid unnecessary I/Os.\r\n\r\n \r\n

Next Steps

\r\n

    \r\n

  • Collect and compare performance counters
  • \r\n

  • Analyze DMV information
  • \r\n

  • Run SQL Server Profiler to find high Read and Write queries that can be tuned
  • \r\n

\r\n \r\n\r\n \r\n\r\nReference: Tybor Nagi, sql t!p\r\n\r\n 

5 thoughts on “How to Identify I/O Bottlenecks in MS SQL Server”

  1. Google Chrome 11.0.696.68 Google Chrome 11.0.696.68 Windows XP Windows XP
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.68 Safari/534.24

    Hi All,\r\n\r\nHow do I determine “Disk Reads/Sec” is less than 85% Percent ?\r\n\r\nThanks in Advance\r\nJayant dass

  2. Google Chrome 14.0.835.202 Google Chrome 14.0.835.202 Windows 7 Windows 7
    Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1

    20% of Available Memory is way beyond too high for a huge server. Let us, 20% of 20 GB is 4 GB which is a lot to me. If a machine has 4 GB Free memory then, it is doing good !!

    1. Google Chrome 14.0.835.202 Google Chrome 14.0.835.202 Windows 7 Windows 7
      Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1

      Google Books articles on “Pro SQL Server 2008 Administration”: \r\nhttp://books.google.com/books?id=GQdDmkqA2TAC&pg=PA471&lpg=PA471&dq=how+to+determine+%E2%80%9CDisk+Reads/Sec%E2%80%9D+is+less+than+85%25&source=bl&ots=nnCv7bV-z5&sig=8nU3GwtNTNpJjt4wzEt5ESqvCFI&hl=en&ei=L3moTseYIcWg4gS3tIHuDw&sa=X&oi=book_result&ct=result&resnum=4&ved=0CDUQ6AEwAw#v=onepage&q=how%20to%20determine%20%E2%80%9CDisk%20Reads%2FSec%E2%80%9D%20is%20less%20than%2085%25&f=false

Comments are closed.