{"id":206,"date":"2016-05-10T08:49:30","date_gmt":"2016-05-10T07:49:30","guid":{"rendered":"https:\/\/sqldoubleg.live-website.com\/?p=206"},"modified":"2016-10-19T09:40:17","modified_gmt":"2016-10-19T08:40:17","slug":"successful-anti-patterns-key-lookup-vs-rid-lookup","status":"publish","type":"post","link":"https:\/\/www.sqldoubleg.com\/es\/2016\/05\/10\/successful-anti-patterns-key-lookup-vs-rid-lookup\/","title":{"rendered":"Successful Anti-Patterns, Key Lookup VS RID Lookup"},"content":{"rendered":"<p>This post wants to show some performance improvements we may have by doing things the way we&#8217;ve been told not to&nbsp;When it comes to technical subjects like databases, we must always look for proven solutions and best practices, which usually take us from A to B in the most efficient way. Usually.<\/p>\n<p>But what happens if one of those approaches considered best practices is not as efficient as the one supposed to be the baddie?<br \/>\n&nbsp;<br \/>\n<strong style=\"color:red\">DISCLAIMER: This is a demonstration which shows a fact or series of facts, not pretending to set or contradict any best practice.<\/strong><br \/>\n&nbsp;<br \/>\nWith that said, let&#8217;s start with the topic in title.<\/p>\n<p>In SQL Server, there are a number of reasons why you want to have most of (if not all) your tables created as clustered indexes and not heaps. There are also lots of information about how we should carefully choose on which column[s] from our tables we would create them as this choice will help our performance in general. My favorite expert for this topic is Kimberly Tripp (<a href=\"http:\/\/www.sqlskills.com\/blogs\/kimberly\/\" target=\"_blank\">b<\/a>|<a href=\"https:\/\/twitter.com\/KimberlyLTripp\" target=\"_blank\">t<\/a>), so I strongly recommend reading <a href=\"http:\/\/www.sqlskills.com\/blogs\/kimberly\/ever-increasing-clustering-key-the-clustered-index-debate-again\/\" target=\"_blank\">this<\/a>, <a href=\"http:\/\/www.sqlskills.com\/blogs\/kimberly\/the-clustered-index-debate-continues\/\" target=\"_blank\">this<\/a> and <a href=\"http:\/\/www.sqlskills.com\/blogs\/kimberly\/more-considerations-for-the-clustering-key-the-clustered-index-debate-continues\/\" target=\"_blank\">this<\/a>. Just checking the dates on the posts, you can picture how long this has been around.<\/p>\n<p>But sometimes we can find some behaviours that contradicts all we&#8217;ve learned and get unexpected benefits from doing things <em>\u00abwrong\u00bb<\/em>.<\/p>\n<p>Let&#8217;s set up a small demo where to see this in action, starting by restoring a copy of [AdventureWorksXXXX] on our server, in my case the 2014 version of it<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\n--===============================================================================================\r\n-- Set up\r\n--===============================================================================================\r\n\r\nUSE master\r\nGO\r\nIF DB_ID('AdventureWorks2014') IS NOT NULL BEGIN\r\n\tALTER DATABASE AdventureWorks2014 SET SINGLE_USER WITH ROLLBACK IMMEDIATE\r\n\tDROP DATABASE AdventureWorks2014\r\nEND\r\n\r\nRESTORE FILELISTONLY FROM DISK = '.\\AdventureWorks2014.bak'\r\n\r\nRESTORE DATABASE AdventureWorks2014\r\nFROM DISK = '.\\AdventureWorks2014.bak'\r\nWITH RECOVERY\r\n, MOVE 'AdventureWorks2014_Data' TO 'C:\\Program Files\\Microsoft SQL Server\\MSSQL12.MSSQL2014\\MSSQL\\DATA\\AdventureWorks2014_Data.mdf'\r\n, MOVE 'AdventureWorks2014_Log' TO 'C:\\Program Files\\Microsoft SQL Server\\MSSQL12.MSSQL2014\\MSSQL\\DATA\\AdventureWorks2014_Log.ldf'\r\nGO\r\n\r\n<\/pre>\n<p>&nbsp;<br \/>\nAnd then we create a HEAP table by simply SELECTing one INTO another<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE AdventureWorks2014\r\nGO \r\n \r\nSELECT * \r\nINTO Person.Address_HEAP\r\nFROM Person.Address\r\n\r\n<\/pre>\n<p>&nbsp;<br \/>\nThe first [unexpected] benefit is that the HEAP is smaller than the Clustered Index.<br \/>\nSince there is no B-Tree to navigate to our records, those sort of pages are not required in a HEAP, so the result is a slightly smaller table, not enough to say it&#8217;s and advantage but there it is.<\/p>\n<p><a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/05\/01_data_size.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-207\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/05\/01_data_size.png\" alt=\"01_data_size\" width=\"1228\" height=\"626\" \/><\/a><br \/>\n&nbsp;<br \/>\nSo, to the point, let&#8217;s create a nonclustered index on our HEAP to match an existing one on the clustered table.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE AdventureWorks2014\r\nGO \r\n\r\nCREATE NONCLUSTERED INDEX [IX_Address_StateProvinceID] ON [Person].[Address_HEAP]\r\n(\r\n\t[StateProvinceID] ASC\r\n)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]\r\nGO\r\n<\/pre>\n<p>&nbsp;<br \/>\nNow we need a bit of imagination but you&#8217;ll see it makes sense (hopefully)<\/p>\n<p>We have our 2 tables with a non clustered index on the column [StateProvinceID], so the query optimizer is likely to choose the combination of Non Clustered Index Seek + Lookup if we query the table for a specific value that do not exceed the <a href=\"http:\/\/www.sqlskills.com\/blogs\/kimberly\/the-tipping-point-query-answers\/\" target=\"_blank\">tipping point<\/a>.<\/p>\n<p>In this case I will force the optimizer to choose that plan like we are in a SP that suffers from <a href=\"https:\/\/blogs.technet.microsoft.com\/mdegre\/2011\/11\/06\/what-is-parameter-sniffing\/\" target=\"_blank\">parameter sniffing<\/a>, so take note because that is a common behaviour.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE AdventureWorks2014\r\nGO \r\n\r\nSET STATISTICS IO, TIME ON\r\n\r\nDECLARE @StateProvinceID INT = 9\r\n\r\nSELECT * FROM Person.Address\r\nWHERE StateProvinceID = @StateProvinceID\r\nOPTION (OPTIMIZE FOR (@StateProvinceID = 1))\r\n\r\nGO\r\n\r\nDECLARE @StateProvinceID INT = 9\r\n\r\nSELECT * FROM Person.Address_HEAP\r\nWHERE StateProvinceID = @StateProvinceID\r\nOPTION (OPTIMIZE FOR (@StateProvinceID = 1))\r\n<\/pre>\n<p>&nbsp;<br \/>\nAs expected, on both queries the plan is identical with the exception of the <strong>Key Lookup<\/strong> versus the <strong>RID Lookup<\/strong><\/p>\n<p><a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/05\/02_Query_Plan.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-209\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/05\/02_Query_Plan.png\" alt=\"02_Query_Plan\" width=\"1013\" height=\"525\" srcset=\"https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/05\/02_Query_Plan.png 1013w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/05\/02_Query_Plan-300x155.png 300w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/05\/02_Query_Plan-150x78.png 150w\" sizes=\"(max-width: 1013px) 100vw, 1013px\" \/><\/a><br \/>\n&nbsp;<br \/>\nBoth plans are the same but let&#8217;s get a better look to STATISTICS IO to see what was going on under the hood.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\n\/*\r\n(4564 row(s) affected)\r\nTable 'Address'. Scan count 1, logical reads 9139, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\r\n\r\n(1 row(s) affected)\r\n\r\n SQL Server Execution Times:\r\n   CPU time = 15 ms,  elapsed time = 176 ms.\r\nSQL Server parse and compile time: \r\n   CPU time = 0 ms, elapsed time = 0 ms.\r\n\r\n SQL Server Execution Times:\r\n   CPU time = 0 ms,  elapsed time = 0 ms.\r\n\r\n(4564 row(s) affected)\r\nTable 'Address_HEAP'. Scan count 1, logical reads 4577, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\r\n\r\n(1 row(s) affected)\r\n\r\n SQL Server Execution Times:\r\n   CPU time = 0 ms,  elapsed time = 95 ms.\r\nSQL Server parse and compile time: \r\n   CPU time = 0 ms, elapsed time = 0 ms.\r\n\r\n SQL Server Execution Times:\r\n   CPU time = 0 ms,  elapsed time = 0 ms.\r\n\r\n*\/\r\n<\/pre>\n<p>&nbsp;<br \/>\nYou can see we need almost double the reads (and time) to finish with the lookups in the clustered index versus the HEAP.<\/p>\n<p>The explanation is that for every lookup, you must locate the row, and that is done in different ways<br \/>\n&#8211; If there is a clustered index, the non clustered index references a row by the clustering key, so we must check the upper levels of the B-tree to locate the row at the leaf levels.<br \/>\n&#8211; However in the HEAP, rows are uniquely identified by a RID which points exactly to a specific file, page and slot in our database, and that&#8217;s the only needed by the storage engine to go fetch that row, so the ratio of reads per fetched row looks more like 1:1 as opposed to the 2:1 in the clustered index in this example.<br \/>\n&nbsp;<br \/>\n<strong>Conclusion<\/strong><br \/>\n&nbsp;<br \/>\nMost of the time following best practices will result in a more robust a fast performing database, but as always there are some behaviors that might contradict them.<\/p>\n<p>The takeaway of this post is that you do not have to assume best practices are always right for each and every case, it is more important to learn how things work and have the knowledge to decide when those best practices really apply to your environment or workload, because every question may be answered with a big \u00abIt Depends\u00bb.<\/p>\n<p>As I said in the disclaimer, I&#8217;m not saying now you should go drop all your clustered indexes because of this example. There are certainly more benefits doing things right and I only pretend you feel more curious about how SQL Server works.<\/p>\n<p>If you have experience with other anti-patterns which can have a use case, please feel free to share them and thanks for reading.<br \/>\n&nbsp;<br \/>\n&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post wants to show some performance improvements we may have by doing things the way we&#8217;ve been told not to&nbsp;When it comes to technical subjects like databases, we must always look&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[35,36,5,12,25,49],"_links":{"self":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/206"}],"collection":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/comments?post=206"}],"version-history":[{"count":1,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/206\/revisions"}],"predecessor-version":[{"id":212,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/206\/revisions\/212"}],"wp:attachment":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/media?parent=206"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/categories?post=206"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/tags?post=206"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}