{"id":316,"date":"2016-09-21T14:34:46","date_gmt":"2016-09-21T13:34:46","guid":{"rendered":"https:\/\/sqldoubleg.live-website.com\/?p=316"},"modified":"2016-09-21T14:49:40","modified_gmt":"2016-09-21T13:49:40","slug":"database-design-matters-choosing-the-clustering-key","status":"publish","type":"post","link":"https:\/\/www.sqldoubleg.com\/es\/2016\/09\/21\/database-design-matters-choosing-the-clustering-key\/","title":{"rendered":"Database Design Matters, choosing the Clustering Key"},"content":{"rendered":"<p>There has been a permanent debate about which is the right clustering key for our tables since the beginning of the time, this time I&#8217;ll get a bit deeper to show you the possible consequences of that choice&nbsp;There has been a permanent debate about which is the right clustering key for our tables since the beginning of the time, well the SQL Server time :), and repeating the same mantras are not going to add anything new, so I will not do it.<\/p>\n<p>This post is more intended to show what happens when you choose a random value as the key for your clustered index from a much deeper angle. <\/p>\n<p>Hopefully you will find very fascinating how the theoretical knowledge about SQL Server internals finally makes total sense.<\/p>\n<p>&nbsp;<br \/>\n<strong>Background<\/strong><\/p>\n<p>This far I assume, in general terms, everyone agrees that using a UNIQUEIDENTIFIER column as our clustered index key is bad, we&#8217;ve heard many times it will produce fragmentation and that&#8217;s really bad, so we tend not to although there might be some use cases (one of these days I&#8217;ll try write something).<\/p>\n<p>But how does this happen and how bad is it? Well, that&#8217;s the fun part, so let&#8217;s get to it.<\/p>\n<p>&nbsp;<br \/>\n<strong>Preparing the playground<\/strong><\/p>\n<p>We&#8217;d need a new database and a new table with our bad practice ready to hit us.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nCREATE DATABASE bad_cluster_key\r\nGO\r\n\r\nUSE bad_cluster_key\r\nGO\r\n\r\nCREATE TABLE dbo.BadClusterKey(\r\nID UNIQUEIDENTIFIER NOT NULL CONSTRAINT PK_BadClusterKey PRIMARY KEY DEFAULT NEWID()\r\n, Col1 CHAR(100) DEFAULT REPLICATE('A', 100)\r\n, Col2 CHAR(100) DEFAULT REPLICATE('B', 100)\r\n)\r\nGO \r\n\r\nINSERT INTO dbo.BadClusterKey DEFAULT VALUES\r\nGO 1000\r\n<\/pre>\n<p>I&#8217;ve just created the table and done 1000 single INSERT, since the clustered key has a pretty random value returned by NEWID(), inserts have been occurring in random places all over our table (clustered index).<\/p>\n<p>In order to remove the fragmentation we have, I&#8217;m going to rebuild it.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nSELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.BadClusterKey'), NULL, NULL, 'DETAILED')\r\n-- 41 pages\r\n\r\nALTER INDEX PK_BadClusterKey ON dbo.BadClusterKey REBUILD\r\n\r\nSELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.BadClusterKey'), NULL, NULL, 'DETAILED')\r\n-- 29 pages\r\nGO\r\n<\/pre>\n<p><a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/09\/01_physical_stats.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/09\/01_physical_stats.png\" alt=\"01_physical_stats\" width=\"1323\" height=\"247\" class=\"aligncenter size-full wp-image-317\" srcset=\"https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/01_physical_stats.png 1323w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/01_physical_stats-300x56.png 300w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/01_physical_stats-1024x191.png 1024w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/01_physical_stats-150x28.png 150w\" sizes=\"(max-width: 1323px) 100vw, 1323px\" \/><\/a><\/p>\n<p>You can see how the size of the index is now smaller because we got rid of the fragmentation. <\/p>\n<p>So far I haven&#8217;t show you the real deal, because with a new table is difficult but since we now have a number of rows and virtually no fragmentation, we can see the effect of inserting new rows.<\/p>\n<p>Before doing so I want to be sure all my transaction rows will stay in the log, so time for a backup.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nBACKUP DATABASE bad_cluster_key TO DISK = 'NUL'\r\nGO\r\nBACKUP LOG bad_cluster_key TO DISK = 'NUL'\r\nGO\r\nCHECKPOINT\r\nGO\r\n<\/pre>\n<p>Now we are ready to do another 10 (random) INSERT and see what has been going on there.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nINSERT INTO dbo.BadClusterKey DEFAULT VALUES\r\nGO 10\r\n\r\nSELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.BadClusterKey'), NULL, NULL, 'DETAILED')\r\nGO\r\n-- 38\r\n<\/pre>\n<p><a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/09\/02_physical_stats_after_inserts.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/09\/02_physical_stats_after_inserts.png\" alt=\"02_physical_stats_after_inserts\" width=\"1315\" height=\"145\" class=\"aligncenter size-full wp-image-318\" srcset=\"https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/02_physical_stats_after_inserts.png 1315w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/02_physical_stats_after_inserts-300x33.png 300w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/02_physical_stats_after_inserts-1024x113.png 1024w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/02_physical_stats_after_inserts-150x17.png 150w\" sizes=\"(max-width: 1315px) 100vw, 1315px\" \/><\/a><\/p>\n<p>You can see how we have grown 9 pages, almost 1 page per row we have inserted&#8230; wow! that&#8217;s huge!<\/p>\n<p>First and most noticeable consequence is fragmentation, but that&#8217;s not deep enough. I was performing a full plus transaction log backups to have less activity in our transaction log so I can find the 10 row I&#8217;ve just inserted easily.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nSELECT [Begin Time]\r\n\t\t, [Transaction ID]\r\n\t\t, [Transaction Name]\r\n\t\t, [AllocUnitName]\r\n\t\t, [Page ID]\r\n\t\t, [Slot ID]\r\n\t\t, [Parent Transaction ID]\r\n\t\t, SUSER_SNAME([Transaction SID]) AS [Login Name]\r\n\t\t, [New Split Page]\r\n\t\t, [Rows Deleted]\r\n\t\t, [Description]\r\n\t\t--, * \r\n\tFROM fn_dblog(NULL, NULL)\r\nGO\r\n<\/pre>\n<p><a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/09\/03_fn_dblog.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/09\/03_fn_dblog.png\" alt=\"03_fn_dblog\" width=\"1744\" height=\"558\" class=\"aligncenter size-full wp-image-319\" srcset=\"https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/03_fn_dblog.png 1744w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/03_fn_dblog-300x96.png 300w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/03_fn_dblog-1024x328.png 1024w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/03_fn_dblog-150x48.png 150w\" sizes=\"(max-width: 1744px) 100vw, 1744px\" \/><\/a><br \/>\nThat&#8217;s still hell of a lot of information, so I&#8217;m going to focus in a single INSERT (transaction) and to show you what I&#8217;m interested in:<\/p>\n<ol>\n<li>&#8211; (Row 49, transaction ID ~0743) We have a row to be inserted which happens that <strong>must<\/strong> be placed in a specific page, because the clustered index enforces how the data is stored and that is sorted by the key.<\/li>\n<li>&#8211; (Row 51-70, transaction ID ~0744) The storage engine has detected there is not enough free space in that page to fit another row, so it has to make room by splitting the page.<br \/>\nThere are many things to get done, hence the amount of records in the transaction log.<\/li>\n<li>&#8211; (Row 62) Half(~ish) the rows are moved from one page to a new page, description can&#8217;t be clearer, <em>Moved 19 row(s) at slot 16 from page 0001:00000158 to page 0001:00000179<\/em><\/li>\n<li>&#8211; (Row 71-72) Now that there is enough free space, transaction ~0743 can complete and the new row is finally inserted in its natural place, in this case <em>page 0001:00000158, slot 13<\/em><\/li>\n<\/ol>\n<p>&nbsp;<br \/>\nAll this trouble just to get one new row inserted. If this was happening on regular basis, imagine what a waste when almost each INSERT results in a page split and then your maintenance task have to go and remove the fragmentation by rebuilding the clustered index (which is not online unless Expensive Edition) and all the log to be sent to your HA\/DR replicas&#8230; <\/p>\n<p>&nbsp;<br \/>\n<strong>The physical row<\/strong><\/p>\n<p>Hopefully so far everything makes sense, so let&#8217;s check those 2 pages to see how the data is stored, to see inside any specific page we use DBCC PAGE(). In a new Query window we run:<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE bad_cluster_key\r\nGO\r\n-- Moved 19 row(s) at slot 16 from page 0001:00000158 to page 0001:00000179\r\n-- We need to convert from Hexadecimal to decimal\r\nSELECT CONVERT(INT, 0x0158), CONVERT(INT, 0x0179)\r\n-- 344\t377\r\nGO\r\nDBCC TRACEON(3604)\r\nGO\r\nSELECT DB_ID()\r\n-- 12\r\nDBCC PAGE(12, 1, 344, 1)\r\n\/*\r\nOFFSET TABLE:\r\n\r\nRow - Offset                        \r\n16 (0x10) - 3441 (0xd71)            \r\n15 (0xf) - 3218 (0xc92)             \r\n14 (0xe) - 2995 (0xbb3)             \r\n13 (0xd) - 7901 (0x1edd)   &lt;----------- This is our row          \r\n12 (0xc) - 2772 (0xad4)             \r\n11 (0xb) - 2549 (0x9f5)             \r\n10 (0xa) - 2326 (0x916)             \r\n9 (0x9) - 2103 (0x837)              \r\n8 (0x8) - 1880 (0x758)              \r\n7 (0x7) - 1657 (0x679)              \r\n6 (0x6) - 1434 (0x59a)              \r\n5 (0x5) - 1211 (0x4bb)              \r\n4 (0x4) - 988 (0x3dc)               \r\n3 (0x3) - 765 (0x2fd)               \r\n2 (0x2) - 542 (0x21e)               \r\n1 (0x1) - 319 (0x13f)               \r\n0 (0x0) - 96 (0x60)                      \r\n*\/\r\n<\/pre>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nDBCC PAGE(12, 1, 344, 3)\r\n\/*\r\nSlot 13 Column 1 Offset 0x4 Length 16 Length (physical) 16\r\n\r\nID = e75e82eb-74d6-4502-bdc1-507fe87323fb                                \r\n\r\nSlot 13 Column 2 Offset 0x14 Length 100 Length (physical) 100\r\n\r\nCol1 = AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\r\n\r\nSlot 13 Column 3 Offset 0x78 Length 100 Length (physical) 100\r\n\r\nCol2 = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\r\n\r\nSlot 13 Offset 0x0 Length 0 Length (physical) 0\r\n*\/\r\n<\/pre>\n<p>We&#8217;ve seen the message &#8216;<em>Moved 19 row(s) at slot 16 from page 0001:00000158 to page 0001:00000179<\/em>&#8216;.<br \/>\nSo, after moving 19 rows, 16 were left in this page (Slots 0-15) and then the row was inserted (slot 13) hence we have now 17 rows (slots 0-16).<\/p>\n<p>See how row (slot) 13 is at the end of the page (offset 7901), not immediately between 12 and 14, but that&#8217;s not important as far as it is sorted correctly in the slot array, where obviously it is.<\/p>\n<p>If we check the new page generated by the split page operation we can see how for sure we have the 19 rows moved from page 344.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nDBCC PAGE(12, 1, 377, 1)\r\n\/*\r\nOFFSET TABLE:\r\n\r\nRow - Offset                        \r\n19 (0x13) - 4110 (0x100e)           \r\n18 (0x12) - 3887 (0xf2f)            \r\n17 (0x11) - 3664 (0xe50)            \r\n16 (0x10) - 3441 (0xd71)            \r\n15 (0xf) - 3218 (0xc92)             \r\n14 (0xe) - 2995 (0xbb3)             \r\n13 (0xd) - 2772 (0xad4)             \r\n12 (0xc) - 2549 (0x9f5)             \r\n11 (0xb) - 2326 (0x916)             \r\n10 (0xa) - 2103 (0x837)             \r\n9 (0x9) - 1880 (0x758)              \r\n8 (0x8) - 1657 (0x679)              \r\n7 (0x7) - 1434 (0x59a)              \r\n6 (0x6) - 1211 (0x4bb)              \r\n5 (0x5) - 4333 (0x10ed)             \r\n4 (0x4) - 988 (0x3dc)               \r\n3 (0x3) - 765 (0x2fd)               \r\n2 (0x2) - 542 (0x21e)               \r\n1 (0x1) - 319 (0x13f)               \r\n0 (0x0) - 96 (0x60)                 \r\n               \r\n*\/\r\n<\/pre>\n<p>See how there are 19&#8230; errrrr 20? OMG! hahaha&#8230; don&#8217;t loose your faith just yet! there is one extra row because since our 10 INSERT are random, probably another row happened to be inserted on this very page, see how slot 5 is also at the end of the page, so I can be certain that insert can be found later in the transaction log.<\/p>\n<p><a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/09\/04_fn_dblog_insert_no_split.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/09\/04_fn_dblog_insert_no_split.png\" alt=\"04_fn_dblog_insert_no_split\" width=\"1738\" height=\"297\" class=\"aligncenter size-full wp-image-320\" srcset=\"https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/04_fn_dblog_insert_no_split.png 1738w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/04_fn_dblog_insert_no_split-300x51.png 300w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/04_fn_dblog_insert_no_split-1024x175.png 1024w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/09\/04_fn_dblog_insert_no_split-150x26.png 150w\" sizes=\"(max-width: 1738px) 100vw, 1738px\" \/><\/a><\/p>\n<p>There you go! Another INSERT in the same page (x0179 &rarr; 377) slot 5. Boom!<\/p>\n<p>&nbsp;<br \/>\n<strong>Concepts\/Take away&#8217;s<\/strong><\/p>\n<p>Writing this demo has been pretty awesome for me and I have to say I needed to do this process at work to find the latest inserted rows in a table where the clustered key was not an IDENTITY value, so all these geekin&#8217; around has some real value, internals are so cool!<\/p>\n<p>And just to wrap up some of the concepts I have used or referenced in this post.<\/p>\n<p>&#8211; <a href=\"http:\/\/www.sqlskills.com\/blogs\/kimberly\/the-clustered-index-debate-continues\/\" target=\"_blank\">The Clustered Index debate by Kimberly Tripp<\/a><br \/>\n&#8211; <a href=\"http:\/\/www.sqlskills.com\/blogs\/paul\/category\/fragmentation\/\" target=\"_blank\">Index Fragmentation by Paul Randal<\/a><br \/>\n&#8211; <a href=\"http:\/\/sqlinthewild.co.za\/index.php\/2009\/08\/31\/backing-up-to-nul-vs-backup-with-truncate-only\/\" target=\"_blank\">Backup to NUL by Gail Shaw<\/a><br \/>\n&#8211; <a href=\"http:\/\/www.sqlskills.com\/blogs\/paul\/using-fn_dblog-fn_dump_dblog-and-restoring-with-stopbeforemark-to-an-lsn\/\" target=\"_blank\">fn_dblog() by Paul Randal<\/a><br \/>\n&#8211; <a href=\"https:\/\/blogs.msdn.microsoft.com\/sqlserverstorageengine\/2006\/12\/13\/more-undocumented-fun-dbcc-ind-dbcc-page-and-off-row-columns\/\" target=\"_blank\">DBCC PAGE by Paul Randal<\/a><\/p>\n<p>Thanks for reading!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There has been a permanent debate about which is the right clustering key for our tables since the beginning of the time, this time I&#8217;ll get a bit deeper to show you&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[35,40,5,12,25],"_links":{"self":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/316"}],"collection":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/comments?post=316"}],"version-history":[{"count":1,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/316\/revisions"}],"predecessor-version":[{"id":321,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/316\/revisions\/321"}],"wp:attachment":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/media?parent=316"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/categories?post=316"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/tags?post=316"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}