{"id":74,"date":"2015-10-17T14:39:14","date_gmt":"2015-10-17T13:39:14","guid":{"rendered":"https:\/\/sqldoubleg.live-website.com\/?p=74"},"modified":"2015-10-28T21:02:56","modified_gmt":"2015-10-28T21:02:56","slug":"divide-and-conquer","status":"publish","type":"post","link":"https:\/\/www.sqldoubleg.com\/es\/2015\/10\/17\/divide-and-conquer\/","title":{"rendered":"Divide and Conquer Date Search Hack"},"content":{"rendered":"<p>Divide and Conquer Date Search Hack (because binary search sounds boring)&nbsp;<strong>Divide and Conquer Date Search Hack<\/strong> <em>(because binary search sounds boring)<\/em><\/p>\n<p>This week has been an amazing experience, not only I have attended to one of the most valuable SQL Server training you can find (IEPTO2 by <a href=\"http:\/\/www.sqlskills.com\" target=\"_blank\">@SqlSkills<\/a> team), I\u2019ve also met really nice people and brains, all of them with something in common, a big passion for technology and specially SQL Server.<\/p>\n<p>Speaking of crazy ideas and \u2018hacks\u2019 we\u2019ve done through the years to make things to work, one of them got my attention for its simplicity, the idea of hacking a Date search by hitting a numeric key.<\/p>\n<p>The idea of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Binary_search_algorithm\" target=\"_blank\">binary search<\/a> has been there for a while, but taking advantage of the algorithm and applying it to SQL Server how it&#8217;s described in a blog post by Rainer Unwin (<a href=\"http:\/\/rmunwin.com\/t-sql-binary-search\/\" target=\"_blank\">See original post<\/a>) was so great that I couldn\u2019t stop myself for writing my own version, so I got to it.<\/p>\n<p>The approach is such that having a numeric clustered key, with an ever increasing pattern (Identity fulfils that perfectly), by doing individual seeks for particular keys and checking the date values for those rows, you can narrow down the range to end up pointing the desired date, understanding that the date we\u2019re looking for, should be also ever increasing as the clustered key is.<\/p>\n<p>Let&#8217;s say this is the whole range of possible values for the clustered key within our table<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2015\/10\/divide_01.png\" alt=\"divide_01\" width=\"560\" height=\"99\" class=\"aligncenter size-full wp-image-75\" srcset=\"https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2015\/10\/divide_01.png 560w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2015\/10\/divide_01-300x53.png 300w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2015\/10\/divide_01-150x27.png 150w\" sizes=\"(max-width: 560px) 100vw, 560px\" \/><\/p>\n<p>If the value we\u2019re looking for is neither of those, we divide the range and ask in which side our value can be, so we can eliminate half of the values at once.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2015\/10\/divide_02.png\" alt=\"divide_01\" width=\"560\" height=\"99\" class=\"aligncenter size-full wp-image-75\" \/><\/p>\n<p>Doing consecutive divisions of the range, we will narrow down the subset to the value we are looking for (if exists) and then we can return the key for that value<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2015\/10\/divide_03.png\" alt=\"divide_01\" width=\"560\" height=\"99\" class=\"aligncenter size-full wp-image-75\" \/><\/p>\n<p>This can be extremely useful if you have a medium or big size table without a nonclustered index on the date column you want to search on. The performance of individual seeks on the clustered primary key comparing to a clustered index scan can go from several minutes to less than a second.<\/p>\n<p>An that scales like a treat, just for a second imagine the biggest number you can&#8230; Not that one&#8230; BIGGER!!!<\/p>\n<p>Take it to the extreme&#8230; the limit of BIGINT in SQL Server is 9,223,372,036,854,775,807&#8230; 9 quintillions.<\/p>\n<p>To be able to insert that amount of records in a table, you would need to insert <strong>1 million rows per second during 292471 YEARS!!!!<\/strong><\/p>\n<p>And even though, the loop would only need 63 cycles&#8230; yes sixty three cycles.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nDECLARE @n\t\tBIGINT = 9223372036854775807\r\nDECLARE @count\tINT = 0\r\n\r\nWHILE @n \/ 2. &gt; 1 BEGIN\r\n\tSET @n = @n \/ 2.\r\n\tSET @cycles += 1\r\nEND\r\n\r\nSELECT @cycles\r\n<\/pre>\n<p>Or also can be calculated as follows<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nSELECT FLOOR(LOG(9223372036854775807)\/LOG(2))\r\n<\/pre>\n<p>First I\u2019ll show you a specific example for a table and then I\u2019ll provide you with a stored procedure which will take some parameters and dynamically will look for absolutely any date within any table on any database that might exist in your server<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE AdventureWorks2014\r\n\r\nDECLARE @maxID\t\t\tINT \r\n\t\t, @minID\t\tINT\r\n\t\t, @cutoff\t\tINT\r\n\t\t, @dateToFind\tDATETIME = '2011-10-14 00:00:00.000'\r\n\r\nSELECT @minID = MIN(SalesOrderID), @maxID = MAX(SalesOrderID) FROM [Sales].[SalesOrderHeader]\r\n\r\nWHILE ISNULL((SELECT 1 FROM [Sales].[SalesOrderHeader] WHERE SalesOrderID IN (@minID, @maxID) AND OrderDate = @dateToFind),0) &lt;&gt; 1 BEGIN\r\n\r\n\t-- To avoid infinite loop if not found\r\n\tIF @maxID = (SELECT MIN(SalesOrderID) FROM [Sales].[SalesOrderHeader] WHERE SalesOrderID &gt; @minID)  BREAK\r\n\t\r\n\tSET @cutoff = @maxID - (@maxID - @minID) \/ 2\r\n\r\n\tIF @dateToFind &lt; (SELECT OrderDate FROM [Sales].[SalesOrderHeader] WHERE SalesOrderID = @cutoff) \r\n\t\tSET @maxID = @cutoff\r\n\tELSE\r\n\t\tSET @minID = @cutoff\r\n\r\nEND\r\n\r\nSELECT * FROM [Sales].[SalesOrderHeader] WHERE SalesOrderID IN (@minID, @maxID) AND (OrderDate = @dateToFind)\r\n<\/pre>\n<p>In this example we have 31465 rows and the loop does 14 cycles to find a matching row, which can become 14*3 Selects, (one of them to 2 key values so 4), 56 index seeks (plus the bookmark lookups when we check for the date, don&#8217;t take this as true as it&#8217;s a pretty rough calculation). Still nothing if we compare it to a Clustered index scan.<\/p>\n<p>Please see the <a href=\"http:\/\/rmunwin.com\/t-sql-binary-search\/\" target=\"_blank\">original post<\/a> for more info about IO<\/p>\n<p>And you&#8217;ve seen than the bigger the number of rows, the better performance you get out of it, just imagine that table in the billions of rows.<\/p>\n<p>The stored procedure would be something as follows<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE master\r\nGO\r\nSET ANSI_NULLS ON\r\nGO\r\nSET QUOTED_IDENTIFIER ON\r\nGO\r\nIF OBJECT_ID('dbo.sqlg_Divide_And_Conquer') IS NULL EXECUTE sp_executesql N'CREATE PROCEDURE dbo.sqlg_Divide_And_Conquer AS RETURN'\r\nGO\r\n-- =============================================\r\n-- Author:\t\tRaul Gonzalez (@SQLDoubleG)\r\n-- Create date: 15\/10\/2015\r\n-- Description:\tReturns the row(s) within the given table which match the date searching criteria\r\n--\r\n-- Parameters:\r\n--\t\t\t\t@DatabaseName\t, Optional, if not specified will search on the current database\t\r\n--\t\t\t\t@SchemaName\t\t, Optional\r\n--\t\t\t\t@TableName\t\t, Table where we want to search\r\n--\t\t\t\t@KeyColumnName\t, Name of the Clustered key column\r\n--\t\t\t\t@DateColumnName\t, Name of the Date(\/time) column \r\n--\t\t\t\t@DateToFind\t\t, Date(\/time) we are looking for\r\n--\t\t\t\t@Aprox\t\t\t, Will return the closest values in case no there is no exact matching\r\n--\r\n-- Assumptions:\t\r\n--\t\t\t\tThe Clustered Key column provided is a number ever increasing.\r\n--\t\t\t\t\tIn case of Clustered composite keys, use the leftmost key column if that follows the desired pattern.\r\n--\t\t\t\tThe Date to find follows the same pattern, is ever increasing, hence correlated to the Clustered Key value\r\n--\r\n-- Log History:\t\r\n--\t\t\t\t15\/10\/2015\tRAG\tCreated\r\n--\r\n-- Copyright:\t(C) 2015 Raul Gonzalez (@SQLDoubleG https:\/\/sqldoubleg.live-website.com)\r\n--\r\n--\t\t\t\tTHIS CODE AND INFORMATION ARE PROVIDED &quot;AS IS&quot; WITHOUT WARRANTY OF \r\n--\t\t\t\tANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED \r\n--\t\t\t\tTO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND\/OR FITNESS FOR A\r\n--\t\t\t\tPARTICULAR PURPOSE.\r\n--\r\n--\t\t\t\tTHE AUTHOR SHALL NOT BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY INDIRECT, \r\n--\t\t\t\tSPECIAL, INCIDENTAL, PUNITIVE, COVER, OR CONSEQUENTIAL DAMAGES OF ANY KIND\r\n--\r\n--\t\t\t\tYOU MAY ALTER THIS CODE FOR YOUR OWN *NON-COMMERCIAL* PURPOSES. YOU MAY\r\n--\t\t\t\tREPUBLISH ALTERED CODE AS LONG AS YOU INCLUDE THIS COPYRIGHT AND GIVE DUE CREDIT. \r\n--\r\n-- =============================================\r\nALTER PROCEDURE dbo.sqlg_Divide_And_Conquer \r\n\t\t@DatabaseName\t\tSYSNAME = NULL\r\n\t\t, @SchemaName\t\tSYSNAME = NULL\r\n\t\t, @TableName\t\tSYSNAME \r\n\t\t, @KeyColumnName\tSYSNAME \r\n\t\t, @DateColumnName\tSYSNAME \r\n\t\t, @DateToFind\t\tDATETIME\r\n\t\t, @Aprox\t\t\tBIT = 0\r\nAS \r\nBEGIN\r\n\r\nSET NOCOUNT ON\r\n\r\nDECLARE @sql NVARCHAR(4000)\r\nDECLARE @FullObjectName NVARCHAR(386)\r\n\r\n-- sanitize the inputs\r\nSET @DatabaseName\t\t= QUOTENAME(REPLACE(REPLACE(@DatabaseName\t, '[', ''), ']', ''))\r\nSET @SchemaName\t\t\t= QUOTENAME(REPLACE(REPLACE(@SchemaName\t\t, '[', ''), ']', ''))\r\nSET @TableName\t\t\t= QUOTENAME(REPLACE(REPLACE(@TableName\t\t, '[', ''), ']', ''))\r\nSET @KeyColumnName\t\t= QUOTENAME(REPLACE(REPLACE(@KeyColumnName\t, '[', ''), ']', ''))\r\nSET @DateColumnName\t\t= QUOTENAME(REPLACE(REPLACE(@DateColumnName\t, '[', ''), ']', ''))\r\n\r\nSET @FullObjectName\t\t= ISNULL(@DatabaseName + '.', '') + ISNULL(@SchemaName + '.', '') + @TableName\r\n\r\nSET @sql = '\r\n\r\nDECLARE  @max\t\t\tINT \r\n\t\t, @min\t\t\tINT\r\n\t\t, @cutoff\t\tINT\r\n\r\n\tSELECT @min=MIN(' + @KeyColumnName +') , @max=MAX(' + @KeyColumnName +') FROM ' + @FullObjectName + '\r\n\r\n\tWHILE ISNULL((SELECT TOP 1 1 FROM ' + @FullObjectName + ' WHERE ' + @KeyColumnName +' IN (@min, @max) AND ' + @DateColumnName + ' = @DateToFind),0) &lt;&gt; 1 BEGIN\r\n\r\n\t\t-- To avoid infinite loop if not found\r\n\t\tIF @max = (SELECT MIN(' + @KeyColumnName +') FROM ' + @FullObjectName + ' WHERE ' + @KeyColumnName +' &gt; @min)  BREAK\r\n\t\r\n\t\t--PRINT ''min =&gt; ''  + CONVERT(VARCHAR,@min) + '', max =&gt; ''+ CONVERT(VARCHAR,@max)\r\n\r\n\t\tSET @cutoff = @max - (@max - @min) \/ 2\r\n\r\n\t\tIF @DateToFind &lt; (SELECT MAX(' + @DateColumnName + ') FROM ' + @FullObjectName + ' WHERE ' + @KeyColumnName +' = @cutoff) \r\n\t\t\tSET @max = @cutoff\r\n\t\tELSE\r\n\t\t\tSET @min = @cutoff\r\n\r\n\tEND\r\n\r\n\tSELECT * FROM ' + @FullObjectName + ' WHERE ' + @KeyColumnName +' IN (@min, @max) AND (' + @DateColumnName + ' = @DateToFind OR @Aprox=1)\r\n'\r\n--PRINT @sql\r\n\r\nEXECUTE sp_executesql @sql, @params = N'@DateToFind DATETIME, @Aprox BIT', @DateToFind = @DateToFind, @Aprox = @Aprox\r\n\r\nEND\r\nGO\r\n<\/pre>\n<p>&nbsp;<br \/>\nAnd you can test it against the [AdventureWorks2014] database<br \/>\n&nbsp;<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nEXECUTE dbo.sqlg_Divide_And_Conquer 'AdventureWorks2014', '[Sales]', '[SalesOrderHeader]', '[SalesOrderID]', '[OrderDate]', '2013-05-30 00:00:00.000', 0\r\n<\/pre>\n<p>&nbsp;<br \/>\nThe SP needs a bit more of code and doesn&#8217;t look at neat and tidy as the script, but it does what is supposed to do, so for me this is \u00abMission Accomplished\u00bb<\/p>\n<p>Hope you guys like it and throw me any question you might have (about the script, not in general \ud83d\ude42 )<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Divide and Conquer Date Search Hack (because binary search sounds boring)&nbsp;Divide and Conquer Date Search Hack (because binary search sounds boring) This week has been an amazing experience, not only I have&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,14,19],"tags":[16,5,15],"_links":{"self":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/74"}],"collection":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/comments?post=74"}],"version-history":[{"count":0,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/74\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/media?parent=74"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/categories?post=74"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/tags?post=74"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}