Divide and Conquer Date Search Hack (because binary search sounds boring) Divide and Conquer Date Search Hack (because binary search sounds boring)
This week has been an amazing experience, not only I have attended to one of the most valuable SQL Server training you can find (IEPTO2 by @SqlSkills team), I’ve also met really nice people and brains, all of them with something in common, a big passion for technology and specially SQL Server.
Speaking of crazy ideas and ‘hacks’ we’ve done through the years to make things to work, one of them got my attention for its simplicity, the idea of hacking a Date search by hitting a numeric key.
The idea of binary search has been there for a while, but taking advantage of the algorithm and applying it to SQL Server how it’s described in a blog post by Rainer Unwin (See original post) was so great that I couldn’t stop myself for writing my own version, so I got to it.
The approach is such that having a numeric clustered key, with an ever increasing pattern (Identity fulfils that perfectly), by doing individual seeks for particular keys and checking the date values for those rows, you can narrow down the range to end up pointing the desired date, understanding that the date we’re looking for, should be also ever increasing as the clustered key is.
Let’s say this is the whole range of possible values for the clustered key within our table
If the value we’re looking for is neither of those, we divide the range and ask in which side our value can be, so we can eliminate half of the values at once.
Doing consecutive divisions of the range, we will narrow down the subset to the value we are looking for (if exists) and then we can return the key for that value
This can be extremely useful if you have a medium or big size table without a nonclustered index on the date column you want to search on. The performance of individual seeks on the clustered primary key comparing to a clustered index scan can go from several minutes to less than a second.
An that scales like a treat, just for a second imagine the biggest number you can… Not that one… BIGGER!!!
Take it to the extreme… the limit of BIGINT in SQL Server is 9,223,372,036,854,775,807… 9 quintillions.
To be able to insert that amount of records in a table, you would need to insert 1 million rows per second during 292471 YEARS!!!!
And even though, the loop would only need 63 cycles… yes sixty three cycles.
DECLARE @n BIGINT = 9223372036854775807 DECLARE @count INT = 0 WHILE @n / 2. > 1 BEGIN SET @n = @n / 2. SET @cycles += 1 END SELECT @cycles
Or also can be calculated as follows
First I’ll show you a specific example for a table and then I’ll provide you with a stored procedure which will take some parameters and dynamically will look for absolutely any date within any table on any database that might exist in your server
USE AdventureWorks2014 DECLARE @maxID INT , @minID INT , @cutoff INT , @dateToFind DATETIME = '2011-10-14 00:00:00.000' SELECT @minID = MIN(SalesOrderID), @maxID = MAX(SalesOrderID) FROM [Sales].[SalesOrderHeader] WHILE ISNULL((SELECT 1 FROM [Sales].[SalesOrderHeader] WHERE SalesOrderID IN (@minID, @maxID) AND OrderDate = @dateToFind),0) <> 1 BEGIN -- To avoid infinite loop if not found IF @maxID = (SELECT MIN(SalesOrderID) FROM [Sales].[SalesOrderHeader] WHERE SalesOrderID > @minID) BREAK SET @cutoff = @maxID - (@maxID - @minID) / 2 IF @dateToFind < (SELECT OrderDate FROM [Sales].[SalesOrderHeader] WHERE SalesOrderID = @cutoff) SET @maxID = @cutoff ELSE SET @minID = @cutoff END SELECT * FROM [Sales].[SalesOrderHeader] WHERE SalesOrderID IN (@minID, @maxID) AND (OrderDate = @dateToFind)
In this example we have 31465 rows and the loop does 14 cycles to find a matching row, which can become 14*3 Selects, (one of them to 2 key values so 4), 56 index seeks (plus the bookmark lookups when we check for the date, don’t take this as true as it’s a pretty rough calculation). Still nothing if we compare it to a Clustered index scan.
Please see the original post for more info about IO
And you’ve seen than the bigger the number of rows, the better performance you get out of it, just imagine that table in the billions of rows.
The stored procedure would be something as follows
USE master GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF OBJECT_ID('dbo.sqlg_Divide_And_Conquer') IS NULL EXECUTE sp_executesql N'CREATE PROCEDURE dbo.sqlg_Divide_And_Conquer AS RETURN' GO -- ============================================= -- Author: Raul Gonzalez (@SQLDoubleG) -- Create date: 15/10/2015 -- Description: Returns the row(s) within the given table which match the date searching criteria -- -- Parameters: -- @DatabaseName , Optional, if not specified will search on the current database -- @SchemaName , Optional -- @TableName , Table where we want to search -- @KeyColumnName , Name of the Clustered key column -- @DateColumnName , Name of the Date(/time) column -- @DateToFind , Date(/time) we are looking for -- @Aprox , Will return the closest values in case no there is no exact matching -- -- Assumptions: -- The Clustered Key column provided is a number ever increasing. -- In case of Clustered composite keys, use the leftmost key column if that follows the desired pattern. -- The Date to find follows the same pattern, is ever increasing, hence correlated to the Clustered Key value -- -- Log History: -- 15/10/2015 RAG Created -- -- Copyright: (C) 2015 Raul Gonzalez (@SQLDoubleG https://www.sqldoubleg.com) -- -- THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF -- ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED -- TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A -- PARTICULAR PURPOSE. -- -- THE AUTHOR SHALL NOT BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY INDIRECT, -- SPECIAL, INCIDENTAL, PUNITIVE, COVER, OR CONSEQUENTIAL DAMAGES OF ANY KIND -- -- YOU MAY ALTER THIS CODE FOR YOUR OWN *NON-COMMERCIAL* PURPOSES. YOU MAY -- REPUBLISH ALTERED CODE AS LONG AS YOU INCLUDE THIS COPYRIGHT AND GIVE DUE CREDIT. -- -- ============================================= ALTER PROCEDURE dbo.sqlg_Divide_And_Conquer @DatabaseName SYSNAME = NULL , @SchemaName SYSNAME = NULL , @TableName SYSNAME , @KeyColumnName SYSNAME , @DateColumnName SYSNAME , @DateToFind DATETIME , @Aprox BIT = 0 AS BEGIN SET NOCOUNT ON DECLARE @sql NVARCHAR(4000) DECLARE @FullObjectName NVARCHAR(386) -- sanitize the inputs SET @DatabaseName = QUOTENAME(REPLACE(REPLACE(@DatabaseName , '[', ''), ']', '')) SET @SchemaName = QUOTENAME(REPLACE(REPLACE(@SchemaName , '[', ''), ']', '')) SET @TableName = QUOTENAME(REPLACE(REPLACE(@TableName , '[', ''), ']', '')) SET @KeyColumnName = QUOTENAME(REPLACE(REPLACE(@KeyColumnName , '[', ''), ']', '')) SET @DateColumnName = QUOTENAME(REPLACE(REPLACE(@DateColumnName , '[', ''), ']', '')) SET @FullObjectName = ISNULL(@DatabaseName + '.', '') + ISNULL(@SchemaName + '.', '') + @TableName SET @sql = ' DECLARE @max INT , @min INT , @cutoff INT SELECT @min=MIN(' + @KeyColumnName +') , @max=MAX(' + @KeyColumnName +') FROM ' + @FullObjectName + ' WHILE ISNULL((SELECT TOP 1 1 FROM ' + @FullObjectName + ' WHERE ' + @KeyColumnName +' IN (@min, @max) AND ' + @DateColumnName + ' = @DateToFind),0) <> 1 BEGIN -- To avoid infinite loop if not found IF @max = (SELECT MIN(' + @KeyColumnName +') FROM ' + @FullObjectName + ' WHERE ' + @KeyColumnName +' > @min) BREAK --PRINT ''min => '' + CONVERT(VARCHAR,@min) + '', max => ''+ CONVERT(VARCHAR,@max) SET @cutoff = @max - (@max - @min) / 2 IF @DateToFind < (SELECT MAX(' + @DateColumnName + ') FROM ' + @FullObjectName + ' WHERE ' + @KeyColumnName +' = @cutoff) SET @max = @cutoff ELSE SET @min = @cutoff END SELECT * FROM ' + @FullObjectName + ' WHERE ' + @KeyColumnName +' IN (@min, @max) AND (' + @DateColumnName + ' = @DateToFind OR @Aprox=1) ' --PRINT @sql EXECUTE sp_executesql @sql, @params = N'@DateToFind DATETIME, @Aprox BIT', @DateToFind = @DateToFind, @Aprox = @Aprox END GO
And you can test it against the [AdventureWorks2014] database
EXECUTE dbo.sqlg_Divide_And_Conquer 'AdventureWorks2014', '[Sales]', '[SalesOrderHeader]', '[SalesOrderID]', '[OrderDate]', '2013-05-30 00:00:00.000', 0
The SP needs a bit more of code and doesn’t look at neat and tidy as the script, but it does what is supposed to do, so for me this is «Mission Accomplished»
Hope you guys like it and throw me any question you might have (about the script, not in general 🙂 )