Working with databases is always fun and you never know what can be knocking on your door, even when you think everything is under control.
As the title of this post states, today I will explain in detail how I managed to compare thousands of databases schemas at once, just by following the popular and well-known saying “How to Eat an Elephant“
Background
When it comes to choosing the right architecture for your data, there is not “one size fits all” solution, it depends on multiple factors and nowadays with more and more offerings from cloud providers, the number of possibilities has grown dramatically.
This wasn’t the case not so long ago, when people used to put everything in the database, just because … just because.
In those good old times, there were still choices to make, like for instance, to have one big database for your application or multiple databases sharing the same schema.
In multi-tenant environments, having one database per customer was is very common, because it brings some benefits from the security and sometimes from the performance point of view.
But also it comes with the challenge of keeping all databases schemas in sync.
Schema Comparison Tools
There are tools that can compare two databases in great detail, but the caveat I found, it is that they can only compare two databases, so while they’ll be fine to compare DEV<=>PROD or QA<=>PROD, they’re not a good fit to compare and find the differences in, literally, thousands of databases
There it’s where I was, so after some time trying to postpone the inevitable, it was time to roll up my sleeves and get the job done.
My Approach
My approach for a task like this is very simple, just like the saying
“How to Eat an Elephant“, One Byte at a Time
But in my case, one [database] object at time 🙂
First thing was to write the script that generates each object’s definition to be able to compare each other. And for simplicity, I also generate a hash for each object to compare, rather than the text.
In my case (yours might be different) I wanted to compare all SQL modules (SP’s, UDF’s, TVF’s, triggers, views), all tables and user defined type tables, that gives me a good starting point.
Thankfully I have written most parts of it at some point in the last 10 years, so I headed to my GitHub repo and started to grab pieces from here and there.
For the purpose of this post, I restored the lightest copy of AdventureWorks 100 times on 2 instances, plus one more that will be the source of truth.
They are named [AdventureWorksLT_001] to [AdventureWorksLT_200] and the master just [AdventureWorksLT]
The TSQL script to generate all the objects I’m interested on, it looks like this:
SET NOCOUNT ON; DECLARE @object_name nvarchar(257) = '' DECLARE @sqlstring nvarchar(MAX) DECLARE @dbname sysname = 'AdventureWorksLT' SET @dbname = NULLIF(@dbname, '') SET @object_name = NULLIF(@object_name, '') IF OBJECT_ID('tempdb..#output') IS NOT NULL DROP TABLE #output CREATE TABLE #output ( database_name sysname NOT NULL, object_type nvarchar(60) NOT NULL, object_name nvarchar(257) NOT NULL, object_definition nvarchar(MAX) NOT NULL ) DECLARE dbs CURSOR FORWARD_ONLY READ_ONLY FAST_FORWARD LOCAL FOR SELECT name FROM sys.databases WHERE database_id > 4 AND state = 0 AND (name = @dbname OR (@dbname IS NULL AND name LIKE 'AdventureWorksLT_%')) OPEN dbs FETCH NEXT FROM dbs INTO @dbname WHILE @@FETCH_STATUS = 0 BEGIN SET @sqlstring = N'USE ' + QUOTENAME(@dbname) + N' SET NOCOUNT ON; SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS [database_name] , o.type_desc COLLATE DATABASE_DEFAULT AS [object_type] , OBJECT_SCHEMA_NAME(o.object_id) + ''.'' + OBJECT_NAME(o.object_id) COLLATE DATABASE_DEFAULT AS [object_name] , LTRIM(RTRIM(OBJECT_DEFINITION(o.OBJECT_ID))) AS [object_definition] FROM sys.objects AS o WHERE o.type IN (''TR'',''FN'',''P'',''TF'',''V'') AND (o.object_id = OBJECT_ID(@object_name) OR @object_name IS NULL) UNION ALL SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS [database_name] , type_desc COLLATE DATABASE_DEFAULT AS [type_desc] , OBJECT_SCHEMA_NAME(object_id) + ''.'' + name COLLATE DATABASE_DEFAULT AS object_name , ''CREATE TABLE '' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + ''.'' + QUOTENAME(name) + ''('' + CHAR(10) + (STUFF( (SELECT '', '' + QUOTENAME(c.name) + '' '' + t.name + CASE WHEN t.name IN (''char'', ''nchar'', ''binay'', ''varchar'', ''nvarchar'', ''varbinay'') THEN ''('' + CONVERT(SYSNAME, c.max_length) + '')'' ELSE '''' END + CASE WHEN c.is_identity = 1 THEN '' IDENTITY'' ELSE '''' END + ISNULL('' COLLATE '' + c.collation_name, '''') + CASE WHEN c.is_nullable = 0 THEN '' NOT'' ELSE '''' END + '' NULL'' + CHAR(10) FROM sys.columns AS c INNER JOIN sys.types AS t ON t.user_type_id = c.user_type_id WHERE c.object_id = o.object_id ORDER BY column_id FOR XML PATH('''')), 1, 2, '''')) + '')'' COLLATE DATABASE_DEFAULT AS object_definition FROM sys.objects AS o WHERE type IN (''U'') AND OBJECTPROPERTY(object_id, ''IsMsShipped'') = 0 AND (object_id = OBJECT_ID(@object_name) OR @object_name IS NULL) UNION ALL SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS database_name , o.type_desc COLLATE DATABASE_DEFAULT , SCHEMA_NAME(tt.schema_id) + ''.'' + tt.name COLLATE DATABASE_DEFAULT AS object_name , ''CREATE TYPE '' + QUOTENAME(SCHEMA_NAME(tt.schema_id)) + ''.'' + QUOTENAME(tt.name) + '' AS TABLE ('' + CHAR(10) + (STUFF( (SELECT '', '' + QUOTENAME(c.name) + '' '' + t.name + CASE WHEN t.name IN (''char'', ''nchar'', ''binay'', ''varchar'', ''nvarchar'', ''varbinay'') THEN ''('' + CONVERT(SYSNAME, c.max_length) + '')'' ELSE '''' END + CASE WHEN c.is_identity = 1 THEN '' IDENTITY'' ELSE '''' END + ISNULL('' COLLATE '' + c.collation_name, '''') + CASE WHEN c.is_nullable = 0 THEN '' NOT'' ELSE '''' END + '' NULL'' + CHAR(10) FROM sys.columns AS c INNER JOIN sys.types AS t ON t.user_type_id = c.user_type_id WHERE c.object_id = tt.type_table_object_id ORDER BY column_id FOR XML PATH('''')), 1, 2, '''')) + '')'' COLLATE DATABASE_DEFAULT AS object_definition FROM sys.objects AS o INNER JOIN sys.table_types AS tt ON tt.type_table_object_id = o.object_id WHERE (tt.type_table_object_id = OBJECT_ID(@object_name) OR @object_name IS NULL) ' INSERT INTO #output (database_name, object_type, object_name, object_definition) EXECUTE sys.sp_executesql @stmt = @sqlstring , @params = N'@object_name NVARCHAR(257)' , @object_name = @object_name; FETCH NEXT FROM dbs INTO @dbname END CLOSE dbs; DEALLOCATE dbs; SELECT @@SERVERNAME AS server_name , database_name , object_type , object_name , object_definition , CONVERT(varchar(100), HASHBYTES('md5', LTRIM(RTRIM(object_definition))),1) AS object_hash FROM #output ORDER BY object_type , object_name
Executed in my local instance, the result looks like this:
AS mentioned earlier, each object has got a hash, which later will help to compare them all.
Time to Eat the Elephant
So far so good, I can use this query to target one or multiple databases at once in the same server, but I do have two instances and still row by row comparison will be a killer, so we need something extra, some sort of super power to go through all of them in on go… or just use PowerShell!
The approach again is One Object at a Time, why? because once we have compared one object against all databases, we have the whole picture for that object and we know if it has drifted, where has drifted and the most important, we can start figuring out what to do with it.
I would need:
- A master copy with all objects to be the reference
- The list of servers I want to target
- The list of databases I want to target
And in return, I will get one folder for each object which will contain
- One file with the list of databases where the object is different than the reference
- One file per each different copy, named after the hash of the definition of the module
I know it might look a bit rudimentary, but there are multiple free tools for text comparison you can use and since you are doing one object at a time and hopefully there are not that many different versions of the objects, I found it simple enough to do it this way.
The Hack
Time to put all together, the process will:
- Get all servers we want to target
- Get all object definitions from the master copy
- Create the folder structure
- Get one object and loop through each server/database and write down when it’s different in the right folder.
#Set-executionPolicy -ExecutionPolicy Unrestricted Import-Module sqlps; Clear-Host; $StartDate=(Get-Date) # Config values $schema = "AdventureWorksLT" $control_srv = 'localhost\mssql2019'; $control_db = "AdventureWorksLT"; $server_query = "SELECT 'localhost\MSSQL2019' AS ServerUrl UNION SELECT 'localhost\MSSQL2019B'" ############################################ # DO NOT TOUCH BELOW THIS LINE ############################################ $server_list = Invoke-Sqlcmd -ServerInstance $control_srv ` -Database $control_db ` -DisableVariables ` -Query $server_query ` -ConnectionTimeout 0 ` -MaxCharLength 64000; $sql = "SET NOCOUNT ON; DECLARE @object_name nvarchar(257) = '[@object_name]' DECLARE @sqlstring nvarchar(MAX) DECLARE @dbname sysname = '[@dbname]' SET @dbname = NULLIF(@dbname, '') SET @object_name = NULLIF(@object_name, '') IF OBJECT_ID('tempdb..#output') IS NOT NULL DROP TABLE #output CREATE TABLE #output ( database_name sysname NOT NULL, object_type nvarchar(60) NOT NULL, object_name nvarchar(257) NOT NULL, object_definition nvarchar(MAX) NOT NULL ) DECLARE dbs CURSOR FORWARD_ONLY READ_ONLY FAST_FORWARD LOCAL FOR SELECT name FROM sys.databases WHERE database_id > 4 AND state = 0 AND (name = @dbname OR (@dbname IS NULL AND name LIKE 'AdventureWorksLT_%')) OPEN dbs FETCH NEXT FROM dbs INTO @dbname WHILE @@FETCH_STATUS = 0 BEGIN SET @sqlstring = N'USE ' + QUOTENAME(@dbname) + N' SET NOCOUNT ON; SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS [database_name] , o.type_desc COLLATE DATABASE_DEFAULT AS [object_type] , OBJECT_SCHEMA_NAME(o.object_id) + ''.'' + OBJECT_NAME(o.object_id) COLLATE DATABASE_DEFAULT AS [object_name] , LTRIM(RTRIM(OBJECT_DEFINITION(o.OBJECT_ID))) AS [object_definition] FROM sys.objects AS o WHERE o.type IN (''TR'',''FN'',''P'',''TF'',''V'') AND (o.object_id = OBJECT_ID(@object_name) OR @object_name IS NULL) UNION ALL SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS [database_name] , type_desc COLLATE DATABASE_DEFAULT AS [type_desc] , OBJECT_SCHEMA_NAME(object_id) + ''.'' + name COLLATE DATABASE_DEFAULT AS object_name , ''CREATE TABLE '' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + ''.'' + QUOTENAME(name) + ''('' + CHAR(10) + (STUFF( (SELECT '', '' + QUOTENAME(c.name) + '' '' + t.name + CASE WHEN t.name IN (''char'', ''nchar'', ''binay'', ''varchar'', ''nvarchar'', ''varbinay'') THEN ''('' + CONVERT(SYSNAME, c.max_length) + '')'' ELSE '''' END + CASE WHEN c.is_identity = 1 THEN '' IDENTITY'' ELSE '''' END + ISNULL('' COLLATE '' + c.collation_name, '''') + CASE WHEN c.is_nullable = 0 THEN '' NOT'' ELSE '''' END + '' NULL'' + CHAR(10) FROM sys.columns AS c INNER JOIN sys.types AS t ON t.user_type_id = c.user_type_id WHERE c.object_id = o.object_id ORDER BY column_id FOR XML PATH('''')), 1, 2, '''')) + '')'' COLLATE DATABASE_DEFAULT AS object_definition FROM sys.objects AS o WHERE type IN (''U'') AND OBJECTPROPERTY(object_id, ''IsMsShipped'') = 0 AND (object_id = OBJECT_ID(@object_name) OR @object_name IS NULL) UNION ALL SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS database_name , o.type_desc COLLATE DATABASE_DEFAULT , SCHEMA_NAME(tt.schema_id) + ''.'' + tt.name COLLATE DATABASE_DEFAULT AS object_name , ''CREATE TYPE '' + QUOTENAME(SCHEMA_NAME(tt.schema_id)) + ''.'' + QUOTENAME(tt.name) + '' AS TABLE ('' + CHAR(10) + (STUFF( (SELECT '', '' + QUOTENAME(c.name) + '' '' + t.name + CASE WHEN t.name IN (''char'', ''nchar'', ''binay'', ''varchar'', ''nvarchar'', ''varbinay'') THEN ''('' + CONVERT(SYSNAME, c.max_length) + '')'' ELSE '''' END + CASE WHEN c.is_identity = 1 THEN '' IDENTITY'' ELSE '''' END + ISNULL('' COLLATE '' + c.collation_name, '''') + CASE WHEN c.is_nullable = 0 THEN '' NOT'' ELSE '''' END + '' NULL'' + CHAR(10) FROM sys.columns AS c INNER JOIN sys.types AS t ON t.user_type_id = c.user_type_id WHERE c.object_id = tt.type_table_object_id ORDER BY column_id FOR XML PATH('''')), 1, 2, '''')) + '')'' COLLATE DATABASE_DEFAULT AS object_definition FROM sys.objects AS o INNER JOIN sys.table_types AS tt ON tt.type_table_object_id = o.object_id WHERE (tt.type_table_object_id = OBJECT_ID(@object_name) OR @object_name IS NULL) ' INSERT INTO #output (database_name, object_type, object_name, object_definition) EXECUTE sys.sp_executesql @stmt = @sqlstring , @params = N'@object_name NVARCHAR(257)' , @object_name = @object_name; FETCH NEXT FROM dbs INTO @dbname END CLOSE dbs; DEALLOCATE dbs; SELECT @@SERVERNAME AS server_name , database_name , object_type , object_name , object_definition , CONVERT(varchar(100), HASHBYTES('md5', LTRIM(RTRIM(object_definition))),1) AS object_hash FROM #output ORDER BY object_type , object_name"; $master_copy = Invoke-Sqlcmd -ServerInstance $master_srv ` -Database $master_db ` -DisableVariables ` -Query $sql.Replace("[@object_name]", "").Replace("[@dbname]","$($master_db)") ` -ConnectionTimeout 0 ` -MaxCharLength 64000; $object_types = $master_copy | Select-Object object_type -Unique # to generate a single path per run $path = "c:/temp/database_schemas/$($schema)/$(Get-Date -Format "yyyyMMddHHmmss")/"; if ( (Test-Path -path $path) -eq $false){ #write-output "path does not exist" New-Item -path $path -ItemType Directory | Out-Null; } # to generate a single folder per object_type foreach ($type in $object_types){ if ( (Test-Path -Path "$($path)/$($type.object_type)" ) -eq $false){ #write-output "path does not exist" New-Item -Path "$($path)/$($type.object_type)" -ItemType Directory | Out-Null; } } foreach ($db_object in $master_copy){ # to cummulate all copies of the object from all servers $local_copy = @(); $errors = @(); Write-Output "Processing $($db_object.object_type) =&amp;amp;amp;gt; $($db_object.object_name)"; $db_object_path = "$($path)/$($db_object.object_type)/$($db_object.object_name)"; if ( (Test-Path -path $db_object_path) -eq $false){ New-Item -path $db_object_path -ItemType Directory | Out-Null; } $outputfile = "$($db_object_path)/all_diffs.txt"; $db_object | Select-Object database_name, object_name, object_hash | Out-File $outputfile -Append; foreach ($srv in $server_list){ Write-Output "`tProcessing server $($srv.ServerUrl)"; $objDataTable = Invoke-Sqlcmd -ServerInstance "$($srv.ServerUrl)" ` -Database "master" ` -DisableVariables ` -Query $sql.Replace("[@object_name]", $db_object.object_name).Replace("[@dbname]","") ` -ConnectionTimeout 0 ` -MaxCharLength 64000; if ($null -ne $objDataTable){ $local_copy += $objDataTable; } else{ $errors += [pscustomobject]@{server_name=$db.ServerUrl;database_name=$db.database_name;object_name=$db_object.object_name;object_hash=0x0}; } } # Drifts are any row that does not match the object_hash # $local_copy | Select-Object database_name, objecT_hash | Where-Object -Property object_hash -ne $db_object.object_hash $drifts = $local_copy | Select-Object | Where-Object -Property object_hash -ne $db_object.object_hash if ($drifts.Count -eq 0){ #if no drifts for this object, remove the noise Remove-Item -Path $db_object_path -Recurse; } else{ # if there was any drift, get unique versions from all servers $diff_versions = $drifts | Select-Object object_hash, object_definition -Unique; # add each database that drifted to the list $errors | Select-Object server_name, database_name, object_name, object_hash | Out-File $outputfile -Append; $drifts | Select-Object server_name, database_name, object_name, object_hash | Sort-Object -Property object_hash | Out-File $outputfile -Append; # put the definition from the master object $db_object.object_definition | Out-File "$($db_object_path)/$($db_object.object_name).sql" foreach ($diff in $($diff_versions)){ $diff.object_definition | Out-File "$($db_object_path)/$($diff.object_hash).sql" } } } $EndDate=(Get-Date); New-TimeSpan –Start $StartDate –End $EndDate | Format-Table -AutoSize;
Once the script completes a run, you can go to your local dir c:\temp
There is one folder per [object_type] that contains one folder for each object that has drifted. If the object is the same in all databases, the process will take care of it and remove the noise, so you can focus on what is important.
On each folder that represents an object, you can see different files:
- The master copy of the object, named after it.
- all_diffs.txt, which will contain a list of databases where the object has drifted.
This is the most important file, because it identifies which database has which version of the object. - One file per different version named after their hashes, you should be fine to compare this in a text editor.
Using the free tool WinMerge, you can compare three files at a time, which is quite handy.
Conclusion
Keeping databases schemas in sync is a big thing for me now, so trust me when I say doing this was bugging me for a while until I decided (and got the time) to get it done.
I developed this script/tool based on my requirements and yours might be different, but I am pretty sure you still can get some good ideas and draw a plan that fits your needs.
I really enjoy this kind of challenges and it is amazing how much we can do with a little bit of TSQL and a little bit of PowerShell.
You can find the scripts in GitHub, I hope you find it useful and please let me know in the comments what you think.
Thanks for reading.