Aug
31
2021

How I Compared 1000’s of database’s schemas at once

Working with databases is always fun and you never know what can be knocking on your door, even when you think everything is under control.

As the title of this post states, today I will explain in detail how I managed to compare thousands of databases schemas at once, just by following the popular and well-known saying “How to Eat an Elephant

Background

When it comes to choosing the right architecture for your data, there is not “one size fits all” solution, it depends on multiple factors and nowadays with more and more offerings from cloud providers, the number of possibilities has grown dramatically.

This wasn’t the case not so long ago, when people used to put everything in the database, just because … just because.

In those good old times, there were still choices to make, like for instance, to have one big database for your application or multiple databases sharing the same schema.

In multi-tenant environments, having one database per customer was is very common, because it brings some benefits from the security and sometimes from the performance point of view.

But also it comes with the challenge of keeping all databases schemas in sync.

Schema Comparison Tools

There are tools that can compare two databases in great detail, but the caveat I found, it is that they can only compare two databases, so while they’ll be fine to compare DEV<=>PROD or QA<=>PROD, they’re not a good fit to compare and find the differences in, literally, thousands of databases

There it’s where I was, so after some time trying to postpone the inevitable, it was time to roll up my sleeves and get the job done.

My Approach

My approach for a task like this is very simple, just like the saying

How to Eat an Elephant“, One Byte at a Time

But in my case, one [database] object at time 🙂

First thing was to write the script that generates each object’s definition to be able to compare each other. And for simplicity, I also generate a hash for each object to compare, rather than the text.

In my case (yours might be different) I wanted to compare all SQL modules (SP’s, UDF’s, TVF’s, triggers, views), all tables and user defined type tables, that gives me a good starting point.

Thankfully I have written most parts of it at some point in the last 10 years, so I headed to my GitHub repo and started to grab pieces from here and there.

For the purpose of this post, I restored the lightest copy of AdventureWorks 100 times on 2 instances, plus one more that will be the source of truth.
They are named [AdventureWorksLT_001] to [AdventureWorksLT_200] and the master just [AdventureWorksLT]

The TSQL script to generate all the objects I’m interested on, it looks like this:

SET NOCOUNT ON;
DECLARE @object_name	nvarchar(257) = '' 
DECLARE @sqlstring		nvarchar(MAX)
DECLARE @dbname			sysname = 'AdventureWorksLT'

SET @dbname = NULLIF(@dbname, '')
SET @object_name = NULLIF(@object_name, '')

IF OBJECT_ID('tempdb..#output') IS NOT NULL DROP TABLE #output

CREATE TABLE #output  (
    database_name sysname			NOT NULL, 
    object_type nvarchar(60)		NOT NULL, 
    object_name nvarchar(257)		NOT NULL, 
    object_definition nvarchar(MAX) NOT NULL
)

DECLARE dbs CURSOR FORWARD_ONLY READ_ONLY FAST_FORWARD LOCAL
	FOR SELECT name 
            FROM sys.databases 
 		    WHERE database_id &gt; 4 
		        AND state = 0
		        AND (name = @dbname OR (@dbname IS NULL AND name LIKE 'AdventureWorksLT_%'))

OPEN dbs 
FETCH NEXT FROM dbs INTO @dbname 

WHILE @@FETCH_STATUS = 0 BEGIN 
	SET @sqlstring	= N'USE ' + QUOTENAME(@dbname) + N'

SET NOCOUNT ON;

SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS [database_name] 
		, o.type_desc COLLATE DATABASE_DEFAULT AS [object_type]
		, OBJECT_SCHEMA_NAME(o.object_id) + ''.'' + OBJECT_NAME(o.object_id) COLLATE DATABASE_DEFAULT AS [object_name] 
		, LTRIM(RTRIM(OBJECT_DEFINITION(o.OBJECT_ID))) AS [object_definition]
	FROM sys.objects AS o
	WHERE o.type IN (''TR'',''FN'',''P'',''TF'',''V'')
        AND (o.object_id = OBJECT_ID(@object_name) OR @object_name IS NULL)

UNION ALL

SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS [database_name]
		, type_desc COLLATE DATABASE_DEFAULT AS [type_desc] 
		, OBJECT_SCHEMA_NAME(object_id) + ''.'' + name COLLATE DATABASE_DEFAULT AS object_name
		, ''CREATE TABLE '' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + ''.'' + QUOTENAME(name) + ''('' + CHAR(10) +
			(STUFF(
					(SELECT '', '' + QUOTENAME(c.name) + '' '' + t.name + 
							CASE WHEN t.name IN (''char'', ''nchar'', ''binay'', ''varchar'', ''nvarchar'', ''varbinay'')
								THEN ''('' + CONVERT(SYSNAME, c.max_length) + '')'' 
								ELSE ''''
							END +
							CASE WHEN c.is_identity = 1 THEN '' IDENTITY'' ELSE '''' END +
							ISNULL('' COLLATE '' + c.collation_name, '''') +
							CASE WHEN c.is_nullable = 0 THEN '' NOT'' ELSE '''' END + '' NULL'' + CHAR(10)
						FROM sys.columns AS c
							INNER JOIN sys.types AS t
								ON t.user_type_id = c.user_type_id
						WHERE c.object_id = o.object_id
						ORDER BY column_id
						FOR XML PATH('''')), 1, 2, '''')) + '')'' COLLATE DATABASE_DEFAULT AS object_definition
	FROM sys.objects AS o
	WHERE type IN (''U'')
		AND OBJECTPROPERTY(object_id, ''IsMsShipped'') = 0
		AND (object_id = OBJECT_ID(@object_name) OR @object_name IS NULL)

UNION ALL

SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS database_name 
		, o.type_desc COLLATE DATABASE_DEFAULT
		, SCHEMA_NAME(tt.schema_id) + ''.'' + tt.name COLLATE DATABASE_DEFAULT AS object_name
		, ''CREATE TYPE '' + QUOTENAME(SCHEMA_NAME(tt.schema_id)) + ''.'' + QUOTENAME(tt.name) + '' AS TABLE ('' + CHAR(10) +
			(STUFF(
					(SELECT '', '' + QUOTENAME(c.name) + '' '' + t.name + 
							CASE WHEN t.name IN (''char'', ''nchar'', ''binay'', ''varchar'', ''nvarchar'', ''varbinay'')
								THEN ''('' + CONVERT(SYSNAME, c.max_length) + '')'' 
								ELSE ''''
							END +
							CASE WHEN c.is_identity = 1 THEN '' IDENTITY'' ELSE '''' END +
							ISNULL('' COLLATE '' + c.collation_name, '''') +
							CASE WHEN c.is_nullable = 0 THEN '' NOT'' ELSE '''' END + '' NULL'' + CHAR(10)
						FROM sys.columns AS c
							INNER JOIN sys.types AS t
								ON t.user_type_id = c.user_type_id
						WHERE c.object_id = tt.type_table_object_id
						ORDER BY column_id
						FOR XML PATH('''')), 1, 2, '''')) + '')'' COLLATE DATABASE_DEFAULT AS object_definition
	FROM sys.objects AS o
		INNER JOIN sys.table_types AS tt
			ON tt.type_table_object_id = o.object_id
	WHERE (tt.type_table_object_id = OBJECT_ID(@object_name) OR @object_name IS NULL)
' 
	
	INSERT INTO #output (database_name, object_type, object_name, object_definition)
	EXECUTE sys.sp_executesql 
        @stmt = @sqlstring
        , @params = N'@object_name NVARCHAR(257)'
        , @object_name = @object_name;

	FETCH NEXT FROM dbs INTO @dbname 
END 
CLOSE dbs; 
DEALLOCATE dbs; 

SELECT @@SERVERNAME AS server_name
		, database_name
        , object_type
		, object_name
		, object_definition
        , CONVERT(varchar(100), HASHBYTES('md5', LTRIM(RTRIM(object_definition))),1) AS object_hash 		    
	FROM #output
    ORDER BY object_type
        , object_name

Executed in my local instance, the result looks like this:

AS mentioned earlier, each object has got a hash, which later will help to compare them all.

Time to Eat the Elephant

So far so good, I can use this query to target one or multiple databases at once in the same server, but I do have two instances and still row by row comparison will be a killer, so we need something extra, some sort of super power to go through all of them in on go… or just use PowerShell!

The approach again is One Object at a Time, why? because once we have compared one object against all databases, we have the whole picture for that object and we know if it has drifted, where has drifted and the most important, we can start figuring out what to do with it.

I would need:

  • A master copy with all objects to be the reference
  • The list of servers I want to target
  • The list of databases I want to target

 

And in return, I will get one folder for each object which will contain

  • One file with the list of databases where the object is different than the reference
  • One file per each different copy, named after the hash of the definition of the module

 

I know it might look a bit rudimentary, but there are multiple free tools for text comparison you can use and since you are doing one object at a time and hopefully there are not that many different versions of the objects, I found it simple enough to do it this way.

The Hack

Time to put all together, the process will:

  • Get all servers we want to target
  • Get all object definitions from the master copy
  • Create the folder structure
  • Get one object and loop through each server/database and write down when it’s different in the right folder.

#Set-executionPolicy -ExecutionPolicy Unrestricted
Import-Module sqlps;
Clear-Host;
$StartDate=(Get-Date)

# Config values
$schema = "AdventureWorksLT"
$control_srv = 'localhost\mssql2019';
$control_db = "AdventureWorksLT";
$server_query = "SELECT 'localhost\MSSQL2019' AS ServerUrl UNION SELECT 'localhost\MSSQL2019B'"

############################################
# DO NOT TOUCH BELOW THIS LINE
############################################

$server_list =  Invoke-Sqlcmd -ServerInstance $control_srv `
                        -Database $control_db `
                        -DisableVariables `
                        -Query $server_query `
                        -ConnectionTimeout 0 `
                        -MaxCharLength 64000;

$sql = "SET NOCOUNT ON;
DECLARE @object_name	nvarchar(257) = '[@object_name]' 
DECLARE @sqlstring		nvarchar(MAX)
DECLARE @dbname			sysname = '[@dbname]'

SET @dbname = NULLIF(@dbname, '')
SET @object_name = NULLIF(@object_name, '')

IF OBJECT_ID('tempdb..#output') IS NOT NULL DROP TABLE #output

CREATE TABLE #output  (
    database_name sysname			NOT NULL, 
    object_type nvarchar(60)		NOT NULL, 
    object_name nvarchar(257)		NOT NULL, 
    object_definition nvarchar(MAX) NOT NULL
)

DECLARE dbs CURSOR FORWARD_ONLY READ_ONLY FAST_FORWARD LOCAL
	FOR SELECT name 
            FROM sys.databases 
 		    WHERE database_id &gt; 4 
		        AND state = 0
		        AND (name = @dbname OR (@dbname IS NULL AND name LIKE 'AdventureWorksLT_%'))

OPEN dbs 
FETCH NEXT FROM dbs INTO @dbname 

WHILE @@FETCH_STATUS = 0 BEGIN 
	SET @sqlstring	= N'USE ' + QUOTENAME(@dbname) + N'

SET NOCOUNT ON;

SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS [database_name] 
		, o.type_desc COLLATE DATABASE_DEFAULT AS [object_type]
		, OBJECT_SCHEMA_NAME(o.object_id) + ''.'' + OBJECT_NAME(o.object_id) COLLATE DATABASE_DEFAULT AS [object_name] 
		, LTRIM(RTRIM(OBJECT_DEFINITION(o.OBJECT_ID))) AS [object_definition]
	FROM sys.objects AS o
	WHERE o.type IN (''TR'',''FN'',''P'',''TF'',''V'')
        AND (o.object_id = OBJECT_ID(@object_name) OR @object_name IS NULL)

UNION ALL

SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS [database_name]
		, type_desc COLLATE DATABASE_DEFAULT AS [type_desc] 
		, OBJECT_SCHEMA_NAME(object_id) + ''.'' + name COLLATE DATABASE_DEFAULT AS object_name
		, ''CREATE TABLE '' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + ''.'' + QUOTENAME(name) + ''('' + CHAR(10) +
			(STUFF(
					(SELECT '', '' + QUOTENAME(c.name) + '' '' + t.name + 
							CASE WHEN t.name IN (''char'', ''nchar'', ''binay'', ''varchar'', ''nvarchar'', ''varbinay'')
								THEN ''('' + CONVERT(SYSNAME, c.max_length) + '')'' 
								ELSE ''''
							END +
							CASE WHEN c.is_identity = 1 THEN '' IDENTITY'' ELSE '''' END +
							ISNULL('' COLLATE '' + c.collation_name, '''') +
							CASE WHEN c.is_nullable = 0 THEN '' NOT'' ELSE '''' END + '' NULL'' + CHAR(10)
						FROM sys.columns AS c
							INNER JOIN sys.types AS t
								ON t.user_type_id = c.user_type_id
						WHERE c.object_id = o.object_id
						ORDER BY column_id
						FOR XML PATH('''')), 1, 2, '''')) + '')'' COLLATE DATABASE_DEFAULT AS object_definition
	FROM sys.objects AS o
	WHERE type IN (''U'')
		AND OBJECTPROPERTY(object_id, ''IsMsShipped'') = 0
		AND (object_id = OBJECT_ID(@object_name) OR @object_name IS NULL)

UNION ALL

SELECT DB_NAME() COLLATE DATABASE_DEFAULT AS database_name 
		, o.type_desc COLLATE DATABASE_DEFAULT
		, SCHEMA_NAME(tt.schema_id) + ''.'' + tt.name COLLATE DATABASE_DEFAULT AS object_name
		, ''CREATE TYPE '' + QUOTENAME(SCHEMA_NAME(tt.schema_id)) + ''.'' + QUOTENAME(tt.name) + '' AS TABLE ('' + CHAR(10) +
			(STUFF(
					(SELECT '', '' + QUOTENAME(c.name) + '' '' + t.name + 
							CASE WHEN t.name IN (''char'', ''nchar'', ''binay'', ''varchar'', ''nvarchar'', ''varbinay'')
								THEN ''('' + CONVERT(SYSNAME, c.max_length) + '')'' 
								ELSE ''''
							END +
							CASE WHEN c.is_identity = 1 THEN '' IDENTITY'' ELSE '''' END +
							ISNULL('' COLLATE '' + c.collation_name, '''') +
							CASE WHEN c.is_nullable = 0 THEN '' NOT'' ELSE '''' END + '' NULL'' + CHAR(10)
						FROM sys.columns AS c
							INNER JOIN sys.types AS t
								ON t.user_type_id = c.user_type_id
						WHERE c.object_id = tt.type_table_object_id
						ORDER BY column_id
						FOR XML PATH('''')), 1, 2, '''')) + '')'' COLLATE DATABASE_DEFAULT AS object_definition
	FROM sys.objects AS o
		INNER JOIN sys.table_types AS tt
			ON tt.type_table_object_id = o.object_id
	WHERE (tt.type_table_object_id = OBJECT_ID(@object_name) OR @object_name IS NULL)
' 
	
	INSERT INTO #output (database_name, object_type, object_name, object_definition)
	EXECUTE sys.sp_executesql 
        @stmt = @sqlstring
        , @params = N'@object_name NVARCHAR(257)'
        , @object_name = @object_name;

	FETCH NEXT FROM dbs INTO @dbname 
END 
CLOSE dbs; 
DEALLOCATE dbs; 

SELECT @@SERVERNAME AS server_name
		, database_name
        , object_type
		, object_name
		, object_definition
        , CONVERT(varchar(100), HASHBYTES('md5', LTRIM(RTRIM(object_definition))),1) AS object_hash 		    
	FROM #output
    ORDER BY object_type
        , object_name";

$master_copy =  Invoke-Sqlcmd -ServerInstance $master_srv `
                                -Database $master_db `
                                -DisableVariables `
                                -Query $sql.Replace("[@object_name]", "").Replace("[@dbname]","$($master_db)") `
                                -ConnectionTimeout 0 `
                                -MaxCharLength 64000;

$object_types = $master_copy | Select-Object object_type -Unique

# to generate a single path per run
$path = "c:/temp/database_schemas/$($schema)/$(Get-Date -Format "yyyyMMddHHmmss")/";
if ( (Test-Path -path $path) -eq $false){
    #write-output "path does not exist"
    New-Item -path $path -ItemType Directory | Out-Null;
}

# to generate a single folder per object_type
foreach ($type in $object_types){
    if ( (Test-Path -Path "$($path)/$($type.object_type)" ) -eq $false){
        #write-output "path does not exist"
        New-Item -Path "$($path)/$($type.object_type)" -ItemType Directory | Out-Null;
    }    
}

foreach ($db_object in $master_copy){

    # to cummulate all copies of the object from all servers
    $local_copy = @();
    $errors = @();

    Write-Output "Processing $($db_object.object_type) =&amp;amp;amp;amp;gt; $($db_object.object_name)";

    $db_object_path = "$($path)/$($db_object.object_type)/$($db_object.object_name)";
        if ( (Test-Path -path $db_object_path) -eq $false){
            New-Item -path $db_object_path -ItemType Directory | Out-Null;
    }

    $outputfile = "$($db_object_path)/all_diffs.txt";
    $db_object | Select-Object database_name, object_name, object_hash | Out-File $outputfile -Append;

    foreach ($srv in $server_list){
    
        Write-Output "`tProcessing server $($srv.ServerUrl)";

        $objDataTable = Invoke-Sqlcmd -ServerInstance "$($srv.ServerUrl)" `
                                -Database "master" `
                                -DisableVariables `
                                -Query $sql.Replace("[@object_name]", $db_object.object_name).Replace("[@dbname]","") `
                                -ConnectionTimeout 0 `
                                -MaxCharLength 64000;                                

        if ($null -ne $objDataTable){
            $local_copy += $objDataTable;
        }
        else{
            $errors += [pscustomobject]@{server_name=$db.ServerUrl;database_name=$db.database_name;object_name=$db_object.object_name;object_hash=0x0};
        }
    }
    
    # Drifts are any row that does not match the object_hash
    # $local_copy | Select-Object database_name, objecT_hash | Where-Object -Property object_hash -ne $db_object.object_hash
    $drifts = $local_copy | Select-Object | Where-Object -Property object_hash -ne $db_object.object_hash

    if ($drifts.Count -eq 0){
    #if no drifts for this object, remove the noise
        Remove-Item -Path $db_object_path -Recurse;
    }
    else{
        # if there was any drift, get unique versions from all servers
        $diff_versions = $drifts | Select-Object object_hash, object_definition -Unique;

        # add each database that drifted to the list
        $errors | Select-Object server_name, database_name, object_name, object_hash | Out-File $outputfile -Append;
        $drifts | Select-Object server_name, database_name, object_name, object_hash | Sort-Object -Property object_hash | Out-File $outputfile -Append;
        
        # put the definition from the master object 
        $db_object.object_definition | Out-File "$($db_object_path)/$($db_object.object_name).sql" 

        foreach ($diff in $($diff_versions)){
            $diff.object_definition | Out-File "$($db_object_path)/$($diff.object_hash).sql" 
        }
    }
}

$EndDate=(Get-Date);

New-TimeSpan –Start $StartDate –End $EndDate | Format-Table -AutoSize;

Once the script completes a run, you can go to your local dir c:\temp

There is one folder per [object_type] that contains one folder for each object that has drifted. If the object is the same in all databases, the process will take care of it and remove the noise, so you can focus on what is important.

On each folder that represents an object, you can see different files:

  • The master copy of the object, named after it.
  • all_diffs.txt, which will contain a list of databases where the object has drifted.
    This is the most important file, because it identifies which database has which version of the object.
  • One file per different version named after their hashes, you should be fine to compare this in a text editor.

Using the free tool WinMerge, you can compare three files at a time, which is quite handy.

Conclusion

Keeping databases schemas in sync is a big thing for me now, so trust me when I say doing this was bugging me for a while until I decided (and got the time) to get it done.

I developed this script/tool based on my requirements and yours might be different, but I am pretty sure you still can get some good ideas and draw a plan that fits your needs.

I really enjoy this kind of challenges and it is amazing how much we can do with a little bit of TSQL and a little bit of PowerShell.

You can find the scripts in GitHub, I hope you find it useful and please let me know in the comments what you think.

Thanks for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.