Spring cleanup, LOB considerations

For DBA’s it is important to have the ability to recover databases quickly in case of disaster. In order to maximize the potential we need to have our databases ready but sometimes that can be tricky So spring time is here and probably you feel like cleaning up your databases, and that’s good.

In previous posts I have explained how having dedicated filegroups for user data can improve our RTO by, in case of disaster, recovering critical data first and then the rest.

The thing is when you deal with databases which were not created this way, you need to move the data from one filegroup to another before you can apply this kind of techniques.

Here is where this post can show you one of the gotchas you can find during this process.

Backuground

According to Books Online

You can use the DROP_EXISTING clause to rebuild the index, add or drop columns, modify options, modify column sort order, or change the partition scheme or filegroup

But let’s say you have old legacy databases that were created back in the day without planning about future growth with only one filegroup and one data file and you that is now causing you some problems in the form of contention or very long recovery time.

Also let’s say some of your tables use LOB data types like MAX or even the old BLOBS (TEXT, NTEXT, IMAGE).

To picture this better, let’s create a simple database the old style and one table

USE master

CREATE DATABASE LOB_allocations
GO

USE LOB_allocations
GO

-- Create table with NVARCHAR(MAX) column (LOB)
CREATE TABLE dbo.blobs (
Id	INT IDENTITY NOT NULL CONSTRAINT PK_blobs PRIMARY KEY
, Column1 VARCHAR(50) DEFAULT 'a'
, blob_column NVARCHAR(MAX) DEFAULT REPLICATE(CONVERT(NVARCHAR(MAX), N'b'), 10000)
)
GO

-- populate the table
INSERT INTO dbo.blobs ( Column1, blob_column )
DEFAULT VALUES
GO 20

That give us the scenario to start with, now let me show you the distribution of our data per allocation unit type.

-- Check the different allocation units
SELECT OBJECT_NAME(ix.object_id)
		, ix.name
		, FILEGROUP_NAME(au.data_space_id) AS [Filegroup_name]
		, au.*
		, p.rows
	FROM sys.indexes AS ix
		INNER JOIN sys.partitions AS p
			ON p.object_id = ix.object_id 
				AND p.index_id = ix.index_id
		INNER JOIN sys.allocation_units AS au
			ON au.container_id = p.hobt_id	
WHERE ix.object_id IN (OBJECT_ID('dbo.blobs'))

So we can see the different allocation unit types along to the filegroup they belong to.

And because we’re in the spring cleaning mood, we want to move our table to a new filegroup and leave PRIMARY alone for other matters. We need to create the new filegroup first.

-- Add new filegroup and file 
ALTER DATABASE LOB_allocations ADD FILEGROUP [USER_DATA]
GO
ALTER DATABASE LOB_allocations 
	ADD FILE (NAME='LOB_allocations_user_data'
			, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQL2016\MSSQL\DATA\LOB_allocations_user_data.ndf'
			, SIZE= 16MB
			, FILEGROWTH=16MB) 
	TO FILEGROUP [USER_DATA]

And now we can follow the instructions and move our table from one filegroup to another by rebuilding the clustered index WITH DROP_EXISTING, good stuff.

-- Move the clustered index to the new filegroup
CREATE UNIQUE CLUSTERED INDEX PK_blobs ON dbo.blobs (Id) WITH (DROP_EXISTING = ON) ON [USER_DATA]

So now our table lives in a different filegroup, nice. Let’s see

-- check again the allocation units.
SELECT OBJECT_NAME(ix.object_id)
		, ix.name
		, FILEGROUP_NAME(au.data_space_id) AS [Filegroup_name]
		, au.*
		, p.rows
	FROM sys.indexes AS ix
		INNER JOIN sys.partitions AS p
			ON p.object_id = ix.object_id 
				AND p.index_id = ix.index_id
		INNER JOIN sys.allocation_units AS au
			ON au.container_id = p.hobt_id	
WHERE ix.object_id IN (OBJECT_ID('dbo.blobs'))

Hm, that’s not nice… Seems like rebuilding the clustered index does not affect the LOB allocation units, so to move completely our data, we’d need to create a new table in the new filegroup and dump the data, with the implied hassle.

Conclusion

Since it’s a great idea to separate logically and physically our data using files and filegroups, it’s always best if we plan ahead for this so we wouldn’t find some gotchas when doing it at later stages.

And once again we can see the importance of a bit of knowledge about internals and how the data is distributed along the different types of allocation units the SQL Server uses to store data.

Hope you find this post helpful and as always please use the comments below for any question you may have.

Thanks for reading!

Leave a Reply Cancel reply