{"id":119,"date":"2016-02-15T15:44:41","date_gmt":"2016-02-15T15:44:41","guid":{"rendered":"https:\/\/sqldoubleg.live-website.com\/?p=119"},"modified":"2016-02-22T21:44:09","modified_gmt":"2016-02-22T21:44:09","slug":"row-compression-a-bit-of-internals","status":"publish","type":"post","link":"https:\/\/www.sqldoubleg.com\/es\/2016\/02\/15\/row-compression-a-bit-of-internals\/","title":{"rendered":"Row compression, a bit of internals"},"content":{"rendered":"<p>Compression in SQL Server has been around for a while now, come explore how it works internally&nbsp;Compression in SQL Server has been around for a while now, to be precise since SQL Server 2008, and comes in two different (complementary) flavors, ROW and PAGE compression. Unfortunately it&#8217;s only available in the Enterprise Edition (and Developer per instance).<\/p>\n<p>This article is not a substitute for the in-depth knowledge you can get by reading the <a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/dd894051(v=sql.100).aspx\" target=\"_blank\">whitepaper<\/a> and other reputable sources, consider it just a teaser for you to go back and play a bit with this interesting feature.<\/p>\n<p>The compression I want to focus this writing is ROW compression, which is the first step in SQL Server native compression, where all columns in a table where is applied behave as if all datatypes were variable length (if possible).<\/p>\n<p>That is totally cool, since you know that a variable length column will use only as much storage as required up to the maximum length defined for the column (see VARCHAR as example).<br \/>\nThis same behavior will apply for all those data types which are fixed length and may result in a good deal of storage savings at the end of the day. (If the data stored in them do not require all storage, of course)<\/p>\n<p>ROW compression reduces also the metadata overhead for each row, so we&#8217;d be able to save a few bytes there too.<\/p>\n<p>To see how differently the data is physically stored in the pages, I&#8217;ve created this demo<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE master\r\nGO\r\nIF DB_ID('compressiontestdb') IS NOT NULL BEGIN\r\n\tALTER DATABASE [compressiontestdb] SET SINGLE_USER WITH ROLLBACK IMMEDIATE\r\n\tDROP DATABASE [compressiontestdb]\r\nEND\r\nGO\r\n\t\r\nCREATE DATABASE [compressiontestdb]\r\nGO\r\n\r\nUSE compressiontestdb\r\nGO\r\n\r\nIF OBJECT_ID('dbo.no_compressed_data')\tIS NOT NULL DROP TABLE dbo.no_compressed_data\r\nIF OBJECT_ID('dbo.compressed_data')\t\tIS NOT NULL DROP TABLE dbo.compressed_data\r\nGO\r\n\r\nCREATE TABLE dbo.no_compressed_data(\r\n\tId\t\tBIGINT IDENTITY(1,1) NOT NULL,\r\n\tCol2\tINT NOT NULL ,\r\n\tCol3\tINT NOT NULL ,\r\n\tCol4\tSMALLINT NOT NULL,\r\n\tPRIMARY KEY (Id) \r\n)\r\nGO\r\n\r\nCREATE TABLE dbo.compressed_data(\r\n\tId\t\tBIGINT IDENTITY(1,1) NOT NULL,\r\n\tCol2\tINT NOT NULL ,\r\n\tCol3\tINT NOT NULL ,\r\n\tCol4\tSMALLINT NOT NULL,\r\n\tPRIMARY KEY (Id) WITH (DATA_COMPRESSION=ROW)\r\n)\r\nGO\r\n\r\n\r\nINSERT INTO dbo.no_compressed_data\r\nSELECT 2,3,4\r\nFROM sys.objects AS o1\r\nCROSS APPLY sys.objects AS o2\r\n\r\nINSERT INTO dbo.compressed_data\r\nSELECT Col2\r\n\t   , Col3\r\n\t   , Col4\r\n\tFROM dbo.no_compressed_data\r\n\r\n\r\n-- See list of pages allocated to the cluestered index of each table\r\nDBCC IND('compressiontestdb', 'dbo.no_compressed_data', 1)\r\nDBCC IND('compressiontestdb', 'dbo.compressed_data', 1)\r\nGO\r\n-- First glance, Compressed table has half of the pages \/ hence the size, good gain\r\n\r\nDBCC TRACEON(3604,1)\r\n\r\n-- No Compression\r\n-- Use first page with PageType = 1\r\nDBCC PAGE('compressiontestdb', 1, 119, 3)\r\n\r\n-- Row Compression\r\n-- Use first page with PageType = 1\r\nDBCC PAGE('compressiontestdb', 1, 147, 3)\r\n<\/pre>\n<p>&nbsp;<br \/>\nThe first impression is good, DBCC IND returned half the pages for the compressed version of the table, so it&#8217;s half the size&#8230; that&#8217;s what we talkin&#8217; about!<\/p>\n<p>If we go deeper and see the header of the first page of each clustered index, we can see how many rows fit in a single page with and without compression<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/02\/slot_count.png\" alt=\"slot_count\" width=\"898\" height=\"156\" class=\"aligncenter size-full wp-image-120\" \/><\/p>\n<p>We fit 736 compressed rows versus 299 rows without compression, not bad at all. Obviously those ratios may vary depending on the data we store, the smaller the data comparing to the fixed length data type, the more gain we get.<\/p>\n<p>If we compare row to row we can see how different they got to the data pages.<\/p>\n<p><a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/02\/row_comparison.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/02\/row_comparison.png\" alt=\"row_comparison\" width=\"1323\" height=\"558\" class=\"aligncenter size-full wp-image-121\" \/><\/a><\/p>\n<p>On the left side, we can see the data as is, where each value take as much space as its data type is supposed to, but on the right we see that the physical storage is the minimum to hold the value, so if it is a number which fit in one byte, it physically takes one byte, regardless it&#8217;s a SMALLINT(2 bytes), INT (4 bytes) or BIGINT(8 bytes)<\/p>\n<p>But even if we push the data to the boundaries of the data type, we still might get some benefit, not much, but some.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE master\r\nGO\r\nIF DB_ID('compressiontestdb') IS NOT NULL BEGIN\r\n\tALTER DATABASE [compressiontestdb] SET SINGLE_USER WITH ROLLBACK IMMEDIATE\r\n\tDROP DATABASE [compressiontestdb]\r\nEND\r\nGO\r\n\t\r\nCREATE DATABASE [compressiontestdb]\r\nGO\r\n\r\nUSE compressiontestdb\r\nGO\r\n\r\nIF OBJECT_ID('dbo.no_compressed_data')\tIS NOT NULL DROP TABLE dbo.no_compressed_data\r\nIF OBJECT_ID('dbo.compressed_data')\t\tIS NOT NULL DROP TABLE dbo.compressed_data\r\nGO\r\n\r\nCREATE TABLE dbo.no_compressed_data(\r\n\tId\t\tINT IDENTITY(1000000000,1) NOT NULL,\r\n\tCol2\tINT NOT NULL ,\r\n\tCol3\tINT NOT NULL ,\r\n\tCol4\tSMALLINT NOT NULL,\r\n\tPRIMARY KEY (Id) \r\n)\r\nGO\r\n\r\nCREATE TABLE dbo.compressed_data(\r\n\tId\t\tINT IDENTITY(1000000000,1) NOT NULL,\r\n\tCol2\tINT NOT NULL ,\r\n\tCol3\tINT NOT NULL ,\r\n\tCol4\tSMALLINT NOT NULL,\r\n\tPRIMARY KEY (Id) WITH (DATA_COMPRESSION=ROW)\r\n)\r\nGO\r\n\r\n\r\nINSERT INTO dbo.no_compressed_data\r\nSELECT 1000000000,1000000000,32000\r\nFROM sys.objects AS o1\r\nCROSS APPLY sys.objects AS o2\r\n\r\nINSERT INTO dbo.compressed_data\r\nSELECT Col2\r\n\t   , Col3\r\n\t   , Col4\r\n\tFROM dbo.no_compressed_data\r\n\r\n\r\n-- See list of pages allocated to the cluestered index of each table\r\nDBCC IND('compressiontestdb', 'dbo.no_compressed_data', 1)\r\nDBCC IND('compressiontestdb', 'dbo.compressed_data', 1)\r\nGO\r\n\r\nDBCC TRACEON(3604,1)\r\n\r\n-- No Compression\r\nDBCC PAGE('compressiontestdb', 1, 178, 3)\r\n\r\n-- Row Compression\r\nDBCC PAGE('compressiontestdb', 1, 191, 3)\r\n<\/pre>\n<p>&nbsp;<br \/>\n24 pages for not compressed data versus 21, you can see by yourselves the different way of storing the data. Different row size because of the metadata.<br \/>\n<a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/02\/row_comparison_2.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/02\/row_comparison_2.png\" alt=\"row_comparison_2\" width=\"1389\" height=\"451\" class=\"aligncenter size-full wp-image-122\" \/><\/a><\/p>\n<p>&nbsp;<br \/>\n<strong>UNICODE compression<\/strong><br \/>\n&nbsp;<\/p>\n<p>Another important gain will be in case you need to store a mix of UNICODE and not UNICODE values, and therefore you have defined a column as NVARCHAR(NOT MAX). In that case, if you apply compression, the non UNICODE characters will take 1 byte and not 2 as UNICODE does, so that overhead will be use in case is strictly necessary.<\/p>\n<p>This point is not very clear when you read Books Online, as depending on the version you read, is correct or not.<\/p>\n<p>Correct<br \/>\n<a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/cc280576(v=sql.105).aspx\" target=\"_blank\">https:\/\/technet.microsoft.com\/en-us\/library\/cc280576(v=sql.105).aspx<\/a><br \/>\nWrong<br \/>\n<a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/cc280576(v=sql.110).aspx\" target=\"_blank\">https:\/\/technet.microsoft.com\/en-us\/library\/cc280576(v=sql.110).aspx<\/a><\/p>\n<p>You can see that this was an improvement that started to apply in SQL Server 2008R2, not in 2008 as compression itself.<\/p>\n<p><a href=\"https:\/\/blogs.msdn.microsoft.com\/sqlserverstorageengine\/2009\/08\/16\/unicode-compression-in-sql-server-2008r2\/\" target=\"_blank\">https:\/\/blogs.msdn.microsoft.com\/sqlserverstorageengine\/2009\/08\/16\/unicode-compression-in-sql-server-2008r2\/<\/a><\/p>\n<p>You can see how UNICODE compression applies in the following example<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE master\r\nGO\r\nIF DB_ID('compressiontestdb') IS NOT NULL BEGIN\r\n\tALTER DATABASE [compressiontestdb] SET SINGLE_USER WITH ROLLBACK IMMEDIATE\r\n\tDROP DATABASE [compressiontestdb]\r\nEND\r\nGO\r\n\t\r\nCREATE DATABASE [compressiontestdb]\r\nGO\r\n\r\nUSE compressiontestdb\r\nGO\r\n\r\nIF OBJECT_ID('dbo.no_compressed_data')\tIS NOT NULL DROP TABLE dbo.no_compressed_data\r\nIF OBJECT_ID('dbo.compressed_data')\t\tIS NOT NULL DROP TABLE dbo.compressed_data\r\nGO\r\n\r\nCREATE TABLE dbo.no_compressed_data(\r\n\tId\t\tINT IDENTITY(1000000000,1) NOT NULL,\r\n\tCol2\tINT NOT NULL ,\r\n\tCol3\tINT NOT NULL ,\r\n\tCol4\tNVARCHAR(200) NULL,\r\n\tPRIMARY KEY (Id) \r\n)\r\nGO\r\n\r\nCREATE TABLE dbo.compressed_data(\r\n\tId\t\tINT IDENTITY(1000000000,1) NOT NULL,\r\n\tCol2\tINT NOT NULL ,\r\n\tCol3\tINT NOT NULL ,\r\n\tCol4\tNVARCHAR(200) NULL,\r\n\tPRIMARY KEY (Id) WITH (DATA_COMPRESSION=ROW)\r\n)\r\nGO\r\n\r\n\r\nINSERT INTO dbo.no_compressed_data\r\nSELECT 1000000000,32000, REPLICATE(N'A', 200)\r\nFROM sys.objects AS o1\r\nCROSS APPLY sys.objects AS o2\r\n\r\nINSERT INTO dbo.compressed_data\r\nSELECT Col2\r\n\t   , Col3\r\n\t   , Col4\r\n\tFROM dbo.no_compressed_data\r\n\r\n\r\n-- See list of pages allocated to the cluestered index of each table\r\nDBCC IND('compressiontestdb', 'dbo.no_compressed_data', 1)\r\nDBCC IND('compressiontestdb', 'dbo.compressed_data', 1)\r\nGO\r\n\r\nDBCC TRACEON(3604,1)\r\n\r\n-- No Compression\r\nDBCC PAGE('compressiontestdb', 1, 119, 3)\r\n\r\n-- Row Compression\r\nDBCC PAGE('compressiontestdb', 1, 147, 3)\r\n<\/pre>\n<p>&nbsp;<br \/>\nSee how compression do the job here<br \/>\n<a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/02\/unicode_compression.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/02\/unicode_compression.png\" alt=\"unicode_compression\" width=\"1288\" height=\"789\" class=\"aligncenter size-full wp-image-132\" srcset=\"https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/02\/unicode_compression.png 1288w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/02\/unicode_compression-300x184.png 300w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/02\/unicode_compression-1024x627.png 1024w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/02\/unicode_compression-150x92.png 150w\" sizes=\"(max-width: 1288px) 100vw, 1288px\" \/><\/a><\/p>\n<p>And remember UNICODE compression only applies for non LOB columns (that excludes NVARCHAR(MAX) too). <\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE master\r\nGO\r\nIF DB_ID('compressiontestdb') IS NOT NULL BEGIN\r\n\tALTER DATABASE [compressiontestdb] SET SINGLE_USER WITH ROLLBACK IMMEDIATE\r\n\tDROP DATABASE [compressiontestdb]\r\nEND\r\nGO\r\n\t\r\nCREATE DATABASE [compressiontestdb]\r\nGO\r\n\r\nUSE compressiontestdb\r\nGO\r\n\r\nIF OBJECT_ID('dbo.no_compressed_data')\tIS NOT NULL DROP TABLE dbo.no_compressed_data\r\nIF OBJECT_ID('dbo.compressed_data')\t\tIS NOT NULL DROP TABLE dbo.compressed_data\r\nGO\r\n\r\nCREATE TABLE dbo.no_compressed_data(\r\n\tId\t\tINT IDENTITY(1000000000,1) NOT NULL,\r\n\tCol2\tINT NOT NULL ,\r\n\tCol3\tINT NOT NULL ,\r\n\tCol4\tNVARCHAR(MAX) NULL,\r\n\tPRIMARY KEY (Id) \r\n)\r\nGO\r\n\r\nCREATE TABLE dbo.compressed_data(\r\n\tId\t\tINT IDENTITY(1000000000,1) NOT NULL,\r\n\tCol2\tINT NOT NULL ,\r\n\tCol3\tINT NOT NULL ,\r\n\tCol4\tNVARCHAR(MAX) NULL,\r\n\tPRIMARY KEY (Id) WITH (DATA_COMPRESSION=ROW)\r\n)\r\nGO\r\n\r\n\r\nINSERT INTO dbo.no_compressed_data\r\nSELECT 1000000000,32000, REPLICATE(N'A', 200)\r\nFROM sys.objects AS o1\r\nCROSS APPLY sys.objects AS o2\r\n\r\nINSERT INTO dbo.compressed_data\r\nSELECT Col2\r\n\t   , Col3\r\n\t   , Col4\r\n\tFROM dbo.no_compressed_data\r\n\r\n\r\n-- See list of pages allocated to the cluestered index of each table\r\nDBCC IND('compressiontestdb', 'dbo.no_compressed_data', 1)\r\nDBCC IND('compressiontestdb', 'dbo.compressed_data', 1)\r\nGO\r\n\r\nDBCC TRACEON(3604,1)\r\n\r\n-- No Compression\r\nDBCC PAGE('compressiontestdb', 1, 119, 3)\r\n\r\n-- Row Compression\r\nDBCC PAGE('compressiontestdb', 1, 147, 3)\r\n<\/pre>\n<p>&nbsp;<br \/>\nIn this case the result is that no compression can apply.<br \/>\n<a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/02\/unicode_no_compression.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/02\/unicode_no_compression.png\" alt=\"unicode_no_compression\" width=\"1284\" height=\"947\" class=\"aligncenter size-full wp-image-133\" srcset=\"https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/02\/unicode_no_compression.png 1284w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/02\/unicode_no_compression-300x221.png 300w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/02\/unicode_no_compression-1024x755.png 1024w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/02\/unicode_no_compression-150x111.png 150w\" sizes=\"(max-width: 1284px) 100vw, 1284px\" \/><\/a><\/p>\n<p>Hope you enjoyed the reading and now feel more curious about all these internals.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Compression in SQL Server has been around for a while now, come explore how it works internally&nbsp;Compression in SQL Server has been around for a while now, to be precise since SQL&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[24,12],"_links":{"self":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/119"}],"collection":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/comments?post=119"}],"version-history":[{"count":0,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/119\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/media?parent=119"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/categories?post=119"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/tags?post=119"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}