{"id":181,"date":"2016-03-24T11:28:24","date_gmt":"2016-03-24T11:28:24","guid":{"rendered":"https:\/\/sqldoubleg.live-website.com\/?p=181"},"modified":"2016-03-24T13:24:05","modified_gmt":"2016-03-24T13:24:05","slug":"sql-server-2016-compress-and-decompress-performance","status":"publish","type":"post","link":"https:\/\/www.sqldoubleg.com\/es\/2016\/03\/24\/sql-server-2016-compress-and-decompress-performance\/","title":{"rendered":"SQL Server 2016 COMPRESS and DECOMPRESS performance"},"content":{"rendered":"<p>More new functionalities coming along with SQL Server 2016, this time I&#8217;ve been playing with COMPRESS and DECOMPRESS&nbsp;One of the most exciting things for me in this new version of SQL Server is inclusion of native gzip compression into the, I guess, storage engine functionalities.<\/p>\n<p>I know you can implement your own CLR version of it and so on, but for many people that is not a real possibility due to security concerns, lack of resources, knowledge or any other reason, that&#8217;s why for many this will be really good news.<\/p>\n<p>I couldn&#8217;t find any reference to which SQL Server editions will have it, but I want to be optimistic and think it will be available at least in Standard Edition, as opposite as ROW and PAGE compression (Enterprise only)<\/p>\n<p>One of the many use cases might be to compress big text columns, either VARCHAR(MAX) or NVARCHAR(MAX), since ROW and PAGE compression don&#8217;t quite work well on these data types.<\/p>\n<p>Let&#8217;s see how much space we can save using a table from the sample database [AdventureWorks2016CTP3] on my shinny new SQL Server 2016 RC1 instance. I&#8217;ve chosen the table [Person].[Person] as base, since it&#8217;s one of the \u00abbig boys\u00bb (hm hm), but I have to convert from XML to NVARCHAR(MAX).<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE [AdventureWorks2016CTP3]\r\nGO\r\nDROP TABLE IF EXISTS [Person].[Person_nvarchar]\r\nGO\r\n-- Create table as NVARCHAR(MAX) from the originally XML columns\r\nSELECT [BusinessEntityID]\r\n      ,[PersonType]\r\n      ,[NameStyle]\r\n      ,[Title]\r\n      ,[FirstName]\r\n      ,[MiddleName]\r\n      ,[LastName]\r\n      ,[Suffix]\r\n      ,[EmailPromotion]\r\n      ,(CONVERT(NVARCHAR(MAX), [AdditionalContactInfo])) AS [AdditionalContactInfo]\r\n      ,(CONVERT(NVARCHAR(MAX), [Demographics])) AS [Demographics]\r\n      ,[rowguid]\r\n      ,[ModifiedDate]\r\n\tINTO   [Person].[Person_nvarchar]\r\n\tFROM [Person].[Person]\r\nGO\r\nCREATE UNIQUE CLUSTERED INDEX CIX_Person_nvarchar ON [Person].[Person_nvarchar] ([BusinessEntityID])\r\nGO\r\n\r\nUSE [AdventureWorks2016CTP3]\r\nGO\r\nDROP TABLE IF EXISTS [Person].[Person_gzip]\r\nGO\r\n-- Create table with gzip compression on the originally XML columns\r\nSELECT [BusinessEntityID]\r\n      ,[PersonType]\r\n      ,[NameStyle]\r\n      ,[Title]\r\n      ,[FirstName]\r\n      ,[MiddleName]\r\n      ,[LastName]\r\n      ,[Suffix]\r\n      ,[EmailPromotion]\r\n      ,COMPRESS(CONVERT(NVARCHAR(MAX), [AdditionalContactInfo])) AS [AdditionalContactInfo]\r\n      ,COMPRESS(CONVERT(NVARCHAR(MAX), [Demographics])) AS [Demographics]\r\n      ,[rowguid]\r\n      ,[ModifiedDate]\r\n\tINTO   [Person].[Person_gzip]\r\n\tFROM [Person].[Person]\r\nGO\r\nCREATE UNIQUE CLUSTERED INDEX CIX_Person_gzip ON [Person].[Person_gzip] ([BusinessEntityID])\r\nGO\r\n<\/pre>\n<p>Now we can see some numbers.<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\n-- See numbers\r\nSELECT OBJECT_SCHEMA_NAME(ix.object_id) AS [schema]\r\n\t\t, OBJECT_NAME(ix.object_id)\t\tAS [object_name]\r\n\t\t, p.rows\t\t\t\t\t\tAS [row_count]\r\n\t\t, SUM(ps.used_page_count)\t\tAS [used_page_count]\r\n\tFROM sys.indexes AS ix\r\n\t\tINNER JOIN sys.partitions AS p\r\n\t\t\tON p.object_id = ix.object_id\r\n\t\t\t\tAND p.index_id = ix.index_id\r\n\t\tINNER JOIN sys.dm_db_partition_stats AS ps\r\n\t\t\tON ps.partition_id = p.partition_id\r\n\tWHERE ix.object_id IN (OBJECT_ID('Person.Person'), OBJECT_ID('Person.Person_gzip'), OBJECT_ID('Person.Person_nvarchar'))\r\nGROUP BY ix.object_id, p.rows\r\nORDER BY [schema], [object_name]\r\n<\/pre>\n<p><a href=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/03\/table_sizes.png\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqldoubleg.live-website.com\/wp-content\/uploads\/2016\/03\/table_sizes.png\" alt=\"table_sizes\" width=\"365\" height=\"110\" class=\"aligncenter size-full wp-image-182\" srcset=\"https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/03\/table_sizes.png 365w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/03\/table_sizes-300x90.png 300w, https:\/\/www.sqldoubleg.com\/wp-content\/uploads\/2016\/03\/table_sizes-150x45.png 150w\" sizes=\"(max-width: 365px) 100vw, 365px\" \/><\/a><\/p>\n<p>We can observe that the original table with the <strong>XML<\/strong> data types is the biggest, followed by <strong>NVARCHAR(MAX)<\/strong> and the winner is obviously <strong>COMPRESSED<\/strong>. So far it does what it says on the tin \ud83d\ude42<\/p>\n<p>But probably now, you&#8217;re ready to ask me the question I&#8217;ve been asked the most. What about performance?<\/p>\n<p>&nbsp;<br \/>\n<strong>The performance<\/strong><\/p>\n<p>Almost nothing in this life comes for free, so what is the penalty we have to pay for keeping our SAN admin happy? Let&#8217;s find it out with some queries. <\/p>\n<p>First let&#8217;s go for a full <del>table<\/del> clustered index scan<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\nUSE [AdventureWorks2016CTP3]\r\nGO\r\nSET STATISTICS IO, TIME ON\r\nGO\r\n\r\nSELECT * FROM Person.Person\r\n--(19972 row(s) affected)\r\n--Table 'Person'. Scan count 1, logical reads 3838, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\r\n\r\n-- SQL Server Execution Times:\r\n--   CPU time = 31 ms,  elapsed time = 1318 ms.\r\n\r\nSELECT * FROM Person.Person_gzip\r\n--(19972 row(s) affected)\r\n--Table 'Person_gzip'. Scan count 1, logical reads 1290, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\r\n\r\n-- SQL Server Execution Times:\r\n--   CPU time = 47 ms,  elapsed time = 430 ms.\r\n\r\nSELECT * FROM Person.Person_nvarchar\r\n--(19972 row(s) affected)\r\n--Table 'Person_nvarchar'. Scan count 1, logical reads 3173, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\r\n\r\n-- SQL Server Execution Times:\r\n--   CPU time = 46 ms,  elapsed time = 600 ms.\r\n\r\n<\/pre>\n<p>And once again, when doing a full scan, the victory is for the smallest table, hence the result is the same, XML, NVARCHAR(MAX), COMPRESSED. That is reasonable, the less you read, the faster you finish.<\/p>\n<p>But there must be some caveats round the corner, it can&#8217;t be all benefit.<\/p>\n<p>For this query I&#8217;ll leave out of the equation the original table as XML cannot be queried as normal string and I feel lazy to write xquery now \ud83d\ude42<\/p>\n<p>I&#8217;m going to try find all married people, and I&#8217;ve seen there is an attribute in the XML that contains that info, I&#8217;d query it as a string just for demonstration purposes, do not take this as best practice, if you need to query XML, be sure you use XML data type and xquery (combined with XML indexes for best performance)<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\n-- if we look for a value within the compressed column\r\nSELECT CAST(DECOMPRESS(Demographics) AS NVARCHAR(MAX)), * \r\n\tFROM Person.Person_gzip\r\n\tWHERE CAST(DECOMPRESS(Demographics) AS NVARCHAR(MAX)) LIKE '%&lt;MaritalStatus&gt;M%' \r\n\r\n-- same search within the nvarchar column\r\nSELECT * \r\n\tFROM Person.Person_nvarchar\r\n\tWHERE Demographics LIKE '%&lt;MaritalStatus&gt;M%' \r\n<\/pre>\n<p>And here is the penalty<\/p>\n<pre class=\"brush: tsql; title: ; notranslate\" title=\"\">\r\n\/*\r\nSQL Server parse and compile time: \r\n   CPU time = 0 ms, elapsed time = 0 ms.\r\n\r\n(10011 row(s) affected)\r\nTable 'Person_gzip'. Scan count 1, logical reads 1289, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\r\n\r\n SQL Server Execution Times:\r\n   CPU time = 1812 ms,  elapsed time = 2958 ms.\r\n\r\n(10011 row(s) affected)\r\nTable 'Person_nvarchar'. Scan count 1, logical reads 3173, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\r\n\r\n SQL Server Execution Times:\r\n   CPU time = 282 ms,  elapsed time = 992 ms.\r\n*\/\r\n<\/pre>\n<p>Decompressing each value to find something is not that cheap anymore. After many runs I can see that consistently takes about 3 times the total time and about 5 times the CPU time than if you don&#8217;t have to decompress it.<br \/>\n&nbsp;<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>COMPRESS is a nice feature, but be sure that fits your use case, because trying to save in storage can impact the overall performance very badly.<br \/>\n&nbsp;<br \/>\n&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>More new functionalities coming along with SQL Server 2016, this time I&#8217;ve been playing with COMPRESS and DECOMPRESS&nbsp;One of the most exciting things for me in this new version of SQL Server&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,21],"tags":[30,23,25],"_links":{"self":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/181"}],"collection":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/comments?post=181"}],"version-history":[{"count":0,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/posts\/181\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/media?parent=181"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/categories?post=181"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqldoubleg.com\/es\/wp-json\/wp\/v2\/tags?post=181"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}