In-Memory logging, much cooler than disk based

This week I want to look at one of the most important parts, in my opinion, that allow In-Memory to outperform and scale better than the traditional engine, and this is the transaction log usage. Last week we started exploring some of the cool features that the new In-Memory engine (code name Hekaton) has brought into play, looking at the differences between Non-Clustered indexes when they are memory optimized or disk based. That was cool.

This week I want to look at one of the most important parts, in my opinion, that allow In-Memory to outperform and scale better than the traditional engine, and this is the transaction log usage.

Background

SQL Servers in order to guarantee Durability (that’s the D from ACID) uses the transaction log if a way that every operation has to be written first there before sending the commit acknowledgment to the client.

This is called Write-Ahead logging, and due to its own nature, there are some obvious scalability problems.

Think about it, if each record we insert, update or delete has to be written to the transaction log first and there is only one transaction log file per database, that can be a real bottleneck for highly transactional system, where multiple processes will be competing each other to make those writes to keep processing data.

In-Memory Engine

According to the original whitepaper one of the biggest challenges when developing the new engine was scalability and as we’ve seen, it’d be difficult if things kept being done the same way.

But on the other hand, working with datasets exclusively in memory, would any failure happens, recovery might be impossible, hence there is no other way than hardening that data to disk to guarantee the Durability of each transaction which is committed.

What is cool is how this is achieved, but as always, let me write some queries to show you.

I’ll be using [WideWorldImporters] database if you want to follow me.

USE WideWorldImporters
GO
SELECT VehicleTemperatureID, VehicleRegistration, ChillerSensorNumber, RecordedWhen, Temperature, FullSensorData, IsCompressed, CompressedSensorData
INTO [Warehouse].[VehicleTemperatures_DiskBased] 
FROM [Warehouse].[VehicleTemperatures]
GO

That would create an identical (~ish) copy of one of our In-Memory tables, but disk based so we can see the difference when we DELETE a number of rows.

CHECKPOINT
GO

BEGIN TRANSACTION DEL_100_DISK_BASED
 
DELETE TOP(100) t
FROM [Warehouse].[VehicleTemperatures_DiskBased] AS t
 
COMMIT
GO
 
SELECT *
INTO #t
FROM sys.fn_dblog(NULL, NULL)
 
SELECT [Current LSN]
		, Operation
		, Context
		, [Transaction ID]
		, [Transaction Name]
		, AllocUnitId
		, [Page ID]
		, [Slot ID]
		, [Lock Information]
		, Description
FROM #t
WHERE [Transaction ID] = (SELECT [Transaction ID] FROM #t WHERE [Transaction Name] = 'DEL_100_DISK_BASED')
ORDER BY [Current LSN] ASC

See how for each of the records we have deleted, a record in the transaction log is also required, not very efficient but that’s how it is.

Now we can see that it get even worse if we ROLLBACK our transaction.

CHECKPOINT
GO

BEGIN TRANSACTION DEL_100_DISK_BASED_ROLLBACK
 
DELETE TOP(100) t
FROM [Warehouse].[VehicleTemperatures_DiskBased] AS t
 
ROLLBACK
GO
 
SELECT *
INTO #t2
FROM sys.fn_dblog(NULL, NULL)
 
SELECT [Current LSN]
		, Operation
		, Context
		, [Transaction ID]
		, [Transaction Name]
		, AllocUnitId
		, [Page ID]
		, [Slot ID]
		, [Lock Information]
		, Description
FROM #t2
WHERE [Transaction ID] = (SELECT [Transaction ID] FROM #t2 WHERE [Transaction Name] = 'DEL_100_DISK_BASED_ROLLBACK')
ORDER BY [Current LSN] ASC

Now we need double the rows, because for each row we’ve said it’s been deleted, we have to tell SQL Server that was not actually deleted (COMPENSATION due to ROLLBACK) in case of recovery (crash recovery or backup recovery). That’s so bad.

But not everything is lost yet 🙂 let’s check how the In-Memory engine deal with this problem

CHECKPOINT
GO

BEGIN TRANSACTION DEL_100_IN_MEMORY
 
DELETE TOP(100) t
FROM [Warehouse].[VehicleTemperatures] AS t
 
COMMIT
GO
 
SELECT *
INTO #t3
FROM sys.fn_dblog(NULL, NULL)
 
SELECT [Current LSN]
		, Operation
		, Context
		, [Transaction ID]
		, [Transaction Name]
		, AllocUnitId
		, [Page ID]
		, [Slot ID]
		, [Lock Information]
		, Description
FROM #t3
WHERE [Transaction ID] = (SELECT [Transaction ID] FROM #t3 WHERE [Transaction Name] = 'DEL_100_IN_MEMORY')
ORDER BY [Current LSN] ASC

That’s better! According to the whitepaper Hekaton’s transaction log is designed for high efficiency and scale. Each transaction is logged in a single, potentially large, log record.

SQL Server would write one single record for the whole transaction as this has been committed, but how does it work when we rollback our transaction?

CHECKPOINT
GO

BEGIN TRANSACTION DEL_100_IN_MEMORY_ROLLBACK
 
DELETE TOP(100) t
FROM [Warehouse].[VehicleTemperatures] AS t
 
ROLLBACK
GO
 
SELECT *
INTO #t4
FROM sys.fn_dblog(NULL, NULL)
 
SELECT [Current LSN]
		, Operation
		, Context
		, [Transaction ID]
		, [Transaction Name]
		, AllocUnitId
		, [Page ID]
		, [Slot ID]
		, [Lock Information]
		, Description
FROM #t4
WHERE [Transaction ID] = (SELECT [Transaction ID] FROM #t4 WHERE [Transaction Name] = 'DEL_100_IN_MEMORY_ROLLBACK')
ORDER BY [Current LSN] ASC

Nothing, nada, rien, nichts, ничего. That’s even better!

And that is because in this case, the transaction log file contains only enough information to redo committed transactions, the whole set of changes goes to CHECKPOINT files, which are out of the scope of this post, but I’ll talk about them in the future for sure since they are key in all this process.

Conclusion

The changes done in the way SQL Server can provide Durability through the In-Memory engine are really impressive and can take out all the contention created by having just one transaction log file, by reducing its usage to the minimum.

There is still a lot to explore in this topic, so stay tuned for future posts!

Thanks for reading.

Deja una respuesta Cancelar la respuesta