It is important to study the underwater landscape while fishing. There are always nuances to every location, and by finding these “spot on the spot” locations, you can know where to focus your efforts. For example, you might be fishing a large weedy bay that spans several acres. Before diving in and fishing the middle, you’d want to fish the weed edges, where the predators hide just inside the weeds, ready to ambush any bait fish that foolishly wanders into open water. An example of a spot on a spot would be a clump of weeds that extends out of the main weedbed and into the open water. This clump could conceal a large predator and at the same time give the predator a way to watch a lot of open water. Another spot might be a small hole in the main weedbed. A predator can wait for a baitfish to swim out into the hole where there is no cover, and then rush out and strike. If you aren’t paying close attention to the underwater environment while you are fishing, it can be easy to miss these spots.
We recently upgraded one of our third party applications (our biggest one, in fact), and we ran into some problems that involved a very close study of SQL Server to resolve. After the upgrade I was told by some of the application’s support staff that other customers who had upgraded were having problems (not a good sign). The main problem was TempDB filling up and not releasing it’s used space, and the only way to fix was to reboot SQL Server or recycle all of the application’s services. This was a problem that the application’s developers had not been able to figure out, so of course I was lured by the challenge of figuring out a tough problem.
The first problem was that the vendor could not see TempDB filling up. They only saw the files growing. They confirmed that they were monitoring the tempdb space used by clicking on the properties in the SSMS object explorer and looking at the space available in the General tab.
I found that this uses the sp_spaceused system stored procedure to find these values.
In this MSDN article, Microsoft explains that the unallocated_space considers only data pages. We can see that it does not consider internal objects by looking at the sys.dm_db_file_space_usage dynamic management view. This view is specific to TempDB and shows how much space is being used by each different TempDB object – Version Store, User Object, and Internal Object.
By looking at this screenshot, you can already see that far more than 1 GB of space is being used. Each page hold 8KB of data and there are 520,440 pages being used by internal objects – that’s 4.16 GB being used. This explains why the support staff was not able to see TempDB filling up. I had already set up monitoring on our system to track this as well, and I was able to pull up the following graph:
So I was able to establish that TempDB was filling up with Internal Objects – now the question was why. Microsoft explains here that the Internal Objects are created by the database engine and are used for, among other things, storing intermediate results for spools or sorting, as well as cursor results. Tempdb usually deallocates these objects after they are finished being used, but that is often triggered by a connection to the database being closed. So let’s look at the connections to SQL Server by the third party application. The app uses a connection pool which aims to reduce connection overhead by leaving connections open for a certain period of time. I created a job to look at our connections once every five minutes to see how long they’d been open and how much internal object had been allocated and deallocated. I did this through the use of the stored procedure sp_who2 joined to the dmv’s sys.dm_exec_sessions, sys.dm_db_task_space_usage, and sys.dm_db_session_space_usage. Here is the SQL statement that I put together:
CREATE TABLE #sp_who2 ( SPID INT , Status VARCHAR(1000) NULL , Login SYSNAME NULL , HostName SYSNAME NULL , BlkBy SYSNAME NULL , DBName SYSNAME NULL , Command VARCHAR(1000) NULL , CPUTime INT NULL , DiskIO INT NULL , LastBatch VARCHAR(1000) NULL , ProgramName VARCHAR(1000) NULL , SPID2 INT , RequestID INT NULL , rundate DATETIME DEFAULT ( GETDATE() ) ) INSERT INTO #sp_who2 ( SPID , Status , Login , HostName , BlkBy , DBName , Command , CPUTime , DiskIO , LastBatch , ProgramName , SPID2 , RequestID ) EXEC sp_who2 CREATE TABLE #DBCCOutput ( EventType NVARCHAR(100) , Parameters INT , EventInfo VARCHAR(MAX) ) CREATE TABLE #SQL ( spid INT , EventType NVARCHAR(100) , Parameters INT , EventInfo VARCHAR(MAX) ) DECLARE @max INT; DECLARE @i INT; SELECT @max = MAX(session_id) FROM sys.dm_exec_sessions SET @i = 51 WHILE @i <= @max BEGIN IF EXISTS ( SELECT session_id FROM sys.dm_exec_sessions WHERE session_id = @i ) INSERT #DBCCOutput EXEC ( 'DBCC INPUTBUFFER (' + @i + ')' ) INSERT INTO #SQL ( spid , EventType , Parameters , EventInfo ) SELECT @i , EventType , Parameters , EventInfo FROM #DBCCOutput; TRUNCATE TABLE #DBCCOutput; SET @i = @i + 1 END; WITH all_task_usage AS ( SELECT session_id , SUM(internal_objects_alloc_page_count) AS task_internal_objects_alloc_page_count , SUM(internal_objects_dealloc_page_count) AS task_internal_objects_dealloc_page_count FROM sys.dm_db_task_space_usage GROUP BY session_id ), InternalObjectPages AS ( SELECT R1.session_id , R1.internal_objects_alloc_page_count + R2.task_internal_objects_alloc_page_count AS session_internal_objects_alloc_page_count , R1.internal_objects_dealloc_page_count + R2.task_internal_objects_dealloc_page_count AS session_internal_objects_dealloc_page_count FROM sys.dm_db_session_space_usage AS R1 INNER JOIN all_task_usage AS R2 ON R1.session_id = R2.session_id WHERE R1.internal_objects_alloc_page_count + R2.task_internal_objects_alloc_page_count <> 0 OR R1.internal_objects_dealloc_page_count + R2.task_internal_objects_dealloc_page_count <> 0 ) SELECT sp_sql.spid , ses.login_time , ses.last_request_start_time , ses.last_request_end_time , sp_sql.EventInfo , spwho2.Login , spwho2.HostName , spwho2.DBName , InternalObjectPages.session_internal_objects_alloc_page_count , InternalObjjectPages.session_internal_objects_dealloc_page_count , ses.host_process_id, GETDATE() AS CaptureDate FROM InternalObjectPages INNER JOIN #sp_who2 spwho2 ON spwho2.SPID = InternalObjectPages.session_id INNER JOIN #SQL sp_sql ON sp_sql.spid = spwho2.SPID INNER JOIN sys.dm_exec_sessions ses ON ses.session_id = InternalObjectPages.session_id; DROP TABLE #SQL DROP TABLE #DBCCOutput DROP TABLE #sp_who2
And here are the results:
Note that these are all connections from one of the application servers to the database using the app’s SQL login. The EventInfo field is the last SQL that was run. If you compare the login time with the last request start and end times, you can see some connections have been open a long time. In the case of spid 182, the connection had been open for a day and a half but it had not be doing anything for the last 14 hours. You can also see that there were far more pages overall that had been allocated than had been deallocated. Pages can be deallocated directly by the code, or when the connection is closed. It appears the application’s code was not properly deallocating the internal object, and since the connections were being reused, the internal object was stuck indefinitely in a state of allocation, even though it was not really being used anymore.
I was able to prove this by doing a reboot of the Host Application2, then running the SQL statment again:
When comparing this screenshot to the one directly above, you can see that all the internal objects that were allocated by Application2 (primarily spids 182,74,242,234…) were now gone. The connections closed and the internal objects were immediately deallocated.
Here’s my overall TempDB space usage after the reboot:
At this point I had proved that these connections were causing TempDB to blow up, but what exactly was sitting in TempDB taking up space? Look out for Part2 to see how I looked in even more detail to figure out what was really happening.