Steve Stedman’s Corruption Challenge #3

This Saturday is the fishing opener here in Wisconsin.  The DNR designates some time in spring when the only fishing allowed on most inland waters is panfishing.  This rule was created to protect the fish, since most species spawn in the spring.  I can’t wait to get out there on Saturday and try to catch some opening day fish!  This is my first opener owning my own boat, so I can certainly say I’ve never been looking forward to a fishing opener this much.  Hopefully my next post will have a picture or two of a beautiful 2015 muskie.

SQL Server blogger Steve Stedman recently started a weekly database corruption challenge. The challenges each involve the simulation of some sort of corruption or database failure, and we are tasked with recovering without losing any data. I missed the first two weeks, but decided to spend a bit of time last Saturday morning working on the third challenge. Here is the scenario Steve set up:

Here is the timeline as things occurred

At 2:53pm a full backup was performed. After that full backup, users continued to do work and the database continued to grow.

At 2:54pm a transaction log backup was performed. Followed by more changes to the database.

At 3:01pm another transaction log backup was performed. Followed by more changes to the database.

At 3:12 another transaction log backup was performed. Followed by more changes to the database.

At 3:19 a catastrophic failure occurred causing the operating system running SQL Server to reboot. Let’s blame it on a power surge, or maybe it was just a DBA who accidently bumped the power cord. I like the power surge idea, so that I don’t get blamed for kicking the power cord.

After the server rebooted, several other databases on this SQL Server recovered just fine, however the CorruptionChallenge3 database would not come on line. After examining the SQL Server data directory, it was discovered that the single .mdf file for this database was gone, missing, just not there. The only thing that remained on this SQL Server was the .ldf file for this database.

20150429 corruption challenge db recovering

Steve included five files needed to recover all the data – the last full backup, the three transaction log backups that were taken between the last full backup and the disaster event, and the log file.

20150429 corruption challenge files given

After thinking about the problem briefly, I formulated my recovery plan. I would restore the full backup and each of the transaction log backups. This would bring the database current through the time of the last backup. Then I would detach the database and swap out the log file with the one provided by Steve, and take another log backup. Then I could use this new log backup to bring the database online with all transactions accounted for. The first part of my plan went according to plan:

USE [master]
RESTORE DATABASE [CorruptionChallenge3] FROM  DISK = N'\\server\Gugg\CorruptionChallenge3_Full.bak' WITH  FILE = 1,  MOVE N'CorruptionChallenge3' TO N'X:\SQLData\CorruptionChallenge3.mdf',  MOVE N'CorruptionChallenge3_log' TO N'Y:\SQLLogs\CorruptionChallenge3_log.LDF',  NORECOVERY,  NOUNLOAD,  STATS = 5
RESTORE LOG [CorruptionChallenge3] FROM  DISK = N'\\server\Gugg\TransLog_CorruptionChallenge30.trn' WITH  FILE = 1,  NORECOVERY,  NOUNLOAD,  STATS = 5
RESTORE LOG [CorruptionChallenge3] FROM  DISK = N'\\server\Gugg\TransLog_CorruptionChallenge31.trn' WITH  FILE = 1,  NORECOVERY,  NOUNLOAD,  STATS = 5
RESTORE LOG [CorruptionChallenge3] FROM  DISK = N'\\server\Gugg\TransLog_CorruptionChallenge32.trn' WITH  FILE = 1,  RECOVERY,  NOUNLOAD,  STATS = 5
EXEC MASTER.dbo.sp_detach_db @dbname = N'CorruptionChallenge3'
GO

However, when I tried to re-attach the database after replacing the log file, I got the following message:
One or more files do not match the primary file of the database. If you are attempting to attach a database, retry the operation with the correct files. If this is an existing database, the file may be corrupted and should be restored from a backup.
Log file ‘R:\SQLLogs\CorruptionChallenge3_log.LDF’ does not match the primary file. It may be from a different database or the log may have been rebuilt previously.
At this point I remembered some Oracle training I’d done regarding Log Sequence Numbers (LSNs). When connecting a database, the LSN in the log has to match the LSN in the data file. At this point I couldn’t think of what else to do. I could not think of any way to get the data out of the log file without taking a backup of it, and I couldn’t take a backup without first attaching it. I thought about it for another 15 minutes and then turned my attention to other tasks I had to get done that Saturday.
When Steve revealed the answer on Tuesday, I was disappointed by how close I came, but I missed an important detail. In order to take a tail-log backup without a valid data file, you need to set a “dummy” version of the database offline, delete the data file, replace the log file, and the when you turn it back online you’ll get an error message but you will still be able to take a tail-log backup. (See this post by Paul Randal for details.) So I should have done this:

USE [master]
GO
CREATE DATABASE [CorruptionChallenge3] ON 
( NAME = 'CorruptionChallenge3', FILENAME = N'S:\SQLData\CorruptionChallenge3.mdf' ),
( NAME = 'CorruptionChallenge3_log', FILENAME = N'R:\SQLLogs\CorruptionChallenge3_log.LDF' )
GO

ALTER DATABASE CorruptionChallenge3 SET OFFLINE;

--Delete the mdf file and replace the ldf file

ALTER DATABASE CorruptionChallenge3 SET ONLINE

20150429 corruption challenge files inaccesible

BACKUP LOG CorruptionChallenge3 TO DISK = N'\\server\Gugg\Final_Log_Backup.trn' WITH INIT, NO_TRUNCATE;
GO

The key here is to use the NO_TRUNCATE option. With this I am able to take a log backup, and now have one full backup and four log backups, enough to restore the database with no data loss!

I have a busy weekend coming up since it’s the fishing opener on Saturday, but I hope to be able to work on Corruption Challenge #4, coming up on Friday.

Profiler and the SQL Server Trace

The use of sonar, or “fish finders”, has had a dramatic impact on the sport of fishing. In the past the area below the surface of the water was generally a big mystery. You could tell the depth in a specific spot by dropping an anchor or letting a jig run down to the bottom, but they gave you little idea of the depth around you. Is the bottom flat or sloping downward? Is it muck, gravel, rock, or weeds? You can even see congregations of bait-fish and the thermocline on most modern day sonar units.

I remember the first time I fished from a boat with a sonar unit. I assumed it’d be incredibly easy to start pulling in fish. The sonar would point them out, I’d drop a bait in front of their faces, and ten minutes later I’d be home with a live-well full of fish. In reality, it doesn’t work like that. While the sonar does give you a much better perspective on the environment below the surface of the water, it still takes a lot of skill and a little luck to read that environment and present some bait near some hungry fish to entice a bite.

In SQL Server, we can use a trace to look at what is going on “beneath the surface”. Up through SQL Server 2008 R2, a trace is one of the best ways to identify long running transactions, find deadlocks, measure stored procedures’ performance, and try to figure out what that third party software application is doing to your database. After 2008 R2, extended events become the preferred method to do these things, but this post will be focusing on the trace.

A SQL trace is events and data that are collected through the use of some internal SQL stored procedures. The events are specific things that happen within SQL Server, and the data is the details that are associated with those events. Through the use of filters we are able to zoom in on only the events we want to see, and only the details about those events we want to record. It is important to use the filters wisely because running a trace where many events and data are being recorded can significantly slow down the system. Care should always be taken when running a trace in a Production environment!

To get into Profiler, choose the option from the Tools menu in SQL Server Management Studio (SSMS):
20150422 Trace Profiler in SSMS
After choosing which Server Instance to run the trace on, you can choose which events and columns (the details) you want to trace. Use the “Show all events” and “Show all columns” checkboxes to display all the events and data you may want to monitor.20150422 Trace Profiler Trace Definition Clicking on the Column Filters… button will bring up a dialog box to choose the filters. You can filter on any of the columns that are selected for the trace.

20150422 Trace Filters After clicking Run, the Trace will begin. One great option for traces that you will be running often is to save your columns, filters, and events as a trace template. After starting the trace, go to File -> Save As -> Trace Template… to save this template. This allows you to simply choose the template next time you go into Profiler, instead of having to once again set up all the columns, filters, and events.
20150422 Trace Save Template One template I like to set up is for performance tuning a stored procedure. For this template I set a filter so the only login it grabs is mine. Then I choose the Stored Procedures RPC:Completed and TSQL SQL:BatchCompleted events, and make sure I have the CPU, Reads, Writes, and Duration columns selected.
20150422 Trace Performance Tuning With this setup, I can execute a stored procedure or SQL statement, see the amount of CPU, Reads, Writes, and Duration, then make a change to the SQL and execute it again. This will let me see whether the change I made to the SQL statement resulted in better performance, worse performance, or didn’t change the performance.

One last thing I strongly suggest using Trace for is to set up a lightweight trace that runs every day to capture certain events you may want to look back on for troubleshooting. This may include deadlocks, long blocks, stored procedures that take a long time to run and may need to be tuned, and sysadmin logins. If you define this trace in Profiler, you can get the SQL needed to create it by going to File -> Export -> Script Trace Definition -> For SQL Server 20XX…
20150422 Trace Save Definition This will give you the SQL needed to set up the Trace directly in TSQL:

/****************************************************/
/* Created by: SQL Server 2012  Profiler          */
/* Date: 04/22/2015  09:32:19 AM         */
/****************************************************/


-- Create a Queue
DECLARE @rc INT
DECLARE @TraceID INT
DECLARE @maxfilesize bigint
SET @maxfilesize = 5 

-- Please replace the text InsertFileNameHere, with an appropriate
-- filename prefixed by a path, e.g., c:\MyFolder\MyTrace. The .trc extension
-- will be appended to the filename automatically. If you are writing from
-- remote server to local drive, please use UNC path and make sure server has
-- write access to your network share

EXEC @rc = sp_trace_create @TraceID output, 0, N'InsertFileNameHere', @maxfilesize, NULL 
IF (@rc != 0) GOTO error

-- Client side File and Table cannot be scripted

-- Set the events
DECLARE @on bit
SET @on = 1
EXEC sp_trace_setevent @TraceID, 10, 1, @on
EXEC sp_trace_setevent @TraceID, 10, 9, @on
EXEC sp_trace_setevent @TraceID, 10, 2, @on
EXEC sp_trace_setevent @TraceID, 10, 10, @on
EXEC sp_trace_setevent @TraceID, 10, 6, @on
EXEC sp_trace_setevent @TraceID, 10, 8, @on
EXEC sp_trace_setevent @TraceID, 10, 11, @on
EXEC sp_trace_setevent @TraceID, 10, 12, @on
EXEC sp_trace_setevent @TraceID, 10, 13, @on
EXEC sp_trace_setevent @TraceID, 10, 14, @on
EXEC sp_trace_setevent @TraceID, 10, 15, @on
EXEC sp_trace_setevent @TraceID, 10, 16, @on
EXEC sp_trace_setevent @TraceID, 10, 17, @on
EXEC sp_trace_setevent @TraceID, 10, 18, @on
EXEC sp_trace_setevent @TraceID, 10, 35, @on
EXEC sp_trace_setevent @TraceID, 12, 1, @on
EXEC sp_trace_setevent @TraceID, 12, 9, @on
EXEC sp_trace_setevent @TraceID, 12, 11, @on
EXEC sp_trace_setevent @TraceID, 12, 6, @on
EXEC sp_trace_setevent @TraceID, 12, 8, @on
EXEC sp_trace_setevent @TraceID, 12, 10, @on
EXEC sp_trace_setevent @TraceID, 12, 12, @on
EXEC sp_trace_setevent @TraceID, 12, 13, @on
EXEC sp_trace_setevent @TraceID, 12, 14, @on
EXEC sp_trace_setevent @TraceID, 12, 15, @on
EXEC sp_trace_setevent @TraceID, 12, 16, @on
EXEC sp_trace_setevent @TraceID, 12, 17, @on
EXEC sp_trace_setevent @TraceID, 12, 18, @on
EXEC sp_trace_setevent @TraceID, 12, 35, @on


-- Set the Filters
DECLARE @intfilter INT
DECLARE @bigintfilter bigint

EXEC sp_trace_setfilter @TraceID, 6, 0, 6, N'Gugg'
EXEC sp_trace_setfilter @TraceID, 6, 0, 1, NULL
EXEC sp_trace_setfilter @TraceID, 10, 0, 7, N'SQL Server Profiler - c3623d60-e4fd-44d6-a327-6dbb97563197'
EXEC sp_trace_setfilter @TraceID, 10, 0, 7, N'SQL Server Profiler - a2b0f7dd-70a4-45e0-a918-01abd7447cf5'
-- Set the trace status to start
EXEC sp_trace_setstatus @TraceID, 1

-- display trace id for future references
SELECT TraceID=@TraceID
GOTO finish

error: 
SELECT ErrorCode=@rc

finish: 
GO

Now you can set up a SQL Agent job to start the trace each morning, stop the trace each night, save the trace as a file, and save off any relevant results to specific tables. I like to save off deadlocks in their own table, long running blocks in their own table, and long running stored procedure in their own table.
Using this method I also get a log of trace files, so if anyone wants to know if we had any database issues 8 days ago at 11:50 AM, I can pull open the trace file from that day and take a look.
Lastly, I can set up a script to look at the current trace file and look for any unusual results. If I see a stored procedure is taking an average of seven seconds to run while historically it took just two seconds, I can ask myself what has changed with this procedure or the way it is being called as a way to troubleshoot poor performance with the entire system. Here is the script I use. It first provides a summary of the day. It then works with a table where I keep historical results called dbo.HistoricalPerformance to find stored procedures that are performing poorly compared to past performance. It assumes a trace naming convention that includes the date in the file name.

DECLARE @TraceFile NVARCHAR(255);

SELECT  @TraceFile = 'C:\SQL\Traces\Trace_' + CONVERT(CHAR(8), GETDATE(), 112)
        + 'A.trc';


IF ( SELECT OBJECT_ID('tempdb..#trace')
   ) IS NOT NULL
    DROP TABLE #trace;


IF ( SELECT OBJECT_ID('tempdb..#Results')
   ) IS NOT NULL
    DROP TABLE #Results;


SELECT  IDENTITY( INT,1,1 ) AS TraceId ,
        te.name AS EventName ,
        CAST(TEXTData AS VARCHAR(MAX)) AS TEXTData ,
        CPU ,
        Reads ,
        CAST(Duration / 1000000.0 AS DEC(7, 1)) AS Duration ,
        StartTime ,
        ISNULL(DatabaseName, d.name) AS DatabaseName ,
        RowCounts ,
        EndTime ,
        Writes ,
        ClientProcessID ,
        EventClass ,
        ApplicationName ,
        NTUserName ,
        SPID ,
        LoginName ,
        HostName ,
        t.DatabaseID
INTO    #trace
FROM    fn_trace_gettable(@TraceFile, 1) t
        JOIN sys.trace_events te ON t.EventClass = te.trace_event_id
        LEFT JOIN sys.databases d ON t.DatabaseID = d.database_id;


-- show activity --
SELECT  COUNT(*) AS [Total Trace Rows Since 7AM] ,
        [Rows per minute] = CAST(CAST(COUNT(*) AS FLOAT) / DATEDIFF(MINUTE, DATEADD(HOUR, 7, DATEADD(DAY,DATEDIFF(DAY, 0, GETDATE()), 0)), GETDATE()) AS DECIMAL(7, 1)) ,
        [Rows per minute last 5 minutes] = CAST(( SELECT    COUNT(*)
                                                  FROM      #trace
                                                  WHERE     EventName IN ( 'SQL:BatchCompleted','RPC:Completed' )
                                                            AND EndTime > DATEADD(MINUTE,-5, ( SELECT MAX(EndTime) FROM #trace))
                                                ) / 5.0 AS DECIMAL(7, 1)) ,
        [Rows in 60 seconds] = ( SELECT COUNT(*)
                                 FROM   #trace
                                 WHERE  EventName IN ( 'SQL:BatchCompleted',
                                                       'RPC:Completed' )
                                        AND EndTime > DATEADD(SECOND, -60, ( SELECT MAX(EndTime) FROM #trace))
                               ) ,
        [Query duration > 29.9] = SUM(CASE WHEN Duration >= 29.9
                                                AND TEXTData NOT LIKE 'exec sp_execute 1'
                                                AND DatabaseName NOT IN (
                                                'msdb', 'master' ) THEN 1
                                           ELSE 0
                                      END) ,
        [Blocks] = ( SELECT COUNT(*)
                     FROM   #trace
                     WHERE  EventClass = 137
                            AND EndTime > DATEADD(HOUR, 7, DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0))
                   ) ,
        [Deadlocks] = ( SELECT  COUNT(*)
                        FROM    #trace
                        WHERE   EventClass = 148
                      ) ,
        [CPU / Duration] = ( SELECT CAST (SUM(CASE WHEN CPU / 1000. > Duration
                                                   THEN Duration
                                                   ELSE CPU / 1000.
                                              END) / SUM(Duration) AS DECIMAL(5, 2)) 
                             FROM   #trace
                             WHERE  EndTime > DATEADD(HOUR, 7, DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0))
                                    AND CPU > 1000
                                    AND DatabaseName NOT IN ( 'Master', 'msdb' )
                                    AND EventName NOT IN ( 'Exception',
                                                           'Blocked process report' )
                           )
FROM    #trace
WHERE   EndTime > DATEADD(HOUR, 7,
                          DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0))
        AND EventName NOT IN ( 'Exception', 'Blocked process report' )


--Get Procs Slower than normal
;
WITH    Exec11
          AS ( SELECT   DatabaseName ,
                        Duration ,
                        CPU ,
                        Reads ,
                        TEXTData ,
                        PATINDEX('%exec%', TEXTData) AS Loc
               FROM     #trace
               WHERE    ( EventName = 'RPC:Completed'
                          OR ( EventName = 'SQL:BatchCompleted'
                               AND TEXTData LIKE '%exec%'
                             )
                        )
                        AND DatabaseName NOT IN ( 'Distribution', 'msdb' )
                        AND TEXTData NOT LIKE '%sp_executesql%'
                        AND TEXTData NOT LIKE '%sp_prepare%'
                        AND ApplicationName = '.Net SqlClient Data Provider'
             ),
        Exec2
          AS ( SELECT   DatabaseName ,
                        Duration ,
                        CPU ,
                        Reads ,
                        SUBSTRING(TEXTData, Loc + 5, 100) AS TEXTData
               FROM     Exec1
             ),
        Exec3
          AS ( SELECT   DatabaseName ,
                        Duration ,
                        CPU ,
                        Reads ,
                        TEXTData ,
                        PATINDEX('% %', TEXTData) AS Loc
               FROM     Exec2
             ),
        Exec4
          AS ( SELECT   DatabaseName ,
                        Duration ,
                        CPU ,
                        Reads ,
                        CASE WHEN Loc > 0 THEN LEFT(TEXTData, Loc - 1)
                             ELSE TEXTData
                        END AS Procname
               FROM     Exec3
             )
    SELECT  t.DatabaseName ,
            t.Procname AS SlowerThanNormalProc ,
            COUNT(*) AS Cnt ,
            CAST(AVG(t.Duration) AS DECIMAL(5, 1)) AS AvgDuration ,
            MAX(h.AvgDuration) AS HistoricalAvgDuration ,
            CAST(CAST(AVG(t.Duration) AS FLOAT)
            / NULLIF(CAST(MAX(h.AvgDuration) AS FLOAT), 0) AS DECIMAL(8, 2)) AS DurationRatio ,
            AVG(t.CPU) AS AvgCpu ,
            MAX(h.AvgCPU) AS HistoricalAvgCpu ,
            CAST(CAST(AVG(t.CPU) AS FLOAT)
            / NULLIF(CAST(MAX(h.AvgCPU) AS FLOAT), 0) AS DECIMAL(8, 2)) AS CpuRatio ,
            AVG(t.Reads) AS AvgReads ,
            MAX(h.AvgReads) AS HistoricalAvgReads ,
            CAST(CAST(AVG(t.Reads) AS FLOAT)
            / NULLIF(CAST(MAX(h.AvgReads) AS FLOAT), 0) AS DECIMAL(8, 2)) AS ReadsRatio
    INTO    #Results
    FROM    Exec4 t
            LEFT JOIN dbo.HistoricalPerformance h ON t.DatabaseName = h.DatabaseName
                                                     AND t.Procname = h.ProcName
    GROUP BY t.DatabaseName ,
            t.Procname;

SELECT  * ,
        Cnt * AvgCpu AS SumCpu
FROM    #Results
ORDER BY 3 DESC

I hope this information has given you some ideas of ways to use SQL Trace to improve the health of your database.

SQL Server Patching Concerns

A wise man once said “Lures aren’t made to catch fish, they’re made to catch fishermen.” That wise man was my Pop, though I doubt he was the first one to say it. The fact is fishermen, not fish, are the ones buying the lures. This means if you are a lure manufacturer, you want something that fishermen will look at and think “wow, that HAS to work!” I love trying new lures, and think that giving the fish something they haven’t seen before is a good idea. However, you have to be careful that the lure was designed with the intention of catching fish, not just the eye of a naive fisherman. Lures don’t come with a guarantee, and if you buy a dud, too bad so sad.
You hope an expensive relational database management system would be different, but that isn’t always the case. Last week Microsoft released Service Pack 1 for SQL Server 2014. Eleven hours later, the download was disabled and this message appeared on the Microsoft SQL Server Blog:

Dear SQL Customers,

We have chosen to remove SQL Server 2014 Service Pack 1 (SP1) from the Microsoft Download Center due to a significant installation issue discovered shortly after release. We sincerely apologize for the inconvenience this has caused you.

We will fix the installation issue and release the Service Pack for production use in a few weeks. Although operationally equivalent, only the newly released SP1 will be serviced (security updates, cumulative updates, hotfixes). Therefore, it will be important that you replace the current version of SP1 (12.0.4050.0) with the upcoming one.

In the meantime, you can continue to test and use build 12.0.4050.0. If you are experiencing the issue below, please implement the steps described in the workaround section.

We will communicate further as we approach the release date and we appreciate your patience.

The issue is described like this:

Any SQL Server instance with SSISDB catalog enabled runs into the following installation error during SQL Server 2014 SP1 upgrade. The instance is then in an invalid and unusable state.

Error: 912, Severity: 21, State: 2.

Script level upgrade for database 'master' failed because upgrade step 'SSIS_hotfix_install.sql' encountered error 3602, state 251, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.

Error: 3417, Severity: 21, State: 3.

Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online. SQL Server shutdown has been initiated

Ouch! These errors are not what a DBA wants to see when applying a Service Pack to a SQL Server instance. Microsoft also released a workaround to get the instance back up and running, but with a major release like a Service Pack, which only occurs every one to two years, I’d like to have had more extensive testing before the release.
The real lesson here is that changes should ALWAYS be made to test instances before they are rolled out to an environment that is actually being used. Secondly, and this is just my opinion, unless there is a significant reason to apply updates immediately, wait a little while. Updates that break things are often found to be problematic fairly quickly, and waiting a day or a week before applying these noncritical patches can end up saving time and stress in the long run, as I’m sure anyone who tried to install SQL Server 2014 SP1 would agree.

Refreshing Data in Test From Prod

Before my Pop and I purchased our current boat last year, the only boat I had access to was a 12 foot fiberglass boat with a trolling motor and oars. Fiberglass boats are rather heavy, and the trolling motor would only push you around for a little while before it became necessary to row around with the oars. I’d spend hours rowing around our 750 acre lake trying to get to the good spots so I could catch fish. Often the wind would be blowing and I’d spend far more time rowing than fishing. It was a lot of work, but I did it happily because I love the sport.

As I gained experience, I realized it was wise to save the trolling motor for when I really needed it. The battery was a finite supply of power, and once I used it all up I was stuck rowing for the rest of the day.
I have an interesting situation at my current workplace where we have test data that needs to be refreshed periodically. This test data will either need to be refreshed in our test environment or our production environment. Often refreshing test data means taking a backup of a production database and restoring it over a test version, but in this case I actually need to delete out old test data from every table and insert new test data, based on production data. Each table has a key field with an identifier defining which set of data it belongs to – Production, Test in Production, and Test Environment. My process needs to take the Production data and replace the Test in Production or Test Environment data with it. The real challenge with this process is the massive amount of deletes and inserts cause the transaction log to overflow.

When I arrived at the workplace there was an existing process created by the software provider that was able to complete the job, but I observed it was not doing it very efficiently. They had a cursor that looped through each table, built four dynamic statments for each, and then executed those statements. The first statement deleted the Test data from the table. The second statement captured the Production data in a temporary staging table. The third statement updated the temporary staging table so the identifier indicated it was Test data. The fourth statement inserted the Prod data from the temporary staging table as Test data. The cursor would then move onto the next table. This process did a good job of running on its own, but it had to be babysat so the transaction log would not blow up.

The first thing I noticed is there is an extra set of reads, writes, and an update (which is a delete and write) for each table. There is no reason to put the Prod data into a temp table, update the table, then read from the temp table to put the data back into the original table. I rewrote the process so it would grab the prod data and insert it into the table as test data by substituting the Test identifier in a single step. This cut out the temporary staging table and the extra update.

I’m also not a fan of cursors. SQL is a set based language, and while looping through the tables is a legitimate reason for using a cursor, I decided to change it so it would print out the list of DELETE and INSERT statements. I would then take the output and paste it into a new SSMS window and run it. Doing this gave me greater control and flexiblity over which statement are run when, but it also made it easier to debug problems. Since arriving at this workplace we have moved the Test environment onto its own server. This server had a rather small drive for transaction logs, which is usually ok since we don’t run much on the Test environment, but was not sufficient for this process of reloading test data. I got to a specific table that was so large I couldn’t delete all the data in one statement; I needed to do it in chunks. I was able to tell which table was causing problems right away because I was running the delete statements in blocks of 50 tables at a time. If I had been using the original cursor method, I would have had a much harder time determining which table was causing problems.

One other solution I had tried was to change the DELETE statements so they only deleted rows that were different, and change the INSERT statement so they only inserted rows that were missing. I figured this would significantly reduce the amount of writes to the transaction log. However, I found this slowed down the overall processing way too much to be effective.

Here is the SQL I used to build my INSERT and DELETE statements:

--This script was created by Gugg to more efficiently copy data, for example to refresh
--test data from Prod.  Before running change your SSMS output
--to text, and make sure the max columns returned is set to 8192, the max allowed.  It will also help to turn off headers and change
--output to tab delimited.

--Company Copy
USE DBName

DECLARE @ToCo VARCHAR(3) = '125' --Test Data Identifier
DECLARE @FCo VARCHAR(3) = '820' --Production Data Identifier
DECLARE @ToDB VARCHAR(25) = 'DatabaseName_Test' --Destination Database Name
DECLARE @FromDB VARCHAR(25) = 'DatabaseName' --Source Database Name
DECLARE @ToServer VARCHAR(25) = 'TestServer' --Destination Server
DECLARE @FromServer VARCHAR(25) = 'TestServer' --Source Server, if different the servers must be Linked

--Don't need rowcount
SET NOCOUNT ON

--Make sure ToCo is not 820 - we will never want to overwrite production data
IF @ToCo = '820'
   RAISERROR ('Production data 820 should NOT be overridden',18,1)

--List of all tables and fields with a company column on it.
DECLARE @TablesAndFields TABLE (TID INT IDENTITY(1,1), tablename VARCHAR(255), compcolumnname VARCHAR(255), allcolumnnames VARCHAR(8000))

INSERT INTO @TablesAndFields
   (tablename, compcolumnname, allcolumnnames)
SELECT DISTINCT columns.TABLE_NAME ,columns.column_name, dbo.fnTableColumnsImproved(COLUMNS.TABLE_NAME,columns.TABLE_SCHEMA)
FROM information_schema.COLUMNS columns
INNER JOIN sys.objects tables ON columns.TABLE_NAME = tables.name
WHERE columns.table_schema = 'aaa'  --Only grab from TPS schema
AND columns.column_name <> 'id' --Don't need identity columns
AND columns.COLUMN_NAME LIKE '__COMP' --This is the Identifier naming convention
AND tables.[type] = ( 'U' ) --User tables only 
AND tables.schema_id = 5 --Redundant, only grab from TPS schema
AND tables.[name] NOT IN ( 'SCUSER', 'OEZPCS', 'PRPROC', 'SCUCLS', 'OEITXX', 'DWTRAN' ) --Exclude certain tables

DECLARE @r INT;

SET @r = 1;

--Loop until there are no more tables to go through
WHILE @r >0
BEGIN
   --Set up delete statements
   SELECT TOP 50 'DELETE ToCo FROM ' + @ToServer + '.' + @ToDB + '.cwi.' + tablename + ' ToCo WHERE ToCo.' + compcolumnname + ' = ' + @ToCo + ';' AS SQLDeleteStatement FROM @TablesAndFields ORDER BY tablename;

   --Set up insert statements
   SELECT TOP 50 'INSERT INTO '  + @ToServer + '.' + @ToDB + '.cwi.' + tablename + ' (' + allcolumnnames + ') SELECT ' + REPLACE(allcolumnnames,compcolumnname,@ToCo + ' AS ' + compcolumnname) + ' FROM '  + @FROMServer + '.' + @FROMDB + '.cwi.' + tablename + ' ToCo WHERE ' + compcolumnname + ' = ' + @FCo + ';' AS SQLInsertStatement FROM @TablesAndFields ORDER BY tablename;

   --delete out tables that have been processed
   ;WITH cte AS (SELECT TOP 50 * FROM @TablesAndFields ORDER BY tablename)
   DELETE 
   FROM cte 

   SET @r = @@ROWCOUNT

END

GO

Here is the function that is used to get the columns:

CREATE FUNCTION [dbo].[fnTableColumnsImproved]
   (@tablename VARCHAR(255),@schemaname VARCHAR(255) )

RETURNS VARCHAR(5000)
AS
BEGIN

DECLARE @output VARCHAR(5000)
    SELECT @output = COALESCE(@output + ', ', '') + column_name
    FROM information_schema.COLUMNS
    WHERE TABLE_NAME = @tablename
   AND TABLE_SCHEMA = @schemaname
   AND COLUMN_NAME <> 'ID'

    RETURN @output
END

GO

Using this script I was able to significantly improve a rather cumbersome process that has to be run two to four times per year.

Query Completion Time Estimate

Thunderstorms can be a dangerous temptation to a fisherman. Like King David gazing upon Bathsheba as she bathed, a fisherman looks toward the ominous, incoming clouds and sees opportunity. When a storm rolls into an area, it causes a big drop in barometric pressure. This is the weight of the atmosphere pushing down on everything, including the water. When that wieght drops quickly, it causes a huge rise in fish activity. I’ve caught some great fish right before a storm strikes. I can remember being right outside a bay on a lake up north as a storm rolled in. We hadn’t caught much all day, but a wall of thick black clouds was approaching quickly. I managed to hook into a nice sized walleye using a Lindy Rig, but lost it right before I could get it in the net. After that we had to get off the water. The key is to be smart about the situation. The lightening that accompanies these storms can be fatal, and out in the middle of a lake is not a good place to be. Fishing right before a storm is a balancing act of staying long enough to hook into some fish but leaving before conditions become dangerous.
Storm Northern
Similarly, as Database Administrators we are often forced to perform tasks within some tight windows. Software updates can involve both schema and data changes that need to be deployed (and possibly rolled back, if the software update is unsuccessful) within a short timeframe. Moving databases from one location to another is another task that may need to be done within a small window of time. One very interesting challenge that I encountered during my first few months as a DBA was a significant table change on a very large production table. This table held over a billion rows of detail records, and had an integer for the primary key/identity field. A little less than a month before it happened, we noticed we were running out of integer values for this field. We needed to change the field to a bigint, but this change required a significant outage.

It is helpful to have a way to measure progress when performing changes that take a long time. I found this query which has been helpful in estimating the time required to do some longer running tasks. I’ve found it works well for database backups and restores.

SELECT  percent_complete ,
        CONVERT(NUMERIC(10, 2), estimated_completion_time / 1000 / 60) AS Estimated_Minutes_Remaining ,
        start_time ,
        session_id ,
        status ,
        command ,
        sql_handle ,
        database_id ,
        blocking_session_id
FROM    sys.dm_exec_requests
WHERE   command <> 'TASK MANAGER'
        AND status <> 'background'
ORDER BY 1 DESC

The SQL Server Procedure Cache

There’s a lot to think about while fishing, and if you’re not giving it your full attention, you’re going to miss some fish. Muskie fishing in particular adds some additional things to focus on that you may not have to think of while fishing for other species of fish. The first is the “figure eight”. At the end of each retrieve you should smoothly transition into the figure eight by reeling until the beginning of your leader is almost to the end of your rod, then start pulling the lure in circles next to the boat. This gives any muskies that are following another chance to decide to bite. You want to make sure you transition into the figure eight in a smooth motion so the movement of the lure appears natural to the following fish. Many muskies are caught on figure eights each year. Another thing to focus on is lure speed. It’s easy to get in a “cruise control” mode and just crank away at a comfortable pace, but lure speed is a key component. Often times the muskie are lethargic and you need to slow down your presentation to elicit a strike. At other times you’ll want to really burn that lure through the water to try and provoke a reaction strike from the fish. In addition to these two details there are dozens of other things that you should be concentrating on if you want to be the most successful fisherman that you can be.

SQL Server has a lot of things to concentrate on too. One way that it tries to work more efficiently is through the use of the Procedure Cache. This is an area in memory where SQL Server holds query execution plans. This prevents SQL Server from having to build a new execution plan every time a query is run. Think of stored procedures that are run several times a minute – not having to create a new execution plan each time saves SQL Server a LOT of overhead.

There are two parts of an execution plan – the query plan and the execution context. The query plan is the data structure used by the query. This would contian the instructions with the objects that are being used in the query, such as tables, views, etc. There can be up to two copies of a query plan – one for serial executions and one for parallel executions. No matter the degree of parallelism, their is only one parallel execution plan. The exeuction context stores the variables or parameters used in the execution of a query.

Aside from the Procedure Cache, the other main component being stored in memory is the Data Buffer. SQL Server pulls data from the disk into memory for faster read and write operations. Knowing this, let’s think about the lifespan of these execution plans. We want to make sure SQL Server has memory available for incoming data requests and incoming query plans. This means there needs to be a mechanism to remove plans that no longer need to be kept in the Procedure Cache. SQL Server has an interesting way to keep more expensive and often used plans in the Procedure Cache while pushing out plans that are cheap or infrequently used.

When an execution plan is generated and inserted into the Procedure Cache, a cost is assigned to the plan based on the cost to compile the execution plan. Every time the execution plan is used the cost in the Procedure Cache is reset to the original cost needed to compile the plan. If SQL Server runs out of free memory, it realizes it needs to push some of the execution plans out of the Procedure Cache. To do this is examines each plan one at a time. If the cost of the plan is greater than zero, the cost is decrimented. If the cost of the plan reaches zero, it is eliminated from the Procedure Cache. Once enough plans have been pushed out of the cache, the process stops and the new execution plan is inserted into the Procedure Cache. Additionally, any invalid plans are removed from the cache. I’ll describe some different ways a plan can become invalid a bit later. One exception to this process is Ad-hoc plans. No matter the cost to compile an Ad-hoc plan, it is always inserted into the Procedure Cache with a cost of zero. This means the plans will always be eliminated from the Procedure Cache first, and the plans will usually not be reused.
We can use this query to view the plans and details about them in the Procedure Cache:

SELECT TOP 100
        PC.cacheobjtype ,
        PC.objtype ,
        MCE.original_cost ,
        MCE.current_cost ,
        QS.creation_time ,
        QS.last_execution_time ,
        QS.execution_count ,
        DB_NAME(ST.dbid) ,
       ST.TEXT ,
        QP.query_plan
FROM    sys.dm_exec_cached_plans PC
        INNER JOIN sys.dm_os_memory_cache_entries MCE ON MCE.memory_object_address = PC.memory_object_address
        INNER JOIN sys.dm_exec_query_stats QS ON QS.plan_handle = PC.plan_handle
        CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) ST
        CROSS APPLY sys.dm_exec_query_plan(QS.plan_handle) QP

As a warning, the Procedure Cache can contain a LOT of records. It is a good idea to limit your results with the TOP operator.
20150331 Proc Cache Query View
From this screenshot you can see the original cost and current cost are the same. This is because of the very low amount of activity on my Sandbox environment, I don’t have any memory pressure.
To demonstrate some of the ways an execution plan can become invalid, let’s create some examples. First I’ll build a little table and fill it with some data:

CREATE TABLE dbo.FavoriteLakes
    (
      LakeName VARCHAR(50) ,
      Acreage DECIMAL(8, 2) ,
      Rating INT ,
      PrimarySpecies VARCHAR(50) ,
      SecondarySpecies VARCHAR(50)
    )
INSERT  INTO dbo.FavoriteLakes
        ( LakeName, Acreage, Rating, PrimarySpecies, SecondarySpecies )
VALUES  ( 'Lake Tomahawk', 3462, 4, 'Walleye', 'Smallmouth Bass' ),
        ( 'Big Arbor Vitae', 1070, 3, 'Walleye', 'Muskie' ),
        ( 'Pewaukee', 2437, 3, 'Muskie', 'Largemouth Bass' ),
        ( 'Okauchee', 1210, 4, 'Largemouth Bass', 'Muskie' ),
        ( 'Random Lake', 212, 1, 'Largemouth Bass', 'Muskie' ),
        ( 'Two Sisters Lake', 719, 3, 'Walleye', 'Muskie' )

Next I’ll create a stored procedure to get data from that table.

ALTER PROCEDURE GetMyFavoriteLakes
    @FishSought VARCHAR(50)
AS
    BEGIN
        SELECT  *
        FROM    dbo.FavoriteLakes
        WHERE   PrimarySpecies = @FishSought
                OR SecondarySpecies = @FishSought
        ORDER BY Rating DESC
    END

I’ll also run the SELECT statement outside of the stored procedure to create an Ad-Hoc plan.

SELECT  *
FROM    dbo.FavoriteLakes
WHERE   PrimarySpecies = 'Muskie'
        OR SecondarySpecies = 'Muskie'
ORDER BY Rating DESC

After executing a few times with a few different values, here’s what I see in my plan cache:
20150331 Proc Cache SP versus adhoc
To see the stored procedure get invalidated, I’m going to use Profiler and choose only the SQL:StmtRecompile event.
20150331 Proc Cache Profiler Event
The event subclass will contain a number that corresponds to these recompile reasons in this chart from Microsoft:
20150331 Proc Cache recompile reasons
I ran a few statements to cause the stored procedure to recompile:

--Add column to change the Schema
ALTER TABLE dbo.FavoriteLakes ADD ID INT IDENTITY(1,1)
EXEC dbo.GetMyFavoriteLakes @FishSought = 'Largemouth Bass'

--Add index to change the schema
CREATE INDEX IX_Fish ON dbo.FavoriteLakes (PrimarySpecies ASC, SecondarySpecies ASC)
EXEC dbo.GetMyFavoriteLakes @FishSought = 'Muskie'

--Update the Statistics on the table
INSERT INTO dbo.FavoriteLakes
        ( LakeName ,
          Acreage ,
          Rating ,
          PrimarySpecies ,
          SecondarySpecies
        )
VALUES  ( 'Bearskin' , 403 , 3 , 'Muskie' , 'Smallmouth Bass' ),('Boom Lake',437,4,'Muskie','Walleye'),('Crescent Lake',626,3,'Muskie','Largemouth Bass'),
       ('Moen Lake',460,4,'Muskie','Largemouth Bass'),('Hancock Lake',259,3,'Muskie','Yellow Perch'),('Lake Thompson',382,4,'Walleye','Muskie')
UPDATE STATISTICS dbo.FavoriteLakes 
EXEC dbo.GetMyFavoriteLakes @FishSought = 'Smallmouth Bass'

I was able to see the recompilations in Profiler:
20150331 Proc Cache Recompiles in Profiler
That is an overview of the Procedure Cache in SQL Server