Reading Transaction Log Backups

Like it a business, where advances in technology fuel greater efficiencies, advances in fishing technology make for greater fishing opportunities. Just look at all the different ways to rig live bait for examples. My preferred method for fishing close to, but not right on, the bottom is a Lindy Rig. This rig is great when there isn’t much vegetation – it has a small bobber between a weight and the hook which lifts the bait off the bottom. The weight is usually a walking slip sinker weight, which has a large hole for the line to flow through so it doesn’t get hung up. The rig’s made so the fish can grab the bait and swim away with it without feeling any resistance. Then when the fish stops to eat the bait, you set the hook. It’s great fun to sit there with your bail open and a finger on the line, trying to feel the fish make some initial nibble on the bait before grabbing it and running. Next comes the excitement as line peels off your reel while the fish runs. Lastly, once it stops you reel in steadily until you feel the weight of the fish on the end, then set the hook sharply. Using a Lindy Rig is a fun and exciting way to fish with live bait.
Last week we ran into a situation where some user had changed a large amount of data in our OMS – but we didn’t know who or why. We weren’t even sure if the changes had been made through the OMS UI or via a direct database insert. Obviously having change tracking turned on who had made the investigation easier, but that doesn’t help you after the fact. The change had taken place the previous day, so it was my job to try to figure out who made it. For this, my only option was to read the Transaction Log Backups. There is an undocumented system stored procedure in SQL Server called fn_dump_dblog which allow you to get some basic information from the transaction log. Calling the function is a bit interesting – the first two parameters are the starting and ending LSN, which can usually just be left NULL to read the whole log backup. The third paramter is either DISK or TAPE indicating where your backup is stored. The fourth parameter is the backup number within the file. For most of us this will be 1 since we’re not using multi-backup media sets. Lastly, the file path and name is the fifth parameter. After that there are 63 nonoptional DEFAULT parameters! So when you do an initial call, it will look like this:

SELECT *
FROM fn_dump_dblog (NULL, NULL, N'DISK', 1, N'\\NASServer\SQLBackup\INSTANCEServer\Database\LOG\InstanceServer_database_LOG_20180306_120001.trn',
                    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
                    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
                    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
                    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
                    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT)

For my particular issue, I needed to find all INSERTS into a specific table. I used the fn_dump_dblog function in a common table expression and filtered the operation on inserts and the PartitionId on the table I was interested in. Then I joined that CTE to another SELECT from the function on the Transaction ID and filtered that table on the LOP_BEGIN_XACT operation to get when the INSERT began. I was able to use the SUSER_SNAME function on the Transaction SID to reveal the database user who had made the INSERTs:

WITH CTE
AS
       (SELECT [Transaction ID], COUNT(*) AS InsertedRows
       FROM fn_dump_dblog (NULL, NULL, N'DISK', 1, N'\\NASServer\SQLBackup\InstanceServer\DatabaseName\LOG\InstanceServer_DatabaseName_LOG_20180306_120001.trn',
       DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
       DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
       DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
       DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
       DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT)
       WHERE Operation = ('LOP_INSERT_ROWS')
       AND [PartitionId] IN (SELECT sp.partition_id
                            FROM DatabaseName.sys.objects so
                            INNER JOIN DatabaseName.sys.partitions sp ON so.OBJECT_ID = sp.OBJECT_ID
                            WHERE name = 'InsertedOnTable'
                           AND sp.index_id = 1)
       GROUP BY [Transaction ID]
       )
SELECT [Current LSN], a.[Transaction ID], [Transaction Name], [Operation], [Begin Time], SUSER_SNAME([TRANSACTION SID]) AS LoginName, CTE.InsertedRows
FROM fn_dump_dblog (NULL, NULL, N'DISK', 1, N'\\NASServer\SQLBackup\InstanceServer\DatabaseName\LOG\InstanceServer_DatabaseName_LOG_20180306_120001.trn',
   DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
   DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
   DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
   DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
   DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT) AS a
INNER JOIN cte ON a.[Transaction ID] = cte.[Transaction ID]
WHERE Operation = ('LOP_BEGIN_XACT')

20180315 Query Tlog Backup
Change tracking would give more detail and be easier to query, but at least for now I was able to see that the user making the change was the login that the application uses to connect to the database, so I was able to see that the change was indeed made through the UI.

Advertisements

Documentation – The ‘Hit-by-a-Bus’ Handbook

A huge part of success in fishing is preparation. Without careful preparation fishing will be slow, you can lose or break equipment, and you can even put yourself in danger. Several of my previous posts have detailed how preparing by looking at weather, depths, fishing reports, etc. can improve your fishing “luck”. Another part of preparation is inspecting your fishing equipment before use. I can remember at least three times where I’ve casted my lure right off the end of my line. It’s great to see a bomb cast really sail out there, but it’s not so great the moment you realize it’s no longer attached to the end of your line. Lastly, let’s not forget the thought of unhooking that deep hookset on a northern pike when you neglected to make sure your pliers was in your tackle box.
I was considering preparation this week after I read this great post by Monica Rathbun on how to survive as a Lone DBA. One of the great tips she has is to create a ‘Hit-by-the-Bus’ handbook. This would have all the information you need to pass on to the next DBA if you were to suddenly leave the job. With this documentation, the next DBA should be able to step right in without missing a beat. As I deeply considered my current environment, it became clear to me that, were I to suddenly leave my position, the next DBA would have a lot of questions that he or she would have to do some deep and time consuming investigation to answer. As such, I’ve decided to make a major commitment to improving my documentation. From a DBA perspective, this will include detailed documentation on:

  • Daily/Weekly/Monthly/Ad-Hoc tasks. How to do them and why they are done.
  • SQL Agent Jobs. What are they doing and how to troubleshoot if they break. How urgently do they need to be fixed?
  • SSIS packages. What are they doing at a quick glance? Stepping through an SSIS package to figure out what it is doing can be all kinds of frusterating.
  • Backups. What is our strategy? Where are backups going and how would I address various restore situations?

I believe this documentation will bring value both for myself (because my memory is not great) and for my successor, whenever that day may come.

db_recovery_file_dest_size of N bytes is X% used and has R remaining bytes available

The boat launch can be a tough place, especially if you are a beginner on a busy lake.  People are anxious to get onto and off of the lake.  You don’t want people to be sitting there waiting on you, so the temptation is to go as fast as you can.  This can be a huge mistake for many though.  Even experienced boaters can make rookie mistakes that are downright embarrassing in front of all the other boaters.  Personally, I’ve made two mistakes that were somewhat embarrassing.  My first wasn’t a big deal – I forgot to put in the boat plug.  I’d guess most boaters have made this mistake at one time or another.  Water started gushing in the boat.  Once I saw it I quickly jammed in the plug and started the bilge pump, which drained out the water in about five minutes.  No real harm done.  My other big mistake was while I was pulling off a river.  The launch was gravel and had no dock.  I walked out on the tongue of the trailer, but while I was adjusting the boat I lost my balance.  Unfortunately my shoe got stuck on a bracket so instead of stepping down and getting my leg wet up to my knee, I fell back first and ended up totally submerged in 1 1/2 feet of water.  I’m the kind of guy who is able to laugh at my mistakes, so even though I ended up soaking wet and embarrassed, I still had a good laugh at my own expense.

A DBA can certainly see the lesson in this.  If you go too fast you make mistakes.  You need to be very careful and pay close attention to details.  One way to cover yourself is to maintain backups.  I recently ran into a warning in Oracle Enterprise Manager that said I was using 97% of the recovery area free space.  A quick search found that if the used space got up to 100%, the database would cease to process transactions.

20161122-recovery-file-size-oem

I wanted to see within the database what my db_recovery_file_dest_size parameter was set to.  I ran a quick query and found that it was set to 350 GB.

20161122-recovery-file-size-parameter

Next I queried the v$recovery_file_dest to see the total space, used space, and directory where I was storing the backupsets.

20161122-recovery-file-size-recovery-dest

I checked the directory and found it quite full of back files.

20161122-recovery-file-size-backups

My next step was to attempt to delete any expired archivelogs.  Unfortunately, this didn’t do anything:

20161122-recovery-file-size-delete-expired-results

I needed to first backup the archivelogs before I could delete any of them.  So I ran the following command in RMAN:

20161122-recovery-file-size-backup-archivelog-all-delete-input

This ran a VERY long time, but eventually succeeded.  This fixed my problem.  After running this command I checked my free space and found I’d gained plenty:

20161122-recovery-file-size-fixed-size

Taking backups is not enough – you need to be aware of what is happening to the backups.  Testing recover-ability is also a great idea!

Challenges with image-based backups

I haven’t done a lot of muskie fishing in the past month.  It’s been a busy summer (as always), and I’ve been sticking to short fishing trips for “lesser” species.  I made a couple trips to little cedar this last weekend and both were fun and fairly successful.  The first was with a friend from church and his young son.  They caught several bluegills and the boy had a blast, although he probably enjoyed the snacks more than the fishing.  I caught a couple northern pike, the biggest of which was 19″.  It was almost big enough to keep, but I ended up letting it go.

The next night I took my five year old out for an evening of fishing.  We got some leeches and were targeting walleye, but we only ended up with some largemouth bass.  I think we were getting bites from walleye, but we didn’t manage to hook any.  Below is a picture of a nice largemouth bass my son took with a skitter pop.  The fish had a couple spectacular jumps before making it to the boat.

Nate LM Bass 20160806

Yesterday was one of the most challenging days I’ve had with my current employer.  We’re in the process of changing managed service providers for IT infrastructure, and changes like these are never easy.  The problems yesterday stemmed from the backup solution changes.  Both our old and new MSPs use a disk image-based backup solution, but they are different brands.  In a previous post, I’ve made it clear that I am not a fan of image-based backups for databases.  Most relational database management systems have their own backup mechanisms, and I understand and trust these backups far more than the image backups.  However, different backups can serve different purposes and in this case, I have no choice in the matter.

The problem occurred over the course of the morning.  I got a report that our Oracle database was not available.  Upon checking, I found I could not access the database.  I remote desktop-ed into the server to check the service and found connectivity spotty.  I recycled the Oracle database service, but that did not fix the issue.  Since the connectivity was slow and spotty, I suspected either a network or disk issue.  The new MSP was able to add some diagnostics to the server and we found no hardware issues with the disks.  There also didn’t appear to be any network connectivity issues.

At this point we attempted a server reboot.  This didn’t help.  Since Oracle database is pretty much the only thing running on the server, we figured the problem either had to be Oracle or the antivirus software.  We disabled both and rebooted the server again.  This time it appeared to help.  We brought Oracle back up but left the antivirus down.  Everything was working fine for about 45 minutes, so we figured there must’ve been some sort of conflict between Oracle and the antivirus software.  To win the worst timing of the day award, I sent out an all-clear email to the users about one minute before the server started breaking again.  We saw the same symptoms as before where we had spotty connectivity and extreme slowness.

After yet another reboot we left both Oracle and the antivirus off.  When slowness occurred yet again, we knew it had to be something else that was running on the server.  A quick check of the Programs list showed that the image-based backups for the previous MSP were still running (as they should have been).  However, that was enough of a clue for a lightbulb moment.  In preparation for the new backup solution, a disk defragmentation had been run the previous night.  The current backup solution did a single full, then continuous incrementals afterward.  The defragmentation had scrambled up all the bytes on the disk, resulting in an incremental with lots of changes to process.  In looking at the backup manager, we found the next incremental would take a full two days to process.  We also discovered that each time we’d had problems, there was an incremental backup scheduled to run.

Fairly confident we’d found our culprit, we disabled the backup and turned Oracle and the antivirus software back on.  As expected, there were no issues for the remainder of the day.

In the end, the pressure of the database trying to use the disks in addition to the backup solution proved to be too much for the server to handle.  Through the use of methodical testing, we found the problem and fix it.

SQL Server Upgrade/Migration

Last week I was feeling burnt out. We’d been having numerous issues pop up at work and it was taking its toll on me. I felt constantly mentally fatigued and knew I needed some time to myself. My wife could sense I needed some relaxation, so she took the kids to her aunt’s house and let me have some solo time. I considered going down to Pewaukee to try and catch some muskies, but the water temp is peaking over 80 degrees, which is the max you should fish for muskies. Any warmer than that and they are unable to properly recover from the fight and could end up dying, even if it isn’t for a couple hours after they are released. Besides, I needed to just relax, and while muskie fishing is an awesome sport it really isn’t very relaxing. So I headed over to Little Cedar Lake to shoot for some walleye, largemouth bass, or northern pike. While I only managed one medium sized bass, it was great to just sit back and sip a few beers while enjoying the lake air and beautiful scenery. Since that night I have felt much more refreshed and ready to get back to work.


Evening Fishing
We have been doing some server upgrade/migrations the last several weeks. We purchased a new server and needed to move our production instance to it. We don’t have any kind of high availability, and were able to tolerate a system outage of four or less hours. The first thing I did was to practice extensively using our Test environment. I did the cutover in Test several times. I was confident going into the upgrade that I would be able to complete it successfully.  This upgrade was slightly more complicated in that in involved moving some databases to different drives during the upgrade process.  Not only was I going to move the system files from the C: drive to the data file and log file drives, but I also needed to move tempdb from the G: to the F: drive and move data and log files into differently named folders on the D: and E: drives.
I used the following procedure list to perform the upgrade:

  1. Stop the application services.  This will close existing connections to the user databases.
  2. Stop the data warehouse transfer process.  We use a third party tool to move data from our Production SQL Server instance to our Oracle data warehouse.
  3. Take full copy-only backups of each user database.  After taking each backup, take the database offline to prevent new connections and changing data.
  4. Take full backups of all system databases.  It is important to do them after the user databases so that msdb has records of the latest user database backups.
  5. Move all system database using SQL Script below.
  6. Create and restore all user databases using SQL Script demonstrated in other blog post.
  7. Shutdown new and old server.  System Admin changes the name of the old server to whatever, then changes the name of the new server to what the old server used to be.  He also updates the IP of the new to be what the old was.  This way we don’t have to change any application settings.
  8. Bring up Reporting Services on new server.  I will explain the intricacies of this step in a later post.
  9. Begin testing everything – application, SSIS, SSRS, and SQL Agent jobs.

Here are the SQL Scripts used to make the changes:

USE [master]
RESTORE DATABASE [model] FROM  DISK = N'\\BackupLocation\OldServer SQL Backups\SQLBackupFiles\OldServer\model\FULL\OldServer_model_FULL_20150724_201857.bak' 
WITH  FILE = 1,  
--MOVE N'modeldev' TO N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\model.mdf',  
--MOVE N'modellog' TO N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Log\modellog.ldf',  
NOUNLOAD,  REPLACE,  STATS = 5

GO

ALTER DATABASE model MODIFY FILE (NAME = 'modeldev', FILENAME = 'D:\DataFiles\model.mdf')
ALTER DATABASE model MODIFY FILE (NAME = 'modellog', FILENAME = 'E:\LogFiles\modellog.ldf')

GO

--Stop SQL Server Agent
USE [master]
RESTORE DATABASE [msdb] FROM  DISK = N'\\BackupLocation\OldServer SQL Backups\SQLBackupFiles\OldServer\msdb\FULL\OldServer_msdb_FULL_20150724_201858.bak' 
WITH  FILE = 1,  
--MOVE N'MSDBData' TO N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\MSDBData.mdf',  
--MOVE N'MSDBLog' TO N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Log\MSDBLog.ldf',  
NOUNLOAD,  REPLACE,  STATS = 5

GO

ALTER DATABASE [msdb] MODIFY FILE (NAME = 'MSDBData', FILENAME = 'D:\DataFiles\MSDBData.mdf')
ALTER DATABASE [msdb] MODIFY FILE (NAME = 'MSDBLog', FILENAME = 'E:\LogFiles\MSDBLog.ldf')

GO
--Restart the SQL Service

--Stop the service
--Add -m to SQL Server startup parameters to start in single user mode
--Open command prompt
--sqlcmd
--Start the service
RESTORE DATABASE [master] 
FROM DISK = N'\\BackupLocation\OldServer SQL Backups\SQLBackupFiles\OldServer\master\FULL\OldServer_master_FULL_20150724_201856.bak' WITH REPLACE;

--Remove -m from SQL Server startup parameter
--Open command prompt
--Go to the SQL Server directory (on this machine, C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn)
--run SQLServr.exe -T3608 -T3609
--		This will just open Master database.
--Run SQLCMD from a different command prompt and run these commands
ALTER DATABASE model MODIFY FILE (NAME = 'modeldev', FILENAME = 'D:\DataFiles\model.mdf')
ALTER DATABASE model MODIFY FILE (NAME = 'modellog', FILENAME = 'E:\LogFiles\modellog.ldf')
ALTER DATABASE msdb MODIFY FILE (NAME = 'MSDBData', FILENAME = 'D:\DataFiles\MSDBData.mdf')
ALTER DATABASE msdb MODIFY FILE (NAME = 'MSDBLog', FILENAME = 'E:\LogFiles\MSDBLog.ldf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev01', FILENAME = 'F:\TempDB\tempdb.mdf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev02', FILENAME = 'F:\TempDB\tempdb2.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev03', FILENAME = 'F:\TempDB\tempdb3.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev04', FILENAME = 'F:\TempDB\tempdb4.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev05', FILENAME = 'F:\TempDB\tempdb5.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev06', FILENAME = 'F:\TempDB\tempdb6.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev07', FILENAME = 'F:\TempDB\tempdb7.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev08', FILENAME = 'F:\TempDB\tempdb8.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev09', FILENAME = 'F:\TempDB\tempdb9.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev10', FILENAME = 'F:\TempDB\tempdb10.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev11', FILENAME = 'F:\TempDB\tempdb11.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev12', FILENAME = 'F:\TempDB\tempdb12.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev13', FILENAME = 'F:\TempDB\tempdb13.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev14', FILENAME = 'F:\TempDB\tempdb14.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev15', FILENAME = 'F:\TempDB\tempdb15.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev16', FILENAME = 'F:\TempDB\tempdb16.ndf')
ALTER DATABASE tempdb MODIFY FILE (NAME = 'templog', FILENAME = 'F:\TempDB\templog.ldf')

--End SQL Service, then start the service normallyy.
--At this point all the system databases should be transfered to New Server.

As I said, the migration went pretty smoothly.  I did have a few hiccups that should have been tested weeks before the migration, but I missed them.  I was able to get them working within a day of the migration, and next time I’ll get them right away.  The first was the database mail, which took me an extra hour to get set up right after the migration.  The second was a linked server.  This was a bit tougher because I had to install the Oracle ODBC driver, and this required a SQL Server reboot.

Steve Stedman’s Corruption Challenge #3

This Saturday is the fishing opener here in Wisconsin.  The DNR designates some time in spring when the only fishing allowed on most inland waters is panfishing.  This rule was created to protect the fish, since most species spawn in the spring.  I can’t wait to get out there on Saturday and try to catch some opening day fish!  This is my first opener owning my own boat, so I can certainly say I’ve never been looking forward to a fishing opener this much.  Hopefully my next post will have a picture or two of a beautiful 2015 muskie.

SQL Server blogger Steve Stedman recently started a weekly database corruption challenge. The challenges each involve the simulation of some sort of corruption or database failure, and we are tasked with recovering without losing any data. I missed the first two weeks, but decided to spend a bit of time last Saturday morning working on the third challenge. Here is the scenario Steve set up:

Here is the timeline as things occurred

At 2:53pm a full backup was performed. After that full backup, users continued to do work and the database continued to grow.

At 2:54pm a transaction log backup was performed. Followed by more changes to the database.

At 3:01pm another transaction log backup was performed. Followed by more changes to the database.

At 3:12 another transaction log backup was performed. Followed by more changes to the database.

At 3:19 a catastrophic failure occurred causing the operating system running SQL Server to reboot. Let’s blame it on a power surge, or maybe it was just a DBA who accidently bumped the power cord. I like the power surge idea, so that I don’t get blamed for kicking the power cord.

After the server rebooted, several other databases on this SQL Server recovered just fine, however the CorruptionChallenge3 database would not come on line. After examining the SQL Server data directory, it was discovered that the single .mdf file for this database was gone, missing, just not there. The only thing that remained on this SQL Server was the .ldf file for this database.

20150429 corruption challenge db recovering

Steve included five files needed to recover all the data – the last full backup, the three transaction log backups that were taken between the last full backup and the disaster event, and the log file.

20150429 corruption challenge files given

After thinking about the problem briefly, I formulated my recovery plan. I would restore the full backup and each of the transaction log backups. This would bring the database current through the time of the last backup. Then I would detach the database and swap out the log file with the one provided by Steve, and take another log backup. Then I could use this new log backup to bring the database online with all transactions accounted for. The first part of my plan went according to plan:

USE [master]
RESTORE DATABASE [CorruptionChallenge3] FROM  DISK = N'\\server\Gugg\CorruptionChallenge3_Full.bak' WITH  FILE = 1,  MOVE N'CorruptionChallenge3' TO N'X:\SQLData\CorruptionChallenge3.mdf',  MOVE N'CorruptionChallenge3_log' TO N'Y:\SQLLogs\CorruptionChallenge3_log.LDF',  NORECOVERY,  NOUNLOAD,  STATS = 5
RESTORE LOG [CorruptionChallenge3] FROM  DISK = N'\\server\Gugg\TransLog_CorruptionChallenge30.trn' WITH  FILE = 1,  NORECOVERY,  NOUNLOAD,  STATS = 5
RESTORE LOG [CorruptionChallenge3] FROM  DISK = N'\\server\Gugg\TransLog_CorruptionChallenge31.trn' WITH  FILE = 1,  NORECOVERY,  NOUNLOAD,  STATS = 5
RESTORE LOG [CorruptionChallenge3] FROM  DISK = N'\\server\Gugg\TransLog_CorruptionChallenge32.trn' WITH  FILE = 1,  RECOVERY,  NOUNLOAD,  STATS = 5
EXEC MASTER.dbo.sp_detach_db @dbname = N'CorruptionChallenge3'
GO

However, when I tried to re-attach the database after replacing the log file, I got the following message:
One or more files do not match the primary file of the database. If you are attempting to attach a database, retry the operation with the correct files. If this is an existing database, the file may be corrupted and should be restored from a backup.
Log file ‘R:\SQLLogs\CorruptionChallenge3_log.LDF’ does not match the primary file. It may be from a different database or the log may have been rebuilt previously.
At this point I remembered some Oracle training I’d done regarding Log Sequence Numbers (LSNs). When connecting a database, the LSN in the log has to match the LSN in the data file. At this point I couldn’t think of what else to do. I could not think of any way to get the data out of the log file without taking a backup of it, and I couldn’t take a backup without first attaching it. I thought about it for another 15 minutes and then turned my attention to other tasks I had to get done that Saturday.
When Steve revealed the answer on Tuesday, I was disappointed by how close I came, but I missed an important detail. In order to take a tail-log backup without a valid data file, you need to set a “dummy” version of the database offline, delete the data file, replace the log file, and the when you turn it back online you’ll get an error message but you will still be able to take a tail-log backup. (See this post by Paul Randal for details.) So I should have done this:

USE [master]
GO
CREATE DATABASE [CorruptionChallenge3] ON 
( NAME = 'CorruptionChallenge3', FILENAME = N'S:\SQLData\CorruptionChallenge3.mdf' ),
( NAME = 'CorruptionChallenge3_log', FILENAME = N'R:\SQLLogs\CorruptionChallenge3_log.LDF' )
GO

ALTER DATABASE CorruptionChallenge3 SET OFFLINE;

--Delete the mdf file and replace the ldf file

ALTER DATABASE CorruptionChallenge3 SET ONLINE

20150429 corruption challenge files inaccesible

BACKUP LOG CorruptionChallenge3 TO DISK = N'\\server\Gugg\Final_Log_Backup.trn' WITH INIT, NO_TRUNCATE;
GO

The key here is to use the NO_TRUNCATE option. With this I am able to take a log backup, and now have one full backup and four log backups, enough to restore the database with no data loss!

I have a busy weekend coming up since it’s the fishing opener on Saturday, but I hope to be able to work on Corruption Challenge #4, coming up on Friday.

Stopping MSDB bloat due to frequent backups

I’ve always wanted to catch a truly fat fish. In fall many species of fish feed heavily to put on fat for the winter, and when catching them you can see the stomach pushing the belly of the fish out and making them look quite bloated. This is the reason that fishing in the cold wind and rain of late fall is worth the pain and sacrifice.
I recently ran into an issue where I noticed the MSDB database on an instance was growing quite bloated. MSDB database contains data related to SQL Agent jobs, database mail, SSIS packages, Backups, and Maintenance plans. In most cases it should be fairly small, but I observed it at close to 4 GB.
20150325 Bloated MSDB Initial Size
To find what was taking all the space I ran this query:

USE msdb

SELECT  OBJECT_NAME(i.OBJECT_ID) AS objectName ,
        i.[name] AS indexName ,
        SUM(a.total_pages) AS totalPages ,
        SUM(a.used_pages) AS usedPages ,
        SUM(a.data_pages) AS dataPages ,
        ( SUM(a.total_pages) * 8 ) / 1024 AS totalSpaceMB ,
        ( SUM(a.used_pages) * 8 ) / 1024 AS usedSpaceMB ,
        ( SUM(a.data_pages) * 8 ) / 1024 AS dataSpaceMB
FROM    sys.indexes i
        INNER JOIN sys.partitions p ON i.OBJECT_ID = p.OBJECT_ID
                                       AND i.index_id = p.index_id
        INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
GROUP BY i.OBJECT_ID ,
        i.index_id ,
        i.[name]
ORDER BY SUM(a.total_pages) DESC ,
        OBJECT_NAME(i.OBJECT_ID)
GO

20150325 Bloated MSDB Biggest Space Users
This showed the backupfile table to be taking over half the database space, and some other associated backup tables taking up quite a bit of the rest. Looking in MSDN, I was able to find that the backupset table contains a row for each backup set. A backup set contains the backup from a single, successful backup operation. At he bottom of the msdn page I found this: “To reduce the number of rows in this table and in other backup and history tables, execute the sp_delete_backuphistory stored procedure.” I followed the hyperlink to the MSDN page describing this stored procedure. The page describes the stored procedure as follows: “Reduces the size of the backup and restore history tables by deleting the entries for backup sets older than the specified date. Additional rows are added to the backup and restore history tables after each backup or restore operation is performed; therefore, we recommend that you periodically execute sp_delete_backuphistory.”
I decided to get a baseline of how many rows are currently in the backupset table, then start deleting rows using this stored procedure. For this instance we never have to look at backups older than a month at the very most, but to be safe I would prefer not to delete backups that are too recent. I decided I wanted to reduce the record count by about 75%, and then monitor growth to see how much data I should actually keep.
Here is the initial rowcount:
20150325 Bloated MSDB Initial Rowcount Next I ran the stored procedure to delete everything older than January 1st 2011:

 EXEC msdb.dbo.sp_delete_backuphistory @oldest_date = '1/1/2011'

Then I checked the rowcount again:
20150325 Bloated MSDB 2011 rowcount I ran this several more times, keeping less history each time:

 EXEC msdb.dbo.sp_delete_backuphistory @oldest_date = '1/1/2012'

20150325 Bloated MSDB 2012 rowcount

 EXEC msdb.dbo.sp_delete_backuphistory @oldest_date = '1/1/2013'

20150325 Bloated MSDB 2013 rowcount I quickly realized much of this bloat is due to a change in the backup configuration we made late last year. In order to facilitate disaster recovery we had to abandon our native SQL backups (NO!) and go with a third party image backup system. I didn’t like this because I prefer to be in charge of the backups instead of leaving them to an outside vendor, but our cloud backup/disaster recovery area was only able to handle image backups, so I had no real option. Since we are busiest late in the year, we ended up having to change the transaction log backups so they are run every minute. This is the only way the image backup would work without slowing the system, since it requires an I/O freeze when it takes the backup, and the backup capture all changes since the last backup. Smaller intervals mean less changes and a shorter I/O freeze time.
After trimming the backup history to about three months:
EXEC msdb.dbo.sp_delete_backuphistory @oldest_date = '1/1/2015'
20150325 Bloated MSDB 2015 rowcount I was able to trim the rowcount enough such that there is significant empty space in the database. Next I set up a SQL job to delete all records older than 3 months and run it once a month. Now I don’t have to worry about MSDB continuing to grow.