There is a rather rough rule that fisherman follow when looking for trophy fish. The rule goes like this: the bigger the lake, the bigger the fish. As I stated, this is a rough rule, and there are lots of exceptions, but in many cases it proves to be true. There is a very small lake here in southeastern Wisconsin called Random lake. This lake holds muskie, but very few, if any, above the 40″ mark. Contrast that with Green Bay, a huge amount of water where many 50″+ muskie are caught every year. Much of this is the results of the forage available and the density of the target fish, but either way, if I’m looking to catch a true trophy, I’m heading to large lakes.
At work I came across a situation that made me wonder about backup sizes. I had inherited a ~100 GB database, and after some initial benchmarking of the system, I decided to compress some of the largest tables. I ended up reducing the amount of used space down by almost 53 gigabytes, which reduced the space needed and increased performance (less I/O). However, I now have these large data files (this database has 10 files) that are about 2/3rds empty. I wondered to myself whether having large files, even if they are mostly empty, increases the backup or the restore time of the database. I decided to run a quick test on my Sandbox machine to see if there was any real difference.
The database started out with an initial size of 0.4 GB.
I used this script to do the backup. The script backups up the database and also does a test restore to ensure the backup is usable. This is an important step in any backup strategy.
BACKUP DATABASE [Sandbox] TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQLSANDBOX\MSSQL\Backup\Sandbox.bak' WITH NOFORMAT, NOINIT, NAME = N'Sandbox-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 GO DECLARE @backupSetId AS INT SELECT @backupSetId = position FROM msdb..backupset WHERE database_name = N'Sandbox' AND backup_set_id = ( SELECT MAX(backup_set_id) FROM msdb..backupset WHERE database_name = N'Sandbox' ) IF @backupSetId IS NULL BEGIN RAISERROR(N'Verify failed. Backup information for database ''Sandbox'' not found.', 16, 1) END RESTORE VERIFYONLY FROM DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQLSANDBOX\MSSQL\Backup\Sandbox.bak' WITH FILE = @backupSetId, NOUNLOAD, NOREWIND GO
The backup took 16 seconds. Here is the backup in the directory. You can see the backup size is 373 MB.
Lastly, I’ll do a restore to check how long that takes. You can see in the screenshot below it took 23 seconds.
USE [master] RESTORE DATABASE [Sandbox] FROM DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQLSANDBOX\MSSQL\Backup\Sandbox.bak' WITH FILE = 1, NOUNLOAD, REPLACE, STATS = 5 GO
Next I grow the database file. Since I’m using SQL Server Express, I’ll grow it to its max size, 10 GB.
I use the same backup script and it now takes 17 seconds. This is a very slight difference considering the database file grew by 2400%. We’ll call it a negligible difference
The size of the backup has also grown by a very small amount, up to 383 MB.
The last thing to check is the restore time. This time the restore took 23 seconds. This is in line with what we saw from the restore with the small size database file.
So this experiment has shown that data file size makes a tiny, if any at all, difference in the amount of time to backup and restore and the size of the backups. Shrinking database data files almost always causes high fragmentation in the database, so unless you absolutely need more disk space on your drive, there is really no reason to shrink the database files, even if they have a lot of free space.