There is one week each year that is absolutely terrible for walleye fishing, and that week is when the mayfly hatch. Mayflies are a type of fly similar to a dragon fly. They live in the immature stage for a full year underwater, then they burrow into the ground underneath the water to change into the final fly stage. They usually hatch all in a one to two day span, and emerge from the water for a day or two before reproducing and then dying. When this hatch takes place walleye feed heavily on them, and trying to catch a walleye during this period is almost impossible due to the amount of food available to them. The density of the mayflies can be so great that it is sometimes seen on weather radar.
The density of data on a SQL Server page can be increased through the use of compression. Compression in SQL Server is a powerful tool. Not only does it reduce the amount of disk space being used, saving money on disks, but given that reading from a disk is the slowest part of the retrieval of data from a database, it will speed up your queries (less reads = less time). The one downside is that compression will increase CPU usage by a small amount (I’ve seen about 3% increase), so if you’re strapped for CPU you’re going to have some decisions to make.
Compression in SQL Server comes in two different flavors: row compression and page compression. Row compression works by reducing the data type for a specific record to the lowest sized data type that will accurately hold the information. For example, an INTEGER field takes 4 bytes. If I’m storing the number 23 in an INTEGER field, I’m taking up a lot of space by storing zeros. Row compression will reduce that record to a tinyint, thereby saving 3 bytes. This clearly only reduces fixed length fields, so if your table is mostly variable length it may not provide much space savings.
Page compression gives you row compression, but it also includes some compression information structure in each data page. There are two parts to the compression information structure – a column prefix section and a data dictionary section. The column prefix section will hold values that are part of the total field and are repetitive in a field. For example, if a text field is storing muskie lure manufacturers and it has “Muskie Innoviations”, “Muskie Mania”, and “Muskie Mayhem”, the prefix section can hold the word “Muskie ” and then each record needs to just hold a pointer to the prefix and the remainder of the string. The dictionary section acts the same way, but it only stores complete matches. So if I am storing addresses (city, state, zip) and most of them are in Wisconsin, the data dictionary would store “Wisconsin” and each state that had that record would just include a pointer to that record in the data dictionary.
So there’s a pretty good summary of how compression works in SQL Server, now let’s look at how to set it up. You can set up compression when you are first creating a table with this syntax:
CREATE TABLE [dbo].[BaitPurchases]( [ID] [INT] IDENTITY(1,1) NOT NULL, [Baitname] [VARCHAR](255) NOT NULL, [Purchasedate] [DATETIME2](7) NULL, [Price] [MONEY] NULL, ) ON [PRIMARY] WITH (DATA_COMPRESSION = PAGE) --OR ROW GO
You can also add compression to an existing table like this:
ALTER TABLE dbo.FishCaught REBUILD WITH (DATA_COMPRESSION = PAGE) --OR ROW
Microsoft also includes a handy stored procedure for estimating how much of a space savings you will gain by compressing a table. Here is the syntax for that SP:
EXEC sp_estimate_data_compression_savings 'dbo','LiveBait',NULL,NULL,'page'
It is also important to note that nonclustered index compression is not tied to table (clustered indexes and heaps) compression. This means you will need to compress the table and each nonclustered index separately. I’ve created the following SQL to generate the SQL statements needed to compress a table and all its non-clustered indexes:
DECLARE @tablename VARCHAR(6) = 'AverageFishLength' SELECT 'ALTER INDEX ' + ind.name + ' ON dbo.' + OBJECT_NAME(ind.OBJECT_ID) + ' REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE, ONLINE = ON)' FROM sys.indexes IND WHERE OBJECT_NAME(ind.OBJECT_ID) = @tablename AND type_desc = 'NONCLUSTERED'
Lastly, I’ll share my results when compressing some key tables for one of our largest databases here. The total database used space went from 84.3 GB to 33.2 GB, a space savings of almost 61%!
Here is a screenshot of the spreadsheet where I tracked my results. I ended up compressing the 48 largest tables in a database with over 500 tables.