Evaluating the use of Sparse Columns

A few summers ago I was fishing a very clear lake up in Oneida county, WI for muskie. Oneida county and the county just to the north of it, Vilas county, offer some of the best muskie fishing in the Midwest. It was a hot, calm day with not a cloud in the sky. These types of conditions are usually not very good for muskie – they are ambush predators and prefer low light conditions or stained water so they can close in on their prey without being noticed until the last second. I was fishing a deep weedline with a deep-running crankbait. As I was reeling in I felt the dull tug of weed getting caught in the hooks of the lure. I gave a sharp tug with the rod to try and free the lure, but the weeds would not let go. As my lure limped back to the boat, I was surprised to see a good sized muskie following the lure even though it was surrounded by the weeds it was dragging. One fun thing about muskie is their willingness to attack lures right next to the boat, and for this reason the seasoned muskie angler will swim the bait in a circle a few times at the end of each retrieve (commonly called a figure-eight). Although I was sure the fish wouldn’t hit the fouled lure, I still went into my standard figure-eight. I was surprised and completely unprepared as the muskie shot forward and grabbed the lure, still full of weeds, about a quarter way through my first circle. I had a split second decision on which way to try the hookset, and I made the wrong one. I tried setting the hood back against the direction the fish was moving, but I ended up striking the side of the boat with my rod, and the fish dropped the lure and swam off. I had made a wrong decision, and had to live with losing that fish. Fortunately when deciding whether or not to use Sparse columns in SQL Server, we don’t have a split second decision – we can experiment to see whether they are a good fit for our particular tables.
Previously I had posted on the amount of space used by fixed length fields, especially when storing the NULL value. One way to get around this “limitation” of these data types is to use Sparse columns. Microsoft’s Developer Network defines Sparse columns as ordinary columns that have an optimized storage for NULL values. Sparse columns reduce the space needed for NULL values, but the tradeoff is the space used and cost to retrieve the non-NULL values. Microsoft suggests using sparse columns when the space saved is at least 20 to 40 percent. There are also a couple of important restrictions on Sparse columns:

  • Obviously, the column has to be nullable (no identity or primary key)
  • These data types are not allowed to use sparse columns: text, ntext, image, timestamp, user-defined, geometry, or geography
  • The columns can’t have a default value or be a computed column
  • Sparse columns are incompatible with compression. Sparse columns can’t be added to compressed tables and tables with existing sparse columns can’t be compressed

That last point is pretty important because you can often get significant space savings with compression. The main downside of compression is the extra CPU usage required when working with these tables, but that’s another discussion altogether.
When SQL Server is changing a column from sparse to nonsparse or vice versa, that table is unavailable (locked). This is because SQL Server follows this procedure to make the change:

  1. Adds a new column to the table with the new storage size and format.
  2. Copies the data from the old row to the new
  3. Removes the old column
  4. Rebuilds the table or clustered index to reclaim the space used by the old column.

It’s important to remember there is a row size limit in SQL Server that is 8060 bytes for tables without Sparse columns and 8018 bytes for tables with Sparse columns. If changing a column from nonsparse to Sparse violates these limits, the change will fail.
From a storage perspective, Sparse columns work by not storing any data for columns that have been marked sparse and are NULL. However, in order to store values in Sparse columns that aren’t NULL, SQL server has to not only store the value, but also what column is holding that value. This causes the size required for each non-NULL value to increase by 4 to 8 bytes (for example, non-NULL integers increase from 4 to 8 bytes).

Now that I’ve hopefully provided a little background on what Sparse columns are and how they work, let’s look at an example. I have created a table with 1 million records in an integer field.


CREATE TABLE dbo.SparseTest
    (
      ID INT IDENTITY(1, 1) ,
      IntegerField INT ,
      CONSTRAINT PK_SparseTest_ID PRIMARY KEY CLUSTERED ( ID ASC )
        WITH ( PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
               IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
               ALLOW_PAGE_LOCKS = ON ) ON [PRIMARY]
    )
ON  [PRIMARY]
DECLARE @int INT = 1
WHILE @int <= 1000000
    BEGIN
        INSERT  INTO dbo.SparseTest
                ( IntegerField )
        VALUES  ( CHECKSUM(NEWID()) )
        SELECT  @int = @int + 1
    END

This table is full of values – none are NULL:

20141223 No Sparse No Null

The initial size of this table is about 16 MB:

20141223 No Sparse Size

Next I update the table definition to make IntegerField a Sparse column:

ALTER TABLE dbo.SparseTest ALTER COLUMN IntegerField INT SPARSE
ALTER TABLE dbo.SparseTest REBUILD

I found I did have to rebuild the table before I was able to see the change in used space.  Now our table with the Sparse column with no NULL values has ballooned to 28 MB!

20141223 Sparse Size

I change 25% of the values in the Sparse column to NULL with this statement:

UPDATE  dbo.SparseTest
SET     IntegerField = NULL
WHERE   ID % 4 = 0

20141223 Sparse Size 25 Percent NULL

The size has shrunk to approximately 4 MB, to 24 MB.  Next I change another 25% of the values to NULL.  Now half of the values in the table are NULL.

UPDATE  dbo.SparseTest
SET     IntegerField = NULL
WHERE   ID % 4 = 1

The size has shrunk another 4 MB; we are now down to about 20 MB for the entire table.

20141223 Sparse Size 50 Percent NULL

I update another 25% to NULL so now 75% of the values in the Sparse column are NULL.

UPDATE  dbo.SparseTest
SET     IntegerField = NULL
WHERE   ID % 4 = 2

20141223 Sparse Size 75 Percent NULL

We now see the table size is down to 16 MB, the same size as when the column was nonSparse.  One last update puts the table down to 12.5 MB.

UPDATE  dbo.SparseTest
SET     IntegerField = NULL

20141223 Sparse Size 100 Percent NULL

So this demonstrates the size differences of the entire table based on the amount of NULLs in a Sparse column.  The real lesson here is to experiment with Sparse columns on your tables if you have many NULL values, but understand that you need to check to make sure making columns Sparse does indeed reduce the amount of space used.  And don’t forget, this isn’t a one time decision.  The percentage of total values in a table that are NULL can change often; don’t be afraid to track the NULL percentage over time to make sure you are keeping your table optimized.

Advertisements

Fixed Length Fields and the NULL Value

Before each muskie fishing trip I go down into my basement and choose which lures I want to bring with.  You see, muksy lures are BIG, and I have many more lures than fit inside my tackle box.  I need to plan which lures will probably be most effective depending on where I will be fishing and under what conditions (weather, season, water clarity, etc.).  I may choose to use every slot in my tackle box, or to just grab a few proven lures and leave the rest of the slots empty.  However, whether I choose to completely fill my tackle box, or just bring a few lures that I really trust, my tackle box is still going to take up the same amount of space in my boat.  Fixed length fields in SQL Server work the same way.  No matter what value (or lack of value) is in them, they take up the same amount of space. Mostly Empty Tackly Box

This topic is a fairly simple one but it resonates with me because it was one of the first situations I ran into as an early DBA where I had to disagree with one of the “more seasoned” DBAs.  I put more seasoned in quotes because the guy (I’ll call him Larry) actually had only been a DBA for about ten months more than me.  In discussing table design, Larry made the point that NULL values don’t take up much space, usually a single bit that indicates the field is NULL.  I tried to point out that this was only true for variable length fields – fixed length fields always take the same amount of space regardless of what value they hold.  Larry disagreed and shot me down abruptly.  I tried to argue my point one more time, then I let it go.  Afterward I set up an experiment to prove to myself that I was correct.  To set this up, I created a table with both fixed length and variable length fields:

CREATE TABLE [dbo].[NullsUsedSpace]
    (
      [ID] [INT] NOT NULL IDENTITY(1, 1) ,
      [IntegerField] [INT] NULL ,
      [VarcharField] [VARCHAR](36) NULL ,
      [CharField] [CHAR](36) NULL
        CONSTRAINT [PK_NullsUsedSpace]
        PRIMARY KEY CLUSTERED ( [ID] ASC )
        WITH ( PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
               IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
               ALLOW_PAGE_LOCKS = ON ) ON [PRIMARY]
    )
ON  [PRIMARY]

From Microsoft here and here, we know that Integer and Char are fixed length fields while Varchar is a variable length field.  Next I fill this table with a set of various values for each data type:

SELECT TOP 100000
        IDENTITY ( INT,1,1 ) IntegerField ,
        NEWID() VARCHARField ,
        NEWID() CHARField
INTO    #temp
FROM    MASTER.dbo.syscolumns sc1 ,
        MASTER.dbo.syscolumns sc2

INSERT  INTO dbo.NullsUsedSpace
        ( IntegerField ,
          VARCHARField ,
          CHARField
        )
        SELECT  IntegerField ,
                VARCHARField ,
                CHARField
        FROM    #temp;

DROP TABLE #temp;

Here is a screenshot of the data returned:

SELECT  *
FROM    dbo.NullsUsedSpace;

20141222 Nullable Table

I check out the size of the table and find it at almost 9 MB.

20141222 Initial Space

Next we update the table and set all CharField and IntegerField to NULL.

UPDATE  dbo.NullsUsedSpace
SET     CHARField = NULL ,
        IntegerField = NULL;

Depending on the size of the table, you may need to rebuild the index to see size changes:

ALTER INDEX PK_NullsUsedSpace ON dbo.NullsUsedSPace
REBUILD;

If fixed length fields didn’t need the same amount of space for NULLs as non-NULL values, I’d expect the space used to go down significantly.  Instead, we see it hasn’t changed:

20141222 Nullable Fixed

However, if we set the variable length fields to NULL, we do see the space used go down to just over 5 MB:

UPDATE  dbo.NullsUsedSpace
SET     VARCHARField = NULL;

ALTER INDEX PK_NullsUsedSpace ON dbo.NullsUsedSPace
REBUILD;

20141222 Nullable Variable

For one last double check, let’s add the integer and char values back into the table to verify the size is still the same.

UPDATE  dbo.NullsUsedSpace
SET     IntegerField = RAND() * 100000 ,
        CHARField = 'Fishing Rocks!';

ALTER INDEX PK_NullsUsedSpace ON dbo.NullsUsedSPace
REBUILD;

20141222 Nullable Variable 2

So there you have it!  Again, this was a pretty simple concept.  This demonstration proved that NULLs take just as much space in a fixed length field as any other value does, just like an empty tackle box takes up just as much space in a boat as a full one.