How Index Fill Factor affects new rows

Fishing is not what it used to be. I recently began reading the Wisconsin Outdoor News regularly. It has helped me to understand just how pervasive fish stocking is. Without it, the population of fishermen in Wisconsin, which includes me, would quickly overcome the population of fish, leaving our lakes and rivers largely empty. Stocking fish isn’t a bad thing; it allows more people to enjoy fishing that much more. It allows for bigger bag limits and greater participation. But I do believe we should not trick ourselves into thinking that fishing is the same now as it used to be before we settled and tamed our great state. Many of the lakes we enjoy fishing have species of fish that are not native to that body of water. So yes, it’s great that my local lake has a nice population of walleye (most of which have been stocked), but the natural balance between the species of fish in the lake has been artificially disrupted for the purposes of catering to fishermen who prize walleye above all else. Is it fair to fisherman who enjoy fishing other species, or even to the ecosystem of the lake, to maintain this stocking? That’s a tough question to ask. We just don’t know the long-term affects of maintain such an unnatural lack of balance will be.

Fish Stocking.jpg
One thing I like about SQL Server is you can test almost anything, so you can be sure what the long term affects of a change will be. I recently answered a trivia question regarding Index Fill Factor incorrectly, so I decided to test out the question in SQL. The question asked whether fill factor affects newly inserted rows on the table. The Fill Factor of an index controls how much empty space the database engine will attempt to leave on each page for the purposes of minimizing page splits during updates or inserts.

For example, if I have a page that is 100% full and someone updates a varchar(1000) field to go from a value that is using 10 bytes to a value using 250 bytes, SQL Server’s got some work to do. It can’t just fill in the empty space on the page because their is none. It’s going to have to take some rows of data on the page, move it to another page, create a pointer on the original page to the new page, and then perform the data update. This is a lot of extra work and it increases index fragmentation. So if you have a table where you expect many updates, you might intentionally leave some blank space on the page. That way updates that increase used space in a row can simply push the extra data down into the free space without doing a bad page split.
Now, I assumed that when SQL Server inserts new rows onto the end of a table, it will insert those rows with a fill factor equal to the existing fill factor on the index of the table. However, when I submitted that answer to the trivia question, it was marked incorrect. So I decided to set up this test. I create an index with a low fill factor, then insert a bunch of rows on the end. If the overall used space as seen in the sys.dm_db_index_physical_stats view goes up, then we can be confident the new rows are being inserted at close to 100%. If it stays about the same, then we know the new rows are being inserted with the fill factor defined on the index.

First I create a new table for the test:

CREATE TABLE [dbo].[GuggTest_20180411](
   [id] [int] IDENTITY(1,1) NOT NULL,
   [somevarchardata] [varchar](100) NULL,
   [somenumericdata] [decimal](10, 2) NULL,
 CONSTRAINT [PK_GuggTest_20180411] PRIMARY KEY CLUSTERED 
(
   [id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 20) ON [PRIMARY]
) ON [PRIMARY]
GO

Next I’m going to fill it in with some data:

DECLARE @i INT = 0;

WHILE @i < 100000
BEGIN
    INSERT INTO dbo.GuggTest_20180411
    (
        somevarchardata
      , somenumericdata
    )
    SELECT CONVERT(VARCHAR(100), NEWID())
         , ROUND(RAND(CHECKSUM(NEWID())) * (100000000), 2);

    SELECT @i = @i + 1;
END;

Now I’ll run the system dmv to see the existing page space usage:

20180411 Index Usage Before Rebuild

The leaf level is where we expect to see the page space used around 20%. To see the leaf level we look for the lowest index level in the DMV. So we can see even now the average space used is close to 100% despite having created the index with a fill factor of 20%.  To get it down to 20 %, I’ll rebuild the index:

ALTER INDEX [PK_GuggTest_20180411] ON [dbo].[GuggTest_20180411] REBUILD PARTITION = ALL WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 20)

Now I’ll rerun the DMV to see the space used, which should be closer to 20%.

20180411 Index Usage After Rebuild.png

So we can see the average space used is now just over 20%.  (As an aside, we can also see the page count has ballooned to 3704 from 770.  This shows the danger of reducing the fill factor.  Selecting all rows from this table just became 481% more expensive!)

I’ll now insert another 10,000 rows into this table.  If the new rows go in with a fill factor of 20%, the average page space used metric should remain around 20%.  If they go in at 100%, I expect the average space used will go up significantly.

DECLARE @i INT = 0;

WHILE @i < 100000
BEGIN
    INSERT INTO dbo.GuggTest_20180411
    (
        somevarchardata
      , somenumericdata
    )
    SELECT CONVERT(VARCHAR(100), NEWID())
         , ROUND(RAND(CHECKSUM(NEWID())) * (100000000), 2);

    SELECT @i = @i + 1;
END;

And now I check the DMV one last time:

20180411 Index Usage After Insert.png

We can see the space used increased to 34%, indicating the rows are not inserted with the specified fill factor as I had assumed.

Advertisements