Index a Computed Column to Improve Query Performance

This summer started out with a below-average fishing volume. The birth of my fourth child in June used up much of the time I would normally spend fishing. As the summer grew late, I was able to get out a bit more, but it’s mostly been a small lake near my house. I’ve only gotten down to Pewaukee to muskie fish once, though I am happy to say I did have some luck that outing. The fish came in hot behind my lure, but veered off when it got to the side of the boat. As I began the figure eight, I turned to my brother-in-law to let him know I saw a fish and to estimate the size. While I was looking at him the fish came back and struck. After a brief but spirited fight, we netted the fish, a 40.5″ muskie. She was very healthy and I feel blessed to have gotten one my first time out this summer.

20180728 Pewaukee Musky

Doing a figure-eight while musky fishing is something you should do on every cast, just like making sure every query you’re executing is using an index as expected.  In some cases though, indexes can be hard to use.  I wrote an earlier post about SARGability – making sure your indexes can be used.  As I explained in that post, some things, such as functions on the columns being searched, can prevent SQL Server from doing a seek on the index.

I recently was facing a situation like that while loading a data warehouse table.  These loads use a large number of row and can be very time consuming, so it is important to make sure the performance is as efficient as possible.  In this case, I was matching customers in one table to customers in another by comparing their addresses and cities.  The relevant part of the join was:

REPLACE(LEFT(Table1.StreetAddress, 10) + LEFT(Table1.City, 5), ' ', '') = REPLACE(LEFT(Table2.StreetAddress, 10) + LEFT(Table2.City, 5), ' ', '')

Both the REPLACE and LEFT functions prevent the index from being seeked.  An initial test run of the join on the two big tables came in at about 3 hours, much longer than I needed to make it.

To solve the problem, I created a view that added a computed column to hold the match code:

CREATE VIEW [cwi].[Customer Sold To Indexed Address Match]
WITH SCHEMABINDING

AS
SELECT dncomp AS [CMP Company]
     , dnjynb AS [CST Customer #]
    , [...] AS [Many other columns]
    , REPLACE(LEFT(dnl1tx, 10) + LEFT(dnl5tx, 5), ' ', '') AS AddressMatch
FROM vendorschema.vendortable;
GO

The important thing to note here is the SCHEMABINDING.  This tells SQL Server that the view is dependent upon the object (in this case, vendorschema.vendortable) staying the same.  If we were to go into vendorschema.vendortable and attempt to drop the dnl1tx column, SQL Server would give me an error informing me that the cwi.[Customer Sold To Indexed Address Match] view depends upon that object and the drop would fail.

Once I have the computed column set up, I can build an index on it just as I would any other column:

CREATE UNIQUE CLUSTERED INDEX [IX_lwt_CustSoldToIndexAddressMatch_AddressMatch] ON [cwi].[Customer Sold To Indexed Address Match]
(
   [AddressMatch] ASC,
   [CMP Company] ASC,
   [CST Customer #] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Now when I run the same query that was before taking about 3 hours, the results are returned in a minute and a half. That’s an improvement any DBA can be proud of!

Advertisements

Sargability – make sure your indexes are being used!

My favorite time of year to fish is fall. The fish are bulking up for the winter, so they are fatter than usual. They also will often binge feed, providing some great fishing. However, fall can be rather uncomfortable here in Wisconsin. We get freezing temperatures, blustery days, and snow or rainfail for extended periods in the fall. Just this last October I was fishing cold conditions, with a pretty strong wind and light snow that turned heavy every so often. I remember looking around, seeing no one else on the lake, and thinking to myself “This is living!”. I’ve come to enjoy the sting of the weather because I associate it with the joy of fishing and being in the outdoors.
One thing that can really sting database end users is a slow query. Web pages that take seconds to load instead of miliseconds and BI queries that take minutes instead of seconds can make people awfully grumpy toward the DBA. One way to make sure you can aviod this sting is to ensure your queries are sargable. Sarbable is a made up term that comes from the expression Search ARGument ABLE. Basically, you want to make sure any fields that are referenced in the WHERE, ORDER BY, GROUP BY, and HAVING clauses are able to use an index. The main way to do this is to avoid functions in your predicate. Let’s take a look at a quick example. I’ll create a table with a clustered primary key and a some nonclustered indexes and fill it with some data:

--Create table with primary key clustered
CREATE TABLE dbo.BaitPurchases (ID INT IDENTITY(1,1), Baitname VARCHAR(255) NOT NULL, Purchasedate DATETIME2 NULL, Price MONEY NULL
PRIMARY KEY CLUSTERED 
([ID] ASC)
) ON [PRIMARY]
GO

--Create nonclustered index on PurchaseDate
CREATE NONCLUSTERED INDEX IX_BaitPurchases_PurchaseDate ON [dbo].BaitPurchases
(
   Purchasedate ASC
)INCLUDE (Baitname)
GO

--Create nonclustered index on Baitname
CREATE NONCLUSTERED INDEX IX_BaitPurchases_Baitname ON [dbo].BaitPurchases
(
   Baitname ASC
)INCLUDE (Price)
GO

--Create nonclustered index on Price
CREATE NONCLUSTERED INDEX IX_BaitPurchases_Price ON [dbo].BaitPurchases
(
   Price ASC
)INCLUDE (Baitname)
GO


--Throw in some data
INSERT INTO dbo.BaitPurchases
        ( Baitname, Purchasedate, Price )
VALUES  ( 'Bulldawg','2010-04-29',15.99),
( 'Double Cowgirl','2010-04-30',16.99),
( 'Shallow Raider','2010-05-29',15.49),
( 'Depth Raider','2010-05-30',18.99),
( 'Reef Runner','2011-05-29',14.99),
( 'Mepps Muskie Killer','2011-04-29',12.49),
( 'Suick Thriller','2011-04-30',16.09),
( 'Softtail Phantom','2011-05-30',26.99),
( 'Shallow Invader','2012-04-29',18.29),
( 'Medussa','2012-04-30',17.99),
( 'Grandma','2012-05-29',15.39),
( 'Jake','2012-05-30',15.39),
( 'Believer','2013-04-29',16.59),
( 'Harasser','2013-04-30',13.49),
( 'Doubledawg','2013-05-29',17.99),
( 'Buchertail 700','2013-05-30',13.89),
( 'Hawg Wobbler','2014-04-29',21.99),
( 'Pacemaker','2014-04-30',20.99),
( 'Top Raider','2014-05-29',18.50),
( 'Swim Whizz','2014-05-30',16.98),
( 'Creek Chub Pike Minnow','2015-01-12',15.00)

Now we’ll look for all items that were purchased in 2012, attempting to utilize the nonclustered index that was created on the Purchasedate field:

SELECT Baitname, Purchasedate
FROM dbo.BaitPurchases
WHERE Purchasedate BETWEEN '2013-01-01' AND '2013-12-31 23:59:59.999'

20150112 Sargability Date Seek

You can see from the execution plan that the database engine is doing an index seek on our nonclustered index, just as we were expecting.  However, your junior DBA comes across this code and thinks it’ll be much more readable if he re-writes it with a function to look at the purchase year:

SELECT  Baitname ,
        Purchasedate
FROM    dbo.BaitPurchases
WHERE   YEAR(Purchasedate) = 2013

This does look more readable to me, but if we take a look at the execution plan, we find a very bad negative consequence of this change:

20150112 Sargability Date Scan

We are now doing a nonclustered index scan, meaning the entirety of the index was read. This doesn’t make a big difference in our current tiny table, but with a table with millions of rows you could be seeing the query execution time going from less than a second to minutes.

Let’s take a look at one more common filtering technique that causes index scans instead of seeks. Let’s say you can’t quite remember the name of that lure, but you’re sure it started with an ‘S’. The obvious way to search for this would be:

SELECT  Baitname
FROM    dbo.BaitPurchases
WHERE   Baitname LIKE 'S%'

Looking at the execution plan, we can see we got a nonclustered index seek, so we’re happy with our results.
20150112 Sargability String Seek

However, now we remembered the name didn’t start with an ‘S’, but it definitely had ‘Raider’ in it somewhere. Now we change up our query like this:

SELECT  Baitname
FROM    dbo.BaitPurchases
WHERE   Baitname LIKE '%Raider%'

Now we’re getting a nonclustered index scan:
20150112 Sargability String Scan

Before, the database engine could look at the beginning of each baitname to find the results we were looking for. With the index, the engine jumped right to the section of the index that had the ‘S’s and returned the results from there. Now it has to look in every single bait name and see if it contains the string ‘Raider’. Searches like the second one are usually going to be slow, but SQL Server does provide one tool for getting that data a bit faster, and that is Full-Text indexes. I’ll discuss those in a later blog post.

So take the sting out of database queries by making sure your indexes are being used.