Index a Computed Column to Improve Query Performance

This summer started out with a below-average fishing volume. The birth of my fourth child in June used up much of the time I would normally spend fishing. As the summer grew late, I was able to get out a bit more, but it’s mostly been a small lake near my house. I’ve only gotten down to Pewaukee to muskie fish once, though I am happy to say I did have some luck that outing. The fish came in hot behind my lure, but veered off when it got to the side of the boat. As I began the figure eight, I turned to my brother-in-law to let him know I saw a fish and to estimate the size. While I was looking at him the fish came back and struck. After a brief but spirited fight, we netted the fish, a 40.5″ muskie. She was very healthy and I feel blessed to have gotten one my first time out this summer.

20180728 Pewaukee Musky

Doing a figure-eight while musky fishing is something you should do on every cast, just like making sure every query you’re executing is using an index as expected.  In some cases though, indexes can be hard to use.  I wrote an earlier post about SARGability – making sure your indexes can be used.  As I explained in that post, some things, such as functions on the columns being searched, can prevent SQL Server from doing a seek on the index.

I recently was facing a situation like that while loading a data warehouse table.  These loads use a large number of row and can be very time consuming, so it is important to make sure the performance is as efficient as possible.  In this case, I was matching customers in one table to customers in another by comparing their addresses and cities.  The relevant part of the join was:

REPLACE(LEFT(Table1.StreetAddress, 10) + LEFT(Table1.City, 5), ' ', '') = REPLACE(LEFT(Table2.StreetAddress, 10) + LEFT(Table2.City, 5), ' ', '')

Both the REPLACE and LEFT functions prevent the index from being seeked.  An initial test run of the join on the two big tables came in at about 3 hours, much longer than I needed to make it.

To solve the problem, I created a view that added a computed column to hold the match code:

CREATE VIEW [cwi].[Customer Sold To Indexed Address Match]
WITH SCHEMABINDING

AS
SELECT dncomp AS [CMP Company]
     , dnjynb AS [CST Customer #]
    , [...] AS [Many other columns]
    , REPLACE(LEFT(dnl1tx, 10) + LEFT(dnl5tx, 5), ' ', '') AS AddressMatch
FROM vendorschema.vendortable;
GO

The important thing to note here is the SCHEMABINDING.  This tells SQL Server that the view is dependent upon the object (in this case, vendorschema.vendortable) staying the same.  If we were to go into vendorschema.vendortable and attempt to drop the dnl1tx column, SQL Server would give me an error informing me that the cwi.[Customer Sold To Indexed Address Match] view depends upon that object and the drop would fail.

Once I have the computed column set up, I can build an index on it just as I would any other column:

CREATE UNIQUE CLUSTERED INDEX [IX_lwt_CustSoldToIndexAddressMatch_AddressMatch] ON [cwi].[Customer Sold To Indexed Address Match]
(
   [AddressMatch] ASC,
   [CMP Company] ASC,
   [CST Customer #] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Now when I run the same query that was before taking about 3 hours, the results are returned in a minute and a half. That’s an improvement any DBA can be proud of!

Advertisements