Tally Tables

Occasionally I like to take a break from Muskie fishing and spend time catching some easier species. This is especially true when I’m taking friends out fishing. Not many people like to cast for hours with only a few follows to show for it. Last month I took my brother Steve out onto a smaller lake about five minutes from my house. This lake is overrun with invasive weeds, and I tend to think of it as a garbage lake. However, we had a great time catching fish. My brother caught several bass and a bonus walleye, while I managed this fat 30″ pike. The pike took a good 5 minutes to get in the boat since I was using fairly light tackle and we had no net.

Little Cedar Northern Pike.jpg

SQL is a set based language. It is built with the idea that the engine will handle any looping in the background, without the author needing to specify the best way to loop. There are a few rare exceptions, but if you are creating a loop in SQL, you are usually doing something wrong or much less efficiently. One great way to get around loops is to create a Tally Table. Originally defined by SQL Server legend Jeff Moden in 2008, the Tally Table is simply a table with a single column of very well indexed sequential numbers.
If you’re a programmer or developer, you’re probably going to think of something like this to build a Tally Table:

--Create the Tally Table
CREATE TABLE #Tally
(
    N INT
  , CONSTRAINT PK_Tally_N
        PRIMARY KEY CLUSTERED (N)
);

--Set up a increment counter
DECLARE @TallyCounter INT;
SET @TallyCounter = 1;

--Fill the Tally Table with a Loop
WHILE @TallyCounter <= 11000
BEGIN
    INSERT INTO #Tally
    (
        N
    )
    VALUES (@TallyCounter);

    SET @TallyCounter = @TallyCounter + 1;
END;

Running on my server, this code took an avergage of 432 milisecond while requiring 22,426 reads and 407 CPU. A more efficient way to generate the table will be like this:

--Create and populate table
SELECT TOP 11000
    IDENTITY(INT, 1, 1) AS N
INTO #Tally
FROM MASTER.sys.syscolumns sc1
   , MASTER.sys.syscolumns sc2;

--Add Primary Key Clustered
ALTER TABLE #Tally
ADD CONSTRAINT PK_Tally_N 
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100;

This took me only 73 miliseconds to run, and required only 885 reads and 78 CPU.
In Oracle this is even easier to create:

CREATE TABLE tempTally AS
SELECT LEVEL AS N
FROM DUAL
CONNECT BY LEVEL <= 11000
ORDER BY LEVEL;

So now we’ve got a table full of sequential numbers from 1 to 11,000. What can we use this for?
From a programmer or developer perspective, loops are often used with strings. Let’s say we want to step through and display each character in a string. With a loop, you’d do something like this:

DECLARE @StepThroughMe VARCHAR(100), @i INT;
SELECT @StepThroughMe = 'Looping through this string is a waste of time.', @i = 1;

WHILE @i <= LEN(@StepThroughMe)
BEGIN
    SELECT @i, SUBSTRING(@StepThroughMe, @i, 1);
   SELECT @i = @i+1
END;

Using a Tally table, you can do it in a way that is simpler to write and runs in less than a tenth of the time:

DECLARE @TallyThroughMe VARCHAR(100);
SELECT @TallyThroughMe = 'Using a Tally Table is an efficient use of time.'

SELECT t.N, SUBSTRING(@TallyThroughMe, t.N, 1)
FROM #Tally AS t
WHERE t.N <= LEN(@TallyThroughMe);

One other way I used this was to create my Date table in my date warehouse.

WITH cte
AS (SELECT DATEADD(DAY, N - 1, '2000-01-01') AS Date
    FROM #Tally
   )
SELECT YEAR(cte.Date) * 10000 + MONTH(cte.Date) * 100 + DAY(cte.Date) AS DateKey
     , cte.Date
     , YEAR(cte.Date) AS YEAR
     , DATEPART(QUARTER, cte.Date) AS Quarter
     , MONTH(cte.Date) AS MONTH
     , RIGHT('0' + CAST(MONTH(cte.Date) AS VARCHAR(2)), 2) + '. ' + DATENAME(MONTH, cte.Date) AS MonthName
     , DATEPART(ww, cte.Date) + 1 - DATEPART(ww, CAST(DATEPART(mm, cte.Date) AS VARCHAR) + '/1/' + CAST(DATEPART(yy, cte.Date) AS VARCHAR)) AS WeekOfMonth
     , DATEPART(wk, cte.Date) AS WeekOfYear
     , DATEPART(dw, cte.Date) AS DayOfWeek
     , RIGHT('0' + DATEPART(dw, cte.Date), 2) + '. ' + DATENAME(dw, cte.Date) AS DayOfWeekName
     , DAY(cte.Date) AS DayOfMonth
     , DATEPART(DAYOFYEAR, cte.Date) AS DayOfYear
     , CASE
           WHEN DATEPART(QUARTER, cte.Date) IN ( 1, 2 ) THEN
               'Spring'
           ELSE
               'Fall'
       END AS RetailSeason
FROM cte;

This worked for loading my permanent table, but you could also use it to load a temp table or table variable that could be joined to a data set to get a full range of dates even when your data set is missing data on some of the dates.

Tally tables can be used to improve performance in a number of different scenarios. Next time you’re not sure whether you may need a loop, stop and consider whether your situation may benefit from a Tally Table.

Advertisements