Bad Database Design

Fall is quickly approaching here in southeast Wisconsin. Football has begun, the nights are cooler, and the leaves are beginning to change colors. I love fall because the weather is much more tolerable than the normal Wisconsin summer, which is muggy, hot and humid, and the normal Wisconsin winter, which is freezing cold and full of snow. I also love fall because it is time for the big muskies to come up shallow and binge feed for the winter. I had a meeting to check out a datacenter near Pewaukee Lake on Wednesday of this week.  The meeting was at 8 AM, so I got up at 4:30 AM and got out on the lake for a while first.  I didn’t even end up seeing a fish (except for the jumping carp), but it was still a great morning to be out on the water.  70 degree weather before dawn in September in Wisconsin is rare, so you have to take advantage of it when it happens.
One thing I don’t love is bad database design. In fact, I really, really dislike bad database design. This is especially true when the database belongs to a third party software and I can’t fix it. Being the type of person who sees inefficiencies and wants to correct them, looking at terrible database structure day after day can be quite frustrating.
As a warning, this post is about to turn into a bit of a complaint session. We’ve been having quite a few problems with our PLM software. I won’t mention the product by name, but if you’re evaluating PLM software and come across this post and want to know the owner of this atrocity, send me a comment and I’ll email you the name. Because the fact is, no one should ever buy this product again. Unfortunately it was already selected and in the process of implementation when I started my employment here (and a year and a half later its still in the process of implementation). Back to the problems – we started out with very slow page loading times. If you continue reading and see the table definition below, you’ll quickly see some reasons for it. In some cases it was taking over ten seconds for a single page to load. We asked for a fix which was delivered a month and a half later. Unfortunately, this fix ended up deleting large portions of data – measurements that were attached to design styles. A fix for that came a couple of weeks later (in the meantime, guess who’s restoring all the deleted data?). That fix ended up deleting an additional portion of data – the description of certain points of measure. This product was so clunky that even the company that built it could not make effective changes. Their response to the problems they caused was also disappointing.

Let’s take a look at the definition of one of the tables.  This table is used to store information attached to a design style:

CREATE TABLE [dbo].[descoder](
   [season] [char](15) NOT NULL,
   [style] [char](30) NOT NULL,
   [designmemo] [text] NOT NULL,
   [designimg] [char](50) NOT NULL,
   [id_style] [int] IDENTITY(1,1) NOT NULL,
   [expdate] [datetime] NOT NULL,
   [origexpdate] [datetime] NOT NULL,
   [status] [char](50) NOT NULL,
   [cycledays] [int] NOT NULL,
   [laststep] [char](50) NOT NULL,
   [dayslate] [int] NOT NULL,
   [inidate] [datetime] NOT NULL,
   [stgscode] [char](50) NOT NULL,
   [completed] [bit] NOT NULL,
   [datecompleted] [datetime] NOT NULL,
   [Revision] [char](10) NOT NULL,
   [Desce] [char](50) NOT NULL,
   [Publish] [bit] NOT NULL,
   [DesnImage] [text] NOT NULL,
   [DistSpecs] [bit] NOT NULL,
   [LastDate] [datetime] NOT NULL,
   [WasRevised] [bit] NOT NULL,
   [misc1] [char](30) NOT NULL,
   [misc2] [char](30) NOT NULL,
   [misc3] [char](30) NOT NULL,
   [misc4] [char](30) NOT NULL,
   [misc5] [char](30) NOT NULL,
   [misc6] [char](30) NOT NULL,
   [misc7] [char](30) NOT NULL,
   [misc8] [char](30) NOT NULL,
   [misc9] [char](30) NOT NULL,
   [misc10] [char](30) NOT NULL,
   [misc11] [char](30) NOT NULL,
   [misc12] [char](30) NOT NULL,
   [misc13] [char](30) NOT NULL,
   [misc14] [char](30) NOT NULL,
   [misc15] [char](30) NOT NULL,
   [misc16] [char](30) NOT NULL,
   [misc17] [char](30) NOT NULL,
   [misc18] [char](30) NOT NULL,
   [misc19] [char](30) NOT NULL,
   [misc20] [char](30) NOT NULL,
   [prototype] [bit] NOT NULL,
   [startdate] [datetime] NULL,
   [wholecost] [numeric](14, 4) NOT NULL,
   [WholePrice] [numeric](14, 4) NOT NULL,
   [mainimage] [char](10) NOT NULL,
   [scale] [char](15) NOT NULL,
   [ManufPrice] [numeric](14, 4) NOT NULL,
   [RetailPrice] [numeric](14, 4) NOT NULL,
   [WholeMargin] [numeric](14, 4) NOT NULL,
   [RetailMargin] [numeric](14, 4) NOT NULL,
   [manufcost] [numeric](14, 4) NOT NULL,
   [manufmargin] [numeric](14, 4) NOT NULL,
   [retailcost] [numeric](14, 4) NOT NULL,
   [Forecast] [numeric](10, 0) NOT NULL,
   [unit] [char](4) NOT NULL,
   [gartype] [char](5) NOT NULL,
   [PDMisc1] [char](30) NOT NULL,
   [PDMisc2] [char](30) NOT NULL,
   [PDMisc3] [char](30) NOT NULL,
   [stylestatus] [char](15) NOT NULL,
   [hasColorWay] [bit] NOT NULL,
   [chartcode] [char](10) NOT NULL,
   [chartDescription] [char](50) NOT NULL,
   [LabPacknotes] [text] NULL,
   [Constnotes] [text] NULL,
   [colwnotes] [text] NULL,
   [labnotes] [text] NULL,
   [ProtoSeason] [char](15) NOT NULL,
   [ProtoStyle] [char](30) NOT NULL,
   [groupName] [char](20) NOT NULL,
   [sizespecs] [text] NULL,
   [colorspecs] [text] NULL,
   [cost] [numeric](8, 2) NOT NULL,
   [saleprice] [numeric](8, 2) NOT NULL,
   [ldpcost] [numeric](10, 4) NOT NULL,
   [retailconfirmed] [bit] NOT NULL,
   [costconfirmed] [bit] NOT NULL,
   [plannedunits] [numeric](11, 0) NOT NULL,
   [id_calendar] [int] NOT NULL,
   [morefields] [text] NULL,
   [wboardMaster] [numeric](1, 0) NOT NULL,
   [optLocked] [numeric](1, 0) NOT NULL,
   [RHReady] [bit] NOT NULL,
   [lastexported] [datetime] NOT NULL,
   [lastedited] [datetime] NOT NULL,
   [imgfront] [char](10) NOT NULL,
   [imgBack] [char](10) NOT NULL,
   [imgSide] [char](10) NOT NULL,
   [imgDetail] [char](10) NOT NULL,
   [Frontxt] [char](30) NOT NULL,
   [backtxt] [char](30) NOT NULL,
   [sidetxt] [char](30) NOT NULL,
   [detailtxt] [char](30) NOT NULL,
   [misc21] [char](30) NOT NULL,
   [misc22] [char](30) NOT NULL,
   [misc23] [char](30) NOT NULL,
   [misc24] [char](30) NOT NULL,
   [misc25] [char](30) NOT NULL,
   [misc26] [char](30) NOT NULL,
   [misc27] [char](30) NOT NULL,
   [misc28] [char](30) NOT NULL,
   [misc29] [char](30) NOT NULL,
   [misc30] [char](30) NOT NULL,
   [misc31] [char](30) NOT NULL,
   [misc32] [char](30) NOT NULL,
   [misc33] [char](30) NOT NULL,
   [misc34] [char](30) NOT NULL,
   [misc35] [char](30) NOT NULL,
   [misc36] [char](30) NOT NULL,
   [misc37] [char](30) NOT NULL,
   [misc38] [char](30) NOT NULL,
   [misc39] [char](30) NOT NULL,
   [misc40] [char](30) NOT NULL,
   [descrev] [char](10) NOT NULL,
   [batchNo] [char](10) NOT NULL,
   [collab_issues] [int] NOT NULL,
   [prrob] [char](250) NOT NULL,
   [Attachments] [int] NOT NULL,
   [CreatedDate] [datetime] NOT NULL,
   [Createdby] [char](25) NOT NULL,
   [Editedby] [char](25) NOT NULL,
   [LastTabEdited] [char](25) NOT NULL,
   [CreatedOn] [datetime] NOT NULL,
   [ModifiedBy] [char](25) NOT NULL,
   [ModifiedOn] [datetime] NOT NULL,
   [RowVersion] [int] NOT NULL,
   [hts] [char](20) NOT NULL,
   [TargetFOB] [money] NOT NULL,
   [QuotedFOB] [money] NOT NULL,
   [ActualFOB] [money] NOT NULL,
   [CurrentFOB] [money] NOT NULL,
   [LandedCost] [money] NOT NULL,
   [xTargetFOB] [money] NOT NULL,
   [xQuotedFOB] [money] NOT NULL,
   [xActualFOB] [money] NOT NULL,
   [xCurrentFOB] [money] NOT NULL,
   [xLandedCost] [money] NOT NULL,
   [xRetailPrice] [money] NOT NULL,
   [xWholePrice] [money] NOT NULL,
   [xWholeMargin] [money] NOT NULL,
   [isset] [bit] NOT NULL,
   [sidemeasurement] [bit] NOT NULL,
   [LastSeason] [char](15) NULL,
   [LastStyle] [char](30) NULL,
   [lineplanorder] [numeric](6, 0) NOT NULL,
 CONSTRAINT [PK_descode] PRIMARY KEY CLUSTERED 
(
   [id_style] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY],
 CONSTRAINT [Design Styles: Style/Season Duplicated] UNIQUE NONCLUSTERED 
(
   [style] ASC,
   [season] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

A few things that jump out right away:

  • The use of fixed length fields.  Lots of CHAR() fields instead of VARCHAR.  Fixed length fields like CHAR are useful when you are know you are always going to be storing an exact amount of characters.  In this application, many of these fields are usually blank.  The Miscfields, for example, are company defined fields (CDFs) that may or may not even be set up.  Every single row that doesn’t use them is storing 30 characters of blank spaces for each cdf that is not used.  There are several modules to this software and on some we are not using any of the CDFs.  We are storing a massive amount of blank spaces on disk.
  • A natural primary key exists across the style and season fields.  However, this table was built with the primary key on the identity field and a unique index on style + season.  There is a nonclustered index on season + style, but every query that uses that index now requires a key lookup to get any non-included columns.  The data is also physically sorted by the id_style column, which contains an arbitrary identity number.  Want to get all the styles in one season?  Prepare to do a clustered index scan instead of a seek.
  • The table is not normal.  Edgar F Codd would be rolling in his grave if he knew this kind of design was being implemented in a relational database.  Take the CDFs for example.  The maximum number of CDFs is 40 and you have to store info for all 40 along with the main record.  A normal design would put the CDFs in a table like this.
CREATE TABLE dbo.StyleCDFs
    (
      id_style INT PRIMARY KEY,
      CDFID INT ,
      CDFValue VARCHAR(30)
    );

This reduces the amount of data stored and makes updating data much easier. Joining to it could hurt performance a bit, which is why some designs need to be denormalized to a degree, but attaching all the CDFs to the main record in dbo.descoder is a bad idea.

  • The table width is out of control.  Using sys.partitions and sys.allocation_units to view data from this table, I can see there are 4803 rows in the table and 2611 data pages being used for the clustered index.  That means each page is storing only two rows, even with a 90% fill factor.  This results in a lot of extra pages being read.

This is just one example of a bad table.  There are several others, and they all include those same problems and many others, including:

  • We are missing foreign keys from most of the table in the database.
  • A trace of the database shows a mix of parameterized statements and dynamic SQL with XML.  I prefer to use stored procedures to get data (this is often an argued preference among developers) but since they clearly don’t have a development DBA this wasn’t even an option.
  • Many tables contain ambiguous id columns.  For example, one table has an id_style, an id_column, and an id_header.  Two are foreign keys and one is not a key at all.  How long does it take to figure out which is which?
  • Another problem which is only indirectly related to the database involves a screen with several subtabs.  Each subtab has different information related to the parent record.  After tracing the database to find the answer to a different problem I discovered that each time you click a different subtab, data from each subtab is pulled from the database.  Its clear there is a base set of code that is firing every time a subtab is clicked and it doesn’t depend on what tab is clicked.  This results in unnecessary calls to the database and much longer page load times.

So, I hope reading this has given you some ideas on how not to build a database.

Advertisements