A deeper dive into XML in SQL Server

When I was younger, I remember my family taking a camping trip near a lake with trout. As best I can remember, the lake was stocked with rainbow and brown trout. When we visited the lake to fish there was a crowd of people along one of the shores. There were a mass of small trout in the water, and everyone was trying to get one to bite. Very few of the people were succeeding. Seeing the fish there, I joined in the group, desparate to try to get one of the 8″ – 10″ fish to bite. Suddenly from a shoreline halfway across the lake, I heard a triumphant cry. Looking up, I saw a man holding an enormous trout in the air. He’d caught the fish by avoiding the crowd of people and small fish, and exploring a new shoreline. I learned two lessons that day. First, if you target smaller fish because you know where they are, you’ll never catch the big ones. Second, if you never explore unfamiliar water you’ll miss out on some good fishing.
One of our developers was recently pulling data from SQL Server to create a catalog feed for a website. This feed included our products, which have a parent child hierarchy between a base product and the skus beneath that product. His plan was to create two stored procedures – one for the base product and one for the skus. Then in his code he was going to take the results of those two stored procedures and merge them into a single XML document.
I had recently taken a SQL quiz and ran into the hierarchy data type, and thought this might be a good way to experiment with using it. After doing some initial work, I found that the hierchary id was overkill for this situation. The Hierarchy data type seems to work best when you have several levels you want be able to delve in to. Instead, I decided to attempt the feed directly in SQL using the FOR XML clause. I have used this clause a few times before, but never to build something as complex as this document. Here are some things I did or observed that may be helpful:
1. Concatenating XML
The final output document had several sections in it that I was forced to create with separate SELECT statements. I couldn’t get these together into the same XML document by unioning or concatenating them together. Instead I had to place them each into variables, then SELECT the variables together in a separate statement that used the FOR XML clause.

DECLARE @xmlheader XML, @xmlnonskuitem AS XML, @xmlsizeonlyitem AS XML, @xmlcoloronlyitem AS XML, @xmlcolorandsizeitem AS XML, @xmlskus AS XML;

SELECT @xmlheader =  ...;
SELECT @xmlnonskuitem = ...;
SELECT @xmlcoloronlyitem = ...;
SELECT @xmlsizeonlyitem = ...;
SELECT @xmlcolorandsizeitem = ...;
SELECT @xmlskus = ...;

SELECT @xmlheader, @xmlnonskuitem, @xmlcoloronlyitem, @xmlsizeonlyitem, @xmlcolorandsizeitem, @xmlskus
--End root

2. Building depth into the XML tree
The xml we were creating had several child elements, some of which needed to be repeated for each variation, and some of which were used to encapsulate all variants for a particular base product. To get this layering, I had to create some of the elements with varchar characters that could be converted to XML. Complicating this, I needed to create XML inside of those text characters, and join them together. The general pattern for this is to convert the inner most XML to varchar, then concatenate those together with the hard coded elements, then convert the whole thing back to XML.

SELECT @xmlsizeonlyitem
    SELECT RTRIM(LTRIM(i.[Field2])) AS "@product-id"
         , CONVERT(
                    , '<attributes><variation-attribute attribute-id="size" variation-attribute-id="size"><display-name>Size</display-name><variation-attribute-values>'
                      + CONVERT(VARCHAR(MAX)
                              , (
                                    SELECT LTRIM(RTRIM(s.size)) AS [@value]
                                         , RTRIM(LTRIM(s.size)) AS [display-value]
                                    FROM db.SCHEMA.tblSku AS s
                                    WHERE s.Field1 = 125
                                          AND s.Field2 = i.Field2
                                    FOR XML PATH('variation-attribute-value'), TYPE
                               ) + '</variation-attribute-values></variation-attribute></attributes><variants>'
                      + CONVERT(VARCHAR(MAX)
                              , (
                                    SELECT RTRIM(LTRIM(s.Field2)) + ' ' + RTRIM(LTRIM(s.Field3)) AS [@product-id]
                                    FROM db.SCHEMA.tblSku AS s
                                    WHERE s.Field1 = 125
                                          AND s.Field2 = i.Field2
                                    FOR XML PATH('variant'), TYPE
                               ) + '</variants>'
    FROM db.SCHEMA.tblBaseItem i
    WHERE i.Field1 = 125
          AND i.Field2 = 'BaseItemId'
    GROUP BY i.Field1
           , i.Description
    FOR XML PATH('product')

3. Elements versus attributes
It’s very easy to change the field you are returning from the database to either an element or an attribute. For attributes you put the @ sign in the front of the name. For elements, you just leave the name as is.

SELECT @xmlcolorandsizeitem = (SELECT RTRIM(LTRIM(i.[Field2])) AS "@product-id"
    , RTRIM(LTRIM(i.Description)) AS [display-name]
FROM db.SCHEMA.baseitemtable i
    WHERE i.field1 = 125
          AND i.field2 = '7972'
         GROUP BY i.field1, i.Description
FOR XML PATH('product')) ;

20180216 XML Elements vs Attributes
4. Renaming ROOT and ROW
You can rename the ROOT element to something else by putting that in the parenthesis after ROOT.
You can also rename the first element, usually defaulted as ROW, by putting the name in the parenthesis after RAW or PATH. This does not work in EXPLICIT or AUTO modes, which don’t have a ROW element.

FOR XML RAW ('catalog') ROOT ('catalogs');

20180216 XML Root and Raw.PNG
5. Adding namespace
To add a namespace to the XML, you can use the WITH XMLNAMESPACES clause. This only works for RAW, AUTO, or PATH modes.

WITH XMLNAMESPACES('http://www.company.com/xml/impex/catalog/2106-1-31' AS Co)

20180216 XML Namespace

I hope these tips give you a better understanding of how to work with XML in SQL.


Import XML into SQL Server

One thing any responsible fisherman needs is a pair of pliers.  You may need different sizes for different types of fish.  For panfish, something small and skinny is going to help get in their small mouths to extract deep hooks.  For muskie, you’re going to want an extra-long pair to get to the hooks while keeping your hands away from those viscous teeth.  I’ve found that Northern Pike are particularily good at swallowing hooks into the back of their throats.  The last thing to remember is that a cut in the gills is almost always a death sentence for the fish.  Since the gills are what re-oxygenates the blood, their is a high volume of blood flow there, and even a small cut can cause a fish to bleed out in minutes.  If you aren’t confident you can remove a deep hook without damaging the fish, it’s usually best to just cut the line as close to the hook as possible.  Studies have shown the fish are adept at shaking hooks off once they are free of the fisherman.


I recently was provided a file that I needed to extract a bunch of data from.  The problem was that the file was xml, not the easiest to import into SQL Server.  I took a two step approach to solve this problem.  The first step was to import the data into SQL Server as a blob:

CREATE TABLE dbo.MarketplaceDecemberXML
, LoadedDate DATETIME

INSERT  INTO dbo.MarketplaceDecemberXML
( XMLData
, LoadedDate
SELECT  CONVERT(XML , BulkColumn) AS BulkColumn
FROM    OPENROWSET(BULK '\\ServerName\DataImport\All Orders.txt' , SINGLE_BLOB) AS oro;

The next step is to pull out any data that is needed.  There are a few ways to do this, but the easiest for me was to use the OPENXML function.  This function required me to load the xml data into an xml variable, then I could read different data based on the depth to which I did the initial opening.

, @hDoc AS INT

SELECT  @XML = adx.XMLData
FROM    dbo.MarketplaceDecemberXML adx;
EXEC sys.sp_xml_preparedocument @hDoc OUTPUT , @XML;

SELECT  MarketplaceOrderID
, FulfillmentChannel
, PurchaseDate
, CustState
, OrderStatus
FROM    OPENXML(@hDoc, 'VendorEnvelope/Message/Order/FulfillmentData/Address')
VendorOrderID VARCHAR(100) '../../VendorOrderID',
OrderStatus VARCHAR(100) '../../OrderStatus',
FulfillmentChannel VARCHAR(100) '../FulfillmentChannel',
PurchaseDate DATETIME '../../PurchaseDate',
CustState VARCHAR(100) 'State'

EXEC sys.sp_xml_removedocument @hDoc;

Here is a screenshot of some of the data I was working with:


The important thing to note is that I used the ../ to move up in the hierarchy.  Using this method I was able to extract all the data I needed and place it in a table where I could query it at will.