A couple years ago I took a fishing trip up to Canada with my Pop, brother-in-law, and his Pop. We went to Lost Bay Resort on Cliff lake in Northwestern Ontario. It was an outstanding experience, staying in a cabin, spending time with other men, and doing a whole lot of fishing. I really fell in love with Muskie fishing while I was up there. Between the four of us (two to a boat) we probably saw over 100 of the toothy predators, though we managed to boat only three of them. I caught my current personal best, this 46″ beast:
Let’s say we knew of ten must-fish locations on the lake, and we wanted to make sure to fish each of them one day. If we had a single boat to fish them, we would move from one place to another until we’d hit all ten locations. If the locations are big and each takes a while to thoroughly fish, we’re going to be spending quite a bit of time. A more efficient way to make sure we are fishing each of these ten locations would be to use multiple boats. If we have two boats then each needs to cover just five of the locations. If we have five boats each one can focus on just two locations. From a fishing perspective, this makes more sense as the more water that gets covered, the better chance of catching fish.
From a SQL Server perspective, this concept, called Parallelism, also makes a lot of sense. If we can split a task among multiple workers, or threads, the task will finish in a much shorter time. To facilitate this process, SQL Server has a couple of settings that can be adjusted. The first is called Max Degree of Parallelism, of MAXDOP. This setting limits the number of processors to be used in parallel plan execution. The default is 0 which will use all available processors. The maximum number of processors that can be used for a parallel plan is 64, and the maximum number of cores is 32,767. The second setting is called Cost Threshold of Parallelism. This controls how expensive a plan must be before the query optimizer considers using a parallel plan. This is an important part of the equation because threads that finish their part of the query have to sit and wait for the other threads to finish. Using our fishing analogy, if you have five boats fishing ten locations and one boat has two very large locations that each take a half day to fish, but all the others have small locations that take a mere couple hours to fish, you’re going to have four baits sitting around doing nothing while that last boat fishes for the rest of the day.
So you can see that Max Degree of Parallelism and Cost Threshold of Parallelism must managed together to maximize the system efficiency. The only way to know the appropriate values for each is to know your system! You’ll see lots of sources around the net saying turn off Parallelism for systems that are primarily OLAP. This is, in my opinion, bad advice! Use the Cost Threshold of Parallelism to make sure simple queries aren’t using parallelism, don’t turn it off completely. This is like saying you’re usually going to be moving around the lake with a trolling motor, so don’t bother to bring along an outboard. Even if you are using the trolling motor 99% of the time, when it does come time to move all the way across the lake, you’re going to be glad you have that outboard available.
First let’s take a look at where these server properties are shown in SSMS.
On this sandbox server, I have the Cost Threshold for Parallelism set to 1 and the Max Degree of Parallelism set to 8. This means that any query where the cost is over 1, the query optimizer will try to use parallelism. Well, how much is 1? To answer that, we need to look at a query’s execution plan. Here’s a simple query using a couple tables from previous blog posts:
SELECT * FROM Sandbox.dbo.AverageFishLength afl INNER JOIN Sandbox.dbo.FishCaught fc ON fc.FishType = afl.Fishtype WHERE afl.AvgLength > 12
I use Control + M before executing so I can see the actual execution plan, and then run the query. To see the cost relating to Cost Threshold for Parallelism for this query, I leave my mouse over the farthest left icon, the SELECT one. A small table will popup that includes the Estimated Subtree Cost:
Since my Subtree Cost here is 0.0084115, which is less than 1, the Query Optimizer will not attempt to use parallelism on this query. In order to get a complex plan, I’m going to take a few tables that have been created in previous posts and do a cross join:
SELECT * FROM Sandbox.dbo.ClusteredIndex CI CROSS JOIN Sandbox.dbo.NonClusteredIndex NCI WHERE CI.ID <= 15;
Looking at the actual execution plan for this query, I can see the estimated subtree cost is now X, so the Query Optimizer will/won’t try to use parallelism on this query.
So that’s how to see the cost associated with specific queries. However, to set the Cost Threshold to Parallelism to an appropriate number, we need a general idea of the cost of all our queries. To accomplish this, I’ve created the following table:
CREATE TABLE [dbo].[PlanCacheStatementCost]( [PCSCID] [INT] IDENTITY(1,1) NOT NULL, [StatementText] [VARCHAR](8000) NOT NULL, [StatementSubTreeCost] [FLOAT] NOT NULL, [UseCounts] [INT] NOT NULL, [StatementEstRows] [INT] NOT NULL, [Avg_Worker_Time] [INT] NULL, [Avg_Logical_Reads] [INT] NULL, [Avg_Elapsed_Time] [INT] NULL, [CaptureDateTime] [DATETIME] NOT NULL ) ON [PRIMARY]
And this stored procedure to fill the table:
CREATE PROCEDURE [dbo].[CapturePlanCacheStatementCost] AS BEGIN DECLARE @CaptureDate DATETIME SELECT @CaptureDate = DATEADD(minute, DATEDIFF(minute, 0, DATEADD(second, 30 - DATEPART(second, GETDATE() + '00:00:30.000'), GETDATE())), 0) CREATE TABLE #StatementSubTreeCost ( StatementSubTreeCost FLOAT , StatementId INT , UseCounts BIGINT , plan_handle VARBINARY(64) ); WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') INSERT INTO #StatementSubTreeCost (StatementSubTreeCost, StatementId, UseCounts, plan_handle) SELECT c.value('@StatementSubTreeCost', 'float') AS StatementSubTreeCost ,c.value('@StatementId', 'float') AS StatementId ,cp.UseCounts ,cp.plan_handle FROM sys.dm_exec_cached_plans AS cp CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp CROSS APPLY qp.query_plan.nodes('//StmtSimple') t(c) WHERE cp.cacheobjtype = 'Compiled Plan' AND qp.query_plan.exist('//StmtSimple') = 1 AND c.value('@StatementSubTreeCost', 'float') > 5 ORDER BY c.value('@StatementSubTreeCost', 'float') DESC; WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') ,cQueryStats AS ( SELECT query_hash ,SUM(total_worker_time / NULLIF(qs.execution_count,0)) AS avg_worker_time ,SUM(total_logical_reads / NULLIF(qs.execution_count,0)) AS avg_logical_reads ,SUM(total_elapsed_time / NULLIF(qs.execution_count,0)) AS avg_elapsed_time FROM sys.dm_exec_query_stats qs GROUP BY query_hash ) INSERT INTO dbo.PlanCacheStatementCost ( StatementText , StatementSubTreeCost , UseCounts , StatementEstRows , Avg_Worker_Time , Avg_Logical_Reads , Avg_Elapsed_Time , CaptureDateTime ) SELECT LEFT(c.value('@StatementText', 'varchar(max)'),8000) AS StatementText, s.StatementSubTreeCost , s.usecounts , CAST(c.value('@StatementEstRows', 'float') AS BIGINT) AS StatementEstRows , qs.avg_worker_time , qs.avg_logical_reads , qs.avg_elapsed_time , @CaptureDate FROM #StatementSubTreeCost s CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) AS qp CROSS APPLY qp.query_plan.nodes('//StmtSimple') t(c) LEFT OUTER JOIN cQueryStats qs ON c.value('xs:hexBinary(substring(@QueryHash,3))','binary(8)') = query_hash WHERE c.value('@StatementId', 'float') = s.StatementId ORDER BY c.value('@StatementSubTreeCost', 'float') DESC, s.StatementSubTreeCost DESC DROP TABLE #StatementSubTreeCost END
This stored procedure will grab every execution plan in the plan cache that has an estimated subtree cost over five and put it into the table. The five will have to be adjusted based on your system. You’ll get a good idea by running it a couple time what that number should be. You want to see the expensive plans, but you don’t want to be inundated. Note that this is a very expensive query and should only be run on a Production environment with great care! I then set up a SQL Agent job to run this stored procedure on a regular basis. Now I can look through the “expensive” queries and decide whether I’d want these queries to run in parallel or not, and adjust the Cost Threshold of Parallelism accordingly.
One last thing to note is the wait_stat CXPACKET. This is the measurement of amount of time that worker threads are waiting for the other threads assigned to the same task to finish. So in our example of the five boats where four of them are waiting half a day for the last to finish fishing, CXPACKET would measure the time those four boats are waiting. CXPACKET can indicate you have Cost Threshold of Parallelism set too low, but you have to know how to interpret it. Just because CXPACKET is high doesn’t mean you have a problem. Indeed by turning off parallelism (setting Max Degree of Parallelism to 1) you can eliminate the CXPACKET wait completely, but all your complex queries are going to be running more slowly than they could be. You want to set the Cost Threshold of Parallelism to a number where complex queries are getting the parallelism they need, but you don’t have a bunch of worker threads sitting around waiting for others to finish. You can see the CXPACKET wait with this query:
SELECT wait_type , waiting_tasks_count , wait_time_ms , max_wait_time_ms , signal_wait_time_ms FROM sys.dm_os_wait_stats WHERE wait_type ='CXPACKET'
The numbers returned are cumulative, so you need to track these numbers over time to see how fast the waits are increasing.
I hope this post has helped you understand how parallelism can help make queries run faster, and that using the correct settings means knowing your system and the queries that are routinely being run.