SQL Server Query Performance

Photograph of a SSD Drive Array

Database systems such as SQL/Server are getting an immense set of features these days, to the point where its impossible to be an expert in all areas. However a lot of the basics are similar to what they were 20 years ago. These basics are an area where a lot of software companies get it wrong, even in 2015. The sheer number of poor database and query designs that affect sql server query performance and application design that I have seen over the years from different software houses that really should know how to do so much better continues to amaze me, but the one I’m going to talk about in this post really takes the biscuit.

Designing a database is mostly science but part art as well, in the sense that you can’t just design everything to sixth normal form. You have to design a database that works in the real world, so this generally means on making tables that reflect real world business objects and then developing the design from there. Just putting in six normal form generic, abstract objects that don’t mean anything, so to store a business object in five tables just because you can, is not good practice in the real world. If you did this because you are being anal about performance, did you check performance first and see what you are saving by your approach? An approach that probably makes the code more complicated and less readable? No I thought not.

Just that one tip alone could save a lot of heartache for many people, developers and users. Don’t optimise anything without checking first, because computer hardware relatively speaking is cheap. Finding the wrong balance, or just simply getting it wrong, can be expensive.

The other tip I would give is make sure you get the level of table normal form optimisation right and all naming of objects and columns right. You should use table and column names that your users would understand where possible – again doing that is a really good start that can prevent many mistakes when designing databases.

Anyway, in this post I’d like to focus on one query that I came across the other year. Its long enough ago now, and I’ve changed all the names, so it will not be possible for you to work out who this is. All I will say its the supplier of a package to one of my customers of the past few years.

I don’t know how many people actually read this blog, but I know my brothers dog does (he is half poodle half border terrier), and I’m sure even he will agree how poor this SQL query is I’m about to show you.

I was running a query to try to find out why a database was slow. If users are complaining about speed, and its not your queries, or your package, or your database for that matter, then it can be hard to find out what the worst thing is. I run the following query and came across probably the worse SQL I have ever come across in my career, and that really is saying something, because my SQL career goes back to 1988:

This query was used to check for the longest running queries:

SELECT TOP 100
    total_worker_time/execution_count AS Avg_CPU_Time
        ,execution_count
        ,total_elapsed_time/execution_count as AVG_Run_Time
        ,(SELECT
              SUBSTRING(text,statement_start_offset/2,(CASE
                         WHEN statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), text)) * 2 
                         ELSE statement_end_offset 
                         END -statement_start_offset)/2
                       ) FROM sys.dm_exec_sql_text(sql_handle)
         ) AS query_text 
FROM sys.dm_exec_query_stats 
--pick your criteria
ORDER BY Avg_CPU_Time DESC
--ORDER BY AVG_Run_Time DESC
--ORDER BY execution_count DESC

This produced the following query in the top 5, which was taking a long long time, 11 minutes plus, to run. It was also being run quite regularly by the third party application (reminder: all names have been changed for confidentiality reasons):

ALTER VIEW [dbo].[v_Customer_Address]
AS
SELECT
	Customer.CustomerID,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddressID  ELSE Address.CustomerAddressID END PrimaryAddressID,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddress1  ELSE Address.Address1 END PrimaryAddress1,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddress2  ELSE Address.Address2 END PrimaryAddress2,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddress3  ELSE Address.Address3 END PrimaryAddress3,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddress4  ELSE Address.Address4 END PrimaryAddress4,
	CASE WHEN  Address.CustomerAddressID IS NULL THEN PrimaryAddress.PrimaryTel  ELSE Address.Tel END PrimaryTel,
	Email.Email EmailAddress,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryPostcodeOut  ELSE Address.PostcodeOut END PrimaryPostcodeOut,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryPostcodeIn  ELSE Address.PostcodeIn END PrimaryPostcodeIn,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryCountry  ELSE Address.Country END PrimaryCountry,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN dbo.GetPostcode (PrimaryAddress.PrimaryPostcodeOut ,PrimaryAddress.PrimaryPostcodeIn,'')  ELSE dbo.GetPostcode (Address.PostcodeOut , Address.PostcodeIn,'') END PrimaryPostcode,
	AddressType.AddressTypeID AS PrimaryAddressTypeID
FROM	Customer WITH (NOLOCK)
		CROSS JOIN
			AddressType WITH (NOLOCK)		
		LEFT JOIN
			(		SELECT CustomerAddress.* 
					FROM 	CustomerAddress WITH (NOLOCK) 
					WHERE 	CustomerAddress.DateTo IS NULL
			) Address
			ON
			Customer.CustomerID = Address.CustomerID AND
			AddressType.AddressTypeID = Address.AddressTypeID
		LEFT JOIN
			v_Customer_PrimaryAddresss PrimaryAddress WITH (NOLOCK)
			ON Customer.CustomerID = PrimaryAddress.CustomerID
		LEFT JOIN
			v_Customer_Email Email WITH (NOLOCK)
			ON Customer.CustomerID = Email.CustomerID

The functions and views that the above view called were trivial and not worth repeating, but one must question why such a view has to call other views in the way it was. At one time this would have confused the optimiser, although thankfully its less of a problem in recent years.

I strongly feel that NOLOCK is an anti pattern that does a poor job of doing READ UNCOMMITTED in SQL/Server (the database default isolation level was READ COMMITTED), it does not do what it says, using NOLOCK still generates locks that can lead to lock promotion. There are many posts and a little bit of controversy about NOLOCK if you google for it so I won’t repeat any here apart from to say LOCK PROMOTION and if it is occurring understanding why it is occurring is the answer, not putting NOLOCK on all your queries. DOH! Not only that, I can imagine this software company interviewing developers and rejecting anybody who doesn’t use NOLOCK as being stupid, so the real result being that they only hire inexperienced developers and/or developers who don’t bother to understand fully what they are doing. Double DOH!

I think its fair to say that the above code generated the largest number of WTF’s per line from me from any query, anywhere, anytime ever. Why 11 minutes? The WHERE clause was in the third party application, thats why, so it was doing at least one full table scan without a where clause of any kind and the CROSS and LEFT JOINs would ensure that it did quite a bit more inefficient reading.

The solution was to rewrite the query so as not to use the CROSS JOIN and read all those records:

SELECT        dbo.Customer.CustomerID, 
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.CustomerAddressID ELSE sa.CustomerAddressID END AS PrimaryAddressID,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Address1 ELSE sa.Address1 END AS PrimaryAddress1,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Address2 ELSE sa.Address2 END AS PrimaryAddress2,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Address3 ELSE sa.Address3 END AS PrimaryAddress3,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Address4 ELSE sa.Address4 END AS PrimaryAddress4,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Tel ELSE sa.Tel END AS Tel

	 ...
FROM            dbo.Customer
CROSS APPLY ( 
 SELECT CustomerAddressID, Address1, Address2, Address3, Address4, PostcodeIn + ' ' + PostcodeOut AS PostCode, Tel
 FROM dbo.CustomerAddress 
 WHERE CustomerAddress.CustomerID = dbo.Customer.CustomerID
 AND CustomerAddress.DateTo IS NULL
) sa
OUTER APPLY (
 SELECT CustomerAddressID, Address1, Address2, Address3, Address4, Postcode, Tel
 FROM dbo.CustomerAddress 
 WHERE CustomerAddress.CustomerID = dbo.Customer.CustomerID
 AND CustomerAddress.AddressTypeID = ISNULL((SELECT CAST(Setting AS Integer) FROM SystemSetting WHERE Code = 'PrimaryAddressType'), 1)
 AND CustomerAddress.DateTo IS NULL
) pa

When I ran my new query with explain plan, it suggested that I index the DateTo column which I did. This was not a recommendation for the old query because its very likely it couldn’t use indexing efficiently no matter what indexes were on the tables.

The new query took around a second to return 250,000 rows, and if a customerId where clause was added, the time was 20ms. I couldn’t get the original to run in less than 11 minutes although I ran it a few times.

The main thing to take away from this post is that if you read many rows, its better doing it in a query (in SQL) rather than in IL (e.g. C# or VB.NET code behind), but its very bad all the same and you should minimize the number of rows being retrieved by good database and query design as far as possible. I managed to make such improvements just by altering the query but sometimes you have to alter the design of the tables just to reduce the sheer complexity of some of the SQL required to get at the data. Like your code, the database design should reflect, and show the user requirements wherever possible. You can always change it if user requirements change and this way you end up with good simple, reliable code in your business layer rather than trying to sort the mess out there (at a possible impact on performance and readability of more of your code).

As an example, if look at one of your most common queries – e.g. searching for customers, and you don’t use SKIP/TAKE just to return the page of information that you are reading, then consider doing so. This alone might remove a potential lock promotion problem and make everything run faster. See this link for more details – http://stackoverflow.com/questions/548475/efficient-way-to-implement-paging. I might write a post on SKIP/TAKE at some point in the future because its a good technique that’s well worth knowing about.

So Boo, when you read this post, don’t forget that when other humans are around you must always pronounce WTF as WOOF. Thanks mate!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *