Tag: sql/server

SQL Server Query Performance
Database systems such as SQL/Server are getting an immense set of features these days, to the point where its impossible to be an expert in all areas. However a lot of the basics are similar to what they were 20 years ago. These basics are an area where a lot of software companies get it wrong, even in 2015. The sheer number of poor database and query designs that affect sql server query performance and application design that I have seen over the years from different software houses that really should know how to do so much better continues to amaze me, but the one I’m going to talk about in this post really takes the biscuit.

Designing a database is mostly science but part art as well, in the sense that you can’t just design everything to sixth normal form. You have to design a database that works in the real world, so this generally means on making tables that reflect real world business objects and then developing the design from there. Just putting in six normal form generic, abstract objects that don’t mean anything, so to store a business object in five tables just because you can, is not good practice in the real world. If you did this because you are being anal about performance, did you check performance first and see what you are saving by your approach? An approach that probably makes the code more complicated and less readable? No I thought not.

Just that one tip alone could save a lot of heartache for many people, developers and users. Don’t optimise anything without checking first, because computer hardware relatively speaking is cheap. Finding the wrong balance, or just simply getting it wrong, can be expensive.

The other tip I would give is make sure you get the level of table normal form optimisation right and all naming of objects and columns right. You should use table and column names that your users would understand where possible – again doing that is a really good start that can prevent many mistakes when designing databases.

Anyway, in this post I’d like to focus on one query that I came across the other year. Its long enough ago now, and I’ve changed all the names, so it will not be possible for you to work out who this is. All I will say its the supplier of a package to one of my customers of the past few years.

I don’t know how many people actually read this blog, but I know my brothers dog does (he is half poodle half border terrier), and I’m sure even he will agree how poor this SQL query is I’m about to show you.

I was running a query to try to find out why a database was slow. If users are complaining about speed, and its not your queries, or your package, or your database for that matter, then it can be hard to find out what the worst thing is. I run the following query and came across probably the worse SQL I have ever come across in my career, and that really is saying something, because my SQL career goes back to 1988:

This query was used to check for the longest running queries:
```
SELECT TOP 100
    total_worker_time/execution_count AS Avg_CPU_Time
        ,execution_count
        ,total_elapsed_time/execution_count as AVG_Run_Time
        ,(SELECT
              SUBSTRING(text,statement_start_offset/2,(CASE
                         WHEN statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), text)) * 2 
                         ELSE statement_end_offset 
                         END -statement_start_offset)/2
                       ) FROM sys.dm_exec_sql_text(sql_handle)
         ) AS query_text 
FROM sys.dm_exec_query_stats 
--pick your criteria
ORDER BY Avg_CPU_Time DESC
--ORDER BY AVG_Run_Time DESC
--ORDER BY execution_count DESC
```
This produced the following query in the top 5, which was taking a long long time, 11 minutes plus, to run. It was also being run quite regularly by the third party application (reminder: all names have been changed for confidentiality reasons):
```
ALTER VIEW [dbo].[v_Customer_Address]
AS
SELECT
	Customer.CustomerID,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddressID  ELSE Address.CustomerAddressID END PrimaryAddressID,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddress1  ELSE Address.Address1 END PrimaryAddress1,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddress2  ELSE Address.Address2 END PrimaryAddress2,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddress3  ELSE Address.Address3 END PrimaryAddress3,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryAddress4  ELSE Address.Address4 END PrimaryAddress4,
	CASE WHEN  Address.CustomerAddressID IS NULL THEN PrimaryAddress.PrimaryTel  ELSE Address.Tel END PrimaryTel,
	Email.Email EmailAddress,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryPostcodeOut  ELSE Address.PostcodeOut END PrimaryPostcodeOut,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryPostcodeIn  ELSE Address.PostcodeIn END PrimaryPostcodeIn,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN PrimaryAddress.PrimaryCountry  ELSE Address.Country END PrimaryCountry,
	CASE WHEN  dbo.f_CustomerHasMailingAddress(Address.Address1,Address.Address2,Address.Address3,Address.Address4) = 0 THEN dbo.GetPostcode (PrimaryAddress.PrimaryPostcodeOut ,PrimaryAddress.PrimaryPostcodeIn,'')  ELSE dbo.GetPostcode (Address.PostcodeOut , Address.PostcodeIn,'') END PrimaryPostcode,
	AddressType.AddressTypeID AS PrimaryAddressTypeID
FROM	Customer WITH (NOLOCK)
		CROSS JOIN
			AddressType WITH (NOLOCK)		
		LEFT JOIN
			(		SELECT CustomerAddress.* 
					FROM 	CustomerAddress WITH (NOLOCK) 
					WHERE 	CustomerAddress.DateTo IS NULL
			) Address
			ON
			Customer.CustomerID = Address.CustomerID AND
			AddressType.AddressTypeID = Address.AddressTypeID
		LEFT JOIN
			v_Customer_PrimaryAddresss PrimaryAddress WITH (NOLOCK)
			ON Customer.CustomerID = PrimaryAddress.CustomerID
		LEFT JOIN
			v_Customer_Email Email WITH (NOLOCK)
			ON Customer.CustomerID = Email.CustomerID
```
The functions and views that the above view called were trivial and not worth repeating, but one must question why such a view has to call other views in the way it was. At one time this would have confused the optimiser, although thankfully its less of a problem in recent years.

I strongly feel that NOLOCK is an anti pattern that does a poor job of doing READ UNCOMMITTED in SQL/Server (the database default isolation level was READ COMMITTED), it does not do what it says, using NOLOCK still generates locks that can lead to lock promotion. There are many posts and a little bit of controversy about NOLOCK if you google for it so I won’t repeat any here apart from to say LOCK PROMOTION and if it is occurring understanding why it is occurring is the answer, not putting NOLOCK on all your queries. DOH! Not only that, I can imagine this software company interviewing developers and rejecting anybody who doesn’t use NOLOCK as being stupid, so the real result being that they only hire inexperienced developers and/or developers who don’t bother to understand fully what they are doing. Double DOH!

I think its fair to say that the above code generated the largest number of WTF’s per line from me from any query, anywhere, anytime ever. Why 11 minutes? The WHERE clause was in the third party application, thats why, so it was doing at least one full table scan without a where clause of any kind and the CROSS and LEFT JOINs would ensure that it did quite a bit more inefficient reading.

The solution was to rewrite the query so as not to use the CROSS JOIN and read all those records:
```
SELECT        dbo.Customer.CustomerID, 
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.CustomerAddressID ELSE sa.CustomerAddressID END AS PrimaryAddressID,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Address1 ELSE sa.Address1 END AS PrimaryAddress1,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Address2 ELSE sa.Address2 END AS PrimaryAddress2,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Address3 ELSE sa.Address3 END AS PrimaryAddress3,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Address4 ELSE sa.Address4 END AS PrimaryAddress4,
				CASE WHEN pa.CustomerAddressID IS NOT NULL THEN pa.Tel ELSE sa.Tel END AS Tel

	 ...
FROM            dbo.Customer
CROSS APPLY ( 
 SELECT CustomerAddressID, Address1, Address2, Address3, Address4, PostcodeIn + ' ' + PostcodeOut AS PostCode, Tel
 FROM dbo.CustomerAddress 
 WHERE CustomerAddress.CustomerID = dbo.Customer.CustomerID
 AND CustomerAddress.DateTo IS NULL
) sa
OUTER APPLY (
 SELECT CustomerAddressID, Address1, Address2, Address3, Address4, Postcode, Tel
 FROM dbo.CustomerAddress 
 WHERE CustomerAddress.CustomerID = dbo.Customer.CustomerID
 AND CustomerAddress.AddressTypeID = ISNULL((SELECT CAST(Setting AS Integer) FROM SystemSetting WHERE Code = 'PrimaryAddressType'), 1)
 AND CustomerAddress.DateTo IS NULL
) pa
```
When I ran my new query with explain plan, it suggested that I index the DateTo column which I did. This was not a recommendation for the old query because its very likely it couldn’t use indexing efficiently no matter what indexes were on the tables.

The new query took around a second to return 250,000 rows, and if a customerId where clause was added, the time was 20ms. I couldn’t get the original to run in less than 11 minutes although I ran it a few times.

The main thing to take away from this post is that if you read many rows, its better doing it in a query (in SQL) rather than in IL (e.g. C# or VB.NET code behind), but its very bad all the same and you should minimize the number of rows being retrieved by good database and query design as far as possible. I managed to make such improvements just by altering the query but sometimes you have to alter the design of the tables just to reduce the sheer complexity of some of the SQL required to get at the data. Like your code, the database design should reflect, and show the user requirements wherever possible. You can always change it if user requirements change and this way you end up with good simple, reliable code in your business layer rather than trying to sort the mess out there (at a possible impact on performance and readability of more of your code).

As an example, if look at one of your most common queries – e.g. searching for customers, and you don’t use SKIP/TAKE just to return the page of information that you are reading, then consider doing so. This alone might remove a potential lock promotion problem and make everything run faster. See this link for more details – http://stackoverflow.com/questions/548475/efficient-way-to-implement-paging. I might write a post on SKIP/TAKE at some point in the future because its a good technique that’s well worth knowing about.

So Boo, when you read this post, don’t forget that when other humans are around you must always pronounce WTF as WOOF. Thanks mate!
21st December 2015
Sql Server Data Tools (SSDT)

Quite a few years ago stored procedures were very common. With the advent of ORM (object relational mappers) and TDD (test driven design), they are less common. Well what might be termed “misuse” of views and stored procedures is. I have worked on some systems where the majority of the data access logic and business logic are partly in stored procedures, partly in the app. These systems are very difficult to test, no matter what methods are used (and can be difficult if for example a requirement appears to store some of the data on a different database for example). I could go on but I wont…

However, even though a thinner database layer is both common and usually a good thing, it is still a good idea in my opinion to keep the database scripts used to create the database under source control. This way, the code can be versioned with the app using the app’s source control system and a history of changes and comments, along with all the benefits of the app’s source control system can be used for the database files as well. Even though we don’t have big stored procedures as much any more, many systems can still contain a lot of views, and of course tables.

Prior to using the tools I am talking about in this blog post, keeping full track of changes was hard. I few customers I worked for have had systems where object create (table, view, stored procedure) and also change scripts were maintained in a set of folders, so that when you changed the database, you changed the folder structure/contents as well to match. A little time consuming but these systems tended to work very well.

The first tool I came across a few years ago was SQL Compare and SQL Data Compare from Redgate. These are powerful, useful tools, that saved me time because when I made a change, I updated the release notes but didn’t have to make a change script. Both these tools can very easily make change scripts at release time, provided you have a copy, or have access to the test database at release time (you are using test and dev databases aren’t you, please don’t tell me you develop on live…).

However what really got my attention was Sql Server Data Tools (SSDT) from Microsoft, which is now an integral part of Visual Studio. Installing this add on, makes available a new project type, the SQL Server Database Project as shown below.

Sql Server Data Tools (SSDT) New Project

Once the project is created, the following page appears, upon which you can import an existing database, or you can create the SQL files to create tables, views etc. using full Visual Studio Intellisense.

Sql Server Data Tools (SSDT) Empty Project

The immediate thing that I like about Sql Server Data Tools (SSDT) is that it automatically creates and saves CREATE TABLE scripts, but when working with it, it uses CREATE or ALTER as appropriate. Just a detail that is surprisingly useful.Sql Server Data Tools (SSDT) compiles the Visual Studio solution just like other solutions, and has compare tools built in (I still like Redgate’s tools in production environments but SSDT are good for dev’s). It keeps an empty copy of your dev database in LocalDb storage, so you can either update the dev database and compare with SSDT, or you can update SSDT and publish your change to your dev database. It can also be used for updating test and live database although I tend to use Redgate’s tools for that at the moment.

The only problem I have found personally with Sql Server Data Tools (SSDT) is that because it compiles the solution, some complex production databases, particularly ones that link to other databases inside of views, stored procedures etc., do not properly compile. In these situations I have used RedGate’s SQL Source Control which is easier to set up in these situations. With SQL Source Control, you have a development database as before, which is linked in SSMS to the source system and no solution compilation is involved, so it works fine in even the most complex of situations. This is slightly different to Microsoft because they use Visual Studio for dev’s, SSMS for admins, but as a dev I have used SSMS as well so this is no problem at all for me.

A good tutorial on Sql Server Data Tools (SSDT) can be found in this Mssql Tips article on Sql Server Data Tools (SSDT). Hope this helps. I am surprised that a lot of developers haven’t heard about these tools. Well worth checking out.

9th December 2015
SQL Server Performance Tuning with SSD drives

Recently when explaining to a student the basics of database design and the need for primary keys, foreign keys and indexes generally, I fired up SQL/Server 2012 and typed in a few simple examples on my laptop.

A very brief summary of this is that you need primary keys to uniquely identify rows in a table, and you use foreign keys to link to primary keys in other tables. Additionally an index must be specified on each foreign key (if you use it as an access method rather than just referential integrity). You can also specify additional indexes unique or otherwise as you see fit, taking into account your access methods (paths) and also being careful not to get too carried away with indexing due to insert performance and space considerations.

So I set up a simple example with a couple of tables, customers and orders. I then used Red Gates SQL Data Generator 2 to generate a million rows in the tables because the first time I tried my laptop was taking less than a millisecond to return the data and the times were coming out as zero.

So to repeat the tests with more data: I then started off with a table scan, tried to access via a unique id (without a primary key defined) and compared the timings, so far so good.

When I tried to access all customers whose names begin with “A”, I got 70,000 returned in a quarter of a second or so. It was doing a table scan. When I added an index to customer name, the times increased by about 25%. The only explanation I have is that because my laptop uses dual Samsung 256GB SSD’s, data access is so quick anyway, using an index results in more IO and hence slower access.

I need to do some more work here to understand whats happening here. Expect future blog posts on this subject.

25th January 2015

Tag: sql/server

SQL Server Query Performance

Sql Server Data Tools (SSDT)

SQL Server Performance Tuning with SSD drives