SQL COACH - Your Source for Advanced SQL Server Tips, Query Optimization and More!: 2007

Thursday, October 18, 2007

Quick Tip: Deleting all Data from a Table... FAST!!

There are times when you need to “blank out” a table. That is to say, remove all data from the table, but leave the table to be repopulated.

Many people opt for:

DELETE FROM MYTABLE

This certainly works. However, if you have a lot of records (and especially if there are a lot of indexes as well) this approach can be time consuming.

The reason is simple. SQL Server tracks all changes made in a database so they can be rolled back (Using Begin Tran/Rollback/Commit commands). Also, SQL Server is tracking these commands for disaster recovery. If the server goes down later, these changes are stored in a way that SQL Server can reconstruct the database when it boots back up and recovers. So, for every row you delete, SQL Server is writing to a log about that delete action. As we all know, writing data takes some time.

A quicker option is to use the following:

TRUNCATE MYTABLE

This is basically instantaneous. SQL Server merely logs the fact that a truncate happened. So, if you delete two million records, only one log entry is made. SQL Server then simply changes some internal pointers, and *poof* it’s done. BE VERY AWARE THAT YOU CANNOT ROLLBACK A TRUNCATE STATEMENT. That is the price you pay for speed!!

One final note, if you have an Identity field in your table (otherwise known as an auto-increment field from those of you coming from an Access background) then Truncate will reset your starting number to 1 (or whatever you defined as your starting number). This can cause some problems in some apps that depend on the field continuing to increment. If you need the field to continue to increment, there is a technique for that. If your interested in that technique, just drop us a comment and we’ll post it for you.

Thursday, October 11, 2007

Search Stored Procedures with SysComments - The Inner Workings of Your Database Revealed!

Hello and welcome back to SQL Coach!!

Two days ago I shared some information with you about the SysObjects table. Tonight I want to talk about SysComments.

In case you missed the Tuesday evening post, SysComments is one of SQL Server's System Tables. Do not attempt to DELETE, INSERT or UPDATE the table in this table. That is SQL Server's job and we need to leave it up to SQL Server to decide what to update and when.

Have you ever been in the situation where you needed to create a stored procedure to accomplish something that you know you have already written or at least saw a stored procedure that performed a task similiar to what you need to accomplish?

And have you ever been frustrated because no matter how hard you tried to remember you simply could not recall which stored procedure that code is in?

If you only have a handful of simple stored procedures it would be a simple task to open each and perform a search for some part of the code that you remember.

But, wouldn't it be so much simpler if you could simply write a query and have SQL Server do the work for you and return a listing of any stored procedures that are good candidates?

I am about to show you how to accomplish this... are you ready? Great, let's get started!

Remember that SysComments stores the actual text (code) for your stored procedures and functions. The table also contains an ID field that maps back to the id field in SysObjects.

The column in SysComments that stores the text of your stored procedures and functions is called simply enough text.

Here is a very simple example of how to search for all stored procedures that contain an Insert operation:

SELECT so.name
FROM dbo.sysobjects so
INNER JOIN dbo.SysComments sc
ON sc.id = so.id
WHERE so.xtype = 'P'
AND sc.text LIKE '%INSERT%'

Now, one thing you may notice is if you have a stored procedure with a lot of text and several INSERT statements spread down throughout the code the above query will return that stored procedure's name more than once.

The reason for this is the text column in the SysComments table can only hold 8000 characters. If the text in the procedure is longer than 8000 characters SQL Server needs to create additional rows. In each new row, the text column will contain up 8,000 characters (or however many characters are left) of text until finally all text in the procedure is accounted for.

No worries though, there is a simple solution to the names being returned multiple times, simply add a DISTINCT to your SELECT statement so it looks like this:

SELECT DISTINCT so.name
FROM dbo.sysobjects so
INNER JOIN dbo.SysComments sc
ON sc.id = so.id
WHERE so.xtype = 'P'
AND sc.text LIKE '%INSERT%'

Now, you have the ability to search the text (code and comments) of your stored procedures. And, believe me, this can be a real timesaver when your database grows to more than a dozen or so of long procedures. Beyond the time savings searching like is fun. Make SQL Server do the work for you!

Let me share one more trick with you before I end this post.

You do not even need to link back to the SysObjects table as long as you do not care what type of objects are returned by your search.

For example, if you want all stored procedures, scalar functions, etc returned that contain the word "password" you can accomplish this with the following statement:

SELECT DISTINCT OBJECT_NAME(id)
FROM dbo.SysComments
WHERE text LIKE '%password%'

Here is how it works: SQL Server provides the OBJECT_NAME function. It accepts an ID value (from the sysobjects table) and returns the name. The SysComments table contains an ID column that links back to the ID column in the SysObjects table.

Okay, that's it for this post. Remember to come back often and bring your friends!

Tuesday, October 9, 2007

The Power of SysObjects - The Inner Workings of Your Database Revealed!

Hello and welcome to SQL Coach!!

Tonight I want to share some information with you that I find extremely useful in my daily work.

As you may already know... inside the Tables section (in your Object Explorer window) of your database SQL Server contains what are called System Tables.

SQL Server uses these System Tables to store information describing your database objects and statistics about the use of your objects.

I feel the need to stop right here and say DO NOT INSERT, UPDATE or DELETE any data in the System Tables. Like I said, these are for SQL Server to maintain, not you (or me).

However, there is no problem in SELECTing the data contained in these tables. =)

The tables I want you to learn about are called SysObjects and SysComments.

There are many columns in each table and they can be intimidating for a beginner. So, let's do something easy to show you their "bark is worse than their bite."

First a super quick summary of these tables is in order.

SysObjects stores basic information about all objects inside your database. It's useful for you to know because it tells us the name of each object and the type of the object.

SysComments stores the actual text (code) for your stored procedures and functions. It contains an ID field that maps back to the id field in SysObjects.

Now, let's do some quick queries to help you find your way around these two.

SELECT name
FROM dbo.sysobjects

Executing the statement above will list all objects associated with your database. Probably not that useful but maybe kind of interesting if you want an overall feel for the size of database you are working in.

Now, we'll make use of the xtype field found inside SysObjects:

SELECT name
FROM dbo.sysobjects
WHERE xtype = 'U'

Executing the statement above will list all User tables inside your database.

Let's have a look at the Procedures inside your database:

SELECT name
FROM dbo.sysobjects
WHERE xtype = 'P'

Executing the statement above will list all Procedures inside your database (if you want to look at your functions use 'FN' for scalar functions and 'TF' for table functions).

Now, I bet you have a good idea how the Object Explorer window knows what to show, huh?

That's it for now... join us later this week and we'll take a good look at the SysComments table.

Thursday, October 4, 2007

Quick Tip: Left vs. Inner Joins

I have run into a few people now who don’t fully understand the differences between a left join and an inner join. In these cases, the people simply used Left Joins all the time to “cover their bases”. If this is you, please read on…

As far as performance goes, a left join is MUCH LESS EFFICIENT than an inner join. Therefore, it is definitely worth your time to investigate them and understand them. I am not going to re-invent the wheel here as there are about 1.2 ga-billion articles on this topic, and Books Online even has some pretty decent info. I just want to emphasize that this is a topic worth the time to learn. The query it could save may be your own!

Wednesday, September 26, 2007

Implicit Data Conversion: The “Hidden” Bottleneck

Have you ever written a query that just seems to take forever? You have looked it over time and time again, and it seems to be perfect! It’s straightforward, has an appropriate index, and still takes 5 – 10 seconds to come back.

Well, there can be a number of causes. Today, we are going to discuss “Implicit Conversion.” I ran into this again today helping a colleague out. I have seen this literally bring a system to its knees by causing deadlocks during high load. In short, we can blame this for the fall of the Roman Empire.

Consider the following stored procedure:

Create Procedure WidgetInfoGet @WidgetID VarChar(20)
As
Select WidgetID, Descr, CurrentPrice, Qty
From WidgetMaster
Where WidgetID = @WidgetID

That looks innocent enough, right? However, a statement like this has the potential to make demon and children cry if that table has several hundred thousand rows in it.

Here’s the deal. Have you ever seen a system where in some tables an ID is stored as a Character string, but in others it’s stored as an Integer? Sure, we all have. Developers and even DBAs get confused and write a procedure with the wrong data type all the time in that scenario. Say for example that our table actually has WidgetID definted as an Int, but the developer was confused and wrote the procedure as if the WidgetID column was a VarChar(20).

Now, every time SQL Server has to look for a Widget, it has to convert @WidgetID from a VarChar to an Int. This is an implicit conversion of data type. Internally, SQL Server uses a convert function to do this. When this happens, SQL Server cannot use an index effectively because any time it has to hand something to a function, it cannot be certain of the result of the function. Therefore, it has to convert the value for each and every row. This results in SQL Server scanning the entire table looking for the value. This takes time and, under default locking modes, places a share lock on the entire table preventing other processes from updating records while the scan is taking place.

The fix? Simple! Strangle the person who started mixing data types in the system. Once they are dead, alter your procedure to convert the value first:

Create Procedure WidgetInfoGet @WidgetID VarChar(20)
As
Declare @intWidgetID Int
Set @intWidgetID = Convert (Int, @WidgetID)

Select WidgetID, Descr, CurrentPrice, Qty
From WidgetMaster
Where WidgetID = @intWidgetID

…and you will now retire a millionaire because you single-handedly rescued the world.

Friday, August 31, 2007

Building a Weekend Table - SQL Dates and Times Series

Hello and welcome (back) to the SQL Coach blog!

Tonight I am going to show you how to build a table populated with the dates of all weekend days within a certain date range.

In this project we will be covering:

The SmallDateTime SQL Server data type
How to create a Variable Temporary Table in SQL Server
How to loop through a date range in SQL Server T-SQL
How to use DatePart to determine if a date refers to a Saturday or Sunday

Open a new query window and copy and paste the following code into it...

-- Get rid of Row(s) Inserted tracking and messaging
SET NOCOUNT ON

-- Create a temporary variable table
DECLARE @WeekendDays TABLE
(
dDate SmallDateTime,
sType VarChar(3)
)

-- Define the date range we will be using
DECLARE @dDate SmallDateTime,
@dEndDate SmallDateTime

SET @dDate = '19900101'
SET @dEndDate = '20251231'

-- Populate the temporary table with dates
-- of all Saturdays and Sundays that fall
-- within the date range
WHILE @dDate < @dEndDate
BEGIN
If DATEPART(WEEKDAY, @dDate) = 1
INSERT INTO @WeekendDays Values(@dDate, 'SUN')

Else If DATEPART(WEEKDAY, @dDate) = 7
INSERT INTO @WeekendDays Values(@dDate, 'SAT')

SET @dDate = DATEADD(DAY, 1, @dDate)
END

This code will create and populate a temporary variable table holding the dates of all Saturdays and Sundays from January 1, 1990 through December 31, 2025.

For the remainder of this post I will refer to this code as the base code.

Now, let's have some fun and check out the data in this table! ;)

Suppose you want to know how many Saturdays are in December 2007.

You can find out this information by adding the following query below the base code and executing it (it being all code, the whole enchilada):

SELECT COUNT(*) AS NUM_WEEKEND_DAYS_IN_DEC_2007
FROM @WeekendDays
WHERE dDate BETWEEN '20071201' AND '20071231'

Excellent! Now we know December 2007 has 10 weekend days.

Now, let's see the dates of these weekend days.

You can find the dates by adding the following query below the base code and executing it:

SELECT dDate
FROM @WeekendDays
WHERE dDate BETWEEN '20071201' AND '20071231'

You should see 10 rows that begin with this data...

2007-12-01 00:00:00
2007-12-02 00:00:00

Now, let's get rid of the time data because we really do not need it here.

SELECT Convert(Char(8), dDate, 112)
FROM @WeekendDays
WHERE dDate BETWEEN '20071201' AND '20071231'

Now, the data will look like this...

20071201
20071202

Now, let's Americanize this format.

SELECT Cast(Month(dDate) as VarChar)
+ '/' + Cast(Day(dDate) as VarChar)
+ '/' + Cast(Year(dDate) as VarChar)
FROM @WeekendDays
WHERE dDate BETWEEN '20071201' AND '20071231'

As you see, the data now looks like this...

12/1/2007
12/2/2007

I hope you now have a much better understanding of how to work with Dates in Sql Server. And I imagine you have some ideas you just can't wait to try out!

I apologize for the lack of indenting in the code. I actually wrote the code using indenting to make it easier to follow but once I published this post the formatting was removed. Ah well, these things happen. I think the code is clear enough you can easily follow it.

Remember to check back soon for more great SQL information. And please help us to spread the word about our blog. Tell your friends. Tell your co-workers. Heck, tell your family and even your pets!

Thanks!

Wednesday, August 29, 2007

Difference between DateTime and SmallDateTime - SQL Dates and Times Series

Greetings and welcome to the SQL Coach blog!

It seems that many people are a little confused about working with time and dates in SQL databases.

Are you one of the people who are wondering about working with time in SQL?

Do you just want to know more about the sql server date time types?

If you answered "YES!" to either question you will want to check this blog daily because I will be posting a thorough and easy-to-follow description of the many options available when working with dates / time in SQL Server.

Assuming all goes to plan I will post this information by Friday night!

In the meantime, here is some information to get you started...

What is the difference between SMALLDATETIME and DATETIME?

"What's that", you ask? You never heard of this SmallDateTime data type?

You are not alone! Based on the databases I have worked in while helping my clients I think it is safe to say many people do not know about the SmallDateTime data type.

That does not change the fact that it does indeed exist. And yes, even your SQL Server 2000 has it!

Here are the main differences you should know about these two datetime types:

1. Range of Dates

A DateTime can range from January 1, 1753 to December 31, 9999.
A SmallDateTime can range from January 1, 1900 to June 6, 2079.

2. Accuracy

DateTime is accurate to three-hundredths of a second.
SmallDateTime is accurate to one minute.

3. Size

DateTime takes up 8 bytes of storage space.
SmallDateTime takes up 4 bytes of storage space.

Armed with this knowledge, you may want to use SmallDateTime instead of DateTime if you only need to represent dates from January 1, 1900 to June 6, 2079 and you do not need accuracy below 1 minute. Why? Simple! Using SmallDateTime will reduce the amount of data your queries are pulling back. The size of each row will be a bit smaller.

That's it for now... be sure to check back soon for more information on using datetime in SQL Server.

Monday, August 20, 2007

Joins / Wheres and Indexing

Another common error I have seen is anticipating what indexes will be needed on a table based on what other tables you are going to join to. For example, say you have a Puppies table and a Kid table. Most likely, you’ll have a Kid_ID column in the Puppies table pointing back to Kid; no problem there. Nearly always, you will see an index on Kid_ID. In many systems, this index is rarely used, if ever. Also, when searching for the kids who own a small Labrador with white spots on it’s back who’s owner’s last name is ‘Johnson’, the query is not running as efficiently as it could, even though there is clearly an index on the Characteristics column of the Puppies table.

Why?

The Kid_ID index is merely too simple for SQL Server to use effectively. Think back to “The” Analogy (http://sqlcoach.blogspot.com/2007/08/analogy.html). If you’re the super secret agent, do you want to drive out to the warehouse, grab the index book that lists all dogs on the planet, then grab the reference for all kids on the planet and start cross-referencing them? No!!! You’d have to cross reference several billion (if not more) entries before starting to weed out entries based on their type. In other words, you’d be cross-referencing hound ogs, shepard’s, etc. which is a lot of work, especially since in the end you don’t care about those breeds right now.

It makes more sense to grab the index reference on dog characteristics (i.e. the Characteristics Index we mentioned above.) So, SQL Server ignores the Kid_ID index and uses the Characteristics index instead.

So, back to our analogy. The agent uses the Characteristics cross reference to look for all small Labs w/white spots. He finds a mere 36,000. Now he runs into the warehouse to look through all 36,000 files looking for kids with the last name of “Johnson.” Good news… that will only take about 32 days….

What the super secret agent REALLY needs is a cross-refference that has characteristics AND Kid_ID reference. Then, if he had a reference that had Kid_ID and Last_Name, he could cross-refference the two of them without every leaving the reference room. The super-secret agent just LOVES saving time by not going into the warehouse.

So, the moral of the story…

When creating an index, create a compound key (an index with multiple fields in it). The first field(s) should be those commonly used in WHERE clauses (in this case, Characteristics). Next, place fields that commonly participate in Join clauses (such as Kid_ID in this example.) Now you’re DB is ready to Rock!!

Some of you may be asking: “How do I know which columns are being used in WHERE clauses the most?” Well, three’s two choices there: (1) Get tools that profile your database and give you quantifiable information about that very topic. These are rare and expensive. If you’re a smaller shop, then (2) Start Analyzing! Ask the project lead, ask the users, use common sense, etc. Creating indexes can be an art in this respect. Go buy a smock!!

Friday, August 17, 2007

Order of Where Clause

Disclaimer: For those of you looking for the "Where's the Order By Clause", then try googling again. You messed up.

Now on to the post....

Many people, once they start getting comfortable writing SQL, begin asking the same questions. One such common question is “Does it matter the order of stuff that I put on the where clause?” This question usually means different things to different people, so let’s dig in a bit.

The short answer, with Microsoft SQL Server, is “No.” SQL Server will internally re-order where clause items to properly match indexes or other time saving devices. The only exception to this is if items are grouped with parentheses.

That being said, other Database Management Systems (DBMS) may be impacted by where clause order. I used to work with Sybase SQL Anywhere and Pervasive DBs that it did make a difference. If you’re not working with SQL Server, take all this with a grain of salt and do you own testing.

I said earlier that this question means different things to different people. The following examples will highlight what I mean:

These two statements are equivalent in Microsoft SQL Server:

SELECT CUSTOMER_ANNOYACE_FACTOR
FROM CUSTOMERS
WHERE HAVE_CUSSED_ME_BEFORE = 1
AND STOLE_MY_GIRLFRIEND = 0

And

SELECT CUSTOMER_ANNOYACE_FACTOR
FROM CUSTOMERS
WHERE STOLE_MY_GIRLFRIEND = 0
AND HAVE_CUSSED_ME_BEFORE = 1

Notice that the first statement begins by filtering on “Have Cussed Me Before” and in the second statement, these are reversed.

As well, others may actually be asking about the following example. Again, these two are equivalent:

SELECT GEEK_LEVEL
FROM GEEK_RATINGS
WHERE GEEK_LEVEL = 25

And

SELECT GEEK_LEVEL
FROM GEEK_RATINGS
WHERE 25 = GEEK_LEVEL

Well, there ya have it. Yet another fine advanced SQL (OK, maybe not so advanced this time) from the Acme Advanced SQL Tuning squad.

Happy Tuning!

Monday, August 13, 2007

Poor Little Misunderstood Views

Some people love ‘em, some people hate ‘em. But, one thing I have found to be nearly universal is that move people misunderstand how Views are used in SQL Server.

The Backdrop
A View is merely a pre-defined query that can be treated in many ways as a table. This allows DBAs or Database Developers to pull data from several tables and expose it as a single “virtual table”. This has many advantages: the consumer of the view doesn’t have to have a detailed understanding of the database table layouts (schema); and a single addition or fix to the view then ripples out to all the consuming queries in one fell swoop.

The Problem
However, many developers struggle with the performance of Views. Most note that they operate slower than simply joining in the information they need from the base tables in every query, throwing out the advantages of the views. I know many DBAs and Database Developers who live by the “There’s no views allowed on my server” rule.

The Solution, or, How to Make Life Beeter for Everyone
Views can be a friend or a foe. The later scenario is generally born out of a lack or misunderstanding of the rules of Views. Once you get to know how SQL Server actually makes use of a view, it will make much more sense, and you can make more appropriate choices which will allow you to benefit from the Views without impacting performance.

The Misconceptions
When views were first explained to me, they were explained incorrectly. I have, since then, heard others regurgitate the same falsehood countless times. I operated under this false knowledge for years until recently, working with Query Analyzer and actually breaking down the query plans, I saw “the light.”

Most of us were taught that Views are slower because the database has to calculate them BEFORE they are used to join to other tables and BEFORE the where clauses are applied. If there are a lot of tables in the View, then this process slows everything down. This explanation seems to make sense on the surface, and is therefore easily accepted. However, NOTHING IS FURTHER FROM THE TRUTH on SQL Server!

The fact of the matter is that when a query is being broken down by the SQL Server’s optimizer, it looks at the fields within the select to see which of them are needed by the consuming query. If it needs a given field, then it extracts it from the query definition, along with it’s table from the FROM clause, and any restrictions it needs from the WHERE clause or other clauses (GROUP BY, HAVING, etc.) These extracted elements are then merged into the consuming queries and are generally treated as a sub-query. The optimizer then joins the data together along indexes as best it can, just as it does with non-view elements, and then the entire query is run. The view is NOT pre-calculated just because it came from a view definition.

So, why does it often run slower? Three reasons:

Reason 1 - Sort Order: Well, sub-queries often suffer from not being sequenced in an order that can easily be merged into the main query. This causes the server to do extra work to so the data returned by the sub-query before merging it. In this circumstance, the data is pre-calculated so it can be sorted. However, if the index that is used by the sub-query orders

Fix 1 - Watch your query plans. If an appropriate index exists that will return data in the same order that is needed for the join, then it will be pulled in without having to sort it, thus avoiding the need to pre-fetch and pre-calculate.

Reason 2 – Inner Joins: When the view is broken down to see what fields on the SELECT are needed, and then the corresponding table from the FROM clause, it has to go one step further. It must consider anything in the WHERE clause that may throw out data. As well, Inner Joins from the table in the FROM clause can also throw out data if the joined in table does not have a matching row. Since the optimizer doesn’t know whether or not the Inner Join was used as a filtering device, it has to include it. Very often, tables are joined in to show data that the consuming query doesn’t need, not just as a filter. In these cases, the Inner Join only causes SQL Server to do more work for no good reason.

Fix 2 - Whenever possible, limit the number of inner joins in the View definition. Try to only use Inner Joins when are you certain most or all consuming queries will need data from the joined table. If there are many cases where the consuming data will not, consider multiple Views to service the various cases.

Side note: Left Joins are not used as filters. If a View left joins in a table, but there are no fields used in that table, it will be eliminated when the view is pulled in.

Reason 3 – Redundant Tables Calls: When you create a view, you can actually use another view as a source of data. This practice can be nested practically limitlessly. Since each of these views will be have their query definitions pulled in as a Sub-Query, then it’s very possible that the same base table will participate in the query multiple times. This is, generally, just a waste. Why go to the same place multiple times?

Fix 3 - Try to limit yourself to only using 1 view in a query. Also, try to avoid using Views as a base table within another view. If you find yourself needing info from multiple views, consider breaking form and joining in the base tables rather than pulling in the views. In some cases, doing this has allowed me to call data from a table once rather than THREE OR FOUR times, bringing the response time down by thousands of percents (and I’m not kidding). Sometimes you have to ignore the advantages of the views to gain performance.

Conclusion
Views do have their place, but knowing how they are used will save you grief and performance time. Knowing the views are NOT pre-calculated, but rather are broken down into it’s parts and the “useful bits” (and inner joined fluff) pulled into the main query can explain a lot of the performance impacts of these objects.

Happy tuning!

Wednesday, August 8, 2007

"THE" Analogy

INTRODUCTION
Databases, such as Microsoft SQL Server, Oracle, DB2, etc, can sometimes seem to be marvelous shiny “Magic Happens Here” boxes where you simply ask it for information and “poof” the answer is dropped into your lap. At times it can remind you of your favorite sci-fi show. “My, those engineers who build these databases are really really really smart!!”

Well, yeah, they are really smart. And the databases do some pretty incredible things. However, always remember that in the end, it’s just a computer that has to follow the laws of time and space just like everything else. Given it’s mortal status, the wise developer will always bear in mind that there is more than one way to skin the preverbal cat, and some ways are much faster than others.

Here’s a big statement

WHEN WRITING DATABASE QUERIES, 95% OF THE TIME THE BOTTLENECK IS IN THE I/O, not the CPU, and not the memory. If you can cut down on the I/O operations, your query will run faster. In fact, other queries will too because they don’t have to compete with your query as much!

Now, this statement is huge, so let’s repeat it. In fact, let’s chant it under our breath as we write queries (everyone already thinks we’re strange, so there is no damage done here):

WHEN WRITING DATABASE QUERIES, 95% OF THE TIME THE BOTTLENECK IS IN THE I/O, not the CPU, and not the memory.

(Copy and Paste is great, eh?) The enormity of this statement just BEGS for an analogy, so here we go…

THE ANALOGY
You work at a super secret government agency whose initials no one has every heard of. You have been given an empty desk and the instructions that when your boss asks for information, you need to provide it to him as quickly as possible. The information you need to find is in the super secret warehouse on the other side of the city.

The boss comes in. He’s big, mean looking, and schooled in the arts of killing and maiming without remorse. He orders you to give him the total number of suspected skateboarders in the US who currently own blue GEO Metros. This is top priority; the security of our nation depends on it. “Right away” you reply as your grabbing your car keys and heading out the door with no other thought on your mind but to satisfy this dangerous man.

You jump in your car and head for the highway. After about 25 minutes of bumper-to-bumper traffic, you finally navigate your way to the underground fortress that houses all the information. A brief three-minute stop to allow the guards to check your name is on the list, and a quick strip search, you’re on your way to the belly of the repository that is buried underneath a mountain.

With vim and vigor you jump out, grab the first box, and begin digging through the “A’s” looking for any skateboarders. As you find them, you also check to see if they are driving a blue car. If so, you also double-check that it’s a GEO Metro. The first box down, you grab the next. Two and one-half years later, you have completed looking through all the boxes and gratefully return to the office with the answer in hand!!

The boss is somewhat surprised to see you! You proudly arrive state “63!! The total number is 63!!” He sneers a thanks at you, muttering something about how long it took.

“OK,” he says, “Now I need to know how many of them are over the age of 60.” You are crest-fallen. You do not relish the idea of going through all that again.

The smarter super secret agent:
Work smarter, not harder, right? Super-Secret Agent #2 (SSA2) is in the same boat. The same desk, same job description, same distance to the warehouse. When his big, mean looking boos asks for the same information, he hops in his car, crosses city traffic, and passes the security checkpoint. However, he’s smart enough to go to the “Cross-Reference” room. Once in this room, he pulls out the “Skate or Die Index” book and “You are What You Drive Index” book and begins to cross reference the two. About three hours later, he has his list of 63 candidates. Now all that remains is cross-referencing this list with the “Index for All Ages” book and cross off all those who are not yet 60. Just in case, SSA2 makes a copy of each of these index books as he suspects the boss may ask him additional questions along this line. H’ell just keep the indexes on his desk to avoid city traffic next time.

With answer in hand (a somewhat surprising 52) he starts back for the office. The SSA2 worked much smarter and produced the answer in only about 3 hours where the first agent took 2 and half years.

Breakdown of the Thinly Veiled Analogy:
OK, rapid fire, here we go:
The Boss: A Database User
The Super Secret Agents (i.e. You): Queries!!!
Cross town traffic: I/O buss, on the way to storage!
Security Guard: Database Authentication
Warehouse: Data storage (Tables)
Cross Reference Index Books: Er, Indexes
One last translation: Making a copy of the Cross-Reference Index books (and/or files of information) and bringing them back to your desk is “Caching.” You can see how important caching is as it completely eliminates the need to go across town if you already have a copy of the file. Your desk, in this case, would be your RAM. The more RAM you have access to, the more desk space you have to keep your copied (cached) information.

We will be using this analogy to discuss many ideas on query performance tuning. This will come up over and over as most tuning revolves around eliminating I/O, or (per this analogy) keeping the Agents from having to cross the city and look through the warehouse for information.

So, now with the analogy under your belt, let’s move on to the tuning!!!

Tuesday, July 31, 2007

Thinking Outside the Box - Help Writing Faster Queries - Part 3 of 3

Hello and welcome to our blog!

This is part three of our series to help you write faster queries.

In part one you learned about the power of indexes, when an index should be used and how to create an index.

In part two you learned about the performance hits associated with using function calls in your Where clauses.

Now, in part three, let’s think outside the box a bit. I know I said I was going to give tips on how to make your query faster… and I am. However, this may be a bit off the beaten path. But doesn’t that make for a great tip? Something you haven’t already heard a hundred times?

Help Writing Faster SQL Queries
Part 3 : Thinking Outside the Box!

Here’s the backdrop: The app is in the can, working great, and the roll out begins. Within a few days, you start hearing murmurs that parts of the app are slow. These murmurings soon become help tickets. These help tickets soon become meetings with the boss and the users. These meeting often turn into alcoholism…. You get the point.

This unfortunate situation arises all too frequently, and there is a relatively simple solution to it.

You see, everything always performs well in your development environment because you have relatively few records in your development database! Sure, if I only had 30 records in my main table, my application will work wonderfully too! Once it gets into the real world, and a few hundred thousand or millions of records are input or imported, the need for performance tuning becomes a critical apparentness.

The solution is simple; create the same number of records in your development environment, as you’d expect to have in the first year of operation. If you have importable data (say, from legacy systems or third party feeds) them pump them in. If you do not, then there are many open source utilities to create mock data that would closely conform to your actual data. If you are the creative type you may write your own code to populate the tables with "dummy data". Regardless of how you fill the tables, the point is that now, as your first writing your queries, you’ll have a really good idea if you need to concentrate on a particular query to avoid future headaches and counseling bills.

Another thing you may wish to do is model the effect of concurrent users in your database system. One easy way to do this is to create a script that simply performs searches on a table (or across multiple tables joined together). In addition, you can create another script to insert records in a table and a third script to update records.

Schedule these scripts to run every minute throughout the day and night. Doing this will result in 60 "dummy records" being inserted each hour (1,440 records added each day). Of course, you should tweak the time based on the amount of transactions you expect your system to have "in the real world". You may want to add one record per hour or you may want to add 100 records every minute.

Now, when developing your database storage and retrieval systems you will not be "alone in the box". These automated processes will simulate some of the load you can expect to deal with on your Production environment when users are searching for information, entering information and updating information. This can go a long way to help you identify areas vulnerable to contention.

This concludes our series providing help writing faster sql queries.

However, this certainly is not the end of the information we intend to share!

Please bookmark this blog now so you can easily come back tomorrow and learn more great SQL programming tips and tricks.

The SQL Coach Team

Sunday, July 29, 2007

Function Calls are Evil Incarnate - Help Writing Faster SQL Queries - Part 2 of 3

Hello again and welcome to our blog.

This is part two of our three part series to help you write faster queries.

In part one you learned about the power of indexes, when an index should be used and how to create an index.

Let's get on to part two, shall we?

Help Writing Faster SQL Queries
Part 2 : Function Calls are Evil Incarnate!

The following queries look extremely innocent. However, they are all application killers once the table grow to more than a couple thousand rows:

Select Person_Data_ID, Name_First, Name_Last
From Person_Data
Where Next_Contact_Date > GetDate()

Select Count(*)
From Person_Data
Where Upper(Name_Last) = ‘SMITH’

Select Avg(Collar_Size)
From Person_Data
Where dbo.MyFunc(Person_Data) > dbo.MyOtherFunc(22)

For each of these statements, a table scan will be used to try and fetch data.

A table scan is simply an operation where the database looks at each and every record in the table to see which ones meet it’s criteria in the Where clause.

We would much rather the database use an index and only fetch the 20 records it needs rather than reading 200,000 records in a table scan fto locate the 20 records it needs. However, the use of the functions (GetDate(), Upper(), MyFunc(), MyOtherFunc(), etc) keep it from doing that.

Why do functions keep the database from using indexes? It’s simply because the database does not know the result of the function.

Each time GetDate() is called, a different answer could come back. Each time MyFunc() is called it could come back with a different result since the function could potentially be based on the input value, the time of day, or data that is being modified elsewhere in the database. The only way the database can know for certain the answer to the function call is to make it and see what the answer is. Since it doesn’t know ahead of time what the answer is, it cannot use the index to go to the corresponding entry.

So, whenever possible, avoid the use of functions in your Where clauses. A great solution to these problems are (in order):

Select Person_Data_ID, Name_First, Name_Last
From Person_Data
Where Next_Contact_Date > GetDate()

Becomes

Declare @DT DateTime
Set @DT = GetDate()
Select Person_Data_ID, Name_First, Name_Last
From Person_Data
Where Next_Contact_Date > @DT

Voila, the database optimizer gets the value of GetDate() ahead of time. Since the value of @DT will not change while the query is running, it can use an appropriate index.

Select Count(*)
From Person_Data
Where Upper(Name_Last) = ‘SMITH’

Becomes

Select Count(*)
From Person_Data
Where Name_Last = ‘SMITH’

“Whoa?” I hear you cry. All you did was take the Upper() function out. What about case sensitivity?!? Actually, I’m cheating a bit here. With Microsoft SQL Server, the default index collations are case insensitive. In plain terms, that means the searches are case insensitive by default. With Oracle and other database engines, you can choose to make your indexes case insensitive, so why not? For a simple name search, it’s a good idea. And, it avoids the dreaded Table Scan.

The third example is a bit tougher. Essentially, if your function is extremely complex, you may not have a choice but to use it and let the user know to expect poor performance (a please wait screen goes a long way here, guys). However, if the function is merely a simple multiplication (i.e. Unit_Price * Qty) then simply do the math in the statement and avoid the function.

Please note: Many developers have it beaten into them to make reusable code so changes can be made in one central place. The fact of the matter here is that a SQL statement is not the same as a C program!! You should adjust accordingly. One of the adjustments is to give up reusable code in this context in favor of better performance.

We hope you have enjoyed the information presented in Part 2 of our series. Were you paying attention? Let's do a quick test and find out?

Is the following statement true or false?

You should always strive to use function calls in your Where clause to maximize Code Reuse and eliminate redundant code from your database?

We really hope you said the answer to the above question is false!

Part three will be ready in a day or so. We recommend that you bookmark this blog now so you can easily come back tomorrow and learn more great SQL Server tuning practices.

The SQL Coach Team

Friday, July 27, 2007

SQL Server Performance Tuning Video Lesson #1 Now Available!

Just letting you know the first video is now available.

This video shows you how to add an index to a table.

Watch as I add the index IX_PersonData_NameLastNameFirst on the Person_Data table!! When you see how easy this is you will wonder why on earth you have not done this before.

You can watch this video on YouTube.

Just click on the following link: Learn How to Add an Index to a Table!

I did not realize the video would be shown so small at YouTube. Don't worry though, lesson learned.

Future videos will be zoomed in closer to the action. If enough people request it I will rebuild this video so you can better see what is happening.

The SQL Coach Team

Thursday, July 26, 2007

The Power of Database Indexes - Help Writing Faster Queries - Part 1 of 3

Good evening! Well, it's evening here in Missouri at the time I am writing this. ;)

Before I go any further let me say we will be making some videos to show you how to do this stuff. So, if you are one of those people who learn better / faster by watching rather than reading check back soon. The video lesson for Part 1 will be available within a day or two at most. I actually plan on doing this first video tomorrow after work.

Okay, now are you ready to pick up some quick help writing faster sql queries? I hope so because we are now just moments away from Part 1 of this little series.

Let's go!

Help Writing Faster SQL Queries
Part 1 : The Power of Database Indexes

Many developers ignore indexes altogether. Many do not understand them.

An index is merely a Table of Contents to the data in the database table. While developing an application, print out the schema of the database, including indexes, so you know what they are while writing your queries.

If an appropriate index is missing, have someone experienced with indexes create it (whether that be you, the developer on the other side of the wall, or a DBA).

Here’s the impact. Say you are writing a contact management system. You would probably have a table with first name, last name, etc. For simplicity, we’ll keep this all one big table and obvious field names.

Want an example? Okay, you’d probably write a query that looks likething like:

Select Person_ID, Name_First, Name_Last
From Person_Data
Where Name_Last = @NameLast
And Name_First = @NameFirst

With 100 rows (ie. records), this query will work instantly. However, as your database grows to several thousand rows, this query will slow down, eventually hitting the seconds range. If you merely add an index with the columns Name_Last and Name_First, you will have basically instant return time for the life of the application.

I bet indexes don't seem so complicated now, do they?

Just remember to look for the columns (Name_Last and Name_First in our example above) that appear in the Where clause. These are the columns that need an index.

Creating the index

There are different ways to create the actual index. If you are more comfortable with GUIs then use Enterprise Manager or the appropriate tool for your database.

If you are comfortable working in the query window you can use the command format to build your index.

create index INDEX_NAME_HERE on TABLE_NAME_HERE (column_name, column_name, ... etc, and so on, and so forth).

This is how you create the index needed for the query above:

create index IX_PersonData_NameLastNameFirst on Person_Data (Name_Last, Name_First)

Hope you have enjoyed this first tip and now have a better understanding of indexes.

If you have any questions please post them on this blog!

Just use the link immediately below this post (the one that says comments).

Happy indexing!

The SQL Coach Team

Wednesday, July 25, 2007

New Three Part Series on Writing Faster Queries Starts Soon!

Be sure to bookmark this page and come back tomorrow when we begin our Three-Part Series on Writing Faster SQL Queries.

We'll cover three things you can do immediately to get your queries out of the database faster.

Whether you are writing batch processes, a database-driven website or a database-centric desktop application you can put these techniques into use.

Each will be explained in simple ordinary language and any "Average Joe or Jane" programmer can put these into practice.

How cool is that? =)

Cya tomorrow!

The SQL Coach Team

Monday, July 23, 2007

SQL Coach Welcomes You...

Hello and welcome to the SQL Coach blog.

This blog has been created by two professional SQL Developers. Our mission is to create a fantastic online resource filled with advanced SQL Server information.

Here is a list describing only a portion of the things you will learn in the coming weeks:

Discover a very special, often overlooked, Temporary Table that can be used to greatly accelerate your sql queries.
Determine which columns and tables to index, when to add indexes... and when not to.
What is the "Intra-query parallelism caused your server command to deadlock" error... and how you can protect yourself against it!
One of the most common causes of "latch wait" ...and how to greatly reduce latch wait.
How a simple trick of rearranging the order of criteria in a where clause can dramatically accelerate the speed of a select statement.
... and much more!

Thanks for stopping by the SQL Coach Blog. We look forward to you returning soon.

This blog was just launched today so please be patient and we will fill this blog with good solid hard-to-find SQL Server information as soon as possible.

While you are waiting for the information to arrive... please help us out by spreading the word. Tell all your co-workers and friends about our new blog, the SQL Coach!

Best regards,

B. Vegiard
G. Benjamin