Friday, August 31, 2007

Building a Weekend Table - SQL Dates and Times Series

Hello and welcome (back) to the SQL Coach blog!

Tonight I am going to show you how to build a table populated with the dates of all weekend days within a certain date range.

In this project we will be covering:

  • The SmallDateTime SQL Server data type
  • How to create a Variable Temporary Table in SQL Server
  • How to loop through a date range in SQL Server T-SQL
  • How to use DatePart to determine if a date refers to a Saturday or Sunday
Open a new query window and copy and paste the following code into it...

-- Get rid of Row(s) Inserted tracking and messaging
SET NOCOUNT ON

-- Create a temporary variable table
DECLARE @WeekendDays TABLE
(
dDate SmallDateTime,
sType VarChar(3)
)

-- Define the date range we will be using
DECLARE @dDate SmallDateTime,
@dEndDate SmallDateTime

SET @dDate = '19900101'
SET @dEndDate = '20251231'

-- Populate the temporary table with dates
-- of all Saturdays and Sundays that fall
-- within the date range
WHILE @dDate < @dEndDate

BEGIN
If DATEPART(WEEKDAY, @dDate) = 1
INSERT INTO @WeekendDays Values(@dDate, 'SUN')

Else If DATEPART(WEEKDAY, @dDate) = 7
INSERT INTO @WeekendDays Values(@dDate, 'SAT')

SET @dDate = DATEADD(DAY, 1, @dDate)
END

This code will create and populate a temporary variable table holding the dates of all Saturdays and Sundays from January 1, 1990 through December 31, 2025.

For the remainder of this post I will refer to this code as the base code.

Now, let's have some fun and check out the data in this table! ;)

Suppose you want to know how many Saturdays are in December 2007.

You can find out this information by adding the following query below the base code and executing it (it being all code, the whole enchilada):

SELECT COUNT(*) AS NUM_WEEKEND_DAYS_IN_DEC_2007
FROM @WeekendDays
WHERE dDate BETWEEN '20071201' AND '20071231'


Excellent! Now we know December 2007 has 10 weekend days.

Now, let's see the dates of these weekend days.

You can find the dates by adding the following query below the base code and executing it:

SELECT dDate
FROM @WeekendDays
WHERE dDate BETWEEN '20071201' AND '20071231'

You should see 10 rows that begin with this data...

2007-12-01 00:00:00
2007-12-02 00:00:00

Now, let's get rid of the time data because we really do not need it here.

SELECT Convert(Char(8), dDate, 112)
FROM @WeekendDays
WHERE dDate BETWEEN '20071201' AND '20071231'


Now, the data will look like this...

20071201
20071202

Now, let's Americanize this format.


SELECT Cast(Month(dDate) as VarChar)
+ '/' + Cast(Day(dDate) as VarChar)
+ '/' + Cast(Year(dDate) as VarChar)

FROM @WeekendDays
WHERE dDate BETWEEN '20071201' AND '20071231'

As you see, the data now looks like this...

12/1/2007
12/2/2007


I hope you now have a much better understanding of how to work with Dates in Sql Server. And I imagine you have some ideas you just can't wait to try out!

I apologize for the lack of indenting in the code. I actually wrote the code using indenting to make it easier to follow but once I published this post the formatting was removed. Ah well, these things happen. I think the code is clear enough you can easily follow it.

Remember to check back soon for more great SQL information. And please help us to spread the word about our blog. Tell your friends. Tell your co-workers. Heck, tell your family and even your pets!

Thanks!

Wednesday, August 29, 2007

Difference between DateTime and SmallDateTime - SQL Dates and Times Series

Greetings and welcome to the SQL Coach blog!

It seems that many people are a little confused about working with time and dates in SQL databases.

Are you one of the people who are wondering about working with time in SQL?

Do you just want to know more about the sql server date time types?

If you answered "YES!" to either question you will want to check this blog daily because I will be posting a thorough and easy-to-follow description of the many options available when working with dates / time in SQL Server.

Assuming all goes to plan I will post this information by Friday night!

In the meantime, here is some information to get you started...

What is the difference between SMALLDATETIME and DATETIME?

"What's that", you ask? You never heard of this SmallDateTime data type?

You are not alone! Based on the databases I have worked in while helping my clients I think it is safe to say many people do not know about the SmallDateTime data type.

That does not change the fact that it does indeed exist. And yes, even your SQL Server 2000 has it!

Here are the main differences you should know about these two datetime types:

1. Range of Dates

A DateTime can range from January 1, 1753 to December 31, 9999.
A SmallDateTime can range from January 1, 1900 to June 6, 2079.


2. Accuracy

DateTime is accurate to three-hundredths of a second.
SmallDateTime is accurate to one minute.


3. Size

DateTime takes up 8 bytes of storage space.
SmallDateTime takes up 4 bytes of storage space.


Armed with this knowledge, you may want to use SmallDateTime instead of DateTime if you only need to represent dates from January 1, 1900 to June 6, 2079 and you do not need accuracy below 1 minute. Why? Simple! Using SmallDateTime will reduce the amount of data your queries are pulling back. The size of each row will be a bit smaller.


That's it for now... be sure to check back soon for more information on using datetime in SQL Server.

Monday, August 20, 2007

Joins / Wheres and Indexing

Another common error I have seen is anticipating what indexes will be needed on a table based on what other tables you are going to join to. For example, say you have a Puppies table and a Kid table. Most likely, you’ll have a Kid_ID column in the Puppies table pointing back to Kid; no problem there. Nearly always, you will see an index on Kid_ID. In many systems, this index is rarely used, if ever. Also, when searching for the kids who own a small Labrador with white spots on it’s back who’s owner’s last name is ‘Johnson’, the query is not running as efficiently as it could, even though there is clearly an index on the Characteristics column of the Puppies table.

Why?

The Kid_ID index is merely too simple for SQL Server to use effectively. Think back to “The” Analogy (http://sqlcoach.blogspot.com/2007/08/analogy.html). If you’re the super secret agent, do you want to drive out to the warehouse, grab the index book that lists all dogs on the planet, then grab the reference for all kids on the planet and start cross-referencing them? No!!! You’d have to cross reference several billion (if not more) entries before starting to weed out entries based on their type. In other words, you’d be cross-referencing hound ogs, shepard’s, etc. which is a lot of work, especially since in the end you don’t care about those breeds right now.

It makes more sense to grab the index reference on dog characteristics (i.e. the Characteristics Index we mentioned above.) So, SQL Server ignores the Kid_ID index and uses the Characteristics index instead.

So, back to our analogy. The agent uses the Characteristics cross reference to look for all small Labs w/white spots. He finds a mere 36,000. Now he runs into the warehouse to look through all 36,000 files looking for kids with the last name of “Johnson.” Good news… that will only take about 32 days….

What the super secret agent REALLY needs is a cross-refference that has characteristics AND Kid_ID reference. Then, if he had a reference that had Kid_ID and Last_Name, he could cross-refference the two of them without every leaving the reference room. The super-secret agent just LOVES saving time by not going into the warehouse.

So, the moral of the story…

When creating an index, create a compound key (an index with multiple fields in it). The first field(s) should be those commonly used in WHERE clauses (in this case, Characteristics). Next, place fields that commonly participate in Join clauses (such as Kid_ID in this example.) Now you’re DB is ready to Rock!!

Some of you may be asking: “How do I know which columns are being used in WHERE clauses the most?” Well, three’s two choices there: (1) Get tools that profile your database and give you quantifiable information about that very topic. These are rare and expensive. If you’re a smaller shop, then (2) Start Analyzing! Ask the project lead, ask the users, use common sense, etc. Creating indexes can be an art in this respect. Go buy a smock!!

Friday, August 17, 2007

Order of Where Clause

Disclaimer: For those of you looking for the "Where's the Order By Clause", then try googling again. You messed up.

Now on to the post....

Many people, once they start getting comfortable writing SQL, begin asking the same questions. One such common question is “Does it matter the order of stuff that I put on the where clause?” This question usually means different things to different people, so let’s dig in a bit.

The short answer, with Microsoft SQL Server, is “No.” SQL Server will internally re-order where clause items to properly match indexes or other time saving devices. The only exception to this is if items are grouped with parentheses.

That being said, other Database Management Systems (DBMS) may be impacted by where clause order. I used to work with Sybase SQL Anywhere and Pervasive DBs that it did make a difference. If you’re not working with SQL Server, take all this with a grain of salt and do you own testing.

I said earlier that this question means different things to different people. The following examples will highlight what I mean:

These two statements are equivalent in Microsoft SQL Server:

SELECT CUSTOMER_ANNOYACE_FACTOR
FROM CUSTOMERS
WHERE HAVE_CUSSED_ME_BEFORE = 1
AND STOLE_MY_GIRLFRIEND = 0

And

SELECT CUSTOMER_ANNOYACE_FACTOR
FROM CUSTOMERS
WHERE STOLE_MY_GIRLFRIEND = 0
AND HAVE_CUSSED_ME_BEFORE = 1

Notice that the first statement begins by filtering on “Have Cussed Me Before” and in the second statement, these are reversed.

As well, others may actually be asking about the following example. Again, these two are equivalent:

SELECT GEEK_LEVEL
FROM GEEK_RATINGS
WHERE GEEK_LEVEL = 25

And

SELECT GEEK_LEVEL
FROM GEEK_RATINGS
WHERE 25 = GEEK_LEVEL

Well, there ya have it. Yet another fine advanced SQL (OK, maybe not so advanced this time) from the Acme Advanced SQL Tuning squad.

Happy Tuning!

Monday, August 13, 2007

Poor Little Misunderstood Views

Some people love ‘em, some people hate ‘em. But, one thing I have found to be nearly universal is that move people misunderstand how Views are used in SQL Server.

The Backdrop
A View is merely a pre-defined query that can be treated in many ways as a table. This allows DBAs or Database Developers to pull data from several tables and expose it as a single “virtual table”. This has many advantages: the consumer of the view doesn’t have to have a detailed understanding of the database table layouts (schema); and a single addition or fix to the view then ripples out to all the consuming queries in one fell swoop.

The Problem
However, many developers struggle with the performance of Views. Most note that they operate slower than simply joining in the information they need from the base tables in every query, throwing out the advantages of the views. I know many DBAs and Database Developers who live by the “There’s no views allowed on my server” rule.

The Solution, or, How to Make Life Beeter for Everyone
Views can be a friend or a foe. The later scenario is generally born out of a lack or misunderstanding of the rules of Views. Once you get to know how SQL Server actually makes use of a view, it will make much more sense, and you can make more appropriate choices which will allow you to benefit from the Views without impacting performance.

The Misconceptions
When views were first explained to me, they were explained incorrectly. I have, since then, heard others regurgitate the same falsehood countless times. I operated under this false knowledge for years until recently, working with Query Analyzer and actually breaking down the query plans, I saw “the light.”

Most of us were taught that Views are slower because the database has to calculate them BEFORE they are used to join to other tables and BEFORE the where clauses are applied. If there are a lot of tables in the View, then this process slows everything down. This explanation seems to make sense on the surface, and is therefore easily accepted. However, NOTHING IS FURTHER FROM THE TRUTH on SQL Server!

The fact of the matter is that when a query is being broken down by the SQL Server’s optimizer, it looks at the fields within the select to see which of them are needed by the consuming query. If it needs a given field, then it extracts it from the query definition, along with it’s table from the FROM clause, and any restrictions it needs from the WHERE clause or other clauses (GROUP BY, HAVING, etc.) These extracted elements are then merged into the consuming queries and are generally treated as a sub-query. The optimizer then joins the data together along indexes as best it can, just as it does with non-view elements, and then the entire query is run. The view is NOT pre-calculated just because it came from a view definition.

So, why does it often run slower? Three reasons:

Reason 1 - Sort Order: Well, sub-queries often suffer from not being sequenced in an order that can easily be merged into the main query. This causes the server to do extra work to so the data returned by the sub-query before merging it. In this circumstance, the data is pre-calculated so it can be sorted. However, if the index that is used by the sub-query orders

Fix 1 - Watch your query plans. If an appropriate index exists that will return data in the same order that is needed for the join, then it will be pulled in without having to sort it, thus avoiding the need to pre-fetch and pre-calculate.

Reason 2 – Inner Joins: When the view is broken down to see what fields on the SELECT are needed, and then the corresponding table from the FROM clause, it has to go one step further. It must consider anything in the WHERE clause that may throw out data. As well, Inner Joins from the table in the FROM clause can also throw out data if the joined in table does not have a matching row. Since the optimizer doesn’t know whether or not the Inner Join was used as a filtering device, it has to include it. Very often, tables are joined in to show data that the consuming query doesn’t need, not just as a filter. In these cases, the Inner Join only causes SQL Server to do more work for no good reason.

Fix 2 - Whenever possible, limit the number of inner joins in the View definition. Try to only use Inner Joins when are you certain most or all consuming queries will need data from the joined table. If there are many cases where the consuming data will not, consider multiple Views to service the various cases.

Side note: Left Joins are not used as filters. If a View left joins in a table, but there are no fields used in that table, it will be eliminated when the view is pulled in.

Reason 3 – Redundant Tables Calls: When you create a view, you can actually use another view as a source of data. This practice can be nested practically limitlessly. Since each of these views will be have their query definitions pulled in as a Sub-Query, then it’s very possible that the same base table will participate in the query multiple times. This is, generally, just a waste. Why go to the same place multiple times?

Fix 3 - Try to limit yourself to only using 1 view in a query. Also, try to avoid using Views as a base table within another view. If you find yourself needing info from multiple views, consider breaking form and joining in the base tables rather than pulling in the views. In some cases, doing this has allowed me to call data from a table once rather than THREE OR FOUR times, bringing the response time down by thousands of percents (and I’m not kidding). Sometimes you have to ignore the advantages of the views to gain performance.

Conclusion
Views do have their place, but knowing how they are used will save you grief and performance time. Knowing the views are NOT pre-calculated, but rather are broken down into it’s parts and the “useful bits” (and inner joined fluff) pulled into the main query can explain a lot of the performance impacts of these objects.

Happy tuning!

Wednesday, August 8, 2007

"THE" Analogy

INTRODUCTION
Databases, such as Microsoft SQL Server, Oracle, DB2, etc, can sometimes seem to be marvelous shiny “Magic Happens Here” boxes where you simply ask it for information and “poof” the answer is dropped into your lap. At times it can remind you of your favorite sci-fi show. “My, those engineers who build these databases are really really really smart!!”

Well, yeah, they are really smart. And the databases do some pretty incredible things. However, always remember that in the end, it’s just a computer that has to follow the laws of time and space just like everything else. Given it’s mortal status, the wise developer will always bear in mind that there is more than one way to skin the preverbal cat, and some ways are much faster than others.

Here’s a big statement

WHEN WRITING DATABASE QUERIES, 95% OF THE TIME THE BOTTLENECK IS IN THE I/O, not the CPU, and not the memory. If you can cut down on the I/O operations, your query will run faster. In fact, other queries will too because they don’t have to compete with your query as much!

Now, this statement is huge, so let’s repeat it. In fact, let’s chant it under our breath as we write queries (everyone already thinks we’re strange, so there is no damage done here):

WHEN WRITING DATABASE QUERIES, 95% OF THE TIME THE BOTTLENECK IS IN THE I/O, not the CPU, and not the memory.

(Copy and Paste is great, eh?) The enormity of this statement just BEGS for an analogy, so here we go…

THE ANALOGY
You work at a super secret government agency whose initials no one has every heard of. You have been given an empty desk and the instructions that when your boss asks for information, you need to provide it to him as quickly as possible. The information you need to find is in the super secret warehouse on the other side of the city.

The boss comes in. He’s big, mean looking, and schooled in the arts of killing and maiming without remorse. He orders you to give him the total number of suspected skateboarders in the US who currently own blue GEO Metros. This is top priority; the security of our nation depends on it. “Right away” you reply as your grabbing your car keys and heading out the door with no other thought on your mind but to satisfy this dangerous man.

You jump in your car and head for the highway. After about 25 minutes of bumper-to-bumper traffic, you finally navigate your way to the underground fortress that houses all the information. A brief three-minute stop to allow the guards to check your name is on the list, and a quick strip search, you’re on your way to the belly of the repository that is buried underneath a mountain.

With vim and vigor you jump out, grab the first box, and begin digging through the “A’s” looking for any skateboarders. As you find them, you also check to see if they are driving a blue car. If so, you also double-check that it’s a GEO Metro. The first box down, you grab the next. Two and one-half years later, you have completed looking through all the boxes and gratefully return to the office with the answer in hand!!

The boss is somewhat surprised to see you! You proudly arrive state “63!! The total number is 63!!” He sneers a thanks at you, muttering something about how long it took.

“OK,” he says, “Now I need to know how many of them are over the age of 60.” You are crest-fallen. You do not relish the idea of going through all that again.

The smarter super secret agent:
Work smarter, not harder, right? Super-Secret Agent #2 (SSA2) is in the same boat. The same desk, same job description, same distance to the warehouse. When his big, mean looking boos asks for the same information, he hops in his car, crosses city traffic, and passes the security checkpoint. However, he’s smart enough to go to the “Cross-Reference” room. Once in this room, he pulls out the “Skate or Die Index” book and “You are What You Drive Index” book and begins to cross reference the two. About three hours later, he has his list of 63 candidates. Now all that remains is cross-referencing this list with the “Index for All Ages” book and cross off all those who are not yet 60. Just in case, SSA2 makes a copy of each of these index books as he suspects the boss may ask him additional questions along this line. H’ell just keep the indexes on his desk to avoid city traffic next time.

With answer in hand (a somewhat surprising 52) he starts back for the office. The SSA2 worked much smarter and produced the answer in only about 3 hours where the first agent took 2 and half years.

Breakdown of the Thinly Veiled Analogy:
OK, rapid fire, here we go:
The Boss: A Database User
The Super Secret Agents (i.e. You): Queries!!!
Cross town traffic: I/O buss, on the way to storage!
Security Guard: Database Authentication
Warehouse: Data storage (Tables)
Cross Reference Index Books: Er, Indexes
One last translation: Making a copy of the Cross-Reference Index books (and/or files of information) and bringing them back to your desk is “Caching.” You can see how important caching is as it completely eliminates the need to go across town if you already have a copy of the file. Your desk, in this case, would be your RAM. The more RAM you have access to, the more desk space you have to keep your copied (cached) information.

We will be using this analogy to discuss many ideas on query performance tuning. This will come up over and over as most tuning revolves around eliminating I/O, or (per this analogy) keeping the Agents from having to cross the city and look through the warehouse for information.

So, now with the analogy under your belt, let’s move on to the tuning!!!