Monday, August 20, 2007

Joins / Wheres and Indexing

Another common error I have seen is anticipating what indexes will be needed on a table based on what other tables you are going to join to. For example, say you have a Puppies table and a Kid table. Most likely, you’ll have a Kid_ID column in the Puppies table pointing back to Kid; no problem there. Nearly always, you will see an index on Kid_ID. In many systems, this index is rarely used, if ever. Also, when searching for the kids who own a small Labrador with white spots on it’s back who’s owner’s last name is ‘Johnson’, the query is not running as efficiently as it could, even though there is clearly an index on the Characteristics column of the Puppies table.

Why?

The Kid_ID index is merely too simple for SQL Server to use effectively. Think back to “The” Analogy (http://sqlcoach.blogspot.com/2007/08/analogy.html). If you’re the super secret agent, do you want to drive out to the warehouse, grab the index book that lists all dogs on the planet, then grab the reference for all kids on the planet and start cross-referencing them? No!!! You’d have to cross reference several billion (if not more) entries before starting to weed out entries based on their type. In other words, you’d be cross-referencing hound ogs, shepard’s, etc. which is a lot of work, especially since in the end you don’t care about those breeds right now.

It makes more sense to grab the index reference on dog characteristics (i.e. the Characteristics Index we mentioned above.) So, SQL Server ignores the Kid_ID index and uses the Characteristics index instead.

So, back to our analogy. The agent uses the Characteristics cross reference to look for all small Labs w/white spots. He finds a mere 36,000. Now he runs into the warehouse to look through all 36,000 files looking for kids with the last name of “Johnson.” Good news… that will only take about 32 days….

What the super secret agent REALLY needs is a cross-refference that has characteristics AND Kid_ID reference. Then, if he had a reference that had Kid_ID and Last_Name, he could cross-refference the two of them without every leaving the reference room. The super-secret agent just LOVES saving time by not going into the warehouse.

So, the moral of the story…

When creating an index, create a compound key (an index with multiple fields in it). The first field(s) should be those commonly used in WHERE clauses (in this case, Characteristics). Next, place fields that commonly participate in Join clauses (such as Kid_ID in this example.) Now you’re DB is ready to Rock!!

Some of you may be asking: “How do I know which columns are being used in WHERE clauses the most?” Well, three’s two choices there: (1) Get tools that profile your database and give you quantifiable information about that very topic. These are rare and expensive. If you’re a smaller shop, then (2) Start Analyzing! Ask the project lead, ask the users, use common sense, etc. Creating indexes can be an art in this respect. Go buy a smock!!

No comments: