Wednesday, August 8, 2007

"THE" Analogy

INTRODUCTION
Databases, such as Microsoft SQL Server, Oracle, DB2, etc, can sometimes seem to be marvelous shiny “Magic Happens Here” boxes where you simply ask it for information and “poof” the answer is dropped into your lap. At times it can remind you of your favorite sci-fi show. “My, those engineers who build these databases are really really really smart!!”

Well, yeah, they are really smart. And the databases do some pretty incredible things. However, always remember that in the end, it’s just a computer that has to follow the laws of time and space just like everything else. Given it’s mortal status, the wise developer will always bear in mind that there is more than one way to skin the preverbal cat, and some ways are much faster than others.

Here’s a big statement

WHEN WRITING DATABASE QUERIES, 95% OF THE TIME THE BOTTLENECK IS IN THE I/O, not the CPU, and not the memory. If you can cut down on the I/O operations, your query will run faster. In fact, other queries will too because they don’t have to compete with your query as much!

Now, this statement is huge, so let’s repeat it. In fact, let’s chant it under our breath as we write queries (everyone already thinks we’re strange, so there is no damage done here):

WHEN WRITING DATABASE QUERIES, 95% OF THE TIME THE BOTTLENECK IS IN THE I/O, not the CPU, and not the memory.

(Copy and Paste is great, eh?) The enormity of this statement just BEGS for an analogy, so here we go…

THE ANALOGY
You work at a super secret government agency whose initials no one has every heard of. You have been given an empty desk and the instructions that when your boss asks for information, you need to provide it to him as quickly as possible. The information you need to find is in the super secret warehouse on the other side of the city.

The boss comes in. He’s big, mean looking, and schooled in the arts of killing and maiming without remorse. He orders you to give him the total number of suspected skateboarders in the US who currently own blue GEO Metros. This is top priority; the security of our nation depends on it. “Right away” you reply as your grabbing your car keys and heading out the door with no other thought on your mind but to satisfy this dangerous man.

You jump in your car and head for the highway. After about 25 minutes of bumper-to-bumper traffic, you finally navigate your way to the underground fortress that houses all the information. A brief three-minute stop to allow the guards to check your name is on the list, and a quick strip search, you’re on your way to the belly of the repository that is buried underneath a mountain.

With vim and vigor you jump out, grab the first box, and begin digging through the “A’s” looking for any skateboarders. As you find them, you also check to see if they are driving a blue car. If so, you also double-check that it’s a GEO Metro. The first box down, you grab the next. Two and one-half years later, you have completed looking through all the boxes and gratefully return to the office with the answer in hand!!

The boss is somewhat surprised to see you! You proudly arrive state “63!! The total number is 63!!” He sneers a thanks at you, muttering something about how long it took.

“OK,” he says, “Now I need to know how many of them are over the age of 60.” You are crest-fallen. You do not relish the idea of going through all that again.

The smarter super secret agent:
Work smarter, not harder, right? Super-Secret Agent #2 (SSA2) is in the same boat. The same desk, same job description, same distance to the warehouse. When his big, mean looking boos asks for the same information, he hops in his car, crosses city traffic, and passes the security checkpoint. However, he’s smart enough to go to the “Cross-Reference” room. Once in this room, he pulls out the “Skate or Die Index” book and “You are What You Drive Index” book and begins to cross reference the two. About three hours later, he has his list of 63 candidates. Now all that remains is cross-referencing this list with the “Index for All Ages” book and cross off all those who are not yet 60. Just in case, SSA2 makes a copy of each of these index books as he suspects the boss may ask him additional questions along this line. H’ell just keep the indexes on his desk to avoid city traffic next time.

With answer in hand (a somewhat surprising 52) he starts back for the office. The SSA2 worked much smarter and produced the answer in only about 3 hours where the first agent took 2 and half years.

Breakdown of the Thinly Veiled Analogy:
OK, rapid fire, here we go:
The Boss: A Database User
The Super Secret Agents (i.e. You): Queries!!!
Cross town traffic: I/O buss, on the way to storage!
Security Guard: Database Authentication
Warehouse: Data storage (Tables)
Cross Reference Index Books: Er, Indexes
One last translation: Making a copy of the Cross-Reference Index books (and/or files of information) and bringing them back to your desk is “Caching.” You can see how important caching is as it completely eliminates the need to go across town if you already have a copy of the file. Your desk, in this case, would be your RAM. The more RAM you have access to, the more desk space you have to keep your copied (cached) information.

We will be using this analogy to discuss many ideas on query performance tuning. This will come up over and over as most tuning revolves around eliminating I/O, or (per this analogy) keeping the Agents from having to cross the city and look through the warehouse for information.

So, now with the analogy under your belt, let’s move on to the tuning!!!

No comments: