Monday, August 13, 2007

Poor Little Misunderstood Views

Some people love ‘em, some people hate ‘em. But, one thing I have found to be nearly universal is that move people misunderstand how Views are used in SQL Server.

The Backdrop
A View is merely a pre-defined query that can be treated in many ways as a table. This allows DBAs or Database Developers to pull data from several tables and expose it as a single “virtual table”. This has many advantages: the consumer of the view doesn’t have to have a detailed understanding of the database table layouts (schema); and a single addition or fix to the view then ripples out to all the consuming queries in one fell swoop.

The Problem
However, many developers struggle with the performance of Views. Most note that they operate slower than simply joining in the information they need from the base tables in every query, throwing out the advantages of the views. I know many DBAs and Database Developers who live by the “There’s no views allowed on my server” rule.

The Solution, or, How to Make Life Beeter for Everyone
Views can be a friend or a foe. The later scenario is generally born out of a lack or misunderstanding of the rules of Views. Once you get to know how SQL Server actually makes use of a view, it will make much more sense, and you can make more appropriate choices which will allow you to benefit from the Views without impacting performance.

The Misconceptions
When views were first explained to me, they were explained incorrectly. I have, since then, heard others regurgitate the same falsehood countless times. I operated under this false knowledge for years until recently, working with Query Analyzer and actually breaking down the query plans, I saw “the light.”

Most of us were taught that Views are slower because the database has to calculate them BEFORE they are used to join to other tables and BEFORE the where clauses are applied. If there are a lot of tables in the View, then this process slows everything down. This explanation seems to make sense on the surface, and is therefore easily accepted. However, NOTHING IS FURTHER FROM THE TRUTH on SQL Server!

The fact of the matter is that when a query is being broken down by the SQL Server’s optimizer, it looks at the fields within the select to see which of them are needed by the consuming query. If it needs a given field, then it extracts it from the query definition, along with it’s table from the FROM clause, and any restrictions it needs from the WHERE clause or other clauses (GROUP BY, HAVING, etc.) These extracted elements are then merged into the consuming queries and are generally treated as a sub-query. The optimizer then joins the data together along indexes as best it can, just as it does with non-view elements, and then the entire query is run. The view is NOT pre-calculated just because it came from a view definition.

So, why does it often run slower? Three reasons:

Reason 1 - Sort Order: Well, sub-queries often suffer from not being sequenced in an order that can easily be merged into the main query. This causes the server to do extra work to so the data returned by the sub-query before merging it. In this circumstance, the data is pre-calculated so it can be sorted. However, if the index that is used by the sub-query orders

Fix 1 - Watch your query plans. If an appropriate index exists that will return data in the same order that is needed for the join, then it will be pulled in without having to sort it, thus avoiding the need to pre-fetch and pre-calculate.

Reason 2 – Inner Joins: When the view is broken down to see what fields on the SELECT are needed, and then the corresponding table from the FROM clause, it has to go one step further. It must consider anything in the WHERE clause that may throw out data. As well, Inner Joins from the table in the FROM clause can also throw out data if the joined in table does not have a matching row. Since the optimizer doesn’t know whether or not the Inner Join was used as a filtering device, it has to include it. Very often, tables are joined in to show data that the consuming query doesn’t need, not just as a filter. In these cases, the Inner Join only causes SQL Server to do more work for no good reason.

Fix 2 - Whenever possible, limit the number of inner joins in the View definition. Try to only use Inner Joins when are you certain most or all consuming queries will need data from the joined table. If there are many cases where the consuming data will not, consider multiple Views to service the various cases.

Side note: Left Joins are not used as filters. If a View left joins in a table, but there are no fields used in that table, it will be eliminated when the view is pulled in.

Reason 3 – Redundant Tables Calls: When you create a view, you can actually use another view as a source of data. This practice can be nested practically limitlessly. Since each of these views will be have their query definitions pulled in as a Sub-Query, then it’s very possible that the same base table will participate in the query multiple times. This is, generally, just a waste. Why go to the same place multiple times?

Fix 3 - Try to limit yourself to only using 1 view in a query. Also, try to avoid using Views as a base table within another view. If you find yourself needing info from multiple views, consider breaking form and joining in the base tables rather than pulling in the views. In some cases, doing this has allowed me to call data from a table once rather than THREE OR FOUR times, bringing the response time down by thousands of percents (and I’m not kidding). Sometimes you have to ignore the advantages of the views to gain performance.

Conclusion
Views do have their place, but knowing how they are used will save you grief and performance time. Knowing the views are NOT pre-calculated, but rather are broken down into it’s parts and the “useful bits” (and inner joined fluff) pulled into the main query can explain a lot of the performance impacts of these objects.

Happy tuning!

No comments: