I read an interesting blog entry titled “The Sin of Band-Aids” by the Oracle Alchemist. It started like this…..
Let’s take a trip to the past…we weren’t actually there, but we can pretend. Let’s look into a conversation between a DBA, developers, and management at some time ‘way back when.’
Management: Our system has been performing poorly for months now.
Developers: We didn’t change anything.
DBA: I didn’t change anything.
Management: We need a fix.
Developers: We don’t have time to go through all our code and tune it.
DBA: *frantically types: alter system set optimizer_index_caching = 35; alter system set optimizer_index_cost_adj = 25;* Fixed!
God does that sound familiar. How many times as a DBA, have we made an adjustment to the database as a Band-Aid fix to an application problem. We tell management it’s a band-aid, that eventually wont work, but thed don’t here that part.
The blog entry goes on to say…
Queries are not performing as expected. Sometimes a query will run fine, sometimes it runs horribly. Sometimes a combination of queries and users results in massive buffer busy waits or ‘latch: cache buffers chains’ waits. Nothing is working and so finally, a consultant is called.
You’ve probably guessed that I was the consultant. Move to the head of the class! In particular, there were three or four main offenders; queries that would sometimes perform well, then all of a sudden ‘turn horrible.’ The core of the problem was truly unstable execution plans.
Using ASH I was able to go through different times of day and compare reports from times where the system performed well with times the system performed horribly. Doing so allowed me to compare SQL_IDs between different times of day to see which queries seemed to be going through dramatic changes. Once a SQL_ID was found that experienced such a dramatic change, I used DBMS_XPLAN.DISPLAY_AWR to find all the different iterations of the query.
In case you’ve never used it, DBMS_XPLAN.DISPLAY_AWR is very useful. It allows you to input only a SQL_ID, and Oracle will show you the explain plans for that SQL_ID as recorded in the AWR. For instance:
select * from TABLE(dbms_xplan.display_awr(’95t7cmj84u4jz’));
The next step was to look through the results to find any inconsistencies large enough to result in huge differences in performance. In this situation, I found that ‘good’ explain plans were using decent indexes, NESTED LOOPs, and low overall bytes. The ‘bad’ explain plans were using index joins, where an index fast full scan would join with an index range scan into a view (e.g. index$_join$_003). These plans resulted in many more block touches and much more CPU used.
Using Grid Control, I looked at the timeline for each of the hash values of the SQL_ID and saw that the poorly performing plans lined up with their bad performance periods.
Of course, now the burning question: what can be done about this?
Well it’s the Oracle Alchmist blog entry not mine, only right that you go to the source to see the answer to the question and also his conclusions.
Posted by Michael Corey