381. Does
AWR report provide any PQ stats/usage?
Yes. It reports PX wait
events so you can see time spent waiting for them. This can help you assess
whether this is reasonable or too long is being spent. There are also
statisitcs displayed under "Instance Activity Stats" related to
Parallel Query for example "PX local messages recv'd", "PX local
messages sent" , "Parallel operations downgraded to ..."
382. How accurate are the advisory sections in the AWR ?
The advisory sections are
based upon data collected in a particular snapshot period that the report was
run against. As such they provide good advice based upon actual collected
evidence.
If you choose to act upon
the advice, as with any changes, verification, testing and commons sense should
be used.
383. What
is the reference time/values to consider in the AWR for checking any problem?
There is not really a hard
limit for all cases.It is better to look at what the top waits are overall on
the system and then examine these to see what is causing them. By looking at
the top waits you are looking at a combination of wait time and frequency. A
small number of longer waits may not be an issue overall. A larger number of
medium waits may have a far greater impact on the system.
384. If
we have a deadlock, can we find out in an ADDM, AWR or ASH report?
No. These reports are not
useful for that purpose. If deadlocks are occurring, you may see more waits for
enqueues than normal, but if only a couple of sessions are involved this would
likely be insignificant in the context of the whole system. If you are
encountering deadlock please refer to the following article:
Document 60.1 Troubleshooting
Assistant: Oracle Database ORA-00060 Errors on Single Instance (Non-RAC)
Diagnosing Using Deadlock Graphs in ORA-00060 Trace Files
385. Is
it possible to report the activity/access for a certain table using
AWR/ASH/ADDM?
AWR records database wide information,
so other than showing the top SQL generally, that may happen to be against the
table you are interested in, it does not specifically report on them.
ASH data reports on the
session history so you may be able to retrieve the information that you want
from there. If you use the Enterprise Manager interface to the AWR data then
this provides more flexibility than can be provided by the AWR report itself.
386. How
can you obtain the CPU consumption by a user for each hour from AWR?
The AWR report itself is
showing you information from the whole database at once. It does not provide
you with information at the individual user level. You could see session
information from an ASH report. If you use the Enterprise Manager interface to
the AWR data then this provides more flexibility than can be provided by the
AWR report itself.
387. Is
there a way to find all the memory used the by the database including all the
user processes within a certain time frame?
Not with AWR, no. The memory
statistics reported by AWR are a comparison between the start and the end of a
period.
It is possible for memory to
have been allocated and de-allocated within a period and for the statistics to
remain the same making it look as though nothing has changed in the period.
Additionally, for a user
process, the AWR report only reports PGA statistics and that process may be
taking up additional server memory at the operating system level that is not
covered or reported. This means that you might not get the total usage for the
database from AWR. You can get a full picture of memory usage at the O/S level
using OSWatcher.
388. How is ADDM/ASH related to AWR?
The AWR is the repository
where workload information is automatically recorded. This workload information
contains information about the database as a whole and also about individual sessions.
It can be reported on in different fashions.
The AWR report generates
output showing the base statistics from the AWR for a given period for the
whole database.
The ADDM report uses AWR
information for a given period for the whole database in order to present the
findings (in the form of a report) in the form of more readable advice.
The ASH report generates
output based upon the Active Session History rather than the whole database
389. I
have found the addmrpt.sql goes directly to the problem. But it seems like
awrrpt.sql gives more detail, but I have found addmrpt.sql more useful. Any
comments on this?
AWR contains a detailed set
of information that ADDM analyses and reports on. ADDM uses the information
from AWR to formulate actions. ADDM does not necessarily use all the
information
in every case since AWR is
comprehensive in its coverage and the majority of information in AWR is not
applicable all the time for performance improvement.
390. In
terms of ADDM and SGA sizing the ADDM always seems to suggest SGA is
undersized, how reliable it this as a counter? Is there a better way to
determine the optimal SGA size?
The ADDM report is working
from the basis that you are looking at it because you have a performance
problem. This means it is somewhat more strict than it might be if it sees that
there is insufficient memory to store everything. There is a misconception that
it "always suggests the SGA is undersized". It does not. If you are
encountering memory issues then this information would be useful. If you are
not encountering memory issues but have other problems then this advice may be
less important. It is difficult to differentiate between cases where the SGA is
making a difference or not based on automated information so a shortage is
reported if there is one and then this can be dealt with as necessary.
The SGA Target Advisory in
the AWR report provides further information regarding the difference that
changes to the SGA size is predicted to make.
If you consistently feel
that you want ADDM to report this less frequently, there is an example of how
to achieve this with DBMS_ADDM.INSERT_FINDING_DIRECTIVE
391. How
can confirm if my database has enough memory allocated to it?
The Advisory Statistics
section of the AWR report is a good place to start. But, generally, if you do
not have much in the way of waits for buffers (data blocks in memory) or
reloads (of statement text), you have enough memory.
ADDM will also provide
advice for a related AWR report period.
392. Can
I reduce cache waits by adding RAM to the database?
There is no direct answer to
that, you should investigate *why* there are cache waits first. If a shortage
of RAM is not the problem then adding more will make no difference
393. What
is the best way to determine that the hardware is not enough and more needs to
be added?
The ADDM report will provide
useful information in this area. You can also use OSWatcher to get information
about how the operating system is performing.
394. Would
adding new HW make all these events go away, for example adding cpu or memory?
Not necessarily, no. Adding
hardware would only help if the hardware was constraining the activity that you
are trying to perform. If you are contending with other processes for a
particular resource then adding more unrelated hardware will not help.
395. What
does swapping mean?
Swapping is when memory is
written to swap space on disk (either by pages or whole processes memory
space).
396. Are
you more likely to encounter swapping is you do not have sufficent RAM,
especially in the case of multiple instances on the same machine?
Not necessarily - some
operating systems will actively swap pages of memory. This is usually not a
problem. Only when whole (key) processes are swapped does it becomes an issue.
397. Should
I get worried if I see that the system is swapping, does that indicate a need
to add more physical RAM?
It may mean that. This is
usually the case, but not always. Your memory capacity should be planned for
and swapping should not happen but because you probably do not have infinite capacity
on your system you will occassionally encounter swapping in a normal
environment.
398. What
is the performance impact of running ASH 4 times in an hour?
The concept of
"running" ASH is little different than AWR. AWR captures
"pictures" (snapshots) of the system every N times, ASH just flushes
some memory info to disk. Thus ASH is a much more lightweight activity event
and does not place any considerable load on the system.
399. Can
an ASH report generate the same level of detail as a 10046 trace?
No - that is not possible.
It is insufficiently granular and would be misleading (it does not sample
enough data). Remember that the ASH report is sampling data from the outside.
Trace is writing everything that happens as it progresses through the code.
400. At
what time interval is ASH captured from memory?
The data stored in memory is
sampled every second. When the data is stored in the DBA_HIST_ACTIVE_SESS_HISTORY
one in 10 of these is recorded (i.e. it is recorded every 10 seconds by
default)
401. How
can you reduce the 10 second time for ASH samples stored in
dba_hist_active_sess_history ?
There is a hidden parameter
that you can change to alter from the default 10 (one in 10 - ie every 10
seconds) to another value. The parameter is _ash_disk_filter_ratio. Note
that all underscore parameter changes should be cleared through Support first
so make sure that you have good reason to modify this parameter.
For example setting this to
5 would collect information every 5 seconds. Please be aware that changing this
parameter will affect the amount of session history data that you are storing
and so you may need to alter your storage accordingly.
402. What
is the minimum interval possible by AWR snapshot?
There is no minimum. But you
need to make a call on how scalable and useful collecting at that frequency is
based upon your application and storage limitations.
403. The
interval for AWR reports are normally generated using the snapshots created
automatically (eg. every 30 mins). Is there a way to manually create snapshots
so I can generate AWR with any interval?
Yes. You can create manual
snapshots whenever you like. Please look into DBMS_WORKLOAD_REPOSITORY package
404. If
we had 4 AWR reports of 15 minutes each, then does that mean we do not need ASH
reports?
No. ASH and AWR reprots
cover different areas. AWR reports deal with the database as a whole. ASH
focuses on individual sessions.
405. Can
we use a sql tuning task on AWR history information?
Yes, you can supply a SQLID
and snap_ids to DBMS_SQLTUNE.CREATE_TUNING_TASK()
406. Can
the AWR run timeframe be modified to run for periods in the past?
You can only report on
periods where you have already collected information (in the form of
snapshots). So you can report on historic snapshots but you cannot change the
times of them.
407. Is
there any way to generate multiple hourly AWR reports at once instead of
generating them one at a time interactively?
There is no direct routine
to generate reports for every snapshot taken in a range of snapshots. You could
however create a pl/sql loop and run the report for each snapshot pair. i.e. in
pseudo code:
for i in begin_day_snap ..
end_day_snap
loop
exec
dbms_workload_repository.awr_report(..)
end loop
408. Is
there a way we can automatically send old snapshot of all prodcution DB's to a
remote DB before they get purged based on retention?
There is no automated
mechanism like this available now.
410. If
it is not automatic, is there a way to transfer snapshots to Remote DB's which
can store snap of all production DB's manually? How can we implement it?
To archive AWR data in an
archive DB - one option could probably be to replicate the relevant AWR tables
in the AWR repository from the main database to the archive DB (via STREAMS / Goldengate).
Remember that you would need to filter and delete operations if you were
purging from the source database so as not to also purge the target.
Alternatively, you can
export awr data from one database and import it into another database. One
usage of this would be to allow you to do comparison reporting. If these
suggestions do not help you then you probably will need to ask in the Community
or file a Service Request.
411. If a
database has been created as a copy of another database it may contains
'cloned' AWR information. Is it possible to clear the old cloned instance AWR/ASH
report history from database.
To clear up space, have a
look at articles like :
Document 552880.1 General
Guidelines for SYSAUX Space Issues
Document 1292724.1
Suggestions if your SYSAUX Tablespace grows rapidly or too large
If these suggestions do not
help you then you probably will need to ask in the Community or file a Service
Request.
412. Is
there a best way to cleanup the AWR data after DBID changes?
Document 1251795.1 How To
remove Workload Repository information related to an old DBID from workload
repository
413. What
products/feature packs need to be licensed to use AWR/ASH/ADDM ?
AWR/ASH and ADDM reports are
all based upon data collected in the Automatic Workload Repository. Any use or
interpretation of that data needs a diagnostic pack license, see:Document
1490798.1 AWR Reporting - Licensing Requirements Clarification
414. Can
AWR/ASH/ADDM be used on Oracle Standard Edition?
No. Automatic Workload
Repository (AWR) is an Enterprise Edition feature. To find out about this and
other features see:
Document 1361401.1 Where to
Find Information About Performance Related Features
415. Do
we need to have a separate license to use AWR/ASH/ADDM from OEM ?
No.
It needs the diagnostic pack license regardless of how it is accessed
416. Do we have any healthcheck reports to check overall DB
performance issue besides running ADDM/AWR reports for bad period time?
There
is a general database health monitor to check the database. Details can be
found in the following article:
Document
466920.1 Health monitor
There
is no specific performance health check report. An ADDM report should provide
good recommendations for this and AWR is the best way to get a baseline for
good performance.
417.Are there any special considerations for the Automatic
Workload Repository (AWR) when migrating databases to 11g?
No.
There are no AWR specific considerations
418. AWR does not give recommendations on things found in the
report. How do I get recommendations?
Use
the ADDM report to assist you to interpret the information. This report does
numerous comparisons to determine improvement actions.
419. Is it possible to baseline AWR reports and is this
worthwhile?
Yes,
you can use any meaningful set of snapshot as a baseline and then use this to
compare that with other reports. You can use the "good" period from
your system to compare with the "bad" one, in this way you have a
reliable expectation about what your healthy system looks like. The easiest way
to do this is via Enterprise Manager. Basic commands are also mentioned in the
following article:
Document
748642.1 How to Generate an AWR Report and Create Baselines
A
good period could be defined as when the performance is considered "good
enough" to meet business requirements
420. Why does generating an AWR report in HTML format takes much
more time than AWR in TXT format?
Generating
a HTML report is more complex since more formatting is required to produce the
tags and internal links. If it takes MUCH longer than a TXT report then please
log a Service Request.
421. Where can we get more information on specific events and
system calls ?
The
most common wait events are documented in the reference manual and in MOS you
can find documents with additional details. For example, you can find
information about 'library cache load lock' wait event in detail in :
Document
444560.1 Troubleshooting Library Cache: Lock, Pin and Load Lock
Note
that there are no specific "bad" wait events. Whatever your top
consumer is, it needs investigation even if it is just CPU (for example, you
may be wasting CPU cycles with poorly tuned SQLs)
422. What is
the difference between latch and lock?
A
latch is an internal mechanism used by Oracle to protect internal structures
from modification during concurrent activity.
A
lock is a mechanism provided by Oracle to protect user data from concurrent
activity while it is being changed.
The
goal of both is to make sure that only a single process can modify a resource
at the same time. Both are serialisation mechanisms. Document 22908.1 FAQ: What
are Latches and What Causes Latch Contention
423. Is oradebug still being used?
Yes.
The ORADEBUG utility is used primarily by Oracle Support to diagnose problems
that may arise with an Oracle database. You may use it for generating various
different types of trace.
The
AWR records performance data across the whole database by taking intermittent
snapshots that can then be analysed and compared with other snapshots.
424. Is there a "performance" parameter in Oracle?
No.
You need to look at the issue that is causing the performance issue and address
that directly.
425. What is is
difference between TKProf and AWR reports?
TKProf
produces a report from a single trace file generated by a single session.
AWR
produces a report based upon the whole database activity as recorded in the AWR
repository.
Document
29012.1 QREF: TKPROF Usage - Quick Reference
Document
760786.1 TKProf Interpretation (9i and above)
426. How do I gather Performance Related Diagnostics?
AWR reports/Statspack reports
1363422.1 AWR Reports - Information Center
AWR/Statspack
reports provide a method for evaluating the relative performance of a
database.Assuming you have the appropriate licenses for AWR, please gather
Automatic Workload Repository (AWR) reports for the system. Without the
specific licenses please gather statspack snapshots.
Oracle
10g+ :From 10g onwards, to check for general performance issues, use the
Automatic Workload Repository (AWR) and specifically the Automatic Database
Diagnostic Monitor (ADDM) tool for assistance.
276103.1 PERFORMANCE TUNING USING
10g ADVISORS AND MANAGEABILITY FEATURES
Prior
to 10g :For 9i and 8i, statspack, rather than AWR, reports should be gathered.
To
gather a statspack report, please refer to: Document 94224.1 FAQ- Statspack
Complete Reference.
10046 Trace (SQL_TRACE) 10046 trace
gathers tracing information about a session.
Example
usage in a session is as follows:
alter
session set timed_statistics = true;
alter
session set statistics_level=all;
alter
session set max_dump_file_size = unlimited;
alter
session set events '10046 trace name context forever,level 12';
--
run the statement(s) to be traced --
select
* from dual;
exit;
These
settings ensure that all the relevant information for tracing is collected and
that the cursor involved is successfully closed.
376442.1
Recommended Method for Obtaining 10046 trace for Tuning.
Querying Waits from V$Session_wait
The view V$Session_wait can show useful information about what a
session is waiting for.
Multiple
selects from this view can indicate if a session is moving or not. Document
When wait_time=0 the session is waiting, any other value indicates CPU
activity. The 'state' column also helps to confirm the wait state of the
session.
The
following is an example formatted select:
set
lines 132 pages 999
column
event format a30
SELECT
sid,state,event,seq#,p1,p2,p3,wait_time FROM V$session_wait WHERE sid =
&&SID;
SELECT
sid,state,event,seq#,p1,p2,p3,wait_time FROM V$session_wait WHERE sid =
&&SID;
SELECT
sid,state,event,seq#,p1,p2,p3,wait_time FROM V$session_wait WHERE sid =
&&SID;
Document
43718.1 VIEW "V$SESSION_WAIT" Reference
** IMPORTANT ** v$session_wait
is often misinterpreted.
Apart
from the wait_time=0 confusion, users will often assume a session is waiting
because an event is shown in v$session_wait and the seconds_in_wait column is
rising. It should be remembered that seconds_in_wait only applies to a current
wait if wait_time =0 , otherwise it is actually "seconds since the last
wait completed". So if the wait has completed (i.e. non-zero wait_time)
then this would be expected to change. The "state" column can be used
to clear up this misinterpretation: If the session is waiting it says
"WAITING" and "WAITED%" if we are no longer waiting.
Finding
Current Session ID - This select is useful for finding the current session
information for tracing later:
SELECT
p.pid ,p.SPID,s.SID FROM v$process p,v$session s WHERE s.paddr = p.addr AND
s.audsid = userenv('SESSIONID') /
System State Dumps If the
database is hung then we need to gather systemstate dumps to try to determine
what is happening. At
least
3 dumps should be taken as follows:
Login
to sqlplus as the internal user:
10g
and Later:
(a)
Non-Rac
sqlplus
"/ as sysdba"
oradebug
setmypid
oradebug
unlimit
oradebug
dump systemstate 266
wait
90 seconds
oradebug
dump systemstate 266
wait
90 seconds
oradebug
dump systemstate 266
quit
(b)
RAC
$
sqlplus '/ as sysdba'
oradebug
setmypid
oradebug unlimits
oradebug
setinst all
oradebug
-g all hanganalyze 4
oradebug
-g all dump systemstate 266
quit
Prior
to 10g:
sqlplus
"/ as sysdba"
rem
-- set trace file size to unlimited:
alter
session set max_dump_file_size = unlimited;
alter
session set events '10998 trace name context forever, level 1';
alter
session set events 'immediate trace name systemstate level 10';
alter
session set events 'immediate trace name systemstate level 10';
alter
session set events 'immediate trace name systemstate level 10';
If no connection is possible at all then please refer to the
following article which describes how to collect systemstates in that
situation:
Document
121779.1 Taking a SYSTEMSTATE dump when you cannot CONNECT to Oracle
Errorstack Errorstack
traces are Oracle Call Stack dumps that can be used to gather stack information
for a process. They are useful for determination of what a process is doing at
a particular time. Attach to the process and gather at least 3 errorstacks. For
details see: Document 1364257.1 How to Collect Errorstacks for use in
Diagnosing Performance Issues.
PSTACK Pstack is
an operating system tool that can be used to gather stack information on some
unix platforms. Attach to the process and gather about 10 pstacks while the job
is running.
%
script pstacks.txt
%
/usr/proc/bin/pstack pid
--repeat
10x
%
/usr/proc/bin/pstack pid
%
exit
The
PID is the o/s process id of the process to be traced. Repeat the pstack
command about 10 times to capture possible stack changes. Further details of
pstack are in :
Document
1386530.1 Using truss and pstack to Debug Hang Type Problems
Document
70609.1 How To Display Information About Processes on SUN Solaris
PL/SQL Profiler The PL/SQL
profiler provides information about PL/SQL code with regard to CPU usage and
other resource
usage
information. See Document 243755.1 Implementing and Using the PL/SQL Profiler.
Hanganalyze Hanganalyze
is often gathered for hang situations. The following describes how to gather
hanganalyze dumps:
Document
175006.1 Steps to generate HANGANALYZE trace files.
OS Watcher OS Watcher
(OSW) is a collection of UNIX shell scripts intended to collect and archive
operating system and
network
metrics to aid support in diagnosing performance issues. Document 301137.1 OS
Watcher User Guide
LTOM The Lite Onboard Monitor (LTOM) is a java
program designed as a proactive, real-time diagnostic platform and
provides
real-time automatic problem detection and data collection.
Document
352363.1 LTOM - The On-Board Monitor User Guide
Document
461050.1 The LTOM Graph (LTOMg) User Guide
Document
461228.1 The LTOM Graph FAQ
Document
461052.1 LTOM System Profiler - Sample Output
Document
977975.1 Diagnose LTOM Connection Problems
427. How do I Interpret the Results of Various Traces
Statspack Reports Look at the Top 5 waiters section
and work to reduce the time spent in the top waiter first. Once that has been addressed,
then regather a statspack report and see what effect the changes have made.The
following assumptions hold true:-
Top
waiter is IO/CPU -> Main issue is likely to be SQL tuning
Top
waiter is any other event -> Database wide performance issue
For
more information on interpretation of statspacks output refer to:
http://www.oracle.com/technology/deploy/performance/pdf/statspack_tuning_otn_new.pdf
10046 traces
In order to find what is causing a session to run slowly with a
10046 trace of that session, do the following:
Run
the 10046 trace through TKProf and look at the total time spent in SQL. search
back through the TKProf report looking for a SQL Statement which takes up the
largest proportion of the total.
Look
at the breakdown of time and wait events for that SQL.
Always
remember that the 'number of executions' is important as although the time for
a statement may be high this may be accompanied by an equally high execution
count. Assume the following:-
If
most of the time is spent in parsing there may be a parsing issue
If
the number of physical IOs is high then look at changing the access path of the
query to do less work or increasing the buffer cache to get buffers from memory
rather than blocks from disk.
If
the wait events are enqueue related then generally this is an application
design issue.
Determine
the enqueue which is being waited for and address appropriately.
Document
21154.1 EVENT 10046 "enable SQL statement tracing (including
binds/waits)"
Document
39817.1 Interpreting Raw SQL_TRACE and DBMS_SUPPORT.START_TRACE output
Document
199081.1 SQL_TRACE (10046), TKProf and Explain Plan - Overview Reference
Systemstates These
should be sent to Oracle Support Services to interpret.
Hanganalyze These
should be sent to Oracle Support Services to interpret.
Errorstacks These
should be sent to Oracle Support Services to interpret (Some of the calls on
the stack are generic and as
a
result of how an errorstack works so , if searched for on MyOracleSupport, can
lead to incorrect analysis).
428. Top
Database Performance Issues/Problems and How To Resolve Them
Library Cache/Shared Pool Latch waits Typically Library Cache/Shared Pool
Latch waits is a contention problem caused by unshared SQL (in the case of the library cache latch), or exhaustion of
space in the shared pool (for the shared pool latch).
With
the shared pool latch specifically, while new space allocations will require the
latch, it is typically the freeing AND allocation of space in a shared pool
that is too small which causes problem. See: 62143.1 Understanding and Tuning
the Shared Pool
High Version Counts High version counts occur when
there are multiple copies of the 'same' statement in the shared pool, but some
factor
prevents them from being shared. This wastes space and causing latch
contention.
Document
296377.1 Handling and resolving unshared cursors/large version_counts
Document 438755.1 Formated V$SQL_SHARED_CURSOR
Report by SQLID or Hash Value (Doc ID 438755.1)
Log File Sync waits Log file sync waits occur when
sessions wait for redo data to be written to disk. Typically this is caused by
slow writes or committing too frequently in the application.
Document
34592.1 WAITEVENT: "log file sync" Reference Note.
Document
857576.1 How to Tune Log File Sync?
Document
1064487.1 Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)
Buffer Busy waits Buffer Busy waits occur when a
session wants to access a database block in the buffer cache but it cannot as
the
buffer is "busy" .
Typical solutions are:- Look for SQL that accesses
the blocks in question and determine if the repeated reads are necessary.
Check
for suboptimal SQL (this is the most common cause of the events) - look at the
execution plan for the SQL being run and try to reduce the gets per executions
which will minimise the number of blocks being accessed and therefore reduce
the chances of multiple sessions contending for the same block.
Document
34405.1 WAITEVENT: "buffer busy waits" Reference Note
Document
155971.1 Resolving Intense and "Random" Buffer Busy Wait Performance
Problems:
Document
163424.1 How To Identify a Hot Block Within The Database Buffer Cache.
Cache Buffers Chains Latch waits
Document
1342917.1 Troubleshooting 'latch: cache buffers chains' Wait Contention (Doc ID
1342917.1)
WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! This Issue occurs when the database
detects that a waiter had waited for a resource for longer than a particular
threshold. The message "WAITED TOO LONG FOR A ROW CACHE ENQUEUE
LOCK!" appears in the alert log and trace and systemstates are dumped. Typically
this is caused by two (or more) incompatible operations being run
simultaneously.
Document
278316.1 Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE
LOCK! "
ORA-60 DEADLOCK DETECTED/enqueue hash chains latch ORA-60 is an application error
which usually occurs because a consistent locking strategy has not been
followed throughout an application.
Refer to: hash chains latch waits'
are referenced here is that, typically, during deadlock detection (ie the
routine Oracle uses to determine if a deadlock actually exists), there is a
heavy need for the latch which can cause issues for other sessions. If there is
a problem with this latch, check if a trace file is generated for the ORA-60
and resolve that issue.
TX
- Document 62354.1 TX Transaction locks - Example wait scenarios
TM
- Document 33453.1 REFERENTIAL INTEGRITY AND LOCKING
Procwatcher/Cluster Health Monitor These tools
can be used to monitor various aspects of database health. Refer to:
Document
459694.1 Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware
Processes
Document
736752.1 Introducing Cluster Health Monitor
429. What If I cannot Find an Answer to my Issue?
Performance
issues come in all sorts of guises, but you might find that someone in the
community at large has seen something similar and can help. A community has
been set up for My Oracle Support (MOS) users moderated by Oracle. The goal of
this community is to exchange database related Tuning knowledge and concepts.
The community can be found here: Database Tuning
430. Pro-Active
Problem Avoidance and Diagnostic Collection
Although
some problems may be unforeseen, in many cases problems may be avoidable if
signs are detected early enough. Additionally, if an issue does occur, it is no
use collecting information about that issue after the event. AWR reports are
one of the support recommend methods for collecting such diagnostics. For
information on suggested uses, other proactive preparations and diagnostics,
see: Document 1482811.1 Best Practices: Proactively Avoiding Database and Query
Performance Issues
Document
1477599.1 Best Practices Around Data Collection For Performance Issues
AWR
reports are an extremely useful diagnostic tool for the determination of the
potential cause of database wide performance issues. Typically when a
performance issue is detected you would collect an AWR report covering the
period of the poor performance. It is best to use a reporting period no
longer than 1 hour as otherwise specifics can be lost.
It
is also prudent to Gather AWR reports during times when performance is
acceptable to provide baselines for comparison when there is a problem. Ensure
that the baseline snapshot duration is the same as the problem duration to
facilitate like with like comparison
Document 1363422.1 Automatic Workload Repository (AWR) Reports - Start
Point
NOTE:
It is often prudent to use a matched ADDM report initially to give a pointer to
the main issues. Reading the corresponding ADDM report as a first step to
tuning can save a lot of time because it immediately points at the main user as
compared to trying to understand what an AWR report is presenting.
Interpretation
Since
we are looking at a performance issue, our primary concern is what the database
is waiting for. When processes wait, they are being prevented from doing an
activity because of some other factor. High waits provide the highest benefit
when wait times are reduced and as such are a good focus. The Top Wait
information provides such information and allows us to focus on the main
problem areas without wasting time investigating areas that are not causing
significant delay.
Top 5 Timed Events
As
mentioned, the Top waits section is the most important single section in the
whole report being as it quatifies and allows comparison of the primary diagnostic:
what each session is waiting for. An example output is provided below:
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~
wait Call
Event
Waits Time (s) (ms) Time Wait Class
------------------------------
------------ ----------- ------ ------ ----------
db
file scattered read 10,152,564 81,327 8 29.6 User I/O
db
file sequential read 10,327,231 75,878 7 27.6 User I/O
CPU
time 56,207 20.5
read
by other session 4,397,330 33,455 8 12.2 User I/O
PX
Deq Credit: send blkd 31,398 26,576 846 9.7 Other
-------------------------------------------------------------
The
Top 5 Waits section reports on a number of useful topics related to Events. It
records the number of waits encountered in the period and the total time spent
waiting together with the average time waited for each event. The section is
ordered by the %age of the total call time that each Event is responsible for.
Dependent
on what is seen in this section, other report sections may need to be
referenced in order to quantify or check the findings. For example, the wait
count for a particular event needs to be assessed based upon the duration of
the reporting period and also the number of users on the database at the time;
10 Million waits in 10 minutes is far more significant than 10 Million in 10
hours, or if shared among 10 users as opposed to 10,000.
In
this example report, almost 60% of the time is spent waiting for I/O related
reads. Event 'db file scattered read ' is typically used when fetching blocks
for a full tablescan index fast full scan and performs multiblock IO.
Event
'db file sequential read' is a single block read and is typically engaged for
any activity where multiblock io is unavailable (for example index reads).
◦
Another 20% of the time is spent waiting for or using CPU time. High CPU usage
is often a symptom of poorly tuned SQL (or at least SQL which has potential to
take less resource) of which excessive I/O can also be a symptom.
Based
on this we would investigate whether these waits indicate a problem or not. If
so, resolve the problem, if not, move on to the next wait to determine if that
is a potential cause. There are 2 main reasons why I/O related waits are going
to be top of the waits:
The
database is
◦ doing lots of reads
◦
The individual reads are slow
The
Top 5 events show us information that helps us here :
Is the database doing lots of reads?: The section
shows > 10 Million reads for each of these events in the period.
Whether
this is a lot depends on whether the report duration is 1 hour or 1 minute. Check
the report duration to asses this. If the reads do seem excessive, then why
would the database do a lot of reads?
The
database only reads data because the execution of SQL statements has instructed
it to do so.
◦Are the individual reads slow? The section
shows waits of <=8 ms for the 2 I/O related events.
Whether
this is fast or slow is dependent on the hardware underlying the I/O subsystem,
but typically anything under 20 ms is acceptable. If the I/O was slow then you
can get further information from the 'Tablespace IO Stats ' section:
Tablespace
IO Stats DB/Inst: VMWREP/VMWREP Snaps: 1-15
->
ordered by IOs (Reads + Writes) desc
Tablespace
------------------------------
Av
Av Av Av Buffer Av Buf
Reads
Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
--------------
------- ------ ------- ------------ -------- ---------- ------
TS_TX_DATA
14,246,367
283 7.6 4.6 145,263,880 2,883 3,844,161 8.3
USER
204,834
4 10.7 1.0 17,849,021 354 15,249 9.8
UNDOTS1
19,725
0 3.0 1.0 10,064,086 200 1,964 4.9
AE_TS
4,287,567
85 5.4 6.7 932 0 465,793 3.7
TEMP
2,022,883
40 0.0 5.8 878,049 17 0 0.0
UNDOTS3
1,310,493
26 4.6 1.0 941,675 19 43 0.0
TS_TX_IDX
1,884,478
37 7.3 1.0 23,695 0 73,703 8.3
>SYSAUX
346,094
7 5.6 3.9 112,744 2 0 0.0
SYSTEM
101,771
2 7.9 3.5 25,098 0 653 2.7
Specifically,
look for the timing under Rd(ms). If it is higher than 20 milliseconds per read
and reads are high, then you may want to start investigating a potential I/O
bottleneck from the os.
NOTE:
You should ignore relatively idle tablespaces/files as you can get high values
due to disk spinup etc. which are not relevant. If you have an issue with 10
million reads being slow it is unlikely that a tablespace/file with 10 reads
has caused the problem!
Note:223117.1
Troubleshooting I/O-related waits
Although
high waits for 'db file scattered read' and 'db file sequential read' can be
I/O related, it is actually more common to find that these waits are relatively
'normal' based on the SQL that the database is being asked to run. In fact, on
a well tuned database, you would want these events to be top of the waits,
since that would mean that no 'problem' events were there instead!
The
trick is being able to assess whether the high waits is indicative of some SQL
statements are not using optimal paths (as mentioned earlier) or otherwise. If
there are high waits for 'db file scattered read', then SQL may not be using
optimal access paths and so are tending to do Full Table Scans as opposed to
indexes (or there may be missing indexes or not optimal indexes).
Furthermore,
high waits for 'db file sequential read' may indicate SQL statements are using unselective
indexes and there for reading more index blocks than necessary or using the
wronmg indexes. So these waits may point to poor execution plans for sqls.
In
either case the next step would be to check the top resource consuming sqls
from the AWR report to determine whether these look excessive or whether
improvements can be made.
To
do this look at the SQL Statistics Section.
As
mentioned 20% of the time is spent waiting for or using CPU time. This should
also be looked at when looking at the SQL Statistics.
Remember
that the next step to take following the Top 5 Waits is dependent upon the
findings within that section. In the example above, 3 of the waits point
towards potentially Sub-optimal SQL so that should be the section investigated
next.
Equally,
if you do not see any latch waits, then latches are not causing a significant
problem on your instance and so you do not need to investigate latch waits
further.
Generally,
if the database is slow, and the Top 5 timed events include "CPU" and
"db file sequential read" and "db file scattered read" in
any order, then it is usually worth jumping to the Top SQL (by logical and
physical reads) section of an AWR report and calling the SQL Tuning Advisor on
them (or tune them manually) just to make sure that they are running
efficiently.
•
SQL Statistics
AWR
Reports show a number of different SQL statistics: The different SQL statistic
sub sections should be examined based upon the Top Wait events seen in the Top
5 Section.
In
our example we saw top waits as 'db file scattered read' , 'db file sequential
read' and CPU. For these we are most interested in SQL ordered by CPU Time,
Gets and Reads. These sections actually duplicate some information adding other
specifics as appropriate to the topic.
Often
looking at 'SQL ordered by gets' is a convenient stating point as statements
with high buffer gets are usually good candidates for tuning :
SQL
ordered by Gets
->
Resources reported for PL/SQL code includes the resources used by all SQL
statements
called by the code.
->
Total Buffer Gets: 4,745,943,815
->
Captured SQL account for 122.2% of Total
Gets
CPU Elapsed
Buffer
Gets Executions per Exec %Total Time (s) Time (s) SQL Id
--------------
------------ ------------ ------ -------- --------- -------------
1,228,753,877
168 7,314,011.2 25.9 8022.46 8404.73 5t1y1nvmwp2
SELECT
ADDRESSID",CURRENT$."ADDRESSTYPEID",CURRENT$URRENT$."ADDRESS3",
CURRENT$."CITY",CURRENT$."ZIP",CURRENT$."STATE",CURRENT$."PHONECOUNTRYCODE",
CURRENT$."PHONENUMBER",CURRENT$."PHONEEXTENSION",CURRENT$."FAXCOU
1,039,875,759
62,959,363 16.5 21.9 5320.27 5618.96 grr4mg7ms81
Module:
DBMS_SCHEDULER
INSERT
INTO "ADDRESS_RDONLY"
("ADDRESSID","ADDRESSTYPEID","CUSTOMERID","
ADDRESS1","ADDRESS2","ADDRESS3","CITY","ZIP","STATE","PHONECOUNTRYCODE","PHONENU
854,035,223
168 5,083,543.0 18.0 5713.50 7458.95 4at7cbx8hnz
SELECT
"CUSTOMERID",CURRENT$."ISACTIVE",CURRENT$."FIRSTNAME",CURRENT$."LASTNAME",CU<
RRENT$."ORGANIZATION",CURRENT$."DATEREGISTERED",CURRENT$."CUSTOMERSTATUSID",CURR
ENT$."LASTMODIFIEDDATE",CURRENT$."SOURCE",CURRENT$."EMPLOYEEDEPT",CURRENT$.
Tuning
can either be performed either manually or by calling the SQL Tuning Advisor on
them:
Document
271196.1 Automatic SQL Tuning - SQL Profiles.
Document
262687.1 How to use the Sql Tuning Advisor.
Document
276103.1 PERFORMANCE TUNING USING ADVISORS AND MANAGEABILITY FEATURES: AWR,
ASH, and ADDM and Sql Tuning Advisor.
Analysis:
->
Total Buffer Gets: 4,745,943,815 : On
the assumption that this is an hour long report, this is a significant number
of gets and as
such
this confirms that it is worth investigating the top SQL statements to make
sure they are taking optimal paths.
◦
Individual Buffer Gets : The buffer gets for the individual statements shown
are very high with the lowest being 850 Million. These 3 statements actually
point towards 2 different reasons for the large number of buffers:
◦
Excessive Buffer Gets/Execution : SQL_IDs '5t1y1nvmwp2' and '4at7cbx8hnz' are
only executed 168 times, but each execution reads over 5 Million buffers. This
SQL statement is a prime candidate for tuning since the number of buffers read
in each execution is so high.
■
Excessive Executions : On the other hand SQL_ID 'grr4mg7ms81' only reads 16
buffers for each execution. Tuning the individual statement may not be able to
reduce that significantly. However, the issue with this statement is caused by
the number of times it is executed - 65 Million. Changing the way in which the
statement is called is likely to have the largest impact here - it is likely
that the statement is called in a loop, once per record, if it could be called
so as to process multiple records at once then there is potential for
significant economies of scale.
■
Remember that these numbers may be 'normal' for this environment (since some
are very busy). By comparing this report against a baseline, you can see
whether these SQL statements also read this much data when the database
performs well. If they do then they are not the cause of the issue and can be
ignored (although there may be benefit generally in improving them).
Other SQL Statistic Sections
As
mentioned previously there are a number of different report sections that help
for specific causes. If you do not have the particular cause then there is
likely to be little benefit in looking at these. The following section outlines
some potential causes and uses:
◦ Waits for 'Cursor: mutex/pin'
If
there are mutex waits such such as 'Cursor: pin S wait on X' or 'Cursor: mutex
X' etc , then these are indicative of parsing issues. On this basis look for
statements with high parse counts or high version counts under 'SQL ordered by
Parse Calls' and 'SQL ordered by Version Count' as these are most likely to be
the causes of problems. The following notes can assist further:
Document
1356828.1 FAQ: 'cursor: mutex ..' / 'cursor: pin ..' / 'library cache: mutex
..' Type Wait Events
Note:1349387.1
Troubleshooting 'cursor: pin S wait on X' waits.
•Load Profile
Dependent
on the waits, the load profile section either provides useful general
background information or specific details related to potential issues.
Load
Profile
~~~~~~~~~~~~ Per Second Per
Transaction
---------------
---------------
Redo
size: 4,585,414.80 3,165,883.14
Logical
reads: 94,185.63 65,028.07
Block
changes: 40,028.57 27,636.71
Physical
reads: 2,206.12 1,523.16
Physical
writes: 3,939.97 2,720.25
User
calls: 50.08 34.58
Parses:
26.96 18.61
Hard
parses: 1.49 1.03
Sorts:
18.36 12.68
Logons:
0.13 0.09
Executes:
4,925.89 3,400.96
Transactions:
1.45
%
Blocks changed per Read: 42.50 Recursive Call %: 99.19
Rollback
per transaction %: 59.69 Rows per Sort: 1922.64
In
the example, the waits section shows potential for issues with the execution of
SQL so the load profile can be checked for details in this area, although it is
not the primary source of such information.If you were looking at the AWR
report for general tuning you might pick up that the load section shows relatively
high redo activity with high physical writes. There are more writes than reads
on this load with 42% block changes.
Furthermore,
there is less hard parsing compared the soft parses. If there was a mutex wait
as top wait such as 'library cache: mutex X', then statistics such as the
overall parse rate would be more relevant. Again, comparing to a baseline will
provide the best information, for example, checking to see if the load has
changed by comparing redo size, users calls, and parsing.
•Instance Efficiency
Again,
instance efficiency stats are more use for general tuning as opposed to
addressing specific issues (unless waits point at these).
Instance
Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer
Nowait %: 99.91 Redo NoWait %: 100.00
Buffer
Hit %: 98.14 In-memory Sort %: 99.98
Library Hit %: 99.91 Soft Parse %:
94.48
Execute
to Parse %: 99.45 Latch Hit %: 99.97
Parse
CPU to Parse Elapsd %: 71.23 % Non-Parse CPU: 99.00
The
most important Statistic presented here from the point of view of our example
is the '% Non-Parse CPU' because this indicates that almost all the CPU time
that we see in the Top Waits section is attributable to Execution and not
parse, which means that tuning SQL may help to improve this.
If
we were tuning then 94.48% soft parse rate would show a small proportion of
hard parsing which is desirable. The high execute to parse % indicates good
usage of cursors. Generally, we want the statistics here close to 100%, but
remember that a few percent may not be relevant dependent on the application.
For example, in a data warehouse environment, hard parsing may be higher due to
usage of materialized views and, or histograms. So again comparing to baseline
report when performance was good is important.
• Latch Activity
In
the example we are not seeing significant waits for latches so this section
could be ignored. However if latch waits were significant, then we would be
looking for high latch sleeps under Latch Sleep Breakdown for latch free waits:
Latch
Sleep Breakdown
*
ordered by misses desc
Latch
Name
----------------------------------------
Get
Requests Misses Sleeps Spin Gets Sleep1 Sleep2 Sleep3
--------------
----------- ----------- ---------- -------- -------- --------
cache
buffers chains
2,881,936,948
3,070,271 41,336 3,031,456 0 0 0
row
cache objects
941,375,571
1,215,395 852 1,214,606 0 0 0
object
queue header operation
763,607,977
949,376 30,484 919,782 0 0 0
cache
buffers lru chain
376,874,990
705,162 3,192 702,090 0 0 0
Here
the top latch is cache buffers chains. Cache Buffers Chains latches protect the
buffers in the buffer cache that hold data that we have retrieved from disk.
This is a perfectly normal latch to see when data is being read. When this
becomes stressed, the sleeps figure tends to rise as sessions start to wait to
get the buffers they require. Contention can be caused by poorly tuned SQL
reading the same buffers. In our example, although the gets are high at 2.8
billion buffer gets, the sleeps at 41,336 is low.
Average
number of sleeps per miss ratio (Avg Slps/Miss) is low. The reason for this is
that the server is able to deal with this volume of data and so there is no
significant contention on Cache Buffers Chains
latches at this point.
Note:413942.1
How to Identify Which Latch is Associated with a "latch free" wait
Notable timed and wait events:
•CPU time events
Just
because CPU comes as top timed event in AWR may not indicate a problem.
However, if performance is slow with high CPU usage, then start investigating
the wait. First, check to see if a sql is taking most CPU under SQL ordered by
CPU Time in AWR:
SQL
ordered by CPU Time
->
Resources reported for PL/SQL code includes the resources used by all SQL
statements
called by the code.
->
% Total is the CPU Time divided into the Total CPU Time times 100
->
Total CPU Time (s): 56,207
->
Captured SQL account for 114.6% of Total
CPU
Elapsed CPU per % Total
Time
(s) Time (s) Executions Exec (s) % Total DB Time SQL Id
----------
---------- ------------ ----------- ------- ------- -------------
20,349
24,884 168 121.12 36.2 9.1 7bbhgqykv3cm9
Module:
DBMS_SCHEDULER
DECLARE
job BINARY_INTEGER := :job; next_date TIMESTAMP WITH TIME ZONE := :myda
te;
broken BOOLEAN := FALSE; job_name VARCHAR2(30) := :job_name; job_subname
VARCHAR2(30)
:= :job_subname; job_owner VARCHAR2(30) := :job_owner; job_start
TIMESTAMP
WITH TIME ZONE := :job_start; job_scheduled_start TIMESTAMP WITH TIME
Analysis:
->
Total CPU Time (s): 56,207 This
represents 15 minutes of CPU time in total. Whether this is significant depends
on the report
duration.
◦The
top CPU using SQL uses 20,349 second ◦ (around 5 minutes),
◦
Total DB of time this represents is 9.1%.
Executions
is 168 - being as this execution count is the same as 2 of the 3 SQLs
identified earlier, these may be related and this task may well be the scheduling
job that runs the SQLs.
Other Potential CPU related Issues:
◦Check to see if other waits follow the high CPU timed event.
For
example, cursor: pin S waits may cause the high CPU with following known issue:
Note:6904068.8
Bug 6904068 - High CPU usage when there are "cursor: pin S" waits
◦High External CPU usage
If
a process outside of the database is taking high CPU then this could be
preventing database processes from getting the CPU they require and affecting
the database performance. In this case, run oswatcher or other os diagnostic
tools to find which process is taking high CPU. Note:433472.1 OS Watcher For
Windows (OSWFW) User Guide
◦ Troubleshooting CPU usage
The
following note outlines how to further diagnose high CPU usage:
Note:164768.1
Troubleshooting: High CPU Utilization
•'Log file
sync' waits
When
a user session commits or rolls back, the log writer flushes the redo from log
buffer to the redo logs. AWR reports are very useful for determination if this
is a problem and whether the cause of the problem is I/O or in some other area.
The following articles deal specifically with this symptom: Document 1376916.1
Troubleshooting: "Log File Sync" Waits
Note:34592.1WAITEVENT:
"log file sync"
•Buffer busy waits
This
is the event waited on when a session is trying to get a buffer from the buffer
cache but the buffer is busy - either being read by another session or another
session is holding it in incompatible mode. In order to find which block is
busy and why, use the following notes: Document 155971.1 Resolving Intense and
"Random" Buffer Busy Wait Performance
Problems:Note:34405.1 WAITEVENT:
"buffer busy waits"
Troubleshooting Other Issues
Use of ADDM Reports alongside AWR
ADDM
reports can be reviewed along with AWR to assist in diagnosis since they
provide specific recommendations which can help point at potential problems.
The following is a sample ADDM report taken from:
Note:250655.1How
to use the Automatic Database Diagnostic Monitor: Example Output:
DETAILED
ADDM REPORT FOR TASK 'SCOTT_ADDM' WITH ID 5
----------------------------------------------------
Analysis
Period: 17-NOV-2003 from 09:50:21 to 10:35:47
Database
ID/Instance: 494687018/1
Snapshot
Range: from 1 to 3
Database
Time: 4215 seconds
Average
Database Load: 1.5 active sessions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FINDING
1: 65% impact (2734 seconds)
------------------------------------
PL/SQL
execution consumed significant database time.
RECOMMENDATION
1: SQL Tuning, 65% benefit (2734 seconds)
ACTION:
Tune the PL/SQL block with SQL_ID fjxa1vp3yhtmr. Refer to
the
"Tuning PL/SQL Applications" chapter of Oracle's "PL/SQL
User's
Guide and Reference"
RELEVANT
OBJECT: SQL statement with SQL_ID fjxa1vp3yhtmr
BEGIN
EMD_NOTIFICATION.QUEUE_READY(:1, :2, :3); END;
FINDING
2: 35% impact (1456 seconds)
------------------------------------
SQL
statements consuming significant database time were found.
RECOMMENDATION
1: SQL Tuning, 35% benefit (1456 seconds)
ACTION:
Run SQL Tuning Advisor on the SQL statement with SQL_ID
gt9ahqgd5fmm2.
RELEVANT
OBJECT: SQL statement with SQL_ID gt9ahqgd5fmm2 and
PLAN_HASH
547793521
UPDATE
bigemp SET empno = ROWNUM
FINDING
3: 20% impact (836 seconds)
-----------------------------------
The
throughput of the I/O subsystem was significantly lower than expected.
RECOMMENDATION
1: Host Configuration, 20% benefit (836 seconds)
ACTION:
Consider increasing the throughput of the I/O subsystem.
Oracle's recommended solution is to
stripe all data file using
the
SAME methodology. You might also need to increase the
number
of disks for better performance.
RECOMMENDATION
2: Host Configuration, 14% benefit (584 seconds)
ACTION:
The performance of file
D:\ORACLE\ORADATA\V1010\UNDOTBS01.DBF
was significantly worse
than
other files. If striping all files using the SAME
methodology
is not possible, consider striping this file over
multiple
disks.
RELEVANT
OBJECT: database file
"D:\ORACLE\ORADATA\V1010\UNDOTBS01.DBF"
SYMPTOMS
THAT LED TO THE FINDING:
Wait
class "User I/O" was consuming significant database time.
(34%
impact [1450 seconds])
FINDING
4: 11% impact (447 seconds)
-----------------------------------
Undo
I/O was a significant portion (33%) of the total database I/O.
NO
RECOMMENDATIONS AVAILABLE
SYMPTOMS
THAT LED TO THE FINDING:
The
throughput of the I/O subsystem was significantly lower than
expected.
(20% impact [836 seconds])
Wait
class "User I/O" was consuming significant database time.
(34%
impact [1450 seconds])
FINDING
5: 9.9% impact (416 seconds)
------------------------------------
Buffer
cache writes due to small log files were consuming significant
database
time.
RECOMMENDATION
1: DB Configuration, 9.9% benefit (416 seconds)
ACTION:
Increase the size of the log files to 796 M to hold at
least
20 minutes of redo information.
ADDM
report gives possible recommendations in more readable format than AWR.
However, ADDM should be interpreted along with AWR statistics for accurate
diagnostics.
Document
786554.1 How to Read PGA Memory Advisory Section in AWR and Statspack Reports
Document
754639.1 How to Read Buffer Cache Advisory Section in AWR and Statspack Reports
Document
1301503.1 Troubleshooting: AWR Snapshot Collection issues
Document
1363422.1 Automatic Workload Repository (AWR) Reports - Start Point
Statspack
AWR
reports supercede legacy reports such as statspack and bstat/estat. For
reference, the following is a link to and article outlining how to read
statspack reports:
http://www.oracle.com/technetwork/database/focus-areas/performance/statspack-opm4-134117.pdf
Document
94224.1 FAQ- Statspack Complete Reference
Document 394937.1 Statistics
Package (STATSPACK) Guide
431. Why and when should one
tune?
One of the
biggest responsibilities of a DBA is to ensure that the Oracle database is
tuned properly. The Oracle RDBMS is highly tunable and allows the database to
be monitored and adjusted to increase its performance.
One should do performance tuning for the following reasons:
The speed of computing might be wasting valuable human time (users waiting for response);
Enable your system to keep-up with the speed business is conducted; and
Optimize hardware usage to save money (companies are spending millions on hardware).
One should do performance tuning for the following reasons:
The speed of computing might be wasting valuable human time (users waiting for response);
Enable your system to keep-up with the speed business is conducted; and
Optimize hardware usage to save money (companies are spending millions on hardware).
432. What
database aspects should be monitored?
One should
implement a monitoring system to constantly monitor the following aspects of a
database. Writing custom scripts, implementing Oracle’s Enterprise Manager, or
buying a third-party monitoring product can achieve this. If an alarm is
triggered, the system should automatically notify the DBA (e-mail, page, etc.)
to take appropriate action.
Infrastructure
availability:
• Is the
database up and responding to requests
• Are the listeners up and responding to requests
• Are the Oracle Names and LDAP Servers up and responding to requests
• Are the Web Listeners up and responding to requests
• Are the listeners up and responding to requests
• Are the Oracle Names and LDAP Servers up and responding to requests
• Are the Web Listeners up and responding to requests
Things
that can cause service outages:
• Is the
archive log destination filling up?
• Objects getting close to their max extents
• Tablespaces running low on free space/ Objects what would not be able to extend
• User and process limits reached
• Objects getting close to their max extents
• Tablespaces running low on free space/ Objects what would not be able to extend
• User and process limits reached
433. Where should the tuning effort be
directed?
Consider
the following areas for tuning. The order in which steps are listed needs to be
maintained to prevent tuning side effects. For example, it is no good
increasing the buffer cache if you can reduce I/O by rewriting a SQL
statement.
Database Design (if it’s not too late):
Poor system performance usually results from a poor database design. One should generally normalize to the 3NF. Selective denormalization can provide valuable performance improvements. When designing, always keep the “data access path” in mind. Also look at proper data partitioning, data replication, aggregation tables for decision support systems, etc.
Application Tuning:
Experience showed that approximately 80% of all Oracle system performance problems are resolved by coding optimal SQL. Also consider proper scheduling of batch tasks after peak working hours.
Memory Tuning:
Properly size your database buffers (shared pool, buffer cache, log buffer, etc) by looking at your buffer hit ratios. Pin large objects into memory to prevent frequent reloads.
Disk I/O Tuning:
Database files needs to be properly sized and placed to provide maximum disk subsystem throughput. Also look for frequent disk sorts, full table scans, missing indexes, row chaining, data fragmentation, etc.
Eliminate Database Contention:
Study database locks, latches and wait events carefully and eliminate where possible.
Tune the Operating System:
Monitor and tune operating system CPU, I/O and memory utilization. For more information, read the related Oracle FAQ dealing with your specific operating system.
Database Design (if it’s not too late):
Poor system performance usually results from a poor database design. One should generally normalize to the 3NF. Selective denormalization can provide valuable performance improvements. When designing, always keep the “data access path” in mind. Also look at proper data partitioning, data replication, aggregation tables for decision support systems, etc.
Application Tuning:
Experience showed that approximately 80% of all Oracle system performance problems are resolved by coding optimal SQL. Also consider proper scheduling of batch tasks after peak working hours.
Memory Tuning:
Properly size your database buffers (shared pool, buffer cache, log buffer, etc) by looking at your buffer hit ratios. Pin large objects into memory to prevent frequent reloads.
Disk I/O Tuning:
Database files needs to be properly sized and placed to provide maximum disk subsystem throughput. Also look for frequent disk sorts, full table scans, missing indexes, row chaining, data fragmentation, etc.
Eliminate Database Contention:
Study database locks, latches and wait events carefully and eliminate where possible.
Tune the Operating System:
Monitor and tune operating system CPU, I/O and memory utilization. For more information, read the related Oracle FAQ dealing with your specific operating system.
434. What
tuning indicators can one use?
The
following high-level tuning indicators can be used to establish if a database
is performing optimally or not:
• Buffer Cache Hit Ratio
Formula: Hit Ratio = (Logical Reads – Physical Reads) / Logical Reads
Action: Increase DB_CACHE_SIZE (DB_BLOCK_BUFFERS prior to 9i) to increase hit ratio
• Library Cache Hit Ratio
Action: Increase the SHARED_POOL_SIZE to increase hit ratio
• Buffer Cache Hit Ratio
Formula: Hit Ratio = (Logical Reads – Physical Reads) / Logical Reads
Action: Increase DB_CACHE_SIZE (DB_BLOCK_BUFFERS prior to 9i) to increase hit ratio
• Library Cache Hit Ratio
Action: Increase the SHARED_POOL_SIZE to increase hit ratio
435. What tools/utilities does Oracle
provide to assist with performance tuning?
Oracle
provide the following tools/ utilities to assist with performance monitoring
and tuning:
• TKProf
• UTLBSTAT.SQL and UTLESTAT.SQL – Begin and end stats monitoring
• Statspack
• Oracle Enterprise Manager – Tuning Pack
• TKProf
• UTLBSTAT.SQL and UTLESTAT.SQL – Begin and end stats monitoring
• Statspack
• Oracle Enterprise Manager – Tuning Pack
436. What
is STATSPACK and how does one use it?
Statspack
is a set of performance monitoring and reporting utilities provided by Oracle from
Oracle8i and above. Statspack provides improved BSTAT/ESTAT functionality,
though the old BSTAT/ESTAT scripts are still available. For more information
about STATSPACK, read the documentation in file
$ORACLE_HOME/rdbms/admin/spdoc.txt.
Install Statspack:
cd $ORACLE_HOME/rdbms/admin
sqlplus “/ as sysdba” @spdrop.sql — Install Statspack -
sqlplus “/ as sysdba” @spcreate.sql– Enter tablespace names when prompted
Use Statspack:
sqlplusperfstat/perfstat
exec statspack.snap; — Take a performance snapshots
exec statspack.snap;
• Get a list of snapshots
select SNAP_ID, SNAP_TIME from STATS$SNAPSHOT;
@spreport.sql — Enter two snapshot id’s for difference report
Other Statspack Scripts:
• sppurge.sql – Purge a range of Snapshot Id’s between the specified begin and end Snap Id’s
• spauto.sql – Schedule a dbms_job to automate the collection of STATPACK statistics
• spcreate.sql – Installs the STATSPACK user, tables and package on a database (Run as SYS).
• spdrop.sql – Deinstall STATSPACK from database (Run as SYS)
• sppurge.sql – Delete a range of Snapshot Id’s from the database
• spreport.sql – Report on differences between values recorded in two snapshots
• sptrunc.sql – Truncates all data in Statspack tables
Install Statspack:
cd $ORACLE_HOME/rdbms/admin
sqlplus “/ as sysdba” @spdrop.sql — Install Statspack -
sqlplus “/ as sysdba” @spcreate.sql– Enter tablespace names when prompted
Use Statspack:
sqlplusperfstat/perfstat
exec statspack.snap; — Take a performance snapshots
exec statspack.snap;
• Get a list of snapshots
select SNAP_ID, SNAP_TIME from STATS$SNAPSHOT;
@spreport.sql — Enter two snapshot id’s for difference report
Other Statspack Scripts:
• sppurge.sql – Purge a range of Snapshot Id’s between the specified begin and end Snap Id’s
• spauto.sql – Schedule a dbms_job to automate the collection of STATPACK statistics
• spcreate.sql – Installs the STATSPACK user, tables and package on a database (Run as SYS).
• spdrop.sql – Deinstall STATSPACK from database (Run as SYS)
• sppurge.sql – Delete a range of Snapshot Id’s from the database
• spreport.sql – Report on differences between values recorded in two snapshots
• sptrunc.sql – Truncates all data in Statspack tables
437. When
is cost based optimization triggered?
It’s
important to have statistics on all tables for the CBO (Cost Based Optimizer)
to work correctly. If one table involved in a statement does not have
statistics, Oracle has to revert to rule-based optimization for that statement.
So you really want for all tables to have statistics right away; it won’t help
much to just have the larger tables analyzed.
Generally, the CBO can change the execution plan when you:
1. Change statistics of objects by doing an ANALYZE;
2. Change some initialization parameters (for example: hash_join_enabled, sort_area_size, db_file_multiblock_read_count).
Generally, the CBO can change the execution plan when you:
1. Change statistics of objects by doing an ANALYZE;
2. Change some initialization parameters (for example: hash_join_enabled, sort_area_size, db_file_multiblock_read_count).
438. How
can one optimize %XYZ% queries?
It is
possible to improve %XYZ% queries by forcing the optimizer to scan all the
entries from the index instead of the table. This can be done by specifying
hints. If the index is physically smaller than the table (which is
usually the case) it will take less time to scan the entire index than to scan
the entire table.
439. Where
can one find I/O statistics per table?
The UTLESTAT
report shows I/O per tablespace but one cannot see what tables in the
tablespace has the most I/O.
The $ORACLE_HOME/rdbms/admin/catio.sql script creates a sample_io procedure and table to gather the required information. After executing the procedure, one can do a simple SELECT * FROM io_per_object; to extract the required information.
For more details, look at the header comments in the $ORACLE_HOME/rdbms/admin/catio.sql script.
The $ORACLE_HOME/rdbms/admin/catio.sql script creates a sample_io procedure and table to gather the required information. After executing the procedure, one can do a simple SELECT * FROM io_per_object; to extract the required information.
For more details, look at the header comments in the $ORACLE_HOME/rdbms/admin/catio.sql script.