Database slow; Log file sync shown in AWR
Our database becomes slow occasionally and when i looked at AWR report it shows 'log file sync' quite often. Any idea why this is happening?
Garry,
1. Please get the SA to check the I/O Subsystem for any I/O queues
2. Check redo log switching sequence in alert log during the time of this slowness and see if there is anything unusual.
3. Generate ADDM report and see if it has any redo related recommendations
4. Do you have any batch jobs running during this slowness? Check if there are any unwanted frequent commits (like COMMITs within a loop) is happening
Edited by: Manu Alphonse on Dec 4, 2009 8:58 PM
Similar Messages
-
Log file sync top event during performance test -av 36ms
Hi,
During the performance test for our product before deployment into product i see "log file sync" on top with Avg wait (ms) being 36 which i feel is too high.
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
log file sync 208,327 7,406 36 46.6 Commit
direct path write 646,833 3,604 6 22.7 User I/O
DB CPU 1,599 10.1
direct path read temp 1,321,596 619 0 3.9 User I/O
log buffer space 4,161 558 134 3.5 ConfiguratAlthough testers are not complaining about the performance of the appplication , we ,DBAs, are expected to be proactive about the any bad signals from DB.
I am not able to figure out why "log file sync" is having such slow response.
Below is the snapshot from the load profile.
Snap Id Snap Time Sessions Curs/Sess
Begin Snap: 108127 16-May-13 20:15:22 105 6.5
End Snap: 108140 16-May-13 23:30:29 156 8.9
Elapsed: 195.11 (mins)
DB Time: 265.09 (mins)
Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 1,168M 1,136M Std Block Size: 8K
Shared Pool Size: 1,120M 1,168M Log Buffer: 16,640K
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ --------------- --------------- ---------- ----------
DB Time(s): 1.4 0.1 0.02 0.01
DB CPU(s): 0.1 0.0 0.00 0.00
Redo size: 607,512.1 33,092.1
Logical reads: 3,900.4 212.5
Block changes: 1,381.4 75.3
Physical reads: 134.5 7.3
Physical writes: 134.0 7.3
User calls: 145.5 7.9
Parses: 24.6 1.3
Hard parses: 7.9 0.4
W/A MB processed: 915,418.7 49,864.2
Logons: 0.1 0.0
Executes: 85.2 4.6
Rollbacks: 0.0 0.0
Transactions: 18.4Some of the top background wait events:
^LBackground Wait Events DB/Inst: Snaps: 108127-108140
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
Avg
%Time Total Wait wait Waits % bg
Event Waits -outs Time (s) (ms) /txn time
log file parallel write 208,563 0 2,528 12 1.0 66.4
db file parallel write 4,264 0 785 184 0.0 20.6
Backup: sbtbackup 1 0 516 516177 0.0 13.6
control file parallel writ 4,436 0 97 22 0.0 2.6
log file sequential read 6,922 0 95 14 0.0 2.5
Log archive I/O 6,820 0 48 7 0.0 1.3
os thread startup 432 0 26 60 0.0 .7
Backup: sbtclose2 1 0 10 10094 0.0 .3
db file sequential read 2,585 0 8 3 0.0 .2
db file single write 560 0 3 6 0.0 .1
log file sync 28 0 1 53 0.0 .0
control file sequential re 36,326 0 1 0 0.2 .0
log file switch completion 4 0 1 207 0.0 .0
buffer busy waits 5 0 1 116 0.0 .0
LGWR wait for redo copy 924 0 1 1 0.0 .0
log file single write 56 0 1 9 0.0 .0
Backup: sbtinfo2 1 0 1 500 0.0 .0During a previous perf test , things didnt look this bad for "log file sync. Few sections from the comparision report(awrddprt.sql)
{code}
Workload Comparison
~~~~~~~~~~~~~~~~~~~ 1st Per Sec 2nd Per Sec %Diff 1st Per Txn 2nd Per Txn %Diff
DB time: 0.78 1.36 74.36 0.02 0.07 250.00
CPU time: 0.18 0.14 -22.22 0.00 0.01 100.00
Redo size: 573,678.11 607,512.05 5.90 15,101.84 33,092.08 119.13
Logical reads: 4,374.04 3,900.38 -10.83 115.14 212.46 84.52
Block changes: 1,593.38 1,381.41 -13.30 41.95 75.25 79.38
Physical reads: 76.44 134.54 76.01 2.01 7.33 264.68
Physical writes: 110.43 134.00 21.34 2.91 7.30 150.86
User calls: 197.62 145.46 -26.39 5.20 7.92 52.31
Parses: 7.28 24.55 237.23 0.19 1.34 605.26
Hard parses: 0.00 7.88 100.00 0.00 0.43 100.00
Sorts: 3.88 4.90 26.29 0.10 0.27 170.00
Logons: 0.09 0.08 -11.11 0.00 0.00 0.00
Executes: 126.69 85.19 -32.76 3.34 4.64 38.92
Transactions: 37.99 18.36 -51.67
First Second Diff
1st 2nd
Event Wait Class Waits Time(s) Avg Time(ms) %DB time Event Wait Class Waits Time(s) Avg Time
(ms) %DB time
SQL*Net more data from client Network 2,133,486 1,270.7 0.6 61.24 log file sync Commit 208,355 7,407.6
35.6 46.57
CPU time N/A 487.1 N/A 23.48 direct path write User I/O 646,849 3,604.7
5.6 22.66
log file sync Commit 99,459 129.5 1.3 6.24 log file parallel write System I/O 208,564 2,528.4
12.1 15.90
log file parallel write System I/O 100,732 126.6 1.3 6.10 CPU time N/A 1,599.3
N/A 10.06
SQL*Net more data to client Network 451,810 103.1 0.2 4.97 db file parallel write System I/O 4,264 784.7 1
84.0 4.93
-direct path write User I/O 121,044 52.5 0.4 2.53 -SQL*Net more data from client Network 7,407,435 279.7
0.0 1.76
-db file parallel write System I/O 986 22.8 23.1 1.10 -SQL*Net more data to client Network 2,714,916 64.6
0.0 0.41
{code}
*To sum it sup:
1. Why is the IO response getting such an hit during the new perf test? Please suggest*
2. Does the number of DB writer impact "log file sync" wait event? We have only one DB writer as the number of cpu on the host is only 4
{code}
select *from v$version;
BANNER
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
PL/SQL Release 11.1.0.7.0 - Production
CORE 11.1.0.7.0 Production
TNS for HPUX: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production
{code}
Please let me know if you would like to see any other stats.
Edited by: Kunwar on May 18, 2013 2:20 PM1. A snapshot interval of 3 hours always generates meaningless results
Below are some details from the 1 hour interval AWR report.
Platform CPUs Cores Sockets Memory(GB)
HP-UX IA (64-bit) 4 4 3 31.95
Snap Id Snap Time Sessions Curs/Sess
Begin Snap: 108129 16-May-13 20:45:32 140 8.0
End Snap: 108133 16-May-13 21:45:53 150 8.8
Elapsed: 60.35 (mins)
DB Time: 140.49 (mins)
Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 1,168M 1,168M Std Block Size: 8K
Shared Pool Size: 1,120M 1,120M Log Buffer: 16,640K
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ --------------- --------------- ---------- ----------
DB Time(s): 2.3 0.1 0.03 0.01
DB CPU(s): 0.1 0.0 0.00 0.00
Redo size: 719,553.5 34,374.6
Logical reads: 4,017.4 191.9
Block changes: 1,521.1 72.7
Physical reads: 136.9 6.5
Physical writes: 158.3 7.6
User calls: 167.0 8.0
Parses: 25.8 1.2
Hard parses: 8.9 0.4
W/A MB processed: 406,220.0 19,406.0
Logons: 0.1 0.0
Executes: 88.4 4.2
Rollbacks: 0.0 0.0
Transactions: 20.9
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
log file sync 73,761 6,740 91 80.0 Commit
log buffer space 3,581 541 151 6.4 Configurat
DB CPU 348 4.1
direct path write 238,962 241 1 2.9 User I/O
direct path read temp 487,874 174 0 2.1 User I/O
Background Wait Events DB/Inst: Snaps: 108129-108133
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
Avg
%Time Total Wait wait Waits % bg
Event Waits -outs Time (s) (ms) /txn time
log file parallel write 61,049 0 1,891 31 0.8 87.8
db file parallel write 1,590 0 251 158 0.0 11.6
control file parallel writ 1,372 0 56 41 0.0 2.6
log file sequential read 2,473 0 50 20 0.0 2.3
Log archive I/O 2,436 0 20 8 0.0 .9
os thread startup 135 0 8 60 0.0 .4
db file sequential read 668 0 4 6 0.0 .2
db file single write 200 0 2 9 0.0 .1
log file sync 8 0 1 152 0.0 .1
log file single write 20 0 0 21 0.0 .0
control file sequential re 11,218 0 0 0 0.1 .0
buffer busy waits 2 0 0 161 0.0 .0
direct path write 6 0 0 37 0.0 .0
LGWR wait for redo copy 380 0 0 0 0.0 .0
log buffer space 1 0 0 89 0.0 .0
latch: cache buffers lru c 3 0 0 1 0.0 .0 2 The log file sync is a result of commit --> you are committing too often, maybe even every individual record.
Thanks for explanation. +Actually my question is WHY is it so slow (avg wait of 91ms)+3 Your IO subsystem hosting the online redo log files can be a limiting factor.
We don't know anything about your online redo log configuration
Below is my redo log configuration.
GROUP# STATUS TYPE MEMBER IS_
1 ONLINE /oradata/fs01/PERFDB1/redo_1a.log NO
1 ONLINE /oradata/fs02/PERFDB1/redo_1b.log NO
2 ONLINE /oradata/fs01/PERFDB1/redo_2a.log NO
2 ONLINE /oradata/fs02/PERFDB1/redo_2b.log NO
3 ONLINE /oradata/fs01/PERFDB1/redo_3a.log NO
3 ONLINE /oradata/fs02/PERFDB1/redo_3b.log NO
6 rows selected.
04:13:14 perf_monitor@PERFDB1> col FIRST_CHANGE# for 999999999999999999
04:13:26 perf_monitor@PERFDB1> select *from v$log;
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME
1 1 40689 524288000 2 YES INACTIVE 13026185905545 18-MAY-13 01:00
2 1 40690 524288000 2 YES INACTIVE 13026185931010 18-MAY-13 03:32
3 1 40691 524288000 2 NO CURRENT 13026185933550 18-MAY-13 04:00Edited by: Kunwar on May 18, 2013 2:46 PM -
10.2.0.2 aix 5.3 64bit archivelog mode.
I'm going to attempt to describe the system first and then outline the issue: The database is about 1Gb in size of which only about 400Mb is application data. There is only one table in the schema that is very active with all transactions inserting and or updating a row to log the user activity. The rest of the tables are used primarily for reads by the users and periodically updated by the application administrator with application code. There's about 1.2G of archive logs generated per day, from 3 50Mb redo logs all on the same filesystem.
The problem: We randomly have issues with users being kicked out of the application or hung up for a period of time. This application is used at a remote site and many times we can attribute the users issues to network delays or problems with a terminal server they are logging into. Today however they called and I noticed an abnormally high amount of 'log file sync' waits.
I asked the application admin if there could have been more activity during that time frame and more frequent commits than normal, but he says there was not. My next thought was that there might be an issue with the IO sub-system that the logs are on. So I went to our aix admin to find out the activity of that file system during that time frame. She had an nmon report generated that shows the RAID-1 disk group peak activity during that time was only 10%.
Now I took two awr reports and compared some of the metrics to see if indeed there was the same amount of activity, and it does look like the load was the same. With the same amount of activity & commits during both time periods wouldn't that lead to it being time spent waiting on writes to the disk that the redo logs are on? If so, why wouldn't the nmon report show a higher percentage of disk activity?
I can provide more values from the awr reports if needed.
per sec per trx
Redo size: 31,226.81 2,334.25
Logical reads: 646.11 48.30
Block changes: 190.80 14.26
Physical reads: 0.65 0.05
Physical writes: 3.19 0.24
User calls: 69.61 5.20
Parses: 34.34 2.57
Hard parses: 19.45 1.45
Sorts: 14.36 1.07
Logons: 0.01 0.00
Executes: 36.49 2.73
Transactions: 13.38
Redo size: 33,639.71 2,347.93
Logical reads: 697.58 48.69
Block changes: 215.83 15.06
Physical reads: 0.86 0.06
Physical writes: 3.26 0.23
User calls: 71.06 4.96
Parses: 36.78 2.57
Hard parses: 21.03 1.47
Sorts: 15.85 1.11
Logons: 0.01 0.00
Executes: 39.53 2.76
Transactions: 14.33
Total Per sec Per Trx
redo blocks written 252,046 70.52 5.27
redo buffer allocation retries 7 0.00 0.00
redo entries 167,349 46.82 3.50
redo log space requests 7 0.00 0.00
redo log space wait time 49 0.01 0.00
redo ordering marks 2,765 0.77 0.06
redo size 111,612,156 31,226.81 2,334.25
redo subscn max counts 5,443 1.52 0.11
redo synch time 47,910 13.40 1.00
redo synch writes 64,433 18.03 1.35
redo wastage 13,535,756 3,787.03 283.09
redo write time 27,642 7.73 0.58
redo writer latching time 2 0.00 0.00
redo writes 48,507 13.57 1.01
user commits 47,815 13.38 1.00
user rollbacks 0 0.00 0.00
redo blocks written 273,363 76.17 5.32
redo buffer allocation retries 6 0.00 0.00
redo entries 179,992 50.15 3.50
redo log space requests 6 0.00 0.00
redo log space wait time 18 0.01 0.00
redo ordering marks 2,997 0.84 0.06
redo size 120,725,932 33,639.71 2,347.93
redo subscn max counts 5,816 1.62 0.11
redo synch time 12,977 3.62 0.25
redo synch writes 66,985 18.67 1.30
redo wastage 14,665,132 4,086.37 285.21
redo write time 11,358 3.16 0.22
redo writer latching time 6 0.00 0.00
redo writes 52,521 14.63 1.02
user commits 51,418 14.33 1.00
user rollbacks 0 0.00 0.00Edited by: PktAces on Oct 1, 2008 1:45 PMMr Lewis,
Here's the results from the histogram query, the two sets of values were gathered about 15 minutes apart, during a slower than normal activity time.
105 log file parallel write 1 714394
105 log file parallel write 2 289538
105 log file parallel write 4 279550
105 log file parallel write 8 58805
105 log file parallel write 16 28132
105 log file parallel write 32 10851
105 log file parallel write 64 3833
105 log file parallel write 128 1126
105 log file parallel write 256 316
105 log file parallel write 512 192
105 log file parallel write 1024 78
105 log file parallel write 2048 49
105 log file parallel write 4096 31
105 log file parallel write 8192 35
105 log file parallel write 16384 41
105 log file parallel write 32768 9
105 log file parallel write 65536 1
105 log file parallel write 1 722787
105 log file parallel write 2 295607
105 log file parallel write 4 284524
105 log file parallel write 8 59671
105 log file parallel write 16 28412
105 log file parallel write 32 10976
105 log file parallel write 64 3850
105 log file parallel write 128 1131
105 log file parallel write 256 316
105 log file parallel write 512 192
105 log file parallel write 1024 78
105 log file parallel write 2048 49
105 log file parallel write 4096 31
105 log file parallel write 8192 35
105 log file parallel write 16384 41
105 log file parallel write 32768 9
105 log file parallel write 65536 1 -
Hig Log file sync waits on DG environment.
Hi All,
Experiencing a high number of "log file sync" waits on primary after AIX OS upgrade to 6.1 and
changed storage from EMC DMX storage to EMC VMAX storage on primary. SA's says EMC checked out the
storage and it is showing faster response time than the old DMX storage.
Not made any app changes and have been running fine for years on 10.2.0.3 just befor upgrade.
Research:
Every time primary database has to write redo from the log buffer to the online redolog files, the user session
waits on "log file sync" wait event while waiting for LGWR to post it back to confirm all redo changes are safely
on disk, however when the primary database also has a standby DB and the log shipping is using "LGWR SYNC AFFIRM"
means that user sessions not only has to wait for the local write to the online redologs, but also wait for the
write to the SRL on the standby DB, so every delay in getting a complete write response from the standby will
be seen in the primary as an "log file sync" wait event even if the local write to the ORL has completed already
After AWRs from primary(DB1ABRN_regy_AWR_20110802_1400_3.html)thre is not much to pinpoint except:
i) DG is configured to use 'LGWR SYNC AFFIRM'(customer can't switch to standby "ARCH SYNC NOAFFIRM" for critical application support).
ii) Upgraded the OS to 6.1 and changed their storage on primary *** Standby DB storage is still using slow one.
In DG there are several components like transport network, IO on the standby etc that can feed back to primary "log file sync" wait events that are seen.
Question we have is:
What is the best way to trace/query information from the standby side to identify its contribution to the high log file synch wait on the primary ?There an Oracle support note on this :
WAITEVENT: "log file sync" Reference Note [ID 34592.1]
While it does not address trace it does have a Data Guard section.
Also this Oracle support note may help :
Troubleshooting I/O-related waits [ID 223117.1]
Best Regards
mseberg -
Performance Issue: Wait event "log file sync" and "Execute to Parse %"
In one of our test environments users are complaining about slow response.
In statspack report folowing are the top-5 wait events
Event Waits Time (cs) Wt Time
log file parallel write 1,046 988 37.71
log file sync 775 774 29.54
db file scattered read 4,946 248 9.47
db file parallel write 66 248 9.47
control file parallel write 188 152 5.80
And after runing the same application 4 times, we are geting Execute to Parse % = 0.10. Cursor sharing is forced and query rewrite is enabled
When I view v$sql, following command is parsed frequently
EXECUTIONS PARSE_CALLS
SQL_TEXT
93380 93380
select SEQ_ORDO_PRC.nextval from DUAL
Please suggest what should be the method to troubleshoot this and if I need to check some more information
Regards,
Sudhanshu BhandariWell, of course, you probably can't eliminate this sort of thing entirely: a setup such as yours is inevitably a compromise. What you can do is make sure your log buffer is a good size (say 10MB or so); that your redo logs are large (at least 100MB each, and preferably large enough to hold one hour or so of redo produced at the busiest time for your database without filling up); and finally set ARCHIVE_LAG_TARGET to something like 1800 seconds or more to ensure a regular, routine, predictable log switch.
It won't cure every ill, but that sort of setup often means the redo subsystem ceases to be a regular driver of foreground waits. -
Wait Events "log file parallel write" / "log file sync" during CREATE INDEX
Hello guys,
at my current project i am performing some performance tests for oracle data guard. The question is "How does a LGWR SYNC transfer influences the system performance?"
To get some performance values, that i can compare i just built up a normal oracle database in the first step.
Now i am performing different tests like creating "large" indexes, massive parallel inserts/commits, etc. to get the bench mark.
My database is an oracle 10.2.0.4 with multiplexed redo log files on AIX.
I am creating an index on a "normal" table .. i execute "dbms_workload_repository.create_snapshot()" before and after the CREATE INDEX to get an equivalent timeframe for the AWR report.
After the index is built up (round about 9 GB) i perform an awrrpt.sql to get the AWR report.
And now take a look at these values from the AWR
Avg
%Time Total Wait wait Waits
Event Waits -outs Time (s) (ms) /txn
log file parallel write 10,019 .0 132 13 33.5
log file sync 293 .7 4 15 1.0
......How can this be possible?
Regarding to the documentation
-> log file sync: http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents003.htm#sthref3120
Wait Time: The wait time includes the writing of the log buffer and the post.-> log file parallel write: http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents003.htm#sthref3104
Wait Time: Time it takes for the I/Os to complete. Even though redo records are written in parallel, the parallel write is not complete until the last I/O is on disk.This was also my understanding .. the "log file sync" wait time should be higher than the "log file parallel write" wait time, because of it includes the I/O and the response time to the user session.
I could accept it, if the values are close to each other (maybe round about 1 second in total) .. but the different between 132 seconds and 4 seconds is too noticeable.
Is the behavior of the log file sync/write different when performing a DDL like CREATE INDEX (maybe async .. like you can influence it with the initialization parameter COMMIT_WRITE??)?
Do you have any idea how these values come about?
Any thoughts/ideas are welcome.
Thanks and RegardsSurachart Opun (HunterX) wrote:
Thank you for Nice Idea.
In this case, How can we reduce "log file parallel write" and "log file sync" waited time?
CREATE INDEX with NOLOGGINGA NOLOGGING can help, can't it?Yes - if you create index nologging then you wouldn't be generating that 10GB of redo log, so the waits would disappear.
Two points on nologging, though:
<ul>
it's "only" an index, so you could always rebuild it in the event of media corruption, but if you had lots of indexes created nologging this might cause an unreasonable delay before the system was usable again - so you should decide on a fallback option, such as taking a new backup of the tablespace as soon as all the nologging operatons had completed.
If the database, or that tablespace, is in +"force logging"+ mode, the nologging will not work.
</ul>
Don't get too alarmed by the waits, though. My guess is that the +"log file sync"+ waits are mostly from other sessions, and since there aren't many of them the other sessions are probably not seeing a performance issue. The +"log file parallel write"+ waits are caused by your create index, but they are happeninng to lgwr in the background which is running concurrently with your session - so your session is not (directly) affected by them, so may not be seeing a performance issue.
The other sessions are seeing relatively high sync times because their log file syncs have to wait for one of the large writes that you have triggered to complete, and then the logwriter includes their (little) writes with your next (large) write.
There may be a performance impact, though, from the pure volume of I/O. Apart from the I/O to write the index you have LGWR writting (N copies) of the redo for the index and ARCH is reading and writing the completed log files caused by the index build. So the 9GB of index could easily be responsible for vastly more I/O than the initial 9GB.
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk
To post code, statspack/AWR report, execution plans or trace files, start and end the section with the tag {noformat}{noformat} (lowercase, curly brackets, no spaces) so that the text appears in fixed format.
"Science is more than a body of knowledge; it is a way of thinking"
Carl Sagan -
Log file sync wait event advise?
Due to business needs, Apps has been designed to do every single transaction commit and coming to infrastructure, db Datafiles and redo logs are in faster disk (FC) and archive logs are placed in slower speed disk(SATA). We are seeing the log file sync wait event in the top events and symptoms for this waitevent is either disk speed is slow or doing frequent commit. In my scenario i guess 99% this wait event happening due to frequent commits. Can i assume archive log slower disk will not be root cause for this (my understanding this waitevent occurs on redo log writing area and not in archive log writing area) ? Please confirm.
user530956 wrote:
We are seeing the log file sync wait event in the top events and symptoms for this waitevent is either disk speed is slow or doing frequent commit.As Hemant has pointed out, this could also be due to CPU overload.
I note you say the event is IN the top events - this tells us virtually nothing; an event might be IN the top 5 while being responsible for less than 1% of the total recorded wait time; it could be IN the top 5 but explained as a side effect of something that appeared above it in the Top 5. Why not just show us a typical Top 5 (along with a typical Load Profile it you want to be really helpful).
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk
To post code, statspack/AWR report, execution plans or trace files from text files, START and END the text with the tag {noformat}{noformat} (the word "code" in lowercase, curly brackets, no spaces) so that the text appears in fixed format. This won't be sufficient if you try to cut and paste from an HTML report, which will need further editing. -
when i look at oem dbconsole, i see that waits on log file sync has %98 bad impact on my database. dbconsole says that :
finding : Waits on event "log file sync" while performing COMMIT and ROLLBACK operations were consuming significant database time.
Action : Investigate the possibility of improving the performance of I/O to the online redo log files
what can be done for this error?what can be done for this error? This is not an error,its a wait events.
when user perform commit/rollback then information in logbuffer will be flush to redo logfile by lgwr process.and user session will wait until all this activity process has to complete after commit.
try following actions
log file sync
When a user session commits (or rolls back), the session's redo information must be flushed to the redo logfile by LGWR. The server process performing the COMMIT or ROLLBACK waits under this event for the write to the redo log to complete.
Actions
If this event's waits constitute a significant wait on the system or a significant amount of time waited by a user experiencing response time issues or on a system, then examine the average time waited.
If the average time waited is low, but the number of waits are high, then the application might be committing after every INSERT, rather than batching COMMITs. Applications can reduce the wait by committing after 50 rows, rather than every row.
If the average time waited is high, then examine the session waits for the log writer and see what it is spending most of its time doing and waiting for. If the waits are because of slow I/O, then try the following:
[b] * Reduce other I/O activity on the disks containing the redo logs, or use dedicated disks.
* Alternate redo logs on different disks to minimize the effect of the archiver on the log writer.
* Move the redo logs to faster disks or a faster I/O subsystem (for example, switch from RAID 5 to RAID 1).
* Consider using raw devices (or simulated raw devices provided by disk vendors) to speed up the writes.
* Depending on the type of application, it might be possible to batch COMMITs by committing every N rows, rather than every row, so that fewer log file syncs are needed.
kuljeet -
Statspack: High log file sync timeouts and waits
Hi all,
Please see an extract from our statpack report:
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
log file sync 349,713 215,674 74.13
db file sequential read 16,955,622 31,342 10.77
CPU time 21,787 7.49
direct path read (lob) 92,762 8,910 3.06
db file scattered read 4,335,034 4,439 1.53
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
log file sync 349,713 150,785 215,674 617 1.8
db file sequential read 16,955,622 0 31,342 2 85.9
I hope the above is readable. I'm concerned with the very high number of Waits and Timeouts, particulary around the log file sync event. From reading around I suspect that the disk our redo log sits on isn't fast enough.
1) Is this conclusion correct, are these timeouts excessively high (70% seems high...)?
2) I see high waits on almost every other event (but not timeouts), is this pointing towards an incorrect database database setup (give our very high loads of 160 executes second?
Any help would be much appreciated.
JonathanTop 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
log file sync 349,713 215,674 74.13
db file sequential read 16,955,622 31,342 10.77
CPU time 21,787 7.49
direct path read (lob) 92,762 8,910 3.06
db file scattered read 4,335,034 4,439 1.53
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
log file sync 349,713 150,785 215,674 617 1.8
db file sequential read 16,955,622 0 31,342 2 85.9What's the time frame of this report on?
It looks like your disk storage can't keep up with the volume of I/O requests from your database.
The first few thing need to look at, what're IO intensive SQLs in your database. Are these SQLs doing unnecessary full table scan?
Find out the hot blocks and the objects they belong.
Check v$session_wait view.
Is there any other suspicious activity going on in your Server ? Like other program other than Oracle doing high IO activities? Are there any core dump going on? -
45 min long session of log file sync waits between 5000 and 20000 ms
45 min long log file sync waits between 5000 and 20000 ms
Encountering a rather unusual performance issue. Once every 4 hours I am seeing a 45 minute long log file sync wait event being reported using Spotlight on Oracle. For the first 30 minutes the event wait is for approx 5000 ms, followed by an increase to around 20000 ms for the next 15 min before rapidly dropping off and normal operation continues for the next 3 hours and 15 minutes before the cycle repeats itself. The issue appears to maintain it's schedule independently of restarting the database. Statspack reports do not show an increase in commits or executions or any new sql running during the time the issue is occuring. We have two production environments both running identicle applications with similar usage and we do not see the issue on the other system. I am leaning towards this being a hardware issue, but the 4 hour interval regardless of load on the database has me baffled. If it were a disk or controller cache issue one would expect to see the interval change with database load.
I cycle my redo logs and archive them just fine with log file switches every 15-20 minutes. Even during this unusally long and high session of log file sync waits I can see that the redo log files are still switching and are being archived.
The redo logs are on a RAID 10, we have 4 redo logs at 1 GB each.
I've run statspack reports on hourly intervals around this event:
Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
log file sync 756,729 2,538,034 88.47
db file sequential read 208,851 153,276 5.34
log file parallel write 636,648 129,981 4.53
enqueue 810 21,423 .75
log file sequential read 65,540 14,480 .50
And here is a sample while not encountering the issue:
Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
log file sync 953,037 195,513 53.43
log file parallel write 875,783 83,119 22.72
db file sequential read 221,815 63,944 17.48
log file sequential read 98,310 18,848 5.15
db file scattered read 67,584 2,427 .66
Yes I know I am already tight on I/O for my redo even during normal operations yet, my redo and archiving works just fine for 3 hours and 15 minutes (11 to 15 log file switches). These normal switches result in a log file sync wait of about 5000 ms for about 45 seconds while the 1GB redo log is being written and then archived.
I welcome any and all feedback.
Message was edited by:
acyoung1
Message was edited by:
acyoung1Lee,
log_buffer = 1048576 we use a standard of 1 MB for our buffer cache, we've not altered the setting. It is my understanding that Oracle typically recommends that you not exceed 1MB for the log_buffer, stating that a larger buffer normally does not increase performance.
I would agree that tuning the log_buffer parameter may be a place to consider; however, this issue last for ~45 minutes once every 4 hours regardless of database load. So for 3 hours and 15 minutes during both peak usage and low usage the buffer cache, redo log and archival processes run just fine.
A bit more information from statspack reports:
Here is a sample while the issue is occuring.
Snap Id Snap Time Sessions
Begin Snap: 661 24-Mar-06 12:45:08 87
End Snap: 671 24-Mar-06 13:41:29 87
Elapsed: 56.35 (mins)
Cache Sizes
~~~~~~~~~~~
db_block_buffers: 196608 log_buffer: 1048576
db_block_size: 8192 shared_pool_size: 67108864
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
Redo size: 615,141.44 2,780.83
Logical reads: 13,241.59 59.86
Block changes: 2,255.51 10.20
Physical reads: 144.56 0.65
Physical writes: 61.56 0.28
User calls: 1,318.50 5.96
Parses: 210.25 0.95
Hard parses: 8.31 0.04
Sorts: 16.97 0.08
Logons: 0.14 0.00
Executes: 574.32 2.60
Transactions: 221.21
% Blocks changed per Read: 17.03 Recursive Call %: 26.09
Rollback per transaction %: 0.03 Rows per Sort: 46.87
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Buffer Hit %: 98.91 In-memory Sort %: 100.00
Library Hit %: 98.89 Soft Parse %: 96.05
Execute to Parse %: 63.39 Latch Hit %: 99.87
Parse CPU to Parse Elapsd %: 90.05 % Non-Parse CPU: 85.05
Shared Pool Statistics Begin End
Memory Usage %: 89.96 92.20
% SQL with executions>1: 76.39 67.76
% Memory for SQL w/exec>1: 72.53 63.71
Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
log file sync 756,729 2,538,034 88.47
db file sequential read 208,851 153,276 5.34
log file parallel write 636,648 129,981 4.53
enqueue 810 21,423 .75
log file sequential read 65,540 14,480 .50
And this is a sample during "normal" operation.
Snap Id Snap Time Sessions
Begin Snap: 671 24-Mar-06 13:41:29 88
End Snap: 681 24-Mar-06 14:42:57 88
Elapsed: 61.47 (mins)
Cache Sizes
~~~~~~~~~~~
db_block_buffers: 196608 log_buffer: 1048576
db_block_size: 8192 shared_pool_size: 67108864
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
Redo size: 716,776.44 2,787.81
Logical reads: 13,154.06 51.16
Block changes: 2,627.16 10.22
Physical reads: 129.47 0.50
Physical writes: 67.97 0.26
User calls: 1,493.74 5.81
Parses: 243.45 0.95
Hard parses: 9.23 0.04
Sorts: 18.27 0.07
Logons: 0.16 0.00
Executes: 664.05 2.58
Transactions: 257.11
% Blocks changed per Read: 19.97 Recursive Call %: 25.87
Rollback per transaction %: 0.02 Rows per Sort: 46.85
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Buffer Hit %: 99.02 In-memory Sort %: 100.00
Library Hit %: 98.95 Soft Parse %: 96.21
Execute to Parse %: 63.34 Latch Hit %: 99.90
Parse CPU to Parse Elapsd %: 96.60 % Non-Parse CPU: 84.06
Shared Pool Statistics Begin End
Memory Usage %: 92.20 88.73
% SQL with executions>1: 67.76 75.40
% Memory for SQL w/exec>1: 63.71 68.28
Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
log file sync 953,037 195,513 53.43
log file parallel write 875,783 83,119 22.72
db file sequential read 221,815 63,944 17.48
log file sequential read 98,310 18,848 5.15
db file scattered read 67,584 2,427 .66 -
'log file sync' versus 'log file prallel write'
I have been asked to run an artificial test that performs a large number of small insert-only transactions with a high degree (200) of parallelism. The COMMITS were not inside a PL/SQL loop so a 'log file sync' (LFS) event occured each COMMIT. I have measured the average 'log file parallel write' (LFPW) time by running the following PL/SQL queries at the beginning and end of a 10 second period:
SELECT time_waited,
total_waits
INTO wait_start_lgwr,
wait_start_lgwr_c
FROM v$system_event e
WHERE event LIKE 'log%parallel%';
SELECT time_waited,
total_waits
INTO wait_end_lgwr,
wait_end_lgwr_c
FROM v$system_event e
WHERE event LIKE 'log%parallel%';
I took the difference in TIME_WAITED and divided it by the difference in TOTAL_WAITS.
I did the same thing for LFS.
What I expected was that the LFS time would be just over 50% more than the LFPW time: when the thread commits it has to wait for the previous LFPW to complete (on average half way through) and then for its own.
Now I know there is a lot of CPU related stuff that goes on in LGWR but I 'reniced' it to a higher priority and could observe that it was then spending 90% of its time in LFPW, 10% ON CPU and no time idle. Total system CPU time averaged only 25% on this 64 'processor' machine.
What I saw was that the LFS time was substantially more than the LFPW time. For example, on one test LFS was 18.07ms and LFPW was 6.56ms.
When I divided the number of bytes written each time by the average 'commit size' it seems that LGWR is writing out data for only about one third of the average number of transactions in LFS state (rather than the two thirds that I would have expected). When the COMMIT was changed to COMMIT WORK NOWAIT the size of each LFPW increased substantially.
These observations are at odds with my understanding of how LGWR works. My understanding is that when LGWR completes one LFPW it begins a new one with the entire contents of the log buffer at that time.
Can anybody tell me what I am missing?
P.S. Same results in database versions 10.2 Sun M5000 and 11.2 HP G7s.I have been asked to run an artificial test that performs a large number of small insert-only transactions with a high degree (200) of parallelism. The COMMITS were not inside a PL/SQL loop so a 'log file sync' (LFS) event occured each COMMIT. I have measured the average 'log file parallel write' (LFPW) time by running the following PL/SQL queries at the beginning and end of a 10 second period:
SELECT time_waited,
total_waits
INTO wait_start_lgwr,
wait_start_lgwr_c
FROM v$system_event e
WHERE event LIKE 'log%parallel%';
SELECT time_waited,
total_waits
INTO wait_end_lgwr,
wait_end_lgwr_c
FROM v$system_event e
WHERE event LIKE 'log%parallel%';
I took the difference in TIME_WAITED and divided it by the difference in TOTAL_WAITS.
I did the same thing for LFS.
What I expected was that the LFS time would be just over 50% more than the LFPW time: when the thread commits it has to wait for the previous LFPW to complete (on average half way through) and then for its own.
Now I know there is a lot of CPU related stuff that goes on in LGWR but I 'reniced' it to a higher priority and could observe that it was then spending 90% of its time in LFPW, 10% ON CPU and no time idle. Total system CPU time averaged only 25% on this 64 'processor' machine.
What I saw was that the LFS time was substantially more than the LFPW time. For example, on one test LFS was 18.07ms and LFPW was 6.56ms.
When I divided the number of bytes written each time by the average 'commit size' it seems that LGWR is writing out data for only about one third of the average number of transactions in LFS state (rather than the two thirds that I would have expected). When the COMMIT was changed to COMMIT WORK NOWAIT the size of each LFPW increased substantially.
These observations are at odds with my understanding of how LGWR works. My understanding is that when LGWR completes one LFPW it begins a new one with the entire contents of the log buffer at that time.
Can anybody tell me what I am missing?
P.S. Same results in database versions 10.2 Sun M5000 and 11.2 HP G7s. -
Log file sync waits with null sql_ids
10.2.0.3
I am querying V$ACTIVE_SESSION_HISTORY to drill into log file sync waits.
select sql_id,sum(time_waited)
from v$active_session_history
where sample_time > sysdate - 1/24
group by sql_id
order by 2 desc
All of my top sessions for this have null sql_ids. I did some google searches and these are the answers that I found have null sql_ids. There are some other sessions where the sql_id is not null, but they are not anywhere near the top.
1. could be running pl/sql. yeah ok. but I would need to run 'dml' and issue a commit for this event to fire).
2. no sql is running. does this mean the insert finished and then I am waiting on the 'commit' part?
I want to track these sqls down so I can track them back to the application. I want to get the developers to limit their commit frequency and use batch (array based) DML. How do I track this down?
Also, is there anyway to figure out how often different users are committing? I want to track back to the worst offenders. Could be some parts of the application are commit periodically and others are not, but log file sync's could slow down everyone.You are either bored or suffer from Compulsive Tuning Disorder.
It can be a challenge to solve a problem that only exists between your ears
post results from SQL below
SELECT sql_id,
SUM(time_waited) / 1000000
FROM v$active_session_history
WHERE sample_time > SYSDATE - 1 / 24
AND time_waited > 0
GROUP BY sql_id
ORDER BY 2 DESC -
We have just deployed a 4-node RAC cluster on 10GR2. We force a log switch every 5 minutes to ensure our Dataguard standby site is relatively up to date, we use the ARCH to ship logs. We are running to a very fast HP XP 12000 with massive amounts of write cache, so we never actually write straight to disk. However everytime we do a log switch and archive the log, we see a massive spike in the log file sync event. This is a real-time billing system so we monitor transaction response times in ms. Our response time for a transaction can go from 8ms to around 500ms.
I can't understand why this is happening, not only are our disks fast but we are also using asynch I/O and ASM. Surely with asynch I/O you should never wait for a write to complete.Log file sync event happens when client wait for LGWR finishes write to the log file after client said 'commit'. The way to reduce the number of the 'Log file sync' events is to increase the speed of LGWR process or not to commit that often.
You've described your disk system as very fast - what is the amount of data you write on every log switch? How does the performance of this write relates to your disk system tests? what block size did you use when testing the disk system? as far as I remember the LGWR uses OS block size and not the DB block size to write data to the disk. Try to experiment on your test system - put your log files on the virtual disk created in RAM and run the test case - do you see the delays?
With such restrictions for the transaction time you may want to look at Oracle Times-Ten database (http://www.oracle.com/database/timesten.html)
Since you've mentioned the 10gR2 you could probably use the new feature - asynchronous commit - in this case your transaction will not wait for the LGWR process. Be aware that using the NOWAIT commit opens a small possibility of data loss - the doc describes it quite clear.
http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_sqlproc.htm#CIHEDGBF
Mike -
Hi,
Maybe someone can help me on this.
We have a RAC database in production that (for some) applications need a response of 0,5 seconds. In general that is working.
Outside of production hours we make a weekly full backup and daily incremental backup so that is not bothering us. However as soon as we make an archive backup or a backup of the control file during production hours we have a problem as the application have to wait for more then 0,5 seconds for a respons caused by the event "log file sync" with wait class "Commit".
I already adjusted the script for RMAN so that we use only have 1 files per set and also use one channel. However that didn't work.
Increasing the logbuffer was also not a success.
Increasing Large pool is in our case not an option.
We have 8 redolog groups with each 2 members ( each 250 Mb) and an average during the day of 12 logswitches per hour which is not very alarming. Even during the backup the I/O doesn't show very high activity. The increase of I/O at that moment is minor but (maybe) apperantly enough to cause the "log file sync".
Oracle has no documentation that gives me more possible causes.
Strange thing is that before the first of October we didn't have this problem and there were no changes made.
Has anyone an idea where to look further or did anyone experience a thing like this and was able to solve it?
Kind regardsThe only possible contention I can see is between the log writer and the archiver. 'Backup archivelog' in RMAN means implicitly 'ALTER SYSTEM ARCHIVE LOG CURRENT' (log switch and archiving the online log).
You should alternate redo logs on different disks to minimize the effect of the archiver on the log writer.
Werner -
Log file sync during RMAN archive backup
Hi,
I have a small question. I hope someone can answer it.
Our database(cluster) needs to have a response within 0.5 seconds. Most of the time it works, except when the RMAN backup is running.
During the week we run one time a full backup, every weekday one incremental backup, every hour a controlfile backup and every 15 minutes an archival backup.
During a backup reponse time can be much longer then this 0.5 seconds.
Below an typical example of responsetime.
EVENT: log file sync
WAIT_CLASS: Commit
TIME_WAITED: 10,774
It is obvious that it takes very long to get a commit. This is in seconds. As you can see this is long. It is clearly related to the RMAN backup since this kind of responsetime comes up when the backup is running.
I would like to ask why response times are so high, even if I only backup the archivelog files? We didn't have this problem before but suddenly since 2 weeks we have this problem and I can't find the problem.
- We use a 11.2G RAC database on ASM. Redo logs and database files are on the same disks.
- Autobackup of controlfile is off.
- Dataguard: LogXptMode = 'arch'
Greetings,Hi,
Thank you. I am new here and so I was wondering how I can put things into the right category. It is very obvious I am in the wrong one so I thank the people who are still responding.
-Actually the example that I gave is one of the many hundreds a day. The respone times during the archive backup is most of the time between 2 and 11 seconds. When we backup the controlfile with it, it is for sure that these will be the response times.
-The autobackup of the controfile is put off since we already have also a backup of the controlfile every hour. As we have a backup of archivefiles every 15 minutes it is not necessary to also backup the controlfile every 15 minutes, specially if that even causes more delay. Controlfile is a lifeline but if you have properly backupped your archivefiles, a full restore with max 15 minutes of data loss is still possible. We put autobackup off since it is severely in the way of performance at the moment.
As already mentioned for specific applications the DB has to respond in 0,5 seconds. When it doesn’t happen then an entry will be written in a table used by that application. So I can compare the time of failure with the time of something happening. The times from the archivelog backup and the failure match in 95% of the cases. It also show that log file sync at that moment is also part of this performance issue. I actually built a script that I used for myself to determine out of the application what the cause is of the problem;
select ASH.INST_ID INST,
ASH.EVENT EVENT,
ASH.P2TEXT,
ASH.WAIT_CLASS,
DE.OWNER OWNER,
DE.OBJECT_NAME OBJECT_NAME,
DE.OBJECT_TYPE OBJECT_TYPE,
ASH.TIJD,
ASH.TIME_WAITED TIME_WAITED
from (SELECT INST_ID,
EVENT,
CURRENT_OBJ#,
ROUND(TIME_WAITED / 1000000,3) TIME_WAITED,
TO_CHAR(SAMPLE_TIME, 'DD-MON-YYYY HH24:MI:SS') TIJD,
WAIT_CLASS,
P2TEXT
FROM gv$active_session_history
WHERE PROGRAM IN ('yyyyy', 'xxxxx')) ASH,
(SELECT OWNER, OBJECT_NAME, OBJECT_TYPE, OBJECT_ID FROM DBA_OBJECTS) DE
WHERE DE.OBJECT_id = ASH.CURRENT_OBJ#
AND ASH.TIME_WAITED > 2
ORDER BY 8,6
- Our logfiles are 250M and we have 8 groups of 2 members.
- Large pool is not set since we use memory_max_target and memory_target . I know that Oracle maybe doesn’t use memory well with this parameter so it is truly a thing that I should look into.
- I looked for the size of the logbuffer. Actually our logbuffer is 28M which in my opinion is very large so maybe I should put it even smaller. It is very well possible that the logbuffer is causing this problem. Thank you for the tip.
- I will also definitely look into the I/O. Eventhough we work with ASM on raid 10 I don’t think it is wise to put redo logs and datafiles on the same disks. Then again, it is not installed by me. So, you are right, I have to investigate.
Thank you all very much for still responding even if I put this in the totally wrong category.
Greetings,
Maybe you are looking for
-
Where do I find MPN on my MBP?
I'm trying to find the MPN for my MBP (to register iDVD/iLife), but don't see anything under Utilities/System Profiler. Any advice? Also tried looking on my package but can't see any M#####... number. Thanks- MacBook Pro Mac OS X (10.4.7)
-
Sender SOAP AXIS message protocol
Hello Experts, i am working on SOAP to Proxy scenario, i need to pull XML data from URL using AXIS message protocol. i have developed the scenario and in rwb channel show active with out error and its not pulling XML data from URL. when i open the UR
-
Why is my mac mail program freezing? I use an imac running os 10.7.2. My freezes occur when I'm not using mail but have it open. After a while I realize no new mail has been downloaded. If I try to close mac mail, nothing happens, so I do a "forc
-
Dynamically Decaling Properties in PS CS3's plug in
Now Im developing a plug-in for Photoshop CS3, and I want to dynamically change its properties other than statically define them in PiPL. But Photoshop never send message "SP Properties" to my plug-in, so I have no chance to handle SPPropertiesMessag
-
I have been experimenting with the opensparc code a little bit to try and synthesize with xst. I do not have access to the high dollar tools. The problem I ran into is even with one core enabled the build rapidly chews threw about 2.2G of memory and