Alert log monitoring problem
Hi guys!
I have found good simple script for alert log monitoring BUT it doesnt shows date and time. How to format it so that i could see date and time as well in log output?
+#!/bin/sh+
+#GLOBS+
ALERT="ORA-";
LOGFILE=/path/to/my/alert_X.log
MYFILE=/home/user/test.log
+##+
+#get the errors out of the logfile.+
grep -h "$ALERT" $LOGFILE >> $MYFILE
+#count number of lines in myfile, and print.+
+VAR=`wc -l $MYFILE | awk -F" " '{print $1}'`+
+#echo $VAR+
+# if the amount of lines is greater than 0, then cat the file and send it to me. if not then echo all clear.+
+#size variable is the number of errors the last time the script ran+
+size=`wc -l test.last | awk -F" " '{print $1}'`+
echo $size
+# if the amount of lines is greater than 0, then cat the file and send it to me. if not then echo all clear+
+if [[ $VAR -gt $size ]] ; then+
cat $MYFILE | mail -vs "oralert" [email protected] ;
else
echo "All clear.."
fi;
rm test.last
mv test.log test.last
touch test.log
The alert_log is know to first put the date of the message into it, and then on the next line the ORA- error message.
So in the case you would find an ORA-error here, then what is the problem into going into the file and start looking at what time it occured.
Otherwise you would have to modify the script in such way that it will get more info than a 'grep' at the moment it finds the error, like using "grep -l" option, that simply matches the lines.
Then a while loop in the script could "walk" trough these lines and get the surrounding lines around that line number from the alert_log.
But, again, if an error pops up, you do want to go into the alert-file, looking up all given info e.g. about the object(s) involved, the users, if a tracefile was generated...etc,etc,.
FJFranken
Similar Messages
-
Alert log monitoring in dbconsole 11.2.0.2.0
Hullo
Is there a way to tune out certain ORA- alerts from alerting via dbconsole? At the moment we are getting all ORA-0600s and some of them we dont need waking up by!
Thanks.
11.2.0.3
RHEL 5.5Do you have metalink support? This is the doc you want to check into .. yes it is possible.
Database Alert log monitoring in 12c explained (Doc ID 1538482.1)
From this doc...
You need to get fancy with the regexp..
Setting thresholds for alert log metrics
Setting of the thresholds is a very complex and wide area and the values are depending on the requirements, the DBA experience and environments being monitored. This can be achieved by going to the Database target menu->Monitoring->Metrics and collection settings->Edit the "Generic Alert Log Error" from the 3 blue pencils icon on the right.
By default, the "Generic Alert Log Error" Warning threshold is set to "ORA-0*(600?|7445|4[0-9][0-9][0-9])[^0-9]" which means:
ORA-0*(600?|7445|4[0-9][0-9][0-9])[^0-9] - the string ORA-0
ORA-0*(600?|7445|4[0-9][0-9][0-9])[^0-9] - folowed by none ore more zeroes
ORA-0*(600?|7445|4[0-9][0-9][0-9])[^0-9] - followed by 600 or 60 (the ? operator means "there is zero or one of the preceding element")
ORA-0*(600?|7445|4[0-9][0-9][0-9])[^0-9] - or it can be 7445
ORA-0*(600?|7445|4[0-9][0-9][0-9])[^0-9] - or it can be anything between 4000 and 4999
ORA-0*(600?|7445|4[0-9][0-9][0-9])[^0-9] - it can follow anything but a number
For setting the thresholds, a Regular expression (RegExp) as the one above needs to be used. -
Hello,
I am very new to shell scripting. Our DB is 10g on AIX. And i wanted to setup something that will monitor my alertlog and send me e-mail out. And i found this online. But have very little knowledge on cronjob. I can set one up. But this script dont tell what goes here. Here is the script that i found online. So if anyone could give me what goes where i would be thankfull. it does says put the check_alert.awk someplace. But is that where the cron comes in place. i mean do i schedule check_alert.awk in my cronjob ??? Just wanted to know what parts goes where and how to set this up the right way so i get e-mail alert for my alert log. a step - step process would be good. Thanks
UNIX shell script to monitor and email errors found in the alert log. Is ran as the oracle OS owner. Make sure you change the "emailaddresshere" entries to the email you want and put the check_alert.awk someplace. I have chosen $HOME for this example, in real life I put it on as mounted directory on the NAS.
if test $# -lt 1
then
echo You must pass a SID
exit
fi
# ensure environment variables set
#set your environment here
export ORACLE_SID=$1
export ORACLE_HOME=/home/oracle/orahome
export MACHINE=`hostname`
export PATH=$ORACLE_HOME/bin:$PATH
# check if the database is running, if not exit
ckdb ${ORACLE_SID} -s
if [ "$?" -ne 0 ]
then
echo " $ORACLE_SID is not running!!!"
echo "${ORACLE_SID is not running!" | mailx -m -s "Oracle sid ${ORACLE_SID} is not running!" "
|emailaddresshere|"
exit 1
fi;
#Search the alert log, and email all of the errors
#move the alert_log to a backup copy
#cat the existing alert_log onto the backup copy
#oracle 8 or higher DB's only.
sqlplus '/ as sysdba' << EOF > /tmp/${ORACLE_SID}_monitor_temp.txt
column xxxx format a10
column value format a80
set lines 132
SELECT 'xxxx' ,value FROM v\$parameter WHERE name = 'background_dump_dest'
exit
EOF
cat /tmp/${ORACLE_SID}_monitor_temp.txt | awk '$1 ~ /xxxx/ {print $2}' > /tmp/${ORACLE_SID}_monitor_location.txt
read ALERT_DIR < /tmp/${ORACLE_SID}_monitor_location.txt
ORIG_ALERT_LOG=${ALERT_DIR}/alert_${ORACLE_SID}.log
NEW_ALERT_LOG=${ORIG_ALERT_LOG}.monitored
TEMP_ALERT_LOG=${ORIG_ALERT_LOG}.temp
cat ${ORIG_ALERT_LOG} | awk -f $HOME/check_alert.awk > /tmp/${ORACLE_SID}_check_monitor_log.log
rm /tmp/${ORACLE_SID}_monitor_temp.txt 2>/dev/null
if [ -s /tmp/${ORACLE_SID}_check_monitor_log.log ]
then
echo "Found errors in sid ${ORACLE_SID}, mailed errors"
echo "The following errors were found in the alert log for ${ORACLE_SID}" > /tmp/${ORACLE_SID}_check_monitor_log.mail
echo "Alert log was copied into ${NEW_ALERT_LOG}" >> /tmp/${ORACLE_SID}_check_monitor_log.mail
echo " "
date >> /tmp/${ORACLE_SID}_check_monitor_log.mail
echo "--------------------------------------------------------------">>/tmp/${ORACLE_SID}_check_monitor_log.mail
echo " "
echo " " >> /tmp/${ORACLE_SID}_check_monitor_log.mail
echo " " >> /tmp/${ORACLE_SID}_check_monitor_log.mail
cat /tmp/${ORACLE_SID}_check_monitor_log.log >> /tmp/${ORACLE_SID}_check_monitor_log.mail
cat /tmp/${ORACLE_SID}_check_monitor_log.mail | mailx -m -s "on ${MACHINE}, MONITOR of Alert Log for ${ORACLE_SID} found errors" "
|emailaddresshere|"
mv ${ORIG_ALERT_LOG} ${TEMP_ALERT_LOG}
cat ${TEMP_ALERT_LOG} >> ${NEW_ALERT_LOG}
touch ${ORIG_ALERT_LOG}
rm /tmp/${ORACLE_SID}_monitor_temp.txt 2> /dev/null
rm /tmp/${ORACLE_SID}_check_monitor_log.log
rm /tmp/${ORACLE_SID}_check_monitor_log.mail
exit
fi;
rm /tmp/${ORACLE_SID}_check_monitor_log.log > /dev/null
rm /tmp/${ORACLE_SID}_monitor_location.txt > /dev/null
The referenced awk script (check_alert.awk). You can modify it as needed to add or remove things you wish to look for. The ERROR_AUDIT is a custom entry that a trigger on DB error writes in our environment.
$0 ~ /Errors in file/ {print $0}
$0 ~ /PMON: terminating instance due to error 600/ {print $0}
$0 ~ /Started recovery/{print $0}
$0 ~ /Archival required/{print $0}
$0 ~ /Instance terminated/ {print $0}
$0 ~ /Checkpoint not complete/ {print $0}
$1 ~ /ORA-/ { print $0; flag=1 }
$0 !~ /ORA-/ {if (flag==1){print $0; flag=0;print " "} }
$0 ~ /ERROR_AUDIT/ {print $0}
I simply put this script into cron to run every 5 minutes passing the SID of the DB I want to monitor.I have a PERL script that I wrote that does exactly what you want and I'll be glad to share that with you along with the CRON entries.
The script runs opens the current alert_log and searches for key phrases and send e-mail if it finds anything. It then sleeps for 60 sec, wakes up and reads from were it left off to the bottom of the file, searching again and then sleeping. The only down side is it keeps a file handle open on the alert_log, so you have to kill this processes if you want to rename or delete the alert_log.
My email in my profile is not hidden.
Tom -
Hi guys is there any kind of free utility or script which could send me peice of last alert log error produced by remote Oracle database?
I need something similar to OraSentry utility....
Just to say my 10g database work on Win platform.
Thank you!Here's a script I use for our 9i databases.
# Set environment
. $HOME/.profile
# get a list of all Oracle DB's running and check their alert log for errors
# The 'end of monitoring' string is added to the end of the alert log after each run
# and only the messages after that are processed. Used so we don't double-check
# the same messages.
for SID in `ps -ef | grep pmon | grep -v grep | awk '{print $NF}' | sed 's/_/ /g' | awk '{print $3}'`
do
export ALERT_LOG=$ORACLE_BASE/admin/$SID/bdump/alert_$SID.log
export LASTLINE=`tail -1 $ALERT_LOG | cut -c1-5`
if [ "$LASTLINE" != "#####" ]; then
sed '/##### end of monitoring #####/,$ !d' $ALERT_LOG > /tmp/work.tmp
grep ORA- /tmp/work.tmp
if [ $? -eq 0 ]; then
mailx -s "Errors found in $SID alert log" [email protected] < /tmp/work.tmp
fi
sed '/##### end of monitoring #####/d' $ALERT_LOG > $ALERT_LOG.work
mv $ALERT_LOG.work $ALERT_LOG
echo "##### end of monitoring #####" >> $ALERT_LOG
fi
done
rm /tmp/work.tmp
Start off by inserting a comment at the end of the alert log. I use ##### end of monitoring #####
Next the script will check to see if that is the last line in the alert log. If it is then nothing has been written to the alert log since the last time the script was run...
If there is something, the script grabs everything after the '##### end of monitoring #####' and checks if it includes errors.
If it does it emails them to the dba. If not (ie: just informational messages) then it appends the '##### end of monitoring #####' line to the end of the alert log for the next run.
Edited by: Jamie CC on Jun 4, 2010 6:43 AM -
Cloud Control 12c monitoring Oracle 11g Standard Edition alert.log
Hi guys.
I just installed Cloud Control 12c3 and added my cluster database, I have readed many papers, tech docs and tech discuss and now i have a huge confussion about packs, licensing and other fruits. Please help...
I have four Oracle 11g databases (differents hosts) Standard Edition, and i want monitoring and notificate alert.log errors. For example, if alert.log says "ORA-01438, blablabla" i want a email notification.
I read about packs, and says that "diagnostic pack" is needed for alert.log monitoring in Cloud Control 12c. But my version database is Standard Edition without packs, so I CAN'T MONITORING ALERT LOG!!!.
Question:
Do I really need Diagnostic Pack for monitoring alert.log with Cloud Control?
If Diagnostinc Pack is not necessary, how can i monitoring alert.log?
ThanksI do not think you require any pack for alert.log content monitoring but you might need to check with Oracle rep. I am saying so because if you go to the "Alert log content" page and click on management pack for this page, Grid will display a message that this page does not require any pack.
Go to the Alert log content by "oracle database -> logs -> alert log contents"
then
go to SETUP -> management packs -> packs for this page
You will see the message will be displayed " this pages does not require any pack"
Can you also provide the doc where it says that this pae needs diagnostic pack? -
Unix Log Monitoring regular expression not picking up alerts
Hi,
We are moving our unix monitoring to SCOM 2012 SP1 rollup 4.
What I have got working is indvidual alert logging of Unix Log alerts by exporting the MP and changing the <IndividualAlerts> value to true and removing the suppression xml section then reimporting the MP.
What I am trying to do is use the regular expression to peform the suppression of specific event (such as event codes).
The expression is:
((?i:warning)(?!(.*1222)|(.*1001)))
ie Search the log for "warning" (not case sensitive) then check if events 1222 or 1001 exist if so return no match, if they dont exist then return true.
I use the built in test function in SCOM when creating the rule and the tests come back as expected but when I inject test lines into the unix log, no alerts get generted.
I suspect it could be the syntax not being accepted on the system (its running RedHat 6 )
I have tested this with regex tools and works.
When I try and test it on the server i get:
[root@bld02 ~]# grep ((?i:Warning)(?!(.*1222)|(.*1001))) /var/log/messages
-bash: !: event not found
[root@bld02 ~]# tail /var/log/messages
Nov 13 15:07:26 bld02 root: SCOM Test Warning Event ID 1001 Round 18
Nov 13 15:07:29 bld02 root: SCOM Test Warning Event ID 1000 Round 18
Nov 13 15:07:35 bld02 root: SCOM Test Warning Event ID 1002 Round 18
So I am expecting 2 alerts to be generated.
SCOM tests to show expression working:
Test 1 Matching
Test 2 to exclude
Need some help with this, Thankyou in advance :)Hello,
Here's an example of modifying the MP to exclude particular events. Firstly, I created a log file rule using the MP template that is fairly inclusive - matching the string Warning (with either a lower or upper case W).
I then exported the MP, and modified the rule. I set the IndividualAlerts = true and removed the AlertSuppression element, so that every matched line will fire a unique alert. You don't have to remove the AlertSuppression, but you should use
Individual alerts so that the exclusion logic doesn't exclude concurrent events that you actually want to match.
Implementing the exclusion logic involves the addition of a System.ExpressionFilter definition in the rule. This will use a conditional evaluation of the //row element of the data item. Here's an example of a dataitem matching an individual row:
<DataItem type="System.Event.Data"time="2013-11-15T10:33:14.8839662-08:00"sourceHealthServiceId="667FF365-70DD-6607-5B66-F9F95253B29F">
<EventOriginId>{86AB962D-2F44-29FD-A909-B99FF6FEB2C5}</EventOriginId>
<PublisherId>{EC7EA4B1-0EA5-7E8E-701F-82FEF3367BC4}</PublisherId>
<PublisherName>WSManEventProvider</PublisherName>
<EventSourceName>WSManEventProvider</EventSourceName>
<Channel>WSManEventProvider</Channel>
<LoggingComputer/>
<EventNumber>0</EventNumber>
<EventCategory>3</EventCategory>
<EventLevel>0</EventLevel>
<UserName/>
<RawDescription>Detected Entry: warning 1002</RawDescription>
<CollectDescription Type="Boolean">true</CollectDescription>
<EventData>
<DataItem type="SCXLogProviderDataSourceData"time="2013-11-15T10:33:14.8839662-08:00"sourceHealthServiceId="667FF365-70DD-6607-5B66-F9F95253B29F">
<SCXLogProviderDataSourceData>
<row>warning 1002</row>
</SCXLogProviderDataSourceData>
</DataItem>
</EventData>
<EventDisplayNumber>0</EventDisplayNumber>
<EventDescription>Detected Entry: warning 1002</EventDescription>
</DataItem>
Here is the rule in the MP XML. The <ConditionDetection>...</ConditionDetection> content was what I added to do the exclusion filtering:
<Rule ID="LogFileTemplate_66b86eaded094c309ffd2631b8367a32.Alert" Enabled="false" Target="Unix!Microsoft.Unix.Computer" ConfirmDelivery="false" Remotable="true" Priority="Normal" DiscardLevel="100">
<Category>EventCollection</Category>
<DataSources>
<DataSource ID="EventDS" TypeID="Unix!Microsoft.Unix.SCXLog.VarPriv.DataSource">
<Host>$Target/Property[Type="Unix!Microsoft.Unix.Computer"]/PrincipalName$</Host>
<LogFile>/tmp/test</LogFile>
<UserName>$RunAs[Name="Unix!Microsoft.Unix.ActionAccount"]/UserName$</UserName>
<Password>$RunAs[Name="Unix!Microsoft.Unix.ActionAccount"]/Password$</Password>
<RegExpFilter>warning</RegExpFilter>
<IndividualAlerts>true</IndividualAlerts>
</DataSource>
</DataSources>
<ConditionDetection TypeID="System!System.ExpressionFilter" ID="Filter">
<Expression>
<RegExExpression>
<ValueExpression>
<XPathQuery Type="String">//row</XPathQuery>
</ValueExpression>
<Operator>DoesNotContainSubstring</Operator>
<Pattern>1001</Pattern>
</RegExExpression>
</Expression>
</ConditionDetection>
<WriteActions>
<WriteAction ID="GenerateAlert" TypeID="Health!System.Health.GenerateAlert">
<Priority>1</Priority>
<Severity>2</Severity>
<AlertName>Log File Alert: ExclusionExample</AlertName>
<AlertDescription>$Data/EventDescription$</AlertDescription>
</WriteAction>
</WriteActions>
</Rule>
I traced this with the Workflow Analyzer as I tested, which shows the logic being applied. Here is the exclusion happening:
Here's more info on the definition of an ExpressionFilter:
http://msdn.microsoft.com/en-us/library/ee692979.aspx
And more information on Regular Expressions in MPs:
http://support.microsoft.com/kb/2702651/en-us
You can also have multiple Expressions in the ExpressionFilter joined by OR or AND operators.
Also, if you are comfortable with the MP authoring, you can just skip the step of creating the rules in the MP template and just author your own MP with the VSAE tool:
http://social.technet.microsoft.com/wiki/contents/articles/18085.scom-2012-authoring-unixlinux-log-file-monitoring-rules.aspx
www.operatingquadrant.com -
Best Way to monitor standby, primary databases, including alert logs, etc.
Hi, Guys, I finally cutover the new environment to the new linux redhat and everything working great so far (the primary/standby).
Now I would like to setup monitoring scripts to monitor it automatically so I can let it run by itself.
What is the best way?
I talked to another dba friend outside of the company and he told me his shop not use any cron jobs to monitor, they use grid control.
We have no grid control. I would like to see what is the best option here? should we setup grid control?
And also for the meantime, I would appreciate any good ideas of any cronjob scripts.
ThanksHello;
I came up with this which I run on the Primary daily, Since its SQL you can add any extras you need.
SPOOL OFF
CLEAR SCREEN
SPOOL /tmp/quickaudit.lst
PROMPT
PROMPT -----------------------------------------------------------------------|
PROMPT
SET TERMOUT ON
SET VERIFY OFF
SET FEEDBACK ON
PROMPT
PROMPT Checking database name and archive mode
PROMPT
column NAME format A9
column LOG_MODE format A12
SELECT NAME,CREATED, LOG_MODE FROM V$DATABASE;
PROMPT
PROMPT -----------------------------------------------------------------------|
PROMPT
PROMPT
PROMPT Checking Tablespace name and status
PROMPT
column TABLESPACE_NAME format a30
column STATUS format a10
set pagesize 400
SELECT TABLESPACE_NAME, STATUS FROM DBA_TABLESPACES;
PROMPT
PROMPT ------------------------------------------------------------------------|
PROMPT
PROMPT
PROMPT Checking free space in tablespaces
PROMPT
column tablespace_name format a30
SELECT tablespace_name ,sum(bytes)/1024/1024 "MB Free" FROM dba_free_space WHERE
tablespace_name <>'TEMP' GROUP BY tablespace_name;
PROMPT
PROMPT ------------------------------------------------------------------------|
PROMPT
PROMPT
PROMPT Checking freespace by tablespace
PROMPT
column dummy noprint
column pct_used format 999.9 heading "%|Used"
column name format a16 heading "Tablespace Name"
column bytes format 9,999,999,999,999 heading "Total Bytes"
column used format 99,999,999,999 heading "Used"
column free format 999,999,999,999 heading "Free"
break on report
compute sum of bytes on report
compute sum of free on report
compute sum of used on report
set linesize 132
set termout off
select a.tablespace_name name,
b.tablespace_name dummy,
sum(b.bytes)/count( distinct a.file_id||'.'||a.block_id ) bytes,
sum(b.bytes)/count( distinct a.file_id||'.'||a.block_id ) -
sum(a.bytes)/count( distinct b.file_id ) used,
sum(a.bytes)/count( distinct b.file_id ) free,
100 * ( (sum(b.bytes)/count( distinct a.file_id||'.'||a.block_id )) -
(sum(a.bytes)/count( distinct b.file_id ) )) /
(sum(b.bytes)/count( distinct a.file_id||'.'||a.block_id )) pct_used
from sys.dba_free_space a, sys.dba_data_files b
where a.tablespace_name = b.tablespace_name
group by a.tablespace_name, b.tablespace_name;
PROMPT
PROMPT ------------------------------------------------------------------------|
PROMPT
PROMPT
PROMPT Checking Size and usage in GB of Flash Recovery Area
PROMPT
SELECT
ROUND((A.SPACE_LIMIT / 1024 / 1024 / 1024), 2) AS FLASH_IN_GB,
ROUND((A.SPACE_USED / 1024 / 1024 / 1024), 2) AS FLASH_USED_IN_GB,
ROUND((A.SPACE_RECLAIMABLE / 1024 / 1024 / 1024), 2) AS FLASH_RECLAIMABLE_GB,
SUM(B.PERCENT_SPACE_USED) AS PERCENT_OF_SPACE_USED
FROM
V$RECOVERY_FILE_DEST A,
V$FLASH_RECOVERY_AREA_USAGE B
GROUP BY
SPACE_LIMIT,
SPACE_USED ,
SPACE_RECLAIMABLE ;
PROMPT
PROMPT ------------------------------------------------------------------------|
PROMPT
PROMPT
PROMPT Checking free space In Flash Recovery Area
PROMPT
column FILE_TYPE format a20
select * from v$flash_recovery_area_usage;
PROMPT
PROMPT ------------------------------------------------------------------------|
PROMPT
PROMPT
PROMPT ------------------------------------------------------------------------|
PROMPT
PROMPT
PROMPT Checking last sequence in v$archived_log
PROMPT
clear screen
set linesize 100
column STANDBY format a20
column applied format a10
--select max(sequence#), applied from v$archived_log where applied = 'YES' group by applied;
SELECT name as STANDBY, SEQUENCE#, applied, completion_time from v$archived_log WHERE DEST_ID = 2 AND NEXT_TIME > SYSDATE -1;
prompt
prompt----------------Last log on Primary--------------------------------------|
prompt
select max(sequence#) from v$archived_log where NEXT_TIME > sysdate -1;
PROMPT
PROMPT ------------------------------------------------------------------------|
PROMPT
PROMPT
PROMPT Checking switchover status
PROMPT
select switchover_status from v$database;I run it from a shell script and email myself quickaudit.lst
Alert logs are great source of information when you have an issue or just want to check something.
Best Regards
mseberg -
Alert Log File Monitoring of 8i and 9i Databases with EM Grid Control 10g
Is it possible to monitor alert log errors in Oracle 8i/9i Databases with EM Grid Control 10g and EM 10g agents? If yes, is it possible to get some kind of notification?
I know that in 10g Database, it is possible to use server generated alerts, but what about 8i/9i?
Regards,
MartinHello
i am interested in a very special feature: is it possible to get notified if alerts occur in alert logs in an 8i/9i database when using Grid control and the 10g agent on the 8i/9i systems?
Moreover, the 10g agent should be able to get Performance Data using the v$ views or direct sga access without using statspack, right?
Do you know where I can find documentation about the supported features when using Grid Control with 8i/9i databases? -
Hi,
I am trying to make modification in metric for monitoring alertlog.
When I try to add a new line,it's asking for field "Time/Line Number",any one know what should be the value for that?
Description on that page --> "The table lists all Time/Line Number objects monitored for this metric. You can specify different threshold settings for each Time/Line Number object." doesn''t make much sense to me.
I gave a dummy value 50,but it doesn't seem to work..
Any help is highly appreciated... thanks...For monitoring <11g database note 976982.1 may be of use to you: https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=BULLETIN&id=976982.1;
For monitoring 11g db's see note 949858.1 (Monitoring 11g Database Alert Log Errors in Enterprise Manager)
I your case you may want to consider to create a UDM, see Monitor Non Critical 11g Database Alert Log Errors Using a SQL UDM [ID 961682.1]
Eric -
Need to find the way to get the actual error message in the alert log.
Hi,
I have configured OEM 11G and monitoring target versions are from 9i to 11g. Now my problem is i have defined the metrics for monitoring the alert log contents and OEM is sending alert if there is any error in the alert log but it is not showing the actual error message just it is showing as below.
============================
Target Name=IDMPRD
Target type=Database Instance
Host=oidmprd01.ho.abc.com
Occurred At=Dec 21, 2011 12:05:21 AM GMT+03:00
Message=1 distinct types of ORA- errors have been found in the alert log.
Metric=Generic Alert Log Error Status
Metric value=1
Severity=Warning
Acknowledged=No
Notification Rule Name=RULE_4_PROD_DATABASES
Notification Rule Owner=SYSMAN
============================
Is there any way to get the complete error details in the OEM alert itself.
Regards
DBA.You need to look at the Alert Log error messages, not the "status" messages. See doc http://docs.oracle.com/cd/E11857_01/em.111/e16285/oracle_database.htm#autoId2
-
Not clearable : 1 distinct types of ORA- errors have been found in the alert log.
( Correct me if I'm wrong.)
What could be the reason for the fact that the event "1 distinct types of ORA- errors have been found in the alert log." is not clearable?
Even if the problem is solved, the ORA- error will still be in the alert.log.
Therefore the event would eventually only disappear after 7 days.
Doesn't sound very logical to me.- The event page showing a warning event for "Generic Alert Log Error Status" metric,
- "Generic Alert Log Error Status" event is a statefull metric,
- On each evaluation of the metric defined as 'Statefull', the Cloud Control Agent recalculates the severity,
- This means, EM Agent scans the alert log for ORA errors, "Generic Alert Log Error Status" will return the number of ORA error found,
- Whenever the returned number is grater than the assigned thresholds of "Generic Alert Log Error Status", an alert is raised on EM console,
- On the next scan of alert log file, this alert will be cleared ONLY if the collected value is less than the assigned threshold,
- By checking your system, the warning threshold of "Generic Alert Log Error Status" is 0, so whenever a single ORA error is found in alert log, a warning event will be raised and you will not be able to clear it manually as this is a stateful metric.
There are three cases where this alert can be cleared:
1. The agent found 0 ORA errors in Alert log, then the alert will be cleared automatically
2. Manual clear: By disabling/enabling the metric
3. Manual clear and further not receiving similar alerts frequently: By assigning higher thresholds values
So, I suggest to disable the metric, then, enable it after some time as follows as follows:
- Go to the problematic database home page >> open the 'Oracle Database' drop-down menu >> 'Monitoring' >> 'Metric and Collection Settings',
- Choose 'All metrics' from the 'View' drop-down list >> search for the 'Generic Alert Log Error Status' metric,
- Click the pencil icon under 'Edit' column opposite to the above metric >> remove the Warning Threshold (make it empty) >> 'Continue' >> 'OK'.
- Wait for the next collection schedule in order for the warning events to be cleared, then, enable the metric again with the same steps (setting Warning Threshold=0).
References:
Note 604385.1 Receiving "Clear" Notifications Unexpectedly for 'Generic Alert Log Error Status' Metric
Note 733784.1 What are Statefull and Stateless Metrics in Enterprise Manager - Explanation and Example
HTH
Mani -
DG Observer triggering SIGSEGV Address not mapped to object errors in alert log
Hi,
I've got a Data Guard configuration using two 11.2.0.3 single instance databases. The configuration has been configured for automatic failover and I have an observer running on a separate box.
This fast-start failover configuration has been in place for about a month and in the last week, numerous SEGSEGV (address not mapped to object) errors are reported in the alert log. This is happening quite frequently (every 4/5 minutes or so).
The corresponding trace files show the process triggering the error coming from the observer.
Has anyone experienced this problem? I'm at my wits end trying to figure out how to fix the configuration to eliminate this error.
I must also note that even though this error is occurring a lot, it doesn't seem to be affecting any of the database functionality.
Help?
Thanks in advance.
BethHi.. The following is the alert log message, the traced file generated, and the current values of the data guard configuration. In addition, as part of my research, I attempted to apply patch 12615660 which did not take care of the issue. I also set the inbound_connection_timeout parameter to 0 and that didn't help either. I'm still researching but any pointer in the right direction is very much appreciated.
Error in Alert Log
Thu Apr 09 10:28:59 2015
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x9] [PC:0x85CE503, nstimexp()+71] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/<db_unq_name>/<SID>/trace/<SID>_ora_29902.trc (incident=69298):
ORA-07445: exception encountered: core dump [nstimexp()+71] [SIGSEGV] [ADDR:0x9] [PC:0x85CE503] [Address not mapped to object] []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Apr 09 10:29:02 2015
Sweep [inc][69298]: completed
Trace file:
Trace file /u01/app/oracle/diag/rdbms/<db_unq_name>/<SID>/trace/<SID>_ora_29902.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning and Oracle Label Security options
ORACLE_HOME = /u01/app/oracle/product/11.2.0.3/dbhome_1
System name: Linux
Node name: <host name>
Release: 2.6.32-431.17.1.el6.x86_64
Version: #1 SMP Wed May 7 14:14:17 CDT 2014
Machine: x86_64
Instance name: <SID>
Redo thread mounted by this instance: 1
Oracle process number: 19
Unix process pid: 29902, image: oracle@<host name>
*** 2015-04-09 10:28:59.966
*** SESSION ID:(416.127) 2015-04-09 10:28:59.966
*** CLIENT ID:() 2015-04-09 10:28:59.966
*** SERVICE NAME:(<db_unq_name>) 2015-04-09 10:28:59.966
*** MODULE NAME:(dgmgrl@<observer host> (TNS V1-V3)) 2015-04-09 10:28:59.966
*** ACTION NAME:() 2015-04-09 10:28:59.966
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x9] [PC:0x85CE503, nstimexp()+71] [flags: 0x0, count: 1]
DDE: Problem Key 'ORA 7445 [nstimexp()+71]' was flood controlled (0x6) (incident: 69298)
ORA-07445: exception encountered: core dump [nstimexp()+71] [SIGSEGV] [ADDR:0x9] [PC:0x85CE503] [Address not mapped to object] []
ssexhd: crashing the process...
Shadow_Core_Dump = PARTIAL
ksdbgcra: writing core file to directory '/u01/app/oracle/diag/rdbms/<db_unq_name>/<SID>/cdump'
Data Guard Configuration
DGMGRL> show configuration verbose;
Configuration - dg_config
Protection Mode: MaxPerformance
Databases:
dbprim - Primary database
dbstby - (*) Physical standby database
(*) Fast-Start Failover target
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
FastStartFailoverLagLimit = '180'
CommunicationTimeout = '180'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: dbstby
Observer: observer_host
Lag Limit: 180 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Configuration Status:
SUCCESS
DGMGRL> show database verbose dbprim
Database - dbprim
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
DG_CONFIG
Properties:
DGConnectIdentifier = 'dbprim'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'MANUAL'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = 'dbstby'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
SidName = ‘<sid>’
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<db host name>)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=<service_name>)(INSTANCE_NAME=<sid>)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = '%t_%s_%r.dbf'
TopWaitEvents = '(monitor)'
Database Status:
SUCCESS
DGMGRL> show database verbose dbstby
Database - dbstby
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds
Apply Lag: 0 seconds
Real Time Query: ON
Instance(s):
DG_CONFIG
Properties:
DGConnectIdentifier = 'dbstby'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = 'dbprim'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
SidName = ‘<sid>’
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<db host name>)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=<service_name>)(INSTANCE_NAME=<sid>)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = '%t_%s_%r.dbf'
TopWaitEvents = '(monitor)'
Database Status:
SUCCESS -
Database Generating Errors in Alert Log
Hie,
my db is generating errors in the alert log
Errors in file /export/home/app/oracle/diag/rdbms/ORACLE_SID/ORACLE_SID/trace/ORACLE_SID_j000_15845.trc (incident=44144):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
ORA-00001: unique constraint (SYSMAN.PK_MGMT_JOB_EXECUTION) violated
DDE: Problem Key 'ORA 600 [13011]' was completely flood controlled (0x4)
Further messages for this problem key will be suppressed for up to 10 minutes
looking forward to your assistance
MikeTue May 22 12:55:56 2012
Adjusting the default value of parameter parallel_max_servers
from 960 to 285 due to the value of parameter processes (300)
Starting ORACLE instance (normal)
Tue May 22 13:00:16 2012
Adjusting the default value of parameter parallel_max_servers
from 960 to 285 due to the value of parameter processes (300)
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /export/home/app/oracle/product/11.2.0/dbhome_1/dbs/arch
Autotune of undo retention is turned on.
IMODE=BR
ILAT =52
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
ORACLE_HOME = /export/home/app/oracle/product/11.2.0/dbhome_1
System name: SunOS
Node name: server_1
Release: 5.10
Version: Generic_141445-09
Machine: i86pc -
We have recently migrated our database from Solaris to Linux (RHEL5) and since this migration we are seeing weird errors related to archive log shipping to the remote standby site and a corresponding ORA-600 in the standby site.
What's interesting is everything gets resolved by itself, it always seems to happen during the heavy database load when there is frequent log switch happening (~3 min). So initially it tries to archive to the remote site and it fails with the following error in the primary alert log.
Errors in file /app/oracle/admin/UIIP01/bdump/uiip01_arc1_9772.trc:
ORA-00272: error writing archive log
Mon Jul 14 10:57:36 2008
FAL[server, ARC1]: FAL archive failed, see trace file.
Mon Jul 14 10:57:36 2008
Errors in file /app/oracle/admin/UIIP01/bdump/uiip01_arc1_9772.trc:
ORA-16055: FAL request rejected
ARCH: FAL archive failed. Archiver continuing
Mon Jul 14 10:57:36 2008
ORACLE Instance UIIP01 - Archival Error. Archiver continuing.
And then we see a ORA-600 on standby database related to this which complains about redo block corruption.
Mon Jul 14 09:57:32 2008
Errors in file /app/oracle/admin/UIIP01/udump/uiip01_rfs_12775.trc:
ORA-00600: internal error code, arguments: [kcrrrfswda.11], [4], [368], [], [], [], [], []
Mon Jul 14 09:57:36 2008
And the trace file has this wonderful block corruption error..
Corrupt redo block 424432 detected: bad checksum
Flag: 0x1 Format: 0x22 Block: 0x000679f0 Seq: 0x000006ef Beg: 0x150 Cks:0xa2e5
----- Dump of Corrupt Redo Buffer -----
*** 2008-07-14 09:57:32.550
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcrrrfswda.11], [4], [368], [], [], [], [], []
So ARC tries to resend this redo log again and it succeeds, end of the day all we have is a bunch of these ORA- errors in our alert logs, triggering off our monitors and these errors resolve themselves without any manual intervention, opened a tar with Oracle support as this is not affecting our primary database, they are in no hurry to get this one prioritized and also they are reluctant to accept that it's a bug that resolves itself.
Just wanted to get it out here to see if anyone experienced a similar problem, let me know if you need any more details.
As I said earlier this behaviour happens only during peak loads espceially when we have full 500M redo logs that are being archived.
Thanks in Advance.Thanks Madrid!..
I scoured thru these metalink notes before looking for possible solutions and almost all of them were closed citing a customer problem related to OS, firewall, network, etc or some were closed saying requested data not provided.
Looks as if they were never successfully closed with a resolution.
I just want to assure myself that the redo corruption that standby is reporting will not haunt me later, when I am doing a recovery or even a crash recovery using redo logs?..
I have multiplexed my logs, just in case and have all the block checking parameters enabled on both primary and standby databases.
Thanks,
Ramki -
EM Grid Monitoring problem.
Hi all,
1. I have configured EM GRID control 11g with all my targets and hosts.but I tried to set metrics Generic Alert Log, I am able to see only few instances.
2. I am trying to send error exec dbms_system.ksdwrt(2, 'ORA-00600: Testing monitoring tool'); . Error is logged in em grid but never came to my email id which I have configured.
sent test mails both from hosts and em grid. it is successful. but still problem occurs for sending ORA- errors.
Please help me in this regard.
thanks in advance
Edited by: user7280060 on Jan 23, 2012 6:26 PMHi;
I suggest close your issue here as answered than move your issue Forum Home » Enterprise Manager » Enterprise Manager Grid Control which you can get more quick response
Regard
Helios
Maybe you are looking for
-
I have a new Mac Pro on the way. How exciting. Any way I currently have a 2012 Mac pro with a 5870 and a gtx 120 in it. It is currently driving a 46 inch tv using vga. I also have a 21, 24 and 27 inch displays plugged in using dvi. My question is wil
-
I forgot the answer to the secret question
I forgot the answer to the secret question
-
HT1386 get itunes music library onto iPhone
My iPhone app purchases are syncing fine to my laptop, but I can't get my laptop's music library in iTunes to sync onto the iPhone. Help?
-
I am using Weblogic 7 SP2 and Oracle 8.1.7 and I am getting the following Exception. Any Pointers as what needs to be done..... java.sql.SQLException: XA error: XAER_NOTA : The XID is not valid start() failed on resource 'GFSLocalCnxnPool': XAER_NOTA
-
Company and application library in Headstart 6i
Form level triggers and corresponding event-handling procedures in the application/company library have been changed in Headstart Designer R6I in comparison to previous release 3.4.x. I wonder how to organize the new library-structure and event-handl