(RAC) node 2 dont start automatically
hi gurus,
we have intalled a RAC (with 2 nodes) on database version 11.2.0.3.
When we reboot node 1, it restart automatically but when we reboot node 2 we have to restart manually using:
srvctl start instance -d DB_name -i NODE_2
how to check if we have autostart configured in node 2?
i did
[oracle@sifnode2 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
if HAS is online it means that is autostart on?
Hi,
You sould know there is important change in 11.2 CRS...
With Oracle 11.2 database auto start policy in the clusterware is restore, which means that clusterware will remember the last state of the database. If the database was stopped normally then on the next restart of clusterware it won’t be started. Otherwise if the server crashes or by some reason the OS is rebooted then clusterware will start the database because last state was ONLINE (running).
If you are running on Linux there is no need to write your own scripts for automatically startup/shutdown anymore. But you need to configure something.
Here is what it is:
By default with Oracle 11.2 several important resources come in the profile with attribute AUTO_START=restore. Such resources are Oracle Database resource, Oracle ASM resource, CRS resource type for the Listener. This means that if Oracle database server is restarted for some reason , it will keep and restore the last state.
It is a good practice to change this default behaviour and it is the first thing I usually do after a new installation. I change AUTO_START=always for the resources listed above.
Coming back to your issue....
Please check following to know the auto_start policy
crsctl config crs
crs_stat -p <resource name>
crsctl status resource <resource name> -p | grep AUTO_START
Similar Messages
-
Upgrading to iTunes 10.1 on my MBP (MacOS 10.6.5) i got two problems:
first, itunes dont starts automaticly anymmore, when connecting iPhone 3GS(iOS 4.2.1). Starting Itunes manually, iPhone-sync starts and works well??? The "dont sync automaticly"-button in iTunes setup is not set!
sec.: iTunes 10.1 dont play videopodcasts anymore. Only sound is o.k. The podast itself is o.k.and vissibale with other tools and after syncing, iPhone shows them, too.
Any Tips?
greetings to all
A. GehrmannMine did something very similar - when I tried to search, I got a message saying that it couldn't contact the iTunes Store. I uninstalled it and reinstalled it, and afterwards it worked, but the fix might have had nothing to do with that uninstall/reinstall. I wasn't able to contact the iTunes Store through my iPod either, and that didn't work for a while even after the reinstall. Then eventually it ran correctly as well.
I'm inclined to believe that there was something wrong at iTunes, not on my end. Have you tried again recently? Hopefully it'll be working again, like mine is. -
Both instances of rac is getting start automatically
Earlier we use to start both the instances of RAC manually but from last one week
both instances of rac is getting start automatically when we starts the unix machine
DB Name - Kumaondb
instance1=Kumaondb1
instance2=Kumaondb2
I want to start them manually,
May someone help me ?
Thanx !
DeepIf you not using ASM with your database then simply change the Y to N in the /etc/oratab file.
-
Why RAC services doesn't start automatically?
Hi All,
I have 2-node 10gR2 RAC installed on RHEL4. Whenever i restart the server, the services related to RAC are not starting automatically except VIP related services. Manually i have to start the services using crs_stop and crs_start commands. then, everything will be fine and working. There was no problems or error occurred while installation.
Whenever i restart my node VIP addresses which are in DNS takes about 10-15 mins to resolve, means to show up in ifconfig -a output. Is this related to service not starting automatically?
Can anybody tell me what might be the problem?
Thanks,
Praveen.when a failure occurs, the cluster will put things back to the state they were in at the time of failure. So if your database is running and the node crashes or is rebooted, then the services should start because the target state is ONLINE. If you do a srvctl stop database, then this is a assumed a planned shutdown so we stop the database and the services (dependent resource as Erik says). If after you have successfully stopped the database, the target state of the service is OFFLINE. If you reboot at this point, then the database and service should not start.
All databases should work the same if they were created the same. There is not enough information in your post to know why they did not start. If it continues to be a problem, contact Oracle Support and they can help figure out the differences. -
The CSSD does not start automatically on Non-RAC - AIX 5L
Hi all,
Database: Oracle 10.2.0.4
O.S AIX 5.3 TL 10
I am facing a problem with the CSSD Non-RAC/Clusterware.
When we restarted the server the service CSSD does not start automatically and there is no relevant messages in the log files of CSSD.
Running the "localconfig reset $ORACLE_HOME", execute successfully but does not start the CSSD.
Running "/etc/init.cssd start" - does not work, nothing happens when we run it
I manually ran "/etc/ init.cssd run" and the service CSSD (ossd.bin) was started without errors.
No errors on errpt from AIX.
I've opened a SR but I am waiting for response from Oracle.Hi,
I believe this problem is not with Oracle, but in AIX.It may be that one process that is starting on boot is holding the other process. You need to identify what is process/application.
Check the process on inittab that are already active and which are inactive they should be active.
One clue: Is very common process that is initiated in wait mode hold the boot server.
wait
When the init command enters the run level that matches the entry's run level,
start the process and wait for its termination.
All subsequent reads of the /etc/inittab file while the init
command is in the same run level will cause the init command to ignore this entry.http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.files/doc/aixfiles/inittab.htm
Regards,
Levi Pereira -
RAC hangs when starting or stopping 2nd instance of 2 node RAC
Has anyone seen the problem with all transactions and/or logins hanging when starting or stopping the 2nd node of a 2 node RAC database. When I shut down the 2nd instance using srvctl I occasionally get errors and long delays connecting. On our larger database with instances having a very large SGAs, trying to connect you sometimes get ORA-12537 or ORA-01033. Users experience a very long hang before getting the errors. These instances have very large SGAs of about 30GB on each node.
In the log of the first instance we see messages like the following for quite a while.
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Jan 8 06:59:08 2007
LMS 1: 0 GCS shadows cancelled, 0 closed
Mon Jan 8 06:59:08 2007
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Mon Jan 8 06:59:23 2007
LMS 0: 20740 GCS shadows traversed, 4001 replayed
Mon Jan 8 06:59:23 2007
LMS 1: 20744 GCS shadows traversed, 4001 replayed
Mon Jan 8 06:59:23 2007
LMS 0: 20882 GCS shadows traversed, 4001 replayed
Mon Jan 8 06:59:23 2007
LMS 1: 20627 GCS shadows traversed, 4001 replayed
Mon Jan 8 06:59:24 2007
LMS 0: 20781 GCS shadows traversed, 4001 replayed
Mon Jan 8 06:59:24 2007
Thanks in advance.I have tested with only one node and starting and stopping is much faster without any cache fusion traffic. The application is not RAC aware, it was written without regard for RAC. As you say there may be some accessing of the same block by all nodes causing RAC to remaster blocks when a node is shut down.
My concern is the length of time users are afftected when I am starting a node that has been offline for a while. With the large SGA I have, it appears users are affected adversely for several minutes, in affect causing an outage which we are trying to avoid by using RAC. -
hi
i got a few free app before and did not download them completely and dont want to do this
but they stay in my download list and start automatly when i connect to itunes store and going to crazy me
how can i remove them from my account for ever
please help me
thank youTry Here > https://discussions.apple.com/thread/4074945?tstart=0
-
Dbconsole failed to start on one RAC node
Hi
I have 2 RAC nodes (RHEL 4) and 10.2.0.1. On one dbconsole is running and on other I get the following. Earlier dbconsole
on both the nodes used to run perfectly fine. I will appreacite any suggestions to rectify this problem.
Regards
oracle@rac01<18>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> emctl start dbconsole
TZ set to Canada/Newfoundland
Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
http://rac01:1158/em/console/aboutApplication
Agent Version : 10.1.0.4.1
OMS Version : Unknown
Protocol Version : 10.1.0.2.0
Agent Home : /u01/app/oracle/product/10.2/db_1/rac01_RACDB1
Agent binaries : /u01/app/oracle/product/10.2/db_1
Agent Process ID : 23329
Parent Process ID : 21132
Agent URL : http://rac01:3938/emd/main
Started at : 2007-07-25 11:37:32
Started by user : oracle
Last Reload : 2007-07-25 11:37:32
Last successful upload : (none)
Last attempted upload : (none)
Total Megabytes of XML files uploaded so far : 0.00
Number of XML files pending upload : 371
Size of XML files pending upload(MB) : 7.66
Available disk space on upload filesystem : 44.78%
Agent is already started. Will restart the agent
Stopping agent ... stopped.
Starting Oracle Enterprise Manager 10g Database Control ............................................................................................. failed.
Logs are generated in directory /u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log
oracle@rac01<19>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log>
ON OTHER NODE:
oracle@rac02<2>:/u01/app/oracle> emctl start dbconsole
TZ set to Canada/Newfoundland
Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
http://rac01:1158/em/console/aboutApplication
Starting Oracle Enterprise Manager 10g Database Control .................................... started.
Logs are generated in directory /u01/app/oracle/product/10.2/db_1/rac02_RACDB2/sysman/log
oracle@rac02<3>:/u01/app/oracle>Thanks for your time and reply .
Well, here is what I got, couldn't make out from here.
Regards
oracle@rac01<19>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> ls -lart
total 13500
drwxr----- 7 oracle dba 4096 Jul 14 10:48 ..
-rw-r----- 1 oracle dba 0 Jul 14 10:48 emdctl.log
drwxrwx--- 2 oracle dba 4096 Jul 14 10:54 nmcRACDB11521
-rw-r----- 1 oracle dba 4655792 Jul 24 23:01 emoms.trc
-rw-r----- 1 oracle dba 4655792 Jul 24 23:01 emoms.log
drwxr----- 3 oracle dba 4096 Jul 25 11:35 .
-rw-r----- 1 oracle dba 4096 Jul 25 12:05 emdb.nohup.lr
-rw-r----- 1 oracle dba 1074 Jul 25 12:05 emagent_perl.trc
-rw-r----- 1 oracle dba 1731 Jul 25 12:06 emagent.log
-rw-r----- 1 oracle dba 1080 Jul 25 12:07 emagentfetchlet.trc
-rw-r----- 1 oracle dba 1080 Jul 25 12:07 emagentfetchlet.log
-rw-r----- 1 oracle dba 81089 Jul 25 13:28 emdctl.trc
-rw-r----- 1 oracle dba 3309143 Jul 25 13:28 emdb.nohup
-rw-r----- 1 oracle dba 1044518 Jul 25 13:28 emagent.trc
oracle@rac01<20>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> cat emagent.log
2007-07-14 10:50:44 Thread-3086936288 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-14 10:51:16 Thread-3086936288 EMAgent started successfully (00702)
2007-07-14 14:38:21 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-14 14:39:00 Thread-3086935744 EMAgent started successfully (00702)
2007-07-24 07:05:06 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-24 07:07:11 Thread-3086935744 target {+ASM1_rac01, osm_instance} is broken: cannot compute dynamic properties in time. (00155)
2007-07-24 07:07:14 Thread-3086935744 EMAgent started successfully (00702)
2007-07-24 12:06:27 Thread-3086935744 EMAgent normal shutdown (00703)
2007-07-24 12:08:26 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-24 12:08:51 Thread-3086935744 EMAgent started successfully (00702)
2007-07-25 11:35:35 Thread-3086935744 EMAgent normal shutdown (00703)
2007-07-25 11:37:32 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-25 11:39:29 Thread-3086935744 target {+ASM1_rac01, osm_instance} is broken: cannot compute dynamic properties in time. (00155)
2007-07-25 11:39:30 Thread-3086935744 EMAgent started successfully (00702)
2007-07-25 12:03:36 Thread-3086935744 EMAgent normal shutdown (00703)
2007-07-25 12:05:15 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-25 12:06:23 Thread-3086935744 target {+ASM1_rac01, osm_instance} is broken: cannot compute dynamic properties in time. (00155)
2007-07-25 12:06:24 Thread-3086935744 EMAgent started successfully (00702)
oracle@rac01<21>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> cat emagentfetchlet.log
2007-07-14 11:01:44,208 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-14 14:40:29,096 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-24 07:10:44,123 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-24 12:12:48,187 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-25 11:41:25,628 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-25 12:07:30,335 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
oracle@rac01<22>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log>
oracle@rac01<22>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> tail -40 emagentfetchlet.trc
2007-07-14 11:01:44,208 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-14 14:40:29,096 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-24 07:10:44,123 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-24 12:12:48,187 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-25 11:41:25,628 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-25 12:07:30,335 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
oracle@rac01<25>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> tail -10 emdctl.trc
2007-07-25 13:01:02 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:04:41 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:07:12 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:10:50 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:14:32 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:18:09 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:20:40 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:24:27 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:28:06 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:31:43 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
oracle@rac01<28>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> tail -10 emagent.trc
2007-07-25 13:31:44 Thread-43162528 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:31:44 Thread-43162528 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
2007-07-25 13:32:14 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:32:14 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
2007-07-25 13:32:14 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:32:14 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
2007-07-25 13:32:44 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:32:44 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
2007-07-25 13:32:44 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:32:44 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
Message was edited by:
Singh -
Cluster and ASM services doesnt start automatically
Hi,
I have configured a 2 node 10g RAC environment (rac, rac2) in RHEL 4 through vmware.
However the cluster services and ASM services do not come up automatically after server reboot. I have to manually bring it up using ./srvctl command.
[root@rac2 bin]# ./crs_stat -t
Name Type Target State Host
ora....SM1.asm application ONLINE UNKNOWN rac
ora....AC.lsnr application ONLINE UNKNOWN rac
ora.rac.gsd application ONLINE UNKNOWN rac
ora.rac.ons application ONLINE UNKNOWN rac
ora.rac.vip application ONLINE ONLINE rac
[email protected] application ONLINE UNKNOWN rac2
ora....C2.lsnr application ONLINE UNKNOWN rac2
ora.rac2.gsd application ONLINE UNKNOWN rac2
ora.rac2.ons application ONLINE UNKNOWN rac2
ora.rac2.vip application ONLINE ONLINE rac2
Could someone please guide me as to how these services can be brought up automatically upon every reboot?Normally all services will start automatically, I am not sure how you installed and configured your RAC.
Here i am giving all options.
Option1:-
Re: CRS auto-start
Option2:-
See the below links.
http://jaffardba.blogspot.com/2009/03/how-to-startup-rac-database-services.html
http://www.dannorris.com/2009/03/12/start-database-services-automatically-after-instance-startup/
Option3:-
if you simply want to start the service upon server reboot, put srvctl
command in rc.local script under /etc directory.
Hope this solves your issue.
Regards
Click here to [createdisk, deletedisk and querydisk in ASM|http://www.oracleracexpert.com/2009/09/createdisk-deletedisk-and-querydisk-in.html]
Click here to see [ RAC database Instance hang/restart due to node eviction and Solution.|http://www.oracleracexpert.com/2009/09/ora-29740-evicted-by-member-0-group.html]
Click here for [Cross platform Transportable tablespace using RMAN|http://www.oracleracexpert.com/2009/10/cross-platform-transportable-tablespace.html]
http://www.oracleracexpert.com -
RAC does not start after trying to move storage
Hello,
We have a two nodes RAC Oracle 10g and ASM groups defined on one external storage.
RAC does not start after we reformated the storage to RAID 10.
What we did:
1) deleted the database
2) reformated the external storage to RAID 10 and defined needed volumes for OCR, VotingDisk and ASM
3) on both nodes
- recreated VotingDisk
node1:root$ /u01/oracle/product/10gr2/crs/bin/crsctl add css votedisk /dev/rdsk/c0t600A0B8000347E7D00000EE74A5AF386d0s3 -force
Now formatting voting disk: /dev/rdsk/c0t600A0B8000347E7D00000EE74A5AF386d0s3
successful addition of votedisk /dev/rdsk/c0t600A0B8000347E7D00000EE74A5AF386d0s3.
node2:root$ /u01/oracle/product/10gr2/crs/bin/crsctl add css votedisk /dev/rdsk/c0t600A0B8000347E7D00000EE74A5AF386d0s3 -force
Now formatting voting disk: /dev/rdsk/c0t600A0B8000347E7D00000EE74A5AF386d0s3
successful addition of votedisk /dev/rdsk/c0t600A0B8000347E7D00000EE74A5AF386d0s3.
- tried to move OCR but was able only on one node:
/u01/oracle/product/10gr2/crs/bin/ocrconfig -replace ocrmirror /dev/rdsk/c7t30d0s3
/u01/oracle/product/10gr2/crs/bin/ocrconfig -replace ocr /dev/rdsk/c7t29d0s3
/u01/oracle/product/10gr2/crs/bin/ocrconfig -replace ocr
These return ok for both node1 and node2:
node1:oracle$ cluvfy stage -post hwos -n node1
node1:oracle$ cluvfy stage -post crsinst -n node1,node2 -verbose
Performing post-checks for cluster services setup
Checking node reachability...
Check: Node reachability from node "node1"
Destination Node Reachable?
node1 yes
node2 yes
Result: Node reachability check passed from node "node1".
Checking user equivalence...
Check: User equivalence for user "oracle"
Node Name Comment
node2 passed
node1 passed
Result: User equivalence check passed for user "oracle".
Checking Cluster manager integrity...
Checking CSS daemon...
Node Name Status
node2 not running
node1 not running
Result: Daemon status check failed for "CSS daemon".
Cluster manager integrity check failed.
Checking cluster integrity...
Cluster is divided into 2 partitions.
Partition 1 consists of the following members:
Node Name
node1
Partition 2 consists of the following members:
Node Name
node1
node2
Cluster integrity check failed. Cluster is divided into 2 partition(s).
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
WARNING:
CSS is probably working with a non-clustered, local-only configuration on nodes:
node1
Verification will proceed with nodes:
node2
Uniqueness check for OCR device passed.
Checking the version of OCR...
OCR of correct Version "2" exists.
Checking data integrity of OCR...
Data integrity check for OCR passed.
OCR integrity check failed.
Checking CRS integrity...
Checking daemon liveness...
Check: Liveness for "CRS daemon"
Node Name Running
node2 no
node1 yes
Result: Liveness check failed for "CRS daemon".
Checking daemon liveness...
Check: Liveness for "CSS daemon"
Node Name Running
node2 no
node1 no
Result: Liveness check failed for "CSS daemon".
Checking daemon liveness...
Check: Liveness for "EVM daemon"
Node Name Running
node2 no
node1 yes
Result: Liveness check failed for "EVM daemon".
Liveness of all the daemons
Node Name CRS daemon CSS daemon EVM daemon
node2 no no no
node1 yes no yes
Checking CRS health...
Check: Health of CRS
Node Name CRS OK?
node1 unknown
Result: CRS health check failed.
CRS integrity check failed.
Post-check for cluster services setup was unsuccessful on all the nodes.
Please help.Thanks for the answer.
Problem is that the RAC is not healty, I know how to start it but now it doesn't start and I cannot find helpful logs.
You can see that cluster is not healty from cluvfy output.
When I tried to start crs whith crsctl start crs but:
- on node2 none of the demons starts (CSS, CRS or EVM)
- on node1 CSS and EVM demons start but CRS is not starting and I cannot find helpful logs
Problem 2: Cannot change OCR location
node2:root$ cat /var/opt/oracle/ocr.loc
+#Device/file /dev/rdsk/c7t30d0s3 getting replaced by device /dev/rdsk/c7t30d0s3+
ocrconfig_loc=/dev/rdsk/c7t30d0s3
local_only=falsenode2:root$ ocrconfig -replace ocr /dev/rdsk/c0t600A0B8000347EAD00000F314A5AF5B8d0s3
PROT-1: Failed to initialize ocrconfig
node1:root$ cat /var/opt/oracle/ocr.loc
ocrconfig_loc=/u01/oracle/product/10gr2/db/cdata/localhost/local.ocr
local_only=TRUE
node1:root$ ocrconfig -replace ocr /dev/rdsk/c0t600A0B8000347EAD00000F314A5AF5B8d0s3
PROT-1: Failed to initialize ocrconfig
Problem 3: On node 1 I wrongly issued a command
node1:root$ /u01/oracle/product/10gr2/db/bin/localconfig reset
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Configuration for local CSS has been initialized
Adding to inittab
Startup will be queued to init within 30 seconds.
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
Giving up: Oracle CSS stack appears NOT to be running.
Oracle CSS service would not start as installed
Automatic Storage Management(ASM) cannot be used until Oracle CSS service is started -
hi
one of our RAC environment keep restarting.
i've disable the init.cssd, init.crs, init.evmd in the /etc/inittab in order to check the logs.
this is the situation:
crsd.log:
2009-02-04 00:09:00.118: [ COMMCRS][9]clsc_connect: (8000000100318640) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:09:00.132: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:09:00.134: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2009-02-04 00:09:08.016: [ CRSD][1]32Daemon Version: 10.2.0.2.0 Active Version: 10.2.0.2.0
2009-02-04 00:09:08.016: [ CRSD][1]32Active Version and Software Version are same
2009-02-04 00:09:08.017: [ CRSMAIN][1]32Initializing OCR
2009-02-04 00:09:08.037: [ OCRRAW][1]proprioo: for disk 0 (/dev/rdsk/ora_ocr_raw), id match (1), my id set (752560621,1028247821) total id sets (1), 1st set
(752560621,1028247821), 2nd set (0,0) my votes (2), total votes (2)
2009-02-04 00:09:08.140: [ CSSCLNT][24]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
ocssd.log:
[ CSSD]2009-02-03 21:52:08.651 [9] >USER: clssnmHandleUpdate: NODE 1 (node1l) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2009-02-03 21:52:08.651 [9] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmDoSyncUpdate: Sync Complete!
[ CSSD]2009-02-03 21:52:08.722 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2009-02-03 21:52:08.724 [17] >TRACE: clssgmReconfigThread: started for reconfig (1)
[ CSSD]2009-02-03 21:52:08.749 [17] >USER: NMEVENT_RECONFIG [00][00][00][02]
[ CSSD]2009-02-03 21:52:08.749 [17] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
[ CSSD]2009-02-03 21:52:08.751 [13] >TRACE: clssgmPeerListener: connects done (1/1)
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
[ CSSD]CLSS-3001: local node number 1, master node number 1
[ CSSD]2009-02-03 21:52:08.753 [17] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
[ CSSD]2009-02-03 21:52:08.863 [10] >TRACE: clssgmClientConnectMsg: Connect from con(80000001008fd2a0) proc(8000000100ae26a8) pid() proto(10:2:1:1)
[ CSSD]2009-02-03 21:52:08.864 [10] >TRACE: clssgmClientConnectMsg: Connect from con(8000000100ae0128) proc(8000000100ae2a10) pid() proto(10:2:1:1) from con(8000000100aa32c0) proc(8000000100aa5b90) pid() proto(10:2:1:1)
alertlog:
[cssd(2535)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
2009-02-03 23:55:20.821
[cssd(2575)]CRS-1605:CSSD voting file is online: /dev/rdsk/ora_voting_raw. Detai ls in /work/crs/product/10.2/crs/log/lourmel/cssd/ocssd.log.
2009-02-03 23:55:28.376
evmd.log:
Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
2009-02-04 00:08:58.331: [ EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:08:59.939: [ COMMCRS][9]clsc_connect: (800000010007d658) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:08:59.946: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:08:59.948: [ EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:09:07.596: [ CSSCLNT][1]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
syslog:
Feb 4 00:08:41 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:45 lourmel sfd[2153]: starting the daemon.
Feb 4 00:08:45 lourmel su: + tty?? root-orac
Feb 4 00:08:45 lourmel krsd[2152]: Delay time is 300 seconds
Feb 4 00:08:43 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:52 lourmel above message repeats 2 times
Feb 4 00:08:52 lourmel syslog: Cluster Ready Services completed waiting on dependencies.
Feb 4 00:08:53 lourmel syslog: Running CRSD with TZ =
when i checked(befor the restart) the command crs_stat i got the message:
ORA-0184: Cannot communicate wirh CRS
crsctl check crs gives us:
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
as i said befor, the machine always restarting
anyone have an idea?? pleaseDear All,
I recently upgrade the Few RAC setups with Oracle 10g Patchset 3 (10.2.0.4) on Linux Servers
In one of the RAC setup, found servers are rebooting daily. The same setup was working fine and problem started only after applying the Patchset. Checked all the logs and Found nothing relevant.
Then i checked the things which added with this Patchset.
The Most interesting found , Oracle Added a New Daemon- oprocd.
# ps -efl | grep oprocd
4 S root 6440 6063 0 -40 - - 2114 - Mar03 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -hsi 5:10:50:75:90 -f
These are Interesting Points about above line
1.This Process is running by root user
2. With Highest Priority -40
3. Probing every Seconds (t 1000)
4. waiting CPU response for 500 Milliseconds ( -m 500 means margin time is 500 Milli Seconds)
5. Process status is Fatal (-f)
Now I am concluding these points- This daemon will probe cpu every second and wait for response within 500 Mill seconds. If in the 500 Milli second not getting any response from the cpu, will assume the CPU is hang and try to Reboot the Machine. The OPERATING SYSTEM will not get enough time to write the system logs and server reboots.
So the solution is increase the Margin time for 500 Milli second to 10 seconds.
These are following steps to increase the Margin time.
Please Remember- The Modification process need Downtime and You need to stop cluster service in all member nodes.
1. Stop The CRS Process
#crsctl stop crs
#<CRS_HOME>/bin/oprocd stop
2. Ensure that Clusterware stack is down and not running
#ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
This should return no processes.
3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root:
#crsctl set css diagwait 13 -force
4. Check if diagwait is successfully set.
#crsctl get css diagwait
5. Restart the Oracle Clusterware on all the nodes by executing:
#crsctl start crs
(Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)
6. The oprocd daemon process will show with -m 10000
# ps -efl| grep oprocd
# 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
Rollback Procedure-
If You need to unset oprocd value due any reason
#crsctl unset css diagwait
I am confident, The abnormal RAC Node restart problem will solve with this workaround.
Regards,
Sumit
Bangalore,India -
MSExchangeSA and IS is not starting automatic at reboot after upgrading the server from SP1 to SP2
I am running a 2 node DAG with separate Hub & CAS servers
I have already upgraded the Hub CAS server and they are fine
Exchange running on Windows 2008 R2 SP1
All are in VM-Ware ESXi 5.0
After I upgrade the mailbox server to SP2 the SA and IS service not starting automatic after reboot
If I run "net time /set" command and then restart ADtopology server, the IS and SA starts fine
But again after reboot they do not start
Also I get the below event in the app log when the IS & SA fails to start
Log Name: Application
Source: MSExchangeIS
Date: 05/22/2012 5:15:03 AM
Event ID: 5003
Task Category: General
Level: Error
Keywords: Classic
User: N/A
Computer: server.domain.com
Description:
Unable to initialize the Information Store service because the clocks on the client and server are skewed. This may be caused by a time change either on the client or on the server, and may require a restart of that computer. Verify that your domain is correctly
configured and is currently online.
If I check the Registry for time resource it is taking from the DC
I manually compared the time between DC and Exchange and it is fine
Since the Exchange mailbox is virtual, I checked the time sync on VM-Ware tools and it is "unchecked"
Checked for group policy and do not find any specific to time sync.
Confirmed that Windows Time service is running
Suspecting a Operating system issue, tried to upgrade the other DAG
Even that node is displaying the same symptom and Issues
Tried to install RU1 on both nodes and that too did not fix the issue.
Checked with VM-Ware engineer and he said that every thing from VM-Ware end is fine
Is Anyone have come across the issue before ?Exchange 2010: Unable to initialize the Information Store
Problem:
Unable to initialize the Information Store service because the clocks on the client and server are skewed. This may be caused by a time change either in the client or the server, and may require a reboot of that computer. Verify that
your domain is properly configured and is currently online.
Solution:
Open a cmd prompt from the Exchange Server as administrator (right mouse click on the command prompt shortcut and select “Run as Administrator”)
Run this command
Net time \\ADServerName /Set
Restart the Microsoft Exchange Active Directory Topology Service
*Important!! You have to restart the Microsoft Exchange Active Directory Topology Service or this fix will not work.. I tried just starting the Information Store service after running the command it it still failed..
This worked for me and all is well on my exchange server.
- See more at: http://www.mountainvistatech.com/2012/06/04/exchange-2010-unable-to-initialize-the-information-store/#sthash.SwQpgQEM.dpuf -
Windows listener +service does not start automatically eventhough Automatic
Hi
When I restart the database server, the listener is not starting automatically , so as the service.
Both are set to start automatically.
This was working before and all of a sudden, this problem appeared.
I dont remember changing any setting on the database side but on windows server side, not sure if anything is changed.
The error we received:
============
The OracleServiceGEOOAP service failed to start due to the following error: %%1053 Service error on GEPGPO01-V
The OracleOraDb11g_home1TNSListener service failed to start due to the following error: %%1053 Service error on GEPGPO01-V
alert log
=======
Fatal NI connect error 12638, connecting to:
(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
VERSION INFORMATION:
TNS for 32-bit Windows: Version 11.1.0.7.0 - Production
Oracle Bequeath NT Protocol Adapter for 32-bit Windows: Version 11.1.0.7.0 - Production
Time: 11-MAR-2011 15:44:26
Tracing not turned on.
Tns error struct:
ns main err code: 12638
TNS-12638: Credential retrieval failed
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
listener.log
========
Fri Mar 11 15:44:23 2011
11-MAR-2011 15:44:23 * service_update * geop * 0
11-MAR-2011 15:44:29 * service_died * geooap * 12547
TNS-12547: TNS:lost contact
11-MAR-2011 15:44:30 * service_died * geop * 12547
TNS-12547: TNS:lost contact
Fri Mar 11 15:44:37 2011
11-MAR-2011 15:44:37 * (CONNECT_DATA=(SID=GEOOAP)(CID=(PROGRAM=C:\Program Files\Quest Software\Toad for Oracle\toad.exe)(HOST=D00661)(USER=app_nathv))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.4.136.230)(PORT=65288)) * establish * GEOOAP * 12505
TNS-12505: TNS:listener does not currently know of SID given in connect descriptor
11-MAR-2011 15:44:41 * (CONNECT_DATA=(SID=GEOOAP)(CID=(PROGRAM=C:\Program Files\Quest Software\Toad for Oracle\toad.exe)(HOST=D00661)(USER=app_nathv))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.4.136.230)(PORT=65289)) * establish * GEOOAP * 12505
TNS-12505: TNS:listener does not currently know of SID given in connect descriptor
11-MAR-2011 15:44:43 * (CONNECT_DATA=(SID=GEOOAP)(CID=(PROGRAM=C:\Program Files\Quest Software\Toad for Oracle\toad.exe)(HOST=D00661)(USER=app_nathv))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.4.136.230)(PORT=65290)) * establish * GEOOAP * 12505
TNS-12505: TNS:listener does not currently know of SID given in connect descriptor
Fri Mar 11 15:53:46 2011
So now we have to manually restart the listener and service , for the database to be opened. Is it possible to give direction what areas need to check, to solve this issue?
Thanks.Hi Krithi,
Can you confirm my understanding here:
* Since a short time the automatic startup of the listener and Oracle instance fail.
* You can manually startup the listener and database instance without any problem, and then all runs as expected
so far so good?
What's your procedure to start the listener and DB instances?
Are you performing the startups with your own user account, or SYSTEM, or a dedicated Oracle account?
Are the windows services running under SYSTEM, a dedicated Oracle user....??
Could it be that the system administrator installed something (like a new Oracle client) which would have changed some settings?
Can you compare the environment variables (ORACLE_HOME, ORACLE_SID, PATH) between the account YOU use for manual startup and those for the user account used to (auto)start the services?
Maybe you can try to plan a reboot after you have set the problematic services to run under your profile?
Well.... a few points to investigate if not yet considered.
HTH,
Thierry -
Hello everyone,
I have met an error,that is our RAC node auto restart with below messages.
#/u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/alert_odsdb1.log
Fri Jun 07 12:23:42 2013
Thread 1 cannot allocate new log, sequence 58363
Checkpoint not complete
Current log# 2 seq# 58362 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
Current log# 2 seq# 58362 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
Fri Jun 07 12:23:42 2013
NOTE: ASMB terminating
Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
ASMB (ospid: 32641): terminating the instance due to error 15064
Fri Jun 07 12:23:44 2013
ORA-1092 : opitsk aborting process
Fri Jun 07 12:23:46 2013
ORA-1092 : opitsk aborting process
Instance terminated by ASMB, pid = 32641
Fri Jun 07 12:25:02 2013
Starting ORACLE instance (normal)
Fri Jun 07 12:25:23 2013
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
[name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
Public Interface 'eth0:1' configured from GPnP for use as a public interface.
[name='eth0:1', type=1, ip=135.33.2.13, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0/dbhome_2/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
System name: Linux
Node name: odsdb1
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_2/dbs/initodsdb1.ora
System parameters with non-default values:
processes = 4500
sessions = 6784
event = ""
spfile = "+DATA/odsdb/spfileodsdb.ora"
nls_language = "SIMPLIFIED CHINESE"
nls_territory = "CHINA"
memory_target = 170G
control_files = "+DATA/odsdb/controlfile/current.262.812288837"
control_files = "+DATA/odsdb/controlfile/current.261.812288837"
db_block_size = 8192
compatible = "11.2.0.0.0"
db_files = 4096
cluster_database = TRUE
db_create_file_dest = "+DATA"
db_recovery_file_dest = ""
db_recovery_file_dest_size= 38820M
thread = 1
undo_tablespace = "UNDOTBS1"
instance_number = 1
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=odsdbXDB)"
remote_listener = "odsdb-cluster-scan:1521"
job_queue_processes = 1000
audit_file_dest = "/u01/app/oracle/admin/odsdb/adump"
audit_trail = "DB"
db_name = "odsdb"
open_cursors = 300
diagnostic_dest = "/u01/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
169.254.37.103
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Jun 07 12:25:33 2013
PMON started with pid=2, OS id=22959
Fri Jun 07 12:25:33 2013
PSP0 started with pid=3, OS id=22962
Fri Jun 07 12:25:34 2013
VKTM started with pid=4, OS id=22971 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Fri Jun 07 12:25:34 2013
GEN0 started with pid=5, OS id=22977
Fri Jun 07 12:25:34 2013
DIAG started with pid=6, OS id=22979
Fri Jun 07 12:25:35 2013
DBRM started with pid=7, OS id=22981
Fri Jun 07 12:25:35 2013
PING started with pid=8, OS id=22983
Fri Jun 07 12:25:35 2013
ACMS started with pid=9, OS id=22985
Fri Jun 07 12:25:35 2013
DIA0 started with pid=10, OS id=22987
Fri Jun 07 12:25:35 2013
LMON started with pid=11, OS id=22989
Fri Jun 07 12:25:35 2013
LMD0 started with pid=12, OS id=22991
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [61440 - 81920]
Fri Jun 07 12:25:35 2013
LMS0 started with pid=13, OS id=22994 at elevated priority
Fri Jun 07 12:25:35 2013
LMS1 started with pid=14, OS id=22998 at elevated priority
Fri Jun 07 12:25:35 2013
LMS2 started with pid=15, OS id=23002 at elevated priority
Fri Jun 07 12:25:35 2013
LMS3 started with pid=16, OS id=23006 at elevated priority
Fri Jun 07 12:25:35 2013
RMS0 started with pid=17, OS id=23010
Fri Jun 07 12:25:35 2013
LMHB started with pid=18, OS id=23013
Fri Jun 07 12:25:35 2013
MMAN started with pid=19, OS id=23015
Fri Jun 07 12:25:35 2013
DBW0 started with pid=20, OS id=23017
Fri Jun 07 12:25:35 2013
DBW1 started with pid=21, OS id=23019
Fri Jun 07 12:25:35 2013
DBW2 started with pid=22, OS id=23022
Fri Jun 07 12:25:35 2013
DBW3 started with pid=23, OS id=23024
Fri Jun 07 12:25:35 2013
DBW4 started with pid=24, OS id=23026
Fri Jun 07 12:25:35 2013
DBW5 started with pid=25, OS id=23028
Fri Jun 07 12:25:35 2013
DBW6 started with pid=26, OS id=23031
Fri Jun 07 12:25:35 2013
DBW7 started with pid=27, OS id=23033
Fri Jun 07 12:25:35 2013
LGWR started with pid=28, OS id=23035
Fri Jun 07 12:25:35 2013
CKPT started with pid=29, OS id=23037
Fri Jun 07 12:25:35 2013
SMON started with pid=30, OS id=23039
Fri Jun 07 12:25:35 2013
RECO started with pid=31, OS id=23041
Fri Jun 07 12:25:35 2013
RBAL started with pid=32, OS id=23043
Fri Jun 07 12:25:35 2013
ASMB started with pid=33, OS id=23045
Fri Jun 07 12:25:35 2013
MMON started with pid=34, OS id=23048
Fri Jun 07 12:25:35 2013
MMNL started with pid=35, OS id=23052
Fri Jun 07 12:25:35 2013
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
NOTE: initiating MARK startup
starting up 1 shared server(s) ...
Starting background process MARK
Fri Jun 07 12:25:35 2013
MARK started with pid=37, OS id=23056
NOTE: MARK has subscribed
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 119)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid according to instance 2
* domain 0 valid = 1 according to instance 2
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration started (old inc 119, new inc 121)
List of instances:
1 2 (myinst: 1)
Nested reconfiguration detected.
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Fri Jun 07 12:25:45 2013
Submitted all GCS remote-cache requests
Fri Jun 07 12:26:08 2013
Fix write in gcs resources
Reconfiguration complete
Fri Jun 07 12:26:10 2013
LCK0 started with pid=40, OS id=23632
Fri Jun 07 12:26:10 2013
Starting background process RSMN
Fri Jun 07 12:26:10 2013
RSMN started with pid=41, OS id=23646
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
Fri Jun 07 12:26:11 2013
ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=135.33.2.13)(PORT=1521))))' SCOPE=MEMORY SID='odsdb1';
ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
Fri Jun 07 12:26:11 2013
NOTE: Loaded library: System
Fri Jun 07 12:26:11 2013
SUCCESS: diskgroup DATA was mounted
Fri Jun 07 12:26:11 2013
NOTE: dependency between database odsdb and diskgroup resource ora.DATA.dg is established
Fri Jun 07 12:26:16 2013
Successful mount of redo thread 1, with mount id 3452000551
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
Picked broadcast on commit scheme to generate SCNs
Thread 1 advanced to log sequence 58364 (thread open)
Thread 1 opened at log sequence 58364
Current log# 2 seq# 58364 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
Current log# 2 seq# 58364 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jun 07 12:26:21 2013
SMON: enabling cache recovery
Fri Jun 07 12:26:23 2013
minact-scn: Inst 1 is a slave inc#:121 mmon proc-id:23048 status:0x2
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Jun 07 12:26:34 2013
[23651] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:2061372614 end:2061384964 diff:12350 (123 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
Fri Jun 07 12:26:34 2013
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Starting background process GTX0
Fri Jun 07 12:26:35 2013
GTX0 started with pid=45, OS id=23931
Starting background process RCBG
Fri Jun 07 12:26:35 2013
RCBG started with pid=46, OS id=23933
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Jun 07 12:26:35 2013
QMNC started with pid=48, OS id=23940
Completed: ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
Fri Jun 07 12:26:38 2013
Starting background process CJQ0
Fri Jun 07 12:26:38 2013
CJQ0 started with pid=55, OS id=23977
Fri Jun 07 12:27:56 2013
Thread 1 advanced to log sequence 58365 (LGWR switch)
Current log# 1 seq# 58365 mem# 0: +DATA/odsdb/onlinelog/group_1.263.812288839
Current log# 1 seq# 58365 mem# 1: +DATA/odsdb/onlinelog/group_1.264.812288839
Fri Jun 07 12:28:18 2013
Starting background process SMCO
Fri Jun 07 12:28:18 2013
SMCO started with pid=70, OS id=25166
Fri Jun 07 12:29:01 2013
Thread 1 cannot allocate new log, sequence 58366
Trace file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
System name: Linux
Node name: odsdb1
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Instance name: odsdb1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 33
Unix process pid: 32641, image: oracle@odsdb1 (ASMB)
*** 2013-05-14 15:37:08.705
*** SESSION ID:(3499.1) 2013-05-14 15:37:08.705
*** CLIENT ID:() 2013-05-14 15:37:08.705
*** SERVICE NAME:() 2013-05-14 15:37:08.705
*** MODULE NAME:() 2013-05-14 15:37:08.705
*** ACTION NAME:() 2013-05-14 15:37:08.705
NOTE: initiating MARK startup
*** 2013-05-14 15:37:16.835
instance health monitoring reports instance shutting down
*** 2013-06-07 12:23:42.700
NOTE: ASMB terminating
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
error 15064 detected in background process
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1332<-ksbrdp()+3344<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
*** 2013-06-07 12:23:42.783
ASMB (ospid: 32641): terminating the instance due to error 15064
/u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
NOTE: ASMB process exiting, either shutdown is in progress
NOTE: or foreground connected to ASMB was killed.
Fri Jun 07 12:23:42 2013
NOTE: client exited [14808]
Fri Jun 07 12:23:44 2013
Received an instance abort message from instance 2
Please check instance 2 alert and LMON trace files for detail.
Fri Jun 07 12:23:44 2013
Received an instance abort message from instance 2
Please check instance 2 alert and LMON trace files for detail.
LMD0 (ospid: 31201): terminating the instance due to error 481
Instance terminated by LMD0, pid = 31201
Fri Jun 07 12:24:30 2013
* instance_number obtained from CSS = 1, checking for the existence of node 0...
* node 0 does not exist. instance_number = 1
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
[name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.2/grid/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
[grid@odsdb1 cssd]$ file core.30481
core.30481: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'ocssd.bin'
[grid@odsdb1 cssd]$ gdb
gdb gdbserver gdbtui
[grid@odsdb1 cssd]$ gdb ocssd.bin core.30481
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /u01/app/11.2.0.2/grid/bin/ocssd.bin...(no debugging symbols found)...done.
[New Thread 30486]
[New Thread 30530]
[New Thread 30526]
[New Thread 30525]
[New Thread 30523]
[New Thread 30522]
[New Thread 30521]
[New Thread 30520]
[New Thread 30519]
[New Thread 30504]
[New Thread 30503]
[New Thread 30495]
[New Thread 30485]
[New Thread 30484]
[New Thread 30483]
[New Thread 30481]
Reading symbols from /u01/app/11.2.0.2/grid/lib/libhasgen11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libhasgen11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libocr11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocr11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrb11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrb11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrutl11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrutl11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1
Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxn2.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxn2.so
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libcell11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libcell11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxp11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxp11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libnnz11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnnz11.so
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /u01/app/11.2.0.2/grid/lib/libnque11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnque11.so
Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...(no debugging symbols found)...done.
Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff505fd000
Core was generated by `/u01/app/11.2.0.2/grid/bin/ocssd.bin '.
Program terminated with signal 6, Aborted.
#0 0x000000369ea30265 in raise () from /lib64/libc.so.6
(gdb) where
#0 0x000000369ea30265 in raise () from /lib64/libc.so.6
#1 0x000000369ea31d10 in abort () from /lib64/libc.so.6
#2 0x00002afc67f9aeda in scls_abort (flags=0) at scls.c:7088
#3 0x000000000040babd in clssscExit (thrd=0x10d325a0, status=clssscreasonSHUTNORM) at clsssc.c:2155
#4 0x0000000000446221 in clssgmClientShutdown (thrd=0x10d325a0, cmInfo=0x10b40090) at clssgmc.c:6415
#5 0x0000000000436707 in clssgmProcClientReqs (thrd=0x10d325a0, clctx=0x10b40630) at clssgmc.c:704
#6 0x0000000000436405 in clssgmclientlsnr (thrd=0x10d325a0) at clssgmc.c:644
#7 0x000000000040ac2f in clssscthrdmain (thrd=0x10d325a0) at clsssc.c:1716
#8 0x000000369fa0677d in start_thread () from /lib64/libpthread.so.0
#9 0x000000369ead49ad in clone () from /lib64/libc.so.6
(gdb)
2013-06-07 12:19:37.377: [ CSSD][1085888832]clssscSelect: cookie accept request 0x10b40630
2013-06-07 12:19:37.377: [ CSSD][1085888832]clssgmAllocProc: (0x2aaab0133ea0) allocated
2013-06-07 12:19:37.379: [ CSSD][1085888832]clssgmClientConnectMsg: properties of cmProc 0x2aaab0133ea0 - 1,2,3,4,5
2013-06-07 12:19:37.379: [ CSSD][1085888832]clssgmClientConnectMsg: Connect from con(0x6ae44fa) proc(0x2aaab0133ea0) pid(14139/14139) version 11:2:1:4, properties: 1,2,3,4,5
2013-06-07 12:19:37.379: [ CSSD][1085888832]clssgmClientConnectMsg: msg flags 0x0000
2013-06-07 12:19:37.384: [ CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
2013-06-07 12:19:37.384: [ CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
2013-06-07 12:19:37.384: [ CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(1/0x2aaab010c5c0)
2013-06-07 12:19:37.385: [ CSSD][1085888832]clssgmRegisterShared: grp DBODSDB, mbr 0, type 1
2013-06-07 12:19:37.385: [ CSSD][1085888832]clssgmQueueShare: (0x2aaab0085790) target global grock DBODSDB member 0 type 1 queued from client (0x2aaab010c5c0), global grock DBODSDB, refcount 23
2013-06-07 12:19:37.385: [ CSSD][1085888832]clssgmRegisterShared: global grock DBODSDB member 0 share type 1, refcount 23
2013-06-07 12:19:37.391: [ CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
2013-06-07 12:19:37.391: [ CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
2013-06-07 12:19:37.391: [ CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(2/0x2aaab0061f10)
what is the problem
Edited by: 徐振富 on 2013-6-7 下午6:38
Edited by: 徐振富 on 2013-6-7 下午6:45is your ASM instance up?
If not, trying bring up ASM instance up just by itself and see if it throws any error?
Post status of crsctl status cluster -all -
Found the errors in CSSD logs of RAC node
Found the below error in CSSD logs in One of RAC nodes from 5:15 to 5:18 PM, after this the error got disappeared. Could anyone please have an idea what could be the reason of this error.
Also, at that time we didn't find any errors in the alert log.
[ CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization failed (112bd2a70), timed out, start 17:13:51.041, duration 120009
[ CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:15:51.233 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(1049540) proto(10:2:1:1)
[ CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization failed (112bd4a10), timed out, start 17:13:51.268, duration 120003
[ CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization prepare time: 3 ms
[ CSSD]2009-07-19 17:15:52.544 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(786918) proto(10:2:1:1)
[ CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization failed (112c38af0), timed out, start 17:13:53.290, duration 120009
[ CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization prepare time: 3 ms
[ CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization failed (112d356f0), timed out, start 17:13:53.320, duration 120000
[ CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:16:02.342 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b932b0) proc(112b67d10) pid(1336252) proto(10:2:1:1)
[ CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization failed (112d04f70), timed out, start 17:14:02.978, duration 120001
[ CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization failed (112d38210), timed out, start 17:14:03.006, duration 120002
[ CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization failed (112bd7e30), timed out, start 17:14:10.441, duration 120007
[ CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization failed (112d3ee70), timed out, start 17:14:10.840, duration 120008
[ CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization prepare time: 2 ms
Thanks,
MahiCheck the metalink note:
6996694-OCSSD.BIN CONSUMING 100% CPU AND ASM/DB HANGING
Maybe you are looking for
-
can someone help me with this.... how to read data.....and move it to internal table.... the requirement is as: <b>Get the Participant details in an internal table IT_FINAL_PAR</b> Loop through IT_PA0002 and move Personnel number (PERNR), Perso
-
Where to find the personalization api
where to find the personalization api
-
Where can I download a JDBC driver for MAXDB
Hi, I want to update my jdbc driver to access maxdb because I am using a version created four years ago but the database is the most updated. I have visited the maxdb web site, but I did not found any jdbc driver. Thanks in advance. Rosa
-
Problems acquiring and saving multiple camera images using a switch with GigE cameras
Hi Folks, We are having an issue with connecting 6 GigE cameras via an Ethernet switch. We can acquire and store individual cameras but once we increase the number of cameras we end up with jumpy avi files. Each camera has been physically labelled an
-
I'm at a point where I need to restart my iPhone 5 multiple times a day to be able to connect to the internet at all. My upgrade date is 2 weeks away. Is there any way to go ahead and upgrade my phone now rather than deal with troubleshooting my curr