RAC node setup
Hi,
can i use LVM in node.
i am implementing 2 node rac setup using openfiler 2.3
I node1&2 i have 3 SCSi hardisk of each 70 Gb. can i use LVM to combine the 3 Harddisks and use for a RAC environment.
charan.
Hi,
UPDATE ---
I am using 3 DELL server's for RAC configuration.
In server 1, i am installing Openfiler 2.3
In Server 2 & 3, i am installing Oracle Enterprise Linux 4 update 4 for nodes.
Server 1 : Dell Poweredge SC430 ( two SCSI HDD of 160 GB each ) with 2 NIC cards - eth0 & eth1
Server 2 : Dell Poweredge 600sc ( three SCSI HDD of 70 GB each ) with 2 NIC cards - eth0 & eth1
server 3 : Dell Poweredge SC430 ( two SCSI HDD of 160 GB each ) with 2 NIC cards - eth0 & eth1
For storage , i am using SCSI HDD in the server 1, no externel Storage.
I am going to use 10gR2.
Question 1:----- Does server 2 & 3 should have same Hardware configuration, chip model and so on..
( or )
Can Server 2 & 3 have different Hardware architecture, with different SCSI hardisk Sizes..
regards,
Charan
Similar Messages
-
Exception while failing over to 2nd RAC Node
We are using Weblogic 10.3.4. Our setup is that we have a Web Application (A tapestry front end Web UI) and EJb 2.1 back-end talking to the Oracle database. The EJB’s are CMP. Our product always was just stand alone and it wasn’t until this release we needed to make it work with RAC. To get this to work we followed the model of having a Multidatasource with datasources pointing to our RAC nodes. We have two types of datasources that we use persistent and non-persistent. And we are using the Oracle thin driver – non-XA for RAC Service Instances, supporting global transactions.
When we do failover to the 2nd node we get a nasty exception in our GUI but after logging out and logging back it we are fine.
My question is that I assumed I shouldn't have to restart our web-application and it should have stayed up ?? Or is there something wrong with our setup ?
Thanks,
IanShowing us the exception and/or the error messages at the server might help...
Note that failing over does not save any ongoing connection or transaction that
had been to the dead RAC node... Does your web-app get-use-close JDBC
connections on a per-user-invoke basis, or does it hold onto connections?
Joe -
Multiple Standby Databases on same RAC nodes.
We have a 3 node Oracle 10gR2 RAC production environment on site A and a 3 node Oracle 10g RAC standby environment on site B. Both use like HW and OS - HP BL45p with RHEL AS 4.x.
Can we have heterogeneous standby databases(logical and physical) running on the same RAC nodes?
Can the 2 apply processes (MRP-manageed recovery process & LSP-logical standby process) coexists on the same set of nodes in a cluster at the same time. Are there any conflicts or limitations?
Is there any documentation that supports this?Would Active Data Guard give you the best of both worlds?
The caveat might be your SID_LIST_LISTENER setup.
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /u01/app/oracle/product/11.2.0)
(PROGRAM = extproc)
(SID_DESC =
(global_dbname = <database1>_DGMGRL.yourdomain)
(ORACLE_HOME = /u01/app/oracle/product/11.2.0)
(sid_name = <database1>)
(SID_DESC =
(global_dbname = <database2>_DGMGRL.yourdomain)
(ORACLE_HOME = /u01/app/oracle/product/11.2.0)
(sid_name = <database2>)
I have a server with ten standbys on it currently and no issues.
As long as the version is the same you should be good. -
Running Oracle database 10g and 11g on same 5 RAC nodes
Hello Gurus,
Could any body throw light if I can install and sucessfully run Oracle database 10g and 11g on the same Oracle RAC installation setup.My setup is below
Number of nodes-5
OS- windows 2003 or RHEL5
storage- DELL EMC SAN
Clusterware- oracle version11g
File system-Automatic storage management(ASM)
After I successfully setup clusterware,ASM on the nodes,I would want to install 11g database on all 5 nodes .
Then Install 10g database on only 3 of the nodes using the same clusterware.
What are your views on the same.
Also FYI... as per metalink node 220970.1(RAC: Frequently Asked Questions) one can do such a setup.
what iam looking for is practical experience if anyone has implemented this in production system,if yes any issues faced and how tough it is to support.
Thanks,
ImtiyazYou could run an 11g database and 10g database on the same cluster as long as you use Clusterware 11g.
The administration aspect will drastically change according to the platform you run on. As of now, it appears you don't know whether it will be Linux or Windows.
It would be practical to support the same database release. -
Pointing existing RAC nodes to a fresh Shared Storage discarding old one
Hi,
I have a RAC Setup with the Primary Database on Oracle 10gR2.
For this setup, there is a Physical Standby Database Setup (using DataGuard configuration) also with 30min delay.
Assume that the "Shared Storage" of the Primary DB fails completely.
In the above scenario, my plan is to refresh a "fresh" shared storage device using Physical Standby Database Setup and then "point" the RAC nodes to the new "Shared Storage".
Is this possible?
Simply put, how can I refresh the Primary database using the Standby Database?
Please help with the utilities (RMAN, DatGuard, other non-Oracle products etc.) that can be used to do this.
RegardsDoes following Shared Device configuration is fine for 10g RAC on Windows 2003?
. 1 SCSI drive
• Two PCI network adapters on each node in the cluster.
• Storage cables to attach the shared storage device to all computers.
regard. -
hi
one of our RAC environment keep restarting.
i've disable the init.cssd, init.crs, init.evmd in the /etc/inittab in order to check the logs.
this is the situation:
crsd.log:
2009-02-04 00:09:00.118: [ COMMCRS][9]clsc_connect: (8000000100318640) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:09:00.132: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:09:00.134: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2009-02-04 00:09:08.016: [ CRSD][1]32Daemon Version: 10.2.0.2.0 Active Version: 10.2.0.2.0
2009-02-04 00:09:08.016: [ CRSD][1]32Active Version and Software Version are same
2009-02-04 00:09:08.017: [ CRSMAIN][1]32Initializing OCR
2009-02-04 00:09:08.037: [ OCRRAW][1]proprioo: for disk 0 (/dev/rdsk/ora_ocr_raw), id match (1), my id set (752560621,1028247821) total id sets (1), 1st set
(752560621,1028247821), 2nd set (0,0) my votes (2), total votes (2)
2009-02-04 00:09:08.140: [ CSSCLNT][24]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
ocssd.log:
[ CSSD]2009-02-03 21:52:08.651 [9] >USER: clssnmHandleUpdate: NODE 1 (node1l) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2009-02-03 21:52:08.651 [9] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmDoSyncUpdate: Sync Complete!
[ CSSD]2009-02-03 21:52:08.722 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2009-02-03 21:52:08.724 [17] >TRACE: clssgmReconfigThread: started for reconfig (1)
[ CSSD]2009-02-03 21:52:08.749 [17] >USER: NMEVENT_RECONFIG [00][00][00][02]
[ CSSD]2009-02-03 21:52:08.749 [17] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
[ CSSD]2009-02-03 21:52:08.751 [13] >TRACE: clssgmPeerListener: connects done (1/1)
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
[ CSSD]CLSS-3001: local node number 1, master node number 1
[ CSSD]2009-02-03 21:52:08.753 [17] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
[ CSSD]2009-02-03 21:52:08.863 [10] >TRACE: clssgmClientConnectMsg: Connect from con(80000001008fd2a0) proc(8000000100ae26a8) pid() proto(10:2:1:1)
[ CSSD]2009-02-03 21:52:08.864 [10] >TRACE: clssgmClientConnectMsg: Connect from con(8000000100ae0128) proc(8000000100ae2a10) pid() proto(10:2:1:1) from con(8000000100aa32c0) proc(8000000100aa5b90) pid() proto(10:2:1:1)
alertlog:
[cssd(2535)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
2009-02-03 23:55:20.821
[cssd(2575)]CRS-1605:CSSD voting file is online: /dev/rdsk/ora_voting_raw. Detai ls in /work/crs/product/10.2/crs/log/lourmel/cssd/ocssd.log.
2009-02-03 23:55:28.376
evmd.log:
Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
2009-02-04 00:08:58.331: [ EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:08:59.939: [ COMMCRS][9]clsc_connect: (800000010007d658) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:08:59.946: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:08:59.948: [ EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:09:07.596: [ CSSCLNT][1]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
syslog:
Feb 4 00:08:41 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:45 lourmel sfd[2153]: starting the daemon.
Feb 4 00:08:45 lourmel su: + tty?? root-orac
Feb 4 00:08:45 lourmel krsd[2152]: Delay time is 300 seconds
Feb 4 00:08:43 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:52 lourmel above message repeats 2 times
Feb 4 00:08:52 lourmel syslog: Cluster Ready Services completed waiting on dependencies.
Feb 4 00:08:53 lourmel syslog: Running CRSD with TZ =
when i checked(befor the restart) the command crs_stat i got the message:
ORA-0184: Cannot communicate wirh CRS
crsctl check crs gives us:
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
as i said befor, the machine always restarting
anyone have an idea?? pleaseDear All,
I recently upgrade the Few RAC setups with Oracle 10g Patchset 3 (10.2.0.4) on Linux Servers
In one of the RAC setup, found servers are rebooting daily. The same setup was working fine and problem started only after applying the Patchset. Checked all the logs and Found nothing relevant.
Then i checked the things which added with this Patchset.
The Most interesting found , Oracle Added a New Daemon- oprocd.
# ps -efl | grep oprocd
4 S root 6440 6063 0 -40 - - 2114 - Mar03 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -hsi 5:10:50:75:90 -f
These are Interesting Points about above line
1.This Process is running by root user
2. With Highest Priority -40
3. Probing every Seconds (t 1000)
4. waiting CPU response for 500 Milliseconds ( -m 500 means margin time is 500 Milli Seconds)
5. Process status is Fatal (-f)
Now I am concluding these points- This daemon will probe cpu every second and wait for response within 500 Mill seconds. If in the 500 Milli second not getting any response from the cpu, will assume the CPU is hang and try to Reboot the Machine. The OPERATING SYSTEM will not get enough time to write the system logs and server reboots.
So the solution is increase the Margin time for 500 Milli second to 10 seconds.
These are following steps to increase the Margin time.
Please Remember- The Modification process need Downtime and You need to stop cluster service in all member nodes.
1. Stop The CRS Process
#crsctl stop crs
#<CRS_HOME>/bin/oprocd stop
2. Ensure that Clusterware stack is down and not running
#ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
This should return no processes.
3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root:
#crsctl set css diagwait 13 -force
4. Check if diagwait is successfully set.
#crsctl get css diagwait
5. Restart the Oracle Clusterware on all the nodes by executing:
#crsctl start crs
(Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)
6. The oprocd daemon process will show with -m 10000
# ps -efl| grep oprocd
# 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
Rollback Procedure-
If You need to unset oprocd value due any reason
#crsctl unset css diagwait
I am confident, The abnormal RAC Node restart problem will solve with this workaround.
Regards,
Sumit
Bangalore,India -
I have done all the steps to remove one RAC node but got stuck at the step of running rootdelete.sh file from $CRS_HOME/install directory as I don't have this file in windows environment.
What is the equivalent file for rootdelete.sh on windows platform. I want to run this to remove the node info from the clusterware entry.
Is there a good document that explains about removing the node on windows platform.Hello,
You need to run the following steps to remove a node from a RAC cluster on Windows platform:
Perform the following steps on a node other than the node you want to delete:
1. Run the Database Configuration Assistant (DBCA) utility to delete the instance.
2. Then run the Net Configuration Assistant (NetCA) to delete the listener.
3. If the node that you are deleting has ASM instance, then delete the ASM instance using the srvctl stop asm and srvctl remove asm commands.
4. Run the command srvctl stop nodeapps -n nodename of the node to be deleted to stop the node applications.
5. Run the command srvctl remove nodeapps -n nodename of the node to be deleted to remove the node applications.
6. Stop isqlplus if it is running.
7. Run the command setup.exe -updateNodeList ORACLE_HOME=Oracle_home ORACLE_HOME_NAME=Oracle_home_name CLUSTER_NODES=remaining
nodes where remaining nodes is a list of the nodes that are to remain part of the cluster.
Perform the following steps on the deleted RAC node:
1. Run the command setup.exe -updateNodeList -local -noClusterEnabled ORACLE_HOME=Oracle_home ORACLE_HOME_NAME=Oracle_home_name CLUSTER_NODES="".
Note that you do not need a value for "" after the CLUSTER_NODES= entry in this command. If you delete more than one node, then you must run this command on every deleted node to remove the Oracle home if you have a non-shared Oracle home (non-cluster file system) installation.
2. On the same node, delete the Windows Registry entries and ASM services using Oradim.
3. From the deleted RAC node, run the command Oracle_home\oui\bin\setup.exe to start the Oracle Universal Installer (OUI). Select Deinstall Products and select the Oracle home that you want to de-install.
4. Then to delete the CRS node, from a remaining node run the command crssetup del -nn node_name of the deleted node, node number
5. Then run the command setup.exe -updateNodeList ORACLE_HOME=CRS home ORACLE_HOME_NAME=CRS home name CLUSTER_NODES=remaining nodes where remaining nodes is a list of the nodes that are to remain in the cluster.
6. Then on the deleted CRS node, run the command setup.exe -updateNodeList -local -noClusterEnabled ORACLE_HOME=CRS home ORACLE_HOME_NAME=CRS home name CLUSTER_NODES=""
7. Remove the Oracle home manually from the new node if the home is not shared and then manually remove the HKLM/software/Oracle registry keys and the Oracle services. 7
8. After adding or deleting nodes from your Oracle Database 10g with RAC environment, and after you are sure that your system is functioning properly, make a backup of the contents of the voting disk using the dd.exe utility. The dd.exe utility is part of the MKS toolkit.
ASM Instance Cleanup Procedures after Node Deletion on Windows-Based Platforms
The delete node procedure requires the following additional steps on Windows-based systems to remove the ASM instances:
1. If this is the Oracle home from which the node-specific listener named LISTENER_nodename runs, then use NetCA to remove this listener and its CRS resources. If necessary, re-create this listener in another home.
2. If this is the Oracle home from which the ASM instance runs, then remove the ASM configuration by running the following command for all nodes on which this Oracle home exists:
srvctl stop asm -n node
Then run the following command for the nodes that you are removing:
srvctl remove asm -n node
3. If you are using a cluster file system for your ASM Oracle home, then run the following commands on the local node:
4. rd -s -q %ORACLE_BASE%\admin\+ASM
delete %ORACLE_HOME%\database\*ASM*
5. If you are not using a cluster file system for your ASM Oracle home, then run the delete command mentioned in the previous step on each node on which the Oracle home exists.
6. Run the following command on each node that has an ASM instance:
oradim -delete -asmsid +ASMnode_number
Source:
Oracle® Real Application Clusters Administrator's Guide
10g Release 1 (10.1)
Part Number B10765-02
Chapter 5: Adding and Deleting Nodes and Instances
Hope this helps,
Ben Prusinski, Oracle 10g OCP
http://oracle-magician.blogspot.com -
Have lost a RAC node (10gr2) some years ago. We recovered the node via a bit of a hack - pulling a mirrored root disk from another cluster node and changing the config of that root disk, after boot, to that of the lost node (including recreating local node log directories and so on). But that was done as a result of a crisis... ;-)
Have lost a 11gr2 RAC node (3 node RAC, 1st node) this weekend during scheduled maintenance (was told that the root disks crashed badly when server was restarted). O/s has been reinstalled in the meantime. I've been looking for an official support note or section in an Oracle manual that describes the most painless way to get a lost node working again. Have not found anything.
Is the recommended approach to remove the lost node from the cluster and then add it as a brand new node? Or did I miss an alternative or even recommended Oracle method, in my googling and looking through the docs and Metalink notes?Is the recommended approach to remove the lost node from the cluster and then add it as a brand new node?i think so and This doc may help in this case as new node has no information of existing node setup.
Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot up [ID 466975.1] -
What is best use of 1400 gb SGA (2 rac nodes 768gb each)
currently using 11.2.0.3.0 on unix sun sever with 2 RAC nodes each 8 UltraSPARC-T1 cpus (came out in 2005) four threads each so oracle sees 32 CPUS very slow(1.2 gb). Database is 4TB in size on regular SAN (10k speed).
8gb SGA.
New boss wants to update system to the max to get best performance possible Money is a concern of course but budget is pretty high, Our use case is 12-16 users at same time, running reports some small others very large (return single row or 10000s or rows). reports take 5 sec to 5 minutes, Our job is get the fastest system possible, We have total of 8 licenses available so we can have 16 cores. We are also getting a 6tb all flash SSD array for database. we can get any CPU we want but we cant use parallel query server due to all kinds of issues we have experienced (too many slaves, RAC interconnect saturation etc, whack-a-mole). sparc has too many threads and without PS oracle runs query in single thread.
we have speced out the following system for each RAC node
HP ProLiant DL380p Gen8 8 SFF server
2 Intel Xeon E5-2637v2 3.5GHz/4-core cpus
768 gb ram
2 HP 300GB 6G SAS 15K drives for database software
this will give us total of 4 Xeon E5-2637v2 cpus 16 cores total (,5 factor for 8 licenses) and 1536 ram (leaving ~1400 for sga). this will guarantee an available core for each user. we intend to create very very large keep pool around 300 gb for each node that will hold all our dimension tables. this we hope will reduce reads from the SSD to just data from fact tables.,
Are we doing a massive overkill here? the budget for this was way less than what our boss expected. will that big an sga be wasted will say a 256gb be fine. or will oracle take advantage of it and be able to keep most blocks in there.
will an sga that big cause oracle problems due to overhead of handling that much ram?Current System:
===========
a. Version : 11.2.0.3
b. Unix Sun
c. CPU - 8 cpus with 4 threads => 32 logical cpus or cores
d. database 4TB
e. SAN - 10k speed disk drives
f. 8gb SGA
g. 1.2 gb ??
h. Users --> 12-16 concurrent and run reports varying size
i. reports elasped time 5 sec to 5 mins
j. cpu license -->8
Target System
===========
a. Version: 11.2.0.3
b. HP ProLiant DL380p Gen8 8 SFF server
c. RAM --> 768 GB
d. 2 HP 300GB 6G SAS 15K drives for database software
e. large keep pool -->90 gb to hold all dimension tables.
f. SSD to just data from fact tables
g. SGA -->256gb
Reassessment of the performance issues of current system appears to be required.Good performance tuning expert is required to look into tuning issues of current application by analyzing awr performance metrics . If 8GB SGA is not enough,then reason behind so is that queries running in the system are not having good access path to select lesser data to avoid flushing out of recent buffers from different tables involved in the query. Until those issues are identified , wherever you go, performance issue wont be going away as table size increase in future , problem will reappear.Even if the queries are running with more FULL Scan , then re-platforming to Exadata might be right decision as Exadata has smart scan , cell offloading feature which works faster and might be right direction for best performance and best investment for future.Compression (compress for OLTP) could be one of the other feature to exploit to improve further efficiency while reading the lesser block in lesser read time.
Investment in infrastructure will solve a few issue in short term but long term issue will again arise.
Investment in identifying the performance issues of current system would be best investment in current scenario. -
Hello everyone,
I have met an error,that is our RAC node auto restart with below messages.
#/u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/alert_odsdb1.log
Fri Jun 07 12:23:42 2013
Thread 1 cannot allocate new log, sequence 58363
Checkpoint not complete
Current log# 2 seq# 58362 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
Current log# 2 seq# 58362 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
Fri Jun 07 12:23:42 2013
NOTE: ASMB terminating
Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
ASMB (ospid: 32641): terminating the instance due to error 15064
Fri Jun 07 12:23:44 2013
ORA-1092 : opitsk aborting process
Fri Jun 07 12:23:46 2013
ORA-1092 : opitsk aborting process
Instance terminated by ASMB, pid = 32641
Fri Jun 07 12:25:02 2013
Starting ORACLE instance (normal)
Fri Jun 07 12:25:23 2013
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
[name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
Public Interface 'eth0:1' configured from GPnP for use as a public interface.
[name='eth0:1', type=1, ip=135.33.2.13, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0/dbhome_2/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
System name: Linux
Node name: odsdb1
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_2/dbs/initodsdb1.ora
System parameters with non-default values:
processes = 4500
sessions = 6784
event = ""
spfile = "+DATA/odsdb/spfileodsdb.ora"
nls_language = "SIMPLIFIED CHINESE"
nls_territory = "CHINA"
memory_target = 170G
control_files = "+DATA/odsdb/controlfile/current.262.812288837"
control_files = "+DATA/odsdb/controlfile/current.261.812288837"
db_block_size = 8192
compatible = "11.2.0.0.0"
db_files = 4096
cluster_database = TRUE
db_create_file_dest = "+DATA"
db_recovery_file_dest = ""
db_recovery_file_dest_size= 38820M
thread = 1
undo_tablespace = "UNDOTBS1"
instance_number = 1
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=odsdbXDB)"
remote_listener = "odsdb-cluster-scan:1521"
job_queue_processes = 1000
audit_file_dest = "/u01/app/oracle/admin/odsdb/adump"
audit_trail = "DB"
db_name = "odsdb"
open_cursors = 300
diagnostic_dest = "/u01/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
169.254.37.103
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Jun 07 12:25:33 2013
PMON started with pid=2, OS id=22959
Fri Jun 07 12:25:33 2013
PSP0 started with pid=3, OS id=22962
Fri Jun 07 12:25:34 2013
VKTM started with pid=4, OS id=22971 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Fri Jun 07 12:25:34 2013
GEN0 started with pid=5, OS id=22977
Fri Jun 07 12:25:34 2013
DIAG started with pid=6, OS id=22979
Fri Jun 07 12:25:35 2013
DBRM started with pid=7, OS id=22981
Fri Jun 07 12:25:35 2013
PING started with pid=8, OS id=22983
Fri Jun 07 12:25:35 2013
ACMS started with pid=9, OS id=22985
Fri Jun 07 12:25:35 2013
DIA0 started with pid=10, OS id=22987
Fri Jun 07 12:25:35 2013
LMON started with pid=11, OS id=22989
Fri Jun 07 12:25:35 2013
LMD0 started with pid=12, OS id=22991
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [61440 - 81920]
Fri Jun 07 12:25:35 2013
LMS0 started with pid=13, OS id=22994 at elevated priority
Fri Jun 07 12:25:35 2013
LMS1 started with pid=14, OS id=22998 at elevated priority
Fri Jun 07 12:25:35 2013
LMS2 started with pid=15, OS id=23002 at elevated priority
Fri Jun 07 12:25:35 2013
LMS3 started with pid=16, OS id=23006 at elevated priority
Fri Jun 07 12:25:35 2013
RMS0 started with pid=17, OS id=23010
Fri Jun 07 12:25:35 2013
LMHB started with pid=18, OS id=23013
Fri Jun 07 12:25:35 2013
MMAN started with pid=19, OS id=23015
Fri Jun 07 12:25:35 2013
DBW0 started with pid=20, OS id=23017
Fri Jun 07 12:25:35 2013
DBW1 started with pid=21, OS id=23019
Fri Jun 07 12:25:35 2013
DBW2 started with pid=22, OS id=23022
Fri Jun 07 12:25:35 2013
DBW3 started with pid=23, OS id=23024
Fri Jun 07 12:25:35 2013
DBW4 started with pid=24, OS id=23026
Fri Jun 07 12:25:35 2013
DBW5 started with pid=25, OS id=23028
Fri Jun 07 12:25:35 2013
DBW6 started with pid=26, OS id=23031
Fri Jun 07 12:25:35 2013
DBW7 started with pid=27, OS id=23033
Fri Jun 07 12:25:35 2013
LGWR started with pid=28, OS id=23035
Fri Jun 07 12:25:35 2013
CKPT started with pid=29, OS id=23037
Fri Jun 07 12:25:35 2013
SMON started with pid=30, OS id=23039
Fri Jun 07 12:25:35 2013
RECO started with pid=31, OS id=23041
Fri Jun 07 12:25:35 2013
RBAL started with pid=32, OS id=23043
Fri Jun 07 12:25:35 2013
ASMB started with pid=33, OS id=23045
Fri Jun 07 12:25:35 2013
MMON started with pid=34, OS id=23048
Fri Jun 07 12:25:35 2013
MMNL started with pid=35, OS id=23052
Fri Jun 07 12:25:35 2013
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
NOTE: initiating MARK startup
starting up 1 shared server(s) ...
Starting background process MARK
Fri Jun 07 12:25:35 2013
MARK started with pid=37, OS id=23056
NOTE: MARK has subscribed
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 119)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid according to instance 2
* domain 0 valid = 1 according to instance 2
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration started (old inc 119, new inc 121)
List of instances:
1 2 (myinst: 1)
Nested reconfiguration detected.
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Fri Jun 07 12:25:45 2013
Submitted all GCS remote-cache requests
Fri Jun 07 12:26:08 2013
Fix write in gcs resources
Reconfiguration complete
Fri Jun 07 12:26:10 2013
LCK0 started with pid=40, OS id=23632
Fri Jun 07 12:26:10 2013
Starting background process RSMN
Fri Jun 07 12:26:10 2013
RSMN started with pid=41, OS id=23646
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
Fri Jun 07 12:26:11 2013
ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=135.33.2.13)(PORT=1521))))' SCOPE=MEMORY SID='odsdb1';
ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
Fri Jun 07 12:26:11 2013
NOTE: Loaded library: System
Fri Jun 07 12:26:11 2013
SUCCESS: diskgroup DATA was mounted
Fri Jun 07 12:26:11 2013
NOTE: dependency between database odsdb and diskgroup resource ora.DATA.dg is established
Fri Jun 07 12:26:16 2013
Successful mount of redo thread 1, with mount id 3452000551
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
Picked broadcast on commit scheme to generate SCNs
Thread 1 advanced to log sequence 58364 (thread open)
Thread 1 opened at log sequence 58364
Current log# 2 seq# 58364 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
Current log# 2 seq# 58364 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jun 07 12:26:21 2013
SMON: enabling cache recovery
Fri Jun 07 12:26:23 2013
minact-scn: Inst 1 is a slave inc#:121 mmon proc-id:23048 status:0x2
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Jun 07 12:26:34 2013
[23651] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:2061372614 end:2061384964 diff:12350 (123 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
Fri Jun 07 12:26:34 2013
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Starting background process GTX0
Fri Jun 07 12:26:35 2013
GTX0 started with pid=45, OS id=23931
Starting background process RCBG
Fri Jun 07 12:26:35 2013
RCBG started with pid=46, OS id=23933
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Jun 07 12:26:35 2013
QMNC started with pid=48, OS id=23940
Completed: ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
Fri Jun 07 12:26:38 2013
Starting background process CJQ0
Fri Jun 07 12:26:38 2013
CJQ0 started with pid=55, OS id=23977
Fri Jun 07 12:27:56 2013
Thread 1 advanced to log sequence 58365 (LGWR switch)
Current log# 1 seq# 58365 mem# 0: +DATA/odsdb/onlinelog/group_1.263.812288839
Current log# 1 seq# 58365 mem# 1: +DATA/odsdb/onlinelog/group_1.264.812288839
Fri Jun 07 12:28:18 2013
Starting background process SMCO
Fri Jun 07 12:28:18 2013
SMCO started with pid=70, OS id=25166
Fri Jun 07 12:29:01 2013
Thread 1 cannot allocate new log, sequence 58366
Trace file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
System name: Linux
Node name: odsdb1
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Instance name: odsdb1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 33
Unix process pid: 32641, image: oracle@odsdb1 (ASMB)
*** 2013-05-14 15:37:08.705
*** SESSION ID:(3499.1) 2013-05-14 15:37:08.705
*** CLIENT ID:() 2013-05-14 15:37:08.705
*** SERVICE NAME:() 2013-05-14 15:37:08.705
*** MODULE NAME:() 2013-05-14 15:37:08.705
*** ACTION NAME:() 2013-05-14 15:37:08.705
NOTE: initiating MARK startup
*** 2013-05-14 15:37:16.835
instance health monitoring reports instance shutting down
*** 2013-06-07 12:23:42.700
NOTE: ASMB terminating
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
error 15064 detected in background process
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1332<-ksbrdp()+3344<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
*** 2013-06-07 12:23:42.783
ASMB (ospid: 32641): terminating the instance due to error 15064
/u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
NOTE: ASMB process exiting, either shutdown is in progress
NOTE: or foreground connected to ASMB was killed.
Fri Jun 07 12:23:42 2013
NOTE: client exited [14808]
Fri Jun 07 12:23:44 2013
Received an instance abort message from instance 2
Please check instance 2 alert and LMON trace files for detail.
Fri Jun 07 12:23:44 2013
Received an instance abort message from instance 2
Please check instance 2 alert and LMON trace files for detail.
LMD0 (ospid: 31201): terminating the instance due to error 481
Instance terminated by LMD0, pid = 31201
Fri Jun 07 12:24:30 2013
* instance_number obtained from CSS = 1, checking for the existence of node 0...
* node 0 does not exist. instance_number = 1
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
[name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.2/grid/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
[grid@odsdb1 cssd]$ file core.30481
core.30481: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'ocssd.bin'
[grid@odsdb1 cssd]$ gdb
gdb gdbserver gdbtui
[grid@odsdb1 cssd]$ gdb ocssd.bin core.30481
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /u01/app/11.2.0.2/grid/bin/ocssd.bin...(no debugging symbols found)...done.
[New Thread 30486]
[New Thread 30530]
[New Thread 30526]
[New Thread 30525]
[New Thread 30523]
[New Thread 30522]
[New Thread 30521]
[New Thread 30520]
[New Thread 30519]
[New Thread 30504]
[New Thread 30503]
[New Thread 30495]
[New Thread 30485]
[New Thread 30484]
[New Thread 30483]
[New Thread 30481]
Reading symbols from /u01/app/11.2.0.2/grid/lib/libhasgen11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libhasgen11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libocr11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocr11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrb11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrb11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrutl11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrutl11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1
Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxn2.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxn2.so
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libcell11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libcell11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxp11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxp11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libnnz11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnnz11.so
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /u01/app/11.2.0.2/grid/lib/libnque11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnque11.so
Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...(no debugging symbols found)...done.
Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff505fd000
Core was generated by `/u01/app/11.2.0.2/grid/bin/ocssd.bin '.
Program terminated with signal 6, Aborted.
#0 0x000000369ea30265 in raise () from /lib64/libc.so.6
(gdb) where
#0 0x000000369ea30265 in raise () from /lib64/libc.so.6
#1 0x000000369ea31d10 in abort () from /lib64/libc.so.6
#2 0x00002afc67f9aeda in scls_abort (flags=0) at scls.c:7088
#3 0x000000000040babd in clssscExit (thrd=0x10d325a0, status=clssscreasonSHUTNORM) at clsssc.c:2155
#4 0x0000000000446221 in clssgmClientShutdown (thrd=0x10d325a0, cmInfo=0x10b40090) at clssgmc.c:6415
#5 0x0000000000436707 in clssgmProcClientReqs (thrd=0x10d325a0, clctx=0x10b40630) at clssgmc.c:704
#6 0x0000000000436405 in clssgmclientlsnr (thrd=0x10d325a0) at clssgmc.c:644
#7 0x000000000040ac2f in clssscthrdmain (thrd=0x10d325a0) at clsssc.c:1716
#8 0x000000369fa0677d in start_thread () from /lib64/libpthread.so.0
#9 0x000000369ead49ad in clone () from /lib64/libc.so.6
(gdb)
2013-06-07 12:19:37.377: [ CSSD][1085888832]clssscSelect: cookie accept request 0x10b40630
2013-06-07 12:19:37.377: [ CSSD][1085888832]clssgmAllocProc: (0x2aaab0133ea0) allocated
2013-06-07 12:19:37.379: [ CSSD][1085888832]clssgmClientConnectMsg: properties of cmProc 0x2aaab0133ea0 - 1,2,3,4,5
2013-06-07 12:19:37.379: [ CSSD][1085888832]clssgmClientConnectMsg: Connect from con(0x6ae44fa) proc(0x2aaab0133ea0) pid(14139/14139) version 11:2:1:4, properties: 1,2,3,4,5
2013-06-07 12:19:37.379: [ CSSD][1085888832]clssgmClientConnectMsg: msg flags 0x0000
2013-06-07 12:19:37.384: [ CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
2013-06-07 12:19:37.384: [ CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
2013-06-07 12:19:37.384: [ CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(1/0x2aaab010c5c0)
2013-06-07 12:19:37.385: [ CSSD][1085888832]clssgmRegisterShared: grp DBODSDB, mbr 0, type 1
2013-06-07 12:19:37.385: [ CSSD][1085888832]clssgmQueueShare: (0x2aaab0085790) target global grock DBODSDB member 0 type 1 queued from client (0x2aaab010c5c0), global grock DBODSDB, refcount 23
2013-06-07 12:19:37.385: [ CSSD][1085888832]clssgmRegisterShared: global grock DBODSDB member 0 share type 1, refcount 23
2013-06-07 12:19:37.391: [ CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
2013-06-07 12:19:37.391: [ CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
2013-06-07 12:19:37.391: [ CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(2/0x2aaab0061f10)
what is the problem
Edited by: 徐振富 on 2013-6-7 下午6:38
Edited by: 徐振富 on 2013-6-7 下午6:45is your ASM instance up?
If not, trying bring up ASM instance up just by itself and see if it throws any error?
Post status of crsctl status cluster -all -
RAC node outage causes SOA Suite 10.1.3.4 BPEL failure
Using weblogic 9.2 and the SOA Suite 10.1.3.4. We use a 10g Oracle RAC ( 2 nodes ); the WL cluster has a multi data source of 2 pools, each pool pointing to a single node in the rac, each pool deployed to the cluster, and the multi data source in load-balancing mode.
So the other night, one of the db nodes had a hardware failure ( ironically, with a remote monitoring / management card ). Annoying, but it should not have caused the BPEL servers to be in "FAILED NOT RESTARTABLE" status the next morning.
Jun 9, 2009 12:10:07 AM EDT> <Warning> <JDBC> <BEA-001129> <Received exception while creating connection for pool "esbaqds2": Io exception: The Network Adapter could not establish the connection>
SEVERE: Destroying JMSDequeuer failed
oracle.jms.AQjmsException: Connection has been administratively destroyed. Reconnect.
at oracle.jms.AQjmsSession.preClose(AQjmsSession.java:980)
at oracle.jms.AQjmsObject.close(AQjmsObject.java:409)
at oracle.jms.AQjmsSession.close(AQjmsSession.java:1020)
at oracle.tip.esb.server.dispatch.JMSDequeuer.destroy(JMSDequeuer.java:419)
at oracle.tip.esb.server.dispatch.JMSDequeuer.destroyWithoutUnsubscribing(JMSDequeuer.java:395)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:175)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
followed by a 2 GB log file containing 1.3 million iterations of the following within the next 10 minutes before the managed servers failed.
java.lang.NullPointerException
at java.lang.String.<init>(String.java:144)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:168)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
Both managed instances of the BPEL cluster failed, even though the 1st node of the Oracle RAC was still available.
Our 10.3 cluster, also using multi data sources to the same RAC for the OSB components, simply went on about its business using the remaining rac node pool.
Seems to be a single point of failure...We haven't changed the JDBC connection string yet, but we did run a test in the same environment while Oracle support considers the situation.
For the test, we simply shutdown one node of the RAC and watched to see what happens. Within the space of a minute, the JDBC "Failed Reserve Request Count" was increasing by thousands on every refresh of the screen. We restarted the RAC node after 5 minutes, by which time the "Failed Reserve Request Count" was over 190,000
The 2 BPEL managed servers remained in Running status and each created a 660 MB log file within that 5 minutes. In the original outage, the nodes were down for about 15 minutes. Most of the logging is being generated from within the oracle.tip.esb classes, not by the weblogic classes. It looks like that once the pool pointing to the downed RAC node becomes disabled, the Oracle BPEL code is still trying to use it even though the multi-source JNDI is the published lookup:
INFO: JMSDequeuer::createConnection - AQ Topics
java.sql.SQLException: weblogic.common.ResourceException:
esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
at weblogic.jdbc.common.internal.JDBCUtil.wrapAndThrowResourceException(JDBCUtil.java:250)
at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:348)
at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:364)
at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:559)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
Caused by: weblogic.common.ResourceException:
esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
at weblogic.jdbc.common.internal.MultiPool.searchLoadBalance(MultiPool.java:331)
at weblogic.jdbc.common.internal.MultiPool.findPool(MultiPool.java:202)
at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:77)
at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:346)
... 11 more
SEVERE: Failed to process deferred message
oracle.tip.esb.server.dispatch.QueueHandlerException: Error creating "weblogic.common.ResourceException: No good connections available."
at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:661)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181) -
Private Interconnect: Should any nodes other than RAC nodes have one?
The contractors that set up our four-node production 10g RAC (and a standalone development server) also assigned private interconnect addresses to 2 Apache/ApEx servers and a standalone development database server.
There are service names in the tnsnames.ora on all servers in our infrastructure referencing these private interconnects- even the non-rac member servers. The nics on these servers are not bound for failover with the nics bound to the public/VIP addresses. These nics are isolated on their own switch.
Could this configuration be related to lost heartbeats or voting disk errors? We experience rac node expulsions and even arbitrary bounces (reboots!) of all the rac nodes.I do not have access to the contractors. . . .can only look at what they have left behind and try to figure out their intention. . .
I am reading the Ault/Tumha book Oracle 10g Grid and Real Application Clusters and looking through our own settings and config files and learning srvctl and crsctl commands from their examples. Also googling and OTN searching through the library full of documentation. . .
I still have yet to figure out if the private interconnect spoken about so frequently in cluster configuration documents are the binding to the set of node.vip address specifications in the tnsnames.ora (bound the the first eth adaptor along with the public ip addresses for the nodes) or the binding on the second eth adaptor to the node.prv addresses not found in the local pfile, in the tnsnames.ora, or the listener.ora (but found at the operating system level in the ifconfig). If the node.prv addresses are not the private interconnect then can anyone tell me that they are for? -
Found the errors in CSSD logs of RAC node
Found the below error in CSSD logs in One of RAC nodes from 5:15 to 5:18 PM, after this the error got disappeared. Could anyone please have an idea what could be the reason of this error.
Also, at that time we didn't find any errors in the alert log.
[ CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization failed (112bd2a70), timed out, start 17:13:51.041, duration 120009
[ CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:15:51.233 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(1049540) proto(10:2:1:1)
[ CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization failed (112bd4a10), timed out, start 17:13:51.268, duration 120003
[ CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization prepare time: 3 ms
[ CSSD]2009-07-19 17:15:52.544 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(786918) proto(10:2:1:1)
[ CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization failed (112c38af0), timed out, start 17:13:53.290, duration 120009
[ CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization prepare time: 3 ms
[ CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization failed (112d356f0), timed out, start 17:13:53.320, duration 120000
[ CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:16:02.342 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b932b0) proc(112b67d10) pid(1336252) proto(10:2:1:1)
[ CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization failed (112d04f70), timed out, start 17:14:02.978, duration 120001
[ CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization failed (112d38210), timed out, start 17:14:03.006, duration 120002
[ CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization failed (112bd7e30), timed out, start 17:14:10.441, duration 120007
[ CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization prepare time: 2 ms
[ CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization failed (112d3ee70), timed out, start 17:14:10.840, duration 120008
[ CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization prepare time: 2 ms
Thanks,
MahiCheck the metalink note:
6996694-OCSSD.BIN CONSUMING 100% CPU AND ASM/DB HANGING -
Question on Rebooting RAC Nodes
Hi, I heard that when rebooting all RAC nodes, one has to wait at least 5 minutes between each node reboot. So you would reboot node 1 at 0 time, node 2 5 minutes later etc.
However, I could not find any documentation on this, can someone please point me to the right place to look? Thanks.I have not heard that before. I generally use srvctl to stop/start the databases, I do not think it waits 5 minutes between starting nodes as it does not take 15 minutes to stop/start the databases. As far as rebooting the hosts, the only time interval between machines was the time it took to send the reboot command to the host.
-
Document Management Node Setup
Should Document Management Node setup be available in 11.5.7 as shown in the Workflow User Guide (pg. 2-31 to 2-33, 4-5 to 4-6, 10-32 to 10-35)? The example in the manual is too generic for my DBA's or Internet people to understand it fully.
If we are using embedded 11i workflows can we attach WORD documents or Excel spreadsheets to workflow notifications in an HP UNIX environment or do we have to be using a formal document image management system?
How do we specifically complete the Document Management Nodes screen for a Document Management system's images?
How do we specifically complete the Document Management Nodes screen for a document created by a PL/SQL procedure?
If we can attach Microsoft office documents to notifications, how do we complete the Document Management Node screen?
We are not using e-mail notifications or e-mail summaries. Users are accessing notification screens only.
Dave Petrie
Valspar CorporationDavid,
The Document Management node setup is reserved for future use.
It is possible to integrate Workflow with Oracle Internet File System and more information can be found at:
http://otn.oracle.com/products/ifs/htdocs/workflow/workflow.html
Maybe you are looking for
-
Last night I attempted to purchase an app from the App Store on my iPhone 5s The previous day I had purchased an app with no issues, and this time was told that my payment method was invalid. When I clicked 'Continue' to fix my information, everythin
-
Rel. strategy is not triggered while creation of PR with cost centr
Hi Gurus, While creation of PR release strategy is not getting triggered with cost centre BCWSV870. But release strategy of PR triggered for cost centre BCWSV852 (In PR changed only cost centre as BCWSV852 instead of BCWSV870). Both cost centres are
-
Pdf report not saving in to the local system
Hi, I am facing some issue with PDF report. We have written code to generate PDF report using java to generate the report we have used the jasper API. We were able to generate the pdf report and we have written a code to appear the ��save as�� dialog
-
Webservice --- XI ---- webservice
Hi, I'm trying out a synchrous sceanrio where i have a sender soap adapter and a receiver soap adapter. I want to create a employee record in MDM system and the record id in passed back to the 1st webservc.One of my outbound sync interface is exp
-
I want to open a sound file from my index page that will describe my sight and play while users explore the site. I have tried to use behaviors which will open an HTML page but not an MP3 file - cant see why not. The other opens which were to make a