RAC node down

Hi All,
Oracle 10.2.0
Windows server 2003
My doubt is, In the Oracle RAC server if the there are some connections to the node1 and node2 through some applications.
If node1 is down, Will the existing connections to the node1 will be redirected to the node2 ?
Please Advice..
TIA,

If you did configure transparent application failover (TAF) in the tnsnames.ora of the clients, yes.
You can read about TAF in the Net Administrators Manaul at http://tahiti.oracle.com
Sybrand Bakker
Senior Oracle DBA

Similar Messages

  • RAC Node down and ORA-12514

    I have a two node rac setup. One Node went down because of hardware issues. And it seems that I cannot connect from client (jdbc) when SCAN gives particular ip.
    I receive : ORA-12514, TNS:listener does not currently know of service requested in connect descriptor. If DNS returns the correct ip - everything works fine.
    connection string:
    jdbc:oracle:thin:@(DESCRIPTION= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=testracscan.internal.int)(PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=testdb.internal.int)))
    Interfaces show that VIPS and SCANS are assigned correctly on Node 1:
    vlan65 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.10 Bcast:192.168.2.255 Mask:255.255.255.0
    inet6 addr: fe80::2e76:8aff:fe4f:b5cc/64 Scope:Link
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    RX packets:937195 errors:0 dropped:0 overruns:0 frame:0
    TX packets:852745 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:0
    RX bytes:186434457 (177.7 MiB) TX bytes:141217705 (134.6 MiB)
    vlan65:1 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.25 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    vlan65:2 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.35 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    vlan65:3 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.30 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    vlan65:4 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.110 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    vlan65:5 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.115 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    [oracle@srvtestdb1 ~]$ lsnrctl status
    LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 03-SEP-2012 15:35:05
    Copyright (c) 1991, 2011, Oracle. All rights reserved.
    Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
    STATUS of the LISTENER
    Alias LISTENER
    Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
    Start Date 29-AUG-2012 15:52:57
    Uptime 4 days 23 hr. 42 min. 7 sec
    Trace Level off
    Security ON: Local OS Authentication
    SNMP OFF
    Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
    Listener Log File /u01/app/grid/diag/tnslsnr/srvtestdb1/listener/alert/log.xml
    Listening Endpoints Summary...
    (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.10)(PORT=1521)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.110)(PORT=1521)))
    Services Summary...
    Service "+ASM" has 1 instance(s).
    Instance "+ASM1", status READY, has 1 handler(s) for this service...
    Service "testdb.internal.int" has 1 instance(s).
    Instance "testdb1", status READY, has 1 handler(s) for this service...
    Service "testdbXDB.internal.int" has 1 instance(s).
    Instance "testdb1", status READY, has 1 handler(s) for this service...
    Service "testdbsvc.internal.int" has 1 instance(s).
    Instance "testdb1", status READY, has 1 handler(s) for this service...
    The command completed successfully
    [oracle@srvtestdb1 ~]$
    SQL> show parameter listener
    NAME TYPE VALUE
    listener_networks string
    local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.110)(PORT=1521))))
    remote_listener string testracscan.internal.int:1521
    nslookup testracscan.internal.int
    Server: 192.168.0.18
    Address: 192.168.0.18#53
    Name: testracscan.internal.int
    Address: 192.168.2.30
    Name: testracscan.internal.int
    Address: 192.168.2.25
    Name: testracscan.internal.int
    Address: 192.168.2.35
    Problems arise when client ip is resolved to 192.168.2.35 - i get ORA12514.
    When IP is resolved to 192.168.2.110 it simply sits ant waits for a moment and then begins to work, and nestat shows:
    tcp 0 0 ::ffff:1 192.168.2.5:51685 ::ffff:192.168.2.110:1521 ESTABLISHED
    What might be causing this?

    [grid@srvtestdb1 ~]$ ps -ef|grep tns
    root 65 2 0 Aug29 ? 00:00:00 [netns]
    grid 4449 1 0 Aug29 ? 00:00:25 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN2 -inherit
    grid 4454 1 0 Aug29 ? 00:00:23 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
    grid 4481 1 0 Aug29 ? 00:00:33 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
    grid 37028 1 0 09:38 ? 00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
    grid 37901 36372 0 09:45 pts/0 00:00:00 grep tns
    [grid@srvtestdb1 ~]$
    [grid@srvtestdb1 ~]$ srvctl config scan_listener
    SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
    SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
    SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521
    [grid@srvtestdb1 ~]$
    [grid@srvtestdb1 ~]$ srvctl status scan_listener
    SCAN Listener LISTENER_SCAN1 is enabled
    SCAN listener LISTENER_SCAN1 is running on node srvtestdb1
    SCAN Listener LISTENER_SCAN2 is enabled
    SCAN listener LISTENER_SCAN2 is running on node srvtestdb1
    SCAN Listener LISTENER_SCAN3 is enabled
    SCAN listener LISTENER_SCAN3 is running on node srvtestdb1
    [grid@srvtestdb1 ~]$ srvctl status scan
    SCAN VIP scan1 is enabled
    SCAN VIP scan1 is running on node srvtestdb1
    SCAN VIP scan2 is enabled
    SCAN VIP scan2 is running on node srvtestdb1
    SCAN VIP scan3 is enabled
    SCAN VIP scan3 is running on node srvtestdb1

  • 10g RAC Node down Enterprise Manager

    Hello
    We have a 2 node Oracle 10g RAC Rel2 Linux setup. Enterprise Manager was first stalled on Node1 and we access it using http://node1:5500/em.
    This node has a hardware failure and is out of commision at the moment. When I try to connect to http://node2.5500/em it does not work.
    I see the dbconsole process is running onthe node2.
    How can I use the Enterprise Manager if the node1 is down?
    Thanks

    Can you try deconfig and config again on node 2?
    emctl stop dbconsol
    emca -deconfig dbcontrol
    emca -config dbcontrolSalman

  • What would happened when one RAC node's public NIC down ?

    Dear all,
    There's a two-node RAC on my office. As my observation, when one RAC node's public NIC is down, the "crs_stat -t" command wouldn't show any resource is OFFLINE. So at this moment if i use sqlplus to connect to my RAC db, it still will choose node1 or node2 instance randomly right? And if i've been assigned to the node's instance whose public NIC is down, my sqlplus would failed to connect to db?
    So, can we say that Oracle RAC can't supply HA function if one node's public NIC is down? Or is there just any other solution to solve this issue? Any suggestion would be appreciated.

    Public node is down means that only one node would be able to take the load i.e. in your case it will be Node 2. It may happen that CRS is unable to record the status - in such case there may be failed connections to one node.

  • JDBC read stuck if RAC node goes down

    We did several tests with Java applications against our RAC DB and face a hanging application if we power off the RAC node that executes the current (long) running query.
    We can see that the application receives HA-events via UCP:
    2015-01-22 13:02:11 | r-thread-1 | WARN  | o.ucp.jdbc.oracle.ONSDatabaseFailoverEvent    | NO timezone in HA event
    However, the application started a query before and the query is not aborted with an exception. A Thread dump after about 7 minutes shows that the application is hanging in a socket read call:
    "pool-1-thread-1" #32 prio=5 os_prio=0 tid=0x00007fedf45b2000 nid=0xbc4 runnable [0x00007fee00cd3000]
       java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:150)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        at oracle.net.ns.Packet.receive(Packet.java:283)
        at oracle.net.ns.DataPacket.receive(DataPacket.java:103)
        at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:230)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:175)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:100)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:85)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
        at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1122)
        at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1099)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:288)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
        at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
        at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:863)
        at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1275)
        at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576)
        at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3620)
        - locked <0x00000000c0ddcb20> (a oracle.jdbc.driver.T4CConnection)
        at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
        at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:703)
    The expected behaviour would be that a running query is aborted with an exception. (BTW: This happens if the service is taken down with "shutdown immediate". All ok for this case.)
    We consider to implement custom ONS listeners [1], but we actually expect that UCP would handle such situations or lets us register strategies/callbacks for certain events.
    Our config:
    Oracle Enterprise 11.2.0.4.0 with RAC
    ons.jar 12.1.0.1
    ojdbc6.jar 11.2.0.2
    ucp.jar 12.1.0.1
    Server JRE 1.8.0_25
    Any hints appreciated.
    [1] http://docs.oracle.com/cd/E11882_01/java.112/e16548/apxracfan.htm#JJDBC28945

    You're concept isn't right:
    http://docs.oracle.com/cd/E11882_01/server.112/e25494/restart.htm#ADMIN13178
    Overview of Fast Application Notification
    FAN is a notification mechanism that Oracle Restart can use to notify other processes about configuration changes that include service status changes, such as UP or DOWN events. FAN provides the ability to immediately terminate inflight transaction when an instance or server fails. Integrated Oracle clients receive the events and respond. Applications can respond either by propagating the error to the user or by resubmitting the transactions and masking the error from the application user. When a DOWN event occurs, integrated clients immediately clean up connections to the terminated database. When an UP event occurs, the clients create new connections to the new primary database instance.
    Also, take a look at these docs: http://docs.oracle.com/cd/E11882_01/java.112/e12265/rac.htm#JJUCP08100 ; and https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=890204623685515&id=566573.1&_afrWindowMode=0&_adf.ctrl-s…
    And make a test, execute  a query that took about 1 minute and after you executed, just power down the node where it is executing, to see if it will retrieve the results.
    Regards.

  • Rac node restart

    Hello everyone,
    I have met an error,that is our RAC node auto restart with below messages.
    #/u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/alert_odsdb1.log
    Fri Jun 07 12:23:42 2013
    Thread 1 cannot allocate new log, sequence 58363
    Checkpoint not complete
    Current log# 2 seq# 58362 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
    Current log# 2 seq# 58362 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
    Fri Jun 07 12:23:42 2013
    NOTE: ASMB terminating
    Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    ASMB (ospid: 32641): terminating the instance due to error 15064
    Fri Jun 07 12:23:44 2013
    ORA-1092 : opitsk aborting process
    Fri Jun 07 12:23:46 2013
    ORA-1092 : opitsk aborting process
    Instance terminated by ASMB, pid = 32641
    Fri Jun 07 12:25:02 2013
    Starting ORACLE instance (normal)
    Fri Jun 07 12:25:23 2013
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
    [name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'eth0' configured from GPnP for use as a public interface.
    [name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Public Interface 'eth0:1' configured from GPnP for use as a public interface.
    [name='eth0:1', type=1, ip=135.33.2.13, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0/dbhome_2/dbs/arch
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    Starting up:
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options.
    ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
    System name:     Linux
    Node name:     odsdb1
    Release:     2.6.18-308.el5
    Version:     #1 SMP Fri Jan 27 17:17:51 EST 2012
    Machine:     x86_64
    Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_2/dbs/initodsdb1.ora
    System parameters with non-default values:
    processes = 4500
    sessions = 6784
    event = ""
    spfile = "+DATA/odsdb/spfileodsdb.ora"
    nls_language = "SIMPLIFIED CHINESE"
    nls_territory = "CHINA"
    memory_target = 170G
    control_files = "+DATA/odsdb/controlfile/current.262.812288837"
    control_files = "+DATA/odsdb/controlfile/current.261.812288837"
    db_block_size = 8192
    compatible = "11.2.0.0.0"
    db_files = 4096
    cluster_database = TRUE
    db_create_file_dest = "+DATA"
    db_recovery_file_dest = ""
    db_recovery_file_dest_size= 38820M
    thread = 1
    undo_tablespace = "UNDOTBS1"
    instance_number = 1
    remote_login_passwordfile= "EXCLUSIVE"
    db_domain = ""
    dispatchers = "(PROTOCOL=TCP) (SERVICE=odsdbXDB)"
    remote_listener = "odsdb-cluster-scan:1521"
    job_queue_processes = 1000
    audit_file_dest = "/u01/app/oracle/admin/odsdb/adump"
    audit_trail = "DB"
    db_name = "odsdb"
    open_cursors = 300
    diagnostic_dest = "/u01/app/oracle"
    Cluster communication is configured to use the following interface(s) for this instance
    169.254.37.103
    cluster interconnect IPC version:Oracle UDP/IP (generic)
    IPC Vendor 1 proto 2
    Fri Jun 07 12:25:33 2013
    PMON started with pid=2, OS id=22959
    Fri Jun 07 12:25:33 2013
    PSP0 started with pid=3, OS id=22962
    Fri Jun 07 12:25:34 2013
    VKTM started with pid=4, OS id=22971 at elevated priority
    VKTM running at (1)millisec precision with DBRM quantum (100)ms
    Fri Jun 07 12:25:34 2013
    GEN0 started with pid=5, OS id=22977
    Fri Jun 07 12:25:34 2013
    DIAG started with pid=6, OS id=22979
    Fri Jun 07 12:25:35 2013
    DBRM started with pid=7, OS id=22981
    Fri Jun 07 12:25:35 2013
    PING started with pid=8, OS id=22983
    Fri Jun 07 12:25:35 2013
    ACMS started with pid=9, OS id=22985
    Fri Jun 07 12:25:35 2013
    DIA0 started with pid=10, OS id=22987
    Fri Jun 07 12:25:35 2013
    LMON started with pid=11, OS id=22989
    Fri Jun 07 12:25:35 2013
    LMD0 started with pid=12, OS id=22991
    * Load Monitor used for high load check
    * New Low - High Load Threshold Range = [61440 - 81920]
    Fri Jun 07 12:25:35 2013
    LMS0 started with pid=13, OS id=22994 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS1 started with pid=14, OS id=22998 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS2 started with pid=15, OS id=23002 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS3 started with pid=16, OS id=23006 at elevated priority
    Fri Jun 07 12:25:35 2013
    RMS0 started with pid=17, OS id=23010
    Fri Jun 07 12:25:35 2013
    LMHB started with pid=18, OS id=23013
    Fri Jun 07 12:25:35 2013
    MMAN started with pid=19, OS id=23015
    Fri Jun 07 12:25:35 2013
    DBW0 started with pid=20, OS id=23017
    Fri Jun 07 12:25:35 2013
    DBW1 started with pid=21, OS id=23019
    Fri Jun 07 12:25:35 2013
    DBW2 started with pid=22, OS id=23022
    Fri Jun 07 12:25:35 2013
    DBW3 started with pid=23, OS id=23024
    Fri Jun 07 12:25:35 2013
    DBW4 started with pid=24, OS id=23026
    Fri Jun 07 12:25:35 2013
    DBW5 started with pid=25, OS id=23028
    Fri Jun 07 12:25:35 2013
    DBW6 started with pid=26, OS id=23031
    Fri Jun 07 12:25:35 2013
    DBW7 started with pid=27, OS id=23033
    Fri Jun 07 12:25:35 2013
    LGWR started with pid=28, OS id=23035
    Fri Jun 07 12:25:35 2013
    CKPT started with pid=29, OS id=23037
    Fri Jun 07 12:25:35 2013
    SMON started with pid=30, OS id=23039
    Fri Jun 07 12:25:35 2013
    RECO started with pid=31, OS id=23041
    Fri Jun 07 12:25:35 2013
    RBAL started with pid=32, OS id=23043
    Fri Jun 07 12:25:35 2013
    ASMB started with pid=33, OS id=23045
    Fri Jun 07 12:25:35 2013
    MMON started with pid=34, OS id=23048
    Fri Jun 07 12:25:35 2013
    MMNL started with pid=35, OS id=23052
    Fri Jun 07 12:25:35 2013
    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
    NOTE: initiating MARK startup
    starting up 1 shared server(s) ...
    Starting background process MARK
    Fri Jun 07 12:25:35 2013
    MARK started with pid=37, OS id=23056
    NOTE: MARK has subscribed
    lmon registered with NM - instance number 1 (internal mem no 0)
    Reconfiguration started (old inc 0, new inc 119)
    List of instances:
    1 2 (myinst: 1)
    Global Resource Directory frozen
    * allocate domain 0, invalid = TRUE
    Communication channels reestablished
    * domain 0 valid according to instance 2
    * domain 0 valid = 1 according to instance 2
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Submitted all GCS remote-cache requests
    Fix write in gcs resources
    Reconfiguration started (old inc 119, new inc 121)
    List of instances:
    1 2 (myinst: 1)
    Nested reconfiguration detected.
    Global Resource Directory frozen
    Communication channels reestablished
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Fri Jun 07 12:25:45 2013
    Submitted all GCS remote-cache requests
    Fri Jun 07 12:26:08 2013
    Fix write in gcs resources
    Reconfiguration complete
    Fri Jun 07 12:26:10 2013
    LCK0 started with pid=40, OS id=23632
    Fri Jun 07 12:26:10 2013
    Starting background process RSMN
    Fri Jun 07 12:26:10 2013
    RSMN started with pid=41, OS id=23646
    ORACLE_BASE not set in environment. It is recommended
    that ORACLE_BASE be set in the environment
    Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
    Fri Jun 07 12:26:11 2013
    ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=135.33.2.13)(PORT=1521))))' SCOPE=MEMORY SID='odsdb1';
    ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
    Fri Jun 07 12:26:11 2013
    NOTE: Loaded library: System
    Fri Jun 07 12:26:11 2013
    SUCCESS: diskgroup DATA was mounted
    Fri Jun 07 12:26:11 2013
    NOTE: dependency between database odsdb and diskgroup resource ora.DATA.dg is established
    Fri Jun 07 12:26:16 2013
    Successful mount of redo thread 1, with mount id 3452000551
    Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
    Lost write protection disabled
    Completed: ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
    ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
    Picked broadcast on commit scheme to generate SCNs
    Thread 1 advanced to log sequence 58364 (thread open)
    Thread 1 opened at log sequence 58364
    Current log# 2 seq# 58364 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
    Current log# 2 seq# 58364 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
    Successful open of redo thread 1
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Fri Jun 07 12:26:21 2013
    SMON: enabling cache recovery
    Fri Jun 07 12:26:23 2013
    minact-scn: Inst 1 is a slave inc#:121 mmon proc-id:23048 status:0x2
    minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
    Fri Jun 07 12:26:34 2013
    [23651] Successfully onlined Undo Tablespace 2.
    Undo initialization finished serial:0 start:2061372614 end:2061384964 diff:12350 (123 seconds)
    Verifying file header compatibility for 11g tablespace encryption..
    Verifying 11g file header compatibility for tablespace encryption completed
    Fri Jun 07 12:26:34 2013
    SMON: enabling tx recovery
    Database Characterset is ZHS16GBK
    No Resource Manager plan active
    Starting background process GTX0
    Fri Jun 07 12:26:35 2013
    GTX0 started with pid=45, OS id=23931
    Starting background process RCBG
    Fri Jun 07 12:26:35 2013
    RCBG started with pid=46, OS id=23933
    replication_dependency_tracking turned off (no async multimaster replication found)
    Starting background process QMNC
    Fri Jun 07 12:26:35 2013
    QMNC started with pid=48, OS id=23940
    Completed: ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
    Fri Jun 07 12:26:38 2013
    Starting background process CJQ0
    Fri Jun 07 12:26:38 2013
    CJQ0 started with pid=55, OS id=23977
    Fri Jun 07 12:27:56 2013
    Thread 1 advanced to log sequence 58365 (LGWR switch)
    Current log# 1 seq# 58365 mem# 0: +DATA/odsdb/onlinelog/group_1.263.812288839
    Current log# 1 seq# 58365 mem# 1: +DATA/odsdb/onlinelog/group_1.264.812288839
    Fri Jun 07 12:28:18 2013
    Starting background process SMCO
    Fri Jun 07 12:28:18 2013
    SMCO started with pid=70, OS id=25166
    Fri Jun 07 12:29:01 2013
    Thread 1 cannot allocate new log, sequence 58366
    Trace file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options
    ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
    System name: Linux
    Node name: odsdb1
    Release: 2.6.18-308.el5
    Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
    Machine: x86_64
    Instance name: odsdb1
    Redo thread mounted by this instance: 0 <none>
    Oracle process number: 33
    Unix process pid: 32641, image: oracle@odsdb1 (ASMB)
    *** 2013-05-14 15:37:08.705
    *** SESSION ID:(3499.1) 2013-05-14 15:37:08.705
    *** CLIENT ID:() 2013-05-14 15:37:08.705
    *** SERVICE NAME:() 2013-05-14 15:37:08.705
    *** MODULE NAME:() 2013-05-14 15:37:08.705
    *** ACTION NAME:() 2013-05-14 15:37:08.705
    NOTE: initiating MARK startup
    *** 2013-05-14 15:37:16.835
    instance health monitoring reports instance shutting down
    *** 2013-06-07 12:23:42.700
    NOTE: ASMB terminating
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    error 15064 detected in background process
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    kjzduptcctx: Notifying DIAG for crash event
    ----- Abridged Call Stack Trace -----
    ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1332<-ksbrdp()+3344<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
    ----- End of Abridged Call Stack Trace -----
    *** 2013-06-07 12:23:42.783
    ASMB (ospid: 32641): terminating the instance due to error 15064
    /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
    NOTE: ASMB process exiting, either shutdown is in progress
    NOTE: or foreground connected to ASMB was killed.
    Fri Jun 07 12:23:42 2013
    NOTE: client exited [14808]
    Fri Jun 07 12:23:44 2013
    Received an instance abort message from instance 2
    Please check instance 2 alert and LMON trace files for detail.
    Fri Jun 07 12:23:44 2013
    Received an instance abort message from instance 2
    Please check instance 2 alert and LMON trace files for detail.
    LMD0 (ospid: 31201): terminating the instance due to error 481
    Instance terminated by LMD0, pid = 31201
    Fri Jun 07 12:24:30 2013
    * instance_number obtained from CSS = 1, checking for the existence of node 0...
    * node 0 does not exist. instance_number = 1
    Starting ORACLE instance (normal)
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
    [name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'eth0' configured from GPnP for use as a public interface.
    [name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.2/grid/dbs/arch
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    [grid@odsdb1 cssd]$ file core.30481
    core.30481: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'ocssd.bin'
    [grid@odsdb1 cssd]$ gdb
    gdb gdbserver gdbtui
    [grid@odsdb1 cssd]$ gdb ocssd.bin core.30481
    GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
    Copyright (C) 2009 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law. Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /u01/app/11.2.0.2/grid/bin/ocssd.bin...(no debugging symbols found)...done.
    [New Thread 30486]
    [New Thread 30530]
    [New Thread 30526]
    [New Thread 30525]
    [New Thread 30523]
    [New Thread 30522]
    [New Thread 30521]
    [New Thread 30520]
    [New Thread 30519]
    [New Thread 30504]
    [New Thread 30503]
    [New Thread 30495]
    [New Thread 30485]
    [New Thread 30484]
    [New Thread 30483]
    [New Thread 30481]
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libhasgen11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libhasgen11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocr11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocr11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrb11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrb11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrutl11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrutl11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxn2.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxn2.so
    Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libdl.so.2
    Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libm.so.6
    Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
    [Thread debugging using libthread_db enabled]
    Loaded symbols for /lib64/libpthread.so.0
    Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libnsl.so.1
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libcell11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libcell11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxp11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxp11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libnnz11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnnz11.so
    Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libc.so.6
    Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib64/libaio.so.1
    Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib64/ld-linux-x86-64.so.2
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libnque11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnque11.so
    Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...(no debugging symbols found)...done.
    Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so
    warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff505fd000
    Core was generated by `/u01/app/11.2.0.2/grid/bin/ocssd.bin '.
    Program terminated with signal 6, Aborted.
    #0 0x000000369ea30265 in raise () from /lib64/libc.so.6
    (gdb) where
    #0 0x000000369ea30265 in raise () from /lib64/libc.so.6
    #1 0x000000369ea31d10 in abort () from /lib64/libc.so.6
    #2 0x00002afc67f9aeda in scls_abort (flags=0) at scls.c:7088
    #3 0x000000000040babd in clssscExit (thrd=0x10d325a0, status=clssscreasonSHUTNORM) at clsssc.c:2155
    #4 0x0000000000446221 in clssgmClientShutdown (thrd=0x10d325a0, cmInfo=0x10b40090) at clssgmc.c:6415
    #5 0x0000000000436707 in clssgmProcClientReqs (thrd=0x10d325a0, clctx=0x10b40630) at clssgmc.c:704
    #6 0x0000000000436405 in clssgmclientlsnr (thrd=0x10d325a0) at clssgmc.c:644
    #7 0x000000000040ac2f in clssscthrdmain (thrd=0x10d325a0) at clsssc.c:1716
    #8 0x000000369fa0677d in start_thread () from /lib64/libpthread.so.0
    #9 0x000000369ead49ad in clone () from /lib64/libc.so.6
    (gdb)
    2013-06-07 12:19:37.377: [    CSSD][1085888832]clssscSelect: cookie accept request 0x10b40630
    2013-06-07 12:19:37.377: [    CSSD][1085888832]clssgmAllocProc: (0x2aaab0133ea0) allocated
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: properties of cmProc 0x2aaab0133ea0 - 1,2,3,4,5
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: Connect from con(0x6ae44fa) proc(0x2aaab0133ea0) pid(14139/14139) version 11:2:1:4, properties: 1,2,3,4,5
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: msg flags 0x0000
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(1/0x2aaab010c5c0)
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmRegisterShared: grp DBODSDB, mbr 0, type 1
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmQueueShare: (0x2aaab0085790) target global grock DBODSDB member 0 type 1 queued from client (0x2aaab010c5c0), global grock DBODSDB, refcount 23
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmRegisterShared: global grock DBODSDB member 0 share type 1, refcount 23
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(2/0x2aaab0061f10)
    what is the problem
    Edited by: 徐振富 on 2013-6-7 下午6:38
    Edited by: 徐振富 on 2013-6-7 下午6:45

    is your ASM instance up?
    If not, trying bring up ASM instance up just by itself and see if it throws any error?
    Post status of crsctl status cluster -all

  • RAC node outage causes SOA Suite 10.1.3.4 BPEL  failure

    Using weblogic 9.2 and the SOA Suite 10.1.3.4. We use a 10g Oracle RAC ( 2 nodes ); the WL cluster has a multi data source of 2 pools, each pool pointing to a single node in the rac, each pool deployed to the cluster, and the multi data source in load-balancing mode.
    So the other night, one of the db nodes had a hardware failure ( ironically, with a remote monitoring / management card ). Annoying, but it should not have caused the BPEL servers to be in "FAILED NOT RESTARTABLE" status the next morning.
    Jun 9, 2009 12:10:07 AM EDT> <Warning> <JDBC> <BEA-001129> <Received exception while creating connection for pool "esbaqds2": Io exception: The Network Adapter could not establish the connection>
    SEVERE: Destroying JMSDequeuer failed
    oracle.jms.AQjmsException: Connection has been administratively destroyed. Reconnect.
    at oracle.jms.AQjmsSession.preClose(AQjmsSession.java:980)
    at oracle.jms.AQjmsObject.close(AQjmsObject.java:409)
    at oracle.jms.AQjmsSession.close(AQjmsSession.java:1020)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.destroy(JMSDequeuer.java:419)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.destroyWithoutUnsubscribing(JMSDequeuer.java:395)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:175)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    followed by a 2 GB log file containing 1.3 million iterations of the following within the next 10 minutes before the managed servers failed.
    java.lang.NullPointerException
    at java.lang.String.<init>(String.java:144)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:168)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    Both managed instances of the BPEL cluster failed, even though the 1st node of the Oracle RAC was still available.
    Our 10.3 cluster, also using multi data sources to the same RAC for the OSB components, simply went on about its business using the remaining rac node pool.
    Seems to be a single point of failure...

    We haven't changed the JDBC connection string yet, but we did run a test in the same environment while Oracle support considers the situation.
    For the test, we simply shutdown one node of the RAC and watched to see what happens. Within the space of a minute, the JDBC "Failed Reserve Request Count" was increasing by thousands on every refresh of the screen. We restarted the RAC node after 5 minutes, by which time the "Failed Reserve Request Count" was over 190,000
    The 2 BPEL managed servers remained in Running status and each created a 660 MB log file within that 5 minutes. In the original outage, the nodes were down for about 15 minutes. Most of the logging is being generated from within the oracle.tip.esb classes, not by the weblogic classes. It looks like that once the pool pointing to the downed RAC node becomes disabled, the Oracle BPEL code is still trying to use it even though the multi-source JNDI is the published lookup:
    INFO: JMSDequeuer::createConnection - AQ Topics
    java.sql.SQLException: weblogic.common.ResourceException:
    esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
    esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
    at weblogic.jdbc.common.internal.JDBCUtil.wrapAndThrowResourceException(JDBCUtil.java:250)
    at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:348)
    at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:364)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:559)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    Caused by: weblogic.common.ResourceException:
    esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
    esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
    at weblogic.jdbc.common.internal.MultiPool.searchLoadBalance(MultiPool.java:331)
    at weblogic.jdbc.common.internal.MultiPool.findPool(MultiPool.java:202)
    at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:77)
    at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:346)
    ... 11 more
    SEVERE: Failed to process deferred message
    oracle.tip.esb.server.dispatch.QueueHandlerException: Error creating "weblogic.common.ResourceException: No good connections available."
    at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:661)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)

  • RAC node restarting!

    hi
    one of our RAC environment keep restarting.
    i've disable the init.cssd, init.crs, init.evmd in the /etc/inittab in order to check the logs.
    this is the situation:
    crsd.log:
    2009-02-04 00:09:00.118: [ COMMCRS][9]clsc_connect: (8000000100318640) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
    2009-02-04 00:09:00.132: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
    2009-02-04 00:09:00.134: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
    2009-02-04 00:09:08.016: [    CRSD][1]32Daemon Version: 10.2.0.2.0 Active Version: 10.2.0.2.0
    2009-02-04 00:09:08.016: [    CRSD][1]32Active Version and Software Version are same
    2009-02-04 00:09:08.017: [ CRSMAIN][1]32Initializing OCR
    2009-02-04 00:09:08.037: [  OCRRAW][1]proprioo: for disk 0 (/dev/rdsk/ora_ocr_raw), id match (1), my id set (752560621,1028247821) total id sets (1), 1st set
    (752560621,1028247821), 2nd set (0,0) my votes (2), total votes (2)
    2009-02-04 00:09:08.140: [ CSSCLNT][24]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
    ocssd.log:
    [    CSSD]2009-02-03 21:52:08.651 [9] >USER: clssnmHandleUpdate: NODE 1 (node1l) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2009-02-03 21:52:08.651 [9] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
    [    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmWaitForAcks: done, msg type(15)
    [    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmDoSyncUpdate: Sync Complete!
    [    CSSD]2009-02-03 21:52:08.722 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2009-02-03 21:52:08.724 [17] >TRACE: clssgmReconfigThread: started for reconfig (1)
    [    CSSD]2009-02-03 21:52:08.749 [17] >USER: NMEVENT_RECONFIG [00][00][00][02]
    [    CSSD]2009-02-03 21:52:08.749 [17] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
    [    CSSD]2009-02-03 21:52:08.751 [13] >TRACE: clssgmPeerListener: connects done (1/1)
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
    [    CSSD]CLSS-3001: local node number 1, master node number 1
    [    CSSD]2009-02-03 21:52:08.753 [17] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
    [    CSSD]2009-02-03 21:52:08.863 [10] >TRACE: clssgmClientConnectMsg: Connect from con(80000001008fd2a0) proc(8000000100ae26a8) pid() proto(10:2:1:1)
    [    CSSD]2009-02-03 21:52:08.864 [10] >TRACE: clssgmClientConnectMsg: Connect from con(8000000100ae0128) proc(8000000100ae2a10) pid() proto(10:2:1:1) from con(8000000100aa32c0) proc(8000000100aa5b90) pid() proto(10:2:1:1)
    alertlog:
    [cssd(2535)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
    2009-02-03 23:55:20.821
    [cssd(2575)]CRS-1605:CSSD voting file is online: /dev/rdsk/ora_voting_raw. Detai ls in /work/crs/product/10.2/crs/log/lourmel/cssd/ocssd.log.
    2009-02-03 23:55:28.376
    evmd.log:
    Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
    2009-02-04 00:08:58.331: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
    2009-02-04 00:08:59.939: [ COMMCRS][9]clsc_connect: (800000010007d658) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
    2009-02-04 00:08:59.946: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
    2009-02-04 00:08:59.948: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
    2009-02-04 00:09:07.596: [ CSSCLNT][1]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
    syslog:
    Feb 4 00:08:41 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
    Feb 4 00:08:45 lourmel sfd[2153]: starting the daemon.
    Feb 4 00:08:45 lourmel su: + tty?? root-orac
    Feb 4 00:08:45 lourmel krsd[2152]: Delay time is 300 seconds
    Feb 4 00:08:43 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
    Feb 4 00:08:52 lourmel above message repeats 2 times
    Feb 4 00:08:52 lourmel syslog: Cluster Ready Services completed waiting on dependencies.
    Feb 4 00:08:53 lourmel syslog: Running CRSD with TZ =
    when i checked(befor the restart) the command crs_stat i got the message:
    ORA-0184: Cannot communicate wirh CRS
    crsctl check crs gives us:
    Failure 1 contacting CSS daemon
    Cannot communicate with CRS
    Cannot communicate with EVM
    as i said befor, the machine always restarting
    anyone have an idea?? please

    Dear All,
    I recently upgrade the Few RAC setups with Oracle 10g Patchset 3 (10.2.0.4) on Linux Servers
    In one of the RAC setup, found servers are rebooting daily. The same setup was working fine and problem started only after applying the Patchset. Checked all the logs and Found nothing relevant.
    Then i checked the things which added with this Patchset.
    The Most interesting found , Oracle Added a New Daemon- oprocd.
    # ps -efl | grep oprocd
    4 S root 6440 6063 0 -40 - - 2114 - Mar03 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -hsi 5:10:50:75:90 -f
    These are Interesting Points about above line
    1.This Process is running by root user
    2. With Highest Priority -40
    3. Probing every Seconds (t 1000)
    4. waiting CPU response for 500 Milliseconds ( -m 500 means margin time is 500 Milli Seconds)
    5. Process status is Fatal (-f)
    Now I am concluding these points- This daemon will probe cpu every second and wait for response within 500 Mill seconds. If in the 500 Milli second not getting any response from the cpu, will assume the CPU is hang and try to Reboot the Machine. The OPERATING SYSTEM will not get enough time to write the system logs and server reboots.
    So the solution is increase the Margin time for 500 Milli second to 10 seconds.
    These are following steps to increase the Margin time.
    Please Remember- The Modification process need Downtime and You need to stop cluster service in all member nodes.
    1. Stop The CRS Process
    #crsctl stop crs
    #<CRS_HOME>/bin/oprocd stop
    2. Ensure that Clusterware stack is down and not running
    #ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
    This should return no processes.
    3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root:
    #crsctl set css diagwait 13 -force
    4. Check if diagwait is successfully set.
    #crsctl get css diagwait
    5. Restart the Oracle Clusterware on all the nodes by executing:
    #crsctl start crs
    (Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)
    6. The oprocd daemon process will show with -m 10000
    # ps -efl| grep oprocd
    # 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
    Rollback Procedure-
    If You need to unset oprocd value due any reason
    #crsctl unset css diagwait
    I am confident, The abnormal RAC Node restart problem will solve with this workaround.
    Regards,
    Sumit
    Bangalore,India

  • RAC node rebooting frequently

    Hi all,
    I am woserking on two node rac environment.One of my rac node is rebooting so frequently.I am using oracle 10g database and clusterware also(10.2.0.1).
    Ihave checked os logs(linux AS 4),and rac related logs.Not able to find out anything.Posting all logs please suggest.

    Hi i am posting alert log,os log and ocssd logs....
    clusterware alert log....._
    [crsd(5649)]CRS-1201:CRSD started on node ctmisdb1.
    2012-03-21 09:50:38.188
    [cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 .
    2012-03-21 09:50:46.726
    [crsd(5649)]CRS-1204:Recovering CRS resources for node ctmisdb2.
    2012-03-21 09:55:21.760
    [cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 12:07:46.681
    [cssd(7426)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
    2012-03-21 12:07:50.432
    [cssd(7426)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 12:07:50.893
    [crsd(5549)]CRS-1012:The OCR service started on node ctmisdb1.
    2012-03-21 12:07:50.942
    [evmd(7304)]CRS-1401:EVMD started on node ctmisdb1.
    2012-03-21 12:07:52.827
    [crsd(5549)]CRS-1201:CRSD started on node ctmisdb1.
    2012-03-21 12:48:41.908
    [cssd(7448)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
    2012-03-21 12:48:45.741
    [cssd(7448)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 12:48:49.173
    [crsd(5546)]CRS-1012:The OCR service started on node ctmisdb1.
    2012-03-21 12:48:49.190
    [evmd(7328)]CRS-1401:EVMD started on node ctmisdb1.
    2012-03-21 12:48:50.818
    [crsd(5546)]CRS-1201:CRSD started on node ctmisdb1.
    2012-03-21 13:26:36.398
    [cssd(7343)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
    2012-03-21 13:26:40.492
    [cssd(7343)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 13:26:40.939
    [crsd(5542)]CRS-1012:The OCR service started on node ctmisdb1.
    2012-03-21 13:26:40.977
    [evmd(7223)]CRS-1401:EVMD started on node ctmisdb1.
    2012-03-21 13:26:42.772
    [crsd(5542)]CRS-1201:CRSD started on node ctmisdb1.
    node os log....+
    Mar 21 12:06:35 ctmisdb1 rc: Starting readahead: succeeded
    Mar 21 12:06:35 ctmisdb1 messagebus: messagebus startup succeeded
    Mar 21 12:06:36 ctmisdb1 cups-config-daemon: cups-config-daemon startup succeeded
    Mar 21 12:06:36 ctmisdb1 haldaemon: haldaemon startup succeeded
    Mar 21 12:06:37 ctmisdb1 fstab-sync[6267]: removed all generated mount points
    Mar 21 12:06:37 ctmisdb1 fstab-sync[6378]: added mount point /media/cdrecorder for /dev/hde
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6323]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6324]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session closed for user oracle
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6644]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 kernel: matroxfb: cannot set xres to 800, rounded up to 832
    Mar 21 12:06:37 ctmisdb1 last message repeated 2 times
    Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6323]: session closed for user oracle
    Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6644]: session closed for user oracle
    Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6324]: session closed for user oracle
    Mar 21 12:06:41 ctmisdb1 logger: Cluster Ready Services completed waiting on dependencies.
    Mar 21 12:06:41 ctmisdb1 last message repeated 2 times
    Mar 21 12:06:45 ctmisdb1 gdm(pam_unix)[6379]: session opened for user root by (uid=0)
    Mar 21 12:06:46 ctmisdb1 gconfd (root-7052): starting (version 2.8.1), pid 7052 user 'root'
    Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
    Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1
    Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
    Mar 21 12:06:55 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 0
    Mar 21 12:07:41 ctmisdb1 su(pam_unix)[5547]: session opened for user oracle by (uid=0)
    Mar 21 12:07:41 ctmisdb1 logger: Running CRSD with TZ =
    Mar 21 12:07:43 ctmisdb1 su(pam_unix)[7399]: session opened for user oracle by (uid=0)
    Mar 21 12:12:49 ctmisdb1 sshd(pam_unix)[15323]: session opened for user root by root(uid=0)
    Mar 21 12:12:57 ctmisdb1 su(pam_unix)[15531]: session opened for user oracle by root(uid=0)
    Mar 21 12:47:05 ctmisdb1 syslogd 1.4.1: restart.
    ocssd log....
    [    CSSD]2012-03-21 11:24:41.045 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661f0c0) proc(0x8006622560) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 11:24:41.078 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660cfe0) proc(0x800662ba70) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:07:44.564 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
    [  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
    [    CSSD]2012-03-21 12:07:44.564 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
    [    CSSD]2012-03-21 12:07:44.581 [28260544] >TRACE: clssscmain: local-only set to false
    [    CSSD]2012-03-21 12:07:44.603 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
    [    CSSD]2012-03-21 12:07:44.621 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
    [    CSSD]2012-03-21 12:07:44.627 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
    [    CSSD]2012-03-21 12:07:44.627 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
    [    CSSD]2012-03-21 12:07:44.641 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
    [    CSSD]2012-03-21 12:07:44.655 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:07:46.661 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:07:46.690 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(18) wrtcnt(7920) LATS(0) Disk lastSeqNo(7920)
    [    CSSD]2012-03-21 12:07:46.752 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
    [    CSSD]2012-03-21 12:07:46.752 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
    [    CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
    [    CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
    [    CSSD]2012-03-21 12:07:46.755 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006601040), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
    [    CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
    [    CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
    [    CSSD]2012-03-21 12:07:46.757 [151810688] >TRACE: clssnmPollingThread: Connection complete
    [    CSSD]2012-03-21 12:07:46.757 [162296448] >TRACE: clssnmSendingThread: Connection complete
    [    CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Connection complete
    [    CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Local Join
    [    CSSD]2012-03-21 12:07:46.757 [172782208] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
    [    CSSD]2012-03-21 12:07:47.339 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[5] sync[18]
    [    CSSD]2012-03-21 12:07:47.759 [172782208] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332311864)
    [    CSSD]2012-03-21 12:07:48.341 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(18)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332311864/1332311864) prevConuni(0) birth (0/18) (old/new)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: SYNC(18) from node(2) completed
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:07:50.429 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmReconfigThread: started for reconfig (18)
    [    CSSD]2012-03-21 12:07:50.429 [183267968] >USER: NMEVENT_RECONFIG [00][00][00][06]
    [    CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 18
    [    CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: (0x102a0360) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
    [    CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: conns done (2/2)
    [    CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmEstablishMasterNode: MASTER for 18 is node(2) birth(16)
    [    CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2012-03-21 12:07:50.432 [140255872] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 18
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 18 with 2 nodes
    [    CSSD]CLSS-3001: local node number 1, master node number 2
    [    CSSD]2012-03-21 12:07:50.433 [183267968] >TRACE: clssgmReconfigThread: completed for reconfig(18), with status(1)
    [    CSSD]2012-03-21 12:07:50.550 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006603bb0) proc(0x8006608b00) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:07:50.551 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066066f0) proc(0x8006608d70) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:07:53.569 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660ec70) proc(0x8006611260) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:00.829 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006610990) proc(0x800660de00) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:04.698 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006613030) proc(0x8006612930) pid(8115) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:04.816 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:04.832 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:06.615 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8171) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:07.114 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006615960) proc(0x8006616350) pid(8175) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:11.373 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066192a0) proc(0x8006619470) pid(8302) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:11.669 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee20) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.135 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee70) pid(8458) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.268 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661fc00) proc(0x80066220d0) pid(8460) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.305 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066223e0) proc(0x8006625250) pid(8462) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.353 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8464) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:24.585 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8645) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:27.957 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006628740) proc(0x800662b610) pid(8722) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:30.931 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662cce0) proc(0x800662c860) pid(8801) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:36.400 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661c5f0) proc(0x800661eb50) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:37.863 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661eee0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:38.537 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661d500) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:39.232 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661d500) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:43.085 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:58.971 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x80066112c0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:09:59.290 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:10:59.589 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:11:59.904 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:13:00.203 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:13:14.029 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:14:00.501 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:15:00.809 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:16:01.117 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:17:01.447 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:01.762 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:39.841 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.123 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.316 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.843 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.963 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:43.098 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800662bd20) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:44.173 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:44.368 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:45.351 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:46.236 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:47.031 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:47.694 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:47.819 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.103 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.327 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.484 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006611210) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.758 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:49.529 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:50.509 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:51.060 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:51.558 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:48:39.836 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
    [  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
    [    CSSD]2012-03-21 12:48:39.836 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
    [    CSSD]2012-03-21 12:48:39.849 [28260544] >TRACE: clssscmain: local-only set to false
    [    CSSD]2012-03-21 12:48:39.865 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
    [    CSSD]2012-03-21 12:48:39.872 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
    [    CSSD]2012-03-21 12:48:39.879 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
    [    CSSD]2012-03-21 12:48:39.879 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
    [    CSSD]2012-03-21 12:48:39.881 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
    [    CSSD]2012-03-21 12:48:39.888 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:48:41.892 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:48:41.915 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(20) wrtcnt(10367) LATS(0) Disk lastSeqNo(10367)
    [    CSSD]2012-03-21 12:48:41.959 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
    [    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
    [    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
    [    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
    [    CSSD]2012-03-21 12:48:41.961 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006702790), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
    [    CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
    [    CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
    [    CSSD]2012-03-21 12:48:41.963 [152330880] >TRACE: clssnmPollingThread: Connection complete
    [    CSSD]2012-03-21 12:48:41.963 [162816640] >TRACE: clssnmSendingThread: Connection complete
    [    CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Connection complete
    [    CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Local Join
    [    CSSD]2012-03-21 12:48:41.963 [173302400] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
    [    CSSD]2012-03-21 12:48:42.631 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[13] sync[20]
    [    CSSD]2012-03-21 12:48:42.965 [173302400] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332314319)
    [    CSSD]2012-03-21 12:48:43.636 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(20)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332314319/1332314319) prevConuni(0) birth (0/20) (old/new)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: SYNC(20) from node(2) completed
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:48:45.737 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmReconfigThread: started for reconfig (20)
    [    CSSD]2012-03-21 12:48:45.738 [183788160] >USER: NMEVENT_RECONFIG [00][00][00][06]
    [    CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 20
    [    CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: (0x102a0370) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
    [    CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: conns done (2/2)
    [    CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmEstablishMasterNode: MASTER for 20 is node(2) birth(16)
    [    CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2012-03-21 12:48:45.741 [140776064] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 20
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 20 with 2 nodes
    Plz check and help..........

  • What steps to follow to make RAC Database down.

    hi all,
    I need to know the order which we have to follow in making RAC database Down Completely,
    Information reg database:
    OS::IBM AIX,
    ASM Storage,
    2 node RAC,
    2 databases.
    order in the sence to shutdown the RAC database first what we have to shutdown like ,
    database,asm,cluster, etc.and also give respective commands for reference.
    Regards,
    vamsi.

    844795 wrote:
    hi all,
    I need to know the order which we have to follow in making RAC database Down Completely,
    Information reg database:
    OS::IBM AIX,
    ASM Storage,
    2 node RAC,
    2 databases.
    order in the sence to shutdown the RAC database first what we have to shutdown like ,
    database,asm,cluster, etc.and also give respective commands for reference.
    Regards,
    vamsi.Stopping the Oracle RAC 10g Environment
    The first step is to stop the Oracle instance. When the instance (and related services) is down, then bring down the ASM instance. Finally, shut down the node applications (Virtual IP, GSD, TNS Listener, and ONS).
    $ export ORACLE_SID=orcl1
    $ emctl stop dbconsole
    $ srvctl stop instance -d orcl -i orcl1
    $ srvctl stop asm -n linux1
    $ srvctl stop nodeapps -n linux1
    Starting the Oracle RAC 10g Environment
    The first step is to start the node applications (Virtual IP, GSD, TNS Listener, and ONS). When the node applications are successfully started, then bring up the ASM instance. Finally, bring up the Oracle instance (and related services) and the Enterprise Manager Database console.
    $ export ORACLE_SID=orcl1
    $ srvctl start nodeapps -n linux1
    $ srvctl start asm -n linux1
    $ srvctl start instance -d orcl -i orcl1
    $ emctl start dbconsole
    Start/Stop All Instances with SRVCTL
    Start/stop all the instances and their enabled services. I have included this step just for fun as a way to bring down all instances!
    $ srvctl start database -d orcl
    $ srvctl stop database -d orcl
    reference:http://www.rampant-books.com/art_hunter_rac_start_stop_cluster.htm
    refer the links for more informations:
    Starting and Stopping Instances and Oracle Real Application Clusters Databases
    http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/dbinstmgt.htm#BCEBGHHC
    Server Control Utility Reference
    http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/srvctladmin.htm
    answered by ssolbach
    Just a minor comment on the stop nodeapps.
    While it is fine to stop the nodeapps on the server, the drawback to this is, that the VIP will not failover if you stop the nodeapps, but will be stopped.
    Hence if you only shutdown one server, then you are causing clients to fail to connect to the VIP and having to wait for the TCP/Timeout.
    So if you are not going to shut down all the server, but just want to shutdown one node, you should failover the VIP the the other node.
    See: Note 749160.1 Vip Does Not Failover When Nodeapps Stopped
    So it is sometimes better instead of stopping the nodeapps, to simply shutdown the cluster with crsctl stop crs (which will failover the VIP).
    Sebastian
    reference:-
    Re: RAC Questions

  • RAC Nodes on two different network segment, what can go wrong?

    In two node RAC, what are the disadvantage if both nodes are on different segment of the network?
    So both node will have public interface on different segment, but interconnect will be on same segment.
    Both nodes have same gateway and both network segments goes to same switch.
    It is Solaris, Oracle 10g (10.2.0.4)

    Hi,
    The Public IP addresses and virtual IP addresses must be in the same subnet.
    This is necessary because the VIP-IP (both nodes) should work on both public networks.
    Recently I set up RAC Extended with diferents networks (subnet). When all nodes is up everything is fine. But When one node down, clusterware freeze node survive because he try setup vip-ip (different subnet) from node down on node survive.
    Regards,
    Levi Pereira

  • Solaris RAC nodes re-booting

    I have a pre-production 2-node cluster running on Solaris 10, Oracle 10.2.0.3 with the Oracle CRS, and using a NetApp filer as the shared storage.
    I also have a separate Solaris server running Grid Control 10.2.0.3, with the repository as one of the databases on the RAC (don't know if this is relevant to my problem).
    Periodically both RAC nodes reboot, with no trace of why (the GC server is fine). There is nothing logged in the Solaris logs (messages file), CRS logs, Oracle logs or the NetApp logs.
    All that is shown is the relevant service starting up following the shutdown.
    Has anyone any experience of this, or any thoughts on which component may cause such an issue?
    Thanks in advance
    Bob

    What type of Sun hardware are you using?
    Below is the Action Plan Oracle support sent me on my SR on this issue, not sure if any of this was provided to you or would be of help.
    ACTION PLAN
    ============
    1. there is nothing on the files at all that sheds any light on the issue
    agian 3 sperate sets of clusters all losing all nodes at the same tiem is a very strange occurance. Please be sure to have the admin look for
    anything in common wiht all custers.
    2. advice placing oswatcher on the systems Note.301137.1 Ext/Pub OS Watcher User Guide
    if we should have another occurances we will want the oswatcher logs for 1 hr before issue thru issue
    also see if the unix admin perhaps has any os stats from this occurance
    3. advice settign ntpd to run with -x option I do see that you are having negative time changes
    at times
    -x will give us a skew rather then an abbrupt time change
    4. advice setting this when you can
    Please do the following
    set the diagwait parameter:
    crsctl set css diagwait N [-force]
    Where N is the number of seconds to wait for a filesystem sync to
    complete (after this wait the node will reboot regardless of whether the
    sync has completed). This change must be made with the clusterware
    down, which will require the '-force', or with the stack up on just 1
    node, after which the stack on that node must be restarted before the
    stack starts up on any of the other nodes.
    N should be set to 25 (25 seconds)
    5. advice that you have with pcw mlr#6 Patch 5980915 on the systems as well
    but I do not believe that this was an oracle bug the reason for placing the patch on is for advanced diagnostics that is in that patchset
    6. the two issues sun is workking on
    Sun is working to resolve a time skew issue and a Solaris 10 kernel SIGALRM Sun#6292092 in addition to Sun#6595936.
    7. we do have a diagnostic oprocd that soem sites have used but on thier test systems. It stops reboots adn dumps information but I have
    been hesitant to place it on production boxes if you continue to have issues we may consider download the oprocd_skewfix_noreboot fro
    m Bug 6279879 but at this time I do not belvve that is warrented

  • Rac node failed how do you bring it back up?

    Example: If there are 3 RAC nodes and one becomes unavailable/fails, how do you bring it back up?

    There are typically two basic reasons why a RAC node will go down.
    A cluster issue, causing the node to be evicted. This usually means the node lost access to the cluster storage (e.g. no access to voting disks), or lost access to the Interconnect.
    An o/s issue causing the node to fail. E.g. a kernel panic due to a page swap and memory not syncing, or a soft CPU lockup, etc.
    You need to determine why it went down, and determine what is needed to enable it to join the successfully cluster again.

  • How to get dbms_scheduler to run jobs on different RAC nodes

    lets say I have 3 jobs and I want to run each on different RAC nodes. How do I do this?

    Hi,
    Pierre's response shows the easiest way to do this on 10g (create a service for each instance, then a job class for each service, then assign jobs to specific job classes).
    In 11g there is a more direct method, you can just set the INSTANCE_ID attribute of a job using dbms_scheduler.set_attribute.
    Note that for PL/SQL jobs Oracle recommends using services instead of instance ids because they provide better availability if one instance goes down or has to be taken down.
    Hope this helps,
    Ravi.

  • What is best use of 1400 gb SGA (2 rac nodes 768gb each)

    currently using 11.2.0.3.0 on unix sun sever with 2 RAC nodes each 8 UltraSPARC-T1 cpus (came out in 2005) four threads each so oracle sees 32 CPUS very slow(1.2 gb).  Database is 4TB in size on regular SAN (10k speed).
    8gb SGA.
    New boss wants to update system to the max to get best performance possible  Money is a concern of course but budget is pretty high,  Our use case is 12-16 users at same time, running reports some small others very large (return single row or 10000s or rows).  reports take 5 sec to 5 minutes, Our job is get the fastest system possible,  We have total of 8 licenses available so we can have 16 cores.  We are also getting a 6tb all flash SSD array for database.  we can get any CPU we want but we cant use parallel query server due to all kinds of issues we have experienced (too many slaves, RAC interconnect saturation etc, whack-a-mole).  sparc has too many threads and without PS oracle runs query in single thread. 
    we have speced out the following system for each RAC node
    HP ProLiant DL380p Gen8 8 SFF server
    2 Intel Xeon E5-2637v2 3.5GHz/4-core cpus
    768 gb ram
    2 HP 300GB 6G SAS 15K drives for database software
    this will give us total of 4 Xeon E5-2637v2 cpus 16 cores total (,5 factor for 8 licenses) and 1536 ram (leaving ~1400 for sga).  this will guarantee an available core for each user.  we intend to create very very large keep pool around 300 gb for each node that will hold all our dimension tables.  this we hope will reduce reads from the SSD to just data from fact tables.,
    Are we doing a massive overkill here?  the budget for this was way less than what our boss expected.  will that big an sga be wasted will say a 256gb be fine.  or will oracle take advantage of it and be able to keep most blocks in there.
    will an sga that big cause oracle problems due to overhead of handling that much ram?

    Current System:
    ===========
    a. Version : 11.2.0.3
    b. Unix Sun
    c. CPU - 8 cpus with 4 threads => 32 logical cpus or cores
    d. database 4TB
    e. SAN - 10k speed disk drives
    f. 8gb SGA
    g. 1.2 gb ??
    h. Users --> 12-16 concurrent and run reports varying size
    i. reports elasped time 5 sec to 5 mins
    j. cpu license -->8
    Target System
    ===========
    a. Version: 11.2.0.3
    b. HP ProLiant DL380p Gen8 8 SFF server
    c. RAM --> 768 GB
    d. 2 HP 300GB 6G SAS 15K drives for database software
    e. large keep pool -->90 gb to  hold all dimension tables. 
    f.  SSD to just data from fact tables
    g. SGA -->256gb
    Reassessment of the performance issues of current system appears to be required.Good performance tuning expert is required to look into tuning issues of current application by analyzing awr performance metrics . If 8GB SGA is not enough,then reason behind so is that queries running in the system are not having good access path to select lesser data to avoid flushing out of recent buffers from different tables involved in the query. Until those issues are identified , wherever you go, performance issue wont be going away as table size increase in future , problem will reappear.Even if the queries are running with more FULL Scan , then re-platforming to Exadata might be right decision as Exadata has smart scan , cell offloading feature which works faster and might be right direction for best performance and best investment for future.Compression (compress for OLTP) could be one of the other feature to exploit to improve further efficiency while reading the lesser block in lesser read time.
    Investment in infrastructure will solve a few issue in short term but long term issue will again arise.
    Investment in identifying the performance issues of current system would be best investment in current scenario.

Maybe you are looking for

  • Trying to create a XML file from an ASP Form

    I have an ASP form on my website that generates a XML data file, but there are a few problems with it. First, when I generate the file, it creates a new file every time a user clicks on "Submit" and I would like the data to just be appended to a part

  • Logical Column(s) a.k.a. Pre-calculated Measures

    I am looking for best practices around logical columns either in Presentation Layer or Business Layer. Specifically I want to know.. 1) Is it advisable to have logical columns? 2) How many are good to have? Should one create logical columns for all f

  • Switching off connections fast (Wifi, 3G, BT, GPS)

    Does anybody know of an easy (FAST) way to switch off and on your Wifi, 3G, Bluetooth, GPS connections on the iPhone? I HATE going through all the screens. On top of that they are not all in the same place. Is there an App or something? I cannot seem

  • Calling a stored procedure which returns a UDT

    Hi devs, Recently I've come across this requirement where I need to get a Oracle UDT returned from a stored procedure. The stored procedure I've used is described below. CREATE OR REPLACE PROCEDURE test_proc(param_id IN NUMBER, cust OUT CUSTOMER) IS

  • Exporting MPEG2-DVD

    When I export my video in Premiere CS6 as MPEG2-DVD it says the file size is 1086 MB but when I go to were I saved the video it is 2.60 GB how can I reduce the file size when exporting?