Nslookup scan ip not pingable in rac node

All,
I'm planning to install 11.2.0.1 rac on my laptop. As a first step I have configured dns in separate vmware and configured rac1 node as well. Both dns and rac1 public ip addresses are pingable from each others and from host machine.But the rac-scan ip is only pingable from dns server and not pingable from rac1 server.  Will this make any problem if dns server running on 32 bit and rac nodes running on 64 bit server ? Please let me know if I have anything missed here. Thanks again.
About posting on this forum. I have used [code]  [/code] to format the code previously. But this time it is not working. Also there is no option to preview the code before posting.
use spaces to separate multiple tags I'm not clear about this. I read https://forums.oracle.com/thread/865295 this article how to post the code. It says to use  \ . If you guide me how to format the code I can use that in future.
Host OS : Windows 8 64 bit
Guest OS -1 : dns 32 bit Linux
  [root@dns32 ~]# uname -a
   Linux dns32.testenv.com 2.6.18-164.el5 #1 SMP Thu Sep 3 02:16:47 EDT 2009 i686 i686 i386 GNU/Linux
Guest OS -2 : rac1 64 bit Linux
  [root@rac1 ~]# uname -a
   Linux rac1 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
Guest OS-3 : rac2 - yet to configure 64 bit Linux
@ dns server
[root@dns32 ~]# nslookup rac-scan
Server:         192.168.1.26
Address:        192.168.1.26#53
Name:   rac-scan.testenv.com
Address: 192.168.1.57
Name:   rac-scan.testenv.com
Address: 192.168.1.58
Name:   rac-scan.testenv.com
Address: 192.168.1.59
[root@dns32 ~]# cat /etc/resolv.conf
search testenv.com
nameserver 192.168.1.26
[root@dns32 ~]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:0C:29:EF:03:D3
          inet addr:192.168.1.26  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:feef:3d3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2802 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2691 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:210115 (205.1 KiB)  TX bytes:208344 (203.4 KiB)
          Interrupt:67 Base address:0x2024
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2308 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2308 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:5494207 (5.2 MiB)  TX bytes:5494207 (5.2 MiB)
sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
[root@dns32 ~]# ping 192.168.1.26
PING 192.168.1.26 (192.168.1.26) 56(84) bytes of data.
64 bytes from 192.168.1.26: icmp_seq=1 ttl=64 time=0.200 ms
--- 192.168.1.26 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.200/0.200/0.200/0.000 ms
[root@dns32 ~]# ping 192.168.1.27
PING 192.168.1.27 (192.168.1.27) 56(84) bytes of data.
64 bytes from 192.168.1.27: icmp_seq=1 ttl=64 time=0.330 ms
--- 192.168.1.27 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.330/0.330/0.330/0.000 ms
@rac1 node :
[root@rac1 ~]# cat /etc/resolv.conf
search testenv.com
nameserver 192.168.1.26
[root@rac1 ~]#  nslookup rac-scan
;; connection timed out; no servers could be reached
[root@rac1 ~]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:0C:29:75:A9:39
          inet addr:192.168.1.27  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:a939/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:500 errors:0 dropped:0 overruns:0 frame:0
          TX packets:357 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:52333 (51.1 KiB)  TX bytes:39556 (38.6 KiB)
eth1      Link encap:Ethernet  HWaddr 00:0C:29:75:A9:43
          inet addr:192.168.2.37  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:a943/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:160 errors:0 dropped:0 overruns:0 frame:0
          TX packets:50 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:20359 (19.8 KiB)  TX bytes:6518 (6.3 KiB)
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1940 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1940 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4783881 (4.5 MiB)  TX bytes:4783881 (4.5 MiB)
sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
[root@rac1 ~]# ping 192.168.1.26
PING 192.168.1.26 (192.168.1.26) 56(84) bytes of data.
64 bytes from 192.168.1.26: icmp_seq=1 ttl=64 time=0.284 ms
64 bytes from 192.168.1.26: icmp_seq=2 ttl=64 time=0.456 ms
--- 192.168.1.26 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.284/0.370/0.456/0.086 ms
[root@rac1 ~]# ping 192.168.1.27
PING 192.168.1.27 (192.168.1.27) 56(84) bytes of data.
64 bytes from 192.168.1.27: icmp_seq=1 ttl=64 time=0.032 ms
--- 192.168.1.27 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.032/0.032/0.032/0.000 ms
Thanks
Arul

Thanks Saurabh. I have configured dns using this blog " http://dnccfg.blogspot.in/2012/08/dns-configuration-on-linux.html.html
[root@dns32 etc]# cat named.conf
// named.caching-nameserver.conf
// Provided by Red Hat caching-nameserver package to configure the
// ISC BIND named(8) DNS server as a caching only nameserver
// (as a localhost DNS resolver only).
// See /usr/share/doc/bind*/sample/ for example named configuration files.
// DO NOT EDIT THIS FILE - use system-config-bind or an editor
// to create named.conf - edits to this file will be lost on
// caching-nameserver package upgrade.
options {
        listen-on port 53 { 192.168.1.26; };
        listen-on-v6 port 53 { ::1; };
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
        // Those options should be used carefully because they disable port
        // randomization
        // query-source    port 53;
        // query-source-v6 port 53;
        allow-query     { any; };
        allow-query-cache { localhost; };
logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
view localhost_resolver {
        match-clients      { any; };
        match-destinations { 192.168.1.26; };
        recursion yes;
        include "/etc/named.rfc1912.zones";
[root@dns32 etc]# cat named.rfc1912.zones
// named.rfc1912.zones:
// Provided by Red Hat caching-nameserver package
// ISC BIND named zone configuration for zones recommended by
// RFC 1912 section 4.1 : localhost TLDs and address zones
// See /usr/share/doc/bind*/sample/ for example named configuration files.
zone "." IN {
        type hint;
        file "named.ca";
zone "testenv.com" IN {
        type master;
        file "forward.zone";
        allow-update { none; };
zone "localhost" IN {
        type master;
        file "localhost.zone";
        allow-update { none; };
zone "1.168.192.in-addr.arpa" IN {
        type master;
        file "reverse.zone";
        allow-update { none; };
[root@dns32 named]# cat forward.zone
$TTL    86400
@               IN SOA  dns32.testenv.com. root.dns32.testenv.com. (
                                        42              ; serial (d. adams)
                                        3H              ; refresh
                                        15M             ; retry
                                        1W              ; expiry
                                        1D )            ; minimum
                IN NS           dns32.testenv.com.
dns32     IN A 192.168.1.26
rac1      IN A 192.168.1.27
rac2      IN A 192.168.1.28
rac1-priv  IN A 192.168.2.37
rac2-priv  IN A 192.168.2.38
rac1-vip  IN A 192.168.1.47
rac2-vip  IN A 192.168.1.48
rac-scan  IN A 192.168.1.57
rac-scan  IN A 192.168.1.58
rac-scan  IN A 192.168.1.59
[root@dns32 named]# cat reverse.zone
$TTL    86400
@       IN      SOA     dns32.testenv.com. root.dns32.testenv.com.  (
                                      1997022700 ; Serial
                                      28800      ; Refresh
                                      14400      ; Retry
                                      3600000    ; Expire
                                      86400 )    ; Minimum
        IN      NS      dns32.testenv.com.
26       IN      PTR     dns32.testenv.com.
27        IN      PTR   rac1.testenv.com.
28        IN      PTR   rac2.testenv.com.
47        IN      PTR   rac1-vip.testenv.com.
48        IN      PTR   rac2-vip.testenv.com.
57        IN      PTR   rac-scan.testenv.com.
58        IN      PTR   rac-scan.testenv.com.
59        IN      PTR   rac-scan.testenv.com.
Thanks
Arul

Similar Messages

  • Orainst.sh scripts not respond on RAC Node 2

    Hi every one,
    I am configuring RAC with 2 nodes. during cluster installation the root.sh and orainst.sh runs successfully on Node1.. root.sh scripts also runs fine on Node 2...but when running orainst.sh scripts its starts and hangs on "90 seconds" nothing happens ...
    Any one have idea..
    Thanks..
    Aziz

    Abdul Aziz wrote:
    Hi Billy, sorry for not mentioning... I have Linux rhel 4.6 and Oracle 10g Clusterware 10.2.....
    We have installed many time before but on rhel 4.3..
    Had a similar problem due to bug 4679769. As I recall, the configuration/checking of the OCR (and/or Voting) Disks fails when checked by the installer script on subsequent nodes after the 1st node formatted it.
    I hacked it originally by removing certain checks from the shell script (not recommended). Later when installing other clusters discovered that module clsfmt.bin was the problem and that patch 4679769 provides a replacement module. The patch is not opatch style - simply a manual copy to replace the existing module with the updated one.
    But this was for Linux x86-64 (AMD64/EM64T). You neglected to name the h/w platform. But I suspect the same problem would exist for Intel based platforms.
    An alternative (which I am using with current CRS installs) is to install CRS 11g and run 10gr2 RAC on top of it. It is a certified combo and hopefully will result in less work when I need to upgrade the RDBMS to 11.2 when it becomes available.

  • Scan ip not working in 11.2.0.1 RAC

    Hi,
    I have set up 11gR2 (11.2.0.1) in OEL 5.My scan ip is not working properly.Here i have shown the output of the some commands on node1 and node2.Can anyone let me know how to rectify this issue.
    [oracle@markdb bin]$ nslookup db1-scan.oracle.com
    Server:         192.168.32.142
    Address:        192.168.32.142#53
    Name:   db1-scan.oracle.com
    Address: 192.168.32.155   ---------------> ip order changes
    Name:   db1-scan.oracle.com
    Address: 192.168.32.153
    Name:   db1-scan.oracle.com
    Address: 192.168.32.154
    [oracle@markdb bin]$
    [oracle@markdb bin]$
    [oracle@markdb bin]$ nslookup db1-scan.oracle.com
    Server:         192.168.32.142
    Address:        192.168.32.142#53
    Name:   db1-scan.oracle.com
    Address: 192.168.32.153
    Name:   db1-scan.oracle.com
    Address: 192.168.32.154
    Name:   db1-scan.oracle.com
    Address: 192.168.32.155 -----------------> ip order changes
    Node1
    [oracle@markdb bin]$ ./srvctl status scan
    SCAN VIP scan1 is enabled
    SCAN VIP scan1 is running on node lukedb  -------------------> Node 2 name
    [oracle@markdb bin]$
    [oracle@markdb bin]$
    Node2
    [oracle@lukedb bin]$ ./srvctl status scan
    SCAN VIP scan1 is enabled
    SCAN VIP scan1 is running on node lukedb --------------------> Node 2 name
    [oracle@lukedb bin]$
    [oracle@lukedb bin]$
    [oracle@lukedb bin]$
    Node1
    [oracle@markdb bin]$ ./srvctl config scan
    SCAN name: db1-scan.oracle.com, Network: 1/192.168.32.0/255.255.255.0/eth0
    SCAN VIP name: scan1, IP: /db1-scan.oracle.com/192.168.32.153
    Regards,
    007

    What is the issue?  "SCAN not working" say nothing to me.

  • Listener on RAC node 2 not accepting user connections

    Hey guys i am experiencing a disturbing scenario, i have a 2 RAC node cluster . The problem is that the second listener is not registering any connections . I have verified the services of listener using lsnrctl status (the default name is LISTENER), i also have verified the local and remote listener parameters they are fine but running the fol query shows count =0 against inst_id=2;
    SQL > select count() from gv$session where username='XYZ' and inst_id=2;*
    Count
    0
    Any help is appreciated in this regard please.

    Yeah i know very well the purpose of SCAN . But web admins are sticking to the vip . Well here is the output of lsnrctl status:-
    LSNRCTL for Solaris: Version 11.2.0.1.0 - Production on 11-APR-2013 16:41:55
    Copyright (c) 1991, 2009, Oracle. All rights reserved.
    Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
    STATUS of the LISTENER
    Alias LISTENER
    Version TNSLSNR for Solaris: Version 11.2.0.1.0 - Production
    Start Date 10-MAR-2013 13:00:14
    Uptime 17 days 2 hr. 42 min. 18 sec
    Trace Level off
    Security ON: Local OS Authentication
    SNMP OFF
    Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
    Listener Log File /u01/app/oracle/diag/tnslsnr/fedb6/listener/alert/log.xml
    Listening Endpoints Summary...
    (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=5.2.21.1)(PORT=1521)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=5.2.21.7)(PORT=1521)))
    Services Summary...
    Service "+ASM" has 1 instance(s).
    Instance "+ASM2", status READY, has 1 handler(s) for this service...
    Service "grid1.xyz" has 1 instance(s).
    Instance "grid12", status READY, has 1 handler(s) for this service...
    Service "grid1XDB.xyz" has 1 instance(s).
    Instance "grid12", status READY, has 1 handler(s) for this service...
    Service "xyz" has 1 instance(s).
    Instance "grid12", status READY, has 1 handler(s) for this service...
    The command completed successfully
    and here is tnsnames.ora output :-
    GRID1 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = grid1-scan)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = grid1.xyz)
    PRIMARY1_SERV =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 5.2.21.1)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = grid1.xyz)
    (INSTANCE_NAME= grid11)
    PRIMARY2_SERV =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 5.2.21.6)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = grid1.xyz)
    (INSTANCE_NAME = grid12)
    )

  • Scan-vip running only on one RAC node

    Hi ,
    While setting up RAC11.2 on Centos 5.7 , I was getting this error during the grid installation:
    PRCR-1079 : Failed to start resource ora.scan1.vip
    CRS-5005: IP Address: 192.168.100.208 is already in use in the network
    CRS-2674: Start of 'ora.scan1.vip' on 'falcen6b' failed
    CRS-2632: There are no more servers to try to place resource 'ora.scan1.vip' on that would satisfy its placement policy
    PRCR-1079 : Failed to start resource ora.scan2.vip
    CRS-5005: IP Address: 192.168.100.209 is already in use in the network
    CRS-2674: Start of 'ora.scan2.vip' on 'falcen6b' failed
    CRS-2632: There are no more servers to try to place resource 'ora.scan2.vip' on that would satisfy its placement policy
    PRCR-1079 : Failed to start resource ora.scan3.vip
    CRS-5005: IP Address: 192.168.100.210 is already in use in the network
    CRS-2674: Start of 'ora.scan3.vip' on 'falcen6b' failed
    CRS-2632: There are no more servers to try to place resource 'ora.scan3.vip' on that would satisfy its placement policy
    I figured that the scan service is able to run only on one node at a time. When I stopped the service on rac1 and started it on rac2 the service is starting.
    But I think for the grid installation the scan service has to simultaneously run on both the nodes.
    How do I resolve it?
    Any suggestions please.
    PS - I am planning to try with the patch 11.0.2.3 but it will be a while till i get access to it.
    Till then can someone suggest a workaround?

    Hi Balazs Papp and onedbguru,
    I was able to resolve that error by running the following command on rac2, now that part of the installer passed.
    crsctl start res ora.scan1.vip
    However the cluster verification utility is failing at the end of installer.
    When I executed the below command, this is my output:
    [oracle@falcen6a grid]$ ./runcluvfy.sh stage -post crsinst -n falcen6a,falcen6b -verbose
    Performing post-checks for cluster services setup
    Checking node reachability...
    Check: Node reachability from node "falcen6a"
    Destination Node Reachable?
    falcen6a yes
    falcen6b yes
    Result: Node reachability check passed from node "falcen6a"
    Checking user equivalence...
    Check: User equivalence for user "oracle"
    Node Name Comment
    falcen6b passed
    falcen6a passed
    Result: User equivalence check passed for user "oracle"
    Checking time zone consistency...
    Time zone consistency check passed.
    Checking Cluster manager integrity...
    Checking CSS daemon...
    Node Name Status
    falcen6b running
    falcen6a running
    Oracle Cluster Synchronization Services appear to be online.
    Cluster manager integrity check passed
    UDev attributes check for OCR locations started...
    Result: UDev attributes check passed for OCR locations
    UDev attributes check for Voting Disk locations started...
    Result: UDev attributes check passed for Voting Disk locations
    Check default user file creation mask
    Node Name Available Required Comment
    falcen6b 0022 0022 passed
    falcen6a 0022 0022 passed
    Result: Default user file creation mask check passed
    Checking cluster integrity...
    Cluster is divided into 2 partitions
    Partition 1 consists of the following members:
    Node Name
    falcen6b
    Partition 2 consists of the following members:
    Node Name
    falcen6a
    Cluster integrity check failed. Cluster is divided into 2 partition(s).
    Checking OCR integrity...
    Checking the absence of a non-clustered configuration...
    All nodes free of non-clustered, local-only configurations
    ERROR:
    PRVF-4193 : Asm is not running on the following nodes. Proceeding with the remaining nodes.
    Checking OCR config file "/etc/oracle/ocr.loc"...
    OCR config file "/etc/oracle/ocr.loc" check successful
    ERROR:
    PRVF-4195 : Disk group for ocr location "+DATA" not available on the following nodes:
    Checking size of the OCR location "+DATA" ...
    Size check for OCR location "+DATA" successful...
    OCR integrity check failed
    Checking CRS integrity...
    ERROR:
    PRVF-5316 : Failed to retrieve version of CRS installed on node "falcen6b"
    The Oracle clusterware is healthy on node "falcen6b"
    The Oracle clusterware is healthy on node "falcen6a"
    CRS integrity check failed
    Checking node application existence...
    Checking existence of VIP node application
    Node Name Required Status Comment
    falcen6b yes unknown failed
    falcen6a yes unknown failed
    Result: Check failed.
    Checking existence of ONS node application
    Node Name Required Status Comment
    falcen6b no unknown ignored
    falcen6a no online passed
    Result: Check ignored.
    Checking existence of GSD node application
    Node Name Required Status Comment
    falcen6b no unknown ignored
    falcen6a no does not exist ignored
    Result: Check ignored.
    Checking existence of EONS node application
    Node Name Required Status Comment
    falcen6b no unknown ignored
    falcen6a no online passed
    Result: Check ignored.
    Checking existence of NETWORK node application
    Node Name Required Status Comment
    falcen6b no unknown ignored
    falcen6a no online passed
    Result: Check ignored.
    Checking Single Client Access Name (SCAN)...
    SCAN VIP name Node Running? ListenerName Port Running?
    falcen6-scan unknown false LISTENER 1521 false
    WARNING:
    PRVF-5056 : Scan Listener "LISTENER" not running
    Checking name resolution setup for "falcen6-scan"...
    SCAN Name IP Address Status Comment
    falcen6-scan 192.168.100.210 passed
    falcen6-scan 192.168.100.208 passed
    falcen6-scan 192.168.100.209 passed
    Verification of SCAN VIP and Listener setup failed
    OCR detected on ASM. Running ACFS Integrity checks...
    Starting check to see if ASM is running on all cluster nodes...
    PRVF-5137 : Failure while checking ASM status on node "falcen6b"
    Starting Disk Groups check to see if at least one Disk Group configured...
    Disk Group Check passed. At least one Disk Group configured
    Task ACFS Integrity check failed
    Checking Oracle Cluster Voting Disk configuration...
    Oracle Cluster Voting Disk configuration check passed
    Checking to make sure user "oracle" is not in "root" group
    Node Name Status Comment
    falcen6b does not exist passed
    falcen6a does not exist passed
    Result: User "oracle" is not part of "root" group. Check passed
    Checking if Clusterware is installed on all nodes...
    Check of Clusterware install passed
    Checking if CTSS Resource is running on all nodes...
    Check: CTSS Resource running on all nodes
    Node Name Status
    falcen6b passed
    falcen6a passed
    Result: CTSS resource check passed
    Querying CTSS for time offset on all nodes...
    Result: Query of CTSS for time offset passed
    Check CTSS state started...
    Check: CTSS state
    Node Name State
    falcen6b Observer
    falcen6a Observer
    CTSS is in Observer state. Switching over to clock synchronization checks using NTP
    Starting Clock synchronization checks using Network Time Protocol(NTP)...
    NTP Configuration file check started...
    The NTP configuration file "/etc/ntp.conf" is available on all nodes
    NTP Configuration file check passed
    Checking daemon liveness...
    Check: Liveness for "ntpd"
    Node Name Running?
    falcen6b yes
    falcen6a yes
    Result: Liveness check passed for "ntpd"
    Checking NTP daemon command line for slewing option "-x"
    Check: NTP daemon command line
    Node Name Slewing Option Set?
    falcen6b yes
    falcen6a yes
    Result:
    NTP daemon slewing option check passed
    Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x"
    Check: NTP daemon's boot time configuration
    Node Name Slewing Option Set?
    falcen6b yes
    falcen6a yes
    Result:
    NTP daemon's boot time configuration check for slewing option passed
    NTP common Time Server Check started...
    NTP Time Server "133.243.236.19" is common to all nodes on which the NTP daemon is running
    NTP Time Server "133.243.236.18" is common to all nodes on which the NTP daemon is running
    NTP Time Server "210.173.160.86" is common to all nodes on which the NTP daemon is running
    NTP Time Server ".LOCL." is common to all nodes on which the NTP daemon is running
    Check of common NTP Time Server passed
    Clock time offset check from NTP Time Server started...
    Checking on nodes "[falcen6b, falcen6a]"...
    Check: Clock time offset from NTP Time Server
    Time Server: 133.243.236.19
    Time Offset Limit: 1000.0 msecs
    Node Name Time Offset Status
    falcen6b 15.332 passed
    falcen6a -1.503 passed
    Time Server "133.243.236.19" has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
    Time Server: 133.243.236.18
    Time Offset Limit: 1000.0 msecs
    Node Name Time Offset Status
    falcen6b 15.115 passed
    falcen6a -1.614 passed
    Time Server "133.243.236.18" has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
    Time Server: 210.173.160.86
    Time Offset Limit: 1000.0 msecs
    Node Name Time Offset Status
    falcen6b 15.219 passed
    falcen6a -1.527 passed
    Time Server "210.173.160.86" has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
    Time Server: .LOCL.
    Time Offset Limit: 1000.0 msecs
    Node Name Time Offset Status
    falcen6b 0.0 passed
    falcen6a 0.0 passed
    Time Server ".LOCL." has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
    Clock time offset check passed
    Result: Clock synchronization check using Network Time Protocol(NTP) passed
    Oracle Cluster Time Synchronization Services check passed
    Post-check for cluster services setup was unsuccessful on all the nodes.
    [oracle@falcen6a grid]$
    Any suggestions?

  • SCAN in Oracle 11g R2 RAC

    Hi All,
    Actually i am not having much idea on either Linux administration nor DB administration, Actually i am Test engg and my requirement is to setup Oracle 11gR2 RAC on Linux machine.
    I have followed steps in http://www.oracle-base.com/articles/11g/OracleDB11gR2RACInstallationOnLinuxUsingNFS.php
    Actually i am not clear with SCAN. Given some IP address of the main box names of the nodes.
    Please can any one explain what should be the IP address that needs to enter as SCAN in /etc/hosts file.

    I have used the IP address of the VM host which is having 2 nodes, It could be able to see in nslookup but not pingable.
    Placed the same in /etc/hosts of both nodes. After installation of Grid infrastructure which i have issued ifconfig i could able to see the SACN IP address
    Node 1:
    eth0 : Public IP Address
    eth0:2 Virtual IP address
    eh0:3 SCAN IP address
    eth1: Private IP
    Node 2:
    eth0 : Public IP Address
    eth0:1 Virtual IP address
    eth1: Private IP
    Please let me know if the shown thing is correct or i did some mistake.

  • RAC Node down and ORA-12514

    I have a two node rac setup. One Node went down because of hardware issues. And it seems that I cannot connect from client (jdbc) when SCAN gives particular ip.
    I receive : ORA-12514, TNS:listener does not currently know of service requested in connect descriptor. If DNS returns the correct ip - everything works fine.
    connection string:
    jdbc:oracle:thin:@(DESCRIPTION= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=testracscan.internal.int)(PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=testdb.internal.int)))
    Interfaces show that VIPS and SCANS are assigned correctly on Node 1:
    vlan65 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.10 Bcast:192.168.2.255 Mask:255.255.255.0
    inet6 addr: fe80::2e76:8aff:fe4f:b5cc/64 Scope:Link
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    RX packets:937195 errors:0 dropped:0 overruns:0 frame:0
    TX packets:852745 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:0
    RX bytes:186434457 (177.7 MiB) TX bytes:141217705 (134.6 MiB)
    vlan65:1 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.25 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    vlan65:2 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.35 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    vlan65:3 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.30 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    vlan65:4 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.110 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    vlan65:5 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
    inet addr:192.168.2.115 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    [oracle@srvtestdb1 ~]$ lsnrctl status
    LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 03-SEP-2012 15:35:05
    Copyright (c) 1991, 2011, Oracle. All rights reserved.
    Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
    STATUS of the LISTENER
    Alias LISTENER
    Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
    Start Date 29-AUG-2012 15:52:57
    Uptime 4 days 23 hr. 42 min. 7 sec
    Trace Level off
    Security ON: Local OS Authentication
    SNMP OFF
    Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
    Listener Log File /u01/app/grid/diag/tnslsnr/srvtestdb1/listener/alert/log.xml
    Listening Endpoints Summary...
    (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.10)(PORT=1521)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.110)(PORT=1521)))
    Services Summary...
    Service "+ASM" has 1 instance(s).
    Instance "+ASM1", status READY, has 1 handler(s) for this service...
    Service "testdb.internal.int" has 1 instance(s).
    Instance "testdb1", status READY, has 1 handler(s) for this service...
    Service "testdbXDB.internal.int" has 1 instance(s).
    Instance "testdb1", status READY, has 1 handler(s) for this service...
    Service "testdbsvc.internal.int" has 1 instance(s).
    Instance "testdb1", status READY, has 1 handler(s) for this service...
    The command completed successfully
    [oracle@srvtestdb1 ~]$
    SQL> show parameter listener
    NAME TYPE VALUE
    listener_networks string
    local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.110)(PORT=1521))))
    remote_listener string testracscan.internal.int:1521
    nslookup testracscan.internal.int
    Server: 192.168.0.18
    Address: 192.168.0.18#53
    Name: testracscan.internal.int
    Address: 192.168.2.30
    Name: testracscan.internal.int
    Address: 192.168.2.25
    Name: testracscan.internal.int
    Address: 192.168.2.35
    Problems arise when client ip is resolved to 192.168.2.35 - i get ORA12514.
    When IP is resolved to 192.168.2.110 it simply sits ant waits for a moment and then begins to work, and nestat shows:
    tcp 0 0 ::ffff:1 192.168.2.5:51685 ::ffff:192.168.2.110:1521 ESTABLISHED
    What might be causing this?

    [grid@srvtestdb1 ~]$ ps -ef|grep tns
    root 65 2 0 Aug29 ? 00:00:00 [netns]
    grid 4449 1 0 Aug29 ? 00:00:25 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN2 -inherit
    grid 4454 1 0 Aug29 ? 00:00:23 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
    grid 4481 1 0 Aug29 ? 00:00:33 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
    grid 37028 1 0 09:38 ? 00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
    grid 37901 36372 0 09:45 pts/0 00:00:00 grep tns
    [grid@srvtestdb1 ~]$
    [grid@srvtestdb1 ~]$ srvctl config scan_listener
    SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
    SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
    SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521
    [grid@srvtestdb1 ~]$
    [grid@srvtestdb1 ~]$ srvctl status scan_listener
    SCAN Listener LISTENER_SCAN1 is enabled
    SCAN listener LISTENER_SCAN1 is running on node srvtestdb1
    SCAN Listener LISTENER_SCAN2 is enabled
    SCAN listener LISTENER_SCAN2 is running on node srvtestdb1
    SCAN Listener LISTENER_SCAN3 is enabled
    SCAN listener LISTENER_SCAN3 is running on node srvtestdb1
    [grid@srvtestdb1 ~]$ srvctl status scan
    SCAN VIP scan1 is enabled
    SCAN VIP scan1 is running on node srvtestdb1
    SCAN VIP scan2 is enabled
    SCAN VIP scan2 is running on node srvtestdb1
    SCAN VIP scan3 is enabled
    SCAN VIP scan3 is running on node srvtestdb1

  • What is best use of 1400 gb SGA (2 rac nodes 768gb each)

    currently using 11.2.0.3.0 on unix sun sever with 2 RAC nodes each 8 UltraSPARC-T1 cpus (came out in 2005) four threads each so oracle sees 32 CPUS very slow(1.2 gb).  Database is 4TB in size on regular SAN (10k speed).
    8gb SGA.
    New boss wants to update system to the max to get best performance possible  Money is a concern of course but budget is pretty high,  Our use case is 12-16 users at same time, running reports some small others very large (return single row or 10000s or rows).  reports take 5 sec to 5 minutes, Our job is get the fastest system possible,  We have total of 8 licenses available so we can have 16 cores.  We are also getting a 6tb all flash SSD array for database.  we can get any CPU we want but we cant use parallel query server due to all kinds of issues we have experienced (too many slaves, RAC interconnect saturation etc, whack-a-mole).  sparc has too many threads and without PS oracle runs query in single thread. 
    we have speced out the following system for each RAC node
    HP ProLiant DL380p Gen8 8 SFF server
    2 Intel Xeon E5-2637v2 3.5GHz/4-core cpus
    768 gb ram
    2 HP 300GB 6G SAS 15K drives for database software
    this will give us total of 4 Xeon E5-2637v2 cpus 16 cores total (,5 factor for 8 licenses) and 1536 ram (leaving ~1400 for sga).  this will guarantee an available core for each user.  we intend to create very very large keep pool around 300 gb for each node that will hold all our dimension tables.  this we hope will reduce reads from the SSD to just data from fact tables.,
    Are we doing a massive overkill here?  the budget for this was way less than what our boss expected.  will that big an sga be wasted will say a 256gb be fine.  or will oracle take advantage of it and be able to keep most blocks in there.
    will an sga that big cause oracle problems due to overhead of handling that much ram?

    Current System:
    ===========
    a. Version : 11.2.0.3
    b. Unix Sun
    c. CPU - 8 cpus with 4 threads => 32 logical cpus or cores
    d. database 4TB
    e. SAN - 10k speed disk drives
    f. 8gb SGA
    g. 1.2 gb ??
    h. Users --> 12-16 concurrent and run reports varying size
    i. reports elasped time 5 sec to 5 mins
    j. cpu license -->8
    Target System
    ===========
    a. Version: 11.2.0.3
    b. HP ProLiant DL380p Gen8 8 SFF server
    c. RAM --> 768 GB
    d. 2 HP 300GB 6G SAS 15K drives for database software
    e. large keep pool -->90 gb to  hold all dimension tables. 
    f.  SSD to just data from fact tables
    g. SGA -->256gb
    Reassessment of the performance issues of current system appears to be required.Good performance tuning expert is required to look into tuning issues of current application by analyzing awr performance metrics . If 8GB SGA is not enough,then reason behind so is that queries running in the system are not having good access path to select lesser data to avoid flushing out of recent buffers from different tables involved in the query. Until those issues are identified , wherever you go, performance issue wont be going away as table size increase in future , problem will reappear.Even if the queries are running with more FULL Scan , then re-platforming to Exadata might be right decision as Exadata has smart scan , cell offloading feature which works faster and might be right direction for best performance and best investment for future.Compression (compress for OLTP) could be one of the other feature to exploit to improve further efficiency while reading the lesser block in lesser read time.
    Investment in infrastructure will solve a few issue in short term but long term issue will again arise.
    Investment in identifying the performance issues of current system would be best investment in current scenario.

  • SCAN LISTENER runs from only one node at a time from /etc/hosts !

    Dear all ,
    Recently I have to configure RAC in oracle 11g(r2) in AIX 6.1 . Since in this moment it is not possible to configure DNS, so I dont use SCAN ip into the DNS/GNS, I just add the SCAN ip into the host file like :
    cat /etc/hosts
    SCAN 172.17.0.22
    Got the info from : http://www.freeoraclehelp.com/2011/12/scan-setup-for-oracle-11g-release211gr2.html#ORACLE11GR2RACINS
    After configuring all the steps of RAC , Every services are ok except SCAN_LISTENER . This listener is up only one node at a time . First time when I chek it from node1 , it shows :
    srvctl status scan_listener
    SCAN listener LISTENER_SCAN1 is enabled
    SCAN listener LISTENER_SCAN1 is running on node dcdbsvr1
    now when I relocate it from node 2 using
    "srvctl relocate scan -i 1-n DCDBSVR2" , then the output shows :
    srvctl status scan_listener
    SCAN listener LISTENER_SCAN1 is enabled
    SCAN listener LISTENER_SCAN1 is running on node dcdbsvr2
    Baring these , we have to try to relocate it from the node2 by the following way, then it shows the error :
    srvctl relocate scan -i 2 -n DCDBSVR2
    resource ora.scan2.vip does not exists
    Now my question , How can I run the SCAN and SCAN_LISTENER both of the NODES ?
    Here is my listener file (which is in the GRID home location) configuration :
    Listener File OF NODE1 AND NODE 2:
    ==================================
    ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON
    ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON
    LISTENER_SCAN1 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = IPC) (KEY = LISTENER_SCAN1)
    ADR_BASE_LISTENER_SCAN1 = /U01/APP/ORACLE
    2)
    Another issue , when I give the command : " ifconfig -a " , then it shows the SCAN ip either node1 or node2 . suppose if the SCAN ip is in the node1 , and then if I run the "relocate" command from node2 , the ip goes to the Node 2 . is it a correct situation ? advice plz ... ...
    thx in advance .. ...
    Edited by: shipon_97 on Jan 10, 2012 7:22 AM
    Edited by: shipon_97 on Jan 10, 2012 7:31 AM

    After configuring all the steps of RAC , Every services are ok except SCAN_LISTENER . This listener is up only one node at a time . First time when I chek it from node1 , it shows :If I am not wrong and after looking at the document you sent, you will be able to use only once scan in case you use /etc/host file and this will be up on only one node where you added this scan entry in /etc/hosts file.
    Now my question , How can I run the SCAN and SCAN_LISTENER both of the NODES ?Probably you can't in your case, you might run only one i think and on one node only
    srvctl status scan_listener
    SCAN listener LISTENER_SCAN1 is enabled
    SCAN listener LISTENER_SCAN1 is running on node dcdbsvr1
    now when I relocate it from node 2 using
    "srvctl relocate scan -i 1 -n DCDBSVR2" , then the output shows :
    srvctl status scan_listener
    SCAN listener LISTENER_SCAN1 is enabled
    SCAN listener LISTENER_SCAN1 is running on node dcdbsvr2You moved scan listener from node 1 to node 2, OK
    Baring these , we have to try to relocate it from the node2 by the following way, then it shows the error :
    srvctl relocate scan -i 2 -n DCDBSVR2
    resource ora.scan2.vip does not exists
    --------------------------------------------------------------------------------Since you have only one scan, you can't relocate "2". So ise "1" instead here also
    FYI
    http://www.oracle.com/technetwork/database/clustering/overview/scan-129069.pdf
    Salman

  • Rac node restart

    Hello everyone,
    I have met an error,that is our RAC node auto restart with below messages.
    #/u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/alert_odsdb1.log
    Fri Jun 07 12:23:42 2013
    Thread 1 cannot allocate new log, sequence 58363
    Checkpoint not complete
    Current log# 2 seq# 58362 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
    Current log# 2 seq# 58362 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
    Fri Jun 07 12:23:42 2013
    NOTE: ASMB terminating
    Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    ASMB (ospid: 32641): terminating the instance due to error 15064
    Fri Jun 07 12:23:44 2013
    ORA-1092 : opitsk aborting process
    Fri Jun 07 12:23:46 2013
    ORA-1092 : opitsk aborting process
    Instance terminated by ASMB, pid = 32641
    Fri Jun 07 12:25:02 2013
    Starting ORACLE instance (normal)
    Fri Jun 07 12:25:23 2013
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
    [name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'eth0' configured from GPnP for use as a public interface.
    [name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Public Interface 'eth0:1' configured from GPnP for use as a public interface.
    [name='eth0:1', type=1, ip=135.33.2.13, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0/dbhome_2/dbs/arch
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    Starting up:
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options.
    ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
    System name:     Linux
    Node name:     odsdb1
    Release:     2.6.18-308.el5
    Version:     #1 SMP Fri Jan 27 17:17:51 EST 2012
    Machine:     x86_64
    Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_2/dbs/initodsdb1.ora
    System parameters with non-default values:
    processes = 4500
    sessions = 6784
    event = ""
    spfile = "+DATA/odsdb/spfileodsdb.ora"
    nls_language = "SIMPLIFIED CHINESE"
    nls_territory = "CHINA"
    memory_target = 170G
    control_files = "+DATA/odsdb/controlfile/current.262.812288837"
    control_files = "+DATA/odsdb/controlfile/current.261.812288837"
    db_block_size = 8192
    compatible = "11.2.0.0.0"
    db_files = 4096
    cluster_database = TRUE
    db_create_file_dest = "+DATA"
    db_recovery_file_dest = ""
    db_recovery_file_dest_size= 38820M
    thread = 1
    undo_tablespace = "UNDOTBS1"
    instance_number = 1
    remote_login_passwordfile= "EXCLUSIVE"
    db_domain = ""
    dispatchers = "(PROTOCOL=TCP) (SERVICE=odsdbXDB)"
    remote_listener = "odsdb-cluster-scan:1521"
    job_queue_processes = 1000
    audit_file_dest = "/u01/app/oracle/admin/odsdb/adump"
    audit_trail = "DB"
    db_name = "odsdb"
    open_cursors = 300
    diagnostic_dest = "/u01/app/oracle"
    Cluster communication is configured to use the following interface(s) for this instance
    169.254.37.103
    cluster interconnect IPC version:Oracle UDP/IP (generic)
    IPC Vendor 1 proto 2
    Fri Jun 07 12:25:33 2013
    PMON started with pid=2, OS id=22959
    Fri Jun 07 12:25:33 2013
    PSP0 started with pid=3, OS id=22962
    Fri Jun 07 12:25:34 2013
    VKTM started with pid=4, OS id=22971 at elevated priority
    VKTM running at (1)millisec precision with DBRM quantum (100)ms
    Fri Jun 07 12:25:34 2013
    GEN0 started with pid=5, OS id=22977
    Fri Jun 07 12:25:34 2013
    DIAG started with pid=6, OS id=22979
    Fri Jun 07 12:25:35 2013
    DBRM started with pid=7, OS id=22981
    Fri Jun 07 12:25:35 2013
    PING started with pid=8, OS id=22983
    Fri Jun 07 12:25:35 2013
    ACMS started with pid=9, OS id=22985
    Fri Jun 07 12:25:35 2013
    DIA0 started with pid=10, OS id=22987
    Fri Jun 07 12:25:35 2013
    LMON started with pid=11, OS id=22989
    Fri Jun 07 12:25:35 2013
    LMD0 started with pid=12, OS id=22991
    * Load Monitor used for high load check
    * New Low - High Load Threshold Range = [61440 - 81920]
    Fri Jun 07 12:25:35 2013
    LMS0 started with pid=13, OS id=22994 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS1 started with pid=14, OS id=22998 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS2 started with pid=15, OS id=23002 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS3 started with pid=16, OS id=23006 at elevated priority
    Fri Jun 07 12:25:35 2013
    RMS0 started with pid=17, OS id=23010
    Fri Jun 07 12:25:35 2013
    LMHB started with pid=18, OS id=23013
    Fri Jun 07 12:25:35 2013
    MMAN started with pid=19, OS id=23015
    Fri Jun 07 12:25:35 2013
    DBW0 started with pid=20, OS id=23017
    Fri Jun 07 12:25:35 2013
    DBW1 started with pid=21, OS id=23019
    Fri Jun 07 12:25:35 2013
    DBW2 started with pid=22, OS id=23022
    Fri Jun 07 12:25:35 2013
    DBW3 started with pid=23, OS id=23024
    Fri Jun 07 12:25:35 2013
    DBW4 started with pid=24, OS id=23026
    Fri Jun 07 12:25:35 2013
    DBW5 started with pid=25, OS id=23028
    Fri Jun 07 12:25:35 2013
    DBW6 started with pid=26, OS id=23031
    Fri Jun 07 12:25:35 2013
    DBW7 started with pid=27, OS id=23033
    Fri Jun 07 12:25:35 2013
    LGWR started with pid=28, OS id=23035
    Fri Jun 07 12:25:35 2013
    CKPT started with pid=29, OS id=23037
    Fri Jun 07 12:25:35 2013
    SMON started with pid=30, OS id=23039
    Fri Jun 07 12:25:35 2013
    RECO started with pid=31, OS id=23041
    Fri Jun 07 12:25:35 2013
    RBAL started with pid=32, OS id=23043
    Fri Jun 07 12:25:35 2013
    ASMB started with pid=33, OS id=23045
    Fri Jun 07 12:25:35 2013
    MMON started with pid=34, OS id=23048
    Fri Jun 07 12:25:35 2013
    MMNL started with pid=35, OS id=23052
    Fri Jun 07 12:25:35 2013
    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
    NOTE: initiating MARK startup
    starting up 1 shared server(s) ...
    Starting background process MARK
    Fri Jun 07 12:25:35 2013
    MARK started with pid=37, OS id=23056
    NOTE: MARK has subscribed
    lmon registered with NM - instance number 1 (internal mem no 0)
    Reconfiguration started (old inc 0, new inc 119)
    List of instances:
    1 2 (myinst: 1)
    Global Resource Directory frozen
    * allocate domain 0, invalid = TRUE
    Communication channels reestablished
    * domain 0 valid according to instance 2
    * domain 0 valid = 1 according to instance 2
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Submitted all GCS remote-cache requests
    Fix write in gcs resources
    Reconfiguration started (old inc 119, new inc 121)
    List of instances:
    1 2 (myinst: 1)
    Nested reconfiguration detected.
    Global Resource Directory frozen
    Communication channels reestablished
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Fri Jun 07 12:25:45 2013
    Submitted all GCS remote-cache requests
    Fri Jun 07 12:26:08 2013
    Fix write in gcs resources
    Reconfiguration complete
    Fri Jun 07 12:26:10 2013
    LCK0 started with pid=40, OS id=23632
    Fri Jun 07 12:26:10 2013
    Starting background process RSMN
    Fri Jun 07 12:26:10 2013
    RSMN started with pid=41, OS id=23646
    ORACLE_BASE not set in environment. It is recommended
    that ORACLE_BASE be set in the environment
    Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
    Fri Jun 07 12:26:11 2013
    ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=135.33.2.13)(PORT=1521))))' SCOPE=MEMORY SID='odsdb1';
    ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
    Fri Jun 07 12:26:11 2013
    NOTE: Loaded library: System
    Fri Jun 07 12:26:11 2013
    SUCCESS: diskgroup DATA was mounted
    Fri Jun 07 12:26:11 2013
    NOTE: dependency between database odsdb and diskgroup resource ora.DATA.dg is established
    Fri Jun 07 12:26:16 2013
    Successful mount of redo thread 1, with mount id 3452000551
    Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
    Lost write protection disabled
    Completed: ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
    ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
    Picked broadcast on commit scheme to generate SCNs
    Thread 1 advanced to log sequence 58364 (thread open)
    Thread 1 opened at log sequence 58364
    Current log# 2 seq# 58364 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
    Current log# 2 seq# 58364 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
    Successful open of redo thread 1
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Fri Jun 07 12:26:21 2013
    SMON: enabling cache recovery
    Fri Jun 07 12:26:23 2013
    minact-scn: Inst 1 is a slave inc#:121 mmon proc-id:23048 status:0x2
    minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
    Fri Jun 07 12:26:34 2013
    [23651] Successfully onlined Undo Tablespace 2.
    Undo initialization finished serial:0 start:2061372614 end:2061384964 diff:12350 (123 seconds)
    Verifying file header compatibility for 11g tablespace encryption..
    Verifying 11g file header compatibility for tablespace encryption completed
    Fri Jun 07 12:26:34 2013
    SMON: enabling tx recovery
    Database Characterset is ZHS16GBK
    No Resource Manager plan active
    Starting background process GTX0
    Fri Jun 07 12:26:35 2013
    GTX0 started with pid=45, OS id=23931
    Starting background process RCBG
    Fri Jun 07 12:26:35 2013
    RCBG started with pid=46, OS id=23933
    replication_dependency_tracking turned off (no async multimaster replication found)
    Starting background process QMNC
    Fri Jun 07 12:26:35 2013
    QMNC started with pid=48, OS id=23940
    Completed: ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
    Fri Jun 07 12:26:38 2013
    Starting background process CJQ0
    Fri Jun 07 12:26:38 2013
    CJQ0 started with pid=55, OS id=23977
    Fri Jun 07 12:27:56 2013
    Thread 1 advanced to log sequence 58365 (LGWR switch)
    Current log# 1 seq# 58365 mem# 0: +DATA/odsdb/onlinelog/group_1.263.812288839
    Current log# 1 seq# 58365 mem# 1: +DATA/odsdb/onlinelog/group_1.264.812288839
    Fri Jun 07 12:28:18 2013
    Starting background process SMCO
    Fri Jun 07 12:28:18 2013
    SMCO started with pid=70, OS id=25166
    Fri Jun 07 12:29:01 2013
    Thread 1 cannot allocate new log, sequence 58366
    Trace file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options
    ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
    System name: Linux
    Node name: odsdb1
    Release: 2.6.18-308.el5
    Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
    Machine: x86_64
    Instance name: odsdb1
    Redo thread mounted by this instance: 0 <none>
    Oracle process number: 33
    Unix process pid: 32641, image: oracle@odsdb1 (ASMB)
    *** 2013-05-14 15:37:08.705
    *** SESSION ID:(3499.1) 2013-05-14 15:37:08.705
    *** CLIENT ID:() 2013-05-14 15:37:08.705
    *** SERVICE NAME:() 2013-05-14 15:37:08.705
    *** MODULE NAME:() 2013-05-14 15:37:08.705
    *** ACTION NAME:() 2013-05-14 15:37:08.705
    NOTE: initiating MARK startup
    *** 2013-05-14 15:37:16.835
    instance health monitoring reports instance shutting down
    *** 2013-06-07 12:23:42.700
    NOTE: ASMB terminating
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    error 15064 detected in background process
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    kjzduptcctx: Notifying DIAG for crash event
    ----- Abridged Call Stack Trace -----
    ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1332<-ksbrdp()+3344<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
    ----- End of Abridged Call Stack Trace -----
    *** 2013-06-07 12:23:42.783
    ASMB (ospid: 32641): terminating the instance due to error 15064
    /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
    NOTE: ASMB process exiting, either shutdown is in progress
    NOTE: or foreground connected to ASMB was killed.
    Fri Jun 07 12:23:42 2013
    NOTE: client exited [14808]
    Fri Jun 07 12:23:44 2013
    Received an instance abort message from instance 2
    Please check instance 2 alert and LMON trace files for detail.
    Fri Jun 07 12:23:44 2013
    Received an instance abort message from instance 2
    Please check instance 2 alert and LMON trace files for detail.
    LMD0 (ospid: 31201): terminating the instance due to error 481
    Instance terminated by LMD0, pid = 31201
    Fri Jun 07 12:24:30 2013
    * instance_number obtained from CSS = 1, checking for the existence of node 0...
    * node 0 does not exist. instance_number = 1
    Starting ORACLE instance (normal)
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
    [name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'eth0' configured from GPnP for use as a public interface.
    [name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.2/grid/dbs/arch
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    [grid@odsdb1 cssd]$ file core.30481
    core.30481: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'ocssd.bin'
    [grid@odsdb1 cssd]$ gdb
    gdb gdbserver gdbtui
    [grid@odsdb1 cssd]$ gdb ocssd.bin core.30481
    GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
    Copyright (C) 2009 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law. Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /u01/app/11.2.0.2/grid/bin/ocssd.bin...(no debugging symbols found)...done.
    [New Thread 30486]
    [New Thread 30530]
    [New Thread 30526]
    [New Thread 30525]
    [New Thread 30523]
    [New Thread 30522]
    [New Thread 30521]
    [New Thread 30520]
    [New Thread 30519]
    [New Thread 30504]
    [New Thread 30503]
    [New Thread 30495]
    [New Thread 30485]
    [New Thread 30484]
    [New Thread 30483]
    [New Thread 30481]
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libhasgen11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libhasgen11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocr11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocr11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrb11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrb11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrutl11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrutl11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxn2.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxn2.so
    Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libdl.so.2
    Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libm.so.6
    Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
    [Thread debugging using libthread_db enabled]
    Loaded symbols for /lib64/libpthread.so.0
    Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libnsl.so.1
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libcell11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libcell11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxp11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxp11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libnnz11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnnz11.so
    Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libc.so.6
    Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib64/libaio.so.1
    Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib64/ld-linux-x86-64.so.2
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libnque11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnque11.so
    Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...(no debugging symbols found)...done.
    Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so
    warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff505fd000
    Core was generated by `/u01/app/11.2.0.2/grid/bin/ocssd.bin '.
    Program terminated with signal 6, Aborted.
    #0 0x000000369ea30265 in raise () from /lib64/libc.so.6
    (gdb) where
    #0 0x000000369ea30265 in raise () from /lib64/libc.so.6
    #1 0x000000369ea31d10 in abort () from /lib64/libc.so.6
    #2 0x00002afc67f9aeda in scls_abort (flags=0) at scls.c:7088
    #3 0x000000000040babd in clssscExit (thrd=0x10d325a0, status=clssscreasonSHUTNORM) at clsssc.c:2155
    #4 0x0000000000446221 in clssgmClientShutdown (thrd=0x10d325a0, cmInfo=0x10b40090) at clssgmc.c:6415
    #5 0x0000000000436707 in clssgmProcClientReqs (thrd=0x10d325a0, clctx=0x10b40630) at clssgmc.c:704
    #6 0x0000000000436405 in clssgmclientlsnr (thrd=0x10d325a0) at clssgmc.c:644
    #7 0x000000000040ac2f in clssscthrdmain (thrd=0x10d325a0) at clsssc.c:1716
    #8 0x000000369fa0677d in start_thread () from /lib64/libpthread.so.0
    #9 0x000000369ead49ad in clone () from /lib64/libc.so.6
    (gdb)
    2013-06-07 12:19:37.377: [    CSSD][1085888832]clssscSelect: cookie accept request 0x10b40630
    2013-06-07 12:19:37.377: [    CSSD][1085888832]clssgmAllocProc: (0x2aaab0133ea0) allocated
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: properties of cmProc 0x2aaab0133ea0 - 1,2,3,4,5
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: Connect from con(0x6ae44fa) proc(0x2aaab0133ea0) pid(14139/14139) version 11:2:1:4, properties: 1,2,3,4,5
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: msg flags 0x0000
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(1/0x2aaab010c5c0)
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmRegisterShared: grp DBODSDB, mbr 0, type 1
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmQueueShare: (0x2aaab0085790) target global grock DBODSDB member 0 type 1 queued from client (0x2aaab010c5c0), global grock DBODSDB, refcount 23
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmRegisterShared: global grock DBODSDB member 0 share type 1, refcount 23
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(2/0x2aaab0061f10)
    what is the problem
    Edited by: 徐振富 on 2013-6-7 下午6:38
    Edited by: 徐振富 on 2013-6-7 下午6:45

    is your ASM instance up?
    If not, trying bring up ASM instance up just by itself and see if it throws any error?
    Post status of crsctl status cluster -all

  • RAC node outage causes SOA Suite 10.1.3.4 BPEL  failure

    Using weblogic 9.2 and the SOA Suite 10.1.3.4. We use a 10g Oracle RAC ( 2 nodes ); the WL cluster has a multi data source of 2 pools, each pool pointing to a single node in the rac, each pool deployed to the cluster, and the multi data source in load-balancing mode.
    So the other night, one of the db nodes had a hardware failure ( ironically, with a remote monitoring / management card ). Annoying, but it should not have caused the BPEL servers to be in "FAILED NOT RESTARTABLE" status the next morning.
    Jun 9, 2009 12:10:07 AM EDT> <Warning> <JDBC> <BEA-001129> <Received exception while creating connection for pool "esbaqds2": Io exception: The Network Adapter could not establish the connection>
    SEVERE: Destroying JMSDequeuer failed
    oracle.jms.AQjmsException: Connection has been administratively destroyed. Reconnect.
    at oracle.jms.AQjmsSession.preClose(AQjmsSession.java:980)
    at oracle.jms.AQjmsObject.close(AQjmsObject.java:409)
    at oracle.jms.AQjmsSession.close(AQjmsSession.java:1020)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.destroy(JMSDequeuer.java:419)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.destroyWithoutUnsubscribing(JMSDequeuer.java:395)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:175)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    followed by a 2 GB log file containing 1.3 million iterations of the following within the next 10 minutes before the managed servers failed.
    java.lang.NullPointerException
    at java.lang.String.<init>(String.java:144)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:168)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    Both managed instances of the BPEL cluster failed, even though the 1st node of the Oracle RAC was still available.
    Our 10.3 cluster, also using multi data sources to the same RAC for the OSB components, simply went on about its business using the remaining rac node pool.
    Seems to be a single point of failure...

    We haven't changed the JDBC connection string yet, but we did run a test in the same environment while Oracle support considers the situation.
    For the test, we simply shutdown one node of the RAC and watched to see what happens. Within the space of a minute, the JDBC "Failed Reserve Request Count" was increasing by thousands on every refresh of the screen. We restarted the RAC node after 5 minutes, by which time the "Failed Reserve Request Count" was over 190,000
    The 2 BPEL managed servers remained in Running status and each created a 660 MB log file within that 5 minutes. In the original outage, the nodes were down for about 15 minutes. Most of the logging is being generated from within the oracle.tip.esb classes, not by the weblogic classes. It looks like that once the pool pointing to the downed RAC node becomes disabled, the Oracle BPEL code is still trying to use it even though the multi-source JNDI is the published lookup:
    INFO: JMSDequeuer::createConnection - AQ Topics
    java.sql.SQLException: weblogic.common.ResourceException:
    esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
    esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
    at weblogic.jdbc.common.internal.JDBCUtil.wrapAndThrowResourceException(JDBCUtil.java:250)
    at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:348)
    at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:364)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:559)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    Caused by: weblogic.common.ResourceException:
    esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
    esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
    at weblogic.jdbc.common.internal.MultiPool.searchLoadBalance(MultiPool.java:331)
    at weblogic.jdbc.common.internal.MultiPool.findPool(MultiPool.java:202)
    at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:77)
    at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:346)
    ... 11 more
    SEVERE: Failed to process deferred message
    oracle.tip.esb.server.dispatch.QueueHandlerException: Error creating "weblogic.common.ResourceException: No good connections available."
    at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:661)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)

  • Exception while failing over to 2nd RAC Node

    We are using Weblogic 10.3.4. Our setup is that we have a Web Application (A tapestry front end Web UI) and EJb 2.1 back-end talking to the Oracle database. The EJB’s are CMP. Our product always was just stand alone and it wasn’t until this release we needed to make it work with RAC. To get this to work we followed the model of having a Multidatasource with datasources pointing to our RAC nodes. We have two types of datasources that we use persistent and non-persistent. And we are using the Oracle thin driver – non-XA for RAC Service Instances, supporting global transactions.
    When we do failover to the 2nd node we get a nasty exception in our GUI but after logging out and logging back it we are fine.
    My question is that I assumed I shouldn't have to restart our web-application and it should have stayed up ?? Or is there something wrong with our setup ?
    Thanks,
    Ian

    Showing us the exception and/or the error messages at the server might help...
    Note that failing over does not save any ongoing connection or transaction that
    had been to the dead RAC node... Does your web-app get-use-close JDBC
    connections on a per-user-invoke basis, or does it hold onto connections?
    Joe

  • Private Interconnect: Should any nodes other than RAC nodes have one?

    The contractors that set up our four-node production 10g RAC (and a standalone development server) also assigned private interconnect addresses to 2 Apache/ApEx servers and a standalone development database server.
    There are service names in the tnsnames.ora on all servers in our infrastructure referencing these private interconnects- even the non-rac member servers. The nics on these servers are not bound for failover with the nics bound to the public/VIP addresses. These nics are isolated on their own switch.
    Could this configuration be related to lost heartbeats or voting disk errors? We experience rac node expulsions and even arbitrary bounces (reboots!) of all the rac nodes.

    I do not have access to the contractors. . . .can only look at what they have left behind and try to figure out their intention. . .
    I am reading the Ault/Tumha book Oracle 10g Grid and Real Application Clusters and looking through our own settings and config files and learning srvctl and crsctl commands from their examples. Also googling and OTN searching through the library full of documentation. . .
    I still have yet to figure out if the private interconnect spoken about so frequently in cluster configuration documents are the binding to the set of node.vip address specifications in the tnsnames.ora (bound the the first eth adaptor along with the public ip addresses for the nodes) or the binding on the second eth adaptor to the node.prv addresses not found in the local pfile, in the tnsnames.ora, or the listener.ora (but found at the operating system level in the ifconfig). If the node.prv addresses are not the private interconnect then can anyone tell me that they are for?

  • Found the errors in CSSD logs of RAC node

    Found the below error in CSSD logs in One of RAC nodes from 5:15 to 5:18 PM, after this the error got disappeared. Could anyone please have an idea what could be the reason of this error.
    Also, at that time we didn't find any errors in the alert log.
    [    CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization failed (112bd2a70), timed out, start 17:13:51.041, duration 120009
    [    CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:15:51.233 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(1049540) proto(10:2:1:1)
    [    CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization failed (112bd4a10), timed out, start 17:13:51.268, duration 120003
    [    CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization prepare time: 3 ms
    [    CSSD]2009-07-19 17:15:52.544 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(786918) proto(10:2:1:1)
    [    CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization failed (112c38af0), timed out, start 17:13:53.290, duration 120009
    [    CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization prepare time: 3 ms
    [    CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization failed (112d356f0), timed out, start 17:13:53.320, duration 120000
    [    CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:16:02.342 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b932b0) proc(112b67d10) pid(1336252) proto(10:2:1:1)
    [    CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization failed (112d04f70), timed out, start 17:14:02.978, duration 120001
    [    CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization failed (112d38210), timed out, start 17:14:03.006, duration 120002
    [    CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization failed (112bd7e30), timed out, start 17:14:10.441, duration 120007
    [    CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization failed (112d3ee70), timed out, start 17:14:10.840, duration 120008
    [    CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization prepare time: 2 ms
    Thanks,
    Mahi

    Check the metalink note:
    6996694-OCSSD.BIN CONSUMING 100% CPU AND ASM/DB HANGING

  • Question on Rebooting RAC Nodes

    Hi, I heard that when rebooting all RAC nodes, one has to wait at least 5 minutes between each node reboot. So you would reboot node 1 at 0 time, node 2 5 minutes later etc.
    However, I could not find any documentation on this, can someone please point me to the right place to look? Thanks.

    I have not heard that before. I generally use srvctl to stop/start the databases, I do not think it waits 5 minutes between starting nodes as it does not take 15 minutes to stop/start the databases. As far as rebooting the hosts, the only time interval between machines was the time it took to send the reboot command to the host.

Maybe you are looking for