X86 cluster 3.2

Hi
I have a 2 node cluster  with one RG     configured
Currently the  RG and  one of the resource status is showing as Faulted as below
# clrg status
=== Cluster Resource Groups ===
Group Name   Node Name                         Suspended   Status
oracle-rg    cert1.cin.com   No          Online_faulted
                 cert2.cin.com   No          Offline
# clrs status
=== Cluster Resources ===
Resource Name         Node Name                         State     Status Message
oracle-storage-res    cert1.cin.com   Online    Faulted - I/O timed out on path /dev/md/ora-data/rdsk/d9
                              cert2.cin.com   Offline   Offline
THere are   5 disks in the ora-data diskset
Output of "cldevice show"  command from cert2.cin.com server shows the physical path of d9 disk above as accessible but
Running cluster status command on cert2.cin.com hangs midway  after the   resource status output
I am not able to login to cert1.cin.com where the RG is active . It does not come to the login prompt but the server is pinging
I am not sure why the resource is showing online while as the message is Faulted and the  node also is hung
Please help me in how to clear the Faulted state and recover from this condition
Can I switch the RG  to cert2.cin.com safely
Appreciate any help TIA

Hi ra*326096*ul,
it seems there was a timeout on device /dev/md/ora-data/rdsk/d9 on node cert1. Maybe the /var/adm/messages file give more information for the cause of this timeout. Certainly, you need to login to cert1 to check this. If you are not able to login via the network maybe you can access the cert1 via the console port and check the status of the network and the SVM d9 device? And what are the underlying cluster ‘did’ devices of the SVM d9 device? You can use metastat if you can access cert1 via console port.
It’s not sure if you can switch the RG safely to cert2 due to the fact that the ‘cluster status’ command is not finishing successful. Because after the resources output it looks to the cluster ‘did’ devices. So, if you know which are the underlying cluster ‘did’ devices of the SVM d9 device then you can check if the relevant cluster ‘did’ devices are ok. The ‘cldevice show’ command shows the cluster ‘did’ devices but not the SVM d9 device. So, there is no need that cluster ‘did’ device d9 is part of SVM d9 device. If you know the cluster ‘did’ devices which are part of SVM d9 device you check if the physical device of these’ did’ devices are ok. Maybe ‘scdidadm -L’ is also helpful to show a summary of all nodes for cluster ‘did’ devices. When you have checked that the physical devices of the SVM d9 device are ok on cert2 then I believe you can switch the RG successful.  Does 'scstat -D' work on cert2?
Hth,
  Juergen

Similar Messages

  • Linux/x86 cluster working with Xgrid

    Hi all!, any one know how i can connect my old linux/x86 cluster (working whit redhat fedora c1) and new one (with mandriva clustering) as agent in xgrid?, thanks!

    At sourceforge.net you can find Java-based Xgrid client for Windows/Linux/UNIX. Maybe it can help you.

  • Zpool wont import: /var/cluster/run/HAStoragePlus/zfs not a valid directory

    I have this two-node Solaris 10 x86 cluster that refuses to fail the HASP zpool over to the second node.
    On node 2, log complains that /var/cluster/run/HAStoragePlus/zfs is not a valid directory:
    Aug 17 22:23:28 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 148650 daemon.notice] Started searching for devices in '/dev/dsk' to find the importable pools.
    Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 547433 daemon.notice] Completed searching the devices in '/dev/dsk' to find the importable pools.
    Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 471757 daemon.error] cannot import pool 'qnap' : '/var/cluster/run/HAStoragePlus/zfs' is not a valid directory
    Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 117328 daemon.error] The pool 'qnap' failed to import and populate cachefile.
    Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 292307 daemon.error] Failed to import:qnap
    In fact, on node 2 this folder is missing, i.e. the /var/cluster/run/HAStoragePlus folder isn't there.
    If I create that folder, and the zfs folder inside it, then switchover works flawlessly both ways, but if I reboot node 2, that folder
    gets cleared and the problem is back again. Looks like a bug to me, but I perused sunsolve without finding anything.
    This is Sun Cluster 3.2 update 2 with latest patchset, running on a Solaris 10 x86 update 7 (latest).
    I already tried removing and recreating the HASP resource, no way.
    Any hint will be greatly appreciated.
    Thanks
    Rick
    Edited by: leopardus2 on Aug 18, 2009 2:46 AM

    After many sleepless nights looking at this problem, I ended up solving it by myself right 30 minutes after posting here!!
    The problem was due to ZFS plugin not being configured in /etc/cluster/eventlog/eventlog.conf !!
    I think this happened because I installed HASP on node 1 before node 2 joined the cluster for the first time.
    Should be marked as a bug IMHO...
    Thanks
    Rick

  • Error when creating zone cluster

    Hello,
    I have the following setup: Solaris 11.2 x86, cluster 4.2. I have already configured the cluster and it's up and running. I am trying to create a zone cluster, but getting the following error:
    >>> Result of the Creation for the Zone cluster(ztestcluster) <<<
        The zone cluster is being configured with the following configuration
            /usr/cluster/bin/clzonecluster configure ztestcluster
            create
            set zonepath=/zclusterpool/znode
            set brand=cluster
            set ip-type=shared
            set enable_priv_net=true
            add sysid
            set  root_password=********
            end
            add node
            set physical-host=node2
            set hostname=zclnode2
            add net
            set address=192.168.10.52
            set physical=net1
            end
            end
            add node
            set physical-host=node1
            set hostname=zclnode1
            add net
            set address=192.168.10.51
            set physical=net1
            end
            end
            add net
            set address=192.168.10.55
            end
    java.lang.NullPointerException
            at java.util.regex.Matcher.getTextLength(Matcher.java:1234)
            at java.util.regex.Matcher.reset(Matcher.java:308)
            at java.util.regex.Matcher.<init>(Matcher.java:228)
            at java.util.regex.Pattern.matcher(Pattern.java:1088)
            at com.sun.cluster.zcwizards.zonecluster.ZCWizardResultPanel.consoleInteraction(ZCWizardResultPanel.java:181)
            at com.sun.cluster.dswizards.clisdk.core.IteratorLayout.cliConsoleInteraction(IteratorLayout.java:563)
            at com.sun.cluster.dswizards.clisdk.core.IteratorLayout.displayPanel(IteratorLayout.java:623)
            at com.sun.cluster.dswizards.clisdk.core.IteratorLayout.run(IteratorLayout.java:607)
            at java.lang.Thread.run(Thread.java:745)
                 ERROR: System configuration error
                 As a result of a change to the system configuration, a resource that this
                 wizard will create is now invalid. Review any changes that were made to the
                 system after you started this wizard to determine which changes might have
                 caused this error. Then quit and restart this wizard.
        Press RETURN to close the wizard
    No errors in /var/adm/messages.
    Any ideas?
    Thank you!

    I must be doing some obvious, stupid mistake, cause I still get that "not enough space" error
    root@node1:~# clzonecluster show ztestcluster
    === Zone Clusters ===
    Zone Cluster Name:                              ztestcluster
      zonename:                                        ztestcluster
      zonepath:                                        /zcluster/znode
      autoboot:                                        TRUE
      brand:                                           solaris
      bootargs:                                        <NULL>
      pool:                                            <NULL>
      limitpriv:                                       <NULL>
      scheduling-class:                                <NULL>
      ip-type:                                         shared
      enable_priv_net:                                 TRUE
      resource_security:                               SECURE
      --- Solaris Resources for ztestcluster ---
      Resource Name:                                net
        address:                                       192.168.10.55
        physical:                                      auto
      --- Zone Cluster Nodes for ztestcluster ---
      Node Name:                                    node2
        physical-host:                                 node2
        hostname:                                      zclnode2
        --- Solaris Resources for node2 ---
      Node Name:                                    node1
        physical-host:                                 node1
        hostname:                                      zclnode1
        --- Solaris Resources for node1 ---
    root@node1:~# clzonecluster install ztestcluster
    Waiting for zone install commands to complete on all the nodes of the zone cluster "ztestcluster"...
    clzonecluster:  (C801046) Command execution failed on node node2. Please refer to the console for more information
    clzonecluster:  (C801046) Command execution failed on node node1. Please refer to the console for more information
    But I have enough FS space. I increased the virtual HDD to 25GB on each node. After global cluster installation, I still have 16GB free on each node. During the install I constantly check the free space and it should be enough (only about 500MB is consumed by downloaded packages, which leaves about 15.5GB free).  And every time the installation fails at "apply-sysconfig checkpoint"...

  • I/O tuning on Linux X86

    As part of the due dilgence we are conduction research into possible issues of moving a datawarehouse database from Sun -12k to RAC on x86 linux cluster on HP or Dell. One of the issues was the I/O bus speed comparision between the 12k to HP or Dell cluster..the throughput for a PCI-e on Sun 12k is about 8GB/s and as compared to a decent PCI 64-bit/100 MHz on a Lintel is about 799.99 MB/s
    .Our SAN storage would be still be on EMC with F/C attached and using most likely a CFS...Does anyone feel this is a due concern -if all else remains the same if we move the database from sun 12k to a linux x86 cluster will I/O be a bottleneck specifically Bus speeds?
    Thanks
    Praveen

    by default on linux env. large IO operations are broken into 512K chunks (on 2.6 kernel) separating system IO into smaller sizes. On Sun you are able to perform large 1Mb IO so when you move to Linux you should see some 'degradation'.
    In order to allow oracle to perform large IO operations on Linux you should adjust some kernel parameters - check on metalink for parameters aio-max-size, aio-nr, aio-max-nr etc...
    regards,
    goran

  • Solaris 8 x86 patch cluster installation

    hi. i recently installed solaris 8 x86 on my intel p4 machine. i got through the installation fine but i am now running into problems with installing the 8x86 recommended patch cluster.
    whenever i run the install_cluster script, it says every patch failed to install due to return code 1. i even tried manually installing the patches manually via patchadd...but i end up getting a message saying the patch directory is not valid.
    i basically downloaded the 8x86 recommended patch cluster zip file on a windows 2k machine. i unzip the file and then burn the patches onto a cd. i then copy the patches from the cd onto my solaris machine and try to install the patches that way. so far...this doesn't work..and i dont know what im doing wrong.
    does anyone know how to fix this problem? thx.

    hi again. i just fixed my problem. apparently when i unzipped the file and then burned it onto cd...the data got corrupted. i fixed it by copying the zip file onto cd and then extracting the patches onto the solaris machine.

  • X86 sc3.1-0805 sol10-0606 - Doesn't boot in cluster mode

    Hi,
    i'm at my first experience with Sun Cluster on x86.
    I've already tried at home with two p4 whiteboxes and now repeating the experiment here at work with a similar conf.( i happily run 6 v490 in 3 clustered pair with 3510Fc and a test 2 u10 clustered pair with Multipack).
    no matter what i try i ever end up with the same results:
    nodes boot up outside of the cluster. Interconnects doesn't start and (i think maybe cause of that) global devices don't get initialized.
    I already tried many reinstall, already tried to add etc/cluster/nodeid to the filelist.ramdisk and update boot archive e reconfigure, like described in an infodoc to workaround a well known problem, but nothing changed
    This is the situation as i start either one of the nodes:
    mordor-nodo2 # svcs -x
    svc:/system/cluster/mountgfsys:default (Suncluster mountgfsys service)
    State: maintenance since Tue Aug 08 15:37:51 2006
    Reason: Restarter svc:/system/svc/restarter:default gave no explanation.
    See: http://sun.com/msg/SMF-8000-9C
    See: /var/svc/log/system-cluster-mountgfsys:default.log
    Impact: 13 dependent services are not running. (Use -v for list.)
    svc:/system/cluster/gdevsync:default (Suncluster gdevsync service)
    State: maintenance since Tue Aug 08 15:37:51 2006
    Reason: Restarter svc:/system/svc/restarter:default gave no explanation.
    See: http://sun.com/msg/SMF-8000-9C
    See: /var/svc/log/system-cluster-gdevsync:default.log
    Impact: 13 dependent services are not running. (Use -v for list.)
    svc:/network/multipath:cluster (Network Monitor Daemon)
    State: maintenance since Tue Aug 08 15:37:39 2006
    Reason: Maintenance requested by an administrator.
    See: http://sun.com/msg/SMF-8000-63
    See: in.mpathd(1M)
    See: /etc/svc/volatile/network-multipath:cluster.log
    Impact: This service is not running.
    Only the public interface is up in sc_ipmp0 group
    Cluster interconnects are 3com elxl interface in all two nodes and are connected with cross-cables elxl0->>elxl0 elxl1-->elxl1 (verified that it works)
    I've removed switches and put cross-cables while troubleshooting to have a simpler setup.
    /etc/vfstab - every fs is mirrored - metadb are in s7 , globalfs in s3
    /dev/md/dsk/d0 - - swap - no -
    /dev/md/dsk/d10 /dev/md/rdsk/d10 / ufs 1 no -
    /devices - /devices devfs - no -
    ctfs - /system/contract ctfs - no -
    objfs - /system/object objfs - no -
    swap - /tmp tmpfs - yes -
    #/dev/md/dsk/d20 /dev/md/rdsk/d20 /globaldevices ufs 2 yes -
    /dev/md/dsk/d20 /dev/md/rdsk/d20 /global/.devices/node@1 ufs 2 no global
    Planning to add a multipack for the multihost disks. But i'd like to solve this problem before
    Nothing useful appears on the logs.
    I only got the initial
    Not booting in cluster mode
    and nothing more
    Maybe missing something about versions of the software/hardware am using that for any reason can't work togheter?
    or some fix is needed?
    Any hint would be appreciated
    I stay at disposal for any kind of info
    thanks

    Hi,
    you should elaborate a bit on your hardware.
    Is it x64 or x32, what is your shared storage.
    The "not booting in cluster mode" appears in example if you want to install SC 3.1 and SC3.2 on x32 hardware. If this your goal, you should start with Solaris Express and Solaris Cluster express.
    Kind regards
    Detlef

  • SUn CLuster 3.2 install - scinstall on x86 32 bit Solaris 5.10

    Ok - I have 2 machines with 32-bit x86 SOlaris 5.10 - I installed the Sun Cluster 3.2 software but everytime I try scinstall it says rebooting other node and the other node never brings sun cluster up -
    Questions:
    1. The private interface does not come up on reboot - should it - and should I have an entry for cluster-priv1 in /etc/hosts
    2. I know I am supposed to do a scvx -xv - I then try to enable the cluster service, but everything says disabled
    I have tried this 5 times no luck - I have lots of cluster experience - and can get Oracle CRS working fine
    Any thoughts

    Yeahh, guys!!!
    I was trying to establish a two-node cluster using VirtualBox + Solaris x86 + Sun Cluster 3.2. The node where I was running scinstall to configure my cluster environment was rebooting the other node in the end of the configuration process but it was hanging in the "Rebooting node01..." message just because it was not able to establish the cluster.
    After see your comments, I changed Solaris x86 to Solaris Express Community Edition and Sun Cluster to Cluster Express and now everything is working fine!
    Thanks!
    Jansen Sena <[email protected]>

  • OracleAS R2 - Cluster Mixed Solaris SPARC/x86?

    Is a mixed Solaris SPARC/x86 active-active cluster environment supported for OracleAS R2? What I mean by this is, can I put together a supported environment where an Identity Management node (node 1) is running as SPARC, and a second Identity Management node (node 2) is running Solaris x86?
    Both OS's would be as identically configured as possible (both OS version & patch levels).
    Cheers, Brad

    Metalink note 429995.1
    Says
    Goal
    Is it supported to install Application Server Oracle Homes on different operating systems or different versions of the same operating system?
    Example 1: AS Infrastructure is installed on a Solaris 8 server and a Business Intelligence and Forms Middle Tier is installed on a Solaris 10 server.
    Example 2: AS Infrastructure is installed on a Red Hat linux server and Business Intelligence and Forms Middle Tiers are installed on Windows 2003 servers.
    Solution
    It is completely supported to install each Application Server oracle home onto a different operating system or onto different versions of the same operating system.
    Both of the above example scenarios are supported.
    The only restriction is that members of a Middle-tier DCM-Managed OracleAS Cluster must be on the same operating system 'flavour'. As per the High Availability Guide:
    All Oracle Application Server instances that are to be members of a DCM-Managed OracleAS Cluster must be installed on the same flavour operating system. For example, different variants of UNIX are clusterable together, but they are not clusterable with Windows systems.
    Greetings

  • Sun cluster patch for solaris 10 x86

    I have Solaris 10 6/06 installed on x4100 box with 2 node clustering using Sun Cluster 3.1 8/05. I just want to know is there any latest patches available for the OS to prevent cluster related bugs. what are they? My kernel patch is 118855-19.
    any inputs needed. let me know.

    Well, I would run S10 updatemanager and get the latest patches that way.
    Tim
    ---

  • X86-64 GNU/LINUX cluster failover alert

    Hi..
    I would like to know how to set the email alert so that I can receive email when the gnu/linux cluster failover to another node. Please Help
    Thanks in advance ...

    Check the documentation of the clusterware product you are using. It will explain how to run commands on failover.

  • Error while checking the status of Oracle Cluster ware

    Hi
    I was trying to install the database using dbca after setting up the grid and database software on LINUX x86-64 RHEL 5.7 machine. The database software version is 11.2.0.3. It throwing the error regarding the connectivity of clusterware. So I checked the status of clusterware.
    -bash-3.2$ ./crsctl stat res -t
    CRS-4535: Cannot communicate with Cluster Ready Services
    CRS-4000: Command Status failed, or completed with errors.
    -bash-3.2$
    But when I ran below one:
    -bash-3.2$ ./crsctl stat res -t -init
    NAME TARGET STATE SERVER STATE_DETAILS
    Cluster Resources
    ora.asm
    1 ONLINE ONLINE sfv9699 Started
    ora.cluster_interconnect.haip
    1 ONLINE ONLINE sfv9699
    ora.crf
    1 ONLINE ONLINE sfv9699
    ora.crsd
    1 ONLINE OFFLINE
    ora.cssd
    1 ONLINE ONLINE sfv9699
    ora.cssdmonitor
    1 ONLINE ONLINE sfv9699
    ora.ctssd
    1 ONLINE ONLINE sfv9699 OBSERVER
    ora.diskmon
    1 OFFLINE OFFLINE
    ora.drivers.acfs
    1 ONLINE ONLINE sfv9699
    ora.evmd
    1 ONLINE INTERMEDIATE sfv9699
    ora.gipcd
    1 ONLINE ONLINE sfv9699
    ora.gpnpd
    1 ONLINE ONLINE sfv9699
    ora.mdnsd
    1 ONLINE ONLINE sfv9699
    So i saw that the crsd having some issue. I checked the alert log and crsd log. Below are the output.
    Alert <server_name>.log
    2012-10-20 15:37:51.408
    [ohasd(3694)]CRS-2765:Resource 'ora.crsd' has failed on server 'sfv9699'.
    2012-10-20 15:37:52.968
    [crsd(5188)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /oracle2/app/11.2.0/grid/log/sfv9699/crsd/crsd.log.
    2012-10-20 15:37:52.984
    [crsd(5188)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
    ORA-27140: attach to post/wait facility failed
    ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
    ORA-27301: OS failure message: Operation not permitted
    ORA-27302: failure occurred at: skgpwinit6
    ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
    ]. Details at (:CRSD00111:) in /oracle2/app/11.2.0/grid/log/sfv9699/crsd/crsd.log.
    2012-10-20 15:37:53.471
    [ohasd(3694)]CRS-2765:Resource 'ora.crsd' has failed on server 'sfv9699'.
    2012-10-20 15:37:53.472
    [ohasd(3694)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
    CRSD.log
    2012-10-20 15:37:52.456: [ CRSMAIN][3563381328] Checking the OCR device
    2012-10-20 15:37:52.457: [ CRSMAIN][3563381328] Sync-up with OCR
    2012-10-20 15:37:52.457: [ CRSMAIN][3563381328] Connecting to the CSS Daemon
    2012-10-20 15:37:52.457: [ CRSMAIN][3563381328] Getting local node number
    2012-10-20 15:37:52.459: [ CRSMAIN][3563381328] Initializing OCR
    [   CLWAL][3563381328]clsw_Initialize: OLR initlevel [70000]
    2012-10-20 15:37:52.897: [  OCRASM][3563381328]proprasmo: Error in open/create file in dg [DATA]
    [  OCRASM][3563381328]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=27140, loc=kgfokge
    2012-10-20 15:37:52.898: [  OCRASM][3563381328]ASM Error Stack : ORA-27140: attach to post/wait facility failed
    ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
    ORA-27301: OS failure message: Operation not permitted
    ORA-27302: failure occurred at: skgpwinit6
    ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
    2012-10-20 15:37:52.967: [  OCRASM][3563381328]proprasmo: kgfoCheckMount returned [7]
    2012-10-20 15:37:52.967: [  OCRASM][3563381328]proprasmo: The ASM instance is down
    2012-10-20 15:37:52.968: [  OCRRAW][3563381328]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
    2012-10-20 15:37:52.968: [  OCRRAW][3563381328]proprioo: No OCR/OLR devices are usable
    2012-10-20 15:37:52.968: [  OCRASM][3563381328]proprasmcl: asmhandle is NULL
    2012-10-20 15:37:52.969: [    GIPC][3563381328] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5326]
    2012-10-20 15:37:52.975: [ default][3563381328]clsvactversion:4: Retrieving Active Version from local storage.
    2012-10-20 15:37:52.978: [ CSSCLNT][3563381328]clssgsgrppubdata: group (ocr_SFV9699-cluster) not found
    2012-10-20 15:37:52.978: [  OCRRAW][3563381328]proprio_repairconf: Failed to retrieve the group public data. CSS ret code [20]
    2012-10-20 15:37:52.981: [  OCRRAW][3563381328]proprioo: Failed to auto repair the OCR configuration.
    2012-10-20 15:37:52.981: [  OCRRAW][3563381328]proprinit: Could not open raw device
    2012-10-20 15:37:52.981: [  OCRASM][3563381328]proprasmcl: asmhandle is NULL
    2012-10-20 15:37:52.983: [  OCRAPI][3563381328]a_init:16!: Backend init unsuccessful : [26]
    2012-10-20 15:37:52.984: [  CRSOCR][3563381328] OCR context init failure. Error: PROC-26: Error while accessing the physical storage
    ORA-27140: attach to post/wait facility failed
    ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
    ORA-27301: OS failure message: Operation not permitted
    ORA-27302: failure occurred at: skgpwinit6
    ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
    2012-10-20 15:37:52.984: [ CRSMAIN][3563381328] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage
    ORA-27140: attach to post/wait facility failed
    ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
    ORA-27301: OS failure message: Operation not permitted
    ORA-27302: failure occurred at: skgpwinit6
    ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
    2012-10-20 15:37:52.984: [    CRSD][3563381328][PANIC] CRSD exiting: Could not init OCR, code: 26
    2012-10-20 15:37:52.984: [    CRSD][3563381328] Done.
    =======================
    I see in the above log that saying ASM instance is down and failed to open +DATA .
    But the asm instance up and running
    SQL> select instance_name,status from v$instance;
    INSTANCE_NAME STATUS
    +ASM1            STARTED
    And we havent created any disk named DATA before the installation. We have created only below two disks
    SQL> select name,header_status from v$asm_disk;
    NAME HEADER_STATUS
    ASM_DATA MEMBER
    FLASH_RECOVERY MEMBER
    But I am seeing a diskgroup in the v$asm_diskgroup which we havent created.
    SQL> select name,state from v$asm_diskgroup;
    NAME STATE
    DATA MOUNTED
    Ya this is a second time installtion. In the first installtion we created the asmdisk as DATA. But later everything (RAW device ) was formatted and this new disks has been created and installtion again started
    [root@SFV9699 bin]# oracleasm listdisks
    ASM_DATA
    FLASH_RECOVERY
    Seems like its trying to read the old disk DATA.
    we have done asmscanning too with oracleasm scan disks. but no use.
    Where I can remove the old entry of DATA disk.
    It would be a great if a quick response get.
    Thanks
    SHIYAS M

    The permission looks fine. If it was permission issue then y it is trying to read the DATA disk which I havent created this time at all ( But created in the first installation).
    2012-10-20 15:37:52.459: [ CRSMAIN][3563381328] Initializing OCR
    [ CLWAL][3563381328]clsw_Initialize: OLR initlevel [70000]
    2012-10-20 15:37:52.897: [ OCRASM][3563381328]proprasmo: *Error in open/create file in dg [DATA]*[ OCRASM][3563381328]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=27140, loc=kgfokge
    2012-10-20 15:37:52.898: [ OCRASM][3563381328]ASM Error Stack : ORA-27140: attach to post/wait facility failed
    ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
    ORA-27301: OS failure message: Operation not permitted
    ORA-27302: failure occurred at: skgpwinit6
    ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
    2012-10-20 15:37:52.967: [ OCRASM][3563381328]proprasmo: kgfoCheckMount returned [7]
    2012-10-20 15:37:52.967: [ OCRASM][3563381328]proprasmo: The ASM instance is down
    2012-10-20 15:37:52.968: [ OCRRAW][3563381328]proprioo: *Failed to open [+DATA].* Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
    2012-10-20 15:37:52.968: [ OCRRAW][3563381328]proprioo: No OCR/OLR devices are usable
    2012-10-20 15:37:52.968: [ OCRASM][3563381328]proprasmcl: asmhandle is NULL
    The only disks created are
    [root@SFV9699 dev]# oracleasm listdisks
    ASM_DATA
    FLASH_RECOVERY
    And these disks are showing part of that group also.. Not quite sure how this happened..
    what abt dropping this group.. will anything work.

  • SAPOSCOL not running in MS Cluster

    Hi, gurus:
    We have a problem with SAPOSCOL in a SAP ECC 6.0 system (SAP ECC 6.0 + NetWeaver 7.00 + Oracle 10.2 + Windows Server 2003 R2 Enterprise x64 Edition) running over a MS cluster:
    Transactions OS06/ST06 shows no data, and they show an info message wich states: SAPOSCOL not running ? (shared memory not available ). When we checked this issue, we noticed that, in fact, there is no sapcoscol.exe task running in any node.
    But when we try to start the service (both using microsoft services console and cmd commands) although we can see the process running in the node which owns all the resources, SAP seems not notice that. The information system shows in ST06>Operating System Collector>Status is the following:
    iinterval             0             sec.
    Collector Version:
    Date/time             05.09.2008 16:55:01
    Start of Collector
    Status report
    Collector Versions                         
      running                                   COLL 20.95     700     - 20.64     NT 07/10/17
      dialog                                   COLL 20.95     700     - 20.65     NT 08/02/06
    Shared Memory                              attached
    Number of records                         575
    Active Flag                              active     (01)
    Operating System                         Windows NT     5.2.3790 SP     2 BL-SAP2 4x AMD64 Level 1
    Collector PID                              0 (00000000)
    Collector                                   not running (process ID not found).
    Start time coll.                              Thu Jan 01     01:00:00 1970
    Current     Time                              Fri Sep 05     16:55:01 2008
    Last write access                         Mon Sep 01     11:28:23 2008
    Last Read  Access                         Fri Sep 05     15:54:00 2008
    Collection Interval                         10     sec     (next delay).
    Collection Interval                         10     sec     (last ).
    Status                                   read
    Collect     Details                         required
    Refresh                                   required
    Header Extention Structure                    
    Number of x-header          Records          1
    Number of Communication     Records          60
    Number of free Com.          Records          60
    Resulting offset to     1.data rec.               61
    Trace level                                   3
    Collector in IDLE -     mode ?               NO
      become idle after     300     sec     without     read access.|
      Length of     Idle Interval                    60     sec
      Length of     norm.Interval                    10     sec
    But saposcol.exe is running with a certain PID in the same note than SAP and Oracle under user sapservice<sid>
    We have tried to run saposcol in several ways (as, I have noted before: from microsoft service console, from cmd line using "net start saposcol", using the saposcol under C:\WINDOWS\SapCluster and the one under
    F:\usr\sap\PRD\sys\exe\run, fom the two nodes, accessing the cluster through several IPs...) and tried the commands saposcol -c and saposcol -k but we cannot get the saposcoll run. Moreover, we haven't found any log information. The only log we (and SAP) could find is the one located in C:\WINDOWS\SapCluster\dev_coll.
    This log remain frozen at September 1st:
          SAPOSCOL version  COLL 20.95 700 - 20.64 NT 07/10/17, 64 bit, multithreaded, Non-Unicode
          compiled at   Feb  3 2008
          systemid      562 (PC with Windows NT)
          relno         7000
          patch text    COLL 20.95 700 - 20.64 NT 07/10/17
          patchno       146
          intno         20050900
          running on    BL-SAP2 Windows NT 5.2 3790 Service Pack 2 4x AMD64 Level 15 (Mod 65 Step 3)
    12:04:16 01.09.2008   LOG: Profile          : no profile used
    12:04:16 01.09.2008   LOG: Saposcol Version  : [COLL 20.95 700 - 20.64 NT 07/10/17]
    12:04:16 01.09.2008   LOG: Working directory : C:\WINDOWS\SAPCLU~1
    12:04:16 01.09.2008   LOG: Allocate Counter Buffer [10000 Bytes]
    12:04:16 01.09.2008   LOG: Allocate Instance Buffer [10000 Bytes]
    12:04:17 01.09.2008   LOG: Shared Memory Size: 71898.
    12:04:17 01.09.2008   LOG: Connected to existing shared memory.
    12:04:17 01.09.2008   LOG: MaxRecords = 575 <> RecordCnt + Dta_offset = 614 + 61
    12:04:22 01.09.2008 WARNING: WaitFree: could not set new shared memory status after 5 sec
    12:04:22 01.09.2008 WARNING: Cannot create Shared Memory
    Kernel Info:
    Kernel release    700
    Compilation        NT 5.2 3790 Service Pack 1 x86 MS VC++ 14.00
    Sup.Pkg lvl.       146
    ABAP Load       1563
    CUA load           30
    Mode                opt
    Can anyone shed some light on the subject?
    Thank you very much and kind regards
    Edited by: Jose Enrique Sepulveda on Sep 6, 2008 2:10 AM

    Dear bhaskar:
    Thanks for your reply. We have considered balancing the system to the other node or reboot the system to free resources, in order to re-create the shared memory, but in the past, the balancing process (move resources from one node to the other) has caused problems. Since this is a critical system, stopping (or balancing) is not an option right now, and updating the kernel requires an ABAP stack reboot plus the kernel change : any changes in system configuration requires a longer approval/planning process than a reboot.
    Moreover, the OS collecting system and its display in OS06/ST06 has worked fine until now.
    Does anyone knows if a reboot has solved this kind of problem in a similar situation?
    Thanks in advance
    José Enrique

  • Solaris 9 x86 bug report - el_GR.ISO8859-7 & CDE

    I'm posting this article here, because I can't find any official Solaris 9 x86 bug report page. I hope the developers will notice it. I'm using Solaris 9 x86 (12/02), with the latest 9_x86_Recommended patch cluster installed, and support for Greek installed too.
    It seems there is some kind of problem, when trying to view text files with Greek (el_GR.ISO8859-7) characters, which were created under Windows. To be more specific:
    If I boot at CDE with language el_GR.ISO8859-7, and try to view a .txt file (just with a simple double click), which I have created under Windows, with Greek characters,
    the screen goes black, and the CDE login screen appears again (restarts). If I keep the Greek language or change the language to US English, I can boot at CDE again, with
    no problems. If I try "command line logging", the screen goes off - just like when the computer is powered off, and I can't do anything, (well, except pressing the reset button, that's the sure way). And if I use the "init 6" command, while being logged at CDE, from a terminal, the Graphical Desktop exits, and then, the screen goes off again (just like the computer is powered off), but finally the computer manages to restart.
    I'm using the Sun X server, NOT the XFree86 porting kit and I use the entry
    :0 Local local_uid@console nobody /usr/openwin/bin/Xsun :0 -dpsfileops
    in /usr/dt/config/Xservers file, to start the X server.
    Here is the $HOME_DIR/.dt/startlog file:
    --- ??? 23 ??? 2003 12:10:51
    --- /usr/dt/bin/Xsession starting...
    --- starting /usr/openwin/bin/speckeysd
    --- Xsession started by dtlogin
    --- starting /usr/dt/bin/dtsession_res -load -system
    --- sourcing /root/.dtprofile...
    --- sourcing /usr/dt/config/Xsession.d/0010.dtpaths...
    --- sourcing /usr/dt/config/Xsession.d/0015.sun.env...
    --- sourcing /usr/dt/config/Xsession.d/0020.dtims...
    --- sourcing /usr/dt/config/Xsession.d/0030.dttmpdir...
    --- sourcing /usr/dt/config/Xsession.d/0040.xmbind...
    --- sourcing /usr/dt/config/Xsession.d/1000.solregis...
    --- could not read /root/.profile
    --- starting /usr/dt/bin/dthello &
    --- starting /usr/dt/bin/dtsearchpath
    --- starting /usr/dt/bin/dtappgather &
    --- starting /usr/dt/bin/dsdm &
    --- session log file is /root/.dt/sessionlogs/www_DISPLAY=:0
    --- DTSOURCEPROFILE is 'true' (see /root/.dtprofile)
    --- execing /usr/dt/bin/dtsession with a /sbin/sh login shell ...
    --- starting desktop on /dev/pts/3
    Sun Microsystems Inc.     SunOS 5.9     Generic_112234-03     November 2002
    /usr/dt/bin/ttsession[337]: starting
    X connection to :0.0 broken (explicit kill or server shutdown).
    X connection to :0.0 broken (explicit kill or server shutdown).
    I don't know if this is a bug or something, and I'm very curious about the cause. I didn't have much time for any other "experiments".
    Anyway, I hope this will help developers solve a problem -if it really exists-.
    Angelos Vasilopoulos
    Site Security Officer
    [email protected]

    I'm posting this article here, because I can't find any official Solaris 9 x86 bug report page. I hope the developers will notice it. I'm using Solaris 9 x86 (12/02), with the latest 9_x86_Recommended patch cluster installed, and support for Greek installed too.
    It seems there is some kind of problem, when trying to view text files with Greek (el_GR.ISO8859-7) characters, which were created under Windows. To be more specific:
    If I boot at CDE with language el_GR.ISO8859-7, and try to view a .txt file (just with a simple double click), which I have created under Windows, with Greek characters,
    the screen goes black, and the CDE login screen appears again (restarts). If I keep the Greek language or change the language to US English, I can boot at CDE again, with
    no problems. If I try "command line logging", the screen goes off - just like when the computer is powered off, and I can't do anything, (well, except pressing the reset button, that's the sure way). And if I use the "init 6" command, while being logged at CDE, from a terminal, the Graphical Desktop exits, and then, the screen goes off again (just like the computer is powered off), but finally the computer manages to restart.
    I'm using the Sun X server, NOT the XFree86 porting kit and I use the entry
    :0 Local local_uid@console nobody /usr/openwin/bin/Xsun :0 -dpsfileops
    in /usr/dt/config/Xservers file, to start the X server.
    Here is the $HOME_DIR/.dt/startlog file:
    --- ??? 23 ??? 2003 12:10:51
    --- /usr/dt/bin/Xsession starting...
    --- starting /usr/openwin/bin/speckeysd
    --- Xsession started by dtlogin
    --- starting /usr/dt/bin/dtsession_res -load -system
    --- sourcing /root/.dtprofile...
    --- sourcing /usr/dt/config/Xsession.d/0010.dtpaths...
    --- sourcing /usr/dt/config/Xsession.d/0015.sun.env...
    --- sourcing /usr/dt/config/Xsession.d/0020.dtims...
    --- sourcing /usr/dt/config/Xsession.d/0030.dttmpdir...
    --- sourcing /usr/dt/config/Xsession.d/0040.xmbind...
    --- sourcing /usr/dt/config/Xsession.d/1000.solregis...
    --- could not read /root/.profile
    --- starting /usr/dt/bin/dthello &
    --- starting /usr/dt/bin/dtsearchpath
    --- starting /usr/dt/bin/dtappgather &
    --- starting /usr/dt/bin/dsdm &
    --- session log file is /root/.dt/sessionlogs/www_DISPLAY=:0
    --- DTSOURCEPROFILE is 'true' (see /root/.dtprofile)
    --- execing /usr/dt/bin/dtsession with a /sbin/sh login shell ...
    --- starting desktop on /dev/pts/3
    Sun Microsystems Inc.     SunOS 5.9     Generic_112234-03     November 2002
    /usr/dt/bin/ttsession[337]: starting
    X connection to :0.0 broken (explicit kill or server shutdown).
    X connection to :0.0 broken (explicit kill or server shutdown).
    I don't know if this is a bug or something, and I'm very curious about the cause. I didn't have much time for any other "experiments".
    Anyway, I hope this will help developers solve a problem -if it really exists-.
    Angelos Vasilopoulos
    Site Security Officer
    [email protected]

  • Encountered ora-29701 during Sun Cluster for Oracle RAC 9.2.0.7 startup (UR

    Hi all,
    Need some help from all out there
    In our Sun Cluster 3.1 Data Service for Oracle RAC 9.2.0.7 (Solaris 9) configuration, my team had encountered
    ora-29701 *Unable to connect to Cluster Manager*
    during the startup of the Oracle RAC database instances on the Oracle RAC Server resources.
    We tried the attached workaround by Oracle. This workaround works well for the 1^st time but it doesn’t work anymore when the server is rebooted.
    Kindly help me to check whether anyone encounter the same problem as the above and able to resolve. Thanks.
    Bug No. 4262155
    Filed 25-MAR-2005 Updated 11-APR-2005
    Product Oracle Server - Enterprise Edition Product Version 9.2.0.6.0
    Platform Linux x86
    Platform Version 2.4.21-9.0.1
    Database Version 9.2.0.6.0
    Affects Platforms Port-Specific
    Severity Severe Loss of Service
    Status Not a Bug. To Filer
    Base Bug N/A
    Fixed in Product Version No Data
    Problem statement:
    ORA-29701 DURING DATABASE CREATION AFTER APPLYING 9.2.0.6 PATCHSET
    *** 03/25/05 07:32 am ***
    TAR:
    PROBLEM:
    Customer applied 9.2.0.6 patchset over 9.2.0.4 patchset.
    While creating the database, customer receives following error:
         ORA-29701: unable to connect to Cluster Manager
    However, if customer goes from 9.2.0.4 -> 9.2.0.5 -> 9.2.0.6, the problem does not occur.
    DIAGNOSTIC ANALYSIS:
    It seems that the problem is with libskgxn9.so shared library.
    For 9.2.0.4 -> 9.2.0.5 -> 9.2.0.6, the install log shows the following:
    installActions2005-03-22_03-44-42PM.log:,
    [libskgxn9.so->%ORACLE_HOME%/lib/libskgxn9.so 7933 plats=1=>[46]langs=1=> en,fr,ar,bn,pt_BR,bg,fr_CA,ca,hr,cs,da,nl,ar_EG,en_GB,et,fi,de,el,iw,hu,is,in, it,ja,ko,es,lv,lt,ms,es_MX,no,pl,pt,ro,ru,zh_CN,sk,sl,es_ES,sv,th,zh_TW, tr,uk,vi]]
    installActions2005-03-22_04-13-03PM.log:, [libcmdll.so ->%ORACLE_HOME%/lib/libskgxn9.so 64274 plats=1=>[46] langs=-554696704=>[en]]
    For 9.2.0.4 -> 9.2.0.6, install log shows:
    installActions2005-03-22_04-13-03PM.log:, [libcmdll.so ->%ORACLE_HOME%/lib/libskgxn9.so 64274 plats=1=>[46] langs=-554696704=>[en]] does not exist.
    This means that while patching from 9.2.0.4 -> 9.2.0.5, Installer copies the libcmdll.so library into libskgxn9.so, while patching from 9.2.0.4 -> 9.2.0.6 does not.
    ORACM is located in /app/oracle/ORACM which is different than ORACLE_HOME in customer's environment.
    WORKAROUND:
    Customer is using the following workaround:
    cd $ORACLE_HOME/rdbms/lib make -f ins_rdbms.mk rac_on ioracle ipc_udp
    RELATED BUGS:
    Bug 4169291

    Check if following MOS note helps.
    Series of ORA-7445 Errors After Applying 9.2.0.7.0 Patchset to 9.2.0.6.0 Database (Doc ID 373375.1)

Maybe you are looking for