X86 cluster 3.2

Hi
I have a 2 node cluster with one RG     configured
Currently the RG and one of the resource status is showing as Faulted as below
# clrg status
=== Cluster Resource Groups ===
Group Name   Node Name                         Suspended   Status
oracle-rg    cert1.cin.com   No          Online_faulted
                 cert2.cin.com   No          Offline
# clrs status
=== Cluster Resources ===
Resource Name         Node Name                         State     Status Message
oracle-storage-res    cert1.cin.com   Online    Faulted - I/O timed out on path /dev/md/ora-data/rdsk/d9
                              cert2.cin.com   Offline   Offline
THere are   5 disks in the ora-data diskset
Output of "cldevice show" command from cert2.cin.com server shows the physical path of d9 disk above as accessible but
Running cluster status command on cert2.cin.com hangs midway after the   resource status output
I am not able to login to cert1.cin.com where the RG is active . It does not come to the login prompt but the server is pinging
I am not sure why the resource is showing online while as the message is Faulted and the node also is hung
Please help me in how to clear the Faulted state and recover from this condition
Can I switch the RG to cert2.cin.com safely
Appreciate any help TIA

Hi ra*326096*ul,
it seems there was a timeout on device /dev/md/ora-data/rdsk/d9 on node cert1. Maybe the /var/adm/messages file give more information for the cause of this timeout. Certainly, you need to login to cert1 to check this. If you are not able to login via the network maybe you can access the cert1 via the console port and check the status of the network and the SVM d9 device? And what are the underlying cluster ‘did’ devices of the SVM d9 device? You can use metastat if you can access cert1 via console port.
It’s not sure if you can switch the RG safely to cert2 due to the fact that the ‘cluster status’ command is not finishing successful. Because after the resources output it looks to the cluster ‘did’ devices. So, if you know which are the underlying cluster ‘did’ devices of the SVM d9 device then you can check if the relevant cluster ‘did’ devices are ok. The ‘cldevice show’ command shows the cluster ‘did’ devices but not the SVM d9 device. So, there is no need that cluster ‘did’ device d9 is part of SVM d9 device. If you know the cluster ‘did’ devices which are part of SVM d9 device you check if the physical device of these’ did’ devices are ok. Maybe ‘scdidadm -L’ is also helpful to show a summary of all nodes for cluster ‘did’ devices. When you have checked that the physical devices of the SVM d9 device are ok on cert2 then I believe you can switch the RG successful. Does 'scstat -D' work on cert2?
Hth,
Juergen

Similar Messages

Linux/x86 cluster working with Xgrid

Hi all!, any one know how i can connect my old linux/x86 cluster (working whit redhat fedora c1) and new one (with mandriva clustering) as agent in xgrid?, thanks!

At sourceforge.net you can find Java-based Xgrid client for Windows/Linux/UNIX. Maybe it can help you.

Zpool wont import: /var/cluster/run/HAStoragePlus/zfs not a valid directory

I have this two-node Solaris 10 x86 cluster that refuses to fail the HASP zpool over to the second node.
On node 2, log complains that /var/cluster/run/HAStoragePlus/zfs is not a valid directory:
Aug 17 22:23:28 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 148650 daemon.notice] Started searching for devices in '/dev/dsk' to find the importable pools.
Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 547433 daemon.notice] Completed searching the devices in '/dev/dsk' to find the importable pools.
Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 471757 daemon.error] cannot import pool 'qnap' : '/var/cluster/run/HAStoragePlus/zfs' is not a valid directory
Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 117328 daemon.error] The pool 'qnap' failed to import and populate cachefile.
Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 292307 daemon.error] Failed to import:qnap
In fact, on node 2 this folder is missing, i.e. the /var/cluster/run/HAStoragePlus folder isn't there.
If I create that folder, and the zfs folder inside it, then switchover works flawlessly both ways, but if I reboot node 2, that folder
gets cleared and the problem is back again. Looks like a bug to me, but I perused sunsolve without finding anything.
This is Sun Cluster 3.2 update 2 with latest patchset, running on a Solaris 10 x86 update 7 (latest).
I already tried removing and recreating the HASP resource, no way.
Any hint will be greatly appreciated.
Thanks
Rick
Edited by: leopardus2 on Aug 18, 2009 2:46 AM

After many sleepless nights looking at this problem, I ended up solving it by myself right 30 minutes after posting here!!
The problem was due to ZFS plugin not being configured in /etc/cluster/eventlog/eventlog.conf !!
I think this happened because I installed HASP on node 1 before node 2 joined the cluster for the first time.
Should be marked as a bug IMHO...
Thanks
Rick

Error when creating zone cluster

Hello,
I have the following setup: Solaris 11.2 x86, cluster 4.2. I have already configured the cluster and it's up and running. I am trying to create a zone cluster, but getting the following error:
>>> Result of the Creation for the Zone cluster(ztestcluster) <<<
    The zone cluster is being configured with the following configuration
        /usr/cluster/bin/clzonecluster configure ztestcluster
        create
        set zonepath=/zclusterpool/znode
        set brand=cluster
        set ip-type=shared
        set enable_priv_net=true
        add sysid
        set root_password=********
        end
        add node
        set physical-host=node2
        set hostname=zclnode2
        add net
        set address=192.168.10.52
        set physical=net1
        end
        end
        add node
        set physical-host=node1
        set hostname=zclnode1
        add net
        set address=192.168.10.51
        set physical=net1
        end
        end
        add net
        set address=192.168.10.55
        end
java.lang.NullPointerException
        at java.util.regex.Matcher.getTextLength(Matcher.java:1234)
        at java.util.regex.Matcher.reset(Matcher.java:308)
        at java.util.regex.Matcher.<init>(Matcher.java:228)
        at java.util.regex.Pattern.matcher(Pattern.java:1088)
        at com.sun.cluster.zcwizards.zonecluster.ZCWizardResultPanel.consoleInteraction(ZCWizardResultPanel.java:181)
        at com.sun.cluster.dswizards.clisdk.core.IteratorLayout.cliConsoleInteraction(IteratorLayout.java:563)
        at com.sun.cluster.dswizards.clisdk.core.IteratorLayout.displayPanel(IteratorLayout.java:623)
        at com.sun.cluster.dswizards.clisdk.core.IteratorLayout.run(IteratorLayout.java:607)
        at java.lang.Thread.run(Thread.java:745)
             ERROR: System configuration error
             As a result of a change to the system configuration, a resource that this
             wizard will create is now invalid. Review any changes that were made to the
             system after you started this wizard to determine which changes might have
             caused this error. Then quit and restart this wizard.
    Press RETURN to close the wizard
No errors in /var/adm/messages.
Any ideas?
Thank you!

I must be doing some obvious, stupid mistake, cause I still get that "not enough space" error
root@node1:~# clzonecluster show ztestcluster
=== Zone Clusters ===
Zone Cluster Name:                              ztestcluster
zonename:                                        ztestcluster
zonepath:                                        /zcluster/znode
autoboot:                                        TRUE
brand:                                           solaris
bootargs:                                        <NULL>
pool:                                            <NULL>
limitpriv:                                       <NULL>
scheduling-class:                                <NULL>
ip-type:                                         shared
enable_priv_net:                                 TRUE
resource_security:                               SECURE
--- Solaris Resources for ztestcluster ---
Resource Name:                                net
    address:                                       192.168.10.55
    physical:                                      auto
--- Zone Cluster Nodes for ztestcluster ---
Node Name:                                    node2
    physical-host:                                 node2
    hostname:                                      zclnode2
    --- Solaris Resources for node2 ---
Node Name:                                    node1
    physical-host:                                 node1
    hostname:                                      zclnode1
    --- Solaris Resources for node1 ---
root@node1:~# clzonecluster install ztestcluster
Waiting for zone install commands to complete on all the nodes of the zone cluster "ztestcluster"...
clzonecluster: (C801046) Command execution failed on node node2. Please refer to the console for more information
clzonecluster: (C801046) Command execution failed on node node1. Please refer to the console for more information
But I have enough FS space. I increased the virtual HDD to 25GB on each node. After global cluster installation, I still have 16GB free on each node. During the install I constantly check the free space and it should be enough (only about 500MB is consumed by downloaded packages, which leaves about 15.5GB free). And every time the installation fails at "apply-sysconfig checkpoint"...

I/O tuning on Linux X86

As part of the due dilgence we are conduction research into possible issues of moving a datawarehouse database from Sun -12k to RAC on x86 linux cluster on HP or Dell. One of the issues was the I/O bus speed comparision between the 12k to HP or Dell cluster..the throughput for a PCI-e on Sun 12k is about 8GB/s and as compared to a decent PCI 64-bit/100 MHz on a Lintel is about 799.99 MB/s
.Our SAN storage would be still be on EMC with F/C attached and using most likely a CFS...Does anyone feel this is a due concern -if all else remains the same if we move the database from sun 12k to a linux x86 cluster will I/O be a bottleneck specifically Bus speeds?
Thanks
Praveen

by default on linux env. large IO operations are broken into 512K chunks (on 2.6 kernel) separating system IO into smaller sizes. On Sun you are able to perform large 1Mb IO so when you move to Linux you should see some 'degradation'.
In order to allow oracle to perform large IO operations on Linux you should adjust some kernel parameters - check on metalink for parameters aio-max-size, aio-nr, aio-max-nr etc...
regards,
goran

Solaris 8 x86 patch cluster installation

hi. i recently installed solaris 8 x86 on my intel p4 machine. i got through the installation fine but i am now running into problems with installing the 8x86 recommended patch cluster.
whenever i run the install_cluster script, it says every patch failed to install due to return code 1. i even tried manually installing the patches manually via patchadd...but i end up getting a message saying the patch directory is not valid.
i basically downloaded the 8x86 recommended patch cluster zip file on a windows 2k machine. i unzip the file and then burn the patches onto a cd. i then copy the patches from the cd onto my solaris machine and try to install the patches that way. so far...this doesn't work..and i dont know what im doing wrong.
does anyone know how to fix this problem? thx.

hi again. i just fixed my problem. apparently when i unzipped the file and then burned it onto cd...the data got corrupted. i fixed it by copying the zip file onto cd and then extracting the patches onto the solaris machine.

X86 sc3.1-0805 sol10-0606 - Doesn't boot in cluster mode

Hi,
i'm at my first experience with Sun Cluster on x86.
I've already tried at home with two p4 whiteboxes and now repeating the experiment here at work with a similar conf.( i happily run 6 v490 in 3 clustered pair with 3510Fc and a test 2 u10 clustered pair with Multipack).
no matter what i try i ever end up with the same results:
nodes boot up outside of the cluster. Interconnects doesn't start and (i think maybe cause of that) global devices don't get initialized.
I already tried many reinstall, already tried to add etc/cluster/nodeid to the filelist.ramdisk and update boot archive e reconfigure, like described in an infodoc to workaround a well known problem, but nothing changed
This is the situation as i start either one of the nodes:
mordor-nodo2 # svcs -x
svc:/system/cluster/mountgfsys:default (Suncluster mountgfsys service)
State: maintenance since Tue Aug 08 15:37:51 2006
Reason: Restarter svc:/system/svc/restarter:default gave no explanation.
See: http://sun.com/msg/SMF-8000-9C
See: /var/svc/log/system-cluster-mountgfsys:default.log
Impact: 13 dependent services are not running. (Use -v for list.)
svc:/system/cluster/gdevsync:default (Suncluster gdevsync service)
State: maintenance since Tue Aug 08 15:37:51 2006
Reason: Restarter svc:/system/svc/restarter:default gave no explanation.
See: http://sun.com/msg/SMF-8000-9C
See: /var/svc/log/system-cluster-gdevsync:default.log
Impact: 13 dependent services are not running. (Use -v for list.)
svc:/network/multipath:cluster (Network Monitor Daemon)
State: maintenance since Tue Aug 08 15:37:39 2006
Reason: Maintenance requested by an administrator.
See: http://sun.com/msg/SMF-8000-63
See: in.mpathd(1M)
See: /etc/svc/volatile/network-multipath:cluster.log
Impact: This service is not running.
Only the public interface is up in sc_ipmp0 group
Cluster interconnects are 3com elxl interface in all two nodes and are connected with cross-cables elxl0->>elxl0 elxl1-->elxl1 (verified that it works)
I've removed switches and put cross-cables while troubleshooting to have a simpler setup.
/etc/vfstab - every fs is mirrored - metadb are in s7 , globalfs in s3
/dev/md/dsk/d0 - - swap - no -
/dev/md/dsk/d10 /dev/md/rdsk/d10 / ufs 1 no -
/devices - /devices devfs - no -
ctfs - /system/contract ctfs - no -
objfs - /system/object objfs - no -
swap - /tmp tmpfs - yes -
#/dev/md/dsk/d20 /dev/md/rdsk/d20 /globaldevices ufs 2 yes -
/dev/md/dsk/d20 /dev/md/rdsk/d20 /global/.devices/node@1 ufs 2 no global
Planning to add a multipack for the multihost disks. But i'd like to solve this problem before
Nothing useful appears on the logs.
I only got the initial
Not booting in cluster mode
and nothing more
Maybe missing something about versions of the software/hardware am using that for any reason can't work togheter?
or some fix is needed?
Any hint would be appreciated
I stay at disposal for any kind of info
thanks

Hi,
you should elaborate a bit on your hardware.
Is it x64 or x32, what is your shared storage.
The "not booting in cluster mode" appears in example if you want to install SC 3.1 and SC3.2 on x32 hardware. If this your goal, you should start with Solaris Express and Solaris Cluster express.
Kind regards
Detlef

SUn CLuster 3.2 install - scinstall on x86 32 bit Solaris 5.10

Ok - I have 2 machines with 32-bit x86 SOlaris 5.10 - I installed the Sun Cluster 3.2 software but everytime I try scinstall it says rebooting other node and the other node never brings sun cluster up -
Questions:
1. The private interface does not come up on reboot - should it - and should I have an entry for cluster-priv1 in /etc/hosts
2. I know I am supposed to do a scvx -xv - I then try to enable the cluster service, but everything says disabled
I have tried this 5 times no luck - I have lots of cluster experience - and can get Oracle CRS working fine
Any thoughts

Yeahh, guys!!!
I was trying to establish a two-node cluster using VirtualBox + Solaris x86 + Sun Cluster 3.2. The node where I was running scinstall to configure my cluster environment was rebooting the other node in the end of the configuration process but it was hanging in the "Rebooting node01..." message just because it was not able to establish the cluster.
After see your comments, I changed Solaris x86 to Solaris Express Community Edition and Sun Cluster to Cluster Express and now everything is working fine!
Thanks!
Jansen Sena <[email protected]>

OracleAS R2 - Cluster Mixed Solaris SPARC/x86?

Is a mixed Solaris SPARC/x86 active-active cluster environment supported for OracleAS R2? What I mean by this is, can I put together a supported environment where an Identity Management node (node 1) is running as SPARC, and a second Identity Management node (node 2) is running Solaris x86?
Both OS's would be as identically configured as possible (both OS version & patch levels).
Cheers, Brad

Metalink note 429995.1
Says
Goal
Is it supported to install Application Server Oracle Homes on different operating systems or different versions of the same operating system?
Example 1: AS Infrastructure is installed on a Solaris 8 server and a Business Intelligence and Forms Middle Tier is installed on a Solaris 10 server.
Example 2: AS Infrastructure is installed on a Red Hat linux server and Business Intelligence and Forms Middle Tiers are installed on Windows 2003 servers.
Solution
It is completely supported to install each Application Server oracle home onto a different operating system or onto different versions of the same operating system.
Both of the above example scenarios are supported.
The only restriction is that members of a Middle-tier DCM-Managed OracleAS Cluster must be on the same operating system 'flavour'. As per the High Availability Guide:
All Oracle Application Server instances that are to be members of a DCM-Managed OracleAS Cluster must be installed on the same flavour operating system. For example, different variants of UNIX are clusterable together, but they are not clusterable with Windows systems.
Greetings

Sun cluster patch for solaris 10 x86

I have Solaris 10 6/06 installed on x4100 box with 2 node clustering using Sun Cluster 3.1 8/05. I just want to know is there any latest patches available for the OS to prevent cluster related bugs. what are they? My kernel patch is 118855-19.
any inputs needed. let me know.

Well, I would run S10 updatemanager and get the latest patches that way.
Tim
---

X86-64 GNU/LINUX cluster failover alert

Hi..
I would like to know how to set the email alert so that I can receive email when the gnu/linux cluster failover to another node. Please Help
Thanks in advance ...

Check the documentation of the clusterware product you are using. It will explain how to run commands on failover.

Error while checking the status of Oracle Cluster ware

Hi
I was trying to install the database using dbca after setting up the grid and database software on LINUX x86-64 RHEL 5.7 machine. The database software version is 11.2.0.3. It throwing the error regarding the connectivity of clusterware. So I checked the status of clusterware.
-bash-3.2$ ./crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
-bash-3.2$
But when I ran below one:
-bash-3.2$ ./crsctl stat res -t -init
NAME TARGET STATE SERVER STATE_DETAILS
Cluster Resources
ora.asm
1 ONLINE ONLINE sfv9699 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE sfv9699
ora.crf
1 ONLINE ONLINE sfv9699
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE sfv9699
ora.cssdmonitor
1 ONLINE ONLINE sfv9699
ora.ctssd
1 ONLINE ONLINE sfv9699 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE sfv9699
ora.evmd
1 ONLINE INTERMEDIATE sfv9699
ora.gipcd
1 ONLINE ONLINE sfv9699
ora.gpnpd
1 ONLINE ONLINE sfv9699
ora.mdnsd
1 ONLINE ONLINE sfv9699
So i saw that the crsd having some issue. I checked the alert log and crsd log. Below are the output.
Alert <server_name>.log
2012-10-20 15:37:51.408
[ohasd(3694)]CRS-2765:Resource 'ora.crsd' has failed on server 'sfv9699'.
2012-10-20 15:37:52.968
[crsd(5188)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /oracle2/app/11.2.0/grid/log/sfv9699/crsd/crsd.log.
2012-10-20 15:37:52.984
[crsd(5188)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
]. Details at (:CRSD00111:) in /oracle2/app/11.2.0/grid/log/sfv9699/crsd/crsd.log.
2012-10-20 15:37:53.471
[ohasd(3694)]CRS-2765:Resource 'ora.crsd' has failed on server 'sfv9699'.
2012-10-20 15:37:53.472
[ohasd(3694)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
CRSD.log
2012-10-20 15:37:52.456: [ CRSMAIN][3563381328] Checking the OCR device
2012-10-20 15:37:52.457: [ CRSMAIN][3563381328] Sync-up with OCR
2012-10-20 15:37:52.457: [ CRSMAIN][3563381328] Connecting to the CSS Daemon
2012-10-20 15:37:52.457: [ CRSMAIN][3563381328] Getting local node number
2012-10-20 15:37:52.459: [ CRSMAIN][3563381328] Initializing OCR
[   CLWAL][3563381328]clsw_Initialize: OLR initlevel [70000]
2012-10-20 15:37:52.897: [ OCRASM][3563381328]proprasmo: Error in open/create file in dg [DATA]
[ OCRASM][3563381328]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=27140, loc=kgfokge
2012-10-20 15:37:52.898: [ OCRASM][3563381328]ASM Error Stack : ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
2012-10-20 15:37:52.967: [ OCRASM][3563381328]proprasmo: kgfoCheckMount returned [7]
2012-10-20 15:37:52.967: [ OCRASM][3563381328]proprasmo: The ASM instance is down
2012-10-20 15:37:52.968: [ OCRRAW][3563381328]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2012-10-20 15:37:52.968: [ OCRRAW][3563381328]proprioo: No OCR/OLR devices are usable
2012-10-20 15:37:52.968: [ OCRASM][3563381328]proprasmcl: asmhandle is NULL
2012-10-20 15:37:52.969: [    GIPC][3563381328] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5326]
2012-10-20 15:37:52.975: [ default][3563381328]clsvactversion:4: Retrieving Active Version from local storage.
2012-10-20 15:37:52.978: [ CSSCLNT][3563381328]clssgsgrppubdata: group (ocr_SFV9699-cluster) not found
2012-10-20 15:37:52.978: [ OCRRAW][3563381328]proprio_repairconf: Failed to retrieve the group public data. CSS ret code [20]
2012-10-20 15:37:52.981: [ OCRRAW][3563381328]proprioo: Failed to auto repair the OCR configuration.
2012-10-20 15:37:52.981: [ OCRRAW][3563381328]proprinit: Could not open raw device
2012-10-20 15:37:52.981: [ OCRASM][3563381328]proprasmcl: asmhandle is NULL
2012-10-20 15:37:52.983: [ OCRAPI][3563381328]a_init:16!: Backend init unsuccessful : [26]
2012-10-20 15:37:52.984: [ CRSOCR][3563381328] OCR context init failure. Error: PROC-26: Error while accessing the physical storage
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
2012-10-20 15:37:52.984: [ CRSMAIN][3563381328] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
2012-10-20 15:37:52.984: [    CRSD][3563381328][PANIC] CRSD exiting: Could not init OCR, code: 26
2012-10-20 15:37:52.984: [    CRSD][3563381328] Done.
=======================
I see in the above log that saying ASM instance is down and failed to open +DATA .
But the asm instance up and running
SQL> select instance_name,status from v$instance;
INSTANCE_NAME STATUS
+ASM1            STARTED
And we havent created any disk named DATA before the installation. We have created only below two disks
SQL> select name,header_status from v$asm_disk;
NAME HEADER_STATUS
ASM_DATA MEMBER
FLASH_RECOVERY MEMBER
But I am seeing a diskgroup in the v$asm_diskgroup which we havent created.
SQL> select name,state from v$asm_diskgroup;
NAME STATE
DATA MOUNTED
Ya this is a second time installtion. In the first installtion we created the asmdisk as DATA. But later everything (RAW device ) was formatted and this new disks has been created and installtion again started
[root@SFV9699 bin]# oracleasm listdisks
ASM_DATA
FLASH_RECOVERY
Seems like its trying to read the old disk DATA.
we have done asmscanning too with oracleasm scan disks. but no use.
Where I can remove the old entry of DATA disk.
It would be a great if a quick response get.
Thanks
SHIYAS M

The permission looks fine. If it was permission issue then y it is trying to read the DATA disk which I havent created this time at all ( But created in the first installation).
2012-10-20 15:37:52.459: [ CRSMAIN][3563381328] Initializing OCR
[ CLWAL][3563381328]clsw_Initialize: OLR initlevel [70000]
2012-10-20 15:37:52.897: [ OCRASM][3563381328]proprasmo: *Error in open/create file in dg [DATA]*[ OCRASM][3563381328]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=27140, loc=kgfokge
2012-10-20 15:37:52.898: [ OCRASM][3563381328]ASM Error Stack : ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 10002 (dba)
2012-10-20 15:37:52.967: [ OCRASM][3563381328]proprasmo: kgfoCheckMount returned [7]
2012-10-20 15:37:52.967: [ OCRASM][3563381328]proprasmo: The ASM instance is down
2012-10-20 15:37:52.968: [ OCRRAW][3563381328]proprioo: *Failed to open [+DATA].* Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2012-10-20 15:37:52.968: [ OCRRAW][3563381328]proprioo: No OCR/OLR devices are usable
2012-10-20 15:37:52.968: [ OCRASM][3563381328]proprasmcl: asmhandle is NULL
The only disks created are
[root@SFV9699 dev]# oracleasm listdisks
ASM_DATA
FLASH_RECOVERY
And these disks are showing part of that group also.. Not quite sure how this happened..
what abt dropping this group.. will anything work.

SAPOSCOL not running in MS Cluster

Hi, gurus:
We have a problem with SAPOSCOL in a SAP ECC 6.0 system (SAP ECC 6.0 + NetWeaver 7.00 + Oracle 10.2 + Windows Server 2003 R2 Enterprise x64 Edition) running over a MS cluster:
Transactions OS06/ST06 shows no data, and they show an info message wich states: SAPOSCOL not running ? (shared memory not available ). When we checked this issue, we noticed that, in fact, there is no sapcoscol.exe task running in any node.
But when we try to start the service (both using microsoft services console and cmd commands) although we can see the process running in the node which owns all the resources, SAP seems not notice that. The information system shows in ST06>Operating System Collector>Status is the following:
iinterval             0             sec.
Collector Version:
Date/time             05.09.2008 16:55:01
Start of Collector
Status report
Collector Versions
running                                   COLL 20.95     700     - 20.64     NT 07/10/17
dialog                                   COLL 20.95     700     - 20.65     NT 08/02/06
Shared Memory                              attached
Number of records                         575
Active Flag                              active     (01)
Operating System                         Windows NT     5.2.3790 SP     2 BL-SAP2 4x AMD64 Level 1
Collector PID                              0 (00000000)
Collector                                   not running (process ID not found).
Start time coll.                              Thu Jan 01     01:00:00 1970
Current     Time                              Fri Sep 05     16:55:01 2008
Last write access                         Mon Sep 01     11:28:23 2008
Last Read Access                         Fri Sep 05     15:54:00 2008
Collection Interval                         10     sec     (next delay).
Collection Interval                         10     sec     (last ).
Status                                   read
Collect     Details                         required
Refresh                                   required
Header Extention Structure
Number of x-header          Records          1
Number of Communication     Records          60
Number of free Com.          Records          60
Resulting offset to     1.data rec.               61
Trace level                                   3
Collector in IDLE -     mode ?               NO
become idle after     300     sec     without     read access.|
Length of     Idle Interval                    60     sec
Length of     norm.Interval                    10     sec
But saposcol.exe is running with a certain PID in the same note than SAP and Oracle under user sapservice<sid>
We have tried to run saposcol in several ways (as, I have noted before: from microsoft service console, from cmd line using "net start saposcol", using the saposcol under C:\WINDOWS\SapCluster and the one under
F:\usr\sap\PRD\sys\exe\run, fom the two nodes, accessing the cluster through several IPs...) and tried the commands saposcol -c and saposcol -k but we cannot get the saposcoll run. Moreover, we haven't found any log information. The only log we (and SAP) could find is the one located in C:\WINDOWS\SapCluster\dev_coll.
This log remain frozen at September 1st:
      SAPOSCOL version COLL 20.95 700 - 20.64 NT 07/10/17, 64 bit, multithreaded, Non-Unicode
      compiled at   Feb 3 2008
      systemid      562 (PC with Windows NT)
      relno         7000
      patch text    COLL 20.95 700 - 20.64 NT 07/10/17
      patchno       146
      intno         20050900
      running on    BL-SAP2 Windows NT 5.2 3790 Service Pack 2 4x AMD64 Level 15 (Mod 65 Step 3)
12:04:16 01.09.2008   LOG: Profile          : no profile used
12:04:16 01.09.2008   LOG: Saposcol Version : [COLL 20.95 700 - 20.64 NT 07/10/17]
12:04:16 01.09.2008   LOG: Working directory : C:\WINDOWS\SAPCLU~1
12:04:16 01.09.2008   LOG: Allocate Counter Buffer [10000 Bytes]
12:04:16 01.09.2008   LOG: Allocate Instance Buffer [10000 Bytes]
12:04:17 01.09.2008   LOG: Shared Memory Size: 71898.
12:04:17 01.09.2008   LOG: Connected to existing shared memory.
12:04:17 01.09.2008   LOG: MaxRecords = 575 <> RecordCnt + Dta_offset = 614 + 61
12:04:22 01.09.2008 WARNING: WaitFree: could not set new shared memory status after 5 sec
12:04:22 01.09.2008 WARNING: Cannot create Shared Memory
Kernel Info:
Kernel release    700
Compilation        NT 5.2 3790 Service Pack 1 x86 MS VC++ 14.00
Sup.Pkg lvl.       146
ABAP Load       1563
CUA load           30
Mode                opt
Can anyone shed some light on the subject?
Thank you very much and kind regards
Edited by: Jose Enrique Sepulveda on Sep 6, 2008 2:10 AM

Dear bhaskar:
Thanks for your reply. We have considered balancing the system to the other node or reboot the system to free resources, in order to re-create the shared memory, but in the past, the balancing process (move resources from one node to the other) has caused problems. Since this is a critical system, stopping (or balancing) is not an option right now, and updating the kernel requires an ABAP stack reboot plus the kernel change : any changes in system configuration requires a longer approval/planning process than a reboot.
Moreover, the OS collecting system and its display in OS06/ST06 has worked fine until now.
Does anyone knows if a reboot has solved this kind of problem in a similar situation?
Thanks in advance
José Enrique

Solaris 9 x86 bug report - el_GR.ISO8859-7 & CDE

I'm posting this article here, because I can't find any official Solaris 9 x86 bug report page. I hope the developers will notice it. I'm using Solaris 9 x86 (12/02), with the latest 9_x86_Recommended patch cluster installed, and support for Greek installed too.
It seems there is some kind of problem, when trying to view text files with Greek (el_GR.ISO8859-7) characters, which were created under Windows. To be more specific:
If I boot at CDE with language el_GR.ISO8859-7, and try to view a .txt file (just with a simple double click), which I have created under Windows, with Greek characters,
the screen goes black, and the CDE login screen appears again (restarts). If I keep the Greek language or change the language to US English, I can boot at CDE again, with
no problems. If I try "command line logging", the screen goes off - just like when the computer is powered off, and I can't do anything, (well, except pressing the reset button, that's the sure way). And if I use the "init 6" command, while being logged at CDE, from a terminal, the Graphical Desktop exits, and then, the screen goes off again (just like the computer is powered off), but finally the computer manages to restart.
I'm using the Sun X server, NOT the XFree86 porting kit and I use the entry
:0 Local local_uid@console nobody /usr/openwin/bin/Xsun :0 -dpsfileops
in /usr/dt/config/Xservers file, to start the X server.
Here is the $HOME_DIR/.dt/startlog file:
--- ??? 23 ??? 2003 12:10:51
--- /usr/dt/bin/Xsession starting...
--- starting /usr/openwin/bin/speckeysd
--- Xsession started by dtlogin
--- starting /usr/dt/bin/dtsession_res -load -system
--- sourcing /root/.dtprofile...
--- sourcing /usr/dt/config/Xsession.d/0010.dtpaths...
--- sourcing /usr/dt/config/Xsession.d/0015.sun.env...
--- sourcing /usr/dt/config/Xsession.d/0020.dtims...
--- sourcing /usr/dt/config/Xsession.d/0030.dttmpdir...
--- sourcing /usr/dt/config/Xsession.d/0040.xmbind...
--- sourcing /usr/dt/config/Xsession.d/1000.solregis...
--- could not read /root/.profile
--- starting /usr/dt/bin/dthello &
--- starting /usr/dt/bin/dtsearchpath
--- starting /usr/dt/bin/dtappgather &
--- starting /usr/dt/bin/dsdm &
--- session log file is /root/.dt/sessionlogs/www_DISPLAY=:0
--- DTSOURCEPROFILE is 'true' (see /root/.dtprofile)
--- execing /usr/dt/bin/dtsession with a /sbin/sh login shell ...
--- starting desktop on /dev/pts/3
Sun Microsystems Inc. SunOS 5.9 Generic_112234-03 November 2002
/usr/dt/bin/ttsession[337]: starting
X connection to :0.0 broken (explicit kill or server shutdown).
X connection to :0.0 broken (explicit kill or server shutdown).
I don't know if this is a bug or something, and I'm very curious about the cause. I didn't have much time for any other "experiments".
Anyway, I hope this will help developers solve a problem -if it really exists-.
Angelos Vasilopoulos
Site Security Officer
[email protected]

I'm posting this article here, because I can't find any official Solaris 9 x86 bug report page. I hope the developers will notice it. I'm using Solaris 9 x86 (12/02), with the latest 9_x86_Recommended patch cluster installed, and support for Greek installed too.
It seems there is some kind of problem, when trying to view text files with Greek (el_GR.ISO8859-7) characters, which were created under Windows. To be more specific:
If I boot at CDE with language el_GR.ISO8859-7, and try to view a .txt file (just with a simple double click), which I have created under Windows, with Greek characters,
the screen goes black, and the CDE login screen appears again (restarts). If I keep the Greek language or change the language to US English, I can boot at CDE again, with
no problems. If I try "command line logging", the screen goes off - just like when the computer is powered off, and I can't do anything, (well, except pressing the reset button, that's the sure way). And if I use the "init 6" command, while being logged at CDE, from a terminal, the Graphical Desktop exits, and then, the screen goes off again (just like the computer is powered off), but finally the computer manages to restart.
I'm using the Sun X server, NOT the XFree86 porting kit and I use the entry
:0 Local local_uid@console nobody /usr/openwin/bin/Xsun :0 -dpsfileops
in /usr/dt/config/Xservers file, to start the X server.
Here is the $HOME_DIR/.dt/startlog file:
--- ??? 23 ??? 2003 12:10:51
--- /usr/dt/bin/Xsession starting...
--- starting /usr/openwin/bin/speckeysd
--- Xsession started by dtlogin
--- starting /usr/dt/bin/dtsession_res -load -system
--- sourcing /root/.dtprofile...
--- sourcing /usr/dt/config/Xsession.d/0010.dtpaths...
--- sourcing /usr/dt/config/Xsession.d/0015.sun.env...
--- sourcing /usr/dt/config/Xsession.d/0020.dtims...
--- sourcing /usr/dt/config/Xsession.d/0030.dttmpdir...
--- sourcing /usr/dt/config/Xsession.d/0040.xmbind...
--- sourcing /usr/dt/config/Xsession.d/1000.solregis...
--- could not read /root/.profile
--- starting /usr/dt/bin/dthello &
--- starting /usr/dt/bin/dtsearchpath
--- starting /usr/dt/bin/dtappgather &
--- starting /usr/dt/bin/dsdm &
--- session log file is /root/.dt/sessionlogs/www_DISPLAY=:0
--- DTSOURCEPROFILE is 'true' (see /root/.dtprofile)
--- execing /usr/dt/bin/dtsession with a /sbin/sh login shell ...
--- starting desktop on /dev/pts/3
Sun Microsystems Inc. SunOS 5.9 Generic_112234-03 November 2002
/usr/dt/bin/ttsession[337]: starting
X connection to :0.0 broken (explicit kill or server shutdown).
X connection to :0.0 broken (explicit kill or server shutdown).
I don't know if this is a bug or something, and I'm very curious about the cause. I didn't have much time for any other "experiments".
Anyway, I hope this will help developers solve a problem -if it really exists-.
Angelos Vasilopoulos
Site Security Officer
[email protected]

Encountered ora-29701 during Sun Cluster for Oracle RAC 9.2.0.7 startup (UR

Hi all,
Need some help from all out there
In our Sun Cluster 3.1 Data Service for Oracle RAC 9.2.0.7 (Solaris 9) configuration, my team had encountered
ora-29701 *Unable to connect to Cluster Manager*
during the startup of the Oracle RAC database instances on the Oracle RAC Server resources.
We tried the attached workaround by Oracle. This workaround works well for the 1^st time but it doesnt work anymore when the server is rebooted.
Kindly help me to check whether anyone encounter the same problem as the above and able to resolve. Thanks.
Bug No. 4262155
Filed 25-MAR-2005 Updated 11-APR-2005
Product Oracle Server - Enterprise Edition Product Version 9.2.0.6.0
Platform Linux x86
Platform Version 2.4.21-9.0.1
Database Version 9.2.0.6.0
Affects Platforms Port-Specific
Severity Severe Loss of Service
Status Not a Bug. To Filer
Base Bug N/A
Fixed in Product Version No Data
Problem statement:
ORA-29701 DURING DATABASE CREATION AFTER APPLYING 9.2.0.6 PATCHSET
*** 03/25/05 07:32 am ***
TAR:
PROBLEM:
Customer applied 9.2.0.6 patchset over 9.2.0.4 patchset.
While creating the database, customer receives following error:
ORA-29701: unable to connect to Cluster Manager
However, if customer goes from 9.2.0.4 -> 9.2.0.5 -> 9.2.0.6, the problem does not occur.
DIAGNOSTIC ANALYSIS:
It seems that the problem is with libskgxn9.so shared library.
For 9.2.0.4 -> 9.2.0.5 -> 9.2.0.6, the install log shows the following:
installActions2005-03-22_03-44-42PM.log:,
[libskgxn9.so->%ORACLE_HOME%/lib/libskgxn9.so 7933 plats=1=>[46]langs=1=> en,fr,ar,bn,pt_BR,bg,fr_CA,ca,hr,cs,da,nl,ar_EG,en_GB,et,fi,de,el,iw,hu,is,in, it,ja,ko,es,lv,lt,ms,es_MX,no,pl,pt,ro,ru,zh_CN,sk,sl,es_ES,sv,th,zh_TW, tr,uk,vi]]
installActions2005-03-22_04-13-03PM.log:, [libcmdll.so ->%ORACLE_HOME%/lib/libskgxn9.so 64274 plats=1=>[46] langs=-554696704=>[en]]
For 9.2.0.4 -> 9.2.0.6, install log shows:
installActions2005-03-22_04-13-03PM.log:, [libcmdll.so ->%ORACLE_HOME%/lib/libskgxn9.so 64274 plats=1=>[46] langs=-554696704=>[en]] does not exist.
This means that while patching from 9.2.0.4 -> 9.2.0.5, Installer copies the libcmdll.so library into libskgxn9.so, while patching from 9.2.0.4 -> 9.2.0.6 does not.
ORACM is located in /app/oracle/ORACM which is different than ORACLE_HOME in customer's environment.
WORKAROUND:
Customer is using the following workaround:
cd $ORACLE_HOME/rdbms/lib make -f ins_rdbms.mk rac_on ioracle ipc_udp
RELATED BUGS:
Bug 4169291

Check if following MOS note helps.
Series of ORA-7445 Errors After Applying 9.2.0.7.0 Patchset to 9.2.0.6.0 Database (Doc ID 373375.1)

X86 cluster 3.2

Similar Messages

Maybe you are looking for