RAC11.2 EMGC 11.1 node did not discover ASM

Hi,
we installed a two node RAC with ASM and grid infrastructure on OEL. On system level grid and db runs fine.
#>crsctl stat res -t --> all ok.
We deploy the agents form EMGC 11.1the installation was finish successfully.
On node one the agent discover all components form the node...host, +ASM1, LISTENER_node1, SCANS1,2
But on node two the agent did not discover +ASM2 and the LISTENER_node2.
All Passwords are the same on both nodes and provided to the EMGC11.1
In configuration tab in EMGC 11.1 all installations are found
Oracle Database 11g 11.2.0.2.0     /u01/app/oracle/product/11.2.0/dbhome_1 (OraDb11g_home1)     Jun 9, 2011 1:26:07 PM GMT
Oracle Grid Infrastructure 11.2.0.2.0     /u01/app/11.2.0/grid (Ora11g_gridinfrahome1)     Jun 9, 2011 12:23:31 PM GMT
Oracle Management Agent 11.1.0.1.0     /u01/app/oracle/product/agent11g (agent11g1)     Jun 14, 2011 1:14:26 PM GMT
node2 #>emctl status agent ---> agent OK heartbeat to OMS --> 2011-06......
Does anyone can help to find why the agent on node2 don´t find all components. Or how to tell agent to rediscover everything.
Thanks
*T

have you installed the agent in clustermode?
I can't tell you the reason why one agent has discovered the node completely and why the other hasnt't. Please compare the file target.xml (.../sysman/emd) from the fully discovered node with the the file target.xml of the other node. Do both contain the same amount of targets?
If not, you may edit this file on the node where the targets are missing (but be careful: you have ASM1 on node#1 and ASM2 on node #2).
And you certainly have to replace the hostname from node #1 with host-name of node #2. But before editing this file please shutdown the
agent and restart them afterwards. I know that this way is not supported or recommended by Oracle - but I have done that before and it worked
fine.
If you want to figure out why the targets on one node haven't been discovered please check the logfiles under .../sysman/log. Maybe you
find a reason. Any you could try "emctl stop agent", "emctl clearstate agent" and "emctl start agent"...

Similar Messages

AppDiscovery did not discover all SCCM Applications

Hello,
I have a problem with the appdiscovery from mdt 2013.
i use mdt with sccm 2012 r2 Integration.
In my UDI Tasksequence the appdiscovery only discover Software that are installed with msi.
Scripted Software installations are not discovered.
The WMI Application History in the appdiscovery.log is empty.
Thanks for help

Hi..
Follow the instrucitons here > OS X Lion: About Lion Recovery
At that link see: Restoring iLife applications after Internet Restore of OS X Lion

Node did not refrrshed, even invalidate

Hello Friends,
I am calling a bapi in wdmodifyview, and want what surprise me the data is not modifying even I used the node.invalidate after executing the bapi ?
Any idea what can be done more to refresh the node ?
Regards,
( if its helps I can paste my code here )...

Code of wdmodify:
//@@begin wdDoModifyView
     view.resetView();
     IWDAttributeInfo attribute_var = wdContext.getNodeInfo().getAttribute("varname");
     ISimpleTypeModifiable fileName_var = attribute_var.getModifiableSimpleType();
     Zport_Get_Variant_Input variant_input = new Zport_Get_Variant_Input();
     wdContext.nodeZport_Get_Variant_Input().bind(variant_input);
     try {
          variant_input.setClient(WDClientUser.getCurrentUser().getSAPUser().getJobTitle());
          variant_input.setPuser(WDClientUser.getCurrentUser().getSAPUser().getUniqueName().toUpperCase());
          variant_input.setRpttype(wdContext.currentContextElement().getReportType() );
          variant_input.setTabname(wdContext.currentContextElement().getFileName() );
          wdContext.nodeZport_Get_Variant_Input().currentZport_Get_Variant_InputElement().modelObject().execute();
          wdContext.nodeVartab().invalidate();
          wdContext.nodeOutput_variant().invalidate();
     } catch ( Exception e)     {
     fileName_var.setFieldLabel("varname");
     IModifiableSimpleValueSet valueSet2 = fileName_var.getSVServices().getModifiableSimpleValueSet();
//     IWDInputField fld = (IWDInputField) view.getElement("Variant");
//     fld.setValue("");
     for (int i = 0; i < wdContext.nodeVartab().size(); i++) {
          valueSet2.put(wdContext.nodeVartab().getVartabElementAt(i).getVarname(), "Variant: " + wdContext.nodeVartab().getVartabElementAt(i).getVarname());
          //valueSet2.sort(true, true , true);
          if (firstTime)
               if (wdContext.currentContextElement().getCSRflag().equalsIgnoreCase("X"))
                    IWDInputField clt = (IWDInputField) view.getElement("Client");
                    clt.setReadOnly(false);
    //@@end
Regards,

Node is not processing

Hallo friends,
after years of using Logic I am trying to set up a Node. Without success
What works: a LAN connection between my two Macs. I used automatic configuration, under TCP/IP Configured IPv4 Using DHCP.
I can connect to the file server from / to each of my Macs. I can mount hard drives, share internet connections...
I installed Logic Node app on my MacBook and I used the Logic Node installer located in utilities folder on my dualG5 (I am trying to use MacBook as a node and Dual G5 as a Master computer).
I started the Node app on MacBook first, than launched Logic app on dualG5. Without opening new project I opened Preferences / Audio / Nodes. Now: I can see my MacBook available (not with grey letters). No box is checked (near "Enable Logic Nodes" or near "MacBook" under Node). So I check "Enable Logic Nodes". I can see *Initializing Core Audio* is working. Than I try to check the box near my MacBook node, but I can' t. The whole line turns into darker grey as soon as I select the checkbox,but nothing happens. when I try to select it another time, the popup window occurs and says:
*Logic Pro lost the connection to the Node "MacBook"*
Please relaunch Node application or check network connection
My network connection seems to work fine, I can log in, mount drives, share internet... The Node app seems to be or it' s icon looks like connected in MacBook dock (and it says "Connected to G5"). What I found out is, that if I try to Quit the Node app on MacBook either when Logic is running on G5 or after I quit Logic on G5, I am never able to quit Node app on my MacBook normally as it stops responding, when I try to quit it. I always have to Force Quit this Node app...
Now the second way I tried: started Node app on MacBook, launched Logic on dualG5. Opened Preferences / Audio / Nodes. Nothing is checked AND NOW I CHECKED the checkbox near the MacBook node FIRST. It can be done (remember nothing happened when I tried to check it after I checked "Enable Logic Nodes" checkbox). Than I checked "Enable Logic Nodes" , seems to be OK. The Node on MacBook says and looks like connected...
I opened a project and created approx. 10 audio tracks. I inserted Linear phase EQs, Logic compressors, Deessers, EQs to them. I also inserted whole lot of Waves SSL bundle to some tracks, but kept Logic native and Waves plugins separate. The system was pretty maxed out. Now the maximum I could get whenassign a track to a Node processing, was that the Track Node button showed green arrow, but did not light green. Whing according to manual is: Sync Pending and not Enabled/Active. I even could see some tracks Enabled/Inactive.
After spending last 24+ hours how to figure it out, I am asking for help.
Here are some answers I expect you might ask:
-yes, I have same versions of Logic and Node
-yes, I have Waves SSL bundle installed on my MacBook, same version as on DualG5
-yes, I tried it with native Logic plugins only
-yes, the Firewall is OFF on both machines (Preferences / Sharing)
- I unchecked all Ports except "Built in Ethernet" on dualG5 and "Airport" and "Built in Ethernet" on MacBook in Preferences / Network Port Configurations
-yes, I tried it with "Airport" unchecked also
-yes, I tried to reboot many times - both machines. I put Node app into login items on MacBook. I tried to shut both machines of, start MacBook first, make sure Node is "on", started dualG5, launched Logic...
-yes, I inserted so many plugins to so many tracks that may dualG5 gave me "System Overload" message everytime I tried to playback the project, but Node did not start to process...
-no, I am not networking genius
-yes, I tried different LAN cable (CAT5 always, one "crossed" or "cross-patched")
-I've got Parallels and WinXP installed on my MacBook, but they never run, when I tried Node setup (actually, I am almost never using them )
Any ideas appreciated, thank you.
Message was edited by: Diamond Dog

I'd start with something simple - one audio track, with one Logic plugin.
Can you get a solid green node light on that? if not, no point trying anything more complicated until you can get this fixed.
Note - I briefly tried it with my MBP and my old Powerbook as a node, and it worked fine without any particular headaches setting it up...

Cluster node does not shutdown after "received shutdown"

Hi,
We put together an automated restart process that restarts cluster nodes across multiple servers. To shutdown a node, we use the Coherence MBeanConnector and invoke stop on object: name=Management,nodeId=<member id>. This works for most cases where member's log output shows "received shutdown", partition transfer messages and after the last primary partitions have been transferred the VM exits.
For one node however, the VM did not exit. From looking at the log file for this particular node, the primary partitions were transferred, the distributedCache thread stops showing output, but the Cluster thread continues to show activity.
Note that this node was the last VM to stop on the given server.
Has anyone seen this before or ideas on why this particular node did not exit after receiving the shutdown message?
Thanks!
Marcel.

Hi Marcel -
Please take a thread dump (via "kill -3" or "ctrl-break") on the VM that does not stop correctly. Coherence does not shut the VM down; it simply shuts itself down. If a non-daemon thread is running on the VM, then it may not exit. However, we won't know that until we see the thread dump.
Peace,
Cameron Purdy | Oracle Coherence

DBCA did not see the ASM disk group in NODE 2 but see in NODE 1

Are there anyone who encountered creating a database using DBCA with ASM as file system?
Our issue before is in both nodes the DBCA did not see the ASM disk group.
But after setting the TNS_ADMIN in both nodes and running the DBCA as administrator in Node 1, the DBCA able to see now the ASM disk group. Unfortunately, in Node 2 it didn't work out?
So we didn't know why is it from Node 2, the DBCA still didn't see the ASM disk group?Since it is both the same.
Any ideas? Please advise.
For you information, we are using Windows 64-bit, Oracle 11g R2
Thank you in advance for those who will respond.
Edited by: 822505 on Dec 20, 2010 7:47 PM

822505 wrote:
Are there anyone who encountered creating a database using DBCA with ASM as file system?
Our issue before is in both nodes the DBCA did not see the ASM disk group.
But after setting the TNS_ADMIN in both nodes and running the DBCA as administrator in Node 1, the DBCA able to see now the ASM disk group. Unfortunately, in Node 2 it didn't work out?
So we didn't know why is it from Node 2, the DBCA still didn't see the ASM disk group?Since it is both the same.
Any ideas? Please advise.
For you information, we are using Windows 64-bit, Oracle 11g R2
Thank you in advance for those who will respond.
Are the disks given to the ASM are visible from Node2?
Aman....

Failover did not happen when one node went down!!! PLEASE HELP

Hi gurus,
Yesterday one disaster struck my RAC database. We have two node cluster and it is 10.2.0.2, both of them located in different sites, yesterday suddenly power went down and the one of the network switch went down and got destructed, node one of RAC database was connected to that switch, but the failover did not happen to the node two as this should be the case when one node goes down the other should be available for all the node one sessions/connections.
when I tried to ping/telnet the node 1, it was not happening because the switch was down, the network guyz connected the cables to other switch available. When I connected to the node 1, it was showing "Oracle is not available" message.
And when I tried the other node, it was the same case but I did not see any error in alert log file. Then my TL restarted both the nodes and then the database was available.
I am very confused that how the failover did not happen and how the database went down, PLEASE suggest something to how to identifiy what was happened. Thanks & Regards

Thanks for your reply,
after the network switch was replaced we connected to both the nodes and found that the instances are down with no reason given in the Alertlog file. We just restarted both the instances and then the database was up and the clients connected to both the instances with equal sessions on both the instances. I want to know that whether the failover can be done at the application side or it should be done on the database side i,e; in tnsnames.ora file with the required parameters? as in our scenario there is no failover configuration in the tnsnames.ora file.
Thanks & Regards

The error is " The Mapping to Node has not been completed

Hi All,
I am getting a strange type of error and need help immediately.
The error is " The Mapping to Node COMPONENTCONTROLLER.1.PLANNING_ENTITY Has Not Been Completed" for the node that exists in the Parent component and is being used in all the child nodes thru reverse mapping.
I have done mapping in all the child nodes but still the message is coming.
Could anybody tell me the reason .
Regards,
Arti.

Basically somewhere you have defined a context node 'PLANNING_ENTITY' to be an Input-Element. At the same time you did not define (through a component usage at design time) where the input to that node is coming from. This means the mapping path to the node is not complete, and the node does not know where it is mapped to.
Either:
- You untick the checkbox 'Input-Element (ext.)' inside the controller context, or
- You find the component that uses the component with the node 'PLANNING_ENTITY' and select Component_Usage->'Name of Usage'->'Add controller usage'. Inside the controller menu you see then, you can now provide a mapping to the context node.
I realise this now sounds a little confusing, but I'm happy to provide more details should you need them.
Cheers,
Robin

Node can not join cluster after RAC HA Testing

Dear forum,
We are performing RAC failover tests according to document "RAC System Test Plan Outline 11gR2, Version 2.0". In testcase #14 - Interconnect network failure (11.2.0.2 an higher), we have disabled private interconnect network of node node1 (OCR Master).
Then - as expected - node node2 was evicted. Now, after enabling private interconnect network on node node1, i want to start CRS again on node2. However, node does not join cluster with messages:
2012-03-15 14:12:35.138: [ CSSD][1113114944]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2012-03-15 14:12:35.371: [ CSSD][1109961024]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 226493542, wrtcnt, 2301201, LATS 5535614, lastSeqNo 2301198, uniqueness 1331804892, timestamp 1331817153/13040714
2012-03-15 14:12:35.479: [ CSSD][1100884288]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 226493542, wrtcnt, 2301202, LATS 5535724, lastSeqNo 2301199, uniqueness 1331804892, timestamp 1331817154/13041024
2012-03-15 14:12:35.675: [ CSSD][1080801600]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 226493542, wrtcnt, 2301203, LATS 5535924, lastSeqNo 2301200, uniqueness 1331804892, timestamp 1331817154/13041364
Rebooting node2 did not help. Node1 which was online all the time (although private interconnect interface was unplugged for a few minutes and then plugged back in). I suppose that if we reboot node2, the problem will disappear. But there should be solution, which keeps availability requirements.
Setup:
2 Nodes (OEL5U7, UEK)
2 Storages
Network bonding via Linux bonding
GI 11.2.0.3.1
RDBMS 11.1.0.7.10
Any ideas?
Regards,
Martin

I have found a solution myself:
[root@node1 trace]# echo -eth3 > /sys/class/net/bond1/bonding/slaves
[root@node1 trace]# echo -eth1 > /sys/class/net/bond1/bonding/slaves
[root@node1 trace]# echo +eth1 > /sys/class/net/bond1/bonding/slaves
[root@node1 trace]# echo +eth3 > /sys/class/net/bond1/bonding/slaves
Now node2 is automatically joining the cluster.
Regards,
martin

Error on sender FCC: InterfaceDetermination did not yield any actual intfc

Hi Experts,
I am developing a File to Proxy scenario. Sender File adapter with FCC has been configured as required with the Fixedfiled lenghths and filed names. On trying to execute the i/f by placing the source file at the desired location; Sender Communication channel throws the following exception: Error: com.sap.aii.adapter.xi.routing.RoutingException: InterfaceDetermination did not yield any actual interface. I am using the iNtegrated configuration object on PI 7.3 and Ic has been configured correctly too.
the adapter logs look as follows:
18.10.2011 08:20:46 Information Channel CC_BOURQUE_AccrualTransactions_FILE_Sender1: Converted complete file content to XML format
18.10.2011 08:20:46 Information Channel CC_BOURQUE_AccrualTransactions_FILE_Sender1: Send binary file "D:\Interfaces\ITF\07\Out\freight_export.dat", size 46538 with QoS EO
18.10.2011 08:20:46 Information MP: processing local module localejbs/CallSapAdapter
18.10.2011 08:20:46 Information Application attempting to send an XI message asynchronously using connection File_http://sap.com/xi/XI/System
18.10.2011 08:20:46 Error Returning to application. Exception: com.sap.aii.adapter.xi.routing.RoutingException: InterfaceDetermination did not yield any actual interface
18.10.2011 08:20:46 Error MP: exception caught with cause com.sap.aii.adapter.xi.routing.RoutingException: InterfaceDetermination did not yield any actual interface
18.10.2011 08:20:46 Error Attempt to process file failed with com.sap.aii.adapter.xi.routing.RoutingException: InterfaceDetermination did not yield any actual interface
Please guide me if I am missing on anything.
Regards,
Elizabeth

Hi Mark,
Have removed the namespace from the message type. Still the same issue occurs for mapping.
On trying to copy and use the source payload on the Message mapping test tab, the source structure is reflected with all the elements and nodes in red.
Following is how my source payload looks like:
<?xml version="1.0" encoding="UTF-8"?>
<MT_AccrualsTransactions>
<Accruals>
     <Row>
          <BillOfLadingNumber>21776</BillOfLadingNumber>
          <CarrierCode>VB</CarrierCode>
          <ChargeTypeInterface>FO</ChargeTypeInterface>
          <CreateDate>07-01-201107:45:22:303</CreateDate>
          <Amount>28830.51</Amount>
          <Currency>U</Currency>
          <ExportDate>07-01-201107:45:24:070</ExportDate>
          <VendorCode>105106</VendorCode>
          <CustomData1>5140200-1000000</CustomData1>
          <CustomData2>6635</CustomData2>
          <CustomData3>88846</CustomData3>
          <CustomData4>5                   2011/</CustomData4>
          <CustomData5>01/31</CustomData5>
          <CustomData6>034</CustomData6>
          <CustomData7></CustomData7>
          <CustomData8></CustomData8>
          <CustomData9></CustomData9>
          <CustomData10></CustomData10>
          <ReferenceNo></ReferenceNo>
          <EquipmentInitials></EquipmentInitials>
          <EquipmentNumber>B         MTC723</EquipmentNumber>
          <ShipDateTime>01</ShipDateTime>
          <OrderNumber>-29-201107:00:00:000U88714</OrderNumber>
          <ShippedQty></ShippedQty>
          <ShippedUOM></ShippedUOM>
                  <OrganizationCode/>
                   <AccrualDate/>
                   <EOF/>
     </Row>
</Accruals>
</MT_AccrualsTransactions>
What could be wrong?
Regards,
Elizabeth

Node text not found in SPRO

Hi All,
In SPRO transaction Time Management-->Web Applications->leave request(new)-> node text not found
I am not able to find all the three subnodes under the node leave request (new)
It is displayed as node text not found .whenever we execute these nodes it gives out an error as "there are no executable transactions assigned to this node".
These nodes are available in the development server but not available on test server.we have already activated the BC sets through SCPR20 and found the activation log successful for all EA-IMG, EA-AKH,EA-MENU.
Also activated the switch EA-HR.and ours is not an upgraded system(.4.7)
Please Help to the earliest possible as it is very urgent for us.
Points guaranteed.
Regards,
Jyothi.R

Hi Julian
Did you check whether all the upgrade program ran successfully? Also can you please let me know when exactly you get the message ?
Thanks
Debraj Roy

An error 1069 - )The service did not start due to logon failure) occurred while performing this service operation ...

Hi All,
We seem to be being plagued by the error below by our SQL Server agent. This happens almost everytime we restart the server that has been running for a day or two.
Our SQL Server Agent uses a none expiring domain credential. I understand that this problem only happens when the profile being used by the SQL Servr Agent has changed (password change). What puzzles me is that the login is A ok and no changes has been made to it's password.
We always resolve this problem by changing the login used in the SQL Server Agent to local and after that, returning it back to it's original domain login. Unfortunately, we cant always do this everytime something goes wrong.
Can anyone please help us shed a light on this? We're using SQL2k with SP3a. Thanks!
Error:
An error 1069 - )The service did not start due to logon failure) occurred while performing this service operation on the SQLServerAgent service.
Regards,
Joseph

Ran into this error, and the password was correct. What the System Event Log said:
Code SnippetEvent Type: Error
Event Source: Service Control Manager
Event Category: None
Event ID: 7041
Date: 10/8/2008
Time: 9:33:09 AM
User: N/A
Computer: ComputerName
Description:
The SQLSERVERAGENT service was unable to log on as DomainName\SQLAgent with the currently configured password due to the following error:
Logon failure: the user has not been granted the requested logon type at this computer.
Service: SQLSERVERAGENT
Domain and account: DomainName\SQLAgent
This service account does not have the necessary user right "Log on as a service."
User Action
Assign "Log on as a service" to the service account on this computer. You can use Local Security Settings (Secpol.msc) to do this. If this computer is a node in a cluster, check that this user right is assigned to the Cluster service account on all nodes in the cluster.
If you have already assigned this user right to the service account, and the user right appears to be removed, a Group Policy object associated with this node might be removing the right. Check with your domain administrator to find out if this is happening.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp
...sure enough it had been removed from the "Logon as a service" list. Hope this helps.

Error 1053: The service did not respond to the start or control ...

Dear SapService,
I've installed a BI7.0 (ABAP only) instance on cluster with :
windows 2003 x64 SP2
SQL server 2005 x64 SP2
WAS 7.00 kernel 80
*the service runs with domain user.
I've upgraded to kernel 114 (latest). After the upgrade the ASCS service doesn't start. It throws error:
"Error 1053: The service did not respond to the start or control
request in a timely fashion"
To solve this problem I've tried:
-go back to kernel 80 but I still get the same error.
-start the service manually on the second node. Still same error.
-restart the two servers and try again to start the service on the
nodes.
Any ideas ?
Please advice,
Dimitry Haritonov

Hi,
Check SAP Note 82751 - Problems with SAP Services & SAP Service Manager
for some initial investigation.
Regards,
Siddhesh

Node does not join cluster upon reboot

Hi Guys,
I have two servers [Sun Fire X4170] clustered together using Solaris cluster 3.3 for Oracle Database. They are connected to a shared storage which is Dell Equallogic [iSCSI]. Lately, I have ran into a weird kind of a problem where as both nodes come up fine and join the cluster upon reboot; however, when I reboot one of nodes then any of them does not join cluster and shows following errors:
This is happening on both the nodes [if I reboot only one node at a time]. But if I reboot both the nodes at the same time then they successfully join the cluster and everything runs fine.
Below is the output from one node which I rebooted and it did not join the cluster and puked out following errors. The other node is running fine will all the services.
In order to get out of this situation, I have to reboot both the nodes together.
# dmesg output #
Apr 23 17:37:03 srvhqon11 ixgbe: [ID 611667 kern.info] NOTICE: ixgbe2: link down
Apr 23 17:37:12 srvhqon11 iscsi: [ID 933263 kern.notice] NOTICE: iscsi connection(5) unable to connect to target SENDTARGETS_DISCOVERY
Apr 23 17:37:12 srvhqon11 iscsi: [ID 114404 kern.notice] NOTICE: iscsi discovery failure - SendTargets (010.010.017.104)
Apr 23 17:37:13 srvhqon11 iscsi: [ID 240218 kern.notice] NOTICE: iscsi session(9) iqn.2001-05.com.equallogic:0-8a0906-96cf73708-ef30000005e50a1b-sblprdbk online
Apr 23 17:37:13 srvhqon11 scsi: [ID 583861 kern.info] sd11 at scsi_vhci0: unit-address g6090a0887073cf961b0ae505000030ef: g6090a0887073cf961b0ae505000030ef
Apr 23 17:37:13 srvhqon11 genunix: [ID 936769 kern.info] sd11 is /scsi_vhci/disk@g6090a0887073cf961b0ae505000030ef
Apr 23 17:37:13 srvhqon11 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Apr 23 17:37:13 srvhqon11 /scsi_vhci/disk@g6090a0887073cf961b0ae505000030ef (sd11): Command failed to complete (3) on path iscsi0/[email protected]:0-8a0906-96cf73708-ef30000005e50a1b-sblprdbk0001,0
Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 122153 daemon.warning] svc:/network/iscsi/initiator:default: Method or service exit timed out. Killing contract 41.
Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 636263 daemon.warning] svc:/network/iscsi/initiator:default: Method "/lib/svc/method/iscsid start" failed due to signal KILL.
Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 748625 daemon.error] network/iscsi/initiator:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details)
Apr 24 14:50:16 srvhqon11 svc.startd[11]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 1
root@srvhqon11 # svcs -xv
svc:/system/cluster/loaddid:default (Oracle Solaris Cluster loaddid)
State: offline since Tue Apr 23 17:46:54 2013
Reason: Start method is running.
See: http://sun.com/msg/SMF-8000-C4
See: /var/svc/log/system-cluster-loaddid:default.log
Impact: 49 dependent services are not running:
svc:/system/cluster/bootcluster:default
svc:/system/cluster/cl_execd:default
svc:/system/cluster/zc_cmd_log_replay:default
svc:/system/cluster/sc_zc_member:default
svc:/system/cluster/sc_rtreg_server:default
svc:/system/cluster/sc_ifconfig_server:default
svc:/system/cluster/initdid:default
svc:/system/cluster/globaldevices:default
svc:/system/cluster/gdevsync:default
svc:/milestone/multi-user:default
svc:/system/boot-config:default
svc:/system/cluster/cl-svc-enable:default
svc:/milestone/multi-user-server:default
svc:/application/autoreg:default
svc:/system/basicreg:default
svc:/system/zones:default
svc:/system/cluster/sc_zones:default
svc:/system/cluster/scprivipd:default
svc:/system/cluster/cl-svc-cluster-milestone:default
svc:/system/cluster/sc_svtag:default
svc:/system/cluster/sckeysync:default
svc:/system/cluster/rpc-fed:default
svc:/system/cluster/rgm-starter:default
svc:/application/management/common-agent-container-1:default
svc:/system/cluster/scsymon-srv:default
svc:/system/cluster/sc_syncsa_server:default
svc:/system/cluster/scslmclean:default
svc:/system/cluster/cznetd:default
svc:/system/cluster/scdpm:default
svc:/system/cluster/rpc-pmf:default
svc:/system/cluster/pnm:default
svc:/system/cluster/sc_pnm_proxy_server:default
svc:/system/cluster/cl-event:default
svc:/system/cluster/cl-eventlog:default
svc:/system/cluster/cl-ccra:default
svc:/system/cluster/ql_upgrade:default
svc:/system/cluster/mountgfs:default
svc:/system/cluster/clusterdata:default
svc:/system/cluster/ql_rgm:default
svc:/system/cluster/scqdm:default
svc:/application/stosreg:default
svc:/application/sthwreg:default
svc:/application/graphical-login/cde-login:default
svc:/application/cde-printinfo:default
svc:/system/cluster/scvxinstall:default
svc:/system/cluster/sc_failfast:default
svc:/system/cluster/clexecd:default
svc:/system/cluster/sc_pmmd:default
svc:/system/cluster/clevent_listenerd:default
svc:/application/print/server:default (LP print server)
State: disabled since Tue Apr 23 17:36:44 2013
Reason: Disabled by an administrator.
See: http://sun.com/msg/SMF-8000-05
See: man -M /usr/share/man -s 1M lpsched
Impact: 2 dependent services are not running:
svc:/application/print/rfc1179:default
svc:/application/print/ipp-listener:default
svc:/network/iscsi/initiator:default (?)
State: maintenance since Tue Apr 23 17:46:54 2013
Reason: Restarting too quickly.
See: http://sun.com/msg/SMF-8000-L5
See: /var/svc/log/network-iscsi-initiator:default.log
Impact: This service is not running.
######## Cluster Status from working node ############
root@srvhqon10 # cluster status
=== Cluster Nodes ===
--- Node Status ---
Node Name Status
srvhqon10 Online
srvhqon11 Offline
=== Cluster Transport Paths ===
Endpoint1 Endpoint2 Status
srvhqon10:igb3 srvhqon11:igb3 faulted
srvhqon10:igb2 srvhqon11:igb2 faulted
=== Cluster Quorum ===
--- Quorum Votes Summary from (latest node reconfiguration) ---
Needed Present Possible
2 2 3
--- Quorum Votes by Node (current status) ---
Node Name Present Possible Status
srvhqon10 1 1 Online
srvhqon11 0 1 Offline
--- Quorum Votes by Device (current status) ---
Device Name Present Possible Status
d2 1 1 Online
=== Cluster Device Groups ===
--- Device Group Status ---
Device Group Name Primary Secondary Status
--- Spare, Inactive, and In Transition Nodes ---
Device Group Name Spare Nodes Inactive Nodes In Transistion Nodes
--- Multi-owner Device Group Status ---
Device Group Name Node Name Status
=== Cluster Resource Groups ===
Group Name Node Name Suspended State
ora-rg srvhqon10 No Online
srvhqon11 No Offline
nfs-rg srvhqon10 No Online
srvhqon11 No Offline
backup-rg srvhqon10 No Online
srvhqon11 No Offline
=== Cluster Resources ===
Resource Name Node Name State Status Message
ora-listener srvhqon10 Online Online
srvhqon11 Offline Offline
ora-server srvhqon10 Online Online
srvhqon11 Offline Offline
ora-stor srvhqon10 Online Online
srvhqon11 Offline Offline
ora-lh srvhqon10 Online Online - LogicalHostname online.
srvhqon11 Offline Offline
nfs-rs srvhqon10 Online Online - Service is online.
srvhqon11 Offline Offline
nfs-stor-rs srvhqon10 Online Online
srvhqon11 Offline Offline
nfs-lh-rs srvhqon10 Online Online - LogicalHostname online.
srvhqon11 Offline Offline
backup-stor srvhqon10 Online Online
srvhqon11 Offline Offline
cluster: (C383355) No response from daemon on node "srvhqon11".
=== Cluster DID Devices ===
Device Instance Node Status
/dev/did/rdsk/d1 srvhqon10 Ok
/dev/did/rdsk/d2 srvhqon10 Ok
srvhqon11 Unknown
/dev/did/rdsk/d3 srvhqon10 Ok
srvhqon11 Unknown
/dev/did/rdsk/d4 srvhqon10 Ok
/dev/did/rdsk/d5 srvhqon10 Fail
srvhqon11 Unknown
/dev/did/rdsk/d6 srvhqon11 Unknown
/dev/did/rdsk/d7 srvhqon11 Unknown
/dev/did/rdsk/d8 srvhqon10 Ok
srvhqon11 Unknown
/dev/did/rdsk/d9 srvhqon10 Ok
srvhqon11 Unknown
=== Zone Clusters ===
--- Zone Cluster Status ---
Name Node Name Zone HostName Status Zone Status
Regards.

check if your global devices are mounted properly
#cat /etc/mnttab | grep -i global
check if proper entries are there on both systems
#cat /etc/vfstab | grep -i global
give output for quoram devices .
#scstat -q
or
#clquorum list -v
also check why your scsi initiator service is going offline unexpectedly
#vi /var/svc/log/network-iscsi-initiator:default.log

Failover did not happen when network switch went down!!! PLEASE HELP

Hi gurus,
Yesterday one disaster struck my RAC database. We have two node cluster and it is 10.2.0.2, both of them located in different sites, yesterday suddenly power went down and the one of the network switch went down and got destructed, node one of RAC database was connected to that switch, but the failover did not happen to the node two as this should be the case when one node goes down the other should be available for all the node one sessions/connections.
when I tried to ping/telnet the node 1, it was not happening because the switch was down, the network guyz connected the cables to other switch available. When I connected to the node 1, it was showing "Oracle is not available" message.
And when I tried the other node, it was the same case but I did not see any error in alert log file. Then my TL restarted both the nodes and then the database was available.
I am very confused that how the failover did not happen and how the database went down, PLEASE suggest something to how to identifiy what was happened. Thanks & Regards
Edited by: user1221 on Mar 18, 2009 1:09 AM

About Oracle RAC ... you have 2 nodes ... You have to connect 4 IPs.
I mean
- IP public node1
- IP public node2
- IP Virtual node1
- IP Virtual node2
When node1 down
You can not ping "IP public node1", but you should ping "- IP Virtual node1", because it should up on node2.
But not bind 1521 port.
Idea about failover
you have to create new difference Service...on RAC Database!
and on your client have to set TNS to on failover and balance (if you used OCI), you can use TAF feature
Example:
DB =
(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = db01-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = db02-vip)(PORT = 1521)) (FAILOVER=ON)(LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = NEW_SERVICE_DB) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5))))
If you use JDBC, can not use TAF.
Anyway, I suggest you read more about RAC on http://otn.oracle.com/rac and http://oracleracsig.org
Good Luck

RAC11.2 EMGC 11.1 node did not discover ASM

Similar Messages

Maybe you are looking for