Secondary Node Rebooted instead of falling to Ok prompt

Hi all,
We need to get system backup for our clustered DB, before and after our maintenance work.
We have the following configuration:
Node #1 and Node #2
Solaris 8
SunCluster 3.0
Oracle 9.3.4
VxVM 3.2
Before issuing cluster shutdown command, I verified which node is primary.
#scstat
I issued scshutdown -y -i0 on the primary node, the secondary node rebooted instead of halting to {ok} prompt. (The primary server successfully fall to ok prompt)
When I checked on the logs on the secondary node.
May 16 08:18:41 SC[SUNW.HAStoragePlus,ttmapd-rg,tmsstor-res,hastorageplus_prenet_start_private]: Global device path /dev/vx/rdsk/tms_usr_dg01/bak_redo11_vol is not recognized as a device group or a device special file.
May 16 08:18:42 SC[SUNW.LogicalHostname,ttmapd-rg,tmslhost3-res,hafoip_start]: pnm_init: RPC: Rpcbind failure - RPC: Unable to receive
May 16 08:18:42 SC[SUNW.LogicalHostname,ttmapd-rg,tmslhost0-res,hafoip_start]: pnm_init: RPC: Rpcbind failure - RPC: Unable to receive
May 16 08:18:42 SC[SUNW.LogicalHostname,ttmapd-rg,tmslhost3-res,hafoip_start]: Failed to validate NAFO group name <nafo0> nafo errorcode <5>.
May 16 08:18:42 SC[SUNW.LogicalHostname,ttmapd-rg,tmslhost0-res,hafoip_start]: Failed to validate NAFO group name <nafo1> nafo errorcode <5>.
May 16 08:18:42 SC[SUNW.LogicalHostname,ttmapd-rg,tmslhost0-res,hafoip_stop]: pnm_init: RPC: Rpcbind failure - RPC: Unable to receive
May 16 08:18:42 SC[SUNW.LogicalHostname,ttmapd-rg,tmslhost0-res,hafoip_stop]: Failed to validate NAFO group name <nafo1> nafo errorcode <5>.
May 16 08:18:42 SC[SUNW.LogicalHostname,ttmapd-rg,tmslhost3-res,hafoip_stop]: pnm_init: RPC: Rpcbind failure - RPC: Unable to receive
May 16 08:18:42 SC[SUNW.LogicalHostname,ttmapd-rg,tmslhost3-res,hafoip_stop]: Failed to validate NAFO group name <nafo0> nafo errorcode <5>.
Has anyone encountered this error before.
Thank you in advance.
Regards,
Rachele

scshutdown gives a shutdown command on both nodes. here is the procedure.
-failover your resourcegroup to node_2
root@node_2#scswitch -z -g oracle -h node_2
you can check the status of the with scstat
-on the node which you want to backup:
root@node_1#init 0
ok boot -sx
s = single usermode
x = outside of the cluster
Once you're in single usermode, start your backup
If you wish to avoid a bunch of logging on node_2, you can always disable, or set it in maintenance, node_1 on node_2 with scconf:
root@node_2#scconf -q node=node_1,maintstate
(make sure you know what you're doing here)
to reboot node_1 and join it into the cluster:
root@node_1#umount -a
root@node_1#sync
root@node_1#reboot
ok boot (if auto-boot? is set to false)
root@node_2#scconf -q node=node_1,reset
that's it
cheers,
Kim

Similar Messages

ISE admin , PSN and monitoring node fail-over and fall back scenario

Hi Experts,
I have question about ISE failover .
I have two ISE appliaces in two different location . I am trying to understand the fail-over scenario and fall-back scenario
I have gone through document as well however still not clear.
my Primary ISE server would have primary admin role , primary monitoring node and secondary ISE would have secondary admin and secondary monitoring role .
In case of primary ISE appliance failure , I will have to login into secondary ISE node and make admin role as primary but how about if primary ISE comes back ? what would be scenario ?
during the primary failure will there any impact with users for authentication ? as far as PSN is available from secondary , it should work ...right ?
and what is the actual method to promote the secondary ISE admin node to primary ? do i have to even manually make monitoring node role changes ?
will i have to reboot the secondary ISE after promoting admin role to primary ?

We have the same set up across an OTV link and have tested this scenario out multiple times. You don't have to do anything if communication is broken between the prim and secondary nodes. The secondary will automatically start authenticating devices that it is in contact with. If you promote the secondary to primary after the link is broke it will assume the primary role when the link is restored and force the former primary nodes to secondary.

Error while adding the secondary node in shared APPL_TOP.

Hi,
We are getting the below error , when trying to add the secondary node in shared APPL_TOP .
We ran the commnad.
perl -I <AU_TOP>/perl txkSOHM.pl
AutoConfig is configuring the Applications environment...
AutoConfig will consider the custom templates if present.
Using APPL_TOP location : /u2590/oracle/oaq5appl
Classpath : /u2590/oracle/oaq5comn/util/java/1.4/j2sdk1.4.2_04/jre/lib/rt.jar:/u2590/oracle/oaq5comn/util/java/1.4/j2sdk1.4.2_04/lib/dt.jar:/u2590/oracle/oaq5comn/util/java/1.4/j2sdk1.4.2_04/lib/tools.jar:/u2590/oracle/oaq5comn/java/appsborg2.zip:/u2590/oracle/oaq5comn/java
Exception in thread "main" java.lang.NoClassDefFoundError: oracle/apps/ad/autoconfig/oam/CtxSynchronizerException
at oracle.apps.ad.context.CtxValueMgt.processCtxFile(CtxValueMgt.java:1548)
at oracle.apps.ad.context.CtxValueMgt.main(CtxValueMgt.java:709)
ERROR: Context Value Management Failed.
Terminate.
The logfile for this session is located at:
/u2590/oracle/oaq5comn/admin/log/oaq5_qn2lx793/txkSetSOHM_ac.log
txkSOHM.pl successfully completed
Regards

Hi,
Is this the complete error message?
Any more details about the error in the log file?
Do you have the latest AutoConfig patch applied?
Regards,
Hussein

SC 3.2 Solaris 10 x86. When one node reboot, the other one does also

Configured a two node cluster with a EMC clariion san (Raid 6) for holding a zpool and use as quorum device.
When one node goes down, the other one does also.
There seems a problem with the quorum.
I can not understand or figure out what actually goes wrong.
When starting up:
Booting as part of a cluster
NOTICE: CMM: Node cnode01 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node cnode02 (nodeid = 2) with votecount = 1 added.
NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d1s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
NOTICE: clcomm: Adapter nge3 constructed
NOTICE: clcomm: Adapter nge2 constructed
NOTICE: CMM: Node cnode01: attempting to join cluster.
NOTICE: nge3: link down
NOTICE: nge2: link down
NOTICE: nge3: link up 1000Mbps Full-Duplex
NOTICE: nge2: link up 1000Mbps Full-Duplex
NOTICE: nge3: link down
NOTICE: nge2: link down
NOTICE: nge3: link up 1000Mbps Full-Duplex
NOTICE: nge2: link up 1000Mbps Full-Duplex
NOTICE: CMM: Node cnode02 (nodeid: 2, incarnation #: 1248284052) has become reachable.
NOTICE: clcomm: Path cnode01:nge2 - cnode02:nge2 online
NOTICE: clcomm: Path cnode01:nge3 - cnode02:nge3 online
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node cnode01 (nodeid = 1) is up; new incarnation number = 1248284001.
NOTICE: CMM: Node cnode02 (nodeid = 2) is up; new incarnation number = 1248284052.
NOTICE: CMM: Cluster members: cnode01 cnode02.
NOTICE: CMM: node econfiguration #1 completed.
NOTICE: CMM: Node cnode01: joined cluster.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
/dev/rdsk/c2t0d0s5 is clean
Reading ZFS config: done.
obtaining access to all attached disks
cnode01 console login:
Then this on the second node:
Booting as part of a cluster
NOTICE: CMM: Node cnode01 (nodeid = 1) with votecount = 1
NOTICE: CMM: Node cnode02 (nodeid = 2) with votecount = 1
NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d1s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
NOTICE: clcomm: Adapter nge3 constructed
NOTICE: clcomm: Adapter nge2 constructed
NOTICE: CMM: Node cnode02: attempting to join cluster.
NOTICE: CMM: Node cnode01 (nodeid: 1, incarnation #: 1248284001) has become reachable.
NOTICE: clcomm: Path cnode02:nge2 - cnode01:nge2 online
NOTICE: clcomm: Path cnode02:nge3 - cnode01:nge3 online
WARNING: CMM: Issuing a NULL Preempt failed on quorum device /dev/did/rdsk/d1s2 with error 2.
NOTICE: CMM: Cluster has reached quorum.ion ratio 4.77, dump succeeded
NOTICE: CMM: Node cnode01 (nodeid = 1) is up; new incarnation number = 1248284001.
NOTICE: CMM: Node cnode02 (nodeid = 2) is up; new incarnation number = 1248284052.
NOTICE: CMM: Cluster members: cnode01 cnode02.
NOTICE: CMM: node reconfiguration #1 completed.
NOTICE: CMM: Node cnode02: joined cluster.
NOTICE: CCR: Waiting for repository synchronization to finish.
*{color:#ff0000}WARNING: CMM: Issuing a NULL Preempt failed on quorum device /dev/did/rdsk/d1s2 with error 2.{color}*
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
/dev/rdsk/c2t0d0s5 is clean
Reading ZFS config: done.
obtaining access to all attached disks
cnode02 console login:
But when the first node reboot, on the second node this message:
Jul 22 19:24:48 cnode02 genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0
Jul 22 19:30:57 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge3: link down
Jul 22 19:30:57 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge2: link down
Jul 22 19:30:59 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge3: link up 1000Mbps Full-Duplex
Jul 22 19:31:00 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge2: link up 1000Mbps Full-Duplex
Jul 22 19:31:06 cnode02 genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path cnode02:nge2 - cnode01:nge2 being drained
{color:#ff0000}Jul 22 19:31:06 cnode02 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x0{color}
Jul 22 19:31:06 cnode02 genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path cnode02:nge3 - cnode01:nge3 being drained
Jul 22 19:31:11 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge3: link down
{color:#ff0000}Jul 22 19:31:12 cnode02 genunix: [ID 414208 kern.warning] WARNING: QUORUM_GENERIC: quorum preempt error in CMM: Error 5 --- QUORUM_GENERIC Tkown ioctl failed on quorum device /dev/did/rdsk/d1s2.{color}
{color:#ff0000}Jul 22 19:31:12 cnode02 cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking
Jul 22 19:31:12 cnode02 unix: [ID 836849 kern.notice]
Jul 22 19:31:12 cnode02 ^Mpanic[cpu3]/thread=ffffffff8b5c06e0:
Jul 22 19:31:12 cnode02 genunix: [ID 265925 kern.notice] CMM: Cluster lost operational quorum; aborting.{color}
Jul 22 19:31:12 cnode02 unix: [ID 100000 kern.notice]
Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651b40 genunix:vcmn_err+13 ()
Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651b50 cl_runtime:__1cZsc_syslog_msg_log_no_args6FpviipkcpnR__va_list_element__nZsc_syslog_msg_status_enum__+24 ()
Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651c30 cl_runtime:__1cCosNsc_syslog_msgDlog6MiipkcE_nZsc_syslog_msg_status_enum__+9d ()
Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651e20 cl_haci:__1cOautomaton_implbAstate_machine_qcheck_state6M_nVcmm_automaton_event_t__+3bc ()
Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651e60 cl_haci:__1cIcmm_implStransitions_thread6M_v_+de ()
Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651e70 cl_haci:__1cIcmm_implYtransitions_thread_start6Fpv_v_+b ()
Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651ed0 cl_orb:cllwpwrapper+106 ()
Jul 22 19:31:13 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651ee0 unix:thread_start+8 ()
Jul 22 19:31:13 cnode02 unix: [ID 100000 kern.notice]
Jul 22 19:31:13 cnode02 genunix: [ID 672855 kern.notice] syncing file systems...
Jul 22 19:31:13 cnode02 genunix: [ID 733762 kern.notice] 1
Jul 22 19:31:34 cnode02 last message repeated 20 times
Jul 22 19:31:35 cnode02 genunix: [ID 622722 kern.notice] done (not all i/o completed)
Jul 22 19:31:36 cnode02 genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c2t0d0s1, offset 3436511232, content: kernel
Jul 22 19:31:45 cnode02 genunix: [ID 409368 kern.notice] ^M100% done: 136950 pages dumped, compression ratio 4.77,
Jul 22 19:31:45 cnode02 genunix: [ID 851671 kern.notice] dump succeeded
Jul 22 19:33:18 cnode02 genunix: [ID 540533 kern.notice] ^M

Hi,
the problem lies in the error message around the quorum device. The SC documentation, specifically the Sun Cluster Error Messages Guide at http://docs.sun.com/app/docs/doc/820-4681 explains this as follows:
414208 QUORUM_GENERIC: quorum preempt error in CMM: Error %d --- QUORUM_GENERIC Tkown ioctl failed on quorum device %s.
Description:
This node encountered an error when issuing a QUORUM_GENERIC Take Ownership operation on a quorum device. This error indicates that the node was unsuccessful in preempting keys from the quorum device, and the partition to which it belongs was preempted. If a cluster is divided into two or more disjoint subclusters, one of these must survive as the operational cluster. The surviving cluster forces the other subclusters to abort by gathering enough votes to grant it majority quorum. This action is called "preemption of the losing subclusters".
Solution:
Other related messages identify the quorum device where the error occurred. If an EACCES error occurs, the QUORUM_GENERIC command might have failed because of the SCSI3 keys on the quorum device. Scrub the SCSI3 keys off the quorum device and reboot the preempted nodes."
You should try to follow this advice. I would propose to chose a different QD before trying to do this, if you have one available. Is it possible that this LUN has been in use by a different cluster?
To scrub SCSI3 keys you should use the scsi command in /usr/cluster/lib/sc: ./scsi -c inkeys -d <device> to check for the existence of keys, and ...-c scrub.. to remove any SCSI3 keys.
Regards
Hartmut

Can't shutdown, poweroff: machines reboots instead

No matter which commands I use, systemctl poweroff, shutdown -h, halt -p, my machine will immediately reboot instead. Here are all journalctl lines immediately after calling systemctl poweroff: the system will tell me that it is going to shutdown, but instead it reboots. Also, the last message I see before the reboot occurs is along the lines of “rebooting now”.
Nov 08 13:46:04 rechenschieber systemd-logind[335]: System is powering down.
Nov 08 13:46:04 rechenschieber org.a11y.atspi.Registry[836]: g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async rea
Nov 08 13:46:04 rechenschieber polkitd[424]: Unregistered Authentication Agent for unix-process:888:1773 (system bus name :1.13, object path /org/freedesktop/PolicyKit1/Authentic
Nov 08 13:46:04 rechenschieber org.a11y.Bus[769]: g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-err
Nov 08 13:46:04 rechenschieber systemd[609]: Stopping Default.
Nov 08 13:46:04 rechenschieber systemd[609]: Stopped target Default.
Nov 08 13:46:04 rechenschieber systemd[609]: Stopping Basic System.
Nov 08 13:46:04 rechenschieber systemd[609]: Stopped target Basic System.
Nov 08 13:46:04 rechenschieber systemd[609]: Stopping Paths.
Nov 08 13:46:04 rechenschieber systemd[609]: Stopped target Paths.
Nov 08 13:46:04 rechenschieber systemd[609]: Stopping Timers.
Nov 08 13:46:04 rechenschieber systemd[609]: Stopped target Timers.
Nov 08 13:46:04 rechenschieber systemd[609]: Stopping Sockets.
Nov 08 13:46:04 rechenschieber systemd[609]: Stopped target Sockets.
Nov 08 13:46:04 rechenschieber systemd[609]: Starting Shutdown.
Nov 08 13:46:04 rechenschieber systemd[609]: Reached target Shutdown.
Nov 08 13:46:04 rechenschieber systemd[609]: Starting Exit the Session...
Nov 08 13:46:04 rechenschieber rpcbind[399]: rpcbind terminating on signal. Restart with "rpcbind -w"
Nov 08 13:46:04 rechenschieber NetworkManager[331]: <warn> disconnected by the system bus.
Nov 08 13:46:04 rechenschieber NetworkManager[331]: g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-e
Nov 08 13:46:04 rechenschieber systemd[609]: Received SIGRTMIN+24 from PID 897 (kill).
Nov 08 13:46:04 rechenschieber systemd[611]: pam_unix(systemd-user:session): session closed for user janis
Nov 08 13:46:04 rechenschieber systemd[1]: rpcbind.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 08 13:46:04 rechenschieber systemd[1]: Unit rpcbind.service entered failed state.
Nov 08 13:46:04 rechenschieber systemd[1]: rpcbind.service failed.
Nov 08 13:46:05 rechenschieber NetworkManager[331]: <info> caught signal 15, shutting down normally.
Nov 08 13:46:05 rechenschieber NetworkManager[331]: <info> (wlan0): device state change: disconnected -> unmanaged (reason 'removed') [30 10 36]
Nov 08 13:46:05 rechenschieber NetworkManager[331]: ** (NetworkManager:331): CRITICAL **: dbus_g_proxy_call_no_reply: assertion '!DBUS_G_PROXY_DESTROYED (proxy)' failed
Nov 08 13:46:05 rechenschieber NetworkManager[331]: <info> (eth0): device state change: unavailable -> unmanaged (reason 'removed') [20 10 36]
Nov 08 13:46:05 rechenschieber kernel: IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Nov 08 13:46:05 rechenschieber wpa_supplicant[536]: Successfully initialized wpa_supplicant
Nov 08 13:46:05 rechenschieber wpa_supplicant[536]: wlan0: CTRL-EVENT-TERMINATING
Nov 08 13:46:05 rechenschieber kernel: e1000e: eth0 NIC Link is Down
Nov 08 13:46:05 rechenschieber kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Nov 08 13:46:05 rechenschieber NetworkManager[331]: (NetworkManager:331): GLib-CRITICAL **: Source ID 37 was not found when attempting to remove it
Nov 08 13:46:05 rechenschieber NetworkManager[331]: <info> exiting (success)
Nov 08 13:46:05 rechenschieber umount[981]: umount: /var: target is busy
Nov 08 13:46:05 rechenschieber umount[981]: (In some cases useful info about processes that
Nov 08 13:46:05 rechenschieber umount[981]: use the device is found by lsof(8) or fuser(1).)
Nov 08 13:46:05 rechenschieber systemd[1]: var.mount mount process exited, code=exited status=32
Nov 08 13:46:05 rechenschieber systemd[1]: Failed unmounting /var.
Nov 08 13:46:05 rechenschieber systemd[1]: Shutting down.
Nov 08 13:46:05 rechenschieber systemd-journal[141]: Journal stopped
-- Reboot --
I tried following up on several suggestions I found around here and online: I disabled Wake-on-Lan in the BIOS, I disabled/stopped tlp.service and tpfand.service (the only power-management-like tools I am running). But that does not help.
Note: the system is not powering off to then reboot. It reboots immediately, just as if I had executed `systemctl reboot'.
FWIW, my system is up to date:
3.17.2-1-ARCH #1
systemd 217
Any fixes? It's getting a bid annoying having to shut down the machine through the syslinux menu after each of these reboots.
Last edited by Stalafin (2014-11-08 12:59:53)

in your log we can read this :
rpcbind.service failed
umount: /var: target is busy
Failed unmounting /var
it's not normal, you could try to solve these errors, it may help to solve the shutdown problem,
use "systemctl --failed --all" at boot in order to know if all is Ok at startup

Unable to register secondary node on Cisco ISE 1.1.4

Hello,
I have a problems with registering the secondary node on Cisco ISE 1.1.4.
I did all like described on User Guide:
- Primary ISE is promoted to PRIMARY.
- DNS entries are added and resolved for both ISEs
- The "Certificate Store" on both ISEs are populated with self-signed certificates from both ISEs.
Durring the registration process (from Primary node), when I add the IP, username and password for secondary node, an empty popup message displayed with only button "OK".
So, I cannot proceed to far and don't see the error indicated what's wrong.
In attachment - screenshot with popup message.
I use IE 8.0.6001.
The lattest patch (1.1.4.218-7-87377) applied on both ISEs.
Is somebody had the similar problem?
Thanks,
PC

Hello,
In the debug logs "ise-psc.log" I see :
2013-11-11 08:43:47,534 ERROR 2013-11-11 08:43:47,534 [http-443-7][] cpm.admin.infra.action.DeploymentEditAction- An exception occurred during the registration of a deployment node: java.lang.NullPointerException
java.lang.NullPointerException
at com.cisco.cpm.admin.infra.action.DeploymentEditAction.registerSubmit(DeploymentEditAction.java:455)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.cisco.webui.action.common.PojoActionProxy.performExecution(PojoActionProxy.java:176)
at com.cisco.webui.action.common.PojoActionProxy.execute(PojoActionProxy.java:89)
at org.apache.struts.chain.commands.servlet.ExecuteAction.execute(ExecuteAction.java:58)
at org.apache.struts.chain.commands.AbstractExecuteAction.execute(AbstractExecuteAction.java:67)
at org.apache.struts.chain.commands.ActionCommandBase.execute(ActionCommandBase.java:51)
at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:191)
at org.apache.commons.chain.generic.LookupCommand.execute(LookupCommand.java:305)
at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:191)
at org.apache.struts.chain.ComposableRequestProcessor.process(ComposableRequestProcessor.java:283)
at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1913)
at org.apache.struts.action.ActionServlet.doPost(ActionServlet.java:462)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.cisco.xmp.wap.dojo.servlet.filter.DojoIframeSendFilter.doFilter(DojoIframeSendFilter.java:58)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.cisco.cpm.admin.infra.utils.WebCleanCacheFilter.doFilter(WebCleanCacheFilter.java:35)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.cisco.cpm.rbacfilter.AccessCheckFilter.doFilter(AccessCheckFilter.java:71)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.cisco.cpm.admin.infra.utils.UserInfoFilter.doFilter(UserInfoFilter.java:110)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.cisco.cpm.admin.infra.utils.CsrfPreventionFilter.doFilter(CsrfPreventionFilter.java:113)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.cisco.cpm.admin.infra.utils.LoginCheckFilter.doFilter(LoginCheckFilter.java:188)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.cisco.cpm.admin.infra.utils.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:121)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:563)
at org.apache.catalina.valves.RequestFilterValve.process(RequestFilterValve.java:316)
at org.apache.catalina.valves.LocalAddrValve.invoke(LocalAddrValve.java:43)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.authenticator.SingleSignOn.invoke(SingleSignOn.java:394)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.valves.MethodsValve.invoke(MethodsValve.java:52)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Unknown Source)
2013-11-11 08:44:00,226 INFO 2013-11-11 08:44:00,226 [http-443-1][] cpm.admin.infra.action.SupportBundleAction- editPreload() triggered. Selected hostname is BB1NACEASTP01
2013-11-11 08:44:00,226 INFO 2013-11-11 08:44:00,226 [http-443-1][] cpm.admin.infra.action.SupportBundleAction- ParameterNames in load()= BB1NACEASTP01
2013-11-11 08:44:00,226 INFO 2013-11-11 08:44:00,226 [http-443-1][] cpm.admin.infra.action.SupportBundleAction- editPreload(): userName= adminhostname= BB1NACEASTP01
2013-11-11 08:44:01,017 INFO 2013-11-11 08:44:01,017 [http-443-1][] cpm.admin.infra.action.SupportBundleAction- ParameterNames in load()= BB1NACEASTP01
2013-11-11 08:44:01,017 INFO 2013-11-11 08:44:01,017 [http-443-1][] cpm.admin.infra.action.SupportBundleAction- Inside load() API : hostNameBB1NACEASTP01 userName : admin
2013-11-11 08:44:01,017 INFO 2013-11-11 08:44:01,017 [http-443-1][] cpm.admin.infra.action.SupportBundleAction- Inside fetchFile() API : hostName: BB1NACEASTP01 userName : admin
2013-11-11 08:44:01,018 INFO 2013-11-11 08:44:01,018 [http-443-3][] cpm.admin.infra.action.SupportBundleAction- ParameterNames in sbfCreationPercentage()= BB1NACEASTP01
2013-11-11 08:44:01,021 INFO 2013-11-11 08:44:01,021 [http-443-3][] cpm.admin.infra.action.SupportBundleAction- Got hostAlias= BB1NACEASTP01
2013-11-11 08:44:01,021 INFO 2013-11-11 08:44:01,021 [http-443-3][] cpm.admin.infra.action.SupportBundleAction- Ping node: BB1NACEASTP01 for connectivity
2013-11-11 08:44:01,181 INFO 2013-11-11 08:44:01,181 [http-443-3][] cpm.admin.infra.action.SupportBundleAction- Received pingNode response : Node is reachable

RAC node reboots from time to time

Hi %,
we have a problem with our rac: it's a three node rac on sles9, 64 bit. one node reboots from time to time. We found nothing in any log file. (only in /var/log/messages of node 1:
"Feb 21 14:58:02 pmg-db1 kernel: o2net: connection to node pmg-db2 (num 1) at 192.168.0.2:7777 has been idle for 10 seconds, shutting it down."
). Does anyone had a similar problem? Or anyone an idea?
regards
Andreas

sorry no /var/log/demsg.
Perhaps I have to write another detail: the third node was added after the two node rac ran for several month. First we had the reboot problem with this third node. We found out, that the interconnect was connected to a 100Mbit module of the switch and not to a 1000Mbit module. We changed this a few days ago, but no the second node rebooted. And it is connected with 1000Mbit/s.
And did I mention, that we use 10.2.0.2?
regards
Andreas

SAP Failover not working on secondary node

Hi All
We have configured HA of SAP on DB2: ASCS,CI and SCS is running on primary node with HACMP clustering installed with virtual hostname,and just running DB2 client on secondary node.
Now we are receiving following error while switching over manually on the secondary node:
Starting SAP-Collector Daemon
21:34:58 26.07.2010   LOG: Effective User Id is root
This is Saposcol Version COLL 20.94 700 - AIX v10.35 5L-64 bit 070123
Usage: saposcol -l: Start OS Collector
        saposcol -k: Stop OS Collector
        saposcol -d: OS Collector Dialog Mode
        saposcol -s: OS Collector Status
The OS Collector (PID 233504) is already running .....
saposcol already running
Running /usr/sap/BP1/SYS/exe/run/startdb
DB startup failed
Below is the output of R3trans -d
This is R3trans version 6.14 (release 700 - 15.06.07 - 15:50:00).
unicode enabled version
2EETW169 no connect possible: "environment variable dbms_type is not set."
R3trans finished (0012).
And the output of tran.log is as follows:
4 ETW000 R3trans version 6.14 (release 700 - 15.06.07 - 15:50:00).
4 ETW000 unicode enabled version
4 ETW000 ===============================================
4 ETW000
4 ETW000 date&time   : 26.07.2010 - 21:35:52
4 ETW000 control file: <no ctrlfile>
4 ETW000 R3trans was called as follows: R3trans -d
4 ETW000 trace at level 2 opened for a given file pointer
4 ETW000 [dev trc     ,00000] Mon Jul 26 21:35:52 2010
      51 0.000051
4 ETW000 [dev trc     ,00000] db_con_init called
      20 0.000071
4 ETW000 [dev trc     ,00000] create_con (con_name=R/3)
      33 0.000104
*4 ETW000 [dbcon.c     ,00000] *** ERROR => Invalid profile parameter dbms/type*
(or environment variable dbms_type) = <undef>, cannot load DB library
4 ETW000
      52 0.000156
2EETW169 no connect possible: "environment variable dbms_type is not set."
Kindly let us know how to go about in resolving this issue.
Thanks in Advance
Hemant

> Now we are receiving following error while switching over manually on the secondary node:
> 2EETW169 no connect possible: "environment variable dbms_type is not set."
The user <sid>adm lacks the proper environment. Is this environment variable set on both nodes?
Markus

Os 10.4.7, reboots instead of waking from sleep

periodically my mac reboots instead of waking from sleep (with mouse or keyboard), once every few days. is this a problem? how can i change that?
should i worry about it?

Try updating the mainboard bios to the most recent:
>>Beta BIOSes<<
Also, what is the cards s/n and can you upload a copy of the vbios somewhere. You can use GPUZ to copy the current vbios.

Scale out question: how does instance name affect primary/secondary node

Hi:
OBIEE 11.1.1.6.4, Windows 2008
On a test instance the following happened:
1. Enterprise install of OBIEE 11g had problems
2. Reinstalled OBIEE 11g, but the installer created an instance2
3. All was well with this install
4. Performed all necessary shared catalog and RPD steps
5. On a second server, installed OBIEE 11g using the scale-out option
6. On second server, this created an instance1 directory
7. All components start up and we can log into OBIEE using either server1 or server2
Question:
Does OBIEE use the instance names in a scale-out? Our primary node is instance2 and the secondary node is instance1, though on a separate server. Will this cause a problem?
Thanks for any help.

Hello,
This can be run on any of the replicas that the availability group participates in to return the primary instance:
select primary_replica FROM sys.dm_hadr_name_id_map nim
inner join sys.dm_hadr_availability_group_states ags
on nim.ag_id = ags.group_id
WHERE nim.ag_name = 'MyAvailabilityGroupNameHere'
Sean Gallardy | Blog | Microsoft Certified Master

Operational Quorum and both nodes rebooting.

I've experienced an issue that when I rip out the SCSI cables to shared storage (and the quorum device), both nodes panic and
reboot. Is this expected behavior?
It seems that it is understandable that the active node reboots, because it lost the disk-path and quorum device. But should
the stand-by node reboot to?

No problem.
It's running S10 update 4 w/ SC 3.2.
3120 JBOD attached to two T2000's, two-node cluster.
I'm wondering if the stand-by node didn't see the quorum device, when the the active nodes scsi cables were pulled.
We pulled the standby nodes SCSI cables and reconnected them prior to pulling the active nodes. The difference was that the stand-by node's /var/adm/messages log was filled with expected messages regarding a missing disk. The cables were re-attached to the stand-by and then yanked out of the active node. This is when both nodes panicked.

Connect some users on ISE Secondary node

Is it possible to connect users on secondary node?
I tried it. I configure one switch to connect on the secondary node. A computer on that switch communicate with the secondary node and get and IP address from the DHCP. but It cannot download DACL.

Yes you can point the users to the secondary server and have them authenticate, within ise the primary and secondary status only applies to admin and the monitoring personas, as as the node is running the policy services they are all considered their own standalone radius server.
please use the "debug radius authentication" and all check the replicstion status and see if it is in sync and completed.
Thanks
Tarik Admani
*Please rate helpful posts*

Node reboot

Hello
This was my exam question last week.
"b" and "e" are definetely correct but not sure about the last one.
Which three actions would be helpful in determining the cause of a node reboot ?
a-)determining the time of the node reboot by using the update command and subtracting the uptime from the current system time
b-)looking for messages such as "ORACLE CSSD failure". Rebooting the cluster integrity in /var/log/messages
c-)using crsctl command to view tracing information
d-)inspecting the ocssd log for "Begin Dump" or "End Dump" messages
e-)inspecting the database alert log for reboot messages

Hi;
Correct answer is ABE
Regard
Helios

${domain_url} resolves to the secondary node's address

Hi, we are running BPEL 10.1.3.5 on 2 separate servers (non clustered) sharing one dehydration store.
We have BPEL processes that calls other BPEL processes and this has been in production for a few years with no issues.
Since the upgrade to 10.1.3.5 we are seeing those BPEL processes reference the wsdl on the secondary node which now creates a dependency between the 2 nodes.
In the BPEL Console under the Descriptor tab we see the wsdlRuntimeLocation now points to the other node hence creating a dependency.
Additional Info:
In the calling BPEL process we have the following in bpel.xml
<property name="wsdlRuntimeLocation">${domain_url}/MyCalledBpelProcess/MyCalledBpelProcess?wsdl</property>
From my understanding the ${domain_url} gets substituted with the appropriate local host information.
None of this has changed over the years but since the upgrade the ${domain_url} now seems to get replaced by the other node's host information rather than its own node host information.
Hope someone can help with this.
Thanks

I have some additional information that may help.
The wsdlRuntimeLocation gets changed when the Descriptor value is updated.
Scenario:
Process A calls process B
Processes A and B are deployed to 2 nodes (non clustered)
Process A has a Descriptor value that can be changed from the BPEL Console. When the value of the Descriptor is changed on one node both node's wsdlRuntimeLocation point to the same wsdl rather than their own node's wsdl. Example: When the Descriptor is updated on node 1 the wsdlRuntimeLocation on node 2 gets changed to point to node 1's wsdl of process B. If node 1 goes down then this fails since it can't find the wsdl anymore. If both nodes are up then no problem.
Note that this has been in production for years and only became a problem when we recently upgrated to 10.1.3.5.
Please let me know if you need additional information.

Both cluster node reboot

There is a two nodes cluster and running Oracle RAC DB. Yesterday both nodes rebooted at the same time (less than few seconds different). Don't know it was caused by Oracle CRS and server itsefl?
Here is the log:
/var/log/messages in node 1
Dec 8 15:14:38 dc01locs01 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18469446 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:14:38 dc01locs01 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18469448 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:17:20 dc01locs01 syslogd 1.4.1: restart.
Dec 8 15:17:20 dc01locs01 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 8 15:17:20 dc01locs01 kernel: Linux version 2.6.18-128.7.1.0.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 14:07:09 EDT 2009
Dec 8 15:17:20 dc01locs01 kernel: Command line: ro root=/dev/vg00/root rhgb quiet crashkernel=128M@16M
Dec 8 15:17:20 dc01locs01 kernel: BIOS-provided physical RAM map:
ocssd.log in node 1
CSSD2009-12-08 15:14:33.467 1134680384 >TRACE: clssgmDispatchCMXMSG: msg type(13) src(2) dest(1) size(123) tag(00000000) incarnation(148585637)
CSSD2009-12-08 15:14:33.468 1134680384 >TRACE: clssgmHandleDataInvalid: grock HB+ASM, member 2 node 2, birth 1
CSSD2009-12-08 15:19:00.217 >USER: Copyright 2009, Oracle version 11.1.0.7.0
CSSD2009-12-08 15:19:00.217 >USER: CSS daemon log for node dc01locs01, number 1, in cluster ocsprodrac
clsdmtListening to (ADDRESS=(PROTOCOL=ipc)(KEY=dc01locs01DBG_CSSD))
CSSD2009-12-08 15:19:00.235 1995774848 >TRACE: clssscmain: Cluster GUID is 79db6803afc7df32ffd952110f22702c
CSSD2009-12-08 15:19:00.239 1995774848 >TRACE: clssscmain: local-only set to false
/var/log/messages in node 2
Dec 8 15:14:38 dc01locs02 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18561465 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:14:38 dc01locs02 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18561463 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:17:14 dc01locs02 syslogd 1.4.1: restart.
Dec 8 15:17:14 dc01locs02 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 8 15:17:14 dc01locs02 kernel: Linux version 2.6.18-128.7.1.0.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 14:07:09 EDT 2009
Dec 8 15:17:14 dc01locs02 kernel: Command line: ro root=/dev/vg00/root rhgb quiet crashkernel=128M@16M
Dec 8 15:17:14 dc01locs02 kernel: BIOS-provided physical RAM map:
ocssd.log in node 2
CSSD2009-12-08 15:14:35.450 1264081216 >TRACE: clssgmExecuteClientRequest: Received data update request from client (0x2aaaac065a00), type 1
CSSD2009-12-08 15:14:36.909 1127713088 >TRACE: clssgmDispatchCMXMSG: msg type(13) src(1) dest(1) size(123) tag(00000000) incarnation(148585637)
CSSD2009-12-08 15:14:36.909 1127713088 >TRACE: clssgmHandleDataInvalid: grock HB+ASM, member 1 node 1, birth 0
CSSD2009-12-08 15:18:55.047 >USER: Copyright 2009, Oracle version 11.1.0.7.0
clsdmtListening to (ADDRESS=(PROTOCOL=ipc)(KEY=dc01locs02DBG_CSSD))
CSSD2009-12-08 15:18:55.047 >USER: CSS daemon log for node dc01locs02, number 2, in cluster ocsprodrac
CSSD2009-12-08 15:18:55.071 3628915584 >TRACE: clssscmain: Cluster GUID is 79db6803afc7df32ffd952110f22702c
CSSD2009-12-08 15:18:55.077 3628915584 >TRACE: clssscmain: local-only set to false

Hi!
I suppose this seems easy: you have a service at 'http://RAIDarray.mppdcsgswsst6140:1:0:2' (a RAID perhaps?) which failed. Logically all servers connected to thi RAID went down at the same time.
Seems no Oracle problem. Good luck!

Secondary Node Rebooted instead of falling to Ok prompt

Similar Messages

Maybe you are looking for