Service relocating and failover

Hi all,
I woluf like to use different service name on a 2-nodes RAC to manually control which nodes clients connect to.
I set up:
- a "global service", which is running on both instances
- a "node1" service, which has instance 1 as preferred and instance 2 as available
- a "node2" service, which has instance 2 as preferred and instance 1 as available
So I can choose where to connect.
However, if I stop one instance or if I reboot one one, clients connected to the "global" service are failed over to the surviving node, the service is switched to the surviving available instance, but clients connected to the switched service are not failed over.
How can I correct this situation? I would like to have all clients failing over to the surviving node.
Moreover, when a node restart the service is not failed back to its preferred node.
How can I force this?
A full list of commands used to manage services follows.
thanks for every answer!
andrea
-- create and start aditional services
[oracle@giallo ~]$ srvctl add service -d asr -s asrgiallo -r "asr1" -a "asr2" -P BASIC
[oracle@giallo ~]$ srvctl add service -d asr -s asrrosso -r "asr2" -a "asr1" -P BASIC
[oracle@giallo admin]$ srvctl start service -d asr -s asrgiallo
[oracle@giallo admin]$ srvctl start service -d asr -s asrrosso
-- clients will connect using these tnsnames entreis
ASR_giallo =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = giallo-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = rosso-vip)(PORT = 1521))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = asrgiallo.noemalife.loc)
ASR_rosso =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = giallo-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = rosso-vip)(PORT = 1521))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = asrrosso.noemalife.loc)
-- reboot first node
[root@giallo ~]# reboot
-- now clients connected to ASR_giallo are disconnected:
-- every query results in end-of-file on commuication channel
-- check service status
[oracle@giallo ~]$ srvctl status service -d asr -s asrgiallo
Service asrgiallo is running on instance(s) asr2
[oracle@giallo ~]$
[oracle@giallo ~]$ srvctl status service -d asr -s asrrosso
Service asrrosso is running on instance(s) asr2
[oracle@giallo ~]$
-- restart the service to have it back on its preferred node
[oracle@giallo ~]$ srvctl stop service -d asr -s asrgiallo
[oracle@giallo ~]$ srvctl status service -d asr -s asrgiallo
Service asrgiallo is not running.
[oracle@giallo ~]$
[oracle@giallo ~]$ srvctl status service -d asr -s asrrosso
Service asrrosso is running on instance(s) asr2
[oracle@giallo ~]$
[oracle@giallo ~]$ srvctl status database -d asr
Instance asr1 is running on node giallo
Instance asr2 is running on node rosso
[oracle@giallo ~]$ srvctl start service -d asr -s asrgiallo
[oracle@giallo ~]$ srvctl status service -d asr -s asrgiallo
Service asrgiallo is running on instance(s) asr1

I was too optimistic in my previous post.
If I reboot one node (using the "reboot" coomand from command line), everything work as expected: services and sessions are failed over to the surviving node, and I'm really happy about it.
But if I switch off one node (typing "init 0" on the command line) things does not work.
if i do
[root@rosso ~]# init 0
the service is not failed over node giallo:
[oracle@giallo ~] dmesg
ocfs2_dlm: Node 1 leaves domain 850CA09F1D7541CB9D996C55DDEB1488
ocfs2_dlm: Nodes in domain ("850CA09F1D7541CB9D996C55DDEB1488"): 0
ocfs2_dlm: Node 1 leaves domain 90A92BEA986244D78826CEFB57584910
ocfs2_dlm: Nodes in domain ("90A92BEA986244D78826CEFB57584910"): 0
o2net: no longer connected to node rosso (num 1) at 192.168.223.241:7777
tg3: eth1: Link is down.
[oracle@giallo ~]$ srvctl status database -d asr
-- hangs for some time, finally it ends up with:
PRKO-2015 : Error in checking condition of instance on node: giallo
PRKO-2015 : Error in checking condition of instance on node: rosso
-- if I try it againg, it succeedds but ig gives wrong result!!!
[oracle@giallo ~]$ srvctl status database -d asr
Instance asr1 is running on node giallo
Instance asr2 is running on node rosso
So the service asrrosso is "correctly" left on the instance asr2, even if the cluster node hosting this instance is off!
If I look at ${ORA_CRS_HOME}/log/`hostname`/crsd/crsd.log I find:
2008-02-11 17:42:24.054: [ OCRMSG][2724740000]prom_rpc: CLSC send failure..ret code 6
2008-02-11 17:42:24.054: [ OCRMSG][2724740000]prom_rpc: possible OCR retry scenario
[ default][2724740000]caching_retry: Reconfig number mre than max nodes [510]
2008-02-11 17:42:24.055: [ CRSOCR][2724740000]0OCR api procr_set_value failed for key ora!asr!asrgiallo!cs.REASON. OCR error code = 23 OCR error msg: PROC-23: Error in cluster services layer Messaging error [6]
2008-02-11 17:42:24.055: [ CRSOCR][2724740000][PANIC]0Failed to set key: ora!asr!asrgiallo!cs.REASON value: system(File: caaocr.cpp, line: 229)
2008-02-11 17:42:26.594: [ default][3086931648][ENTER]0
Oracle Database 10g CRS Release 10.2.0.3.0 Production Copyright 1996, 2004, Oracle. All rights reserved
2008-02-11 17:42:26.594: [ default][3086931648]0CRS Daemon Starting
2008-02-11 17:42:26.595: [ CRSMAIN][3086931648]0Checking the OCR device
2008-02-11 17:42:26.600: [ CRSMAIN][3086931648]0Connecting to the CSS Daemon
2008-02-11 17:42:27.056: [    CRSD][3086931648]0Daemon Version: 10.2.0.3.0 Active Version: 10.2.0.3.0
2008-02-11 17:42:27.057: [    CRSD][3086931648]0Active Version and Software Version are same
2008-02-11 17:42:27.057: [ CRSMAIN][3086931648]0Initializing OCR
2008-02-11 17:42:27.069: [ OCRRAW][3086931648]proprioo: for disk 0 (/fw/oradata/ASR/OCRFile), id match (1), my id set (1215517508,1251051748) total id sets (1), 1st set (1215517508,1251051748), 2nd set (0,0) my votes (1), total votes (2)
2008-02-11 17:42:27.070: [ OCRRAW][3086931648]proprioo: for disk 1 (/fw/oradata/ASR/OCRFile_mirror), id match (1), my id set (1215517508,1251051748) total id sets (1), 1st set (1215517508,1251051748), 2nd set (0,0) my votes (1), total votes (2)
2008-02-11 17:42:27.076: [ OCRSRV][3086931648]proath_init: pudata retval [12]. Member [2] might have gone down
2008-02-11 17:42:27.076: [ OCRSRV][3086931648]proath_init: Failed to retrieve pubdata. Expect a rcfg
2008-02-11 17:47:43.556: [ OCRMAS][3041979296]th_master:12: I AM THE NEW OCR MASTER at incar 1. Node Number 1
2008-02-11 17:47:43.649: [ OCRRAW][3041979296]proprioo: for disk 0 (/fw/oradata/ASR/OCRFile), id match (1), my id set (1215517508,1251051748) total id sets (1), 1st set (1215517508,1251051748), 2nd set (0,0) my votes (1), total votes (2)
2008-02-11 17:47:43.650: [ OCRRAW][3041979296]proprioo: for disk 1 (/fw/oradata/ASR/OCRFile_mirror), id match (1), my id set (1215517508,1251051748) total id sets (1), 1st set (1215517508,1251051748), 2nd set (0,0) my votes (1), total votes (2)
2008-02-11 17:47:43.873: [ OCRMAS][3041979296]th_master: Deleted ver keys from cache (master)
2008-02-11 17:47:43.966: [    CRSD][3086931648]0ENV Logging level for Module: allcomp 0
2008-02-11 17:47:44.191: [    CRSD][3086931648]0ENV Logging level for Module: default 0
2008-02-11 17:47:44.194: [    CRSD][3086931648]0ENV Logging level for Module: COMMCRS 0
2008-02-11 17:47:44.299: [    CRSD][3086931648]0ENV Logging level for Module: COMMNS 0
2008-02-11 17:47:44.300: [    CRSD][3086931648]0ENV Logging level for Module: CRSUI 0
2008-02-11 17:47:44.302: [    CRSD][3086931648]0ENV Logging level for Module: CRSCOMM 0
2008-02-11 17:47:44.305: [    CRSD][3086931648]0ENV Logging level for Module: CRSRTI 0
2008-02-11 17:47:44.307: [    CRSD][3086931648]0ENV Logging level for Module: CRSMAIN 0
2008-02-11 17:47:44.308: [    CRSD][3086931648]0ENV Logging level for Module: CRSPLACE 0
2008-02-11 17:47:44.310: [    CRSD][3086931648]0ENV Logging level for Module: CRSAPP 0
2008-02-11 17:47:44.311: [    CRSD][3086931648]0ENV Logging level for Module: CRSRES 0
2008-02-11 17:47:44.315: [    CRSD][3086931648]0ENV Logging level for Module: CRSOCR 0
2008-02-11 17:47:44.316: [    CRSD][3086931648]0ENV Logging level for Module: CRSTIMER 0
2008-02-11 17:47:44.318: [    CRSD][3086931648]0ENV Logging level for Module: CRSEVT 0
2008-02-11 17:47:44.319: [    CRSD][3086931648]0ENV Logging level for Module: CRSD 0
2008-02-11 17:47:44.323: [    CRSD][3086931648]0ENV Logging level for Module: CLUCLS 0
2008-02-11 17:47:44.390: [    CRSD][3086931648]0ENV Logging level for Module: OCRRAW 0
2008-02-11 17:47:44.402: [    CRSD][3086931648]0ENV Logging level for Module: OCROSD 0
2008-02-11 17:47:44.404: [    CRSD][3086931648]0ENV Logging level for Module: CSSCLNT 0
2008-02-11 17:47:44.405: [    CRSD][3086931648]0ENV Logging level for Module: OCRAPI 0
2008-02-11 17:47:44.407: [    CRSD][3086931648]0ENV Logging level for Module: OCRUTL 0
2008-02-11 17:47:44.408: [    CRSD][3086931648]0ENV Logging level for Module: OCRMSG 0
2008-02-11 17:47:44.409: [    CRSD][3086931648]0ENV Logging level for Module: OCRCLI 0
2008-02-11 17:47:44.411: [    CRSD][3086931648]0ENV Logging level for Module: OCRCAC 0
2008-02-11 17:47:44.412: [    CRSD][3086931648]0ENV Logging level for Module: OCRSRV 0
2008-02-11 17:47:44.414: [    CRSD][3086931648]0ENV Logging level for Module: OCRMAS 0
2008-02-11 17:47:44.414: [ CRSMAIN][3086931648]0Filename is /opt/oracle/crs/oracle/product/10.2.0/crs/crs/init/giallo.pid
[ clsdmt][2840591264]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=gialloDBG_CRSD))
2008-02-11 17:47:44.752: [ CRSMAIN][3086931648]0Using Authorizer location: /opt/oracle/crs/oracle/product/10.2.0/crs/crs/auth/
2008-02-11 17:47:44.988: [ CRSMAIN][3086931648]0Initializing RTI
2008-02-11 17:47:45.208: [CRSTIMER][2819345312]0Timer Thread Starting.
2008-02-11 17:47:45.229: [ CRSRES][3086931648]0Parameter SECURITY = 1, running in USER Mode
2008-02-11 17:47:45.229: [ CRSMAIN][3086931648]0Initializing EVMMgr
2008-02-11 17:47:47.342: [ CRSMAIN][3086931648]0CRSD locked during state recovery, please wait.
2008-02-11 17:47:48.228: [ CRSRES][3086931648]0ora.giallo.vip check shows ONLINE
2008-02-11 17:47:59.191: [ CRSRES][3086931648]0ora.giallo.gsd check shows ONLINE
2008-02-11 17:48:12.702: [ CRSRES][3086931648]0ora.giallo.ons check shows ONLINE
2008-02-11 17:48:17.166: [ CRSRES][3086931648]0ora.giallo.LISTENER_GIALLO.lsnr check shows ONLINE
2008-02-11 17:48:17.861: [ CRSRES][3086931648]0ora.giallo.ASM1.asm check shows ONLINE
2008-02-11 17:48:22.322: [ CRSRES][3086931648]0ora.asr.db check shows ONLINE
2008-02-11 17:48:25.425: [ CRSRES][3086931648]0ora.asr.asr1.inst check shows ONLINE
2008-02-11 17:48:28.913: [ CRSRES][3086931648]0ora.asr.asrtest.asr1.srv check shows ONLINE
2008-02-11 17:48:30.669: [ CRSEVT][3086931648]0CAAMonitorHandler :: 0:Action Script for resource 'ora.asr.asrgiallo.cs' stdout redirection failed for `/opt/oracle/crs/oracle/product/10.2.0/crs/crs/log/startutKC2Q.stdout` : No such file or directory
2008-02-11 17:48:30.916: [ CRSRES][3086931648]0ora.asr.asrgiallo.cs check shows ONLINE
2008-02-11 17:48:34.592: [ CRSRES][3086931648]0ora.asr.asrgiallo.asr1.srv check shows ONLINE
2008-02-11 17:48:35.300: [ CRSMAIN][3086931648]0CRSD recovered, unlocked.
2008-02-11 17:48:35.305: [ CRSMAIN][3086931648]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))
2008-02-11 17:48:35.363: [ CRSMAIN][3086931648]0CRSD UI socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2008-02-11 17:48:35.367: [ CRSMAIN][3086931648]0E2E socket on: (ADDRESS=(PROTOCOL=tcp)(HOST=giallo-priv)(PORT=49896))
2008-02-11 17:48:35.367: [ CRSMAIN][3086931648]0Starting Threads
2008-02-11 17:48:35.367: [ CRSMAIN][3086931648]0CRS Daemon Started.
2008-02-11 17:48:35.367: [ CRSMAIN][2754481056]0Starting runCommandServer for (UI = 1, E2E = 0). 0
2008-02-11 17:48:35.367: [ CRSMAIN][2743991200]0Starting runCommandServer for (UI = 1, E2E = 0). 1
What's wrong?

Similar Messages

Distributed services with load balancing and failover?

Hullo;
What platform would you use to implement something like the following:
* easy registration of various services
* delegation of a service request to the best candidate of many, based on some measure (probably reported by the services themselves)
* quick failover and location of an alternate service in case the best candidate does not respond (real-life environment, uncertain networks and servers;)
RMI could be a starting point, with a custom SocketFactory to take care of the timeouts and redelegations and a good delegator service to work through. The service concept sounds a lot like JINI, but I don't see any provision for best candidate selection, and wonder whether JINI would really save any time compared to RMI in this case.
Is there anything else I should be aware of? I wouldn't mind finding a pre-built wheel. (Cougaar (http://www.cougaar.org/) is on my reading list; a quick glance gives me the impression it may be a bit too heavy on the communication level, but maybe I'm wrong.)
Thanks for your thoughts;
//ata

ata,
Before jumping to anything so bloated and limited as cougaar, take time to consider what you really need. Before grabing at the fanciest Java features like RMI, JINI, and custom SocketFactories; focus on what you are trying to accomplish.
There are plenty of great answers right here, at this forum.
Good hunting,
John

RE: Hard Failures, KeepAlive, and Failover --Follow-up

Hi,
It's a really challenging question. However, what do you want to do after
the network crash? Failover or just stop the service? Should we assume
that when the network is down, and so do your name service?
One idea is to use externalconnection to "listen" to your external non-forte
alarm, so do "whatever" after you receive the alarm instead of letting the
"logical connection" to time out or hang.
Regards,
Peter Sham.
-----Original Message-----
From: Michael Lee [SMTP:[email protected]]
Sent: Wednesday, June 16, 1999 12:44 AM
To: [email protected]
Subject: Hard Failures, KeepAlive, and Failover -- Follow-up
I've gotten a handful of responses to my original post, and the suggested
solutions are all variations on the same theme -- periodically ping remote
nodes/partitions and then react when the node/partition goes down. In
other circumstance this would work, but unless I'm missing something this
solution doesn't solve the problem I'm running into.
Some background...
When a connection is set up between partitions on two different nodes,
Forte is effectively establishing two connections: a "physical
connection"
over TCP/IP between two ports and a "logical connection" between the two
partitions (running on top of the physical connection). Once a connection
is established between two partitions Forte assumes the logical connection
is valid until one of two things happen:
1) The logical connection is broken (by shutting down a partition from
Econsole/Escript, by killing a node manager, by terminating the ftexec,
etc.)
2) Forte detects that the physical connection is broken (via its KeepAlive
functionality).
If a physical connection is broken (via a cut cable or power-off
condition), and Forte has not yet detected the situation (via a KeepAlive
failure), the logical connection is still valid and Forte will still allow
method calls on the remote partition. In effect, Forte thinks the remote
partition is still up and running. In this situation, any method calls
made after the physical connection has been broken will simply hang. No
exceptions are generated and failover does not occur.
However, once a KeepAlive failure is detected all is made right.
Unfortunately, the lowest-bound latency of KeepAlive is greater than one
second, and we need to detect and react to hard failures in the 250-500ms
range. Using technology outside of Forte we are able to detect the hard
failures within the required times, but we haven't been able to get Forte
to react to this "outside" knowledge. Here's why:
Since Forte has not yet detected a KeepAlive failure, the logical
connection to the remote partition is still "valid". Although there are a
number of mechanisms that would allow a logical connection to be broken,
they all assume a valid physical connection -- which, of course, we don't
have!
It appears I'm in a "Catch-22" situation: In order to break a logical
connection between partitions, I need a valid physical connection. But
the
reason I'm trying to break the logical connection in the first place is
that I know (but Forte doesn't yet know) that the physical connection has
been broken.
If anyone knows a way around this Catch-22, please let me know.
Mike
To unsubscribe, email '[email protected]' with
'unsubscribe forte-users' as the body of the message.
Searchable thread archive <URL:http://pinehurst.sageit.com/listarchive/>-
To unsubscribe, email '[email protected]' with
'unsubscribe forte-users' as the body of the message.
Searchable thread archive <URL:http://pinehurst.sageit.com/listarchive/>

Make sure you chose the right format, and as far as partitioning in concerned, you have to select at least one partition, which will be the entire drive.

Load balancing and Failover

Hello,
We are wondering how load-balancing and failover of tpcall() work with
WTC:
The scenario:
We have one WLS Domain and two Tuxedo Domains. The Tuxedo Domains offer
the same set of services.
In the bdmconfig.xml, we specify connection_policy as 'ON_STARTUP' for
both Remote Tuxedo Domains. We also Import (T_DM_IMPORT) the same
Tuxedo Service from both Tuxedo Domains.
Questions:
1. Is there any load-balancing of the tpcall between the two Domains? If
so, is it round-robin? If round-robin, what determines the order?
2. If it is ONLY Failover, what determines the order of the tpcall? And,
is the Failover automatic? Or do we need to code for retry on failure?
3. ON_DEMAND vs ON_STARTUP: Does ON_DEMAND drop the connection to the
remote domain upon tpterm? And does ON_STARTUP use a pool of
TuxedoConnection objects?
4. Are there any configuration parameters for
'max_number-of_connections? What determines how many simultaneous
connections can be made?
Thanks,
Suresh Mohan.

Hi Suresh,
The following are my answers to your questions.
Suresh Mohan wrote:
Hello,
We are wondering how load-balancing and failover of tpcall() work with
WTC:
The scenario:
We have one WLS Domain and two Tuxedo Domains. The Tuxedo Domains offer
the same set of services.
In the bdmconfig.xml, we specify connection_policy as 'ON_STARTUP' for
both Remote Tuxedo Domains. We also Import (T_DM_IMPORT) the same
Tuxedo Service from both Tuxedo Domains.
Questions:
1. Is there any load-balancing of the tpcall between the two Domains? If
so, is it round-robin? If round-robin, what determines the order?Yes there is a load balancing between two remote Tuxedo TDomain Gateways.
The algorithm is random, not RR. Over time this should give equal
opportunities to both remote TDomain.
>
2. If it is ONLY Failover, what determines the order of the tpcall? And,
is the Failover automatic? Or do we need to code for retry on failure?The load balancing is always there. The failover is automatic. When a
connection to a remote TDomain encountered a problem (ie network) the remote
domain will be put on retry open connection (in ON_STARTUP) and the load
balancing will not select it until the connection re-established.
However, the tpcall() that encountered the error will not be retried to send
to different destination. It is up to the application to decide whether it
want to resend. Any requests called after the error will not select the
failed Remote TDomain.
>
3. ON_DEMAND vs ON_STARTUP: Does ON_DEMAND drop the connection to the
remote domain upon tpterm? And does ON_STARTUP use a pool of
TuxedoConnection objects?TPTERM() only terminate your application session to WTC. WTC still maintain
a secured T-session to remote Tuxedo TDomain. WTC does not use a pool of
TuxedoConnection Objects, the object stored in the JNDI refers to WTC.
>
4. Are there any configuration parameters for
'max_number-of_connections? What determines how many simultaneous
connections can be made?No. As described in #3, there is no need to use connection pool in WTC. WTC
uses session and virtual circuit design concept as Tuxedo TDOMAIN, the
logical pool is created/destroyed dynamically. That is the reason why you
can have a lot of TPACALL() outstanding at the same time. (The limitation is
the availability system resource.)
>
>
Thanks,
Suresh Mohan.Regards,
Hong-Hsi :-)

Network Load Balancing and failover for AFP Sharing

Dear all,
Somebody kindly teach me to use round robin DNS to perform the network load balancing, it's success but not the failover.
I have 4 xserve and want to do the load balancing and failover at the same time.
I have read the IP failover document and setup it successfully, but anyone know is it possible to do the IP failover for more than 2 server?
For example, 4 server serving the AFP service at the same time, maybe I have 1 more extra server to do the IP failover for thoese 4 servers.
As I know, IP failover require Firewire as the heartbeat detection. But one xserve only have 2 firewire ports. May I setting up the IP failover only by a ethernet port and an IP address? does it possible to detect and failover to any server after server down has been detected?
I believe load balancer maybe the best solution but its cost is too high.
Thanks any advance!
Karllee

well, u have 2 options here
software load balancing
request comes it foo.com -> ws7u2 hosting foo.com is configured to run as reverse proxy . this server sends any incoming requests to one of the four back end web server 7 handling your incoming request
hardware load balancing (this you need to invest)
request comes to hardware load balancer who responds for foo.com -> sends requests to four ws7 server hosting your application
you could try out how software load balancing works out for you before you invest in hardware load balancing
here is more instruction on configuring ws7 + reverse proxy (software load configuration)
- install ws7 on foo.com
- create a new configuration (choose port 80, disable java

Unplumb net0 in public network, the HA NFS service didn't failover

I have two node consisting the cluster 4.1. I set up the NFS service on the cluster, as below.
root@sgh28h13:~# scstat -g
-- Resource Groups and Resources --
Group Name Resources
Resources: resource-group-1 sgh28cluster global_Sym_R5_1G_d110-rs nfs-global-Sym_R5_1G-d110-admin-rs
-- Resource Groups --
Group Name Node Name State Suspended
Group: resource-group-1 sgh28h13 Online No
Group: resource-group-1 sgh28h17 Offline No
-- Resources --
Resource Name Node Name State Status Message
Resource: sgh28cluster sgh28h13 Online Online - LogicalHostname online.
Resource: sgh28cluster sgh28h17 Offline Offline
Resource: global_Sym_R5_1G_d110-rs sgh28h13 Online Online
Resource: global_Sym_R5_1G_d110-rs sgh28h17 Offline Offline
Resource: nfs-global-Sym_R5_1G-d110-admin-rs sgh28h13 Online Online - Service is online.
Resource: nfs-global-Sym_R5_1G-d110-admin-rs sgh28h17 Offline Offline
The NFS service is on sgh28h13 originally. sgh28h13 has one interface net0 in public network for this service. So I use "ifconfig unplumb net0" to shutdown net0 so the NFS service could failover to the other node. But the service is still online on sgh28h13 after I shutdown the net0 and sc_ipmp0. I don't know why the NFS service didn't failover. Could somebody help?
root@sgh28h13:~# ipadm
NAME CLASS/TYPE STATE UNDER ADDR
clprivnet0 ip ok -- --
lo0 loopback ok -- --
lo0/v4 static ok -- 127.0.0.1/8
lo0/v6 static ok -- ::1/128
net0 ip ok sc_ipmp0 --
net1 ip ok -- --
net2 ip ok -- --
sc_ipmp0 ipmp ok -- --
sc_ipmp0/static1 static ok -- 10.103.117.103/22
帖子经 user9111646编辑过

Hi.
IPMP not triggered on change or misconfigured configuration. You change configuration, so change configuration of IPMP to.
So cluster not trigered on this event.
In case you test failover for lost netowrk connections - initiate link down from switch side or just disconnect cable.
Regards.

CSS 11503 adv-bal-stcky-srcip and failover question

From the documentation I have read about failover (when a service fails) it lists several loadbalancing types, but the advanced-balance-sticky-srcip is not one of them. Is it possible to configure failover linear or failover next when using adv-bal-stcky-srcip? The CSS is configured for a kal (type=tcp / port=8080), but does not do anything when the service does not respond the kal.

Let me add some clarity then. :) I am running PeopleSoft through the CSS (which is the reason for adv-bal sticky-srcip). I have two services in my content rule:
service pplsft-web1
ip 172.27.144.63
port 8080
kal tcp / 8080
service pplsft-web2
ip 172.27.144.63
port 8080
kal tcp / 8080
When the KAL fails and the service is marked down, what is the CSS supposed to do?...you can configure the failover of a a service (ie: failover linear or failover next). In all the docs I have read, I read that 'failover linear' and 'failover next' were for regular load balancing techniques (ie: domain, url, srcip, destip, domainhash and urlhash). Can I use it if I have the advanced-balance sticky-srcip load balancing command on the content?
Does that clarify any?
I did get an answer from TAC...by default, 'failover linear' is enabled. But what may be happening is that because of the sticky config, the user ip is still in the sticky table, which over-rides load balancing. I have to define the settings for "sticky serer down failover" to either 'balance' or 'reject' entries in the sticky table and any requests that come in.

DHCP and failover issues

Hello,
I am trying to implement failover in our dhcp but I don't get it right....
I have 2 SLES/OES servers both can be used as DHCP server without failover (one active, the other not).
server 1 => SLES11SP1 with OES11 (dhcp version is 3.1.3 ESV)
server 2 => SLES11SP2 with OES11SP1 (dhcp version is 4.2.4-P2)
I have defined my failover dhcp services following TID 7004294, so I have defined to services (ip_serv1 and ip_serv2), each failover service does contain:
a) failover object ( FO2SERV2, FO2SERV1), where
Primary server is 10.7.0.248 10.7.0.248
Primary port is 647 847
Secondary server is 10.7.0.250 10.7.0.250
Secondary port is 847 647
Failover split is 128 128
Max. Client Lead Time 3600 3600
b) subnet 10.11.0.0 10.11.0.0 where these subnets have following pool
c) pool pool_10_11 pool_10_11 and these have the following failover attached to
FO2SERV2 FO2SERV1
I think that these parameter follow precisly the TID 7004294, but when I start the dhcpd services on the primary server I get the following messages:
I move from recover to startup
I move from startup to recover
DHCPDISCOVER from 00:23:24:07:84:53 (REFERENCE) via XXX.XXX.XXX.XXX: not responding (recovering)
and indeed I do not get any lease.
What am I doing wrong????
Any suggestions?
Thanks in advance

Originally Posted by ricard1
Finally I got it!!.
First I upgraded one of my servers to same release level as the second
Then I defined the Failover Objects with the same name as you suggested and using the same port (in my case 847).
That did the trick.
Thanks!
PS. it is a shame that some TID's are so wrong as this one (TID7004294).
Please provide TID feedback on the bottom of the page: https://www.novell.com/support/kb/doc.php?id=7004294
Thomas

I cannot receive email properly now. When I open mail, it says that is downloading about 1,700 emails. At the very end, it gives me my newest ones. But this takes a long time. I've contacted the Internet service provider and verified all the right setting

I cannot receive email properly on either my IPad or my IPhone. I have had them for over a year and they have always worked fine. Until three days ago, when they both started acting up. On the IPad, when I open mail, it says it is downloading about 1,700 emails. At the very end, which takes quite a while to get to, I finally get the most recent ones. The IPad is sending emails just fine.
On my IPhone, when I open mail, it says it is downloading 100 emails, but it doesn't do that. And it gives me no new emails at all. The IPhone is sending email just fine.
I have already deleted the email accounts on both devices and reinstalled them. I've contacted the Internet service provider and verified all the right settings. The Outlook email on my desktop is working perfectly.

WMV is a heavily-compressed format/CODEC, and the processing time will depend on several factors:
Your CPU, which is not that powerful in your case
Your I/O sub-system, which is likely a single HDD on your laptop
The source footage. What is your source footage?
Any Effects added to that footage. Do you have any Effects?
Each of those will have an impact on the time required.
The trial has only one main limitation - the watermark. Now, there are some components, that have to be activated, but are not with the trial, but they would be evident with Import of your source footage, if it's an issue.
Good luck,
Hunt

Entering values in Web Services ID and Description, for External Catalog

Hi,
I am trying to connect to an external Vendor Catalog from ERP. Pl note that we just have ERP ECC 6.0 and no SRM. I went to SPRO and followed menu path 'Materials Management >> Purchasing >> Web Services: ID and Description'. I am not sure what to enter in 1) Seq. Number 2) Name of Parameter for Web Service and 3) Call Structure COLUMNS.
I have URL from Vendor, User Name and Password. I need to know what values are mandatory here so I can successfully connect to external Catalog. If someone can give me sample values, I can try.
Any help is appreciated,
Niranjan

Update on 08/23/2011.
We are able to connect to Vendor Catalog, select items in Cart. When I press 'Place Order' button, I see all items populated in SAP PO ME21N screen. We can then add other information and create PO. This is working great. We have another question. Heard that with ERP ECC 6.0, we can only connect 1 vendor catalog. But with SRM, we can connect more than 1 vendor catalog. Is there a BAPI or change SAP code to connect more than 1 vendor catalog from ECC 6.0 ?

How to synchronize PI service registry and IBM WSRR

Hello All,
In our current project we have developed one web service which resides in SAP CE and is registered in SAP PI Service Registry.
But our client has IBM websphere that acts as a middleware for all ther services (interfaces/web services)
Now, I need to understand how to synchronize SAP PI Service Registry with IBM WSRR (Websphere Services Registry and Repository)
What are the steps/configurations need to be performed at PI end and IBM WSRR end?
Appreciate your help in this matter.
Thanks,
Shriram.

Hi,
Refer the below links i think it will be helpful.
Configuring a central Services Registry:
http://help.sap.com/saphelp_nwce711/helpdata/en/47/d391d7b8fc3c83e10000000a42189c/frameset.htm
You can also use the Wizard based configuration: http://help.sap.com/saphelp_nwce711/helpdata/en/f7/6182bd68434595ba5105a0a346efcc/frameset.htm
https://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/00985388-6748-2c10-0d83-f17c3e768a8b&overridelayout=true
Regards,
Sudha S.

VirtualDisk on Windows Server 2012 R2 Storage Pool stuck in "Warning: In Service" state and all file transfers to and from is awfully slow

Greetings,
I'm having some trouble with my Windows Storage Pool and my VirtualDisk running on a Windows Server 2012 R2 installation. It consists of 8x Western Digital RE-4 2TB drives + 2x Western Digital Black Edition 2TB drives and have been configured in a single-disk
parity setup and the virtual disk is running fixed provisioning (max size) and is formatted with ReFS.
It's been running solid for months besides some awful write-speeds at times, it seems like the write performance running ReFS compared to NTFS is not that good.
I was recommended to add SSD's for journalling in order to boost write-performance. Sadly I seemed to screw up this part, you need to due this through PowerShell and it needs to be done before creating the virtualdisk. I managed to add my SSD to the Storage
Pool and then remove it.
This seem to have caused some awkward issues, I'm not quite sure of why as the virtualdisk is "fixed" so adding the SSD to the Storage Pool shouldn't really do anything, right? But after I did this my virtual disk have been stuck in "Warning:
In Service" and it seems to be stuck? It's been 4-5 days and it's still the same and the performance is currently horrible. Moving 40GB of data off the virtual disk took me about 20 hours or so. Launching files under 1mb of the virtual disk takes several
minutes etc.. It's pretty much useless.
The GUI is not providing any useful information about what's going on. What does "Warning: In Service" actually imply? How am I supposed to know how long this is supposed to take? Running Get-Virtualdisk in PowerShell does not provide any useful
information either. I did try to do a repair through the Server Manager GUI but it goes to about 21% within 2-3 hours but drops back down to 10%. I have had the repair running for days but it wont go past 21% without dropping back down again.
Running repair through PowerShell yields the same results, but if I detach the virtual disk and then try to repair through PowerShell (the GUI wont let me do repair on detached virtual disks) it will just run for a split second and then close.
After doing some "Googeling" I've seen people mentioning that the repair is not able to finish unless I have at least the same amount of free space in the Storage Pool as the largest drive in my Storage Pool is housing so I added a 4TB drive as
due to me running fixed provisioning I had used all the space in the pool but the repair is still not able to go past 21%.
As am running "fixed provisioning" I guess adding a extra drive to the pool doesn't do much difference as it's not available for the virtual disk? So I went ahead and deleted 3 TB of data on the virtual disk so now I've got about 4 TB free space
on the virtual disk so there should be plenty of room for Windows Server 2012 R2 to re-build the parity or whatever it's trying to do but it's still the same, the repair wont move past 21% and the virtual disk is still stuck in "Warning: In Service"
mode and the performance keeps being horrible so taking a backup will take forever at these speeds...
Am I missing something here? All the drives in the pool is working fine. I have verified using various bootable tools so why is this happening and what can I do to get the virtual disk running at full state again? Why doesn't the GUI prompt you with any
kind of usable information?
Best regards, Thomas Andre

Hi,
Please run chkdsk /f /r command on the virtual disk to have a try. In the meantime, run the following commands in PowerShell to share the output.
get-virtualdisk -friendlyname <name> | get-physicaldisk | fl
get-virtualdisk -friendlyname <name> |fl
Best Regards,
Mandy
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected]

Living in Japan and I'm an American who just signed up with Soft Bank the phone service here and spent a TON of money on an iPhone. I can't figure out how to connect my bank account at home to my app account so I can Skype my family. Please help!!!!

Living in Japan and I'm an American who just signed up with Soft Bank the phone service here and spent a TON of money on an iPhone. I can't figure out how to connect my bank account at home to my app account so I can Skype my family. Please help!!!! I don't have a credit card nor do they gove debit cards to foreigners here, or at least it's really hard so I'm using my bank at home and still have a debit there. My phone number is 8 numbers plus the country code which is strange! Another thing is I was given a phone email that I was told to use for texting but I'm not sure how that works!! It's so frustrating too because no one speaks English here and I'm not very tech savvy. God bless you if you can help :)

whichever app store you are connecting to, hyou need a credit card with an address in that country. Also, itunes gift cards must be in local currency too.
If you are in japan, you need to use the japan app store

WSUS service failure and uninstall error 0x80070643

Hello
I recently had a drive fail in a RAID 1 array on a Windows Server 2008 Standard SP2 domain controller. The drive was replaced and the array successfully rebuilt the drive. Our domain comprises a Win2k8 DC (which WSUS is installed on), a W2k3 DC, a W2k8 Storage
Server and various W2k, Win XP, Vista and W7 clients.
WSUS3 SP2 is installed on this computer and had been working fine. After the drive was rebuilt WSUS stopped working. WSUS is organised as follows:
The drive is divided into 3 partitions. C: contains the program files, D: contains the WSUS database and update files. E: is a system recovery partition.
After the drive was rebuilt I had a problem connecting to the WSUS console. I am logged on using the domain administrator account. I restarted the server last night in the hope that it would solve this issue. After restarting, the problem persists. When
I start Windows Server Update Services from Administrative Tools the centre pane shows a large red X and 'Error: Connection Error'. The option to 'Reset Server Node' results in the same error. The error, available from 'Copy Error to Clipboard' is:
The WSUS administration console was unable to connect to the WSUS Server via the remote API.
Verify that the Update Services service, IIS and SQL are running on the server. If the problem persists, try restarting IIS, SQL, and the Update Services Service.
The WSUS administration console has encountered an unexpected error. This may be a transient error; try restarting the administration console. If this error persists,
Try removing the persisted preferences for the console by deleting the wsus file under %appdata%\Microsoft\MMC\.
System.IO.IOException -- The handshake failed due to an unexpected packet format.
Source
System
Stack Trace:
   at System.Net.Security.SslState.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.TlsStream.CallProcessAuthentication(Object state)
   at System.Threading.ExecutionContext.runTryCode(Object userData)
   at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Net.TlsStream.ProcessAuthentication(LazyAsyncResult result)
   at System.Net.TlsStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.Net.PooledStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.Net.ConnectStream.WriteHeaders(Boolean async)
** this exception was nested inside of the following exception **
System.Net.WebException -- The underlying connection was closed: An unexpected error occurred on a send.
Source
Microsoft.UpdateServices.Administration
Stack Trace:
   at Microsoft.UpdateServices.Administration.AdminProxy.CreateUpdateServer(Object[] args)
   at Microsoft.UpdateServices.Administration.AdminProxy.GetUpdateServer(String serverName, Boolean useSecureConnection, Int32 portNumber)
   at Microsoft.UpdateServices.UI.AdminApiAccess.AdminApiTools.GetUpdateServer(String serverName, Boolean useSecureConnection, Int32 portNumber)
   at Microsoft.UpdateServices.UI.SnapIn.Scope.ServerSummaryScopeNode.GetUpdateServer(PersistedServerSettings settings)
   at Microsoft.UpdateServices.UI.SnapIn.Scope.ServerSummaryScopeNode.ConnectToServer()
   at Microsoft.UpdateServices.UI.SnapIn.Scope.ServerSummaryScopeNode.ConnectToServerAndPopulateNode(Boolean connectingServerToConsole)
   at Microsoft.UpdateServices.UI.SnapIn.Scope.ServerSummaryScopeNode.OnExpandFromLoad(SyncStatus status)
The event logs show that the Update Services service started 7mins after the server was restarted. This is immediately followed by an event stating the Windows Update service had started. If fails soon after and is restared automatically twice to try to
recover from the failure. 12 hours after the restart the Update Services service is still not running. It is set to Automatic (Delayed Start). The service is set to logon as 'Network Service'. The service can be manually started, but 'Resert Server Node' again
results in the same error. After 4mins the service will stop. Event 7034 is logged:
Log Name:      System
Source:        Service Control Manager
Date:          31/08/2011 09:05:50
Event ID:      7034
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      Phobos.htlincs.local
Description:
The Update Services service terminated unexpectedly. It has done this 4 time(s).
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
    <Provider Name="Service Control Manager" Guid="{555908D1-A6D7-4695-8E1E-26931D2012F4}" EventSourceName="Service Control Manager" />
    <EventID Qualifiers="49152">7034</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2011-08-31T08:05:50.000Z" />
    <EventRecordID>230471</EventRecordID>
    <Correlation />
    <Execution ProcessID="0" ThreadID="0" />
    <Channel>System</Channel>
    <Computer>Phobos.htlincs.local</Computer>
    <Security />
</System>
<EventData>
    <Data Name="param1">Update Services</Data>
    <Data Name="param2">4</Data>
</EventData>
</Event>
IIS and SQL are running fine as far as I am aware. I also have Sophos Enterprise Console installed which uses SQL to store it's data and that is working fine.
After a bit of searching around the 'net I thought the easiest solution would be to uninstall WSUS, leave the database and update files in place and then re-install. I ran the uninstallation procedure from the Server Manager, but because the service is not
running, and even after starting the service, uninstallation fails:
Windows Server Update Services: Removal failed
   Error: Attempt to un-install Windows Server Update Services failed with error code 0x80070643. Fatal error during installation
I saw a reference to the online article located here:
http://blogs.technet.com/b/sus/archive/2008/11/05/how-to-manually-remove-all-of-wsus.aspx but the article states it is out of date and I did not want to make a bad situation worse by following the instructions.
Can anyone help me with this, please. Ideally, I would like to be able to keep the updates as downloading them again will consume a substantial amount of our monthly allowance. However, if the best thing is to remove the entire installation, then I will
ahppily go with that, too.
Thanks.

I'm experiencing this issue, but haven't been able to resolve it with the steps in Lawrence's post (and, another thanks from me for another detailed post from Lawrence!)
It's a 2008 box which had a relatively new WSUS install that just stopped working - not sure why. Unable to figure out why it had stopped, I tried to Remove the WSUS role, but that failed with the error above.
During the manual uninstall steps above...
(1) Using Windows Installer Cleanup Utility only showed the Windows Internal Database, not WSUS. I removed
that.
(2) Removed %ProgramFiles%\Update
Services - other resources not present
Reinstall has now failed with this error:
Windows Server Update Services 3.0 SP2 could not install Windows Internal Database. For more information, see the Setup log "C:\Users\NAME~1.ADM\AppData\Local\Temp\WSUSSetup.log".
2012-10-23 20:57:53 Success MWUSSetup Detected that setup was launched through Server Manager
2012-10-23 20:57:54 Success MWUSSetup Validating pre-requisites...
2012-10-23 20:57:54 Error MWUSSetup Failed to determine if an higher version of WSUS is installed. Assuming it is not... (Error 0x80070002: The system cannot find the file specified.)
2012-10-23 20:57:54 Error MWUSSetup WSUS is outdated. But this will not block setup (Error 0x00000000: The operation completed successfully.)
2012-10-23 20:57:57 Success MWUSSetup Incompatible version of ReportViewer installed. Required ReportViewer version: 9.
2012-10-23 20:57:57 Success MWUSSetup Incompatible version of ReportViewer installed. Required ReportViewer version: 9.
2012-10-23 20:58:25 Success MWUSSetup Initializing installation details
2012-10-23 20:58:25 Success MWUSSetup Skipping Asp.Net install since not running on win2k3...
2012-10-23 20:58:25 Success MWUSSetup Installing wYukon using ocsetup
2012-10-23 20:58:25 Success MWUSSetup Installing Windows Internal database using ocsetup with command line as "ocsetup "WSSEE" /quiet /norestart"
2012-10-23 20:58:49 Error MWUSSetup The process ocsetup "WSSEE" /quiet /norestart returned error: 0x643 (Error 0x80070643: Fatal error during installation.)
2012-10-23 20:58:49 Error MWUSSetup ExecCmd failed (Error 0x80070643: Fatal error during installation.)
2012-10-23 20:58:49 Error MWUSSetup Install Windows Internal database: Failed to execute "ocsetup "WSSEE" /quiet /norestart" (Error 0x80070643: Fatal error during installation.)
2012-10-23 20:58:49 Error MWUSSetup CInstallDriver::PerformSetup: Installation of wYukon failed (Error 0x80070643: Fatal error during installation.)
2012-10-23 20:58:49 Error MWUSSetup CSetupDriver::LaunchSetup: Setup failed (Error 0x80070643: Fatal error during installation.)
Would anyone on the thread have suggestions for what I might be able to do, to wipe the slate clean with the Internal
Database on this box?
The WID is not used for anything else, but there is other production software installed, (using SQL - not relevant, just
mentioning) and reinstalling the OS is not an option.

I am trying to open a service request and the site does not work.

I am trying to open a service request and the site does not work with ANY browser. I get the page https://getsupport.apple.com/GetParts.action which says "Send in for service. We just need a little more information." The continue button does not work. I've seen this problem before trying to file support requests. It's as if Apple simply does not want you to create requests.

I just went through several screens and had no problems. Try clearing your browser's cache and/or history.

Service relocating and failover

Similar Messages

Maybe you are looking for