Failing over after WRITE_ERROR_TO_SERVER exception in sendRequest()

Hi
I am getting below error in my issproxy.log file. I wanted to see the source of this URL.cpp file to find out why it is failing. I am not able to open them using DLL decompiler as well.
Could anyone tell me where can I get the source code for iisproxy.dll and iisforward.dll ?
This request is failing only when the request is routed from IIS.
================New Request: [/GLMS/index.jsp.wlforward] =================
Mon Nov 24 14:19:48 2014 <503614168189882> SSL must be used
Mon Nov 24 14:19:48 2014 <503614168189882> Initializing SSL
Mon Nov 24 14:19:48 2014 <503614168189881> INFO: Initializing SSL library
Mon Nov 24 14:19:48 2014 <503614168189881> timer thread starting
Mon Nov 24 14:19:48 2014 <503614168189881> Loaded 1 trusted CA's
Mon Nov 24 14:19:48 2014 <503614168189881> sysMkdirs() on 'C:\windows\TEMP\_wl_proxy':
Mon Nov 24 14:19:48 2014 <503614168189881> getWLFilePath: Complete File name = [C:\windows\TEMP\_wl_proxy\orbrandom.txt]
Mon Nov 24 14:19:48 2014 <503614168189881> INFO: Successfully initialized SSL
Mon Nov 24 14:19:48 2014 <503614168189882> SSL configured successfully
Mon Nov 24 14:19:48 2014 <503614168189882> resolveRequest: wlforward: /TEST/index.jsp
Mon Nov 24 14:19:48 2014 <503614168189882> URI is /GLMS/index.jsp, len=15
Mon Nov 24 14:19:48 2014 <503614168189882> Request URI = [/TEST/index.jsp]
Mon Nov 24 14:19:48 2014 <503614168189882> attempt #0 out of a max of 50
Mon Nov 24 14:19:48 2014 <503614168189882> Trying a pooled connection for 'XX.XX.XX.XX/7002/7002'
Mon Nov 24 14:19:48 2014 <503614168189882> getPooledConn: No more connections in the pool for Host[XX.XX.XX.XX] Port[7002] SecurePort[7002]
Mon Nov 24 14:19:48 2014 <503614168189882> general list: trying connect to '192.168.17.180'/7002/7002 at line 1306 for '/GLMS/index.jsp'
Mon Nov 24 14:19:48 2014 <503614168189882> New SSL URL: match = 0 oid = 22
Mon Nov 24 14:19:48 2014 <503614168189882> Connect returns -1, and error no set to 10035, msg 'Unknown error'
Mon Nov 24 14:19:48 2014 <503614168189882> EINPROGRESS in connect() - selecting
Mon Nov 24 14:19:48 2014 <503614168189882> Setting peerID for new SSL connection
Mon Nov 24 14:19:48 2014 <503614168189882> c0a8 11b4 5a1b 0000 ....Z...
Mon Nov 24 14:19:48 2014 <503614168189882> Local Port of the socket is 57397
Mon Nov 24 14:19:48 2014 <503614168189882> Remote Host xx.xx.xx.xx Remote Port 7002
Mon Nov 24 14:19:48 2014 <503614168189882> general list: created a new connection to 'XX.XX.XX.XX'/7002 for '/GLMS/index.jsp', Local port: 57397
Mon Nov 24 14:19:48 2014 <503614168189882> WLS info in sendRequest: XX.XX.XX.XX:7002 recycled? 0
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs from client:[Accept]=[application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs from client:[Accept-Encoding]=[gzip, deflate]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs from client:[Accept-Language]=[en-IN]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs from client:[Cookie]=[ADMINCONSOLESESSION=9fTkJypQ229r1ZHx6cQZG8cwHb0T0ssW8TkM7zyzzCVvNzjzDsf2!1779325670; JSESSIONID=GcZVJyXT8WMyv9pT8xGNzndSPCbBCcy1tfm5yRG1DSv8PhT97gv9!1779325670; _WL_AUTHCOOKIE_ADMINCONSOLESESSION=WcL9RbOJFiDqn3LiZO0g]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs from client:[Host]=[localhost]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs from client:[User-Agent]=[Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)]
Mon Nov 24 14:19:48 2014 <503614168189882> URL::sendHeaders(): meth='GET' file='/GLMS/index.jsp' protocol='HTTP/1.1'
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[Accept]=[application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[Accept-Encoding]=[gzip, deflate]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[Accept-Language]=[en-IN]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[Cookie]=[ADMINCONSOLESESSION=9fTkJypQ229r1ZHx6cQZG8cwHb0T0ssW8TkM7zyzzCVvNzjzDsf2!1779325670; JSESSIONID=GcZVJyXT8WMyv9pT8xGNzndSPCbBCcy1tfm5yRG1DSv8PhT97gv9!1779325670; _WL_AUTHCOOKIE_ADMINCONSOLESESSION=WcL9RbOJFiDqn3LiZO0g]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[Host]=[localhost]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[User-Agent]=[Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[Connection]=[Keep-Alive]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[WL-Proxy-Client-IP]=[::1]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[Proxy-Client-IP]=[::1]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[X-Forwarded-For]=[::1]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[WL-Proxy-Client-Keysize]=[128]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[X-WebLogic-KeepAliveSecs]=[30]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[X-WebLogic-Force-JVMID]=[unset]
Mon Nov 24 14:19:48 2014 <503614168189882> Hdrs to WLS:[WL-Proxy-SSL]=[true]
Mon Nov 24 14:19:48 2014 <503614168189881> WARN: GetSessionCallback: No session match found
Mon Nov 24 14:19:48 2014 <503614168189881> WARN: DeleteSessionCallback: No match found!!
Mon Nov 24 14:19:48 2014 <503614168189882> ERROR: SSLWrite failed
Mon Nov 24 14:19:48 2014 <503614168189882> SEND failed (ret=-1) at 805 of file ..\nsapi\.\URL.cpp
Mon Nov 24 14:19:48 2014 <503614168189882> *******Exception type [WRITE_ERROR_TO_SERVER] raised at line 806 of ..\nsapi\.\URL.cpp
Mon Nov 24 14:19:48 2014 <503614168189882> Marking xx.xx.xx.xx:7002 as bad
Mon Nov 24 14:19:48 2014 <503614168189882> Exception occurred for backend host 'XX.XX.XX.XX/7002/0' while sending request : 'WRITE_ERROR_TO_SERVER [os error=0, line 806 of ..\nsapi\.\URL.cpp]: '
Mon Nov 24 14:19:48 2014 <503614168189882> got exception in sendRequest phase: WRITE_ERROR_TO_SERVER [os error=0, line 806 of ..\nsapi\.\URL.cpp]: at line 1019; last_error 0
Mon Nov 24 14:19:48 2014 <503614168189882> INFO: Closing SSL context
Mon Nov 24 14:19:48 2014 <503614168189882> Failing over after WRITE_ERROR_TO_SERVER exception in sendRequest()

yes that is right.
Essentially you should be doing one of the following on weblogic side:
1) Installed Certs on weblogic that were obtained from a commercial CA. (like verisign, thawte etc)
In this case, you will receive rootCA crt along with the other bundled certs and private key.
these rootCA certs are publicly available (your browser will be already using them)
2) Using certs signed by your company. (companies can maintain their own CA)
In this case you should be having a rootCA cert from your company.
3) using demo certs that were shipped with weblogic.
In this case, the rootca cert can be obtained from DemoTrust.jks
this is documented at http://e-docs.bea.com/wls/docs90/plugins/isapi.html#114851 (should be same for any plugins)
Apache plug-in can understand .crt extension.
-Vijay

Similar Messages

Go back to the failed line after an exception is catched

I have some lines of code where an exception can be thrown, but I would like that, if the exception is thrown, the code will continue with the next line.
I mean, how can I do this
try{
a
}catch (exception e) {}
try{
b
}catch (exception e) {}
try{
c}
catch (exception e) {}
withouth using a try...catch statement for every line of code?
Thanks

i dont know very well but there is some thing like LABEL.
you can use it like this:
try
a;
LABEL1;
b;
LABEL2;
c;
catch(aException e){LABEL1;}
catch(bException e){LABEL2;}
but in the book i read about LABEL s it is said they are not recommanded and it is also not efficient because all statements must have different exceptions.

Failing over Oracle connections in a pool

          Hi,
          This message is probably a bit out of context (I've already posted
          it to the JDBC group). I post here as well, since I guess it's
          the place where people have the most experience with clustering
          and HA. Original posting below...
          Could you please tell me whether, yes or no, connections to an
          Oracle database should fail over (when the database fails over
          to another machine)? I use Oracle's Transparent Application Failover
          (configured via Net8) with Weblogic 6 on Linux and Oracle 8.1.7
          on Solaris/SPARC.
          If this doesn't work in my configuration, is there any configuration
          where it should work? (Another version of Oracle, WLS, OS, ...)
          When I try TAF using the PetStore application, I get exceptions
          related to no being connected to the database.
          If TAF doesn't work with WebLogic, is there a way to work around
          the problem? Can I catch these exceptions and renew the connections
          in the pool? Or, what else is possible...?
          I'd appreciate any help. I'd like to demonstrate our HA product
          with WLS. If it doesn't work, I'll turn to iPlanet instead. Pity,
          I really like WLS!
          Thanks in advance for any help or advice!
          Regards, Frank Olsen


          Hi (Frank ;-)
          I got carried away a bit too fast...
          Some more testing shows that it doesn't work in all cases:
          - when someone is trying to check out the shopping cart when the
          the database fails (and fails over), I get exceptions once the
          databses has restarted on the backup node
          - the exceptions are related to some transactions being rolled
          back and Oracle stating that it couldn't safely replay the transactions
          - browsing the categories still works fine
          - all access to the shopping cart and sign-in/sign-out causes time-outs
          and exceptions
          Any ideas what may cause this problem, please?
          Regards,
          Frank Olsen
          "Frank Olsen" <[email protected]> wrote:
          >
          >Hi,
          >
          >TAF worked with WLS 6 on NT with the Oracle 8.1.7 client!
          >
          >Has anyone tested it on Solaris/SPARC?
          >
          >Regards,
          >Frank Olsen
          >
          >
          >
          >"Frank Olsen" <[email protected]> wrote:
          >>
          >>Hi,
          >>
          >>Most of my question below is still valid (in particular
          >>concerning
          >>whether TAF should work with WLS on some or all platforms
          >>and
          >>versions).
          >>
          >>However, when I tested TAF with the Oracle client (sqlplus)
          >>there
          >>also was no failover of the (one) connection. I then
          >checked
          >>the
          >>`V$SESSION' view and the colums related to failover showed
          >>that
          >>TAF was not correctly configured. Strange because I copied
          >>the
          >>`tnsnames.ora' parameters from the Oracle documentation
          >>for TAF.
          >>
          >>Has anyone managed to configure and use TAF, with or
          >without
          >>WLS?!
          >>
          >>Thanks in advance for your help!
          >>
          >>Regards,
          >>Frank Olsen
          >>
          >>
          >>"Frank Olsen" <[email protected]> wrote:
          >>>
          >>>Hi,
          >>>
          >>>This message is probably a bit out of context (I've
          >already
          >>>posted
          >>>it to the JDBC group). I post here as well, since I
          >guess
          >>>it's
          >>>the place where people have the most experience with
          >>clustering
          >>>and HA. Original posting below...
          >>>
          >>>----
          >>>
          >>>Could you please tell me whether, yes or no, connections
          >>>to an
          >>>Oracle database should fail over (when the database
          >fails
          >>>over
          >>>to another machine)? I use Oracle's Transparent Application
          >>>Failover
          >>>(configured via Net8) with Weblogic 6 on Linux and Oracle
          >>>8.1.7
          >>>on Solaris/SPARC.
          >>>
          >>>If this doesn't work in my configuration, is there any
          >>>configuration
          >>>where it should work? (Another version of Oracle,
          >WLS,
          >>>OS, ...)
          >>>
          >>>
          >>>When I try TAF using the PetStore application, I get
          >>exceptions
          >>>related to no being connected to the database.
          >>>
          >>>If TAF doesn't work with WebLogic, is there a way to
          >>work
          >>>around
          >>>the problem? Can I catch these exceptions and renew
          >the
          >>>connections
          >>>in the pool? Or, what else is possible...?
          >>>
          >>>I'd appreciate any help. I'd like to demonstrate our
          >>HA
          >>>product
          >>>with WLS. If it doesn't work, I'll turn to iPlanet instead.
          >>>Pity,
          >>>I really like WLS!
          >>>
          >>>Thanks in advance for any help or advice!
          >>>
          >>>Regards, Frank Olsen
          >>>
          >>
          >

Exception while failing over to 2nd RAC Node

We are using Weblogic 10.3.4. Our setup is that we have a Web Application (A tapestry front end Web UI) and EJb 2.1 back-end talking to the Oracle database. The EJB’s are CMP. Our product always was just stand alone and it wasn’t until this release we needed to make it work with RAC. To get this to work we followed the model of having a Multidatasource with datasources pointing to our RAC nodes. We have two types of datasources that we use persistent and non-persistent. And we are using the Oracle thin driver – non-XA for RAC Service Instances, supporting global transactions.
When we do failover to the 2nd node we get a nasty exception in our GUI but after logging out and logging back it we are fine.
My question is that I assumed I shouldn't have to restart our web-application and it should have stayed up ?? Or is there something wrong with our setup ?
Thanks,
Ian

Showing us the exception and/or the error messages at the server might help...
Note that failing over does not save any ongoing connection or transaction that
had been to the dead RAC node... Does your web-app get-use-close JDBC
connections on a per-user-invoke basis, or does it hold onto connections?
Joe

After adding 2nd WiSM and failing over AP's some apps don't work

We have a dual core made up of 2 6513's. In 6513#1 we have WiSM#1 which we have had for sometime now. We have added a 2nd WiSM in 6513#2 for redundancy purposes also we are going to be re-configuring the WiSM in 6513#1 to more match that of the new WiSM in 6513#2. We have installed the new WiSM and failed over the AP's from 6513#1 so we can re-configure it's WiSM. The failover went great and no issues, with the exception that a web application or two didn't function from wireless clients and users were having issues getting to some mapped drives. The only difference from the new WiSM config vs the old WiSM is that on the old WiSM the AP's were in the same VLAN as the controller management interfaces. Now with the new WiSM it's configuration has the controllers AP mgt interfaces ip addresses in a different VLAN from the AP's, we are doing this based on Cisco best practices. If we revert the AP's back to the original WiSM/controllers the PC's where they are on the same vlan/subnet the applications and shares that were having issues the other way work. We have placed a call with Cisco TAC and they say our configs look good and we even sent them some packet captures and they said everything looks normal. The wireless clients can ping and resolve the server hosting the application database just fine.
Thanks

We did create the mobility groups, and we are using DHCP opt 43. The AP's find the 2nd WiSM#2 just fine and associate to the controllers and all the WLAN's work just fine. The only issue is that after the AP's are on the new WiSM and controllers there is an application or 2 that is having trouble locating it's database server and that some share's are not working. Again the only difference in this new setup in that now the AP's are on a different subnet/vlan from the controller mgt addresses where as before they were in the same subnet/vlan and the application and shares worked fine. It's almost like it is a bit of a routing issue?
Thanks

How do the application servers connect the new database after failing over from primary DB to standby DB

How do the application servers connect the new database after failing over from primary DB to standby DB?
We have setup a DR environment with a standalone Primary server and a standalone Physical Standby server on RHEL Linux 6.4. Now our application team would like to know:
When the primary DB server is crashed, the standy DB server will takeover the role of primary DB through the DataGuard fast failover. As the applications are connected by the primary DB IP before,currently the physical DB is used as a different IP or listener. If this is happened, they need to stop their application servers and re-configure their connection so the they coonect the new DB server, they cannot tolerate these workaround.
Whether does oracle have the better solution for this so that the application can automatically know the role's transition and change to the new IP without re-confige any connection and shutdown their application?
Oracle support provides us the answer as following:
==================================================================
Applications connected to a primary database can transparently failover to the new primary database upon an Oracle Data Guard role transition. Integration with Fast Application Notification (FAN) provides fast failover for integrated clients.
After a failover, the broker publishes Fast Application Notification (FAN) events. These FAN events can be used in the following ways:
Applications can use FAN without programmatic changes if they use one of these Oracle integrated database clients: Oracle Database JDBC, Oracle Database Oracle Call Interface (OCI), and Oracle Data Provider for .NET ( ODP.NET). These clients can be configured for Fast Connection Failover (FCF) to automatically connect to a new primary database after a failover.
JAVA applications can use FAN programmatically by using the JDBC FAN application programming interface to subscribe to FAN events and to execute event handling actions upon the receipt of an event.
FAN server-side callouts can be configured on the database tier.
FAN events are published using Oracle Notification Services (ONS) and Oracle Streams Advanced Queuing (AQ).
=======================================================================================
Who has the experience and the related documentation or other solutions? we don't have the concept of about FAN.
Thank very much in advance.

Hi mesbeg,
Thanks alot.
For example, there is an application JBOSS server connecting the DB, we just added another datasource and put the standby IP into the configuration file except adding a service on DB side like this following:
        <subsystem xmlns="urn:jboss:domain:datasources:1.0">
        <datasources>
                <datasource jta="false" jndi-name="java:/jdbc/idserverDatasource" pool-name="IDServerDataSource" enabled="true" use-java-context="true">
                    <connection-url>jdbc:oracle:thin:@<primay DB IP>:1521:testdb</connection-url>
                    <connection-url>jdbc:oracle:thin:@<standby DB IP>:1521:testdb</connection-url>
                    <driver>oracle</driver>
                    <pool>
                        <min-pool-size>2</min-pool-size>
                        <max-pool-size>10</max-pool-size>
                        <prefill>true</prefill>
                    </pool>
                    <security>
                        <user-name>TEST_USER</user-name>
                        <password>Password1</password>
                    </security>
                    <validation>
                        <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleValidConnectionChecker"/>
                        <validate-on-match>false</validate-on-match>
                        <background-validation>false</background-validation>
                        <use-fast-fail>false</use-fast-fail>
                        <stale-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleStaleConnectionChecker"/>
                        <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleExceptionSorter"/>
                    </validation>
                </datasource>
                <drivers>
                    <driver name="oracle" module="com.oracle.jdbc">
                        <xa-datasource-class>oracle.jdbc.OracleDriver</xa-datasource-class>
                    </driver>
                </drivers>
            </datasources>
        </subsystem>
If the failover is occurred, the JBOSS will automatically be pointed to the standby DB. Additional actions are not needed.

Dabase fail over problem after we change Concurrency Strategy:

Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
reasons. Since we change that now when we do oracle database fail over weblogic
6.1 does not detect database fail over and it need to be rebooted.
how we can resolve this ??

Hi,
It is just faining one of the application servers, developer wrote that when installing CI, Local hostname is written in Database and SDM. We will have to do a Homogeneous system copy to change the name.
The problem is that I used Virtual SAP group name in CI and DI application servers, in SCS and ASCS we used Virtual hostnames and it is OK according to SAP developer.
The Start and instance profiles were checked and everything was fine, just the dispatcher from CI is having problems when comming from Node B to Node A.
Regards

Users contacts missing after failing over and then failing back pool

We have 2 Lync enterprise pools that are paired.
3 days ago, I failed the central management store, and all users from pool01 to pool02.
This morning, I failed the CMS and all users back from pool02 to pool01.
All users signed back in to Lync and no issues were reported. A user then contacted me to say that his contact list was empty.
I had him sign out and back in to Lync, and also had him sign into Lync from a different workstation, as well as his mobile device. All of which showed his contacts list as empty.
We have unified contacts enabled (Hybrid mode with Office 365 exchange online, and Lync on prem). When I check the users Outlook contacts, I can see all of his contacts listed under "Lync Contacts", along with the current presence of each user.
If I perform an export-csuserdata for that user's userdata, the XML file contained within the ZIP file shows the contacts that he is missing.
I've also checked the client log on the workstation too, and can see that Lync can see the contacts as it lists them in the log. They do not appear in the Lync client though.
Environment details:
Lync 2013 - 2 enterprise pools running the latest December 2014 CU updates.
Lync 2013 clients - running on Windows 8.1. User who is experiencing the issue is running client version 15.0.4675.1000 (32 bit)
I have attempted to re-import the user data using both import-csuserdata (and restarting the front end services) and update-csuserdata. Both of these have had no effect.

Hi Eason,
Thanks for your reply. I've doubled checked and can confirm that only one policy exists, which enables it globally.
I believe this problem relates to issues that always seem to happen when ever our primary pool is failed over to the backup pool, and then failed back.
What I often see is that upon pool failback, things like response group announcements don't play on inbound calls (white noise is heard, followed by the call disconnecting), and agents somehow get signed out of queues (although they appear to be signed in to
the queue when checking their Response Group settings in their Lync client. I've also noticed that every time we fail back, a different user will come to me and report that either their entire contacts list is missing, or that half of their contacts are missing.
I am able to restore these from backup though.
This appears to happen regardless of if the failover to the backup pool is due to a disaster, or to simply perform pool maintenance on our primary pool.

Failing over from MDC02 back to MDC01 on XSan

cross-posted from Xsan discussion...
Looking for some advice on an XSan MetaData controller problem. Last week, we had a failover from our primary MetaData controler (metasvr01) to our backup MetaData Controller (metasvr02). So far, so good.
After some investigation, it looked like metasvr01 had locked up - so we rebooted it. It appears to have come back alive somewhat normally, except the cvadmin command is not seeing both MetaData servers (like it used to).
Should it be "safe" to try a metadata server failover to metasvr01?
Are there any potential problems / "gotcha's" we should be aware of?
Here's the output from metasvr01:
========================================================
metasvr01:~ metasvr01$ sudo cvadmin
Password:
Xsan Administrator
Enter command(s)
For command help, enter "help" or "?".
List FSS
File System Services (* indicates service is in control of FS):
1> EditB[0]         located on metasvr01.private:49248 (pid 140)
2> EditA[0]         located on metasvr01.private:49247 (pid 139)
No FSSs are active.
Select FSM "none"
Here's the output from metasvr02:
========================================================
metasvr02:~ metasvr02$ sudo cvadmin
Password:
Xsan Administrator
Enter command(s)
For command help, enter "help" or "?".
List FSS
File System Services (* indicates service is in control of FS):
1>*EditB[0]         located on metasvr02.private:49200 (pid 128)
2>*EditA[0]         located on metasvr02.private:49201 (pid 127)
Select FSM "none"
NOTE:
==========================================================
Prior to this incident, we would see both metadata servers (on both metasvr01 and metasvr02) - with the "active" one having the asterisk indicating correctly.
Original XSan / XRaid systems.
Still running XSan version 1.4.x
Here are the details regarding metasvr01:
OS/X Server 10.5.5
Normally serving as the primary / active metadata controller.
Here are the details regarding metasvr02:
OS/X Server 10.5.5
Normally serving as the secondary / backup metadata controller,
                         secondary / backup Open Directory Server,
                         secondary / backup DNS server
Here are the details regarding xsangw01:
OS/X Server 10.5.5
Normally serving as the primary / master Open Directory server,
                         primary / master DNS server,
                         smb / afp shares throughout the LAN.
================================================================================ ===
One final set of notes regarding these servers. Over the last couple of months, the servers have been increasingly "problematic". So far, we've lost the following capabilities on them:
- no more ARD access (generally)
- no more local KB, Mouse, & Monitor access (generally)
- no more XSan GUI / Server tools GUI access
================================================================================ ==

It is common that when only one of the Xsan servers are restarted that their configuration would get out of sync.
It looks like your metasvr01 thinks it is still hosting the volumes that were failed over.
What you can try to do is from metasvr02 demote metasvr01 to a client and then promote it back to controller. That will force a configuration rewrite and bring metasvr01 back in sync. If cvadmin shows you a correct configuration, then you could safely fail over back to metasvr01 assuming you are confident that whatever issue led to the failover is solved.
A full restart of the XSAN would probably work too (shutdown order - clients, controllers, raids, switches and power on on reverse order)
I would also recomend regular maintenance tasks, like checking disk and permissions on the servers and stopping the volume(s) and running cvfsck.

Failing over from MDC02 back to MDC01

Looking for some advice on an XSan MetaData controller problem. Last week, we had a failover from our primary MetaData controler (metasvr01) to our backup MetaData Controller (metasvr02). So far, so good.
After some investigation, it looked like metasvr01 had locked up - so we rebooted it. It appears to have come back alive somewhat normally, except the cvadmin command is not seeing both MetaData servers (like it used to).
Should it be "safe" to try a metadata server failover to metasvr01?
Are there any potential problems / "gotcha's" we should be aware of?
Here's the output from metasvr01:
========================================================
metasvr01:~ metasvr01$ sudo cvadmin
Password:
Xsan Administrator
Enter command(s)
For command help, enter "help" or "?".
List FSS
File System Services (* indicates service is in control of FS):
1> EditB[0]         located on metasvr01.private:49248 (pid 140)
2> EditA[0]         located on metasvr01.private:49247 (pid 139)
No FSSs are active.
Select FSM "none"
Here's the output from metasvr02:
========================================================
metasvr02:~ metasvr02$ sudo cvadmin
Password:
Xsan Administrator
Enter command(s)
For command help, enter "help" or "?".
List FSS
File System Services (* indicates service is in control of FS):
1>*EditB[0]         located on metasvr02.private:49200 (pid 128)
2>*EditA[0]         located on metasvr02.private:49201 (pid 127)
Select FSM "none"
NOTE:
==========================================================
Prior to this incident, we would see both metadata servers (on both metasvr01 and metasvr02) - with the "active" one having the asterisk indicating correctly.
Original XSan / XRaid systems.
Still running XSan version 1.4.x
Here are the details regarding metasvr01:
OS/X Server 10.5.5
Normally serving as the primary / active metadata controller.
Here are the details regarding metasvr02:
OS/X Server 10.5.5
Normally serving as the secondary / backup metadata controller,
                         secondary / backup Open Directory Server,
                         secondary / backup DNS server
Here are the details regarding xsangw01:
OS/X Server 10.5.5
Normally serving as the primary / master Open Directory server,
                         primary / master DNS server,
                         smb / afp shares throughout the LAN.
================================================================================ ===
One final set of notes regarding these servers. Over the last couple of months, the servers have been increasingly "problematic". So far, we've lost the following capabilities on them:
- no more ARD access (generally)
- no more local KB, Mouse, & Monitor access (generally)
- no more XSan GUI / Server tools GUI access
================================================================================ ==

It is common that when only one of the Xsan servers are restarted that their configuration would get out of sync.
It looks like your metasvr01 thinks it is still hosting the volumes that were failed over.
What you can try to do is from metasvr02 demote metasvr01 to a client and then promote it back to controller. That will force a configuration rewrite and bring metasvr01 back in sync. If cvadmin shows you a correct configuration, then you could safely fail over back to metasvr01 assuming you are confident that whatever issue led to the failover is solved.
A full restart of the XSAN would probably work too (shutdown order - clients, controllers, raids, switches and power on on reverse order)
I would also recomend regular maintenance tasks, like checking disk and permissions on the servers and stopping the volume(s) and running cvfsck.

Job cancelled after system exception ERROR_MESSAGE in DB13

Hello All,
When i opened the t-code DB13 i saw that this job "Mark tables requiring statistics update" is cancelled.
JOB LOG:
12.02.2011 22:00:16 Job started
12.02.2011 22:00:16 Step 001 started (program RSDBAJOB, variant &0000000000085, user ID 80000415)
12.02.2011 22:00:18 Job finished
12.02.2011 22:00:18 Job started
12.02.2011 22:00:18 Step 001 started (program RSADAUP2, variant &0000000000081, user ID 80000415)
12.02.2011 22:01:26 Error when performing the action
12.02.2011 22:01:26 Job cancelled after system exception ERROR_MESSAGE
When check for the BGD Job in SM37 for this job i found the same error in the job log with the status cancelled.
Job log overview for job: DBA!PREPUPDSTAT_____@220000/6007 / 22001700
12.02.2011 22:00:18 Job started
12.02.2011 22:00:18 Step 001 started (program RSADAUP2, variant &0000000000081, user ID 80000415)
12.02.2011 22:01:26 Error when performing the action
12.02.2011 22:01:26 Job cancelled after system exception ERROR_MESSAGE
I couldn't find any logs in SM21 of that time also no dumps in ST22.
Possible reason for this error:
I have scheduled the job Check database structure (only tables) at some other time and deleted the earlier job which was scheduled during the bussiness hours which caused performance problem.
So to avoid performance issue i scheduled this job in the mid night by cancelling the old job which was scheduled during the bussiness hours.
And from the next day i could see this error in DB13.
Rest all the backups are running fine but the only job getting cancelled is for "Mark tables requiring statistics update"
Could anyone tell me what should i do to get rid of this error?
Can i schedule this "Mark tables requiring statistics update" again by deleting the old one?
Thanks.
Regards.
Mudassir Imtiaz

Hello Adrian,
Thanks for your response.
Every alternate day we used to have performance issue at 19:00hrs.
Then when i checked what is causing this problem, i discovered that there was a backup "Check Database Structure (tables only)" scheduled at this time and it was mentioned that this backup may cause performance issue.
Then i changed "Check Database Structure (tables only)" the time of this backup to 03:00hrs.
The next day when i checked DB13 i found that one of the backups failed.
i.e. "Mark Tables Requiring Statistics Update"
Then i checked the log which i posted earlier with the error: "Job cancelled after system exception ERROR_MESSAGE"
I posted this error here and then i tried to delete the jobs scheduled i.e. "Mark Tables Requiring Statistics Update" and then re-schedule it at the same time and interval.
And then it started working fine.
So i just got curious to know the cause of the failure of that job.
Thanks.
Regards,
Mudassir.Imtiaz
P.S There is one more thing which i would like to say which is not related to the above issue, and m sorry to discuss this in this thread.
I found a few Bottlenecks in ST04 with Medium and High priority.
Medium: Selects and fetches selectivity 0.53%: 122569 selects and fetches, 413376906 rows read, 2194738 rows qualified.
High: 108771 primary key range accesses, selectivity 0.19%: 402696322 rows read, 763935 rows qualified.
There are a lot these.
I would really appreciate if you tell me what is the cause for these Bottlenecks and how to resolve.
Thanks a lot.

SQL Server 2014 Always on HA takes 8-14 seconds to fail over. Application side timeouts occur

Hi All,
I have a very similar post in the SQL Server 2014 forums too (https://social.technet.microsoft.com/Forums/sqlserver/en-US/adb5e338-907e-4405-aa62-d3ea93c7a98a/sql-server-2014-always-on-ha-takes-814-seconds-to-fail-over-application-side-timeouts-occur?forum=sqldisasterrecovery) -
advice in the end was to post a question here.
SQL Server Nodes, 2014 (12.0.2480.0)
1 Share witness (on separate subnet)
1 Cluster
1 Listener
I have been testing the response time to failovers – both manual (right-click, fail over in SSMS) and Automatic (shut down the primary host). The way I am testing response is to have a SSMS query running on my desktop, connected to the listener querying
a small table and hit execute.
The Query response time, from execute to receiving the result, has been between 8 and 14 seconds based on my testing. My previous experience (in a separate environment) showed around 2 second fail over times in a very similar configuration.
Availability DB is 200Mb and is not actively used. The nodes are synchronised.
SQL Server Hosts: Windows 2012, 2 cpu, 8gb RAM.
Questions:
1: It’s a big question but what should I expect for a ‘normal’ fail over time. Keep in mind this scenario is about as simple as it gets.
2: As it stands an 8 to 14 second ‘outage’ could cause some applications to time out. Or am I being un-reasonable? I am seeing the very simple query in SSMS to time out with this:
Msg 983, Level 14, State 1, Line 2
Unable to access availability database 'DATABASE' because the database replica is not in the PRIMARY or SECONDARY role. Connections to
an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.
Cluster logs are long - this section accounts for 8 seconds of the 11 second outage I experienced. I can supply the full log if required. Also this log is just the 2 cluster nodes, I removed the witness share to make sure it was as simple as possible.
00001090.00002128::2015/02/25-03:05:08.255 INFO [GEM] Node 2: Deleting [1:65 , 1:71] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:10.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:11.888 INFO [GEM] Node 2: Deleting [1:72 , 1:73] (both included) as it has been ack'd by every node
00001090.00002698::2015/02/25-03:05:11.889 INFO [GUM] Node 2: Processing RequestLock 2:49
00001090.00002128::2015/02/25-03:05:11.890 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 67)
00001090.00002698::2015/02/25-03:05:11.890 INFO [GUM] Node 2: executing request locally, gumId:68, my action: /dm/update, # of updates: 1
00001090.00002128::2015/02/25-03:05:12.890 INFO [GEM] Node 2: Deleting [1:74 , 1:74] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:15.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:16.988 INFO [GUM] Node 2: Processing RequestLock 1:28
Thanks in advance.
Keegan

Hi Keegan,
From these event log , what I can see is "Sending request Netname" wasted the time .
Could you please tell us the network configuration of that cluster nodes ?
If I recall correctly , it is recommended to only remain Tcp/IP protocol and disable NetBIOS over TCP/IP for "Private Network" , also do not configure DNS/Wins default gateway for "Private Network" :
https://support.microsoft.com/kb/258750?wa=wsignin1.0
After that please test again .
Best Regards,
Elton JI
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected] .

OCR and voting disks on ASM, problems in case of fail-over instances

Hi everybody
in case at your site you :
- have an 11.2 fail-over cluster using Grid Infrastructure (CRS, OCR, voting disks),
where you have yourself created additional CRS resources to handle single-node db instances,
their listener, their disks and so on (which are started only on one node at a time,
can fail from that node and restart to another);
- have put OCR and voting disks into an ASM diskgroup (as strongly suggested by Oracle);
then you might have problems (as we had) because you might:
- reach max number of diskgroups handled by an ASM instance (63 only, above which you get ORA-15068);
- experiment delays (especially in case of multipath), find fake CRS resources, etc.
whenever you dismount disks from one node and mount to another;
So (if both conditions are true) you might be interested in this story,
then please keep reading on for the boring details.
One step backward (I'll try to keep it simple).
Oracle Grid Infrastructure is mainly used by RAC db instances,
which means that any db you create usually has one instance started on each node,
and all instances access read / write the same disks from each node.
So, ASM instance on each node will mount diskgroups in Shared Mode,
because the same diskgroups are mounted also by other ASM instances on the other nodes.
ASM instances have a spfile parameter CLUSTER_DATABASE=true (and this parameter implies
that every diskgroup is mounted in Shared Mode, among other things).
In this context, it is quite obvious that Oracle strongly recommends to put OCR and voting disks
inside ASM: this (usually called CRS_DATA) will become diskgroup number 1
and ASM instances will mount it before CRS starts.
Then, additional diskgroup will be added by users, for DATA, REDO, FRA etc of each RAC db,
and will be mounted later when a RAC db instance starts on the specific node.
In case of fail-over cluster, where instances are not RAC type and there is
only one instance running (on one of the nodes) at any time for each db, it is different.
All diskgroups of db instances don't need to be mounted in Shared Mode,
because they are used by one instance only at a time
(on the contrary, they should be mounted in Exclusive Mode).
Yet, if you follow Oracle advice and put OCR and voting inside ASM, then:
- at installation OUI will start ASM instance on each node with CLUSTER_DATABASE=true;
- the first diskgroup, which contains OCR and votings, will be mounted Shared Mode;
- all other diskgroups, used by each db instance, will be mounted Shared Mode, too,
even if you'll take care that they'll be mounted by one ASM instance at a time.
At our site, for our three-nodes cluster, this fact has two consequences.
One conseguence is that we hit ORA-15068 limit (max 63 diskgroups) earlier than expected:
- none ot the instances on this cluster are Production (only Test, Dev, etc);
- we planned to have usually 10 instances on each node, each of them with 3 diskgroups (DATA, REDO, FRA),
so 30 diskgroups each node, for a total of 90 diskgroups (30 instances) on the cluster;
- in case one node failed, surviving two should get resources of the failing node,
in the worst case: one node with 60 diskgroups (20 instances), the other one with 30 diskgroups (10 instances)
- in case two nodes failed, the only node survived should not be able to mount additional diskgroups
(because of limit of max 63 diskgroup mounted by an ASM instance), so all other would remain unmounted
and their db instances stopped (they are not Production instances);
But it didn't worked, since ASM has parameter CLUSTER_DATABASE=true, so you cannot mount 90 diskgroups,
you can mount 62 globally (once a diskgroup is mounted on one node, it is given a number between 2 and 63,
and other diskgroups mounted on other nodes cannot reuse that number).
So as a matter of fact we can mount only 21 diskgroups (about 7 instances) on each node.
The second conseguence is that, every time our CRS handmade scripts dismount diskgroups
from one node and mount it to another, there are delays in the range of seconds (especially with multipath).
Also we found inside CRS log that, whenever we mounted diskgroups (on one node only), then
behind the scenes were created on the fly additional fake resources
of type ora*.dg, maybe to accomodate the fact that on other nodes those diskgroups were left unmounted
(once again, instances are single-node here, and not RAC type).
That's all.
Did anyone go into similar problems?
We opened a SR to Oracle asking about what options do we have here, and we are disappointed by their answer.
Regards
Oscar

Hi Klaas-Jan
- best practises require that also online redolog files are in a separate diskgroup, in case of ASM logical corruption (we are a little bit paranoid): in case DATA dg gets corrupted, you can restore Full backup plus Archived RedoLog plus Online Redolog (otherwise you will stop at the latest Archived).
So we have 3 diskgroups for each db instance: DATA, REDO, FRA.
- in case of fail-over cluster (active-passive), Oracle provide some templates of CRS scripts (in $CRS_HOME/crs/crs/public) that you edit and change at your will, also you might create additionale scripts in case of additional resources you might need (Oracle Agents, backups agent, file systems, monitoring tools, etc)
About our problem, the only solution is to move OCR and voting disks from ASM and change pfile af all ASM instance (parameter CLUSTER_DATABASE from true to false ).
Oracle aswers were a litlle bit odd:
- first they told us to use Grid Standalone (without CRS, OCR, voting at all), but we told them that we needed a Fail-over solution
- then they told us to use RAC Single Node, which actually has some better features, in csae of planned fail-over it might be able to migreate
client sessions without causing a reconnect (for SELECTs only, not in case of a running transaction), but we already have a few fail-over cluster, we cannot change them all
So we plan to move OCR and voting disks into block devices (we think that the other solution, which needs a Shared File System, will take longer).
Thanks Marko for pointing us to OCFS2 pros / cons.
We asked Oracle a confirmation that it supported, they said yes but it is discouraged (and also, doesn't work with OUI nor ASMCA).
Anyway that's the simplest approach, this is a non-Prod cluster, we'll start here and if everthing is fine, after a while we'll do it also on Prod ones.
- Note 605828.1, paragraph 5, Configuring non-raw multipath devices for Oracle Clusterware 11g (11.1.0, 11.2.0) on RHEL5/OL5
- Note 428681.1: OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)
-"Grid Infrastructure Install on Linux", paragraph 3.1.6, Table 3-2
Oscar

I failed to Confirm Security Exception for a site confirmed to be trusted, but apparently only supported by Internet Explorer, and am now barred from the site, so how do I access it?

I have been successfully accessing a secure site using a Citrix plugin. The security certificate for the site was renewed on 17 September and is valid, but is apparently supported by Explorer. When I accessed it today, Firefox reported it as Insecure. I attempted to bypass the Firefox block, but must have failed to Confirm Security Exception, so that my access is now denied as I 'have said this is not a trusted site' (error 183). Is there any way I can now access the site.

Thank you for your reply.
My problem isn't removing my log-in name and password for a site.
The problem is I already had the log-in name and password saved by FF, but I was required by the site to change the password. FF did not recognize that it had changed and refuses to prompt me to save the new password.
After several attempts to force the prompt, I deleted the site from the Saved Password list, hoping that the next time I entered my log-in name and password on that site FF would ask if I wanted to save both. That did not happen even after exiting FF and relaunching it.
So, right now FF will not save my log-in name and password for a site that it used track for me. Given the complexity of the new password, I really do not want to manually enter it every time I use the site.
Regards.

Which role do I need DFS or File server on fail over cluster server 2012 R2?

what I want to achieve is that I want to share all my user data files in a central location and to be highly available all the time whether it's a general share or folder redirection data. BUT I'm a bit confused; I have fail over cluster set-up
on server 2012, now I would like to add DFS as a role but than we have another role called File server and virtually it does the same thing as DFS? Means it creates a namespace share that can be access even one of the nodes goes down. Now I am thinking is
that DFS does the replication between two physical location but fail over cluster works slightly differently and with file server it pretty much does the same thing except for replicating data from one drive to another. Now what do you suggest I do or
did I get the concept wrong like a noob?

DFS and Failover Clustering for file shares provides a similar end result for file access, but they are significantly different implementations.
Clustering provides high availability to files by presenting shared access to set a files served from a cluster. With 2012 R2 Microsoft added the ability to create a Scale-out File Server that even allows all nodes of the cluster to server access to
the files for a higher level of performance and other great things. Bottom line with Failover Clusters for files is that there is a single copy of the file presented from the cluster.
DFS on the other hand provides high availability to files by presenting multiple copies of the file by making a copy in two or more locations and presenting a naming space that allows access to the file through any of the network paths. DFS works very
well for files that are primarily read-only. When you get into a situation where there is a lot of updating of the shared files, DFS is not a very good solution. There are ways to implement DFS for read/write files, but it generally requires a
good knowledge of how the files are used and how you want to manage them.
The key to answering your question comes in your first sentence "I want to share all my user data files in a central location and to be highly available all the time". My initial reaction to this is that central location means Failover Cluster
- there is only a single copy of the file. However, "all the time" can be compromised by network failures to the central site. Remote sites would not have access if they can't access the central site. DFS provides the ability to
have copies remotely, but then if you allow updating at multiple sites, you have to manage the merging of the changes, among other things.
. : | : . : | : . tim

Failing over after WRITE_ERROR_TO_SERVER exception in sendRequest()

Similar Messages

Maybe you are looking for