Dataguard Failure scenario

In my current environemnt Production database has one physical standby database.
The size of the database is 5 TB. Due the network failure 4 days archivelog files were not copied to primary database (archivelog files were deleted in primary database)
How to sync with the Standby database without recreation

Hi user8934564 ,
If your database is 10g and above and your production is still working without any interrupt , then you can go for incremental backup of primary from the SCN of standby.Transfer this backup to standby and restore it in standby.After this , create standby controlfile in primary for standby and restore the same in standby.Now standby will be in sync with primary.
You can follow up the below link:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14239/scenarios.htm#CIHIAADC
If you have the deleted logs in PRIMARY backup fiels , then restore them in primary , so that same logs will start to transfer to standby.
Regards,
Sunand

Similar Messages

ORACLE RAC failure scenarios

Hello,
We have heard all the good points about the RAC and many of these are true but I just want real experience when even well configured RAC is failed and have unplanned downtime.
Can anyone tell the failure scenarios also? I understand very basic one for example, interconnect fails ,SAN failed etc but please share some real life experience where even Oracle Customer Service takes not only hours but days to resolve the problem and they simply termed that problem as bug.
Thanks,
S.Mann

I agree with Andreas and I think it's important to point out that the issues he mentioned (networking issues as well as other communication problems) are typically more common when RAC is deployed on a platform that isn't completely familiar to the implementor. That is, if you run Oracle on Windows servers, then deploying RAC on Linux successfully will probably be difficult.
My standard answer for "what's the best platform for RAC?" is to run RAC on the platform that you know the most about. When you're building a system to house your most critical applications, wouldn't you want to build it on the platform that you know the most about?

Data load - Failure Scenarios

Can anyone please explain in detail on how to handle this DATALOAD issues. I would like to understand for following scenario in BI 7.0 and R/3 4.6c
1) R/3 --> PSA --> DSO --> CUBE
So, there are one infoPackage and 2 DTP's are involved.
Let us take take a scenario, Please explain me in detail (Steps), on how to fix the LOAD, if the LOAD failed in R/3 --> PSA or PSA --> DSO or DSO --> CUBE.
I would appreciate your help in advance and points will rewards, of course.
BI Developer

Hi.......
1) Generally load fails in R/3 --> PSA due to RFC connection issue......You can check the RFC connection SM59.....
If it fails........you can make the status red.....and delete the failed request from PSA........again load it........
2) Load from PSA to DSO may fail due to many reasons....
a) Lock issue...>> in this case you can check the lock in SM12........wait for the lock get released...........then make the QM status red......delete the request from the target..again repeat the load............
Or may be due to locked by change run........in this case also after the completion of ACR....you have to repeat the load..
b) load failed because last delta failed.........In this case first you have to rectify the last delta.........then repeat this load....
3) DSO --> CUBE.........a)here also the load may fail due to lock issue...in this case also you have to delete the request from the target.after making the QM status red........then lock get released you have to repeat the load.......
b)while loading data from DSO to infocube , the dtp run failed due to short dump.The error anlysis gives description as DBIF_RSQL_SQL_ERROR,
This error is usually seen when the table behind ODS is getting updated from more than one source system simultaneously. Here the data packets from more than one request contain one or more common records. This gives rise to deadlock while updating inserting records into the ODS table and subsequently a short dump is encountered leading to the failure of one request. The solution is to cancel the loads and run them serially. A possible long term solution is to load the data up to the PSA in parallel and then load the ODS table from the PSA in serial. (Here change the InfoPackage to option PSA only and tick Update subsequent data targets).
Solution
It may be possible that the job is set up in such a way that the activation of data in the ODS takes place only after data is received from all four regions. Thus, if the failing request is deleted the correct requests will also be deleted. Hence it is required to change the status of the failed job from red/yellow to green, activate the ODS data and then re-start the load from the (failing) source after correction of any possible errors.
c) while loading the active data that is in the first ODS into a cube - it fails with the following error.
err #1 )Data Package 1 : arrived in BW ; Processing : Error records written to application log
err #2 ) Fiscal year variant C1 not expected
Sol : 'GO to SPRO, SAP Customizing guide
Open the tree:
SAP Netweaver
SAP Business Information Warehouse
Maintain FIscal Year Variant
Hope this helps you.....
Regards,
Debjani........
Edited by: Debjani Mukherjee on Sep 24, 2008 8:47 PM
Edited by: Debjani Mukherjee on Sep 24, 2008 8:53 PM
Edited by: Debjani Mukherjee on Sep 24, 2008 8:54 PM

HT6154 What are DHCP timers in failure scenario for common iOS versions?

If the DHCP server is down, how many DHCP DISCOVERS will be sent and what are each DHCP DISCOVER timeout to wait for DHCP OFFER?
I'd like the answer for all common and recent iOS versions.

That sounds more like a question for the developer forums than the user to user support forums.

Streams with DataGuard design advice

I have two 10gR2 RAC installs with DataGuard physical copy mode. We will call the main system A and the standby system B. I have a third 10gR2 RAC install with two-way Streams Replication to system A. We will call this RAC system C.
When I have a failure scenario with system A, planned or unplanned, I need for system C's Streams Replication to start replicating with system B. When the system A is available again I need for system C to start replicating with system A again.
I am sure this is possible, and I am not the only one that wants to do something like this, but how? What are the pitfals?
Any advice on personal experience with this would be greatly appreciated!

Nice concept and I can only applaud to its ambitions.
+"I am sure this is possible, and I am not the only one that wants to do something like this".+
I would like to share your confidence, but i am afraid there are so many pitfalls than success will depends on how much pain you and you hierarchy can cope with.
Some thoughts:
Unless your dataguard is Synchronous, at the very moment where A fails, there will be missing TXN in C,
which may have been applied in B as Streams is quite fast. This alone tells us that a forced switch cannot
be guarantee consistent : You will have errors and some nasty one such as sequence numbers consumed
on A (or B) just before the crash, already replicated to B (or A)but never shipped to C. Upon awake C will
re-emit values already known on B (dup key on index?)
I hope you don't sell airplane ticket for in such a case you can sell some seats twice.
Does C have to appear as another A or is it allowed to have a different DB_NAME? (How will you set C in B?
is C another A which retakes A name or is C a distinct source).if C must have the same DB_NAME, the global
name must be the same. Your TNS construction will be have to cope with 2 identical TNS entry in your network
referring to 2 different hosts and DB. Possible with cascade lines at (ADDRESSES= ... , but to be tested.
If C is another A then C must have the same DB_name as LCR do have their origin DB name into them.
If C has a distinct name from A it must have its on apply process, not a problem it will be idle while A is alive,
but also a capture that capture nothing while A is alive, for it is Capture site who is supposed to send the ticks
to advance counters on B. Since C will be down in normal time, you will have to emulate this feature periodically
reseting the first_scn manually for this standby capture - you can jump archives providing you jump to another
archive with built in data dictionary - or accept to create a capture on B only when C wakes up. The best would
be to consider C as a copy of B (multi-master DML+DDL?) and re-instantiate tables without transferring any data
the apply and capture on C and B to whatever SCN is found to be max on both sites when C wakes up.
All this is possible, with lot of fun and hard work.
As of the return of the Jedi, I mean A after its recovered from its crash you did not tell us if the re-sync C-->A
is hot or cold. Cold is trivial but If it is hot, it can be done configuring a downstream capture from C to A with
the init SCN set to around the crash and re-load all archives produced on C. But then C and A must have a different
DB_name or maybe setting a different TAG will be enough, to be tested. Also there will be the critical switch
the multi-master replication C<-->B to A<-->B. This alone is a master piece of re-sync tables.
In all cases, I wish you a happy and great project and eager to hear about it.
Last, there was a PDF describing how to deal with dataguard switch with a DB using streams, but it is of little help,
for it assumes that the switch is gentle : no SCN missing. did not found it but maybe somebody can point a link to it.
Regards,
Bernard Polarski

Exception handling in synchronous proxy - web service scenario

Hi Gurus,
I have a synchronous scenario in which SAP is sending a request via XI using SOAP and receiving a response back. As part of this scenario, I am consuming standard web service APIs provided by the third party.
Since, every request has to contain the connecting user id and password provided by the third party, I am sending/receiving messages without SOAP envelop (achieved by clicking 'Do not use SOAP envelope' checkbox in SOAP Receiver Communication Channel).
For this scenario, we are including the user id and password in the request message using XSLT mapping and the request number using simple message mapping.
The fault message of the web service is being mapped to the fault message created in XI under Fault Message Types.
The interface mapping page has got 3 tabs, one each for Request message mapping, Response message mapping and Fault message mapping.
When I trying to test a failure scenario by giving an incorrect request number(since this is the only input parameter in gthe request message apart from user id and password), it is throwing up "MAPPING">EXCEPTION_DURING_EXECUTE error.
Actually, for such requests, I am getting a proper fault response back from the third party which I can see in XI (in moni) as response to my request but when I am looking at the message in moni in SAP, I am only seeing "MAPPING">EXCEPTION_DURING_EXECUTE. Even I can see the exception in the trace section of my response in moni in XI.
My feeling is that the fault message mapping is not getting executed at all.
I also thought to do a 2:1 multimapping in which, the target side will contain the response message type created in XI but the source will contain two messages, i.e. Normal response message structure provided by third party and Fault message structure provided by third party but I am not sure whether this is possible without using BPM.
Please suggest the best way to resolve this issue.

The Fault message raised from 3rd party service, is structured as follow ??
HTTP/1.1 500 Internal Server Error
Content-Type: text/xml; charset="utf-8"
Content-Length: nnnn
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
   <SOAP-ENV:Body>
       <SOAP-ENV:Fault>
           <faultcode>SOAP-ENV:Server</faultcode>
           <faultstring>Server Error</faultstring>
           <detail>
               <e:myfaultdetails xmlns:e="Some-URI">
                 <message>My application didn't work</message>
                 <errorcode>1001</errorcode>
               </e:myfaultdetails>
           </detail>
       </SOAP-ENV:Fault>
   </SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Take a look here:
/people/jin.shin/blog/2007/05/21/handling-web-service-soap-fault-responses-in-sap-netweaver-xi
and to this standard document:
http://help.sap.com/saphelp_nwpi711/helpdata/en/48/5946db5f693912e10000000a42189b/content.htm

Factory Finder failure and _tmmsgrcv() SEGV

Hello,
we have a rather strange problem with WLE 5.1. One type of process, QD,
occasionally experiences a SEGV inside _tmmsgrcv(). The SEGV is usually
preceeded by a failure from the factory finder.
QD is a process that converts /Q messages into Croba requests. TMQFORWARD is
used to get the messages out of the /Q, and to invoke the Tuxedo service in
QD. QD then finds a Corba object that can handle the operation and invokes
the operation. There are multiple instances of TMQFORWARD, QD and CorbaApp
for the same service/object.
The functionality of QD is roughly this:
try {
f = find_on_factory_by_id( name);
if (CORBA::is_nil( f)) {
log( "no factory found");
return RETRY;
else {
h = Obj::_narrow( f);
if (CORBA::is_nil( h)) {
log( "narrow failed");
return RETRY;
catch( ...) {
log( "find factory failed");
if (tooManyFactoryFailures)
assert();
return( RETRY);
try {
h->ping(); // no-op to check that object is available
catch( INVALID_TRANSACTION) {
log( "object is busy");
CORBA::release( h);
return( RETRY);
catch ( ...) {
log( "ping failed");
CORBA::release( h);
return( RETRY);
try {
h->doOp( msg);
catch( ...) {
log( "operation failed");
CORBA::release( h);
return( RETRY);
CORBA::release( h);
return SUCCESS;
The outer function will call tpreturn indicating success, retry or abort.
Now, most of the time all is ok and this works for days. But once in a while
there is a problem.
The outcome is either an assert, because most attempts to find the factory
fail (there still are a few successes in between), or a SEGV on getting the
next message to be handled by the QD (it's a different message, not a
retry).
#0 0xfbaa4274 in _tmmsgrcv () from /opt/wle/lib/libtux.so.65
#1 0xfbad4c1c in _tmrcvrq () from /opt/wle/lib/libtux.so.65
#2 0xfbad6b58 in _tmrunserver () from /opt/wle/lib/libtux.so.65
#3 0xfbaba140 in _tmstartserver () from /opt/wle/lib/libtux.so.65
#4 0x4346c in main (argc=19, argv=<incomplete type>, N=<error type>) at
BS-6950.c:74
The failure scenario starts with a factory failure (log "find factory
failed", exception TRANSACTION_ROLLEDBACK). QD returns retry, and the
message remains on the /Q (with a delayed retry).
After that the behaviour is not consistent.
Sometimes QD crashes (SEGV) on receiving the next message.
Sometimes QD seems to succeed processing the next message (no errors logged,
no exceptions thrown), but the Corba Object never gets the message.
Sometimes QD fails again to get the factory, but the exception is either
INTERNAL or BAD_PARAM.
The above may be repeated in any combination until the final outcome (assert
or SEGV).
In one case (but only one) there was a LIBTUX_CAT:518: ERROR.
In some cases there is a LIBTUX_CAT:489: ERROR.
My suspicion is that there is a memory corruption somewhere. I have reviewed
our code many times and have not been able to find anything. I have also
searched BEA and the web, but have not found anything relevant.
Any suggestions?
Roger

Hi Todd,
thanks for your reply.
I started to answer your individual questions, but then we had a different
crash (in tmfmsgfree). I submitted that to BEA, and they found a known fix
in 8.1. Hopefully they will port it to WLE 5.1.
Have you tried running Purify or a similar tool to see if that helps?Yes, some time ago to fix a memory leak. At that time I didn't look
specifically for this problem, but anything unusual I would have noticed.
Also, have you been able to create a consistent reproducer?We can not reproduce it. On our performance test bed it happens about once
every other day (with 132 QD processes handling 3.5M messages). At the
customer site it happens two or 3 times a day, even though they have only
about 2M messages per day.
BEA Support can help youWe created a case and submitted a stack trace and core dump. They did not
find anything unusual.
but no releases were evident for your factories.That was my mistake in presenting the simplified code.
I'm also puzzled by your tooManyFactoryFailures assertion.Before I started working on the project, it was noticed that once in a blue
moon a QD process would enter a state where it would experience many factory
finder failures. No cause was found and the Assert was added to recover from
this. It was an early indication of the current problem. Since then the
factory failures have become more frequent and more severe (SEGV). That is
why I am looking at it again.
is there a reason you are hitting the factory finder each time?Again it is historical. I believe it was done to better do load-sharing.
Roger

How to design a grid to withstand a partial network failure

Hi,
We are evaluating Coherence for a mission-critical system where we want to test partial network failure scenario. We want to run 4 physical hosts, 8 JVMs with 2 JVM on each host. The evaluation criteria is to connect 2 machines on either side of a router, kill one side during a load test, thereby disconnecting the 2 machines and run with the remaining two. In order to have a fail-safe behavior in this scenario, I guess we must ascertain that the back-ups for the objects on one side of a router are always made on the other side. Can Coherence detect such a network set up and store backups accordingly? Or is there a way to configure this by overriding the default behavior?
Pls advise
Thanks,
Sairam

Hi Sairam
If you use scenario 1) then your test will work. As this scenario only has two machines then the primary node for a piece of data will be on one machine and Coherence will make sure the backup is on the other machine. If you then break the link between the machines or lose a machine you will not have lost data.
If however you have more than 2 machines then you break the link between them you have what is known as a split-brain - which means you have effectively split your cluster in two. Both sides only know they they can no longer see the other part of the cluster and assume they are must be the remaining working part. In this case though you will have lost data from the cluster as some of the backups for each part of the cluster will be on the other part. There is nothing you can do about this, you cannot control which machines backups are allocated to.
Increasing the backup count to 3 does not give you any more reliance than having a backup count of 2. As far as I know Coherence only guarantees that the placement of the first backup is on another machine.
I am not quite sure what you are trying to test as a Coherence cluster cannot automatically survive a network failure that splits the cluster. There are things in 3.6 that you might be able to do with Quorums to mitigate the damage while you recover and there are things you can do to make recovery easier - but you will have to recover lost data.
JK

Cluster point of failure

I'm trying to setup an environment where if my primary web server goes down then request will be sent to the backup. I think clustering can help me here but my fear is that I have a single point of failure on the managing server. If i have a cluster is one machine managing all traffic? and if that machine were to go down my entire site would be down. Any suggestion at how to handle this at the router level would be appreciated also.
Scott

I'm not sure I understand your question completely.
You can certainly run multiple managed servers and/or a cluster of managed servers to give you some redundancy.
You can run multiple physical and/or virtual machines.
You can run multiple sites etc for disaster recovery.
I can't recall a site I've visited in a long time that didn't do all of these.
Was there a specific question you had about HA or failure scenarios?
-- Rob
WLS Blog http://dev2dev.bea.com/blog/rwoollen/

Authorization Failure Redirect URL in OAM

Hi,
From OAM policies i want to redirect a user to Authorization Failure page by configuring redirect URL for Authorization Failure. But user is always redirected to OAM operation error page (with an error message that URL .. has been denied for the user) in case of Authorization Failure..How to redirect the user to my AuthFail.html page ? I am able to redirect the user to AuthenticationFailure page incase of authentication failure..but not able to redirect in case of authorization failure..how to achieve this?
Thanks & Regards,
Srikanth

Hi,
I am new to OAM and facing the same error in Authz Rule. Did your issue get resolved?
When I tested the URL with access tester for authz failure scenario, I got Authorized Inconclusive.
I do understand if I mention the AuthFail.html in the redirection URL Authz Inconclusive, the user would be able to see the appropriate error page. But I wanted to understand the reason for authz getting into inconclusive condition. Can someone provide me clarity on this?
Thanks!

Interchange Acknowledgment TA1 In Different Outbound Scenarios!

Outbound Scenario:
The scenario I have here is Remote Trading Partner(TP1) doesn't need TA1's either at successful or at failure scenarios. And TP2 need TA1's both in success and failure scenarios. How to achieve the scenario with single document definition(ex: 834) created under single version(5010X220A1)?
I have a Host Trading Partner and Multiple Remote Trading Partners configured. I have one Document Definition created for both the trading partners. While defining the version we have to mention the Interchange Acknowledgment Requested under Interchange tab as either 0 or 1 based on the requirement and TA1 as Always or OnError. As we mention TA1 requirement while configuring document definition itself how to achieve when the trading partners have different requirement.
If each Trading Partner has different Vendors then there is no issue as I will create different version and will update the Interchange tab accordingly as per the requirement. So the issue is if all the Trading Partners require same version, how to achieve ???
Hope I am making sense, please correct me if wrong...
Thanks

It is supported out of the box. Please follow below steps -
1. Select the remote TP profile and navigate to the "Documents" tab
2. Select your doc def and check the box against "Override Version Param"
3. Now in TA1 drop-down select the desired value
4. Delete the corresponding agreement and recreate it again
5. Deploy it and run a test again
By this way, you may have different value for TA1 flag for different TP's in spite of using same doc def.
Regards,
Anuj

PG JTAPI and CM failure, how to connect with others?

Hi All;
The main site has UCCE side A (RGR-a,PG-a, HDS&ICM&Webview-a), CVP servers and VXML Servers, Gatekeeper, two Voice Gateways and Cluster of CUCM (publisher and 3 subscribers).
Disaster Recovery site has UCCE side B (RGR-a, PG-a, HDS&ICM&Webview-a), CVP Servers and VXML Servers, Gatekeeper, two Voice Gateways and 3 CUCM all subscribers while the publisher in the main site, so the CUCM are clustered with main site.
If the CUCM in the main site had a failure (the publisher and the 3 subscribers), what are the needed configuration need to be done in the main site at the UCCE side A to let it able to communicate with the CUCM subscribers in the Disaster Recovery? Is this failover possible to be autmatic or I have to do it manual?
Your kindly advise is highly appreciated.
Regards
Bilal

Hi Bilal,
one PG can connect to only one CUCM CTI Manager. The redundancy is within the PG A and PG B.
So, if the CTI Manager that the PG A connects fails, PG B JGW process will be become active and will communicate with CTI Manager that PG B connects.
Now, that the connectivity of PG to CTI Manager. But the CTI RP's may register to different Subscriber depending on how the callmanager group is defined on device pool that is assigned to CTI RP.
You should have the Callmanager group for the CTI RP such that it contains both main site and fail over site call managers.
Here are some scenarios.
lets say, you have
sub1 and sub2 on main site
sub3 and sub4 on recovery site
PGA connect to sub1 CTI manager, and PGB connects to sub3 CTI manager. Callmanager group (sub1, sub2, sub3 and sub4 in order) for the CTI RP contains all the call managers
Failure Scenario:
1. If sub2 fails - JGW process on PG B will be active. But CTI RPs will be registered to sub2
2. if sub1 fails, JGW process will reamin active on PG A. but RPs will be registered to sub2
3. if both sub1 and sub2 fails, JGW process will be active on PG B and RPs will be registered on sub3
Hope this helps.
Thanks
- abu

SCE vlan translation with connection-mode inline on-failure bypass

Hi guys,
Can anyone tell me the behaviour of the SCE (1010 or 2020) in a failure scenario when configured with the following commands?
connection-mode inline on-failure bypass
VLAN translation increment value 5
While the SCE is in failure mode, will the VLAN increment between the SUB and NET ports still be applied (ie. applied in hardware while the control engine is being bypassed), or will all frames be passed from the SUB to the NET ports unaltered?
Regards,
Brett.

Reading through the CLI :
int LineCard 0
connection-mode inline on-failure bypass
VLAN translation ?
decrement Vlan will be decremented on network port and incremented on subscriber port
increment Vlan will be incremented on network port and decremented on subscriber port
So on traffic that is either coming in on either the network side or sub side will get either incremeneted or decremented for vlan translation.
Is there a need to do this ? what scenario are you considering using it in ? if the traffic is bypassed as is shown why you need to further massage it ?

Recovery scenarios

Hi,
I am doing testing on recovery. Can some one guide me to a link where I can find some good recovery scenarios.
I have gone through the following one:
http://docs.oracle.com/cd/B12037_01/server.101/b10735/recov.htm
Appreciate ur help

What I would do is setup a test recovery database.
Then I would document "How the failure scenario was setup" and the recovery plan.
I would start with something like this :
RMAN Block Media Recovery
RMAN Cancel Based Recovery
RMAN Duplicate Database on new Host
RMAN Log Sequence based recovery
RMAN loss of all Control files No Catalog
RMAN Loss of all Control Files
RMAN loss of all database files including SPFILE
RMAN Loss of data file and no Catalog
RMAN Loss of file containing Online Undo Segment
RMAN Loss of INACTIVE Online Redo Log Group
RMAN Loss of Media
RMAN Recovering Archived Logs Only
RMAN Recovering Datafile for which no backup exist
RMAN recovery from loss of all online redo log files
RMAN Recovery of a Datafile to a different location
RMAN Recovery of Databases with Read-Only Tablespaces
RMAN Recovery of LOSS SYSTEM TABLESPACE
RMAN Recovery of Read-Only Tablespace and Control file
RMAN Tablespace Point in Time Recovery
RMAN Time Based Recovery
You can google on some of these to get idea's for scenario's.
I would do a cold backup in case the RMAN test fails.
Make sure you have a method of adding data before/during the "failure" so you can test the results of the recovery.
Also stop and think about recoveries I have not listed.
Happy testing.
Best Regards
mseberg

Oracle DB Can't Survive ZFS SA Controller Failover

We are running two new Sparc T4-1 servers against a ZFS SA with two heads and a single DE2-24P disk shelf. It is configured with a single pool for all the storage. Our servers are clustered with VCS as an active/passive pair, so only one server accesses storage at a time. The active server runs the Oracle Enterprise DB version 12c, using dNFS to connect to the shares. Before deployment, we are testing out various failure scenarios, and I was disheartened to see that the Oracle DB doesn't handle a controller failover very well. Here's how I tested:
My DBA kicked off a large DB import job to provide some load.
I logged in to the secondary head, and issued "takeover" on the "cluster" page.
My DBA monitored the DB alert log, and reported everything looking fine.
When the primary head was back online, I logged in to it, and issued "takeover" on the "cluster" page.
This time things didn't go so well. We logged the following:
Errors in file /u04/app/oracle/diag/rdbms/aasc/aasc/trace/aasc_arc2_1296.trc:
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ORA-17516: dNFS asynchronous I/O failure
ARCH: Archival stopped, error occurred. Will continue retrying
Tue Aug 12 14:25:14 2014
ORACLE Instance aasc - Archival Error
Tue Aug 12 14:25:14 2014
ORA-16038: log 15 sequence# 339 cannot be archived
ORA-19510: failed to set size of blocks for file "" (block size=)
12-AUG-14 14:32:03.424: ORA-02374: conversion error loading table "ARCHIVED"."CR_PHOTO"
12-AUG-14 14:32:03.424: ORA-00600: internal error code, arguments: [kpudpcs_ccs-1], [], [], [], [], [], [], [], [], [], [], []
12-AUG-14 14:32:03.424: ORA-02372: data for row: INVID : ''
12-AUG-14 14:32:03.513: ORA-31693: Table data object "ARCHIVED"."CR_PHOTO" failed to load/unload and is being skipped due to error:
ORA-02354: error in exporting/importing data
ORA-00600: internal error code, arguments: [kpudpcs_ccs-1], [], [], [], [], [], [], [], [], [], [], []
My DBA said that this was a very risky outcome, and that we certainly wouldn't want this to happen to a live production instance.
I would have hoped that the second controller failover would have been invisible to the Oracle instance. What am I missing?
Thanks.
Don

your FRA filed up.
you are getting ORA-16038: log 15 sequence# 339 cannot be archived
This means that there is no more space on FRA.
You need to clean up the FRA. Here are some steps:
SQL > alter system set db_recovery_file_dest_size=18G;
http://oraclenutsandbolts.net/knowledge-base/oracle-data-guard/65-oracle-dataguard-and-oracle-standby-errors

Dataguard Failure scenario

Similar Messages

Maybe you are looking for