7410 Cluster Recovery

hi,
I'm working on a Try-and-Buy configuration of a Unified Storage 7410 Cluster.
After (intentionally) deleting the only storage pool via the GUI on the primary unit, the secondary cluster member wasn't responsive anymore. As such I tried a reboot. After the reboot, the serial console showed it trying to re-join the cluster with a spinning dash in ascii-art, but it would just carry on spinning and spinning..... As such, I removed the cluster configuration on the primary unit. Via the ssh cli, I told the second unit to factory-reset. Since that took a long time 20+ minutes, I suspected an error and rebooted.
Then, after grub, I was shown the following:
svc.configd: smf(5) database integrity check of:
+/etc/svc/repository.db+
failed.
which I was able to fix by following the instructions and running
/lib/svc/bin/restore_repository
Now, the unit does boot again, however, it hangs at the following screen, that refreshes periodically:
Sun Storage 7410 Configuration
Copyright 2009 Sun Microsystems, Inc.  All rights reserved.
NET-0 <=>  NET-1 <X>  NET-2 <X>  NET-3 <X>
SUNW-MSG-ID: AK-8000-2U, TYPE: Defect, VER: 1, SEVERITY: Major
EVENT-TIME: Thu Apr 23 17:53:21 UTC 2009
PLATFORM: i86pc, CSN: 0912QAF048, HOSTNAME: head2
SOURCE: aktty, REV: 1.51
EVENT-ID: 2c8d4cd4-5e88-ca68-e871-e1ef0af6c040
DESC: The appliance experienced an unrecoverable protocol error while
attempting to join a cluster of appliances.  Refer to
http://sun.com/msg/AK-8000-2U for more information.
AUTO-RESPONSE: No automated response is possible for this defect.
IMPACT: The appliance cannot be configured into the cluster.
REC-ACTION: Reboot the system and attempt to configure the cluster again.
Contact your vendor for support.
ESC-3: Halt   ESC-4: Reboot   ESC-5: Info
For help, see http://www.sun.com/7410/
I've rebooted the box several times, with the primary unit powered on and powered off, but this message stays the same.
I would like to re-attempt the factory reset, however, this secondary unit, does not have an IP address and as such, I cannot log in to the webinterface and make it happen, nor can I access the CLI, as I'm locked inside this refreshing screen.
Is there any way that I can get to the CLI from this screen (or anywhere during the boot sequence) so that I can re-try the factory reset ?
Looking forward to your responses.
Thanks,
Frans

Ok, since there isn't very much documentation on the topic yet, I've played around a bit more.
I've rebooted and via grub I booted into single user mode.
Sniffing around the fileystem, I found a number of ZFS snapshots.
+# zfs list -t snapshot+
NAME                                     USED  AVAIL  REFER  MOUNTPOINT
system/home@install                       18K      -  21.5K  -
system/stash@install                      16K      -    18K  -
system/svc@install                      3.94M      -  4.78M  -
system/[email protected]_1-1.1  19.0M      -  23.8M  -
system/var@install                       844K      -  31.4M  -
So I rolled back to the original (@install) versions:
+# zfs rollback system/home@install+
+# zfs rollback system/stash@install+
+# zfs rollback system/svc@install+
cannot rollback to'system/svc@install': more recent snapshots exist
use '-r' to force deletion of the following snapshots:
system/[email protected]_1-1.1
+# zfs rollback -r system/svc@install+
svc.configd: Fatal error: /etc/svc/repository.db: db error: database is locked
+# zfs rollback -r system/var@install+
And after a reboot, I see:
Sun Storage 7410 Version ak/SUNW,[email protected],1-1.9
Copyright 2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Configuring devices.
Checking hardware configuration ... done.
Starting appliance configuration .......... done.
+Press any key to begin configuring appliance: [*]+
Jay !
Now, the version string after the reboot doesn't correspond to the 2008.11.20.0.1_1-1.1 that I was running upon installation, so it seems that I haven't completely restored the system to what it should be, but at least I have some form of control again. I'll try and upload the latest software again and see if we can get it all the way back up and running again.
Frans

Similar Messages

  • Storage 7410 cluster - separating "admin" traffic from "storage" traffic

    Please help me figure out a strategy here. We have a Storage 7410 cluster running in an active/passive mode. On each node, I have cabled nge0 and nge1 each to 100Mbps ports and nxge0 and nxge1 to 10Gbps ports. I have configured nge0 to be the "admin" interface for node 1, and nge1 the same for node 2. I have aggregated nxge0 and nxge1 via LACP and it's currently owned by node 1 (fails over to node 2 nicely). Here's the basic layout:
    Node 1
    nge0 -> active "admin" interface -> ip address 172.16.158.33
    nge1 -> inactive (owned by Node 2) "admin" interface
    nxge0/nxge -> active LACP aggregate "aggr1" -> ip address 172.16.158.32
    Node 2
    nge0 -> inactive (owned by Node 1) "admin" interface
    nge1 -> active "admin" interface -> ip address 172.16.158.41
    nxge0/nxge -> inactive (owned by Node 1) LACP aggregate
    What's confusing me is routing. Right now all interfaces have IPs on the same subnet. I can define a default route for the gateway on that subnet (172.16.158.1) on the "aggr1" LACP, but only Node 1 gets routed. So, I can add two additional default routes to the same gateway, reflecting each of the other NICs (nge0, nge1). But the way I understand it, there's no guarantee that IP traffic that originated on aggr1 will return via that same interface. Or am I mistaken? Essentially, I want to segregate "storage" traffic from "admin" traffic, and I want to make sure that any host connecting to the "storage" IP address takes full advantage of the 10Gbps aggregate.
    Any ideas are welcome.
    Charles

    My assumption above was correct. At some point, traffic was now favored over nge0, so my performance went down from ~200MB/s to about 60MB/s (expected results with Windows VMs on vmware with a NFS datastore). It looks like I may have to abandon the nge ports and lose the LACP (at least until I can get a second nxge NIC in each head). Is that all I can do? Any ideas are appreciated.
    Charles

  • 7410 Cluster Admin Guide (or decent documentation)?

    I'm trying to install a clustered 7410 system. The documentation for doing this is bad or non-existent. On a different thread
    (http://forums.sun.com/thread.jspa?threadID=5388662&tstart=0), the poster claimed that there's a Cluster Admin Guide available that clears up some of the questions about setting up the clustered 7410. Does anybody know if this is available online? Or any other good source of documentation for clustered 7410 systems?
    Thanks,
    Jorge

    I am told that the cluster guide should be in the latest OS release online help facility (2009.Q2.05.x). If it is not then it will certainly be in the next major release to be out by the end of the month (2009.Q3.x). I don't see it in the 2009.Q2 release that we are running, so it may be a couple of weeks away in 2009.Q3.

  • Weblogic Cluster recovery of servers declared dead

              Hi
              1)When the HungServerRecoverSecs time is over and the server is declared dead,
              why does the server not behave as a part of the cluster even when it is brought
              up?
              2)What steps need to be taken to make sure that the server beahaves as a part
              of the cluster once it is brought back up?
              3)Is there a way by which we can restart any one of the boxes clustered together
              without affecting the other ones behaviour as a part of the cluster and then puch
              into this cluster the one we started lately?
              Thanks
              Mausam
              

              Mausam wrote:
              > Hi
              > 1)When the HungServerRecoverSecs time is over and the server is declared dead,
              > why does the server not behave as a part of the cluster even when it is brought
              > up?
              This is not true, when hungserverrecoversecs time is over, we just assume that server
              is hung and not responding. We do failover to 2ndry server at this time. When you
              recycle such server, it should become part of dynamic list of servers used by proxy
              to load balance. if this is not the behavior you are seeing, it could be a bug.
              Follow with [email protected]
              >
              > 2)What steps need to be taken to make sure that the server beahaves as a part
              > of the cluster once it is brought back up?
              none at your end. With the next HTTP request we will rewrite the list of servers HTTP
              header that will be parsed by proxy.
              >
              > 3)Is there a way by which we can restart any one of the boxes clustered together
              > without affecting the other ones behaviour
              yes, you can!
              > as a part of the cluster and then puch
              > into this cluster the one we started lately?
              yes you can add or delete as many servers as you want any time in cluster.
              >
              >
              > Thanks
              > Mausam
              Viresh Garg
              Principal Developer Relations Engineer
              BEA Systems
              

  • Storage 7410 CIFS primary group for a directory tree

    We're trying to achieve what are basically "folder-level quotas" so that a particular folder used by a particular set of users belonging to a particular Active Directory group is limited to a set quota. That's possible in Windows (and QFS for that matter). Is there any way to achieve this in our Storage 7410 cluster? I have explored setting a "primary group" for the file, but that doesn't seem to become inherited (as can the GID in posix) - it's set based on the "primary group" setting assigned to the USER.
    Basically, I'm asking how to set the NTFS equivalent of GID for a folder and have that "stick" to all subsequently created files/folders within that folder.
    Thanks!
    Charles

    Gcool wrote:Try using "adduser" (interactive version) instead and see if the behaviour remains the same.
    I used
    #sudo userdel -r mel
    to remove the account
    and then used
    #sudo adduser
    and went through the prompts
    by default it wanted to add the users group as primary and I accepted it
    everything went fine , no errors but yet again the
    cat /etc/group shows
    storage:x:95:john,mel
    scanner:x:96:john,mel
    power:x:98:john,mel
    nobody:x:99:
    users:x:100:john
    dbus:x:81:
    interestingly if I do
    #sudo id mel
    I get
    uid=1001(mel) gid=100(users) groups=100(users),7(lp),10(wheel),50(games),91(video),92(audio),93(optical),95(storage),96(scanner),98(power)
    I am officially freaked !

  • Clustered 7410 BUI screen dump

    I'm setting up a new 7410 cluster system, and have found the documentation for it very lacking. But, eventually, I think I got something working. The problem now is that, whenever I go to the Maintenance->System area, I get a screen dump of stuff and I don't know what it means. Here's the dump, as cut and pasted from the browser:
    ----- BEGIN DUMP -----
    12:10:13.498
    Exception type: coXmlrpcFault
    Native message: no such chassis
    Mapped file: https://localhost:10215/lib/crazyolait/coError.js line 37
    Mapped stack trace:
    :0     @Error()
    Native file: https://localhost:10215/lib/crazyolait/index.js line 364
    Native stack trace:
    Error()@:0
    @https://localhost:10215/lib/crazyolait/index.js:364
    Additional native members:
    faultCode: 384
    faultString: no such chassis
    coStack: coXmlrpcProxy.unmarshallDoc(doc:<object> "[object XMLDocument]", xml:"<methodResponse><fault>
    <value><struct><member>
    <name>faultCode</name>
    <value><int>384</int></value>
    </member>
    <member>
    <name>faultString</name>
    <value><string>no such chassis</string></value>
    </member>
    </struct></value>
    </fault></methodResponse>
    coXmlrpcProxy.unmarshallResponse(response:<object> "[object Object]")
    <anonymous>(response:<object> "[object Object]", null)
    <anonymous>(r:<object> "[object Object]", e:null, cb:<function> "function (response) {\n var result = null;\n var fault = null;\n var docb = true;\n try {\n result = coXmlrpcProxy.unmarshallResponse(response);\n if (result === null) {\n return;\n }\n } catch (e) {\n try {\n e.xmlrpc_method = method;\n } catch (e1) {\n }\n if (e instanceof coXmlrpcFault) {\n fault = e;\n proxy._fillFault(fault);\n } else if (proxy._coxp_onasyncerr) {\n fault = proxy._coxp_onasyncerr(e, null, method, stack, true, retries);\n } else if (e instanceof coError) {\n throw e;\n } else {\n throw new coError("Response unmarshalling failed", e, stack);\n }\n } finally {\n response = null;\n }\n try {\n if (fault && proxy._coxp_onfault) {\n docb = proxy._coxp_onfault(method, args, fault, stack);\n }\n if (docb) {\n callback(result, fault);\n }\n } catch (e) {\n try {\n e.xmlrpc_method = method;\n } catch (e1) {\n }\n if (proxy._coxp_onacberr) {\n proxy._coxp_onacberr(e, fault, method, stack);\n } else if (e instanceof coError) {\n throw e;\n } else {\n throw new coError("Unhandled error during callback", e, stack);\n }\n } finally {\n callback = null;\n stack = null;\n args = null;\n }\n}")
    <anonymous>()
    <anonymous>(<object> "[object Event]")
    xmlrpc_method: appliance.rootStatus
    faultName: EAK_HW_NOENT
    xmlrpc_fault: Message: no such chassis
    Wrapped exception: <none>
    Stack trace:
    coXmlrpcProxy.unmarshallDoc(doc:<object> "[object XMLDocument]", xml:"<methodResponse><fault>
    <value><struct><member>
    <name>faultCode</name>
    <value><int>384</int></value>
    </member>
    <member>
    <name>faultString</name>
    <value><string>no such chassis</string></value>
    </member>
    </struct></value>
    </fault></methodResponse>
    coXmlrpcProxy.unmarshallResponse(response:<object> "[object Object]")
    <anonymous>(response:<object> "[object Object]", null)
    <anonymous>(r:<object> "[object Object]", e:null, cb:<function> "function (response) {\n var result = null;\n var fault = null;\n var docb = true;\n try {\n result = coXmlrpcProxy.unmarshallResponse(response);\n if (result === null) {\n return;\n }\n } catch (e) {\n try {\n e.xmlrpc_method = method;\n } catch (e1) {\n }\n if (e instanceof coXmlrpcFault) {\n fault = e;\n proxy._fillFault(fault);\n } else if (proxy._coxp_onasyncerr) {\n fault = proxy._coxp_onasyncerr(e, null, method, stack, true, retries);\n } else if (e instanceof coError) {\n throw e;\n } else {\n throw new coError("Response unmarshalling failed", e, stack);\n }\n } finally {\n response = null;\n }\n try {\n if (fault && proxy._coxp_onfault) {\n docb = proxy._coxp_onfault(method, args, fault, stack);\n }\n if (docb) {\n callback(result, fault);\n }\n } catch (e) {\n try {\n e.xmlrpc_method = method;\n } catch (e1) {\n }\n if (proxy._coxp_onacberr) {\n proxy._coxp_onacberr(e, fault, method, stack);\n } else if (e instanceof coError) {\n throw e;\n } else {\n throw new coError("Unhandled error during callback", e, stack);\n }\n } finally {\n callback = null;\n stack = null;\n args = null;\n }\n}")
    <anonymous>()
    <anonymous>(<object> "[object Event]")
    xmlrpc_fault_code: 384
    xmlrpc_fault_string: no such chassis
    ----- END DUMP -----
    Is anybody else seeing this problem? Does anybody know what it means?
    Thanks,
    Jorge

    I just got something very similar to this, off a 7110, after I upgraded it to the latest software release (2009.09.01.1.0,1-1.3).
    Exception type: coXmlrpcFault
    Native message: no such chassis
    Native file: <undefined> line ?
    Additional native members:
        faultCode: 387
        faultString: no such chassis
        coStack: coXmlrpcProxy.unmarshallDoc(doc:<object> "[object Document]", xml:"<methodResponse><fault>
    <value><struct><member>
    <name>faultCode</name>
    <value><int>387</int></value>
    </member>
    <member>
    <name>faultString</name>
    <value><string>no such chassis</string></value>
    </member>
    </struct></value>
    </fault></methodResponse>
    ...This system has previously had both its power supplies swapped out (field replacement) but hasn't had a hard reset/power cycle since then, so my current theory is that it's got a bit confused by this. I'm slightly concerned that nothing is in the logs indicating any problem whatsoever, but it is still serving files. Unfortunately my system isn't in a convenient place for a hard reset, so that will have to wait.

  • RMAN Automatic backup/recovery with oracle fail safe, windows cluster

    Hello,
    I have question,
    1) Whether it is possible to do "RMAN Automatic Backup and recovery" in environment as ?
    Environment: -
    a. Windows clustering with windows server 2003 Enterprise Edition R2 (Two Node Clustering)
    b. Shared disk ( RAID )
    c. Oracle 10g standard edition one
    d. Oracle fail safe v3.3.3 (for redundancy)
    Here we have single oracle instance operating on single database whose files are located on shared disk.
    2) If answer to above question is yes please specify if there is some good documentation to it.
    Any help regarding this will be greatly appreciated.
    Thanks in advance,
    Rahul

    You just need to make sure that the RMAN scripts are always able to connect to the target database instance whether instance runs on cluster node 1 or cluster node 2
    here, If oracle services(resources) will shift from Node1 to Node2 (due to media or any failure) during RMAN Backup then there will be break in connection, will it destroy my backup or will it get started automatically without any harm?
    and also I want to know,
    Do we need to setup another server which will have RMAN backup script running?

  • Recovery scenario - Voting disk  does not match with the cluster guid

    Hi all,
    Think of you can not start your guest VMs just because it has a corrupted system.img root image. And assume it contains 5 physical disk( which are all created by the RAC template) hence ASM on them.
    What is the simplest recovery scneario of the guest vms (RAC)?
    Can it be a feasible scenario for recover of the availablity? (Assume both of the RAC system images are corrupted and we prefer not a system level recovery rather than backup / restore)
    1. Create 2 RAC instances using the same networking and hostname details as the ones that are corrupted. - Use 5 different new disks.
    2 Shutdown the newly created instances. Drop the disks from the newly created instances using VM manager.
    3. Add the old disks whose system image is failing to be recoverd but ASM disks are still in use (from the newly created instances using VM manager.) to the newly created instances.
    4. Open the newly created instances
    Can we expect the ASM and CRS could be initialized and be opened without a problem?
    When I try this scenario I get the folllowing error from the cssd/crsd .
    - Cluster guid 9112ddc0824fefd5ff2b7f9f7be8f048 found in voting disk does not match with the cluster guid a3eec66a2854ff0bffe784260856f92a obtained from the GPnP profile.
    - Found 0 configured voting files but 1 voting files are required, terminating to ensure data integrity.
    What could be the simplest way of recovery of a virtual machine that has healthy ASM disks but corrupted system image?
    Thank you

    Hi,
    you have a similar problem, when trying to clone databases with 11.2.
    The problem is that a cluster is uniquely identified, and this information is hold in the OCR and the Voting disks. So exactly these 2 are not to be cloned.
    To achieve what you want, simply setup your system in that way, that you have a separate diskgroup for OCR and Voting (and ASM spfile), which is not to be restored in this case of szeanrio.
    Only all database files in ASM will then be exchanged later.
    Then what you want can be achieved.
    However I am not sure that the RAC templates have the option to install OCR and Voting into a separated diskgroup.
    Regards
    Sebastian

  • Disaster Recovery server for SAP and HACMP Cluster script for SAP ECC 6.0

    Hi,
    I need  document for Disaster Recovery server for SAP? I have to configure the DR Server If you have, please share this document if possible.
    Have you any cluster script ( HACMP Cluster script for SAP ECC 6.0 ) ?
    Thanks & Regards,

    Hello, I'm doing a "Plant ECC6 to do a test disatre / Recovery.
    During the installation, I would use the same <sidadm> existing as it is in the PRD.
    If I tell you to install it during the same <sidadm> it will overwrite what already exists, or do I just enter the password of existing <sidadm>?

  • Disaster Recovery in Windows 2003/Cluster, SQL 2000 and R3

    Hi,
    Can someone share experience/knowledge of disaster recovery scenarios in MSCS/SQL Server/SAP. One of our customer has R3/SQL Server2000/Win 2003 (Cluster).
    We would like to evaluate best possible options for the Disaster Recovery which are supported by SAP.
    We have thought about
    1. Log shipping
    2. Standby Database
    3. Restore backup on new cluster
    4. Homogeneous System copy.
    We do not want to go for first two and would like to explore on 3rd and 4th option.
    Any links to documents/blogs will be helpful.
    Thanks,
    Manoj

    > I am confused. Option 3 will be restoring backup
    Yes - but what will you restore? Everything? If you're running on a cluster it's unlikely that both nodes will fail at the same time so there is still one node that can and will run the software, no?
    > and 4 will be sapinst. Isn't it? Are both options supported by SAP?
    Yes.
    > Is there a SAP standard documentation for building cluster from scratch and build SAP system from backup or sapinst for DR?
    The standard installation documentation cover a cluster installation.
    > I am sure there will be installation document if it is a fresh installation. But not sure if there is one for DR.
    If you have a cluster then you have a high availability already. If a node fails, you will "just" reinstall that node and put it back into the cluster.
    What kind of DR scenario are you thinking about?
    Markus

  • Cisco Expressway Core & Edge Cluster, or Disaster Recovery Setup

    Hi All,
    Dear experts,
    I have two sites HQ & Branch.below are my questions for Expressway Core & Edge Cluster
    1. Can i create one cluster for 6 sever for  Expressway Core and distribute 3 server in HQ &  3 server in Branch for HA
    2. Can i create one cluster for 6 sever for  Expressway Edge and distribute 3 server in HQ & 3 server in Branch for HA
    3. How would be the DNS SRV records and call flow work if the main site  Expressway Edge goes down in HQ. How the Branch Side Expressway Edge becomes active for B2B calls.
    4. What is the best design to have disaster recovery for Expressway core and Expressway Edge between Sites(HQ & Branch)
    Regards,
    Irf

    Regarding 1 and 2, as long as the round trip delay is no more than 30ms maximum, you should be fine.  I presume the HQ Branch offices for the Expressway-Core have a shared internal network, since the Core goes inside the corporate network, while the Edge goes in the DMZ or outside the network for public access.
    For 3, as long as you have your SRV records setup correct, you can set the priority of which office gets used first, and then second, in case of a failure.
    I don't have an answer for 4 unfortunately.

  • CUC 10.0.1 cluster status stuck in Split Brain Recovery (SBR) on Primary server - HA reports fine.

    Hi,
    Have a 10.01.11900 CUC cluster and everything is working fine (no one having issues with voice mail, etc) but the cluster status reports is not consistent. 
    DBreplication is showing 2 on both servers. 
    Primary unity server cluster status shows Primary/split brain recovery.
    HA Unity server cluster status shows Primary/Secondary.
    utils diagnose test - everything tests fine except the tomcat_connectors test.
    test - tomcat_connectors   : Failed - The HTTPS port is not responding to local requests.  Please collect all of the Tomcat logs for root cause analysis: file get activelog tomcat/logs/*
    We've shutdown the HA server and rebooted primary, and then waited awhile after primary was back up/active before bringing the HA server back up and still same.
    We reset DB replication and same. 
    On the HA server I made the HA primary and the cluster status flipped to Seconday/Primary and I then made primary the primary again, but the primary server cluster status always shows Split Brain Recovery for the secondary/HA server. 
    No core dumps on either server and all services are started. 
    Any one seen this before or have any thoughts?  I have a TAC Case on this but so far in same boat. 
    Would the utils cuc cluster renegotiate command help? Did not replace a server so don't really want to overwrite data to publisher server. Issue seems to be with the publisher since HA shows fine but not sure. I don't want to lose messages/etc so don't want really want to run these commands.  
    Thanks.

    Ok, thanks.
    The SRM logs indicate the Connection Digital Networking Replication Agent service is not running, however when I start it it stops right away and the cuReplicator log states digital networking is not enabled. 
    From SRM Log:
    23:47:20.100 |17755,,,SRM,7,<svcmon> checkServiceStatus: started service monitoring
    23:47:20.100 |17755,,,SRM,7,<svcmon> Service Status: 1 service(s) not running. Service name(s):
    23:47:20.100 |17755,,,SRM,7,<svcmon> Connection Digital Networking Replication Agent
    23:47:24.674 |28471,,,SRM,11,<Timer-3> [snd] Type: Heartbeat
    From Replicator log:
    admin:file tail activelog cuc/diag_CuReplicator_00000049.uc
    23:42:59.208 HDR|09/14/2014 ,Significant
    23:42:59.208 |28914,,,CuReplicator,0,Digital Networking is not enabled. Replicator will stop now.
    There is no digital networking setup to other unity systems, and only one location. 
    Also, the Server role manager can't be restarted from CLI or the GUI so either root or a server reboot. 
    I compared it to another CUC cluster and deactivated the Digital Networking service and the SRM logs seem happier now, will wait a bit and see if it clears the SBR status up. 

  • Backup recovery in sun cluster production

    Hello Friends,
              Can any body provide me document for Backuprecovery in suncluster based sapR3 SERVER.
    Thanks in Advance.

    Yes, we use 2-way replication, but we don't use cache connect. The replication is created like this on both servers:
    create replication MYDB.REPSCHEME
    element SERVER01_DS datastore
    master MYDB on "SERVER01_REP"
    transmit nondurable
    subscriber MYDB on "SERVER02_REP"
    element SERVER02_DS datastore
    master MYDB on "SERVER02_REP"
    transmit nondurable
    subscriber MYDB on "SERVER01_REP"
    store MYDB on "SERVER01_REP"
    port 16004
    failthreshold 500
    store MYDB on "SERVER02_REP"
    port 16004
    failthreshold 500
    The application runs on SERVER01 and is standby on SERVER02. If an invalid state is detected in the application, the application on SERVER01 is stopped and the application on SERVER02 is started.
    In addition to this, we want to fail over if the database on the SERVER01 is in invalid state. What should we have monitored by the Clustering Agent to detect an invalid state in TT?

  • Help Educate Me on an RMAN Recovery Question

    Perhaps I am not understanding how recovery should work, so, let me lay out my situation:
    First things first:
    Oracle 11.2.0.2.5 on AIX 6.1
    Source database: POR02P on a 4 node cluster
    Auxiliary database: POR02x (single instance)
    I want to duplicate my source database to my auxiliary database as of 02-APR-2013 at 3:00PM
    Here are how my backups run:
    LEVEL0 On Sunday at 8:00 AM
    LEVEL1 Differential all other days at 8:00 AM
    Intermediate Archivelogs backups run at the following times daily: 00:45, 11:45, 19:45
    March 31 is my LEVEL0.
    So, the way I understand things, to do this duplicate:
    I need my LEVEL0 from Sunday
    I need my LEVEL1 from Monday
    I need my LEVEL1 from Tuesday
    I need all my archivelogs from just before the LEVEL1 on Tuesday started all the way to after 3:00PM on Tuesday:
    Restore archive backups from:
    4/2 11:45
    4/2 19:45
    All my backups include a controlfile.
    According to my above analysis, I made sure the required files were on disk.
    After making sure my files were on disk, I went to my source database and ran the following to make sure Oracle had all the files it needs:
    RMAN> run {
    2> set until time = "to_date('02-APR-2013 15:00:00','DD-MON-YYYY HH24:MI:SS')";
    3> restore database preview summary;
    4> }
    This above command succeeds and I can verify all the files it mentions are on disk.
    Here is the problem, the restore of the LEVEL0 and LEVEL1's succeed.  When it starts recovery, it fails asking for an archive log that both:
    *1. Was not listed as part of the PREVIEW command*
    *2. Is from a time between my LEVEL0 and first LEVEL1.*
    I was under the impression that RMAN does not need archivelogs from BETWEEN incremental backups.
    Am I wrong about this?
    Doesn't an incremental LEVEL1 differential go back to the last incremental and get all the needed blocks thereby making the "in between" archivelogs obsolete?
    If I am wrong about this, that solves part of my confusion...
    The other part of my confusion is why does the PREVIEW command not specify the archivelogs it asks for during recovery?
    If you can answer my question now, you can skip the rest.
    Else, here are all the dirty details.
    Here is mt PREVIEW command as run on the source database. The output supports how I think this should work:
    RMAN> run {
    2> set until time = "to_date('02-APR-2013 15:00:00','DD-MON-YYYY HH24:MI:SS')";
    3> restore database preview;
    4> }
    executing command: SET until clause
    Starting restore at 23-APR-2013 13:47:30
    using channel ORA_DISK_1
    using channel ORA_DISK_2
    List of Backup Sets
    ===================
    BS Key Type LV Size Device Type Elapsed Time Completion Time
    11386312 Incr 0 105.12M DISK 00:00:23 31-MAR-2013 08:00:29
    BP Key: 11386324 Status: AVAILABLE Compressed: YES Tag: LEVEL0
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130331_e6o5su46_1_1.BAK
    List of Datafiles in backup set 11386312
    File LV Type Ckp SCN Ckp Time Name
    1 0 Incr 15658251 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/system.258.810123295
    3 0 Incr 15658251 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs1.260.810123303
    4 0 Incr 15658251 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs2.261.810123315
    5 0 Incr 15658251 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs3.262.810123315
    6 0 Incr 15658251 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs4.263.810123317
    9 0 Incr 15658251 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_ias_orasdpm.274.810135793
    14 0 Incr 15658251 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_oim_lob.281.810135799
    BS Key Type LV Size Device Type Elapsed Time Completion Time
    11613131 Incr 1 122.93M DISK 00:00:21 01-APR-2013 08:00:19
    BP Key: 11613143 Status: AVAILABLE Compressed: YES Tag: LEVEL1D
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130401_f2o5vig5_1_1.BAK
    List of Datafiles in backup set 11613131
    File LV Type Ckp SCN Ckp Time Name
    1 1 Incr 16624714 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/system.258.810123295
    3 1 Incr 16624714 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs1.260.810123303
    4 1 Incr 16624714 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs2.261.810123315
    5 1 Incr 16624714 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs3.262.810123315
    6 1 Incr 16624714 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs4.263.810123317
    9 1 Incr 16624714 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_ias_orasdpm.274.810135793
    14 1 Incr 16624714 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_oim_lob.281.810135799
    BS Key Type LV Size Device Type Elapsed Time Completion Time
    11784495 Incr 1 119.81M DISK 00:00:20 02-APR-2013 08:00:19
    BP Key: 11784507 Status: AVAILABLE Compressed: YES Tag: LEVEL1D
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_fuo626s5_1_1.BAK
    List of Datafiles in backup set 11784495
    File LV Type Ckp SCN Ckp Time Name
    1 1 Incr 17547965 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/system.258.810123295
    3 1 Incr 17547965 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs1.260.810123303
    4 1 Incr 17547965 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs2.261.810123315
    5 1 Incr 17547965 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs3.262.810123315
    6 1 Incr 17547965 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/undotbs4.263.810123317
    9 1 Incr 17547965 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_ias_orasdpm.274.810135793
    14 1 Incr 17547965 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_oim_lob.281.810135799
    BS Key Type LV Size Device Type Elapsed Time Completion Time
    11386314 Incr 0 901.41M DISK 00:03:33 31-MAR-2013 08:03:38
    BP Key: 11386326 Status: AVAILABLE Compressed: YES Tag: LEVEL0
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130331_e5o5su45_1_1.BAK
    List of Datafiles in backup set 11386314
    File LV Type Ckp SCN Ckp Time Name
    2 0 Incr 15658225 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/sysaux.259.810123299
    7 0 Incr 15658225 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/users.264.810123317
    8 0 Incr 15658225 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_mds.273.810135793
    10 0 Incr 15658225 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_soainfra.275.810135795
    11 0 Incr 15658225 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_oim.276.810135797
    12 0 Incr 15658225 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_biplatform.277.810135797
    13 0 Incr 15658225 31-MAR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_ias_opss.278.810135797
    BS Key Type LV Size Device Type Elapsed Time Completion Time
    11613133 Incr 1 77.77M DISK 00:00:36 01-APR-2013 08:00:41
    BP Key: 11613145 Status: AVAILABLE Compressed: YES Tag: LEVEL1D
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130401_f1o5vig5_1_1.BAK
    List of Datafiles in backup set 11613133
    File LV Type Ckp SCN Ckp Time Name
    2 1 Incr 16624711 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/sysaux.259.810123299
    7 1 Incr 16624711 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/users.264.810123317
    8 1 Incr 16624711 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_mds.273.810135793
    10 1 Incr 16624711 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_soainfra.275.810135795
    11 1 Incr 16624711 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_oim.276.810135797
    12 1 Incr 16624711 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_biplatform.277.810135797
    13 1 Incr 16624711 01-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_ias_opss.278.810135797
    BS Key Type LV Size Device Type Elapsed Time Completion Time
    11784497 Incr 1 71.67M DISK 00:00:27 02-APR-2013 08:00:32
    BP Key: 11784509 Status: AVAILABLE Compressed: YES Tag: LEVEL1D
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_fto626s5_1_1.BAK
    List of Datafiles in backup set 11784497
    File LV Type Ckp SCN Ckp Time Name
    2 1 Incr 17547912 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/sysaux.259.810123299
    7 1 Incr 17547912 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/users.264.810123317
    8 1 Incr 17547912 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_mds.273.810135793
    10 1 Incr 17547912 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_soainfra.275.810135795
    11 1 Incr 17547912 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_oim.276.810135797
    12 1 Incr 17547912 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_biplatform.277.810135797
    13 1 Incr 17547912 02-APR-2013 08:00:06 +POR02P_DATA/por02p/datafile/lfgprod_ias_opss.278.810135797
    List of Backup Sets
    ===================
    BS Key Size Device Type Elapsed Time Completion Time
    11784501 60.89M DISK 00:00:09 02-APR-2013 08:00:53
    BP Key: 11784513 Status: AVAILABLE Compressed: YES Tag: ARCHIVE_BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g1o626tc_1_1.BAK
    List of Archived Logs in backup set 11784501
    Thrd Seq Low SCN Low Time Next SCN Next Time
    2 147 17279219 02-APR-2013 00:45:13 17549976 02-APR-2013 08:00:35
    4 149 17279360 02-APR-2013 00:45:13 17549980 02-APR-2013 08:00:35
    1 179 17279579 02-APR-2013 00:45:15 17549968 02-APR-2013 08:00:34
    BS Key Size Device Type Elapsed Time Completion Time
    11784499 26.99M DISK 00:00:04 02-APR-2013 08:00:48
    BP Key: 11784511 Status: AVAILABLE Compressed: YES Tag: ARCHIVE_BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g2o626tc_1_1.BAK
    List of Archived Logs in backup set 11784499
    Thrd Seq Low SCN Low Time Next SCN Next Time
    3 149 17279600 02-APR-2013 00:45:15 17549971 02-APR-2013 08:00:34
    1 180 17549968 02-APR-2013 08:00:34 17549996 02-APR-2013 08:00:40
    3 150 17549971 02-APR-2013 08:00:34 17549999 02-APR-2013 08:00:40
    2 148 17549976 02-APR-2013 08:00:35 17550003 02-APR-2013 08:00:41
    BS Key Size Device Type Elapsed Time Completion Time
    11784500 2.00K DISK 00:00:01 02-APR-2013 08:00:52
    BP Key: 11784512 Status: AVAILABLE Compressed: YES Tag: ARCHIVE_BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g3o626tj_1_1.BAK
    List of Archived Logs in backup set 11784500
    Thrd Seq Low SCN Low Time Next SCN Next Time
    4 150 17549980 02-APR-2013 08:00:35 17550007 02-APR-2013 08:00:41
    BS Key Size Device Type Elapsed Time Completion Time
    11798434 28.77M DISK 00:00:05 02-APR-2013 11:45:28
    BP Key: 11798442 Status: AVAILABLE Compressed: YES Tag: ARCHIVE_BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g7o62k2j_1_1.BAK
    List of Archived Logs in backup set 11798434
    Thrd Seq Low SCN Low Time Next SCN Next Time
    1 181 17549996 02-APR-2013 08:00:40 17687415 02-APR-2013 11:45:05
    3 151 17549999 02-APR-2013 08:00:40 17687418 02-APR-2013 11:45:05
    BS Key Size Device Type Elapsed Time Completion Time
    11798432 20.40M DISK 00:00:03 02-APR-2013 11:45:27
    BP Key: 11798440 Status: AVAILABLE Compressed: YES Tag: ARCHIVE_BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g8o62k2k_1_1.BAK
    List of Archived Logs in backup set 11798432
    Thrd Seq Low SCN Low Time Next SCN Next Time
    2 149 17550003 02-APR-2013 08:00:41 17687399 02-APR-2013 11:45:03
    4 151 17550007 02-APR-2013 08:00:41 17687403 02-APR-2013 11:45:03
    2 150 17687399 02-APR-2013 11:45:03 17688884 02-APR-2013 11:45:15
    4 152 17687403 02-APR-2013 11:45:03 17688895 02-APR-2013 11:45:15
    BS Key Size Device Type Elapsed Time Completion Time
    11798433 66.00K DISK 00:00:00 02-APR-2013 11:45:27
    BP Key: 11798441 Status: AVAILABLE Compressed: YES Tag: ARCHIVE_BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g9o62k2n_1_1.BAK
    List of Archived Logs in backup set 11798433
    Thrd Seq Low SCN Low Time Next SCN Next Time
    1 182 17687415 02-APR-2013 11:45:05 17688871 02-APR-2013 11:45:14
    3 152 17687418 02-APR-2013 11:45:05 17688875 02-APR-2013 11:45:14
    BS Key Size Device Type Elapsed Time Completion Time
    11834701 54.22M DISK 00:00:08 02-APR-2013 19:45:32
    BP Key: 11834709 Status: AVAILABLE Compressed: YES Tag: ARCHIVE_BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_gdo63g6k_1_1.BAK
    List of Archived Logs in backup set 11834701
    Thrd Seq Low SCN Low Time Next SCN Next Time
    1 183 17688871 02-APR-2013 11:45:14 17982647 02-APR-2013 19:45:04
    3 153 17688875 02-APR-2013 11:45:14 17982641 02-APR-2013 19:45:04
    BS Key Size Device Type Elapsed Time Completion Time
    11834699 44.18M DISK 00:00:07 02-APR-2013 19:45:31
    BP Key: 11834707 Status: AVAILABLE Compressed: YES Tag: ARCHIVE_BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_geo63g6k_1_1.BAK
    List of Archived Logs in backup set 11834699
    Thrd Seq Low SCN Low Time Next SCN Next Time
    2 151 17688884 02-APR-2013 11:45:15 17982650 02-APR-2013 19:45:04
    4 153 17688895 02-APR-2013 11:45:15 17982653 02-APR-2013 19:45:04
    Media recovery start SCN is 17547912
    Recovery must be done beyond SCN 17548424 to clear datafile fuzziness
    Finished restore at 23-APR-2013 13:47:32
    Here is a summary of the files it lists:
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130331_e6o5su46_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130401_f2o5vig5_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_fuo626s5_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130331_e5o5su45_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130401_f1o5vig5_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_fto626s5_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g1o626tc_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g2o626tc_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g3o626tj_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g7o62k2j_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g8o62k2k_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_g9o62k2n_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_gdo63g6k_1_1.BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130402_geo63g6k_1_1.BAK
    Here is my list of files on disk. There are actually
    more files listed here because the PREVIEW command does
    not show pieces with Control/SPFILE backups:
    -rw-r----- 1 oracle oinstall 945201152 Mar 31 08:03 POR02P_20130331_e5o5su45_1_1.BAK
    -rw-r----- 1 oracle oinstall 110231552 Mar 31 08:00 POR02P_20130331_e6o5su46_1_1.BAK
    -rw-r----- 1 oracle oinstall 1261568 Mar 31 08:00 POR02P_20130331_e7o5su4v_1_1.BAK
    -rw-r----- 1 oracle oinstall 81559552 Apr 1 08:00 POR02P_20130401_f1o5vig5_1_1.BAK
    -rw-r----- 1 oracle oinstall 128909312 Apr 1 08:00 POR02P_20130401_f2o5vig5_1_1.BAK
    -rw-r----- 1 oracle oinstall 1277952 Apr 1 08:00 POR02P_20130401_f3o5vigv_1_1.BAK
    -rw-r----- 1 oracle oinstall 75161600 Apr 2 08:00 POR02P_20130402_fto626s5_1_1.BAK
    -rw-r----- 1 oracle oinstall 125640704 Apr 2 08:00 POR02P_20130402_fuo626s5_1_1.BAK
    -rw-r----- 1 oracle oinstall 1294336 Apr 2 08:00 POR02P_20130402_fvo626su_1_1.BAK
    -rw-r----- 1 oracle oinstall 63847936 Apr 2 08:00 POR02P_20130402_g1o626tc_1_1.BAK
    -rw-r----- 1 oracle oinstall 28300800 Apr 2 08:00 POR02P_20130402_g2o626tc_1_1.BAK
    -rw-r----- 1 oracle oinstall 2560 Apr 2 08:00 POR02P_20130402_g3o626tj_1_1.BAK
    -rw-r----- 1 oracle oinstall 30164480 Apr 2 11:45 POR02P_20130402_g7o62k2j_1_1.BAK
    -rw-r----- 1 oracle oinstall 21393920 Apr 2 11:45 POR02P_20130402_g8o62k2k_1_1.BAK
    -rw-r----- 1 oracle oinstall 68096 Apr 2 11:45 POR02P_20130402_g9o62k2n_1_1.BAK
    -rw-r----- 1 oracle oinstall 56855552 Apr 2 19:45 POR02P_20130402_gdo63g6k_1_1.BAK
    -rw-r----- 1 oracle oinstall 46324224 Apr 2 19:45 POR02P_20130402_geo63g6k_1_1.BAK
    -rw-r----- 1 oracle oinstall 3072 Apr 2 19:45 POR02P_20130402_gfo63g6r_1_1.BAK
    Okay, so at this point, I think I have everything I need.
    Here is my RMAN command:
    # start up auxiliary in nomount
    run {
    allocate auxiliary channel a1 type disk;
    allocate auxiliary channel a2 type disk;
    allocate auxiliary channel a3 type disk;
    allocate auxiliary channel a4 type disk;
    set until time = "to_date('02-APR-2013:15:00:00','DD-MON-YYYY:HH24:MI:SS')";
    DUPLICATE DATABASE TO por02x
    BACKUP LOCATION '/backup_rman/backupset/por02p'
    NOFILENAMECHECK
    LOGFILE
    GROUP 1 ('+DB_REDO') SIZE 512M,
    GROUP 2 ('+DB_REDO') SIZE 512M,
    GROUP 3 ('+DB_REDO') SIZE 512M,
    GROUP 4 ('+DB_REDO') SIZE 512M,
    GROUP 5 ('+DB_REDO') SIZE 512M,
    GROUP 6 ('+DB_REDO') SIZE 512M,
    GROUP 7 ('+DB_REDO') SIZE 512M,
    GROUP 8 ('+DB_REDO') SIZE 512M;
    For now, I won't paste in the entire output of the log,
    just enough to see that it works fine through the restore,
    then fails at an archivelog. If someone wants the whole
    thing, I can add it in:
    RMAN> run {
    2> allocate auxiliary channel a1 type disk;
    3> allocate auxiliary channel a2 type disk;
    4> allocate auxiliary channel a3 type disk;
    5> allocate auxiliary channel a4 type disk;
    6> set until time = "to_date('02-APR-2013:15:00:00','DD-MON-YYYY:HH24:MI:SS')";
    7> DUPLICATE DATABASE TO por02x
    8> BACKUP LOCATION '/backup_rman/backupset/por02p'
    9> NOFILENAMECHECK
    10> LOGFILE
    11> GROUP 1 ('+DB_REDO') SIZE 512M,
    12> GROUP 2 ('+DB_REDO') SIZE 512M,
    13> GROUP 3 ('+DB_REDO') SIZE 512M,
    14> GROUP 4 ('+DB_REDO') SIZE 512M,
    15> GROUP 5 ('+DB_REDO') SIZE 512M,
    16> GROUP 6 ('+DB_REDO') SIZE 512M,
    17> GROUP 7 ('+DB_REDO') SIZE 512M,
    18> GROUP 8 ('+DB_REDO') SIZE 512M;
    19> }
    allocated channel: a1
    channel a1: SID=49 device type=DISK
    allocated channel: a2
    channel a2: SID=98 device type=DISK
    allocated channel: a3
    channel a3: SID=146 device type=DISK
    allocated channel: a4
    channel a4: SID=194 device type=DISK
    executing command: SET until clause
    Starting Duplicate Db at 23-APR-2013 14:22:07
    contents of Memory Script:
    sql clone "create spfile from memory";
    executing Memory Script
    sql statement: create spfile from memory
    contents of Memory Script:
    shutdown clone immediate;
    startup clone nomount;
    executing Memory Script
    Oracle instance shut down
    connected to auxiliary database (not started)
    Oracle instance started
    Total System Global Area 2137886720 bytes
    Fixed Size 2221336 bytes
    Variable Size 503319272 bytes
    Database Buffers 1610612736 bytes
    Redo Buffers 21733376 bytes
    allocated channel: a1
    channel a1: SID=98 device type=DISK
    allocated channel: a2
    channel a2: SID=146 device type=DISK
    allocated channel: a3
    channel a3: SID=194 device type=DISK
    allocated channel: a4
    channel a4: SID=242 device type=DISK
    contents of Memory Script:
    sql clone "alter system set db_name =
    ''POR02P'' comment=
    ''Modified by RMAN duplicate'' scope=spfile";
    sql clone "alter system set db_unique_name =
    ''POR02X'' comment=
    ''Modified by RMAN duplicate'' scope=spfile";
    shutdown clone immediate;
    startup clone force nomount
    restore clone primary controlfile from '/backup_rman/backupset/por02p/POR02P_20130331_e7o5su4v_1_1.BAK';
    alter clone database mount;
    executing Memory Script
    sql statement: alter system set db_name = ''POR02P'' comment= ''Modified by RMAN duplicate'' scope=spfile
    sql statement: alter system set db_unique_name = ''POR02X'' comment= ''Modified by RMAN duplicate'' scope=spfile
    Oracle instance shut down
    Oracle instance started
    Total System Global Area 2137886720 bytes
    Fixed Size 2221336 bytes
    Variable Size 503319272 bytes
    Database Buffers 1610612736 bytes
    Redo Buffers 21733376 bytes
    allocated channel: a1
    channel a1: SID=98 device type=DISK
    allocated channel: a2
    channel a2: SID=146 device type=DISK
    allocated channel: a3
    channel a3: SID=194 device type=DISK
    allocated channel: a4
    channel a4: SID=242 device type=DISK
    Starting restore at 23-APR-2013 14:23:02
    channel a2: skipped, AUTOBACKUP already found
    channel a3: skipped, AUTOBACKUP already found
    channel a4: skipped, AUTOBACKUP already found
    channel a1: restoring control file
    channel a1: restore complete, elapsed time: 00:00:10
    output file name=+POR02U_CTL/por02x/control01.ctl
    Finished restore at 23-APR-2013 14:23:12
    database mounted
    channel a1: starting datafile backup set restore
    channel a1: specifying datafile(s) to restore from backup set
    channel a1: restoring datafile 00001 to +por02u_data
    channel a1: restoring datafile 00003 to +por02u_data
    channel a1: restoring datafile 00004 to +por02u_data
    channel a1: restoring datafile 00005 to +por02u_data
    channel a1: restoring datafile 00006 to +por02u_data
    channel a1: restoring datafile 00009 to +por02u_data
    channel a1: restoring datafile 00014 to +por02u_data
    channel a1: reading from backup piece /backup_rman/backupset/por02p/POR02P_20130331_e6o5su46_1_1.BAK
    channel a2: starting datafile backup set restore
    channel a2: specifying datafile(s) to restore from backup set
    channel a2: restoring datafile 00002 to +por02u_data
    channel a2: restoring datafile 00007 to +por02u_data
    channel a2: restoring datafile 00008 to +por02u_data
    channel a2: restoring datafile 00010 to +por02u_data
    channel a2: restoring datafile 00011 to +por02u_data
    channel a2: restoring datafile 00012 to +por02u_data
    channel a2: restoring datafile 00013 to +por02u_data
    channel a2: reading from backup piece /backup_rman/backupset/por02p/POR02P_20130331_e5o5su45_1_1.BAK
    channel a1: piece handle=/backup_rman/backupset/por02p/POR02P_20130331_e6o5su46_1_1.BAK tag=LEVEL0
    channel a1: restored backup piece 1
    channel a1: restore complete, elapsed time: 00:00:55
    channel a2: piece handle=/backup_rman/backupset/por02p/POR02P_20130331_e5o5su45_1_1.BAK tag=LEVEL0
    channel a2: restored backup piece 1
    channel a2: restore complete, elapsed time: 00:05:05
    Finished restore at 23-APR-2013 14:28:24
    channel a2: reading from backup piece /backup_rman/backupset/por02p/POR02P_20130402_fto626s5_1_1.BAK
    channel a1: piece handle=/backup_rman/backupset/por02p/POR02P_20130402_fuo626s5_1_1.BAK tag=LEVEL1D
    channel a1: restored backup piece 1
    channel a1: restore complete, elapsed time: 00:00:35
    channel a2: piece handle=/backup_rman/backupset/por02p/POR02P_20130402_fto626s5_1_1.BAK tag=LEVEL1D
    channel a2: restored backup piece 1
    channel a2: restore complete, elapsed time: 00:00:35
    starting media recovery
    unable to find archived log
    archived log thread=1 sequence=163
    released channel: a1
    released channel: a2
    released channel: a3
    released channel: a4
    RMAN-00571: ===========================================================
    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
    RMAN-00571: ===========================================================
    RMAN-03002: failure of Duplicate Db command at 04/23/2013 14:29:41
    RMAN-05501: aborting duplication of target database
    RMAN-03015: error occurred in stored script Memory Script
    RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 163 and starting SCN of 15660537
    RMAN> **end-of-file**
    So, if I go back to my source:
    RMAN> list backup of archivelog sequence 163 thread 1;
    List of Backup Sets
    ===================
    BS Key Size Device Type Elapsed Time Completion Time
    11386318 66.51M DISK 00:00:09 31-MAR-2013 08:04:08
    BP Key: 11386330 Status: EXPIRED Compressed: YES Tag: ARCHIVE_BAK
    Piece Name: /backup_rman/backupset/por02p/POR02P_20130331_e9o5subf_1_1.BAK
    List of Archived Logs in backup set 11386318
    Thrd Seq Low SCN Low Time Next SCN Next Time
    1 163 15336390 31-MAR-2013 00:45:14 15661283 31-MAR-2013 08:03:51
    The archivelog is not reported as part of the preview
    command. Also, shouldn't it's changes be included in
    the next day's LEVEL1?
    Thanks for anyone's time. Would be happy to provide
    more info.

    Levi,
    Thank you very much for your insight.
    However, I may have found something. I have not fully tested and found a solution so I have not posted an update yet...I want to make sure I have a complete answer. However, I will provide some details.
    I decided to go ahead and restore ALL my archivelogs just to see if I could get the recovery to work.
    It still did not work!
    So, I happened to look at my alert log (can't believe I did not check it before) and I saw this:
    Wed Apr 24 13:34:20 2013
    Errors with log +DB_ARCH/por02x/archivelog/2013_04_24/thread_4_seq_151.7352.813591241
    Recovery interrupted!
    Recovered data files to a consistent state at change 17574527
    Media Recovery failed with error 19755
    Errors in file /opt/app/oracle/diag/rdbms/por02x/por02x/trace/por02x_pr00_2973744.trc:
    ORA-00283: recovery session canceled due to errors
    ORA-19755: could not open change tracking file
    ORA-19750: change tracking file: '+POR02P_DATA/por02p/changetracking/ctf.272.810127579'
    ORA-17503: ksfdopn:2 Failed to open file +POR02P_DATA/por02p/changetracking/ctf.272.810127579
    ORA-15001: diskgroup "POR02P_DATA" does not exist or is not mounted
    ORA-15001: diskgroup "POR02P_DATA" does not exist or is not mounted
    Slave exiting with ORA-283 exception
    Errors in file /opt/app/oracle/diag/rdbms/por02x/por02x/trace/por02x_pr00_2973744.trc:
    ORA-00283: recovery session canceled due to errors
    ORA-19755: could not open change tracking file
    ORA-19750: change tracking file: '+POR02P_DATA/por02p/changetracking/ctf.272.810127579'
    ORA-17503: ksfdopn:2 Failed to open file +POR02P_DATA/por02p/changetracking/ctf.272.810127579
    ORA-15001: diskgroup "POR02P_DATA" does not exist or is not mounted
    ORA-15001: diskgroup "POR02P_DATA" does not exist or is not mounted
    ORA-10877 signalled during: alter database recover logfile '+DB_ARCH/por02x/archivelog/2013_04_24/thread_4_seq_151.7352.813591241'...
    Then, I looked on Metalink and found this:
    Oracle Support Document 1098638.1 (Rman Duplicate fail ORA-19755, Tries Open The Block Change Tracking File of Source DB) can be found at: https://support.oracle.com/epmos/faces/DocumentDisplay?id=1098638.1
    [https://support.oracle.com/epmos/faces/DocumentDisplay?id=1098638.1]
    Now, this document is very interesting within the context of the problem I am having.  However, it says a few things:
    1. This is fixed in the oracle patchset 11.2.0.2 as part of Bug 7500916.
    ---I am on patchset 11.2.0.2!
    2. The workaround is to disable change tracking before duplicate.
    ---this is not really an option for me...we like having this turned on...it considerably speeds up our backups
    3. Workaround is to set DB_FILE_NAME_CONVERT in the SET-clause of duplicate, instead of using DB_FILE_NAME_CONVERT setting in an init.ora or spfile.
    ---tried this...did not work
    4. Workaround is to create a dummy file in the location where the error ORA-19755 is signalled.
    ---have not tried this yet
    Another interesting thing to note:
    My database is left in mounted state when it fails.  After seeing this document, just for fun I:
    SQL> alter database disable block change tracking;
    Database altered.
    And Then:
    RMAN> run {
    2> set until time = "to_date('02-APR-2013:15:00:00','DD-MON-YYYY:HH24:MI:SS')";
    3> recover database;
    4> }
    executing command: SET until clause
    Starting recover at 24-APR-2013 14:20:20
    using target database control file instead of recovery catalog
    allocated channel: ORA_DISK_1
    channel ORA_DISK_1: SID=98 device type=DISK
    allocated channel: ORA_DISK_2
    channel ORA_DISK_2: SID=146 device type=DISK
    starting media recovery
    archived log for thread 1 with sequence 181 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_1_seq_181.7864.813591241
    archived log for thread 1 with sequence 182 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_1_seq_182.2358.813591247
    archived log for thread 1 with sequence 183 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_1_seq_183.3637.813591249
    archived log for thread 2 with sequence 149 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_2_seq_149.9547.813591241
    archived log for thread 2 with sequence 150 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_2_seq_150.9570.813591241
    archived log for thread 2 with sequence 151 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_2_seq_151.6148.813591247
    archived log for thread 3 with sequence 151 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_3_seq_151.9630.813591239
    archived log for thread 3 with sequence 152 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_3_seq_152.4418.813591247
    archived log for thread 3 with sequence 153 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_3_seq_153.9782.813591247
    archived log for thread 4 with sequence 151 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_4_seq_151.7352.813591241
    archived log for thread 4 with sequence 152 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_4_seq_152.6936.813591241
    archived log for thread 4 with sequence 153 is already on disk as file +DB_ARCH/por02x/archivelog/2013_04_24/thread_4_seq_153.9610.813591249
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_4_seq_151.7352.813591241 thread=4 sequence=151
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_1_seq_181.7864.813591241 thread=1 sequence=181
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_2_seq_149.9547.813591241 thread=2 sequence=149
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_3_seq_151.9630.813591239 thread=3 sequence=151
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_2_seq_150.9570.813591241 thread=2 sequence=150
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_4_seq_152.6936.813591241 thread=4 sequence=152
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_1_seq_182.2358.813591247 thread=1 sequence=182
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_3_seq_152.4418.813591247 thread=3 sequence=152
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_1_seq_183.3637.813591249 thread=1 sequence=183
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_3_seq_153.9782.813591247 thread=3 sequence=153
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_2_seq_151.6148.813591247 thread=2 sequence=151
    archived log file name=+DB_ARCH/por02x/archivelog/2013_04_24/thread_4_seq_153.9610.813591249 thread=4 sequence=153
    media recovery complete, elapsed time: 00:00:08
    Finished recover at 24-APR-2013 14:20:36
    So, it finishes the recovery. However, I did not yet try to open resetlogs because this was originally a duplicate and doing an open resetlogs would not complete all the post-duplicate operations.
    Also, looking at the logs applied when running the above recover, now it only uses logs that were specified in the original PREVIEW command.
    I am going to play around some more and I promise I will post back with whatever results I happen across.
    Chris..

  • Need suggestion on  implementing JMS message error recovery

    Hi,
    Our application has a JMS topic where we publish application events. Now, there can be scenarios where the consumers cannot process the message due to some infrastructure issues and would error out. We need a way so that those messages can be reprocessed again later. we are thinking of the following design for JMS message error recovery
    1. Use a persistent TOPIC (this would ensure guaranteed delivery)
    2. Configure a error destination on JMS topic e.g a jms queue
    3. Have an error handling MDB listening to the Error destination. An error handling MDB would dequeue the errored messages from error destination and persist it to a Data base "error" table..
    4. Provide a mechanism to republish those messages to topic (e.g a scheduler or admin ui or a command line utility) .. The messages would deleted from database "error" table and published to topic again....
    A. Are there any issues with the above design which we need to handle?
    B. Are there any additional steps required in a Cluster environment with a distributed topic and distribute error destination? (our error mdb will have one-copy-per-application setting)
    B. From a performance angle, Is it OK to use persistent TOPIC ? Or will it better to persist the message to the db table and then publish it as a non persistent message ... ? (But i guess the performance should be more or less the same in both of these approaches)
    C. Are there any other recommended design patterns for error recovery of JMS messages
    Please advise.
    Regards,
    Arif

    Thanks Tom !
    We may not be able to go with the approach of delaying/pausing redelivery of the messsage because
    1. Pausing entire MDB approach: Our MDB application consumes messages generated by different producers and our MDB needs to continue processing the messages even if messages corresponding to one producer is erroring out
    2. Redelivery delay : This would only delay the retry of an errored message. But there would still be a problem if the message fails during all retries (i.e redelivery limit count). We don't want to lose this message. In our case, It is possible that a particular message cannot be processed due to unavailability of a third party system for hours or may be a day.
    Basically, i am looking on approaches for a robust and performant error recovery/retry framework for our application (refer details in my first post on this thread) while fully making use of all features provided by middleware (WLS). Please advise.
    Regards,
    Arif

Maybe you are looking for