Transaction Recovery

A user is connect from past 5 hours, and is inserting continuesly, without any idle time, till now user has inserted 2500 rows in a table, but does not commits. suddenly user expreinces client procees failure (i.e. client machine from where user was connected has rebooted). Now PMON process will rollback all the transactions. is there a way to recover those 2500 transtions?
Regards
Neo

you mean redo log file only contains dump of transaction without any ownership & session informationNo, the O.S. infos, Oracle Username, etc are stored in redo logs. If a rollback is performed, it's too late.
now how to recover those transactions,There is no recovery involved in what you describe. This is normal transaction processing. If you fail a transaction, it is rolled back. Normal functionning.
If you like to play with data integrity, try to use the Log Miner on your achived redo logs, but I warn you... this is dangerous and will sooner or later make your data inconcistent. One and only warning.
I suppose DBA's primary motive is to protect data in almost every case of disasterWell... yes?
not to train the users/developers.There I disagree. Who better than a DBA can teach devs/users how to use a database well?
becuase in almost every situation there is space for humans to commit a mistake.Yes, and in your case, the mistake has been made by the user (no commit) and the dba (not explaining what could happen in case of problem to the user).
Regards,
Yoann.

Similar Messages

Oracle 9 and XA transaction recovery

Just thought I would pass on a bit of advice if you are attempting to enable XA transaction recovery from either Application Server 7.0 or 8.1 with Oracle version 9.2.0.6 and possibly above.
In 9.2.0.6 Oracle changed the way they responded to XA recovery queries from transaction managers and so to now allow a list of XAID's to be returned without an exception being raised you will need to have the following privileges on the Application Server Oracle user attempting to do the recovery:
Select on DBA_PENDING_TRANSACTIONS, PENDING_TRANS$, DBA_2PC_PENDING and DBA_2PC_NEIGHBORS
and the new one execute on DBMS_SYSTEM
Also I discovered that although a db ping is possible in Application server 8.1 with lower case properties on the user, password and url (JDBC Pool properties) in order to get XA recovery to work for some reason these had to be changed to be case sensitive. See below:
Not working:
<jdbc-connection-pool connection-validation-method="table"
datasource-classname="oracle.jdbc.xa.client.OracleXADataSou
rce" fail-all-connections="false" idle-timeout-in-seconds="300"
is-connection-validation-required="true" is-isolation-leve
l-guaranteed="false" max-pool-size="32"
max-wait-time-in-millis="60000" name="OraclePool"
pool-resize-quantity="2" res-typ
e="javax.sql.XADataSource" steady-pool-size="8"
validation-table-name="DUAL">
<description>Oracle Pool</description>
<property name="user" value="sunone"/>
<property name="url" value="jdbc:oracle:oci:@SS"/>
<property name="password" value="sunone"/>
</jdbc-connection-pool>
Working!!
<jdbc-connection-pool connection-validation-method="table"
datasource-classname="oracle.jdbc.xa.client.OracleXADataSou
rce" fail-all-connections="false" idle-timeout-in-seconds="300"
is-connection-validation-required="true" is-isolation-leve
l-guaranteed="false" max-pool-size="32"
max-wait-time-in-millis="60000" name="OraclePool"
pool-resize-quantity="2" res-typ
e="javax.sql.XADataSource" steady-pool-size="8"
validation-table-name="DUAL">
<description>Oracle Pool</description>
<property name="User" value="sunone"/>
<property name="URL" value="jdbc:oracle:oci:@SS"/>
<property name="Password" value="sunone"/>
</jdbc-connection-pool>
Just thought I would share that with you!

I am able to connect and message to Oracle AQ using an XA Datasource using the getVendorConnection method on the Weblogic connection wrapper (WLConnection).
This, however, seems to be memory leaking quite badly. I've followed the Weblogic documentation (which says that you should not call close() on the Vendor connection).
I've also tried closing the vendorConnection object and nulling out the reference that the WLConnection holds. This slows the leak, but it appears that the WLConnections are never being released.
Has anyone else had any success doing this?
Dom.

Parallel Transaction recovery caught exception 30319

Hi,
Database - Oracle 11.1.0.7
Server - RHEL 5.2
A load process which was to load 7 milllion records was killed after 4 hours because it had several errors in the alert log -
22238:ORA-04030: out of process memory when trying to allocate 10504 bytes (pga heap,kgh stack)
After the process was killed, even after 3 hours, there were several trc files being generated with the error -
Dead transaction 0x0001.04a.00005b78 recovered by 32 server(s)
SMON: Parallel transaction recovery tried
Parallel Transaction recovery caught exception 30319
and
Parallel Transaction recovery server caught exception 10388
*** 2010-10-07 20:40:01.124
*** SESSION ID:(2993.32762) 2010-10-07 20:40:01.124
*** SERVICE NAME:(SYS$BACKGROUND) 2010-10-07 20:40:01.124
Does this means that the smon recovery failed? If so, what can I do for this? I need to kick off the load process again.
these are some other queries I ran -
select l.message,l.sql_id,l.totalwork,l.sofar,l.*
from v$session_longops l
where time_remaining is not null and time_remaining > 0
Row#     MESSAGE     SQL_ID     TOTALWORK     SOFAR     SID     TIME_REMAINING     ELAPSED_SECONDS
1     Table Scan: prdb.prpeet: 156648 out of 843440 Blocks done     3w1cxu6vbz4jn     843440     156648     2956     8707     1986
2     Index Fast Full Scan: prdb.prpeet: 15 out of 216076 Blocks done     7cacc3d07d041     216076     15     2953     3442572     239
3     Index Fast Full Scan: prdb.prpeet: 192281 out of 206419 Blocks done     d2cu3p7tmuz0z     206419     192281     2953     37     509
4     Index Fast Full Scan: prdb.prpeet: 198899 out of 216076 Blocks done     f7whmwp813kf8     216076     198899     2953     46     531
5     Index Fast Full Scan: prdb.prpeet: 5 out of 216076 Blocks done     dwf4gghk4mq0z     216076     5     2953     604999     14
6     Index Fast Full Scan: prdb.prpeet: 7 out of 216076 Blocks done     dwf4gghk4mq0z     216076     7     2953     864276     28
7     Index Fast Full Scan: prdb.prpeet: 6 out of 216076 Blocks done     7cacc3d07d041     216076     6     2953     504163     14
8     Index Fast Full Scan: prdb.prpeet: 9 out of 216076 Blocks done     7cacc3d07d041     216076     9     2953     1128350     47
9     Index Fast Full Scan: prdb.prpeet: 13 out of 216076 Blocks done     7cacc3d07d041     216076     13     2953     1961187     118
Can someone please tell me what I need to do so that I can restart the load?
Thanks.

I do -
OPEN cur_n1_C1 ;
LOOP
FETCH cur_n1_C1 BULK COLLECT INTO vt_C1 LIMIT 1000000;
FOR i IN vt_C1.FIRST .. vt_C1.LAST LOOP
INSERT /*+ APPEND */ INTO prpeet
VALUES
END LOOP ;
COMMIT;
END IF ;
END LOOP;
CLOSE cur_n1_C1 ;
EXCEPTION
So, will this commit be a problem? I am loading around 7 million records.
Also,
I ran the v$session_longops query again now and this is what it says -
select l.message,l.sql_id,l.totalwork,l.sofar,l.sid, l.time_remaining, elapsed_seconds
from v$session_longops l
where time_remaining is not null and time_remaining > 0
Row#     MESSAGE     SQL_ID     TOTALWORK     SOFAR     SID     TIME_REMAINING     ELAPSED_SECONDS
1     Sort Output: : 55613 out of 122712 Blocks done          122712     55613     2987     490     406
3     Table Scan: prdb.prpeet: 160239 out of 843440 Blocks done     3w1cxu6vbz4jn     843440     160239     2956     21071     4942
Does this mean it will finish in 21071 seconds?
Edited by: user12158503 on Oct 7, 2010 9:41 PM

Transaction Recovery Service failover

Can anyone explain what the suggested configuration is for the default persistence store? In particular, this is to ensure the proper failover / migration of the Transaction Recovery Service which is required to use the Default Persistence Store which is file based. Based on the following statement from the docs:
Preparing to Migrate the Transaction Recovery Service
To migrate the Transaction Recovery Service from a failed server in a cluster to another server (backup server) in the same cluster, the backup server must have access to the transaction log records from the failed server. Therefore, you must store default persistent store data files on persistent storage available to all potential backup servers in the cluster. Oracle recommends that you store transaction log records on a Storage Area Network (SAN) device or a dual-ported disk. Do not use an NFS file system to store transaction log records. Because of the caching scheme in NFS, files on disk may not always be current. Using transaction log records stored on an NFS device for recovery may cause data corruption.
A SAN storage device is recommended for this but my understanding is that the SAN can only be mounted by one machine at a time. Does this imply then that our failover process needs to include mounting the SAN before starting the failover server (as part of the whole server migration)? The docs here (http://download.oracle.com/docs/cd/E15523_01/core.1111/e12036/net.htm#CIHBDDAA) indicate that a NAS can be used and even give examples of configuring it using NFS mount points:
The following commands show how to share the SOA TX logs location across different nodes:
SOAHOST1> mount nasfiler:/vol/vol1/u01/app/oracle/stores/soadomain/soa_cluster/tlogs
+/u01/app/oracle/stores/soadomain/soa_cluster/tlogs -t nfs+
SOAHOST2> nasfiler:/vol/vol1/u01/app/oracle/stores/soadomain/soa_cluster/tlogs
+/u01/app/oracle/stores/soadomain/soa_cluster/tlogs -t nfs+
Can anyone describe a best-practices approach for how to configure the expected persistent storage solution that will work with proper failover of the transaction recovery service?
Thanks!
Gary

have a look at this article and see if it helps.
http://el-caro.blogspot.com/2008/11/parallel-rollback.html

URGENT:SMON: Parallel transaction recovery tried

Hi,
I got these following messages in the BDUMP file. Around these time my process was always getting blocked while updating a table. Does this messages mean anything critical:
Dump file d:\oracle\admin\usmdb\bdump\usmdb_smon_1564.trc
Sun Mar 07 03:17:27 2004
ORACLE V9.2.0.1.0 - Production vsnsta=0
vsnsql=12 vsnxtr=3
Windows 2000 Version 5.0 Service Pack 3, CPU type 586
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
Windows 2000 Version 5.0 Service Pack 3, CPU type 586
Instance name: usmdb
Redo thread mounted by this instance: 1
Oracle process number: 6
Windows thread id: 1564, image: ORACLE.EXE
*** 2004-03-07 03:17:27.000
*** SESSION ID:(5.1) 2004-03-07 03:17:27.000
*** 2004-03-07 03:17:27.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 03:58:28.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 04:38:38.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 04:39:23.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 05:20:16.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 06:01:25.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 06:42:27.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 07:23:28.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 08:04:33.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 08:45:31.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 09:26:36.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 10:07:42.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 10:48:46.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 11:29:41.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 12:10:56.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 12:51:52.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 13:32:55.000
SMON: Parallel transaction recovery tried
Thanks,
Tuhin

Hello
No there is no problem. SMON probe a recovery in parallel but failed, and done in serial mode.
lajos

SMON: Parallel transaction recovery tried

Hi,
I got these following messages in the BDUMP file. Around these time my process was always getting blocked while updating a table. Does this messages mean anything critical:
Dump file d:\oracle\admin\usmdb\bdump\usmdb_smon_1564.trc
Sun Mar 07 03:17:27 2004
ORACLE V9.2.0.1.0 - Production vsnsta=0
vsnsql=12 vsnxtr=3
Windows 2000 Version 5.0 Service Pack 3, CPU type 586
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
Windows 2000 Version 5.0 Service Pack 3, CPU type 586
Instance name: usmdb
Redo thread mounted by this instance: 1
Oracle process number: 6
Windows thread id: 1564, image: ORACLE.EXE
*** 2004-03-07 03:17:27.000
*** SESSION ID:(5.1) 2004-03-07 03:17:27.000
*** 2004-03-07 03:17:27.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 03:58:28.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 04:38:38.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 04:39:23.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 05:20:16.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 06:01:25.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 06:42:27.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 07:23:28.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 08:04:33.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 08:45:31.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 09:26:36.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 10:07:42.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 10:48:46.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 11:29:41.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 12:10:56.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 12:51:52.000
SMON: Parallel transaction recovery tried
*** 2004-03-07 13:32:55.000
SMON: Parallel transaction recovery tried
Thanks,
Tuhin

That would occur because a large transaction (or transactions) had been killed / interrupted while the instance was running (or when the instance was shutdown abort). SMON takes over the job of "cleanup" and may use Parallel Recovery . You should be able to monitor the recovery in the V$FAST_START_TRANSACTIONS view.

Transaction recovery: caught and ignored

Since a week ago I had noticed in my alert log the following message repeated many many times:
Transaction recovery: caught and ignored
I suspect this is related with a big datapump import operation that I did and do not end due to I cancelled it because of errors. After that the frecuency of the log switch began to occur too fast(every 20 seconds, redo size 50MB).
Actually I have not idea about what can be happening so I aprecciate any help you can give me.
Thanks very much.

http://serdarturgut.blogspot.com/2010/05/transaction-recovery-lock-conflict.html

Transaction recovery caught exception

i see my .trc file
*** 2009-05-17 15:57:03.492
Parallel Transaction recovery caught exception 30319
Parallel Transaction recovery caught error 30319
*** 2009-05-17 16:01:04.040
Parallel Transaction recovery caught exception 30319
Parallel Transaction recovery caught error 30319
what it is and what must do i for this?

have a look at this article and see if it helps.
http://el-caro.blogspot.com/2008/11/parallel-rollback.html

Transaction Recovery within an Oracle RAC environment

Good evening everyone.
I need some help with Oracle 11gR1 RAC transaction-level recovery issues. Here's the scenario.
We have a three(3) node RAC Cluster running Oracle 11g R1. The Web UI portion of the application connects through WLS 9.2.3 with connection pooling set. We also have a command-line/SQL*Developer component that uses a TNSNAMES file that allows for both failover and load balancing. Within either the UI or the command line portion of the application, a user can run a process by which invokes one or more PL/SQL Packages to be invoked. The exact location of the physical to the database is dependent on which server is chosen from either the connection pooling or the TNSNAMES.ORA Load Balancing option.
In the normal world, the process executes and all is good. The status of the execution of this process is updated by the Packages once completed. The problem we are encountering is when an Oracle Instance fails. Here's where I need some help. For Application-level (Transaction Level) recovery, the database instances are first recovered by the database background proccesses and then Users must determine which processes were in flight and either re-execute them (if restart processing is part of the process) or remove any changes and restart from scratch. Given that the database instance does not record which processes are "in flight" it is the responsibility of the application to perform its own recovery processing. Is this still true?
If an instance fails, are "in flight" transactions/connections moved to other instances in the Grid/RAC environment? I don't think this is possible but I don't remember if this was accomplished through a combination of Application and Database Server features that provide feedback between each other. How is the underlying application notified of the change if such an issue occurs? I remember something similar to this in older versions of Oracle but I cannot remember what it was callled.
Any help or guidance would be great as our client is being extremely difficult in pressing this issue.
Thanks in advance
Stephen Karniotis
Project Architect - Compuware
[email protected]
(248) 408-2918

You have not indicated whether you are using TAF or FCF ... that would be the first place to start.
My recommendation would be to let Oracle roll back the database changes and have the application resubmit the most recent work.
If the application knows what it did since the last "COMMIT" then you should be fine with the possible exception of variables stored
in packages. Depending on packages retaining values is an issue best solved with PRAGMA SERIALLY_REUSABLE ... in other words
not using the retention feature.

OWB Transactions & Recovery

Hi there all
I have read chapter 8 in the OWB 10gR2 user guide titled "Understanding Performance and Advanced ETL Concepts".
Unfortunately, I still have a few questions regarding transaction processing etc... in OWB.
Firstly, here are the things that I am pretty sure about but would just like some confirmation:
1. In set based, either all or none of the records are committed regardless of "Commit Frequency" or "Batch Size".
2. In Row based, if the error count reaches the "Maximum Number of Errors", then the mapping will stop and only those records that have been updated or inserted since the last commit will be rolled back.
That is, if the "commit freuency"/"bulk size" is set to 100, then at most 99 records can be rolled back upon the mapping breaching the maximum number of errors.
Now for the things that have really got me stymied:
3. How do I configure an OWB mapping so that upon re-start it only processes the records that "WEREN'T" processed the last time the mapping ran (i.e. when it failed)?
That is, I dont want to process incoming dimension (or more importantly) fact records multiple times just because my mapping failed last time.
4. On a similar note, how do I configure a process flow to re-start where it left off after a failure.
That is, if the process flow failed half-way through the fact loading process, I want it to only run the fact mappings that didn't complete and none of the dimension mappings etc...
Phew, another lengthy post by a relative newbie.
If anyone can either anwer these Qs or point me in the drection of the appropriate documentation then that would be sweet!
Thanks in advance
Jays :-)

You are right. Voting system in CMT is irreversable. Any EJB failed transaction (throw system exception), container will rollback everything. You are not supposed to control transaction in CMT (preferred way)
ggu

Uncommited transactions remain after Instance Recovery

After Instance Recovery, the database seems to contain uncommitted transactions. Please provide an explanation for the following:
create table t1
as
select *
from all_objects;
commit;
create table t2
as
select *
from all_objects;
update t2
set object_id = 1 where rownum = 1;
shutdown abort;
Neither table t2 nor the update to it were committed (right?), therefore, once the database starts up t2 should not be there. On the contrary, it is still there. In the Oracle Database Backup and Recovery Advanced Guide 10g Release 2, page 11-9, explains how the uncommitted transactions are removed (rolled back) in the Roll Backward step (transaction recovery) of the Instance recovery process.
Any insight on this is highly appreciated. Thanks.

Note also following scenario run with SYSDBA privileges:
bas002> select * from v$version;
BANNER
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Prod
PL/SQL Release 10.2.0.2.0 - Production
CORE    10.2.0.2.0      Production
TNS for 32-bit Windows: Version 10.2.0.2.0 - Production
NLSRTL Version 10.2.0.2.0 - Production
bas002>
bas002> drop table t2;
drop table t2
ERROR at line 1:
ORA-00942: table or view does not exist
bas002> create table t2
2 as
3 select *
4 from all_objects;
Table created.
bas002>
bas002> update t2
2 set object_id = 1 where object_id=258;
1 row updated.
bas002>
bas002> shutdown abort;
ORACLE instance shut down.
bas002> startup
ORACLE instance started.
Total System Global Area 192937984 bytes
Fixed Size                  1288484 bytes
Variable Size             130025180 bytes
Database Buffers           54525952 bytes
Redo Buffers                7098368 bytes
Database mounted.
Database opened.
bas002>
bas002> select count(*) from t2 where object_id=1;
COUNT(*)
         0
bas002>Message was edited by:
Pierre Forstmann

Uncommitted transactions after recovery?

Will a database have uncommitted transactions after recovery?

Do you mean Instance Recovery? or Database Recovery?
Database Recovery will perform full rollforward/rollback. Instance Recovery will use a Fast-Start On-Demand Rollback, which means Oracle will only do transaction recovery in a similar way consistent reads are performed, this mitigates the adverse effect of recovery process.
Even though uncommitted transactions are present at start time, this doesn't mean you can resume dead transactions. Information will be completely rolled back by background processes afterwards.
Ref.
http://www.oracle.com/technology/deploy/availability/pdf/fast-start.pdf

Rollback ( Tx Recovery) and Roll forward ( cache Recovey)

Hi Guys,
I have a some doubt after Going through the links :
Difference between redo logs and undo tablespace and Oracle DBA ADMIN Guide (E25494-02).
1) Redologs Contain committed and Uncommitted data . Wether they also contain before Image Data ? or its vector that points to the Undo Segments ?
Doc Says : Redo entries record data that you can use to reconstruct all changes made to the database, including the undo segments.
But Above forum Links says that it contains the vector only .
2) Crash of database ( Abort) , how it happens .
I know first it makes the roll forward and then backward. During this Time Undo Tablespace Comes into the picture ? Do the Undo tablespace segments are build from the redo logs. Please Help on this. really get confused
I try to test few scenarios but Not able to conclude ?
1> Make A TXN ( Updated almost 29000 rows )
2) Abort the Database.
3) Change the undo Tablespace Name in init file some dummy .
Database error with the following Error :
SMON: enabling cache recovery
Errors in file d:\install\pracledb\diag\rdbms\dba\dba\trace\dba_ora_5592.trc:
ORA-30012: undo tablespace 'UNDOTBS11' does not exist or of wrong type
Errors in file d:\install\pracledb\diag\rdbms\dba\dba\trace\dba_ora_5592.trc:
ORA-30012: undo tablespace 'UNDOTBS11' does not exist or of wrong type
Error 30012 happened during db open, shutting down database
USER (ospid: 5592): terminating the instance due to error 30012
Second Test :
1) Create a 2nd Undo Tablespace
2) Update the same number of rows .
3) Shut abort in another session.
4) Modify the Pfile with new undotabs2 ( No Spfile)
5) Startup database it started.
Beginning crash recovery of 1 threads
parallel recovery started with 2 processes
Started redo scan
Completed redo scan
read 7971 KB redo, 1272 data blocks need recovery
Started redo application at
Thread 1: logseq 28, block 3
Recovery of Online Redo Log: Thread 1 Group 1 Seq 28 Reading mem 0
Mem# 0: D:\INSTALL\PRACLEDB\ORADATA\DBA\REDO01.LOG
Completed redo application of 6.72MB
Completed crash recovery at
Thread 1: logseq 28, block 15945, scn 1487147
1272 data blocks read, 1272 data blocks written, 7971 redo k-bytes read
Thread 1 advanced to log sequence 29 (thread open)
Thread 1 opened at log sequence 29
Current log# 2 seq# 29 mem# 0: D:\INSTALL\PRACLEDB\ORADATA\DBA\REDO02.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 5.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
Sun Oct 28 18:11:03 2012
SMON: enabling tx recovery
Database Characterset is WE8MSWIN1252
No Resource Manager plan active
SMON: Parallel transaction recovery tried
3) Third Test :
1) Update the same number of rows
2) Change the UNDOTBS to the New UndoTBS
3) rename the Old Datafile
4) db start to fail with error
ALTER DATABASE OPEN
Errors in file d:\install\pracledb\diag\rdbms\dba\dba\trace\dba_dbw0_3960.trc:
ORA-01157: cannot identify/lock data file 10 - see DBWR trace file
ORA-01110: data file 10: 'D:\INSTALL\PRACLEDB\ORADATA\DBA\UNDOTBS2.DBF'
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) The system cannot find the file specified.
Errors in file d:\install\pracledb\diag\rdbms\dba\dba\trace\dba_ora_4552.trc:
ORA-01157: cannot identify/lock data file 10 - see DBWR trace file
ORA-01110: data file 10: 'D:\INSTALL\PRACLEDB\ORADATA\DBA\UNDOTBS2.DBF'
ORA-1157 signalled during: ALTER DATABASE OPEN...
Sun Oct 28 18:20:04 2012
Checker run found 1 new persistent data failures
Please Help on understand this

Sourabh85 wrote:
Hi Hemant,
Thank you very much , really very helpful. One More Question :
Why Oracle do the Double work ? First Populate the Undo Blocks and then do the roll backward ? Why Not directly recovered from the redo logs for uncommitted txn.Its not the doubling of the work. Oracle updates teh Undo blocks with the old data and since the Undo blocks are also just like the Other data blocks, any changes done to them is also logged as a Change Vector to the redo log files. This means, that there would be an Undo data block change vector recorded in the redo log file before the change vector of the data block which contains the data of your statement. This is what is meant when it is said that the redo log also contains the old images . The idea is to do the recovery, when needed, in the same sequence(based on the SCN's) using everything from the redo log files. This would include the recovery of the Undo datafiles as well in the case they are lost too!
HTH
Aman....

ORACLE8 OPS BACKUP & RECOVERY

제품 : ORACLE SERVER
작성날짜 : 2004-08-16
ORACLE8 OPS BACKUP & RECOVERY
=============================
SCOPE
Standard Edition 에서는 Real Application Clusters 기능이 10g(10.1.0) 이상 부터 지원이 됩니다.
Explanation
OPS에서의 database backup & recovery 방법은 single instance의 backup 방법과
비슷하다. 즉, Single instance에서의 모든 backup 방법은 ops에서도 지원된다.
1. Backup 방법
다음의 backup 방법 모두 사용이 가능하다. 여기서는 2)의 os 명령을 이용한
backup 방법에 대해 기술합니다.
1) Recovery Manager (RMAN) : <Bulletin 11451> 참고
2) OS 명령을 활용한 백업
Noarchive log mode : full offline backup only
Archive log mode : full or partial, offline or online backup
3) export : <Bulletin 10080> 참고 : ORACLE 7 BACKUP 및 RECOVERY 방법
2. backup 정책 수립 시 고려 사항
1) disk crash나 user error 등으로 말미암은 손실을 허용하지 않는다면 ARCHIVE
LOG MODE를 사용해야 한다.
2) 대부분 모든 instance는 자동 archiving을 사용한다.
3) 모든 data backup 작업이 어떤 instance 건 가능하다.
4) media recovery 시 모든 thread의 archive file이 사용된다.
5) Instance recovery 시 살아있는 instance의 smon에 의해 자동으로 recovery된다.
3. Noarchive log mode : Full offline backup
1) 다음의 view들을 query하여 backup이 필요한 file을 알아낸다.
V$DATAFILE or DBA_DATA_FILES
V$LOGFILE
V$CONTROLFILE
2) 모든 instance를 shutdown한다.
3) 확인된 file을 backup destination으로 copy한다.
4. Archive log mode : Partial or Full Online Backup
1) 백업을 수행하기 전에 ALTER SYSTEM ARCHIVE LOG CURRENT 명령 실행(이 명령을
실행하여 현재 운영되지 않는 데이터베이스를 포함한 모든 노드의 current redo
log에 대한 로그 스위치와 그에 따른 아카이브를 모든 인스턴스에서 실행시킨다.)
2) ALTER TABLESPACE tablespace BEGIN BACKUP 명령 실행
3) ALTER TABLESPACE 명령이 성공적을 실행될 때까지 대기
4) OS에서 적절한 명령어를 활용하여 테이블스페이스에 속하는 데이터파일들을 백업
(tar, cpio, cp 등)
5) OS 명령을 활용한 백업이 다 끝날 때까지 대기
6) ALTER TABLESPACE tablespace END BACKUP 명령 수행
7) ALTER DATABASE BACKUP CONTROLFILE TO filename 이나
ALTER DATABASE BACKUP CONTROLFILE TO TRACE
명령을 수행시켜 컨트롤 파일을 백업.
만약 아카이브 로그 파일을 백업받는다면 END BACKUP 명령을 실행시킨 이후
ALTER SYSTEM ARCHIVE LOG CURRENT 명령을 실행시켜 END BACKUP 시점까지의
모든 리두 로그 파일들을 확보한다.
5. Import Parameter
1) Controlfile 내의 Redo Log History (MAXLOGHISTORY )
CREATE DATABASE 명령이나 CREATE CONTROLFILE 명령에서 MAXLOGHISTORY 값을
지정하여 parallel server에서 다 채워진 리두 로그 파일에 대한 history를
컨트롤 파일이 저장하도록 할 수 있다. 이미 데이터베이스를 생성한 후라면
log history 값을 증가시키거나 감소시키기 위해서는 컨트롤 파일을 재생성
하여야만 한다.
MAXLOGHISTORY는 컨트롤 파일 내의 archive history를 얼마나 저장할 수
있는지를 지정하며, 기본값은 플랫폼 별로 다르다. 이 값이 0이 아닌 다른
값으로 지정된다면 log switch가 발생할 때마다 LGWR 프로세스에서는 컨트롤
파일에 다음 정보를 기록한다.
thread number, log sequence number, low SCN, low SCN timestamp, next SCN
(next log의 가장 낮은 SCN값)
(이 정보는 리두 로그 파일이 archive된 후가 아니라 log switch가 발생할 때
컨트롤 파일에 저장된다.)
MAXLOGHISTORY 값에서 지정한 값을 넘어서 log history가 저장되어야 할 경우
가장 오래된 history를 overwrite하는 방식으로 저장된다. Log history 정보는
OPS에서 자동 media recovery 시 SCN, thread number를 기준으로 적절한
아카이브 로그 파일을 찾아 재구성하는 데 사용된다. 데이터베이스를 exclusive
모드에서 한개의 쓰레드만 사용하는 환경에서는 log history 정보가 필요하지 않다.
Log history 관련 정보는 V$LOG_HISTORY를 이용해 조회해 볼 수 있다.
서버 관리자에서 V$RECOVERY_LOG를 조회하면 media recovery에 필요한 아카이브
로그에 대한 정보를 얻을 수 있다.
Multiplex된 리두 로그 파일에 대해서, log history 내에서 여러개의 entry가
사용되지 않는다. 각각의 entry는 개개의 파일에 대한 정보가 아니라, multiplex
된 log 파일의 그룹에 대한 정보를 가지고 있다.
2) Archive Log Mode 시 Parameter
OPS에서 archive log mode로 변경 시 exclusive mode로 db mount 후에 변경한다.
a. LOG_ARCHIVE_FORMAT
파라미터     설명     예
%T     thread number, left-zero-padded     arch0000000001
%t     thread number, not padded     arch1
%S     log sequence number, left-zero-padded     arch0000000251
%s     log sequence number, not padded     arch251
이 가운데 %T와 %t는 OPS에서만 유효한 파라미터이다.
모든 instance의 format은 같아야 하며 OPS 환경에서는 반드시 thread 번호를
포함시켜야 한다.
예) log_archive_format = %t_%s.arc
b. LOG_ARCHIVE_START
- 자동 archiving : TRUE로 지정한 후 인스턴스를 구동시키면 background process
인 ARCH에서 자동 archiving을 수행한다. Closed Thread의 경우에는 실행 중인
thread에서 closed thread를 대신해 log switch와 archiving을 수행한다.
이것은 모든 노드에서 비슷한 SCN을 유지하도록 하기 위해 강제적으로 log switch
가 발생할 때 일어난다
- 수동 Archiving : FALSE이면 archive를 시작하도록 지시하는 명령을 명시적으로
내리지 않는 이상 동작을 멈추고 대기한다. OPS에서는 각각의 인스턴스에서 서로
다른 LOG_ARCHIVE_START 값을 사용할 수 있다.
다음과 같은 방법으로 수동 archiving을 수행할 수 있다.
ALTER SYSTEM ARCHIVE LOG SQL 명령을 실행
ALTER SYSTEM ARCHIVE LOG START 명령을 실행하여 자동 archiving을 실행하도록
지정.
수동 archiving은 명령을 실행시킨 노드에서만 실행 되며, 이 때 archiving
작업을 ARCH 프로세스가 처리하지 않는다.
c. LOG_ARCHIVE_DEST
archive log file이 만들어질 directory를 지정한다.
예) log_archive_dest = /arch2/arc
6. OPS Recovery
1) Instance Failure 시
Instance failure는 S/W나 H/W 상의 문제, 정전이나 background process에서
fail이 발생하거나, shutdown abort를 시키거나 OS crash 등 여러가지 이유로
인해 instance가 더 이상 작업을 진행할 수 없을 때 발생할 수 있다.
Single instance 환경에서는 instance failure는 instance를 restart 시키고
database를 open하여 해결된다. Mount 상태에서 open 되는 중간 단계에서 SMON은
online redo log 파일을 읽어 instance recovery 작업을 수행한다.
OPS에서는 instance failure가 발생 했을 경우 다른 방식으로 instance
recovery가 수행된다. OPS에서는 한 노드에서 fail이 발생했다고 하더라도
다른 노드의 인스턴스는 계속 운영될 수 있기 때문에 instance failure는
database가 가용하지 않다는 것을 의미하지는 않는다.
Instance recovery는 dead instance를 처음으로 발견한 SMON 프로세스에서
수행한다. Recovery가 수행되는 동안 다음과 같은 작업이 일어난다.
- Fail이 발생하지 않은 다른 인스턴스에서는 fail이 발생한 인스턴스의
redo log 파일을 읽어 들여 데이터파일에 그 내용을 적용시킨다.
- 이 기간 동안 fail이 발생하지 않은 다른 노드에서도 buffer cache 영역의
내용을 write 하지는 못한다.
- DBWR disk I/O가 일어나지 못한다.
- DML 사용자에 의해 lock request를 할 수 없다.
a. Single-node Failure
한 인스턴스에서 fail이 난 다른 인스턴스에 대한 recovery를 수행하는 동안,
정상적으로 운영 중인 인스턴스는 fail이 난 인스턴스의 redo log entry를
읽어 들어 commit이 된 트랜잭션의 결과치를 데이터베이스에 반영시킨다.
따라서 commit 된 데이터에 대한 손실은 일어나지 않으며, fail이 난
인스턴스에서 commit 시키지 않은 트랜잭션에 대해서는 rollback을 수행하고,
트랜잭션에서 사용 중이던 자원을 release시킨다.
b. Multiple-node Failure
만약 OPS의 모든 인스턴스에서 fail이 발생했을 경우, 인스턴스 recovery는
어느 한 인스턴스라도 open이 될 때 자동으로 수행된다. 이 때 open되는 인스턴스는
fail이 발생한 인스턴스가 아니라도 상관 없으며, OPS에서 shared 모드
혹은 execlusive 모드에서 데이터베이스를 mount 하더라도 상관 없이 수행된다.
오라클이 shared 모드에서 수행되던, execlusive 모드에서 수행되건,
recovery 절차는 하나의 인스턴스에서, fail이 난 모든 인스턴스에 대한
recovery를 수행하는지 여부를 제외하고는 동일하다.
2) Media Failure 시
Oracle에서 사용하는 file을 저장하는 storage media에 문제가 발생했을 경우
발생한다. 이와 같은 상황에서는 일반적으로 data에 대한 read/write가 불가능하다.
Media failure가 발생했을 경우 recovery는 single instance의 경우와
마찬가지로 recovery가 수행되어야 한다. 두 경우 모드 archive log 파일을
이용해서 transaction recovery를 수행하여야 한다.
3) Node Failure 시
OPS 환경에서, 한 노드 전체에 fail이 발생했을 때, 해당 노드에서 동작하던
instance와 IDLM 컴포넌트에서도 fail이 발생한다. 이 경우 instance recovery를
하기 위해서는 IDLM은 lock에 대한 remaster를 시키기 위해 그 자신을
reconfigure시켜야 한다.
한 노드에서 fail이 발생했을 때 Cluster Manager 또는 다른 GMS product에서는
failure를 알리고, reconfiguration을 수행하여야만 한다. 이 작업이 수행되어야만
다른 노드에서 운영 중인 LMD0 프로세스와의 통신이 가능하다.
오라클에서는 fail이 발생한 노드에서 잡고 있는 lock 정보를 access할 경우나,
LMON 프로세스에서 heartbeat을 이용해서 fail이 발생한 노드가 더 이상
가용하지 않다는 것을 감지할 때 failure가 발생한 것을 알게 된다.
IDLM에서 reconfigure가 일어나면 instance recovery가 수행된다.
Instance recovery는 recovery를 수행하는 동안 자원에 대한 contention을
피하기 위해 전체 데이터베이스의 작업을 일시 중지시킬 수 있다.
FREEZE_DB_FOR_FAST_INSTANCE_RECOVERY initialization parameter 값을
TRUE로 지정하며 전체 데이터베이스가 일시적으로 작업을 멈추게 된다.
데이터 화일에서 fine-grain lock을 사용할 경우 기본값은 TRUE이다.
이 값을 FALSE로 지정할 경우 recovery가 필요한 데이터만이 일시적으로 작업이
멈춰진다. 데이터 화일이 hash lock을 사용할 경우 FALSE가 기본 값이다.
4) IDLM failure 시
한 노드에서 다른 연관된 프로세스의 fail이나 memory fault 등의 이유로 인해
IDLM 프로세스만 fail이 발생했다면 다른 노드의 LMON에서는 이 문제를 감지하여
lock reconfiguration process를 시작한다.
이 작업이 진행 중인 동안 lock 관련 작업은 처리가 정지되고 PCM lock 또는
다른 resource를 획득하기 위해 일부 사용자들은 대기 상태로 들어간다.
5) Interconnect Failure ( GMS failure ) 시
노드 간의 interconnect에서 fail이 발생하면 각각의 노드에서는 서로 다른
노드의 IDLM과 GMS에서 fail 이 발생했다고 간주하게 된다. GMS에서는 quorum
disk나 node에 pinging 등을 수행하는 다른 방법을 통해 시스템의 상태를 확인한다.
이 경우 Fail이 발생한 connection에 대해 두 노드 혹은 한쪽 노드에서
shutdown 이 일어난다.
Oracle 8 recovery mechanism에서는 노드 혹은 인스턴스에서 강제로 fail이
발생했을 경우 IDLM이나 instance가 startup 될 수 없게 된다. 경우에 따라서는
노드 간의 IDLM communication이 가용한지 여부를 확인하기 위해 cluster
validation code를 직접 작성하여 사용할 수도 있다. 이 방법을 사용하여
GMS에서 제공하지는 않지만, 문제를 진단한 후 shutdown을 수행하도록 할 수 있다.
이같은 code를 작성하기 위해서는 단일 PCM lock에서 처리되는 단일 data block에
대해 계속해서 update 를 수행해 보는 루틴이 들어가면 된다. 서로 연결된
두 노드에서 이 프로그램을 실행시키게 될 경우 interconnect에서 fail이
난 상황을 진단할 수 있게 된다.
만약 여러개의 노드가 cluster를 구성할 경우에는 매 interconnect 마다
다른 PCM lock에 의해 처리되는 data block을 update 함으로써, 어떤 노드와의
interconnect에 문제가 발생했는지를 알아낼 수 있다.
7. Parallel Recovery
Parallel Recovery의 목표는 compute와 I/O parallelism을 사용해서 crash
recovery, single-instance recovery, media recovery 시 소요되는 시간을 줄이는
데 있다.
Parallel recovery는 여러 디스크에 걸쳐 몇 개의 데이터파일에 대해 동시에
recovery를 수행할 때 가장 효율적이다
다음과 같이 2가지 방식으로 병렬화시킬 수 있다.
- RECOVERY_PARALLELISM 파라미터 지정
- RECOVER 명령의 옵션에 지정
오라클 서버는 하나의 프로세스에서 log file을 순차적으로 읽어들이고, redo
정보를 여러 개의 recovery 프로세스에 전달해, log file에 기록된 변동 사항을
데이터파일에 적용시킬 수 있다.
Recovery Process는 오라클에서 자동적으로 구동되므로, recovery를 수행할 경우
한 개 이상의 session을 사용할 필요가 없다.
RECOVERY_PARALLELISM의 최대값은 PARALLEL_MAX_SERVERS 파라미터에 지정된 값을
초과할 수 없다.
Reference Ducumment
Oracle8 ops manual

Configuration files of the Oracle Application server can be backed up by "Backup and Recovery Tool"
Pls refer to the documentation,
http://download.oracle.com/docs/cd/B32110_01/core.1013/b32196/part5.htm#i436649
Also "backup to tapes feature" is not yet supported by this tool
thanks,
Murugesh
Message was edited by:
Murugesan Appukuttty

Gv$global_transaction locked in crash(?) recovery

Hello,
my first post on this public forum :)
We are facing some system slowdown (or hang) due to locks on either (sometimes both) gv$global_transaction or dba_pending_transactions view. Browsing the session during
"hangs" we find up to 64 parallel processes recovering (presumably dead) transactions.
So far so good: we have 16 cpus * 4 (fast_start_parallel_rollback set to high).
I thought that FSPR param is relevant only for crash recovery (is that right?) so we drilled
down the problem discovering that during these "hangs" one of the nodes N1 can't feel
N2 whilst It's true the opposite (N2 feels N1).
I can't explain the lock, to be more specific: enqueues. Is the parallel activities that
actually locks the segment underlying the views?
Why the parallel activity?
either Oracle uses FSPR for any kind of rollback activity (not only for crash recovery)
or Oracle uses FSPR also for distributed trans
or Oracle uses FSPR when rb activity is estimated above a certain threshold
or in my case N1 thought N2 is down and took over N2's transaction recovery
Machines are 2 x (9.2.0.8/64 RAC on Solaris 9/64)
Thank for your patience
g

Hello,
my first post on this public forum :)
We are facing some system slowdown (or hang) due to locks on either (sometimes both) gv$global_transaction or dba_pending_transactions view. Browsing the session during
"hangs" we find up to 64 parallel processes recovering (presumably dead) transactions.
So far so good: we have 16 cpus * 4 (fast_start_parallel_rollback set to high).
I thought that FSPR param is relevant only for crash recovery (is that right?) so we drilled
down the problem discovering that during these "hangs" one of the nodes N1 can't feel
N2 whilst It's true the opposite (N2 feels N1).
I can't explain the lock, to be more specific: enqueues. Is the parallel activities that
actually locks the segment underlying the views?
Why the parallel activity?
either Oracle uses FSPR for any kind of rollback activity (not only for crash recovery)
or Oracle uses FSPR also for distributed trans
or Oracle uses FSPR when rb activity is estimated above a certain threshold
or in my case N1 thought N2 is down and took over N2's transaction recovery
Machines are 2 x (9.2.0.8/64 RAC on Solaris 9/64)
Thank for your patience
g

Transaction Recovery

Similar Messages

Maybe you are looking for