Recovery Mechanism in Solaris
Hi to all,
I am new to Solaris (comimg from HP-UX world) and I was wondering if there is some tool in Solaris world for making exact image of the system and use it afterwards to restore the system as it was at the moment of taking the image.
HP-UX have such tool called ignite make_tape_recovery and is very handy tool fot this pourpuse.
Something like this in Solaris?
dejan.stojcevski wrote:
Thanks a lot Ivan.
This answered my question.
I will search around to learn some more about flash archives and see what they can do too.
Anyway a little comparisson with HP's make_tape_recovery:
1. make_tape_recovery creates a bootable tape. No need to boot from instalation CD. You boot directly from the tape. ufsdump is not doing this.
2. make_tape_recovery does not require to partition the underlying root disk. It is doing this automatically. ufsdump does not have this functionality.
3. make_tape_recovery is fully automated backup/recovery mechanism mining after you boot from the tape you can return around 1 hour and you will have completly recovered system. ufsdump requires mounting/unmounting of slices.This sounds a lot like SCO's root/boot floppy/tape restore solution.
Yet I think that this comparison is not correct because Sun's ufsdump and HP's make_tape_recovery are two diferent types of software (different philosophy). Sun's ufsdump is like HP's fsbackup utility - tools for full file system backups. HP's make_tape_recovery <=> Sun's ??? (flash archives maybe?)I don't think Sun has anything like this and the closest you could get would be a Flash archive or a Jumpstart server. And then you would still have to do a restore after a machine has been booted up.
The closest you could get to something like in the Sun world would probably be "Bare Metal Restore" from Veritas, now Symantec.
alan
Similar Messages
-
Error Handling/Recovery Mechanism in ODI
can u ps provide sum infor related to Error handling/recovery mechanism in ODI?
say for instance a link breaks down while moving data from source to staging or/and staging to target..what will happen?? is it like that the processed records will be dumped into the target table, or no record will be moved into the target table?
Is it like "ZERO or ALL" kindaa stuff that ODI works on?
I really need help on this?There is an option - Restart in the Operator. When you right click and click on restart .ODI will start from the steps failed.
I beleive if the database is down then restart can help you but if the agent is down , then you might need to start the session completely ,reason being when agent send the SQL process to database it wait till the Database process and send the record back to it.During that interval if the agent goes down the database would have processed the records but agent wouldn't be ready to read those record and by the time you bring the agent up ,the session would have died and so you would need to start the session again.
You can test and see if the Restart option helps you . -
Hi,
I try to understand the mechanism of instance recovery of Oracle 8i/ Oracle 9i database and I am little bit confused, since I got some opposed information about that.
Please find below my questions:
1. Is SCN number assigned to all transactions or ONLY to committed transactions?
2. Is the checkpoint number the highest SCN number after an checkpoint event?
3. After checkpoint event, dirty buffer are written to database files by DBWR. Does DBWR write only dirty buffers of committet transactions or does DBWR write out all dirty buffers in the DB buffer cache?
Thanks in advance for your answers!
regards
Peter1. The system change number (SCN) is updated whenever there is a commit.
2. If you are talking about the checkpoint_change# in the v$database table, yes
3. All dirty buffers are written to disk. Data files can and do contain uncommitted data
Justin
Distributed Database Consulting, Inc.
http://www.ddbcinc.com/askDDBC -
Want info about eventing mechanism in Solaris.
Does Solaris have facility to register for events such as "link down" other than snmp ?? If yes, where can I find information about it ?
Thanks,
DevTry looking at syseventd. This is the mechanism to expose events from the kernel/drivers to userland. I'm not sure if network link events are currently exposed.
-
Auto-recovery mechanism(S) of oracle database???
Hi,
I read that if the database is closed abnormally, say to power failure, then once the database is re-opened, the RECO foreground process will recover all transactions that were in-doubt, that is neither commited or rolled back, how does it do??? can anyone explain on that???...???this is the instance recovery. do not confuse with database recovery.
If you suddenly power-off of your db server while running, in most cases, only the uncommited transaction are lost.
You should read the doc -
Does Linux filesystem undermine Oracle's recovery mechanism?
I've been an Oracle DBA for 10 years and have been using Oracle
on Linux for several months, but am not a Linux expert by any
means. A client told me something about the filesystem Linux uses
(x2?) that I find hard to believe. Can anyone shed some light on
this?
The claim is that the Linux filesystem does not implement
synchronous writes correctly. The implication is that when a user
commits a transaction and Oracle flushes the redo log to disk,
Oracle may think the redo information has been successfully
written when in fact its still sitting in a buffer somewhere
waiting to write. If a drive failure occurs, the redo might never
get written, but meanwhile the user has already been informed
their transaction has been committed.
Oracle does not flush data block buffers to disk when you commit
a transaction. Only the redo is flushed. If the instance were to
fail, Oracle reads the redo when you restart the instance and
performs instance recovery automatically.
If the Linux filesystem does not implement synchronous writes
legitimately, then the recovery mechanisms in Oracle are
compromised--indeed a successful commit is not a guarantee of
data permanence.
Its hard to believe that this could be true; I don't see how
Oracle Corporation could put so much effort into porting their
flagship products to Linux if data permanence cannot be
guaranteed.
Is my client mistaken in their understanding of the Linux
filesystem? Any insights from the Linux gurus out there would be
gratefully appreciated!
Regards,
Roger Schrag
Database Specialists, Inc.
nullRoger Schrag (guest) wrote:
: I've been an Oracle DBA for 10 years and have been using
: Oracle on Linux for several months, but am not a Linux expert
: by any means. A client told me something about the filesystem
: Linux uses (x2?) that I find hard to believe. Can anyone shed
: some light on this?
: The claim is that the Linux filesystem does not implement
: synchronous writes correctly. The implication is that when a
: user commits a transaction and Oracle flushes the redo log to
: disk, Oracle may think the redo information has been
: successfully written when in fact its still sitting in a
: buffer somewhere waiting to write. If a drive failure occurs,
: the redo might never get written, but meanwhile the user has
: already been informed their transaction has been committed.
The problem doesn't lie with Linux - fsync() and O_SYNC are
supported and, AFAIK, behave correctly.
The problem is that Oracle doesn't appear to use them. The redo
logs arnt fsync'ed on commit, nor do they appear to be opened
with O_SYNC.
Data loss will result, as you point out, if the RDBMS doesn't
tell the operating system to save the data synchronously.
Play with strace()ing the RDBMS background processes and confirm
this for yourself.
I'm not in a position to progress this with Oracle. Someone,
obviously, should, otherwise the next article on ZDNet could be
"Linux causes massive Oracle dataloss"..
null -
Help required with inittab mechanism on Solaris 11
Hi
we have an application running on Solaris 11 which uses /etc/inittab entries to spawn and maintain a number of daemon processes. I know that use of /etc/inittab is deprecated in Solaris 11 but at the moment we would prefer not to have to modify our software to use SMF. However saying that we have hit an issue which may force our hand....
We have inittab entries which look like this:
app1:34:respawn: su - appuser -c "exec /APP/myDaemon.sh >> /APP/myDaemon.log 2>&1" > /dev/null 2>&1 0<&1
When init state 3 is entered we see two processes associated with myDaemon:
# uname -a
SunOS host40054 5.11 11.0 sun4v sparc sun4v
# ps -Af |grep myDa
appuser 21383 21380 0 19:26:58 ? 0:00 -ksh -c exec /APP/myDaemon.sh >> /APP/myDaemon.log
root 21380 1 0 19:26:58 ? 0:00 su - appuser -c exec /APP/myDaemon.sh >> /APP/myDaemon.log
If I drop the OS to init state 2 only the parent process is terminated:
# /usr/sbin/init 2
# ps -Af |grep myDa
appuser 21383 21380 0 19:26:58 ? 0:00 -ksh -c exec /APP/myDaemon.sh >> /APP/myDaemon.log
leaving the daemon process still running. Returning to init state 3 we now see:
# /usr/sbin/init 3
# ps -Af |grep myDa
appuser 26997 1 0 19:30:54 ? 0:00 -ksh -c exec /APP/myDaemon.sh >> /APP/myDaemon.log
root 26994 1 0 19:30:54 ? 0:00 su - appuser -c exec /APP/myDaemon.sh >> /APP/myDaemon.log
appuser 21383 1 0 19:26:58 ? 0:00 -ksh -c exec /APP/myDaemon.sh >> /APP/myDaemon.log
So now we've ended up with two daemon processes running (not what we want).
If I compare with Solaris 10 and earlier, the inittab works very well for us. We only ever see the single daemon process running and this is dropped and respawned as required.
# uname -a
SunOS host40041 5.10 Generic_147440-04 sun4u sparc SUNW,Sun-Fire-V210
# ps -Af | grep myD
appuser 2390 27941 0 19:38:21 ? 0:00 -ksh -c exec /APP/myDaemon.sh >> /APP/myDaemon.log
# /usr/sbin/init 2
# ps -Af | grep myD
root 2665 1187 0 19:39:19 pts/1 0:00 grep myD
# /usr/sbin/init 3
# ps -Af | grep myD
appuser 2795 27941 0 19:39:30 ? 0:00 -ksh -c exec /APP/myDaemon.sh >> /APP/myDaemon.log
With Solaris 11 it's as if the su shell process has been made explicit and now the mechnanism, as far as we are concerned, is broken.
Can anyone suggest a way to get back to the Solaris 10 way of working, or some kind of workaround for this issue, before we have to dust off our XML skills and wrestle with SMF.
Many thanks
Dave.If you use Access Policy for triggering Provision/Revoke then it will Cancel all the tasks and you won't be able to use the same instance. On each Enable you'll see new instance of Seibel RO which would be wrong.
Workaround:
Create two Tasks Enable and Disable and configured it properly like Disable Effect and Enable Effect.
Attach Adapter which is attached with Delete User Task in Disable Task and Map with Disabled
Attach Adapter which is attached with Create User Task in Enable Task and Map with Provisioned -
Controlfile recovery (Oracle 9i Solaris 9)
Hi Guys,
In my environment I had 2 control files 1 and 2. I think 1 got corrupted. so I deleted it then made a copy of 1 to 2. But now nothing seems to work.
SQL> shutdown immediate;
ORA-00210: cannot open the specified controlfile
ORA-00202: controlfile: '/database/oradata/lldev/control02.lldev'
ORA-27041: unable to open file
SVR4 Error: 13: Permission denied
Additional information: 3
SQL>
Pls assist.
Thank you All.To summarize, did you -
1) copy the control file while the database was up?
2) Copy the correct control file and not the corrupted one.
Best practice is to shutdown the database add a new location with the correct control file copy and hash the corrupt file location. Once the startup is proper, the corrupt control file can be deleted and the corresponding entry removed from the initialization parameters at next database restart.
Could you also post the relevant alert log entries? -
Enable password recovery in cisco 2950 with AAA
Hello friends,
I need to reccover switch enable password, i have already configured AAA also, when i am tryig to follow below proceedure finally saying Authorization failed. how can i recover enable password,
Regards,
Haris
If I try to recover password like this description says
http://www.cisco.com/en/US/docs/switches/lan/catalyst2960/software/release/12.2_25_see/configuration/guide/swtrbl.html#wp1090048
Step 1 Connect a terminal or PC with terminal-emulation software to the switch console port.
Step 2 Set the line speed on the emulation software to 9600 baud.
Step 3 Power off the switch. Reconnect the power cord to the switch and, within 15 seconds, press the Mode button while the System LED is still flashing green.
Base ethernet MAC Address: 00:0x:xx:xx:xx:xx
Xmodem file system is available.
The password-recovery mechanism is enabled.
The system has been interrupted prior to initializing the
flash filesystem. The following commands will initialize
the flash filesystem, and finish loading the operating
system software:
flash_init
load_helper
boot
switch:
Step 4 switch: flash_init
Initializing Flash...
flashfs[0]: 600 files, 19 directories
flashfs[0]: 0 orphaned files, 0 orphaned directories
flashfs[0]: Total bytes: 32514048
flashfs[0]: Bytes used: 7713792
flashfs[0]: Bytes available: 24800256
flashfs[0]: flashfs fsck took 10 seconds.
...done Initializing Flash.
Boot Sector Filesystem (bs) installed, fsid: 3
Setting console baud rate to 9600...
Step5 switch:load_helper
Step6 switch: dir flash:
Directory of flash:/
2 -rwx 916 <date> vlan.dat
5 drwx 192 <date> c2960-lanbase-mz.122-25.SEE1
620 -rwx 5488 <date> config.text
621 -rwx 5 <date> private-config.text
24800256 bytes available (7713792 bytes used)
Step7 switch: rename flash:config.text flash:config.text.old
Step8 switch: boot
Loading "flash:c2960-lanbase-mz.122-25.SEE1/c2960-lanbase-mz.122-25.SEE1.bin"...
Initializing flashfs...
flashfs[1]: 600 files, 19 directories
flashfs[1]: 0 orphaned files, 0 orphaned directories
flashfs[1]: Total bytes: 32514048
flashfs[1]: Bytes used: 7713792
flashfs[1]: Bytes available: 24800256
flashfs[1]: flashfs fsck took 1 seconds.
flashfs[1]: Initialization complete....done Initializing flashfs.
64K bytes of flash-simulated non-volatile configuration memory.
Base ethernet MAC Address : 00:0x:xx:xx:xx:xx
Motherboard assembly number : xxxxxxxxxx
Power supply part number : xxxxxxxxxxx
Motherboard serial number : xxxxxxxxxxx
Power supply serial number : xxxxxxxxxxx
Model revision number : B0
Motherboard revision number : B0
Model number : WS-C2960G-24TC-L
System serial number : xxxxxxxxxxxx
Top Assembly Part Number : xxxxxxxxxxxx
Top Assembly Revision Number : B0
Version ID : V02
CLEI Code Number : xxxxxxxxxxxxx
Hardware Board Revision Number : 0x01
Switch Ports Model SW Version SW Image
* 1 24 WS-C2960G-24TC-L 12.2(25)SEE1 C2960-LANBASE-M
Press RETURN to get started!
Step9 Hit <Enter>
Would you like to terminate autoinstall? [yes]: yes
Step10
--- System Configuration Dialog ---
Would you like to enter the initial configuration dialog? [yes/no]no
Switch>
Step11 Switch> enable
Step12 Switch# rename flash:config.text.old flash:config.text
Destination filename [config.text]? <Enter>
Step13 Switch# copy flash:config.text system:running-config
Destination filename [running-config]?<Enter>
5488 bytes copied in 0.940 secs (5838 bytes/sec)
Step14 NewSwitchName#conf t
% Authorization failed.
Doesn't this procedure work any more ?The password recovery worked, but you copied your problematic config back to the switch. Skip Step 13 and paste only the working part of the config to the switch.
You can see your renamed config with "more flash:config.text.old". -
PRD system recovery - "HOLD" & "To Be Delivered" messages
Hi Gurus,
I'd like to discuss this crucial thing.
In our systems occures sometimes this probem - messages in "HOLD" status and "To Be Delivered" status.
HOLD status - occurs in serialized queues (EIEO) when error occurs in some message in this queue, all others are set to status HOLD. Which is ok. The question is: when we cancel the "error causing" message, do we need to restart other messages from the queue manually? Or does there work some "auto-recovery" mechanism?
And what about the status TBD? When occurs this one? and what steps can be done to solve this problem?
Thank you all,
OlianHi Olian,
I was experiencing the same issues the other day with IDOC - Soap scenario.
I have given a wrong URL on the target adapter and everytime I sent an IDOC, it went to Waiting status and then after some time it gave a error message "system error"
if system error occurs, it block the messages in that queue for subsequent entries.
Solutions:
1. If you are using IDOC on the sender side, go to Interface determination and deselect the checkbox "maintain order at runtime" (this is basically you dont have a sender CC for IDOC)
2. Find out the message that resulted in system error and resend it or cancel that message.
3. Find out the cause of that error message and fix it so that it wont happen next time
Regards,
Nikhil. -
Purpose of ONLINE REDO LOG FILES - Media or Instance recovery or BOTH ?
Hi
Currently studying this topic for the 1z0-031 exam and am a little confused.
my books (from instructor led class) say
-redo logs are a mean to provide redo transactions in the event of a DATABASE recovery
-redo log buffer gets flushed to redo log files to provide a recovery mechanism in case of MEDIA FAILURE
Then it says
-Online redo log files are used in a situation such as an INSTANCE FAILURE to recover uncommitted data which has not yet been written to the data files
- online redo log files are used for RECOVERY only.
Am i misunderstanding? Or are redo log files for both MEDIA and INSTANCE recovery? Or just INSTANCE ?
confused....
AmanjitOnline Redo Log Files are used in a sense for both Media and Instance Recovery. If your database is in NoArchive Mode then you will only be able to use the Redo Log Files for instance recover. But if you are running in Archive Log Mode then Redo Log Files are archived and will allow you to recover from media failure.
-
제품 : ORACLE SERVER
작성날짜 : 2004-08-16
ORACLE8 OPS BACKUP & RECOVERY
=============================
SCOPE
Standard Edition 에서는 Real Application Clusters 기능이 10g(10.1.0) 이상 부터 지원이 됩니다.
Explanation
OPS에서의 database backup & recovery 방법은 single instance의 backup 방법과
비슷하다. 즉, Single instance에서의 모든 backup 방법은 ops에서도 지원된다.
1. Backup 방법
다음의 backup 방법 모두 사용이 가능하다. 여기서는 2)의 os 명령을 이용한
backup 방법에 대해 기술합니다.
1) Recovery Manager (RMAN) : <Bulletin 11451> 참고
2) OS 명령을 활용한 백업
Noarchive log mode : full offline backup only
Archive log mode : full or partial, offline or online backup
3) export : <Bulletin 10080> 참고 : ORACLE 7 BACKUP 및 RECOVERY 방법
2. backup 정책 수립 시 고려 사항
1) disk crash나 user error 등으로 말미암은 손실을 허용하지 않는다면 ARCHIVE
LOG MODE를 사용해야 한다.
2) 대부분 모든 instance는 자동 archiving을 사용한다.
3) 모든 data backup 작업이 어떤 instance 건 가능하다.
4) media recovery 시 모든 thread의 archive file이 사용된다.
5) Instance recovery 시 살아있는 instance의 smon에 의해 자동으로 recovery된다.
3. Noarchive log mode : Full offline backup
1) 다음의 view들을 query하여 backup이 필요한 file을 알아낸다.
V$DATAFILE or DBA_DATA_FILES
V$LOGFILE
V$CONTROLFILE
2) 모든 instance를 shutdown한다.
3) 확인된 file을 backup destination으로 copy한다.
4. Archive log mode : Partial or Full Online Backup
1) 백업을 수행하기 전에 ALTER SYSTEM ARCHIVE LOG CURRENT 명령 실행(이 명령을
실행하여 현재 운영되지 않는 데이터베이스를 포함한 모든 노드의 current redo
log에 대한 로그 스위치와 그에 따른 아카이브를 모든 인스턴스에서 실행시킨다.)
2) ALTER TABLESPACE tablespace BEGIN BACKUP 명령 실행
3) ALTER TABLESPACE 명령이 성공적을 실행될 때까지 대기
4) OS에서 적절한 명령어를 활용하여 테이블스페이스에 속하는 데이터파일들을 백업
(tar, cpio, cp 등)
5) OS 명령을 활용한 백업이 다 끝날 때까지 대기
6) ALTER TABLESPACE tablespace END BACKUP 명령 수행
7) ALTER DATABASE BACKUP CONTROLFILE TO filename 이나
ALTER DATABASE BACKUP CONTROLFILE TO TRACE
명령을 수행시켜 컨트롤 파일을 백업.
만약 아카이브 로그 파일을 백업받는다면 END BACKUP 명령을 실행시킨 이후
ALTER SYSTEM ARCHIVE LOG CURRENT 명령을 실행시켜 END BACKUP 시점까지의
모든 리두 로그 파일들을 확보한다.
5. Import Parameter
1) Controlfile 내의 Redo Log History (MAXLOGHISTORY )
CREATE DATABASE 명령이나 CREATE CONTROLFILE 명령에서 MAXLOGHISTORY 값을
지정하여 parallel server에서 다 채워진 리두 로그 파일에 대한 history를
컨트롤 파일이 저장하도록 할 수 있다. 이미 데이터베이스를 생성한 후라면
log history 값을 증가시키거나 감소시키기 위해서는 컨트롤 파일을 재생성
하여야만 한다.
MAXLOGHISTORY는 컨트롤 파일 내의 archive history를 얼마나 저장할 수
있는지를 지정하며, 기본값은 플랫폼 별로 다르다. 이 값이 0이 아닌 다른
값으로 지정된다면 log switch가 발생할 때마다 LGWR 프로세스에서는 컨트롤
파일에 다음 정보를 기록한다.
thread number, log sequence number, low SCN, low SCN timestamp, next SCN
(next log의 가장 낮은 SCN값)
(이 정보는 리두 로그 파일이 archive된 후가 아니라 log switch가 발생할 때
컨트롤 파일에 저장된다.)
MAXLOGHISTORY 값에서 지정한 값을 넘어서 log history가 저장되어야 할 경우
가장 오래된 history를 overwrite하는 방식으로 저장된다. Log history 정보는
OPS에서 자동 media recovery 시 SCN, thread number를 기준으로 적절한
아카이브 로그 파일을 찾아 재구성하는 데 사용된다. 데이터베이스를 exclusive
모드에서 한개의 쓰레드만 사용하는 환경에서는 log history 정보가 필요하지 않다.
Log history 관련 정보는 V$LOG_HISTORY를 이용해 조회해 볼 수 있다.
서버 관리자에서 V$RECOVERY_LOG를 조회하면 media recovery에 필요한 아카이브
로그에 대한 정보를 얻을 수 있다.
Multiplex된 리두 로그 파일에 대해서, log history 내에서 여러개의 entry가
사용되지 않는다. 각각의 entry는 개개의 파일에 대한 정보가 아니라, multiplex
된 log 파일의 그룹에 대한 정보를 가지고 있다.
2) Archive Log Mode 시 Parameter
OPS에서 archive log mode로 변경 시 exclusive mode로 db mount 후에 변경한다.
a. LOG_ARCHIVE_FORMAT
파라미터 설명 예
%T thread number, left-zero-padded arch0000000001
%t thread number, not padded arch1
%S log sequence number, left-zero-padded arch0000000251
%s log sequence number, not padded arch251
이 가운데 %T와 %t는 OPS에서만 유효한 파라미터이다.
모든 instance의 format은 같아야 하며 OPS 환경에서는 반드시 thread 번호를
포함시켜야 한다.
예) log_archive_format = %t_%s.arc
b. LOG_ARCHIVE_START
- 자동 archiving : TRUE로 지정한 후 인스턴스를 구동시키면 background process
인 ARCH에서 자동 archiving을 수행한다. Closed Thread의 경우에는 실행 중인
thread에서 closed thread를 대신해 log switch와 archiving을 수행한다.
이것은 모든 노드에서 비슷한 SCN을 유지하도록 하기 위해 강제적으로 log switch
가 발생할 때 일어난다
- 수동 Archiving : FALSE이면 archive를 시작하도록 지시하는 명령을 명시적으로
내리지 않는 이상 동작을 멈추고 대기한다. OPS에서는 각각의 인스턴스에서 서로
다른 LOG_ARCHIVE_START 값을 사용할 수 있다.
다음과 같은 방법으로 수동 archiving을 수행할 수 있다.
ALTER SYSTEM ARCHIVE LOG SQL 명령을 실행
ALTER SYSTEM ARCHIVE LOG START 명령을 실행하여 자동 archiving을 실행하도록
지정.
수동 archiving은 명령을 실행시킨 노드에서만 실행 되며, 이 때 archiving
작업을 ARCH 프로세스가 처리하지 않는다.
c. LOG_ARCHIVE_DEST
archive log file이 만들어질 directory를 지정한다.
예) log_archive_dest = /arch2/arc
6. OPS Recovery
1) Instance Failure 시
Instance failure는 S/W나 H/W 상의 문제, 정전이나 background process에서
fail이 발생하거나, shutdown abort를 시키거나 OS crash 등 여러가지 이유로
인해 instance가 더 이상 작업을 진행할 수 없을 때 발생할 수 있다.
Single instance 환경에서는 instance failure는 instance를 restart 시키고
database를 open하여 해결된다. Mount 상태에서 open 되는 중간 단계에서 SMON은
online redo log 파일을 읽어 instance recovery 작업을 수행한다.
OPS에서는 instance failure가 발생 했을 경우 다른 방식으로 instance
recovery가 수행된다. OPS에서는 한 노드에서 fail이 발생했다고 하더라도
다른 노드의 인스턴스는 계속 운영될 수 있기 때문에 instance failure는
database가 가용하지 않다는 것을 의미하지는 않는다.
Instance recovery는 dead instance를 처음으로 발견한 SMON 프로세스에서
수행한다. Recovery가 수행되는 동안 다음과 같은 작업이 일어난다.
- Fail이 발생하지 않은 다른 인스턴스에서는 fail이 발생한 인스턴스의
redo log 파일을 읽어 들여 데이터파일에 그 내용을 적용시킨다.
- 이 기간 동안 fail이 발생하지 않은 다른 노드에서도 buffer cache 영역의
내용을 write 하지는 못한다.
- DBWR disk I/O가 일어나지 못한다.
- DML 사용자에 의해 lock request를 할 수 없다.
a. Single-node Failure
한 인스턴스에서 fail이 난 다른 인스턴스에 대한 recovery를 수행하는 동안,
정상적으로 운영 중인 인스턴스는 fail이 난 인스턴스의 redo log entry를
읽어 들어 commit이 된 트랜잭션의 결과치를 데이터베이스에 반영시킨다.
따라서 commit 된 데이터에 대한 손실은 일어나지 않으며, fail이 난
인스턴스에서 commit 시키지 않은 트랜잭션에 대해서는 rollback을 수행하고,
트랜잭션에서 사용 중이던 자원을 release시킨다.
b. Multiple-node Failure
만약 OPS의 모든 인스턴스에서 fail이 발생했을 경우, 인스턴스 recovery는
어느 한 인스턴스라도 open이 될 때 자동으로 수행된다. 이 때 open되는 인스턴스는
fail이 발생한 인스턴스가 아니라도 상관 없으며, OPS에서 shared 모드
혹은 execlusive 모드에서 데이터베이스를 mount 하더라도 상관 없이 수행된다.
오라클이 shared 모드에서 수행되던, execlusive 모드에서 수행되건,
recovery 절차는 하나의 인스턴스에서, fail이 난 모든 인스턴스에 대한
recovery를 수행하는지 여부를 제외하고는 동일하다.
2) Media Failure 시
Oracle에서 사용하는 file을 저장하는 storage media에 문제가 발생했을 경우
발생한다. 이와 같은 상황에서는 일반적으로 data에 대한 read/write가 불가능하다.
Media failure가 발생했을 경우 recovery는 single instance의 경우와
마찬가지로 recovery가 수행되어야 한다. 두 경우 모드 archive log 파일을
이용해서 transaction recovery를 수행하여야 한다.
3) Node Failure 시
OPS 환경에서, 한 노드 전체에 fail이 발생했을 때, 해당 노드에서 동작하던
instance와 IDLM 컴포넌트에서도 fail이 발생한다. 이 경우 instance recovery를
하기 위해서는 IDLM은 lock에 대한 remaster를 시키기 위해 그 자신을
reconfigure시켜야 한다.
한 노드에서 fail이 발생했을 때 Cluster Manager 또는 다른 GMS product에서는
failure를 알리고, reconfiguration을 수행하여야만 한다. 이 작업이 수행되어야만
다른 노드에서 운영 중인 LMD0 프로세스와의 통신이 가능하다.
오라클에서는 fail이 발생한 노드에서 잡고 있는 lock 정보를 access할 경우나,
LMON 프로세스에서 heartbeat을 이용해서 fail이 발생한 노드가 더 이상
가용하지 않다는 것을 감지할 때 failure가 발생한 것을 알게 된다.
IDLM에서 reconfigure가 일어나면 instance recovery가 수행된다.
Instance recovery는 recovery를 수행하는 동안 자원에 대한 contention을
피하기 위해 전체 데이터베이스의 작업을 일시 중지시킬 수 있다.
FREEZE_DB_FOR_FAST_INSTANCE_RECOVERY initialization parameter 값을
TRUE로 지정하며 전체 데이터베이스가 일시적으로 작업을 멈추게 된다.
데이터 화일에서 fine-grain lock을 사용할 경우 기본값은 TRUE이다.
이 값을 FALSE로 지정할 경우 recovery가 필요한 데이터만이 일시적으로 작업이
멈춰진다. 데이터 화일이 hash lock을 사용할 경우 FALSE가 기본 값이다.
4) IDLM failure 시
한 노드에서 다른 연관된 프로세스의 fail이나 memory fault 등의 이유로 인해
IDLM 프로세스만 fail이 발생했다면 다른 노드의 LMON에서는 이 문제를 감지하여
lock reconfiguration process를 시작한다.
이 작업이 진행 중인 동안 lock 관련 작업은 처리가 정지되고 PCM lock 또는
다른 resource를 획득하기 위해 일부 사용자들은 대기 상태로 들어간다.
5) Interconnect Failure ( GMS failure ) 시
노드 간의 interconnect에서 fail이 발생하면 각각의 노드에서는 서로 다른
노드의 IDLM과 GMS에서 fail 이 발생했다고 간주하게 된다. GMS에서는 quorum
disk나 node에 pinging 등을 수행하는 다른 방법을 통해 시스템의 상태를 확인한다.
이 경우 Fail이 발생한 connection에 대해 두 노드 혹은 한쪽 노드에서
shutdown 이 일어난다.
Oracle 8 recovery mechanism에서는 노드 혹은 인스턴스에서 강제로 fail이
발생했을 경우 IDLM이나 instance가 startup 될 수 없게 된다. 경우에 따라서는
노드 간의 IDLM communication이 가용한지 여부를 확인하기 위해 cluster
validation code를 직접 작성하여 사용할 수도 있다. 이 방법을 사용하여
GMS에서 제공하지는 않지만, 문제를 진단한 후 shutdown을 수행하도록 할 수 있다.
이같은 code를 작성하기 위해서는 단일 PCM lock에서 처리되는 단일 data block에
대해 계속해서 update 를 수행해 보는 루틴이 들어가면 된다. 서로 연결된
두 노드에서 이 프로그램을 실행시키게 될 경우 interconnect에서 fail이
난 상황을 진단할 수 있게 된다.
만약 여러개의 노드가 cluster를 구성할 경우에는 매 interconnect 마다
다른 PCM lock에 의해 처리되는 data block을 update 함으로써, 어떤 노드와의
interconnect에 문제가 발생했는지를 알아낼 수 있다.
7. Parallel Recovery
Parallel Recovery의 목표는 compute와 I/O parallelism을 사용해서 crash
recovery, single-instance recovery, media recovery 시 소요되는 시간을 줄이는
데 있다.
Parallel recovery는 여러 디스크에 걸쳐 몇 개의 데이터파일에 대해 동시에
recovery를 수행할 때 가장 효율적이다
다음과 같이 2가지 방식으로 병렬화시킬 수 있다.
- RECOVERY_PARALLELISM 파라미터 지정
- RECOVER 명령의 옵션에 지정
오라클 서버는 하나의 프로세스에서 log file을 순차적으로 읽어들이고, redo
정보를 여러 개의 recovery 프로세스에 전달해, log file에 기록된 변동 사항을
데이터파일에 적용시킬 수 있다.
Recovery Process는 오라클에서 자동적으로 구동되므로, recovery를 수행할 경우
한 개 이상의 session을 사용할 필요가 없다.
RECOVERY_PARALLELISM의 최대값은 PARALLEL_MAX_SERVERS 파라미터에 지정된 값을
초과할 수 없다.
Reference Ducumment
Oracle8 ops manualConfiguration files of the Oracle Application server can be backed up by "Backup and Recovery Tool"
Pls refer to the documentation,
http://download.oracle.com/docs/cd/B32110_01/core.1013/b32196/part5.htm#i436649
Also "backup to tapes feature" is not yet supported by this tool
thanks,
Murugesh
Message was edited by:
Murugesan Appukuttty -
Hi All,
can anyone help me out to know if "Fail-Over Recovery" concept is avaliable in Hyperion Essbase 11.1.1.3.
If possible, please explain me how it ca be done.
RegardsRajesh Kumar wrote:
Hi
I am working on data base fail over recovery mechanism. I am working on weblogic6.1Sp1 server installed on a unix machine. We are using J2EE architecture in our application. We have used Entity beans for dase base transactions.
My main objective is to allow my applictaion to switch over to secondry data base in case of failure of primary data base.
I have already developed a prototype which is working fine for a client application's request.But i can't use it for entity beans with container managed persistance.
So what i want to ask you is as follows:
Is there a way to switch between data bases for container managed entity beans.If yes then how to implement it?
Thank you
RajeshEasy. Define a multipool to tap a pool to the regular database first, and in cases when that DBMS is down,
tap a second pool to the fallback DBMS. Define a TxDataSource for the multipool, and have the beans
use that DataSource.
Joe -
Application Restart and Recovery APIs doesn't work for windows services
I am using the Application Restart and Recovery mechanism (provided in Windows API Code Pack for Microsoft.NET Framework) to collect some information (i.e. stack information when there's an unhandledexception) before my windows service crash down.
It works well for windows form applications, but the callback method wouldn't be called if the host is a windows service.
I have checked the article: https://msdn.microsoft.com/zh-cn/subscriptions/downloads/cc303708
But it doesn't specify clearly whether it works for a windows service. It seems that the recovery will only be activated when the user interacts with the error dialog of Windows Error Reporting (clicking "close" on the dialog, for example).
So I am wondering is my guess right that the Application Restart and Recovery mechanism doesn't work for windows services. Or is there a better way to meet my requirement?I would suggest trying ARR if that's what you want to use. The restart portion won't work, but it doesn't need to as if you fail out of your service, the Windows service controller will handle recovery (up to and including restarting your service).
You configure those recovery actions either through code or one of the built in administrative tools for services such as services.msc.
DebugDiag/ADplus and similar tools ultimately do use built-in APIs; you don't need to add anything external to collect debugging information. You do however have to write a good deal of code to do somethings. It's pretty simple to use the unmanaged
function that I pointed out before and
MiniDumpWriteDump to write a minidump when you hit an unexpected error(the dbghelp.dll that comes installed with Windows has it so you don't need anything additional installed). You can even write a basic debugger that literally debugs a process using
only kernel32 functions (see
https://msdn.microsoft.com/en-us/library/windows/desktop/ms679301(v=vs.85).aspx if you're interested).
WinSDK Support Team Blog: http://blogs.msdn.com/b/winsdk/ -
Old 1760 With password recovery disabled, no way to factory reset
Hi
I have an old 1760 router with Password Recovery Functionality Disabled
I don't care about its actual configuration , I need factory reset
I Followed the well documented procedure :
Normal boot
Self decompressing the image : #################################################
################################################################ [OK]
Smart Init is disabled. IOMEM set to: 15
PMem allocated: 57042944 bytes; IOMem allocated: 10065920 bytes
Restricted Rights Legend
Use, duplication, or disclosure by the Government is
subject to restrictions as set forth in subparagraph
(c) of the Commercial Computer Software - Restricted
Rights clause at FAR sec. 52.227-19 and subparagraph
(c) (1) (ii) of the Rights in Technical Data and Computer
Software clause at DFARS sec. 252.227-7013.
cisco Systems, Inc.
170 West Tasman Drive
San Jose, California 95134-1706
Cisco Internetwork Operating System Software
IOS (tm) C1700 Software (C1700-SV8Y7-M), Version 12.3(6d), RELEASE SOFTWARE (fc1
Copyright (c) 1986-2004 by cisco Systems, Inc.
Compiled Fri 15-Oct-04 03:46 by kellythw
Image text-base: 0x80008120, data-base: 0x81440804
Send break at this time , then :
Do you want to reset the router to factory default
configuration and proceed [y/n] ? y
Reset router configuration to factory default.
cisco 1760 (MPC860P) processor (revision 0x500) with 55706K/9830K bytes of memor
y.
Processor board ID FOC07450X9P (3881152211), with hardware revision 0000
MPC860P processor: part number 5, mask 2
Bridging software.
X.25 software, Version 3.0.0.
1 FastEthernet/IEEE 802.3 interface(s)
32K bytes of non-volatile configuration memory.
32768K bytes of processor board System flash (Read/Write)
WARNING:
Executing this command will disable password recovery mechanism.
Do not execute this command without another plan for
password recovery.
Are you sure you want to continue? [yes/no]: y
The router boot up normally anyway , still with original password unrecovered instead a fresh factory default.
Any hint please ?????
Thank youFederico,
There is something quite strange going on but one thing has caught my attention in particular. This is a part of your transcript:
Send break at this time , then :
Do you want to reset the router to factory default configuration and proceed [y/n] ? y Reset router configuration to factory default.cisco 1760 (MPC860P) processor (revision 0x500) with 55706K/9830K bytes of memory.Processor board ID FOC07450X9P (3881152211), with hardware revision 0000MPC860P processor: part number 5, mask 2Bridging software.X.25 software, Version 3.0.0.1 FastEthernet/IEEE 802.3 interface(s)32K bytes of non-volatile configuration memory.32768K bytes of processor board System flash (Read/Write)WARNING:Executing this command will disable password recovery mechanism.Do not execute this command without another plan forpassword recovery.Are you sure you want to continue? [yes/no]: y
Notice that the first question is whether you want to erase the configuration - you respond with yes, and the router continues booting. The second question displayed clearly shows that the router continues loading the configuration file and in particular processes the no service password-recovery command.
What would happen if you answered with n to this second question, preventing the router from accepting the no password-recovery stored command? Could you reload the router afterwards and try the password recovery procedure again?
Also, if this router has a removable Flash card, would you be able to enter the ROMMON and set the configuration register to 0x2142 if you removed the card and tried booting the router?
Best regards,
Peter
Maybe you are looking for
-
IPad2 display won't fit the screen
My iPad screen display is too big so all outside margins can't be seen. I tried to reset and tried to shrink. Still too big. It's like my screen is 110% trying to fit a 100% screen. Ideas?
-
RSRV - Database information about Infoprovider
Greeting fellow gurus, I have run an RSRV check against my Basic Infocube and am getting 0 entries in all of my fact tables and dimensions. When I do an SE16 against the /BIC/* tables that make up the fact and dim tables, I see they do have entries.
-
I am new to flash and building a simple picture slideshow with buttons for the next frame and previous frame to use on a website. I have two questions. 1. How do I get the pictures to start over at the beginning once I get to the last frame? Right no
-
DW CS4 acting screwy now - inserted links no longer site relative...
I'm using DW CS4 on Win 7 and it was working great for a while, but now it's going haywire. Everytime I try to use the little target icon (or any insert method other than manually typing) to link to a document it is inserting local system level full
-
Do not like Firefox 6.0, how do I un-install?
Just downloaded Firefox 6.0 and do not like what has changed, how do I un-install?