Node crashes when enabling RDS for private interconnect.

OS: oel6.3 - 2.6.39-300.17.2.el6uek.x86_64
Grid and DB: 11.2.0.3.4
This is a two node Standard Edition cluster.
The node crashes upon restart of clusterware after following the instructions from note:751343.1 (RAC Support for RDS Over Infiniband) to enable RDS.
The cluster is running fine using ipoib for the cluster_interconnect.
1) As the ORACLE_HOME/GI_HOME owner, stop all resources (database, listener, ASM etc) that's running from the home. When stopping database, use NORMAL or IMMEDIATE option.
2) As root, if relinking 11gR2 Grid Infrastructure (GI) home, unlock GI home: GI_HOME/crs/install/rootcrs.pl -unlock
3) As the ORACLE_HOME/GI_HOME owner, go to ORACLE_HOME/GI_HOME and cd to rdbms/lib
4) As the ORACLE_HOME/GI_HOME owner, issue "make -f ins_rdbms.mk ipc_rds ioracle"
5) As root, if relinking 11gR2 Grid Infrastructure (GI) home, lock GI home: GI_HOME/crs/install/rootcrs.pl -patch
Looks to abend when asm tries to start with the message below on the console.
I have a service request open for this issue but, I am hoping someone may have seen this and has
some way around it.
Thanks
Alan
kernel BUG at net/rds/ib_send.c:547!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: 8021q garp stp llc iptable_filter ip_tables nfs lockd
fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8
freq_table mperf rds_rdma rds_tcp rds ib_ipoib rdma_ucm ib_ucm ib_uverbs
ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa sr_mod cdrom microcode
serio_raw pcspkr ghes hed k10temp hwmon amd64_edac_mod edac_core
edac_mce_amd i2c_piix4 i2c_core sg igb dca mlx4_ib ib_mad ib_core
mlx4_en mlx4_core ext4 mbcache jbd2 usb_storage sd_mod crc_t10dif ahci
libahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
scsi_wait_scan]
Pid: 4140, comm: kworker/u:1 Not tainted 2.6.39-300.17.2.el6uek.x86_64
#1 Supermicro BHDGT/BHDGT
RIP: 0010:[<ffffffffa02db829>] [<ffffffffa02db829>]
rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
RSP: 0018:ffff880fb84a3c50 EFLAGS: 00010202
RAX: ffff880fbb694000 RBX: ffff880fb3e4e600 RCX: 0000000000000000
RDX: 0000000000000030 RSI: ffff880fbb6c3a00 RDI: ffff880fb058a048
RBP: ffff880fb84a3d30 R08: 0000000000000fd0 R09: ffff880fbb6c3b90
R10: 0000000000000000 R11: 000000000000001a R12: ffff880fbb6c3a00
R13: ffff880fbb6c3a00 R14: 0000000000000000 R15: ffff880fb84a3d90
FS: 00007fd0a3a56700(0000) GS:ffff88101e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002158ca2 CR3: 0000000001783000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:1 (pid: 4140, threadinfo ffff880fb84a2000, task
ffff880fae970180)
Stack:
0000000000012200 0000000000012200 ffff880f00000000 0000000000000000
000000000000e5b0 ffffffff8115af81 ffffffff81b8d6c0 ffffffffa02b2e12
00000001bf272240 ffffffff81267020 ffff880fbb6c3a00 0000003000000002
Call Trace:
[<ffffffff8115af81>] ? __kmalloc+0x1f1/0x200
[<ffffffffa02b2e12>] ? rds_message_alloc+0x22/0x90 [rds]
[<ffffffff81267020>] ? sg_init_table+0x30/0x50
[<ffffffffa02b2db2>] ? rds_message_alloc_sgs+0x62/0xa0 [rds]
[<ffffffffa02b31e4>] ? rds_message_map_pages+0xa4/0x110 [rds]
[<ffffffffa02b4f3b>] rds_send_xmit+0x38b/0x6e0 [rds]
[<ffffffff81089d53>] ? cwq_activate_first_delayed+0x53/0x100
[<ffffffffa02b6040>] ? rds_recv_worker+0xc0/0xc0 [rds]
[<ffffffffa02b6075>] rds_send_worker+0x35/0xc0 [rds]
[<ffffffff81089fd6>] process_one_work+0x136/0x450
[<ffffffff8108bbe0>] worker_thread+0x170/0x3c0
[<ffffffff8108ba70>] ? manage_workers+0x120/0x120
[<ffffffff810907e6>] kthread+0x96/0xa0
[<ffffffff81515544>] kernel_thread_helper+0x4/0x10
[<ffffffff81090750>] ? kthread_worker_fn+0x1a0/0x1a0
[<ffffffff81515540>] ? gs_change+0x13/0x13
Code: ff ff e9 b1 fe ff ff 48 8b 0d b4 54 4b e1 48 89 8d 70 ff ff ff e9
71 ff ff ff 83 bd 7c ff ff ff 00 0f 84 f4 f5 ff ff 0f 0b eb fe <0f> 0b
eb fe 44 8b 8d 48 ff ff ff 41 b7 01 e9 51 f6 ff ff 0f 0b
RIP [<ffffffffa02db829>] rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
RSP <ffff880fb84a3c50>
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.39-300.17.2.el6uek.x86_64
([email protected]) (gcc version 4.4.6 20110731 (Red
Hat 4.4.6-3) (GCC) ) #1 SMP Wed Nov 7 17:48:36 PST 2012
Command line: ro root=UUID=5ad1a268-b813-40da-bb76-d04895215677
rd_DM_UUID=ddf1_stor rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us numa=off
console=ttyS1,115200n8 irqpoll maxcpus=1 nr_cpus=1 reset_devices
cgroup_disable=memory mce=off memmap=exactmap memmap=538K@64K
memmap=130508K@770048K elfcorehdr=900556K memmap=72K#3668608K
memmap=184K#3668680K
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 0000000000096800 (usable)
BIOS-e820: 0000000000096800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dfe90000 (usable)
BIOS-e820: 00000000dfe9e000 - 00000000dfea0000 (reserved)
BIOS-e820: 00000000dfea0000 - 00000000dfeb2000 (ACPI data)
BIOS-e820: 00000000dfeb2000 - 00000000dfee0000 (ACPI NVS)
BIOS-e820: 00000000dfee0000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)

I believe OFED version is 1.5.3.3 but I am not sure if this is correct.
We have not added any third parry drivers. All that has been done to add infiniband to our build is
a yum groupinstall iInfiniband support.
I have not tries rds-stress but rds-ping works fine and rds-info seems fine.
A service request has been opened but so far I have had better response here.
oracle@blade1-6:~> rds-info
RDS IB Connections:
LocalAddr RemoteAddr LocalDev RemoteDev
10.10.0.116 10.10.0.119 fe80::25:90ff:ff07:df1d fe80::25:90ff:ff07:e0e5
TCP Connections:
LocalAddr LPort RemoteAddr RPort HdrRemain DataRemain SentNxt ExpectUna SeenUna
Counters:
CounterName Value
conn_reset 5
recv_drop_bad_checksum 0
recv_drop_old_seq 0
recv_drop_no_sock 1
recv_drop_dead_sock 0
recv_deliver_raced 0
recv_delivered 18
recv_queued 18
recv_immediate_retry 0
recv_delayed_retry 0
recv_ack_required 4
recv_rdma_bytes 0
recv_ping 14
send_queue_empty 18
send_queue_full 0
send_lock_contention 0
send_lock_queue_raced 0
send_immediate_retry 0
send_delayed_retry 0
send_drop_acked 0
send_ack_required 3
send_queued 32
send_rdma 0
send_rdma_bytes 0
send_pong 14
page_remainder_hit 0
page_remainder_miss 0
copy_to_user 0
copy_from_user 0
cong_update_queued 0
cong_update_received 1
cong_send_error 0
cong_send_blocked 0
ib_connect_raced 4
ib_listen_closed_stale 0
ib_tx_cq_call 6
ib_tx_cq_event 6
ib_tx_ring_full 0
ib_tx_throttle 0
ib_tx_sg_mapping_failure 0
ib_tx_stalled 16
ib_tx_credit_updates 0
ib_rx_cq_call 33
ib_rx_cq_event 38
ib_rx_ring_empty 0
ib_rx_refill_from_cq 0
ib_rx_refill_from_thread 0
ib_rx_alloc_limit 0
ib_rx_credit_updates 0
ib_ack_sent 4
ib_ack_send_failure 0
ib_ack_send_delayed 0
ib_ack_send_piggybacked 0
ib_ack_received 3
ib_rdma_mr_alloc 0
ib_rdma_mr_free 0
ib_rdma_mr_used 0
ib_rdma_mr_pool_flush 8
ib_rdma_mr_pool_wait 0
ib_rdma_mr_pool_depleted 0
ib_atomic_cswp 0
ib_atomic_fadd 0
iw_connect_raced 0
iw_listen_closed_stale 0
iw_tx_cq_call 0
iw_tx_cq_event 0
iw_tx_ring_full 0
iw_tx_throttle 0
iw_tx_sg_mapping_failure 0
iw_tx_stalled 0
iw_tx_credit_updates 0
iw_rx_cq_call 0
iw_rx_cq_event 0
iw_rx_ring_empty 0
iw_rx_refill_from_cq 0
iw_rx_refill_from_thread 0
iw_rx_alloc_limit 0
iw_rx_credit_updates 0
iw_ack_sent 0
iw_ack_send_failure 0
iw_ack_send_delayed 0
iw_ack_send_piggybacked 0
iw_ack_received 0
iw_rdma_mr_alloc 0
iw_rdma_mr_free 0
iw_rdma_mr_used 0
iw_rdma_mr_pool_flush 0
iw_rdma_mr_pool_wait 0
iw_rdma_mr_pool_depleted 0
tcp_data_ready_calls 0
tcp_write_space_calls 0
tcp_sndbuf_full 0
tcp_connect_raced 0
tcp_listen_closed_stale 0
RDS Sockets:
BoundAddr BPort ConnAddr CPort SndBuf RcvBuf Inode
0.0.0.0 0 0.0.0.0 0 131072 131072 340441
RDS Connections:
LocalAddr RemoteAddr NextTX NextRX Flg
10.10.0.116 10.10.0.119 33 38 --C
Receive Message Queue:
LocalAddr LPort RemoteAddr RPort Seq Bytes
Send Message Queue:
LocalAddr LPort RemoteAddr RPort Seq Bytes
Retransmit Message Queue:
LocalAddr LPort RemoteAddr RPort Seq Bytes
10.10.0.116 0 10.10.0.119 40549 32 0
oracle@blade1-6:~> cat /etc/rdma/rdma.conf
# Load IPoIB
IPOIB_LOAD=yes
# Load SRP module
SRP_LOAD=no
# Load iSER module
ISER_LOAD=no
# Load RDS network protocol
RDS_LOAD=yes
# Should we modify the system mtrr registers? We may need to do this if you
# get messages from the ib_ipath driver saying that it couldn't enable
# write combining for the PIO buffs on the card.
# Note: recent kernels should do this for us, but in case they don't, we'll
# leave this option
FIXUP_MTRR_REGS=no
# Should we enable the NFSoRDMA service?
NFSoRDMA_LOAD=yes
NFSoRDMA_PORT=2050
oracle@blade1-6:~> /etc/init.d/rdma status
Low level hardware support loaded:
     mlx4_ib
Upper layer protocol modules:
     rds_rdma ib_ipoib
User space access modules:
     rdma_ucm ib_ucm ib_uverbs ib_umad
Connection management modules:
     rdma_cm ib_cm iw_cm
Configured IPoIB interfaces: none
Currently active IPoIB interfaces: ib0

Similar Messages

Word Crashing When Discarding Checkout for Document Stored in SharePoint

We have recently noticed an issue when we have a Word file checked out from SharePoint, if we then discard the checkout from Word, Word crashes. The error details point to an issue in wwlib.dll.
Having done some further investigation into this it appears this only happens if the document has an attached template which contains a custom ribbon. We use SharePoint 2013 and Word 2010, although I have tested using Word 2013 with the same results.
This was noticed on our company templates which contain a custom ribbon tab and a number of custom buttons. I have since tested it by creating a template with a single button on a custom tab with the same results.
Has anyone else come across this issue and and is there any way to resolve it?
Thanks,
Richard

Hi Daniel,
Thanks for the response.
The issue only happens when discarding checkout. Editing and saving a document back to SharePoint does not cause an issue. I am using a dotm template. I am not storing the template
as a content type. When I say I created a custom tab, this was created by adding the XML to create the ribbon tab and button. I used the 'Custom UI Editor for Microsoft Office' to add this. The file created for this is called customUI14.xml and the
XML is as follows (in the basic test template that I created):
<customUI xmlns="http://schemas.microsoft.com/office/2009/07/customui">
<ribbon startFromScratch="false">
<tabs>
<tab id="customTab" label="Custom Tab">
<group id="customGroup" label="Custom Group">
<button id="customButton" label="Test Ribbon Button" imageMso="HappyFace" size="large" onAction="TestRibbonCrash" />
</group>
</tab>
</tabs>
</ribbon>
</customUI>
The code this points to is as follows:
Sub TestRibbonCrash(ByRef Ctrl As IRibbonControl)
MsgBox "Ribbon button working"
End Sub
Following your reply I have tested it by adding a custom tab and button through the GUI and the problem doesn't occur when using that method of creating the tab and button.
I will see if I can find anything useful in the ULS logs although no correlation ID is displayed when the error happens and in my experience even with a correlation ID, 99% of the time the ULS logs don't provide any useful information.
I've also observed that Word doesn't crash if I discard checkout when I have another document open at the same time which is based on the same template.
I have already posted on a MS Word forum. Is there more chance of getting some help on one of the other Word forums you have suggested?
http://answers.microsoft.com/en-us/office/forum/office_2010-word/word-crashing-when-discarding-checkout-for/95565f9e-411b-4f11-b14d-d3771e2e2ba4
Thanks,
Richard

Is anybody else experiencing crashes when installing iTunes for Windows?

Is anybody else experiencing crashes when installing iTunes for Windows?

For general advice see Troubleshooting issues with iTunes for Windows updates.
The steps in the second box (similar to those given above) are a guide to removing everything related to iTunes and then rebuilding it which is often a good starting point unless the symptoms indicate a more specific approach. Review the other boxes and the list of support documents further down the page in case one of them applies.
Your library should be unaffected by these steps but there is backup and recovery advice elsewhere in the user tip.
tt2

Flash CS5.5 crashes when exporting movieclip for Actionscript

Hello,
My Flash CS5.5 crashes when exporting movieclip for Actionscript (select movieclips -> modify -> convert to symbol -> check export for actionscript). I think this only happens when Photoshop CS5 is open.
Does anyone know how to fix this? I'm using Win7 64-bit. The only open programs are Flash, Photoshop, and Firefox.

Oddly enough, I opened up Flash again, today, just to see if it was still the same, and it had worked. Everything seemed normal, I opened a new document and started working. Then, out of nowhere it stopped responding again. So I waited for it to respond, it didn't. I closed it from the task manager, and now it's back to the original problem of crashing when I try to create a new document or open an old ontrie
Update: I tried the cleaner, as you'd suggested. Followed all the steps, but to no avail. Think it may have something to do with my machine, and less with the software?

NICs for Private Interconnect redundancy

DB/Grid version : 11.2.0.2
Platform : AIX 6.1
We are going to install a 2-node RAC on AIX (that thing which is almost good as Solaris )
Our primary private interconnect is
### Primary Private Interconnect
169.21.204.1      scnuprd186-privt1.mvtrs.net scnuprd186-privt1
169.21.204.4      scnuprd187-privt1.mvtrs.net scnuprd187-privt1For Cluster inteconnect's redundancy , Unix team has attached an extra NIC for each node with an extra Gigabit-ethernet switch for these NICs.
###Redundant Private Interconnect attached to the server
169.21.204.2      scnuprd186-privt2.mvtrs.net scnuprd186-privt2 # Node1's newly attached redundant NIC
169.21.204.5      scnuprd187-privt2.mvtrs.net scnuprd187-privt2 # Node2's newly attached redundant NICExample borrowed from citizen2's post
Apparently I have 2 ways to implement cluster inteconnect's redundancy
Option1. NIC bonding at OS level
Option2. Let grid software do it
Question1. Which is better : Option 1 or 2 ?
Question2.
Regarding Option2.
From googling and OTN , i gather that , during grid installation you just provide 169.21.204.0 for cluster inteconnect and grid will identify the redundant NIC and switch. And if something goes wrong with the Primary Interconnect setup (shown above) , grid will automatically re-route interconnect traffic using the redundant NIC setup. Is this correct ?
Question 3.
My colleague tells me , for the redundant Switch (Gigabit) Unless I configure some Multicasting (AIX specific), I could get errors during installation. He doesn't clearly what it was ? Anyone faced Multicasting related issue on this ?

Hi,
My recommendation is to you use the AIX EtherChannel.
The EtherCannel of AIX is much more powerfull and stable compared with HAIP.
See how setup AIX EtherChannel on 10 Gigabit Ethernet interfaces
http://levipereira.wordpress.com/2011/01/26/setting-up-ibm-power-systems-10-gigabit-ethernet-ports-and-aix-6-1-etherchannel-for-oracle-rac-private-interconnectivity/
If you choose use HAIP I recommend you read this note, and find all notes about bugs of HAIP on AIX.
11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip [ID 1210883.1]
ASM Crashes as HAIP Does not Failover When Two or More Private Network Fails [ID 1323995.1]
About Multicasting read it:
Grid Infrastructure 11.2.0.2 Installation or Upgrade may fail due to Multicasting Requirement [ID 1212703.1]
Regards,
Levi Pereira

Calendar app crashes when selecting save for this event only

When putting my work schedule into the calendar app, I usually put it in as a repeating event until a certain date. If I edit one of those days and hit "Save for this event only" the app will crash. It does save, but the crash is quite annoying.

I'm having a similar issue, but mine is crashing when I try to delete one occurrence of a repeating event, and when I re-open the calendar the event I'm trying to delete is still there. This only began after the 8.3 update. Eventually I just gave up and deleted it on my Macbook at home, but there's obviously a bug in the update.

Layout Editor crashes when loading icon for iconic button (11gR2 patch 1)

Using Forms 11gR2 Patch 1, 64 bit on Solaris 10, the Layout Editor crashes when reading the .GIF file to display an iconic button. (The workaround is not to set UI_ICON and UI_ICON_EXTENSION in the frmbld.sh script. In that case, the Layout Editor displays an empty iconic button, but at least it does not crash.)
The truss output from frmbld says,
5078/1:          open("/usr/local/Oracle/Middleware/Oracle_FRHome1/ohs/icons/bomb.gif", O_RDONLY) = 24
5078/1:          lseek(24, 0, SEEK_END)                    = 308
5078/1:          lseek(24, 0, SEEK_CUR)                    = 308
5078/1:          lseek(24, 0, SEEK_SET)                    = 0
5078/1:          lseek(24, 161, SEEK_SET)               = 161
5078/1:          fstat(24, 0xFFFFFD7FFFDF9440)               = 0
5078/1:          fstat(24, 0xFFFFFD7FFFDF9390)               = 0
5078/1:          ioctl(24, TCGETA, 0xFFFFFD7FFFDF9400)          Err#25 ENOTTY
5078/1:          read(24, "048F 0C9 IA7B8 X $C0BBDF".., 8192)     = 147
5078/1:          Incurred fault #6, FLTBOUNDS %pc = 0xFFFFFD7FFDF3FB42
5078/1:          siginfo: SIGSEGV SEGV_MAPERR addr=0x0000004C
5078/1:          Received signal #11, SIGSEGV [caught]
5078/1:          siginfo: SIGSEGV SEGV_MAPERR addr=0x0000004C
(These are consecutive lines from the truss output, so it would appear that read()'ing from the bomb.gif file -- or any other .GIF file for that matter -- causes the FLTBOUNDS fault.)
The hs_err_pid*.log file in the Forms home directory says,
# Problematic frame:
# C [libuimotif.so.0+0x13fb42] uiimkxu_Support+0x4f42
(If it matters, Solaris is running in a VMware virtual machine on Windows 7.)
Did I find a bug?

Stebalien wrote:Segmentation faults are ALWAYS bugs.
https://bbs.archlinux.org/viewtopic.php?pid=1381173
https://bugs.launchpad.net/ubuntu/+sour … ug/1279412
so, the problem is in fgrlx, but in opensuse it worked fine, mb this happens because I use latest beta driver

WRT54GL crashes when i choose for a wireless encryption

Hello people
My problem:
My wireless router crashes when i am installing it.
And when i configure the the wireless encryption, my router hangs.
Then i have to reset it and i can configure it again, but the same thing happens again.
I am using the latest firmware 4.30.7
is my router out of order.
Greetings Neppinda

Try to reflash the router's firmware and re-configure the router from scratch.

ASM on one node crashes when we start the other two nodes ASM

We completed database build in Aug 2010
We complete PSU patching in Jan ending
Feb 4th the database crashed
We cannot start ASM on node1
ASM starts good on node2 and node3 but node1 cannot join
If ASM is down on node2, node3 then we can start ASM node1Reconfiguration started (old inc 0, new inc 6)
ASM instance
List of nodes:
0 1 2
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* allocate domain 1, invalid = TRUE
* allocate domain 2, invalid = TRUE
Mon Mar 01 16:53:00 2010
Trace dumping is performing id=[cdmp_20100301165301]
Mon Mar 01 16:53:55 2010
ERROR: LMD0 (ospid: 274638) detects an idle connection to instance 2
Mon Mar 01 16:54:44 2010
Errors in file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_860280.trc (incident=116865):
ORA-29740: evicted by member 1, group incarnation 8
Incident details in: /oradb/oracle/diag/asm/+asm/+ASM1/incident/incdir_116865/+ASM1_lmon_860280_i116865.trc
Errors in file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_860280.trc:
ORA-29740: evicted by member 1, group incarnation 8
LMON (ospid: 860280): terminating the instance due to error 29740
Mon Mar 01 16:54:46 2010
System state dump is made for local instance
Errors in file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_diag_614488.trc (incident=116833):
ORA-29740: evicted by member , group incarnation
Incident details in: /oradb/oracle/diag/asm/+asm/+ASM1/incident/incdir_116833/+ASM1_diag_614488_i116833.trc
Mon Mar 01 16:54:46 2010
ORA-1092 : opitsk aborting process
Errors in file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_diag_614488.trc:
ORA-29740: evicted by member , group incarnation
Trace dumping is performing id=[cdmp_20100301165446]
Instance terminated by LMON, pid = 860280
Another thing we found that when we start ASM on node1, the cluster interconnect hangs when we try to ping
We did modify the cluster_interconnect parameter to try to start using public interface but the issued remained the same and we were not able to ping public interface
The crs is fine
$ crs_stat -t
Name Type Target State Host
ora....p1.inst application ONLINE OFFLINE
ora....p2.inst application ONLINE ONLINE noden2
ora....p3.inst application ONLINE ONLINE noden3
ora....1p2.srv application ONLINE ONLINE noden2
ora....1p3.srv application ONLINE ONLINE noden3
ora.....net.cs application ONLINE ONLINE noden1
ora.appl.db application ONLINE ONLINE noden1
ora....SM1.asm application ONLINE OFFLINE
ora....N1.lsnr application ONLINE ONLINE noden1
ora....8n1.gsd application ONLINE ONLINE noden1
ora....8n1.ons application ONLINE ONLINE noden1
ora....8n1.vip application ONLINE ONLINE noden1
ora....SM2.asm application ONLINE ONLINE noden2
ora....N2.lsnr application ONLINE ONLINE noden2
ora....8n2.gsd application ONLINE ONLINE noden2
ora....8n2.ons application ONLINE ONLINE noden2
ora....8n2.vip application ONLINE ONLINE noden2
ora....SM3.asm application ONLINE ONLINE noden3
ora....N3.lsnr application ONLINE ONLINE noden3
ora....8n3.gsd application ONLINE ONLINE noden3
ora....8n3.ons application ONLINE ONLINE noden3
ora....8n3.vip application ONLINE ONLINE noden3
Any inpts can help

Env
3-node RAC
oracle version 11.1.0.7
Latest PSU Jan applied
OS is AIX version is 6100-02==========
LMON trace files
==========
Trace file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_860280.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oradb/oracle/product/11.1/asm_1
System name:     AIX
Node name:     host-node1
Release:     1
Version:     6
Machine:     00C39EA44C00
Instance name: +ASM1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 8
Unix process pid: 860280, image: oracle@host-node1 (LMON)
*** 2010-03-01 16:50:23.023
*** SESSION ID:(218.1) 2010-03-01 16:50:23.023
*** CLIENT ID:() 2010-03-01 16:50:23.023
*** SERVICE NAME:() 2010-03-01 16:50:23.023
*** MODULE NAME:() 2010-03-01 16:50:23.023
*** ACTION NAME:() 2010-03-01 16:50:23.023
GES resources 5596 pool 6
GES enqueues 7959
GES IPC: Receivers 2 Senders 2
GES IPC: Buffers Receive 1000 Send (i:1150 b:482) Reserve 402
GES IPC: Msg Size Regular 416 Batch 8192
Batching factor: enqueue replay 201, ack 224
Batching factor: cache replay 126 size per lock 64
kjxggin: CGS tickets = 1000
kjxgrdmpcpu: CPU Total 6 Core 3 Socket -1 OCPU 6
kjxgrdmpcpu: High load threshold 21504
*** 2010-03-01 16:50:23.362
kjxgmrcfg: Reconfiguration started, type 1
kjxgmcs: Setting state to 0 0.
*** 2010-03-01 16:50:23.363
Name Service frozen
kjxgmcs: Setting state to 0 1.
kjxgrdecidever: No old version members in the cluster
kjxgrssvote: reconfig bitmap chksum 0x88477268 cnt 3 master 0 ret 0
ksirValidateModuleInfo: action = 10 startup = 0
Name Service Mode: multi (0x21)
kjfcpiora: published my fusion master weight 5322
kjfcpiora: publish my flogb 9
kjfcpiora: publish my cluster_database_instances parameter=3
kjxggpoll: change poll time to 50 ms
kjxgrpropmsg: SSMEMI: inst 1 - no disk vote
kjxgrpropmsg: SSMEMI: inst 1 - no disk vote
kjxgrpropmsg: SSMEMI: inst 2 - no disk vote
SSVOTE: Master indicates no Disk Voting
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 6 2.
kjfmuin: bitmap 0 1 2
kjfmmhi: received msg from 0 (inc 6)
kjfmmhi: received msg from 1 (inc 2)
kjfmmhi: received msg from 2 (inc 4)
Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 6 3.
Name Service recovery started
Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 6 4.
Multicasted all local name entries for publish
Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 6 5.
Name Service normal
Name Service recovery done
*** 2010-03-01 16:50:23.889
*** 2010-03-01 16:50:23.958
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 6 6.
kjxggpoll: change poll time to 600 ms
2010-03-01 16:50:23.980620 :
********* kjfcrfg() called, BEGIN LMON RCFG *********
kjfcrfg: DRM window size = 0->128 (min lognb = 9)
2010-03-01 16:50:23.980811 :
Reconfiguration started (old inc 0, new inc 6)
ASM instance
Send timeout: 300 secs
Defer Queue timeout: 360 secs
Synchronization timeout: 420 sec
List of nodes:
0 1 2
*** 2010-03-01 16:50:24.023
2010-03-01 16:50:24.034432 : Global Resource Directory frozen
node 0
release 11 1 0 7
node 1
release 11 1 0 7
node 2
release 11 1 0 7
number of mastership buckets = 128
2010-03-01 16:50:24.034959 :
domain attach called for domid 0
* kjbdomalc: domain 0 invalid = TRUE
* kjbdomatt: first attach for domain 0
asby init, 0/0/x1
asby returns, 0/0/x1/false
* Domain maps before reconfiguration:
* DOMAIN 0 (valid 1): 0
* End of domain mappings
* Domain maps after recomputation:
* DOMAIN 0 (valid 1): 0 1 2
* End of domain mappings
Dead inst
Join inst 0 1 2
Exist inst
Active Sendback Threshold = 50 %
Communication channels reestablished
2010-03-01 16:50:24.152688 :
received all domreplay (6.6)
2010-03-01 16:50:24.152732 :
sent master 1 (6.6)
*** 2010-03-01 16:53:00.494
kjfmReceiverHealthCB_Check: Reciever [0] is healthy.
2010-03-01 16:52:56.921800 : Received comm error info from 2 (cnt 1)
kjxgrvalid: valid - 0.1 : (6 6) from 2
kjxgrrcfgchk: Initiating reconfig, reason=3
kjxgrrcfgchk: COMM rcfg - Disk Vote Required
2010-03-01 16:52:57.077877 : kjxgrnetchk: start 0x53001440, end 0x53019ae0
2010-03-01 16:52:57.077906 : kjxgrnetchk: Sending comm check req to 1
2010-03-01 16:52:57.078140 : kjxgrnetchk: Sending comm check req to 2
kjxgrrcfgchk: prev pstate 5 mapsz 512
kjxgrrcfgchk: new bmp: 0 1 2
kjxgrrcfgchk: work bmp: 0 1 2
kjxgrrcfgchk: rr bmp: 0 1 2
*** 2010-03-01 16:53:00.792
kjxgmrcfg: Reconfiguration started, type 3
kjxgmcs: Setting state to 6 0.
*** 2010-03-01 16:53:00.792
Name Service frozen
kjxgmcs: Setting state to 6 1.
kjxgrdecidever: No old version members in the cluster
kjxgrmsghndlr: Queue msg (0x110a21e50->0x110f09b90) type 7 for later
*** 2010-03-01 16:54:43.233
kjxgrssvote: reconfig bitmap chksum 0x88477268 cnt 3 master 2 ret 0
kjxgrrcfgchk: disable CGS timeout
kjxggpoll: change poll time to 50 ms
* kjfcchknested: CGS rcfg detected in step 7.0.0
SSVOTE: Master indicates Disk Voting required
2010-03-01 16:54:37.535518 : kjxgrmsghndlr: evict req from 1 for 0, seq (8, 8) vers 2193970751
2010-03-01 16:54:37.535587 : kjxgrdtrt: Evicted by 1, seq (8, 8)
IMR state information
Member 0, thread -1, state 0x2:c, flags 0x2c48
RR seq commit 6 cur 8
Propstate 3 prv 2 pending 0
rcfg rsn 3, rcfg time 1392514113, mem ct 3
master 2, master rcfg time 1392479783
evicted memcnt 0, starttm 0 chkcnt 0
system load 241 (normal)
Member information:
Member 0, incarn 6, version 0x82c5563f, thrd -1
prev thrd -1, status 0x1203 (JR..), err 0x0000
Member 1, incarn 6, version 0x82c1073b, thrd 2
prev thrd -1, status 0x1007 (JRM.), err 0x0002
Member 2, incarn 6, version 0x82c114ee, thrd 3
prev thrd -1, status 0x0007 (JRM.), err 0x0000
=====================================================
Group name: +ASM
Member id: 0
Cached KGXGN event: 0
Group State:
State: 6 1
Reconfig started start-tm 0x4b8c373c tmout period 0xffffffff state 0x2
Reconfig INPG type 3 inc 6 rsn 0 data 0x0
Reconfig COMP type 1 inc 6 rsn 0 data 0x0
Commited Map: 0 1 2
New Map: 0 1 2
KGXGN Map: 0 1 2
KGXGN Map2: 0 1 2
Master node: 0
Memcnt 3 Rcvcnt 0
Substate Proposal: false
Inc Proposal:
incarn 0 memcnt 0 master 0
proposal false matched false
map:
Master Inc State:
incarn 0 memcnt 0 agrees 0 flag 0x1
wmap:
nmap:
ubmap:
Substate Handler Execution State
substate 0 status done
substate 1 status done
substate 2 status done
substate 3 status done
substate 4 status done
substate 5 status done
substate 6 status done
IMR hist: 20[0x0a00:0x53019b0e] 4[0x0007:0x53019b0e] 3[0x0006:0x53019b0e]
IMR hist: 20[0x0902:0x53019b0e] 20[0x0702:0x53019b0b] 20[0x0702:0x53019b0a]
IMR hist: 20[0x0702:0x53019b0a] 1[0x0006:0x53019b0a] 20[0x0702:0x53019aff]
IMR hist: 10[0x0006:0x52fdbdb1] 20[0x0b00:0x52fdbdb1] 9[0x0006:0x52fdbdaf]
IMR hist: 20[0x0a02:0x52fdbdaf] 20[0x0a01:0x52fdbce1] 20[0x0a00:0x52fdbc8a]
IMR hist: 4[0x0005:0x52fdbc86] 3[0x0004:0x52fdbc4c] 20[0x0900:0x52fdbc4c]
IMR hist: 20[0x0802:0x52fdbc08] 20[0x0801:0x52fdbc08] 20[0x0801:0x52fdbc08]
IMR hist: 20[0x0602:0x52fdbc08] 20[0x0601:0x52fdbc08] 20[0x0601:0x52fdbc08]
IMR hist: 20[0x0800:0x52fdbc08] 20[0x0700:0x52fdbc08] 20[0x0602:0x52fdbc07]
IMR hist: 20[0x0800:0x52fdbc07] 20[0x0700:0x52fdbc07] 1[0x0000:0x52fdbbb8]
IMR hist: 0[0x0000:0x00000000] 0[0x0000:0x00000000]
KJM HIST LMD0:
7:0 6:0 5:7:0 12:97697 7:0 6:0 5:7:0 12:97696 7:0 6:0
5:7:0 12:97703 7:0 6:0 5:7:0 2:0 1:0 12:97713 7:0 6:0
5:7:0 12:97766 7:0 6:0 5:7:0 12:97782 7:0 6:0 5:7:0 12:97778
7:0 6:0 5:7:0 12:97799 7:0 6:0 5:7:0 12:97771 7:0 6:0
5:7:0 12:97784 7:0 6:0 5:7:0 12:97805 7:0 6:0 5:7:0 12:97785
7:0 6:0 5:7:0 12:97757 7:0 6:0 5:7:0 12:97770 7:0 6:0
5:7:0 12:97784 7:0 6:0
KJM HIST LMS0:
7:0 6:0 5:7:0 10:0 12:97697 7:0 6:0 5:7:0 10:0 12:97696
7:0 6:0 5:7:0 10:0 12:97703 7:0 6:0 5:7:0 10:0 12:97713
7:0 6:0 5:7:0 10:0 2:0 12:97766 7:0 6:0 5:7:0 10:0
12:97782 7:0 6:0 5:7:0 10:0 12:97778 7:0 6:0 5:7:0 10:0
12:97799 7:0 6:0 5:7:0 10:0 12:97771 7:0 6:0 5:7:0 10:0
12:97784 7:0 6:0 5:7:0 10:0 12:97805 7:0 6:0 5:7:0 10:0
12:97785 7:0 6:0 5:7:0
DUMP state for lmd0 (ospid 274638)
DUMP IPC context for lmd0 (ospid 274638)
Dumping process 9.274638 info:
*** 2010-03-01 16:54:43.664
Process diagnostic dump for oracle@host-node1 (LMD0), OS id=274638,
pid: 9, proc_ser: 1, sid: 217, sess_ser: 1
loadavg : 1.72 1.07 0.90
swap info: free_mem = 28642.09M rsv = 16.00M
alloc = 21.13M avail = 4096.00M swap_free = 4074.87M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
240001 A oracle 274638 1 0 60 20 12ca9f590 156060 16:50:23 - 0:00 asm_lmd0_+ASM1
Short stack dump:
<-ksedsts()+0254<-ksdxfstk()+0028<-ksdxcb()+05d8<-sspuser()+0074<-4750<-poll()+000c<-sskgxp_select()+00e4<-skgxpiwait()+08a4<-skgxpwait()+06fc<-ksxpwait()+081c<-ksliwat()+0a58<-kslwaitctx()+0150<-kslwait()+006c<-ksxprcvimd()+0368<-kjctr_rksxp()+013c<-kjctrcv()+0160<-kjcsrmg()+005c<-kjmdm()+2454<-ksbrdp()+075c<-opirip()+0444<-opidrv()+0414<-sou2o()+0090<-opimai_real()+0148<-main()+0090<-__start()+0070
Process diagnostic dump actual duration=0.161000 sec
(max dump time=30.000000 sec)
*** 2010-03-01 16:54:43.825
SO: 0x70000001ff913a0, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x70000001ff913a0, name=process, file=ksu.h LINE:10706 ID:, pg=0
(process) Oracle pid:9, ser:1, calls cur/top: 0x70000001f733140/0x70000001f733140
flags : (0x6) SYSTEM
flags2: (0x100), flags3: (0x0)
int error: 0, call error: 0, sess error: 0, txn error 0
ksudlp FALSE at location: 0
(post info) last post received: 0 0 83
last post received-location: kji.h LINE:2369 ID:kjga: clear wait for lmon
last process to post me: 70000001ff903b0 1 6
last post sent: 0 0 25
last post sent-location: ksa2.h LINE:282 ID:ksasnd
last process posted by me: 70000001ff903b0 1 6
(latch info) wait_event=68 bits=0
Process Group: DEFAULT, pseudo proc: 0x70000001f4851d0
O/S info: user: oracle, term: UNKNOWN, ospid: 274638
OSD pid info: Unix process pid: 274638, image: oracle@host-node1 (LMD0)
Dump of memory from 0x070000001FF70038 to 0x070000001FF70240
70000001FF70030 00000000 00000000 [........]
70000001FF70040 00000000 00000000 00000000 00000000 [................]
Repeat 31 times
SO: 0x70000001f6de4a0, type: 4, owner: 0x70000001ff913a0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x70000001ff913a0, name=session, file=ksu.h LINE:10719 ID:, pg=0
(session) sid: 217 ser: 1 trans: 0x0, creator: 0x70000001ff913a0
flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x408) -/-
DID: , short-term DID:
txn branch: 0x0
oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2613 seq_num=2614 snap_id=1
wait times: snap=0.018269 sec, exc=0.018269 sec, total=0.018269 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
in_wait=1 iflags=0x5a8
Wait State:
auto_close=0 flags=0x22 boundary=0x0/-1
Session Wait History:
0: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2612 seq_num=2613 snap_id=1
wait times: snap=0.160172 sec, exc=0.160172 sec, total=0.160172 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000008 sec of elapsed time
1: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2611 seq_num=2612 snap_id=1
wait times: snap=0.096359 sec, exc=0.096359 sec, total=0.096359 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000008 sec of elapsed time
2: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2610 seq_num=2611 snap_id=1
wait times: snap=0.098065 sec, exc=0.098065 sec, total=0.098065 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000007 sec of elapsed time
3: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2609 seq_num=2610 snap_id=1
wait times: snap=0.097831 sec, exc=0.097831 sec, total=0.097831 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000014 sec of elapsed time
4: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2608 seq_num=2609 snap_id=1
wait times: snap=0.095876 sec, exc=0.095876 sec, total=0.095876 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000008 sec of elapsed time
5: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2607 seq_num=2608 snap_id=1
wait times: snap=0.098788 sec, exc=0.098788 sec, total=0.098788 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000006 sec of elapsed time
6: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2606 seq_num=2607 snap_id=1
wait times: snap=0.098854 sec, exc=0.098854 sec, total=0.098854 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000007 sec of elapsed time
7: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2605 seq_num=2606 snap_id=1
wait times: snap=0.098040 sec, exc=0.098040 sec, total=0.098040 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000008 sec of elapsed time
8: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2604 seq_num=2605 snap_id=1
wait times: snap=0.097322 sec, exc=0.097322 sec, total=0.097322 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000007 sec of elapsed time
9: waited for 'ges remote message'
waittime=40, loop=0, p3=44
wait_id=2603 seq_num=2604 snap_id=1
wait times: snap=0.097334 sec, exc=0.097334 sec, total=0.097334 sec
wait times: max=0.080000 sec
wait counts: calls=1 os=1
occurred after 0.000008 sec of elapsed time
Sampled Session History
The sampled session history is constructed by sampling
the target session every 1 second. The sampling process
captures at each sample if the session is in a non-idle wait,
an idle wait, or not in a wait. If the session is in a
non-idle wait then one interval is shown for all the samples
the session was in the same non-idle wait. If the
session is in an idle wait or not in a wait for
consecutive samples then one interval is shown for all
the consecutive samples. Though we display these consecutive
samples in a single interval the session may NOT be continuously
idle or not in a wait (the sampling process does not know).
The history is displayed in reverse chronological order.
sample interval: 1 sec, max history 120 sec
KSFD PGA DUMPS
Number of completed I/O requests=0 flags=0
END OF PROCESS STATE
LMON IPC context:
ksxpdmp: facility 0 (?) (0x1, 0x0) counts 0, 0
ksxpdmp: Dumping the osd context
SKGXP: SKGXPCTX: 0x1103bfb58 ctx
SKGXP:
SKGXP: WAIT HISTORY
SKGXP: Time(msec)     Wait Type     Return Code
SKGXP: ----------     ---------     ------------
SKGXP: 0          NORMAL          SUCC
SKGXP: 0          NORMAL          SUCC
SKGXP: 0          NORMAL          SUCC
SKGXP: 0          NORMAL          SUCC
SKGXP: 0          NORMAL          TIMEDOUT
SKGXP: 12          NORMAL          TIMEDOUT
SKGXP: 0          NORMAL          TIMEDOUT
SKGXP: 20          NORMAL          TIMEDOUT
SKGXP: 0          NORMAL          TIMEDOUT
SKGXP: 19          NORMAL          TIMEDOUT
SKGXP: 0          NORMAL          TIMEDOUT
SKGXP: 20          NORMAL          TIMEDOUT
SKGXP: 0          NORMAL          TIMEDOUT
SKGXP: 19          NORMAL          TIMEDOUT
SKGXP: 0          NORMAL          TIMEDOUT
SKGXP: 20          NORMAL          TIMEDOUT
SKGXP: wait delta 0 sec (27 msec) ctx ts 0x3e377 last ts 0x3e381
SKGXP: user cpu time since last wait 0 sec 0 ticks
SKGXP: system cpu time since last wait 0 sec 0 ticks
SKGXP: locked 1
SKGXP: blocked 51
SKGXP: timed wait receives 0
SKGXP: admno 0x485303b1 admport:
SKGXP: SSKGXPT 0x103c0a74 flags sockno 12 IP 192.168.253.49 UDP 49777
SKGXP: context timestamp 0x3e377
SKGXP: buffers queued on port 1105aa950
SKGXP:
SKGXP: Dumping Connection Handle Table
SKGXP: sconno accono ertt state seq# RcvPid TotCreditsSKGXP: sent rtrans acks
SKGXP: CNH Table Bucket: 10
SKGXP: 0x339d0248 0x6dd6841c 64 4 32838 589900 8SKGXP: 75d 5d 32838d
SKGXP: CNH Table Bucket: 11
SKGXP: 0x339d0249 0x75ef4c98 32 4 32811 1007758 8SKGXP: 48d 12d 32811d
SKGXP: CNH Table Bucket: 12
SKGXP: 0x339d024a 0x75703ec2 16 4 32763 524518 8SKGXP: 0d 0d 0d
SKGXP: CNH Table Bucket: 13
SKGXP: 0x339d024b 0x41094259 16 4 32763 520260 8SKGXP: 0d 0d 0d
SKGXP: CNH Table Bucket: 14
SKGXP: 0x339d024c 0x7c1c696c 16 4 32763 585808 8SKGXP: 0d 0d 0d
SKGXP: CNH Table Bucket: 15
SKGXP: 0x339d024d 0x138c8c4a 16 4 32763 843952 8SKGXP: 0d 0d 0d
SKGXP:
SKGXP: Dumping Accept Handle Table
SKGXP: ach accono sconno admno state SndPid seq# rcv rtrans acks credits
SKGXP: ACH Table Bucket: 1472
SKGXP: 0x111088010 0x48cb4387 0x3365b236 0x1fe7dc68 40 1007758 32812 49 0 26 8
SKGXP: ACH Table Bucket: 1474
SKGXP: 0x11108b730 0x48cb4389 0x1c69654a 0x7183ff4c 40 589900 32838 75 0 52 8
Incident 116865 created, dump file: /oradb/oracle/diag/asm/+asm/+ASM1/incident/incdir_116865/+ASM1_lmon_860280_i116865.trc
ORA-29740: evicted by member 1, group incarnation 8
error 29740 detected in background process
ORA-29740: evicted by member 1, group incarnation 8
*** 2010-03-01 16:54:46.430
LMON (ospid: 860280): terminating the instance due to error 29740
ksuitm: waiting up to [5] seconds before killing DIAG
==========
DIAG trace files
=========
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oradb/oracle/product/11.1/asm_1
System name:     AIX
Node name:     host-node1
Release:     1
Version:     6
Machine:     00C39EA44C00
Instance name: +ASM1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 4
Unix process pid: 614488, image: oracle@host-node1 (DIAG)
*** 2010-03-01 16:50:22.947
*** SESSION ID:(222.1) 2010-03-01 16:50:22.947
*** CLIENT ID:() 2010-03-01 16:50:22.947
*** SERVICE NAME:() 2010-03-01 16:50:22.947
*** MODULE NAME:() 2010-03-01 16:50:22.947
*** ACTION NAME:() 2010-03-01 16:50:22.947
Node id: 0
List of nodes: 0, 1, 2,
*** 2010-03-01 16:50:22.948
Reconfiguration starts [incarn=0]
*** 2010-03-01 16:50:22.948
I'm the master node
Group reconfiguration cleanup
*** 2010-03-01 16:50:23.602
A rcfg proposal from node 2 is received
*** 2010-03-01 16:50:23.602
A rcfg proposal from node 1 is received
*** 2010-03-01 16:50:23.602
Reconfiguration completes [incarn=3]
*** 2010-03-01 16:53:00.877
A dump event msg is rcv'd
REQUEST:trace dump in directory cdmp_20100301165301
*** 2010-03-01 16:53:00.877
Trace dumping is performing id=[cdmp_20100301165301]....
*** 2010-03-01 16:53:01.041
Trace dumping is done
*** 2010-03-01 16:54:46.560
Instance is terminating by process 860280 [ospid=oracle@host-node1 (LMON)]
Performing diagnostic data dump for this instance
Incident 116833 created, dump file: /oradb/oracle/diag/asm/+asm/+ASM1/incident/incdir_116833/+ASM1_diag_614488_i116833.trc
ORA-29740: evicted by member , group incarnation
Error 29740 encountered during system state dump
*** 2010-03-01 16:54:49.280
----- Error Stack Dump -----
ORA-29740: evicted by member , group incarnation
*** 2010-03-01 16:54:49.281
Trace dumping is performing id=[cdmp_20100301165446]....
*** 2010-03-01 16:54:49.433
Trace dumping is done

Oracle instance crashing when enabling use_indirect_data_buffers=true

I have a Windows 2003 EE server (32bit) with 16GB of ram hosting a 10.2.0.2 Oracle server which is used to support a commercial software package (arcsight). I'm trying to get the Oracle backend to leverage the available system memory. I've read 50-60 different articles and posts regarding AWE and Oracle. I have successfully tuned the userva parameter in order to get the server to boot stable with the /3gb boot parameter. I've gotten to the point that the oracle instance will start up, but within about 30-60 seconds the instance will crash. Below is the information I believe that is relevant:
*.......From computer Registry.........*
AWE_MEMORY_WINDOW = 1288486912
ORA_WORKINGSETMIN = 2
*...........From init.ora.............*
*.__dg_broker_service_names=';'
arcsight.__java_pool_size=0
arcsight.__large_pool_size=0
arcsight.__shared_pool_size=314572800
arcsight.__streams_pool_size=0
*.audit_file_dest='E:\oracle10g\OraHome10g\admin\arcsight\adump'
*.audit_sys_operations=true
*.audit_trail='db'
*.background_dump_dest='E:\oracle10g\OraHome10g\admin\arcsight\bdump'
*.compatible='10.2.0.1.0'
*.control_files='E:\oracle10g\OraHome10g\oradata\arcsight\control01.ctl','f:\arcsight\control02.ctl','g:\arcsight\control03.ctl'
*.core_dump_dest='E:\oracle10g\OraHome10g\admin\arcsight\cdump'
*.cursor_sharing='FORCE'
**.db_block_size=16384*
**.db_block_buffers=235929*
*.db_domain=''
*.db_file_multiblock_read_count=16
*.db_files=2000
*.db_name='arcsight'
*.db_writer_processes=4
*.dispatchers=''
*.job_queue_processes=10
*.log_archive_dest_1='LOCATION=H:'
*.log_buffer=1048576
*.open_cursors=2000
*.parallel_max_servers=0
*.pga_aggregate_target=314572800
*.processes=300
*.recyclebin='OFF'
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=0
*.undo_management='AUTO'
*.undo_retention=43200
*.undo_tablespace='ARC_UNDO'
*.user_dump_dest='E:\oracle10g\OraHome10g\admin\arcsight\udump'
*.java_pool_size=0
*.large_pool_size=0
*.shared_pool_size=314572800
*.streams_pool_size=0
**.use_indirect_data_buffers=true*
*......From oradim.log.......*
Sun Feb 22 18:37:33 2009
E:\oracle10g\OraHome10g\bin\oradim.exe -shutdown -sid arcsight -usrpwd * -shutmode immediate -log oradim.log
Sun Feb 22 18:37:34 2009
ORA-01012: not logged on
Sun Feb 22 18:37:45 2009
E:\oracle10g\OraHome10g\bin\oradim.exe -startup -sid arcsight -usrpwd * -log oradim.log -nocheck 0
Sun Feb 22 18:37:51 2009
ORA-03113: end-of-file on communication channel
*.......From alert_arcsight.log.........*
Dump file e:\oracle10g\orahome10g\admin\arcsight\bdump\alert_arcsight.log
Sun Feb 22 23:20:51 2009
ORACLE V10.2.0.2.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Windows Server 2003 Version V5.2 Service Pack 2
CPU : 8 - type 586, 4 Physical Cores
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:14554M/16215M, Ph+PgF:14862M/15967M, VA:1926M/2047M
Sun Feb 22 23:20:51 2009
Starting ORACLE instance (normal)
Sun Feb 22 23:20:52 2009
Window memory size 1288503296
Sun Feb 22 23:20:52 2009
Minimum working set window size : 4096
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 2
Autotune of undo retention is turned on.
IMODE=BR
ILAT =36
LICENSE_MAX_USERS = 0
SYS auditing is enabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.2.0.
System parameters with non-default values:
processes = 300
use_indirect_data_buffers= TRUE
__shared_pool_size = 318767104
shared_pool_size = 318767104
__large_pool_size = 0
large_pool_size = 0
__java_pool_size = 0
java_pool_size = 0
__streams_pool_size = 0
streams_pool_size = 0
sga_target = 0
control_files = E:\ORACLE10G\ORAHOME10G\ORADATA\ARCSIGHT\CONTROL01.CTL, F:\ARCSIGHT\CONTROL02.CTL, G:\ARCSIGHT\CONTROL03.CTL
db_block_buffers = 235932
db_block_size = 16384
db_writer_processes = 4
compatible = 10.2.0.1.0
log_archive_dest_1 = LOCATION=H:
log_buffer = 2097152
db_files = 2000
db_file_multiblock_read_count= 16
undo_management = AUTO
undo_tablespace = ARC_UNDO
undo_retention = 43200
recyclebin = OFF
remote_login_passwordfile= EXCLUSIVE
audit_sys_operations = TRUE
db_domain =
__dg_broker_service_names= ;
dispatchers =
job_queue_processes = 10
cursor_sharing = FORCE
parallel_max_servers = 0
audit_file_dest = E:\ORACLE10G\ORAHOME10G\ADMIN\ARCSIGHT\ADUMP
background_dump_dest = E:\ORACLE10G\ORAHOME10G\ADMIN\ARCSIGHT\BDUMP
user_dump_dest = E:\ORACLE10G\ORAHOME10G\ADMIN\ARCSIGHT\UDUMP
core_dump_dest = E:\ORACLE10G\ORAHOME10G\ADMIN\ARCSIGHT\CDUMP
audit_trail = DB
db_name = arcsight
open_cursors = 2000
pga_aggregate_target = 314572800
PMON started with pid=2, OS id=6676
PSP0 started with pid=6, OS id=7544
MMAN started with pid=10, OS id=7560
DBW0 started with pid=14, OS id=6500
DBW1 started with pid=18, OS id=6800
DBW2 started with pid=22, OS id=6276
DBW3 started with pid=26, OS id=520
LGWR started with pid=30, OS id=6756
CKPT started with pid=34, OS id=6380
SMON started with pid=38, OS id=7472
RECO started with pid=42, OS id=7696
CJQ0 started with pid=46, OS id=7912
MMON started with pid=50, OS id=7576
MMNL started with pid=54, OS id=6852
Sun Feb 22 23:20:53 2009
alter database mount exclusive
Sun Feb 22 23:20:57 2009
Setting recovery target incarnation to 1
Sun Feb 22 23:20:57 2009
Successful mount of redo thread 1, with mount id 1799551061
Sun Feb 22 23:20:57 2009
Database mounted in Exclusive Mode
Completed: alter database mount exclusive
Sun Feb 22 23:20:57 2009
alter database open
Sun Feb 22 23:20:58 2009
Beginning crash recovery of 1 threads
parallel recovery setup failed: using serial mode
Sun Feb 22 23:20:58 2009
Started redo scan
Sun Feb 22 23:20:58 2009
Completed redo scan
0 redo blocks read, 0 data blocks need recovery
Sun Feb 22 23:20:58 2009
Started redo application at
Thread 1: logseq 1137, block 3, scn 1707289029
Sun Feb 22 23:20:58 2009
Recovery of Online Redo Log: Thread 1 Group 5 Seq 1137 Reading mem 0
Mem# 0: I:\ARCSIGHT\REDO\REDO5.LOG
Mem# 1: I:\ARCSIGHT\REDO\REDO05B.LOG
Sun Feb 22 23:20:58 2009
Completed redo application
Sun Feb 22 23:20:58 2009
Completed crash recovery at
Thread 1: logseq 1137, block 3, scn 1707309030
0 data blocks read, 0 data blocks written, 0 redo blocks read
Sun Feb 22 23:20:59 2009
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=62, OS id=6972
Sun Feb 22 23:20:59 2009
ARC0: Archival started
ARC1 started with pid=66, OS id=6640
Sun Feb 22 23:20:59 2009
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Thread 1 advanced to log sequence 1138
Thread 1 opened at log sequence 1138
Current log# 4 seq# 1138 mem# 0: G:\ARCSIGHT\REDO\REDO4.LOG
Current log# 4 seq# 1138 mem# 1: G:\ARCSIGHT\REDO\REDO04B.LOG
Successful open of redo thread 1
Sun Feb 22 23:21:00 2009
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Feb 22 23:21:00 2009
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Sun Feb 22 23:21:00 2009
ARC1: Becoming the heartbeat ARCH
Sun Feb 22 23:21:00 2009
SMON: enabling cache recovery
Sun Feb 22 23:21:02 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_pmon_6676.trc:
ORA-27103: internal error
OSD-00028: additional error information
Sun Feb 22 23:21:02 2009
PMON: terminating instance due to error 27103
Sun Feb 22 23:21:02 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_reco_7696.trc:
ORA-27103: internal error
Sun Feb 22 23:21:02 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_smon_7472.trc:
ORA-27103: internal error
Sun Feb 22 23:21:02 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_ckpt_6380.trc:
ORA-27103: internal error
Sun Feb 22 23:21:02 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_lgwr_6756.trc:
ORA-27103: internal error
Sun Feb 22 23:21:03 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_dbw3_520.trc:
ORA-27103: internal error
Sun Feb 22 23:21:03 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_dbw2_6276.trc:
ORA-27103: internal error
Sun Feb 22 23:21:03 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_dbw1_6800.trc:
ORA-27103: internal error
Sun Feb 22 23:21:03 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_dbw0_6500.trc:
ORA-27103: internal error
Sun Feb 22 23:21:03 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_mman_7560.trc:
ORA-27103: internal error
Sun Feb 22 23:21:04 2009
Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_psp0_7544.trc:
ORA-27103: internal error
Instance terminated by PMON, pid = 6676
I appreciate any input on what to look at to further isolate this issue. I'd run into many other issues along the way (setting AWE_WINDOW_MEMORY to a proper size, setting db_block_buffers to a proper value, etc) that various forum searches helped resolve but I've not been able to find anything related to the errors I'm getting now. If I set use_indirect_data_buffers=false and tune back the db_block_buffers, the instance starts without any problems. Its just when I try and enable the use of AWE that I'm having a problem.
Nick

Just wanted to close out this tread in case anyone else runs into a similar problem. Turns out we ran into a bug documented in the below linked article (we're using AMD processors). Essentially needed to disable NUMA.
http://blog.csdn.net/orapeasant/archive/2007/06/05/1639532.aspx
excerpt ....
But please be aware of Bug 4494543 - affecting 10g and fixed in Oracle 11.0 ......
ORA-7445: CORE DUMP [ACCESS_VIOLATION] WITH USE_INDIRECT_DATA_BUFFERS=TRUE
Rediscovery Information:
1) Using 32-Bit Oracle on a 32-Bit Windows 2003 server running on an AMD Opteron 64-Bit chip.
2) You have set use_indirect_data_buffers=true in init.ora
Workaround: Basically disable NUMA feature on 32-Bit platform :-
1) Set ENABLENUMA = FALSE in Windows registry for the Oracle Home.
2) Set enableNUMA_optimizations = FALSE (init.ora)
Thanks for the help. We'll see if access to the extra memory will be useful or not .....
Nick

After upgrade to 7.0, crash when left idle for more than a few minutes & if click contact without checking the box first.

I had no serious problems with Firefox until downloading the current version. Since then I've had frequent crashes, usually several times a day, and even with CTL + ALT + Delete I have difficulty closing Firefox down. The crashes happen more frequently when I leave the program idle for, say, 15 minutes or more or when I attempt to open a contact for editing by directly clicking on the contact name, without checking the box first.
My anti-virus software is ESET-NOD.

I get this on firefox. I get this on waterfox. I get this on nightly. I don't get it on chrome. I don't get it on internet explorer. I don't get it on comodo dragon. I don't get it on arora. I don't get it on konqueror...... Anyone noticing a pattern here?
I'm willing to admit that it's probably something to do with my isp server, so Mozilla can feel holier than thou about it, but if that's the way they insist on going, I suggest that the next time they upgrade, they call their browser Canute.

ITunes crashes when selecting songs for importing from CD - any clues?

I'm running XP Pro on a Dell Latitude 620 with the latest version of iTunes.
When I put a CD in, iTunes shows me the tracks. But as I am selecting tracks to play (double-clicking on them), randomly iTunes crashes. Not immediatly repeatable, but it always happens within 5 minutes or so of selecting songs.
Any clues out there?

I had removed a couple of those artists and later replaced them with backups from a separate hard drive and it didn't help. I was going to ditch and replace all of them today to see if that made a difference. However, before I did, I thought I'd check a couple of things as far as renaming the artists was concerned. So I added a number to one problem artist's name, then tried opening it in the 'Artist' list. It worked fine. I went back, removed the number, and tried opening it again. Worked fine again. So I checked the other problem artists, without altering the names. They all open just fine. Problem solved! Well, not solved but mysteriously gone away, at least for now.
I think you're right about a faulty link being the problem, and I suppose something I did reestablished that link. Don't know what it was though, so I'm afraid I can't offer a solution to anyone else with a similar problem.

Windows 8.1 Pro Crashes When Enabling Hyper-V

So I bought a new HP Envy Desktop specifically for Windows Phone Development and ironically I can't get Hyper-V running. The machine just hangs on startup and eventually Windows 8.1 Pro x64 just crashes after multiple attempts after enabling Hyper-V in Windows
Programs/Features. It's a Intel i5 with 12GB of RAM and virtualization is enabled in the BIOS. I've read through multiple threads over the last several days and tried the following:
- Updating all local drivers and Windows updates
- Disabling Bluetooth
- Uninstalling Avira AV
- Someone also suggested disabling USB 3.0 support but I don't see an option to do this in my BIOS.
I've wasted a lot of time on this and I would really appreciate any help.
Thanks

Hi ericvanburen,
I found similar issue with HP Envy Desktop in HP forum , it might be due to blue tooth driver .
And I quoted the genneral solution here :
1. Enter into BIOS setup and set Virtualization as disabled and reboot system.
2. After booting windows 8, enter into Control panel and remove bluetooth driver.
3. Download and install Ralink Bluetooth driver version 9.2.101.10 (SP59632)
4. Reboot system and enable Virtualization Technology again.
For details please refer to following link:
http://h30434.www3.hp.com/t5/Notebook-Operating-Systems-and-Software/HP-ENVY-M6-1106er-Windows-8-Pro-hangs-up-to-start-after/m-p/2386453#M126362
Best Regards
Elton Ji
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

My App Store keeps crashing when I look for an app! How can I change this?

When I load an App my App Store crash! :/

Hello, I'm sorry to hear that you are experiencing what most of us are experiencing.
This is not something any of the users can fix, so we just have to wait for apple to bring a new update.
Here is one thing you can try to clear it up though. Try rebooting your device by holding the power button down until it prompts you to shutdown, and restart it. If that doesn't work, try holding the power button and the home button down until the device is forced into a shutdown, and restart it.
I hope that helps a little at least.
~Lt. Leviathan

Crash when adding artwork for Films/Movies

I have converted some DVDs to .mp4 format for use on my iPad. The files play fine in Quicktime and iTunes, but when I "Get Info" and add artwork for the file - after clicking ok, the spinning beachball appears and iTunes eventually crashes.
The files themselves are about 3GB each. I have tried opening iTunes in safemode, and it does add the artwork, but as soon as I deslect the video, the artwork clears again.
Does anyone have a solution, or some tips?
I am using Mac OS X 10.7.3 and iTunes 10.6 (40).
Thanks!

This is also driving me nuts... Actually, I have not even figured out how to even enter a new appointment in month view. It used to be double-click, enter name, hit <tab> and fill out he rest in the panel, but this routine does not work any more.

Node crashes when enabling RDS for private interconnect.

Similar Messages

Maybe you are looking for