Node crashes when enabling RDS for private interconnect.

OS: oel6.3 - 2.6.39-300.17.2.el6uek.x86_64
Grid and DB: 11.2.0.3.4
This is a two node Standard Edition cluster.
The node crashes upon restart of clusterware after following the instructions from note:751343.1 (RAC Support for RDS Over Infiniband) to enable RDS.
The cluster is running fine using ipoib for the cluster_interconnect.
1) As the ORACLE_HOME/GI_HOME owner, stop all resources (database, listener, ASM etc) that's running from the home. When stopping database, use NORMAL or IMMEDIATE option.
2) As root, if relinking 11gR2 Grid Infrastructure (GI) home, unlock GI home: GI_HOME/crs/install/rootcrs.pl -unlock
3) As the ORACLE_HOME/GI_HOME owner, go to ORACLE_HOME/GI_HOME and cd to rdbms/lib
4) As the ORACLE_HOME/GI_HOME owner, issue "make -f ins_rdbms.mk ipc_rds ioracle"
5) As root, if relinking 11gR2 Grid Infrastructure (GI) home, lock GI home: GI_HOME/crs/install/rootcrs.pl -patch
Looks to abend when asm tries to start with the message below on the console.
I have a service request open for this issue but, I am hoping someone may have seen this and has
some way around it.
Thanks
Alan
kernel BUG at net/rds/ib_send.c:547!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: 8021q garp stp llc iptable_filter ip_tables nfs lockd
fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8
freq_table mperf rds_rdma rds_tcp rds ib_ipoib rdma_ucm ib_ucm ib_uverbs
ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa sr_mod cdrom microcode
serio_raw pcspkr ghes hed k10temp hwmon amd64_edac_mod edac_core
edac_mce_amd i2c_piix4 i2c_core sg igb dca mlx4_ib ib_mad ib_core
mlx4_en mlx4_core ext4 mbcache jbd2 usb_storage sd_mod crc_t10dif ahci
libahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
scsi_wait_scan]
Pid: 4140, comm: kworker/u:1 Not tainted 2.6.39-300.17.2.el6uek.x86_64
#1 Supermicro BHDGT/BHDGT
RIP: 0010:[<ffffffffa02db829>] [<ffffffffa02db829>]
rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
RSP: 0018:ffff880fb84a3c50 EFLAGS: 00010202
RAX: ffff880fbb694000 RBX: ffff880fb3e4e600 RCX: 0000000000000000
RDX: 0000000000000030 RSI: ffff880fbb6c3a00 RDI: ffff880fb058a048
RBP: ffff880fb84a3d30 R08: 0000000000000fd0 R09: ffff880fbb6c3b90
R10: 0000000000000000 R11: 000000000000001a R12: ffff880fbb6c3a00
R13: ffff880fbb6c3a00 R14: 0000000000000000 R15: ffff880fb84a3d90
FS: 00007fd0a3a56700(0000) GS:ffff88101e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002158ca2 CR3: 0000000001783000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:1 (pid: 4140, threadinfo ffff880fb84a2000, task
ffff880fae970180)
Stack:
0000000000012200 0000000000012200 ffff880f00000000 0000000000000000
000000000000e5b0 ffffffff8115af81 ffffffff81b8d6c0 ffffffffa02b2e12
00000001bf272240 ffffffff81267020 ffff880fbb6c3a00 0000003000000002
Call Trace:
[<ffffffff8115af81>] ? __kmalloc+0x1f1/0x200
[<ffffffffa02b2e12>] ? rds_message_alloc+0x22/0x90 [rds]
[<ffffffff81267020>] ? sg_init_table+0x30/0x50
[<ffffffffa02b2db2>] ? rds_message_alloc_sgs+0x62/0xa0 [rds]
[<ffffffffa02b31e4>] ? rds_message_map_pages+0xa4/0x110 [rds]
[<ffffffffa02b4f3b>] rds_send_xmit+0x38b/0x6e0 [rds]
[<ffffffff81089d53>] ? cwq_activate_first_delayed+0x53/0x100
[<ffffffffa02b6040>] ? rds_recv_worker+0xc0/0xc0 [rds]
[<ffffffffa02b6075>] rds_send_worker+0x35/0xc0 [rds]
[<ffffffff81089fd6>] process_one_work+0x136/0x450
[<ffffffff8108bbe0>] worker_thread+0x170/0x3c0
[<ffffffff8108ba70>] ? manage_workers+0x120/0x120
[<ffffffff810907e6>] kthread+0x96/0xa0
[<ffffffff81515544>] kernel_thread_helper+0x4/0x10
[<ffffffff81090750>] ? kthread_worker_fn+0x1a0/0x1a0
[<ffffffff81515540>] ? gs_change+0x13/0x13
Code: ff ff e9 b1 fe ff ff 48 8b 0d b4 54 4b e1 48 89 8d 70 ff ff ff e9
71 ff ff ff 83 bd 7c ff ff ff 00 0f 84 f4 f5 ff ff 0f 0b eb fe <0f> 0b
eb fe 44 8b 8d 48 ff ff ff 41 b7 01 e9 51 f6 ff ff 0f 0b
RIP [<ffffffffa02db829>] rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
RSP <ffff880fb84a3c50>
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.39-300.17.2.el6uek.x86_64
([email protected]) (gcc version 4.4.6 20110731 (Red
Hat 4.4.6-3) (GCC) ) #1 SMP Wed Nov 7 17:48:36 PST 2012
Command line: ro root=UUID=5ad1a268-b813-40da-bb76-d04895215677
rd_DM_UUID=ddf1_stor rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us numa=off
console=ttyS1,115200n8 irqpoll maxcpus=1 nr_cpus=1 reset_devices
cgroup_disable=memory mce=off memmap=exactmap memmap=538K@64K
memmap=130508K@770048K elfcorehdr=900556K memmap=72K#3668608K
memmap=184K#3668680K
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 0000000000096800 (usable)
BIOS-e820: 0000000000096800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dfe90000 (usable)
BIOS-e820: 00000000dfe9e000 - 00000000dfea0000 (reserved)
BIOS-e820: 00000000dfea0000 - 00000000dfeb2000 (ACPI data)
BIOS-e820: 00000000dfeb2000 - 00000000dfee0000 (ACPI NVS)
BIOS-e820: 00000000dfee0000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)

I believe OFED version is 1.5.3.3 but I am not sure if this is correct.
We have not added any third parry drivers. All that has been done to add infiniband to our build is
a yum groupinstall iInfiniband support.
I have not tries rds-stress but rds-ping works fine and rds-info seems fine.
A service request has been opened but so far I have had better response here.
oracle@blade1-6:~> rds-info
RDS IB Connections:
LocalAddr RemoteAddr LocalDev RemoteDev
10.10.0.116 10.10.0.119 fe80::25:90ff:ff07:df1d fe80::25:90ff:ff07:e0e5
TCP Connections:
LocalAddr LPort RemoteAddr RPort HdrRemain DataRemain SentNxt ExpectUna SeenUna
Counters:
CounterName Value
conn_reset 5
recv_drop_bad_checksum 0
recv_drop_old_seq 0
recv_drop_no_sock 1
recv_drop_dead_sock 0
recv_deliver_raced 0
recv_delivered 18
recv_queued 18
recv_immediate_retry 0
recv_delayed_retry 0
recv_ack_required 4
recv_rdma_bytes 0
recv_ping 14
send_queue_empty 18
send_queue_full 0
send_lock_contention 0
send_lock_queue_raced 0
send_immediate_retry 0
send_delayed_retry 0
send_drop_acked 0
send_ack_required 3
send_queued 32
send_rdma 0
send_rdma_bytes 0
send_pong 14
page_remainder_hit 0
page_remainder_miss 0
copy_to_user 0
copy_from_user 0
cong_update_queued 0
cong_update_received 1
cong_send_error 0
cong_send_blocked 0
ib_connect_raced 4
ib_listen_closed_stale 0
ib_tx_cq_call 6
ib_tx_cq_event 6
ib_tx_ring_full 0
ib_tx_throttle 0
ib_tx_sg_mapping_failure 0
ib_tx_stalled 16
ib_tx_credit_updates 0
ib_rx_cq_call 33
ib_rx_cq_event 38
ib_rx_ring_empty 0
ib_rx_refill_from_cq 0
ib_rx_refill_from_thread 0
ib_rx_alloc_limit 0
ib_rx_credit_updates 0
ib_ack_sent 4
ib_ack_send_failure 0
ib_ack_send_delayed 0
ib_ack_send_piggybacked 0
ib_ack_received 3
ib_rdma_mr_alloc 0
ib_rdma_mr_free 0
ib_rdma_mr_used 0
ib_rdma_mr_pool_flush 8
ib_rdma_mr_pool_wait 0
ib_rdma_mr_pool_depleted 0
ib_atomic_cswp 0
ib_atomic_fadd 0
iw_connect_raced 0
iw_listen_closed_stale 0
iw_tx_cq_call 0
iw_tx_cq_event 0
iw_tx_ring_full 0
iw_tx_throttle 0
iw_tx_sg_mapping_failure 0
iw_tx_stalled 0
iw_tx_credit_updates 0
iw_rx_cq_call 0
iw_rx_cq_event 0
iw_rx_ring_empty 0
iw_rx_refill_from_cq 0
iw_rx_refill_from_thread 0
iw_rx_alloc_limit 0
iw_rx_credit_updates 0
iw_ack_sent 0
iw_ack_send_failure 0
iw_ack_send_delayed 0
iw_ack_send_piggybacked 0
iw_ack_received 0
iw_rdma_mr_alloc 0
iw_rdma_mr_free 0
iw_rdma_mr_used 0
iw_rdma_mr_pool_flush 0
iw_rdma_mr_pool_wait 0
iw_rdma_mr_pool_depleted 0
tcp_data_ready_calls 0
tcp_write_space_calls 0
tcp_sndbuf_full 0
tcp_connect_raced 0
tcp_listen_closed_stale 0
RDS Sockets:
BoundAddr BPort ConnAddr CPort SndBuf RcvBuf Inode
0.0.0.0 0 0.0.0.0 0 131072 131072 340441
RDS Connections:
LocalAddr RemoteAddr NextTX NextRX Flg
10.10.0.116 10.10.0.119 33 38 --C
Receive Message Queue:
LocalAddr LPort RemoteAddr RPort Seq Bytes
Send Message Queue:
LocalAddr LPort RemoteAddr RPort Seq Bytes
Retransmit Message Queue:
LocalAddr LPort RemoteAddr RPort Seq Bytes
10.10.0.116 0 10.10.0.119 40549 32 0
oracle@blade1-6:~> cat /etc/rdma/rdma.conf
# Load IPoIB
IPOIB_LOAD=yes
# Load SRP module
SRP_LOAD=no
# Load iSER module
ISER_LOAD=no
# Load RDS network protocol
RDS_LOAD=yes
# Should we modify the system mtrr registers? We may need to do this if you
# get messages from the ib_ipath driver saying that it couldn't enable
# write combining for the PIO buffs on the card.
# Note: recent kernels should do this for us, but in case they don't, we'll
# leave this option
FIXUP_MTRR_REGS=no
# Should we enable the NFSoRDMA service?
NFSoRDMA_LOAD=yes
NFSoRDMA_PORT=2050
oracle@blade1-6:~> /etc/init.d/rdma status
Low level hardware support loaded:
     mlx4_ib
Upper layer protocol modules:
     rds_rdma ib_ipoib
User space access modules:
     rdma_ucm ib_ucm ib_uverbs ib_umad
Connection management modules:
     rdma_cm ib_cm iw_cm
Configured IPoIB interfaces: none
Currently active IPoIB interfaces: ib0

Similar Messages

  • Word Crashing When Discarding Checkout for Document Stored in SharePoint

    We have recently noticed an issue when we have a Word file checked out from SharePoint, if we then discard the checkout from Word, Word crashes. The error details point to an issue in wwlib.dll.
    Having done some further investigation into this it appears this only happens if the document has an attached template which contains a custom ribbon.  We use SharePoint 2013 and Word 2010, although I have tested using Word 2013 with the same results.
    This was noticed on our company templates which contain a custom ribbon tab and a number of custom buttons. I have since tested it by creating a template with a single button on a custom tab with the same results.
    Has anyone else come across this issue and and is there any way to resolve it?
    Thanks,
    Richard

    Hi Daniel,
    Thanks for the response.
    The issue only happens when discarding checkout. Editing and saving a document back to SharePoint does not cause an issue. I am using a dotm template. I am not storing the template
    as a content type. When I say I created a custom tab, this was created by adding the XML to create the ribbon tab and button. I used the 'Custom UI Editor for Microsoft Office' to add this. The file created for this is called customUI14.xml and the
    XML is as follows (in the basic test template that I created):
    <customUI xmlns="http://schemas.microsoft.com/office/2009/07/customui">
    <ribbon startFromScratch="false">
    <tabs>
    <tab id="customTab" label="Custom Tab">
    <group id="customGroup" label="Custom Group">
    <button id="customButton" label="Test Ribbon Button" imageMso="HappyFace" size="large" onAction="TestRibbonCrash" />
    </group>
    </tab>
    </tabs>
    </ribbon>
    </customUI>
    The code this points to is as follows:
    Sub TestRibbonCrash(ByRef Ctrl As IRibbonControl)
        MsgBox "Ribbon button working"
    End Sub
    Following your reply I have tested it by adding a custom tab and button through the GUI and the problem doesn't occur when using that method of creating the tab and button.
    I will see if I can find anything useful in the ULS logs although no correlation ID is displayed when the error happens and in my experience even with a correlation ID, 99% of the time the ULS logs don't provide any useful information.
    I've also observed that Word doesn't crash if I discard checkout when I have another document open at the same time which is based on the same template.
    I have already posted on a MS Word forum. Is there more chance of getting some help on one of the other Word forums you have suggested?
    http://answers.microsoft.com/en-us/office/forum/office_2010-word/word-crashing-when-discarding-checkout-for/95565f9e-411b-4f11-b14d-d3771e2e2ba4
    Thanks,
    Richard

  • Is anybody else experiencing crashes when installing iTunes for Windows?

    Is anybody else experiencing crashes when installing iTunes for Windows?

    For general advice see Troubleshooting issues with iTunes for Windows updates.
    The steps in the second box (similar to those given above) are a guide to removing everything related to iTunes and then rebuilding it which is often a good starting point unless the symptoms indicate a more specific approach. Review the other boxes and the list of support documents further down the page in case one of them applies.
    Your library should be unaffected by these steps but there is backup and recovery advice elsewhere in the user tip.
    tt2

  • Flash CS5.5 crashes when exporting movieclip for Actionscript

    Hello,
    My Flash CS5.5 crashes when exporting movieclip for Actionscript (select movieclips -> modify -> convert to symbol -> check export for actionscript). I think this only happens when Photoshop CS5 is open.
    Does anyone know how to fix this? I'm using Win7 64-bit. The only open programs are Flash, Photoshop, and Firefox.

    Oddly enough, I opened up Flash again, today, just to see if it was still the same, and it had worked. Everything seemed normal, I opened a new document and started working. Then, out of nowhere it stopped responding again. So I waited for it to respond, it didn't. I closed it from the task manager, and now it's back to the original problem of crashing when I try to create a new document or open an old ontrie
    Update: I tried the cleaner, as you'd suggested. Followed all the steps, but to no avail. Think it may have something to do with my machine, and less with the software?

  • NICs for Private Interconnect redundancy

    DB/Grid version : 11.2.0.2
    Platform : AIX 6.1
    We are going to install a 2-node RAC on AIX (that thing which is almost good as Solaris )
    Our primary private interconnect is
    ### Primary Private Interconnect
    169.21.204.1      scnuprd186-privt1.mvtrs.net  scnuprd186-privt1
    169.21.204.4      scnuprd187-privt1.mvtrs.net  scnuprd187-privt1For Cluster inteconnect's redundancy , Unix team has attached an extra NIC for each node with an extra Gigabit-ethernet switch for these NICs.
    ###Redundant Private Interconnect attached to the server
    169.21.204.2      scnuprd186-privt2.mvtrs.net  scnuprd186-privt2  # Node1's newly attached redundant NIC
    169.21.204.5      scnuprd187-privt2.mvtrs.net  scnuprd187-privt2  # Node2's newly attached redundant NICExample borrowed from citizen2's post
    Apparently I have 2 ways to implement cluster inteconnect's redundancy
    Option1. NIC bonding at OS level
    Option2. Let grid software do it
    Question1. Which is better : Option 1 or 2 ?
    Question2.
    Regarding Option2.
    From googling and OTN , i gather that , during grid installation you just provide 169.21.204.0 for cluster inteconnect and grid will identify the redundant NIC and switch. And if something goes wrong with the Primary Interconnect setup (shown above) , grid will automatically re-route interconnect traffic using the redundant NIC setup. Is this correct ?
    Question 3.
    My colleague tells me , for the redundant Switch (Gigabit) Unless I configure some Multicasting (AIX specific), I could get errors during installation. He doesn't clearly what it was ? Anyone faced Multicasting related issue on this ?

    Hi,
    My recommendation is to you use the AIX EtherChannel.
    The EtherCannel of AIX is much more powerfull and stable compared with HAIP.
    See how setup AIX EtherChannel on 10 Gigabit Ethernet interfaces
    http://levipereira.wordpress.com/2011/01/26/setting-up-ibm-power-systems-10-gigabit-ethernet-ports-and-aix-6-1-etherchannel-for-oracle-rac-private-interconnectivity/
    If you choose use HAIP I recommend you read this note, and find all notes about bugs of HAIP on AIX.
    11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip [ID 1210883.1]
    ASM Crashes as HAIP Does not Failover When Two or More Private Network Fails [ID 1323995.1]
    About Multicasting read it:
    Grid Infrastructure 11.2.0.2 Installation or Upgrade may fail due to Multicasting Requirement [ID 1212703.1]
    Regards,
    Levi Pereira

  • Calendar app crashes when selecting save for this event only

    When putting my work schedule into the calendar app, I usually put it in as a repeating event until a certain date. If I edit one of those days and hit "Save for this event only" the app will crash. It does save, but the crash is quite annoying.

    I'm having a similar issue, but mine is crashing when I try to delete one occurrence of a repeating event, and when I re-open the calendar the event I'm trying to delete is still there. This only began after the 8.3 update. Eventually I just gave up and deleted it on my Macbook at home, but there's obviously a bug in the update.

  • Layout Editor crashes when loading icon for iconic button (11gR2 patch 1)

    Using Forms 11gR2 Patch 1, 64 bit on Solaris 10, the Layout Editor crashes when reading the .GIF file to display an iconic button. (The workaround is not to set UI_ICON and UI_ICON_EXTENSION in the frmbld.sh script. In that case, the Layout Editor displays an empty iconic button, but at least it does not crash.)
    The truss output from frmbld says,
    5078/1:          open("/usr/local/Oracle/Middleware/Oracle_FRHome1/ohs/icons/bomb.gif", O_RDONLY) = 24
    5078/1:          lseek(24, 0, SEEK_END)                    = 308
    5078/1:          lseek(24, 0, SEEK_CUR)                    = 308
    5078/1:          lseek(24, 0, SEEK_SET)                    = 0
    5078/1:          lseek(24, 161, SEEK_SET)               = 161
    5078/1:          fstat(24, 0xFFFFFD7FFFDF9440)               = 0
    5078/1:          fstat(24, 0xFFFFFD7FFFDF9390)               = 0
    5078/1:          ioctl(24, TCGETA, 0xFFFFFD7FFFDF9400)          Err#25 ENOTTY
    5078/1:          read(24, "048F 0C9 IA7B8 X $C0BBDF".., 8192)     = 147
    5078/1:          Incurred fault #6, FLTBOUNDS %pc = 0xFFFFFD7FFDF3FB42
    5078/1:          siginfo: SIGSEGV SEGV_MAPERR addr=0x0000004C
    5078/1:          Received signal #11, SIGSEGV [caught]
    5078/1:          siginfo: SIGSEGV SEGV_MAPERR addr=0x0000004C
    (These are consecutive lines from the truss output, so it would appear that read()'ing from the bomb.gif file -- or any other .GIF file for that matter -- causes the FLTBOUNDS fault.)
    The hs_err_pid*.log file in the Forms home directory says,
    # Problematic frame:
    # C [libuimotif.so.0+0x13fb42] uiimkxu_Support+0x4f42
    (If it matters, Solaris is running in a VMware virtual machine on Windows 7.)
    Did I find a bug?

    Stebalien wrote:Segmentation faults are ALWAYS bugs.
    https://bbs.archlinux.org/viewtopic.php?pid=1381173
    https://bugs.launchpad.net/ubuntu/+sour … ug/1279412
    so, the problem is in fgrlx, but in opensuse it worked fine, mb this happens because I use latest beta driver

  • WRT54GL crashes when i choose for a wireless encryption

    Hello people
    My problem:
    My wireless router crashes when i am installing it.
    And when i configure the the wireless encryption, my router hangs.
    Then i have to reset it and i can configure it again, but the same thing happens again.
    I am using the latest firmware 4.30.7
    is my router out of order.
    Greetings Neppinda 

    Try to reflash the router's firmware and re-configure the router from scratch.

  • ASM on one node crashes when we start the other two nodes ASM

    We completed database build in Aug 2010
    We complete PSU patching in Jan ending
    Feb 4th the database crashed
    We cannot start ASM on node1
    ASM starts good on node2 and node3 but node1 cannot join
    If ASM is down on node2, node3 then we can start ASM node1Reconfiguration started (old inc 0, new inc 6)
    ASM instance
    List of nodes:
    0 1 2
    Global Resource Directory frozen
    * allocate domain 0, invalid = TRUE
    Communication channels reestablished
    * allocate domain 1, invalid = TRUE
    * allocate domain 2, invalid = TRUE
    Mon Mar 01 16:53:00 2010
    Trace dumping is performing id=[cdmp_20100301165301]
    Mon Mar 01 16:53:55 2010
    ERROR: LMD0 (ospid: 274638) detects an idle connection to instance 2
    Mon Mar 01 16:54:44 2010
    Errors in file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_860280.trc (incident=116865):
    ORA-29740: evicted by member 1, group incarnation 8
    Incident details in: /oradb/oracle/diag/asm/+asm/+ASM1/incident/incdir_116865/+ASM1_lmon_860280_i116865.trc
    Errors in file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_860280.trc:
    ORA-29740: evicted by member 1, group incarnation 8
    LMON (ospid: 860280): terminating the instance due to error 29740
    Mon Mar 01 16:54:46 2010
    System state dump is made for local instance
    Errors in file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_diag_614488.trc (incident=116833):
    ORA-29740: evicted by member , group incarnation
    Incident details in: /oradb/oracle/diag/asm/+asm/+ASM1/incident/incdir_116833/+ASM1_diag_614488_i116833.trc
    Mon Mar 01 16:54:46 2010
    ORA-1092 : opitsk aborting process
    Errors in file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_diag_614488.trc:
    ORA-29740: evicted by member , group incarnation
    Trace dumping is performing id=[cdmp_20100301165446]
    Instance terminated by LMON, pid = 860280
    Another thing we found that when we start ASM on node1, the cluster interconnect hangs when we try to ping
    We did modify the cluster_interconnect parameter to try to start using public interface but the issued remained the same and we were not able to ping public interface
    The crs is fine
    $ crs_stat -t
    Name Type Target State Host
    ora....p1.inst application ONLINE OFFLINE
    ora....p2.inst application ONLINE ONLINE noden2
    ora....p3.inst application ONLINE ONLINE noden3
    ora....1p2.srv application ONLINE ONLINE noden2
    ora....1p3.srv application ONLINE ONLINE noden3
    ora.....net.cs application ONLINE ONLINE noden1
    ora.appl.db application ONLINE ONLINE noden1
    ora....SM1.asm application ONLINE OFFLINE
    ora....N1.lsnr application ONLINE ONLINE noden1
    ora....8n1.gsd application ONLINE ONLINE noden1
    ora....8n1.ons application ONLINE ONLINE noden1
    ora....8n1.vip application ONLINE ONLINE noden1
    ora....SM2.asm application ONLINE ONLINE noden2
    ora....N2.lsnr application ONLINE ONLINE noden2
    ora....8n2.gsd application ONLINE ONLINE noden2
    ora....8n2.ons application ONLINE ONLINE noden2
    ora....8n2.vip application ONLINE ONLINE noden2
    ora....SM3.asm application ONLINE ONLINE noden3
    ora....N3.lsnr application ONLINE ONLINE noden3
    ora....8n3.gsd application ONLINE ONLINE noden3
    ora....8n3.ons application ONLINE ONLINE noden3
    ora....8n3.vip application ONLINE ONLINE noden3
    Any inpts can help

    Env
    3-node RAC
    oracle version 11.1.0.7
    Latest PSU Jan applied
    OS is AIX version is 6100-02==========
    LMON trace files
    ==========
    Trace file /oradb/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_860280.trc
    Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options
    ORACLE_HOME = /oradb/oracle/product/11.1/asm_1
    System name:     AIX
    Node name:     host-node1
    Release:     1
    Version:     6
    Machine:     00C39EA44C00
    Instance name: +ASM1
    Redo thread mounted by this instance: 0 <none>
    Oracle process number: 8
    Unix process pid: 860280, image: oracle@host-node1 (LMON)
    *** 2010-03-01 16:50:23.023
    *** SESSION ID:(218.1) 2010-03-01 16:50:23.023
    *** CLIENT ID:() 2010-03-01 16:50:23.023
    *** SERVICE NAME:() 2010-03-01 16:50:23.023
    *** MODULE NAME:() 2010-03-01 16:50:23.023
    *** ACTION NAME:() 2010-03-01 16:50:23.023
    GES resources 5596 pool 6
    GES enqueues 7959
    GES IPC: Receivers 2 Senders 2
    GES IPC: Buffers Receive 1000 Send (i:1150 b:482) Reserve 402
    GES IPC: Msg Size Regular 416 Batch 8192
    Batching factor: enqueue replay 201, ack 224
    Batching factor: cache replay 126 size per lock 64
    kjxggin: CGS tickets = 1000
    kjxgrdmpcpu: CPU Total 6 Core 3 Socket -1 OCPU 6
    kjxgrdmpcpu: High load threshold 21504
    *** 2010-03-01 16:50:23.362
    kjxgmrcfg: Reconfiguration started, type 1
    kjxgmcs: Setting state to 0 0.
    *** 2010-03-01 16:50:23.363
    Name Service frozen
    kjxgmcs: Setting state to 0 1.
    kjxgrdecidever: No old version members in the cluster
    kjxgrssvote: reconfig bitmap chksum 0x88477268 cnt 3 master 0 ret 0
    ksirValidateModuleInfo: action = 10 startup = 0
    Name Service Mode: multi (0x21)
    kjfcpiora: published my fusion master weight 5322
    kjfcpiora: publish my flogb 9
    kjfcpiora: publish my cluster_database_instances parameter=3
    kjxggpoll: change poll time to 50 ms
    kjxgrpropmsg: SSMEMI: inst 1 - no disk vote
    kjxgrpropmsg: SSMEMI: inst 1 - no disk vote
    kjxgrpropmsg: SSMEMI: inst 2 - no disk vote
    SSVOTE: Master indicates no Disk Voting
    kjxgmps: proposing substate 2
    kjxgmcs: Setting state to 6 2.
    kjfmuin: bitmap 0 1 2
    kjfmmhi: received msg from 0 (inc 6)
    kjfmmhi: received msg from 1 (inc 2)
    kjfmmhi: received msg from 2 (inc 4)
    Performed the unique instance identification check
    kjxgmps: proposing substate 3
    kjxgmcs: Setting state to 6 3.
    Name Service recovery started
    Deleted all dead-instance name entries
    kjxgmps: proposing substate 4
    kjxgmcs: Setting state to 6 4.
    Multicasted all local name entries for publish
    Replayed all pending requests
    kjxgmps: proposing substate 5
    kjxgmcs: Setting state to 6 5.
    Name Service normal
    Name Service recovery done
    *** 2010-03-01 16:50:23.889
    *** 2010-03-01 16:50:23.958
    kjxgmps: proposing substate 6
    kjxgmcs: Setting state to 6 6.
    kjxggpoll: change poll time to 600 ms
    2010-03-01 16:50:23.980620 :
    ********* kjfcrfg() called, BEGIN LMON RCFG *********
    kjfcrfg: DRM window size = 0->128 (min lognb = 9)
    2010-03-01 16:50:23.980811 :
    Reconfiguration started (old inc 0, new inc 6)
    ASM instance
    Send timeout: 300 secs
    Defer Queue timeout: 360 secs
    Synchronization timeout: 420 sec
    List of nodes:
    0 1 2
    *** 2010-03-01 16:50:24.023
    2010-03-01 16:50:24.034432 : Global Resource Directory frozen
    node 0
    release 11 1 0 7
    node 1
    release 11 1 0 7
    node 2
    release 11 1 0 7
    number of mastership buckets = 128
    2010-03-01 16:50:24.034959 :
    domain attach called for domid 0
    * kjbdomalc: domain 0 invalid = TRUE
    * kjbdomatt: first attach for domain 0
    asby init, 0/0/x1
    asby returns, 0/0/x1/false
    * Domain maps before reconfiguration:
    * DOMAIN 0 (valid 1): 0
    * End of domain mappings
    * Domain maps after recomputation:
    * DOMAIN 0 (valid 1): 0 1 2
    * End of domain mappings
    Dead inst
    Join inst 0 1 2
    Exist inst
    Active Sendback Threshold = 50 %
    Communication channels reestablished
    2010-03-01 16:50:24.152688 :
    received all domreplay (6.6)
    2010-03-01 16:50:24.152732 :
    sent master 1 (6.6)
    *** 2010-03-01 16:53:00.494
    kjfmReceiverHealthCB_Check: Reciever [0] is healthy.
    2010-03-01 16:52:56.921800 : Received comm error info from 2 (cnt 1)
    kjxgrvalid: valid - 0.1 : (6 6) from 2
    kjxgrrcfgchk: Initiating reconfig, reason=3
    kjxgrrcfgchk: COMM rcfg - Disk Vote Required
    2010-03-01 16:52:57.077877 : kjxgrnetchk: start 0x53001440, end 0x53019ae0
    2010-03-01 16:52:57.077906 : kjxgrnetchk: Sending comm check req to 1
    2010-03-01 16:52:57.078140 : kjxgrnetchk: Sending comm check req to 2
    kjxgrrcfgchk: prev pstate 5 mapsz 512
    kjxgrrcfgchk: new bmp: 0 1 2
    kjxgrrcfgchk: work bmp: 0 1 2
    kjxgrrcfgchk: rr bmp: 0 1 2
    *** 2010-03-01 16:53:00.792
    kjxgmrcfg: Reconfiguration started, type 3
    kjxgmcs: Setting state to 6 0.
    *** 2010-03-01 16:53:00.792
    Name Service frozen
    kjxgmcs: Setting state to 6 1.
    kjxgrdecidever: No old version members in the cluster
    kjxgrmsghndlr: Queue msg (0x110a21e50->0x110f09b90) type 7 for later
    *** 2010-03-01 16:54:43.233
    kjxgrssvote: reconfig bitmap chksum 0x88477268 cnt 3 master 2 ret 0
    kjxgrrcfgchk: disable CGS timeout
    kjxggpoll: change poll time to 50 ms
    * kjfcchknested: CGS rcfg detected in step 7.0.0
    SSVOTE: Master indicates Disk Voting required
    2010-03-01 16:54:37.535518 : kjxgrmsghndlr: evict req from 1 for 0, seq (8, 8) vers 2193970751
    2010-03-01 16:54:37.535587 : kjxgrdtrt: Evicted by 1, seq (8, 8)
    IMR state information
    Member 0, thread -1, state 0x2:c, flags 0x2c48
    RR seq commit 6 cur 8
    Propstate 3 prv 2 pending 0
    rcfg rsn 3, rcfg time 1392514113, mem ct 3
    master 2, master rcfg time 1392479783
    evicted memcnt 0, starttm 0 chkcnt 0
    system load 241 (normal)
    Member information:
    Member 0, incarn 6, version 0x82c5563f, thrd -1
    prev thrd -1, status 0x1203 (JR..), err 0x0000
    Member 1, incarn 6, version 0x82c1073b, thrd 2
    prev thrd -1, status 0x1007 (JRM.), err 0x0002
    Member 2, incarn 6, version 0x82c114ee, thrd 3
    prev thrd -1, status 0x0007 (JRM.), err 0x0000
    =====================================================
    Group name: +ASM
    Member id: 0
    Cached KGXGN event: 0
    Group State:
    State: 6 1
    Reconfig started start-tm 0x4b8c373c tmout period 0xffffffff state 0x2
    Reconfig INPG type 3 inc 6 rsn 0 data 0x0
    Reconfig COMP type 1 inc 6 rsn 0 data 0x0
    Commited Map: 0 1 2
    New Map: 0 1 2
    KGXGN Map: 0 1 2
    KGXGN Map2: 0 1 2
    Master node: 0
    Memcnt 3 Rcvcnt 0
    Substate Proposal: false
    Inc Proposal:
    incarn 0 memcnt 0 master 0
    proposal false matched false
    map:
    Master Inc State:
    incarn 0 memcnt 0 agrees 0 flag 0x1
    wmap:
    nmap:
    ubmap:
    Substate Handler Execution State
    substate 0 status done
    substate 1 status done
    substate 2 status done
    substate 3 status done
    substate 4 status done
    substate 5 status done
    substate 6 status done
    IMR hist: 20[0x0a00:0x53019b0e] 4[0x0007:0x53019b0e] 3[0x0006:0x53019b0e]
    IMR hist: 20[0x0902:0x53019b0e] 20[0x0702:0x53019b0b] 20[0x0702:0x53019b0a]
    IMR hist: 20[0x0702:0x53019b0a] 1[0x0006:0x53019b0a] 20[0x0702:0x53019aff]
    IMR hist: 10[0x0006:0x52fdbdb1] 20[0x0b00:0x52fdbdb1] 9[0x0006:0x52fdbdaf]
    IMR hist: 20[0x0a02:0x52fdbdaf] 20[0x0a01:0x52fdbce1] 20[0x0a00:0x52fdbc8a]
    IMR hist: 4[0x0005:0x52fdbc86] 3[0x0004:0x52fdbc4c] 20[0x0900:0x52fdbc4c]
    IMR hist: 20[0x0802:0x52fdbc08] 20[0x0801:0x52fdbc08] 20[0x0801:0x52fdbc08]
    IMR hist: 20[0x0602:0x52fdbc08] 20[0x0601:0x52fdbc08] 20[0x0601:0x52fdbc08]
    IMR hist: 20[0x0800:0x52fdbc08] 20[0x0700:0x52fdbc08] 20[0x0602:0x52fdbc07]
    IMR hist: 20[0x0800:0x52fdbc07] 20[0x0700:0x52fdbc07] 1[0x0000:0x52fdbbb8]
    IMR hist: 0[0x0000:0x00000000] 0[0x0000:0x00000000]
    KJM HIST LMD0:
    7:0 6:0 5:7:0 12:97697 7:0 6:0 5:7:0 12:97696 7:0 6:0
    5:7:0 12:97703 7:0 6:0 5:7:0 2:0 1:0 12:97713 7:0 6:0
    5:7:0 12:97766 7:0 6:0 5:7:0 12:97782 7:0 6:0 5:7:0 12:97778
    7:0 6:0 5:7:0 12:97799 7:0 6:0 5:7:0 12:97771 7:0 6:0
    5:7:0 12:97784 7:0 6:0 5:7:0 12:97805 7:0 6:0 5:7:0 12:97785
    7:0 6:0 5:7:0 12:97757 7:0 6:0 5:7:0 12:97770 7:0 6:0
    5:7:0 12:97784 7:0 6:0
    KJM HIST LMS0:
    7:0 6:0 5:7:0 10:0 12:97697 7:0 6:0 5:7:0 10:0 12:97696
    7:0 6:0 5:7:0 10:0 12:97703 7:0 6:0 5:7:0 10:0 12:97713
    7:0 6:0 5:7:0 10:0 2:0 12:97766 7:0 6:0 5:7:0 10:0
    12:97782 7:0 6:0 5:7:0 10:0 12:97778 7:0 6:0 5:7:0 10:0
    12:97799 7:0 6:0 5:7:0 10:0 12:97771 7:0 6:0 5:7:0 10:0
    12:97784 7:0 6:0 5:7:0 10:0 12:97805 7:0 6:0 5:7:0 10:0
    12:97785 7:0 6:0 5:7:0
    DUMP state for lmd0 (ospid 274638)
    DUMP IPC context for lmd0 (ospid 274638)
    Dumping process 9.274638 info:
    *** 2010-03-01 16:54:43.664
    Process diagnostic dump for oracle@host-node1 (LMD0), OS id=274638,
    pid: 9, proc_ser: 1, sid: 217, sess_ser: 1
    loadavg : 1.72 1.07 0.90
    swap info: free_mem = 28642.09M rsv = 16.00M
    alloc = 21.13M avail = 4096.00M swap_free = 4074.87M
    F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
    240001 A oracle 274638 1 0 60 20 12ca9f590 156060 16:50:23 - 0:00 asm_lmd0_+ASM1
    Short stack dump:
    <-ksedsts()+0254<-ksdxfstk()+0028<-ksdxcb()+05d8<-sspuser()+0074<-4750<-poll()+000c<-sskgxp_select()+00e4<-skgxpiwait()+08a4<-skgxpwait()+06fc<-ksxpwait()+081c<-ksliwat()+0a58<-kslwaitctx()+0150<-kslwait()+006c<-ksxprcvimd()+0368<-kjctr_rksxp()+013c<-kjctrcv()+0160<-kjcsrmg()+005c<-kjmdm()+2454<-ksbrdp()+075c<-opirip()+0444<-opidrv()+0414<-sou2o()+0090<-opimai_real()+0148<-main()+0090<-__start()+0070
    Process diagnostic dump actual duration=0.161000 sec
    (max dump time=30.000000 sec)
    *** 2010-03-01 16:54:43.825
    SO: 0x70000001ff913a0, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
    proc=0x70000001ff913a0, name=process, file=ksu.h LINE:10706 ID:, pg=0
    (process) Oracle pid:9, ser:1, calls cur/top: 0x70000001f733140/0x70000001f733140
    flags : (0x6) SYSTEM
    flags2: (0x100), flags3: (0x0)
    int error: 0, call error: 0, sess error: 0, txn error 0
    ksudlp FALSE at location: 0
    (post info) last post received: 0 0 83
    last post received-location: kji.h LINE:2369 ID:kjga: clear wait for lmon
    last process to post me: 70000001ff903b0 1 6
    last post sent: 0 0 25
    last post sent-location: ksa2.h LINE:282 ID:ksasnd
    last process posted by me: 70000001ff903b0 1 6
    (latch info) wait_event=68 bits=0
    Process Group: DEFAULT, pseudo proc: 0x70000001f4851d0
    O/S info: user: oracle, term: UNKNOWN, ospid: 274638
    OSD pid info: Unix process pid: 274638, image: oracle@host-node1 (LMD0)
    Dump of memory from 0x070000001FF70038 to 0x070000001FF70240
    70000001FF70030 00000000 00000000 [........]
    70000001FF70040 00000000 00000000 00000000 00000000 [................]
    Repeat 31 times
    SO: 0x70000001f6de4a0, type: 4, owner: 0x70000001ff913a0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
    proc=0x70000001ff913a0, name=session, file=ksu.h LINE:10719 ID:, pg=0
    (session) sid: 217 ser: 1 trans: 0x0, creator: 0x70000001ff913a0
    flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x408) -/-
    DID: , short-term DID:
    txn branch: 0x0
    oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
    0: waiting for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2613 seq_num=2614 snap_id=1
    wait times: snap=0.018269 sec, exc=0.018269 sec, total=0.018269 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    in_wait=1 iflags=0x5a8
    Wait State:
    auto_close=0 flags=0x22 boundary=0x0/-1
    Session Wait History:
    0: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2612 seq_num=2613 snap_id=1
    wait times: snap=0.160172 sec, exc=0.160172 sec, total=0.160172 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000008 sec of elapsed time
    1: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2611 seq_num=2612 snap_id=1
    wait times: snap=0.096359 sec, exc=0.096359 sec, total=0.096359 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000008 sec of elapsed time
    2: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2610 seq_num=2611 snap_id=1
    wait times: snap=0.098065 sec, exc=0.098065 sec, total=0.098065 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000007 sec of elapsed time
    3: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2609 seq_num=2610 snap_id=1
    wait times: snap=0.097831 sec, exc=0.097831 sec, total=0.097831 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000014 sec of elapsed time
    4: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2608 seq_num=2609 snap_id=1
    wait times: snap=0.095876 sec, exc=0.095876 sec, total=0.095876 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000008 sec of elapsed time
    5: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2607 seq_num=2608 snap_id=1
    wait times: snap=0.098788 sec, exc=0.098788 sec, total=0.098788 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000006 sec of elapsed time
    6: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2606 seq_num=2607 snap_id=1
    wait times: snap=0.098854 sec, exc=0.098854 sec, total=0.098854 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000007 sec of elapsed time
    7: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2605 seq_num=2606 snap_id=1
    wait times: snap=0.098040 sec, exc=0.098040 sec, total=0.098040 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000008 sec of elapsed time
    8: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2604 seq_num=2605 snap_id=1
    wait times: snap=0.097322 sec, exc=0.097322 sec, total=0.097322 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000007 sec of elapsed time
    9: waited for 'ges remote message'
    waittime=40, loop=0, p3=44
    wait_id=2603 seq_num=2604 snap_id=1
    wait times: snap=0.097334 sec, exc=0.097334 sec, total=0.097334 sec
    wait times: max=0.080000 sec
    wait counts: calls=1 os=1
    occurred after 0.000008 sec of elapsed time
    Sampled Session History
    The sampled session history is constructed by sampling
    the target session every 1 second. The sampling process
    captures at each sample if the session is in a non-idle wait,
    an idle wait, or not in a wait. If the session is in a
    non-idle wait then one interval is shown for all the samples
    the session was in the same non-idle wait. If the
    session is in an idle wait or not in a wait for
    consecutive samples then one interval is shown for all
    the consecutive samples. Though we display these consecutive
    samples in a single interval the session may NOT be continuously
    idle or not in a wait (the sampling process does not know).
    The history is displayed in reverse chronological order.
    sample interval: 1 sec, max history 120 sec
    KSFD PGA DUMPS
    Number of completed I/O requests=0 flags=0
    END OF PROCESS STATE
    LMON IPC context:
    ksxpdmp: facility 0 (?) (0x1, 0x0) counts 0, 0
    ksxpdmp: Dumping the osd context
    SKGXP: SKGXPCTX: 0x1103bfb58 ctx
    SKGXP:
    SKGXP: WAIT HISTORY
    SKGXP: Time(msec)     Wait Type     Return Code
    SKGXP: ----------     ---------     ------------
    SKGXP: 0          NORMAL          SUCC
    SKGXP: 0          NORMAL          SUCC
    SKGXP: 0          NORMAL          SUCC
    SKGXP: 0          NORMAL          SUCC
    SKGXP: 0          NORMAL          TIMEDOUT
    SKGXP: 12          NORMAL          TIMEDOUT
    SKGXP: 0          NORMAL          TIMEDOUT
    SKGXP: 20          NORMAL          TIMEDOUT
    SKGXP: 0          NORMAL          TIMEDOUT
    SKGXP: 19          NORMAL          TIMEDOUT
    SKGXP: 0          NORMAL          TIMEDOUT
    SKGXP: 20          NORMAL          TIMEDOUT
    SKGXP: 0          NORMAL          TIMEDOUT
    SKGXP: 19          NORMAL          TIMEDOUT
    SKGXP: 0          NORMAL          TIMEDOUT
    SKGXP: 20          NORMAL          TIMEDOUT
    SKGXP: wait delta 0 sec (27 msec) ctx ts 0x3e377 last ts 0x3e381
    SKGXP: user cpu time since last wait 0 sec 0 ticks
    SKGXP: system cpu time since last wait 0 sec 0 ticks
    SKGXP: locked 1
    SKGXP: blocked 51
    SKGXP: timed wait receives 0
    SKGXP: admno 0x485303b1 admport:
    SKGXP: SSKGXPT 0x103c0a74 flags sockno 12 IP 192.168.253.49 UDP 49777
    SKGXP: context timestamp 0x3e377
    SKGXP: buffers queued on port 1105aa950
    SKGXP:
    SKGXP: Dumping Connection Handle Table
    SKGXP: sconno accono ertt state seq# RcvPid TotCreditsSKGXP: sent rtrans acks
    SKGXP: CNH Table Bucket: 10
    SKGXP: 0x339d0248 0x6dd6841c 64 4 32838 589900 8SKGXP: 75d 5d 32838d
    SKGXP: CNH Table Bucket: 11
    SKGXP: 0x339d0249 0x75ef4c98 32 4 32811 1007758 8SKGXP: 48d 12d 32811d
    SKGXP: CNH Table Bucket: 12
    SKGXP: 0x339d024a 0x75703ec2 16 4 32763 524518 8SKGXP: 0d 0d 0d
    SKGXP: CNH Table Bucket: 13
    SKGXP: 0x339d024b 0x41094259 16 4 32763 520260 8SKGXP: 0d 0d 0d
    SKGXP: CNH Table Bucket: 14
    SKGXP: 0x339d024c 0x7c1c696c 16 4 32763 585808 8SKGXP: 0d 0d 0d
    SKGXP: CNH Table Bucket: 15
    SKGXP: 0x339d024d 0x138c8c4a 16 4 32763 843952 8SKGXP: 0d 0d 0d
    SKGXP:
    SKGXP: Dumping Accept Handle Table
    SKGXP: ach accono sconno admno state SndPid seq# rcv rtrans acks credits
    SKGXP: ACH Table Bucket: 1472
    SKGXP: 0x111088010 0x48cb4387 0x3365b236 0x1fe7dc68 40 1007758 32812 49 0 26 8
    SKGXP: ACH Table Bucket: 1474
    SKGXP: 0x11108b730 0x48cb4389 0x1c69654a 0x7183ff4c 40 589900 32838 75 0 52 8
    Incident 116865 created, dump file: /oradb/oracle/diag/asm/+asm/+ASM1/incident/incdir_116865/+ASM1_lmon_860280_i116865.trc
    ORA-29740: evicted by member 1, group incarnation 8
    error 29740 detected in background process
    ORA-29740: evicted by member 1, group incarnation 8
    *** 2010-03-01 16:54:46.430
    LMON (ospid: 860280): terminating the instance due to error 29740
    ksuitm: waiting up to [5] seconds before killing DIAG
    ==========
    DIAG trace files
    =========
    Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options
    ORACLE_HOME = /oradb/oracle/product/11.1/asm_1
    System name:     AIX
    Node name:     host-node1
    Release:     1
    Version:     6
    Machine:     00C39EA44C00
    Instance name: +ASM1
    Redo thread mounted by this instance: 0 <none>
    Oracle process number: 4
    Unix process pid: 614488, image: oracle@host-node1 (DIAG)
    *** 2010-03-01 16:50:22.947
    *** SESSION ID:(222.1) 2010-03-01 16:50:22.947
    *** CLIENT ID:() 2010-03-01 16:50:22.947
    *** SERVICE NAME:() 2010-03-01 16:50:22.947
    *** MODULE NAME:() 2010-03-01 16:50:22.947
    *** ACTION NAME:() 2010-03-01 16:50:22.947
    Node id: 0
    List of nodes: 0, 1, 2,
    *** 2010-03-01 16:50:22.948
    Reconfiguration starts [incarn=0]
    *** 2010-03-01 16:50:22.948
    I'm the master node
    Group reconfiguration cleanup
    *** 2010-03-01 16:50:23.602
    A rcfg proposal from node 2 is received
    *** 2010-03-01 16:50:23.602
    A rcfg proposal from node 1 is received
    *** 2010-03-01 16:50:23.602
    Reconfiguration completes [incarn=3]
    *** 2010-03-01 16:53:00.877
    A dump event msg is rcv'd
    REQUEST:trace dump in directory cdmp_20100301165301
    *** 2010-03-01 16:53:00.877
    Trace dumping is performing id=[cdmp_20100301165301]....
    *** 2010-03-01 16:53:01.041
    Trace dumping is done
    *** 2010-03-01 16:54:46.560
    Instance is terminating by process 860280 [ospid=oracle@host-node1 (LMON)]
    Performing diagnostic data dump for this instance
    Incident 116833 created, dump file: /oradb/oracle/diag/asm/+asm/+ASM1/incident/incdir_116833/+ASM1_diag_614488_i116833.trc
    ORA-29740: evicted by member , group incarnation
    Error 29740 encountered during system state dump
    *** 2010-03-01 16:54:49.280
    ----- Error Stack Dump -----
    ORA-29740: evicted by member , group incarnation
    *** 2010-03-01 16:54:49.281
    Trace dumping is performing id=[cdmp_20100301165446]....
    *** 2010-03-01 16:54:49.433
    Trace dumping is done

  • Oracle instance crashing when enabling use_indirect_data_buffers=true

    I have a Windows 2003 EE server (32bit) with 16GB of ram hosting a 10.2.0.2 Oracle server which is used to support a commercial software package (arcsight). I'm trying to get the Oracle backend to leverage the available system memory. I've read 50-60 different articles and posts regarding AWE and Oracle. I have successfully tuned the userva parameter in order to get the server to boot stable with the /3gb boot parameter. I've gotten to the point that the oracle instance will start up, but within about 30-60 seconds the instance will crash. Below is the information I believe that is relevant:
    *.......From computer Registry.........*
    AWE_MEMORY_WINDOW = 1288486912
    ORA_WORKINGSETMIN = 2
    *...........From init.ora.............*
    *.__dg_broker_service_names=';'
    arcsight.__java_pool_size=0
    arcsight.__large_pool_size=0
    arcsight.__shared_pool_size=314572800
    arcsight.__streams_pool_size=0
    *.audit_file_dest='E:\oracle10g\OraHome10g\admin\arcsight\adump'
    *.audit_sys_operations=true
    *.audit_trail='db'
    *.background_dump_dest='E:\oracle10g\OraHome10g\admin\arcsight\bdump'
    *.compatible='10.2.0.1.0'
    *.control_files='E:\oracle10g\OraHome10g\oradata\arcsight\control01.ctl','f:\arcsight\control02.ctl','g:\arcsight\control03.ctl'
    *.core_dump_dest='E:\oracle10g\OraHome10g\admin\arcsight\cdump'
    *.cursor_sharing='FORCE'
    **.db_block_size=16384*
    **.db_block_buffers=235929*
    *.db_domain=''
    *.db_file_multiblock_read_count=16
    *.db_files=2000
    *.db_name='arcsight'
    *.db_writer_processes=4
    *.dispatchers=''
    *.job_queue_processes=10
    *.log_archive_dest_1='LOCATION=H:'
    *.log_buffer=1048576
    *.open_cursors=2000
    *.parallel_max_servers=0
    *.pga_aggregate_target=314572800
    *.processes=300
    *.recyclebin='OFF'
    *.remote_login_passwordfile='EXCLUSIVE'
    *.sga_target=0
    *.undo_management='AUTO'
    *.undo_retention=43200
    *.undo_tablespace='ARC_UNDO'
    *.user_dump_dest='E:\oracle10g\OraHome10g\admin\arcsight\udump'
    *.java_pool_size=0
    *.large_pool_size=0
    *.shared_pool_size=314572800
    *.streams_pool_size=0
    **.use_indirect_data_buffers=true*
    *......From oradim.log.......*
    Sun Feb 22 18:37:33 2009
    E:\oracle10g\OraHome10g\bin\oradim.exe -shutdown -sid arcsight -usrpwd * -shutmode immediate -log oradim.log
    Sun Feb 22 18:37:34 2009
    ORA-01012: not logged on
    Sun Feb 22 18:37:45 2009
    E:\oracle10g\OraHome10g\bin\oradim.exe -startup -sid arcsight -usrpwd * -log oradim.log -nocheck 0
    Sun Feb 22 18:37:51 2009
    ORA-03113: end-of-file on communication channel
    *.......From alert_arcsight.log.........*
    Dump file e:\oracle10g\orahome10g\admin\arcsight\bdump\alert_arcsight.log
    Sun Feb 22 23:20:51 2009
    ORACLE V10.2.0.2.0 - Production vsnsta=0
    vsnsql=14 vsnxtr=3
    Windows Server 2003 Version V5.2 Service Pack 2
    CPU : 8 - type 586, 4 Physical Cores
    Process Affinity : 0x00000000
    Memory (Avail/Total): Ph:14554M/16215M, Ph+PgF:14862M/15967M, VA:1926M/2047M
    Sun Feb 22 23:20:51 2009
    Starting ORACLE instance (normal)
    Sun Feb 22 23:20:52 2009
    Window memory size 1288503296
    Sun Feb 22 23:20:52 2009
    Minimum working set window size : 4096
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Picked latch-free SCN scheme 2
    Autotune of undo retention is turned on.
    IMODE=BR
    ILAT =36
    LICENSE_MAX_USERS = 0
    SYS auditing is enabled
    ksdpec: called for event 13740 prior to event group initialization
    Starting up ORACLE RDBMS Version: 10.2.0.2.0.
    System parameters with non-default values:
    processes = 300
    use_indirect_data_buffers= TRUE
    __shared_pool_size = 318767104
    shared_pool_size = 318767104
    __large_pool_size = 0
    large_pool_size = 0
    __java_pool_size = 0
    java_pool_size = 0
    __streams_pool_size = 0
    streams_pool_size = 0
    sga_target = 0
    control_files = E:\ORACLE10G\ORAHOME10G\ORADATA\ARCSIGHT\CONTROL01.CTL, F:\ARCSIGHT\CONTROL02.CTL, G:\ARCSIGHT\CONTROL03.CTL
    db_block_buffers = 235932
    db_block_size = 16384
    db_writer_processes = 4
    compatible = 10.2.0.1.0
    log_archive_dest_1 = LOCATION=H:
    log_buffer = 2097152
    db_files = 2000
    db_file_multiblock_read_count= 16
    undo_management = AUTO
    undo_tablespace = ARC_UNDO
    undo_retention = 43200
    recyclebin = OFF
    remote_login_passwordfile= EXCLUSIVE
    audit_sys_operations = TRUE
    db_domain =
    __dg_broker_service_names= ;
    dispatchers =
    job_queue_processes = 10
    cursor_sharing = FORCE
    parallel_max_servers = 0
    audit_file_dest = E:\ORACLE10G\ORAHOME10G\ADMIN\ARCSIGHT\ADUMP
    background_dump_dest = E:\ORACLE10G\ORAHOME10G\ADMIN\ARCSIGHT\BDUMP
    user_dump_dest = E:\ORACLE10G\ORAHOME10G\ADMIN\ARCSIGHT\UDUMP
    core_dump_dest = E:\ORACLE10G\ORAHOME10G\ADMIN\ARCSIGHT\CDUMP
    audit_trail = DB
    db_name = arcsight
    open_cursors = 2000
    pga_aggregate_target = 314572800
    PMON started with pid=2, OS id=6676
    PSP0 started with pid=6, OS id=7544
    MMAN started with pid=10, OS id=7560
    DBW0 started with pid=14, OS id=6500
    DBW1 started with pid=18, OS id=6800
    DBW2 started with pid=22, OS id=6276
    DBW3 started with pid=26, OS id=520
    LGWR started with pid=30, OS id=6756
    CKPT started with pid=34, OS id=6380
    SMON started with pid=38, OS id=7472
    RECO started with pid=42, OS id=7696
    CJQ0 started with pid=46, OS id=7912
    MMON started with pid=50, OS id=7576
    MMNL started with pid=54, OS id=6852
    Sun Feb 22 23:20:53 2009
    alter database mount exclusive
    Sun Feb 22 23:20:57 2009
    Setting recovery target incarnation to 1
    Sun Feb 22 23:20:57 2009
    Successful mount of redo thread 1, with mount id 1799551061
    Sun Feb 22 23:20:57 2009
    Database mounted in Exclusive Mode
    Completed: alter database mount exclusive
    Sun Feb 22 23:20:57 2009
    alter database open
    Sun Feb 22 23:20:58 2009
    Beginning crash recovery of 1 threads
    parallel recovery setup failed: using serial mode
    Sun Feb 22 23:20:58 2009
    Started redo scan
    Sun Feb 22 23:20:58 2009
    Completed redo scan
    0 redo blocks read, 0 data blocks need recovery
    Sun Feb 22 23:20:58 2009
    Started redo application at
    Thread 1: logseq 1137, block 3, scn 1707289029
    Sun Feb 22 23:20:58 2009
    Recovery of Online Redo Log: Thread 1 Group 5 Seq 1137 Reading mem 0
    Mem# 0: I:\ARCSIGHT\REDO\REDO5.LOG
    Mem# 1: I:\ARCSIGHT\REDO\REDO05B.LOG
    Sun Feb 22 23:20:58 2009
    Completed redo application
    Sun Feb 22 23:20:58 2009
    Completed crash recovery at
    Thread 1: logseq 1137, block 3, scn 1707309030
    0 data blocks read, 0 data blocks written, 0 redo blocks read
    Sun Feb 22 23:20:59 2009
    LGWR: STARTING ARCH PROCESSES
    ARC0 started with pid=62, OS id=6972
    Sun Feb 22 23:20:59 2009
    ARC0: Archival started
    ARC1 started with pid=66, OS id=6640
    Sun Feb 22 23:20:59 2009
    ARC1: Archival started
    LGWR: STARTING ARCH PROCESSES COMPLETE
    Thread 1 advanced to log sequence 1138
    Thread 1 opened at log sequence 1138
    Current log# 4 seq# 1138 mem# 0: G:\ARCSIGHT\REDO\REDO4.LOG
    Current log# 4 seq# 1138 mem# 1: G:\ARCSIGHT\REDO\REDO04B.LOG
    Successful open of redo thread 1
    Sun Feb 22 23:21:00 2009
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Sun Feb 22 23:21:00 2009
    ARC0: Becoming the 'no FAL' ARCH
    ARC0: Becoming the 'no SRL' ARCH
    Sun Feb 22 23:21:00 2009
    ARC1: Becoming the heartbeat ARCH
    Sun Feb 22 23:21:00 2009
    SMON: enabling cache recovery
    Sun Feb 22 23:21:02 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_pmon_6676.trc:
    ORA-27103: internal error
    OSD-00028: additional error information
    Sun Feb 22 23:21:02 2009
    PMON: terminating instance due to error 27103
    Sun Feb 22 23:21:02 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_reco_7696.trc:
    ORA-27103: internal error
    Sun Feb 22 23:21:02 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_smon_7472.trc:
    ORA-27103: internal error
    Sun Feb 22 23:21:02 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_ckpt_6380.trc:
    ORA-27103: internal error
    Sun Feb 22 23:21:02 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_lgwr_6756.trc:
    ORA-27103: internal error
    Sun Feb 22 23:21:03 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_dbw3_520.trc:
    ORA-27103: internal error
    Sun Feb 22 23:21:03 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_dbw2_6276.trc:
    ORA-27103: internal error
    Sun Feb 22 23:21:03 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_dbw1_6800.trc:
    ORA-27103: internal error
    Sun Feb 22 23:21:03 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_dbw0_6500.trc:
    ORA-27103: internal error
    Sun Feb 22 23:21:03 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_mman_7560.trc:
    ORA-27103: internal error
    Sun Feb 22 23:21:04 2009
    Errors in file e:\oracle10g\orahome10g\admin\arcsight\bdump\arcsight_psp0_7544.trc:
    ORA-27103: internal error
    Instance terminated by PMON, pid = 6676
    I appreciate any input on what to look at to further isolate this issue. I'd run into many other issues along the way (setting AWE_WINDOW_MEMORY to a proper size, setting db_block_buffers to a proper value, etc) that various forum searches helped resolve but I've not been able to find anything related to the errors I'm getting now. If I set use_indirect_data_buffers=false and tune back the db_block_buffers, the instance starts without any problems. Its just when I try and enable the use of AWE that I'm having a problem.
    Nick

    Just wanted to close out this tread in case anyone else runs into a similar problem. Turns out we ran into a bug documented in the below linked article (we're using AMD processors). Essentially needed to disable NUMA.
    http://blog.csdn.net/orapeasant/archive/2007/06/05/1639532.aspx
    excerpt ....
    But please be aware of Bug 4494543 - affecting 10g and fixed in Oracle 11.0 ......
    ORA-7445: CORE DUMP [ACCESS_VIOLATION] WITH USE_INDIRECT_DATA_BUFFERS=TRUE
    Rediscovery Information:
    1) Using 32-Bit Oracle on a 32-Bit Windows 2003 server running on an AMD Opteron 64-Bit chip.
    2) You have set use_indirect_data_buffers=true in init.ora
    Workaround: Basically disable NUMA feature on 32-Bit platform :-
    1) Set ENABLENUMA = FALSE in Windows registry for the Oracle Home.
    2) Set enableNUMA_optimizations = FALSE (init.ora)
    Thanks for the help. We'll see if access to the extra memory will be useful or not .....
    Nick

  • After upgrade to 7.0, crash when left idle for more than a few minutes & if click contact without checking the box first.

    I had no serious problems with Firefox until downloading the current version. Since then I've had frequent crashes, usually several times a day, and even with CTL + ALT + Delete I have difficulty closing Firefox down. The crashes happen more frequently when I leave the program idle for, say, 15 minutes or more or when I attempt to open a contact for editing by directly clicking on the contact name, without checking the box first.
    My anti-virus software is ESET-NOD.

    I get this on firefox. I get this on waterfox. I get this on nightly. I don't get it on chrome. I don't get it on internet explorer. I don't get it on comodo dragon. I don't get it on arora. I don't get it on konqueror...... Anyone noticing a pattern here?
    I'm willing to admit that it's probably something to do with my isp server, so Mozilla can feel holier than thou about it, but if that's the way they insist on going, I suggest that the next time they upgrade, they call their browser Canute.

  • ITunes crashes when selecting songs for importing from CD - any clues?

    I'm running XP Pro on a Dell Latitude 620 with the latest version of iTunes.
    When I put a CD in, iTunes shows me the tracks. But as I am selecting tracks to play (double-clicking on them), randomly iTunes crashes. Not immediatly repeatable, but it always happens within 5 minutes or so of selecting songs.
    Any clues out there?

    I had removed a couple of those artists and later replaced them with  backups from a separate hard drive and it didn't help. I was going to ditch and replace all of them today to see if that made a difference. However, before I did, I thought I'd check a couple of things as far as renaming the artists was concerned. So I added a number to one problem artist's name, then tried opening it in the 'Artist' list. It worked fine.  I went back, removed the number, and tried opening it again. Worked fine again. So I checked the other problem artists, without altering the names. They all open just fine. Problem solved! Well, not solved but mysteriously gone away, at least for now.
    I think you're right about a faulty link being the problem, and I suppose something I did reestablished that link. Don't know what it was though, so I'm afraid I can't offer a solution to anyone else with a similar problem.

  • Windows 8.1 Pro Crashes When Enabling Hyper-V

    So I bought a new HP Envy Desktop specifically for Windows Phone Development and ironically I can't get Hyper-V running. The machine just hangs on startup and eventually Windows 8.1 Pro x64 just crashes after multiple attempts after enabling Hyper-V in Windows
    Programs/Features.  It's a Intel i5 with 12GB of RAM and virtualization is enabled in the BIOS.  I've read through multiple threads over the last several days and tried the following:
    - Updating all local drivers and Windows updates
    - Disabling Bluetooth
    - Uninstalling Avira AV
    - Someone also suggested disabling USB 3.0 support but I don't see an option to do this in my BIOS.
    I've wasted a lot of time on this and I would really appreciate any help.
    Thanks

    Hi ericvanburen,
    I found similar issue with HP Envy Desktop in HP forum , it might be due to blue tooth driver .
    And I quoted the genneral solution here :
    1. Enter into BIOS setup and set Virtualization as disabled and reboot system.
    2. After booting windows 8, enter into Control panel and remove bluetooth driver.
    3. Download and install Ralink Bluetooth driver version 9.2.101.10 (SP59632)
    4. Reboot system and enable Virtualization Technology again.
    For details please refer to following link:
    http://h30434.www3.hp.com/t5/Notebook-Operating-Systems-and-Software/HP-ENVY-M6-1106er-Windows-8-Pro-hangs-up-to-start-after/m-p/2386453#M126362
    Best Regards
    Elton Ji
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • My App Store keeps crashing when I look for an app! How can I change this?

    When I load an App my App Store crash! :/

    Hello, I'm sorry to hear that you are experiencing what most of us are experiencing.
    This is not something any of the users can fix, so we just have to wait for apple to bring a new update.
    Here is one thing you can try to clear it up though. Try rebooting your device by holding the power button down until it prompts you to shutdown, and restart it. If that doesn't work, try holding the power button and the home button down until the device is forced into a shutdown, and restart it.
    I hope that helps a little at least.
    ~Lt. Leviathan

  • Crash when adding artwork for Films/Movies

    I have converted some DVDs to .mp4 format for use on my iPad. The files play fine in Quicktime and iTunes, but when I "Get Info" and add artwork for the file - after clicking ok, the spinning beachball appears and iTunes eventually crashes.
    The files themselves are about 3GB each. I have tried opening iTunes in safemode, and it does add the artwork, but as soon as I deslect the video, the artwork clears again.
    Does anyone have a solution, or some tips?
    I am using Mac OS X 10.7.3 and iTunes 10.6 (40).
    Thanks!

    This is also driving me nuts... Actually, I have not even figured out how to even enter a new appointment in month view. It used to be double-click, enter name, hit <tab> and fill out he rest in the panel, but this routine does not work any more.

Maybe you are looking for