The panic protocol and Coherence 3.5

All,
We just upgraded from 3.3.1 to 3.5 but I'm having trouble forming a cluster in multi-server environments. Our config files were developed against older versions of Coherence and I had a lot of trouble with them at first, some of which is detailed here: Config file problem with new Coherence 3.5
The problem now is that we have 2 standalone nodes and 2 application nodes (WebLogic) spread across 2 physical servers (1 standalone and 1 application on each box.) Previously (Coherence 3.3.1,) they all formed one happy cluster of 4 members. Now (Coherence 3.5,) they form separate clusters: each physical machine makes a cluster of 2 members. At startup, I can see the 2-node clusters form. Some time later (not immediately) I see the "unexpected cluster heartbeat" message warning about getting a heartbeat from the other physical server. Clearly the members of the different servers can communicate to some degree if they get these unexpected heartbeats. But why don't they form a cluster in the first place?
If I understand the config correctly, we're using a ttl of 4, the default. I ran the multicast test and a ttl of 1 worked also. I think the join timeout is 30000.
When the standalone node starts, it outputs a ttl of 4 and the expected cluster address and port to the log.
One wrinkle in the config is that there are 2 applications deployed to the same weblogic jvm that both use Coherence. They are in separate classloaders and use unique cluster ports. This hasn't been a problem in the past. Now, however, my app is Coherence 3.5 and the other one is still 3.3.1. The Coherence jars are not shared and the startup params apply to both applications.
In the past I've seen errors where 2 nodes weren't using the same coherence version, same cluster name, etc. but I don't see anything like that now.
thanks
john

Hi John,
The clustering technologies did not change between 3.3 and 3.5. The fact that you could establish a multicast best cluster in 3.3 and not in 3.5 is therefor quite odd. My initial guess would be that your network may be blocking certain multicast address/port ranges? Are you using the same multicast address and port as you'd successfully used in 3.3? Also please use this address and port when running the multicast test to make it as close as possible to the medium on which coherence is trying to operate.
If none of these suggestions resolves the issue, can you please post the following:
- multicast test output from all nodes running the test concurrently
- coherence logs from all nodes, including startup, and panic
- coherence operational configuration
Regarding the mix of Coherence 3.3 and 3.5 in the same JVM. So long as they are classloader isolated and running on a different multicast address/port you should be fine. Note I'm suggesting that both the address and the port be different. Some OSs (Linux) has issues related to not taking the port into consideration during multicast packet delivery. It wouldn't hurt to try starting 3.5 without the 3.3 app running, just to ensure that it isn't causing your troubles in some unforeseen way.
thanks,
Mark
Oracle Coherence

Similar Messages

  • Life after the panic protocol

    We have 2 servers that run on GridGain containing Coherence distributed caches (version 3.4.1). The nodes in both servers are used for processing events. Those events are stored in the cache.
    When the network connection between server A and B fails, each one will continue in its own cluster island. Once the connection is established, Coherence will first log a message like the following:
    10 Mar 2009 10:07:32,748 [Logger@9257178 3.4.1/407] WARN Coherence - 2009-03-10 10:07:32.747/88550.631 Oracle Coherence GE 3.4.1/407 <Warning> (thread=Cluster, member=1): The member formerly known as Member(Id=6, Timestamp=2009-03-10 09:07:40.389, Address=192.168.1.7:8088, MachineId=40455, Location=process:29423, Role=ServerMain) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored.
    and half a minute later it will log the following:
    10 Mar 2009 10:08:02,803 [Logger@9257178 3.4.1/407] WARN Coherence - 2009-03-10 10:08:02.803/88580.687 Oracle Coherence GE 3.4.1/407 <Warning> (thread=Cluster, member=1): An existence of a cluster island with senior Member(Id=6, Timestamp=2009-03-10 09:02:07.28, Address=192.168.1.7:8088, MachineId=40455, Location=process:29423, Role=ServerMain) containing 5 nodes have been detected. Since this Member(Id=1, Timestamp=2009-03-09 09:31:52.149, Address=192.168.1.6:8088, MachineId=40454, Location=process:6853, Role=ServerMain) is the senior of an older cluster island, the panic protocol is being activated to stop the other island's senior and all junior nodes that belong to it.
    All this makes sense. However there's about 30 seconds between the time the network connection was reestablished and the time the cache from the "bad" cluster island was restarted. During those 30 seconds we are already assuming that the nodes from the "bad" cluster island can be used for processing, so events are already added to the cache on the nodes of the "bad" cluster. After the panic protocol the caches are restarted and the events that were added in those last 30 seconds are gone.
    There are two solutions that come to my mind.
    1. We make sure that we don't consider those rejoined nodes for processing events untill after the panic protocol is resolved. Could we use a MemberListener for that? Will we only get a MemberListener.memberJoined() after the panic protocol is executed?
    2. We already use those rejoined nodes for event processing, but we restart any event processing once we get notified of the occurence of the panic protocol. Is there a way we can listen for such an event indicating the cluster has been restarted?
    Best regards
    Jan

    Hi, the problem is we don't know in advance how many members the cluster will contain or even how many nodes each server will contain or how many servers there will be in the cluster. So stopping event processing when the amount of members in the cluster drop to a certain level won't work.
    However we could keep a list of the servers that aren't available anymore and when the connection is reestablished, wait for the members to reappear in the cluster before considering them for event processing.

  • Solaris 10 gen the panic message and let the server auto reboot

    My Solaris 10 server would auto reboot at anytime.
    I think is ip:tcp_unfuse problem. But I don't know how to fix it.
    Anyone know how to fix this problem!!!!!!!
    Many Thanks!
    The following is error information
    # mdb 0
    ::statusdebugging crash dump vmcore.0 (32-bit) from tophkweb
    operating system: 5.10 Generic (i86pc)
    panic message:
    BAD TRAP: type=e (#pf Page fault) rp=d4a76acc addr=9d occurred in module "ip" due to a NUL
    L pointer dereference
    dump content: kernel pages only
    panic[cpu0]/thread=d4a48c00:
    BAD TRAP: type=e (#pf Page fault) rp=d4a76acc addr=9d occurred in module "ip" due to a NUL
    L pointer dereference
    ::msgbuf...........
    httpd:
    #pf Page fault
    Bad kernel fault at addr=0x9d
    pid=277, pc=0xfeaa24b7, sp=0xd479f300, eflags=0x10202
    cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
    cr2: 9d cr3: 1e472000
    gs: fea401b0 fs: fea80000 es: d4530160 ds: ffff0160
    edi: d4594800 esi: 0 ebp: d4a76b18 esp: d4a76afc
    ebx: d479f440 edx: d3e83b40 ecx: 0 eax: a1
    trp: e err: 0 eip: feaa24b7 cs: 158
    efl: 10202 usp: d479f300 ss: d479f440
    d4a76a2c unix:die+a7 (e, d4a76acc, 9d, 0)
    d4a76ab8 unix:trap+f56 (d4a76acc, 9d, 0)
    d4a76acc unix:cmntrap+83 ()
    d4a76b18 ip:tcp_unfuse+2b (d479f440)
    d4a76bf4 ip:tcp_rput_data+4a (d479f300, d4594800,)
    .......

    other information:-
    ::regs%cs = 0x0158 %eax = 0x000000a1
    %ds = 0xffff0160 %ebx = 0xd479f440
    %ss = 0xd479f440 %ecx = 0x00000000
    %es = 0xd4530160 %edx = 0xd3e83b40
    %fs = 0xfea80000 %esi = 0x00000000
    %gs = 0xfea401b0 %edi = 0xd4594800
    %eip = 0xfeaa24b7 tcp_unfuse+0x2b
    %ebp = 0xd4a76b18
    %esp = 0xd4a76afc
    %eflags = 0x00010202
    id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0
    status=<of,df,IF,tf,sf,zf,af,pf,cf>
    %uesp = 0xd479f300
    %trapno = 0xe
    %err = 0x0
    $ctcp_unfuse+0x2b(d479f440)
    tcp_rput_data+0x4a(d479f300, d4594800, d3e8ef40)
    tcp_input+0x39(d479f300, d4594800, d3e8ef40)
    squeue_enter+0xb5(d3e8ef40, d4594800, feaae3c9, d479f300, 1)
    ip_fanout_tcp+0x38e(d5336b60, d4594800, da979714, d3e83ba0, a3, 0)
    ip_wput_local+0x1bb(d5336b60, da979714, d3e83ba0, d4594800, d45318c0, 0)
    ip_wput_ire+0x1436(d5336b60, d4594800, d45318c0, d513d540, 2)
    ip_output+0x70a(d513d540, d4594800, d5336b60, 2)
    tcp_send_data+0x5f9(d513d680, d5336b60, d4594800)
    tcp_xmit_end+0x88(d513d680)
    tcp_close_output+0x115(d513d540, d513d890, d3e8ef40)
    squeue_enter+0x1bf(d3e8ef40, d513d890, feaa5924, d513d540, 6)
    tcp_close+0x6c(d5336ad8, 83, dad03b48)
    qdetach+0x8b(d5336ad8, 1, 83, dad03b48, 0)
    strclose+0x2e3(d4675000, 83, dad03b48)
    socktpi_close+0xd7(d4675000, 83, 1, 0, 0, dad03b48)
    fop_close+0x26(d4675000, 83, 1, 0, 0, dad03b48)
    closef+0x56(d508d950)
    closeandsetf+0x2e7(16, 0)
    close+0xd()
    sys_sysenter+0xdc()
    tcp_unfuse::distcp_unfuse: pushl %ebp
    tcp_unfuse+1: movl %esp,%ebp
    tcp_unfuse+3: pushl %ebx
    tcp_unfuse+4: pushl %esi
    tcp_unfuse+5: movl 0x8(%ebp),%ebx
    tcp_unfuse+8: movl 0x2cc(%ebx),%esi
    tcp_unfuse+0xe: movb 0x9d(%ebx),%al
    tcp_unfuse+0x14: testb $0x1,%al
    tcp_unfuse+0x16: jne +0x15 <tcp_unfuse+0x2b>
    tcp_unfuse+0x18: leal 0x2d0(%ebx),%eax
    tcp_unfuse+0x1e: pushl %eax
    tcp_unfuse+0x1f: pushl %ebx
    tcp_unfuse+0x20: pushl 0x40(%ebx)
    tcp_unfuse+0x23: call +0x75b <tcp_fuse_rcv_drain>
    tcp_unfuse+0x28: addl $0xc,%esp
    tcp_unfuse+0x2b: movb 0x9d(%esi),%al
    tcp_unfuse+0x31: testb $0x1,%al
    tcp_unfuse+0x33: jne +0x15 <tcp_unfuse+0x48>

  • AS2  as transport protocol and AS2xml as message protocol

    Dear All
    AS2 adapter has been installed in our landscape ,when I am trying to select the transport protocol and message protocol  ,I can only see AS2 as transport protocol and AS2xml as the message protocol instead of the normal HTTP/ HTTPs for the Transport protocol and AS2 as message protocol.COuld anyone tell me if there is a problem.We need to send xml messages over the AS2 adapter and receive MDN's.

    Hi Arjun,
    Refer below forum threads which was replied which talks about the configuration which need to be done for AS2 adapter. It will be helpful:
    Re: Pls.. Help Needed.. Seeburger Mapping Names..!!
    Re: Seeburger AS2 adapter...
    Re: AS2 Module tab.. Mapping Names for modified Standard Msg types ? ? BIC ??
    AS2 adpater-- Configuration details for both SND and RCV.
    Re: Regarding Seeburger AS2 Adapter
    Regards,
    Vinod.

  • Transport protocol and authentication method

    Hi gurus,
    i am trying to configure EBP-SUS, i am not having access to solution manager .
    I am working on SRM_SERVER 5.5.
    can somebody who has configured EBP-SUS give me more information about the transport protocol and the authenticaton method.

    solved by self

  • Retrieving protocol and connection address in runtime

    Hi,
    Im trying to Debug p2p video application in browser and I noticed some problems when using different rooms in my application.
    I need to figure out the actual protocol and node connection in order to identify the problem, the thing is that its only available in trace
    while testing in IDE.
    Is there additional way to know the fms node name and protocol after actual connection.
    Best Regards,
    Gadi Srebnik

    With the latest SDK you can override the debug trace function and have the messages go, for example, in a textarea in your app.
    Look at com.adobe.rtc.util.DebugUtil.traceFunction.
    Otherwise you can dig in the ConnectSession object and get the underlying NetConnection object.

  • What protocol and port number(s) does BEx Analyzer use?

    I'm trying to find out if it's possible to packet shape BEx traffic to give it a higher priority. Is BEx using the same protocol and ports as the SAP GUI?

    Hi,
    Download Tcpview from the following link and see processes and  ports used by SAP GUI and Bex
    http://download.sysinternals.com/Files/TcpView.zip
    In addition to tcpview you may install sniffer software to your computer to capture and analysis network packets.
    Regards.

  • GoldenGate and Coherence

    Hi,
    There is recent requirement where customer has asked for GoldenGate integration with Oracle Coherence.
    Can anyone provide some pointers, I have browsed a lot but was didn't get anything substantial.
    Some documentation or something + I have also get this thing that above integration is expected in 2012, till now its not possible.
    Some of the stuff I browsed suggests that:
    1) Using DCN or AQ (along with some triggers , JMS listeners) to publish changes to coherence.
    2) Some posts suggested using GG Java API
    3) others suggested using GG JMS Adapter.
    which was is good or any other option.
    Thanks,
    Vikas

    The out-of-the-box GoldenGate integration with Coherence and TopLink to keep a Coherence cache in-sync with the database (for those times when the database is updated not through the cache) was a coordinated development effort amongst the GoldenGate, TopLink and Coherence products. This feature will work in the upcoming GoldenGate v11.2 "adapters" release, but I just found out myself that this will be shipped with Coherence, as part of the 12c release (but will work in any OGG 11.2 or later release).
    In the meantime, you've correctly enumerated your available options.

  • My iPhone 4 keeps shutting down :/ lately it's taken 2hrs to come back  up. Diagnostics shows panic.plist and I've had my phone barely over a year. Will Apple replace it? As I hear I need to make an appt at the apple store.

    My iPhone 4 keeps shutting down. The past 3 days it's taken at least 2hrs to come back on. It constantly has no service and tells me system restore is needed. Under diagnostics it shows panic.plist. I've had my phone a year and 3mths. I hear I need to go to apple to get a replace as this is a software issue. It isn't under warranty. Will they replace it or is it a waste of my time? I recently just got the iOS update and now my phone gets hot to the touch also. Please help. My upgrade isn't until summer of next year but I need a phone that works and will get signal!

    Make an appointment at
    http://www.apple.com/retail/geniusbar/
    Out of Warranty replacement for iPhone 4 is $149 US;
    you will get the same model, color and capacity you have.
    If your current iPhone was locked to a wireless provider,
    the replacement will be locked to that same provider.

  • What's the proper protocol for a reset on my ipod touch 4g?  iOS 6 has totally jacked it up and it will no longer do anything but crash, and won't sync with itunes wirelessly or by cable.

    What's the proper protocol for a reset on my ipod touch 4g?  iOS 6 has totally jacked it up and it will no longer do anything but crash, and won't sync with itunes wirelessly or by cable.
    It's a 64G ipod touch and was fine till Apple told me to upgrade to ios 6.  Now most of my apps crash, my music won't play and I just get a white screen when I hit Music.
    When I try to sync to itunes it acts like it's going to sync and appears to recognize the ipod, but it's grayed out and has an update circle by it that spins for a while until itunes eventually freezes alltogether.  Is there a  way to go back to ios 5 after a erase and reset?

    iOS: Unable to update or restore

  • File, Send link doesn't open a new email. Using Firefox 11.0. Outlook 2010 is the Mailto default and W7 default email program. On the About:config page network.protocol-handler.external.mailto is set to regular font (not bold) "default Boolean true".

    File, Send link doesn’t open a new email. Running Firefox 11.0. Outlook 2010 is the Mailto default and the W7 default email program. On the About:config page, network.protocol-handler.external.mailto is set to regular font (not bold) “default Boolean true”.

    I assume you have tried toggling the setting in Firefox between Outlook and, say, Gmail:
    orange Firefox button ''or'' classic Tools menu > Options > Applications
    In the search box, type or paste '''mailto''' and pause for the list to filter.
    Change the setting and OK to save it, then return to the dialog, change back, and OK again.
    You also might want to toggle the setting at the OS level between Microsoft Outlook and the native Windows Mail client in a similar fashion. In Windows XP you could use IE's Options dialog, Programs tab, for this, but I'm not sure in Windows 7.
    Since one possibility is a problem in your Firefox settings (including the possibility of interfering add-ons), and another is a problem at the Windows level (e.g., Registry settings), it would be useful to try to identify which one it is. One quick way to distinguish is to create a new Firefox profile. It will start up with all factory settings. You can switch back to your existing profile after testing.
    First, I recommend backing up your Firefox settings in case something goes wrong. See [https://support.mozilla.com/en-US/kb/Backing+up+your+information Backing up your information]. (You can copy your entire Firefox profile folder somewhere outside of the Mozilla folder.)
    After closing Firefox, start up again in the Profile Manager as described in this article: [http://support.mozilla.com/kb/Managing+profiles Managing profiles].
    With the new profile, can Firefox successfully create a message in Outlook?

  • Hello I am not able to published to the web using an FTP the test has a negative response  I do not know what is required in Directory/path Protocol and port

    Hello I am not able to published to the web using an FTP the test has a negative response  I do not know what is required in Directory/path Protocol and port

    If you use FTP then ftp is the protocol and 21 is the port.
    Your webhoster will tell you what path to use.
    You probably can read it in the FAQ/Help/Support pages where you host your website.
    All difficult words are explained in manuals, dictionaries or wikis.

  • RDS 2012 - Slow Perforamance, random disconnects - The RDP protocol component X.224 detected an error (0) in the protocol stream and the client was disconnected.

    We have an RDS environment configured on server 2012 with approx. 20 users connecting for remote app utilization across 4 different locations that are connected via VPN. Server 2012 has great resources from the virtual host so system resource allocation
    shouldn't be an issue. I'm thinking these errors are correlating with the performance problems. Any recommendations on how to effectively end these errors or to boost performance?
    RDS Log File
    Log Name:      Microsoft-Windows-RemoteDesktopServices-RdpCoreTS/Operational
    Source:        Microsoft-Windows-RemoteDesktopServices-RdpCoreTS
    Date:          3/3/2015 7:47:51 PM
    Event ID:      97
    Task Category: RemoteFX module
    Level:         Warning
    Keywords:     
    User:          NETWORK SERVICE
    Computer:      REMOTE1.mzltg.local
    Description: The RDP protocol component X.224 detected an error (0) in the protocol stream and the client was disconnected.
    System Log Error Log Name:      System
    Source:        Schannel
    Date:          3/4/2015 10:42:02 AM
    Event ID:      36887
    Task Category: None
    Level:         Error
    Keywords:     
    User:          SYSTEM
    Computer:      REMOTE1.mzltg.local
    Description: A fatal alert was received from the remote endpoint. The TLS protocol defined fatal alert code is 49.

    Hi Shane,
    Do you have any progress at the moment?
    Regarding the TLS error code 49, it indicates a valid certificate was received, but when access control was applied, the sender did not proceed with negotiation.
    More information for you:
    SSL/TLS Alert Protocol & the Alert Codes
    http://blogs.msdn.com/b/kaushal/archive/2012/10/06/ssl-tls-alert-protocol-amp-the-alert-codes.aspx
    Best Regards,
    Amy
    Please remember to mark the replies as answers if they help and un-mark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected]

  • Java API that implements the SSH, SFTP and Telnet protocols

    Hi,
    I'm looking for a Java API that implements the SSH, SFTP and Telnet protocols. Does anyone have a suggestion?
    Any Suggestions are really appreciated ?
    Thanks,
    Avin

    I believe SSH and telnet are used for interactive command line sessions, don't know how you want to use them in a program.

  • EVENT 36888, Schannel A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 43. The Windows SChannel error state is 252.

    I keep losing my network connection for a few seconds at a time.  Not  a big deal unless I just spent time filling in a form and have to redo it.
    Getting:
    A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 43. The Windows SChannel error state is 252. Using windows 8.  I just installed the new ARRIS
    TG862 provided by Comcast. 
    Any Ideas?
    Also get the following errors in my events:
    The name "WORKGROUP      :1d" could not be registered on the interface with IP address 10.0.0.2. The computer with the IP address 10.0.0.3 did not allow the name to be claimed by this computer.
    Realtek PCIe GBE Family Controller is disconnected from network.
    Any help is appreciated

    Hi,
    Critical Kernel-power event ID 41 is used appear after PC restarts or randomly restarts with error
    BugcheckCode listed or a cold reboot. Do you get BSOD and some dump files?
     Default location is %SystemRoot%\Minidump. You can upload it to skydrive, then paste link here.
    How to use Skydrive
    http://www.wikihow.com/Use-SkyDrive
    Kernel-PnP event ID 219: A Plug and Play device driver on your system is failing to load due to a device driver or device malfunction, you can unplug any external devices (except mouse and keyboard, but please keep the latest drivers), and
    check device status in device manager, please also keep the all latest driver update of your PC.
    And for error 36888, I found a similar thread, please refer to this link
    http://social.technet.microsoft.com/Forums/windowsserver/en-US/4c5430f5-43f6-41b4-97d3-03cfb3efa70b/schannel-error-event-id-36888-is-there-a-way-to-identify-what-causes-schannel-to-log-error?forum=winserverDS
    Regards
    Yolanda
    TechNet Community Support

Maybe you are looking for