Urgent: Child process closed admin channel on Solaris 9 and 6.1 SP5

Hi
I'm facing this issue on Solaris 9 and Sun AppServer 6.1 SP5, the appserver crashes for some reason and restarts itself. There doesnt seem to be any specific instance which leads to this error. I notice that if I leave my application running for a while I end up seeing this error and I need to kill the instance that was running and restart the server instance.
This is what i see in the log files before the server restarts itself.
" failure (12181): CORE3107: Child process closed admin channel"
It seems like i'm not the only one seeing this issue:
http://swforums.sun.com/jive/thread.jspa?threadID=55040&messageID=210473
http://forum.sun.com/jive/thread.jspa?threadID=95718&messageID=328813
Anybody knows what could be going on?
Thanks

After the corefile is written, use pstack on it. In the pstack file you a stacktrace of the crashing function. Alas, this information is only useful when you have the sources.
Also, usually there are other, more specific lines in the errors logfile e.g.:
[27/Jan/2006:07:38:55] catastrophe (15456): CORE3260: Server crash detected (signal SIGSEGV)
[27/Jan/2006:07:38:55] info (15456): CORE3262: Crash occurred in function INTprepare_nsapi_thread from module /opt/SUNWwbsvr/bin/https/lib/libns-httpd40.so
[27/Jan/2006:07:38:55] failure (15455): CORE3107: Child process closed admin channel
If you post them, maybe somebody can give you a hint...

Similar Messages

  • "CORE3107: Child process closed admin channel" in sun webserver 6.1

    Hi,
    Does anyone know of the following error in the Sun Java Webserver 6.1 logs:
    ===========================================
    [08/Mar/2010:20:41:12] failure (27868): CORE3107: Child process closed admin channel
    [08/Mar/2010:20:41:12] info (13932): HTTP3072: [LS ls1] http://172.21.159.184:80 ready to accept requests
    [08/Mar/2010:20:41:12] failure (13932): HTTP3068: Error receiving request from 172.21.158.15 (Connection refused)
    [08/Mar/2010:20:41:12] failure (13932): HTTP3068: Error receiving request from 172.21.158.14 (Connection refused)
    [08/Mar/2010:20:41:12] failure (13932): HTTP3068: Error receiving request from 172.21.158.15 (Connection refused)
    [08/Mar/2010:20:41:12] info (13932): CORE3274: successful server startup
    [08/Mar/2010:20:41:12] failure (13932): HTTP3068: Error receiving request from 172.21.158.14 (Connection refused)
    [08/Mar/2010:20:45:13] failure (27868): CORE3107: Child process closed admin channel
    [08/Mar/2010:20:45:13] info (15075): HTTP3072: [LS ls1] http://172.21.159.184:80 ready to accept requests
    [08/Mar/2010:20:45:13] failure (15075): HTTP3068: Error receiving request from 172.21.158.15 (Connection refused)
    [08/Mar/2010:20:45:13] failure (15075): HTTP3068: Error receiving request from 172.21.158.14 (Connection refused)
    [08/Mar/2010:20:45:13] info (15075): CORE3274: successful server startup
    =============================================
    We have Sun One Webservers 6.1 installed on Solaris 10 box and where we have installed Siebel Webserver Extension (SWSE) plugin for our siebel application server. Any pointers on how we can fix these crashes(child proc error) on webserver?
    Appriciate your help.
    Thanks,
    Mahipal

    Hello were there ever any updates to this post, I'm having basically the same issue and would be interested on an update.

  • CORE3107: Child process closed admin channel

    I Installed SUN ONE Web server 6.1 on a UNIX machine.
    Installed 2 web applications.
    1 of them works fine (has no Oblix logic), but the other gives me the error message "CORE3107: Child process closed admin channel" and terminates the application.
    The point of failure occurs when I call Oblix to initialize.
    I have the same web application on IPlanet version 6.0 and has no issues.
    Pease help.

    That message indicates that the child process terminated unexpectedly. Given your description, the problem is likely a crash in the Oblix plugin.
    You may wish to check for core files.
    (Note that "UNIX" is not enough information for us to suggest specific steps. Sun ONE Web Server 6.1 is supported on 3 different UNIX platforms, and each has a very different set of diagnostic tools.)

  • Child process dies, nfs locks not released, webserver hangs...

    Hi,
    I have Sun One 6.1 sp 11 on a solaris 10 ldom.
    The server is configured to write logs access and error to /logs which is an NFS mount to a separate solaris 10 box. The logging to an NFS mount is a business requirement.
    Sun JWS is configured to have two httpd processes and the watchdog to restart them if one should fail.
    Every now and then, about once a day (it varies), one of the child processes will die with messages like this in the error log: (1949 is the wdog pid)
    [09/Dec/2009:14:19:06] failure ( 1949): CORE3107: Child process closed admin channel
    [09/Dec/2009:14:19:06] fine ( 1949): CORE3061: signal_handler_thread: received signal 18
    [09/Dec/2009:14:19:06] fine ( 1949): CORE3049: Primordial process detected child 1950 died: status 37
    [09/Dec/2009:14:19:06] fine ( 1949): CORE3050: Is our child, will spawn replacement
    [09/Dec/2009:14:19:06] fine ( 1949): CORE3062: Unlinking of /tmp/https-wv2-819e4c2d/.cgistub_1950 returned -1
    [09/Dec/2009:14:19:06] fine ( 1949): CORE3047: Server spawned worker process 2011
    [09/Dec/2009:14:19:06] fine ( 2011): HTTP5169: User authentication cache entries expire in 120 seconds.
    [09/Dec/2009:14:19:06] fine ( 2011): HTTP5170: User authentication cache holds 200 users
    [09/Dec/2009:14:19:06] fine ( 2011): HTTP5171: Up to 4 groups are cached for each cached user.
    [09/Dec/2009:14:19:06] fine ( 2011): HTTP4207: file cache module initialized (API versions 2 through 2)
    [09/Dec/2009:14:19:06] fine ( 2011): HTTP4302: file cache has been initialized
    [09/Dec/2009:14:19:06] fine ( 2011): HTTP3066: MaxKeepAliveConnections set to 256
    [09/Dec/2009:14:19:06] fine ( 2011): Installed configuration 1
    [09/Dec/2009:14:19:06] fine ( 2011): HTTP4193: flex-rotate-init: rotate start time is 0h, 0m
    At this point the webserver will not respond. The processes (2*httpd, 1*wdog) are running but do not respond. The access log shows a weird lock with output from pfiles:
    21: S_IFREG mode:0777 dev:340,10 ino:34988 uid:111 gid:102 size:0
    O_RDWR|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC
    advisory write lock set by system 0x2 process 280
    which I think means the new http process is waiting for the lock to be released, but the lock is never freed.
    But what I'm really curious about is why the process is dying in the first place. Anyone seen "status 37" before, or know where I can look it up? I couln't google up any reference on what it might mean...
    any help appreciated
    cheers
    Kristin.

    I found the following in http://docs.sun.com/app/docs/doc/816-4555/rfsrefer-134?l=ja&a=view :
    In this situation, the SIGLOST signal is posted to the process. The default action for the SIGLOST signal is to terminate the process.
    For you to recover from this state, you must restart any applications that had files open at the time of the failure. Note that the following can occur.
    - Some processes that did not reopen the file could receive I/O errors.
    - Other processes that did reopen the file, or performed the open operation after the recovery failure, are able to access the file without any problems.
    Thus, some processes can access a particular file while other processes cannot.
    Edited by: Arvind_Srinivasan on Dec 10, 2009 12:33 AM

  • BPEL child process issue

    Problem Description:
    Parent process invoking more than 10 concurrent child processes with non-blocking invoke = true. All the child processes are not invoked at
    the same time. Some of the child processes are waiting for others to complete and then invoked. Sometimes all the child processes(tested till
    100 child processes) are invoked at same time and when tested immediately for the next time it executes in different timings. Say for
    example, if I invoke 90 child processes then executes like 83+7.
    Parent process time – 3 mins(set)
    Child process execution time – 2 mins
    So the parent will be alive for 3 mins and the first set of child processes gets created and dies after 2 mins and then the next set of child processes starts(i.e. in the 2nd min of parent) and does not execute completely as the parent dies after 1 min from the time of second set of child creation.
    The parent process “Times Out” while waiting for response from the child processes.
    We have similar environment which does not behave in this way, all the processes are executed in 1 single batch.
    BPEL version: 10.1.3.3.0
    BPEL server is on top of WAS 6.1.
    "Maximum batch size" is 1.
    Thread pool size of webcontainer is min-10, max-50.
    Please let me know how to fix this issue.
    Thanks,

    http://download.oracle.com/docs/cd/B31017_01/integrate.1013/b28980/bpel_install.htm
    look at section 2.11.1

  • Child process admin thread is shutting down.

    Hi,
    Operating on a web server with the following error message, Child Process is a phenomenon that restart.
    I would like to know what the cause.
    Version - Sun Java System Web Server 6.1
    errors
    [09/Nov/2011:12:31:00] catastrophe ( 7647): Server crash detected (signal SIGBUS)
    [09/Nov/2011:12:31:00] info ( 7647): Crash occurred in NSAPI SAF flex-log
    [09/Nov/2011:12:31:00] info ( 7647): Crash occurred in function flex_init from module /netscape/servers/bin/https/lib/libns-httpd40.so
    [09/Nov/2011:12:31:00] failure (16223): Child process admin thread is shutting down
    [09/Nov/2011:13:42:06] catastrophe (16514): Server crash detected (signal SIGSEGV)
    [09/Nov/2011:13:42:06] info (16514): Crash occurred in NSAPI SAF flex-log
    [09/Nov/2011:13:42:06] info (16514): Crash occurred in function flex_init from module /netscape/servers/bin/https/lib/libns-httpd40.so
    [09/Nov/2011:13:42:06] failure (16223): Child process admin thread is shutting down
    Edited by: 896618 on 2011. 11. 10 오후 9:21

    thanks for the response chris. in answer to your questions - no there are no NSAPI plugins installed and we are getting zero helpful output from the log files.
    /server-root/logs/errors is the only log file that has relevant output at the time of the crashes. our own application logs and the sytem's syslogs have nothing relevant at those times.
    the o/p from the errors log is basically :
    [19/Dec/2002:02:05:39] config ( 5815): [GC
    [19/Dec/2002:02:05:39] config ( 5815): 154915K->129640K(249216K)
    [19/Dec/2002:02:05:39] config ( 5815): , 0.0299277 secs]
    [19/Dec/2002:02:05:39] config ( 5815):
    [19/Dec/2002:02:05:59] failure ( 5814): Child process admin thread is shutting down
    at which point it resets itself. we have a load balanced system and the resets aren't noticed at the front end but i'm beginning to tear my hair out.
    We have the exact same s/w configuration on 2 x Netra T1s and they've been running fine for over a year. We have 2 brand spanking new Fire V100s & the only significant h/w difference between the machines being L2 cache (512k v 2Mb). i would've thought that a bottleneck would throw up errors all over the place and result in a noticeably slower system which isn't the case.
    our next step is to throw in an extra 512Mb RAM and see does that increase the time between resets - currently 24-36hrs. i have a niggling suspicion it may be memory related.
    any other ideas?

  • TCP connection closed but a child process of SQL Server may be holding a duplicate of the connection's socket in SQL2008R2

    Hello,
    I do get the below SQL error in production environment intermittently:
    TCP connection closed but a child process of SQL Server may be holding a duplicate of the connection's socket.  Consider enabling the TcpAbortiveClose SQL Server registry setting and restarting SQL Server. If the problem persists, contact Technical
    Support.
    According to the post I search in MSDN, the above error is fixed in SQL2008R-CU6, but I have SQL2008R2-SP02 CU09 patch in production environment and the above error still occurred intermittently. I am running SQL2008R2 SP02 CU09 patch with Windows 2008R2-SP01.
    I would like to know if anyone has  the same error happened to their SQL environment after applied SQL2008R2-SP02 CU06 patch and later.
    Any suggestion would be helpful.
    Best regards,
    PL.

    Hello,
    What happen if you apply the changes on the registry explained on the workaround section of the following article?
    http://support.microsoft.com/kb/2491214
    Hope this helps.
    Regards,
    Alberto Morillo
    SQLCoffee.com

  • Anyone know how to keep a child Process from closing when the main Applicaiton is closed

      I have a Web based application that needs to use an older version of Java to run it properly . I have been able to sequence this and use a shortcut to call on Iexplorer.exe to open the browser in bubble and have the old version of Java run in the
    same bubble. This part is working as needed but the issue I have ran into is when a end user opens up a word document from a link  and "check's it out" to modify it and closes the browser it immediately closes the word document along
    with it. So the Question I have, Is there a way to keep a child process open when the main Application is closed . Anyone run into anything similar or any documentation on a way to keep the process alive until the end user closes it?

    Hi There,
    I don't believe there is a way to handle this currently within App-V...Other virtualization products do have the ability to exclude processes and force to run outside or the bubble or exclude them from terminating on shutdown.
    It would be a great feature request for a future release. You can easily request it here:
    http://appv.uservoice.com/forums/280448-microsoft-application-virtualization
    PLEASE MARK ANY ANSWERS TO HELP OTHERS Blog:
    rorymon.com Twitter: @Rorymon

  • URGENT.... Shutting down child processes

    Hi,
    My application is running under windows. Its a core-java application.
    When I start the application it starts in a DOS window.
    My program in the normal execution starts 3 child processes. A, B and C. I also have added a shutdown hook to the main program, where I destroy these processes when on the main program, someone presses
    CTRL + C, but how to track if some lame user closes the DOS window. In this case the shutdown hook doesnt get invoked...
    Help me out pls. I am unable to destroy the child processes when user closes the DOS window.
    Thanks guys

    Thanks Owen.
    I went through this thread, but they are only talking about CTRL + C and not about closing the DOS window itself. When user closes the DOS window itself, there is no control coming in the program.
    Is there a way to handle this ?

  • SunMC - Process is forking and reaping child processes. What's that?

    Hey folks,
    Im really new to the sysadmin world, and I think maybe my company really didn't think things well when they've decided to put me doing this, hehehe.
    I work with a general queue for which my team receives tickets with different kind of problems, among them, Automation Alerts (I think you all know what Im talking about).
    Recently (maybe 2-4 days) we've started to receive an Automation Ticket with the following message:
    Solaris Process Monitoring Process Monitoring Base03 CPU
    time for reaped children 107.1 30.0 Process is forking and
    reaping child processes.
    I've found almost nothing about this, and even when I think what's this ticket about (I've closed 2 or 3 of them stating no issues were found), I really want to understand and know where to look and if something can be done about this, because processes on the server and general state of it seems in good condition and nothing looks bad, apparently. The server is a Solaris 10 with zones.
    Can you shed some light on this? I'd appreciate all the help you can give me.
    Thank you and regards all.

    The same thought occurred to me, except I started seeing "failure" logs in JAMF the morning after we run maintenance policies. All of these failures were due to the Mac not restarting, which isn't really a failure - more of an inconvenience - since the second part of the policy (removing eTrust antivirus and restarting is part 1, installing SEP 12.1 and restarting is part 2) runs immediately on restart no matter what time of the day it is.
    In some cases the macs did not restart because the user had unsaved work or something else that would cancel logout. In this case, Casper prompts the user to restart and waits until the OK button is clicked, then counts down 1 minute and restarts. In the other case, nothing would have prevented logout yet the computer still did not logout and then restart, leading me to believe that the reason the Mac did not log out was a locked desktop, and since I am telling System Events to simulate a gui log out, this would be blocked by a locked desktop.
    The problem basically stems from the following two issues: 1. I am not to force a logout or restart when a console user is logged in and 2. Casper will not automatically log a console user out on it's own OR restart the computer if a console user is logged in - and then I have to rely on the end user to follow on screen instructions since the Casper restart prompt can be moved to the side and effectively ignored.
    When no restarts are required for the various policies we run, this is not an issue. And anyway, I originally just wanted to know what process is running when a desktop is asleep and locked, but no screen saver is active...

  • Coldfusion 10 Enterprise with Tomcat + mod_jk and Apache2 experiencing child process hangups

    I am experiencing the most bizarre thing that so far I am unable to reproduce with my own visits to the site.
    After restarting Apache2 my cacti graphs show that the child processes increment consistently over the course of a day without dropping back down during off hours.  This behavior eventually leaves the website inaccessible...
    Looking at server-status it is filled with Ws (Sending Reply) and GET calls to my cfm applications :
    Current Time: Tuesday, 22-Jul-2014 16:33:00 PDTRestart Time: Monday, 21-Jul-2014 22:51:12 PDTParent Server Generation: 0Server uptime: 17 hours 41 minutes 48 secondsTotal accesses: 194844 - Total Traffic: 3.8 GBCPU Usage: u201.55 s34.46 cu0 cs0 - .37% CPU load3.06 requests/sec - 63.2 kB/second - 20.6 kB/request73 requests currently being processed, 4 idle workers
    WWWWWWWWWWWWWWWWWWWWWWWWWKWWWWWWWWWWWWWWWWWCWWWWW_WWWWWWCWWW_WWW _WKW...KWW.W_KWW....W........................................... ................................................................ ................................................................
    Scoreboard Key:
    "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
    "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
    "C" Closing connection, "L" Logging, "G" Gracefully finishing,
    "I" Idle cleanup of worker, "." Open slot with no current process
    Srv
    PID
    Acc
    M
    CPU
    SS
    Req
    Conn
    Child
    Slot
    Client
    VHost
    Request
    0-0
    15074
    0/46/1370
    W
    7.39
    46158
    0
    0.0
    0.44
    23.89
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=7000 HTTP/1.1
    1-0
    11563
    0/47/468
    W
    2.75
    58867
    0
    0.0
    4.69
    13.64
    192.168.1.10
    www.mysite.edu
    GET /catalog/index.cfm?courselist=list&dept=&searchc=PEHW%20148
    2-0
    12906
    0/65/884
    W
    7.30
    54536
    0
    0.0
    0.80
    14.62
    192.168.1.10
    www.mysite.edu
    GET /athletics/resources/nwaacc-athlete-of-the-week/ HTTP/1.1
    3-0
    13840
    0/41/1085
    W
    4.01
    51162
    0
    0.0
    0.56
    20.57
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=17001 HTTP/1.1
    4-0
    15928
    0/20/1635
    W
    5.40
    43715
    0
    0.0
    0.06
    41.37
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=37000 HTTP/1.1
    5-0
    18774
    0/19/2387
    W
    0.33
    34564
    0
    0.0
    0.24
    52.70
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=19009 HTTP/1.1
    6-0
    4321
    0/36/6612
    W
    3.61
    13200
    0
    0.0
    0.28
    129.74
    192.168.1.10
    www.mysite.edu
    GET /directory/index.cfm?directory=department&deptexp=28011 HTT
    7-0
    13077
    0/0/808
    W
    0.42
    54383
    0
    0.0
    0.00
    24.81
    192.168.1.10
    www.mysite.edu
    GET /directory/index.cfm?directory=department&deptexp=6005 HTTP
    8-0
    16488
    0/118/1673
    W
    12.39
    40692
    0
    0.0
    1.30
    35.44
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=31003 HTTP/1.1
    9-0
    10726
    0/15/110
    W
    0.58
    61963
    0
    0.0
    0.05
    1.83
    192.168.1.10
    www.mysite.edu
    GET /directory/index.cfm?directory=All&index=Q HTTP/1.1
    10-0
    13154
    0/1/688
    W
    0.00
    54165
    0
    0.0
    0.00
    16.83
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=All&firstname=Patrick&lastname=Murphy
    11-0
    12590
    0/25/516
    W
    4.45
    55851
    0
    0.0
    0.76
    11.19
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=4000 HTTP/1.1
    12-0
    12551
    0/13/454
    W
    1.84
    56055
    0
    0.0
    0.38
    10.00
    192.168.1.10
    www.mysite.edu
    GET /directory/index.cfm?directory=department&deptexp=20001 HTT
    13-0
    13333
    0/23/626
    W
    3.86
    53189
    0
    0.0
    0.57
    11.66
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=31005 HTTP/1.1
    14-0
    12410
    0/13/387
    W
    2.70
    56484
    0
    0.0
    0.42
    10.55
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=6003 HTTP/1.1
    15-0
    13162
    0/70/389
    W
    10.81
    53114
    0
    0.0
    0.86
    5.60
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=6005 HTTP/1.1
    16-0
    12309
    0/22/275
    W
    2.23
    56878
    0
    0.0
    0.43
    3.91
    192.168.1.10
    www.mysite.edu
    GET /directory/?directory=department&deptexp=20005 HTTP/1.1
    17-0
    13163
    0/57/341
    W
    11.85
    53120
    0
    0.0
    1.38
    6.49
    192.168.1.10
    www.mysite.edu
    GET /catalog/index.cfm?courselist=list&dept=&searchc=ENGR%26%20
    I have straced a hung process to only find the following :
    strace -p 6472
    Process 6472 attached - interrupt to quit
    read(23,
    Another interesting bit of info, none of these GET requests make it into my access.log file which I find very peculiar as well.
    Here are my CF Specs
    Server Details
    Server Product ColdFusion
    Version 10,0,13,287689
    Tomcat Version 7.0.23.0
    Edition Enterprise 
    Serial Number
    Operating System UNIX 
    OS Version 3.2.0-65-generic 
    Update Level /opt/coldfusion10/cfusion/lib/updates/chf10000013.jar 
    Adobe Driver Version 4.1 (Build 0001) 
    JVM Details
    Java Version 1.6.0_29 
    Java Vendor Sun Microsystems Inc. 
    Here are my Apache2 Specs
    Server version: Apache/2.2.22 (Ubuntu)
    Server built:   Apr 17 2014 21:49:25
    Server's Module Magic Number: 20051115:30
    Server loaded:  APR 1.4.6, APR-Util 1.3.12
    Compiled using: APR 1.4.6, APR-Util 1.3.12
    Architecture:   64-bit
    Server MPM:     Prefork
      threaded:     no
        forked:     yes (variable process count)
    Server compiled with....
    -D APACHE_MPM_DIR="server/mpm/prefork"
    -D APR_HAS_SENDFILE
    -D APR_HAS_MMAP
    -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
    -D APR_USE_SYSVSEM_SERIALIZE
    -D APR_USE_PTHREAD_SERIALIZE
    -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
    -D APR_HAS_OTHER_CHILD
    -D AP_HAVE_RELIABLE_PIPED_LOGS
    -D DYNAMIC_MODULE_LIMIT=128
    -D HTTPD_ROOT="/etc/apache2"
    -D SUEXEC_BIN="/usr/lib/apache2/suexec"
    -D DEFAULT_PIDLOG="/var/run/apache2.pid"
    -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
    -D DEFAULT_LOCKFILE="/var/run/apache2/accept.lock"
    -D DEFAULT_ERRORLOG="logs/error_log"
    -D AP_TYPES_CONFIG_FILE="mime.types"
    -D SERVER_CONFIG_FILE="apache2.conf"
    I am hoping this is no normal behavior for Coldfusion 10.
    Many thanks in advance.

    We're having the same problem, although with CF11.

  • Apache POST flex2gateway never closes or times out, reaches max child processes

    We have been trying to pass an external PCI scan, and noticed some server lockups after starting a scan.  We are scanning a couple hundred IP addresses, which all resolve to the same servers.  The scans are actively looking for vulnerabilities on the box, and one of which is flash remoting.  When we look at the apache /server-status page, it shows a ton of long running flex2gateway processes.  For instance:
    22-4
    4466
    0/3817/3817
    W
    4.07
    163840
    0
    0.0
    57.76
    57.76
    x.x.x.101
    WebNode2.ambassador.int
    POST /flex2gateway/http HTTP/1.1
    As you can see, this POST request has been running for 163840 seconds, or nearly two days.  Since it seems these POST requests never complete, even though the client has long since disconnected, they simply stack up until the server's max number of child processes has been reached, effectively killing our webserver.
    When I try to restart the clustered coldfusion instances one at a time, these POST requests do not die off.
    If I stop both clustered CF instances, the requests complete (or get killed).
    If I reload or restart apache, the requests are gone as well.
    strace gives me nothing useful:
    [root@WebNode1 ~]# strace -p 34025
    Process 34025 attached - interrupt to quit
    read(185,
    pstack gives a little more, but nothing that looks obvious to me:
    [root@WebNode1 ~]# pstack -p 34025     
    Usage: pstack <process-id>
    [root@WebNode1 ~]# pstack 34025  
    #0  0x00007fdd40444740 in __read_nocancel () from /lib64/libpthread.so.0
    #1  0x00007fdd33efe2e6 in jk_tcp_socket_recvfull () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
    #2  0x00007fdd33f1b68d in ajp_connection_tcp_get_message () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
    #3  0x00007fdd33f1ceea in ajp_get_reply () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
    #4  0x00007fdd33f20308 in ajp_service () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
    #5  0x00007fdd33ef8f5d in jk_handler () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
    #6  0x00007fdd41b92cd0 in ap_run_handler ()
    #7  0x00007fdd41b9658e in ap_invoke_handler ()
    #8  0x00007fdd41ba1c50 in ap_process_request ()
    #9  0x00007fdd41b9eac8 in ?? ()
    #10 0x00007fdd41b9a7d8 in ap_run_process_connection ()
    #11 0x00007fdd41ba6ad7 in ?? ()
    #12 0x00007fdd41ba6dea in ?? ()
    #13 0x00007fdd41ba7a6c in ap_mpm_run ()
    #14 0x00007fdd41b7e9b0 in main ()
    I dont know what that tells us exactly, but I'm leaning toward the hangup between apache and tomcat. 
    Any suggestions on where how to troubleshoot this issue?

    On a test server, I have removed the wildcard from the uriworkermap.properties file, so it now only matches "/flex2gateway" and "/flex2gateway/".  Unfortunately I'm still seeing the occasional hung apache worker. 
    Anyone have any leads on this issue?  I don't mind doing the research, I'v just exhausted the limits of my Google Fu.
    Apache Server Status for 10.10.10.205
    Server Version: Apache/2.2.15 (Unix) DAV/2 PHP/5.3.3 mod_ssl/2.2.15 OpenSSL/1.0.1e-fips mod_wsgi/3.2 Python/2.6.6 mod_jk/1.2.32 mod_perl/2.0.4 Perl/v5.10.1
    Server Built: Oct 16 2014 14:48:21
    Current Time: Monday, 10-Nov-2014 16:49:22 EST
    Restart Time: Monday, 10-Nov-2014 15:25:16 EST
    Parent Server Generation: 0
    Server uptime: 1 hour 24 minutes 6 seconds
    Total accesses: 5313 - Total Traffic: 98.4 MB
    CPU Usage: u3.97 s1.26 cu0 cs0 - .104% CPU load
    1.05 requests/sec - 20.0 kB/second - 19.0 kB/request
    15 requests currently being processed, 11 idle workers
    WWWWWWW_W_W_W__W__W__WW_W_...................................... ................................................................ ................................................................ ................................................................
    Scoreboard Key:
    "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
    "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
    "C" Closing connection, "L" Logging, "G" Gracefully finishing,
    "I" Idle cleanup of worker, "." Open slot with no current process
    Srv
    PID
    Acc
    M
    CPU
    SS
    Req
    Conn
    Child
    Slot
    Client
    VHost
    Request
    0-0
    8727
    0/12/12
    W
    0.03
    4572
    0
    0.0
    0.05
    0.05
    10.10.2.201
    qc.company.int
    POST /flex2gateway HTTP/1.1
    1-0
    8728
    0/11/11
    W
    0.03
    4358
    0
    0.0
    0.18
    0.18
    10.10.2.201
    qc.company.int
    POST /flex2gateway HTTP/1.1
    2-0
    8729
    0/38/38
    W
    0.04
    3910
    0
    0.0
    1.11
    1.11
    10.10.2.201
    qc.company.int
    POST /flex2gateway HTTP/1.1
    3-0
    8730
    0/27/27
    W
    0.03
    4064
    0
    0.0
    0.79
    0.79
    10.10.2.201
    qc.company.int
    POST /flex2gateway HTTP/1.1
    4-0
    8731
    0/16/16
    W
    0.03
    4354
    0
    0.0
    0.12
    0.12
    10.10.2.201
    qc.company.int
    POST /flex2gateway HTTP/1.1
    5-0
    8732
    0/7/7
    W
    0.02
    4564
    0
    0.0
    0.02
    0.02
    10.10.2.201
    qc.company.int
    POST /flex2gateway HTTP/1.1
    6-0
    8733
    0/8/8
    W
    0.02
    4673
    0
    0.0
    0.01
    0.01
    10.10.2.201
    qc.company.int
    POST /flex2gateway HTTP/1.1
    7-0
    8734
    0/386/386
    0.37
    4
    0
    0.0
    6.49
    6.49
    10.10.2.212
    www.company.qc
    GET /marketingpages/images/login_over.jpg HTTP/1.1
    8-0
    9422
    0/10/10
    W
    0.02
    4564
    0
    0.0
    0.04
    0.04
    10.10.2.201
    qc.company.int
    POST /flex2gateway HTTP/1.1
    9-0
    10112
    0/393/393
    0.37
    6
    0
    0.0
    14.59
    14.59
    10.10.2.212
    www.company.qc
    GET /marketingpages/images/box_onesource.jpg HTTP/1.1
    10-0
    10468
    0/321/321
    W
    0.32
    846
    0
    0.0
    4.42
    4.42
    10.10.2.212
    qc.company.int
    POST /flex2gateway HTTP/1.1
    11-0
    10470
    0/398/398
    0.38
    6
    0
    0.0
    12.80
    12.80
    10.10.2.212
    www.company.qc
    GET /marketingpages/images/home_eco.jpg HTTP/1.1
    12-0
    10471
    0/340/340
    W
    0.32
    837
    0
    0.0
    4.99
    4.99
    10.10.2.212
    qc.company.int
    POST /flex2gateway/ HTTP/1.1
    13-0
    10544
    0/404/404
    0.34
    6
    0
    0.0
    5.21
    5.21
    10.10.2.212
    www.company.qc
    GET /marketingpages/images/box_top.jpg HTTP/1.1
    14-0
    10592
    0/353/353
    0.40
    6
    12
    0.0
    14.10
    14.10
    10.10.2.212
    www.company.qc
    GET /?login HTTP/1.1
    15-0
    10648
    0/296/296
    W
    0.31
    800
    0
    0.0
    3.82
    3.82
    10.10.2.212
    qc.company.int
    POST /flex2gateway/ HTTP/1.1
    16-0
    12382
    0/339/339
    0.33
    6
    0
    0.0
    2.85
    2.85
    10.10.2.212
    www.company.qc
    GET /marketingpages/images/logo_sourceone.jpg HTTP/1.1
    17-0
    12387
    0/336/336
    0.34
    6
    0
    0.0
    5.06
    5.06
    10.10.2.212
    www.company.qc
    GET /marketingpages/images/logo_onesource.jpg HTTP/1.1
    18-0
    12388
    0/265/265
    W
    0.25
    839
    0
    0.0
    2.87
    2.87
    10.10.2.212
    qc.company.int
    POST /flex2gateway/ HTTP/1.1
    19-0
    12389
    0/323/323
    0.31
    0
    0
    0.0
    4.82
    4.82
    10.10.2.212
    www.company.qc
    GET /marketingpages/lib/dimming.js HTTP/1.1
    20-0
    12390
    0/336/336
    0.31
    4
    0
    0.0
    5.24
    5.24
    10.10.2.212
    www.company.qc
    GET /marketingpages/lib/superfish.js HTTP/1.1
    21-0
    12391
    0/289/289
    W
    0.27
    805
    0
    0.0
    2.49
    2.49
    10.10.2.212
    qc.company.int
    POST /flex2gateway/ HTTP/1.1
    22-0
    12392
    0/281/281
    W
    0.27
    831
    0
    0.0
    3.17
    3.17
    10.10.2.212
    qc.company.int
    POST /flex2gateway HTTP/1.1
    23-0
    14750
    0/41/41
    0.04
    6
    0
    0.0
    0.92
    0.92
    10.10.2.212
    www.company.qc
    GET /marketingpages/images/close.jpg HTTP/1.1
    24-0
    14751
    0/43/43
    W
    0.04
    0
    0
    0.0
    1.21
    1.21
    10.10.2.36
    qc.company.int
    GET /server-status HTTP/1.1
    25-0
    14752
    0/40/40
    0.04
    6
    0
    0.0
    0.96
    0.96
    10.10.2.212
    www.company.qc
    GET /marketingpages/images/box_sourceone.jpg HTTP/1.1

  • To Kill Parent / Child process

    Hi ,
    I'm facing problem with killing a process. I'm using "kill -9 <ppid>"(ppid - parent process id) command to kill the process, this command kills the parent process but the associated child process is not killed.
    I'm new to this Solaris, and my question is
    Do killing the Parent process internally kills the child process too? if the child process is taking time, how to ensure that the child process is killed before killing the Parent process.
    Thanks in advance for your response.

    Do killing the Parent process internally kills the child process too?Hello.
    If the parent process ends (does not care if regular exit or kill) it sends a signal to all its child processes. This is equal to "kill -xxx <child_pid>" (sorry that I do not know the number "xxx" by now).
    By default this signal will kill the child process but it is a signal that can be caught or ignored. This means the child process can tell the operating system that it does not wish to be killed when the parent is killed, the parent exits or an explicit "kill -xxx" is sent. (Only two signals cannot be caught or ignored: SIGKILL and one that pauses the process.)
    Martin

  • Signal for non-child process death

    I am porting an NT system to Solaris. One process (HM) is responsible for starting groups of server processes, monitoring for death of a process, stopping/restarting/recovering the group. I know how to port this using fork/exec to start processes and SIGCHLD to monitor for death of child.
    Now, for the hard part. If this HM process dies and is restarted by the OS, it reads a text file containing all the child processes and resumes monitoring them. This is done under Win32 API because we do a WaitForMultipleObjects() call and pass the process handles.
    How can a unix process monitor for the death of non-child processes (because after HM dies and is restarted the processes that it needs to monitor are not its children anymore)?
    Can a unix process "adopt" processes from init (which would be the parent of the children after HM died)?
    I thank you in advance for your kind consideration of my questions.

    You cannot rely on /proc - it's not standardized
    (yet?) across different UNIX'es.Any information on how to find out if and/or when it will be standardized?
    And there is no 'watchdogs' that would allow you to
    simply get a notification when specific /proc/<pid>
    directory vanishes - unless you want to poll it...
    Finally, you cannot write/create arbitrary stuff in
    /proc - it's not a real file system, just a [mostly]
    read-only interface to the OS guts...I was not planning on writing to it or making up arbitrary files. I was considering opening the /prod/<pid>/as file (the /proc man page says that it contains information about the address space of the process). I would open it read-only and pass the file descriptor to a select() call in the exceptfds array. I think that this will return as an exception when the process dies, because the file goes away.
    So I'd simply use pipes in, say, /var/tmp, or even
    /tmp...My issue with using pipes is that some of the processes that I want to monitor are third-party processes that are crucial to our software's proper operation (like the processes that make up the CORBA ORB). I do not have the source code and cannot make them open up a pipe. So, I am forced to rely on what the operating system will do for me.
    Please reply with any flaws in my thinking, any improvements on my idea, etc.
    Thanks,
    Raymond Hendrey

  • Server spawning child processes

    Has anyone ever seen the WL server process spawn child processes? Does
    anyone know what it is doing when this happens and why it does so?
    Any help or insight is appreciated.
    Thanks,
    Raymond Lavoie
    P.S.
    Here is the environment:
    WL 4.5.1
    Solaris 2.6
    JDK 1.2.1_03
    using Native I/O

    Will garbase collection clean these up?
    Out heap has slowly risen to 77% during the course of the day. Should this
    happen? Will it wait until a certain % of heap use before it runs a big
    garbage collection? I don't think we are trying to create any new
    processes.
    Thanks.
    Rob Woollen wrote in message <[email protected]>...
    Those are zombies. The kernel will keep process information arounduntil
    the parent process collects it.
    Does your code ever attempt to create new processes?
    -- Rob
    Rob Woollen
    Software Engineer
    BEA WebLogic
    [email protected]
    Raymond Lavoie wrote:
    Can you explain what is happening here with these processes running below
    (6203 is the originial weblogic process).
    web001 28294 6203 0 0:00 <defunct>
    web001 27842 6203 0 0:00 <defunct>
    web001 20125 6203 0 0:00 <defunct>
    web001 26663 6203 0 0:00 <defunct>
    web001 24262 6203 0 0:00 <defunct>
    web001 23073 6203 0 0:00 <defunct>
    web001 28293 6203 0 0:00 <defunct>
    web001 23739 6203 0 0:00 <defunct>
    web001 27718 6203 0 0:00 <defunct>
    web001 21998 6203 0 0:00 <defunct>
    web001 23276 6203 0 0:00 <defunct>
    web001 25729 6203 0 0:00 <defunct>
    web001 24547 6203 0 0:00 <defunct>
    web001 25085 6203 0 0:00 <defunct>
    web001 26779 6203 0 0:00 <defunct>
    web001 12823 6203 0 0:00 <defunct>
    web001 6203 6180 0 10:49:14 pts/3 179:11
    /wl_data_1/java1.2/bin/../bin/sparc/native_threads/java -ms256m -mx256m -Dwe
    blo
    web001 20411 6203 0 0:00 <defunct>
    web001 19491 6203 0 0:00 <defunct>
    web001 12643 6203 0 0:00 <defunct>
    web001 13558 6203 0 0:00 <defunct>
    web001 28584 6203 0 0:00 <defunct>
    web001 26548 6203 0 0:00 <defunct>
    web001 13730 6203 0 0:00 <defunct>
    web001 17209 6203 0 0:00 <defunct>
    web001 26780 6203 0 0:00 <defunct>
    web001 14659 6203 0 0:00 <defunct>
    web001 26722 6203 0 0:00 <defunct>
    web001 26161 6203 0 0:00 <defunct>
    web001 26188 6203 0 0:00 <defunct>
    web001 24546 6203 0 0:00 <defunct>
    web001 22078 6203 0 0:00 <defunct>
    web001 12528 6203 0 0:00 <defunct>
    web001 28007 6203 0 0:00 <defunct>
    web001 13101 6203 0 0:00 <defunct>
    web001 12185 6203 0 16:26:54 pts/3 0:00
    /wl_data_1/java1.2/bin/../bin/sparc/native_threads/java -ms256m -mx256m -Dwe
    blo
    Don Ferguson wrote in message <[email protected]>...
    Well, I suppose processes are forked when compiling JSPs.
    Rob Woollen wrote:
    The WL server never forks another process. It does however use
    multiple threads.
    -- Rob
    Rob Woollen
    Software Engineer
    BEA WebLogic
    [email protected]
    Raymond Lavoie wrote:
    Has anyone ever seen the WL server process spawn child processes?
    Does
    anyone know what it is doing when this happens and why it does so?
    Any help or insight is appreciated.
    Thanks,
    Raymond Lavoie
    P.S.
    Here is the environment:
    WL 4.5.1
    Solaris 2.6
    JDK 1.2.1_03
    using Native I/O

Maybe you are looking for

  • Is it possible to use Facetime between Mac and iPad with the same Apple ID?

    Hello, I have a MacBook Pro and an iPad. May MacBook is used by my wife and me at home. I take the Pad with me when I travel. On the MacBook I have and Apple ID registered. It is the same I use with the iPad. Do I need a differnt ID for my wife if we

  • I cannot use Facetime on my iPhone5.

    When I try to sign in with my Apple ID and password, it says I can not sign in.  To check my network connection and try again. I am connected to my home wifi network.  Why can't I sign in??

  • How to play back mts files in Bridge CS 5.1?

    I figured out that the reason of slow and stuttering playback of my AVCHD clips in Premiere was that I used re-wrapped H.264 files in .mov container. That makes it easy to visualize them in QT or in the Finder, but Premiere cannot play them smoothly.

  • Signal express project documentation

    I've been having problems with the "project documentation" section of my Sound And Vibration Assistant (based on signal express). It's been locking up and corrupting previously good projects. When I add the peak search step to the documentation and h

  • Alesis Multimix 8 input works but not output

    Hi, I'm trying to use the alesis multimix to playback a garageband project via firewire. This is to hopefully reduce latency. However, when I change the settings in garageband to alesis for but in and out, only the input works. I also triend changing