CMS collector tuning

Hi all,
I have a server which monitors a great number of devices. Each value received from the devices is stored in memory for for about 5 sec. At the max load on the server I've got about 400 Mb of data to keep in the memory. What I'd like to achieve is storing as less object in Old Generation as possible, since I observe 'concurrent mode failures' of CMS ~ 2 per day, which cause long pauses in the application execution.
I thought that if I increase the size on Young Generation then the objects will stay longer in Yong Generation and will be GC-ted before getting to the Old Generation. But it seems not to be the case - the JVM doesn't even commit the Max Size for Young Generation and I haven't seen any changes in the Age Thresholds chosen by JVM.
What could the typical GC settings be for such an application?
For the moment we configure it like this:
-Xms1024m -Xmx2048m
-server
-XX:NewRatio=12
-XX:SurvivorRatio=2
-XX:-DisableExplicitGC
-XX:CompileThreshold=50
-XX:CMSInitiatingOccupancyFraction=50
-XX:+UseParNewGC
-XX:ParallelGCThreads=13
-XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode
-XX:CMSIncrementalDutyCycleMin=0
-XX:CMSIncrementalDutyCycle=10
-XX:+CMSParallelRemarkEnabled
-XX:ParallelCMSThreads=4
-XX:+CMSClassUnloadingEnabled
-XX:CMSFullGCsBeforeCompaction=1
We've got 16 CPU Blade, 8 Gb RAM.
Thank you very much in advance for any kind of help!

[What is CMS failure|http://blogs.sun.com/jonthecollector/entry/what_the_heck_s_a]

Similar Messages

CMS collector taking too long pauses due to fragmentation

we are using Weblogic 10gR3 servers with JDK 160_23 for ODSI application and using CMS collector for garbage collection. But we are seeing ParNew (promotion failed) due to fragmentation and ending up CMS having more than 30 seconds stop the world pauses every 12-13 hours. other than this normally CMS takes only 0.03 - 0.05 seconds of application pauses. Here is the JVM arguments we are using and the GC logs that has ParNew - promotion failed.
/opt/oracle/10gR3/jdk160_23/jre/bin/java -Dweblogic.Name=member3MS1 -Djava.security.policy=/opt/oracle/10gR3/wlserver_10.3/server/lib/weblogic.policy -Dweblogic.management.server=http://wdcsn443a.sys.cigna.com:7001 -Djava.library.path=/opt/oracle/10gR3/jdk160_23/jre/lib/sparc/client:/opt/oracle/10gR3/jdk160_23/jre/lib/sparc:/opt/oracle/10gR3/jdk160_23/jre/../lib/sparc:/opt/oracle/10gR3/patch_wlw1030/profiles/default/native:/opt/oracle/10gR3/patch_wls1030/profiles/default/native:/opt/oracle/10gR3/patch_cie670/profiles/default/native:/opt/oracle/10gR3/patch_aldsp1030/profiles/default/native:/opt/oracle/10gR3/patch_wlw1030/profiles/default/native:/opt/oracle/10gR3/patch_wls1030/profiles/default/native:/opt/oracle/10gR3/patch_cie670/profiles/default/native:/opt/oracle/10gR3/patch_aldsp1030/profiles/default/native:.:/opt/oracle/10gR3/wlserver_10.3/server/native/solaris/sparc:/opt/oracle/10gR3/wlserver_10.3/server/native/solaris/sparc/oci920_8:/opt/oracle/10gR3/wlserver_10.3/server/native/solaris/sparc:/opt/oracle/10gR3/wlserver_10.3/server/native/solaris/sparc/oci920_8:/opt/oracle/10gR3/wlserver_10.3/server/native/solaris/sparc:/opt/oracle/10gR3/wlserver_10.3/server/native/solaris/sparc/oci920_8:/usr/jdk/packages/lib/sparc:/lib:/usr/lib -Djava.class.path=/opt/oracle/10gR3/user_projects/lib/commons-lang-2.4.jar:/opt/oracle/10gR3/user_projects/lib/log4j-1.2.15.jar:/opt/oracle/10gR3/modules/com.bea.common.configfwk_1.2.0.0.jar:/opt/oracle/10gR3/modules/com.bea.core.xquery.beaxmlbeans-interop_1.3.0.0.jar:/opt/oracle/10gR3/modules/com.bea.core.xquery.xmlbeans-interop_1.3.0.0.jar:/opt/oracle/10gR3/modules/com.bea.core.binxml_1.3.0.0.jar:/opt/oracle/10gR3/modules/com.bea.core.sdo_1.1.0.0.jar:/opt/oracle/10gR3/modules/com.bea.core.xquery_1.3.0.0.jar:/opt/oracle/10gR3/modules/com.bea.alsb.client_1.1.0.0.jar:/opt/oracle/10gR3/modules/com.bea.common.configfwk.wlinterop_10.3.0.0.jar:/opt/oracle/10gR3/patch_wss110/profiles/default/sys_manifest_classpath/weblogic_patch.jar:/opt/oracle/10gR3/patch_wls1001/profiles/default/sys_manifest_classpath/weblogic_patch.jar:/opt/oracle/10gR3/patch_cie650/profiles/default/sys_manifest_classpath/weblogic_patch.jar:/opt/oracle/10gR3/patch_aldsp320/profiles/default/sys_manifest_classpath/weblogic_patch.jar:/opt/oracle/10gR3/jdk160_23/lib/tools.jar:/opt/oracle/10gR3/wlserver_10.3/server/lib/weblogic_sp.jar:/opt/oracle/10gR3/wlserver_10.3/server/lib/weblogic.jar:/opt/oracle/10gR3/modules/features/weblogic.server.modules_10.0.1.0.jar:/opt/oracle/10gR3/modules/features/com.bea.cie.common-plugin.launch_2.1.2.0.jar:/opt/oracle/10gR3/wlserver_10.3/server/lib/webservices.jar:/opt/oracle/10gR3/modules/org.apache.ant_1.6.5/lib/ant-all.jar:/opt/oracle/10gR3/modules/net.sf.antcontrib_1.0b2.0/lib/ant-contrib.jar:/opt/oracle/10gR3/modules/features/aldsp.server.modules_3.2.0.0.jar:/opt/oracle/10gR3/odsi_10.3/lib/ld-server-core.jar:/opt/oracle/10gR3/wlserver_10.3/common/eval/pointbase/lib/pbclient51.jar:/opt/oracle/10gR3/wlserver_10.3/server/lib/xqrl.jar:/opt/oracle/10gR3/user_projects/lib/db2jcc.jar:/opt/oracle/10gR3/user_projects/lib/db2jcc_license_cisuz.jar:/opt/oracle/10gR3/properties -Dweblogic.system.BootIdentityFile=/opt/oracle/10gR3/user_projects/domains/DataFabricDomain/servers/member3MS1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=true -Dweblogic.ReverseDNSAllowed=false -Xms2048m -Xmx2048m -Xmn640m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:-UseBiasedLocking -XX:ParallelGCThreads=16 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/logs/oracle/10gR3/DataFabricDomain/ManagedServer/member3agc.log_* -da -Dplatform.home=/opt/oracle/10gR3/wlserver_10.3 -Dwls.home=/opt/oracle/10gR3/wlserver_10.3/server -Dweblogic.home=/opt/oracle/10gR3/wlserver_10.3/server -Dwli.home=/opt/oracle/10gR3/wlserver_10.3/integration -Daldsp.home=/opt/oracle/10gR3/odsi_10.3 -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Dweblogic.management.discover=false -Dweblogic.management.server=http://wdcsn443a.sys.cigna.com:7001 -Dwlw.iterativeDev=false -Dwlw.testConsole=false -Dwlw.logErrorsToConsole=true -Dweblogic.ext.dirs=/opt/oracle/10gR3/patch_wss110/profiles/default/sysext_manifest_classpath:/opt/oracle/10gR3/patch_wls1001/profiles/default/sysext_manifest_classpath:/opt/oracle/10gR3/patch_cie650/profiles/default/sysext_manifest_classpath:/opt/oracle/10gR3/patch_aldsp320/profiles/default/sysext_manifest_classpath -Dweblogic.system.BootIdentityFile=/opt/oracle/10gR3/user_projects/domains/DataFabricDomain/security/boot.properties -DDB2_USE_LEGACY_TOP_CLAUSE=true -Dlog4j.configuration=file:/opt/oracle/10gR3/user_projects/domains/DataFabricDomain/properties/log4j.xml -Ddeploymentsite=prod -DLOG4J_LEVEL=WARN -DLOG4J_ROOT=/logs/oracle/10gR3/DataFabricDomain -DLOG4J_NODENAME=member3a weblogic.Server
48461.245: [GC 48461.245: [*ParNew (promotion failed)*: 559017K->551408K(589824K), 1.1880458 secs]48462.433: [CMS: 1294242K->895754K(1441792K), 28.3698618 secs] 1852617K->895754K(2031616K), [CMS Perm : 122026K->120411K(262144K)], 29.5587684 secs] [Times: user=29.93 sys=0.04, real=29.56 secs]
Total time for which application threads were stopped: 29.5661221 seconds
109007.379: [GC 109007.380: [ParNew: 531521K->8922K(589824K), 0.0181922 secs] 1805634K->1283302K(2031616K), 0.0187539 secs] [Times: user=0.22 sys=0.01, real=0.02 secs]
Total time for which application threads were stopped: 0.0285263 seconds
Application time: 33.9224151 seconds
Total time for which application threads were stopped: 0.0086703 seconds
Application time: 8.5028806 seconds
109049.842: [GC 109049.842: [ParNew: 533210K->8861K(589824K), 0.0181380 secs] 1807590K->1283332K(2031616K), 0.0187288 secs] [Times: user=0.22 sys=0.01, real=0.02 secs]
Total time for which application threads were stopped: 0.0283473 seconds
Application time: 42.6375077 seconds
109092.508: [GC 109092.508: [ParNew: 533149K->8811K(589824K), 0.0161865 secs] 1807620K->1283418K(2031616K), 0.0167544 secs] [Times: user=0.19 sys=0.00, real=0.02 secs]
Total time for which application threads were stopped: 0.0264697 seconds
109122.582: [GC 109122.583: [*ParNew (promotion failed)*: 533099K->532822K(589824K), 1.2159460 secs]109123.799: [CMS: 1274986K->928935K(1441792K), 30.2900798 secs] 1807706K->928935K(2031616K), [CMS Perm : 127780K->126922K(262144K)], 31.5070045 secs] [Times: user=31.72 sys=0.04, real=31.51 secs]
Total time for which application threads were stopped: 31.5171276 seconds
Even though we cannot avoid fragmentation, what would be the best way to reduce these stop the world pauses?
Edited by: user12844507 on Mar 31, 2011 6:19 AM
Edited by: user12844507 on Mar 31, 2011 6:46 AM

The problem appears to be that the CMS work best if it can start before it is forced to start. The -XX:CMSInitiatingOccupancyFraction= determines at what point it should start before it is full. However it appear that this is too high, ie. you are creating objects too fast and it is running out of space before it finishes.
In particular you have "533099K->532822K(589824K)" which indicated to me you are filling the eden space with medium term lived objects very quickly. (More than 1/2 GB of them)
I would try to increase the young generation space until it appears to be too large. I would try "-XX:NewSize=2g -mx3g" to give it a much large younger generation space. This will stop some medium lived object being promopted and flooding the tenured space (which then have to be cleaned up and result in fragmentation)
Perhaps you have enough memory to try larger sizes. I use "-XX:NewSize=7g -mx8g" and I have no objects being prompted after startup.
BTW -mx == -Xmx
You might find this interesting http://blogs.sun.com/jonthecollector/entry/when_the_sum_of_the

Garbage collector tuning. Permanent generation

Hi all,
I'm learning about garbage collector tuning.
Why my system always gives for the permanent generation 8192K?
And why is always full with 8191K? Maybe it is full because my application manages an internal java cache but ....
Does it is OK that is always full?, how can I change its size?
[Perm : 8191K->8191K(8192K)], 0.1922860 secs]
I'm using Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_04-b05)
Linux SuSE
I'm using the following command
java -XX:+PrintGCDetails -XX:NewRatio=3 -Xss256k -Xms128m -Xmx256m
[Full GC [Tenured: 0K->2206K(98304K), 0.1920700 secs] 24643K->2206K(127808K), [Perm : 8191K->8191K(8192K)], 0.1922860 secs]
[GC [DefNew: 26299K->1168K(29568K), 0.0566740 secs] 28505K->3374K(127872K), 0.0567870 secs]
[GC [DefNew: 27472K->3264K(29568K), 0.0391920 secs] 29678K->6757K(127872K), 0.0392870 secs]
[GC [DefNew: 29567K->3264K(29568K), 0.0756940 secs] 33061K->12212K(127872K), 0.0757840 secs]
Thaks,

Hi!
In the permanent generation you have data like class information and static strings. This data is usually never garbage-collected since it never becomes garbage anyway (it is "permanent data"). Per default, the JVM starts with a very small perm gen (somewhere around 4 MB, I believe, but this may be system specific). The default max size for the perm gen is 64 MB on most systems.
If your application needs more space in the perm gen than initially allocated, the JVM will enlarge the perm gen until your data fits into the perm gen (or the max size is reached). In your case your applications seems to need 8 MB perm space, therefore the JVM enlarges the perm gen until it is 8 MB large.
So, to answer your question, it's totally ok that your perm gen is 8 MB large and always full. There is no need to change its size in your case. If you still want to do it, you can use -XX:PermSize=<initalSize> and -XX:MaxPermSize=<maxSize>. Setting -XX:PermSize=8m may speed up your application start a little since the JVM allocates enough space right in the beginning and doesn't need to enlarge the perm gen afterwards.
Nick.

Monitoring GC events - CMS MBean reporting vs verboce gc log

With verbose gc logging on, we see, say, 7 major collections reported by the CMS collector, but only 3 of them are reported by its MBean. Can anyone explain what events the MBean is reporting on, and how we can best monitor all of these major collections (some of which are Full GCs)? Also, why no notification API in com.sun.management.GcInfo?
In addition, I am trying to correlate the starting times reported by the MBean to the log's, and I can do so to within <1ms or so - fine. However, the verbose gc log output is not easily grep'able to be able to easily search for something specific. For example, here are the 3 events that are reported by the log and MBean, and which cannot be easily searched for in the log:
7241.257: [Full GC 7241.257: [CMS: 493360K->263554K(983040K), 3.4341793 secs] 508315K->263554K(1042048K), [CMS Perm : 261924K->114742K(262144K)], 3.4343021 secs]
9717.604: [GC 9717.604: [ParNew (promotion failed): 59007K->59007K(59008K), 0.1716470 secs]9717.776: [CMS: 708500K->334835K(983040K), 3.9617842 secs] 755796K->334835K(1042048K), 4.1336188 secs]
11033.636: [GC 11033.636: [ParNew: 56766K->56766K(59008K), 0.0000312 secs]11033.636: [CMS11033.637: [CMS-concurrent-abortable-preclean: 0.034/0.384 secs]
(concurrent mode failure): 964112K->316828K(983040K), 3.9151547 secs] 1020879K->316828K(1042048K), 3.9153244 secs]

The GarbageCollectorMBean is reporting the stop-the-world collection. For CMS collector, it only reports the foreground collection but not the background collection which is running in the background with other mutator threads. I believe the difference you observed is due to the background collector activity.
There is a RFE filed to report the statistic for the background collector as well.
http://bugs.sun.com/view_bug.do?bug_id=4975677
Regarding the notification API, can you elaborate more what functionality you are looking for?
We're working on providing a GChisto tool for viewing and analysing a GC log which should help in your analysis. Stay tuned for the news about it.

Java JVM 8: Using incremental CMS is deprecated and will likely be removed in a future release

Hello,
We (my company) have been using YoungGC=Parallel; Old=CMS/Incremental since Java 1.5 for a Java caching application running in a 64 GB heap (yes, even in HotSpot 1.5).
At various times, we have tested the G1 collector. The performance for our caching application has not met expectations with G1GC. Always looking for ways to improve the system, we are certainly open to new technologies. However, the loss of the CMS collector will mean our application won’t be able to adopt newer versions of Java.
Basic JVM options for GC:
-XX:+UnlockExperimentalVMOptions
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode
-XX:-CMSIncrementalPacing
-XX:CMSIncrementalDutyCycleMin=100
-XX:CMSIncrementalDutyCycleMin=95
-XX:+ExplicitGCInvokesConcurrent
-XX:ConcGCThreads=6
With these options, on servers with 64GB heap used at 60% and thousands of QPS, we see Young GC pauses in the range of 50-100 ms every roughly 10 seconds (for .5-1% of the time) and about the same for the short GC pauses of the background CMS. We NEVER see a long CMS pause and the servers run for months at a time, being taken down pretty much only to patch the OS. With a new generation of the hardware, improved software and taking advantage of the ConcGCThreads option, we are just beginning a series of tests to determine how high we can crank-up the memory to reduce the farm size.This project is expected to go OpenSource later this year. Without the CMS collector, very large heaps will become very difficult (or impossible) to manage.
Before removing the CMS collector, and I understand it is causing grief to still have it in the Java code base, please ensure there is an adequate replacement (G1 is currently not it).
Thank you for your attention,
Pierre

I recomend you probe and test the MetaSpace GC policy in 1.8
Probably you improve the application in 30% of performance terms.

Concurrent mode failure triggered by a full permanent generation?

Hello
I've read various forum posts and blogs on concurrent mode failures and I'm aware why such failures occur (small old gen, promotion guarantee failure, fragmentation) in the old gen and some options (increase heap size/old gen, use CMSInitiatingFraction, explicit System.gc() at quiet hours for compaction, etc.) that may be used to alleviate the problem. In fact, thanks to these forums, I used some of these options to effectively deal with concurrent mode failures on a SPARC system (Solaris 9). However, with my current JVM tuning exercise, the following GC log snippet from a JVM 1.4.2_10 running on HP-UX 11.23 (ia64) leaves me in doubt.
346179.949: [GC 346179.950: [ParNew: 419022K->27014K(491520K), 0.1540543 secs] 976980K->585383K(1474560K), 0.1542645 secs]
346444.644: [GC 346444.645: [ParNew: 420230K->420230K(491520K), 0.0000416 secs]346444.645: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedMethodAccessor1107]
[Unloading class sun.reflect.GeneratedMethodAccessor1087]
[Unloading class org.springframework.ejb.support.AbstractJmsMessageDrivenBean]
[Unloading class sun.reflect.GeneratedConstructorAccessor552]
...#lots of classes unloaded here#
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor155]
: 558369K->317371K(983040K), 9.5144462 secs] 978599K->317371K(1474560K), 9.5147394 secs]
346677.529: [GC 346677.530: [ParNew: 393216K->13960K(491520K), 0.2163942 secs] 710587K->331331K(1474560K), 0.2166204 secs]
JVM 1.4.2_10 Flags: -Xms1536m -Xmx1536m -Xmn576m -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=31 -XX:TargetSurvivorRatio=80 -XX:SurvivorRatio=4 -XX:+TraceClassLoading -XX:+TraceClassUnloading -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xverbosegc:file=.....
Question: When using CMS, perm gen collection is turned off by default and a serial Full GC collects the perm gen. Could this have led to the concurrent mode failure seen above causing classes to be unloaded? i.e. when CMS collector needs to clean up perm gen via a serial STW GC, it generates a concurrent mode failure?
I doubt the concurrent mode failure could be caused by a full promotion guarantee failure or fragmentation given the available free heap and the fact that this was the first GC in the old generation since server startup.
Note: We do have a perm gen memory leak caused by spring/CGLIB. While we are fixing the class loading issue, we increased MaxPermSize from 256m to 512m to buy some time. With MaxPermSize=512m and -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled, we had very frequent CMS GCs (every 5 seconds) and I read that this happens because the CMSInitiatingthreshold is the same for both old gen and perm gen. So, we removed -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled and the CMS GCs are no longer frequent.
Thanks
Edited by: cybergavin on Dec 28, 2009 12:09 AM

Hmm..I think I was a bit hasty there...
[ParNew: 420230K->420230K(491520K), 0.0000416 secs]346444.645: [CMS (concurrent mode failure)[Unloading class........ indicates a parallel scavenge could not be performed most probably due to a "young generation guarantee" failure. Also, HPjmeter revealed no issues with perm gen growth. So, I have decreased the size of the young generation and triggered CMS earlier (changes in bold below):
*JVM 1.4.2_10 Flags:* -Xms1536m -Xmx1536m *-Xmn512m* -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=31 -XX:TargetSurvivorRatio=80 -XX:SurvivorRatio=4 -XX:+TraceClassLoading -XX:+TraceClassUnloading -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps *-XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly* -Xverbosegc:file=.....
Will post back on results of the above....

How can a JVM terminate with an exit code of 141 and no other diagnostics?

Hello,
We are encountering a JVM process that dies with little explanation other than an exit code of 141. No hotspot error file (hs_err_*) or crash dump. To date, the process runs anywhere from 30 minutes to 8 days before the problem occurs. The last application log entry is always the report of a lost SSL connection, the result of an thrown SSLException. (The exception itself is unavailable at this time – the JVM dies before it is logged -- working on that.)
How can a JVM produce an exit code of 141, and nothing else? Can anyone suggest ideas for capturing additional diagnostic information? Any help would be greatly appreciated! Environment and efforts to date are described below.
Thanks,
-KK
Host machine: 8x Xeon server with 256GB memory, RHEL 6 (or RHEL 5.5) 64-bit
Java: Oracle Java SE 7u21 (or 6u26)
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
JVM arguments:
-XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode
-XX:+CMSClassUnloadingEnabled
-XX:MaxPermSize=256m
-XX:NewSize=64m
-Xms128m
-Xmx1037959168
-Djava.awt.headless=true
-Djava.security.egd=file:///dev/./urandom
Diagnostics attempted to date:
LD_PRELOAD=libjsig.so.   A modified version of libjsig.so was created to report all signal handler registrations and to report SIGPIPE signals received. (Exit code 141 could be interpreted as 128+SIGPIPE(13).) No JNI libraries are registering any signal handlers, and no SIGPIPE signal is reported by the library for the duration of the JVM run. Calls to ::exit() are also intercepted and reported. No call to exit() is reported.
Inspect /var/log/messages for any indication that the OS killed the process, e.g. via the Out Of Memory (OOM) Killer. Nothing found.
Set ‘ulimit –c unlimited’, in case the default limit of 0 (zero) was preventing a core file from being written. Still no core dump.
‘top’ reports the VIRT size of the process can grow to 20GB or more in a matter of hours, which is unusual compared to other JVM processes. The RES (resident set size) does not grow beyond about 375MB, however, which is an considered normal.
This JVM process creates many short-lived Thread objects by way of a thread pool, averaging 1 thread every 2 seconds, and these objects end up referenced only by a Weak reference.   The CMS collector seems lazy about collecting these, and upwards of 2000 Thread objects have been seen (in heap dumps) held only by Weak references. (The Java heap averages about 100MB, so the collector is not under any pressure.) However, a forced collection (via jconsole) cleans out the Thread objects as expected. Any relationship of this to the VIRT size or the JVM disappearance, however, cannot be established.
The process also uses NIO and direct buffers, and maintains a DirectByteBuffer cache. There is some DirectByteBuffer churn. MBeans report stats like:
Direct buffer pool: allocated=669 (20,824,064 bytes), released=665 (20,725,760), active=4 (98,304) [note: equals 2x 32K buffers and 2x 16K buffers]
java.nio.BufferPool > direct: Count=18, MemoryUsed=1343568, TotalCapacity=1343568
These numbers appear normal and also do not seem to correlate with the VIRT size or the JVM disappearance.

True, but the JNI call would still be reported by the LD_PRELOAD intercept, unless the native code could somehow circumvent that. Using a test similar to GoodbyeWorld (shown below), I verified that the JNI call to exit() is reported. In the failure case, no call to exit() is reported.
Can an OS (or a manual) 'kill' specify an exit code? Where could "141" be coming from?
Thanks,
-K2
=== GoodbyeWorldFromJNI.java ===
package com.attachmate.test;
public class GoodbyeWorldFromJNI
    public static final String LIBRARY_NAME = "goodbye";
    static {
        try {
            System.loadLibrary(LIBRARY_NAME);
        } catch (UnsatisfiedLinkError error) {
            System.err.println("Failed to load " + System.mapLibraryName(LIBRARY_NAME));
    private static native void callExit(int exitCode);
    public static void main(String[] args) {
        callExit(141);
=== goodbye.c ===
#include <stdlib.h>
#include "goodbye.h" // javah generated header file
JNIEXPORT void JNICALL Java_com_attachmate_test_GoodbyeWorldFromJNI_callExit
(JNIEnv *env, jclass theClass, jint exitCode)
    exit(exitCode);
=== script.sh ===
#!/bin/bash -v
uname -a
export PATH=/opt/jre1.7.0_25/bin:$PATH
java -version
pwd
LD_PRELOAD=./lib/linux-amd64/libjsigdebug.so java -classpath classes -Djava.library.path=lib/linux-amd64 com.attachmate.test.GoodbyeWorldFromJNI > stdout.txt
echo $?
tail stdout.txt
=== script output ===
[keithk@keithk-RHEL5-dev goodbyeJNI]$ ./script.sh
#!/bin/bash -v
uname -a
Linux keithk-RHEL5-dev 2.6.18-164.2.1.el5 #1 SMP Mon Sep 21 04:37:42 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
export PATH=/opt/jre1.7.0_25/bin:$PATH
java -version
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
pwd
/tmp/goodbyeJNI
LD_PRELOAD=./lib/linux-amd64/libjsigdebug.so java -classpath classes -Djava.library.path=lib/linux-amd64 com.attachmate.test.GoodbyeWorldFromJNI > stdout.txt
echo $?
141
tail stdout.txt
JSIG: exit(141) called
JSIG: Call stack has 4 frames:
JSIG: ./lib/linux-amd64/libjsigdebug.so [0x2b07dc1bdc2f]
JSIG: ./lib/linux-amd64/libjsigdebug.so(exit+0x29) [0x2b07dc1bea41]
JSIG: /tmp/goodbyeJNI/lib/linux-amd64/libgoodbye.so [0x2aaab3e82547]
JSIG: [0x2aaaab366d8e]
=== ===

Is there a flowchart describing gc behaviour of the Sun hotspot jvm?

Given that garbage collector tuning is such an important part of application deployment it is surprising that the behaviour of the garbage collector implemented in the Sun JVM is so poorly documented.
Our system is displaying behaviour changes that we don't expect and do not want and it we can't find enough information to predict how the garbage collector will behave under expected and changing conditions. This makes it impossible to select sensible parameters for our system.
For example, we have included the -Xincgc option when running a weblogic6.0 server to reduce the size of the gc pauses. Monitoring the memory (both on the weblogic console and via -verbose:gc) we have small gcs happening quite frequently but after a couple of hours it switches over to full gcs and stays that way for ever after. The full gcs bring on the longer delays for garbage collection (typically around 5 seconds) every 5 minutes or so.
Incidentally, we assume the small gcs are the incremental gcs performed on the old area but there is no way to distinguish those from the little scavenging gcs that are performed even without -Xincgc.
The total memory used is quite modest (1/4 to 1/2 the size generally) compared to the max heap size and the old area and comparable to the new area.
If there is a flowchart or uml activity diagram that describes the hotspot gc behaviour so that we could be a little more deterministic in our approach to gc tuning I would be most grateful to get access to it. This trial and error approach is very frustrating.
There's some very useful information out there about the structure of the java heap and the meaning of the various options and even the garbage collection algorithms themselves but it is not sufficient to specify the behaviour of the specific hotspot jvm from Sun Microsystems.
I liken it to having a class diagram describing a highly dynamic system but no interaction diagrams.

I would also love to have a comprehensive explanation of garbage collection in Java. I'm still mystified by it in some respects.
The author of this thread has obviously researched Java GC... don't know if this helps, but someone in another thread posted this link to a JDC Tech Tips issue concerning memory allocation and GC:
http://developer.java.sun.com/developer/TechTips/2000/tt1222.html
Also the links near the bottom may be worth checking out...
There's something in that web page that I still don't understand and I think I will post a message about it soon.

JVM 1.4.2_12-b03 Internal Error assistance?

Hi all
We have a production application running on WebLogic 8.1 SP4 that has recently been crashing out giving the below error. We've tried adjusting our JVM settings from
JAVA_OPTIONS="-Dnetworkaddress.cache.ttl=60 -Xms2048M -Xmx2048M -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -Xloggc:gc.log"
to
JAVA_OPTIONS="-server -Dnetworkaddress.cache.ttl=60 -Xms2048M -Xmx2048M -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MaxPermSize=256m -XX:+UseParallelGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -Xloggc:gc.log "
as we believed the problem might lie with the CMS collector since the system bailed out whilst performing this type of collection but the system has since crashed with the revised arguments. Anyone encountered the same error?
The tail end of out gc.log reads
237671.735: [GC {Heap before GC invocations=1257:
Heap
par new generation   total 523840K, used 523383K [0x69400000, 0x89400000, 0x89400000)
eden space 523392K, 99% used [0x69400000, 0x8931dee8, 0x89320000)
from space 448K,   0% used [0x89390000, 0x89390000, 0x89400000)
to   space 448K,   0% used [0x89320000, 0x89320000, 0x89390000)
concurrent mark-sweep generation total 1572864K, used 777794K [0x89400000, 0xe9400000, 0xe9400000)
concurrent-mark-sweep perm gen total 117040K, used 70194K [0xe9400000, 0xf064c000, 0xf9400000)
237671.736: [ParNew: 523383K->0K(523840K), 0.5832577 secs] 1301178K->819005K(2096704K) Heap after GC invocations=1258:
Heap
par new generation total 523840K, used 0K [0x69400000, 0x89400000, 0x89400000)
eden space 523392K, 0% used [0x69400000, 0x69400000, 0x89320000)
from space 448K, 0% used [0x89320000, 0x89320000, 0x89390000)
to space 448K, 0% used [0x89390000, 0x89390000, 0x89400000)
concurrent mark-sweep generation total 1572864K, used 819005K [0x89400000, 0xe9400000, 0xe9400000)
concurrent-mark-sweep perm gen total 117040K, used 70194K [0xe9400000, 0xf064c000, 0xf9400000)
} , 0.5835878 secs]
237672.327: [GC [1 CMS-initial-mark: 819005K(1572864K)] 819017K(2096704K), 0.0190249 secs]
237672.347: [CMS-concurrent-mark-start]
237675.099: [CMS-concurrent-mark: 2.752/2.752 secs]
237675.099: [CMS-concurrent-preclean-start]
237675.140: [CMS-concurrent-preclean: 0.040/0.041 secs]
237675.143: [GC[/b]
And below is the err log:
# An unexpected error has been detected by HotSpot Virtual Machine:
# Internal Error (434F4E43555252454E542D41524B335745455027454E45524154494F4E0E4350501175 01), pid=10258, tid=5
# Java VM: Java HotSpot(TM) Client VM (1.4.2_12-b03 mixed mode)
--------------- T H R E A D ---------------
Current thread (0x000a27d0): [b]GCTaskThread [id=5]
Stack: [0x00000000,0x00000000), sp=0xfde818d0, free space=-34298k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x3b0060]
V [libjvm.so+0x23bab4]
V [libjvm.so+0x236784]
V [libjvm.so+0x34fd88]
V [libjvm.so+0x237a0c]
V [libjvm.so+0x232754]
V [libjvm.so+0x3b3704]
V [libjvm.so+0x356648]
--------------- P R O C E S S ---------------
Java Threads: ( => current thread )
0x008333c8 JavaThread "Thread-82" daemon [_thread_blocked, id=127]
0x00c6e8f8 JavaThread "AWT-Motif" daemon [_thread_in_native, id=125]
0x02b8eda0 JavaThread "AsyncRunner 1" daemon [_thread_blocked, id=98]
0x01ae51a0 JavaThread "ExecuteThread: '2' for queue: 'weblogic.kernel.Non-Blocking'" daemon [_thread_blocked, id=97]
0x01ae2b88 JavaThread "ExecuteThread: '1' for queue: 'weblogic.kernel.Non-Blocking'" daemon [_thread_blocked, id=96]
0x01ae1b58 JavaThread "ExecuteThread: '0' for queue: 'weblogic.kernel.Non-Blocking'" daemon [_thread_blocked, id=95]
0x01b58c20 JavaThread "AuditRotor" daemon [_thread_blocked, id=94]
0x01be4100 JavaThread "ExecuteThread: '0' for queue: 'JMSStore<ScritturaJMSJDBCStore>.ioThreadPool'" daemon [_thread_blocked, id=93]
0x01329500 JavaThread "LargeQueueThread" daemon [_thread_blocked, id=92]
0x027e0f70 JavaThread "ExecuteThread: '14' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=91]
0x027e0378 JavaThread "ExecuteThread: '13' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=90]
0x027df780 JavaThread "ExecuteThread: '12' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=89]
0x027deb88 JavaThread "ExecuteThread: '11' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=88]
0x027ddf90 JavaThread "ExecuteThread: '10' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=87]
0x027dd398 JavaThread "ExecuteThread: '9' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=86]
0x027dc7a0 JavaThread "ExecuteThread: '8' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=85]
0x027dbba8 JavaThread "ExecuteThread: '7' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=84]
0x027dafb0 JavaThread "ExecuteThread: '6' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=83]
0x027da3b8 JavaThread "ExecuteThread: '5' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=82]
0x027d98d0 JavaThread "ExecuteThread: '4' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=81]
0x01771928 JavaThread "ExecuteThread: '3' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=80]
0x01770e40 JavaThread "ExecuteThread: '2' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=79]
0x01770c98 JavaThread "ExecuteThread: '1' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=78]
0x017701c8 JavaThread "ExecuteThread: '0' for queue: 'JmsDispatcher'" daemon [_thread_blocked, id=77]
0x01770020 JavaThread "ListenThread.Default" [_thread_blocked, id=76]
0x02816ab8 JavaThread "ScritturaCron" [_thread_blocked, id=73]
0x02816910 JavaThread "TradeDropMonitor" [_thread_blocked, id=72]
0x028030f0 JavaThread "DocMgr Import Daemon" [_thread_blocked, id=70]
0x00f45230 JavaThread "ExecuteThread: '5' for queue: 'JMS.TimerClientPool'" daemon [_thread_blocked, id=69]
0x00f447d0 JavaThread "ExecuteThread: '4' for queue: 'JMS.TimerClientPool'" daemon [_thread_blocked, id=68]
0x00a46fa8 JavaThread "ExecuteThread: '3' for queue: 'JMS.TimerClientPool'" daemon [_thread_blocked, id=67]
0x00a46550 JavaThread "ExecuteThread: '2' for queue: 'JMS.TimerClientPool'" daemon [_thread_blocked, id=66]
0x00a45970 JavaThread "ExecuteThread: '1' for queue: 'JMS.TimerClientPool'" daemon [_thread_blocked, id=65]
0x00a44990 JavaThread "ExecuteThread: '0' for queue: 'JMS.TimerClientPool'" daemon [_thread_blocked, id=64]
0x01487960 JavaThread "Thread-7" daemon [_thread_blocked, id=63]
0x008c9580 JavaThread "ExecuteThread: '0' for queue: 'JMS.TimerTreePool'" daemon [_thread_blocked, id=62]
0x00e6aaf0 JavaThread "Thread-6" [_thread_blocked, id=61]
0x008ddd90 JavaThread "weblogic.health.CoreHealthMonitor" daemon [_thread_blocked, id=60]
0x008693a0 JavaThread "Thread-5" [_thread_blocked, id=59]
0x00aac6a0 JavaThread "LDAPConnThread-0 ldap://nys01a-4704.fir.fbc.com:19101" daemon [_thread_blocked, id=58]
0x00a94308 JavaThread "VDE Transaction Processor Thread" [_thread_blocked, id=56]
0x00a80fa0 JavaThread "ExecuteThread: '2' for queue: 'weblogic.admin.RMI'" daemon [_thread_blocked, id=55]
0x00a80df8 JavaThread "ExecuteThread: '1' for queue: 'weblogic.admin.RMI'" daemon [_thread_blocked, id=54]
0x00a80c50 JavaThread "ExecuteThread: '0' for queue: 'weblogic.admin.RMI'" daemon [_thread_blocked, id=53]
0x00861cf8 JavaThread "ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon [_thread_blocked, id=52]
0x0014dce0 JavaThread "ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'" daemon [_thread_blocked, id=51]
0x00862508 JavaThread "ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" daemon [_thread_in_native, id=50]
0x00147f98 JavaThread "weblogic.security.SpinnerRandomSource" daemon [_thread_blocked, id=49]
0x00147678 JavaThread "weblogic.time.TimeEventGenerator" daemon [_thread_blocked, id=48]
0x0022ff38 JavaThread "ExecuteThread: '4' for queue: 'weblogic.kernel.System'" daemon [_thread_blocked, id=47]
0x0022f340 JavaThread "ExecuteThread: '3' for queue: 'weblogic.kernel.System'" daemon [_thread_blocked, id=46]
0x0022e340 JavaThread "ExecuteThread: '2' for queue: 'weblogic.kernel.System'" daemon [_thread_blocked, id=45]
0x0022d748 JavaThread "ExecuteThread: '1' for queue: 'weblogic.kernel.System'" daemon [_thread_blocked, id=44]
0x0022cb50 JavaThread "ExecuteThread: '0' for queue: 'weblogic.kernel.System'" daemon [_thread_blocked, id=43]
0x0022bf58 JavaThread "ExecuteThread: '24' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=42]
0x0022b360 JavaThread "ExecuteThread: '23' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=41]
0x0022a768 JavaThread "ExecuteThread: '22' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=40]
0x00229b70 JavaThread "ExecuteThread: '21' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=39]
0x00229088 JavaThread "ExecuteThread: '20' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=38]
0x00212048 JavaThread "ExecuteThread: '19' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=37]
0x00211450 JavaThread "ExecuteThread: '18' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=36]
0x00210858 JavaThread "ExecuteThread: '17' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=35]
0x0020fc60 JavaThread "ExecuteThread: '16' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=34]
0x0020f068 JavaThread "ExecuteThread: '15' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=33]
0x0020e470 JavaThread "ExecuteThread: '14' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=32]
0x0020d878 JavaThread "ExecuteThread: '13' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=31]
0x0020cc80 JavaThread "ExecuteThread: '12' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=30]
0x0020c088 JavaThread "ExecuteThread: '11' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=29]
0x0020b568 JavaThread "ExecuteThread: '10' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=28]
0x003490f0 JavaThread "ExecuteThread: '9' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=27]
0x003484f8 JavaThread "ExecuteThread: '8' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=26]
0x003472f8 JavaThread "ExecuteThread: '7' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=25]
0x00346700 JavaThread "ExecuteThread: '6' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=24]
0x00345b08 JavaThread "ExecuteThread: '5' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=23]
0x00344f40 JavaThread "ExecuteThread: '4' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=22]
0x00344558 JavaThread "ExecuteThread: '3' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=21]
0x00343b70 JavaThread "ExecuteThread: '2' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=20]
0x00343188 JavaThread "ExecuteThread: '1' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=19]
0x00342fe0 JavaThread "ExecuteThread: '0' for queue: 'weblogic.kernel.Default'" daemon [_thread_blocked, id=18]
0x00853f00 JavaThread "Thread-1" daemon [_thread_blocked, id=17]
0x00142dc0 JavaThread "CompilerThread0" daemon [_thread_blocked, id=15]
0x00142158 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=14]
0x0013ee40 JavaThread "Surrogate Locker Thread (CMS)" daemon [_thread_blocked, id=12]
0x0013d5a8 JavaThread "Finalizer" daemon [_thread_blocked, id=11]
0x0013bc40 JavaThread "Reference Handler" daemon [_thread_blocked, id=10]
0x00038a90 JavaThread "main" [_thread_blocked, id=1]
Other Threads:
0x0013a7d8 VMThread [id=9]
0x001447c0 WatcherThread [id=16]
VM state:at safepoint (normal execution)
VM Mutex/Monitor currently owned by a thread: ([mutex/lock_event])
[0x00038340/0x00038370] Threads_lock - owner thread: 0x00115858
[0x00035388/0x000386c0] Heap_lock - owner thread: 0x00115858
Heap
par new generation total 523840K, used 19742K [0x69400000, 0x89400000, 0x89400000)
eden space 523392K, 3% used [0x69400000, 0x6a747ac8, 0x89320000)
from space 448K, 0% used [0x89320000, 0x89320000, 0x89390000)
to space 448K, 0% used [0x89390000, 0x89390000, 0x89400000)
concurrent mark-sweep generation total 1572864K, used 790031K [0x89400000, 0xe9400000, 0xe9400000)
concurrent-mark-sweep perm gen total 109584K, used 65799K [0xe9400000, 0xeff04000, 0xf9400000)
Dynamic libraries:
0x00010000      java
0xff350000      /usr/lib/libthread.so.1
0xff340000      /usr/lib/libdl.so.1
0xff200000      /usr/lib/libc.so.1
0xff390000      /usr/platform/SUNW,Sun-Fire-V440/lib/libc_psr.so.1
0xfe000000      /app/java/jdk1.4.2_12/jre/lib/sparc/client/libjvm.so
0xff2e0000      /usr/lib/libCrun.so.1
0xff1e0000      /usr/lib/libsocket.so.1
0xff100000      /usr/lib/libnsl.so.1
0xff0b0000      /usr/lib/libm.so.1
0xff1c0000      /usr/lib/libsched.so.1
0xff310000      /usr/lib/libw.so.1
0xff080000      /usr/lib/libmp.so.2
0xff040000      /app/java/jdk1.4.2_12/jre/lib/sparc/native_threads/libhpi.so
0xfe7d0000      /app/java/jdk1.4.2_12/jre/lib/sparc/libverify.so
0xfe790000      /app/java/jdk1.4.2_12/jre/lib/sparc/libjava.so
0xfe770000      /app/java/jdk1.4.2_12/jre/lib/sparc/libzip.so
0xfbba0000      /app/java/jdk1.4.2_12/jre/lib/sparc/libnet.so
0xfc1e0000      /app/bea/weblogic81sp4/server/lib/solaris/libweblogicunix1.so
0xf9690000      /app/bea/weblogic81sp4/server/lib/solaris/libstackdump.so
0xf95e0000      /app/bea/weblogic81sp4/server/lib/solaris/libmuxer.so
0xf95c0000      /usr/ucblib/libucb.so.1
0xf94b0000      /usr/lib/libresolv.so.2
0xf9470000      /usr/lib/libelf.so.1
0xf9590000      /app/java/jdk1.4.2_12/jre/lib/sparc/libnio.so
0x693a0000      /usr/lib/librt.so.1
0x692e0000      /usr/lib/libaio.so.1
0x692c0000      /usr/lib/libsendfile.so.1
0x692a0000      /app/java/jdk1.4.2_12/jre/lib/sparc/libioser12.so
0x50900000      /app/java/jdk1.4.2_12/jre/lib/sparc/libmlib_image.so
0x4f200000      /app/java/jdk1.4.2_12/jre/lib/sparc/libawt.so
0x5c510000      /app/java/jdk1.4.2_12/jre/lib/sparc/motif21/libmawt.so
0x4ef80000      /usr/dt/lib/libXm.so.4
0x5e610000      /usr/openwin/lib/libXt.so.4
0x5e710000      /usr/openwin/lib/libXext.so.0
0x5e560000      /usr/openwin/lib/libXtst.so.1
0x4ee80000      /usr/openwin/lib/libX11.so.4
0x5e420000      /usr/openwin/lib/libdps.so.5
0x5e530000      /usr/openwin/lib/libSM.so.6
0x5e350000      /usr/openwin/lib/libICE.so.6
0x4ed80000      /app/java/jdk1.4.2_12/jre/lib/sparc/libfontmanager.so
0x5e230000      /app/java/jdk1.4.2_12/jre/lib/sparc/libcmm.so
0x5e130000      /app/java/jdk1.4.2_12/jre/lib/sparc/libjpeg.so
VM Arguments:
jvm_args: -Dnetworkaddress.cache.ttl=60 -Xms2048M -Xmx2048M -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -Xloggc:/app/fid/edocs/config/mydomain/logs/gc.log -Dscrittura.home=/app/fid/edocs/config/mydomain/scrittura -Dscrittura.env=PROD -Dscrittura.workflow.halted=true -Dscrittura.scrittura.halted=true -Dweblogic.alternateTypesDirectory=/app/fid/edocs/config/mydomain/applications/mbeantypes -Dscrittura.messaging.econfirm.transport.url=https://www.econfirm.com/ECFXmlMessaging/ECFXmlMessaging.class -Dscrittura.messaging.proxy.host=169.37.104.17 -Dscrittura.messaging.proxy.port=8080 -Dscrittura.messaging.econfirm.username=systemadmin -Dscrittura.messaging.econfirm.password=System25 -Dscrittura.messaging.econfirm.CSIBatchId=11472 -Dscrittura.messaging.econfirm.CSEBatchId=11508 -Dscrittura.messaging.econfirm.companyName=Test -Dwordml.pdf.timeout=120000 -Dwordml.datamatrix.dotSize=3 -Dcom.ipicorp.scrittura.disableRefiling=true -Dcom.ipicorp.mvc.disableSecurityChecks=true -Dscrittura.bulk.multi.transactions=true -Dscrittura.pdfpreview.quality=grayscale -Dscrittura.pib.bumpUpdate=true -Dweblogic.management.discover=true -Dweblogic.ProductionModeEnabled=true -Dweblogic.Name=edocs -Dbea.home=/app/bea/weblogic81sp4/server/.. -Djava.security.policy==/app/fid/edocs/config/mydomain/weblogic.policy -Dweblogic.system.StoreBootIdentity=true -Dweblogic.security.SSL.trustedCAKeyStore=/app/bea/weblogic81sp4/server/lib/csfbDefaultKeyStore.jks -Dweblogic.management.server=http://nys01a-4704.fir.fbc.com:19101
java_command: weblogic.Server
Launcher Type: SUN_STANDARD
Environment Variables:
CLASSPATH=:/app/bea/weblogic81sp4/server/lib/CR174593_81sp4_v2.jar:/app/java/jdk1.4.2_12/lib/tools.jar:/app/bea/weblogic81sp4/server/lib/CSFB_security_patches_81sp4.jar:/app/bea/weblogic81sp4/server/lib/weblogic_sp.jar:/app/bea/weblogic81sp4/server/lib/weblogic.jar:/app/fid/edocs/config/mydomain/applications/lib/commons-beanutils.jar:/app/fid/edocs/config/mydomain/applications/lib/commons-collections.jar:/app/fid/edocs/config/mydomain/applications/lib/commons-digester.jar:/app/fid/edocs/config/mydomain/applications/lib/commons-logging.jar:/app/fid/edocs/config/mydomain/applications/lib/dom4j-full.jar:/app/fid/edocs/config/mydomain/applications/lib/iText.jar:/app/fid/edocs/config/mydomain/applications/lib/jasperreports-1.3.0.jar:/app/fid/edocs/config/mydomain/applications/lib/jcommon-0.9.4.jar:/app/fid/edocs/config/mydomain/applications/lib/jcommon-1.0.6.jar:/app/fid/edocs/config/mydomain/applications/lib/jfreechart-0.9.19.jar:/app/fid/edocs/config/mydomain/applications/lib/jfreechart-1.0.3.jar:/app/fid/edocs/config/mydomain/applications/lib/log4j.jar:/app/fid/edocs/config/mydomain/applications/lib/Multivalent20040415.jar:/app/fid/edocs/config/mydomain/applications/lib/poi-2.5.1-final-20040804.jar
PATH=/app/java/jdk1.4.2_12/jre/bin:/app/java/jdk1.4.2_12/bin:/bin:/usr/bin
LD_LIBRARY_PATH=/app/java/jdk1.4.2_12/jre/lib/sparc/client:/app/java/jdk1.4.2_12/jre/lib/sparc:/app/java/jdk1.4.2_12/jre/../lib/sparc:/app/bea/weblogic81sp4/server/lib/solaris:/app/bea/weblogic81sp4/server/lib/solaris/oci816_8
SHELL=/bin/ksh
DISPLAY=:1
--------------- S Y S T E M ---------------
OS: Solaris 8 2/04 s28s_hw4wos_05a SPARC
Copyright 2004 Sun Microsystems, Inc. All Rights Reserved.
Assembled 08 January 2004
uname:SunOS 5.8 Generic_117000-05 sun4u (T1 libthread)
rlimit: STACK 8192k, CORE infinity, NOFILE 8192, AS infinity
load average:0.33 0.46 0.62
CPU:total 4 has_v8, has_v9, has_vis1, has_vis2, is_ultra3
Memory: 8k page, physical 16777216k(10879240k free)
vm_info: Java HotSpot(TM) Client VM (1.4.2_12-b03) for solaris-sparc, built on May 9 2006 13:03:17 by unknown with Workshop 5.2 compat=5

>
# Internal Error (434F4E43555252454E542D41524B335745455027454E45524154494F4E0E4350501175 01), pid=10258, tid=5
Heap
par new generation total 523840K, used 19742K [0x69400000, 0x89400000, 0x89400000)
eden space 523392K, 3% used [0x69400000, 0x6a747ac8, 0x89320000)
from space 448K, 0% used [0x89320000, 0x89320000, 0x89390000)
to space 448K, 0% used [0x89390000, 0x89390000, 0x89400000)
concurrent mark-sweep generation total 1572864K, used 790031K [0x89400000, 0xe9400000, 0xe9400000)
concurrent-mark-sweep perm gen total 109584K, used 65799K [0xe9400000, 0xeff04000, 0xf9400000)
vm_info: Java HotSpot(TM) Client VM (1.4.2_12-b03) for solaris-sparc, built on May 9 2006 13:03:17 by unknown with Workshop 5.2 compat=5The error you are seeing was fixed in 1.4.2_14 in CR 6409002, which also
lists (as of this moment) the workaround -XX:-CMSParallelRemakrEnabled -XX:CMSMakrStackSize=64m
(or some suitable large value, i think the default is 8M, but am not sure; however
this can increase the GC pause "CMS-remark" pause times, which
could be as much as ~3 x on your 4 cpu box):-
http://bugs.sun.com/view_bug.do?bug_id=6409002
For more details see:
http://bugs.sun.com/view_bug.do?bug_id=4615723
The latest publicly available version of 1.4.2 is 1.4.2_17, so you might
consider upgrading to that instead of to 1.4.2_14.
Aside: I notice that you are using the "client jvm"; you might consider the
server jvm (via -server) for improved performance.

How to specify when Full Garbage Collections occur in the Old Generation

Hi. We seem to be having a problem with a number of JVMs (1.5.0_17-b04) that run a component of a Document Management application. This component stores a large amount of information in caches which reside in the Old Generation. Although these cache sizes can be somewhat controlled by the application, they are currently taking about 85% of the Old Generation space. Fortunately, very few objects get tenured into the Old Generation - they all are cleaned up in the New Generation space.
The problem we are seeing is that with the Old Generation at 85% full, there are constant full GC's occurring. Since the caches cannot be removed, the system frantically tries to remove objects that can't be removed.
We have three solutions in mind. The first is to increase the memory allocation to the Old Generation so that the caches take a smaller percentage of the available memory allocation. The second would be to decrease the size of the caches; but this is set more by the number of documents in the application and cannot be made much smaller.
The third solution is to configure the JVM so that Garbage Collections in the Old Generation do not occur until the memory is more than a specific percentage of memory in the Old Generation. We would then set this percentage to be higher than the amount of memory being used by the caches.
So, is it possible to tell the JVM to only run a Full GC when the memory in the Old Generation is greater than a specific value (say 85% full)?
Thanks for your help.
Andre Fischer.

afischer wrote:
The third solution is to configure the JVM so that Garbage Collections in the Old Generation do not occur until the memory is more than a specific percentage of memory in the Old Generation. We would then set this percentage to be higher than the amount of memory being used by the caches.
So, is it possible to tell the JVM to only run a Full GC when the memory in the Old Generation is greater than a specific value (say 85% full)?Switch to the CMS collector.
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=86

The GC running for no apparent reason.

Has anyone seen the Concurrent collector run, stopping the application even though there is plenty of free memory.
In the following logs from jstat, the eden space is not full and infact is not touched. However the CMS collector decides to run and locks up the application for 0.5 sec (seen in the GC logs)
If it were triggered by a System.gc() you would see it as the cause (as you do on the first few lines)
Is there anything special about an eden size which is 45% full? I have seen it get to 45% full before and not trigger a collection of the old gen.
I have Java 6 update 21.
From jstat -gccause <pid> 60s
0.00   0.00 44.63 11.05 60.67      0    0.000    17   10.218   10.218 System.gc()          No GC
0.00   0.00 44.64 11.05 60.67      0    0.000    17   10.218   10.218 System.gc()          No GC
0.00   0.00 44.71 11.05 60.67      0    0.000    17   10.218   10.218 System.gc()          No GC
0.00   0.00 44.71 11.05 60.67      0    0.000    17   10.218   10.218 System.gc()          No GC
0.00   0.00 44.79 11.05 60.67      0    0.000    17   10.218   10.218 System.gc()          No GC
0.00   0.00 44.92 11.05 60.67      0    0.000    17   10.218   10.218 System.gc()          No GC
0.00   0.00 45.00 11.05 60.67      0    0.000    18   10.928   10.928 CMS Initial Mark     No GC
0.00   0.00 45.00   7.29 60.00      0    0.000    19   11.501   11.501 CMS Final Remark     No GC
0.00   0.00 45.10   7.29 60.00      0    0.000    19   11.501   11.501 CMS Final Remark     No GC
0.00   0.00 45.24   7.29 60.00      0    0.000    19   11.501   11.501 CMS Final Remark     No GC
0.00   0.00 45.33   7.29 60.00      0    0.000    19   11.501   11.501 CMS Final Remark     No GCBTW: The eden size is 6 GB so there was over 3 GB free when the CMS collector ran.

However, in the OP, the eden and survivor spaces are untouched. The eden space is at 45% full and continues to grow slowly. Only the tenured space is cleaned up. (old and perm)Fair enough.
I had imagined that the "concurrent" collector might conservatively collect the tenured generation to ensure the "young generation guarantee" before the eden space fills up: indeed a procrastinating policy that waits for space to be scarce before kicking, increases both:
- the risk that the collection occurs at a "bad time" (sudden increase of traffic, or whatever)
- the duration of the pause (potentially more objects to traverse or scavenge, within a more fragmented and paged memory).
However, I find nothing in the article that supports that fallacy hypothesis :(
Edited by: jduprez on Oct 14, 2010 11:05 PM
Hum, I did find support for this hypothesis: see section +5.4 The Concurrent Low Pause Collector+
Quoting the most relevant part:
With the serial collector a major collection is started when the tenured generation becomes full and all application threads are stopped while the collection is done. In contrast a concurrent collection should be started at a time such that the collection can finish before the tenured generation becomes full. There are several ways a concurrent collection can be started.
- (...) statistics on the time remaining before the tenured generation is full (...) appropriately padded so as to start a collection conservatively early.
- (...) if the occupancy of the tenured generation grows above the initiating occupancy (...) by default is set to about 68%. (...)As per your OP, the tenured usage is much below 68%, but this leaves the first reason plausible. In particular, the article explicitly states that The pauses for the young generation collection and the tenured generation collection occur independently, so the tenured generation may be collected without (before) the young one.

Problems with cache.clear()

Hello!
We are having some problems with cache clears in our production cluster that we do once a day. Sometimes heaps "explode" with a lot of outgoing messages when we do a cache.clear() and the entire cluster starts failing.
We had some success with a alternate method of doing the cache clear where we iterate cache.keySet() and do a cache.remove(key) with a pausetime of 100 ms after 20000 objects until the cache is empty. But today nodes started failing on a cache.size() before the removes started (the first thing we do is to log the size of the cache we are about to clear before the remove operations start).
We have multiple distributed caches configured with a near cache. The nearcache has 10k objects as high units and the back caches vary in size, the largest is around 300k / node.
In total the DistributedCache-service is handling ~20 caches.
The cluster consists of 18 storage enabled nodes spread across 6 servers and 31 non storage enabled nodes running on 31 servers.
The invalidation stategy on the near caches is ALL (or, its AUTO but it looks like it selects ALL since ListenerFilterCount=29 and ListenerKeyCount=0 on a StorageManager?)
Parition count is 257, backup count 1, no changes in thread count on the service, service is DistributedCache.
Coherence version 3.6.1.0.
A udp test sending from one node to another displays 60 megabyte/s.
Heapsize for the Coherence JVMs, 3gb. LargePages is used.
Heapsize for the front nodes JVMs, 6gb. LargePages is used.
No long GC-times (until the heaps explode), 0.2-0.6 seconds. CMS-collector.
JDK 1.6 u21 64-bit.
Windows 2k8R2.
We are also running CoherenceWeb and some Read/Write-caches, but on different coherence services. We are not doing any clear/size-operations against caches owed by these services.
Looking at some metrics from the last time we had this problem (where we crashed on cache.size()).
The number of messages sent by the backing nodes went from <100/s to 20k-50k/s in 15 s.
The number of messages resent by the backing nodes went from ~0/s to 1k-50k/s depending on the node in 15 s.
At the time the total number of requests against the DistributedCache-service was around 6-8/s and node.
To my questions, should it be a problem to do a cache clear with this setup (where the largest cache is around 5.4 million entires) ? Should it be a problem to do a cache.size()?
What is the nicest way to do a cache.clear()? Any other strategy?
Could a lock on a entry in the cache cause this problem? Should it really cause a problem with cache.size()?
Any advice?
BR,
Carl
Edited by: carl_abramsson on 2011-nov-14 06:16

Hi Charlie,
Thank you for your answer! Yes, actually we are using a lot of expiry and many of the items are created at roughly the same time! We haven't configured expiry in the cache configuration, instead we do a put with a expire.
Regarding the workload, compared to our peak hours it has been very low when had problems with the size and clear operations. So the backing tier isn't really doing much at the time. That's what has been so strange with this problem.
The release going live today has PRESENT as near cache invalidation strategy. We remove as much of the expire as possible in the next.
BR,
Carl

Long GC pause - grey object rescan

We observe occaisional GC pauses on our app server which seem to be caused by a "grey object rescan". In the past they ranged from 20-30s, but yesterday we had a 84s pause. That's a bit much...
[GC 1593754.769: [ParNew: 450212K->46400K(471872K), 0.1076340 secs] 1984347K->1585395K(3019584K), 0.1079070 secs]
1593757.147: [GC[YG occupancy: 339101 K (471872 K)]1593757.147: [Rescan (non-parallel) 1593757.147: [grey object rescan, 84.6104290
secs]1593841.758: [root rescan, 0.3187110 secs], 84.9293110 secs]1593842.076: [weak refs processing, 0.0100700 secs]1593842.087: [cl ass unloading, 0.1208600 secs]1593842.207: [scrub symbol & string tables, 0.0189270 secs] [1 CMS-remark: 1538995K(2547712K)] 1878096 K(3019584K), 85.1028700 secs]
My question is: what exactly causes this? Any way to avoid it?
I've found a couple bugs that might be related: 6367204, 6298694. Although I don't think there we have any huge arrays.
Or GC settings are as follows:
/usr/java/jdk1.5.0_06/bin/java -server -Xms3000m -Xmx3000m -XX:LargePageSizeInBytes=2m -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSCompactWhenClearAllSoftRefs -XX:SoftRefLRUPolicyMSPerMB=200 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:-CMSParallelRemarkEnabled -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:NewSize=512m -XX:MaxNewSize=512m -XX:SurvivorRatio=8 -XX:PermSize=156m -XX:MaxPermSize=156m -XX:CMSInitiatingOccupancyFraction=60 -Xloggc:/var/ec/gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintClassHistogram -XX:+TraceClassUnloading -Duser.region=US -Dsun.rmi.dgc.client.gcInterval=86400000 -Dsun.rmi.dgc.server.gcInterval=86400000
Any help/suggestions are highly appreciated. Thanks!

When using the concurrent mark-sweep (cms) collector there is a phase where
cms is looking for all the live objects (referred to as concucrent marking of live objects)
in the heap while the application is running and likely changing objects in the heap.
In order to assure correctness cms has a phase (referred to as the remark)
where all the applications threads are stopped and it looks for
objects that have changed while it was doing the concurrent marking. The "grey object
rescan" refers to looking at the objects that have changed. This "grey object
rescan" depends on the number of objects that are changed during the
concurrent marking so the level of activity of the application can affect
this phase.
I note that you have turned off the parallel remark on the command line
(-XX:-CMSParallelRemarkEnabled). If you've had problems with the
parallel remark, then turning it on is not an option. If you have not had
problems with parallel remark, turn it on and see if it helps.
If you use the flag -XX:PrintCMSStatistics=1, you will get additional output.
In it you can look for lines such as
(re-scanned XXX dirty cards in cms gen)
If XXX is smaller in the cases where the "grey object rescan" is shorter and
XXX is larger in the cases where the "grey object rescan" is longer, then
the problem is due to lots of activity by the application during the concurrent
mark. If you're not on the 5.0 jdk and can move to it, please do. The parallel
remark will be more stable.

Find out current old heap usage from within the process

Hello!
We use the CMS garbage collector and need a way to find out how much memory is used of the old heap by reachable objects. This we have to do from within the process (not using jvmstat or jstat etc.).
Since there is no way to distinguish between reachable and non-reachable objects (except for traversing the entire heap... -- or are there other possibilities?) our idea is to get the amount of used memory right after a garbage collection in the old heap.
Using Java 1.5, this can be done by
java.lang.management.MemoryPoolMXBean pool = <Pool for Old Generation>;
pool.getUsage().getUsed();However, java.lang.management is only available in Java 1.5.
Therefore my first question: Is there a similar way of finding out old heap usage in Java 1.4?
There is another problem with this method: By calling pool.getUsage().getUsed();, one has to know when a GC has occurred (this could be done by calling it in an interval of x seconds -- if the current value is lower than the one before, a GC must hava occurred). A better way would be to use pool.getCollectionUsage().getUsed();, but this seems not to work for the CMS collector.
Second question: Is pool.getCollectionUsage().getUsed(); really not working with CMS, or are we just doing it in a wrong way? Are there other ways of finding out the used memory in the old heap after a GC even when using the CMS?
Thanks for any help!
Regards,
Nicolas Michael

Hi Nicolas,
There is no API in 1.4 to get the after GC memory usage of the old generation. The only thing close to it is (Runtime.totalMemory - Runtime.freeMemory) but it is the approx amount of memory used for the heap (not just the old generation).
MemoryPoolMXBean.getCollectionUsage() returns the after GC MemoryUsage. This method should work for all collectors. I have a simple test case that shows it working fine with CMS. It shows the same value as the -XX:+PrintGCDetails shows.
If you have a test case showing that this method doesn't work correctly, please submit a bug along with the test case. We'll investigate it.
Thanks
Mandy

Garbage collection eating up processor power

Hi All,
We got 5 Sun Web Servers running on Java 1.4.2, and used to use the default GC for Tenured space. The problem with that is that it takes 60-80 seconds everytime the GC happens, and the latency on the site goes crazy. So we decided to change it to use the Concurrent Mark Sweep Collector on one server to test it out. Here's the setting:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xms3G -Xmx3G -XX:NewSize=384M -XX:MaxNewSize=384M -XX:PermSize=64M -XX:MaxPermSize=64M -XX:CMSInitiatingOccupancyFraction=60
With that setting, the server runs great. But eventually, when the server reach a medium load (around 100-200 users), the tenured space is always around half full, and the CMS collector starts to run continuously one after another. It doesn't hurt the application, but it's taking 25% of processing time (we got 4 cpu, so one web server always keep 1 cpu power). I don't see that much cpu utilization on other web server that don't have CMS, and they have more users than the one with CMS. If we got CMS on all 5 web servers, I'm wondering if that will crash the server or not.
Also, I'm thinking to use i-CMS on the JVM as well even though i-CMS is optimized for 1-2 cpu machines, and maybe that might reduce the amount of CPU utilization by CMS. Any thought?
Thanks,
TK

This was cross-posted to [email protected] and answered there. See the archives at [http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2008-March/000081.html|http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2008-March/000081.html] for the analysis of this issue.

CMS collector tuning

Similar Messages

Maybe you are looking for