Tools for root cause analysis of stack corruption?

I'm experiencing extremely rare stack corruption that results in SEGV core dumps for a large and complex C++ program. Having run all the standard tools such as IBM Rational Purify, Sun mdb/libumem and dbx/rtc (although unfortunately using dbx check -access takes inordinately long and eventually crashes before the main() function is executed), I am no closer to discovering the root cause of the stack corruption. I'm confident based on the tools run and on the stack trace from the core dump that the problem is not heap corruption, but stack corruption.
The environment is Sun Studio 12 (but not update 1) on Solaris 10 on SPARC. The program is compiled with minor optimisation (-xO2 -xbuiltin=%all).
Is anyone aware of other tools or approaches that could help pinpoint the problem? Your help would be much appreciated!
Thanks in advance,
Simon

If you are in fact having stack corruption issues, I don't think any of the tools you mentioned other than Purify would help you identify it.
You may also be simply running out of stack space, and not having corruption issues. Is your app multi-threaded? If so, you could increase the stack size your threads use to something larger than the default.
Another thing you can look for are syslog entries stating "no swap space to grow stack" for your process, you've run out of virtual memory. To avoid this, you can "pre-allocate" your stack memory with code similar to this:
void growStack( size_t bytes )
    char *mem = ( char * ) alloca( bytes );
    memset( mem, 0, bytes );
    return;
}That code, when called, will force the creation of stack memory virtual pages backed by swap, before your server gets into a situation where free memory might be in short supply.
I also seem to recall that Solaris under certain circumstances will allocate stack memory with the MAP_NORESERVE option, which means swap space won't be reserved for your stack. If your process gets swapped out, its stack(s) will be lost and you'll probably get a SIGSEGV or SIGBUS. See this bug:
[http://bugs.opensolaris.org/view_bug.do?bug_id=1221729|http://bugs.opensolaris.org/view_bug.do?bug_id=1221729]
I remember working a similar issue for a customer running large apps on Sun E15Ks, maybe about 5 years ago. To work around this behavoir, I think you'll need to explicitly allocate stack memory for any threads you may be creating. I think that's what we had to do.

Similar Messages

  • E2E Root cause analysis in solution manager

    Hi
    How to add the E2E root cause analysis, the  new functionality in solution manager 4.0 with the E2E root cause analysis functionality does it require support pack which level of SP required to use the new  functionaltiy
    Regards
    Dinaker vikas

    Hi,
    If you have PI system in your landscape, I would strongly recommend to upgrade to SP15.
    Trace monitoring for PI is to be available from SP17 (Not yet released)
    Check [this Link.|http://service.sap.com/e2e].
    Hope this solves your problem.
    Feel free to revert back.
    --Ragu

  • What is the diff of Solution monitoring and E2E Root cause analysis

    Hi,
    I want to know what is the different for Solution Manager Solution Monitoring and End to End root cause analysis. Currently we have implemented change management and Implementing and Upgrading SAP Solutions, if i would like to explore the solution monitoring, and end to end root cause analysis, what is the steps of configuration i need?
    if i would like to have a check on the system whether is performing well, or setup the functionality of sending alert when system is having problem, which feature would be better? end of end functionality or Solution Monitoring?
    Please advice..
    thanks
    -Fung

    Hi,
    If you have PI system in your landscape, I would strongly recommend to upgrade to SP15.
    Trace monitoring for PI is to be available from SP17 (Not yet released)
    Check [this Link.|http://service.sap.com/e2e].
    Hope this solves your problem.
    Feel free to revert back.
    --Ragu

  • Root Cause Analysis (OS and DB - OS Command Console) Connection problem!

    Dear Community,
    i've got a problem with the Root Cause Analysis OS Command console!
    Currently I try to read the System MemStat info from the console, without success!
    I've got the following errors:
    Error1:
    Error to parse the data returns by 'SAPOSCol', cause :com.sap.smd.plugin.remoteos.exception.SAPOSColDataParserException: Unknown Error during the parsing of saposcol output; nested exception is:
    java.lang.NullPointerException
    Error2:
    'SAPOSCol' Service not availbale, detail :com.sap.smd.plugin.remoteos.exception.SAPOScolNotAvailable: SAPOSCol not available (detail output: )
    The saposcol service is running, when I look on the server, I've got the correct status:
    server:<sid>adm> saposcol -s
    Collector Versions :
      running : COLL 20.94 700 - v2.00, AMD/Intel x86_64 with Linux
      dialog  : COLL 20.94 700 - v2.00, AMD/Intel x86_64 with Linux, 2007/02/16
    Shared Memory       : attached
    Number of records   : 5758
    Active Flag         : active (01)
    Operating System    : Linux server 2.6.5-7.283-smp #1 SMP Wed Nov 29
    Collector PID       : 9536 (00002540)
    Collector           : running
    Start time coll.    : Wed Apr 23 15:00:58 2008
    Current Time        : Wed Apr 23 15:12:08 2008
    Last write access   : Wed Apr 23 15:12:04 2008
    Last Read  Access   : Wed Apr 23 15:12:03 2008
    Collection Interval : 10 sec (next delay).
    Collection Interval : 10 sec (last ).
    Status              : free
    Collect Details     : required
    Refresh             : required
    Header Extention Structure
    Number of x-header      Records : 1
    Number of Communication Records : 60
    Number of free Com.     Records : 60
    Resulting offset to 1.data rec. : 61
    Trace level             : 2
    Collector in IDLE - mode ? : NO
      become idle after 300 sec without read access.
      Length of Idle Interval  : 60 sec
      Length of norm.Interval  : 10 sec
    The strange thing is, sometimes it works, sometimes not!
    Any Ideas?
    System: Solution Manager 4.0 SP Stack 13
    Thanks and regards
    Sascha

    >   running : COLL 20.94 700 - v2.00, AMD/Intel x86_64 with Linux
    >   dialog  : COLL 20.94 700 - v2.00, AMD/Intel x86_64 with Linux, 2007/02/16
    I suggest you upgrade your saposcol on the Linux box, it's more than a year old and it's possible, that the SMD client needs a newer version that provides the data you're requesting.
    Markus

  • Root Cause Analysis - Breakpoints in ABAP

    Gurus ,
    We had this breakpoints in the ECC - datasource extractors which got transported in the  production and I have been given the task to identify the root cause analysis as to why was it not caught in dev,qa and testing.
    can any one share their experience as to how to avoid this negligence and  the additional steps required to ensure this does not happen again ?
    Thanks in advance

    Hi,
    To identify the break points and avoid the transport of breakpoints. U can run the extended program check and there you can find in EPC error in superfluous statements the break points present in ur report.
    Regards,
    Vasanth

  • How to do E2E root cause analysis?

    Gurus:
    Could you provide some "how-to" guides here about how to do E2E root cause analysis?
    I have NO problem with E2E set up. My problem is after setting up, how to use it to do E2E root cause analysis?
    Thanks!

    >
    Ashley Ho wrote:
    > Could you provide some "how-to" guides here about how to do E2E root cause analysis?
    Have you already visitited the Diagnostics Page and SolMan Ramp-Up Knowledge Transfer in SAP Support Portal?
    [www.service.sap.com/diagnostics]
    [www.service.sap.com/rkt-solman]
    There is also a very interesting training available:
    E2E100 Root Cause Analysis    (I don't know the exact name)
    Best regards,
    Ruediger

  • Building and deploying J2EE apps ?  Now there is a solution for production root cause analysis.

    Is your organization building and deploying J2EE apps? If so, Halo
    can help solve one of the toughest issues facing enterprises today:
    Finding the root cause of software faults.
    "Halo monitors, pinpoints, reports on and provides a source-code level
    root cause of software faults in deployed J2EE apps. Halo is unique
    because it's the only technology that can give you a root cause
    diagnosis in a fully deployed, live production application. Halo has
    such low performance overhead that customers deploy their final,
    production versions of their applications with Halo enabled.
    Used with Web Application Servers like WebLogic, Halo helps ensure
    that deployed code is reliable and able to be quickly fixed if
    problems turn up. Most important, because Halo is an "always on"
    technology, you get all the information you need to rapidly solve a
    problem on the first fault. Problem replication and bug reports are
    obsolete with Halo
    "Halo has a unique ability to provide a root cause diagonosis and
    understanding
    of software problems in production systems, without needing to
    replicate the
    issue.
    Test on WebLogic proved that Halo runs with extremely low overhead and
    is suitable for use in deployed production systems"
    Andrew Sliwkowski, Software Engineer
    BEA Systems, Inc.
    The key is Halo's high performance, low overhead TraceBack
    instrumentation technology. Based on technology out of MIT and proven
    in the field, TraceBack enables you to instrument JARs, EARs and WARs
    within minutes, without touching source code.
    Halo is useful throughout the entire application life cycle, from
    development through test, beta and deployment.
    If you have interest in learning more visit our website at
    www.incer.com or email me directly at [email protected] (Rick Martin)

    I have two questions. We have just started developing apps using jdev9i, 9iAS v2 and are new to the j2ee environment so my questions may be very easy ones.
    Question 1: We have set up Oracle pooling connection to our databases. We have a development, test and production database. When I deploy my application, it includes the connections. This is preventing me from moving the EAR files from dev to test to prod without modification and re deploying to my EAR file. Is there a way or a place that I can put my database connections that will not be included in my EAR files and the application still find them?datasources.xml is where the info regradings connection to databases is licated. If you're using 9iAS
    you can use EM to create datasource entry at the global level. In OC4J standalone you could use admin.jar
    or edit the file. Check out the standalone user's guide at http://otn.oracle.com/tech/java/oc4j/pdf/oc4j_so_usersguide_r2.pdf.
    Also, you will othe OC4J docs on OTN.
    Question 2: I have a stand alone oc4j set up for our developers to use while testing their applications. The applications include libraries supplied in jdev such as xml parser v2. I do not want to deploy those lib files with the app because I will have to redeploy all my apps if I upgrade jdev. I just want to be able to upgrade the libraries, test the apps and not have to redeploy everything. I can do this by coping the jdev lib to 9iAS but I can't seem to find the right place to put the lib for the stand alone oc4j instance. You can use the library tag within application.xml for server wide availability. Check out the article
    http://otn.oracle.com/oramag/oracle/02-sep/o52oc4j_2.html specifically class loading in OC4J section
    Any help would be greatly appreciated. Thanks in advance.

  • Third Party Tool for Scheduling Web Analysis reports

    Hello,
    I am using Hyperion Web Analysis for report development and Workspace for viewing reports.
    Now, I am facing an issue for scheduling WA reports through Workspace. Few experts told to me, we can't schedule a web Analysis document in workspace. But, scheduling is my urgent and important requirement.
    Can anybody suggest me how can i go for scheduling?
    Whether i need to go for third party scheduling tool? If yes, Then which one?
    Thanks

    WA is for online report only and does not support report delivery and scheduling. Convert your WA reports to IR. Since 9.3.1 IR connectivity to Essbase has been improved (see CubeQuery).
    HTH
    Gerd

  • E2e Root Cause Analysis Checklist

    Hello everyone,
    I know I am close to getting some systems setup for RCA, but nothing is showing up in EWAs or within the RCA workcenter.
    My system is currently at 7.01 SP05.
    Since I've gone through the solman_setup and received all green lights through setting up my  diagnostics system and a couple managed I don't know what else to get done. I know the managed system wizard is a step-through but I am asking for help setting up a checklist to make sure I've got everything installed and checked.
    Configuration:
       Initial - complete
       Basic - complete
    Diagnostics System
       setup wizard - green light
       advanced setup - green light
       upgrader - green light
    Managed Systems
       I have a bunch of different systems with green lights under setup results; ABAP, JAVA, and Dual Stack
    ISAGENT
       I've got updated isagents installed on each system
    Wily
       Currently at the latest release and our Solution Manager system is connected
    SMD Agents
       each server/LPAR has an agent installed and is attached
    I'd like to finish this up but again I am missing something?!?
    Thanks for any help on this one.
    Ryan

    Hello Ryan,
    I would suggest you visit these two links:
    > http://service.sap.com/diagnotics - media library - here you can get installation and troubleshooting guides
    and
    > http://service.sap.com/EWA
    This way you have all the steps documented, and you can check what you have done so far against what is required, and identify the gaps.
    Hopefully this information has been helpful.
    Regards,
    Paul

  • Directory damage and fallout -- looking for root cause

    One of the G5s (2x 2.7GHz, 4.5 GB RAM, 250 GB HD) I support at work started exhibiting erratic behavior yesterday (spinning ball, slow app launch, general weirdness) and I told the user to reboot. He did, and the grey pinwheel screen came up and it sat there spinning for a while and then went wind-tunnel with black screen.
    I powered it back on and went into single-user mode and did fsck -fy and got a host of errors. Not thinking it wouldn't fix things, I don't remember what the initial errors were, but there were a lot of them. Missing files maybe?
    Anyhow, it couldn't repair everything. The errors I then got were:
    Incorrect number of thread records (4, 228)
    and
    Invalid volume file count
    I booted with TechTool 4.x and it found errors as well but could not repair. I then bought DiskWarrior and booted up another Mac and plugged the G5 in via firewire disk mode. I rebuild the directory and rebooted the G5 into single user mode and fsck gave me a clean bill of health.
    I rebooted again and it passed through the grey screen and "hung" on the blue screen prior to login for about 30 mins before I lost patience and rebooted in verbose mode.
    In verbose mode, everything seemed to go well until I got the following message every three seconds...
    ...waiting for IFC
    At that point, I left it to do its thing and came back this morning to the login window.
    I then rebooted the machine again and it "hung" on the blue screen again.
    Right now I am imaging the internal drive over to another drive. I am just going to rebuild the machine now, but what I really want to know is what is this IFC service that it was waiting for?
    (Unfortunately the rebuild will take a while since it has both Adobe CS2 AND Final Cut Pro -- two long installs that require multiple DVD swaps)
    --Mike

    Michael,
    Disk repair utilities, when they work, will return your file system to a "consistent state," but they cannot restore data that may have been lost as a result of the disk error. In other words, even after a successful repair, you may still have damaged files.
    If the damaged files happen to be critical system files, the machine "won't go." Period.
    The original cause could be one or more of several things. If the user that "rebooted" the machine had to perform a forced shutdown, doing so would only have aggravated the issue, causing further errors and damage. The original cause could have been a bad block on the drive, or it could simply have been a "tired" file system.
    If you want to get ahead of the game somewhat, I would recommend that you "zero" the drive (with the drive selected in Disk Utility, not the "volume" on the drive). This will map out any potential bad blocks, thereby eliminating the possible need to go through this again in the near future.
    Also, be aware that any "image" of the volume in question will, itself, remain damaged. I would recommend that you attempt to recover only user data from this backup; do not attempt to "restore" it using Disk Utility.
    Scott

  • Problem with E2E Root Cause analysis in EHP1

    Hi All,
    I'm facing a very strange problem with using the E2E RCA functionality in EHP1. The problem is when I select the systems that I want to analyse and click on a functionality such as change analysis, I get the following error message in SAP GUI:
    Action canceled
    Internet Explorer was unable to link to the Web page you requested. The page might be temporarily unavailable.
    I'm fairly sure that this is a connectivity problem because it only happens on my client machines which are behind the firewall. If I connect to solution manager without going trough the firewall I can access the application. I have checked all firewall logs and it is not blocking anything. I also did a trace using wireshark but could not find anything suspicious.
    Has anybody faced this problem? Any suggestions to resolve the problem?
    Cheers,
    Masoud

    Hi,
    If you have PI system in your landscape, I would strongly recommend to upgrade to SP15.
    Trace monitoring for PI is to be available from SP17 (Not yet released)
    Check [this Link.|http://service.sap.com/e2e].
    Hope this solves your problem.
    Feel free to revert back.
    --Ragu

  • Tool for Airport network statistics per connected device ?

    I'm in search of a tool that would help me identify network traffic load per connected device to my Airport network.
    My network is made of
    * WiFi infra: one main Airport Extreme base station connected thru Ethernet to my SP's cable-modem + 2 Airport Express as relays, all interconnected with WDS.
    * WiFi clients : 2 Macs + 2 PCs + 1 printer + 1 WiFi PDA-Phone, all connected thru WiFi only
    * I suspect misuse by one of the Macs or PCs (or alien ?) of WiFi network as sometimes network performance is really low, impacting all end-points' network performance
    * ideal tool could be based on SNMP stats of Ethernet/TCP/UDP/ports & packets per connected device. Should cover each Airport Express relay, the main Airport Extreme, possibly the cable-modem, and bring help for root cause analysis to go up the chain to faulty client and application (at least port/protocol)
    any idea of such kind of tool (preferably run on Mac OS X)
    thx in advance

    I apologize for taking up your time. I had bought this to use with my PS3 (60GB Launch model with 802.11b) but hadn't used the PS3 in the equation yet. I kind of gave up on the project for awhile and unplugged the Express. Just for giggles I plugged it in later and cabled it up to the PS3 for the first time. Worked perfectly. Now my PS3 downloads are flying.
    I'm not sure what solved the problem but it is working great now! Thanks again for the help.

  • Indices configuration for XML document analysis (indexing time problems)

    Hi all,
    I'm currently developing a tool for XML Document analysis using XQuery. We have a need to analyse the content of a large CMS dump, so I am adding all documents to a berkeley DB xml to be able to run xqueries against it.
    In my last run I've been running to indexing speed problems, with single documents (typically 10-20 K in size) taking around 20 sec to be added to the database after 6000 documents (I've got around 20000 in total). The time needed for adding docs to the database drops with the number of documents.
    I suspect my index configuration to be the reason for this performance drop. Indeed, I've been very generous with indexes, as we have to analyse the data and don't know the structure in advance.
    Currently my index configuration includes:
    - 2 default indicess: edge-element-presence-none and edge-attribute-presence-none to be able to speed up every possible xquery to analyse data patterns: ex. collection()//table//p[contains(.,'help')]
    - 8 edge-attribute-substring-string indices on attributes we use often (id, value, name, ...)
    - 1 edge-element-substring-string index on the root element of the xml documents to be able to speed up document searches: ex. collection()//page[contains(.,'help')]
    So here my questions:
    - Are there any possible performance optimisations in Database config (not index config)? I only set the following:
    setTransactional(false);
    envConf.setCacheSize(1024*64);
    envConf.setCacheMax(1024*256);
    - How can I test various index configuration on the fly? Are there any db tools that allow to set/remove indexes?
    - Is my index config suspect? ;-)
    Greetings,
    Nils

    Hi Nils,
    The edge-element-substring-string index on the document element is almost certainly the cause of the slow document inserts - that's really not a good idea. Substring indexes are used to optimize "=", contains(), starts-with() and ends-with() when they are applied to the named element that has the substring index, so I don't think that index will do what you want it to.
    John

  • Configuration Solution Manager - Travel Management for Route Cause Analisys

    Hi,
    Can you describe how to configure Solution Manager to monitor (RCA-E2E) our Travel System (SAP Enterprise Portal + SAP HR)?
    Best regards,
    Diego.

    Hi,
    RCA configuration remain same for all the module/component
    All the steps are there in RCA Configuration thread
    SPRO>SAP SOlutionManager Implementation Guide> SAP Solution Manager>Basic settings> Root Cause analysis
    here you can find all the information regarding setup, please follow one after the other.
    Regards,
    Shyam.

  • Oracle.jbo.NoDefException: JBO-29114 ADFContext is not setup to process messages for this exception. Use the exception stack trace and error code to investigate the root cause of this exception. Root cause error code is JBO-25058. Error message parameters

    Dear Guru's,
    I am not able to solve the above issue for last couple of days.
    I am newbie to the webservice
    My Issue...
    I am using Jdeveloper 11.1.2.4.0 Release 2
    1. Using Jdev I built one small Web Service with two methods.
            While testing the Webservice...
                   I passed User Id as Parameter and it successfully return the values (user id, user name and description) from fnd_user table
    2. I created another application to consume the web service i created.
                   1. I added the webservice SOAP and added the method.
                   2. Created a jsf page and drag and drop the parameter and return values to the jsf page.
    3. While executing the created jsf page I received the error message as below
    "oracle.jbo.NoDefException: JBO-29114 ADFContext is not setup to process messages for this exception. Use the exception stack trace and error code to investigate the root cause of this exception. Root cause error code is JBO-25058. Error message parameters are {0=Attribute, 1=UserName, 2=UserName}"
    Even I know that this issue is repeated one in our forum, I was not able to solve this issue.
    Can anybody help to solve this issue.
    Thanks and Regards,
    Durai S E

    Dear Guru's,
    I am not able to solve the above issue for last couple of days.
    I am newbie to the webservice
    My Issue...
    I am using Jdeveloper 11.1.2.4.0 Release 2
    1. Using Jdev I built one small Web Service with two methods.
            While testing the Webservice...
                   I passed User Id as Parameter and it successfully return the values (user id, user name and description) from fnd_user table
    2. I created another application to consume the web service i created.
                   1. I added the webservice SOAP and added the method.
                   2. Created a jsf page and drag and drop the parameter and return values to the jsf page.
    3. While executing the created jsf page I received the error message as below
    "oracle.jbo.NoDefException: JBO-29114 ADFContext is not setup to process messages for this exception. Use the exception stack trace and error code to investigate the root cause of this exception. Root cause error code is JBO-25058. Error message parameters are {0=Attribute, 1=UserName, 2=UserName}"
    Even I know that this issue is repeated one in our forum, I was not able to solve this issue.
    Can anybody help to solve this issue.
    Thanks and Regards,
    Durai S E

Maybe you are looking for