Libumem and UMEM_LOGGING

Hello,
I use libumem to try to find memory leaks. But my program needs to run for quite a lot of time (several hours) before leaks are likely to appear.
Meanwhile, many allocations/deallocations can occur.
Regarding the log as defined by UMEM_LOGGING=transaction=size, it seems this log is not circular.
In other words, when the log gets full, further allocations/deallocations are no longer logged. As a consequence, I cannot find any leak with mdb because the "interesting" transactions are ignored.
Do you confirm this is the way the libumem transaction log works? Is there any workaround?
Thanks,
Olivier.

Have you tried using dbx runtime checking to find
memory leaks and access errors?
http://docs.sun.com/source/819-0489/RunTCheck.html
I don't know how libumem by itself can find memory leaks for you. A lot of the usefulness of libumem debugging is because of the functions integrated into mdb.
I suspect that the libumem logs are actually circular, becaues that is much more useful than simply disabling the log when it gets full.
I think the way mdb checks for leaks is by examining which blocks are outstanding (currently allocated) and then scanning through all the stack memory and data memory and looking for any pointers to those blocks. If it finds blocks that have no pointers to them in valid memory, then that's a "leak".
Sometimes people a block a "leak" if it is left allocated when the program exits, but that's not really a leak, in my opinion.
If you playing around with libumem, you might be interested in a basic libumem module that I wrote for dbx. It's available from the blog entry that I wrote here:
http://blogs.sun.com/roller/page/quenelle?entry=umem_integration_with_dbx

Similar Messages

Libumem and java native leaks

I'm diagnosing a native memory leak in a java process(not caused by a user JNI library, probably due to not closing some stream tied to native resources). I had used this technique before with some success, but this time I'm running into some problems. Basically what I'm doing is using libumem and mdb to help find the leak. I realize that these tools can give some strange results when looking at the JVM, but previously when I have it used, I can just focus on the leaked buffers with a large count and that pointed me right to the problem(the problem is bad enough that eventually I run out of memory space for the process...so I know the leak is being triggered repeatedly). The first problem I have is with libumem/mdb and the stack printed out through bufctl_audit which only displays symbol addresses for java routines. If it displayed the java symbol names, or if there were a way to make it display them then I wouldn't have much of a problem(though I've found I need to increase the audit size because of the large stack frame for a java process).
So what I've done before is find the C library/routines where the memory is being leaked at, and then use dtrace to print stack traces for calls into that C library. Something like this:
dtrace -n 'pid$target:libzip:ZIP_GetEntry:entry { @s[jstack(60,3000)] = count() }' -p <PID>
This has pointed me to the right place in the past, because the place where the leak was happening was getting called frequently. The problem is with the current leak is this is much too course grained. The C library being called is being called quite often, so I'm getting way too many stacks, and sorting through to find the problematic one is difficult. Now if jstack() would show the java symbol name AND its address I could easily correlate it with what's in the findleaks/bufctl_audit output of libumem/mdb.
Is there something I can do inside mdb to help find the java symbol names, or is there a way to use dtrace to correlate the java symbol name with its address to help me out here? I can do some rather ugly iterative stuff with dtrace where I don't give it a large enough buffer to print the whole stack trace and with small increases probably find the mappings from the java symbols to the address, but I was hoping there was something a little less painful. This of course needs to be diagnosed in a production system...so what I can do is somewhat limited. There is redundancy, so I obviously can for a short period pause execution on one server to grab the findleaks output and things like that. Thanks,
Micah

Since your question is about tracing Java internals, you might do better posting in a Java-related forum. (This forum is for Sun Studio, for developing in C, C++, and Fortran.) Try one of the forums listed under Java here:
[http://forums.sun.com/index.jspa?tab=forums]
and especially here:
[http://forums.sun.com/category.jspa?categoryID=39]

Help finding location of Memory corruption using libumem and mdb

Hi there,
I am debugging an application, an rpc based server that runs on a solaris 10 box. This is a server that has a footprint of about 2.3 gigs (while using libumem debug options, its close to 3.5 gigs). Most of the size is because of caching. When certain modifications are done to objects in the cache, a copy of the cache is made by the modifying thread, and then the older copy is freed when no other thread is using it.
One of our customers is using libumem as the memory manager. They started seeing crashes every now and then when the copy of cache is being freed. Obviously, it doesnt crash everytime, and each time the crash is with a different object all together. libumem raises a signal 6 abort, and complains that a piece of memory that is being freed is invalid or corrupt memory. Needless to say, countless hours of code walkthrough, and trying to reproduce the problem in house has failed. So we turned on libumem debugging and logging. Using watchmalloc at this site is not an option because of performance considerations. This is the setup we are using.
LD_PRELOAD=${LD_PRELOAD}:libumem.so.1
export LD_PRELOAD
UMEM_DEBUG=default;export UMEM_DEBUG
UMEM_LOGGING=transaction;export UMEM_LOGGING
We have a core from this crash. I am using mdb to analyze the core.
::umem_status tells me that the address being freed is invalid or corrupt memory. It even shows me where it was initially allocated. I want to find out where this piece of memory got corrupted.
::umalog shows the log of all memory allocation and freeing transactions. But because this is a huge app, it has a ton of information. I tried using address::umalog and that doesnt seem to narrow down the log.
So my question is, given the core generated using these settings, what mdb commands do I use to find out what operation/code corrupted the memory. I would expect to see three pieces of information for the address whose free failed.
1. Where it was initially allocated
2. Where the error/crash occured
3. What operation corrupted the memory, whether its a overwrite or freeing (leading to double free)
I ll be glad to provide more information if needed.
Any help is deeply appreciated.

If you have the address that is corrupt, try finding addresses that are just before the corrupt address. If your corrupt address is 0xF028, for example, look for addresses like 0xF020 or 0XF018. Most likely what's happening is some code is writing too far into a block, corrupting what's in the next block.

JVM 1.5.0_11 and libumem -- need stack trace help for memory leak

I'm trying to track down the cause of some memory growth in a java application. In my tests, the java heap appears to remain stable, but the overall memory footprint of the jvm process continues to grow (observed with pmap).
I've run my application with libumem and have found what appears to be the culprit, but the memory allocation is in libjvm.so and I'm looking for ideas what might cause it.
uname -a for my host
SunOS thehost 5.10 Generic_118822-18 sun4u sparc SUNW,Netra-440
Here is the trace from libumem:
1f81c4c0::bufctl_auditADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
1f81c4c0 1f81a470 ac018b4577a0 7
1f43f188 8cda6a4 0
libumem.so.1`umem_cache_alloc+0x210
libumem.so.1`umem_alloc+0x60
libumem.so.1`malloc+0x28
libjvm.so`void*os::malloc+0x28
libjvm.so`void*ResourceObj::operator new+0x38
libjvm.so`ThreadStackTrace::ThreadStackTrace #Nvariant 1+0x34
libjvm.so`void VM_ThreadDump::doit+0xcc
libjvm.so`void VM_Operation::evaluate+0x80
libjvm.so`void VMThread::run+0x6e0
libjvm.so`void*_start+0x208
libc.so.1`_lwp_start
What causes this invocation in the JVM? Is there a known memory leak associated with this?
Thanks in advance for the assistance.

More on this issue. The included program will continually allocate memory on the process heap until the JVM cannot allocate memory and it exits with the following exception.
Exception java.lang.OutOfMemoryError: requested 16 bytes for C_Heap: ResourceOBJ. Out of swap space?
import java.lang.StackTraceElement;
import java.lang.Thread;
public class TraceIt {
     public static void main(String[] args) {
          System.out.println("Starting trace");
          int i = 0;
          while (true)
               if (i%100 == 0) System.out.println(i);
               StackTraceElement[] se = Thread.currentThread().getStackTrace();
               i++;
}

How to find out what is using the native heap of a process running a JVM?

Hello,
I am not sure where to post this question so I am starting here.
I am troubleshooting a Java application using some native calls (32 bits Java running on Solaris 10). The size of the process (as reported by prstat) is slowly increasing day after day.
The size of the 'Java heap' is fixed at the start (-Xms and -Xmx are set to the same value on the command line when launching the Java app) and the GC is workling fine. No memory complaints from the Java side of the application.
It is the size of the 'native' heap (as reported by 'pmap') that is increasing:
root@mas01 # pmap 5382
5382:/apps/java/bin/java -server -Xms207M -Xmx207M -XX:MaxNewSize=24M -XX:N
00010000 64K r-x-- /apps/jdk1.5.0_19/bin/java
0002E000 16K rwx-- /apps/jdk1.5.0_19/bin/java
00032000 3896K rwx-- [ heap ]
00400000 389120K rwx-- [ heap ]
18000000 2784K rwx-- [ heap ]
DCAF4000 48K rw--R [ stack tid=169 ]
DCBF6000 40K rw--R [ stack tid=161 ]
DCCF8000 32K rw--R [ stack tid=160 ]
My first reaction was to search for a memory leak. Found a minor leak in the JVM with the ::findleak function (called within the mdb debugger). Upgraded to a later release of Java 5 (Java 1.5.0_19) where the leak is fixed but the heap is still increasing.
Many parts of the process allocate memory in the native heap. The JVM itself, the native calls made to a C++ library part of our Java application and maybe also some 3rd party software.
I would like to know what is the best way to find out what is consuming more and more memory in the native heap. I started looking a DTraces but I am new to this. I also thought maybe the Solaris Perftools might be of use but I never used them. Before plunging into a tool more or less blindly, I am asking for advices on how to tackle this issue. Can someone recommend a tool/method to see what is allocated in the heap?
Regards,
Stéphan
Edited by: StephanDupont on Sep 22, 2009 8:47 AM

After googling a lot I managed to run my application with libumem, generated a core file and succeeded to find some leak with mdb even if ::findleak reported nothing.
Does anyone knows if the ::findleak (you need libumem and mdb) is supposed to find leak in the native part of the memory and a Java application using the JNI interface?
Regards,
Stéphan

How to find out who is using the shared library?

Is there a way to find out who is listening to my music on my shared library and what they are listening to?

After googling a lot I managed to run my application with libumem, generated a core file and succeeded to find some leak with mdb even if ::findleak reported nothing.
Does anyone knows if the ::findleak (you need libumem and mdb) is supposed to find leak in the native part of the memory and a Java application using the JNI interface?
Regards,
Stéphan

Memory leak with ThreadStackTrace in libjvm.so (jdk 1.5.0_11)

I posted this initially in "Desktop > Runtime Environment > Java Runtime Environment (JRE)", but am reposting here since this may be a more appropriate place. Here's a link to the original post:
http://forum.java.sun.com/thread.jspa?threadID=5156031&tstart=0
I'm trying to track down the cause of some memory growth in a java application. In my tests, the java heap appears to remain stable, but the overall memory footprint of the jvm process continues to grow (observed with pmap).
I've run my application with libumem and have found what appears to be the culprit, but the memory allocation is in libjvm.so and I'm looking for ideas what might cause it.
uname -a for my host
SunOS thehost 5.10 Generic_118822-18 sun4u sparc SUNW,Netra-440
and I'm using Java 1.5.0_11
Here is the trace from libumem:
1f81c4c0::bufctl_auditADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
1f81c4c0 1f81a470 ac018b4577a0 7
1f43f188 8cda6a4 0
libumem.so.1`umem_cache_alloc+0x210
libumem.so.1`umem_alloc+0x60
libumem.so.1`malloc+0x28
libjvm.so`void*os::malloc+0x28
libjvm.so`void*ResourceObj::operator new+0x38
libjvm.so`ThreadStackTrace::ThreadStackTrace #Nvariant 1+0x34
libjvm.so`void VM_ThreadDump::doit+0xcc
libjvm.so`void VM_Operation::evaluate+0x80
libjvm.so`void VMThread::run+0x6e0
libjvm.so`void*_start+0x208
libc.so.1`_lwp_start
It looks like this leak occurrs when getStackTrace() is called on a Thread.
I've found that the included program will continually allocate memory on the process heap until the JVM cannot allocate memory and it exits with the following exception.
Exception java.lang.OutOfMemoryError: requested 16 bytes for C_Heap: ResourceOBJ. Out of swap space?
import java.lang.StackTraceElement;
import java.lang.Thread;
public class TraceIt {
public static void main(String[] args) {
System.out.println("Starting trace");
int i = 0;
while (true)
if (i%100 == 0) System.out.println(i);
StackTraceElement[] se = Thread.currentThread().getStackTrace();
i++;
} Any ideas what would cause this? Is it a JVM bug?

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6469701
BTW, when you register to post on this site why isn't Cuba in the list of countries?

Memory leak using 10.2.0.3 OCI client on Solaris

Hi,
We are using OCI client libraries to connect our C++ program to the Oracle Database. The program does a lot of selects, inserts and SP calls.
Oracle client and Oracle server both are 10.2.0.3.
We have been observing a memory leak of 4M bytes in the C++ program every few minutes since last few days. On debugging through Purify, libumem, and Sun Studio 12, we finally managed to narrow down the problem to the Oracle client library OCI calls.
The Sun Studio leak check shows the following -
Leak #37, Instances = 157, Bytes Leaked = 655004
kpummapg + 0x00000098
kghgex + 0x00000648
kghfnd + 0x000005BC
kghalo + 0x00000A6C
kghgex + 0x000003BC
kghfnd + 0x000005BC
kghalo + 0x00000A6C
kghgex + 0x000003BC
kghfnd + 0x000005BC
kghalo + 0x00000A6C
kpuhhalo + 0x00000558
kpugdesc + 0x00000AD4
kpugparm + 0x00000374
COCIResultSet::InterpretData() + 0x000001B4
COCIResultSet::COCIResultSet(COCIStatement*,OCIStmt*,OCIError*) + 0x000000A4
COCIStatement::PrepareResult() + 0x00000190
A select is executed, a resultset is fetched and the resultset is immeidately closed. The same piece of code has been running at various production systems without any problems. Most of the other sites are either 10.2.0.4 or 9i.
On searching Metalink and various other forums, I found similar issues faced in 10.2.0.1.
Could someone advise if there are any bugs corresponding to this which have been closed. Would upgrading to 10.2.0.4 solve the problem?
Thanks.

Hi,
Apparently a similar issue is being discussed over here:
Re: Memory Leak
Hope it helps.
Regards,
Naveed.

Memory leak using 10.2.0.3 OCCI client on Solaris 10

Hi,
We are using OCCI client libraries to connect our C++ program to the Oracle Database. The program does a lot of selects, inserts and SP calls.
Oracle client and Oracle server both are 10.2.0.3 on Solaris 10.
We have been observing a memory leak of 4M bytes in the C++ program every few minutes since last few days. On debugging through Purify, libumem, and Sun Studio 12, we finally managed to narrow down the problem to the Oracle client library OCI calls.
The Sun Studio leak check shows the following -
Leak #37, Instances = 157, Bytes Leaked = 655004
kpummapg + 0x00000098
kghgex + 0x00000648
kghfnd + 0x000005BC
kghalo + 0x00000A6C
kghgex + 0x000003BC
kghfnd + 0x000005BC
kghalo + 0x00000A6C
kghgex + 0x000003BC
kghfnd + 0x000005BC
kghalo + 0x00000A6C
kpuhhalo + 0x00000558
kpugdesc + 0x00000AD4
kpugparm + 0x00000374
COCIResultSet::InterpretData() + 0x000001B4
COCIResultSet::COCIResultSet(COCIStatement*,OCIStmt*,OCIError*) + 0x000000A4
COCIStatement::PrepareResult() + 0x00000190
A select is executed, a resultset is fetched and the resultset is immeidately closed. The same piece of code has been running at various production systems without any problems. Most of the other sites are either 10.2.0.4 or 9i.
On searching Metalink and various other forums, I found similar issues faced in 10.2.0.1.
Could someone advise if there are any bugs corresponding to this which have been closed. Would upgrading to 10.2.0.4 solve the problem?
Thanks.

Please ... one post and one post only in the group most appropriate to your inquiry. Please open an SR at metalink.

Tools for root cause analysis of stack corruption?

I'm experiencing extremely rare stack corruption that results in SEGV core dumps for a large and complex C++ program. Having run all the standard tools such as IBM Rational Purify, Sun mdb/libumem and dbx/rtc (although unfortunately using dbx check -access takes inordinately long and eventually crashes before the main() function is executed), I am no closer to discovering the root cause of the stack corruption. I'm confident based on the tools run and on the stack trace from the core dump that the problem is not heap corruption, but stack corruption.
The environment is Sun Studio 12 (but not update 1) on Solaris 10 on SPARC. The program is compiled with minor optimisation (-xO2 -xbuiltin=%all).
Is anyone aware of other tools or approaches that could help pinpoint the problem? Your help would be much appreciated!
Thanks in advance,
Simon

If you are in fact having stack corruption issues, I don't think any of the tools you mentioned other than Purify would help you identify it.
You may also be simply running out of stack space, and not having corruption issues. Is your app multi-threaded? If so, you could increase the stack size your threads use to something larger than the default.
Another thing you can look for are syslog entries stating "no swap space to grow stack" for your process, you've run out of virtual memory. To avoid this, you can "pre-allocate" your stack memory with code similar to this:
void growStack( size_t bytes )
    char *mem = ( char * ) alloca( bytes );
    memset( mem, 0, bytes );
    return;
}That code, when called, will force the creation of stack memory virtual pages backed by swap, before your server gets into a situation where free memory might be in short supply.
I also seem to recall that Solaris under certain circumstances will allocate stack memory with the MAP_NORESERVE option, which means swap space won't be reserved for your stack. If your process gets swapped out, its stack(s) will be lost and you'll probably get a SIGSEGV or SIGBUS. See this bug:
[http://bugs.opensolaris.org/view_bug.do?bug_id=1221729|http://bugs.opensolaris.org/view_bug.do?bug_id=1221729]
I remember working a similar issue for a customer running large apps on Sun E15Ks, maybe about 5 years ago. To work around this behavoir, I think you'll need to explicitly allocate stack memory for any threads you may be creating. I think that's what we had to do.

Malloc problem

I apologize if this is not the best forum, but I did not see another one devoted to development
in "C".
I am working on a problem for 64-bit Solaris. Here is my version:
SunOS netsun2 5.10 Generic_142901-05 i86pc i386 i86pc
I am also running MySQL version 5.1.46.
My program is written in "C" and uses Posix threads (pthreads). Initially, I used regular
malloc and free, but have just changed over to use libumem, as it is supposed to be
thread-safe.
I am still seeing the same problem I had before, which is that when I call umem_alloc,
it appears to be returning the same address each time, even when the address is already
in use. My program sometimes allocates a buffer which is then passed to one of my
MySQL worker threads. When the worker thread is finished, it frees the buffer.
The behavior I consistently see is that the same buffer address appears to be allocated
when it is already in use. Now, I am printing the 'char *' address as "0x%x" in my debug printf
statements, so I assume I am still seeing the address correctly! I am also pretty sure that
I have debugged enough to know that I have not somehow already freed the buffer (in which
case I understand that it will get used again by a subsequent malloc).
At first I thought this might be a MySQL problem (after all I am writing a multi-threaded client),
but now this appears to be more systems software (or my program!) related.
Here are my compile and link flags:
CC = gcc
LINKER = gcc
CFLAGS = -Wall -Wextra -m64 -O3 -mtune=k8 -g -I/opt/mysql/mysql/include
LDFLAGS = -m64
LIBS = -L/usr/lib -lmysqlclient_r -lsocket -lnsl -lrt -lpthread -lm -lumem
Can anyone think of what might cause this behavior? Or suggest some specific steps
to figure out what I might be doing wrong?
Any answers will be greatly appreciated.
Mitch ([email protected])

mitchmcc wrote:
I apologize if this is not the best forum, but I did not see another one devoted to development
in "C".
I am working on a problem for 64-bit Solaris. Here is my version:
SunOS netsun2 5.10 Generic_142901-05 i86pc i386 i86pc
I am also running MySQL version 5.1.46.
My program is written in "C" and uses Posix threads (pthreads). Initially, I used regular
malloc and free, but have just changed over to use libumem, as it is supposed to be
thread-safe.I wouldn't say that. The man page says
Functions in this library provide fast, scalable object-
caching memory allocation with multithreaded application
support.
What I would do
Get you app working with the most common denominator (libc and malloc/free). Then try linking with libumem and using libumem malloc/free. Then try umem_alloc/umem)free.
I'd also try using Sun Studio as it often produces faster code than GCC.
Paul

Crash aroused by mdb+libumem

Hi, all
I'm new to use mdb+libumem to debug the memory leak issue in solaris 10. I found that I can run findleaks against some simple of my C++ applications, however the findleaks will result in a process crash sometimes against some specified C++ applications. Then I tried the alternatives like, use gcore to dump the core and then use mdb+libumem to do the memory leak analysis.
Now, a new problem comes up. I found recently, some actions of my C++ application would result in the process crashed as well after I launched the application with env parameters like "UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so".
I'm confused and frustrated right now...would anyone please shed any lights on
1. why findleaks will result in process crash?
2. why libumem will result in my process crash?
Thanks in advanced.

well...would anybody please tell me if it's the correct forum to raise this question up?
Thanks very much.

Libumem complains on redzone violation in std::deque even with latest patch

HW: e2900 (USIV+ cpus), 5.10 Generic_118833-24
I've applied the latest libCstd patch 119963-08
SW:
CC +w -fast -erroff=hidef -V -PIC -mt -xtarget=ultra3 -xarch=v9a sunBugReportDeque.cc
CC: Sun C++ 5.8 Patch 121017-08 2006/12/06
cg: Sun Compiler Common 11 Patch 120760-11 2006/10/18
The attached code fails, when running under libumem on said machine
If I change the deque to a list, everything works smoothly, suggesting there is still something fishy inside the deque allocate/deallocate functions. I could not reproduce the fault on my (single-CPU) workstation, but on the 2900 it crashes immediately.
// file: sunBugReportDeque.cc
#ifndef DEPEND
#include <deque>
#include <exception>
#include <functional>
#include <iostream>
#include <list>
#include <sstream>
#include <stdexcept>
#include <stdio.h>     // sprintf
#include <synch.h>
#include <sys/errno.h> // ETIME
#include <sys/time.h> // gettimeofday
#include <thread.h>
#include <time.h>
#include <unistd.h>
#include <vector>
#endif
using namespace std;
#define ENSURE(expr) do { int rc = expr ; \
        if (rc != 0) \
            char buf[800]; \
            snprintf(buf, sizeof(buf), "%s: unexpected rc: %d", #expr, rc); \
            buf[sizeof(buf)-1] = '\0'; \
            throw std::logic_error(buf);\
    } while(0)
/** A simple mutex implementation. */
class Mutex
    friend class Condition; // needs access to my mutex
public:
    Mutex() {
        ENSURE(::mutex_init(&_impl, 0, 0));
     * Destructor.
    ~Mutex() {
        ENSURE(::mutex_destroy(&_impl));
    void acquire() {
        ENSURE(::mutex_lock(&_impl));
    void acquireRead() { this->acquire(); }
    void acquireWrite() { this->acquire(); }
    void release() {
        ENSURE(::mutex_unlock(&_impl));
    void releaseRead() { this->release(); }
    void releaseWrite() { this->release(); }
    inline bool tryAcquire();
    bool tryAcquireRead() { return this->tryAcquire(); }
    bool tryAcquireWrite() { return this->tryAcquire(); }
private:
    mutex_t _impl;
bool
Mutex::tryAcquire()
while(true) {
    int rc = ::mutex_trylock(&_impl);
    switch (rc)
    case 0: // ok, we got it...
        return true;
    case EBUSY: // nope, it was already taken...
        return false;
    case EINTR: // fork or signal, try again...
        continue;
    case EINVAL: // The rest is faults, should not happend...
    case EFAULT:
        throw std::invalid_argument("Internal error, illegal adress");
    default:
        char msg[800];
        ::sprintf(msg, "::mutex_trylock: Unrecognized rc: %d", rc);
        throw std::logic_error(msg);
/** A simple condition variable implementation. */
class Condition
public:
    enum WaitStatus { TIMEOUT, SIGNALED };
    Condition(Mutex &m) : _m(m) { ENSURE(::cond_init(&_impl, 0, 0)); }
    ~Condition() { ENSURE(::cond_destroy(&_impl)); }
    void signal() { ENSURE(::cond_signal(&_impl)); }
    void signalAll() { ENSURE(::cond_broadcast(&_impl)); }
    inline void wait();
    inline WaitStatus wait(unsigned long ms);
private:
    Condition(const Condition&);
    const Condition& operator=(const Condition&);
    cond_t _impl;
    Mutex &_m;
void
Condition::wait()
    while(true) {
        int rc = ::cond_wait(&_impl, &_m._impl);
        switch (rc)
        case 0: // NoOp, all is well...
            return;
        case EFAULT:
            throw std::invalid_argument("Internal error, illegal adress");
        case EINTR: // fork or signal, we should still be sleeping
            continue;
        default:
            char msg[50];
            sprintf(msg, "::cond_wait: Unrecognized rc: %d", rc);
            throw std::logic_error(msg);
Condition::WaitStatus
Condition::wait(unsigned long millis)
    struct timeval tp;
    timestruc_t to;
    if(gettimeofday(&tp, 0))
    { // failed, use time() instead
        to.tv_sec = ::time(NULL) + millis / 1000;
        to.tv_nsec = (millis % 1000) * 1000000; // 1e6 nanos-per-milli
    else
    { // Ok, calculate when to wake up..
        to.tv_sec = tp.tv_sec + millis/1000;
        to.tv_nsec = tp.tv_usec*1000 + (millis%1000)*1000000;
        if(to.tv_nsec >= 1000000000)
            to.tv_nsec -= 1000000000;
            to.tv_sec++;
    while(true) {
        int rc = ::cond_timedwait(&_impl, &_m._impl, &to);
        switch (rc)
        case 0: // NoOp, all is well... Someone told ut to wake up before timeout...
            return SIGNALED;
        case EFAULT:
            throw std::invalid_argument("Internal error, illegal adress");
        case EINTR: // fork or signal, we should still be sleeping
            continue;
        case ETIME:
        case ETIMEDOUT:
            return TIMEOUT;
        default:
            char msg[50];
            sprintf(msg, "::cond_timedwait: Unrecognized rc: %d", rc);
            throw std::logic_error(msg);
/** Suitable for grabbing a mutex exclusively. The mutex will be
* released when the object dies.
template <class Lock>
class Guard
public:
    Guard(Lock &l) : _l(l) { _l.acquire(); }
    ~Guard() { _l.release(); }
private:
    Guard(const Guard&);
    const Guard& operator=(const Guard&);
    Lock &_l;
class Timer
public:
    Timer() : _cond(_mutex), _isCancelled(false) { }
    ~Timer() { }
    /** Sleeps the specified no of secs, or until the timer is cancelled. */
    inline void sleep(const int millis);
    /** Cancels the timer. Ongoing sleeps will wakeup, new ones will not block.
    inline void cancel();
    inline bool isCancelled();
protected:
private:
    Timer(const Timer& aTimer);
    Timer& operator=(const Timer& aTimer);
    Mutex     _mutex;
    Condition _cond;
    bool         _isCancelled;
void Timer::sleep(const int millis)
    Guard<Mutex> lock(_mutex);
    if (! _isCancelled) // only wait one turn
        _cond.wait(millis);
void Timer::cancel()
    Guard<Mutex> lock(_mutex);
    _isCancelled = true;
    _cond.signalAll();
bool Timer::isCancelled()
    Guard<Mutex> lock(_mutex);
    return _isCancelled;
// shouldn't this be available in STL somewhere???
template <class T>
struct Predicate : public std::unary_function<T, bool>
    virtual bool operator()(const T &x) const = 0;
/** A simple Producer Consumer Queue implementation. */
template <class T>
class PCQueue
public:
    PCQueue(size_t aMaxLenght)
            : myMaxlength(aMaxLenght),
              myQNotEmpty(myQLock),
              myQNotFull(myQLock){ }
    void push(const T& aT);
    bool tryPush(const T& aT);
    T pop();
    bool tryPop(T& retVal, const unsigned int millis = 0);
    size_t size() const;
    bool isFull() const;
     * Atomically purges (removes) the FIRST element for which the supplied
     * predicate returns true.
    bool purge(const Predicate<T>& pred, T& theItem);
protected:
private:
    typedef Guard<Mutex> MutexGuard;
    std::deque<T>        myQueue;
    mutable Mutex        myQLock;
    Condition            myQNotEmpty;
    Condition            myQNotFull;
    size_t               myMaxlength;
template <class T>
size_t PCQueue<T>::size() const
    MutexGuard g(myQLock);
    return myQueue.size();
template <class T>
void PCQueue<T>::push(const T& aT)
    MutexGuard g(myQLock);
    while (myMaxlength && myQueue.size() >= myMaxlength)
        myQNotFull.wait();
    myQueue.push_back(aT);
    myQNotEmpty.signal();
template <class T>
bool PCQueue<T>::tryPush(const T& aT)
    MutexGuard g(myQLock);
    while (myMaxlength && myQueue.size() >= myMaxlength)
        return false;
    myQueue.push_back(aT);
    myQNotEmpty.signal();
    return true;
template <class T>
T PCQueue<T>::pop()
    MutexGuard g(myQLock);
    T entry;
    while (myQueue.empty())
        myQNotEmpty.wait();
    entry = myQueue.front();
    myQueue.pop_front();
    myQNotFull.signal();
    return entry;
template <class T>
bool PCQueue<T>::tryPop(T& retVal, const unsigned int millis)
    MutexGuard g(myQLock);
    long long start = ::gethrtime();
    long remainder = millis;
    while (remainder > 0 && myQueue.empty())
        myQNotEmpty.wait(remainder);
        remainder = millis - (long)((::gethrtime() - start) / 1000000LL);
    if (myQueue.empty()) // timed out
        return false;
    retVal = myQueue.front();
    myQueue.pop_front();
    myQNotFull.signal();
    return true;
template <class T>
bool PCQueue<T>::isFull() const
    MutexGuard g(myQLock);
    if (myMaxlength == 0) // No limit on the queue
        return false;
    return (myQueue.size() >= myMaxlength) ? true : false;
template <class T>
bool PCQueue<T>::purge(const Predicate<T> &pred, T& theItem)
    MutexGuard g(myQLock);
    for (std::deque<T>::iterator i = myQueue.begin(); i != myQueue.end(); ++i)
        if (pred(*i))
            theItem = *i;
            myQueue.erase(i);
            myQNotFull.signal();
            return true;
    return false;
struct fifthBitSet : public Predicate<hrtime_t *>
    bool operator()(hrtime_t * const &i) const { return (bool) ((*i) & (0x1L << 4)); }
class StressTest
public:
    StressTest(int consumers, int producers);
    ~StressTest();
    void start();
    void stop();
    void sleep(int seconds) { timer_.sleep(seconds * 1000); }
private:
    void consume();
    void produce();
    static void * consumer(void *arg);
    static void * producer(void *arg);
    static void joinThread(thread_t tid);
    Timer         timer_;
    PCQueue<hrtime_t*> queue_;
    vector<thread_t> consumers_;
    vector<thread_t> producers_;
StressTest::StressTest(int c, int p) : queue_(501), consumers_(c, 0L), producers_(p, 0L)
StressTest::~StressTest()
    hrtime_t *val = NULL;
    while (queue_.tryPop(val, 0))
        delete val;
void
StressTest::joinThread(thread_t tid)
    void * status;
    int rc = thr_join(tid,
                       NULL,
                       &status);
    if (rc != 0)
        char buf[80];
        snprintf(buf, sizeof(buf), "thr_join: unexpected rc: %d", rc);
        throw std::logic_error(buf);
void
StressTest::start()
    for (int i = 0; i < consumers_.size(); ++i)
        thread_t tid = 0L;
        thr_create(NULL, NULL, &consumer, this, NULL, &tid);
        consumers_ = tid;
for (int i = 0; i < producers_.size(); ++i)
thread_t tid = 0L;
thr_create(NULL, NULL, &producer, this, NULL, &tid);
producers_[i] = tid;
void
StressTest::stop()
timer_.cancel();
for (int i = 0; i < consumers_.size(); ++i)
queue_.push(NULL);
for_each(producers_.begin(), producers_.end(), joinThread);
for_each(consumers_.begin(), consumers_.end(), joinThread);
void *
StressTest::consumer(void *arg)
StressTest * test = reinterpret_cast<StressTest *>(arg);
test->consume();
return NULL;
void *
StressTest::producer(void *arg)
StressTest * test = reinterpret_cast<StressTest *>(arg);
test->produce();
return NULL;
void
StressTest::consume()
while (! timer_.isCancelled())
hrtime_t *ptime = queue_.pop();
while (ptime != NULL)
hrtime_t now = ::gethrtime();
if((now - *ptime) > 1000000000)
ostringstream os;
os << "Too old request: " << ((double)(now - *ptime)) / 1.0e9 << endl;
cerr << os.str() << flush;
delete ptime;
ptime = queue_.pop();
void
StressTest::produce()
while (! timer_.isCancelled())
hrtime_t *pnow = new hrtime_t;
*pnow = ::gethrtime();
bool qIsFull = ! queue_.tryPush(pnow);
while (qIsFull)
hrtime_t *pToRemove = NULL;
if (queue_.purge(fifthBitSet(), pToRemove) == false)
ostringstream os;
os << "Queue full, failed to make room, rejected call:" << *pnow << endl;
cerr << os.str() << flush;
delete pnow;
return;
if (pToRemove)
ostringstream os;
os << "Queue full, removed 1 item: " << *pToRemove << endl;
cerr << os.str() << flush;
delete pToRemove;
qIsFull = ! queue_.tryPush(pnow);
int
main(const int argc, char *argv[])
StressTest test(atoi(argv[1]), atoi(argv[2]));
test.start();
test.sleep(atoi(argv[3]));
test.stop();
Message was edited by:
anderso
Forgot to include the libumem log:
%>env | fgrep UMEM
UMEM_DEBUG=default,verbose
UMEM_LOGGING=transaction,contents,fail
%>(setenv LD_PRELOAD /usr/lib/64/libumem.so ; ./a.out 8 8 400)
umem allocator: redzone violation: write past end of buffer
buffer=1007c3aa0 bufctl=1007c81f0 cache: umem_alloc_48
previous transaction on buffer 1007c3aa0:
thread=d time=T-0.000430640 slab=100799e10 cache: umem_alloc_48
libumem.so.1'?? (0xffffffff7f216278)
libumem.so.1'?? (0xffffffff7f2166d0)
libumem.so.1'?? (0xffffffff7f2130ec)
libCrun.so.1'?? (0xffffffff7ec08810)
a.out'?? (0x100006df4)
a.out'?? (0x1000046ec)
a.out'?? (0x100003c04)
libc.so.1'?? (0xffffffff7e7cd2f8)
umem: heap corruption detected
stack trace:
libumem.so.1'?? (0xffffffff7f21471c)
libumem.so.1'?? (0xffffffff7f213574)
libCrun.so.1'?? (0xffffffff7ec0786c)
a.out'?? (0x100006e70)
a.out'?? (0x1000046ec)
a.out'?? (0x100003c04)
libc.so.1'?? (0xffffffff7e7cd2f8)
Abort (core dumped)
%>mdb core
> ::umem_verify
Cache Name Addr Cache Integrity
umem_magazine_1 100720028 clean
umem_magazine_3 100722028 clean
umem_magazine_7 100724028 clean
umem_magazine_15 100728028 clean
umem_magazine_31 10072a028 clean
umem_magazine_47 10072c028 clean
umem_magazine_63 100730028 clean
umem_magazine_95 100732028 clean
umem_magazine_143 100734028 clean
umem_slab_cache 100738028 clean
umem_bufctl_cache 10073a028 clean
umem_bufctl_audit_cache 10073c028 clean
umem_alloc_8 100742028 clean
umem_alloc_16 100748028 clean
umem_alloc_32 10074a028 clean
umem_alloc_48 10074c028 1 corrupt buffer
umem_alloc_64 100750028 clean
10074c028::umem_verifySummary for cache 'umem_alloc_48'
buffer 1007c3aa0 (allocated) has a corrupt redzone size encoding

This issue has been filed as bug 6514832, and will be visible at bugs.sun.com in a day or two.

Libumem assert

Hi,
My application runs fine, but I wanted to check for memory errors. So I defined LD_PRELOAD=/usr/lib/libumem.so.1, UMEM_LOGGING=transaction and UMEM_DEBUG=default, I get an assertion failure during static init in libumem with the following callstack:
(dbx) where
current thread: t@1
=>[1] lwpkill(0x0, 0x6, 0x0, 0xfdbbc000, 0x0, 0x0), at 0xfdb9f82c
[2] raise(0x6, 0x80, 0x0, 0xffbfe570, 0x0, 0xff36ea8c), at 0xfdb50a1c
[3] umem_do_abort(0x27, 0xa, 0x25640a00, 0x7efefeff, 0x81010100, 0xff00), at 0xff363dcc
[4] panic(0xff37060c, 0xff37172c, 0xff371748, 0xc71, 0xff384000, 0xff3ee7b4), at 0xff3640e4
[5] __umem_assert_failed(0xff37172c, 0xff371748, 0xc71, 0xff3ee7b4, 0x13d3ccc, 0x0), at 0xff363e18
[6] umem_do_init(0xff387190, 0x1eafc, 0xffffffff, 0xff384000, 0x21, 0xffbfe664), at 0xff3699f8
[7] umem_alloc_retry(0xff387190, 0x0, 0xfdc8639c, 0xff36070c, 0x5, 0xffbfe6ec), at 0xff365524
[8] umem_alloc(0x28, 0x0, 0xfdc8d26c, 0xfdc1fd48, 0x2, 0xfe0e0039), at 0xff367304
[9] malloc(0x20, 0x8, 0x0, 0xfea48000, 0x74, 0xfe0e0020), at 0xff36354c
[10] __dce_pthread_get_self_tcb(0x0, 0x1, 0xfdfbff4c, 0xfdf85ab8, 0x0, 0x0), at 0xfde3156c
[11] __dce_pthread_init(0xfe03cdec, 0xfe03cddc, 0xfdfc00d0, 0xfdf85ab8, 0x1, 0xfffb904c), at 0xfde32104
[12] _init(0x0, 0xfe100bbc, 0x30, 0xfe100ba0, 0xfeab839c, 0x0), at 0xfdf52258
[13] call_init(0xfe101170, 0x0, 0xfe101170, 0xffdfffff, 0x400000, 0x80000), at 0xff3bf67c
[14] elf_bndr(0x1, 0x5, 0xff362964, 0xfdc19008, 0x5, 0x0), at 0xff3cb734
[15] elfrtbndr(0xff363488, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff3b3a64
[16] 0xff3846a8(0xff370390, 0x20ba4, 0x0, 0xff384000, 0x0, 0x0), at 0xff3846a7
[17] umem_process_envvars(0x1, 0x2, 0x1da70, 0xff36952c, 0x0, 0x0), at 0xff363488
[18] umem_init(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff369554
[19] umem_do_init(0xff387190, 0x1eafc, 0xffffffff, 0xff384000, 0x0, 0x0), at 0xff369a1c
[20] umem_alloc_retry(0xff387190, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff365524
[21] umem_alloc(0x48, 0x0, 0xfe1ddcd8, 0xfecd1488, 0x5, 0xffbfed74), at 0xff367304
[22] malloc(0x40, 0x58, 0xfe1de080, 0xfe1884d8, 0x1, 0xffbfedd4), at 0xff36354c
[23] operator new(0x40, 0x4d58, 0x13740, 0xfffffff8, 0xfecea8d4, 0x40), at 0xfecd71b8
[24] std::locale::init(0xfe2edd40, 0x68ec8, 0x0, 0x800, 0xfe2e6498, 0xb4c), at 0xfe27d5fc
[25] std::basic_istream<char,std::char_traits<char> >::basic_istream(0xfe2ec658, 0x0, 0x71940, 0xfe2e8e30, 0xfe2e8e3c, 0xfe2ec678), at 0xfe274bc8
[26] __SLIP.INIT_A(0x0, 0x714c8, 0x0, 0x800, 0xfe2e6498, 0x8e4), at 0xfe275010
[27] __STATIC_CONSTRUCTOR(0xfe26a308, 0x7c1fc, 0xa98, 0x800, 0xfe2e6498, 0x840), at 0xfe276048
[28] 0xfe28ae20(0x0, 0xfe170908, 0x18, 0xfe100c50, 0xff590988, 0x0), at 0xfe28ae1f
[29] call_init(0xfe1010e0, 0x0, 0xfe1010e0, 0xffdfffff, 0x400000, 0x80000), at 0xff3bf67c
[30] elf_bndr(0x1, 0x47, 0xfecd4020, 0xfe195308, 0x5, 0xffbff114), at 0xff3cb734
[31] elfrtbndr(0xfecd6ee4, 0xfeceafec, 0x1da70, 0xfecd53b8, 0x5, 0x0), at 0xff3b3a64
[32] 0xfeceaa00(0xfecea8d4, 0xfe27b70c, 0xfe27b70c, 0x1552c, 0xfecea8d4, 0xfeceb0bc), at 0xfecea9ff
[33] __Cimpl::cplus_init(0xfdbbe664, 0x0, 0x329ec, 0xff3ee7b4, 0xfffe1c54, 0xfffe1c54), at 0xfecd6ee4
[34] _init(0x0, 0xfe170aa0, 0x14, 0xfe100dc0, 0xfadf6998, 0x0), at 0xfdba0210
[35] call_init(0xfe1010c8, 0x0, 0xfe1010c8, 0xffdfffff, 0x400000, 0x80000), at 0xff3bf67c
[36] elf_bndr(0x1, 0x2e, 0x3ec628, 0xfdb09274, 0x5, 0x0), at 0xff3cb734
[37] elfrtbndr(0xfecd8bec, 0xff000000, 0x32a34, 0x0, 0xfeceafe0, 0xfecd9210), at 0xff3b3a64
[38] 0x2d25eec(0xfecd8b44, 0x0, 0x15840, 0xff3ee7b4, 0xfffee2f0, 0xfffeddb4), at 0x2d25eeb
[39] 0xfecd8bec(0x0, 0xfe17096c, 0x18, 0xfe100c88, 0xfbfaec38, 0x0), at 0xfecd8beb
[40] call_init(0xfe101094, 0x0, 0xfe101094, 0xffdfffff, 0x400000, 0x80000), at 0xff3bf67c
[41] elf_bndr(0x1, 0x6e, 0x3ec928, 0xfecd13f8, 0x5, 0x0), at 0xff3cb734
[42] elfrtbndr(0xfe128d40, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff3b3a64
[43] 0x2d25eec(0xfe13c0d0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x2d25eeb
[44] 0xfe128d40(0x0, 0xfe1004d8, 0x18, 0xfe100e7c, 0xff3bf674, 0xff3ee7b4), at 0xfe128d3f
[45] call_init(0xfe100ee0, 0x0, 0xfe100ed4, 0xffdfffff, 0x400000, 0x80000), at 0xff3bf67c
[46] setup(0xffbffa70, 0xff3ee194, 0x10000, 0x0, 0xf84a0, 0x0), at 0xff3bebcc
[47] _setup(0x0, 0x0, 0x1, 0x1, 0xffbfffc5, 0xff3cc4ec), at 0xff3cc8e0
[48] rtboot(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff3b3a28
Any thought?
Thanks!
Olivier.

The shell uses its own memory allocation. Preloading an alternative memory allocator usually causes the shell to crash at startup. You have to restrict the preload to a place where no shell will start until the library is unloaded.
You pretty much need to use the Bourne shell, and preload the library as part of running a single command: % sh
$ LD_PRELOAD=/usr/lib/libumem.so.1 my_application In addition, my_application must not start any shell.
Assuming you meet the prerequisites, the simple solution is to make my_application a Bourne shell script that contains a single command like the above.

Libumem : invalid or corrupted buffer

Hello all,
when the libumem is enabled, and the application is run the process is coring with the following statement :
free(2c0a008): invalid or corrupted buffer
stack trace:
libumem.so.1'?? (0xff3799b4)
libCrun.so.1'__1c2k6Fpv_v_+0x4
libhlri_hdm_logical_model.so'__1cDstdFdeque4CpnLHDM_SSINHLR_n0AJallocator4C2___R
__allocate_at_end6M_v_+0x154
libhlri_hdm_logical_model.so'__1cDstdFdeque4CpnLHDM_SSINHLR_n0AJallocator4C2___J
push_back6Mrk2_v_+0x70
Can you please let us know what might be the problem?

I ran dd if=/dev/zero of=dev/rdsk/c4t4d0s2 count=64 and got output
64+0 records in
64+0 records out
I then ran format, and the following came up:
# format c4t4d0
selecting c4t4d0
[disk formatted]
Error occurred with device in use checking: No such device
FORMAT MENU:
When I look at the partition table, it appears to look normal.
partition> p
Current partition table (original):
Total disk cylinders available: 65533 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 120 128.21MB (121/0/0) 262570
1 swap wu 121 - 241 128.21MB (121/0/0) 262570
2 backup wu 0 - 65532 67.81GB (65533/0/0) 142206610
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 usr wm 242 - 65532 67.56GB (65291/0/0) 141681470
7 unassigned wm 0 0 (0/0/0) 0
Also, when I label the disk, I get the following error:
format> l
Error occurred with device in use checking: No such device
Ready to label disk, continue? y
It does label the disk becuase when I go back to look at the partition table after I edit it, it has what I changed it to.
I then tried to run zpool attach tank c2t12d0 c4t4d0 and I still get the stack dump. I guess I could try to shutdown the cluster and reboot both nodes and hope that both servers see the new disk. It seems almost like when I ran devfsadm after installing the disk that it didn't work.
Any other ideas?
Thanks.
Edited by: mbunixadm on Nov 18, 2008 8:43 AM

Libumem and UMEM_LOGGING

Similar Messages

Maybe you are looking for