Very high "load average" in top

Hi,
our OES11SP1 two-server-cluster (fully patched) shows a very high "load
average" (>50, up to 110) in top in some circumstances. There are no
problems in normal operation, but administrator actions like shutdown or
cluster migrate might trigger the problem.
For example when I enter 'halt', then there is the following line in
/var/log/messages:
Sep 12 20:27:18 srv1 shutdown[14675]: shutting down for system halt
more than 20 minutes later:
Sep 12 20:51:19 srv1 init: Switching to runlevel: 0
Within thes 20 minutes nothing happens, but "average load" goes up to at
least 50, with ndsd at top. Access to storage related tools and commands is
not possible, for example 'nss /pool' hangs without any output.
This happens on nearly every shutdown, but from time to time it doesn't. The
same will sometimes be triggered by a cluster migrate.
This only happens with our OES11SP1 cluster, it does not happen with OES11
and OES2SP3; the only other difference I'm aware of: Novell CIFS is only
running on the OES11SP1 cluster.
Any ideas?
Thanks,
Mirko

Sorry for the delay, it seems it's a bad habit of me to ask questions
immediately before holidays...
Yes, these servers have replicas, all of them... Cache size is set to 195328
KB, which is about twice the DIB size. IIRC this was a recommendation I read
somewhere at Novell. But I'll check that information again.
Thanks,
Mirko
kjhurni wrote:
>
> Mirko Guldner;2283539 Wrote:
>> top shows ndsd on top - but it's there in normal operation too, so I
>> don't
>> know if this means something.. (?) And it's not always the CPU which is
>> at
>> 100% - I have an example screenshot with: load average 50.20, 51.61,
>> 41.0
>> 3.2%us, 1.0%sy, 0.0%ni, 77.0%id 18%wa 0.0%hi 0.3%si 0.0%st. But this is
>> only
>> an example - this differs.
>>
>> Thanks,
>> Mirko
>>
>> kjhurni wrote:
>>
>> >
>> > Mirko Guldner;2283448 Wrote:
>> >> Hi,
>> >>
>> >> our OES11SP1 two-server-cluster (fully patched) shows a very high
>> "load
>> >> average" (>50, up to 110) in top in some circumstances. There are no
>> >> problems in normal operation, but administrator actions like
>> shutdown
>> >> or
>> >> cluster migrate might trigger the problem.
>> >>
>> >> For example when I enter 'halt', then there is the following line in
>> >> /var/log/messages:
>> >>
>> >> Sep 12 20:27:18 srv1 shutdown[14675]: shutting down for system halt
>> >>
>> >> more than 20 minutes later:
>> >>
>> >> Sep 12 20:51:19 srv1 init: Switching to runlevel: 0
>> >>
>> >> Within thes 20 minutes nothing happens, but "average load" goes up
>> to
>> >> at
>> >> least 50, with ndsd at top. Access to storage related tools and
>> commands
>> >> is
>> >> not possible, for example 'nss /pool' hangs without any output.
>> >>
>> >> This happens on nearly every shutdown, but from time to time it
>> doesn't.
>> >> The
>> >> same will sometimes be triggered by a cluster migrate.
>> >>
>> >> This only happens with our OES11SP1 cluster, it does not happen with
>> >> OES11
>> >> and OES2SP3; the only other difference I'm aware of: Novell CIFS is
>> >> only
>> >> running on the OES11SP1 cluster.
>> >>
>> >> Any ideas?
>> >>
>> >> Thanks,
>> >> Mirko
>> >
>> > Which process(es) does top show as being the culprit?
>> >
>> > In the past (on OES2 SP3) we had issues with CIFS causing ncp to
>> cause
>> > high utilization, but that was fixed a while ago.
>> >
>> > --Kevin
>> >
>> >
>
> I have seen ncp issues cause high ndsd utilization, but we've not yet
> upgraded our cluster or DS servers to OES11 yet (waiting for new
> hardware to go in place first).
>
> Out of curiosity, are the servers with high utilization also replica
> servers? For some reason, during one of our upgrades on a replica
> server (we have a server that contains all R/W copies of everything),
> the cache size got set down really low and that caused all sorts of
> issues.
>
> Maybe one of my collegues will wander by and offer additional insight,
> as this may be eDir related and/or NCP related. Not sure if triggering
> a core manually would help (but you'd have to send that to Novell and
> open an SR to get it read).
>
> IF you suspect CIFS, do you have the ability to temporarily shut off
> CIFS for like a few days to see if that's the culprit?
>
>

Similar Messages

  • Top- Large run queue & High load averages

    From which section of top we can find Large run queue & High load averages ?

    Can you be a little bit more specific about what OS your running on? Top is not consistent on all versions of Linux...
    This link may help
    http://www.oracle.com/technology/pub/articles/advanced-linux-commands/part2.html

  • [Solved] Excessive high load average

    My laptop seems to have an very excessive high load average. My system gets very slow even when I'm running openbox with a browser, terminal and music player. That is enough to make applications freeze for a few seconds. I've seen the load average goes higher 5.00. Right now I just logged in openbox using lxdm, open a browser and a terminal and my load average is 23:11:21 up 12 min,  2 users,  load average: 1.15, 0.92, 0.58.
    $ top
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    1367 root 20 0 48140 2392 1832 R 100 0.1 7:39.51 lxdm-binary
    1411 jesse 20 0 402m 9800 7176 S 1 0.2 0:03.10 indicator-multi
    1370 root 20 0 110m 14m 6388 S 0 0.4 0:03.50 X
    1479 jesse 20 0 263m 13m 10m S 0 0.3 0:00.99 terminal
    1 root 20 0 4180 692 592 S 0 0.0 0:00.61 init
    2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
    6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0
    7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
    8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1
    9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0
    As yon can see the process 1367 is using 100% of the cpu. I have no idea why this is happening. Also my laptop isn't old. It has intel i5 processor with 4gm ram.
    Anybody has an idea of what is happening?
    EDIT: I just checked my load average again
    [jesse@myarch ~]$ uptime
    23:23:46 up 24 min, 3 users, load average: 4.42, 2.19, 1.25
    That is an absurd value, isn't is? Ah, and my system is up to date, if anyone asks.
    Last edited by sollidsnake (2012-02-17 12:35:34)

    well it looks like back in december there were some bugs reported with libglib2.0-0 2.31.2 that where causing this issue you can try updating to make sure you have the most current version.
    pacman -S glib2

  • One of 4 node RAC always have higher load averages and higher than others

    Hello,
    We have a 4 node rac, 9208 on linux 4. When viewing top, we noticed the same one node always have a higher load average than the other 3 nodes. Is this normal. Loan balance is working fine but this one node always have higher load average. This is the node where we do the rac installation. Thank you.

    I do not remember what is the default for clb_goal (client load balancing) for 9i but 10g is LONG.
    check it
    select clb_goal from dba_services where name = <service name>
    you may have to change from LONG to SHORT OR SHORT to LONG depending your connection types.
    dbms_service.MODIFY_SERVICE(‘<service>’,clb_goal=> dbms_service.CLB_GOAL_LONG);
    Read the following article.
    http://www.databasejournal.com/features/oracle/article.php/3659411/Oracle-RAC-Administration---Part-15-Connection-Load-Balancing-and-FAN.htm

  • High load averages, low CPU usage

    HI,
         I recently upgraded to Lion and I am noticing high load averages ~0.7 for my system. The CPU is, however ~95% idle. I am not running any intensive apps. I first thought it is an I/O issue, but I have almost 500 MB free memory, and there is no disk activity. The system is perfectly fine, with all the eye candy/animations running smoothly. Is this a bug?
    Thanks for any help.

    Same problem here
    Did a clean install of Lion, moved everything manually. System was clean and running fast. Then, for some reason started to slow down after a few months. My last system (macbook snow leopard) was running fine for 3 years.
    I noticed HIGH load averages (over 2.5!) while CPU is ideling (only 'round 30% for user and system). System is slow and CPU gets hot, resulting in loud fan noise.
    Googled a lot, did standard maintenance tasks, tried to pinpoint cause - nothing so far. Will update when I find out more. Maybe someone else has a clue or Apple releases a fix. Fingers crossed.

  • [SOLVED] High load average in X at idle

    Hello Archers,
    Recently my laptop has been showing abnormally high 1-minute load averages (~0.20-0.80 on a dual-core machine) at idle whenever I have X running. This figure has always settled down to 0.00 after a while, and indeed it does when I exit to the console. What's puzzling is that top reports a less-extreme 2-3% CPU usage number, with X taking 1% at most (still high, though, considering I'm running a very lightweight DWM setup). To be sure, this does heat up the CPU noticeably. I don't think I've changed anything on my system except for routine updating, and I've made sure my .xinitrc script isn't the culprit.
    Any ideas on why this is?
    EDIT: Marked as [SOLVED].
    Last edited by ktkhuong (2010-08-27 00:28:45)

    Kernel .35.
    https://bbs.archlinux.org/viewtopic.php?id=103346
    Last edited by karol (2010-08-26 10:33:35)

  • [Solved] High load average even when idle after update

    Hello arch fellows!
    I have a Dell Inspiron N4110 intel i5 4gb ram. I think it is a decent laptop and should run Arch Linux pretty well, right? Well, it does until I run 'pacman -Syu'. After the update, the load average gets over 1.00 even when idle. It hardly gets under 0.40. Before updating, the average is under 0.10 most times. The command top doesn't show me anything. All the processes are practically 0%. It seems to me the problem would be the kernel, am I right? But I tried the fallback option when booting but it is the same.
    Does anybody have an idea of what could be the problem? I appreciate any help.
    Last edited by sollidsnake (2012-05-18 00:32:52)

    Thanks for the reply. It was really a kernel problem. I downgraded it to an older version and it looks normal now

  • High load average

    Hi all,
    System: SunOS 5.10 Generic_127127-11 sun4u sparc SUNW,SPARC-Enterprise
    load: 4:08pm up 78 day(s), 6:57, 2 users, load average: 1.10, 1.14, 1.14
    ps -ef | more
    root 3 0 0 Oct 29 ? 1185:48 fsflush
    Is it normal that fsflush eats that much of time?
    Users told me, that it took several minutes to open files. What could be the issue?
    If you need more information (vmstat, iostat) let me know.
    Thanks in advance.
    Cheers,
    Axel

    asgrunix wrote:
    Hi all,
    System: SunOS 5.10 Generic_127127-11 sun4u sparc SUNW,SPARC-Enterprise
    load: 4:08pm up 78 day(s), 6:57, 2 users, load average: 1.10, 1.14, 1.14
    ps -ef | more
    root 3 0 0 Oct 29 ? 1185:48 fsflush
    Is it normal that fsflush eats that much of time? Doesn't seem unreasonable. A few hours of CPU time in 2 1/2 months? I wouldn't be looking there for your performance issues.
    Users told me, that it took several minutes to open files. What could be the issue?Slow storage? Failing storage? Heavy load on the machine at the time? Any of those things.
    Darren

  • High cpu load average

    Hi Experts,
    I have a SOA deployed on AS 10.1.3.2 which is integerated with BI EE 10.1.3.2 on OHEL 4.
    With this setup, I have seeing very high load average on cpu side. When I stop the soa oc4j the load average comes to normal level of under 1. While with soa process started it goes as high as 15 which is pretty abnormal.
    Any pointers to debug what could be the issue will be helpfu.
    Thanks,
    Rishi

    Hi Experts,
    I have a SOA deployed on AS 10.1.3.2 which is integerated with BI EE 10.1.3.2 on OHEL 4.
    With this setup, I have seeing very high load average on cpu side. When I stop the soa oc4j the load average comes to normal level of under 1. While with soa process started it goes as high as 15 which is pretty abnormal.
    Any pointers to debug what could be the issue will be helpfu.
    Thanks,
    Rishi

  • High load on x4100

    hey all,
    my new x4100s running linux sit at a load average of 8+, even when idle.
    this page http://www.mail-archive.com/[email protected]/msg45814.html
    mentions a workaround, by rmmodding ohci_hcd - however that kills usb, which i need.
    any ideas?

    What version of linux are you using?
    A load average of 8+ means something is truly wrong. My 4100s running RedHat Enterprise Linux V3, Update 6 have a load average at idle of 0, as they should.
    Did you do a 'top' to see what is consuming the CPU?
    Here is a list of the modules loaded on a typical 4100 in our shop:
    [root@iterppn]# lsmod
    Module Size Used by
    parport_pc 29185 0
    lp 15089 0
    parport 43981 2 parport_pc,lp
    autofs4 24009 5
    i2c_dev 13633 0
    i2c_core 28609 1 i2c_dev
    nfs 245617 2
    lockd 77809 2 nfs
    nfs_acl 5185 1 nfs
    sunrpc 175545 7 nfs,lockd,nfs_acl
    ds 21449 0
    yenta_socket 22977 0
    pcmcia_core 69329 2 ds,yenta_socket
    button 9057 0
    battery 11209 0
    ac 6729 0
    sr_mod 20581 0
    usb_storage 70921 0
    md5 5697 1
    ipv6 282913 24
    joydev 11841 0
    ohci_hcd 24273 0
    hw_random 7137 0
    e1000 120761 0
    dm_snapshot 19073 0
    dm_zero 3649 0
    dm_mirror 32465 0
    ext3 137809 6
    jbd 69104 1 ext3
    dm_mod 68097 24 dm_snapshot,dm_zero,dm_mirror
    mptscsih 2753 0
    mptsas 11981 3 mptscsih
    mptspi 11725 1 mptscsih
    mptfc 10825 1 mptscsih
    mptscsi 46161 3 mptsas,mptspi,mptfc
    mptbase 66721 4 mptsas,mptspi,mptfc,mptscsi
    sd_mod 19393 3
    scsi_mod 141457 7 sr_mod,usb_storage,mptsas,mptspi,mptfc,mptscsi,sd_mod
    You might want to compare your module list to this one. We have USB support and no high load averages.

  • Sync messages failed due to high load

    Hi!
    On our test system we had a very high load of async messages for 3 - 4 hours. The async messages were processed successfully but all sync messages failed during this time frame! As soon as the high load of async messages stopped, the sync messages started executing successfully again.
    Has anybody also experienced this kind of problem and what can be done to avoid this issue? Thanks for any input and suggestions.
    Regards, Tanja

    hey Tanja,
    i had a similar issue when testing the box with Load runner, that a simple sync scenario fails when the size of payload or attachment is high, the thing that i did is just increased the virtual memory ,
    You can also check the trace files to figure out problem ?
    This might not be a definitive answer , hope this helps,
    Vara
    [PLEASE REWARD FOR POSTING]

  • Very high server load

    Hi,
    I have run into an issue with very high server load on a OES11sp2 server.
    After a lot of troubleshooting we found that the cause was the adminusd process.
    Last month i patched the server and we upgraded to novell-nss-4.16.4940-0.5.9, this contains the new adminusd component.
    Specifically we found that it was related to volume space restrictions.
    After turning volume space restrictions off the load of the server dropped from 10+ down to 0.66 (atm).
    The strange thing was that adminusd wasn't showing up in top directly, but the us% was very high.
    The problem only pops up when a lot of people are logged in and drops off again when the number is low.
    Any idea to what is going on?
    Cheers,
    Frank

    hi all,
    I checked the environment again,
    then came to know that my SAN is not connected through FIbre Channel or any other faster channel to the Server and its connected through 100MBPs line... this was causing the slow and also there was a lot IO wait in my Oracle VM server.
    Thanks,
    Sri.

  • Once firefox is opened my mac pro lap top starts to rev very high. This has not happen before. And I lost all my tabs from the last session. Has this been a problem recently and how can I solve it? I am trying to avoid my system crashing.

    firefox is working my system too hard and the system is revving very high.
    == This happened ==
    Every time Firefox opened
    == today ==
    == User Agent ==
    Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-us) AppleWebKit/531.22.7 (KHTML, like Gecko) Version/4.0.5 Safari/531.22.7

    Does this start right when you start firefox or after you've loaded some pages? Are there specific pages that cause this problem?

  • Very high HDD load/unload cycles and hardware ECC (SMART)

    Hello,
    I just took a look at my Seagate HDDs SMART values and found 2 very high numbers:
    Load/Unload Cycle Count: 29.422, and increasing +1 like every 15 seconds, together with a clunking noise.
    Hardware ECC Recovered: 320.239.221, and increasing +10 or more like every second.
    The Load/Unload stuff seems to be aggressive power management of either the drive itself or the OS, I have no clue how to stop this. I tried keeping the drive busy with declunk, but it still parks the heads every few seconds.
    But what is Hardware ECC ? And can anyone give me an advice on how to stop my drive moving the heads to the parking position every 15 seconds ? When I used Tiger, opening Photo Booth did the trick, doesn't work on Leopard tho. The HDD is less than 1 month old...

    I too was getting so annoyed by the clunking sounds. The APM settings for the HDD are obviously out of whack completely. Tried Declunk which has been mentioned in other threads in the forums. It basically runs as a daemon making a small file and then deleting it every 5 seconds. I got it to work, but I had to introduce an additional parameter string to force it to create the file every 4 seconds. Visit http://kiza.kcore.de/software/declunk/ for more information on this issue.
    Overall, I was still unhappy with the solution. Creating a file ever few seconds isn't optimal.
    Having tested disabling the APM settings with hdparm for WinXP (Via Bootcamp) and found that it fixed the clunking sound, I knew that APM was definitely causing the sound. Of course, hdparm did not update the firmware of the hdd, so every time I powered down my MBP, the hdd would go back to APM being on "FULL" bringing
    back the "clunk" sound.
    The ideal solution was to obtain some kind of APM management software for OSX. Unfortunately, the historical application known as APM Tuner X was no longer being developed by the author and does not run in Leopard.
    Thats when I found HDAPM!!!
    http://mckinlay.net.nz/hdapm/
    How did you determine the load/unload cycles for your HDD? In the end I located a copy of smartmon for Windows and booted into WinXp (via Bootcamp). I think my cycles are up to over 100,000... Not sure how to easily tell?!
    Anyway, I hope that this solves your issue. Apple's Energy Saving system preference pane SHOULD disable putting the hard disk to sleep if left unchecked, but I also believe this is an issue with many HDD's. Seagate are shocking when it comes to releasing firmware upgrades too by the looks of things.

  • Load+gen time very high

    Hi all
    i m using SAP 4.7 on Oracle 9.2
    Server- IBM P series Processor 375Mhzx 2 RAM - 3 GB
    I hav problem that in ST03n i have Load +Gen time very high (upto 500)
    is it some H/w problem or some tuning prob.
    Thanks in Advance

    SGEN will recompile ABAP loads that you want.
    your system look old, so, this can take anything from 1 hour to 8 hours.
    so, do this when user load is minimal.
    I usually choose all options and do a complete generation. this takes a lot of space in your DB, but system will be faster even if you use rarely used transactions.

Maybe you are looking for

  • Creative MP3 Play

    Dont buy one. The 2 I had just broke and I got the run around from Creative Labs cost more to fix than buying another one. Save yourself the headaches. There all over priced junk. To Creative Labs dont forget to take down my message so no one else ca

  • Huge customer service failure

    After more than eleven years with Verizon Wireless, I was shocked by the events of this past week. I have a phone that a family member uses.  On Saturday, he came to me and said that he was unable to text.  I called techincal support.  After about 30

  • Templates/ download speed

    For Christ, you have to create a tool where user may create a page template and then copy this template to create similar pages. also, apple must create plates a way to transform all these templates elements into one archive so, downloading the pages

  • View Spot Color Separations in Reader

    Is there any way to view the spot color separations in reader?

  • Cannot Reconcile : Crossed Return Date, please Reverse: Message No : 81582

    Dear Friends, One of our client is getting an error as :"Cannot Reconcile : Crossed Return Date, please Reverse" Message No : 81582 while reconciliing Sub Contract Challan through J1IFQ . Here the days limit of challan crossed 180 days. Client wants