Calculating load averages
Generic_137111-02 sun4v sparc SUNW,SPARC-Enterprise-T5220 Solaris 10:
I have couple of questions:
1. mpstat shows it has 63 CPUs but the spec says T5220 have 8 cores.
2. I don't understand what is threading technology in CPUs. What's the difference as compared to cores
3. How do I calculate load averages? When I do a top I don't know how to convert it into this CPU threading technology. How do I know if machine is overloaded
The Niagara processor is designed so there can be 8 parallel simultaneous threads (Hardware threads) running on a single core.
The T2 processor (Niagara 2) is capable of 8 cores, each capable of 8 parallel threads. So, effectively one T2 can run 64 threads simultaneously.
[http://blogs.sun.com/glennf/entry/getting_past_go_with_sparc|http://blogs.sun.com/glennf/entry/getting_past_go_with_sparc]
[http://blogs.sun.com/glennf/tags/cmt|http://blogs.sun.com/glennf/tags/cmt]
I'd recommend going through these two links (see above). It will save you a lot of grief when you go from single-threaded admins to multi-threaded admins.
Similar Messages
-
Hi,
How the load average in a solaris system is calculated. What is the threshold level of load average, which could be panic to server.
Regards,
SivaWe had a large system with over 100 cpu's running Solaris 10, and the highest load point average (LPA) that I saw was over 1000. The system was slow but did not panic.
I believe that the LPA divided by the number of cpu's will tell you the number of jobs per cpu which are runnable. If the LPA is larger that the number of CPU's then you are time slicing between the available jobs, and getting less than a full slice per job. -
Hi
I've been consulting two different Basis Consultants with this question and got two different answers, so I will just try this forum to figure out which one is right:
I have a WEB AS server with 12 CPUs (running Business Warehouse) where the load average is between 4 and 10. In the detailed analysis I can see that most of the CPUs are idle even with a Load Average on e.g. 8. My understanding of load average is the number of work-processes within a certain period (1 min; 5 min etc.) waiting for a CPU to be processed. Furthermore I heard that this load average as a rule of thumb should not be higher than 2.
Answer from Basis-Consultant1:
Load average should be calculated according to the CPUs so in ST06 it is allowed to have a load average of (12 CPUs * 2 work processes) 24. Which means that 10 is not much and is probably due to lack of parallelism in the processes.
Answer from Basis-Consultant2
According to the SAP opinion - like a rule of thumb - if the average load is around 1 percent it is OK, if it is 3 percent there could be a serious bottleneck. But there are also more things to consider (CPU utlization per hour, memory consumption etc.). So the whole picture has to be evaluated.
Can anybody help me here.... which one is right?
Thanks in advance.
Best regards,
Keld PilegaardHi,
Check out this PDF "Best Practices for Performance Tuning SAP R3 and Oracle, Part I" in Sdn, It will give u a clear idea about load average
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/5d0db4c9-0e01-0010-b68f-9b1408d5f234
Kind Regards
Umesh K -
Hi Experts,
I have a SOA deployed on AS 10.1.3.2 which is integerated with BI EE 10.1.3.2 on OHEL 4.
With this setup, I have seeing very high load average on cpu side. When I stop the soa oc4j the load average comes to normal level of under 1. While with soa process started it goes as high as 15 which is pretty abnormal.
Any pointers to debug what could be the issue will be helpfu.
Thanks,
RishiHi Experts,
I have a SOA deployed on AS 10.1.3.2 which is integerated with BI EE 10.1.3.2 on OHEL 4.
With this setup, I have seeing very high load average on cpu side. When I stop the soa oc4j the load average comes to normal level of under 1. While with soa process started it goes as high as 15 which is pretty abnormal.
Any pointers to debug what could be the issue will be helpfu.
Thanks,
Rishi -
Question about Load Average in the AWR report
Hi,
I've some database in 11.2 RAC on AIX.
I was analyzing the root causes of eviction.
Looking AWR Report before the reboot I see:
DB1
Host CPU (CPUs: 6 Cores: 3 Sockets: )
~~~~~~~~ Load Average
Begin End %User %System %WIO %Idle
4.18 12.33 60.9 12.6 1.6 26.5
Instance CPU
~~~~~~~~~~~~
% of total CPU for Instance: 27.4
% of busy CPU for Instance: 37.3
%DB time waiting for CPU - Resource Mgr: 10.6
DB2
Host CPU (CPUs: 6 Cores: 3 Sockets: )
~~~~~~~~ Load Average
Begin End %User %System %WIO %Idle
3.77 13.93 60.7 12.5 1.6 26.7
Instance CPU
~~~~~~~~~~~~
% of total CPU for Instance: 6.9
% of busy CPU for Instance: 9.5
%DB time waiting for CPU - Resource Mgr: 0.0
Do you think these value ar high?
This is vmstats at the time of reboot:
DATA
RUN
BCK
AVM
FRE
PRE
PPI
PPO
PFR
PSR
PCY
FIN
FSY
FCS
CUS
CSY
CID
CWA
07/21/2013
00:08:17
31
0
7.400.345
579.923
0
81
0
0
0
0
3.292
187.010
19.560
84
16
0
0
07/21/2013
00:08:17
17
1
7.390.187
589.884
0
176
0
0
0
0
3.681
169.994
21.482
81
19
0
0
07/21/2013
00:08:17
27
1
7.402.121
577.816
0
115
0
0
0
0
3.150
157.210
18.503
84
16
0
0
07/21/2013
00:08:48
19
1
7.422.966
564.179
0
211
0
0
0
0
2.396
152.667
19.368
84
16
0
0
07/21/2013
00:08:48
19
1
7.427.693
559.268
0
162
0
0
0
0
2.990
154.733
19.843
85
15
0
0
07/21/2013
00:08:48
23
1
7.441.204
545.530
0
204
0
0
0
0
2.137
171.501
18.151
84
16
0
0
This is mpstat:
DATA
CPU
MIN
MAJ
MPC
INT
CS
ICS
RQ
MIG
LPA
SYSC
US
SY
WT
ID
PC
07/21/2013
00:08:48
0
12896
44
0
1279
3030
1362
2
367
100
27313
86
14
0
0
0.49
07/21/2013
00:08:48
1
11055
93
0
1123
3137
1315
1
222
100
31860
85
15
0
0
0.51
07/21/2013
00:08:48
2
5938
51
0
1465
3840
1294
2
532
100
29992
85
15
0
0
0.49
07/21/2013
00:08:48
3
6266
57
0
1247
3177
1046
2
511
100
22793
85
15
0
0
0.51
07/21/2013
00:08:48
4
2661
18
0
1729
4087
1707
4
264
100
24647
85
15
0
0
0.49
07/21/2013
00:08:48
5
4211
10
0
1395
2709
1101
2
209
100
21019
86
14
0
0
0.51
07/21/2013
00:08:49
0
9372
27
0
1150
2583
1219
0
245
100
47745
82
18
0
0
0.47
07/21/2013
00:08:49
1
11327
13
0
726
1803
794
1
130
100
25239
87
13
0
0
0.52
07/21/2013
00:08:49
2
8970
118
0
1459
4396
1517
0
602
100
24833
81
19
0
0
0.49
07/21/2013
00:08:49
3
7328
267
0
1329
4136
1273
2
586
100
25385
81
19
0
0
0.51
07/21/2013
00:08:49
4
8793
19
0
1133
2583
1036
1
235
100
24327
86
14
0
0
0.50
07/21/2013
00:08:49
5
8239
12
0
1309
2846
1165
1
277
100
18513
86
14
0
0
0.50
Thank youThank you Jonathan,
i'm looking ASH, 15 minutes before the crash.
I've 13% of buffer busy waits and 13% of cpu quantum
Avg Active
Event Event Class % Event Sessions
CPU + Wait for CPU CPU 59.09 0.15
buffer busy waits Concurrency 13.64 0.04
resmgr:cpu quantum Scheduler 13.64 0.04
The buffer busy waits was caused by an update of a table.
There are ETL jobs that runs every nigth.
Looking IO stats I notice a change in the use of the swap:
before the crash:
hdisk66 xfer: %tm_act bps tps bread bwrtn
1.0 8.2K 2.0 8.2K 0.0
read: rps avgserv minserv maxserv timeouts fails
2.0 6.7 3.8 9.6 0 0
write: wps avgserv minserv maxserv timeouts fails
0.0 0.0 0.0 0.0 0 0
queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull
0.0 0.0 0.0 0.0 0.0 0.0
near the crash:
hdisk66 xfer: %tm_act bps tps bread bwrtn
71.0 241.7K 59.0 241.7K 0.0
read: rps avgserv minserv maxserv timeouts fails
59.0 12.1 0.2 183.5 0 0
write: wps avgserv minserv maxserv timeouts fails
0.0 0.0 0.0 0.0 0 0
queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull
0.0 0.0 0.0 0.0 0.0 0.0 -
Problem in calculating the Average Daily Requirement
Hello all,
I didn't understand how the system calculates the average daily requirement in Dynamic Safety Stock process. The following process flow in given in SAP notes to find how the system calculates the average daily requirement:
1. The system uses the defined parameters to determine the number of days used for calculating the average daily requirements. If the period is defined as a week, the period length as standard days (5 days) and the number of periods as 2, the system divides the total of the requirements by 10 days.
2. The system then calculates the total of the requirements for this period.
The system takes into account all requirements in the current period, even requirements that lie in the past but are still in the current period. For example, if the planning run is carried out in the middle of the month, then those requirements that were planned at the beginning of the month are also included in the calculation of the average daily requirements.
3. The average daily requirement is calculated using the formula:
Requirements in the specified number of periods / Number of days within the total period length
I have run MRP on 02/23/2009 and the following results are generated in stock requirement list of the component part:
Date Dependent Requirement MMSA Schedule Lines Quantity
3/3/2009 10 31
3/11/2009 20 20
3/31/2009 30 30
4/14/2009 40 49
4/22/2009 50 50
4/29/2009 60 60
5/11/2009 70 55
5/21/2009 80 80Hi,
In addition to my previous reply,
If you did following setting -
Range of coverage in first period -
min - blank
tgt - 7
max - blank
number of periods - blank
The system will calculate the safety stock for 7 days for each period; i.e., 7*3=21 and it will generate plnd orders as
week1 = 51
week2 = 14+21 = 35
week3 = 10+21 = 31
week4 = 30+21 = 51
If you want to restrict your calculation till 2 periods then make following settings -
Range of coverage in first period -
min - blank
tgt - 7
max - blank
number of periods - 2
Range of coverage in second period -
Make all blank
Range of coverage in the rest of the horizon -
min - blank
tgt - 3
max - blank
It means for first two weeks the safety stock will be 21 (equivalent to 7 days) and for rest of the horizon it will be 3*3 = 9 (equivalent to 3 days)
The Plnd orders will be -
week 1 = 51
week 2 = 35
week 3 = 14+9 = 23
week 4 = 30+9 = 39
and so on.
Regards,
Amol -
Why is my load average always above 1 regardless of cpu usage?
My load average is often ridiculous. For example, when I wake from sleep it's usually 45 or so. Even when I'm doing nothing with the machine (CPU is about 10% in each core) I still see load averages of 1.2 to 1.6 or so. Why would this happen?
Is there a way to figure out what is causing the load?Activity Monitor could see what happens on your MacBook.
Resetting SMC could solve the problem.
Intel-based Macs: Resetting the System Management Controller (SMC) -
MacBook Pro Retina 2013 load average constantly above 1
I have a recently purchased (3 month old) MacBook Pro Retina - Late 2013.
I've noticed that the load averages appear to be rather consistently high.
So after a fresh reboot, with nothing other than background applications and the dashboard running, I've noticed that the CPU load stays above 1.0
The CPU itself is idle, at 100% nearly all of the time, it certainly doesn't correspond to '1' unit of load on this system.
I suspect something is amiss but have been unable to figure anything out.
I have tried the instrutions for clearing the SMC and this does not appear to have sorted anything.
The in-built diagnostics suggest nothing is wrong.
Any thoughts?Pre-Mavericks
Open Activity Monitor in the Utilities folder. Select All Processes from the Processes dropdown menu. Click twice on the CPU% column header to display in descending order. If you find a process using a large amount of CPU time (>=70,) then select the process and click on the Quit icon in the toolbar. Click on the Force Quit button to kill the process. See if that helps. Be sure to note the name of the runaway process so you can track down the cause of the problem.
Mavericks and later
Open Activity Monitor in the Utilities folder. Select All Processes from the View menu. Click on the CPU tab in the toolbar. Click twice on the CPU% column header to display in descending order. If you find a process using a large amount of CPU time (>=70,) then select the process and click on the Quit icon in the toolbar. Click on the Force Quit button to kill the process. See if that helps. Be sure to note the name of the runaway process so you can track down the cause of the problem. -
One of 4 node RAC always have higher load averages and higher than others
Hello,
We have a 4 node rac, 9208 on linux 4. When viewing top, we noticed the same one node always have a higher load average than the other 3 nodes. Is this normal. Loan balance is working fine but this one node always have higher load average. This is the node where we do the rac installation. Thank you.I do not remember what is the default for clb_goal (client load balancing) for 9i but 10g is LONG.
check it
select clb_goal from dba_services where name = <service name>
you may have to change from LONG to SHORT OR SHORT to LONG depending your connection types.
dbms_service.MODIFY_SERVICE(‘<service>’,clb_goal=> dbms_service.CLB_GOAL_LONG);
Read the following article.
http://www.databasejournal.com/features/oracle/article.php/3659411/Oracle-RAC-Administration---Part-15-Connection-Load-Balancing-and-FAN.htm -
Load average on services.
Hi,
i have 2 node with ASM file system,
Node 1 -> i have 7 services
Node 2 -> i have 6 services,
how to find out load average on each service on each node?
ThanksSQL> desc V_$SERVICE_STATS
Name Null? Type
SERVICE_NAME_HASH NUMBER
SERVICE_NAME VARCHAR2(64)
STAT_ID NUMBER
STAT_NAME VARCHAR2(64)
VALUE NUMBER -
System load average over 1.5 while cpu idle
Running 10.9.4. My load averages are continuously around 1.5 or higher while my cpu is around 95% idle, no apps running, I've just rebooted and logged in. This is very common now for my machine. Any suggestions? Thanks
On Unix systems that are idle I generally see near-zero load averages. I would expect OS X to be in that general vicinity and seem to recall seeing near-zero numbers in the past when the system is idle.
-
Auto check calculating the Average
I do not know how to approach an issue with my form, so any help or quide will be appreciate very much!
For a Performance Form I have 5 sections for the managers to fill in.
Every section includes an Assessment drop down list with 5 items to select from.
Items for DDList (Named:Assessment):
Outstanding
Exceeds Expectations
Meets Expectations
Needs Improvement
Does Not Meet Expectations
Final at the end of the form I have a section named
OVERALL SUMMARY OF PERFORMANCE with 5 check boxes named:
Outstanding
Exceeds Expectations
Meets Expectations
Needs Improvement
Does Not Meet Expectations
Is it possible with a script(Calculating the average?), AUTO to check(one of the check boxes for the Overall sum of performance?
THANK YOU
Hi Niall, I took your advise form your last sample cindle you have send me, I am close but still I have a problem at the end of the line!
Here what I have till now:
On change event for the Area1 the script below:
switch
(xfa.event.newText){
case
"Outstanding":
NumericField1.rawValue
= "5";
break
case
"Exceeds expectations":
NumericField2.rawValue
= "4";
break
case
"Meets expectations":
NumericField3.rawValue
= "3";
break
case
"Needs improvement":
NumericField4.rawValue
= "2";
break
case
"Does not meet expectations":
NumericField5.rawValue
= "1";
break
For a NumericField1(Score for Outstanding) on Calculate event the script:
var vScore=0 ;
for (var i=0; i<5; i++){
if (xfa.resolveNode("optionA[" + i + "]").rawValue=="5")vScore
+=xfa.resolveNode("optionA[" + i + "]").rawValue;}
NumericField1.rawValue= vScore ;
This I am getting(for NumericField1) is for example select Outstanding for all DDL is:55555 than 25 which is the desire!
How I can make it work, where is my mistake?
Thanks Niall -
Load averages over 1.00 after upgrading to Mavericks
Hi,
I have recently upgraded to OS X Mavericks on my old Macbook Pro (Late-2007, 4GB RAM) and I noticed that the load averages are always above 1.00, even when CPU is almost 100% idle. I know load averages don't reflect only CPU but after a reboot I see the same. Could this be because it's an old Macbook and it takes more resources from it to run the new OS X 10.9? This is so strange.
Thanks for reading!
L.I'm having the same "issue" since I first upgraded to Mavericks from Lion. At first I thought it had something to do with the way processes are scheduled/handled in Mavericks but now I realize that I DO get load averages below 1.0 (~0.7-0.8, never below this), but only when completely idle.
I have an early 2011 Macbook 2.7 GHz i7with 16 GB RAM, so I find this kinda weird.
Are you using any particular applications/extensions?
(see also http://apple.stackexchange.com/questions/106828/avg-load-goes-up-after-upgrading -to-mavericks, altough there is no solution here). -
ODSEE 11g - DPS Directory proxy server suddenly increase load average
Hi all
Recently upgraded from directory server 5.2 to ODSEE 11g, one directory proxy configure to one master directory server and one consumer directory server.
all the three instances are in the same sparc t3 machine.
Directory proxy server alerts server load average on the machine is above 6.00 normally it is 0.66. I'm not sure what is causing the sudden burst in the load ? the traffic is normal there is no abnormal requests coming to the server. proxy performance degrades over the span of 24 hours ....and Once i restart the proxy services (dpsadm restart) all load averages comes to normal and directory proxy runs normal for the next two - three weeks. The same cycle continues ...I'm not sure what was causing the sudden load increase.
I increased the JVM heap size from 1GB to 2 GB still continue to have the problem ..did anyone else experience similar problem. How did you fix it....
Any input or advise in the right direction is much appreciated.
Thank you.server load I'm referring to "prstat command" - server load average suddenly shoot up from 0.66 to 6.00 ie) the CPU usage. Alert is from our server monitoring tool not related to directory proxy.
Clients report connections time out (etime goes from etime=0 ..2..4.....) over 24 hours i can see the etime increases and eventually the proxy server get hung and non responsive. Once I restart all the performance back to normal at-least for another two weeks.
I suspect there might be a memory leak or JVM Garbage collection issue -- any expert input how to figure this out will help.
Here is the JVM args in the proxy server "Xms2g -Xmx2g -Xmn1g -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC"
Here is a jstat during the problem
./jstat -gcutil -t 25365 2s 30
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT
982106.4 0.00 26.17 4.26 92.25 59.52 523 60.979 689 1002.587 1063.566
982108.4 0.00 26.17 4.40 92.25 59.52 523 60.979 689 1002.587 1063.566
982110.4 0.00 26.17 4.80 92.25 59.52 523 60.979 689 1002.587 1063.566
982112.4 0.00 26.17 5.10 92.25 59.52 523 60.979 690 1002.719 1063.698
982114.4 0.00 26.17 5.15 92.25 59.52 523 60.979 690 1002.719 1063.698
982116.4 0.00 26.17 5.32 92.25 59.52 523 60.979 691 1003.009 1063.988
982118.4 0.00 26.17 5.72 92.25 59.52 523 60.979 691 1003.009 1063.988
982120.4 0.00 26.17 5.80 92.25 59.52 523 60.979 691 1003.009 1063.988
982122.4 0.00 26.17 5.93 92.25 59.52 523 60.979 692 1003.168 1064.146
982124.4 0.00 26.17 6.03 92.25 59.52 523 60.979 692 1003.168 1064.146
982126.4 0.00 26.17 6.15 92.25 59.52 523 60.979 693 1003.481 1064.460
982128.5 0.00 26.17 6.18 92.25 59.52 523 60.979 693 1003.481 1064.460
982130.5 0.00 26.17 6.25 92.25 59.52 523 60.979 693 1003.481 1064.460
982132.5 0.00 26.17 6.29 92.25 59.52 523 60.979 694 1003.656 1064.635
982134.5 0.00 26.17 6.31 92.25 59.52 523 60.979 694 1003.656 1064.635
982136.5 0.00 26.17 6.36 92.25 59.52 523 60.979 695 1003.988 1064.967
982138.5 0.00 26.17 6.89 92.25 59.52 523 60.979 695 1003.988 1064.967
982140.5 0.00 26.17 6.99 92.25 59.52 523 60.979 695 1003.988 1064.967
982142.5 0.00 26.17 7.08 92.25 59.52 523 60.979 696 1004.187 1065.165
982144.5 0.00 26.17 7.31 92.25 59.52 523 60.979 696 1004.187 1065.165
982146.5 0.00 26.17 7.82 92.25 59.52 523 60.979 697 1004.553 1065.531
982148.5 0.00 26.17 7.92 92.25 59.52 523 60.979 697 1004.553 1065.531
982150.5 0.00 26.17 8.01 92.25 59.52 523 60.979 697 1004.553 1065.531
982152.5 0.00 26.17 8.17 92.25 59.52 523 60.979 698 1004.786 1065.764
982154.5 0.00 26.17 8.26 92.25 59.52 523 60.979 698 1004.786 1065.764
982156.5 0.00 26.17 8.38 92.25 59.52 523 60.979 699 1005.174 1066.153
982158.5 0.00 26.17 8.74 92.25 59.52 523 60.979 699 1005.174 1066.153
982160.5 0.00 26.17 8.88 92.25 59.52 523 60.979 699 1005.174 1066.153
982162.5 0.00 26.17 8.96 92.25 59.52 523 60.979 700 1005.433 1066.412
982164.5 0.00 26.17 9.09 92.25 59.52 523 60.979 700 1005.433 1066.412
jstat after the restart
./jstat -gcutil -t 10084 2s 30
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT
40312.6 0.00 25.13 88.49 1.98 63.68 21 2.366 0 0.000 2.366
40314.6 0.00 25.13 88.58 1.98 63.68 21 2.366 0 0.000 2.366
40316.6 0.00 25.13 88.71 1.98 63.68 21 2.366 0 0.000 2.366
40318.6 0.00 25.13 88.99 1.98 63.68 21 2.366 0 0.000 2.366
40320.6 0.00 25.13 89.31 1.98 63.68 21 2.366 0 0.000 2.366
40322.6 0.00 25.13 89.36 1.98 63.68 21 2.366 0 0.000 2.366
40324.6 0.00 25.13 89.42 1.98 63.68 21 2.366 0 0.000 2.366
40326.6 0.00 25.13 89.53 1.98 63.68 21 2.366 0 0.000 2.366
40328.6 0.00 25.13 89.60 1.98 63.68 21 2.366 0 0.000 2.366
40330.6 0.00 25.13 89.72 1.98 63.68 21 2.366 0 0.000 2.366
40332.6 0.00 25.13 90.11 1.98 63.68 21 2.366 0 0.000 2.366
40334.6 0.00 25.13 90.56 1.98 63.68 21 2.366 0 0.000 2.366
40336.6 0.00 25.13 90.67 1.98 63.68 21 2.366 0 0.000 2.366
40338.6 0.00 25.13 90.75 1.98 63.68 21 2.366 0 0.000 2.366
40340.6 0.00 25.13 91.09 1.98 63.68 21 2.366 0 0.000 2.366
40342.6 0.00 25.13 91.36 1.98 63.68 21 2.366 0 0.000 2.366
40344.6 0.00 25.13 91.47 1.98 63.68 21 2.366 0 0.000 2.366
40346.6 0.00 25.13 91.53 1.98 63.68 21 2.366 0 0.000 2.366
40348.7 0.00 25.13 91.64 1.98 63.68 21 2.366 0 0.000 2.366
40350.7 0.00 25.13 91.77 1.98 63.68 21 2.366 0 0.000 2.366
40352.7 0.00 25.13 91.87 1.98 63.68 21 2.366 0 0.000 2.366
40354.7 0.00 25.13 91.95 1.98 63.68 21 2.366 0 0.000 2.366
40356.7 0.00 25.13 92.11 1.98 63.68 21 2.366 0 0.000 2.366
40358.7 0.00 25.13 92.19 1.98 63.68 21 2.366 0 0.000 2.366
40360.7 0.00 25.13 92.24 1.98 63.68 21 2.366 0 0.000 2.366
40362.7 0.00 25.13 92.85 1.98 63.68 21 2.366 0 0.000 2.366
40364.7 0.00 25.13 93.19 1.98 63.68 21 2.366 0 0.000 2.366
40366.7 0.00 25.13 93.40 1.98 63.68 21 2.366 0 0.000 2.366
40368.7 0.00 25.13 93.44 1.98 63.68 21 2.366 0 0.000 2.366
40370.7 0.00 25.13 93.47 1.98 63.68 21 2.366 0 0.000 2.366
Any one else had similar behavior. Any input to the right direction is highly appreciated.
Thanks. -
Very high "load average" in top
Hi,
our OES11SP1 two-server-cluster (fully patched) shows a very high "load
average" (>50, up to 110) in top in some circumstances. There are no
problems in normal operation, but administrator actions like shutdown or
cluster migrate might trigger the problem.
For example when I enter 'halt', then there is the following line in
/var/log/messages:
Sep 12 20:27:18 srv1 shutdown[14675]: shutting down for system halt
more than 20 minutes later:
Sep 12 20:51:19 srv1 init: Switching to runlevel: 0
Within thes 20 minutes nothing happens, but "average load" goes up to at
least 50, with ndsd at top. Access to storage related tools and commands is
not possible, for example 'nss /pool' hangs without any output.
This happens on nearly every shutdown, but from time to time it doesn't. The
same will sometimes be triggered by a cluster migrate.
This only happens with our OES11SP1 cluster, it does not happen with OES11
and OES2SP3; the only other difference I'm aware of: Novell CIFS is only
running on the OES11SP1 cluster.
Any ideas?
Thanks,
MirkoSorry for the delay, it seems it's a bad habit of me to ask questions
immediately before holidays...
Yes, these servers have replicas, all of them... Cache size is set to 195328
KB, which is about twice the DIB size. IIRC this was a recommendation I read
somewhere at Novell. But I'll check that information again.
Thanks,
Mirko
kjhurni wrote:
>
> Mirko Guldner;2283539 Wrote:
>> top shows ndsd on top - but it's there in normal operation too, so I
>> don't
>> know if this means something.. (?) And it's not always the CPU which is
>> at
>> 100% - I have an example screenshot with: load average 50.20, 51.61,
>> 41.0
>> 3.2%us, 1.0%sy, 0.0%ni, 77.0%id 18%wa 0.0%hi 0.3%si 0.0%st. But this is
>> only
>> an example - this differs.
>>
>> Thanks,
>> Mirko
>>
>> kjhurni wrote:
>>
>> >
>> > Mirko Guldner;2283448 Wrote:
>> >> Hi,
>> >>
>> >> our OES11SP1 two-server-cluster (fully patched) shows a very high
>> "load
>> >> average" (>50, up to 110) in top in some circumstances. There are no
>> >> problems in normal operation, but administrator actions like
>> shutdown
>> >> or
>> >> cluster migrate might trigger the problem.
>> >>
>> >> For example when I enter 'halt', then there is the following line in
>> >> /var/log/messages:
>> >>
>> >> Sep 12 20:27:18 srv1 shutdown[14675]: shutting down for system halt
>> >>
>> >> more than 20 minutes later:
>> >>
>> >> Sep 12 20:51:19 srv1 init: Switching to runlevel: 0
>> >>
>> >> Within thes 20 minutes nothing happens, but "average load" goes up
>> to
>> >> at
>> >> least 50, with ndsd at top. Access to storage related tools and
>> commands
>> >> is
>> >> not possible, for example 'nss /pool' hangs without any output.
>> >>
>> >> This happens on nearly every shutdown, but from time to time it
>> doesn't.
>> >> The
>> >> same will sometimes be triggered by a cluster migrate.
>> >>
>> >> This only happens with our OES11SP1 cluster, it does not happen with
>> >> OES11
>> >> and OES2SP3; the only other difference I'm aware of: Novell CIFS is
>> >> only
>> >> running on the OES11SP1 cluster.
>> >>
>> >> Any ideas?
>> >>
>> >> Thanks,
>> >> Mirko
>> >
>> > Which process(es) does top show as being the culprit?
>> >
>> > In the past (on OES2 SP3) we had issues with CIFS causing ncp to
>> cause
>> > high utilization, but that was fixed a while ago.
>> >
>> > --Kevin
>> >
>> >
>
> I have seen ncp issues cause high ndsd utilization, but we've not yet
> upgraded our cluster or DS servers to OES11 yet (waiting for new
> hardware to go in place first).
>
> Out of curiosity, are the servers with high utilization also replica
> servers? For some reason, during one of our upgrades on a replica
> server (we have a server that contains all R/W copies of everything),
> the cache size got set down really low and that caused all sorts of
> issues.
>
> Maybe one of my collegues will wander by and offer additional insight,
> as this may be eDir related and/or NCP related. Not sure if triggering
> a core manually would help (but you'd have to send that to Novell and
> open an SR to get it read).
>
> IF you suspect CIFS, do you have the ability to temporarily shut off
> CIFS for like a few days to see if that's the culprit?
>
>
Maybe you are looking for
-
RElationship of Tables and Fields in cProjects 4.0
Hi All Iam writing the functional spec for a custom status report requirement in cProjects. I have experienced deifficulty in getting the information on relationship of tables. Can anyone help me in this regard. I could get the list of tables in PLM,
-
What are the best printer deployment practices for Win Server 2012 R2?
I have about 40 printers deployed around my school. My users move around my building and log into several computers throughout the day. I need to consistently get the correct group of printers to map to the computer upon startup and set a default pri
-
Please help. I am facing a deadline for work an have to find multiple references online. I want to save them to print or read later, or email them to my kindle. However, I can read them "live", but when i save them they cannot be opened later, iget t
-
Hello, I did a tutorial, part of which involved placing a ramp found in Mac/Library/Applic support/Live Type/Images into my project and filling it with a texture using the matte. My problem is, in the tutorial as well as in subsequent attempts of my
-
Inbound refinery : Problem to have same administrator than content server
Hi, I've installed the Inbound Refinery. During the installation, I asked to have the Inbound Refinery as a proxied server in order to have the administrator of my Content Server as administrator of the Inbound Refinery. Both applications are on the