HubTransport UnHealty - Total.Shadow.Queue.Length.Above.Threshold.Monitor - What to check?
Hi To all,
looking at Exchange 2013 ServerHealth i have the "HubTransport" in unhealth state related to this
item:
Total.Shadow.Queue.Length.Above.Threshold.Monitor
I cannot find more information about this issue...
Many thanks for help! :)
r.
Hi,
Found nothing in the public resource, neither.
Is there any error/warning/information in the event viewer?
Please also check the detailed error message in the Monitor if it is possible.
Did this cause other issues, some like mailflow issue etc.?
If everything going well, I suggest disable the Alert.
Thanks
Mavis
Mavis Huang
TechNet Community Support
Similar Messages
-
Swap space used is above threshold of 90
Hi all,
i am facing an issues in linux box which is Swap space used is above threshold of 90
[oracle@gbaheovl21 scripts]$ free
total used free shared buffers cached
Mem: 8388608 8354400 34208 0 32748 4354904
-/+ buffers/cache: 3966748 4421860
Swap: 6257992 5816940 441052
scripts]$ sar -r
07:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
07:10:01 15868 8372740 99.81 16348 4356644 479720 5778272 92.33 767516
07:20:01 93232 8295376 98.89 17624 4304864 433200 5824792 93.08 773240
07:30:01 29092 8359516 99.65 22664 4328688 442496 5815496 92.93 800512
07:40:01 39700 8348908 99.53 28584 4348424 441392 5816600 92.95 768592
Average: 57576 8331032 99.31 56029 4292652 349858 5908134 94.41 775742
Is their any fix to this issue with out reboot ? please any one help me on this ?Hi ,
sorry for that , i am pasting the info below.
Linux venkat.gb.com 2.6.18-238.el5xen #1 SMP Sun Dec 19 14:42:02 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
it is a Oracle virtual linux and here few databases are running
[root@ ~]# sysctl vm.swappiness
vm.swappiness = 60
[root@~]# df /dev/shm
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 4194304 249220 3945084 6% /dev/shm
[root@ ~]# ipcs -um
------ Shared Memory Status --------
segments allocated 31
pages allocated 2099310
pages resident 1003699
pages swapped 881584
Swap performance: 0 attempts 0 successes
[root@ ~]# cat /proc/meminfo
MemTotal: 8388608 kB
MemFree: 29556 kB
Buffers: 39224 kB
Cached: 4458580 kB
SwapCached: 625964 kB
Active: 6728908 kB
Inactive: 548592 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 8388608 kB
LowFree: 29556 kB
SwapTotal: 6257992 kB
SwapFree: 512504 kB
Dirty: 3360 kB
Writeback: 0 kB
AnonPages: 2442472 kB
Mapped: 3943420 kB
Slab: 256204 kB
PageTables: 525464 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 10452296 kB
Committed_AS: 21157240 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 34920 kB
VmallocChunk: 34359703427 kB
thanks
Venkat -
VM exhibiting 100% disk busy time, large disk queue lengths
Hi everyone,
We have a .VHD workload residing on a logical 2 x 136Gb RAID1 mirror pair of disks.
The .VHD file is 130Gb (with 70Gb of free space)
The Virtual Machine is running Windows 2008 R2 SP1, 4 cores and 8Gb of RAM and is exhibiting 100% disk busy time and disk queue lengths of anywhere between 14 and 44
I'm assuming this is because there is virtually no disk space on the logical drive. Ops Mgr 2012 R2 reports high memory pages/sec
So we backed up the .VHD workload, broke the RAID1 Mirror and inserted 2 x 300Gb as a RAID1 mirror and restore the .VHD / VM
The Logical disk has 50% free disk space, however the VM is still exhibiting 100% disk busy time and the above disk queue lengths.
It is running on a Windows Server 2008 R2 SP1 HP Proliant Server running the Hyper-V role under Server Core
Any ideas most appreciated.Hi,
The mirror array doesn’t improve the disk performance but only for the disk redundancy, base on my experience some application frequent operate the large small files often
can use the large disk resource, if you can’t sure the high disk IO cause by the guest vm or host computer, you can use the Resource Monitor first to identify which process handled the high disk resource, then do the further troubleshooting:
The third party Resource Monitor use example:
How to use the Resource Monitor in Windows 7 & Windows 8
http://www.7tutorials.com/how-use-resource-monitor-windows-7
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place. -
Exchange Log Shipping Replay queue length monitor
Hi Guys,
Can anyone tell me, what king of monitor is Log shipping replay queue length monitor??
Is it a average threshold monitor or consecutive samples over threshold monitor?
ThanksHi,
This monitor is optimized for the CCR scenario and raises an alert if the number of transaction logs waiting to be committed is greater than 15 logs and has been waiting for more than 5 minutes. Therefore, it is a Consecutive Samples over Threshold.
You can also get the answer from Microsoft Exchange Server 2007 Management Pack Guide document (Page 72)
http://download.microsoft.com/download/1/E/D/1ED18BCA-B96D-4184-89DB-EDD9A77E5040/OM2007_MP_EX2007_SP1.doc
Niki Han
TechNet Community Support -
Does anyone have any advice/scripts for monitoring queue lengths?
I'd like to be able to monitor the lengths of the queues within my system, ideally
such that once queueing occurs an alert/message of sorts can be raised.
So far I have no continously active monitoring of queue lengths, but am relying
on the average queue length data provided by the pq command, to identify if queuing
is occuring.
Relying on the average queue length reported by pq, I don't think is the best
route to take. Sometimes it provides data that cannot be correct - I get the
impression that unless it has a reasonably constant flow of requests it isn't
very accurate.
I'm assuming what is actually required is some kind of MIB interrogation program,
is there anyone that uses something like this to monitor queues?
The average queue length info provided by pq, does need a little data manipulation
I've discovered to be meaningful, for everyones benefit here's what needs to be
done:
The average queue length is the average number of messages in the queue (inclduing
those being processed) minus one. I don't know the reason for the minus one,
but it is something to be aware of (particularly for MSSQ sets).
I subtract the number of servers serving the queue from the average queue length,
then add the one back on. This gives the average number of requests in the queue
that are actually waiting to be processed.
thanks
JodyJust found it. Coherence->Cache->DistributedCacheForMessages->Attributes->Size
-
Capturing/dumping active thread, queue length and throughput to a file
Hi,
I would like to capture active thread information, queue length, and throughput
information to a file for later analysis. This would be similar to setting the
verbosegc flag at the java startup to dump the heap space memory usage.
Is there a way of doing this?Just like with heap info, you can use weblogic.Admin GET functionality to query
WebLogic runtime information.
To see which bean types you need to query and what properties are available you
can deploy these 2 jsp's on your weblogic:
http://dima.dhs.org/misc/listMBeans.jsp
http://dima.dhs.org/misc/showMBean.jsp
and point your browser to listMBeans.jsp - the rest is self-explanatory.
Mark Officer <[email protected]> wrote:
Hi,
I would like to capture active thread information, queue length, and throughput
information to a file for later analysis. This would be similar to setting the
verbosegc flag at the java startup to dump the heap space memory usage.
Is there a way of doing this?--
Dimitri -
Our organization is having some slowness problems particularly when most are logging on and off so
mornings and 330 or so I've been through everything bandwidth etc we have 10G switches but I've come across this I believe is the problem on our server that we redirect everyones desktop and profile etc. On that drive in he resource monitor there is a section
for Disk Queue Length that I've read should be 0-2. Ours averages 5-10 and spikes to 50 during these slowness times. All our servers are VMware, its on a SAN with SSD drives so what can I do to resolve this. Its just on the drive that that data is on so we've
been considering creating another drive and splitting up the users profile folders or do we need another separate server? How can I fix this problem? Is there a limit to the amount of users that can be setup to access one server? Do I need to break that up
to several servers?
JasonHi Jason0923,
The Disk Queue has Length may caused may reasons, such as high workload with SAN IO bottleneck, generally we can first confirm whether your SAN write disk cache has enabled,
others clue is you can refer the following article to determine whether there have IO bottle neck with your SAN.
Monitoring Queue Length
https://technet.microsoft.com/en-us/library/cc938625.aspx?f=255&MSPPError=-2147217396
Windows Performance Monitor Disk Counters Explained
http://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspx
I’m glad to be of help to you!
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected] -
WiSM Issue - More than three APs working at the power above threshold
According to the RRM document: http://www.cisco.com/en/US/tech/tk722/tk809/technologies_tech_note09186a008072c759.shtml , for each AP, the 3rd loudest AP should be heard at the power equal or lower than the threshold (default -65dbm or -70dbm according to different codes). But in our enviroment, I can see it is not true. Many APs have more than three neighbours heard at above threshold. Is it true in your enviroment as well?
Thanks!
ZhenningIt's the way I have deployed the network. When the number of access point increases (though in diff channel) chances of interference increases leading to decreased performance. So Adjust the power settings and the number of access points accordingly.
-
Need help on explanation of Avg. Disk Queue Length
Based on perfmon, my Avg. Disk Queue Length on physical dick hit 100%.
What's that mean? Really need explanationI’m a bit confused by your statement. I'm not sure where the 100% is coming from.
Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval.
Current Disk Queue Length is the number of requests outstanding on the disk at the time the performance data is collected. It also includes requests in service at the time of the
collection. This is a instantaneous snapshot, not an average over the time interval. Multi-spindle disk devices can have multiple requests that are active at one time, but other concurrent requests are awaiting service. This counter might reflect a transitory
high or low queue length, but if there is a sustained load on the disk drive, it is likely that this will be consistently high. Requests experience delays proportional to the length of this queue minus the number of spindles on the disks. For good performance,
this difference should average less than two.
This whole topic can get very confusing.
Think of Current Disk Queue Length as in flight operations.
These are disk read or write that have passes through the Performance Filter Driver and are on their way to the physical disk and back. While in flight a disk activity must pass through (Assuming a SAN) your class drivers, multi path drivers, HBA card
the network fabric, Switches and into the SAN. Any of which could introduce a bottleneck.
Then the acknowledgment of completion must return.
Think of Avg. Disk Queue Length as disk activities waiting to jump onto the flight.
So if you have an Ave. Disk Queue Length happening thinks of this as cars backing up on the on ramp to get on to the highway.
Typically I start disk analysis by looking at:
Logical Disk\Ave. Disk sec/Read
Logical Disk\Ave. Disk sec/Write.
The Queue Length counters are secondary and only used if the latency counters are out of spec.
Here are some good Blog and tools to use to follow up.
Taking Your Server's Pulse
http://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspx?pr=blog
Performance Analysis of Logs (PAL) Tool
http://pal.codeplex.com/
The Case of the Mysterious Black Box
http://blogs.technet.com/b/clinth/archive/2009/11/18/the-case-of-the-mysterious-black-box-san-analysis-for-beginners.aspx
Bruce Adamczak
Bruce Adamczak -
Copy Queue Length - All of a sudden one server having communication issues
We have 4 servers in a DAG (3 at site A and 1 at site B).
Of the three servers at site A two of them always show 0 copy queue length. Recently one of the servers started to show a back log and we are seeing the following in the event viewer. We see this error when this problem server connects to either
of the other two in the same physical site.
The log copier was unable to communicate with server 'ABC1'. The copy of database 'DB2\ABC1' is in a disconnected state. The communication error was: An error occurred while communicating with server
'ABC1'. Error: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. The copier will automatically retry after a short delay.
At night the queue goes back to 0 and we start over again. Currently the problem server only has passive copies, we moved of active just in case.
I have tried using the MAPI network to replicate (Different physical NICs and switches), that was just worse. Also tried deactivating the primary NIC in the team and using the secondary that is connected to a different core switch.
Any ideas?Hi,
Basic on your post, I understand that one DAG member always show 0 copy queue length with error “Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. The copier will automatically retry after a
short delay”.
If I misunderstand your concern, please do not hesitate to let me know.
Please run below command to double check the connectivity between server:
1. Use netsh int tcp show global.
2. Use netsh int tcp to set global autotuninglevel=disabled.
3. Use netsh int tcp to set global chimney=disabled.
4. Use netsh int tcp to set global rss=disabled.
Meanwhile, follow below steps:
1. Please use the Get-DatabaseAvailabilityGroupNetwork cmdlet to check if DAG network is ok.
2. Run the Update-MailboxDatabaseCopy -Identity xx cmdlet to seed a copy of a database.
3. Restart the Microsoft Exchange Replication service.
4. Please ensure that port 64327 is open.
Thanks
Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact [email protected]
Allen Wang
TechNet Community Support -
"CPU#0 Tempature above threshold"
I am running Arch Linux with the ArchLinux supplied 2.6.2 kernel. When I try to compile a new kernel, I get the message repeatedly on every terminal, and CPU usage shoots up to 100% (according to IceWM).
Here is what the kernel tells me:
CPU#0: Tempature above threshold
CPU#0: Running in modulated clock mode
I see those messages repediately (like 3 times a second), and every once and a while I get compiler messages on what is being compiled. It seems to be compiling fine, but those messages are on every terminal, making it unable to do anything! What's the problem! I need help!I cleaned out the computer (it was pretty hot), and it still displayed the message, but it wasn't as bad. Maybe they should cut down on those messages, and say when they start and stop, but not every blinking second!
-
DB, Replay Queue length is growing
Exchange 2013, it's just starting after migration from 2010 to 2013.
Replay Queue length in specifed passive DBs which is healthy has been growing rapidly in business
hour, however Copy Queue length is ok.
And it's not decreasing them at all in business hour, I'm serching for the cause for that,
MBX server performance, Disk I/O or networok... need help.
Even if that's night time, it looks that specified DBs on one server it has long replay Queue length logs.HI tanale,
Seems log files are copied to the passive copies of the mailbox databases. But the log files are not replayed to the passive database.
Please verify "Don't mount this database at startup " check box selected on the database. If yes please
uncheck it.
Regards
Chinthaka Shameera | MCITP: EA | MCSE: M |
http://howtoexchange.wordpress.com/ -
Another question about throughtput and Queue Length
I do not exactly confirm the meanning of throughput and Queue Length in
the console of performance of weblogic server.
AnyOne can give me a explanation? Thanks a lot!If you look at the ExecuteQueueRuntime MBean
(you can use these 2 jsp's:
http://dima.dhs.org/misc/listMBeans.jsp
http://dima.dhs.org/misc/showMBean.jsp
or Sun's HTMLAdaptor: http://dima.dhs.org/misc/StartHtmlAdaptor.jsp
to browse WLS MBeans.
PendingRequestCurrentCount
Returns the number of waiting requests in the queue.
ServicedRequestTotalCount
Returns the number of requests which have been processed by this queue.
ExecuteThreadCurrentIdleCount
Returns the number of idle threads assigned to the queue
PendingRequestOldestTime
Returns the time that the longest waiting request was placed in the queue.
Eric Nie <[email protected]> wrote:
I do not exactly confirm the meanning of throughput and Queue Length in
the console of performance of weblogic server.
AnyOne can give me a explanation? Thanks a lot!--
Dimitri -
CPUx: Temperature above threshold, cpu clock throttled
My cousin (Windows) computer suicided... it stopped
booting without any apparent reason.
He asked my help and we stored every important data in an
external drive using an arch linux installed in a usb key
(thanks to every wiki writer).
I thought it could be an hardware problem so I started
stress with with a command line like:
stress --cpu 2 --io 1 --vm 1 --hdd 1
(pwd was in a Windows mounted partition)
it is already one hour the program goes. I got the error
of the title few times:
CPUx: Temperature above threshold, cpu clock throttled
for both CPUs. I thought it was normal, after all
the cpu might overheat under stress and it slows down
to protect itself...
But suddenly I got a doubt, it is really the CPU that
slows down itself (so it is an hardware feature) or it is
linux that slows it down (so it is a kernel feature)? In
this second case do anyone knows if Windows do anything
similar?
Secondly in the error log I can see:
EDID checksum is invalid, remainder is 210
Raw EDID:
<hex dump omitted>
I sought in the Internet, but I could not understand what
it means nor if it is dangerous... can anyone help me?
ThanksThe BIOS is usually responsible for setting thresholds and has settings to halt the computer when those thresholds are too high.
Since the program is testing stress, I'm guessing the software is throttling the CPU so the system does not shutoff and the test can still process.
I would highly suggest checking your BIOS's tempertature thresholds, as well as the fan speed settings (BIOS or software).
I would also find out what a normal temperature for the CPU is and do some testing on what different loads do to the temperature. -
Monitor BizTalk Host Queue length and suspended msgs w/SCOM
First, I hope the BizTalk forum is the right place to ask this. Maybe I should try the SCOM forum as well.
I'm trying to create two monitors (Not rules, as we want the alert to be automatically healthy when under treshold again and we want to see the status state as well) in SCOM based on performance counters for BizTalk Msgbox Host Queue Length and suspended
msgs. My question is what I should use as target (class) in SCOM? And can I use "All instances" of the counter or must i create a monitor for each instance (This is a lot of work and not very dynamic)? We want to monitor all the instances/hosts with
different tresholds, so the first thing I did was to target the "BizTalk Host" class, so I can do overrides to different hosts.
The problem with this is it will generete a alert for all hosts if one instance is over treshold. I also tried to target the "Run-time role", and this actually works better, but not perfect as i cannot set a treshold for just one instance/host
then and it will close the alert if any other intance is under treshold.
Anyone have experiences with SCOM and monitoring Hosts queues and/or suspended msgs as monitors?
thank you in advance for all suggestions!I would suggest to look into spool table and its size . As per recommendation it count should not be greater than 3000 per server .
Its easy way to monitor the performance counter "Message Box:General Counters /Spool size".you can execute one the following SQL in the BizTalk message box database.
You can have a counter for spool table size and manually you can use below sql query to find out the count.
SELECT count(*) from SPOOL WITH (NOLOCK)
SELECT top 1 rows FROM sys.partitions WHERE object_id = object_id(‘spool’)
Note :The NOLOCK keyword is important in first query, you don’t want to put any locks in the spool table while measuring the row count. The second query is the one used by the performance counter “Spool Size” using the stored procedure
“MsgBoxPerfCounters_GetSpoolSize”
Reference :http://msdn.microsoft.com/en-us/library/aa561922.aspx
Thanks
Abhishek
Maybe you are looking for
-
How to read a CSV file into the portal
hi all, I want to read the content of CSV file that is avaliable in my desktop. plz help me with suitable code Regards Savitha Edited by: Savitha S R on Jun 1, 2009 8:25 AM
-
Bookmarks and Hyperlinks not Working in PDF's
Hi, I have 2 pdf files, one has extensive bookmarks which do not work. The other pdf has anchor links to refer you to certain areas in the document some work and some do not. How can i fix this in Acrobat Pro 11
-
Product creation in CRM system
hi Guys, I created Attributes, Set Types, Category and hierarchy. Later I when to Product maintenance and created a product under the new Category I created. When I tried to save the product, it says 'saving is not possible'. The error message is "Da
-
Restrict loading in info package by REQUID
Hello, I want to load data from cube in the Bw to another ODS I do not want to load the all data but to refer to the last reuest which been loaded to the cube. In Infopackage I have the possibility to restirct data by REQUID but when I choose type -
-
REM tick not getting deleted for Part no
Dear All , we have one material which had Procurement type " E " ( In-house SFG ) , But now the Part is totally Procured from vendor . So the Procurement type is changed to " F " . So we need to do the Material Master Correction . If we Try to dele