Custom Logical Disk monitor incorrectly flapping between healthy and unhealthy

One of the clients Ops Mgr 2012 SP1 UR8 environments I am supporting has had some custom logical disk monitoring setup; there are 5 groups dynamically populated by logical drives depending on their size (1st group has small drives up to the last group with
very large drives). There is a 'Warning' and 'Critical' Monitor setup per server OS version, the Monitors are not Enabled. There are Overrides applied to each group to enable the Monitor and apply a threshold - different threshold for each group.
During some BAU tuning I could see that some of the above Monitors were appearing as Top-Talking alerts. Further investigation showed that alerts were being triggered by drives that momentarily dropped below the applied threshold. I re-created the Monitors
from 'Simple Threshold' to 'Consecutive Samples' and set the 'Number of Samples' to 6 @ 3 minute intervals.
What I am seeing is that alerts from the above Monitors are still appearing as Top Talkers. When I check the Health Explorer of repeating alerts I can see the disk space is staying the same, below the applied threshold but the health is turning healthy then
back to unhealthy. I have confirmed each noisy Object has the expected threshold as per its dynamic group allocation and have also confirmed the drives are not fluctuating above and below the threshold. One thing I have noticed is that some drives Performance
View is patchy - lots of dotted lines between the coloured lines.
Its almost like the Monitor moves a Logical Disk Object into unhealthy state in the correct (and expected) manner, then it somehow picks up an incorrect threshold which is below the current usage level. This moves it into a healthy state only for the
whole process to repeat. For example: Drive X: on a server is very large, the Group that it sits in has a threshold of 102400MB, its current usage is ~stable at 45500MB. Looking in Health Explorer I can see 3:01pm green state/ 45573 last sampled value/ # of
samples 1 | 3:16pm yellow state/ 45573/ 6 samples | 3:34pm green state/ 45572/ 1 samples | 3:49pm yellow state/ 45571/ 6 samples | 4:01pm green state/ 45425/ 1 sample etc etc.
I'm scratching my head on this one and would appreciate any suggestions or assistance.
Thanks
BT

Thanks for the reply. It is not just one server / drive this is happening on. I am seeing it on everything; once they go into an unhealthy state they periodically go healthy and back again with no change in disk free space. Just to elaborate on how it is
setup; a Monitor has been created for each OS version (2003, 2008 and 2012) and a separate Monitor for Warning and Critical so 6 Monitors in total. Looking at the Warning Monitors; they are created with a threshold of 5120MB for 6 samples and set to disabled.
The following groups have been created and the following thresholds added:
Group 1 (less than 60GB size): override added to enable. This group will then pick up the 5120MB threshold.
Group 2 (60 – 250GB size): override added to enable and override added for 10240MB threshold
Group 3 (250 – 500GB size): override added to enable and override added for 20480MB threshold
Group 4 (500 – 1TB size): override added to enable and override added for 51200MB threshold
Group 5 (>1TB size): override added to enable and override added for 102400MB threshold
One drive I was looking at was in Group 2 (threshold of 10240MB), it was staying at approx. 8500MB but periodically going into healthy state then after 10mins (6 polls @ 2min intervals) back to unhealthy. This process repeats once or twice per day.
I am wondering if the Object is somehow picking up the threshold of the Monitor (5120MB) then going back to its correct overridden threshold. I have setup some test groups and monitors in a lab and will review the results over the coming days.
When the monitors were setup as 'Simple Threshold' this worked fine but were noisy due to drives spiking downwards. It was only when I re-wrote them as 'Consecutive Samples over Threshold' Monitors that this issue has started occurring.
Thanks

Similar Messages

  • Windows Server 2008 R2 Logical Disk monitor is not listed in SCOM 2012

    Dear Experts,
    I tried to create an override of Logical Disk space threshold value changes on one of Windows 2008 R2 OS, But unable to find the respective monitor of 'Windows Server 2008 R2 Logical Disk' monitor in the scope, But I can see Windows Server 2008 Logical
    Disk monitor is there and if I do right-click create override the monitor for particular specific of Windows server 2008 computers then the Windows 2008 R2 computer is not displaying there. Finally I found Windows Server 2008 R2 Logical Disk monitor is
    missing ?
    Can you please help me to achieve this ?
    Saravana Raja

    Hi,
    Same issue/question in this thread:
    https://social.technet.microsoft.com/Forums/en-US/15f634e2-57d7-4f57-b579-61e5ee6a01a2/monitor-hard-drive-space-windows-2008-r2?forum=operationsmanagergeneral
    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

  • Target drives for specific groups - Logical Disk Monitoring

    I want to target different drives and thresholds for logical disk monitoring. I want to be able to place the server in a group and monitor the drives accordingly. Can this be done? How would you go about doing it?

    couple of ways you can go about this:
    1. Place the servers into a group and apply the necessary overrides to the logical disk monitors
    2. If you don't want the same threshold applied to all the drives on the same server create a new group and add the  specific drives to the group and then apply the necessary overrides to the group for logical disk monitors.
    3. If you don't want the same threshold applied to all the drives you could also create a dynamic group ie all D: for servers and then apply the overrides to that group.
    example of this see: https://social.technet.microsoft.com/Forums/systemcenter/en-US/c2719fb1-9298-435a-8bf9-3c92d4b34f85/making-groups-of-logical-disk?forum=operationsmanagergeneral
    Cheers,
    Martin
    Blog:
    http://sustaslog.wordpress.com 
    LinkedIn:
    Note: Posts are provided “AS IS” without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

  • Logical Disk monitoring - Overrides galore?

    Hi all!
    We have a number of systems with differing needs for storage thresholds.  The defaults work the majority of time, but there are a number of 'buckets' our disks need.  For example, some disks might warrant an error at 5GB free, 10GB free, or even
    25GB free.  Let's pretend the percentages aren't important for now.
    Just to be clear, is it correct that I would need to create 2 (system / non-system) * 3 (2003, 2008, 2012) = 6 overrides for every single bucket I want (and make no mistake, we need way more than 3)?
    This seems absurd.  I can handle dumping the appropriate logical disks into buckets based on PowerShell or some other programmatic method.  Creating
    overrides seems a bit more complicated.  Is there not a simpler way to do this?
    I'm comfortable using PowerShell, but the 600 line example above seems like a pretty steep requirement for a very basic and common task (that is cumbersome and delay ridden when using the SCOM console).
    Please tell me there is an easier way to create overrides for logical disk space monitoring!

    I run into the same questions and problems with most customers. What I always suggest is to give the power to the server and/or application owners. This can be accomplished a few different ways, and I have done this by using a registry entry or by using
    a file on the root of each disk.
    For example, everyone gets default thresholds for disk monitoring. If the server owner wants different thresholds for a disk on their server(s), then they would create a csv file on the root of each disk in which they want different thresholds (or create
    a registry entry with similar threshold configuration).
    If you modify the built-in disk monitor by checking whether this particular file is present, then read the file to get the custom thresholds and bypass the default thresholds for monitoring that particular disk. The built-in disk monitoring script is pretty
    long, but if you know a little VBscript then you should be able to figure out where to add more logic to retrieve custom thresholds based on csv file or registry information.
    You will need to put this monitor in a new pack (for example, Windows 2008 Operating System Extended), and disable the unit monitor in the vendor pack. This would end up being 2 new "extended" packs if you were to also do this for Windows Server
    2012.
    This will effectively remove the responsibility from the SCOM admin to manage hundreds (or more) disk monitoring thresholds, and place the responsibility into the hands of the server owner. This has worked quite well in any environment I've work in.
    Jonathan Almquist | SCOMskills, LLC (http://scomskills.com)

  • False Alerts from Cluster Server Logical Disk Monitor

    Hi All,
    I have a cluster server & on one of its server there are 10 cluster disk.
    Today from no where since last 5 hours I have been getting critical alerts for Cluster Disk - Free Space Monitor (MB) with every poll, when the threshold is well above the defined in this monitor.
    After few mins when this critical alert gets resolved, I get a warning alert from Cluster Disk - Free Space Monitor (%). In this case also the threshold is well above.
    Also, this issue with just this disk & all other disks perfectly fine. Kindly help.

    I'm not sure which version of MP you have but the following link may help which have a bug fix for same:
    http://technet.microsoft.com/en-us/library/dd262079.aspx

  • External monitor - any difference between 2ms and 5ms?

    Are there any significant differences?
    Would I only notice on super fast graphics?
    Should I be fussed in choosing between monitors that have either 2ms or 5ms?
    Thanks
    Omar

    Between 5ms and 2ms is not a lot of difference.
    Once you get that low on response times narrow your field by looking for a monitor with little input lag.
    Input lag won't affect image quality, but it can add a bit of latency to your games.

  • Flapping between CSS and router

    We sat CSS bellow ;
    e0 : connected to SW(Cat4006)
    e2 : connected to WEB Svr #1 (w/ IP 172.31.6.41)
    e3 : connected to WEB Svr #2 (w/ IP 172.31.6.42)
    load-balancing WEB#1 and WEB#2 (w/Virtural IP 172.31.6.43)
    VLAN 1 : IP 172.31.6.5 (mng access IP)
    We also linked another VPN device at the same SW (172.31.6.0 network).
    The VPN device was reboot, then we can see the flapping event log.
    Help me!!

    what type of CSS ?
    What software version ?
    What is flapping exactly ?
    How often ?
    What the interface config of the CSS and the SW ?
    Gilles.

  • Difference between HRHAP00_TMPL_608 and HRHAP00_TMPL_GETLIST badis in OSA

    Hello,
    1. Can anyone tell me the difference between the following two BADIs in Objective settings & Appraisals
    HRHAP00_TMPL_608
    HRHAP00_TMPL_GETLIST
    2. Actually we have created a custom implementation of HRHAP00_TMPL_GETLIST. It is working fine in restricting the templates in portal.
    But when I try PHAP_ADMIN_PA / PHAP_CHANGE_PA in gui, it is throwing an error "This template XXXX is not assigned to the add-on application. Any reason?
    Thank You & Regards
    Raghu Kolukuluri

    Hi
    About the 2nd point, maybe you should look at the tables VC_T77HAP_CATEGORY and VC_T77HAP_CAT_GROUP from tcode SM34.
    I am not sure about the cause of this problem, but my guess is that it may be arising due to the incorrect linking between categories and category groups.
    Hope this helps.
    Regards,
    Vikas Bhatia

  • Screen vibrates when booting up. It fluctuates very fast between Safari and Microsoft Messenger. Back and forth .... flicking away .... back and forth, very, very, quickly. And then when its completely finished booting, everything is OK.

    I sign in and when Safari opens, whilst Microsoft Messenger is downloading, the monitor screen vibrates between Safari and Messenger at a very, very, fast pace. Back and forth, back and forth, alternating between the two. When the booting process is completed the computer is fine. Everything is OK. Anyone know what is causing this, and what to do about it?

    YES .... but they have always been set to open at login. I must admit that if Messenger isn't set to open at login this doesn't happen. But even though Safari opens at login, the opening page in Safari doesn't open; I have to click on Safari again, (when I am reminded after the computer has booted-up), because the monitor screen is just blank! Then after a click on the Safari icon the screen fills up in a second. Gees, what have I unintentionally done?

  • Differnce between KE24 and VA05

    Hi,
    I found that the customer wise information is not macthing between KE24 and VA05. Infact the total seems to be matching, while the individual customers seems to be getting consolidated to a few customers.
    This has started happening only recently.
    Can any one tell me why is this happening.
    Regards,
    Arun

    Periodic billing is used when you want to bill the customer at different points of time based on the periodicity or progress of the work. Milestone billing can be used for this type of billing purposes when you want to have billing control from the project.
    Resource billing is based on the resource consumption for the particular activities. Dynamic Item Processor (DIP) profile is used for the resource related billing.
    Further information acn be had from the link
    [http://help.sap.com/saphelp_47x200/helpdata/en/aa/96853478616434e10000009b38f83b/frameset.htm]
    Hope this is of some help
    Venu

  • Can't switch between HDMI and VGA displays

    I have a new H430, with hdmi and vga ports. My HP zr2240w monitor permits switching between AVG and HDMI displays, which I would like to do. But when I plug in both cables (HDMI and VGA) only VGA will work. If I  switch to HDMI or make it the default display,  the screen freezes at the desktop background: no mouse, no taskbar etc. Is is possible that the D430 video card will not support switching between displays (on  a single monitor)
    thanks,
    saintmaur

    Dear Monika and Mylenium: I figured out part of the problem, but not the solution: The shape I had drawn -and coudn't  switch between fil and stroke on-is one I had already done a gradient mesh on, too. I had later abandoned the gradient mesh by using the option key to delete mesh lines, but that didn't take the shape all the way back to its original flat state. So it wasn't a plain drawn shape any more. So how do I release the gradient mesh  to get back to  a flat shape? I'm also having trouble keeping the shapes in RGB or CMYK mode-they keep defaulting back to greyscale. Could that also be a symptom of using gradient mesh? Note: I'm using gradient mesh for the very first time now; I've never used it before. It seems to leave a lot of unexpected baggage behind. I would send a screen shot, but if I start up Grab or Voila, all the AI pallets go away, and that's half the information.  So how do I control gradient mesh?

  • Logical Disk Free Space Monitor - Slow to detect low free space

    We are using the built in two trigger (MB and %) logical disk free space monitor in SCOM 2012 R2. We have setup overrides for MB warning and critical for both system and non-system drives and for a group containing disks we do not want monitored. The monitor
    actually works fine, triggering an alert when both the MB and % free criteria are met. The problem is that it takes almost an hour for the initial alert to fire. After the initial alert, if I further fill the disk to push it from warning to critical, the alert
    changes within the specifiec interval, which we have left at 15 minutes. The alert also clears using the 15 minute interval.
    Has anyone else seen this behavior with this monitor? A disk monitor that takes an hour to fire is not going to be very useful.

    I wanted to see for myself if there was anything else that I might be missing, so I opened up the Windows 2008 Logical Disk Free Space monitor XML and noticed that there is a NumSamples configuration that is set to 4. So, if the interval is 15 minutes, the
    disk would have to exceed both threshold types for 4 consecutive intervals in order to change state and generate alert. This would be a minimum of 1 hour before an alert is raised with the default 15 minutes interval.
    Unfortunately, NumSamples is not overrideable in the monitor type, which is too bad... The only way to get an alert sooner than one hour is to override interval. For example, if you want an alert within 20 minutes, override interval to 300 seconds (5 minutes).
    Here is the code - see for yourself:
    <UnitMonitor ID="Microsoft.Windows.Server.2008.LogicalDisk.FreeSpace" Accessibility="Public" Enabled="true" Target="Server2008!Microsoft.Windows.Server.2008.LogicalDisk" ParentMonitorID="SystemHealth!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Microsoft.Windows.Server.2008.FreeSpace.Monitortype" ConfirmDelivery="true">
    <Category>Custom</Category>
    <AlertSettings AlertMessage="Microsoft.Windows.Server.2008.LogicalDisk.FreeSpace.AlertMessage">
    <AlertOnState>Warning</AlertOnState>
    <AutoResolve>true</AutoResolve>
    <AlertPriority>Normal</AlertPriority>
    <AlertSeverity>MatchMonitorHealth</AlertSeverity>
    <AlertParameters>
    <AlertParameter1>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</AlertParameter1>
    <AlertParameter2>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter2>
    </AlertParameters>
    </AlertSettings>
    <OperationalStates>
    <OperationalState ID="UnderWarningThresholds" MonitorTypeStateID="UnderWarningThresholds" HealthState="Success" />
    <OperationalState ID="OverWarningUnderErrorThresholds" MonitorTypeStateID="OverWarningUnderErrorThresholds" HealthState="Warning" />
    <OperationalState ID="OverErrorThresholds" MonitorTypeStateID="OverErrorThresholds" HealthState="Error" />
    </OperationalStates>
    <Configuration>
    <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
    <DiskLabel>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</DiskLabel>
    <IntervalSeconds>900</IntervalSeconds>
    <SystemDriveWarningMBytesThreshold>500</SystemDriveWarningMBytesThreshold>
    <SystemDriveWarningPercentThreshold>10</SystemDriveWarningPercentThreshold>
    <SystemDriveErrorMBytesThreshold>300</SystemDriveErrorMBytesThreshold>
    <SystemDriveErrorPercentThreshold>5</SystemDriveErrorPercentThreshold>
    <NonSystemDriveWarningMBytesThreshold>2000</NonSystemDriveWarningMBytesThreshold>
    <NonSystemDriveWarningPercentThreshold>10</NonSystemDriveWarningPercentThreshold>
    <NonSystemDriveErrorMBytesThreshold>1000</NonSystemDriveErrorMBytesThreshold>
    <NonSystemDriveErrorPercentThreshold>5</NonSystemDriveErrorPercentThreshold>
    <NumSamples>4</NumSamples>
    </Configuration>
    </UnitMonitor>
    This proves 2 things:
    1. Your testing proved that the monitor is working as designed - you got an alert in about an hour
    2. This is a bad design at best, or a bug if you wish, as NumSamples should not be a hidden configuration - it should be exposed in override parameters in the console.
    This should be fixed by Microsoft.
    Jonathan Almquist | SCOMskills, LLC (http://scomskills.com)

  • "logical disk health" monitor is critical

    Hello
    We have 2 HP-UX servers to monitor and everything is fine until we mount and unmount the servers disks. when we change a disk "Logical
    Disk Health" monitors State goes Critical. unfortunately this change may happen again. I reduced the discovery interval to 5 minutes and no luck. other monitors like "% free space" are healthy. what am i missing?

    I have done this. I've mentioned above the "Logical Disk Health" monitor is in critical state. other
    monitors are healthy.
    Some body told me override the physical disk discoveries interval from 14400 to 600 for test. it didn't
    work. 
    I checked the HP-UX MP user guide, there is no information about logical disk health parameters and dependencies. 
    any idea?

  • What is the difference between Logical Disk and Physical Disk?

    Hi.
    When I do Performance Monitor, I got Logical Disk Avg. Disk sec/Write counter and  Physical Disk Avg. Disk sec/Write counter.
    But I can see the different Avg. value and Max. value. 
    Even if Logical and Physical Disk are one-to-one mapping.
    Why did i get the result?
    One the other hands, I got a same result that Logical Disk Avg. Disk sec/Read counter and  Physical Disk Avg. Disk sec/Read counter's Avg. value and Max. value.

    Physical Disk refers to an actual physical HDD (or array in a hardware RAID setup), whereas Logical Disk refers to a Volume that has been created on that disk.
    So if you have one disk with one volume created on it then the values are likely to be 1 to 1, but if you have multiple volumes on the disk, for instance a physical disk with C:\ and D:\ volumes running on it, then the logical disks relate to c:\ and d:\
    rather than the disk they're running on.
    See
    http://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspx for a more in depth explanation.

  • Need to separate drive alerts with Logical Disk Free Space monitoring in SCOM 2012

    I have an interesting need here to separate our SCOM alerts for Logical Disk Free space so that one alert is for OSSystem drives ONLY (C:/D:) and the other monitor alerts on all APP drives only (E:, etc). So far we have had great success using Kevin Holman's
    blog post.
    http://blogs.technet.com/b/kevinholman/archive/2009/11/24/writing-monitors-to-target-logical-or-physical-disks.aspx
    We have overrides set so that the monitors report ONLY the percentage of free space left and ignores any MB threshold. So far so good, the alert comes in that host A reports low disk space on D: at 2.345...% free or host B reports low disk space on F: at
    4.567...% free space etc. Now that we have our monitors working within the Windows Server classes Logical Disk, we need to set these monitors so that one is just for C or D drives with the alert named system Logical Disk Free Space OS Disk Warn and the other
    monitor just reports on E - Z drives (excluding C or D) with the alert named Logical Disk Free Space APP Disk Warn.
    We are very new to SCOM so I made the rookie mistake of creating a dynamic group for all Windows Server 2003 Logical Disk class that only includes Device Name = C or D. But found out too late you cant point a monitor to a group, it has to target a class.
    And using the current monitors we set up with the above blog uses the correct logical disk class, but it doesnt care what instance (device Id = value), it will report low disk space on ANY logical drive. How in the world can we separate and exclude these monitors
    so that one alerts only on OS disks (C and D) and the other only alerts on app disks (E through Z)?

    Hi Kevsharp,
    Quite confusing after reading your question.
    So based on your requirement, What i understand is you need separate alerts for all the drives of the disk is running at low or out of space right ?
    For the above just create a simple performance counter monitor and use the same counters as kevin has used in his blog.
    Now Target: Use Windows server operating system (This will target all the Windows operating system agents in your SCOM. If the specified discovery MP's are installed).
    Set a threshold Below 10% is critical or what ever. You will get the alerts in your console.
    Gautam.75801

Maybe you are looking for

  • XSLT mapping deserialization

    All, we are getting a response from a system in the following structure, <response> <responsetext><Employee><name>john</name></Employee><Employee><name>sam</name></Employee></responsetext> </response> The entire XML between <responsetext> is a string

  • WebI Processing servers stuck after reach memory threshold

    Hello all, since few day, we encounter several issue with Web Intelligence Processing Server. In the CMC, the server is stuck in starting mode. It seems appear when the server reach maximum memory threshold (configured at 1.8GB) FP2.1 has been instal

  • Filtering XML data by first letter

    I have an XMLList of plant data. I know how to display a subset of data based on matching (thanks, Tracy): _plantsA = _myPlants.(@plantName == "Abelia"); or _cheapPlants = _myPlants.(@price<=5); But what's the syntax for matching plants whose name BE

  • Delete requests in PSA found error DDL time

    Hello Experts, I got the following situation and I'd like some suggestions from anyone who used to got the similar problems: - SAP BI-QA were refreshed from SAP BI-Production - delete old requests in PSA - found the following errors: (1) DDL time(___

  • Need help writing PHP if statement using a date comparison

    Using PHP/mySQL have item records with timestamp at insert in column items.item_listed want to show an image that says 'Item expired', if timestamp is over 60 days ago. thanks for your help, Jim Balthrop