Agent health state issues

scom2012R2
Under Monitoring - Operations Manager-Agent Details-Agent Health Status
I have a number odd server states showing, i.e.,
1) Agent State from Health Service Watch pane shows server as critical
while Agent State pane shows state as healthy - grey circle with check mark
Tried a agent repair and no change
2) Agent State from Health Service Watch pane shows server as healthy with green cirlce and check mark
while Agent State pane shows state as green circle not monitored
Tried a agent repair and no change
3) Agent State from Health Service Watch pane shows server as critical
while Agent State pane shows state as grey triangle Warning
Tried a agent repair and no change
Under Active Alerts a number of servers show as critical with Alert Details as
This monitor indicates that the target System Center Management Health Service that this Watcher is monitoring was not able to load internal system rules.
It says the resolution is to do a repair but that did not help
NM

Hi,
This two views are important – because it gives us a perspective of the agent from two different points:
1.  The perspective of the agent monitors running on the agent, measuring its own “health”.
2.  The perspective of the “Health Service Watcher” which is the agent being monitored from a Management Server".
If any of these are red or yellow – that is an excellent place to start.  This should be an area that your level 1 support for Operations manager checks DAILY.  We should never have a high number of agents that are not green here.  If they
aren't – this is indicative of an unhealthy environment, or the admin team not adhering to best practices (such as keeping up with hotfixes, using maintenance mode correctly, etc…
Use Health Explorer on these views – to drill down into exactly what is causing the Agent, or Health Service Watcher state to be unhealthy.
Please go through the link below for more details:
http://blogs.technet.com/b/kevinholman/archive/2009/10/01/fixing-troubled-agents.aspx
Regards,
Yan Li
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected]

Similar Messages

  • Custom Monitor Health State Troubleshooting Techniques

    Hi,
    I have created a Custom Timed Script 2-state Monitor through Ops Console in SCOM 2012 SP1 and the health state is misbehaving and remains Healthy always, even while the unhealthy expression should be evaluated as True.
    I have confirmed the monitor is targeted to class IIS FTP Server, as the monitor appears in Health Explorer for FTP Server and is displayed as enabled.
    I have used LogScriptEvent in the script to write an event of the results of the script on the agent and the value that will be passed back to the SCOM server, so I also know the script is executing and generating the correct results.
    The script returns a numeric value, either 0 or a positive value (ie. no negative value number).
    The healthy expression is = 0.  The unhealthy expression is > 0.
    The returned value represents the number of missing files and when it is for example 12, the health state is not changing to unhealthy and no alert is generated.
    I have updated the MP version and confirmed the agent receives the latest version.
    I have flushed the agent health state.
    I have reset and recalculated the monitor.
    I have reviewed agent and server event logs and do not see any related errors.
    I have used a simple alert description in case it itself had an error and was preventing a healthy state change.
    I have review the xml and see no issue. I updated the monitor xml manually to receive and compare the script value as an integer (instead of the default string type that the gui creates).
    I wish I knew how to see the results of the SCOM workflow that determines if a health state change is required.
    My next steps in the absence of a better action plan, is to reverse the healthy/unhealthy expressions to see what happens or recreate the MP from scratch.
    Any help would be greatly appreciated as I have been banging my head on this one!!

    Thanks for the response Andy.
    Actually, you are only partially correct. The returned value IS a string by default, however you CAN cast it as an Integer if you export the MP to xml, substitute value String with Integer, and reimport the MP.
    Root cause of the issue turned out to be a typo in the parameter syntax used in the health expression.
    It was an easy catch once I compared side-by-side a working monitoring against the unworking monitor.  Good example of how going too fast can actually slow you down :)
    Thanks again for taking the time to respond!

  • ConfigMgr 2007 Management Point Health: State

    I am getting this, how to fix this issue. SCOM 2012
    Any body knows the triage to fix this issue
    Manjish

    Hi,
    Verify that the SMS Agent service is running.
    Check the MPControl.log file and component status for MP_CONTROL_MANAGER for more details.
    ConfigMgr 2007 Management Point Health: State Rule
    http://mpwiki.viacode.com/default.aspx?g=posts&t=119435
    Note: Microsoft provides third-party contact information to help you find technical support. This contact information may change without notice. Microsoft does not guarantee the accuracy of this third-party contact information.
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected]

  • Group Member showing incorrect Health State

    Hi Everybody,
    Just looking for a few thoughts on an issue I've had today and a couple of times in the past.
    We are running SCOM 2012 and use dynamic groups based on name to send notifications to the right people. Today we had a critical alert however when looking at the group members the health state for the server was only at warning level and a notification
    was not sent.
    The critical alert was that the server was not contactable, other notifications do however seem to work.
    Am I just missing something simple here?
    Many thanks,
    Allan
    Edit - All other health status' match the alerts states and now that the critical issue is resolved the health state matches at a warning level.

    Allan,
    look at this Kevin's blog
    http://blogs.technet.com/b/kevinholman/archive/2008/02/01/configuring-notifications-to-include-specific-alerts-from-specific-groups-and-classes.aspx. This is exactly what I think.
    But be aware, this will only solve problem with notifications.
    If you get in situations like in your first post you will stil not see health rollup to particular windows computer in case of "The Failed to Connect to Computer". Instead you will see coresponding "Health Service Watcher" in critical state.
    This is because class "Windows Computer" does not host class "Health Service Watcher", so health form watcher can't rollup to health of windows computer.
    Watcher is something that reside on management servers or in other words MS's are owner of agent watcher instances.
    The purpose of these watcher is to show you agent health from "network" perspective...
    Regards,
    Ivan   

  • Disk health state dont change OS health state

    Hi,
    Scenario: SCOM 2012 R2 UR4.
    The health state of a disk in a given server dont change the health state at OS.
    The health state at disk is warning, but the health state at OS is OK.
    The questions are:
    Is this behavior normal?
    How to configure health state at OS to be the same at disk?
    Thanks in advance!

    As of i have seen even i have seen such situations in my SCOM 2007 R2 where it shows Healthy event if there are multiple Monitors in a critical state.
    Based on my understanding SCOM calculates based on the Dependency monitor.
    Example: Best state and worst state.
    1.So may be even though that disk related issue is there but that totally does not affect the Health of the Agent.
    2.In case if there is any specific service named Service 1 which is in a critical state and there is Service 2 and if Service 2 has also become critical because it depends on Service 1. So because of 1 service multiple services are in a non healthy state,
    Then if this is the case at that time the agent may change to a critical state in the dashboard.
    May be your current situation is like example 1 and not like example 2 so you are still getting the state as Healthy in the Dashboard.
    =============================
    Any one please correct me if i am wrong.
    Gautam.75801

  • How Do I create a health state dashboard for a group of servers in SCOM 2012 R2

    I am aiming to create a health state dashboard which would show the agent health for a particular group of server. But unable to find the option for the appropriate widget. Can somebody suggest how should I proceed on this?

    There is state widget for show status of computer health or warning or critical
    http://blogs.technet.com/b/antoni/archive/2013/05/13/operations-manager-2012-dashboard-widgets.aspx
    Please remember, if you see a post that helped you please click "Vote As Helpful" and if it answered your question, please click "Mark As Answer"Mai Ali | My blog:
    Technical | Twitter:
    Mai Ali

  • Unable to retrieve topology component health states

    Hi,
    SharePoint 2013 SP1 + Sep 2014 CU and SQL Server 2012 SP1
    i have created search service application using power shell and also using central administration but both times i am getting following error message when access search service application.
    Unable to retrieve topology component health states. This may be because the admin component is not up and running
    Pls guide how to fix it
    thx
    iffi

    Hi Imughal,
    This issue can be caused by many reasons, please check ULS log for detailed error message.
    If the issue is caused by Event ID: 6482 error “A call to SSPI failed, see inner exception”, then
    it can be caused by the following reasons:
    The timer service account is trying to communicate with HostController service  and generates 'SSPI  Connect' call demanding SPN to be created for target services Identity, if no SPN is found.
    The Hostcontroller service's NodeRunner process is limited to use restricted amount of RAM.
    Then please follow the steps in the link below for solving this issue:
    http://blogs.msdn.com/b/bkr_sharepoint/archive/2014/06/09/sharepoint-2013-search-topology-activation-error-quot-unable-to-retrieve-topology-component-health-states-this-may-be-because-of-the-admin-component-is-not-up-and-running-quot.aspx
    Best regards.
    Thanks
    Victoria Xia
    TechNet Community Support

  • Unable to retrieve topology component health states. This may be because the admin component is not up and running.

    Unable to retrieve topology component health states. This may be because the admin component is not up and running.
    I have deleted Search Service App and created again.
    But still same error can anyone give me check list.. I need it very urgently.
    Please reply
    Thanks
    Raj

    Hi Raj,
    This issue can be caused by many reasons, so please check the ULS log for detailed error message.
    For SharePoint 2013, by default, ULS log is at C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\LOGS.
    Here are some similar issues and the corresponding solutions for you take a look:
    http://blogs.msdn.com/b/bkr_sharepoint/archive/2014/06/09/sharepoint-2013-search-topology-activation-error-quot-unable-to-retrieve-topology-component-health-states-this-may-be-because-of-the-admin-component-is-not-up-and-running-quot.aspx
    http://blogs.msdn.com/b/sambetts/archive/2013/08/28/sp2013-win2012-unable-to-retrieve-topology-component-health-states-this-may-be-because-the-admin-component-is-not-up-and-running.aspx
    http://techtrainingnotes.blogspot.com/2013/09/unable-to-retrieve-topology-component.html
    Best regards.
    Thanks
    Victoria Xia
    TechNet Community Support

  • Cisco Supervisor Desktop show "Agent Logs - call" and "Agent Logs - state" in N/A ::: UCCX 8.5.1

    Hi team.
    The Cisco Supervisor Desktop don't show any logs in the "Agent Logs - State" and "Agent Logs - Call" in some agents.
    I restarted the Cisco Desktop Services in CCX Serviceability but the issue continue.
    I appreciate any help respect this case.
    Thanks a lot.
    ErnestoG

    Hi Ernesto,
    Did you click or selct the Specific Agent\Inbound call which is currently being handled by the Agent. From the Screenshot you have attached (first one) doesn't look like the call has been selected.
    Please select or click on that Specific Agent\Inbound call from CSD and check these values.
    Hope this helps.
    Anand
    Please rate helpful posts !!

  • WEBLOGIC MANGED SERVER HEALTH STATE IS UNKNOWN

    Dear Team,
    I am working as support engineer in a production environment.
    I am facing some issue described below:
    1. One of Weblogic managed server health state is WARNING. I have checked in managed server log and found below error:
        <Jun 22, 2014 5:13:37 AM IST> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "7,222" seconds working on the request "Workmanager: nlsi/wm/MarshallerWorkManager, Version: 0, Scheduled=true, Started=true, Started time: 7222299 ms
    ", which is more than the configured time (StuckThreadMaxTime) of "7,200" seconds. Stack trace:
    As the Health is in warning state managed server become fully unresponsive.
    Now I have following query:
    1. What is the risk if managed server health state is WARNING?
    2. Is Stuck thread will make the server go into an Unknown state?
    3. If managed server state in UNKNOWN then what is the risk?
    3. Could you please provide any solution so that in future we should not face this WARNING health state.
    4. Could you please investigate why this health state WARNING occur?
    Please help me.
    Regards,
    Maity

    Dear Team,
    I am working as support engineer in a production environment.
    I am facing some issue described below:
    1. One of Weblogic managed server health state is WARNING. I have checked in managed server log and found below error:
        <Jun 22, 2014 5:13:37 AM IST> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "7,222" seconds working on the request "Workmanager: nlsi/wm/MarshallerWorkManager, Version: 0, Scheduled=true, Started=true, Started time: 7222299 ms
    ", which is more than the configured time (StuckThreadMaxTime) of "7,200" seconds. Stack trace:
    As the Health is in warning state managed server become fully unresponsive.
    Now I have following query:
    1. What is the risk if managed server health state is WARNING?
    2. Is Stuck thread will make the server go into an Unknown state?
    3. If managed server state in UNKNOWN then what is the risk?
    3. Could you please provide any solution so that in future we should not face this WARNING health state.
    4. Could you please investigate why this health state WARNING occur?
    Please help me.
    Regards,
    Maity

  • How to monitor  SQL statements issued by SIEBEL ?

    Hi,
    We have developed BI Siebel 10.1.3.3. application.
    One of the requirement is to persist/store in the database real SQL statement issued by BI Dashboards/Answers.
    The best solution would be having acces on-line to cursor cache.
    Could someone please tell me how to achive this ?
    Regards,
    Cezary

    Sounds like you're looking for Usage Tracking.
    OBIEE Server Administration Guide – Pages: 220
    OBIEE Installation and Configuration Guide – Pages: 229
    And this post here;
    http://oraclebizint.wordpress.com/2007/08/14/usage-tracking-in-obi-ee/
    A.

  • ConfigMgr 2012 MP – suspending SCOM agent/health service

    Hi,
    I use SCOM agent/health for our SLA reporting. I found a handful of servers last month with something like 99.9% uptime even though I knew they had been up all of the time. Looking in the Operations Manager event log I have found the following events related
    to ConfigMgr 2012 -
    The System Center 2012 - Operations Manager agent running on computer "XXXXXXXXXXXXX" is suspended for the following reason:
    "ConfigMgr 5.00.7804.1000 - SMS_MaintenanceTaskRequests –
    I am trying to understand what and why it is doing this. Presumably something in the ConfigMgr MP is trying to “fix” something with a task? However if I checked open/closed alerts for the servers involved I can’t see any type of alert getting created.
    I have opened the ConfigMgr MP with “Management Pack Viewer” but can’t find any Diagnostics, Recoveries or Tasks listed. Just wondered if anyone had any idea what this was or perhaps it’s a task related to a SCOM internal MP itself? I can’t find any other examples
    of the agent getting suspended whilst something gets “fixed”

    Hi,
    This event is telling you that the SCOM Agent is being paused for an SCCM Maintenance task (often a backup).
    For more information, please refer to the link below:
    Use of Disable Operations Manager alerts option in ConfigMgr
    http://blog.tyang.org/2014/04/24/use-disable-operations-manager-alerts-option-configmgr/
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Can monitor SCCM agent health through SCOM, monitor logs like CCMeval and setup alerts

    Can monitor SCCM agent health/inactive agents through SCOM, monitor logs like CCMeval and setup alerts

    You can find some management packs here:
    http://systemcenter.pinpoint.microsoft.com/en-US/applications/search/Operations-Manager-d11?q=
    There are other sites as well but this is the MS page for hosting MP's.  The default SCCM 2012 Management pack for SCOM 2012 is pretty functional, this page talks a little bit about it:
    http://blogs.technet.com/b/kevinholman/archive/2012/12/11/monitoring-configmgr-2012-with-opsmgr.aspx
    If I remember correctly, it does NOT include a lot of client monitoring but I could be wrong.  It might take some custom monitor creation or management pack downloads to get exactly what you're wanting.  If I can find something like that
    I'll add it to this post.
    A good rule of thumb that I live by with SCOM, in case the product is new to you, is to save all your changes and customizations to the SCCM management pack in a custom-created management pack.

  • JMS Health State

    At the following URL it states that the JMSRuntimeMBean and
              JMSServerRuntimeMBeans expose a HealthState via a getHealthState() method
              call:
              http://e-docs.bea.com/wls/docs70/admin_domain/monitoring.html
              From the text of this message is would appear that this health state is
              related to the health of the JMS thread pool:
              "...the JMS subsystem monitors the condition of the JMS thread pool..."
              But after substantial testing it does not appear that the health state
              changes. It has defined 4 levels: OK (0), WARNING (1), CRITICAL (2), and
              FAIL (3), but I have never seen it move to anything other than 0 (OK).
              Here is my test environment:
              JMS Thread Pool set to 2 threads (reduced from 15 to try to observe this
              condition)
              For my Topic I have a maximum message count of 10 (again to try to observe
              errors)
              I have 200 threads sending 224 bytes of data into the JMS topic at 4 second
              intervals. I have observed the byte and message throughput to drop to
              almost nothing with pending bytes and messages, but the health state is
              still OK. Furthermore, with only 2 threads to service 200 requests (backing
              up at some times to 300 concurrent connections? and a default execute queue
              of 50 threads (100% in use) and a queue length of 180), you would think that
              the health state would reflect a problem.
              Any insight?
              Thanks!
              Steve
              

    sorry in health collumn it showing warning , for that deploymnet

  • JTA health state has changed to HEALTH_WARN

    Hi <br><br>
              I've got the following warning:<br><br>
              The JTA health state has changed from HEALTH_OK to HEALTH_WARN with reason codes: Resource OracleStore_1 declared unhealthy.
              <br><br>
              Can u point out a document, that describes rules of changing JTA's health state ?<br>
              <br>
              regards
              <br>Lukas<br>
              <br>WLS 9.2

    First of all, it's not likely there is lost data but the proposed solution to delete store directories will destroy data and is not normally recommended. It's much more likely that the data in question never made it into the system as the transactions that inserted data are in the process of rolling back or already rolled back (the exception in the first post indicates the transaction rolled back); or possibly the data is part of an unresolved transaction that has a pending commit.
    There appear to be two different problems in this thread and it's doubtful the same solution would apply to both. Based on the minimal information provided:
    - (A) check your server log for precursor warnings and errors
    - (B) increase transaction timeouts (the global domain default is 30 seconds)
    - (C) reboot your WebLogic servers
    - (D) ensure your up-to-date on service-packs/patches
    - (E) contact customer support
    HTH,
    Tom

Maybe you are looking for