Best practice on monitoring Endeca health / defining outage

(This is a double post from the Endeca Experience Management forum)
I am looking for best practice on how to define Endeca service outage and monitor the health of the system. I understand this depends on your user requirements and it may vary from customer to customer. Specifically what criteria do you use to notify your engineer there is a problem? We have our load balancers pinging dgraphs on an interval. However the ping operation is not sufficient in our use case. We are also experimenting running a "low cost" query to the dgraphs on an interval and using some query latency thresholds to determine outage. I want to hear from people on the field running large commercial web site about your best practice of monitoring/notifying health of the system.
Thanks.

The performance metric should help to analyse the query and metrics for fine tuning.
Here are few best practices:
1. Reduce the number of components per page
2. Avoid complex LQL queries
3. Keep the LQL threshold small
4. Display the minimum number of columns needed

Similar Messages

  • Best practice to monitor 10gR3 OSB performance using JMX API?

    Hi guys,
    I need some advice on the best practice to monitor 10gR3 OSB performance using JMX API.
    Jus to show I have done my home work, I managed to get the JMX sample code from
    http://download.oracle.com/docs/cd/E13159_01/osb/docs10gr3/jmx_monitoring/example.html#wp1109828
    working.
    The following is the list of options I am think about:
    * Set up: I have a cluster of one 1 admin server with 2 managed servers, which managed server runs an instance of OSB
    * What I try to achieve:
    - use JMX API to collect OSB stats data periodically as in sample code above then save data as a record to a
         database table
    Options/ideas:
    1. Simplest approach: Run the modified version of JMX sample on the Admin Server to save stats data to database
    regularly. I can't see problems with this one ...
    2. Use WLI to schedule the Task of collecting stats data regularly. May be overkill if option 1 above is good for production
    3. Deploy a simple web app on Admin Server, say a simple servlet that displays a simple page to start/stop and configure
    data collection interval for the timer
    What approach would you experts recommend?
    BTW, the caveats os using JMX in http://download.oracle.com/docs/cd/E13159_01/osb/docs10gr3/jmx_monitoring/concepts.html#wp1095673
    says
         Oracle strongly discourages using this API in a concurrent manner with more than one thread or process. This is because a reset performed in
         one thread or process is not visible to another threads or processes. This caveat also applies to resets performed from the Monitoring Dashboard of
         the Oracle Service Bus Console, as such resets are not visible to this API.
    Under what scenario would I be breaking this rule? I am a little worried about its statement
         discourages using this API in a concurrent manner with more than one thread or process
    Thanks in advance,
    Sam

    Hi Manoj,
    Thanks for getting back. I am afraid configuring aggregation interval from Dashboard doesn't solve problem as I need to collect stats data of endpoint URI or in hourly or daily basis, then output to CSV files so line graphs can be drawn for chosen applications.
    Just for those who may be interested. It's not possible to use SQL to query database tables to extract OSB stats for a specified time period, say 9am - 5pm. I raised a support case already and the response I got back is 'No'.
    That means using JMX API will be the way to go :)
    Has anyone actually done this kind of OSB stats report and care to give some pointers?
    I am thinking of using 7 or 1 days as the aggregation interval set in Dashboard of OSB admin console then collects stats data using JMX(as described in previous link) hourly using WebLogic Server JMX Timer Service as described in
    http://download.oracle.com/docs/cd/E12840_01/wls/docs103/jmxinst/timer.html instead of Java's Timer class.
    Not sure if this is the best practice.
    Thanks,
    Regards,
    Sam

  • Best Practice for monitoring database targets configured for Data Guard

    We are in the process of migrating our DB targets to 12c Cloud Control. 
    In our current 10g environment the Primary Targets are monitored and administered by OEM GC A, and the Standby Targets are monitored by OEM GC B.  Originally, I believe this was because of proximity and network speed, and over time it evolved to a Primary/Standby separation.  One of the greatest challenges in this configuration is keeping OEM jobs in sync on both sides (in case of switchover/failover).
    For our new OEM CC environment we are setting up CC A and CC B.  However, I would like to determine if it would be smarter to monitor all DB targets (Primary and Standby) from the same CC console.  In other words, monitor and administer DB Primary and Standby from the same OEM CC Console.   I am trying to determine the best practice.  I am not sure if administering a swichover from Cloud Control from Primary to Standby requires that both targets are monitored in the same environment or not.
    I am interested in feedback.   I am also interested in finding good reference materials (I have been looking at Oracle documentation and other documents online).   Thanks for your input and thoughts.  I am deliberately trying to keep this as concise as possible.

    OMS is a tool it is not need to monitor your primary and standby what is what I meant by the comment.
    The reason you need the same OMS to monitor both the primary and the standby is in the Data Guard administration screen it will show both targets. You also will have the option of doing switch-overs and fail-overs as well as convert the primary or standby. One of the options is also to move all the jobs that are scheduled with primary over to the standby during a switch-over or fail-over.
    There is no document that states that you need to have all targets on one OMS but that is the best method for the reason of having OMS. OMS is a tool to have all targets in a central repository. If you start have different OMS server and OMS repository you will need to log into separate OMS to administrator the targets.

  • Best practice for monitoring MXE3500 v3.2.1

    Hi all
    We have recently upgraded to v3.2.1 on our MXE3500, all has gone well but I am looking for the best way to monitor the system health of the device for our support teams.
    The Show and Share and DMM servers come with SNMP monitoring but I can't see the equivelent for the MXE.
    Due to the lock down nature of the Windows VM on the 3.2.1 software I do not want to install our standard OS monitoring software which is BMC patrol as Cisco have adviced to not install any additional software.
    Has anyone got any ideas on this?
    Adam

    Hello,
    Of course too sensitive might cause failover to happen when some packets get lost, but remember the whole purpose of this is to provide as less downtime to your network as possible,
    Now if you tune these parameters what happen is that failover will be triggered on a different time basis.
    This is taken from a cisco document ( If you tune the sla process as this states, 3 packets will be sent each 10 seconds, so 3 of them need to fail to SLA to happen) This CISCO configuration example looks good but there are network engineers that would rather to use a lower time-line than that.
    sla monitor 123
    type echo protocol ipIcmpEcho 10.0.0.1 interface outside
    num-packets 3
    frequency 10
    Regards,
    Remember to rate all of the helpful posts ( If you need assistance knowing how to rate a post just let me know )

  • Best Practice for Monitoring, on VPS 2012 Server Standard. Extended Events or Profiler?

    Hi,
    What tools do you use to determine if you should tweak SQL Server configuration and optimize code route, or simply bump up your virtual resources? Can someone share a bag of Extended Events to monitor at the VPS level?  
    I'm a reasonably decent SQL Developer but never advanced far with DBA efforts. Especially when the mainstream went virtual. Seemed to me that all SQL Servers flexibility with managing disk and memory went out the window now that everything is 'shared', NATed,
    and Plesked. So I basically dropped out of the conversation and built stuff with SSIS and TSQL.
    Now, I'm charged with assessing a bottleneck on a VPS Windows 2012 Standard running SQL 2012 Express. I've read that running profiler and traces are deprecated and I've looked a bit at the servers extended events on the hosted environment. I have not run anything.
    My question: Does it make sense to think in terms of 'levels' in deciding what to monitor? I consider the SQL Server as a level, then the Windows Server, and finally the Virtual Level. What I'm getting at, is sure, I can monitor SQL Server with a profile tool,
    but it won't know SQL is on a VPS. So do I miss something?
    There used to be day when we had a dedicated physical box for SQL Server. We ran traces using profiler and got good clues on how to improve performance. In todays VPS world we can use sliders to increase virtual memory and disk space. What tools do you use
    to determine if you should tweak SQL Server configuration and optimize code route, or simply bump up your virtual resources? Can someone share a bag of Extended Events to monitor at the VPS level?  
    What Extened Events at the VPS level tell me if SQL Server is struggling with the limited 1Gb virtual memory? I realize this is not a direct question but hopefully someone will point this developer in the right direction.
    John

    Hi John,
    From SQL point of view it doesnt really matter whether the box is physical or virtual. So if you feel there is performance issues with sql and you are well versed with sql profiler troubleshooting go ahead with that. If you feel performance is good, then
    you know where to look into.
    I would first prefer to find whether it is really a problem with SQL before trying to troubleshoot sql side and I use perfmon counters to do that.
    Also I would look at the SQL Error log to see if there are any obvious errors.
    I havent used extended events much so leaving that for others to comment on. :)
    Regards, Ashwin Menon My Blog - http:\\sqllearnings.com

  • Best Practice for monitoring RAC 11gR2

    Hi,
    I have RAC 11gR2+ASM on two nodes.
    I would like to get your advice what are the most critical things i should monitor - Regarding RAC COMPONENCTS
    Thanks

    Hi,
    here i am mentioning some basic monitoring
    1)interconnect switch is working properly or not(private network)
    2) check which instances are running on which nodes
    3) check if ASM,listeners,nodeapps,gsd,vip..... are running or not.
    Each instance carrying planned load (balanced?).
    – Shared storage access is equal.
    – Interconnect Load
    -Latency
    – High CPU usage - Oracle processes getting enough resources.?
    Thanks

  • Best Practices for ASA 5500 Device Monitoring

    I have looked high and low and am unable to find anything on this topic. I am hoping that somebody here may be able to share some insight into what are considered the best practices for monitoring ASA's--specifically the 5510 with Sec+ License.
    My current monitoring application keeps reporting issues with outbound interface buffers being too high, but there are not any performance issues and I believe the thresholds are just set absurdly low.
    Thank you in advance for any assistance.

    Hi James,
    You probably won't be able to find any all-encompassing documentation for these types of best practices that cover all scenarios. The better method would be to define exactly what items you'd like to monitor and we can provide some guidance on how to best get that working for you.
    -Mike

  • BizTalk monitoring best practice(s)

    I am looking for information on best practices to monitor BizTalk environment (ideally using Tivoli monitoring tools).  Specifically I am looking for insight into what should be monitored, how one analyzes performance profile.  Thanks

    While setting up monitoring agents/products for BizTalk server (or for any server/application for this matter), there are two ways to start:
    If available, import/install application specific monitoring packages i.e. Import/install prebuild monitoring rules, alerts and actions specific to the application.
    Or create the rules/alerts and actions from scratch for the application.
    For monitoring products like SCOM, management packs for BizTalk server are available as pre-build, as readymade packages. For a non-Microsoft product like Tivoli check with
    the vendor/ IBM for any such pre-build monitoring packages for BizTalk. If available purchase it and install it in Tivoli. This would be the best option to start, instead of spending time and resource in building the rules, alerts and actions from scratch.
    If pre-build monitoring package is not available, then start by creating rules to monitor any errors or warnings from event logs of the BizTalk and SQL servers. Gradually,
    you can update/add more rules based on your needs.
    And regarding analysing performance profile, most of the monitoring product now-a-days comes with prebuild alerts for monitoring the server performances, CPU utilization
    etc. I’m sure renowned product like Trivoli shall have prebuild alerts for monitoring the server performances. Same can be configured to monitor the BizTalk’s performances. And also monitoring event log entries would also pickup any performance related issues.
    Moreover, Tivoli has got detail user guide document for setting alerts for BizTalk server. Check this
    document here.
    Reading best practices, links provided by MaheshKumar shall help you.
    Key point to remember is no-monitoring product is perfect; you can’t create a fool-proof monitoring alerts and actions on day one. It would get mature over the time in your
    environment.
    If this answers your question please mark it accordingly. If this post is helpful, please vote as helpful.

  • Service Model, Health Model, Best Practice (SML)

    Hello
    I am trying to explain to semi-technical people whom do not know SCOM the principle of SCOM when it come to monitoring concepts best practice.
    Therefore what I am looking for please is a set of slides/short video/Q&A etc. which explains the concepts reasoning behind taking the time to workout a Service Model and Health Model at the 'start' of a project (e.g. before installing BusinessAppA)
    so it can be problem monitored and alerts on etc.
    Basically I am trying to get the architects/project managers to think about what I need as a SCOM engineer so I an discover and monitor etc. the Application/System they are proposing to install, rather then picking up this after the event
    Does anyone know of any good resources to explain these concepts to get the message across.
    Thanks All
    AAnotherUser__
    AAnotherUser__

    Hi,
    Please refer to the links below:
    Service Model
    http://technet.microsoft.com/en-us/library/ee957038.aspx
    Health Model Introduction
    http://channel9.msdn.com/Series/System-Center-2012-R2-Operations-Manager-Management-Packs/Mod15
    Health Model
    http://technet.microsoft.com/en-us/library/ff381324.aspx
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • FWSM interface monitoring and best practices documentation.

    Hello everyone
     I have a couple of questions regarding vlan interface monitoring and best practices specifically for this service module.
     I couldn’t find a suggestion or guideline as for how to define a VLAN interface on a management station. The FWSM total throughput is 5.5gbs and the interfaces are mapped to vlans carried on trunks over 10gb etherchannels. Is there a common practice, or past experience, to set some physical parameters to logical interfaces? "show interface" command states BW as unknown.
     Additionally, do any of you have a document addressing best practices for FWSM? I have this for other platforms and general recommendations based on newer ASA versions but nothing related to FWSM.
    Thanks a lot!
    Regards
    Guido

    Hi,
    If you are looking for some more command to check for the throughput through the module:-
    show firewall module <number> traffic
    Also , I think as this is End of life , you might have to check for some old documentation from Cisco on the best practices.
    http://www.cisco.com/c/en/us/products/collateral/switches/catalyst-6500-series-switches/prod_white_paper0900aecd805457cc.html
    https://supportforums.cisco.com/discussion/11540181/ask-expertconfiguring-troubleshooting-best-practices-asa-fwsm-failover
    Thanks and Regards,
    Vibhor Amrodia

  • Basic Strategy / Best Practices for System Monitoring with Solution Manager

    I am very new to SAP and the Basis group at my company. I will be working on a project to identify the best practices of System and Service level monitoring using Solution Manager. I have read a good amount about SAP Solution Manager and the concept of monitoring but need to begin mapping out a monitoring strategy.
    We currently utilize the RZ20 transaction and basic CCMS monitors such as watching for update errors, availability, short dumps, etc.. What else should be monitored in order to proactively find possible issues. Are there any best practices you all have found when implimenting Monitoring for new solutions added to the SAP landscape.... what are common things we would want to monitor over say ERP, CRM, SRM, etc?
    Thanks in advance for any comments or suggestions!

    Hi Mike,
    Did you try the following link ?
    If not, it may be useful to some extent:
    http://service.sap.com/bestpractices
    ---> Cross-Industry Packages ---> Best Practices for Solution Management
    You have quite a few documents there - those on BPM may also cover Solution Monitoring aspects.
    Best regards,
    Srini
    Edited by: Srinivasan Radhakrishnan on Jul 7, 2008 7:02 PM

  • Best Practices for Defining NDS Java Projects...

    We are doing a Proof of Concept on using NDS to develop non-SAP Java applications.  We are attempting to determine if we can replace our current Java development tools with NDS/WAS.
    We are struggling with SAP's terminology and "plumbing" for setting up/defining Java projects.  For example, what is and when do you define Tracks, Software Components, Development Components, etc.  All of these terms are totally foreign to us and do not relate to our current Java environment (at least not that we can see).  We are also struggling with how the DTR and activities tie in to those components.
    If any one has defined best practices for setting up Java projects or has struggled with and overcome these same issues, please provide us with some guidance.  This is a very frustrating and time-consuming issue for us.
    Thank you!!

    Hi Peggy,
    In Component Model we divide software projects into small components.Components can use other components in well defined manner.
    A development object is a part of a component that can be changed or developed in some way; it provides the component with a certain part of its functionality. A development object may be a Java class, a Web Dynpro view, a table definition, a JSP page, and so on. Development objects are always stored as “sources” in a repository.
    A development component can be defined as a frame shared by a number of objects, which are part of the software.
    Software components combine components (DCs) to larger units for delivery and deployment.
    A track comprises configurations and runtime systems required for developing software component versions.It ensures stable states of deliverables used by subsequent tracks.
    The Design Time Repository is for versioning source code management. Distributed development of software in teams. Transport and replication of sources.
    You can also find lot of support in SDN for the above concepts with tutorials.
    Refer this Link for a overview on Java development Infrastructure(JDI)
    https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/library/webas/java/java development infrastructure jdi overview.pdf
    To understand further
    Working with Net Weaver Development Infrastructure :
    http://help.sap.com/saphelp_nw04/helpdata/en/03/f6bc3d42f46c33e10000000a11405a/content.htm
    In the above link you can find all the concepts clearly explained.You can also find the required tutorials for development.
    Regards,
    Vijith

  • Best practice to define length for varchar field of table in sql server

    What is best practice to define length for a varchar field in table
    where field suppose Remarks By Person  varchar(max) or varchar(4000)
    Could it affect on optimization in future????
    experts Reply Must ... 
    Dilip Patil..

    Hi Dilip,
    Varchar(n/max) is a variable-length, non-unicode character data. N defines the string length and can be a value from 1 through 8,000. Max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered
    + 2 bytes. We always use varchar when the sizes of the column data entries vary considerably. While if the filed data size might exceed 8,000 bytes in some way, we should use varchar(max).
    So the conclusion is just like Uri said, use varchar(max) or varchar(4000) is depends on how much characters we are going to store.
    The following document about varchar in SQL Server is for your reference:
    http://technet.microsoft.com/en-us/library/ms176089.aspx
    Thanks,
    Katherine Xiong
    Katherine Xiong
    TechNet Community Support

  • Best practice for data persistance for monitoring without BAM

    Greetings,
    We are modeling a business process in a large organization using BPEL Process Manager. The key point is that business people needs to monitor the execution of the business process in several key sectors of the process execution as well as they need to get report information of the process.
    To model this in our project, we decided to create a new Oracle Database Schema that is going to hold the information about the business process execution (we decided that because for this initial offering the customer is not buying BAM). In this context, the BPEL process is going to be sending this key information to the repository so business people can then view real time information about the process execution as well as historical information in form of reports.
    The important issue here is, if there is a best practice to send the information to the Database Schema ? it could be just using single database adapters ? maybe using sensors sending the data using topics connections ?
    Any help will be highly appreciated.
    Thanks in advance.

    hi..yes this suggestion is nice...first configure the sensors(activity or variable) ..then configure the sensor action as a JMS Topic which will in turn insert the data into a DB..Or when u configure the sensor action as a DB..then the data goes to Oracle Reports schema..if there is any chance of altering the DB..i mean if there is any chance by changing config files so that the data doesnt go to that Reports schema and goes to a custom schema created by any User....i dont know if it can b done...my problem is wen i m configuring the jms Topic for sensor actions..i see blank data coming..for sm reason or the other the data is not getting posted ...i have used a esb ..a routing service based on the schema which i am monitoring...can any1 help?

  • Best Practice for ASA Route Monitoring Options?

    We have one pair Cisco ASA 5505 located in different location and there are two point to point links between those two locations, one for primary link (static route w/ low metric) and the other for backup (static route w/ high metric). The tracked options is enabled for monitoring the state of the primary route. the detail parameters regarding options as below,
    Frequency: 30 seconds               Data Size: 28 bytes
    Threshold: 3000 milliseconds     Tos: 0
    Time out: 3000 milliseconds          Number of Packets: 8
    ------ show run------
    sla monitor 1
    type echo protocol ipIcmpEcho 10.200.200.2 interface Intersite_Traffic
    num-packets 8
    timeout 3000
    threshold 3000
    frequency 30
    sla monitor schedule 1 life forever start-time now
    ------ show run------
    I'm not sure if the setting is so sensitive that the secondary static route begins to work right away, even when some small link flappings occur.
    What is the best practice to set those parameters up in the production environment. How can we specify the reasonanble monitoring options to fit our needs.
    Thank you for any idea.

    Hello,
    Of course too sensitive might cause failover to happen when some packets get lost, but remember the whole purpose of this is to provide as less downtime to your network as possible,
    Now if you tune these parameters what happen is that failover will be triggered on a different time basis.
    This is taken from a cisco document ( If you tune the sla process as this states, 3 packets will be sent each 10 seconds, so 3 of them need to fail to SLA to happen) This CISCO configuration example looks good but there are network engineers that would rather to use a lower time-line than that.
    sla monitor 123
    type echo protocol ipIcmpEcho 10.0.0.1 interface outside
    num-packets 3
    frequency 10
    Regards,
    Remember to rate all of the helpful posts ( If you need assistance knowing how to rate a post just let me know )

Maybe you are looking for

  • HT1483 how do I update my ipod touch with windows 7

    I have a used ipod touch, 4th gen, I think and I'm trying to download some apps and get the message "this app requires iOS 4.3."  Is it possible to update my opperating system?

  • ?  Photo Shop Express problem

    How do I get out of photo shop express to another product so I can resize a picture. I understand photo shop express.

  • KDE 4 Krash

    hm! First KDE 4 snapshot. I think many people want to try it!

  • How-to: desperate need of help in partitioning an external disk drive

    Some time ago, after installing PARALLELS, my external disk drive lost one of the two HFS+ partitions it had (465 GB total, 50% one and 50% the other). Now, after having sent it to assistance, the technicians have recovered all possible data, with on

  • New Mac Pro Octo released

    Check out the new 3 GHz Mac Pro in the Apple Store. Previous models are still available at same price, so we don't have to feel too bad.