NUMA node zero at 100% CPU utilization while others are around 50% - why?

Questions:
How can I tell which processes are maxing out one of my eight NUMA nodes' CPU?  i.e. what tool would give me that insight?  I've poked around procmon.exe and don't think it has this, but am not sure yet.
Any thoughts on what's going wrong overall, given the description below?
Symptoms:
Task Manager shows one NUMA node at 100% CPU (and in processors view, shows all 8 CPUs in that node at 100%), while other nodes range from 0-90% depending on load
SQL Server, which is the only significant CPU user on this box, is running regular jobs 2x slower than it did last week.  Last week it was running Windows and SQL 2008 R2 SP1, and this week it has been rebuilt (from scratch - not upgraded) with Windows
and SQL 2012.
Environment:
HP DL980 G7, 64 cores, 1TB RAM, 50TB fast disk for SQL use, 120TB slow disk for file storage
Hyperthreading enabled - last week, it was not (since it's not supported well in Windows 2008 R2 with the Hyper-V role and 64 cores)
Windows Server 2012 Standard, installer we got via Software Assurance (and yes, I have an MS support case, ID 112110854959943, but I often lose days before my case escalates to the right person, so I'm hoping someone knowledgeable sees this first,
as I have some very unhappy users!).  Last week was Windows 2008 R2 SP1 Enterprise.
SQL Server 2012 RTM Enterprise Core.  Will update to latest CU as soon as I get a maintenance window.  Last week was SQL 2008 R2 SP1 Enterprise.
Max d.o.p. on six SQL instances is set to 8, on the seventh it is set to 16.  No CPU affinity set by us for SQL or anything else.
Hyper-V role installed.  Had this role last week too.
Domain controller.  Had this role last week too.
Also running DFS (with replication turned off this week, though it was on last week), not much going on there right now so I don't think it's related to the problem.
Task Manager and Resource Monitor show Memory is fine (100GB free), Disk is fine (<500 MB/s I/O, compared to regularly running at 1-2 GB/s in the past).  perfmon won't open for me, possibly due to an unrelated problem (port exhausting with DNS Client,
another MS support case we have ongoing)
Planned next steps:
I have removed the Hyper-V role and am awaiting user permission to reboot for that to complete.  Will see if that fixes it.
If it doesn't, I'll reboot again and remove hyperthreading.
And in the meantime, via MS support and this post, I'm trying to understand what is going on, what is wrong.  I would love your help!

To follow-up on Ethan's post (he works with me) - this is what it looks like in Task Manager.
Normally, all 64 CPUs would be at/near 100%. But SQL jobs look they are being bottlenecked on NUMA node 0 (despite no affinity being set - everything is default/automatic). Another symptom (other than very slow SQL jobs) is that various management tools
in the host partition are very unresponsive.
The significant changes are:
* WS2012 vs WS2008R2 SP1
* SQL2012 vs. SQL2008R2
* Hyperthreading enabled in BIOS to give 128 logical processors to the hypervisor. 64 logical processors are made available to the host partition (the limit in WS2012)
All SQL instances are  running in the host partition. There is no significant network or other activity on this box. No VMs are running. CPU-Z shows 8 sockets, 8 cores/socket, 8 thread/socket in the host partition. So despite hyperthreading/SMT being
turned on, it looks like all 64 real cores are being exposed to the host partition.
See also here, where Microsoft recommends leaving SMT/hyperthreading on in all cases with WS2012:
http://blogs.technet.com/b/matthts/archive/2012/10/14/windows-server-sockets-logical-processors-symmetric-multi-threading.aspx

Similar Messages

  • Load testing and 100% CPU Utilization - Multiple JVMs?

    Hi,
    Problem:
    While stress testing the application with 20 simultaneous users, unix is 100% utilized and there is degradation in response times.
    One component of the application is invoking a unix script which is executing a java program after setting CLASSPATH, PATH etc. I guess thus each execution will be invoking a jvm. At the end of the program it will do a System.Exit(). This java program generates around 15-60 barcode images. It takes 2 secs normally.
    When, 5 and 10 concurrent users executing this unix script for a steady state time of 30 minutes is also working fine even through response time degrades to 4 secs.
    With 20 concurrent users, unix box with 4 CPUs is 100% utilized and response time degrades to 11 secs or so.
    We need to do this through unix script only because it is executed from database tier (oracle reports - rdfs) and java stored procs doesn't allow awt operations (image file creation).
    I repeated this testing using a java class which just loops for 2 secs. With 20 concurrent users, it was also degrading to 6 secs or so and fluctuating between 100% CPU utilization.
    Any pointers on what we should be analyzing more and how should we try to solve this issue.
    Thanks,
    Ayyappa

    Stupid forums made me change the ``screen name.'' Whatever...
    Anyway, it is resource intensive to invoke a new JVM per request. You can do several things but it comes down to the fact you will want to keep ONE JVM running in the background awaiting requests to do work. These requests can come from several mechanisms, such as polling a database table, sending requests over a local TCP socket, listening in on a Unix fifo file, etc, etc.
    When the shell script is exec'd, connect to your JVM (for example) by opening a socket to it, enter your request for work, and wait for the response, the JVM app will listen and accept socket requests, thread off and process and then return data. Something in this form will be substantially more scalable that what is currently being done.

  • 100% CPU utilization on svchost.exe or Automatic Updates service

    We have upgraded from WSUS 2.0 to 3.0 SP1 and now have few Windows XP SP2 PCs that are extremely slow because the CPU is at 100% utilization running a process called "svchost.exe."  If I go into services and stop and disable the "Automatic Updates" service the CPU drop to normal almost instantly.  I tried forcing a reinstall of the Windows Update Agent.  After I enable the "Automatic Updates" service the machine works fine for a day, than after a reboot it goes back to 100% CPU utilization.  We need this fixed so we can get these computers updates.

    Hi Ryan / Folks,
    NO CA Products here - but I have had the same issues with Microsoft updates!
    Here's the install path I used during my experience :
    Cold install of XP with SP1 on the PC (Full factory system restore/rebuild).
    Acer - Semperon 1.8GHz + 1GB RAM - 8Mb ADSL connection to Internet
    XP SP2
    Windows updates OK - used to install IE7
    Reason - I found IE is compromised if you go straight to XP SP3
    XP SP3
    Next Office 2003 Pro
    Switch to Microsoft updates - Custom Updates - SVCHost issue - Still checking for updates after 15 minutes
    Switch back to Windows updates - Custom Updates - NO SVCHost Issue - Checking complete after 2 minutes
    Tried both Microsoft fixes mentioned above
    http://support.microsoft.com/kb/927891
    Same report back - i.e. SP3 newer etc.
    http://support.microsoft.com/kb/943144 - Method 2
    Seems to install ok
    Reboot PC
    Swich back to manual Microsoft Updates and all is not really rosey as the initial "checking updates" scan can take at least 5 minutes with SVCHost at better than 90% CPU usage. So I believe the issue is not fixed, but it is just about useable.
    Interestingly, no issues on my work LAN where I am the systems manager - 20 PCs and 8 servers using WSUS. All units are up to date and no SVCHost issues.
    So.... No fix yet here, however my solution is as follows:
    On the problematic PC I decided to switch back to Automatic Windows Updates. This keeps the PC up to date with all Operating System patches. Performance is not affected. I have decided that I will manually switch to the Microsoft update system once every couple of weeks or so to catch the updates for Office etc. I'll just have to set the updates scan running over a quiet period I suppose.
    Plan B = Manual Office Updates http://office.microsoft.com/en-gb/downloads/default.aspx - Left Pane - Office Updates.
    Hope this sheds some light.
    Regards,
    Knaphie
     

  • I am new to Mac and am having trouble with uploading images from my pictures folder to Facebook and other share sites- some of my images are accessible while others are seemingly locked....

    I am new to Mac and am having trouble up loafing images from my pictures folder to photography sites and Facebook. Some of the saved images are accessible, while others are not, ( they are light colored and cannot be uploaded) I am not saving them any differently.

    Hi Robodisko,
    Thanks for your prompt reply...... 
    I often proof my work in preview then edit images in photoshop and rename from there.  The dsc images are renamed to correlate the name of the class etc.  i.e. dcs_001 is saved as the file name required 7A.jpeg etc.
    The file names are required to correlate what is what and so on..........

  • Why are some of the messages i send in a blue background while others are in a green background?

    why are some of the messages i send in a blue background while others are in a green background?

    Blue = iMessage...
    Green = Text message... SMS

  • Why are some of my movies not syncing while others are? All of my movies are in m4v form, and some of them sync but some of them don't.

    Why are some of my movies not syncing while others are? All of my movies are in m4v form, and some of them sync but some of them don't. Also, when i try to play my movies on my mac AND ipod, some of my movies just randomly freeze and wont let me continue watching. I dont understand why its doing this.

    iTunes: May be unable to transfer videos to iPhone, iPad, or iPod - http://support.apple.com/kb/TS1497

  • Few mails are not reaching to a recipient while other are receiving the same email

    Hi
    I am facing very strange issue for few days. 
    The email sent by a user to multiple internal recipient, is not reaching to one recipient, while others are receiving that email.
    Few users are facing this issue. mail neither reaching in their inbox nor Junk folder.
    I am using Exchange Server 2013 CU2 and TrendMicro MailScan anti-spam.
    I have searched the logs in TrendMicro Mail Scan but nothing found.
    Please help.
    Thanks,
    Manoj 
    Thanks, Manoj

    Hi,
    To narrow down the cause, I’d like to ask the following questions:
    1. Does the issue happen on random or regular recipients?
    2. Does the issue happen on all emails or only emails including many recipients?
    We can check the Maximum number of recipients per message through a Receive connector:
    http://technet.microsoft.com/en-us/library/bb124345(v=exchg.150).aspx
    3. Is there any DNR or error message?
    If you have any question, please feel free to let me know.
    Thanks,
    Angela Shi
    TechNet Community Support

  • IWS 6.0 100% CPU utilization hanging- very urgent

    Hi,
    We are using Iplanet Web server 6.0 on windows-2000 SP2.The problem we are facing is after 10 concurrent users have logged in the CPU utilization shoots up to 100% and we have to reboot the systesm
    Our billing Application is affected very much due to this.
    Can anybody throw some light on this?
    Thanks in advance.

    Hi,
    Are you using any plugin with iWS. Please let me know your config file. Mean while please check tunning parameters of solaris for Performance bench mark.
    http://docs.iplanet.com/docs/manuals/enterprise/50/tuning/perf6.htm#17580
    Regards,
    Dakshin.

  • SDK-based Management Pack causes 100% CPU utilization for HealtService.exe

    We developed an MP that uses SDK to create SCOM objects and insert a large number of performance counters ~ 2000 every 5 minute interval.
    On SCOM side the MP has about 2000 instances of UnitMonitor (only ~20 per class but there are lots of  actual objects). In these monitors we use 
    DataSource based on Microsoft.SystemCenter.TargetEntitySdkPerformanceDataProvider. All worked fine for a year but lately added a bunch of new objects/counters and CPU utilization for the MonitoringHost.exe started to spike
    to 100% a few seconds after the Performance Counters were posted via SDK. The spike lasts for up to 3 minutes. No DB or Network spikes observed. We suspect SCOM does not deal efficiently with SDK Performance data - as if a separate SdkPerformanceDataProvider
    is started for every UnitMonitor when the counters are posted rather than having cooked-down - one per Target instance.
    Can anyone shed some light on this? I suspect only the Microsoft engineers would know.
    Our environment is like this:
    SCOM 2012 R2 UR4
    Windows Server 2012 8 Core, 16 GB Ram
    Thanks,
    Dave

    Hi Dave,
    is there a way that you can reduce the number of performance insertions? Regardless of even fixing the performance issue, the impact from a database standpoint may be something to consider in the long term. My point is that fixing one issue, will take us
    to the next one, etc etc.
    I would reduce the number of insertions, only to the either insertion of the "most critical" or "top #". Also, aren't those counters already captured out of the box?
    hth
    Jose

  • JRun 100% CPU Utilization - Source: Attack

    When I came in this morning, I found my server (HP ML370, 2x
    Xeon 3.06MHz, 4GB RAM, Windows 2003, CF7, Java 1.4.2_05) extremely
    slow. Checking out the Task Manager, I found that JRun was taking
    up 100% of my CPU utilization. Yikes!
    First I thought it was a JVM problem, so I went about
    applying the various tweaks: No improvement.
    Rebooted: No improvement.
    Went through the CF and JVM logs: Nothing of importance
    Finally looked at the 404 error log: Blam! Ton of entries
    (where usually it's pretty empty). It appears that since midnight
    my website was under some sort of attack from 84.32.147.110 (RIPE
    network machine). Refer to the attached code - this is a single URL
    request (the /directory names/ are correct, but not that deep on my
    server - appears he's adding them one after another in a DOS attach
    or something).
    What is this? Once I tracked it down to this machine, I
    blocked access from our firewall. Yes, even now, hours later, he's
    still trying to hammer our site. Once I blocked him JRun went back
    down to around 3-15% (leaning more towards the 3%) and everything
    went to normal.
    I've written an abuse complaint to his ISP in Lithuania (I'm
    not holding by breath).
    What else can I do to avoid future attacks?
    Has anyone heard of anything like this?
    Help!!!!

    The only difference I can discern is that on the machine that doesn't work, the root drive is E
    hmmmm. if you have the drive letter for the root drive that high, it's probably worth checking to see if you're getting a Windows drive letter confusion. for troubleshooting advice on that, see:
    Windows confuses iPod with network drive or hard drive and may keep iPod from mounting or songs may seem to disappear

  • 100% CPU utilization issue by Adobe Form

    Hi
    I am creating a new Adobe Interactive Form in my web dynpro java application.
    When I am trying to run that application on server which is having only 2-3 UI elements in interactive form it's using 100% CPU and adobe form is not getting render.
    Some of the adobe forms are already working properly on the server.
    Does anybody faced such issue or somebody knows what may cause this issue so please help me out on that.
    Thanks
    Ravi

    Clear CPU Temp folders START->RUN-><Type TEMP> you will get processed Forms that are locally stored in your system Temp folder select all and delete.
    and also START->RUN-><Type %TEMP%>  and delete the corresponding files in that folder.
    Kanagaraja L

  • 100% cpu utilization

    We have been struggling for few month now trying to figure out this problem. Application runs fine for few days (or even few weeks in some instances), and then load average (uptime) goes to 1 and cpu is becomes 100% utilized. If I leave it running, in few hours load average starts going up. The the entire application locks up when load average is around 15. Some of our applications actually have been running for few month before this happens.
    We are running on linux. The problem shows itself much faster on Redhat 9, it�s not as bad on Redhat8. Also, it�s much better under 1.4.2_02 jvm, and pretty bad on 1.4.2_06. It also happens in 1.5.0_01. It looks to be most stable with 1.5.0_01 with ConcurrentGC. We have few dozen servers each running at least 2 JVMs, and tested every combination possible.
    Our application is heavily thread, custom built. We have a load test running at 10 times the number of threads and can not replicate this problem within few days, so I believe it has nothing to do with the number of threads and the way we use them.

    General technique for Java apps that use 100% CPU is to take a series of thread dumps (send SIGQUIT on UNIX, Ctrl-Break on Windows), once a second for 10 seconds.
    When you analyze these dumps, look for threads that are consistently in Runnable state. If you've got some sort of loop running, you'll see the same set of threads Runnable in all the traces, and you'll see the stack trace showing what they're executing.
    Also, search the Bug Parade for your symptoms. Link is available from the main java.sun.com page.

  • 100% CPU Utilization on

    I have a 1U Server purchased in August, 2003.  The motherboard model is MS-9129, motherboard chipset is Intel 845E ( Brookdale-E) + ICH4.  Bios is Phoneix Award 6.00 PG, dated 12/10/2002.  We are running Windows 2003 Standard Edition, SP1
    The system is currently operating at 100% CPU, with the System process consuming all otherwise unused capacity.  Within the System process, an ACPI thread is utilizing all the resources.  Our analysis, together win Microsoft Tech Support suggests upgrading the bios to a more current version.
    Two Questions:
    1) Have you experienced this, and do you concurr with the solution, or do you have a better one?
    2) What, and where can I get it, is the most current BIOS?  Do you have Installation instructions?
    Thanks
    Phil Rounds
    [email protected]
    Time is of the essence here.

    I've run a virus scan showing no viruses, and Symantec AV is up to date and running.
    I can't get a screen shot, as there is no software on the server capable of getting one, and I can't load anything when it is this slow.  Besides, all diagnostics I have, or that Microsoft has had me run, show that it is the ACPI subsystem which is causing the problem.  I have disabled all software other than the basic operating system, have assured that the OS up to date and still have the same issues.
    MSI's LiveUpdate shows the BIOS to be version 1.0, with the most up to date one for this system seems to be 1.3.  Does anyone know how
    this translates to Phoenix's numbering system?    All the system diagnostic tools I have shows my current  BIOS to be Phoenix 6.00.  Is the MSI 1.3 BIOS really an upgrade?
    Thanks

  • 100% CPU Utilisation while reading from socket

    I have writted a ServerSocket program to read data from ethernet client bridge. client-bridge is a system which converts serial data to TCP packet and vice versa. I can read from the client bridge only through creating a server socket and waiting for the client bridge to make a client socket connection (ServerSocket.accept() method.
    The problem that I m facing is that, the client bridge is creating connection only once. I have created an infinite loop which keeps on reading from the socket. The performance of the application goes down drastically (100% CPU utilisation through task manager) when this application is run. Is there a better way of reading data from the client bridge. The client bridge is not in our control and the server can only wait for the client to make connection.
    Please help. Thanking you all in anticipation.
    Regards,
    Amitabha

    If I close the client socket will the client create a new socket to
    transmit data or it depends on the client application.
    I have also tried creating a new socket connection for
    reading each frame of the protocol. but the client
    creates only one socket. What might be the problem?I bet it depends on the client app because your server app only waits for a connection from the client. So it's the client that initiates the connection. When you disconnect the client socket then you will have to wait for another client request. what you can do is to stay connected (unless of course client disconnects itself) and just keep on reading from the client socket.

  • FF 20 and Flas 11 = 100% CPU - Opera, IE8, Chrome are all OK. FIX IT!

    I am absolutely sick and tired of hearing excuses about why FF is not to be blamed when it chews 100% CPU with the Adobe Flash plugin.
    I have done all the dancing around I am willing to endure. I have even downgraded to FF18 and FF19. No friggin setting works. Nothing. IT IS IN THE CODE!! FIX IT!!!
    It is simply NOT acceptable that ANY other browser in the known universe works OK with Flash BUT FF.
    It is time that devs acknowledge this and do something about it instead of scratching their butts!
    PLEASE do not get me more links to troubleshooting pages, either from Windows, Adobe or FF.
    I'll repeat: IT IS IN THE FRIGGIN CODE. FIX IT!!!

    Also IE uses a Active version of Flash and Chrome comes with its own Flash though it can use Adobe version if pepper version is disabled.
    Adobe has been making a buggy Plugin for Windows ever since they came out with the Flash 11.3 version.

Maybe you are looking for

  • Text Input Prompt

    Does anyone tried to build some text input field that can do prompt? Something like auto-complete in source code edit. It would mean that: - we have to make the rest of the prompt text gray - the cursor blinks right after the user input characters -

  • HT4436 how can i setup one account for me and my kids ?

    how can i setup one account for me and my kids ?

  • Error when installing on Macbook Pro

    I have some trouble when installing Final Cut Express HD on My Macbook Pro (intel duo core processor). It says some hardware is missing - AGP graphic card. I've search for the solution on Apple Support before and I found that there is an update 3.0.1

  • Converting varchar to datetime where date field is missing

    Hello Sir, i have a column with varchar datatype values like '2013/12','2012/11' where date field is missing.  i want to convert it as datetime at runtime so that i can use datetime related function. Is it possible for ms sql server 2008? thanking yo

  • Critical problem with new io

    i m working on a simple implementation of message passing interface (mpi). there are multiple machines and all the machines are connected to each other. i have used java sockets. i have used selector both in client machines and server machines (clien