0

'System' process CPU monitoring

Yesterday I had a virtual machine that was running extemely slowly. Looking in Task Manager / Performance Monitor on the VM, I could see that CPU was maxed at 100% constantly, and that the 'System' process was taking around 50% CPU all the time.

Looking at the same machine for the same time period in Uber Agent -I don't see the 'System' process CPU consumption on the Process Performance dashboard. Is there a way to see this, as the dashboard paints a different CPU ultisation picture then the information on the host.

Thanks

14 comments

  • Avatar
    Dominik Britz Official comment

    Hi Paul,

    I deleted my old comment because it was partly wrong and confusing. Sorry for the confusion. Here is our new answer:

    The processes with the IDs 0 and 4 represent the kernel (system idle or system). Where in the kernel how much computing time is left, might be quite complex to determine, therefore we don't even try it (yet).

    However, the information alone that a lot of computing time is spent in the kernel could be interesting. In Task Manager you can see this by the process with the PID 4 (System), in uberAgent only indirectly, e.g. by comparing the CPU load of the machine with the sum of the CPU loads of all processes: the difference would roughly correspond to the PID 4.

    Of course, this is a bit far from practice, so the desire to exclude PID 4 (system) is no longer legitimate. I have created a backlog entry for this.

  • 0
    Avatar
    Dominik Britz

    Hi Paul,

    I did a quick check and can't see System either. I'm going to ask the developers and get back to you as soon as possible.

  • 0
    Avatar
    Paul Maddocks

    Hi

     

    Thanks for the quick feedback. I understand the technical challenge and appreciate you adding a feature to the backlog to investigate this further for future releases if possible.

     

    Could I be cheeky and ask if your Devs could give me a simple Splunk query that could be used to calculate this inline with a search on that index? That would really help the community out and save us trying to work it out...

     

    Thanks,

  • 0
    Avatar
    Dominik Britz

    Hey Paul,

    Could you possibly test the following search? Please change the filter to the same machine in both searches.

    On my machine, MachineCPUUsage and ProcessCPUUsage are always the same as I can't find an easy way to stress the kernel. On your machine, the values should differ.

    | pivot `uA_DM_Process_ProcessDetail` Process_ProcessDetail 
    sum(ProcCPUPercent) as MachineCPUUsage
    splitrow
    _time
    period minute
    filter host is ""
    | join type=outer _time
    [
    | pivot `uA_DM_Process_ProcessDetail` Process_ProcessDetail
    sum(ProcCPUPercent) as ProcessCPUUsage
    splitrow
    _time
    period minute
    filter host is ""
    | fields + ProcessCPUUsage
    ]
  • 0
    Avatar
    Paul Maddocks

    Hi

     

    Thanks for this, we will give it a go acorss the the machine / time period that we saw the behaviour and let you know.

     

    Thanks.

  • 0
    Avatar
    Bob Franken

    I've run the query and it gives me the same chart for both runs. Also the numbers are identical.

  • 0
    Avatar
    Dominik Britz

    Try this one. Again you need to change the host filter.

    | pivot `uA_DM_Session_SessionDetail_Users` Session_SessionDetail_Users
    values(SessionCPUUsagePercent) as ValuesSessionCPUUsagePercent
    splitrow
    _time period minute
    filter host is ""
    | stats
    sum(ValuesSessionCPUUsagePercent) as SumSessionCPUUsagePercent
    count(ValuesSessionCPUUsagePercent) as CountSessionCPUUsagePercent
    by _time
    | join type=outer _time
    [
    | pivot `uA_DM_System_SystemPerformanceSummary` System_SystemPerformanceSummary
    values(CPUUsagePercent) as ValuesMachineCPUUsagePercent
    splitrow
    _time period minute
    filter host is ""
    | stats
    sum(ValuesMachineCPUUsagePercent) as SumMachineCPUUsagePercent
    count(ValuesMachineCPUUsagePercent) as CountMachineCPUUsagePercent
    by _time
    | fields + SumMachineCPUUsagePercent CountMachineCPUUsagePercent
    ]
    | timechart
    `uberAgent_dynamic_span`
    eval(sum(SumMachineCPUUsagePercent)/sum(CountMachineCPUUsagePercent)) as "Machine CPU (%)"
    eval(sum(SumSessionCPUUsagePercent)/sum(CountSessionCPUUsagePercent)) as "All sessions CPU (%)"
  • 0
    Avatar
    Bob Franken

    That query shows different results, The Machine query shows high CPU utilization, and the session graph shows much lower utilization.  

     

  • 0
    Avatar
    Dominik Britz

    Sounds good! In that case, the kernel, or what you see as "System" in Task Manager, is doing some heavy lifting while all other processes are working normally. That's exactly what Paul wrote initially.

  • 0
    Avatar
    Bob Franken

    Ok Machine CPU is all CPU by the machine, but All sessions CPU is just the user processes.  This seems to be working.

     

    The Machine graph for the machine shows this utilization, but it doesn't show up in the applications or processes graph.   

  • 0
    Avatar
    Dominik Britz

    The Machine graph for the machine shows this utilization, but it doesn't show up in the applications or processes graph.

    Do you mean on our dashboards? What I created is a custom search, it's not automatically part of uberAgent's dashboard.

  • 0
    Avatar
    Sean Corrigan

    Sorry to bump this over a year later but I am experiencing the same issues, however the custom query Dominik put up doesn't seem to be delivering any stats at all for the "Machine CPU" column.

    I am trying to pin down the same behaviour - VM is maxed out according to the guest in Task manager, but Uber reports barely 25% CPU usage. Even more annoyingly i'm seeing a 3rd number for CPU usage in VMware so not sure if the guest CPU is high due to IO constraints elsewhere, which I am going to look into. But the main issue is no record of the overall CPU usage, we're only getting the number for user processes.

  • 0
    Avatar
    Dominik Britz

    Hi Sean,

    I've tested the search again and it still works fine.

    Be sure to use this one and remember to change the hostname in the two rows with filter host is "".

    Collecting the System's user CPU usage is on our backlog. I've just added you to the list of interested companies so you get notified once it's available.

    Best regards
    Dominik

  • 0
    Avatar
    Sean Corrigan

    Sorry, I missed there were 2 places to put the hostname in not one. D'oh.

    Thanks.

    And yes definitely thanks for putting us on the list. It seems we have a lot of apps that do a lot of work not as the user. It is causing some quite major headaches in terms of visibility.

Please sign in to leave a comment.