0

'System' process CPU monitoring

Yesterday I had a virtual machine that was running extemely slowly. Looking in Task Manager / Performance Monitor on the VM, I could see that CPU was maxed at 100% constantly, and that the 'System' process was taking around 50% CPU all the time.

Looking at the same machine for the same time period in Uber Agent -I don't see the 'System' process CPU consumption on the Process Performance dashboard. Is there a way to see this, as the dashboard paints a different CPU ultisation picture then the information on the host.

Thanks

11 comments

  • Avatar
    Dominik Britz Official comment

    Hi Paul,

    I deleted my old comment because it was partly wrong and confusing. Sorry for the confusion. Here is our new answer:

    The processes with the IDs 0 and 4 represent the kernel (system idle or system). Where in the kernel how much computing time is left, might be quite complex to determine, therefore we don't even try it (yet).

    However, the information alone that a lot of computing time is spent in the kernel could be interesting. In Task Manager you can see this by the process with the PID 4 (System), in uberAgent only indirectly, e.g. by comparing the CPU load of the machine with the sum of the CPU loads of all processes: the difference would roughly correspond to the PID 4.

    Of course, this is a bit far from practice, so the desire to exclude PID 4 (system) is no longer legitimate. I have created a backlog entry for this.

  • 0
    Avatar
    Dominik Britz

    Hi Paul,

    I did a quick check and can't see System either. I'm going to ask the developers and get back to you as soon as possible.

  • 0
    Avatar
    Paul Maddocks

    Hi

     

    Thanks for the quick feedback. I understand the technical challenge and appreciate you adding a feature to the backlog to investigate this further for future releases if possible.

     

    Could I be cheeky and ask if your Devs could give me a simple Splunk query that could be used to calculate this inline with a search on that index? That would really help the community out and save us trying to work it out...

     

    Thanks,

  • 0
    Avatar
    Dominik Britz

    Hey Paul,

    Could you possibly test the following search? Please change the filter to the same machine in both searches.

    On my machine, MachineCPUUsage and ProcessCPUUsage are always the same as I can't find an easy way to stress the kernel. On your machine, the values should differ.

    | pivot `uA_DM_Process_ProcessDetail` Process_ProcessDetail 
    sum(ProcCPUPercent) as MachineCPUUsage
    splitrow
    _time
    period minute
    filter host is ""
    | join type=outer _time
    [
    | pivot `uA_DM_Process_ProcessDetail` Process_ProcessDetail
    sum(ProcCPUPercent) as ProcessCPUUsage
    splitrow
    _time
    period minute
    filter host is ""
    | fields + ProcessCPUUsage
    ]
  • 0
    Avatar
    Paul Maddocks

    Hi

     

    Thanks for this, we will give it a go acorss the the machine / time period that we saw the behaviour and let you know.

     

    Thanks.

  • 0
    Avatar
    Bob Franken

    I've run the query and it gives me the same chart for both runs. Also the numbers are identical.

  • 0
    Avatar
    Dominik Britz

    Try this one. Again you need to change the host filter.

    | pivot `uA_DM_Session_SessionDetail_Users` Session_SessionDetail_Users
    values(SessionCPUUsagePercent) as ValuesSessionCPUUsagePercent
    splitrow
    _time period minute
    filter host is ""
    | stats
    sum(ValuesSessionCPUUsagePercent) as SumSessionCPUUsagePercent
    count(ValuesSessionCPUUsagePercent) as CountSessionCPUUsagePercent
    by _time
    | join type=outer _time
    [
    | pivot `uA_DM_System_SystemPerformanceSummary` System_SystemPerformanceSummary
    values(CPUUsagePercent) as ValuesMachineCPUUsagePercent
    splitrow
    _time period minute
    filter host is ""
    | stats
    sum(ValuesMachineCPUUsagePercent) as SumMachineCPUUsagePercent
    count(ValuesMachineCPUUsagePercent) as CountMachineCPUUsagePercent
    by _time
    | fields + SumMachineCPUUsagePercent CountMachineCPUUsagePercent
    ]
    | timechart
    `uberAgent_dynamic_span`
    eval(sum(SumMachineCPUUsagePercent)/sum(CountMachineCPUUsagePercent)) as "Machine CPU (%)"
    eval(sum(SumSessionCPUUsagePercent)/sum(CountSessionCPUUsagePercent)) as "All sessions CPU (%)"
  • 0
    Avatar
    Bob Franken

    That query shows different results, The Machine query shows high CPU utilization, and the session graph shows much lower utilization.  

     

  • 0
    Avatar
    Dominik Britz

    Sounds good! In that case, the kernel, or what you see as "System" in Task Manager, is doing some heavy lifting while all other processes are working normally. That's exactly what Paul wrote initially.

  • 0
    Avatar
    Bob Franken

    Ok Machine CPU is all CPU by the machine, but All sessions CPU is just the user processes.  This seems to be working.

     

    The Machine graph for the machine shows this utilization, but it doesn't show up in the applications or processes graph.   

  • 0
    Avatar
    Dominik Britz

    The Machine graph for the machine shows this utilization, but it doesn't show up in the applications or processes graph.

    Do you mean on our dashboards? What I created is a custom search, it's not automatically part of uberAgent's dashboard.

Please sign in to leave a comment.