Monitoring Steal Time with NMON

March 2, 2015

For those of you have been reading our EC2 Monitoring: The Case of the Stolen CPU article, the question of how to properly monitor stolen CPU and virtual machine performance is an ongoing discussion.

Axibase has taken an important step in monitoring virtual machines and tracking CPU steal time by releasing a new, open source nmon tool fork for Linux systems.

The sole purpose of the new nmon fork is to collect and display CPU steal time. This is a useful tool for monitoring a micro-partitioned virtual machine such as micro/small instances on the Amazon EC2 cloud which are subject to dynamic CPU capping.

On the Amazon EC2 cloud, while the VM is idle, it accumulates CPU Credits. When the VM becomes busy, it starts burning into available CPU credits (if any) by consuming excess cycles from a shared CPU pool until it caps at entitlement. This is specific for T2 instances which are for workloads that do not often utilize the full CPU, but which sometimes need a surge in performance.

A CPU Credit provides the performance of a full CPU core for one minute. When CPU credits are not being utilized, the instance provides baseline CPU performance. The rate at which CPU credits are earned depends on the instance size. Micro instances earn 6 credits per hour, small earn 12, and medium earn 24. Unused credits are stored for up to 24 hours at which point they expire.

CloudWatch shows the CPU usage cycles for virtual machines. After the CPU credits are burned, the stolen CPU goes up significantly:

Below is the same chart that displays the breakdown in CPU usage with steal time prominently present. This graph is generated in the Axibase Time Series Database from the statistics gathered using the new Axibase nmon tool fork:

Changes have been made to enable steal time viewing in both nmon console and file modes:

CPU Steal Time in nmon header:

CPU001,CPU 1 ip-172-30-0-223,User%,Sys%,Wait%,Idle%,Steal%

CPU_ALL,CPU Total ip-172-30-0-223,User%,Sys%,Wait%,Idle%,Steal%,Busy,CPUs

CPU Steal Time in nmon snapshot:

CPU001,T0001,48.2,1.3,0.0,0.0,50.4

CPU_ALL,T0001,48.2,1.3,0.0,0.0,50.4,,1

CPU Steal Time in Nmon Console:

To learn more about the Axibase nmon system performance monitoring tool fork, please visit the github pages:

nmon version 15a: https://github.com/axibase/nmon/tree/15a

nmon version 14i: https://github.com/axibase/nmon

To learn more about the Axibase Time Series Database, a tool that provides analysis, visualization, alerting and capacity planning for machines monitored with nmon, please visit the ATSD pages:

Axibase Time Series Database

ATSD nmon guide

Axibase Time-Series Database also has the capability to collect Amazon CloudWatch metrics for advanced analytics and long-term retention. Learn about Amazon Web Services integration.