Axibase Time Series Database collects Docker container performance metrics through Google cAdvisor (Container Advisor) for long-term retention, analytics and visualization. A single ATSD instance can collect metrics from multiple Docker hosts and cAdvisors instances.
In a basic configuration cAdvisor monitors all running containers on the Docker host. Container statistics are sent over TCP protocol to the ATSD container installed on the same host. When a new container is launched it will be automatically discovered by cAdvisor and its statistics will be continuously sent into ATSD while the container is running.
In an advanced configuration, multiple cAdvisor instances can be configured to send container statistics to a centralized ATSD installation. ATSD will store metrics from local and remote Docker hosts for consolidated reporting and analytics. This type of configuration is suited for centralized workload planning, capacity planning and performance monitoring.
Launch ATSD container as the back-end for cAdvisor
Installation steps are described here.
In addition to collecting container statistics we recommend installing collectd agent on each Docker hosts as well.
Built-in entity groups and portals
Default visualization portals for cAdvisor entities are included in ATSD.
Default cAdvisor portal names:
Cadvisor Disk Detail
Using the built-in Overview, Disk Detail, Host and Multi-Host visualization portals, you can quickly identify bottlenecks in your microservices infrastructure
cAdvisor Overview Portal
cAdvisor Disk Detail Portal
cAdvisor Host Portal
cAdvisor Mutli-Host Portal
Collected cAdvisor Metrics
CPU metrics will be found in the
cpuacct controller. CPU usages is generated by the processes of the container, broken down between
system time. User is the time during which the processes were in direct control of the CPU, and system is the time during which the CPU was executing system calls on behalf of those processes. Those times are expressed in
ticks of 1/100th of a second.
cpu.loadaverage cpu.loadaverage% cpu.usage.percpu cpu.usage.percpu% cpu.usage.system cpu.usage.system% cpu.usage.total cpu.usage.total% cpu.usage.user cpu.usage.user% cpu.host.usage.system% cpu.host.usage.total% cpu.host.usage.user%
I/O is accounted in the
io_service_bytes – indicates the number of bytes read and written by the cgroup. It has 4 counters per device, because for each device, it differentiates between synchronous vs. asynchronous I/O, and reads vs. writes.
io_serviced – the number of I/O operations performed, regardless of their size. It also has 4 counters per device.
diskio.ioservicebytes.async diskio.ioservicebytes.read diskio.ioservicebytes.sync diskio.ioservicebytes.total diskio.ioservicebytes.write diskio.ioserviced.async diskio.ioserviced.read diskio.ioserviced.sync diskio.ioserviced.total diskio.ioserviced.write
Memory metrics are found in the “memory” cgroup.
pgmajfault – indicate the number of times that a process of the cgroup triggered a page fault and a major fault. A page fault happens when a process accesses a part of its virtual memory space which is nonexistent or protected.
memory.usage – the amount of all used memory, regardless of when it was accessed.
memory.usage – the amount of memory that a processes require in a given time interval. This includes recently accessed memory, dirty memory and kernel memory.
memory.containerdata.pgfault memory.containerdata.pgmajfault memory.hierarchicaldata.pgfault memory.hierarchicaldata.pgmajfault memory.usage memory.workingset cadvisor.memory.cache cadvisor.memory.rss
Network metrics track the about of packets received/sent, amount of traffic in bytes, dropped packets and errors.
network.rxbytes – cumulative count of bytes received.
network.rxpackets – cumulative count of packets received.
network.rxerrors – cumulative count of receive errors encountered.
network.rxdropped – cumulative count of packets dropped while receiving.
network.txbytes – cumulative count of bytes transmitted.
network.txpackets – cumulative count of packets transmitted.
network.txerrors – cumulative count of transmit errors encountered.
network.txdropped – Cumulative count of packets dropped while transmitting.
network.rxbytes network.rxdropped network.rxerrors network.rxpackets network.txbytes network.txdropped network.txerrors network.txpackets
taskstats.nriowait – number of tasks waiting on I/O.
taskstats.nrrunning – number of running tasks.
taskstats.nrsleeping – number of sleeping tasks.
taskstats.nrstopped – number of tasks in stopped staten
taskstats.nruninterruptible – number of tasks in uninterruptible state.
taskstats.nriowait taskstats.nrrunning taskstats.nrsleeping taskstats.nrstopped taskstats.nruninterruptible
File System Metrics
File system metrics track the read and write processes for attached file systems.
filesystem.available filesystem.ioinprogress filesystem.iotime filesystem.limit filesystem.readscompleted filesystem.readsmerged filesystem.readtime filesystem.sectorsread filesystem.sectorswritten filesystem.usage filesystem.weightediotime filesystem.writescompleted filesystem.writesmerged filesystem.writetime cadvisor.filesystem.baseusage cadvisor.filesystem.inodesfree
NOTE: disk metrics and file system metrics are only collected from containers that have attached volumes.