MetricsView: Linux (Collectd) Custom Collector Task Setup

Device Type: MetricsView Platform

Glossary Entry:  Linux Custom Collector

Setting up and Installing a Linux Custom Collector

How to Edit a Linux Custom Collector Task

Wizard Interface

Task Name: Enter a name that describes the function of the task. For example “Server 2 CPU % in use”.

Task UID:  The UID is a unique ID that is generated for each task.  This ID is used to interface with the task in the API.

Collector:  Select the collector from which to retrieve the data for the task.

Custom Counterpath: reflects the relative path to the counter in the source system
for example : ..\cpu\0\cpu-idle
[local or remote machine]\[category]\[instance ]\[counter type] The Counterpath is generated automatically based on the values you select in the dropdown menus below.

Hostname:  The name or IP address of the target machine.

Legacy Advanced Interface

Once you have created a device, installed the Linux Agent and are adding or editing a custom collector task, you will be prompted to adjust the following settings:

MetricsView Collector dropdown shows a list a of all registered custom collectors. The Linux Collector will appear in this list after installation is completed.

Counterpath reflects the relative path to the counter in the source system
for example : ..\cpu\0\cpu-idle
[local or remote machine]\[category]\[instance ]\[counter type] The Counterpath is generated automatically based on the values you select in the dropdown menus below.

Both Interfaces

Category/Instance/Counter dropdowns will be filled with values automatically

Error Thresholds

Aggregate – All received data will be aggregated on a regular basis, according to the adjusted device frequency.

Maximum – the highest value from the array will be taken.

Average – the value is calculated as an average of all collected values.

Minimum – the lowest value from the array will be taken.

Min Threshold  – Results below this value will result in an error being triggered.

Max Threshold – Results exceeding this value will result in an error being triggered.

Ignore Errors if counter is not available – Each time during “Agent” <–>”Server” interaction the Agent asks if there are any new counters to check. In case there are instructions to gather stats on new counters the Agent will begin to gather them.

NO is selected – each failure to gather counter data will be reflected as an error in reports.

YES is selected – failures will be ignored.

Counter descriptions (in process):

CPU

CPU plugin collects the amount of time spent by the CPU in various states, most notably executing user code, executing system code, waiting for IO-operations and being idle. https://collectd.org/wiki/index.php/Plugin:CPU

cpu-interrupt :: Reflects time the processor has spent servicing interrupts

cpu-wait :: For a given CPU, it is the time during which that CPU was idle (i.e. didn’t execute any tasks) and there was at least one outstanding disk I/O operation requested by a task scheduled on that CPU (at the time it generated that I/O request).

cpu-system ::  is the amount of time the CPU was busy executing code in kernel space (https://en.wikipedia.org/wiki/Kernel_space).

cpu-softirq :: For better understanding of softirqs we would recommend reviewing Matthew Wilcox’s article “I’ll Do It Later: Softirqs, Tasklets, Bottom Halves, Task Queues, Work Queues and Timers

cpu-steal :: (for the whole system only), on virtualized hardware, is the amount of time the operating system wanted to execute, but was not allowed to by the hypervisor. This can happen if the physical hardware runs multiple guest operating system and the hypervisor chose to allocate a CPU time slot to another one.

cpu-nice :: The “nice” CPU percentage is the % of CPU time occupied by user level processes with a positive nice value. For more details please see  man nice  in console

cpu-user :: is the amount of time the CPU was busy executing code in user space(https://en.wikipedia.org/wiki/User_space).

interface

if_errors-rx :: Rate of read errors recorded on the interface
if_octets-rx :: Rate of octets read from the interface
if_octets-tx :: Rate of octets written to the interface
if_packets-tx :: Rate of packets written to the interface
if_errors-tx :: Rate of write errors recorded on the interface
if_packets-rx :: Rate of packets read from the interface

df (space usage)

df_complex-free :: Bytes free on disk
df_complex-reserved :: Bytes reserved for root (linux filesystems often reserve a small percentage of total disk

capacity for the root user to protect the system from non-root users filling up the
filesystem)

df_complex-used :: Bytes used on disk

disk (Disk I/O)

disk_time-write :: Amount of time (in milliseconds) the disk spent writing
disk_ops-read :: Total number of read operations performed by the disk
disk_ops-write :: Total number of write operations performed by the disk
disk_octets-write :: Rate of octets written to disk
disk_time-read :: Amount of time (in milliseconds) the disk spent reading
disk_merged-write :: The number of write operations that were merged together by the kernel (because they were

adjacent)

disk_merged-read :: The number of read operations that were merged together by the kernel (because they were

adjacent)

disk_octets-read :: Rate of octets read from the disk

memory

memory-buffered :: The “buffered” memory is memory amount used by Linux to buffer network and disk connections.

memory-cached :: Most Linux distributions will use any available free ram to cache access to files on disk which helps to speed up disk access. When the system runs low on free memory it will automatically flush this data out of ram to make room for programs and other essential data.
memory-used :: the total amount of memory utilized by system
memory-free :: the total amount of free memory in system

memory-slab_unrecl ::
memory-slab_recl ::