How to create devices and tasks

The MetricsView Linux monitoring service enables Linux computer or server monitoring, as well as allowing users to set thresholds for data from Collectd performance counters and set alerts when those thresholds are exceeded.

Once you have created a device, installed the Linux Agent and are adding or editing a custom collector task, you will be prompted to adjust the following settings::

Collector

MetricsView Collectors enables the data gathering from Collectd performance counters from target computers.

To start with the task, you need to specify the collector agent, that will be used as a monitoring target for the task:

  • If a collector agent has already been set up, select the name of the collector agent in the Use Existing Collector or Install New list.
  • If there are no collectors in the list, click Install New to set up a new collector

Linux Counter Path reflects the relative path to the counter in the source system: [local or remote machine]\[category]\[instance ]\[counter type]. The counter path is generated automatically based on the values selected in the Hostname/Category/Instance/Counter dropdowns.

Example

Linux Counter Path: ..\cpu\0\cpu-idle 

Hostname

Provide an IP or a name of the target machine with MetricsView Linux Agent (Collectd) installed.

Counter Category

Select 1st level grouping criteria (for example, CPU).

Counter Name

Select 2nd level grouping criteria (for example, CPU #2 load).

Counter Instance

Select 3rd level grouping criteria (for example, CPU Core number).

See counter descriptions at the end of the article. 

Error Thresholds

  • Aggregate: All received data will be aggregated on a regular basis, according to the adjusted device frequency.
    • Maximum – the highest value from the array will be taken.
    • Average – value is calculated as an average of all intermediate values.
    • Minimum – the lowest value from the array will be taken.
  • Min threshold: Threshold exceeding will result alerting.
  • Max threshold: Threshold exceeding will result alerting.
  • Ignore if Unavailable: Each time during “Agent” <–>”Server” interaction Agent asks if there any new counters for Agent to check. In case there are instructions to gather stats on new counters Agents starts to gather them. In case NO was selected each failure in Counters polling will be reflected as an error in reports, in case YES was selected – failures will be ignored.

Task UID

 The UID is a unique ID that is generated for each task. This ID is used to interface with the task in the API.

Counter descriptions

CPU

CPU plugin collects the amount of time spent by the CPU in various states, most notably executing user code, executing system code, waiting for IO-operations and being idle. https://collectd.org/wiki/index.php/Plugin:CPU

cpu-interrupt :: Reflects time the processor has spent servicing interrupts

cpu-wait :: For a given CPU, it is the time during which that CPU was idle (i.e. didn’t execute any tasks) and there was at least one outstanding disk I/O operation requested by a task scheduled on that CPU (at the time it generated that I/O request).

cpu-system ::  is the amount of time the CPU was busy executing code in kernel space (https://en.wikipedia.org/wiki/Kernel_space).

cpu-softirq :: For better understanding of softirqs we would recommend reviewing Matthew Wilcox’s article “I’ll Do It Later: Softirqs, Tasklets, Bottom Halves, Task Queues, Work Queues, and Timers

cpu-steal :: (for the whole system only), on virtualized hardware, is the amount of time the operating system wanted to execute but was not allowed to by the hypervisor. This can happen if the physical hardware runs multiple guest operating system and the hypervisor chose to allocate a CPU time slot to another one.

cpu-nice :: The “nice” CPU percentage is the % of CPU time occupied by user-level processes with a positive nice value. For more details please see  man nice  in console

cpu-user :: is the amount of time the CPU was busy executing code in user space(https://en.wikipedia.org/wiki/User_space).

interface

if_errors-rx :: Rate of read errors recorded on the interface
if_octets-rx :: Rate of octets read from the interface
if_octets-tx :: Rate of octets written to the interface
if_packets-tx :: Rate of packets written to the interface
if_errors-tx :: Rate of write errors recorded on the interface
if_packets-rx :: Rate of packets read from the interface

df (space usage)

df_complex-free :: Bytes free on disk
df_complex-reserved :: Bytes reserved for root (linux filesystems often reserve a small percentage of total disk

capacity for the root user to protect the system from non-root users filling up the
filesystem)

df_complex-used :: Bytes used on disk

disk (Disk I/O)

disk_time-write :: Amount of time (in milliseconds) the disk spent writing
disk_ops-read :: Total number of read operations performed by the disk
disk_ops-write :: Total number of write operations performed by the disk
disk_octets-write :: Rate of octets written to disk
disk_time-read :: Amount of time (in milliseconds) the disk spent reading
disk_merged-write :: The number of write operations that were merged together by the kernel (because they were

adjacent)

disk_merged-read :: The number of read operations that were merged together by the kernel (because they were

adjacent)

disk_octets-read :: Rate of octets read from the disk

memory

memory-buffered :: The “buffered” memory is memory amount used by Linux to buffer network and disk connections.

memory-cached :: Most Linux distributions will use any available free ram to cache access to files on disk which helps to speed up disk access. When the system runs low on free memory it will automatically flush this data out of ram to make room for programs and other essential data.
memory-used :: the total amount of memory utilized by the system
memory-free :: the total amount of free memory in the system