Resource Monitor¶
A simple cross-platform system resource monitor.
The monitor utility collects telemetry on system resources. Metrics are printed to stdout at regular intervals. Resources are organized under “cpu” or “gpu” device groups. All resources share some global options.
It is easily installable, has an intuitive interface, and is cross-platform. Python 3.7 or higher is required, however.
pip install resource-monitor
$ monitor
usage: monitor [-h] [-v] <device> <resource> [<args>...]
A simple cross-platform system resource monitor.
$ monitor cpu memory --actual --human-readable
2020-01-30 15:24:51.573 desktop.local monitor.cpu.memory 8.23G
2020-01-30 15:24:52.578 desktop.local monitor.cpu.memory 8.23G
...
Getting Started¶
Installation¶
monitor is built on Python 3.7+ and can be installed using Pip
pip install resource-monitor
For use with GPUs you will need to have the associated command-line tools installed.
Currently, Nvidia (using nvidia-smi
) and AMD (using rocm-smi
) are supported.
Basic Usage¶
The usage statement for each resource type is outlined below. The --help
flag is
provided at each level. monitor --help
will show the device groups (i.e., cpu/gpu).
monitor <device> --help
will show available resources for that group.
For complete information including examples at the command-line, the manual page can be
accessed with man monitor
.
CPU Percent¶
usage: monitor cpu percent [-h] [--all-cores] [-s SECONDS] [--csv [--no-header]]
Monitor CPU percent utilization.
options:
-t, --total Show values for total cpu usage (default).
-a, --all-cores Show values for individual cores.
-s, --sample-rate SECONDS Time between samples (default: 1).
--plain Print messages in syslog format (default).
--csv Print messages in CSV format.
--no-header Suppress printing header in CSV mode.
-h, --help Show this message and exit.
CPU Memory¶
usage: monitor cpu memory [-h] [-s SECONDS] [--actual [--human-readable]] [--csv [--no-header]]
Monitor CPU memory utilization.
options:
--percent Report value as a percentage (default).
--actual Report value as total bytes.
-s, --sample-rate SECONDS Time between samples (default: 1).
-H, --human-readable Human readable values (e.g., "8.2G").
--plain Print messages in syslog format (default).
--csv Print messages in CSV format.
--no-header Suppress printing header in CSV mode.
-h, --help Show this message and exit.
GPU Percent¶
usage: monitor gpu percent [-h] [-s SECONDS] [--csv [--no-header]]
Monitor GPU percent utilization.
options:
-s, --sample-rate SECONDS Time between samples (default: 1).
--plain Print messages in syslog format (default).
--csv Print messages in CSV format.
--no-header Suppress printing header in CSV mode.
-h, --help Show this message and exit.
GPU Memory¶
usage: monitor gpu memory [-h] [-s SECONDS] [--csv [--no-header]]
Monitor GPU memory utilization.
options:
-s, --sample-rate SECONDS Time between samples (default: 1).
--plain Print messages in syslog format (default).
--csv Print messages in CSV format.
--no-header Suppress printing header in CSV mode.
-h, --help Show this message and exit.
GPU Power¶
usage: monitor gpu power [-h] [-s SECONDS] [--csv [--no-header]]
Monitor GPU power consumption (in Watts).
options:
-s, --sample-rate SECONDS Time between samples (default: 1).
--plain Print messages in syslog format (default).
--csv Print messages in CSV format.
--no-header Suppress printing header in CSV mode.
-h, --help Show this message and exit.
GPU Temperature¶
usage: monitor gpu temp [-h] [-s SECONDS] [--csv [--no-header]]
Monitor GPU temperature (Celsius).
options:
-s, --sample-rate SECONDS Time between samples (default: 1).
--plain Print messages in syslog format (default).
--csv Print messages in CSV format.
--no-header Suppress printing header in CSV mode.
-h, --help Show this message and exit.
Examples¶
Simple¶
Monitor CPU on a per-core basis on a 10 second interval.
$ monitor cpu percent --all-cores --sample-rate 10
2020-01-30 12:55:55.521 some-hostname.local monitor.cpu.percent [0] 7.5
2020-01-30 12:55:55.522 some-hostname.local monitor.cpu.percent [1] 2.3
2020-01-30 12:55:55.522 some-hostname.local monitor.cpu.percent [2] 8.5
2020-01-30 12:55:55.522 some-hostname.local monitor.cpu.percent [3] 0.8
2020-01-30 12:56:05.525 some-hostname.local monitor.cpu.percent [0] 8.5
2020-01-30 12:56:05.525 some-hostname.local monitor.cpu.percent [1] 2.3
2020-01-30 12:56:05.525 some-hostname.local monitor.cpu.percent [2] 8.6
2020-01-30 12:56:05.526 some-hostname.local monitor.cpu.percent [3] 0.8
...
Monitor CPU memory in actual bytes used and output in CSV format.
$ monitor cpu memory --actual --csv
timestamp,hostname,resource,memory_used
2020-01-30 12:58:21.476,some-hostname.local,cpu.memory,9707892736
2020-01-30 12:58:22.479,some-hostname.local,cpu.memory,9706946560
2020-01-30 12:58:23.480,some-hostname.local,cpu.memory,9724190720
2020-01-30 12:58:24.484,some-hostname.local,cpu.memory,9726636032
...
Monitor GPU utilization on a per-GPU basis on a 10 second interval and log to a file.
$ monitor gpu percent --sample-rate 10 >gpu.percent.log
$ head -8 gpu.percent.log
2020-01-30 13:04:22.938 node-001.cluster monitor.gpu.percent [0] 79.0
2020-01-30 13:04:22.938 node-001.cluster monitor.gpu.percent [1] 0.0
2020-01-30 13:04:22.938 node-001.cluster monitor.gpu.percent [2] 0.0
2020-01-30 13:04:22.938 node-001.cluster monitor.gpu.percent [3] 87.0
2020-01-30 13:04:33.196 node-001.cluster monitor.gpu.percent [0] 72.0
2020-01-30 13:04:33.196 node-001.cluster monitor.gpu.percent [1] 0.0
2020-01-30 13:04:33.196 node-001.cluster monitor.gpu.percent [2] 0.0
2020-01-30 13:04:33.196 node-001.cluster monitor.gpu.percent [3] 90.0
Distributed¶
Monitor core utilization within a distributed computing context.
$ mpiexec -machinefile <(sort -u $NODEFILE) \
monitor cpu percent --all-cores
2020-01-30 13:17:50.980 node-001.cluster monitor.cpu.percent [0] 100.0
2020-01-30 13:17:50.980 node-001.cluster monitor.cpu.percent [1] 1.0
...
2020-01-30 13:17:51.208 node-002.cluster monitor.cpu.percent [0] 100.0
2020-01-30 13:17:51.208 node-002.cluster monitor.cpu.percent [1] 100.0
...
2020-01-30 13:17:51.294 node-003.cluster monitor.cpu.percent [0] 100.0
2020-01-30 13:17:51.295 node-003.cluster monitor.cpu.percent [1] 100.0
...
2020-01-30 13:17:51.319 node-004.cluster monitor.cpu.percent [0] 100.0
2020-01-30 13:17:51.320 node-004.cluster monitor.cpu.percent [1] 100.0
...
Monitor percent main memory utilization within a distributed computing context, as a
background task, and in CSV format. Basically, the produced headers will arrive from each
node, suppress them with --no-header
. Create a single header by just slicing it off
the top of an initial invocation. Collect the process ID so the task can be interrupted
at then end of your job.
$ monitor cpu memory --csv | head -1 >log/resource.mem.csv
$ mpiexec -machinefile <(sort -u $NODEFILE) \
monitor cpu memory --csv --no-header >>log/resource.mem.csv &
$ MEM_PID=$!
...
$ kill -s INT $MEM_PID
Recommendations¶
If collecting data for benchmarking/profiling/scaling purposes (regarding CPU/memory in particular), it may be appropriate to also collect data in the absence of your application as a null-scenario. This can approximate a “background noise” that can modeled and subtracted.
Caveats¶
- monitor merely samples data made available by other libraries or command-line
tools. In the case of CPU resources the psutil library in Python. In the case of
GPU resources the output of the
nvidia-smi
tool. Metrics are reported with regard to the whole system, NOT JUST YOUR APPLICATION. - For GPU resources, currently only
nvidia-smi
androcm-smi
are supported. Additional GPU providers could be supported in the future though. - Sampling more frequently than 1 second is an error. The CPU percent utilization is a time averaged metric subject to how frequently it is sampled.
For Developers¶
Roadmap¶
- Explore additional resources (e.g., disk/filesystem, threads).
Contributing¶
Development of monitor happens on Github. Contributions are welcome in the form of suggestions for additional features, Pull Requests with new features or bug fixes, etc. If you find bugs or have questions, open an Issue.
Guide¶
The monitor command-line interface is written in Python and uses the psutil library. Additional resources may be possible to collect but may not necessarily be easily made cross-platform.
The GPU functionality is simply a wrapper external tools, e.g., nvidia-smi
.
In the library, a fully generalized notion of an ExternalMetric interface is provided.
In principle, anything that could conceivably be invoked on the
command-line need only have a parser method implemented.
For example:
class OpenFiles(ExternalMetric):
"""Report the number of open files (psutil already provides this)."""
_cmd = 'lsof -u `whoami`'
@classmethod
def parse_text(cls, block: str) -> Dict[str, int]:
"""Count lines in the output."""
return {'count': len(block.strip().split('\n'))}