How to graph the resource usage of servers?

I haven’t looked into it yet, but I’ve seen people use Grafana for this use-case.

1 Like

Thanks,

This ranks pretty high on my accidental home lab priorities. I would be nice to have a single pane of glass to show memory, cpu, and storage consumption across all of my servers. Then if something seems to be nearing capacity I can shift things around or buy more of a specific resource.

I’ll go though the link I posted above a see what I can figure out. But, to be honest, I like your video style. They feel like they bring me up to the necessary level to understand have the various parts of a system fit together before I get down in the weeds.

David

1 Like

I’m hoping to cover this topic as well as centralized logging as soon as I can. There’s some things I’ll catch up on and I’ll try to see where everything fits into my schedule. Sometimes the production quality going up can slow me down a bit, so I’m trying to figure out how to automate some of my video production so I can be quicker to cover topics.

2 Likes

Great.

The last time I really played with my network was in the early 2000’s when the NSLU2 came out and flashing router firmware was all the rage. Then, life and work seemed to take over.

Home labs and home networking (and google search) are now at a stage where one can set up a surprising polished system with just a few hours per week. Most importantly, for me, the tools are good enough that when something goes wrong I can find the issue or a least revert the change until I get some time to look into things.

1 Like

Grafana, InfluxDB, and Telegraf.

Telegraf is the agent you install on the host to push metrics into InfluxDB. Telegraf has a huge config file, but is very easy to configure. Grafana is the dashboard engine.

I installed all three on a Debian server last week and used Ansible to install Telegram on 4 VM’s. Then I configured Telegram. Had everything up an running within 20 minutes. I did, however, have to configure /etc/telegraf/telegraf.conf to enable network interface monitoring and specify the name of the interface you want to monitor.

Here is the article I used to perform the install and configuration:

3 Likes

I’m personally using Grafana and Prometheus as Docker containers, with prometheus-node-exporter running on each machine I’m monitoring. This works very well for me. :blush:

1 Like

Thanks,
I appreciate the suggestion. I’ll check out Prometheus.

Unless I am mistaken, monitoring and reporting seem to be one of the things that users and companies are often willing to pay to do correctly. I went through the Graylog homelab episode over the weekend and go a good feeling for how that works.

1 Like

Depending on your needs, but, many of the heavy hitters have already been mentioned in this thread. The traditional ELK Stack (Producers, Storage, Processors, Grapher’s ) is a good starting point.

I ran across this one and bookmarked it some time back due to it’s Jupyter Notebook / Spark components.

I wouldn’t say it’s an entry level approach, but it certainly covers a lot of ground in the logging / analytics space.

1 Like

Thanks, I’ll look into it.

I am thinking that I spent a ton of time working on getting the home/free versions of a vmware based infrastructure working. While it might be great in a large network, it was overly complex for my basic needs (and basic hardware.)

Proxmox meets my needs/willingness to deal with complexity almost perfectly. I am kind of hoping there is a similar type of monitoring tool.

For monitoring anything, Zabbix is an excellent tool and will run on a raspberry pi.

ELK or Splunk is a bit advanced. If you are new to monitoring, I’d recommend starting off with Grafana or Zabbix. But there are other tools out there. Nagios is popular as well.

I’ve been working in the application and network monitoring area for the past 15 years, so let me know if I can help.

1 Like

If anyone else is learning along…

There is an excellent explanation of the Prometheus Architecture at How Prometheus Monitoring works | Prometheus Architecture explained - YouTube

I am going to try to set up a simple monitoring system that monitors my Proxmox server, my primary Synology NAS, my remote backup NAS, and my primary laptop with a couple of different systems.

As odd as it seems, my goal is a ‘set it and forget it’ system where I can trust my network is chugging along without any intervention by me.

Going to set up prometheus based system today… will try zabbix asap.

1 Like

Prometheus is also very popular and Grafana integrates very well it.

I like Grafana quite a bit. I was thinking of setting that up on my servers (which usually leads to a video) but I’ve gotten quite a bit behind schedule at the moment.

2 Likes

I’m currently running Grafana and InfluxDB from a VM, but they both run great on a RPi-4.

1 Like

I would be interested in a video using Grafana and how to setup own graphs. More specifically, what things should we make available at a glance, given the limited “above the fold” space.

I made my own Home Screen, using the Prometheus Node Exporter Full Dashboard as a template.

This has basic stats for my machines, as well as basic HDD and ZFS stats. And I link directly to the Node Exporter Full Dashboard for more detailed information.

3 Likes

@ ameinild - First, nice Dashboard !

I can’t got into great detail, but, my team manages several critical applications where we are responsible for all aspects of the deployment(s) (Bare metal servers, instances, patching, application performance, SLA’s etc). We have hundreds of nodes across virtually all regions where we have infrastructure (that list is growing rapidly). Prometheus and Grafana are two tools my team use extensively. When problems happen (it’s not a matter of if, it’s when), these tools (and others) prove invaluable for getting resources back on line and operating in peek condition.

In low volume scenarios, like most of us have in our home labs, the items you’ve chosen to monitor are solid choices (they apply equally at scale also). However, in High-Volume situation, things like: Networking, HTTP Requests, Stack Tracing, Thread Monitoring, Response Time Metrics, Database transactions, and more, all become critical in troubleshooting issues that may have originated down stream. Setting these metrics up can be no small task, but, the effort is well worth it when a problem crops up.

Anyone looking to work at scale in the DevOps / Cloud Infrastructure / Application Orchestration World would be wise to gain a good understanding of at least the basics of Prometheus and Grafana. Replicating the dashboard(s) you’ve done here would be a very good place to start.

1 Like

That dashboard is great, how long did that take?

Hi. I did an updated version here, also with links to the files on Grafana. I had been fiddling with the new layout over a weekend, so this is how it ended out. :blush:

Oh, posting our dashboards?

The first server is my file storage (samba / sshfs) VM that I forgot to add to monitoring for quite a while. 2nd is my Proxmox server and 3rd is (if it wasn’t obvious) the grafana server (which also has prometheus locally installed).

Seems like Firefox screenshot tool doesn’t know how to make a full page screenshot from Grafana dashboards. Interesting. Well, in any case…

I know it wasn’t a question aimed at me, but I’ll answer it too. I didn’t make it, there are lots of dashboards to grab from Grafana and other sources. But my total setup time for a grafana + prometheus server + installing node_exporter on all the clients took around 15 minutes. It is really easy to setup.

In part, the ease of setup was because grafana, prometheus and node_exporter are all in my distro’s main repo. But setting it up on CentOS / Rocky / Alma / Springdale / OEL or on Debian shouldn’t take more than 20-30 minutes either, just add the repo for prometheus and grafana and you’re good to go. Prometheus requires about 20 lines of config in its file. Grafana requires no fiddling other than changing the password for admin, downloading your dashboard and when you set it up, point it to prometheus metrics page. Node_exporter just needs to be downloaded, ran and enabled at startup.

One thing I didn’t take my time to set up was AlertManager, Prometheus’ own alert engine. With it, you can get to set thresholds and other alerts regarding hosts and services and you get them in a JSON format (IIRC), which you can use as a source for another Grafana dashboards (aptly named Prometheus AlertManager), like storage capacity warning messages, or host down and such. AlertManager can also send you mails, but I haven’t set up a mail server. It shouldn’t take too long to set AlertManager if you already have a mail server.

Here’s another dashboard that uses Node_exporter, but this time, it’s a more in-depth one for closer monitoring.

This dashboard is long and has a lot of data points being monitored, it’s pretty ridiculous what prometheus can gather. And it doesn’t really use a lot of CPU.

Here’s a view of what node_exporter provides:

This page has 4134 lines, of which about 2902 are pure data (not comments).

And here’s how Prometheus actually looks in the background, without the beauty of Grafana:

I disabled graphs, because grafana takes care of that. And I don’t have alerts set, as mentioned before. I estimate that, if you know how linux work, or at the very least how your distro of choice works and you have no idea about monitoring, but have fiddled with config files in the past (like sshd or apache / nginx), you should be able to set up a prometheus + grafana server and node_exporter clients in at most 2 hours. I believe 2 hours is a lot of time to be honest, considering that a somewhat experienced sysadmin like myself has set it up in just 15 minutes (and I’m no linux guru, it’s very likely I know way less stuff than Jay).

1 Like