Accidental Homelab - Sept, 15 2021 - Proxmox clusters -- big - little configuration

dfarning · September 15, 2021, 11:57pm

I have spent much of the past week learning about Dell PowerEdge servers and how to use one as part of a homelab cluster.

One of my goals in my homelab is to provide compute and storage for my home and office while keeping power consumption reasonable. I have been really happy running proxmox on my NUC to provide Virtual Machines. It sips power 10-15 Watts at idle and 15-20 Watts at a normal load. My Synology 918+ has done a great job as a basic file server. It uses 20-30 Watts with 3X8TB drives installed.

As always seems to be the case, this entry-level hardware was starting to feel cramped. The 32GB Memory of the NUC meant I spent a significant amount of effort keeping my memory footprint low. You can squeeze a surprising amount of docker and LXC containers into 32GB. The other limiting factor has been the 1GB network connectivity on both of those devices.

Thus the new (to me) server. After a fair bit of experimentation, I have gotten the R730 down to about 100 Watts at idle where it spends much of its time. Under heavy load, it can go up to 500-600 Watts.

Another interesting feature is that you can turn nodes of a proxmox cluster off when not in use. It is nowhere near as elegant as a laptop powering up and down as needed. But, it does work. So, I call it my big-little configuration.

The most interesting (for me) thing is the additional organization and discipline that a second server brings. I try to set my services us so that they are server agnostic. Ansible is a bit help in that regard. I can change something in one place in ansible and have it propagate throughout the network rather than log into each machine one at a time and update things manually.

Sept 2021
Starting Balance -$ 219
Server Purchase -$1400
homelab monthly budget +$ 50

Ending Balance -$1569

I fear I may have to talk to my spouse about increasing my $50 per month homelab budget. I can feel the eye rolls already!

jay · September 16, 2021, 5:51pm

In regards to turning off nodes of a Proxmox cluster when not in use, can you elaborate more on that?

Also, have you considered having your PowerEdge servers automatically shut down via cron, and perhaps wake them up in the morning via a Raspberry Pi and wake on lan? That’s essentially what I’m doing on my end, though it’s a bit different because 10 Gigabit Intel cards don’t support wake on lan (but there’s alternatives).

dfarning · September 17, 2021, 1:36am

Goals

The idea for the big-little proxmox servers comes from the big.LITTLE processor design ARM has been developing for the last couple of years. ARM big.LITTLE - Wikipedia The basic idea is that one has two heterogeneous types of cores (or servers nodes in my case). Slow power-efficient cores run when there is very little load and some fast power-consuming cores turn on when the load increases.

In promxox

I don’t think that this is a use case that the proxmox developers take into consideration when working on proxmox. However, proxmox has a couple of things working in its favor:

Promox is designed to deal with a node unexpectedly going offline very well. If a node dies unexpectedly, the other nodes continue running their VMs and containers as if nothing happened.
Proxmox can gracefully handle manual node shutdowns. If you shut a node of vie the GUI or CLI, proxmox sends the poweroff command to individual VMs and containers on that node. When they are in the poweroff state, the proxmox node shuts itself off. This works pretty well. The only questionable behavior I have seen is a GitLab container which sometimes takes 5-10 minutes to shut down. I need to read the logs to make sure that those containers are going down gracefully.

The challenge is that a Promox cluster is not designed to intentionally run without all of its nodes:

The first issue is quorum. When making a change to the ‘Promox configuration’ all of the nodes in a cluster have a vote. If one node of a two-node cluster is powered off, it takes two votes to do anything… and there is only one node available to vote. This is a very common design pattern in clustering systems. Some of the early spacecraft used a similar system. They had three navigational computers, if all three didn’t agree, they would all try the calculation again.
The second issue is implementation. The Proxmox GUI does not do anything directly. Instead, the GUI writes to configuration files that are mostly stored in /etc/pve. Then when you hit apply, proxmox triggers a lower level qemu CLI command which read the configuration files and make the actual changes to the system

There doesn’t seem to be anything that can queue configuration changes and apply them a node comes back online.

I need to do more experimentation to see what happens when a node is offline. The power edge currently takes about 10-15 minutes to power back up so waking up to make changes is not really an option.

Power off

Physically, I have found a couple of ways to power off the server:

log in to the server and manually power it off.
Hit the shutdown button in the GUI.
Use a cron job to send the poweroff command
In poweredge systems one can:
a. Log into iDrac and shut it down
b. Send ipmi commands Intelligent Platform Management Interface - Wikipedia
I do it the simple way and set up an alias in ssh config to automatically ssh to the device and power it off.

I went with the manual method because my life is very unstructured. I don’t have set times to work in my homelab.

Power on.

Interestingly, powering the system back on is a bit harder. My favorite technique is WoL or Wake on Lan. There are several GUI and CLI systems that can send WoL signals manually or on a schedule. The challenge is that not all combinations of network adaptors and motherboards respond correctly to WoL signals.

On the Poweredge server, I used impi commands (see above) sent through the iDrac network adaptor and it works a treat.

I consider my NUC like @jay appears to use his RPI. It is a power-sipping ‘head’ of the homelab device which controls everything else.

If anyone else is interested in working on this, it might be interesting to look at the HA (High Availability) code in proxmox. HA has to prepare the cluster so that there are no service interruptions if a node goes offline. It might lend some insight into how to handle LA (Low Availability) situations.

Yep, in 5 years when LA is a thing in proxmox you can say you learned about it first on the Learn Linux TV forum Now, I just have to convince people smarter than me to implement it