Advice for new Proxmox Hyper converged platform

Hello All,

We are using for many years from XEN to XCP-NG , and recently i stumble in few article and recommendation for Proxmox, in addition looking on the product itself looks so comprehensive in term of features and manageability , especially the Hyper converged options and the fact that containers running natively on the host itself , and i asked myself how come never tested Proxmox , so e decided to test Proxmox first with modest platform(limited on budget) as described below, its will mostly used for IT dept monitoring and management environment, if it will run well we will later invest with more enterprise grade disks , all our monitoring and management are dockerized and we plan to make it k8 and rancher in the future.
Later if all go well we would like to mount system for our QA environment that would be based on SuperMicro Hyper SuperServer based , and Hyper converged with Proxmox.

Current Platform with following spec:
Cluster quantity: 4x supermicro 1u servers
Cpu: 2x e5-2690 v4
Mem: 256GB
Net: 4x10Gb
Disk: options 10 x 2.5 disks , we plan to mount 6x 2TB Samsung 870 evo +2x 2TB sumsung nvme (for cache)

As for Hyper converged mode, i understand there is 2 options ,
First one is Ceph cluster which i don’t have experience with, and have minimum knowledge on it , i understand its uses the physical available disks and cannot combined with ZFS volumes

Second option is create ZFS volumes and GlusterFS on top of it, i have minimum experience with it use to play with it back few years ago , but in case we go Glusterfs which mode is best suited for 3 nodes?

I would like to get your advice and tips for build setup on which is better option for hyper converged
For example network layout for each option and disk settings
Please Advice
Thanks

I don’t have experience with hyperconverged infrastructure, so I’ll be talking out of what I heard from others.

People seem to be praising Ceph and Ceph integration in Proxmox. I have been a Proxmox user for a while, both at home and at a really small data room. We have only used NFS NAS attached to our Proxmox hosts and aside from not having highly available storage (besides the local raid, but I’m talking complete NAS failure), the host HA did work wonders, being able to either live migrate, or move the activity on another host without noticing any downtime when a HA node fails.

I’m not in the business of doing a sales pitch for Proxmox though, on the low-end, it does have issues, like if your cluster ever enters a failed state, it’s going to be hard to recover the cluster without resurrecting the dead node(s), due to how Proxmox operates. But if you won’t be faced with such a situation and if you will have more than 5-7 hosts, it should be fine.

As for storage, I’m not sure how Ceph itself works when ran directly on the hosts. When you have a ceph cluster, you get the benefit of basically a highly available SAN. But I don’t know how it runs on the local hosts though. Common sense would dictate that running on the nodes doesn’t necessarily imply that a disk image would run on the same host, meaning you are likely to be reading data from another host, which besides breaking the point of running local storage, it also puts an additional strain on the hosts that could have been used for other things.

So my advice would be having a Proxmox cluster and a separate Ceph cluster. The ceph cluster can also run Proxmox, because I heard Ceph is easy to setup on Proxmox.

I just hope I am not wrong about my assumptions about ceph, or if there is something missing from my current knowledge, like having a way to run VMs on local storage and have them replicated on another server, because that would make it quite impressive.

Some additional points to make. Proxmox LXC integration is terrible. You are limited to what Proxmox provides to you. If you are ok with only running Alpine, Arch, CentOS, Debian, Fedora, Gentoo, OpenSUSE, Rocky, Turnkey and Ubuntu, you should be fine, but if you want something else, even if you have perfectly working LXC rootfs images, good luck on that, because Proxmox can only detect those ones. Also, Proxmox can’t do live migration on LXC, just on VMs.

As for K8s cluster, I suggest that, if the OCI containers (docker) themselves can be disposable, you should run them in LXC containers (nested containers). Make a few containers on each Proxmox node, like say 4 LXC containers for each Proxmox host, out of which 1 will be a master k8s node on each proxmox host, with the rest of them being worker nodes. That would give you 3 master nodes and 9 worker nodes to work with. If any of the worker nodes go down, the load can be spread on the other ones, or if one of the proxmox hosts go down, the rest of the K8s cluster running on the other 2 Proxmox hosts can take up the slack.

However, if the K8s workers storage is not to be considered disposable, I suggest you switch from LXC to VMs and make them HA. This is not the way Kubernetes was intended to be run, but nowadays so many applications are made into OCI containers (usually docker) that it’s hard not to run them as containers. So you would still have the same number of k8s nodes, but when a container fails, if you need the information from that specific one, you have to revive that node.

Still, if you can follow the perishable k8s nodes route, that would be better, as the containers can be relaunched by the master nodes on demand, or even when you need more resources (auto-scaling). But for monitoring software (NMS), I don’t think that’s how it works. At least I can’t imagine having my Prometheus and Grafana servers be considered perishable, I need the information contained in those servers and launching new instances when the old ones die wouldn’t help me without the data inside them.

1 Like

One more thing, but that’s more personal preference, is to have 6 NICs:

  • 2 for the storage VLAN / subnet. The storage has to always have its own pipe(s), no negotiations here
  • 2 for the Proxmox management subnet. Proxmox doesn’t really care about the size of the pipe, the devs say that it can be 1 Gbps as long as it’s low latency, because the hosts need to communicate on that pipe. Having it being gobbled up by VMs activity or worse, by the storage, makes it not so good. With that said, we have ran a 2x 1Gbps LACP which held both the VMs and the Proxmox communication, but we only got away with that because the VMs were pretty much standalone for the most part, they didn’t need much access to the network. They were on separate VLANs though.
    But that won’t always be the case, especially when dealing with NMS, we had that virtualized, but we gave up and moved it to a host by itself. Also, if your VMs have a lot of RAM, like 64 GB+, it will take a hot minute to live migrate VMs through just a 1 Gbps pipe, so 10 Gbps pipes start making sense.
  • 2 for the VMs. To be honest, if there isn’t a lot of traffic that your VMs are doing, you can get away with 2x 1Gbps ports, but that depends on how many VMs you will run and how much network activity there will be, so it’s likely that you will need that 10G pipes.
1 Like

Thank you very much for the advice :smiley:

1 Like