Vagrant replacements for local development?

hypoiodous · February 18, 2022, 11:55pm

Hello,

as the title suggests I’m looking for alternatives to Vagrant for local development and testing. I mostly use it to spin up a few VMs and try out things like Docker, Ansible, test run a little script, try a new distro and things like that.

I have no super huge complaints about Vagrant as it has worked pretty well for me, except that lately it has been misbehaving and leaving unreachable VMs behind that are hard to clean up. I’m not entirely sure if it’s an issue with Virtualbox to be honest, but I think it’s time for a change and learn something new.

The thing is I’m pretty new to this whole realm of virtualization so I know nothing about KVM, QEMU, etc… however LXC and LXD have caught my attention and I’m very curious about them. I’d like to know your thoughts on this. I’m mostly interested in being able to run a VM to play with and if I can learn something useful on the way all the better. I feel like I know just enough about Vagrant to get by, but I don’t see this information being extremely useful.

Also worth noting that I’m running Ubuntu and I mostly use Ubuntu and sometimes CentOS inside my VMs.

Thanks a lot!

ThatGuyB · February 19, 2022, 12:51am

I’m not familiar about Vagrant, I just know it’s a tool that makes creating VMs and containers easy.

In all honesty, if you are using Linux already, just install Virt-Manager. Look for how-to’s online. This should help you with VMs. If you use Windows, idk, try Hyper-V if you have a Pro version, or get VirtualBox (eeww).

As for LXD, it’s my favorite container platform. There is no GUI for it though, but it is pretty easy to use. Just:

lxc launch images:centos/7 container-name-number-1
lxc exec container-name-number-1 -- /bin/bash

And you are inside a shell in your centos 7 LXD container. I haven’t use the LXC tools (lxc- the ones with the dash after c), but LXD (lxc without dash) makes it really easy to launch images.

If you need OCI containers (docker / podman / k8s), I would install portainer inside a LXD container. Be sure to enable container nesting / nested containerization on LXD, so that you can run other containers inside it. Then you can use portainer web GUI to manage docker or microk8s.

If you’re going the LXD / LXC route, avoid Ubuntu, just because snaps don’t work in it and many stuff gets bundled as snaps now, like microk8s. Use debian, rocky or alma (all of which are available on linuxcontainer.org, which LXD uses as the image repo where you can launch new containers).

If you know ansible, you may create ansible templates to control all of those, but to me, it’s a PITA and for some reason, I just don’t like Ansible and prefer to stick to the CLI and regex.

hypoiodous · February 19, 2022, 10:14am

I am currently running Ubuntu 20.04 but if I make sure to install LXC/LXD through the official repositories rather than using the snap package, would that be fine?

How does Virt-Manager compare to LXC? Just a quick look at it but seems like Virt-Manager shows statistics and things like that, but I’m sure this information can also be retrieved somehow through LXC. Although one of the tutorials I’ve watched recently talked about how LXC instances tend to take up as much resources as available so I should be careful with that.

Again my use case is not to setup a home-lab, not yet anyway, but rather run a few VMs to play around and test things out. Because of this one of my goals is to have this process aligned with other tools like Ansible, as I currently do with Vagrant. I want things expressed in code so that I can make changes quickly, and back things up with git so that I can keep track of changes but also share this easily with a git pull.

ThatGuyB · February 19, 2022, 2:57pm

Virt-Manager is just a GUI for libvirt (virsh). You can achieve the same thing with virsh in the CLI. I’m not sure what your workflow is and how you actually use ansible. If you use it to create, delete, start and stop VMs, then obviously virt-man is not for you and you should probably use libvirt. But if you use ansible to configure VMs and test software after you create the VMs, then a GUI is not too bad.

LXC is a container technology, basically chroot on steroids. So compared to VMs, LXC is really lightweight. LXC can be configured to not use all the memory and cpu, but I don’t have experience with LXC tools on how to do that.

LXD is a container and VM orchestrator. In the back-end, it uses LXC, however, LXD makes administration of containers and VMs much easier and if needed, opens it to the network (as opposed to using ssh), so you can manage LXC containers from another device. LXD also has a VM feature, which leverages QEMU. The syntax is the same as for LXC, just that you add a --vm when creating VMs. So if you are going to create ansible playbooks yourself, LXD would simplify the task (you get 2-in-1 basically).

LXD also has an easy way to set the resource utilization of containers and VMs through profiles. You create a profile with standard ram, cpu, networks and permissions, basically like AWS EC2 (t2.medium, t2.large etc.) and you assign your containers and vms to that profile and they will only use as much as you give them, or unlimited if nothing is set on the profile. You may also create a profile for each container or vm, but that becomes unmanageable fast.

By editing a profile, you change the resources of all containers and vms (vms need to be restarted though for settings to apply). Or you can move a container to another profile and it will get new resources.

Finally, VMs, being complete virtual hardware will use all the resources you allocate them and use a bit of resources on your host to virtualize the hardware itself. A now standard feature in most hypervisors, including KVM that QEMU, and thus libvirt is using is called paravirtualization, which, by using a driver in your guest OS, you can reduce the amount of system resources used to emulate physical hardware. KVM calls those devices virtio. Lastly, VMs in libvirt will use all the resources they can get when it comes to RAM and storage.

Another feature developed for VMs is called Ballooning Device for RAM, so you may allocate less RAM to a VM and have it expand when the VM requires it, but I never use it, because I find it clunky. For disk, you have something called thin provisioning, which makes the VM only use as much storage as it actually uses. The double-edge sword of this is that you can overprovision your server. What that means is trick your VMs into having an overall bigger pool of resources then currently exist on the servers. If VMs behave, they may very well work and share together. But if suddenly all VMs decide to gobble up RAM, either some VMs will crash, or they will get suspended (likely the former). If the storage gets filled up and a VM thinks it has more storage than it does and tries to write to it, it will automatically get suspended by the hypervisor.

Now, after all that was explained, LXC and LXD function basically like ballooning for RAM and I believe thin provisioning for storage too (but I’m not certain on the later), but instead that’s the default and can not be changed, due to how LXC works. The amount of resources you allocate a container will not actually be permanently allocated to a container (not sure how it works for VMs in LXD though). You have your profile that sets the maximum limit of a container, but if the RAM is not used, you may use the rest on your host. But that gets you into the same issue with overprovisioning, so you have to keep in mind the total amount of resources containers have set in their profiles, but for LXC/LXD is not that big of an issue compared to VMs, they don’t tend to crash when they reach the RAM limit.

The only disadvantage I see by using LXD VMs is that I don’t think it can run other OSes than the images provided by linuxcontainers.org. I think it may be possible to install things like Windows or BSD manually, then just clone the container, but I don’t know how one would do that, so if you are sometimes using OSes other than Linux in your testing, you have to keep this in mind.

Buffy · February 19, 2022, 4:43pm

KI7MT · February 19, 2022, 5:14pm

There aren’t enough hours in a day for the team I’m on to mange all our servers without automation. TF and Ansible are lifesavers (and not the candy kind). Between dev, test, stage and prod servers in HA / DR, our monthly patching cycles alone would ensure we get “no sleep” if we had to do it all manually, not to mention service maintenance and all that entails.

I can understand the desire to do things manually, but, when compliance, uniformity, alarms, metrics and all the rest comes into play, deploying updates from an inventory makes life “so much” easier.

Buffy · February 19, 2022, 5:19pm

Plus I think an Ansible playbook is or at least can be pretty much self-documenting, which saves time/errors too.

KI7MT · February 19, 2022, 5:33pm

Another example: what if the services you are responsible for are Tier-0 services, meaning, everyone else needs your services to run their DevOps CI/CD pipelines, or their orchestration mechanics? You can’t take these services down, they have like 5-nine SLA’s (99.999 uptime). You simply don’t have time to do things manually. We have to shift these stacks on the load balancers, do what we need to do, then shift them back like yesterday. We have minutes, not hours, to perform the actions we need to do.

ThatGuyB · February 19, 2022, 6:49pm

I’m not saying not to do automation, I’m saying I’m not a fan of ansible. It’s likely due to all the horror stories I read of ansible behaving unexpectedly even though the logic of the playbook indicates otherwise.

I did automation myself using pssh, pscp (which is part of pssh) and shell scripts. With shell scripts, any linux or unix admin will understand what is happening from the code, unlike with ansible where you can only guess what is happening. Sometimes you have to write additional stuff, like checking the os-release and package manager in the script and use one available to you, which ansible already does, but only because people already wrote that previously.

Starting up and cleanly shutting down 100s of oracle dbs with one command was god sent, when we had a planned maintenance to our hypervisors. We don’t have those kind of SLAs, but if we did, we would have probably bought more hosts, moved VMs around, upgraded hosts, moved them back, upgrade the other hosts, but it wasn’t worth the investment. We still had some leeway, we never overprovisioned our proxmox servers, we could move vms around in case there was an unexpected issue with one, but we rarely needed to.

I had all kinds of shell scripts, from updating centos or debian, to changing deprecated releases of centos to the vault repo (because some customers refused to upgrade, so we still had to test stuff on centos 5 and 6), to configuring a new environment from scratch. I never really felt the need for ansible and given the already overloaded CRM, I did not have time to learn ansible. My colleagues did an ansible VM and tried playing with it, but I needed something and I needed it right then and there, so I just started writing shell scripts, which worked phenomenally.

It would have been wise back then to learn ansible or another automation tool, but my work did not allow me the time to do so, I had a tight schedule of writing shell scripts, testing them for correctness, then deploying them. And everything being so tailor made to my infrastructure and needing to run stuff on so many VMs (almost 300), I would even make a shell script that I would only need to run once, just so that I would not have to ssh into all vms and run stuff manually.

That being said, I had one time where I could not, for the love of me, automate something, and that was upgrading proxmox from 5 to 6. Proxmox asks for questions during the upgrade process and some of hosts would add or skip the questions, like “file n was modified, but repo has a new one, keep your or replace it with upstream?” So for that, I just used a terminal multiplxer (dvtm) and ssh’ed into all the hosts and used the multiplexer’s feature to send the same input to all the windows (ctrl+g a), so typing “apt dist-upgrade” and “y” and “n” and so on, on each question helped me keep my sanity in check, and not have to check every question on every host separately, even though for some of them I had to disable the input broadcast and select a certain one that had another question that needed answering.

Good times. Now I don’t maintain an infrastructure anymore. I miss having to buy and install servers and managing my cluster and the VMs… and the network, and the NASes. I don’t miss testing the power generator though.

Mr_McBride · February 20, 2022, 12:51pm

Personally, I think Ansible is much easier to read and understand than someone else’s shell script.

I use Python for most of my automation, but there are many tools out there.

Buffy · February 20, 2022, 2:26pm

I found it lots easier to learn starting out, too. Probably because most of the hard parts are already written (all the different modules) and very standardized and so I just needed to learn to use them from the playbooks. And it’s nice to have a homelab to test it on too.

Mr_McBride · February 20, 2022, 2:37pm

Yeah, the first thing to learn about Ansible is that it’s not a scripting language. Instead of defining and coding what you want done, the thought process is more of describing a desired configuration state. Then the modules do the rest.

ThatGuyB · February 20, 2022, 2:40pm

It’s easy to read because you already know ansible. But given a large sample size of linux or unix admins, I would say there would be very few who wouldn’t fully understand the shell script and it wouldn’t take more than 2 minutes to open the man page of certain commands they don’t understand and figure it out.

Here's an example (click to expand)

if [ -d /mnt/orabkp ]
then
ls -l /mnt/orabkp > /dev/null
ok=$(mount -l | grep -c bkp)
if [ ! $ok -le 0 ]
then
for foo in $(ls -l /opt/orabkp/ | grep drwx | awk '{print $9}')
do
        if [ "$foo" != "fakedb" ]
        then
                if [ ! -d /mnt/orabkp/$foo ]
                then
                        mkdir /mnt/orabkp/$foo
                        touch /mnt/orabkp/$foo/.mark
                        mkdir /mnt/orabkp/$foo/autobackup
                        mkdir /mnt/orabkp/$foo/backupset
                        chown -R oracle:dba /mnt/orabkp/$foo
                        chmod 775 -R /mnt/orabkp/$foo
                        chmod 777 /mnt/orabkp/$foo/.mark
                fi
        fi
fi
fi

This is when I was just starting to write shell scripts, now I could write them better, but I didn’t modify my original one, just to make a point.

The background: oracle backups were made by having backup scripts for each DB, in folders in /opt/orabkp, each folder being named after the DB instance name. All the script does is verify if the folder /mnt/orabkp exists, if it does, it runs ls, just to make sure the NFS is mounted. If the mount command detects and nfs mount in that location, it continues. It checks if the folders already exist on the NFS server and if it doesn’t, it means it’s a new db that needs to be backed-up, so it creates its folder and the file .mark. The last file is used to verify that the NFS is mounted before doing the db dump.

I have used this script hundreds of times when new dbs were created, along with another one (they were twin scripts, now that I think about it, I could have just combined them in one). I have more scripts like those that I used to automate tasks. Could I have used ansible for this? Sure, if I learned it, but the script above is simple, any linux sysadmin worth their salt will understand it (I don’t work at that company anymore, so someone will look or has looked already at it to understand what it does) and I wrote it in probably less than 10 minutes. I don’t know how long it would have taken me to replicate the same tasks in ansible. If I knew ansible, probably around the same time, but since I didn’t, probably a lot more, time that I didn’t have.

And to run it on multiple machines, I would use pssh

pscp.pssh -h HOSTS_FILE -p 20 -o /var/log/pssh-output-logs/ -e /var/log/pssh-error-logs/ the-above-script.sh /root/scripts/
pssh -h HOSTS_FILE -p 20 -o /var/log/pssh-output-logs/ -e /var/log/pssh-error-logs/ "/root/scripts/the-above-script.sh"

Just like in ansible, I had different hosts files with the servers I wanted to deploy and run the scripts on, in this case, oracle servers. The basics of this is that pscp does a “scp file user@host:/path/” for each host in the hosts file you give it, with -p being the number of parallel threads. Then pssh would run the script on all of them and I would be done with my “work” and more to my next task.

There weren’t a lot of tasks that I could automate, I had a few, among them being changing repos, updating, installing all the software on a new VM, or adding ssh keys to a lot of VMs authorized_keys users when someone new came along, removing a ssh key from all the users’ authorized_keys on the VM, or create or delete a user on all VMs when someone new came along. I’m certain, all doable in ansible.

Now I’m not a sysadmin anymore, so I’m not sure how worth it would be learning ansible. If the opportunity comes along, I may, but if I need to do things for myself that would be done easier by automation, I will probably go back to writing shell scripts.

hypoiodous · February 20, 2022, 3:27pm

Well thank you all for this, I think I’ll be going with LXC and see hwo things go from there

Mr_McBride · February 20, 2022, 3:45pm

By easier to read, I meant to convey that with Ansible you describe a configuration state. The Ansible modules provide for how to get to the described configuration state. It’s very different with Python or shell scripting, where you have to not only account for the configuration state, you have to code how to get to that configuration state as well. This concept helped me learn Ansible.

Good topic and great info. Thanks for sharing your insights.

ThatGuyB · February 20, 2022, 4:05pm

Ok, now I get what you mean. Well, yeah, a shell script will always be bigger than an ansible playbook in that sense.

KI7MT · February 20, 2022, 6:00pm

FWIW, we use several options: Ansible, shell, Perl, others. However, if we invoke a script(s),they are not copied to the production servers, we use Artifactory and push a new package (script collection) to the server(s), then invoke it in whatever way is appropriate for the task. This is all part of our change control process: Plan, Implementation, Validate, and roll back (if validation fails). Anything hitting productions servers must be run in a stage environment first, and pass successfully, before pushing to prod.

hypoiodous · February 20, 2022, 7:34pm

I think this is the correct mindset. Use what you need and avoid being dogmatic about what to use and not to use. Personally I really like Ansible, which I’m learning it right now, and I’m not a fan of shell scripting precisely because is not as clear to read and write. But as there rarely is a one-size-fits all solution, we should be able to adapt.

Mr_McBride · February 21, 2022, 2:14pm

Perl ???
Man, that brings back some memories. I loved Perl for a long time.

Agreed. It always depends on the use case.

It’s good that we have so many choices.

KI7MT · February 21, 2022, 9:27pm

Not saying “I use Perl” … just that some still do. My days of perl -e cpan is long past it’s sell by date Ha!