Vagrant replacements for local development?

There aren’t enough hours in a day for the team I’m on to mange all our servers without automation. TF and Ansible are lifesavers (and not the candy kind). Between dev, test, stage and prod servers in HA / DR, our monthly patching cycles alone would ensure we get “no sleep” if we had to do it all manually, not to mention service maintenance and all that entails.

I can understand the desire to do things manually, but, when compliance, uniformity, alarms, metrics and all the rest comes into play, deploying updates from an inventory makes life “so much” easier.

3 Likes

Plus I think an Ansible playbook is or at least can be pretty much self-documenting, which saves time/errors too. :slight_smile:

3 Likes

Another example: what if the services you are responsible for are Tier-0 services, meaning, everyone else needs your services to run their DevOps CI/CD pipelines, or their orchestration mechanics? You can’t take these services down, they have like 5-nine SLA’s (99.999 uptime). You simply don’t have time to do things manually. We have to shift these stacks on the load balancers, do what we need to do, then shift them back like yesterday. We have minutes, not hours, to perform the actions we need to do.

2 Likes

I’m not saying not to do automation, I’m saying I’m not a fan of ansible. It’s likely due to all the horror stories I read of ansible behaving unexpectedly even though the logic of the playbook indicates otherwise.

I did automation myself using pssh, pscp (which is part of pssh) and shell scripts. With shell scripts, any linux or unix admin will understand what is happening from the code, unlike with ansible where you can only guess what is happening. Sometimes you have to write additional stuff, like checking the os-release and package manager in the script and use one available to you, which ansible already does, but only because people already wrote that previously.

Starting up and cleanly shutting down 100s of oracle dbs with one command was god sent, when we had a planned maintenance to our hypervisors. We don’t have those kind of SLAs, but if we did, we would have probably bought more hosts, moved VMs around, upgraded hosts, moved them back, upgrade the other hosts, but it wasn’t worth the investment. We still had some leeway, we never overprovisioned our proxmox servers, we could move vms around in case there was an unexpected issue with one, but we rarely needed to.

I had all kinds of shell scripts, from updating centos or debian, to changing deprecated releases of centos to the vault repo (because some customers refused to upgrade, so we still had to test stuff on centos 5 and 6), to configuring a new environment from scratch. I never really felt the need for ansible and given the already overloaded CRM, I did not have time to learn ansible. My colleagues did an ansible VM and tried playing with it, but I needed something and I needed it right then and there, so I just started writing shell scripts, which worked phenomenally.

It would have been wise back then to learn ansible or another automation tool, but my work did not allow me the time to do so, I had a tight schedule of writing shell scripts, testing them for correctness, then deploying them. And everything being so tailor made to my infrastructure and needing to run stuff on so many VMs (almost 300), I would even make a shell script that I would only need to run once, just so that I would not have to ssh into all vms and run stuff manually.

That being said, I had one time where I could not, for the love of me, automate something, and that was upgrading proxmox from 5 to 6. Proxmox asks for questions during the upgrade process and some of hosts would add or skip the questions, like “file n was modified, but repo has a new one, keep your or replace it with upstream?” So for that, I just used a terminal multiplxer (dvtm) and ssh’ed into all the hosts and used the multiplexer’s feature to send the same input to all the windows (ctrl+g a), so typing “apt dist-upgrade” and “y” and “n” and so on, on each question helped me keep my sanity in check, and not have to check every question on every host separately, even though for some of them I had to disable the input broadcast and select a certain one that had another question that needed answering.

Good times. Now I don’t maintain an infrastructure anymore. I miss having to buy and install servers and managing my cluster and the VMs… and the network, and the NASes. I don’t miss testing the power generator though.

Personally, I think Ansible is much easier to read and understand than someone else’s shell script.

I use Python for most of my automation, but there are many tools out there.

3 Likes

I found it lots easier to learn starting out, too. Probably because most of the hard parts are already written (all the different modules) and very standardized and so I just needed to learn to use them from the playbooks. And it’s nice to have a homelab to test it on too. :smile_cat:

2 Likes

Yeah, the first thing to learn about Ansible is that it’s not a scripting language. Instead of defining and coding what you want done, the thought process is more of describing a desired configuration state. Then the modules do the rest.

2 Likes

It’s easy to read because you already know ansible. But given a large sample size of linux or unix admins, I would say there would be very few who wouldn’t fully understand the shell script and it wouldn’t take more than 2 minutes to open the man page of certain commands they don’t understand and figure it out.

Here's an example (click to expand)
if [ -d /mnt/orabkp ]
then
ls -l /mnt/orabkp > /dev/null
ok=$(mount -l | grep -c bkp)
if [ ! $ok -le 0 ]
then
for foo in $(ls -l /opt/orabkp/ | grep drwx | awk '{print $9}')
do
        if [ "$foo" != "fakedb" ]
        then
                if [ ! -d /mnt/orabkp/$foo ]
                then
                        mkdir /mnt/orabkp/$foo
                        touch /mnt/orabkp/$foo/.mark
                        mkdir /mnt/orabkp/$foo/autobackup
                        mkdir /mnt/orabkp/$foo/backupset
                        chown -R oracle:dba /mnt/orabkp/$foo
                        chmod 775 -R /mnt/orabkp/$foo
                        chmod 777 /mnt/orabkp/$foo/.mark
                fi
        fi
fi
fi

This is when I was just starting to write shell scripts, now I could write them better, but I didn’t modify my original one, just to make a point.

The background: oracle backups were made by having backup scripts for each DB, in folders in /opt/orabkp, each folder being named after the DB instance name. All the script does is verify if the folder /mnt/orabkp exists, if it does, it runs ls, just to make sure the NFS is mounted. If the mount command detects and nfs mount in that location, it continues. It checks if the folders already exist on the NFS server and if it doesn’t, it means it’s a new db that needs to be backed-up, so it creates its folder and the file .mark. The last file is used to verify that the NFS is mounted before doing the db dump.

I have used this script hundreds of times when new dbs were created, along with another one (they were twin scripts, now that I think about it, I could have just combined them in one). I have more scripts like those that I used to automate tasks. Could I have used ansible for this? Sure, if I learned it, but the script above is simple, any linux sysadmin worth their salt will understand it (I don’t work at that company anymore, so someone will look or has looked already at it to understand what it does) and I wrote it in probably less than 10 minutes. I don’t know how long it would have taken me to replicate the same tasks in ansible. If I knew ansible, probably around the same time, but since I didn’t, probably a lot more, time that I didn’t have.

And to run it on multiple machines, I would use pssh

pscp.pssh -h HOSTS_FILE -p 20 -o /var/log/pssh-output-logs/ -e /var/log/pssh-error-logs/ the-above-script.sh /root/scripts/
pssh -h HOSTS_FILE -p 20 -o /var/log/pssh-output-logs/ -e /var/log/pssh-error-logs/ "/root/scripts/the-above-script.sh"

Just like in ansible, I had different hosts files with the servers I wanted to deploy and run the scripts on, in this case, oracle servers. The basics of this is that pscp does a “scp file user@host:/path/” for each host in the hosts file you give it, with -p being the number of parallel threads. Then pssh would run the script on all of them and I would be done with my “work” and more to my next task.

There weren’t a lot of tasks that I could automate, I had a few, among them being changing repos, updating, installing all the software on a new VM, or adding ssh keys to a lot of VMs authorized_keys users when someone new came along, removing a ssh key from all the users’ authorized_keys on the VM, or create or delete a user on all VMs when someone new came along. I’m certain, all doable in ansible.

Now I’m not a sysadmin anymore, so I’m not sure how worth it would be learning ansible. If the opportunity comes along, I may, but if I need to do things for myself that would be done easier by automation, I will probably go back to writing shell scripts.

Well thank you all for this, I think I’ll be going with LXC and see hwo things go from there :slight_smile:

1 Like

By easier to read, I meant to convey that with Ansible you describe a configuration state. The Ansible modules provide for how to get to the described configuration state. It’s very different with Python or shell scripting, where you have to not only account for the configuration state, you have to code how to get to that configuration state as well. This concept helped me learn Ansible.

Good topic and great info. Thanks for sharing your insights.

2 Likes

Ok, now I get what you mean. Well, yeah, a shell script will always be bigger than an ansible playbook in that sense.

FWIW, we use several options: Ansible, shell, Perl, others. However, if we invoke a script(s),they are not copied to the production servers, we use Artifactory and push a new package (script collection) to the server(s), then invoke it in whatever way is appropriate for the task. This is all part of our change control process: Plan, Implementation, Validate, and roll back (if validation fails). Anything hitting productions servers must be run in a stage environment first, and pass successfully, before pushing to prod.

4 Likes

I think this is the correct mindset. Use what you need and avoid being dogmatic about what to use and not to use. Personally I really like Ansible, which I’m learning it right now, and I’m not a fan of shell scripting precisely because is not as clear to read and write. But as there rarely is a one-size-fits all solution, we should be able to adapt.

2 Likes

Perl ???
Man, that brings back some memories. I loved Perl for a long time.

Agreed. It always depends on the use case.

It’s good that we have so many choices.

1 Like

Not saying “I use Perl” … just that some still do. My days of perl -e cpan is long past it’s sell by date Ha!

2 Likes

Here’s a really good exercise, and classic example for somebody that wants to learn automation.

Nextcloud Full Setup Implementation - From Jay

  • Spin up a VM / BM server
  • Get the base OS updated, etc
  • Automate this deployment (Shell Scripts, Ansible, whatever)

You’ll learn “a lot” from going through this process no matter what framework you choose.

4 Likes

I just saw this posted, I have been meaning to setup something similar for along time… I think I just need to get some small scale practice with this type of projects first, see what works and what goes wrong, fix issues, etc.

I was digging through my old script archive, and ran across a bunch of Python Fabric modules I used some time ago. I’ve got a bunch more for deploying Nginx, and other tests I was playing with (if I can find them)

You can edit the ENV list and run the commands on multiple servers. It’s not Ansible, it’s Fabic, running commands remotely. You could easily create Bash function to call these methods directly.

This particular set of functions is rather simple, and used on/for Ubuntu based servers.

  • Save file as: fab.py

You could get fancy and create a connection module, list a bunch of IP’s, and do it all that way. This is just a simple example.

This could be handy for creating the functions to deploy the exercise above without going full Ansible or learning other intricate automation frameworks.

After you write up each individual function, you could write a simple main function that runs them all to deploys the service in its entirety.

Reference Documentation: Python3 Fabric

"""
Date.........: 2/19/2017
Author.......: KI7MT
Description..: Remote server commands using Fabric3

Updating package list

  fab -u username -p password --sudo-password=password setup update

Check for updates
  fab -u username -p password --sudo-password=password setup check

Update and Upgrade (reboot if necessary)
  fab -u username -p pasword --sudo-password=password setup upgrade

"""

from fabric.api import sudo, run, env
from fabric.colors import red
from fabric.contrib.files import exists
from fabric.contrib.console import confirm

def setup():
    """
    Sets up the environment using the given user and host addresses
    """
    global env
    env.hosts = ["192.168.1.1"] # Change this to your remote server IP address

def update():
    """
    Updates the package list
    """
    sudo('apt-get -qq update')

def check():
    """
    Displays package updates needed
    """
    run("""apt-get dist-upgrade -s | python3 -c "import sys; [sys.stderr.write(l) for l in sys.stdin.readlines()[7:] if not (l.startswith('Inst') or l.startswith('Conf'))]" """)

def upgrade():
    """
    Upgrades the system, updating packages and reboots if needed
    """
    sudo('apt-get -y -qq upgrade')
    if exists('/var/run/reboot-required') and confirm('Needs reboot, do it now?'):
        print(red('Rebooting now', True))
        sudo('reboot')

def clean():
    """
    Cleans the packages and install script in /var/cache/apt/archives
    """
    sudo('apt-get -y -q clean')

def autoclean():
    """
    Removes obsolete unbuntu-packages
    """
    sudo('apt-get -y -q autoclean')

def autoremove():
    """
    Removes packages that are no longer needed
    """
    sudo('apt-get -y -q autoremove')

def autoremove_purge():
    """
    Removes packages that are no longer needed plus residual files
    """
    sudo('apt-get -y -q autoremove --purge')
2 Likes

After a couple of days I’m now realizing is probably going to be easier to simply switch from Virtualbox to libvirt as the default provider in Vagrant. This way I can maintain all the configuration as code and everything will continue to work fairly easily, including private networks, ssh, etc. As you can also specify the provisioner in form of ansible or shell scripts, I think it’s going to be more flexible.

I still haven’t test this but from the look of it, the fact that it returns Python object I think makes it really useful for more complex cases where you can easily account for errors, specific versions, etc. I will try this out for sure, thanks! Do you have any more scripts like this? :slight_smile:

I found one project I was messing with on Digital Ocean. It simply deployed a .NetCore Web app along with Nginx.If I have some time this weekend, I’ll clean it up and push the example to my Gihub repo.

This won’t work at present, as it needs the dependent files (systemd, service file, project HelloMvc, etc)

"""
Date.........: 3/3/2017
Author.......: KI7MT
Description..: Hellomvc Web Server Functions using Fabric3

SYNOPSIS
    Example .Net Core MVC project that runs on Ubuntu 16.04 with Nginx Proxy
    server. For simplicity, and to prevent inadvertent installation of
    cross-projects, each project should have it's own *.py file.

    Publishing with Python Fabric3 is functional, but may not be the best
    solution for this activity. Each project requires a SystemD *.service file
    in order to maintain the service. This script will provide for such measure,
    but, subsequent updates (Agile Style of Continuous Integration) to the
    application could be provided by repository tools.

USAGE
    Initial Deployment: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect initial_webapp_deployment

    Republish: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect republish_webapp

"""
import os
import sys
from fabric.api import *
from fabric.colors import *
from server_connect import connect as connect


# Project Data
PROJECT_NAME='NginxTesting'
PACKAGE_NAME='Hellomvc'
GIT_URL='<UPDATE_THIS-REPO>'
SOURCE_DIR='~/src/' 
DESTINATION_DIR='/var/webapps/hellomvc'
SERVICE_FILE='hellomvc.service'
INSTALL_LOCATION='/etc/systemd/system'

#------------------------------------------------------------------------------
# GENERAL PURPOSE FUNCTIONS
#------------------------------------------------------------------------------
def checkout_project():
    """
    As a normal user, clone the sample project

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD connect checkout_project

    @param GIT_URL: is the url of the git repository

    """
    print(yellow("\nChecking Out Project: %s " % PROJECT_NAME))
    run('mkdir -p ~/src && cd ~/src && git clone %s' % GIT_URL)


def fetch_updates():
    """
    Fetch updated from the remote repository

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD connect fetch_updates

    """
    print(yellow("\nFetching Updates for: %s " % PROJECT_NAME))
    run('cd ~/src/%s && git fetch' % PROJECT_NAME)


def service_generate():
    """
    Generate systemd service file for the project
    
    NOTE: Normal Ubuntu systemd.service files go into /lib/systemd/system, then
    a syslink (ln -s) in /etc/systemd/system. However, the Microsoft instructions
    claim it should be installed in /etc/systemd/system  directory.

    To maintain the continuity with the Microsoft How-Too, /etc/systemd/system
    is used for this example.

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect service_generate

    """
    # function variables
    f = SERVICE_FILE
    sf = SOURCE_DIR + f
    dest = DESTINATION_DIR

    # start the generation process
    print(yellow("\nGenerating [ %s ] File" % f))
    run('cd ~')
    run('rm -f %s' % sf)
    run('touch %s' % sf)
    run('echo [Unit] >> %s' % sf)
    run('echo Description="Dot Net Core Hello MVC Example" >> %s' % sf)
    run('echo "" >> %s' % sf)
    run('echo [Service] >> %s' % sf)
    run('echo WorkingDirectory=%s >> %s' %(dest,sf))
    run('echo ExecStart=/usr/bin/dotnet %s/Hellomvc.dll >> %s' % (dest,sf))
    run('echo Restart=always >> %s' % sf)
    run('echo RestartSec=10 >> %s' % sf)
    run('echo SyslogIdentifier=dotnet-example >> %s' % sf)
    run('echo User=www-data >> %s' % sf)
    run('echo Environment=ASPNETCORE_ENVIRONMENT=Production >> %s' % sf)
    run('echo Environment=DOTNET_PRINT_TELEMETRY_MESSAGE=false >> %s' % sf)
    run('echo "" >> %s' % sf)
    run('echo [Install] >> %s' % sf)
    run('echo WantedBy=multi-user.target >> %s' % sf)
    print(green("Finished!"))

    # Now install the service file to the final location
    print(yellow("\nInstalling [ %s ] File" % f))
    sudo('cp -u %s /etc/systemd/system/%s' % (sf,f))

    # change the permissions.
    sudo('chown root:root /etc/systemd/system/%s' % f)
    print(green("Finished!"))


def service_enable():
    """
    Enable a unit to be started on bootup: helloomvc service

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect enable_service

    """
    print(yellow("\nEmable Service [ %s ] " % SERVICE_FILE))
    sudo('systemctl enable %s' % SERVICE_FILE)


def service_start():
    """
    Start a unit immediately: hellomvc.service

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect start_service

    """
    print(yellow("\nStarting Service [ %s ] " % SERVICE_FILE))
    sudo('systemctl start %s' % SERVICE_FILE)


def service_stop():
    """
    Stop a unit immediately: hellomvc.service

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect stop_service

    """
    print(yellow("\nStopping Service [ %s ] " % SERVICE_FILE))
    sudo('systemctl stop %s' % SERVICE_FILE)


def service_restart():
    """
    Restart a unit: hellomvc.service

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect restart_service

    """
    print(yellow("\nRestarting Service [ %s ] " % SERVICE_FILE))
    sudo('systemctl restart %s' % SERVICE_FILE)


def service_disable():
    """
    Disable a unit to not start during bootup: hellomvc.service 

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect disable_service

    """
    print(yellow("\nDisabling Service [ %s ] " % SERVICE_FILE))
    sudo('systemctl disable %s' % SERVICE_FILE)


def service_check():
    """
    Check whether a unit is already enabled or not: hellomvc.service

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect check_enabled

    """
    print(yellow("\nChecking Service Enabled [ %s ] " % SERVICE_FILE))
    sudo('systemctl is-enabled %s' % SERVICE_FILE)


def service_status():
    """
    Display status of hellomvc.service

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect status_service

    """
    print(yellow("\nChecking Service Status [ %s ] " % SERVICE_FILE))
    sudo('systemctl status %s' % SERVICE_FILE)


def reload_systemd():
    """
    Reload Systemd Discovery Deamon

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect reload_systemd

    """
    print(yellow("\nReloading Systemd Daemon"))
    sudo('systemctl daemon-reload')

#------------------------------------------------------------------------------
# PUBLISH and UPDATE FUNCTIONS
#------------------------------------------------------------------------------
def update_webapp():
    """
    Republish the project to a specified location

    Note: By design, this function requires the user to enter the sudo password

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD connect update_webapp

    """
    # fetch new updates
    fetch_updates()
    
    # Stop the Web Service
    service_stop()

    # disable the service
    service_disable()

    # Rebuild the project
    print(yellow("\nPublishing Project : %s/%s (Requires SUDO Password)" % (PROJECT_NAME,PACKAGE_NAME)))
    run('cd ~/src/%s/%s && sudo dotnet publish -c Release -o %s' % (PROJECT_NAME,PACKAGE_NAME,DESTINATION_DIR))

    # Update Permissions on Destination directory
    print(yellow("\nUpdateing Permissions on [%s]" % DESTINATION_DIR))
    sudo('chown -R www-data:www-data %s' % DESTINATION_DIR)

    # Re-Enable the Service
    service_enable()

    # Re-Start the system service
    service_start()
    
    # Reload Systemd
    reload_systemd()

    print(green("\nFinished Publishing [%s]" % PACKAGE_NAME))


def publish_webapp():
    """
    Initial project deployment 

    Note: By design, this function requires the user to enter the sudo password

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD connect publish_webapp

    """
    print(yellow("\nInitial Project Deployment : %s/%s (Requires SUDO Password)" % (PROJECT_NAME,PACKAGE_NAME)))
    run('cd ~/src/%s/%s && sudo dotnet publish -c Release -o %s' % (PROJECT_NAME,PACKAGE_NAME,DESTINATION_DIR))

    print(green("\nFinished Publishing [%s]" % PACKAGE_NAME))

#------------------------------------------------------------------------------
# PUBLISH NEW PROJECT
#------------------------------------------------------------------------------
def initial_webapp_deployment():
    """
    Deploy Hellomvc on a new webserver after update, upgrades, Nginx and .Net SKD installation

    Usage: fab -f hellomvc_example.py -u USER -p PASSWORD --sudo-password="PASSWORD" connect initial_webapp_deployment

    """
    checkout_project()
    publish_webapp()
    service_generate()
    service_enable()
    service_start()
    reload_systemd()
    service_status()


# END hellomvc_example.py 

1 Like