Backup script push or pull?

mason64 · December 20, 2023, 4:01pm

Hey All!

Am currently about to start learning bash scripting and and with this am going to be doing a project to make a basic backup script with rsync

my end goal is to have a cronjob running my bash script once every week maybe to backup a couple of folders on my samba share server, all stored locally on my office/house network. So far i currently have my pi4 as the backup server running ubuntu with 2x 2.5" sata 480gb hdd’s mounted and ready to go) 1 drive is going to be for company stuff the other is going to be for family videos and photos.

this topic may go on a bit asking for help but the first 2 questions are when doing a backup do you Push or Pull your data or does it not matter?

my second question is, is rsync the write tool for the job? i am looking to backup once a week (incremental backups and maybe compressed if thats the best option),

I may have more questions soon sorry

Thanks in advance!

ThatGuyB · December 20, 2023, 6:22pm

Depends. As far as rsync is concerned, doesn’t matter.

As far as operational strategy is involved, I always “pull” (or rather, send) the scripts to the hosts that require backup. Reasoning behind it is that, if you fail to “push” (execute) from your main / control server to the hosts, you’ll get a notification (“failed to run backup”). If you run cron jobs on the hosts locally and say, one of them just dies, you don’t get notified that backups on it didn’t run, unless you do some kind of checks on the backup server file system side (like writing files “begin backup at $(date)” and “end backup at $(date)” and get alerts if backup didn’t get an end file after an expected time).

Rsync is a replication tool. It can be used for backups in some clever ways (like the way BackupPC 2.0 uses it, I don’t like 3.0), but it’s best to use dedicated backup utilities.

I’m starting to get into restic, which should get the job done really well (it’s a push only, it has client-side encryption and has deduplication!), but if you want to learn some enterprise backup solutions, look into stuff like bacula. Note that bacula is not easy.

Restic should be fine at a beginner level, as it’s not a “client-server” software, like bacula. You just need a backend storage, be it nfs, sftp, s3, or whatever else and restic just dumps the backup into that. Everything is done on the client (push).

I’m not sure how bacula does it, but I think it’s some form of push too (you install the bacula client on the host that needs backup).

Technically speaking, everywhere on the internet you go, says to go with pull, because:

if backup host gets ransomwared, you still have the “live copy”
if the host gets ransomwared, your “live copy” gets encrypted and if you push junk data, you risk overwriting the good backed up data

It makes sense when you think about it that way, but it’s not as black and white. If your control server gets trojaned, then your credentials or SSH private key used to authenticate to the hosts gets leaked, giving an attacker access to basically your whole network.

A really secure approach would be to have some automation between the control server and client. You use your control server to ssh (with a really limited access user) or sftp a small file (in a chroot) to the client, the client sees the changed file and triggers a backup, but in a more secure fashion (just having a backup on a file change without supervision or checks that the command or file came from the control server will result in the client basically be able to push the backup, but with more steps involved, basically obfuscation, not security).

Of course, don’t let secure approaches and overthinking and overdesigning prevent you from having proper backups. Do the smallest script you can to get some data flowing (even an rsync combined with a FS snap on the target backup server counts as a backup, which is technically a “full” but run as a “forever incremental”).

mason64 · December 20, 2023, 9:09pm

a great help. i did think pull was the way to go. I will start with rsync like you said and get some data flowing then work on it from there to make it better and easier.

Thanks once again @ThatGuyB

ThatGuyB · December 20, 2023, 10:19pm

I will again shill for restic. Intermit.tech did a good tutorial on the restic + s3 backups. You can use any other backend though, the easiest probably being SSH (SFTP) to the RPi.