Here is a comparison of various data backup and archiving tools. For background, see my blog post in which I discuss the difference between backup and archiving. In a nutshell, backups are designed to recover from a disaster that you can fairly rapidly detect. Archives are designed to survive for many years, protecting against disaster not only impacting the original equipment but also the original person that created them. That blog post goes into a lot of detail on what makes a good backup or archiving tool.
Comparison table
Let me give you the comparison here, and explain the features and their significance below.
Feature | backuppc | bacula (community edition) | borg | dar | git-annex |
---|---|---|---|---|---|
Storage type | reference-counted file tree | archive files | reference-counted chunk tree | archive files | sym/hard-link file tree by hash; history via git |
Supports streaming-only (tape, etc) | no | yes | no | yes | no |
Can save backup to pipe/FIFO | no | yes (FIFO only) | no | yes (pipe and FIFO) | no |
Asynchronous backups possible | no | no | no | yes | yes |
Multi-volume support | no | yes | no | yes | yes |
Single files larger than a volume | no | yes | no | yes | no |
Individual backup larger than a volume | no | yes | no | yes | yes (with separate repo) |
Volume identification | n/a | volume label (except for FIFO) | n/a | backup filename + slice number | repo name |
Backup rotation / pruning | automatic per configured rules | automatic per configured rules | CLI prune call with rules | manual (SaraB/Baras provides CLI with configured rules) | CLI drop call to delete old data |
Deduplication | file-level | common base only (paid version has more) | block-level | common base only | file level |
Compression | zlib at storage; ssh/rsync transport | zlib | lz4, zstd, zlib, lzma | lz4, zstd, zlib, bzip2, lzo, xz | no |
Can avoid re-compressing | no | no | no | yes (based on extension; configurable) | n/a |
Binary deltas | at transport, not storage | no | yes | yes | no |
Supports encryption | no | data only (filenames & EAs unencrypted) | yes (symmetric) | yes (both public key with gpg and symmetric) | with certain special remotes |
Zero-trust target | no | moderate (risk of forced keys to client) | yes (if targeted by only 1 machine) | yes | with certain special remotes |
Authentication / verification | no | X.509 RSA file signatures | HMAC-SHA256 | gpg-signed session key, detached sha512, par2; any pipe | secure hashes and signed commits |
Can directly back up Windows machines | if rsync installed | if agent installed | no | yes | yes (if git installed) |
Can directly back up *nix machines | if rsync installed | if agent installed | yes | yes | yes (if git installed) |
Can directly back up Mac machines | if rsync installed | if agent installed | yes | yes | yes (if git installed) |
Preserves Mac resource forks | no | yes | yes | yes | no |
Preserves timestamps | yes | yes | yes | yes | no |
Preserves *nix hard links | yes | yes | no | yes | no |
Preserves *nix symlinks | yes | yes | yes | yes | no |
Preserves *nix EAs and ACLs | yes | yes | yes | yes | no |
Preserves *nix ownership (uid/gid) | yes | yes | yes | yes | no |
Preserves *nix sparse files | no | yes | simulated | yes | no |
System model | daemon on storage; pull via rsync | daemons everywhere; pull | CLI | CLI, C++ library, Python library | CLI |
Network/remote support | Backs up systems using rsync+ssh | scheduler; backs up from/to multiple systems | push to remote using ssh+borg | push to remote on any curl backends, SFTP, ssh, or pipe | push to any of numerous special remotes or ssh+git-annex |
GUI available | native web interface | yes | yes (vorta) | yes (gdar, DarGUI) | limited web interface focusing on synchronization |
Restoration without using tool | no | no | no | no | file data but not tree updates |
External runtime dependencies | rsync, Perl | MySQL or PostgreSQL | Python | none | git |
Standalone binary distribution | no | no, but bls/bextract can be used in emergency | dynamic (includes Python) | dynamic or static for multiple platforms, from author or distro | dynamic (requires external git) |
Disaster recovery method | mount, reconfigure hosts | bscan to rebuild DB | normal commands | normal commands | normal, but may need to rescan repos |
Scheduling | internal | internal | external | external or wrapper script | external |
Supported platforms for storage | local *nix | *nix, Windows, Mac | local or ssh *nix, Mac | local or ssh *nix, Mac, Windows; curl; SFTP | local or ssh *nix, Mac, Windows; special remotes |
Supports incremental backups | yes | yes | yes | yes | yes |
Supports decremental backups | yes* | no | yes* | yes | yes* |
Here’s what the different features mean:
- Storage type
- What the backup storage looks like. backuppc’s reference-counted file tree is a tree on the filesystem, where each file corresponds to a file on the original. borg’s reference-counted chunk tree encodes each block of files as a file on the filesystem. archive files are large files that group multiple files into one. git-annex’s tree is similar to a reference-counted file tree, but achieves that via links. Of these, the archive files provide the most flexible storage, since they don’t even require a filesystem (and can be put on tape directly), while the reference-counted chunk tree represents the most efficient; see deduplication below.
- Supports streaming-only
- Can the backup system write to devices that do not support random access? That would include things such as tapes and pipes.
- Can save backup to pipe/FIFO
- For those that support streaming-only, can they write the backup data to a pipe or FIFO (named pipe)? These could allow them to be, for example, streamed over ssh.
- Asynchronous backups possible
- If yes, the system being backed up and the ultimate storage destination do not have to be reachable over the network in real time. This means they support Asynchronous Communication (such as NNCP or Filespooler), which facilitates things like Airgaped backups and using sneakernet to have temporary storage on portable devices to transport the data to its ultimate storage host.
- Multi-volume support
- Whether the backup system supports more than one volume for storing backups. Here a volume means a removable drive, a tape, an optical disc, or something similar. OS or hardware tricks to aggregate drives (eg, RAID) don’t count here.
- Single files larger than a volume
- If the system supports multiple volumes, whether it can split a single file across multiple volumes
- Individual backup larger than a volume
- If the system supports multiple volumes, whether it can split a backup session across multiple volumes
- Volume identification
- How a multi-volume-capable system identifies volumes
- Deduplication
- Whether the backup system can detect duplicate data in the backup set and store it once. Block-level is the most efficient, as it detects common parts of files. File-level will typically hash files. Common base means that you can use a single base backup (eg, an installed OS when you back up multiple machines) and base incrementals on that, and is the least flexible.
- Compression
- Whether the backup system supports compression, and if so, what kind.
- Can avoid re-compressing
- As a performance optimization, whether the backup system can avoid re-compressing already-compressed data
- Binary deltas
- Traditional backup systems will take any change in a file, even one bit, as a reason to store an entirely-new copy of that file. Binary deltas store a more efficient representation of the difference, which can be used to bring the previous file to the new state. BackupPC supports binary deltas over the network, but not at storage. borg and dar support binary deltas both over the network and at storage.
- Supports encryption
- Whether and how the system can generate encrypted backups.
- Zero-trust target
- If a system supports encryption, whether the host storing the backup data can be prevented from decrypting it. “Yes” is best.
- Authentication / verification
- Whether a backup system provides integrated authentication of the backup data. With some, this is integrated with the encryption code and may require encryption (eg, Bacula). With others, such as git-annex, it is totally separate. dar provides two built-in options:
--sign
which signs the encryption key used for the session, and--hash
which computes a SHA-512 hash while writing the archive and writes it to a separate file once the archive is written. It also integrates with par2 to create par2 signatures. Since dar creates archive files like tar does, it can also be used with any other tool that can sign data on disk or on a pipe; for instance, gpg could be used to provide stronger assurances than the built-in--sign
. - Can directly back up … machines
- Whether the program can back up machines running certain operating systems without using external helpers (sftp, etc). “*nix” means Unix/Linux/BSD.
- Preserves …
- Whether the backup system saves and restores given types of metadata. To preserve a hard link, the backup program must, at restore time, hard link together the exact same set of files that were hard linked in the source data, and no others (even if identical by content). Borg’s simulated support for sparse files means that it saves holes as blocks of NULLs at backup time, and can convert blocks of NULLs to holes at extract time. This doesn’t necessarily preserve the exact sparse structure of the original file, but should achieve roughly similar storage gains.
- System model
- How the system works. backuppc runs a daemon on the system doing the storage, which pulls data from the systems being backed up using rsync. Bacula had a director daemon that performs scheduling and coordination, a storage daemon that runs on the system(s) providing storage, a file daemon running on systems being backed up, and also requires a PostgreSQL or MySQL database. The CLI tools typically are invoked by command line command (which may be invoked by cron or systemd).
- Network/remote support
- How it supports having the backup and the source data on different machines. BackupPC can use rsync over ssh. Bacula uses the daemons as noted, which can communicate over a network. borg can push to a remote over ssh, so long as borg itself can be executed on the remote. dar can push to a remote using backends supported by libcurl, or SFTP, or any command that can be piped to. git-annex has a set of special remotes that can be pushed to, though they may not necessarily preserve all metadata.
- GUI available
- Whether a graphical interface is available, and what type. Third-party FLOSS projects provide these for borg and dar. BackupPC uses it as its primary interface. git-annex’s assitant provides Dropbox-like synchronization with its web interface, but doesn’t work well with all workflows git-annex makes possible.
- Restoration without using tool
- Whether you can restore data without using the particular backup tool used to create it. Of these, only git-annex has some support here; it could let you at least access the file data, even if you may wind up with duplicate copies after renames, deletions, etc.
- External runtime depdendencies
- Things that must be present to run the tool. Of these tools, only dar is fully self-contained, and can be built into a statically-linked single binary on *nix platforms that has no external dependencies.
- Standalone binary distribution
- Whether a self-contained standalone binary is available, and if so, what kind. Borg’s standalone binary is dynamically-linked and includes the Python environment necessary to run. dar provides a statically-linked binary, built by default; a statically-linked is the most portable. git-annex provides a dynamically-linked binary, which also requires git be installed.
- Disaster recovery method
- How to recover the data if only the backup volumes survive a disaster. With BackupPC, you install BackupPC on a fresh system, configure the hosts, and then can restore. Bacula would have you make a fresh install, then use bscan to load the information about volumes into its database. Bacula does support bls/bextract commands as well, but their usage is complex and impractical for most. borg and dar would just have you use the same commands as usual, since they don’t require any external configuration. git-annex may need to have you do
git annex sync
from repos to reload their statuses, but otherwise doesn’t need anything special - IF you have saved the git metadata somewhere. - Scheduling
- How the backup system schedules backups. “Internal” means the backup software has a daemon running that does its own scheduling, often with limits on simultaneous backups and such that it can enforce. External means something like cron handles the scheduling.
- Supported platforms for storage
- Built-in support for backup destinations. “Local” means storage local to the backup software. “ssh” means via ssh to another system running the backup software. Dar supports libcurl destinations (https/ftp/sftp/etc). git-annex has support for special remotes for various targets. Since dar is a pipe-friendly CLI program, it can be combined with others to support a wide variety of schemes; for instance, rclone to cloud. Emulations such as the Windows Subsystem for Linux don’t count as Windows support; here I mean native support.
- Supports incremental backups
- Whether the backup system supports storing just the changes since the last backup. All systems here do.
- Supports decremental backups
- Whether the backup system supports storing the most recent backup as a full backup, then deltas running back in time – sort of the opposite of a traditional incremental. backuppc, borg, and git-annex use a storage format that is equally efficient going forwards and backwards, so I rated them each as “yes*”.
Features every program here has
-
Included in the Debian distribution and many others
-
Supports random access efficient enough to extract a single file without reading an entire backup, when the underlying device supports random access
Overview of the tools and analysis
BackupPC
BackupPC is a single-daemon system that backs up remote systems using rsync. This means that network bandwidth is used efficiently. It stores the files in a file-level deduplicated directory tree. It is a simple solution for basic backups of *nix machines when the backups fit on a standard filesystem.
Bacula
Bacula has its heritage in the tape backup world. It supports full backups and incrementals in the traditional sense. It keeps a database (in PostgreSQL or MySQL) of the different volumes and their contents. This is used both to calculate what media is needed for a restore as well as implement volume reuse rules (only allowing a full to be overwritten when a more recent full exists, for instance). It is the only tool here to provide automation around many-to-many storage relationships (can back up many systems to many storage systems) and provides the most sophisticated automation around volume management. On the other hand, it is also the most complex to install and set up, requiring its own daemons on every relevant system, as well as a database server. The complexity of restores may be a problem for decades-long archival, but on the other hand, those making heavy use of removable media may appreciate its flexibility. Its real target is the enterprise market, and a commercial version adds additional features.
Borg
Borg does backups to a filesystem. Borg’s emphasis is on efficiency; it is most efficient both over the network and on disk of all the tools here. Its on-disk format is a filesystem tree consisting of deduplicated chunks of files, which can also be compressed. Therefore, if you move a file – even from one machine to another – it will have to be neither re-transmitted nor stored again, because borg’s deduplication will detect this. By supporting binary deltas, it also efficiently stores changes to files. It is the best solution for very slow network links or situations where storage space is at a premium. On the other hand, it has its own repository format that should ideally have the time-consuming borg fsck
(can take days) run periodically, and backups can be slow. Borg doesn’t support multiple volumes.
Dar
Dar represents a kind of next-generation tar. It is a command-line program that is supremely flexible, offers integrated par2 support, and is designed to integrate well with external tools. I’ve written a lot about dar; my dar page has links to my articles. Of all these tools, dar is the most flexible about storage, since it can be used in a pipeline. It also supports tape drives, with hooks allowing you to run commands to, for instance, operate a changer or have an operator switch tapes. Its isolated catalogs feature makes for efficient tracking of backed-up data without requiring a separate SQL database as with Bacula. You could look at dar as an all-around most flexible option. While it’s not quite as efficient on-disk as borg, or doesn’t have quite the level of built-in volume management sophistication as Bacula, it does pretty well compared to both – and also is a better tar than tar, a better zip than zip, and is the most “Unixy” of all of these due to its ability to be used in pipelines. It can be thought of as a powerful filesystem differ/patcher, or the workhorse of your own backup scripts. It is also the most standalone of all the tools here, being able to be functional as just a single statically-linked binary.
git-annex
git-annex isn’t designed as a backup tool at all, but it has a robust feature set that allows it to be used in such a way. It is more of a data-tracking and moving application. Uniquely, if certain care is used, backed-up data can be presented as plain files along with metadata, meaning that a worst-case scenario of a restore by an unrelated person in the future might at least get at your family photos, even if there are 5 copies of each due to renames; using a full git-annex would resolve that situation.
Links to this note
Sometimes we want better-than-firewall security for things. For instance:
dar is a Backup and archiving tool. You can think of it as as more modern tar. It supports both streaming and random-access modes, supports correct incrementals (unlike GNU tar’s incremental mode), Encryption, various forms of compression, even integrated rdiff deltas.
Here are some (potentially) interesting topics you can find here: