Obnam2—a backup system

Lars Wirzenius

2020-11-27 09:11

1 Introduction

Obnam2 is a backup system.

In 2004 I started a project to develop a backup program for myself, which in 2006 I named Obnam. In 2017 I retired the project, because it was no longer fun. The project had some long-standing, architectural issues related to performance that had become entrenched and were hard to fix, without breaking backwards compatibility.

In 2020, with Obnam2 I’m starting over from scratch. The new software is not, and will not become, compatible with Obnam1 in any way. I aim the new software to be more reliable and faster than Obnam1, without sacrificing security or ease of use, while being maintainable in the long run. I also intend to have fun while developing the new software.

Part of that maintainability is going to be achieved by using Rust as the programming language (which has a strong, static type system) rather than Python (which has a dynamic, comparatively weak type system). Another part is more strongly aiming for simplicity and elegance. Obnam1 used an elegant, but not very simple copy-on-write B-tree structure; Obnam2 will use SQLite.

1.1 Glossary

This document uses some specific terminology related to backups. Here is a glossary of such terms.

2 Requirements

The following high-level requirements are not meant to be verifiable in an automated way:

The detailed, automatically verified acceptance criteria are documented below, as scenarios described for the Subplot tool. The scenarios describe specific sequences of events and the expected outcomes.

3 Software architecture

3.1 Effects of requirements

The requirements stated above drive the software architecture of Obnam. Some requirements don’t affect the architecture at all: for example, “excellent documentation”. This section discusses the various requirements and notes how they affect the architecture.

3.2 On SFTP versus HTTPS

Obnam1 supported using a standard SFTP server as a backup repository, and this was a popular feature. This section argues against supporting SFTP in Obnam2.

The performance requirement for network use means favoring protocols such as HTTPS, or even QUIC, rather than SFTP.

SFTP works on top of SSH. SSH provides a TCP-like abstraction for SFTP, and thus multiple SFTP connections can run over the same SSH connection. However, SSH itself uses a single TCP connection. If that TCP connection has a dropped packet, all traffic over the SSH connections, including all SFTP connections, waits until TCP re-transmits the lost packet and re-synchronizes itself.

With multiple HTTP connections, each on its own TCP connection, a single dropped packet will not affect other HTTP transactions. Even better, the new QUIC protocol doesn’t use TCP.

The modern Internet is to a large degree designed for massive use of the world wide web, which is all HTTP, and adopting QUIC. It seems wise for Obnam to make use of technologies that have been designed for, and proven to work well with concurrency and network problems.

Further, having used SFTP with Obnam1, it is not always an easy protocol to use. Further, if there is a desire to have controlled sharing of parts of one client’s data with another, this would require writing a custom SFTP service, which seems much harder to do than writing a custom HTTP service. From experience, a custom HTTP service is easy to do. A custom SFTP service would need to shoehorn the abstractions it needs into something that looks more or less like a Unix file system.

The benefit of using SFTP would be that a standard SFTP service could be used, if partial data sharing between clients is not needed. This would simplify deployment and operations for many. However, it doesn’t seem important enough to warrant the implementation effort.

Supporting both HTTP and SFTP would be possible, but also much more work and against the desire to keep things simple.

3.3 On “btrfs send” and similar constructs

The btrfs and ZFS file systems, and possibly others, have a way to mark specific states of the file system and efficiently generate a “delta file” of all the changes between the states. The delta can be transferred elsewhere, and applied to a copy of the file system. This can be quite efficient, but Obnam won’t be built on top of such a system.

On the one hand, it would force the use of specific file systems: Obnam would no be able to back up data on, say, an ext4 file system, which seems to be the most popular one by far.

Worse, it also for the data to be restored to the same type of file system as where the live data was originally. This onerous for people to do.

3.4 Overall shape

It seems fairly clear that a simple shape of the software architecture of Obnam2 is to have a client and server component, where one server can handle any number of clients. They communicate over HTTPS, using proven web technologies for authentication and authorization.

The responsibilities of the server are roughly:

The responsibilities of the client are roughly:

There are many details to add to both to the client and the server, but that will come later.

It is possible that an identity provider needs to be added to the architecture later, to provide strong authentication of clients. However, that will not be necessary for the minimum viable product version of Obnam. For the MVP, authentication will happen using RSA-signed JSON Web Tokens. The server is configured to trust specific public keys. The clients have the private keys and generate the tokens themselves.

4 Implementation

The minimum viable product will not support sharing of data between clients.

4.1 Chunks

Chunks consist of arbitrary binary data, a small amount of metadata, and an identifier chosen by the server. The chunk metadata is a JSON object, consisting of the following fields:

When creating or retrieving a chunk, its metadata is carried in a Chunk-Meta header as a JSON object, serialized into a textual form that can be put into HTTP headers.

4.2 Server

The server has the following API for managing chunks:

HTTP status codes are used to indicate if a request succeeded or not, using the customary meanings.

When creating a chunk, chunk’s metadata is sent in the Chunk-Meta header, and the contents in the request body. The new chunk gets a randomly assigned identifier, and if the request is successful, the response body is a JSON object with the identifier:

{
    "chunk_id": "fe20734b-edb3-432f-83c3-d35fe15969dd"
}

The identifier is a UUID4, but the client should not assume that and should treat it as an opaque value.

When a chunk is retrieved, the chunk metadata is returned in the Chunk-Meta header, and the contents in the response body.

It is not possible to update a chunk or its metadata.

When searching for chunks, any matching chunk’s identifiers and metadata are returned in a JSON object:

{
  "fe20734b-edb3-432f-83c3-d35fe15969dd": {
     "sha256": "09ca7e4eaa6e8ae9c7d261167129184883644d07dfba7cbfbc4c8a2e08360d5b",
     "generation": null,
     "ended: null,
  }
}

There can be any number of chunks in the search response.

4.3 Client

The client scans live data for files, reads each file, splits it into chunks, and searches the server for chunks with the same checksum. If none are found, the client uploads the chunk. For each backup run, the client creates an SQLite database in its own file, into which it inserts each file, its metadata, and list of chunk ids for its content. At the end of the backup, it uploads the SQLite file as chunks, and finally creates a generation chunk, which has as its contents the list of chunk identifiers for the SQLite file.

For an incremental backup, the client first retrieves the SQLite file for the previous generation, and compares each file’s metadata with that of the previous generation. If a live data file does not seem to have changed, the client copies its metadata to the new SQLite file.

When restoring, the user provides the chunk id of the generation to be restored. The client retrieves the generation chunk, gets the list of chunk ids for the corresponding SQLite file, retrieves those, and then restores all the files in the SQLite database.

5 Acceptance criteria for the chunk server

These scenarios verify that the chunk server works on its own. The scenarios start a fresh, empty chunk server, and do some operations on it, and verify the results, and finally terminate the server.

5.1 Chunk management happy path

We must be able to create a new chunk, retrieve it, find it via a search, and delete it. This is needed so the client can manage the storage of backed up data.

given an installed obnam
and a running chunk server
and a file data.dat containing some random data
when I POST data.dat to /chunks, with chunk-meta: {"sha256":"abc"}
then HTTP status code is 201
and content-type is application/json
and the JSON body has a field chunk_id, henceforth ID

We must be able to retrieve it.

when I GET /chunks/<ID>
then HTTP status code is 200
and content-type is application/octet-stream
and chunk-meta is {"sha256":"abc","generation":null,"ended":null}
and the body matches file data.dat

We must also be able to find it based on metadata.

when I GET /chunks?sha256=abc
then HTTP status code is 200
and content-type is application/json
and the JSON body matches {"<ID>":{"sha256":"abc","generation":null,"ended":null}}

Finally, we must be able to delete it. After that, we must not be able to retrieve it, or find it using metadata.

when I DELETE /chunks/<ID>
then HTTP status code is 200
when I GET /chunks/<ID>
then HTTP status code is 404
when I GET /chunks?sha256=abc
then HTTP status code is 200
and content-type is application/json
and the JSON body matches {}

5.2 Retrieve a chunk that does not exist

We must get the right error if we try to retrieve a chunk that does not exist.

given an installed obnam
and a running chunk server
when I try to GET /chunks/any.random.string
then HTTP status code is 404

5.3 Search without matches

We must get an empty result if searching for chunks that don’t exist.

given an installed obnam
and a running chunk server
when I GET /chunks?sha256=abc
then HTTP status code is 200
and content-type is application/json
and the JSON body matches {}

5.4 Delete chunk that does not exist

We must get the right error when deleting a chunk that doesn’t exist.

given an installed obnam
and a running chunk server
when I try to DELETE /chunks/any.random.string
then HTTP status code is 404

6 Smoke test for Obnam as a whole

This scenario verifies that a small amount of data in simple files in one directory can be backed up and restored, and the restored files and their metadata are identical to the original. This is the simplest possible useful use case for a backup system.

given an installed obnam
and a running chunk server
and a client config based on smoke.yaml
and a file live/data.dat containing some random data
when I run obnam backup smoke.yaml
then backup generation is GEN
when I run obnam list smoke.yaml
then generation list contains <GEN>
when I invoke obnam restore smoke.yaml <GEN> rest
then data in live and rest match

File: smoke.yaml

root: live