One of the stated goals for gludb was to have a sane, easy method for backing up the data stored. The current backup functionality provides an easy way to get all the data you're interested in (as defined by the Storable-derived classes that you include in your backup) in a single gzip'ed tarball stored in Amazon S3
If you learn better by example, we have one!
Why Amazon S3
Before describing the backup, we should take a moment to discuss Amazon's S3. S3 (which stands for "Simple Storage Service") let's you define "buckets" where you can store large blobs of data, each under it's own key. If you like file system analogies, you can think of buckets as directories, keys as file names, and the data stored as file contents. S3 is reasonably priced and offers a variety of ways to handle access to your data.
Those familiar with Amazon's storage offering might be wondering why we didn't choose Amazon's Glacier service. There are two main reasons:
- You might want fairly fast access to your backups. If you want to use the backup functionality for snapshots to examine, ETL projects, or for some other purpose, Glacier's access time just isn't feasible. (Glacier only guarantees that you can access a file within 5 hours.)
- You can update the storage policy on S3 buckets to archive the contents to Glacier after a certain amount of time, so S3 backups can actually use Glacier.
In fact, S3 bucket policies mean that you can create fairly sophisticated backup plans by using different buckets for different kinds of backups. For instance, you might send daily backups to a bucket where files older than 30 days are deleted. Once a month you could run a backup that archives files to Glacier every 60 days. That way you have permanent monthly backups in Glacier, recent monthly backups for instance access, and daily backups for the last month with instance access
Requirements for S3
You'll need to have the name of your S3 bucket (which should already exist). You will also need an AWS Access ID and private key will rights to add data to buckets. You can create user accounts for accessing S3 (using IAM), add S3 buckets, and configure your buckets by using the AWS Console.
From here on our, we're going to assume that you've examined the backup example that we mentioned above.
Assuming that you already have classes defined for storing and reading data (which you can read about here), there are really only three things you need to perform a backup:
- Create an instance of a gludb.backup.Backup, specifying values for
- Add classes (or packages to be backed up) to the instance you created
- Call perform_backup on the instance you've created.
Let's assume that you've already created an instance of a Backup object:
from gludb.backup import Backup backup = Backup( aws_access_key='My Access ID', aws_secret_key='My Secret Key', bucketname='my-backup-bucket-name' )
If you just have a few classes in your model, you can just add them manually:
from myapp.model import SomeClass, SomeOtherClass backup.add_class(SomeClass) backup.add_class(SomeOtherClass)
Note that by default, when you call
add_class, all the base classes that are
also derived from Storable will be added to the backup. If you don't want this
behavior, you can do this with the parameter
include_bases (which defaults
backup.add_class(SomeClass, include_bases=False) backup.add_class(SomeOtherClass, include_bases=False)
If all the classes are in one (or a few) packages, you can just add a package
Note that unlike
add_class, you specify the package with a string. Also note
that the add_package call will examine all modules and sub-packages for
classes deriving from Storable. Each of those classes will be added to the
backup via a call to
include_bases=True. When adding a
- You can specify
include_bases, which will be passed for all classes found. If you need to specify something differently for multiple classes, consider calling
- If you don't want sub-packages to be examined, pass
recurse=False. The modules of the package specified will be scanned, but sub-packages will be ignored