Backups and Archives: Policies and Procedures

 

©2006–2012 by Wayne Pollock, Tampa Florida USA.

Overview of Backups and Archives

Statistic: 90% of all companies that suffer catastrophic data lost (a disk crash) are out of business within one year.

Backups are made for the purpose of rebuilding a system that is identical to the current one.  Backups are thus for recovery, not transferring of data to another system.  They do not need to be portable.  In this sense backup is used to mean a complete backup of an entire system: not just regular files but all owner, group, date, and permission information, for all files, links, /dev entries, some /proc entries, etc.

Archives are for transferring data to other systems, or making copies of files for historical or legal purposes.  As such they should be portable so that they may be recovered on new systems when the original systems are no longer available.  For example it should be possible for an archive of the files on a Solaris Unix system to be restored on an AIX Unix, or even a Linux system.  (Within limits this portability should extend to Windows and Macintosh systems as well.)

Most of the time the two terms are used interchangeably.  (In fact the above definitions are not universally agreed upon!)  In the rest of this document the term backup will be used to mean either a backup or an archive as defined above.  Most real-world situations call for archives, since the other objects (such as /dev entries) rarely if ever change on a production server once the system is installed and configured.  A single true backup is usually sufficient.  For home users, the original system CDs often serve as the only backup need; all other backups are of modified files only and hence are archives.

Using RAID is not a replacement for regular backups!  (Imagine a disaster such as a fire on your site, an accidentally deleted file, or corrupted data.)

Creating backup policies (includes several sub-policies, discussed below) can be difficult.  Keep in mind the requirements of the organization, often specified in an SLA (or service level agreement).  Make sure users/customers are aware of what gets backed up and what doesn't, how to request a restore, and how long it might take to restore different data from various problems (very old data, a fire destroys the hardware, a DB record accidentally deleted from yesterday, ...).

Most people underestimate how slow a restore operation can be.  It is often 10–20 times longer to restore a file than to back one up.  (One reason: Operating systems are usually optimized for read operations, not write operations.)

You must also worry about security of your backups.  Have a clear policy on who is allowed to request a restore and how to do so, or else one user might request a restore of other's files.  In some cases this may be allowed, say by a manager or auditor.  (In a small organization where everyone knows everyone, this is not likely to be a problem.)

An example SLA

Customers should be able to recover any file version (with a granularity of a day) from the past 6 months, and any file version (with a granularity of a month) from the past 3 years.  Disk failure should result in no more than 4 hours of down-time, with at worst 2 business days of data lost.  Archives will be full backups each quarter and kept forever, with old data copied to new medium when the older technology is no longer supported.  Critical data will be kept on a separate system with hourly snapshots between 7:00 AM and 7:00 PM, with a midnight snapshot made and kept for a week.  Users have access to these snapshots.

(Database and financial data have a different SLA for compliance reasons.)

Types and Strategies of Backups and Archives

It is possible to backup only a portion of the files (and other objects in the case of a backup) on your systems.  In fact there are three types of backups (or archives):

  1. Full (also known as epoch or complete) — everything gets backed-up.
  2. Incremental — backup everything that has been added or modified since the last backup of any type (either incremental or full).
  3. Differential — backup everything that has been added or modified since the last full backup.  Differentials can be assigned levels:  level 0 is a full backup and level n is everything that has changed since the last level n−1 backup.

(Not everyone distinguishes between incremental and differential backups, but they are different so you must make sure anyone you speak with is using the same definitions as you are.)

A system administrator must choose a backup strategy (a combination of types) based on several factors.  The factors to consider are safety, required down time, cost, convenience, and speed of backups and recovery.  These factors vary in importance for different situations.  Common strategies include full backups with incremental backups in-between, and full backups with differential backups in-between (a two-level differential).  Sometimes a three level differential is used but rarely more levels.  (You never use both incremental and differential backups as part of a single strategy.)  The strategy of using only full backups is rarely used.

What with modern backup software, the differences between the strategies mentioned above aren't that large.  Incrementals take less time to backup and more time to restore (since several different backup media may be needed), compared with differential backups (where at most two media, the last differential and the last full backup media, are used to recover a file).  Full backups take a huge amount of time to make, but recovery is fast (only a single media must be read).  Note most commercial software keeps a special directory file that is reset for each full backup, and keeps track of which incremental tape (or other media) holds which files.  This file is read from the last incremental tape during a restore, to determine exactly which tape to use to recover some file.

The Backup Schedule

The frequency of backups (the backup schedule) is another part of the policy.  In some cases it is reasonable to have full backups daily and incremental backups several times a day.  In other cases a full backup once a year with weekly or monthly incremental backups could be appropriate.  A common strategy suitable for most corporate environments would be monthly full backups and daily differential backups.  (Another example might be quarterly full (differential level 0) backups, with monthly level 1 differentials, and daily level 2 differentials.)  However more frequent full backups may save tapes (as the incremental backups near the end of the cycle may be too large for a single tape).

Note that in some cases there will be legal requirements for backups at certain intervals (e.g., the SEC for financial industries, the FBI for defense industries, or regulations for medical/personal data).  Depending on your backup software, it may be required to bring the system partially or completely off-line during the backup process.  Thus there is a trade-off between convenience versus cost, versus the safety of more frequent backups.

In a large organization it may not be possible to perform a full backup on all systems on the same weekend.  A staggered schedule is needed, where (say) 1/4 of the severs get backed up on the first Sunday of the month, 1/4 the second Sunday, and so on.  Each server is still being backed up monthly but not all on the same day of the month. 

Ideally you don't want to have a single backup require more than one tape or whatever medium you're using.  Having to change tapes makes backup and recorery slower, and may make automatic backups impossible (if someone has to manually change tapes.)

Be aware that small changes to the schedule can result in dramatic changes in the amount of backup media needed.  For example suppose you have 4 GB to backup within this SLA: full backup every 4 weeks (28 days) and differential backups between.  Now assume the differential backup grows 5% per day for the first 20 days (80% has changed) and stays the same size thereafter.  Some math reveals that doing full backups each week (which still meets the SLA) will use a third the amount of tape of a 28 day cycle, in this case.

Good schedules minimize backup and recovery times, minimize the amount of backup media required, and still meet the SLAs.  Such schedules require a lot of complex calculation to work out.  Modern backup software (such as Amanda) allows one to specify the SLA and will create a schedule automatically.  A dynamic schedule is adjusted automatically to optimize the backups, depending on how much data is actually copied for each backup.  Such software will simply inform the SA when to change the tapes in a jukebox.

On a busy (e.g., database) server downtime will be the most critical factor.  In such cases consider using LVM snapshots, which very quickly makes a read-only copy of some logical volume using very little extra disk space.  You can then backup the snapshot while the rest of the system remains up.

Another strategy is called disk-to-disk-to-tape, in which the data to be backed up is quickly copied to another disk and then written to the slower backup medium later.

Other Policies

Deciding what to backup is part of your policy too.  Are you responsible for backing up the servers only?  All partitions, some partitions, some directories, or just a few selected files?  The Boss' workstation?  All workstations?  (The users need to know!)  Network devices (e.g., routers and switches)?  It may be appropriate to use a different backup strategy for user workstations than for servers, for different servers, or even different partitions/directories/files of servers.

Another part of your backup policy is determining how long to keep old backups around.  This is called the backup retention policy.  In many cases it is appropriate to retain the full backups indefinitely.  In some cases backups should be kept for 7 to 15 years (in case of legal action or an IRS audit).  Such records are often useful for more than disaster recovery.  You may discover your system was compromised months after the break-in.  You may need to examine old files when investigating an employee.  You may need to recover an older version of your company's software.  Such records can help if legal action (either by your company for by someone else suing your company) occurs.

Since Enron, Microsoft scandals (when corp. officers had emails subpoenaed by DoJ), a common new policy became if it doesn't exist it can't be subpoenaed!  These events led to a revision of the FRCP:

FRCP — The Federal Rules of Civil Procedure

The FRCP include rules for handling of ESI (Electronically Stored Information) when legal action (e.g. lawsuits) are immanent or already underway.  You must suspend normal data destruction activities (such as reusing backup media), possibly make snapshot backups of various workstations, log files, and other ESI, classify the ESI as easy or hard to produce, and the cost to produce the hard ESI (which the other party must pay), work out a discovery (of evidence) plan, and actually produce the ESI in a legally acceptable manner.  An SA should consult the corporate lawyers well in advance to work out the procedures.

It is important to decide where store the backup media (the storage policy).  These tapes or CDs contain valuable information and must be secured.  Also it make no sense to store media in the same room as the server the backup was made from; if something nasty happens to the server such as theft, vandalism, fire, etc., then you lose your backups too.  A company should store backup media in a secure location, preferably off-site.  A bank safe-deposit box is usually less than $50 a year and is a good location to store backup media.  If on-site storage is desirable, consider a fire-proof safe.  (And keep the door shut all the time!)  Consider remote storage companies but beware of bandwidth and security issues.

Backup media will not last forever.  Considering how vital the backups might be, it is a false economy to buy cheap tape or reuse the same media over and over.  A reasonable media replacement policy (also known as the media rotation schedule) is to use a new tape once for a full backup, then use it 12 times for differential or 31 times for incremental backups, then toss it.  Before using new media for the first time, test it.

For security reasons you should completely erase the media before throwing the media in the trash.  (This is harder than you think!)  An alternative is to shred or burn old media, and/or encrypting backups as they are made.

Backup Media Choices

There are too many choices to count today.  For smaller archives flash disks, writable CD-ROMs, writable DVDs, (these are WORM media) and old fashioned DLT, DAT, DDS-{2,4,8,16} tape drives are popular.  (I used a DDS-2 SCSI drive at home for many years.)  Today (2010) consider LTO2 drives.

Tape storage is very cheap, typically less then $20 for 80 gigabytes of storage.  (DDS-2 tapes cost about $7 and hold 4 GB each.  DDS-4 are fast backups and hold ~100 GB each.)  However tapes and other magnetic media can be affected by strong electrical and magnetic fields, heat, humidity, etc.

An external hard drive (less than $100 for 1TB) connected directly to your PC can use the backup program that came with your operating system (Backup and Restore Center on Windows, and Time Machine on OS X).  Most backup software can automate backups of all new files or changed ones on a regular basis.  This is a simple option if you only have one PC.

Optical media such as CDs are more durable and fairly cheap but take longer to write.  They can be reused less often than magnetic media, and are still susceptible to heat and humidity.  Optical media can scratch if not carefully handled.  Also consider the bulk of the media.  If you must store seven year's worth of backups, it may be important to minimize the storage requirements and expense.  A CD-ROM can hold about 700 MiB while a dual-layer Blu-ray can hold 50 GiB.

A choice becoming popular (since 2008) is on-line storage, e.g., HP Upline, Google GDrive, etc.  (For SOHO you can use Mozy or BackBlaze).  This is a market that is growing and changing rapidly, so you need to do your own research on available choices (as the list above is likely out of date.)  These companies offer cheap data storage and complete system backups, provided you have a fast Internet connection.  Many collocation (network exchange points) provide this service as well to the connected ISPs.  Whether or not to trust the Internet and some outsourced company with your vital business data is a decision you will need to make.  If you go this route make sure all the data is encrypted using industry standard encryption at your site before transmission across the Internet.  (Never use any company that uses proprietary encryption regardless of how secure they claim it is!)

When backing up large transaction database files, the speed of the media transfer is important.  For instance, a 6 Mbps (Megabits per second) tape drive unit will backup 10 gigabytes in about 3 hours and 45 minutes.  (In most cases incremental or differential backups contain much less data!)

For IDE controllers you only choice is a TRAVAN backup drive.  These are very slow, don't use!  For SCSI drives (such as DDS drives from HP) there are two speeds for the SCSI controller, depending on what devices are on it.  A tape drive will slow down the SCSI bus by half, so consider dual SCSI controllers.

For networks, consider a networked backup unit.  This would allow a single backup system to be used with many different computers.  Thus you can buy one high-speed device for about the same money as several lower-speed devices.  Keep in mind however that a network backup can bring a standard Ethernet network to its knees.  (The network only shares 10 Mbps for all users on a SOHO or wireless LAN.)  Even a Fast Ethernet (100 Mbps) LAN might suffer noticeable delays and problems.

An excellent choice for single-system backup is a USB disk.  Also using SAN/NAS to centralize your storage makes it easy to use a single backup system (robot tapes).

It is a good idea to have a spare media drive (e.g., DLT tape drive), in case the one built into a computer fails when the computer fails.  This is especially true for non-standard backup devices that may not be available from CompUSA on a moment's notice.  Regularly clean and maintain (and test) your backup drives.  (While I don't know of any organization that does this, consider copying old data to new hardware once the old drives are no longer supported or available.  If you don't have a working drive (including drivers), old backups are useless!)

Consumables Planning (Budgeting)

Suppose a medium to large organization uses 8 backup tapes a day, 6 days a week, means 48 tapes.  If your retention policy is to keep 6 months worth of incrementals, that's 1,248 tapes needed.  High capacity quality tapes might go for $60, so you would need $74,880.00.  In the second part of the year, you only need new tapes for full backups, an additional 260 tapes (say) for $15,600, or more than $90k for the first year ($7,500 a month).  (Not counting spares or the cost of drive units.)  Changes to the policies can result in expense differences of over $1,000 per month!

As backup technology changes over the years, it is important to keep old drives (and drivers) around, to read old backup tapes when needed.  You should keep old drives around long enough to cover your data retention policies.  Try to avoid upgrading your backup technology (drives, tapes, software) every few years, or you'll end up with many different and incompatible backup tapes.

Tools for Archives and Backups

Archives are easier to make than backups, so most tools perform archives.  A tool cannot make a backup without knowing the underlying filesystem intimately, i.e. it must parse the filesystem on disk.  The reason is twofold:

  1. Different filesystems exhibit different semantics.  No single tool supports all the semantics of all filesystem types.
  2. The kernel interface obfuscates information about the layout of the file on disk.  You have to go around the kernel to the device interface to see all the information about a file that is necessary for recording it correctly.

GNU tar is a popular tool for archiving the user's view of files.  Another standard (and free) choice is cpio.  Note neither tool is standardized by POSIX.  A new standard tool, based on both (and hopefully better than either) is pax.  These, combined with find and some compression program (such as gzip or bzip2) are used to easily make archives.

You can ask find to locate all files modified since a certain date, and add them to a compressed tar archived created on a mounted backup tape drive.  A backup shell script can be written, so you don't end up attempting to backup /dev, /sys, or /proc files.  (Note!  Unix tarGNU tar; use the GNU version.)

If you want to store the kernel's view of files, along with all of the semantics the filesystem provides, and none of the non-filesystem objects that might appear to inhabit the filesystem (such as /proc entries), use the filesystem's native dump (and restore) programs provided by your vendor specifically for that purpose, for your filesystem type (note for Reiser4Fs you can just use star).  dump uses /etc/dumpdates to track dump levels.  Some of the differences between dump (for backups) and tar or cpio for archives are:

Not all filesystem types support dump and restore utilities.  When picking a filesystem type keep in mind your backup requirements.

In any case, crontab can be used to schedule backups according to the backup schedule discussed earlier.  If your company prefers to have a human perform backups, remember that root permission will be needed to access the full system.  Often the backup program is controlled by sudo or a similar facility so the backup administrator doesn't need the root password.

The find command can be used to locate which files need to be backed-up.  Use find / -mtime -x for incrementals and differentials to find files changed since x (/etc/last-backup.{full,incremental,differential}).  Use with tar roughly like this:

mount /dev/zip
find / -mtime -1 -depth | xargs tar -rf /tmp/$$
gzip /tmp/$$; mv /tmp/$$.gz /zip/incremental-6-20-01
date >/etc/last-backup.incremental
umount /dev/zip

Commercial software is affordable and several packages are popular for Unix and Linux systems, including BRU (TolisGroup.com), BackupEXEC, and Arkeia (www.arkeia.com).  I haven't used these, I just use tar and find.  Of course there are other choices as well, such as KDE ark or amanda (network backups).  Some of these can create schedules, label tapes, encrypt tapes, follow media rotation schedules, etc.

The most important tool is the documentation: the backup strategy, media types and rotation schedule, hardware maintenance schedule, location of media storage (e.g., the address of the bank and box number), and all the other information discussed above.  This document is collectively referred to as the backup policy.  This document should clearly say to users what gets backed up and when, and what to do and who to contact if you need to recover files.

Note:  Whatever tools you use, make sure you test your backup method, by attempting to use the recovery procedure.  (I know someone who spent 45 minutes each working day doing backups for years, only to realize none of the backups ever worked the first time he attempted to recover a file!)

(Parts of this document were adopted from netnews (Usenet) postings in the newsgroup comp.unix.admin during 5/2001 by Jefferson Ogata.)  Other parts were adopted from The Practice of System and Network Administration, by Limoncelli and Hogan, 1st Ed. ©2001 by Addison-Wesley.

Backups with Solaris Zones

Solaris zones contain a complication for backup: many standard directories are actually mounted from the global zone via LOFS (loopback filesystem).  These should only be backed up from the global zone.  The only items in a local zone needing backup are (usually) application data and configuration files.  Using an archive tool (such as cpio, tar, or star) will work best:

find export/zone1 -fstype lofs -prune -o -local \
| cpio -oc -O /backup/zone1.cpio

Whole zones can be fully or incrementally backed up using ufsdump.  Shut down the zone before using the ufsdump command, to put the zone in a quiescent state, and avoid backing up shared file systems, with:

global# zlogin -S zone1 init 0

Solaris supports filesystem snapshots (like LVM does on Linux) so you don't have to shut off a zone.  However it must be quiesed by turning off applications before creating the snapshot.  Then you can turn them back on and perform the backup on the snapshot.  Create it with:

global# fssnap -o bs=/export /export/home # create the snapshot:
global# mount -o ro /dev/fssnap/0 /mnt    # then mount it.

You should make copies of your non-global zones' configurations in case you have to recreate the zones at some point in the future.  You should create the copy of the zone's configuration after you have logged into the zone for the first time and responded to the sysidtool questions:

global# zonecfg -z zone1 export > zone1.config

Adding a backup tape drive

Added SCSI controller (ADAPTEC 2940)
Added SCSI DDS2 Tape drive
On reboot kudzu detected and configured SCSI controller and tape device
Verify devices found with 'dmesg': indicate tape is /dev/st0
and /dev/nst0
Verify scsi devices with 'scsi_info' (/proc/scsi)
Verify device working with: mt -f /dev/st0 status
Create link: ln -s /dev/nst0 /dev/tape
Verify link: mt status

Note: /dev/st0 causes automatic tape rewind after any operation, /dev/nst0 has no automatic rewind, but most backup software knows to rewind before finishing.  If you plan to put multiple backup files on one tape you must use /dev/nst0.

Common Backup and Archive Tools

mt (/dev/mt0, /dev/rmt0)
st (/dev/st0, /dev/nst0 — use nst for no auto rewind)
mt and rmt (remote tape backups) — rewind, erase, ...
dump/restore  (These operate on the drive as a collection of disk blocks, below the abstractions of files, links and directories that are created by the file systems.  dump backs up an entire file system at a time.  It is unable to backup only part of a file system or a directory tree that spans more than one file system.)
tar, cpio, dd, star (and pax)

A comparison of these tools:

If you need to backup large (e.g., DB) files use a large blocksize.

Many types of systems can use LVM, ZFS or some equivalent that supports snapshots for backup without the need to taking the filesystem off-line.

NAS (and some SAN) systems are commonly backed up with some tool that supports NDMP (the network data management protocol), which usually works by doing background backup to tape of a snapshot.  This has a minimal effect on users of the storage system.

Additional Tools I've Heard About

Jörg Schilling's star program currently supports archiving of ACLs.  IEEE Standard 1003.1-2001 (POSIX.1) defined the pax interchange format that can handle ACLs and other extended attributes (e.g., SE Linux stuff).  Gnu tar supposedly handles pax and star formats.  There is also a spax tool that supports the star extensions.

A tool that supposedly easily and correctly backs up ACLs, ext2 and ext3 attributes, and extended attributes (such as for SE Linux) is bsdtar, a BSD modified version of tar that uses libarchive.so to read/write a variety of formats.  (Personally I've only been able to use star and spax to make and restore such archives.)

bru (commercial software)

amanda (a powerful network backup scheduling tool)

unison (Uses rsync to mirror directories between systems, including between Unix/ Linux, and Windows systems.

rsync.  To make a backup of /home to server.kaos.org with rsync via ssh: (untested!)

Examples of Common Backup Tool Use

rsync

rsync -avre "ssh -p 2222" /home/ server.kaos.org:/home
rsync -azve ssh me@ server.kaos.org:documents documents
rsync -azve ssh documents me@ server.kaos.org:documents
rsync -HavRuzc /var/www/html/ example.com:/var/www/html/
# or copy ~/public_html to/from me@example.com:public_html/
-v = verbose,
-c = use MD4 to see if dest file different than src,
-a = archive mode = -rlptgoD = preserve almost everything,
-r = recursive,
-R = preserve src path at dest,
-z = compress when sending,
-b = backup dest file before over-writing/deleting,
-u = don't over-write newer files,
-l = preserve symlinks,
-H = preserve hard links,
-p = preserve permissions,
-o = preserve owner,
-g = preserve group,
-t = preserve timestamps,
-D preserve device files,
-S = preserve file "holes",
--modify-window=1 = timestamps match if different by
                    less than this number of seconds
      (required on Windows which only has 2 second time precision)

Modern rsync has many options to control the attributes at the destination.  You can use --chmod, transfer ACLs and EAs.  You can create rsyncd.conf files to control behavior (and use a special ssh key to run a specific command), and define new arguments via ~/.popt.  But older rsync versions don't have all those features.  On Fedora Core 2 for example, rsync has no options to set/change the permissions when coping new files from Windows; umask applies.  (You can use special ssh tricks to work around this, to run a find ... |xargs chmod... command after each use of rsync.)

A better way on Fedora is to set a default ACL on each directory in your website.  Then all uploaded files will have the umask over-ridden:

cd ~/public_html # or wherever your web site is.
find . -type d -exec xargs setfacl -m d:o:rX {} +

This ACL says to set a default ACL on all directories, to provide others read, plus execute if a directory.  (New directories get this default ACL too.)  With this ACL, uploading a file will have 644 or 755 permissions, rather than 640 or 750.

cpio

cd /someplace/..  # the parent of someplace
find someplace -depth | cpio -o --format=crc > ~/file.cpio
   # crc  is the new SysVr4 format with CRCs
   # (notice how we avoid using absolute pathnames with cpio!)

cpio -idl < file.cpio  # -d means to create directories if needed;
                       # -l means to link files if possible

cpio -ivd glob-pattern <file.cpio # restore files matching glob
   # note these globs (wildcards) match leading dot and slashes!

cpio -it < file.cpio # table of contents
   # "-v" means print filenames as processed;
   # "-V" means print a dot per file processed (often better than "-v" when
   # there are lots of files, to prevent scrolling the screen much)

Sample commands to backup all files to backup media:

cd /folderToBackup/..
find folderToBackup -depth -path './proc' -prune -o -print | \
   cpio -o --format=crc > /dev/tape

# Command to restore complete (full) backup:
 cpio -imd < /dev/tape
 
# command to restore just the file foo:
 cpio -imd */foo < /dev/tape

# Command to get table of contents:
 cpio -tvmd < /dev/tape

pax

#pax works the same as cpio:
find ... | pax -wvx ustar > pax.out
pax [-v] < pax.out

pax -rv -pe < pax.out  # "-pe" means preserve everything
   # Note spax  has an additional "-acl" option

cp, tar, and ssh

To duplicate some files or a whole directory tree requires more then just copying the files.  Today you need to worry about ACLs, extended attributes (SE Linux labels), non-files (e.g., named pipes, or FIFOs), files with holes, etc.  Gnu cp has options for that but can't be used to copy files between hosts.

The best way to duplicate a directory tree on the same host is Gnu cp -a, or if not available use:

spax -pe -rw olddir newdir 

To copy a whole volume to another host you can use dump and then transfer that, and restore it on the remote system.  Files or backups and archives can be copied between hosts with scp, sftp, or rsync.

tar is often used to duplicate a directory tree to the same host if Gnu cp isn't available.  tar can also be used to duplicate a directory tree to a different host, via ssh:

tar czf - -C sourcedir files \
| ssh remote_host 'tar xzf - -C destdir'

Use tar with ssh if this is a complete tree transfer.  For extra performance use different compression (e.g., -j for bzip2).

Using rsync over ssh often performs better than tar if it is an update (i.e., some random subset of files need to be transferred).