Skip to main content

A few tricks to optimize backups for QEMU/KVM virtual machines.

  1. Have the disk usage be only as large as used space, not allocated space. More on that here . The backup program I've helped write fi-backup works remarkably well for that.
     
  2. Have rotating incremental backups so that the master backup name/directory is always the same. E.g. Current -> .0, .0 -> .1, etc.  I wrote a backup script that runs fi-backup and either does blockpull or blockcommit depending on the compressibility of the .qemu image.
     
  3. Use rsync to synchronize the full backup. When I first started using rsync to backup the .qemu snapshots and was looking at the performance/network numbers I thought something was wrong because of how little network traffic it took up. For a well-constructed virtual machine most of the data that changes is normally only the client data and non-static memory.

    As an example: A 200 GB disk qemu snapshot for a running production Windows server transferred only about 2 GB of raw image backup data with total trasnfer time about 2 hours.

Here's a real world example of a system that has a database and web server together for a client CRM solution.

STARTING BACKUP OF REDACTED

Number of files: 3 (reg: 2, dir: 1)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 2
Total file size: 214,272,381,971 bytes
Total transferred file size: 214,272,381,971 bytes
Literal data: 2,156,265,472 bytes
Matched data: 212,116,116,499 bytes
File list size: 110
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 14,713,927
Total bytes received: 2,163,265,771

sent 14,713,927 bytes  received 2,163,265,771 bytes  277,396.64 bytes/sec
total size is 214,272,381,971  speedup is 98.38
ENDING BACKUP OF REDACTED

 

The slowness of 300kbps is really including the rsync binary diff calculation time. The actual backup took about 2 hours which for 214GB translates to 238 Mbps. Yet in the network traffic logs I was only seeing traffic at 1/10th to 1/100th of actual network capacity. Amazing.