Snapshots Won’t Guard Against the Crackle & Pop!

Don’t turn to a snapshot when things go crackle or pop.

Just like the oath doctors take to “do no harm” system administrators know that the first rule of system administration is: Make sure you can undo anything you do. That, sage advice requires the ability to roll back changes that have been put into the system. Normally this requires some form of backup of the directory, files or database. More and more you hear the term “snapshot” used to refer to the known-good point to restore to if something goes wrong. When data is corrupted (crackle) or hardware fails (pop).

Snapshot Defined

You may recognize the term from photography and that’s a good analogy. A snapshot is the captured “image” of a single point-in-time of some directory, file and sometimes a database. When referring to a Virtual Machine (VM) snapshot (as is often the case) it means an image of the entire VM file (this is more or less the same for all hypervisors; the systems that run VMs). Just like their 2 dimensional namesake, VM snapshots are useful but they do pose some limitations to take note of.

Snapshot Limitations

When it comes to backups, VM snapshots are the most fragile.[1] They depend upon an active VM and duplicate any flaws in data or run state that the VM files, configuration and hardware have, which can render the VM unusable for a restore. They should not be relied on as a dependable backup. In fact all vendors of VM software say the same thing.

Other limitations include:

  • Snapshots degrade VM performance.
  • Snapshots consume more and more storage space over time.
  • Snapshots dependent upon the specific VM environment they are created in.[1]

Snapshot Uses

A VM Snapshot is a quick method to revert to a point-in-time state of a virtual machine (this includes its configuration, files, and resources). It is a quick fail-safe roll back point to be revert to when performing upgrades, installing or uninstalling software or components. They are useful for low-risk changes in a development environment where “rinse and repeat” as it’s referred to, testing is being done and where the longer process of taking a backup is unnecessary.

VMware gives their own Best Practice for using VM snapshots

  • Do not use snapshots as backups.
  • VMware recommends only a maximum of 32 snapshots in a chain.
    • However, for better performance, use only 2 to 3 snapshots.
  • Do not use a single snapshot for more than 24-72 hours.[3]

Snapshot Versus Backup

Unfortunately, the term snapshot has made its way into common usage to sometimes mean “backup” even though they are quite different things, actually. In administrative circles, it often requires clarification that the term, when used, is referring specifically to a snapshot and not just a quick backup. Backups are more resilient because they are specifically designed for long-term data storage. They have these advantages over snapshots:

  • Backups do not grow over time or degrade system performance.[4]
  • Backups allow for incremental or selective restores.[5]
  • Backups can transplant data to another database or system.
    • This helps when the underlying system is upgraded.
    • This allows for data migration or multiplication among systems.
  • Backups can be stored for longer periods.[5]

Backup Uses

The central element of any business continuity or disaster recovery plan should be backups, not snapshots. They are the best way to guarantee Recovery Time Objectives (RTO) and recovery Point Objectives (RPO). Snapshots are not an appropriate medium for storing such critical data and any system administrator relying on them to recover from a disaster is asking for trouble. They also do not allow for the type of granular restore that may be required for the quickest RPO. There are essentially two types of backups we need to be concerned with for this purpose.

VM Backups are essentially a copy of the VM and are not dependent on their parent VM or VM environment at all. If the VM or VM hardware is completely lost, it’s 100% restorable from just the VM backup.

Database backups are a copy of a database (should one be running on the system). These are independent and in addition to a VM backup unless the VM, Backup tool has an integration into the database software being used. If a database backup is not used, there’s no guarantee the database can be brought back to a consistent state just by restoring the VM (the system the database runs on).

Both of these backup types should be used to ensure recovery from a data loss (crackle) or a hardware failure (pop) of the system.

Conclusion

So, the next time someone on your team says they are going to take a snapshot, where a backup is required, take the time to clarify which type of recovery tools is meant and should be used. Many have learned the hard lesson of relying on VM snapshots as a quick means to backup data and stretched them beyond their design or intended use case, only to end up with a corrupted system and data. Don’t use a snap to guard against the crackle or the pop.