It’s 3am. Your pager is beeping. Your Kubernetes cluster is down. Don’t panic - we’ve got you covered. In this talk, we’ll describe a variety of disaster scenarios you may encounter. We’ll arm you with the knowledge you need to overcome them. Whether you’re a systems administrator, application developer, or end user, after this talk you’ll walk away with a thorough understanding of Kubernetes disaster recovery, including:
A disaster recovery overview - Strategies for Kubernetes - Comparisons to federation and high availability - Which components to back up vs recreating from scratch
How to minimize your time to recovery - Automate cluster creation and infrastructure configuration - Back up and quickly restore your cluster applications, workloads, and persistent volumes using tools such as Heptio Ark
How to handle specific disaster scenarios - Losing nodes - Recovering from bad configuration updates - Cloud provider outages
Andy Goldstein is an engineer at VMware. Current and past projects and contributions include Cluster API, Velero, OpenShift, and Kubernetes. Andy lives in Rockville, MD, with his wife, two children, and two noisy cats.
Steve Kriss is a systems engineer at Heptio working on building tools and products to help Kubernetes users be successful, and has been a contributor to upstream Kubernetes as well as a member of the Kubernetes release team in the past. Steve recently relocated to Seattle from New... Read More →