Back To Schedule
Friday, December 8 • 4:25pm - 5:00pm
What Happens When Something Goes Wrong? On Kubernetes Reliability [I] - Marek Grabowski & Tina Zhang, Google

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

One of the best features of the Kubernetes is that it can automatically recover from various failures and keep your application working despite unfavorable circumstances. There are moments when this works like magic and operators won't even notice something was going on. Sadly, sometimes automation fails.

In this talk we're going to describe various policies and mechanisms that are implemented in the system designed to keep user applications and cluster in general running. We'll talk both about things that will happen automatically and those that users need to configure.

avatar for Marek Grabowski

Marek Grabowski

Site Reliability Engineer, Google
Marek is a Software Engineer turned Site Reliability Engineer late 2017. Currently he focuses on reliability of Kubernetes clusters. Since 2013 he has been working on Google’s Technical Infrastructure, where early 2015 he joined Kubernetes engineering team. In Kubernetes his main... Read More →
avatar for Tina Zhang

Tina Zhang

Site Reliability Engineer, Google
Tina joined the Google as a Site Reliability Engineer for GKE in March 2017 and has primarily been working on delivering High Availability Masters in GKE, bringing GKE to more cloud regions and improving monitoring and alerting for the system. Prior to this, she had a previous life... Read More →

Friday December 8, 2017 4:25pm - 5:00pm CST
Ballroom A, Level 1