Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Operations - KubeCon [clear filter]
Thursday, December 7


GitOps - Operations by Pull Request [B] - Alexis Richardson, Weaveworks & William Denniss, Google
GitOps is the latest exciting evolution in empowering developers to do operations and CICD. Imagine describing your entire infra in Git declaratively and then continually using that to verify your state. Well, with Kubernetes, and tools like Terraform, and Ansible, you can. We've taken this forward by adding continuous diffs and alerting - and even some of our observability stack itself. An introduction is here: https://www.weave.works/blog/gitops-operations-by-pull-request

William (Google PM) and Alexis (Weaveworks, CNCF) will talk about how we jointly developed this pattern based around our own use cases. We shall make reference to other companies using the approach like Github and Atlassian. This is NOT a product pitch - we are going to teach you the PATTERNS.

avatar for William Denniss

William Denniss

Product Manager, Google
William is a Product Manager at Google on Google Kubernetes Engine. He chairs the Kubernetes Conformance working group, and has a passion for interoperability and developer experience. Previously he worked in the OAuth community, authoring RFC 8252 and creating AppAuth, the leading... Read More →
avatar for Alexis Richardson

Alexis Richardson

Founder & CEO, Weaveworks
Alexis is the CEO of Weaveworks and the chairman of the TOC for CNCF. Previously he was at Pivotal, as head of products for Spring, RabbitMQ, Redis, Apache Tomcat and vFabric. Alexis was responsible for resetting the product direction of Spring and transitioning the vFabric business... Read More →

Thursday December 7, 2017 11:10am - 11:45am
Ballroom C, Level 1


Kubernetes on AWS: Practices & Opinions [I] - Arun Gupta, Amazon Web Services & Raffaele di Fazio, Zalando
A lot of progress has been made on how to bootstrap a cluster since Kubernetes' first commit. It is now only a matter of minutes to go from zero to a running cluster on Amazon Web Services. There are still many fundamental topics to take a simple setup to something that can be run in production in a large enterprise and it is easy to get confused by the number of options and customizations.
In this talk we will show both common practices for running Kubernetes on AWS and an opinionated view of those. Specifically, we will cover options and recommendations on how to install and manage clusters, configure high availability, perform rolling upgrades and handle disaster recovery, as well as continuous integration and deployment of applications, logging, and security.
At the same time, we will explain how those topics are addressed at Zalando, Europe's leading fashion platform, based upon their experience of operating tens of Kubernetes clusters in production on AWS.

avatar for Raffaele Di Fazio

Raffaele Di Fazio

Software Engineer, Zalando SE
Raffaele works with the Zalando's Platform Engineering team in Berlin since 2015. There he is working on container technologies, currently focusing on Kubernetes and cluster orchestration. Over the years, Raffaele developed a genuine passion for simplicity and the Golang language... Read More →
avatar for Arun Gupta

Arun Gupta

Principal Technologist, AWS
Arun Gupta is a Principal Technologist at Amazon Web Services. He is responsible for the Cloud Native Computing Foundation (CNCF) strategy within AWS, and participates at CNCF Board and technical meetings actively. He works with different teams at Amazon to help define their open... Read More →

Thursday December 7, 2017 11:55am - 12:30pm
Ballroom C, Level 1


Kubernetes Distributions and 'Kernels' - Tim Hockin & Michael Rubin, Google
Kubernetes has historically released a full fledged distribution - everything you need. As the project gets more modular, that will become more complicated. This talk will explore the problems we face with this, and some ways can solve them, considering other analogous OSS ecosystems.

avatar for Tim Hockin

Tim Hockin

Principle Software Engineer, Google
Tim is a Principal Software Engineer at Google, where he works on the Kubernetes, Google Kubernetes Engine (GKE), and Anthos. He has been working on Kubernetes since before it was announced, and mostly pays attention to topics like APIs, networking, storage, nodes, multi-cluster... Read More →
avatar for Michael Rubin

Michael Rubin

Senior Staff Eningeer & TLM, Google
Twenty years in the Systems Software Industry, from developing enterprise file servers and systems. The past ten years he has worked at Google where he founded the Linux Storage group for its data centers and worked on world wide WAN and BGP technologies. Today he is co-leading and... Read More →

Thursday December 7, 2017 2:00pm - 2:35pm
Ballroom A, Level 1


Load Testing Kubernetes: How to Optimize Your Cluster Resource Allocation in Production - Harrison Harnisch, Buffer
So you've carefully crafted your first Kubernetes service, and you're ready to deploy it to production. Well, not quite: there are still some important unknowns to understand before your service will be ready for production traffic. It's still unclear how the new service behaves when it's being pushed, and it's possible that Kubernetes will kill the service before serving a single request. At Buffer, we've developed a technique to optimize Kubernetes deployment limits by using load testing to identify optimal values for resource limits. When the service is under heavy load there are a few key metrics to watch to identify bottlenecks. These key metrics can be used to adjust resource limits. This real world approach allowed us to safely and efficiently switch over more than half our production traffic to our Kubernetes cluster and can be applied to any application.

This talk will include a live demo of how to tune Etcd using methods we do at Buffer.

avatar for Harrison Harnisch

Harrison Harnisch

Staff Software Engineer, ZEIT
Harrison is a Staff Software Engineer at Buffer, implementing the transition to microservices with Kubernetes and Docker. He's given talks at KubeconEU and KubeconUS about setting resource limits.

Thursday December 7, 2017 2:45pm - 3:20pm
Ballroom C, Level 1


Securing Cluster Networking with Network Policies - Ahmet Balkan, Google
In a secure microservices cluster, you should only have the pods that need to communicate with each other to be able to establish network connections, and block all others. But how? Until recently, Kubernetes users could not enforce policies for container networking.

First introduced in Kubernetes 1.3, Network Policies are now a stable feature in Kubernetes 1.7. In this talk, we will discuss use cases for network policies, the Network Policy API, how to configure network policies, and how the configured policies are enforced. We will also present some network policies that address some common use cases and are relevant to securing your Kubernetes clusters.

Also, we will discuss the roadmap for Network Policies feature, other methods you can use to secure applications at network and application layers, and how Network Policies relate to service mesh projects such as Istio that offer similar functionality.

avatar for Ahmet Alp Balkan

Ahmet Alp Balkan

Developer Advocate, Google
Ahmet Alp Balkan is a Software Engineer at Google, working on developer experiences for open source technologies like Kubernetes and Knative. He is the maintainer of developer tooling like kubectx.dev and krew.dev, which is a Kubernetes SIG CLI sub-project. At Google, he works on... Read More →

Thursday December 7, 2017 3:50pm - 4:25pm
Ballroom C, Level 1


kubeadm Cluster Creation Internals: From Self-Hosting to Upgradability and HA [A] - Lucas Käldström, Student
kubeadm is the Kubernetes tool that helps you set up a Kubernetes cluster quickly and easily. kubeadm is different from other Kubernetes setup tools in that it doesn’t assume or depend on any special infrastructure. It assumes that you have one or more machine available and those machines can connect to each other via the network.

The master plan is to make kubeadm work both as the “fast path” to getting a best-practice Kubernetes cluster with a couple of easy-to-remember commands and as a toolbox for higher-level solutions like GKE, kops and Tectonic.

But how does kubeadm actually set up a cluster? How is it so easy to add a node with the Bootstrap Token? How does it self-host the control plane? How does it upgrade clusters smoothly with only one command? What is the plan for achieving HA without relying on any external infrastructure?

After this talk, you will be able to describe how:
  • kubeadm runs the different tasks in different stages
  • the network traffic between the cluster components flow
  • self-hosting of the control plane works
  • the Bootstrap Token works
  • the `kubeadm upgrade` command works
  • kubeadm will support multiple masters that are dynamically rotated
  • you can extend kubeadm to build your higher-level Kubernetes deployment tool

avatar for Lucas Käldström

Lucas Käldström

Student, Contracting
Lucas is a cloud native enthusiast that just graduated from High School. Lucas is serving the Kubernetes community in various lead positions, e.g. as a co-lead for SIG Cluster Lifecycle shepherding kubeadm from inception to GA, porting Kubernetes to multiple platforms and by being... Read More →

Thursday December 7, 2017 4:35pm - 5:10pm
Ballroom C, Level 1
Friday, December 8


Highly Available Services During Maintenance Events - Maisem Ali & Eric Tune, Google
Maintenance events occur and require taking down nodes for various reasons. Eric and Maisem talk about the best practices and lessons learned trying to minimize downtime during routine maintenance events.

They show how to use StatefulSets and PodDisruptionBudgets to achieve highly available services. They go on to explain what the best practices for performing node maintenance are using scenarios like failed pod evictions, non-responsive kubelets and network bisections.

avatar for Maisem Ali

Maisem Ali

Software Engineer, VMWare
avatar for Eric Tune

Eric Tune

Senior Staff Software Engineer, Google
Eric is a Senior Staff Software Engineer at Google, where he is an overall lead technical lead on Google Container Engine (GKE). He started contributing to Kubernetes in 2014. Before Kubernetes, he worked on Google's Borg project, and was a co-author of the Borg paper.

Friday December 8, 2017 11:10am - 11:45am
Ballroom C, Level 1


UDP in K8S: Signed, Sealed, but Delivered? [I] - Amanpreet Singh, Crowdfire
This talk is based on my personal experience working with Kubernetes in production. I will talk about the UDP failures we encountered in production, how we found out the root cause, how we mitigated and fixed the bug in kube-proxy. This will help the members of the audience who are - either planning to, or already using Kubernetes - to better understand the Kubernetes networking design and debug any issues they face.

avatar for Amanpreet Singh

Amanpreet Singh

Site Reliability Engineer, Indeed
Amanpreet is an engineer at Indeed & moonlights as a crowd entertainer. He’s an Open Source enthusiast who loves Go & can eat-drink-sleep Kubernetes. He gained extensive knowledge of Kubernetes and other cloud-native technology while handling the migration and continuous improvement... Read More →

Friday December 8, 2017 11:55am - 12:30pm
Ballroom C, Level 1


Disaster Recovery for your Kubernetes Clusters [I] - Andy Goldstein & Steve Kriss, Heptio
It’s 3am. Your pager is beeping. Your Kubernetes cluster is down. Don’t panic - we’ve got you covered. In this talk, we’ll describe a variety of disaster scenarios you may encounter. We’ll arm you with the knowledge you need to overcome them. Whether you’re a systems administrator, application developer, or end user, after this talk you’ll walk away with a thorough understanding of Kubernetes disaster recovery, including:

A disaster recovery overview
- Strategies for Kubernetes
- Comparisons to federation and high availability
- Which components to back up vs recreating from scratch

How to minimize your time to recovery
- Automate cluster creation and infrastructure configuration
- Back up and quickly restore your cluster applications, workloads, and persistent volumes using tools such as Heptio Ark

How to handle specific disaster scenarios
- Losing nodes
- Recovering from bad configuration updates
- Cloud provider outages

avatar for Andy Goldstein

Andy Goldstein

Staff Systems Engineer, VMware
Andy Goldstein is an engineer at Heptio where he works on tooling to make operating Kubernetes clusters easier, such as Ark, a disaster recovery tool for backing up and restoring Kubernetes workloads and persistent data. He is also a contributor to Kubernetes. Prior to his current... Read More →

Steve Kriss

Steve Kriss is a systems engineer at Heptio working on building tools and products to help Kubernetes users be successful, and has been a contributor to upstream Kubernetes as well as a member of the Kubernetes release team in the past. Steve recently relocated to Seattle from New... Read More →

Friday December 8, 2017 2:00pm - 2:35pm
Ballroom A, Level 1


Running Mesos Frameworks on Kubernetes with the Open-Source Universal Resource Broker - Fritz Ferstl, UNIVA
While Kubernetes continues to gain in popularity for cloud applications, many organizations run popular frameworks deployed on Mesos. The need to support multiple orchestration frameworks can result in added cost and complexity as organizations struggle to manage separate, siloed environments. Based on earlier work done for HPC users, Univa has contributed their Universal Resource Broker (URB) Technology to the Kubernetes community as an open-source project. The freely available software allows any Mesos compatible framework including (including Spark, Hadoop, Storm, Jenkins, Marathon and Chronos) to run along-side native Kubernetes services on a shared Kubernetes cluster providing the opportunity simplify environments and consolidate infrastructure.

In his talk Mr. Ferstl will discuss the challenge of running mixed workloads on Kubernetes, provide an architectural overview of the URB and provide a demonstration of the technology. He will also explain how Mesos users or application developers can get started quickly with the technology, and consider it for use in their own environments and applications.


Fritz Ferstl

Chief Technology Officer, UNIVA
Fritz is the Chief Technology Officer at Univa where he helps set technical direction for the company while also spearheading strategic alliances in EMEA. Fritz is widely regarded as the father of Grid Engine software and its forerunners Codine and GRD. He ran the Grid Engine software... Read More →

Friday December 8, 2017 2:45pm - 3:20pm
Ballroom C, Level 1


Kubernetes Ingress Controller with Apache Traffic Server [I] - Mrunmayi Dhume, Oath (Yahoo) & Suresh Visvanathan, Yahoo!
Today, the Oath Media Brands and Products container platform is serving critical application workloads like Yahoo Sports and Yahoo Finance at a large scale using Kubernetes as the orchestration framework.

For a platform at this scale, it is critical to have a powerful and flexible ingress routing layer (controller) that is able to handle the dynamic behavior of container based applications, such as auto-scaling, frequently changing pod IP addresses, self-serve onboarding and cluster-aware routing. This L7 routing layer must be quick to react to changes on the cluster without affecting its routing capabilities and impacting the in-flight requests. In a multi-tenant system it is even more vital that a single application deployment does not cause an impact to user traffic or hinder the release velocity of other tenants.

We developed an ingress controller based on Apache Traffic Server that satisfies the requirements stated above, while remaining scalable and easy to integrate with both Kubernetes and the Oath ecosystem. In this talk/presentation, we will elaborate on the architecture of the ingress controller, the performance metrics we’ve achieved, and the key learnings from supporting such a critical infrastructure component.


Mrunmayi Dhume

Senior Software Engineer, Verizon Media (Yahoo Inc)
Mrunmayi Dhume is a Senior Software Engineer in the Core Infrastructure team at Oath Media Brands and Products. She was involved early on in the introduction of Kubernetes in the organization and took a leadership role in designing and implementing the ingress routing layer components... Read More →
avatar for Suresh Visvanathan

Suresh Visvanathan

Sr Architect, Oath(Yahoo)
Suresh Visvanathan, Sr Architect, has over 13 years of experience in IT and Software. Suresh’s current responsibilities include the architecture, vision, strategy and design of cloud platform as-a-service (PaaS). Suresh has been architecting solutions and building products around... Read More →

Friday December 8, 2017 3:40pm - 4:15pm
Ballroom C, Level 1


What Happens When Something Goes Wrong? On Kubernetes Reliability [I] - Marek Grabowski & Tina Zhang, Google
One of the best features of the Kubernetes is that it can automatically recover from various failures and keep your application working despite unfavorable circumstances. There are moments when this works like magic and operators won't even notice something was going on. Sadly, sometimes automation fails.

In this talk we're going to describe various policies and mechanisms that are implemented in the system designed to keep user applications and cluster in general running. We'll talk both about things that will happen automatically and those that users need to configure.

avatar for Marek Grabowski

Marek Grabowski

Site Reliability Engineer, Google
Marek is a Software Engineer turned Site Reliability Engineer late 2017. Currently he focuses on reliability of Kubernetes clusters. Since 2013 he has been working on Google’s Technical Infrastructure, where early 2015 he joined Kubernetes engineering team. In Kubernetes his main... Read More →
avatar for Tina Zhang

Tina Zhang

Site Reliability Engineer, Google
Tina joined the Google as a Site Reliability Engineer for GKE in March 2017 and has primarily been working on delivering High Availability Masters in GKE, bringing GKE to more cloud regions and improving monitoring and alerting for the system. Prior to this, she had a previous life... Read More →

Friday December 8, 2017 4:25pm - 5:00pm
Ballroom A, Level 1