Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Observability - CloudNativeCon [clear filter]
Wednesday, December 6


Unified Monitoring of Containers and Microservices [I] - Nishant Sahay, Wipro Limited
Microservices are become critical for enterprise strategy towards simplifying their IT landscape. For a successful journey of microservice adoption, Container management, DevOps and Monitoring play an important role. Managing microservices in large-scale deployments are fraught with many unique challenges for enterprise IT.

Following are some of the key metrics of microservice monitoring which will enable the enterprises to manage their container platforms better:

1. Collecting logs, metrics from containers
2. Monitoring application running inside the container
3. Distributed tracing and the time taken by each service call.
4. Storage, analysis of collected metrics, logs
5. Performing RCA and anomaly detection on the collected logs and metrics

This session would explain how to harness the power of Zipkin with the intelligence of Spark ecosystem and the flexibility of ELK+ Beats to create a unified monitoring solution. Key features of this solution are – utilization of distributed tracing, infrastructure metrics to manage containers. All this is done through visualization, correlation and predictive monitoring

avatar for Nishant Sahay

Nishant Sahay

Senior Architect, Wipro Limited
Nishant Sahay is a senior architect in the Open Source COE lab at Wipro, where he is responsible for research and solution development in the area of machine learning and deep learning. Nishant has extensive experience in data analysis, design, and visualization. He has written articles... Read More →

Wednesday December 6, 2017 11:10am - 11:45am
Ballroom C, Level 1


Full Stack Visibility with Elastic: Logs, Metrics and Traces - Carlos Pérez-Aradros, Elastic
"With microservices every outage is like a murder mystery" is a common complaint. But it doesn't have to be! This talk gives an overview on how to monitor distributed applications. We dive into:

System metrics: Keep track of network traffic and system load.
Application logs: Collect structured logs in a central location.
Audit info: Watch for user and processes activity in the system.
Uptime monitoring: Ping services and actively monitor their availability and response time.
Application metrics: Get metrics and health information from for application via REST or JMX.
Request tracing: Gather timing data by using tools like Zipkin to retrieve and show call traces.

avatar for Carlos Pérez-Aradros

Carlos Pérez-Aradros

Software Engineer, Elastic
Carlos is a Software Engineer working for Elastic, he is a core developer of the Beats project. With love for distributed systems, he has experience in many container technologies and focuses on bringing the right tools to monitor them. When he is not coding you may find him playing... Read More →

Wednesday December 6, 2017 11:55am - 12:30pm
Ballroom C, Level 1


Would You Like Some Tracing With Your Monitoring? - Yuri Shkuro, Uber Technologies
Understanding how your microservices based application is executing in a highly distributed and elastic cloud environment can be complicated. Distributed tracing has emerged as an invaluable technique that succeeds where traditional monitoring tools falter. Yet deploying it can be quite challenging, especially in the large scale, polyglot environments of modern companies that mix together many different technologies. In this talk we share what we have learned while building and rolling out Jaeger, our open source, OpenTracing-native distributed tracing system, to hundreds of microservices at Uber. We showcase new and exciting features that make it even more valuable to engineers.

avatar for Yuri Shkuro

Yuri Shkuro

Staff Engineer, Uber Technologies
Yuri is a Staff engineer at Uber Technologies, working on distributed tracing, reliability, monitoring, and performance. He is a member of the CNCF OpenTracing Specification Council, and the founder of Jaeger, Uber's open source distributed tracing system.

Wednesday December 6, 2017 2:00pm - 2:35pm
Ballroom C, Level 1


The RED Method: How To Instrument Your Services [B] - Tom Wilkie, Kausal
The RED Method defines three key metrics you should measure for every microservice in your architecture; inspired by the USE Method from Brendan Gregg, it gives developers a template for instrumenting their services and building dashboards in a consistent, repeatable fashion.

In this talk we will discuss patterns of application instrumentation, where and when they are applicable, and how they can be implemented with Prometheus. We’ll cover Google’s Four Golden Signals, the RED Method, the USE Method, and Dye Testing. We’ll also discuss why consistency is an important approach for reducing cognitive load. Finally we’ll talk about the limitations of these approaches and what can be done to overcome them.

avatar for Tom Wilkie

Tom Wilkie

VP Product, Grafana Labs
Tom is VP Product at Grafana Labs, but really he is a software engineer. Tom is a maintainer on the Prometheus project and a maintainer and the original author of Cortex, both CNCF projects. Previously Tom founded Kausal, a company working on Prometheus, and worked at companies such... Read More →

Wednesday December 6, 2017 2:45pm - 3:20pm
Ballroom C, Level 1


Fluentd and Distributed Logging [I] - Masahiro Nakagawa, Treasure Data
In container era, logging is very important because applications are distributed. This session talks about why Fluentd is needed and how fluentd resolves the distributed logging problem by flexible and robust ways.


Masahiro Nakagawa

Principal Engineer, Arm Treasure Data
Fluentd maintainer

Wednesday December 6, 2017 3:40pm - 4:15pm
Ballroom C, Level 1


gRPC WG Update: Easy Instrumentation with OpenCensus - hosted by April Kyle Nassi, Jaana Burcu Dogan & Morgan McLean, Google

Join the gRPC contributors for a session looking at OpenCensus and gRPC integrations!

Getting traces and application-level metrics out of an application often requires headaches and a nontrivial amount of manual work, which is a challenge for developers and vendors alike. This is especially true when you have a microservice architecture. OpenCensus offers a simple and automatic way for developers to extract correlated traces and metrics from their application so that they can be processed by the backend of their choice. 

avatar for Jaana Dogan

Jaana Dogan

Engineer, Google
Jaana works on Google Compute Engine and is a familiar figure in the the software development community via her previous work on Go and OpenCensus, and from her blog and Twitter presence (@rakyll).
avatar for Morgan McLean

Morgan McLean

Product Manager, Google
Morgan is a co-founder of OpenCensus and OpenTelemetry, and has spent much of his career as an engineer and product manager working on distributed systems and developer tools. Morgan is responsible for Google's distributed tracing, profiling, and debugging tools, including Stackdriver... Read More →
avatar for April Nassi

April Nassi

Program Manager, Google
April Kyle Nassi is an Istio and gRPC community manager at Google focused on open source strategy. Previously, she created the Salesforce Developer community program and put on many a Dreamforce DevZone. She’s a CNCF Ambassador, crazy dog lady, and native Texan. You can find her... Read More →

Wednesday December 6, 2017 4:25pm - 5:00pm
Meeting Room 4C, Level 3


“If you Don’t Monitor your Infrastructure, you Don’t Own it!” Regain Control Thanks to Prometheus [I] - Etienne Coutaud & Guillaume Lefevre, OCTO Technology
In the French FedEx company we used Prometheus to monitor the infrastructure. It hosts a CQRS Architecture composed with Kafka, Spark, Cassandra, ElasticSearch, and microservices APIs in scala.

This presentation is about using Prometheus in production, you will see why we choosed Prometheus, how we integrated it, configured it and what kind of insights we extracted from the whole infrastructure.

In addition, you will see how Prometheus changed our way of working, how we implemented self-healing based on Prometheus, how we configured systemd to trigger AlertManager API, integration with slack and other cool stuffs.

Some demonstrations will be performed in addition of the presentation.

avatar for Etienne Coutaud

Etienne Coutaud

DevOps Engineer, OCTO Technology
Etienne Coutaud is a French DevOps Engineer working in OCTO Technology for 2 years in Paris. Etienne worked of the implementation on Openshift in production for the health insurance agency. Currently working for the French Fedex he participated on the cloud infrastructure automation... Read More →
avatar for Guillaume Lefevre

Guillaume Lefevre

Guillaume Lefevre is a French DevOps Engineer at OCTO Technology for a year now. He worked in the networking field for various company before moving to DevOps. Currently working for the French Fedex he participated on the cloud infrastructure automation, continuous integration and... Read More →

slides pdf

Wednesday December 6, 2017 4:25pm - 5:00pm
Ballroom C, Level 1