Taming troubleshooting at the cloud-native ‘connectivity layer’

Diagnosing the health of connections between modern API-driven applications is a beast. Isovalent and Grafana Labs are working to give platform teams simpler options.

cloud computing — Image: ShpilbergStudios/Adobe Stock

KubeCon — underway this week in Detroit — is always a bellwether of where the pain points still exist around Kubernetes adoption, as platform teams evolve from the so-called “Day 1” challenges to the “Day 2” requirements needed to make K8s infrastructure easier to scale and operate.

A clear focus this year at KubeCon is how platform teams troubleshoot what’s increasingly being referred to as the cloud-native “connectivity layer.” Integration between open source Grafana and Cilium brings heightened observability to this layer.

Working in the dark

“The shift toward building modern applications as a collection of API-driven services has many benefits, but let’s be honest, simplified monitoring and troubleshooting is not one of them,” said Dan Wendlandent, CEO at Isovalent. “In a world where a single click by a user may result in dozens, or even hundreds, of API calls under the hood, any fault, over-capacity or latency in the underlying connectivity can and often will negatively impact application behavior in ways that can be devilishly difficult to detect and root cause.”

SEE: Hiring Kit: Cloud Engineer (TechRepublic Premium)

And those devilish details are many. For one, the container replicas that Kubernetes creates of each service across multi-tenant Linux clusters make it very difficult to pinpoint where these connectivity issues occur. Between the application layer, and the underlying Layer 7 network, cloud-native connectivity is abstractions on top of abstractions — endless layers to troubleshoot. And because K8s clusters often run thousands of different services as containerized workloads that are constantly being created and destroyed, there is a ton of noise and ephemerality to contend with.

It’s a completely different architecture than legacy VM environments, where direct access to low-level network counters and tools like netstat and tcpdump were once common fare for troubleshooting connectivity, and where IPs were instructive about the sources and destinations of connections.

“In the ‘olden days’ of static applications, servers run as physical nodes or VMs on dedicated VLANs and subnets, and the IP address or subnet of a workload was often a long-term meaningful way to identify a specific application,” said Wendlandt. “This meant that IP-based network logs or counters could be analyzed to make meaningful statements about the behavior of an application.… Outside the Kubernetes cluster, when application developers use external APIs from cloud providers or other third parties, the IP addresses associated with these destinations often vary from one connection attempt to another, making it hard to interpret using IP-based logs.”

All is not lost, however. Relief may be ahead for platform teams, made possible by eBPF- based Cilium.

Enhancing observability through Cilium and Grafana

Cilium — a CNCF incubating project that’s becoming a de facto container networking interface for all the major cloud service providers’ Kubernetes engines — builds on top of eBPF’s ability to inject kernel-level observability into a new connectivity layer.

“Cilium leverages eBPF to ensure that all connectivity observability data is associated not only with the IP addresses, but also with the higher-level service identity of applications on both sides of a network connection,” said Wendlandt. “Because eBPF operates at the Linux kernel layer, this added observability does not require any changes to applications themselves or the use of heavyweight and complex sidecar proxies. Instead, Cilium inserts transparently beneath existing workloads, scaling horizontally within a Kubernetes cluster as it grows.”

Today at KubeCon, Grafana Labs and Isovalent, the company whose founders include the creator of Cilium and the eBPF Linux kernel maintainer, have announced a new Cilium-Grafana integration. This Cilium integration into the Grafana stack means platform teams that want a consistent observability experience for service connectivity across their Kubernetes environments can start using their same Grafana visualization tools to roll up their logging, tracing and metrics across the cloud-native connectivity layer.

This integration of the two open source technologies marks the beginning of the joint engineering initiatives launched after Grafana Labs’ strategic investment in Isovalent’s Series B funding round last month.

I previously argued that “observability” seems to have risen as the cool new term for much the same metrics, logs and traces that we’ve been analyzing long before the term was coined. But clearly this cloud-native connectivity issue is an especially confounding problem domain for platform teams to troubleshoot, and with this new eBPF-driven, kernel-level data being exposed as a consistent connectivity datasource, there appears to be a very high ceiling for new observability use cases being discussed at KubeCon this week.

Disclosure: I work for MongoDB but the views expressed herein are mine.

Categories

Taming troubleshooting at the cloud-native ‘connectivity layer’

Working in the dark

Enhancing observability through Cilium and Grafana

Leave a Reply Cancel reply