How Kubernetes Changed the Networking Model and What Developers Should Know about eBPF and Cilium

ctsmithiii
Aug 2, 2024
5 min read

In this interview with Nicolas Vibert, explore how Kubernetes transformed networking, the impact of eBPF and Cilium, and the future of network administration.

Enterprise networking is a radically different discipline in today’s microservices, containers, and Kubernetes paradigm than what it used to be in the old three-tier architecture world. Containers broke traditional networking models, and in this networking rethink for the distributed computing era, a lot has happened in a very short period of time. In this exclusive interview, Nicolas Vibert — senior technical engineer at Isovalent — traces this evolution and explains how open-source projects like eBPF and Cilium continue to drive advanced networking use cases for the modern cloud era.

Q: How Did Kubernetes Change the Networking Model? What’s Fundamentally Different About Networking in K8s/Cloud-Native Environments From Prior Enterprise Architectures?

A: In many ways, Kubernetes networking is similar to our traditional networking. Regardless of the underlying computing platform, it would help if you had your network to support the needs of your business and the applications they rely upon. The requirements are the same whether you are running apps on bare metal servers, virtual machines, or on Kubernetes: you need to connect applications, you need to make them accessible to end users, you need to secure access to them, and adhere to regulatory requirements, etc…

So, for me, a network engineer by trade, the differences weren’t necessarily about the architecture but more about the terminology. Translating terms like pods, nodes, ingress, and kube-proxy to what I was familiar with in the networking space was a demanding cognitive process. However, the most significant fundamental difference has to be about the ephemeral aspect of the systems the network interconnects. In traditional enterprise networks, workloads had predictable IP addresses throughout their lifespan, and we used that IP as their identity. It was based on it that we created our security controls. That’s how we’ve operated for the past 30 years. In Kubernetes, IP addresses are meaningless — workloads come and go, and their IP addresses are unpredictable — meaning we need new ways to enforce network security.

Q: What Do You Think Cilium Nailed — as an eBPF Program — in Giving Network Engineers a Better Abstraction to Kubernetes Networking? What’s Possible Now That Wasn’t Possible Before?

A: I’d say the first thing Cilium was great at was enforcing network security policies by creating a new identity bank that did not rely on irrelevant IP addresses but more meaningful abstractions like workload names and business logic. In Kubernetes, we can assign metadata to a workload using labels. This metadata can represent characteristics of a workload like its business intent, environment (prod, test, QA, etc.…), and regulatory requirements (PCI DSS, HIPAA, etc.…), and Cilium enables us to enforce security based on these identifiers.

The other appealing aspect of Cilium is its versatility and flexibility. EBPF has allowed us to insert network functions into Cilium while maintaining high performance. Network engineers can think of Cilium as a polymorphous networking system. Some users only use it for its security capabilities, while others might encrypt traffic between workloads. But we see more and more operators using Cilium to its full potential, removing the need to install and manage other tools like proxies, ingress, or service meshes.

Q: Performance in the Data Path for the Linux Kernel Itself and How That’s Made It Into Cilium in the New 1.16 Release Is a Really Big Story for the Evolution of the Project. What Can You Tell Readers About Netkit, the Most Recent Big Addition to Cilium, and Directly Related to That Performance Evolution?

A: Cilium 1.16 is another massive release with lots of new service mesh, security functionalities, and improvements to scale and performance. The most notable for me is the support for Netkit, a replacement for veth, the virtual Ethernet device we’ve used to connect containers for the past decade.

If I had to summarize my deep dive on Netkit in a couple of sentences, computing abstractions like virtualization and containerization introduce performance compromises, especially for networking. For example, a few years ago, you’d typically see a 35% drop in performance between host and container. The Cilium developers have steadily added features to catch up over the past few years. The final piece of the puzzle was addressing the limitations of veth. Veth was merged into the Linux Kernel in 2007. While it’s been working fine, it had one particular drawback: when traffic had to leave the Kubernetes Pod for another node, the data had first to enter the network stack within the container, switch namespace, be processed and handled on the host namespace before being sent and managed by the network interface for retransmission. This might not seem like much, but it had a noticeable impact on throughput.

Netkit lets us run BPF programs directly into the Pods and bring networking logic even closer to the source, meaning we can make decisions earlier in the process and avoid the detours previously taken by packets.

The outcome? We can finally eliminate the performance penalty that comes with containerized networking.

Q: When Something Goes Wrong, Everyone Always Wants To Blame the Network. How Does Network Troubleshooting Look Different in Kubernetes Environments Than Single Node Troubleshooting Scenarios, and How Does Cilium Fit into Troubleshooting Scenarios?

A: You’re right; the network always takes the blame! In Kubernetes, you have a gang of potential networking troublemakers: the Kubernetes network, the Linux kernel network stack, the virtual network (Kubernetes nodes are often virtual machines that rely on network virtualization), and the actual physical network. To troubleshoot Kubernetes networking effectively, you need tools that can see all contexts and prove the innocence of each potential network delinquent. Again, that’s an area where an eBPF-based tool like Tetragon is strong. It’s not only aware of IP networking traffic but also the Kubernetes and runtime contexts: it knows the name and namespace of the Kubernetes pods that emit that traffic down to the actual binary and process that generated it.

Q: What Do You See on the Horizon for Network Administrators? Does This Continue To Be a specialized role within large enterprises, or is networking dissolved into the Broader “Platform Engineering” Agenda? How Do You See the Old Network World and New Cloud-Native Networking World Coexisting in the Decade Ahead, as Still, by Most Predictions, Only About 20 Percent of Enterprises Have Adopted Kubernetes Thus Far?

A: The future is bright for network engineers. There are so many exciting areas for them to explore beyond traditional data center networking. You’ve got cloud networking—connecting workloads within and between clouds—NetDevOps for network automation and Kubernetes networking. Then you’ve got AI: AI applied to the network to make smarter, autonomous routing decisions and networking for AI workloads to make the network as reliable and fast as possible for GPU-based machines.

I think networking will remain a distinct profession. While some talented SREs/Platform Engineers may be able to operate across the entire stack, I suspect most DevOps engineers will be more than happy to relinquish networking responsibilities to the experts.

Insights From Analytics

How Kubernetes Changed the Networking Model and What Developers Should Know about eBPF and Cilium

In this interview with Nicolas Vibert, explore how Kubernetes transformed networking, the impact of eBPF and Cilium, and the future of network administration.

Q: How Did Kubernetes Change the Networking Model? What’s Fundamentally Different About Networking in K8s/Cloud-Native Environments From Prior Enterprise Architectures?

Q: What Do You Think Cilium Nailed — as an eBPF Program — in Giving Network Engineers a Better Abstraction to Kubernetes Networking? What’s Possible Now That Wasn’t Possible Before?

Q: When Something Goes Wrong, Everyone Always Wants To Blame the Network. How Does Network Troubleshooting Look Different in Kubernetes Environments Than Single Node Troubleshooting Scenarios, and How Does Cilium Fit into Troubleshooting Scenarios?

Recent Posts

Comments