Tag: k3s networking

  • Linux Internals of Kubernetes Networking

    Introduction

    This blog is a hands-on guide designed to help you understand Kubernetes networking concepts by following along. We’ll use K3s, a lightweight Kubernetes distribution, to explore how networking works within a cluster.

    System Requirements

    Before getting started, ensure your system meets the following requirements:

    • A Linux-based system (Ubuntu, CentOS, or equivalent).
    • At least 2 CPU cores and 4 GB of RAM.
    • Basic familiarity with Linux commands.

    Installing K3s

    To follow along with this guide, we first need to install K3s—a lightweight Kubernetes distribution designed for ease of use and optimized for resource-constrained environments.

    Install K3s

    You can install K3s by running the following command in your terminal:

    curl -sfL https://get.k3s.io | sh -

    This script will:

    1. Download and install the K3s server.
    2. Set up the necessary dependencies.
    3. Start the K3s service automatically after installation.

    Verify K3s Installation

    After installation, you can check the status of the K3s service to make sure everything is running correctly:

    systemctl status k3s

    If everything is correct, you should see that the K3s service is active and running.

    Set Up kubectl

    K3s comes bundled with its own kubectl binary. To use it, you can either:

    Use the K3s binary directly:

    k3s kubectl get pods -A

    Or set up the kubectl config file by exporting the Kubeconfig path:

    export KUBECONFIG="/etc/rancher/k3s/k3s.yaml"
    sudo chown -R $USER $KUBECONFIG
    kubectl get pods -A

    Understanding Kubernetes Networking

    In Kubernetes, networking plays a crucial role in ensuring seamless communication between pods, services, and external resources. In this section, we will dive into the network configuration and explore how pods communicate with one another.

    Viewing Pods and Their IP Addresses

    To check the IP addresses assigned to the pods, use the following kubectl command:

    CODE: https://gist.github.com/velotiotech/1961a4cdd5ec38f7f0fbe0523821dc7f.sh

    This will show you a list of all the pods across all namespaces, including their corresponding IP addresses. Each pod is assigned a unique IP address within the cluster.

    You’ll notice that the IP addresses are assigned by Kubernetes and typically belong to the range specified by the network plugin (such as Flannel, Calico, or the default CNI). K3s uses Flannel CNI by default and sets default pod CIDR as 10.42.0.0/24. These IPs allow communication within the cluster.

    Observing Network Configuration Changes

    Upon starting K3s, it sets up several network interfaces and configurations on the host machine. These configurations are key to how the Kubernetes networking operates. Let’s examine the changes using the IP utility.

    Show All Network Interfaces

    Run the following command to list all network interfaces:

    ip link show

    This will show all the network interfaces.

    • lo, enp0s3, and enp0s9 are the network interfaces that belong to the host.  
    • flannel.1 interface is created by Flannel CNI for inter-pod communication that exists on different nodes.
    • cni0 interface is created by bridge CNI plugin for inter-pod communication that exists on the same node.
    • vethXXXXXXXX@ifY interface is created by bridge CNI plugin. This interface connects pods with the cni0 bridge.

    Show IP Addresses

    To display the IP addresses assigned to the interfaces:

    ip -c -o addr show

    You should see the IP addresses of all the network interfaces. With regards to K3s-related interfaces, only cni0 and flannel.1 have IP addresses. The rest of the vethXXXXXXXX interfaces only have MAC addresses; the details regarding this will be explained in the later section of this blog.

    Pod-to-Pod Communication and Bridge Networks

    The diagram illustrates how container networking works within a Kubernetes (K3s) node, showing the key components that enable pods to communicate with each other and the outside world. Let’s break down this networking architecture:

    At the top level, we have the host interface (enp0s9) with IP 192.168.2.224, which is the node’s physical network interface connected to the external network. This is the node’s gateway to the outside world.

    enp0s9 interface is connected to the cni0 bridge (IP: 10.42.0.1/24), which acts like a virtual switch inside the node. This bridge serves as the internal network hub for all pods running on the node.

    Each of the pods runs in its own network namespace, with each one having its own separate network stack, which includes its own network interfaces and routing tables. Each of the pod’s internal interfaces, eth0, as shown in the diagram above, has an IP address, which is the pod’s IP address. eth0 inside the pod is connected to its virtual ethernet (veth) pair that exists in the host’s network and connects the eth0 interface of the pod to the cni0 bridge.

    Exploring Network Namespaces in Detail

    Kubernetes uses network namespaces to isolate networking for each pod, ensuring that pods have separate networking environments and do not interfere with each other. 

    A network namespace is a Linux kernel feature that provides network isolation for a group of processes. Each namespace has its own network interfaces, IP addresses, routing tables, and firewall rules. Kubernetes uses this feature to ensure that each pod has its own isolated network environment.

    In Kubernetes:

    • Each pod has its own network namespace.
    • Each container within a pod shares the same network namespace.

    Inspecting Network Namespaces

    To inspect the network namespaces, follow these steps:

    If you installed k3s as per this blog, k3s by default selects containerd runtime, your commands to get the container pid will be different if you run k3s with docker or other container runtimes.

    Identify the container runtime and get the list of running containers.

    sudo crictl ps

    Get the container-id from the output and use it to get the process ID

    sudo crictl inspect <container-id> | grep pid

    Check the network namespace associated with the container

    sudo ls -l /proc/<container-pid>/ns/net

    You can use nsenter to enter the network namespace for further exploration.

    Executing Into Network Namespaces

    To explore the network settings of a pod’s namespace, you can use the nsenter command.

    sudo nsenter --net=/proc/<container-pid>/ns/net
    ip addr show

    Script to exec into network namespace

    You can use the following script to get the container process ID and exec into the pod network namespace directly.

    POD_ID=$(sudo crictl pods --name <pod_name> -q) 
    CONTAINER_ID=$(sudo crictl ps --pod $POD_ID -q) 
    nsenter -t $(sudo crictl inspect $CONTAINER_ID | jq -r .info.pid) -n ip addr show

    Veth Interfaces and Their Connection to Bridge

    Inside the pod’s network namespace, you should see the pod’s interfaces (lo and eth0) and the IP address: 10.42.0.8 assigned to the pod. If observed closely, we see eth0@if13, which means eth0 is connected to interface 13 (in your system the corresponding veth might be different). Interface eth0 inside the pod is a virtual ethernet (veth) interface, veths are always created in interconnected pairs. In this case, one end of veth is eth0 while the other part is if13. But where does if13 exist? It exists as a part of the host network connecting the pod’s network to the host network via the bridge (cni0) in this case.

    ip link show | grep 13

    Here you see veth82ebd960@if2, which denotes that the veth is connected to interface number 2 in the pod’s network namespace. You can verify that the veth is connected to bridge cni0 as follows and that the veth of each pod is connected to the bridge, which enables communication between the pods on the same node.

    brctl show

    Demonstrating Pod-to-Pod Communication

    Deploy Two Pods

    Deploy two busybox pods to test communication:

    kubectl run pod1 --image=busybox --restart=Never -- sleep infinity
    kubectl run pod2 --image=busybox --restart=Never -- sleep infinity

    Get the IP Addresses of the Pods

    kubectl get pods pod1 pod2 -o wide -A

    Pod1 IP : 10.42.0.9

    Pod2 IP : 10.42.0.10

    Ping Between Pods and Observe the Traffic Between Two Pods

    Before we ping from Pod1 to Pod2, we will set up a watch on cni0 and veth pair of Pod1 and pod2 that are connected to cni0 using tcpdump.

    Open three terminals and set up the tcpdump listeners: 

    # Terminal 1 – Watch traffic on cni0 bridge 

    sudo tcpdump -i cni0 icmp

     # Terminal 2 – Watch traffic on veth1 (Pod1’s veth pair)

    sudo tcpdump -i veth3a94f27 icmp

    # Terminal 3 – Watch traffic on veth2 (Pod2’s veth pair) 

    sudo tcpdump -i veth18eb7d52 icmp

    Exec into Pod1 and ping Pod2:

    kubectl exec -it pod1 -- ping -c 4 <pod2-IP>

    Watch results on veth3a94f27 pair of Pod1.

    Watch results on cni0:

    Watch results on veth18eb7d52 pair of Pod2:

    Observing the timestamps for each request and reply on different interfaces, we get the flow of request/reply, as shown in the diagram below.

    Deeper Dive into the Journey of Network Packets from One Pod to Another

    We have already seen the flow of request/reply between two pods via veth interfaces connected to each other in a bridge network. In this section, we will discuss the internal details of how a network packet reaches from one pod to another.

    Packet Leaving Pod1’s Network

    Inside Pod1’s network namespace, the packet originates from eth0 (Pod1’s internal interface) and is sent out via its virtual ethernet interface pair in the host network. The destination address of the network packet is 10.0.0.10, which lies within the CIDR range 10.42.0.0 – 10.42.0.255 hence it matches the second route.

    The packet exits Pod1’s namespace and enters the host namespace via the connected veth pair that exists in the host network. The packet arrives at bridge cni0 since it is the master of all the veth pairs that exist in the host network.

    Once the packet reaches cni0, it gets forwarded to the correct veth pair connected to Pod2.

    Packet Forwarding from cni0 to Pod2’s Network

    When the packet reaches cni0, the job of cni0 is to forward this packet to Pod2. cni0 bridge acts as a Layer2 switch here, which just forwards the packet to the destination veth. The bridge maintains a forwarding database and dynamically learns the mapping of the destination MAC address and its corresponding veth device. 

    You can view forwarding database information with the following command:

    bridge fdb show

    In this screenshot, I have limited the result of forwarding database to just the MAC address of Pod2’s eth0

    1. First column: MAC address of Pod2’s eth0
    2. dev vethX: The network interface this MAC address is reachable through
    3. master cni0: Indicates this entry belongs to cni0 bridge
    4. Flags that may appear:
      • permanent: Static entry, manually added or system-generated
      • self: MAC address belongs to the bridge interface itself
      • No flag: The entry is Dynamically learned.

    Dynamic MAC Learning Process

    When a packet is generated with a payload of ICMP requests made from Pod1, it is packed as a frame at layer 2 with source MAC as the MAC address of the eth0 interface in Pod1, in order to get the destination MAC address, eth0 broadcasts an ARP request to all the network interfaces the ARP request contains the destination interface’s IP address.

    This ARP request is received by all interfaces connected to the bridge, but only Pod2’s eth0 interface responds with its MAC address. The destination MAC address is then added to the frame, and the packet is sent to the cni0 bridge.

    This destination MAC address is added to the frame, and it is sent to the cni0 bridge.  

    When this frame reaches the cni0 bridge, the bridge will open the frame and it will save the source MAC against the source interface(veth pair of pod1’s eth0 in the host network) in the forwarding table.

    Now the bridge has to forward the frame to the appropriate interface where the destination lies (i.e. veth pair of Pod2 in the host network). If the forwarding table has information about veth pair of Pod2 then the bridge will forward that information to Pod2, else it will flood the frame to all the veths connected to the bridge, hence reaching Pod2.

    When Pod2 sends the reply to Pod1 for the request made, the reverse path is followed. In this case, the frame leaves Pod2’s eth0 and is tunneled to cni0 via the veth pair of Pod2’s eth0 in the host network. Bridge adds the source MAC address (in this case, the source will be Pod2’s eth0) and the device from which it is reachable in the forwarding database, and forwards the reply to Pod1, hence completing the request and response cycle.

    Summary and Key Takeaways

    In this guide, we explored the foundational elements of Linux that play a crucial role in Kubernetes networking using K3s. Here are the key takeaways:

    • Network Namespaces ensure pod isolation.
    • Veth Interfaces connect pods to the host network and enable inter-pod communication.
    • Bridge Networks facilitate pod-to-pod communication on the same node.

    I hope you gained a deeper understanding of how Linux internals are used in Kubernetes network design and how they play a key role in pod-to-pod communication within the same node.