Tag: kubernetes

Linux Internals of Kubernetes Networking
Introduction

This blog is a hands-on guide designed to help you understand Kubernetes networking concepts by following along. We’ll use K3s, a lightweight Kubernetes distribution, to explore how networking works within a cluster.

System Requirements

Before getting started, ensure your system meets the following requirements:
- A Linux-based system (Ubuntu, CentOS, or equivalent).
- At least 2 CPU cores and 4 GB of RAM.
- Basic familiarity with Linux commands.
Installing K3s

To follow along with this guide, we first need to install K3s—a lightweight Kubernetes distribution designed for ease of use and optimized for resource-constrained environments.

Install K3s

You can install K3s by running the following command in your terminal:
```
curl -sfL https://get.k3s.io | sh -
```
This script will:
1. Download and install the K3s server.
2. Set up the necessary dependencies.
3. Start the K3s service automatically after installation.
Verify K3s Installation

After installation, you can check the status of the K3s service to make sure everything is running correctly:
```
systemctl status k3s
```
If everything is correct, you should see that the K3s service is active and running.

Set Up kubectl

K3s comes bundled with its own kubectl binary. To use it, you can either:

Use the K3s binary directly:
```
k3s kubectl get pods -A
```
Or set up the kubectl config file by exporting the Kubeconfig path:
```
export KUBECONFIG="/etc/rancher/k3s/k3s.yaml"
sudo chown -R $USER $KUBECONFIG
kubectl get pods -A
```
Understanding Kubernetes Networking

In Kubernetes, networking plays a crucial role in ensuring seamless communication between pods, services, and external resources. In this section, we will dive into the network configuration and explore how pods communicate with one another.

Viewing Pods and Their IP Addresses

To check the IP addresses assigned to the pods, use the following kubectl command:

CODE: https://gist.github.com/velotiotech/1961a4cdd5ec38f7f0fbe0523821dc7f.sh

This will show you a list of all the pods across all namespaces, including their corresponding IP addresses. Each pod is assigned a unique IP address within the cluster.

You’ll notice that the IP addresses are assigned by Kubernetes and typically belong to the range specified by the network plugin (such as Flannel, Calico, or the default CNI). K3s uses Flannel CNI by default and sets default pod CIDR as 10.42.0.0/24. These IPs allow communication within the cluster.

Observing Network Configuration Changes

Upon starting K3s, it sets up several network interfaces and configurations on the host machine. These configurations are key to how the Kubernetes networking operates. Let’s examine the changes using the IP utility.

Show All Network Interfaces

Run the following command to list all network interfaces:
```
ip link show
```
This will show all the network interfaces.
- lo, enp0s3, and enp0s9 are the network interfaces that belong to the host.
- flannel.1 interface is created by Flannel CNI for inter-pod communication that exists on different nodes.
- cni0 interface is created by bridge CNI plugin for inter-pod communication that exists on the same node.
- vethXXXXXXXX@ifY interface is created by bridge CNI plugin. This interface connects pods with the cni0 bridge.
Show IP Addresses

To display the IP addresses assigned to the interfaces:
```
ip -c -o addr show
```
You should see the IP addresses of all the network interfaces. With regards to K3s-related interfaces, only cni0 and flannel.1 have IP addresses. The rest of the vethXXXXXXXX interfaces only have MAC addresses; the details regarding this will be explained in the later section of this blog.

Pod-to-Pod Communication and Bridge Networks

The diagram illustrates how container networking works within a Kubernetes (K3s) node, showing the key components that enable pods to communicate with each other and the outside world. Let’s break down this networking architecture:

At the top level, we have the host interface (enp0s9) with IP 192.168.2.224, which is the node’s physical network interface connected to the external network. This is the node’s gateway to the outside world.

enp0s9 interface is connected to the cni0 bridge (IP: 10.42.0.1/24), which acts like a virtual switch inside the node. This bridge serves as the internal network hub for all pods running on the node.

Each of the pods runs in its own network namespace, with each one having its own separate network stack, which includes its own network interfaces and routing tables. Each of the pod’s internal interfaces, eth0, as shown in the diagram above, has an IP address, which is the pod’s IP address. eth0 inside the pod is connected to its virtual ethernet (veth) pair that exists in the host’s network and connects the eth0 interface of the pod to the cni0 bridge.

Exploring Network Namespaces in Detail

Kubernetes uses network namespaces to isolate networking for each pod, ensuring that pods have separate networking environments and do not interfere with each other.

A network namespace is a Linux kernel feature that provides network isolation for a group of processes. Each namespace has its own network interfaces, IP addresses, routing tables, and firewall rules. Kubernetes uses this feature to ensure that each pod has its own isolated network environment.

In Kubernetes:
- Each pod has its own network namespace.
- Each container within a pod shares the same network namespace.
Inspecting Network Namespaces

To inspect the network namespaces, follow these steps:

If you installed k3s as per this blog, k3s by default selects containerd runtime, your commands to get the container pid will be different if you run k3s with docker or other container runtimes.

Identify the container runtime and get the list of running containers.
```
sudo crictl ps
```
Get the container-id from the output and use it to get the process ID
```
sudo crictl inspect <container-id> | grep pid
```
Check the network namespace associated with the container
```
sudo ls -l /proc/<container-pid>/ns/net
```
You can use nsenter to enter the network namespace for further exploration.

Executing Into Network Namespaces

To explore the network settings of a pod’s namespace, you can use the nsenter command.
```
sudo nsenter --net=/proc/<container-pid>/ns/net
ip addr show
```
Script to exec into network namespace

You can use the following script to get the container process ID and exec into the pod network namespace directly.
```
POD_ID=$(sudo crictl pods --name <pod_name> -q) 
CONTAINER_ID=$(sudo crictl ps --pod $POD_ID -q) 
nsenter -t $(sudo crictl inspect $CONTAINER_ID | jq -r .info.pid) -n ip addr show
```
Veth Interfaces and Their Connection to Bridge

Inside the pod’s network namespace, you should see the pod’s interfaces (lo and eth0) and the IP address: 10.42.0.8 assigned to the pod. If observed closely, we see eth0@if13, which means eth0 is connected to interface 13 (in your system the corresponding veth might be different). Interface eth0 inside the pod is a virtual ethernet (veth) interface, veths are always created in interconnected pairs. In this case, one end of veth is eth0 while the other part is if13. But where does if13 exist? It exists as a part of the host network connecting the pod’s network to the host network via the bridge (cni0) in this case.
```
ip link show | grep 13
```
Here you see veth82ebd960@if2, which denotes that the veth is connected to interface number 2 in the pod’s network namespace. You can verify that the veth is connected to bridge cni0 as follows and that the veth of each pod is connected to the bridge, which enables communication between the pods on the same node.
```
brctl show
```
Demonstrating Pod-to-Pod Communication

Deploy Two Pods

Deploy two busybox pods to test communication:
```
kubectl run pod1 --image=busybox --restart=Never -- sleep infinity
kubectl run pod2 --image=busybox --restart=Never -- sleep infinity
```
Get the IP Addresses of the Pods
```
kubectl get pods pod1 pod2 -o wide -A
```
Pod1 IP : 10.42.0.9

Pod2 IP : 10.42.0.10

Ping Between Pods and Observe the Traffic Between Two Pods

Before we ping from Pod1 to Pod2, we will set up a watch on cni0 and veth pair of Pod1 and pod2 that are connected to cni0 using tcpdump.

Open three terminals and set up the tcpdump listeners:

# Terminal 1 – Watch traffic on cni0 bridge
```
sudo tcpdump -i cni0 icmp
```
# Terminal 2 – Watch traffic on veth1 (Pod1’s veth pair)
```
sudo tcpdump -i veth3a94f27 icmp
```
# Terminal 3 – Watch traffic on veth2 (Pod2’s veth pair)
```
sudo tcpdump -i veth18eb7d52 icmp
```
Exec into Pod1 and ping Pod2:
```
kubectl exec -it pod1 -- ping -c 4 <pod2-IP>
```
Watch results on veth3a94f27 pair of Pod1.

Watch results on cni0:

Watch results on veth18eb7d52 pair of Pod2:

Observing the timestamps for each request and reply on different interfaces, we get the flow of request/reply, as shown in the diagram below.

Deeper Dive into the Journey of Network Packets from One Pod to Another

We have already seen the flow of request/reply between two pods via veth interfaces connected to each other in a bridge network. In this section, we will discuss the internal details of how a network packet reaches from one pod to another.

Packet Leaving Pod1’s Network

Inside Pod1’s network namespace, the packet originates from eth0 (Pod1’s internal interface) and is sent out via its virtual ethernet interface pair in the host network. The destination address of the network packet is 10.0.0.10, which lies within the CIDR range 10.42.0.0 – 10.42.0.255 hence it matches the second route.

The packet exits Pod1’s namespace and enters the host namespace via the connected veth pair that exists in the host network. The packet arrives at bridge cni0 since it is the master of all the veth pairs that exist in the host network.

Once the packet reaches cni0, it gets forwarded to the correct veth pair connected to Pod2.

Packet Forwarding from cni0 to Pod2’s Network

When the packet reaches cni0, the job of cni0 is to forward this packet to Pod2. cni0 bridge acts as a Layer2 switch here, which just forwards the packet to the destination veth. The bridge maintains a forwarding database and dynamically learns the mapping of the destination MAC address and its corresponding veth device.

You can view forwarding database information with the following command:
```
bridge fdb show
```
In this screenshot, I have limited the result of forwarding database to just the MAC address of Pod2’s eth0
1. First column: MAC address of Pod2’s eth0
2. dev vethX: The network interface this MAC address is reachable through
3. master cni0: Indicates this entry belongs to cni0 bridge
4. Flags that may appear:
  - permanent: Static entry, manually added or system-generated
  - self: MAC address belongs to the bridge interface itself
  - No flag: The entry is Dynamically learned.
Dynamic MAC Learning Process

When a packet is generated with a payload of ICMP requests made from Pod1, it is packed as a frame at layer 2 with source MAC as the MAC address of the eth0 interface in Pod1, in order to get the destination MAC address, eth0 broadcasts an ARP request to all the network interfaces the ARP request contains the destination interface’s IP address.

This ARP request is received by all interfaces connected to the bridge, but only Pod2’s eth0 interface responds with its MAC address. The destination MAC address is then added to the frame, and the packet is sent to the cni0 bridge.

This destination MAC address is added to the frame, and it is sent to the cni0 bridge.

When this frame reaches the cni0 bridge, the bridge will open the frame and it will save the source MAC against the source interface(veth pair of pod1’s eth0 in the host network) in the forwarding table.

Now the bridge has to forward the frame to the appropriate interface where the destination lies (i.e. veth pair of Pod2 in the host network). If the forwarding table has information about veth pair of Pod2 then the bridge will forward that information to Pod2, else it will flood the frame to all the veths connected to the bridge, hence reaching Pod2.

When Pod2 sends the reply to Pod1 for the request made, the reverse path is followed. In this case, the frame leaves Pod2’s eth0 and is tunneled to cni0 via the veth pair of Pod2’s eth0 in the host network. Bridge adds the source MAC address (in this case, the source will be Pod2’s eth0) and the device from which it is reachable in the forwarding database, and forwards the reply to Pod1, hence completing the request and response cycle.

Summary and Key Takeaways

In this guide, we explored the foundational elements of Linux that play a crucial role in Kubernetes networking using K3s. Here are the key takeaways:
- Network Namespaces ensure pod isolation.
- Veth Interfaces connect pods to the host network and enable inter-pod communication.
- Bridge Networks facilitate pod-to-pod communication on the same node.
I hope you gained a deeper understanding of how Linux internals are used in Kubernetes network design and how they play a key role in pod-to-pod communication within the same node.
May 26, 2025
Strategies for Cost Optimization Across Amazon EKS Clusters
Fast-growing tech companies rely heavily on Amazon EKS clusters to host a variety of microservices and applications. The pairing of Amazon EKS for managing the Kubernetes Control Plane and Amazon EC2 for flexible Kubernetes nodes creates an optimal environment for running containerized workloads.

With the increasing scale of operations, optimizing costs across multiple EKS clusters has become a critical priority. This blog will demonstrate how we can leverage various tools and strategies to analyze, optimize, and manage EKS costs effectively while maintaining performance and reliability.

Cost Analysis:

Working on cost optimization becomes absolutely necessary for cost analysis. Data plays an important role, and trust your data. The total cost of operating an EKS cluster encompasses several components. The EKS Control Plane (or Master Node) incurs a fixed cost of $0.20 per hour, offering straightforward pricing.

Meanwhile, EC2 instances, serving as the cluster’s nodes, introduce various cost factors, such as block storage and data transfer, which can vary significantly based on workload characteristics. For this discussion, we’ll focus primarily on two aspects of EC2 cost: instance hours and instance pricing. Let’s look at how to do the cost analysis on your EKS cluster.
- Tool Selection: We can begin our cost analysis journey by selecting Kubecost, a powerful tool specifically designed for Kubernetes cost analysis. Kubecost provides granular insights into resource utilization and costs across our EKS clusters.
- Deployment and Usage: Deploying Kubecost is straightforward. We can integrate it with our Kubernetes clusters following the provided documentation. Kubecost’s intuitive dashboard allowed us to visualize resource usage, cost breakdowns, and cost allocation by namespace, pod, or label. Once deployed, you can see the Kubecost overview page in your browser by port-forwarding the Kubecost k8s service. It might take 5-10 minutes for Kubecost to gather metrics. You can see your Amazon EKS spend, including cumulative cluster costs, associated Kubernetes asset costs, and monthly aggregated spend.
- Cluster Level Cost Analysis: For multi-cluster cost analysis and cluster level scoping, consider using the AWS Tagging strategy and tag your EKS clusters. Learn more about tagging strategy from the following documentations. You can then view your cost analysis in AWS Cost Explorer. AWS Cost Explorer provided additional insights into our AWS usage and spending trends. By analyzing cost and usage data at a granular level, we can identify areas for further optimization and cost reduction.
- Multi-Cluster Cost Analysis using Kubecost and Prometheus: Kubecost deployment comes with a Prometheus cluster to send cost analysis metrics to the Prometheus server. For multiple EKS clusters, we can enable the remote Prometheus server, either AWS-Managed Prometheus server or self-managed Prometheus. To get cost analysis metrics from multiple clusters, we need to run Kubeost with an additional Sigv4 pod that sends individual and combined cluster metrics to a common Prometheus cluster. You can follow the AWS documentation for Multi-Cluster Cost Analysis using Kubecost and Prometheus.
Cost Optimization Strategies:

Based on the cost analysis, the next step is to plan your cost optimization strategies. As explained in the previous section, the Control Plane has a fixed cost and straightforward pricing model. So, we will focus mainly on optimizing the data nodes and optimizing the application configuration. Let’s look at the following strategies when optimizing the cost of the EKS cluster and supporting AWS services:
- Right Sizing: On the cost optimization pillar of the AWS Well-Architected Framework, we find a section on Cost-Effective Resources, which describes Right Sizing as:
“… using the lowest cost resource that still meets the technical specifications of a specific workload.”
- Application Right Sizing: Right-sizing is the strategy to optimize pod resources by allocating the appropriate CPU and memory resources to pods. Care must be taken to try to set requests that align as close as possible to the actual utilization of these resources. If the value is too low, then the containers may experience throttling of the resources and impact the performance. However, if the value is too high, then there is waste, since those unused resources remain reserved for that single container. When actual utilization is lower than the requested value, the difference is called slack cost. A tool like kube-resource-report is valuable for visualizing the slack cost and right-sizing the requests for the containers in a pod. Installation instructions demonstrate how to install via an included helm chart.
  
  helm upgrade –install kube-resource-report chart/kube-resource-report
  
  You can also consider tools like VPA recommender with Goldilocks to get an insight into your pod resource consumption and utilization.
- Compute Right Sizing: Application right sizing and Kubecost analysis are required to right-size EKS Compute. Here are several strategies for computing right sizing:some text
  - Mixed Instance Auto Scaling group: Employ a mixed instance policy to create a diversified pool of instances within your auto scaling group. This mix can include both spot and on-demand instances. However, it’s advisable not to mix instances of different sizes within the same Node group.
  - Node Groups, Taints, and Tolerations: Utilize separate Node Groups with varying instance sizes for different application requirements. For example, use distinct node groups for GPU-intensive and CPU-intensive applications. Use taints and tolerations to ensure applications are deployed on the appropriate node group.
  - Graviton Instances: Explore the adoption of Graviton Instances, which offer up to 40% better price performance compared to traditional instances. Consider migrating to Graviton Instances to optimize costs and enhance application performance.
- Purchase Options: Another part of the cost optimization pillar of the AWS Well-Architected Framework that we can apply comes from the Purchasing Options section, which says:
“Spot Instances allow you to use spare compute capacity at a significantly lower cost than On-Demand EC2 instances (up to 90%).”

Understanding purchase options for Amazon EC2 is crucial for cost optimization. The Amazon EKS data plane consists of worker nodes or serverless compute resources responsible for running Kubernetes application workloads. These nodes can utilize different capacity types and purchase options, including On-Demand, Spot Instances, Savings Plans, and Reserved Instances.

On-Demand and Spot capacity offer flexibility without spending commitments. On-Demand instances are billed based on runtime and guarantee availability at On-Demand rates, while Spot instances offer discounted rates but are preemptible. Both options are suitable for temporary or bursty workloads, with Spot instances being particularly cost-effective for applications tolerant of compute availability fluctuations.

Reserved Instances involve upfront spending commitments over one or three years for discounted rates. Once a steady-state resource consumption profile is established, Reserved Instances or Savings Plans become effective. Savings Plans, introduced as a more flexible alternative to Reserved Instances, allow for commitments based on a “US Dollar spend amount,” irrespective of provisioned resources. There are two types: Compute Savings Plans, offering flexibility across instance types, Fargate, and Lambda charges, and EC2 Instance Savings Plans, providing deeper discounts but restricting compute choice to an instance family.

Tailoring your approach to your workload can significantly impact cost optimization within your EKS cluster. For non-production environments, leveraging Spot Instances exclusively can yield substantial savings. Meanwhile, implementing Mixed-Instances Auto Scaling Groups for production workloads allows for dynamic scaling and cost optimization. Additionally, for steady workloads, investing in a Savings Plan for EC2 instances can provide long-term cost benefits. By strategically planning and optimizing your EC2 instances, you can achieve a notable reduction in your overall EKS compute costs, potentially reaching savings of approximately 60-70%.
- Auto Scaling: The cost optimization pillar of the AWS Well-Architected Framework includes a section on Matching Supply and Demand, which recommends the following:
“… this (matching supply and demand) accomplished using Auto Scaling, which helps you to scale your EC2 instances and Spot Fleet capacity up or down automatically according to conditions you define.”
- Cluster Autoscaling: Therefore, a prerequisite to cost optimization on a Kubernetes cluster is to ensure you have Cluster Autoscaler running. This tool performs two critical functions in the cluster. First, it will monitor the cluster for pods that are unable to run due to insufficient resources. Whenever this occurs, the Cluster Autoscaler will update the Amazon EC2 Auto Scaling group to increase the desired count, resulting in additional nodes in the cluster. Additionally, the Cluster Autoscaler will detect nodes that have been underutilized and reschedule pods onto other nodes. Cluster Autoscaler will then decrease the desired count for the Auto Scaling group to scale in the number of nodes.
‍The Amazon EKS User Guide has a great section on the configuration of the Cluster Autoscaler. There are a couple of things to pay attention to when configuring the Cluster Autoscaler:

IAM Roles for Service Account – Cluster Autoscaler will require access to update the desired capacity in the Auto Scaling group. The recommended approach is to create a new IAM role with the required policies and a trust policy that restricts access to the service account used by Cluster Autoscaler. The role name must then be provided as an annotation on the service account:
```
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cluster-autoscaler
  annotations:
	eks.amazonaws.com/role-arn: arn:aws:iam::000000000000:role/my_role_name
```
Auto-Discovery Setup

Setup your Cluster Autoscaler in Auto-Discovery Setup by enabling the –node-group-auto-discovery flag as an argument. Also, make sure to tag your EKS nodes’ Autoscaling groups with the following tags:

k8s.io/cluster-autoscaler/enabled,
k8s.io/cluster-autoscaler/<cluster-name>

Auto Scaling Group per AZ – When Cluster Autoscaler scales out, it simply increases the desired count for the Auto Scaling group, leaving the responsibility for launching new EC2 instances to the AWS Auto Scaling service. If an Auto Scaling group is configured for multiple availability zones, then the new instance may be provisioned in any of those availability zones.

For deployments that use persistent volumes, you will need to provision a separate Auto Scaling group for each availability zone. This way, when Cluster Autoscaler detects the need to scale out in response to a given pod, it can target the correct availability zone for the scale-out based on persistent volume claims that already exist in a given availability zone.

When using multiple Auto Scaling groups, be sure to include the following argument in the pod specification for Cluster Autoscaler:

–balance-similar-node-groups=true
- Pod Autoscaling: Now that Cluster Autoscaler is running in the cluster, you can be confident that the instance hours will align closely with the demand from pods within the cluster. Next up is to use Horizontal Pod Autoscaler (HPA) to scale out or in the number of pods for a deployment based on specific metrics for the pods to optimize pod hours and further optimize our instance hours.
The HPA controller is included with Kubernetes, so all that is required to configure HPA is to ensure that the Kubernetes metrics server is deployed in your cluster and then defining HPA resources for your deployments. For example, the following HPA resource is configured to monitor the CPU utilization for a deployment named nginx-ingress-controller. HPA will then scale out or in the number of pods between 1 and 5 to target an average CPU utilization of 80% across all the pods:
```
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-ingress-controller
spec:
  scaleTargetRef:
	apiVersion: apps/v1
	kind: Deployment
	name: nginx-ingress-controller
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 80
```
The combination of Cluster Autoscaler and Horizontal Pod Autoscaler is an effective way to keep EC2 instance hours tied as close as possible to the actual utilization of the workloads running in the cluster.

‍
- Down Scaling: In addition to demand-based automatic scaling, the Matching Supply and Demand section of the AWS Well-Architected Framework cost optimization pillar includes a section, which recommends the following:
“Systems can be scheduled to scale out or in at defined times, such as the start of business hours, thus ensuring that resources are available when users arrive.”

There are many deployments that only need to be available during business hours. A tool named kube-downscaler can be deployed to the cluster to scale in and out the deployments based on time of day.

Some example use case of kube-downscaler is:
- Deploy the downscaler to a test (non-prod) cluster with a default uptime or downtime time range to scale down all deployments during the night and weekend.
- Deploy the downscaler to a production cluster without any default uptime/downtime setting and scale down specific deployments by setting the downscaler/uptime (or downscaler/downtime) annotation. This might be useful for internal tooling front ends, which are only needed during work time.
- AWS Fargate with EKS: You can run Kubernetes without managing clusters of K8s servers with AWS Fargate, a serverless compute service.
AWS Fargate pricing is based on usage (pay-per-use). There are no upfront charges here as well. There is, however, a one-minute minimum charge. All charges are also rounded up to the nearest second. You will also be charged for any additional services you use, such as CloudWatch utilization charges and data transfer fees. Fargate can also reduce your management costs by reducing the number of DevOps professionals and tools you need to run Kubernetes on Amazon EKS.

Conclusion:

Effectively managing costs across multiple Amazon EKS clusters is essential for optimizing operations. By utilizing tools like Kubecost and AWS Cost Explorer, coupled with strategies such as right-sizing, mixed instance policies, and Spot Instances, organizations can streamline cost analysis and optimize resource allocation. Additionally, implementing auto-scaling mechanisms like Cluster Autoscaler ensures dynamic resource scaling based on demand, further optimizing costs. Leveraging AWS Fargate with EKS can eliminate the need to manage Kubernetes clusters, reducing management costs. Overall, by combining these strategies, organizations can achieve significant cost savings while maintaining performance and reliability in their containerized environments.
April 2, 2024

Mastering Prow: A Guide to Developing Your Own Plugin for Kubernetes CI/CD Workflow

Continuous Integration and Continuous Delivery (CI/CD) pipelines are essential components of modern software development, especially in the world of Kubernetes and containerized applications. To facilitate these pipelines, many organizations use Prow, a CI/CD system built specifically for Kubernetes. While Prow offers a rich set of features out of the box, you may need to develop your own plugins to tailor the system to your organization’s requirements. In this guide, we’ll explore the world of Prow plugin development and show you how to get started.

Prerequisites

Before diving into Prow plugin development, ensure you have the following prerequisites:

Basic Knowledge of Kubernetes and CI/CD Concepts: Familiarity with Kubernetes concepts such as Pods, Deployments, and Services, as well as understanding CI/CD principles, will be beneficial for understanding Prow plugin development.
Access to a Kubernetes Cluster: You’ll need access to a Kubernetes cluster for testing your plugins. If you don’t have one already, you can set up a local cluster using tools like Minikube or use a cloud provider’s managed Kubernetes service.
Prow Setup: Install and configure Prow in your Kubernetes cluster. You can visit Velotio Technologies – Getting Started with Prow: A Kubernetes-Native CI/CD Framework
Development Environment Setup: Ensure you have Git, Go, and Docker installed on your local machine for developing and testing Prow plugins. You’ll also need to configure your environment to interact with your organization’s Prow setup.

The Need for Custom Prow Plugins

While Prow provides a wide range of built-in plugins, your organization’s Kubernetes workflow may have specific requirements that aren’t covered by these defaults. This is where developing custom Prow plugins comes into play. Custom plugins allow you to extend Prow’s functionality to cater to your needs. Whether automating workflows, integrating with other tools, or enforcing custom policies, developing your own Prow plugins gives you the power to tailor your CI/CD pipeline precisely.

Getting Started with Prow Plugin Development

Developing a custom Prow plugin may seem daunting, but with the right approach and tools, it can be a rewarding experience. Here’s a step-by-step guide to get you started:

1. Set Up Your Development Environment

Before diving into plugin development, you need to set up your development environment. You will need Git, Go, and access to a Kubernetes cluster for testing your plugins. Ensure you have the necessary permissions to make changes to your organization’s Prow setup.

2. Choose a Plugin Type

Prow supports various plugin types, including postsubmits, presubmits, triggers, and utilities. Choose the type that best fits your use case.

Postsubmits: These plugins are executed after the code is merged and are often used for tasks like publishing artifacts or creating release notes.
Presubmits: Presubmit plugins run before code is merged, typically used for running tests and ensuring code quality.
Triggers: Trigger plugins allow you to trigger custom jobs based on specific events or criteria.
Utilities: Utility plugins offer reusable functions and utilities for other plugins.

3. Create Your Plugin

Once you’ve chosen a plugin type, it’s time to create it. Below is an example of a simple Prow plugin written in Go, named comment-plugin.go. It will create a comment on a pull request each time an event is received.

This code sets up a basic HTTP server that listens for GitHub events and handles them by creating a comment using the GitHub API. Customize this code to fit your specific use case.

package main
import (
    "encoding/json"
    "flag"
    "net/http"
    "os"
    "strconv"
    "time"
    "github.com/sirupsen/logrus"
    "k8s.io/test-infra/pkg/flagutil"
    "k8s.io/test-infra/prow/config"
    "k8s.io/test-infra/prow/config/secret"
    prowflagutil "k8s.io/test-infra/prow/flagutil"
    configflagutil "k8s.io/test-infra/prow/flagutil/config"
    "k8s.io/test-infra/prow/github"
    "k8s.io/test-infra/prow/interrupts"
    "k8s.io/test-infra/prow/logrusutil"
    "k8s.io/test-infra/prow/pjutil"
    "k8s.io/test-infra/prow/pluginhelp"
    "k8s.io/test-infra/prow/pluginhelp/externalplugins"
)
const pluginName = "comment-plugin"
type options struct {
    port int
    config                 configflagutil.ConfigOptions
    dryRun                 bool
    github                 prowflagutil.GitHubOptions
    instrumentationOptions prowflagutil.InstrumentationOptions
    webhookSecretFile string
}
type server struct {
    tokenGenerator func() []byte
    botUser        *github.UserData
    email          string
    ghc            github.Client
    log            *logrus.Entry
    repos          []github.Repo
}
func helpProvider(_ []config.OrgRepo) (*pluginhelp.PluginHelp, error) {
    pluginHelp := &pluginhelp.PluginHelp{
       Description: `The sample plugin`,
    }
    return pluginHelp, nil
}
func (o *options) Validate() error {
    return nil
}
func gatherOptions() options {
    o := options{config: configflagutil.ConfigOptions{ConfigPath: "./config.yaml"}}
    fs := flag.NewFlagSet(os.Args[0], flag.ExitOnError)
    fs.IntVar(&o.port, "port", 8888, "Port to listen on.")
    fs.BoolVar(&o.dryRun, "dry-run", false, "Dry run for testing. Uses API tokens but does not mutate.")
    fs.StringVar(&o.webhookSecretFile, "hmac-secret-file", "/etc/hmac", "Path to the file containing GitHub HMAC secret.")
    for _, group := range []flagutil.OptionGroup{&o.github} {
       group.AddFlags(fs)
    }
    fs.Parse(os.Args[1:])
    return o
}
func main() {
    o := gatherOptions()
    if err := o.Validate(); err != nil {
       logrus.Fatalf("Invalid options: %v", err)
    }
    logrusutil.ComponentInit()
    log := logrus.StandardLogger().WithField("plugin", pluginName)
    if err := secret.Add(o.webhookSecretFile); err != nil {
       logrus.WithError(err).Fatal("Error starting secrets agent.")
    }
    gitHubClient, err := o.github.GitHubClient(o.dryRun)
    if err != nil {
       logrus.WithError(err).Fatal("Error getting GitHub client.")
    }
    email, err := gitHubClient.Email()
    if err != nil {
       log.WithError(err).Fatal("Error getting bot e-mail.")
    }
    botUser, err := gitHubClient.BotUser()
    if err != nil {
       logrus.WithError(err).Fatal("Error getting bot name.")
    }
    repos, err := gitHubClient.GetRepos(botUser.Login, true)
    if err != nil {
       log.WithError(err).Fatal("Error listing bot repositories.")
    }
    serv := &server{
       tokenGenerator: secret.GetTokenGenerator(o.webhookSecretFile),
       botUser:        botUser,
       email:          email,
       ghc:            gitHubClient,
       log:            log,
       repos:          repos,
    }
    health := pjutil.NewHealthOnPort(o.instrumentationOptions.HealthPort)
    health.ServeReady()
    mux := http.NewServeMux()
    mux.Handle("/", serv)
    externalplugins.ServeExternalPluginHelp(mux, log, helpProvider)
    logrus.Info("starting server " + strconv.Itoa(o.port))
    httpServer := &http.Server{Addr: ":" + strconv.Itoa(o.port), Handler: mux}
    defer interrupts.WaitForGracefulShutdown()
    interrupts.ListenAndServe(httpServer, 5*time.Second)
}
func (s *server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    logrus.Info("inside http server")
    _, _, payload, ok, _ := github.ValidateWebhook(w, r, s.tokenGenerator)
    logrus.Info(string(payload))
    if !ok {
       return
    }
    logrus.Info(w, "Event received. Have a nice day.")
    if err := s.handleEvent(payload); err != nil {
       logrus.WithError(err).Error("Error parsing event.")
    }
}
func (s *server) handleEvent(payload []byte) error {
    logrus.Info("inside handler")
    var pr github.PullRequestEvent
    if err := json.Unmarshal(payload, &pr); err != nil {
       return err
    }
    logrus.Info(pr.Number)
    if err := s.ghc.CreateComment(pr.PullRequest.Base.Repo.Owner.Login, pr.PullRequest.Base.Repo.Name, pr.Number, "comment from smaple-plugin"); err != nil {
       return err
    }
    return nil
}

package main

import (
    "encoding/json"
    "flag"
    "net/http"
    "os"
    "strconv"
    "time"

    "github.com/sirupsen/logrus"
    "k8s.io/test-infra/pkg/flagutil"
    "k8s.io/test-infra/prow/config"
    "k8s.io/test-infra/prow/config/secret"
    prowflagutil "k8s.io/test-infra/prow/flagutil"
    configflagutil "k8s.io/test-infra/prow/flagutil/config"
    "k8s.io/test-infra/prow/github"
    "k8s.io/test-infra/prow/interrupts"
    "k8s.io/test-infra/prow/logrusutil"
    "k8s.io/test-infra/prow/pjutil"
    "k8s.io/test-infra/prow/pluginhelp"
    "k8s.io/test-infra/prow/pluginhelp/externalplugins"
)

const pluginName = "comment-plugin"

type options struct {
    port int

    config                 configflagutil.ConfigOptions
    dryRun                 bool
    github                 prowflagutil.GitHubOptions
    instrumentationOptions prowflagutil.InstrumentationOptions

    webhookSecretFile string
}

type server struct {
    tokenGenerator func() []byte
    botUser        *github.UserData
    email          string
    ghc            github.Client
    log            *logrus.Entry
    repos          []github.Repo
}

func helpProvider(_ []config.OrgRepo) (*pluginhelp.PluginHelp, error) {
    pluginHelp := &pluginhelp.PluginHelp{
       Description: `The sample plugin`,
    }
    return pluginHelp, nil
}

func (o *options) Validate() error {
    return nil
}

func gatherOptions() options {
    o := options{config: configflagutil.ConfigOptions{ConfigPath: "./config.yaml"}}
    fs := flag.NewFlagSet(os.Args[0], flag.ExitOnError)
    fs.IntVar(&o.port, "port", 8888, "Port to listen on.")
    fs.BoolVar(&o.dryRun, "dry-run", false, "Dry run for testing. Uses API tokens but does not mutate.")
    fs.StringVar(&o.webhookSecretFile, "hmac-secret-file", "/etc/hmac", "Path to the file containing GitHub HMAC secret.")
    for _, group := range []flagutil.OptionGroup{&o.github} {
       group.AddFlags(fs)
    }
    fs.Parse(os.Args[1:])
    return o
}

func main() {
    o := gatherOptions()
    if err := o.Validate(); err != nil {
       logrus.Fatalf("Invalid options: %v", err)
    }

    logrusutil.ComponentInit()
    log := logrus.StandardLogger().WithField("plugin", pluginName)

    if err := secret.Add(o.webhookSecretFile); err != nil {
       logrus.WithError(err).Fatal("Error starting secrets agent.")
    }

    gitHubClient, err := o.github.GitHubClient(o.dryRun)
    if err != nil {
       logrus.WithError(err).Fatal("Error getting GitHub client.")
    }

    email, err := gitHubClient.Email()
    if err != nil {
       log.WithError(err).Fatal("Error getting bot e-mail.")
    }

    botUser, err := gitHubClient.BotUser()
    if err != nil {
       logrus.WithError(err).Fatal("Error getting bot name.")
    }
    repos, err := gitHubClient.GetRepos(botUser.Login, true)
    if err != nil {
       log.WithError(err).Fatal("Error listing bot repositories.")
    }
    serv := &server{
       tokenGenerator: secret.GetTokenGenerator(o.webhookSecretFile),
       botUser:        botUser,
       email:          email,
       ghc:            gitHubClient,
       log:            log,
       repos:          repos,
    }

    health := pjutil.NewHealthOnPort(o.instrumentationOptions.HealthPort)
    health.ServeReady()

    mux := http.NewServeMux()
    mux.Handle("/", serv)
    externalplugins.ServeExternalPluginHelp(mux, log, helpProvider)
    logrus.Info("starting server " + strconv.Itoa(o.port))
    httpServer := &http.Server{Addr: ":" + strconv.Itoa(o.port), Handler: mux}
    defer interrupts.WaitForGracefulShutdown()
    interrupts.ListenAndServe(httpServer, 5*time.Second)
}

func (s *server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    logrus.Info("inside http server")
    _, _, payload, ok, _ := github.ValidateWebhook(w, r, s.tokenGenerator)
    logrus.Info(string(payload))
    if !ok {
       return
    }
    logrus.Info(w, "Event received. Have a nice day.")
    if err := s.handleEvent(payload); err != nil {
       logrus.WithError(err).Error("Error parsing event.")
    }
}

func (s *server) handleEvent(payload []byte) error {
    logrus.Info("inside handler")
    var pr github.PullRequestEvent
    if err := json.Unmarshal(payload, &pr); err != nil {
       return err
    }
    logrus.Info(pr.Number)
    if err := s.ghc.CreateComment(pr.PullRequest.Base.Repo.Owner.Login, pr.PullRequest.Base.Repo.Name, pr.Number, "comment from smaple-plugin"); err != nil {
       return err
    }
    return nil
}

4. Deploy Your Plugin

To deploy your custom Prow plugin, you will need to create a Docker image and deploy it into your Prow cluster.‍

FROM golang as app-builder
WORKDIR /app
RUN apt  update
RUN apt-get install git
COPY . .
RUN CGO_ENABLED=0 go build -o main
FROM alpine:3.9
RUN apk add ca-certificates git
COPY --from=app-builder /app/main /app/custom-plugin
ENTRYPOINT ["/app/custom-plugin"]

FROM golang as app-builder
WORKDIR /app
RUN apt  update
RUN apt-get install git
COPY . .
RUN CGO_ENABLED=0 go build -o main

FROM alpine:3.9
RUN apk add ca-certificates git
COPY --from=app-builder /app/main /app/custom-plugin
ENTRYPOINT ["/app/custom-plugin"]

docker build -t jainbhavya65/custom-plugin:v1 .

docker push jainbhavya65/custom-plugin:v1

Deploy the Docker image using Kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: comment-plugin
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: comment-plugin
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: comment-plugin
    spec:
      containers:
      - args:
        - --github-token-path=/etc/github/oauth
        - --hmac-secret-file=/etc/hmac-token/hmac
        - --port=80
        image: <IMAGE>
        imagePullPolicy: Always
        name: comment-plugin
        ports:
        - containerPort: 80
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/github
          name: oauth
          readOnly: true
        - mountPath: /etc/hmac-token
          name: hmac
          readOnly: true
      volumes:
      - name: oauth
        secret:
          defaultMode: 420
          secretName: oauth-token
      - name: hmac
        secret:
          defaultMode: 420
          secretName: hmac-token

apiVersion: apps/v1
kind: Deployment
metadata:
  name: comment-plugin
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: comment-plugin
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: comment-plugin
    spec:
      containers:
      - args:
        - --github-token-path=/etc/github/oauth
        - --hmac-secret-file=/etc/hmac-token/hmac
        - --port=80
        image: <IMAGE>
        imagePullPolicy: Always
        name: comment-plugin
        ports:
        - containerPort: 80
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/github
          name: oauth
          readOnly: true
        - mountPath: /etc/hmac-token
          name: hmac
          readOnly: true
      volumes:
      - name: oauth
        secret:
          defaultMode: 420
          secretName: oauth-token
      - name: hmac
        secret:
          defaultMode: 420
          secretName: hmac-token

Create a service for deployment:

apiVersion: v1
kind: Service
metadata:
  name: comment-plugin
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: comment-plugin
  sessionAffinity: None
  type: ClusterIP
view raw

apiVersion: v1
kind: Service
metadata:
  name: comment-plugin
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: comment-plugin
  sessionAffinity: None
  type: ClusterIP
view raw

After creating the deployment and service, integrate it into your organization’s Prow configuration. This involves updating your Prow plugin.yaml files to include your plugin and specify when it should run.

external_plugins: 
- name: comment-plugin
  # No endpoint specified implies "http://{{name}}". // as we deploy plugin into same cluster
  # if plugin is not deployed in same cluster then you can give endpoint
  events:
  # only pull request and issue comment events are send to our plugin
  - pull_request
  - issue_comment

external_plugins: 
- name: comment-plugin
  # No endpoint specified implies "http://{{name}}". // as we deploy plugin into same cluster
  # if plugin is not deployed in same cluster then you can give endpoint
  events:
  # only pull request and issue comment events are send to our plugin
  - pull_request
  - issue_comment

‍

Conclusion

Mastering Prow plugin development opens up a world of possibilities for tailoring your Kubernetes CI/CD workflow to meet your organization’s needs. While the initial learning curve may be steep, the benefits of custom plugins in terms of automation, efficiency, and control are well worth the effort.

Remember that the key to successful Prow plugin development lies in clear documentation, thorough testing, and collaboration with your team to ensure that your custom plugins enhance your CI/CD pipeline’s functionality and reliability. As Kubernetes and containerized applications continue to evolve, Prow will remain a valuable tool for managing your CI/CD processes, and your custom plugins will be the secret sauce that sets your workflow apart from the rest.

March 14, 2024

The Ultimate Guide to Disaster Recovery for Your Kubernetes Clusters
Kubernetes allows us to run a containerized application at scale without drowning in the details of application load balancing. You can ensure high availability for your applications running on Kubernetes by running multiple replicas (pods) of the application. All the complexity of container orchestrations is hidden away safely so that you can focus on developing application instead of deploying it. Learn more about high availability of Kubernetes Clusters and how you can use Kubedm for high availability in Kubernetes here.

But using Kubernetes has its own challenges and getting Kubernetes up and running takes some real work. If you are not familiar with getting Kubernetes up and running, you might want to take a look here.

Kubernetes allows us to have a zero downtime deployment, yet service interrupting events are inevitable and can occur at any time. Your network can go down, your latest application push can introduce a critical bug, or in the rarest case, you might even have to face a natural disaster.

When you are using Kubernetes, sooner or later, you need to set up a backup. In case your cluster goes into an unrecoverable state, you will need a backup to go back to the previous stable state of the Kubernetes cluster.

Why Backup and Recovery?

There are three reasons why you need a backup and recovery mechanism in place for your Kubernetes cluster. These are:
1. To recover from Disasters: like someone accidentally deleted the namespace where your deployments reside.
2. Replicate the environment: You want to replicate your production environment to staging environment before any major upgrade.
3. Migration of Kubernetes Cluster: Let’s say, you want to migrate your Kubernetes cluster from one environment to another.
What to Backup?

Now that you know why, let’s see what exactly do you need to backup. The two things you need to backup are:
1. Your Kubernetes control plane is stored into etcd storage and you need to backup the etcd state to get all the Kubernetes resources.
2. If you have stateful containers (which you will have in real world), you need a backup of persistent volumes as well.
How to Backup?

There have been various tools like Heptio ark and Kube-backup to backup and restore the Kubernetes cluster for cloud providers. But, what if you are not using managed Kubernetes cluster? You might have to get your hands dirty if you are running Kubernetes on Baremetal, just like we are.

We are running 3 master Kubernetes cluster with 3 etcd members running on each master. If we lose one master, we can still recover the master because etcd quorum is intact. Now if we lose two masters, we need a mechanism to recover from such situations as well for production grade clusters.

Want to know how to set up multi-master Kubernetes cluster? Keep reading!

Taking etcd backup:

There is a different mechanism to take etcd backup depending on how you set up your etcd cluster in Kubernetes environment.

There are two ways to setup etcd cluster in kubernetes environment:
1. Internal etcd cluster: It means you’re running your etcd cluster in the form of containers/pods inside the Kubernetes cluster and it is the responsibility of Kubernetes to manage those pods.
2. External etcd cluster: Etcd cluster you’re running outside of Kubernetes cluster mostly in the form of Linux services and providing its endpoints to Kubernetes cluster to write to.
Backup Strategy for Internal Etcd Cluster:

To take a backup from inside a etcd pod, we will be using Kubernetes CronJob functionality which will not require any etcdctl client to be installed on the host.

Following is the definition of Kubernetes CronJob which will take etcd backup every minute:
`apiVersion: batch/v1beta1kind: CronJobmetadata: name: backup namespace: kube-systemspec: # activeDeadlineSeconds: 100schedule: "*/1 * * * *" jobTemplate: spec: template: spec: containers: - name: backup # Same image as in /etc/kubernetes/manifests/etcd.yaml image: k8s.gcr.io/etcd:3.2.24 env: - name: ETCDCTL_API value: "3" command: ["/bin/sh"] args: ["-c", "etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key snapshot save /backup/etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db"] volumeMounts: - mountPath: /etc/kubernetes/pki/etcd name: etcd-certs readOnly: true - mountPath: /backup name: backup restartPolicy: OnFailure hostNetwork: true volumes: - name: etcd-certs hostPath: path: /etc/kubernetes/pki/etcd type: DirectoryOrCreate - name: backup hostPath: path: /data/backup type: DirectoryOrCreate
```
`apiVersion: batch/v1beta1kind: CronJobmetadata: name: backup namespace: kube-systemspec: # activeDeadlineSeconds: 100schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
# Same image as in /etc/kubernetes/manifests/etcd.yaml
image: k8s.gcr.io/etcd:3.2.24
env:
- name: ETCDCTL_API
value: "3"
command: ["/bin/sh"]
args: ["-c", "etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key snapshot save /backup/etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db"]
volumeMounts:
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
readOnly: true
- mountPath: /backup
name: backup
restartPolicy: OnFailure
hostNetwork: true
volumes:
- name: etcd-certs
hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
- name: backup
hostPath:
path: /data/backup
type: DirectoryOrCreate
```
Backup Strategy for External Etcd Cluster:

If you running etcd cluster on Linux hosts as a service, you should set up a Linux cron job to take backup of your cluster.

Run the following command to save etcd backup
```
ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save /path/for/backup/snapshot.db
```
Disaster Recovery

Now, Let’s say the Kubernetes cluster went completely down and we need to recover the Kubernetes cluster from the etcd snapshot.

Normally, start the etcd cluster and do the kubeadm init on the master node with etcd endpoints.

Make sure you put the backup certificates into /etc/kubernetes/pki folder before kubeadm init. It will pick up the same certificates.

Restore Strategy for Internal Etcd Cluster:
```
docker run --rm 
-v '/data/backup:/backup' 
-v '/var/lib/etcd:/var/lib/etcd' 
--env ETCDCTL_API=3 
k8s.gcr.io/etcd:3.2.24' 
/bin/sh -c "etcdctl snapshot restore '/backup/etcd-snapshot-2018-12-09_11:12:05_UTC.db' ; mv /default.etcd/member/ /var/lib/etcd/"

kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd
```
Restore Strategy for External Etcd Cluster

Restore the etcd on 3 nodes using following commands:
ETCDCTL_API=3 etcdctl snapshot restore snapshot-188.db --name master-0 --initial-cluster master-0=http://10.0.1.188:2380,master-01=http://10.0.1.136:2380,master-2=http://10.0.1.155:2380 --initial-cluster-token my-etcd-token --initial-advertise-peer-urls http://10.0.1.188:2380 ETCDCTL_API=3 etcdctl snapshot restore snapshot-136.db --name master-1 --initial-cluster master-0=http://10.0.1.188:2380,master-1=http://10.0.1.136:2380,master-2=http://10.0.1.155:2380 --initial-cluster-token my-etcd-token --initial-advertise-peer-urls http://10.0.1.136:2380 ETCDCTL_API=3 etcdctl snapshot restore snapshot-155.db --name master-2 --initial-cluster master-0=http://10.0.1.188:2380,master-1=http://10.0.1.136:2380,master-2=http://10.0.1.155:2380 --initial-cluster-token my-etcd-token --initial-advertise-peer-urls http://10.0.1.155:2380
```
ETCDCTL_API=3 etcdctl snapshot restore snapshot-188.db 
--name master-0 
--initial-cluster master-0=http://10.0.1.188:2380,master-01=http://10.0.1.136:2380,master-2=http://10.0.1.155:2380 
--initial-cluster-token my-etcd-token 
--initial-advertise-peer-urls http://10.0.1.188:2380

ETCDCTL_API=3 etcdctl snapshot restore snapshot-136.db 
--name master-1 
--initial-cluster master-0=http://10.0.1.188:2380,master-1=http://10.0.1.136:2380,master-2=http://10.0.1.155:2380 
--initial-cluster-token my-etcd-token 
--initial-advertise-peer-urls http://10.0.1.136:2380

ETCDCTL_API=3 etcdctl snapshot restore snapshot-155.db 
--name master-2 
--initial-cluster master-0=http://10.0.1.188:2380,master-1=http://10.0.1.136:2380,master-2=http://10.0.1.155:2380 
--initial-cluster-token my-etcd-token 
--initial-advertise-peer-urls http://10.0.1.155:2380
```
The above three commands will give you three restored folders on three nodes named master:

0.etcd, master-1.etcd and master-2.etcd

Now, Stop all the etcd service on the nodes, replace the restored folder with the restored folders on all nodes and start the etcd service. Now you can see all the nodes, but in some time you will see that only master node is ready and other nodes went into the not ready state. You need to join those two nodes again with the existing ca.crt file (you should have a backup of that).

Run the following command on master node:
```
kubeadm token create --print-join-command
```
It will give you kubeadm join command, add one –ignore-preflight-errors and run that command on other two nodes for them to come into the ready state.

Conclusion

One way to deal with master failure is to set up multi-master Kubernetes cluster, but even that does not allow you to completely eliminate the Kubernetes etcd backup and restore, and it is still possible that you may accidentally destroy data on the HA environment.

Need help with disaster recovery for your Kubernetes Cluster? Connect with the experts at Velotio!

For more insights into Kubernetes Disaster Recovery check out here.
March 1, 2024
Streamline Kubernetes Storage Upgrades
Introduction:

As technology advances, organizations are constantly seeking ways to optimize their IT infrastructure to enhance performance, reduce costs, and gain a competitive edge. One such approach involves migrating from traditional storage solutions to more advanced options that offer superior performance and cost-effectiveness.

In this blog post, we’ll explore a recent project (On Azure) where we successfully migrated our client’s applications from Disk type Premium SSD to Premium SSD v2. This migration led to performance improvements and cost savings for our client.

Prerequisites:

Before initiating this migration, ensure the following prerequisites are in place:
1. Kubernetes Cluster: Ensure you have a working K8S cluster to host your applications.
2. Velero Backup Tool: Install Velero, a widely-used backup and restoration tool tailored for Kubernetes environments.
Overview of Velero:

Velero stands out as a powerful tool designed for robust backup, restore, and migration solutions within Kubernetes clusters. It plays a crucial role in ensuring data safety and continuity during complex migration operations.

Refer to the article on Velero installation and configuration.

Strategic Plan Overview:

There is two methods for upgrading storage classes:
- Migration via Velero and CSI Integration:
This approach leverages Velero’s capabilities in conjunction with CSI integration to achieve a seamless and efficient migration.
- Using Cloud Methods:
This method involves leveraging cloud provider-specific procedures. It includes steps like taking a snapshot of the disk, creating a new disk from the snapshot, and then establishing a Kubernetes volume using disk referencing.

Step-by-Step Guide:

Migration via Velero and CSI Integration:

Step 1 : Storage Class for Premium SSD v2

Define a new storage class that supports Azure Premium SSD v2 disks. This storage class will be used to provision new persistent volumes during the restore process.
```
# We have taken azure storage class example

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
 name: premium-ssd-v2
parameters:
 cachingMode: None
 skuName: PremiumV2_LRS # (Disk Type)
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
```
Step 2: Volume Snapshot Class

Introduce a Volume Snapshot Class to enable snapshot creation for persistent volumes. This class will be utilized for capturing the current state of persistent volumes before restoring them using Premium SSD v2.
```
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: disk-snapshot-class
driver: disk.csi.azure.com
deletionPolicy: Delete
parameters:
  incremental: "false"
```
Step 3: Update Velero Deployment and Daemonset

Enable CSI (Container Storage Interface) support in both the Velero deployment and the node-agent daemonset. This modification allows Velero to interact with the Cloud Disk CSI driver for provisioning and managing persistent volumes. Additionally, configure the Velero client to utilize the CSI plugin, ensuring that Velero utilizes the Cloud Disk CSI driver for backup and restore operations.
```
# Enable CSI Server side features 

$ Kubectl -n velero edit deployment/velero
$ kubectl -n velero edit daemonset/restic

# Add below --features=EnableCSI flag in both resources 

    spec:
      containers:
      - args:
        - server
        - --features=EnableCSI

# Enable client side features 

$ velero client config set features=EnableCSI
```
Step 4: Take Velero Backup

Create a Velero backup of all existing persistent volumes stored on Disk Premium SSD. These backups serve as a safety net in case of any unforeseen issues during the migration process. And we can use the include and exclude flags with the velero backup commands.

Reference Article : https://velero.io/docs/v1.12/resource-filtering
```
# run the below command for taking backup 
$ velero backup create backup_name --include-namespaces namespace_name
```
Step 5: ConfigMap Deployment

Deploy a ConfigMap in the Velero namespace. This ConfigMap defines the mapping between the old storage class (Premium SSD) and the new storage class (Premium SSD v2). During the restore process, Velero will use this mapping to recreate the persistent volumes using the new storage class.
```
apiVersion: v1
data:
  # managed-premium : premium-ssd-v2
  older storage_class name : new storage_class name
kind: ConfigMap
metadata:
  labels:
    velero.io/change-storage-class: RestoreItemAction
    velero.io/plugin-config: ""
  name: storage-class-config
  namespace: velero
```
Step 6: Velero Restore Operation

Initiate the Velero restore process. This will replace the existing persistent volumes with new ones provisioned using Disk Premium SSD v2. The ConfigMap will ensure that the restored persistent volumes utilize the new storage class.

Reference article: https://velero.io/docs/v1.12/restore-reference
```
# run the below command for restoring from backups to different namespace 
$ velero restore create restore-name --from-backup backup-name --namespace-mappings namespace1:namespace2
# verify the new restored resources in namespace2
$ kubectl get pvc,pv,pod -n namespace2
```
Step 7: Verification & Testing

Verify that all applications continue to function correctly after the restore process. Check for any performance improvements and cost savings as a result of the migration to Premium SSD v2.

Step 8: Post-Migration Cleanup

Remove any temporary resources created during the migration process, such as Volume Snapshots, and the custom Volume Snapshot Class. And delete the old persistent volume claims (PVCs) that were associated with the Premium SSD disks. This will trigger the automatic deletion of the corresponding persistent volumes (PVs) and Azure Disk storage.

Impact:

It’s less risky because all new objects are created while retaining other copies with snapshots. And during the scheduling of new pods, the new Premium SSD v2 disks will be provisioned in the same zone as the node where the pod is being scheduled. While the content of the new disks is restored from the snapshot, there may be some downtime expected. The duration of downtime depends on the size of the disks being restored.

Conclusion:

Migrating from any storage class to a newer, more performant one using Velero can provide significant benefits for your organization. By leveraging Velero’s comprehensive backup and restore functionalities, you can effectively migrate your applications to the new storage class while maintaining data integrity and application functionality. Whether you’re upgrading from Premium SSD to Premium SSD v2 or transitioning to a completely different storage provider. By adopting this approach, organizations can reap the rewards of enhanced performance, reduced costs, and simplified storage management.
December 13, 2023
Unveiling the Magic of Kubernetes: Exploring Pod Priority, Priority Classes, and Pod Preemption
‍Introduction:

Generally, during the deployment of a manifest, we observe that some pods get successfully scheduled, while few critical pods encounter scheduling issues. Therefore, we must schedule the critical pods first over other pods. While exploring, we discovered a built-in solution for scheduling using Pod Priority and Priority Class. So, in this blog, we’ll be talking about Priority Class and Pod Priority and how we can implement them in our use case.

Pod Priority:

It is used to prioritize one pod over another based on its importance. Pod Priority is particularly useful when critical pods cannot be scheduled due to limited resources.

Priority Classes:

This Kubernetes object defines the priority of pods. Priority can be set by an integer value. Higher-priority values have higher priority to the pod.

Understanding Priority Values:

Priority Classes in Kubernetes are associated with priority values that range from 0 to 1000000000, with a higher value indicating greater importance.

These values act as a guide for the scheduler when allocating resources.

Pod Preemption:

It is already enabled when we create a priority class. The purpose of Pod Preemption is to evict lower-priority pods in order to make room for higher-priority pods to be scheduled.

Example Scenario: The Enchanted Shop

Let’s dive into a scenario featuring “The Enchanted Shop,” a Kubernetes cluster hosting an online store. The shop has three pods, each with a distinct role and priority:

Priority Class:
- Create High priority class:
```
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
```
- Create Medium priority class:
```
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: medium-priority
value: 500000
```
- Create Low priority class:
```
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 100000
```
Pods:
- Checkout Pod (High Priority): This pod is responsible for processing customer orders and must receive top priority.
Create the Checkout Pod with a high-priority class:
```
apiVersion: v1
kind: Pod
metadata:
  name: checkout-pod
  labels:
    app: checkout
spec:
  priorityClassName: high-priority
  containers:
  - name: checkout-container
    image: nginx:checkout
```
- Product Recommendations Pod (Medium Priority):
This pod provides personalized product recommendations to customers and holds moderate importance.

Create the Product Recommendations Pod with a medium priority class:
```
apiVersion: v1
kind: Pod
metadata:
  name: product-rec-pod
  labels:
    app: product-recommendations
spec:
  priorityClassName: medium-priority
  containers:
  - name: product-rec-container
    image: nginx:store
```
- Shopping Cart Pod (Low Priority):
This pod manages customers’ shopping carts and has a lower priority compared to the others.

Create the Shopping Cart Pod with a low-priority class:
```
apiVersion: v1
kind: Pod
metadata:
  name: shopping-cart-pod
  labels:
    app: shopping-cart
spec:
  priorityClassName: low-priority
  containers:
  - name: shopping-cart-container
    image: nginx:cart
```
With these pods and their respective priority classes, Kubernetes will allocate resources based on their importance, ensuring smooth operation even during peak loads.

Commands to Witness the Magic:
- Verify Priority Classes:
kubectl get priorityclasses

Note: Kubernetes includes two predefined Priority Classes: system-cluster-critical and system-node-critical. These classes are specifically designed to prioritize the scheduling of critical components, ensuring they are always scheduled first.
- Check Pod Priority:
Conclusion:

In Kubernetes, you have the flexibility to define how your pods are scheduled. This ensures that your critical pods receive priority over lower-priority pods during the scheduling process. To get deeper into the concepts of Pod Priority, Priority Class, and Pod Preemption, you can find more information by referring to the following links.
- https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/
- https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#how-to-use-priority-and-preemption
July 20, 2023

How to deploy GitHub Actions Self-Hosted Runners on Kubernetes

GitHub Actions jobs are run in the cloud by default; however, sometimes we want to run jobs in our own customized/private environment where we have full control. That is where a self-hosted runner saves us from this problem.

To get a basic understanding of running self-hosted runners on the Kubernetes cluster, this blog is perfect for you.

We’ll be focusing on running GitHub Actions on a self-hosted runner on Kubernetes.

An example use case would be to create an automation in GitHub Actions to execute MySQL queries on MySQL Database running in a private network (i.e., MySQL DB, which is not accessible publicly).

A self-hosted runner requires the provisioning and configuration of a virtual machine instance; here, we are running it on Kubernetes. For running a self-hosted runner on a Kubernetes cluster, the action-runner-controller helps us to make that possible.

This blog aims to try out self-hosted runners on Kubernetes and covers:

Deploying MySQL Database on minikube, which is accessible only within Kubernetes Cluster.
Deploying self-hosted action runners on the minikube.
Running GitHub Action on minikube to execute MySQL queries on MySQL Database.

Steps for completing this tutorial:

Create a GitHub repository

Create a private repository on GitHub. I am creating it with the name velotio/action-runner-poc.

Setup a Kubernetes cluster using minikube

Install Docker.
Install Minikube.
Install Helm
Install kubectl

Install cert-manager on a Kubernetes cluster

By default, actions-runner-controller uses cert-manager for certificate management of admission webhook, so we have to make sure cert-manager is installed on Kubernetes before we install actions-runner-controller.
Run the below helm commands to install cert-manager on minikube.

Verify installation using “kubectl –namespace cert-manager get all”. If everything is okay, you will see an output as below:

Setting Up Authentication for Hosted Runners‍

There are two ways for actions-runner-controller to authenticate with the GitHub API (only 1 can be configured at a time, however):

Using a GitHub App (not supported for enterprise-level runners due to lack of support from GitHub.)
Using a PAT (personal access token)

To keep this blog simple, we are going with PAT.

To authenticate an action-runner-controller with the GitHub API, we can use a PAT with the action-runner-controller registers a self-hosted runner.

Go to account > Settings > Developers settings > Personal access token. Click on “Generate new token”. Under scopes, select “Full control of private repositories”.

Click on the “Generate token” button.

Copy the generated token and run the below commands to create a Kubernetes secret, which will be used by action-runner-controller deployment.

export GITHUB_TOKEN=XXXxxxXXXxxxxXYAVNa

export GITHUB_TOKEN=XXXxxxXXXxxxxXYAVNa

kubectl create ns actions-runner-system

kubectl create ns actions-runner-system

Create secret

kubectl create secret generic controller-manager  -n actions-runner-system 
--from-literal=github_token=${GITHUB_TOKEN}

kubectl create secret generic controller-manager  -n actions-runner-system 
--from-literal=github_token=${GITHUB_TOKEN}

Install action runner controller on the Kubernetes cluster

Run the below helm commands

helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller
helm repo update
helm upgrade --install --namespace actions-runner-system 
--create-namespace --wait actions-runner-controller 
actions-runner-controller/actions-runner-controller --set 
syncPeriod=1m

helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller
helm repo update
helm upgrade --install --namespace actions-runner-system 
--create-namespace --wait actions-runner-controller 
actions-runner-controller/actions-runner-controller --set 
syncPeriod=1m

Verify that the action-runner-controller installed properly using below command

kubectl --namespace actions-runner-system get all

kubectl --namespace actions-runner-system get all

Create a Repository Runner

Create a RunnerDeployment Kubernetes object, which will create a self-hosted runner named k8s-action-runner for the GitHub repository velotio/action-runner-poc
Please Update Repo name from “velotio/action-runner-poc” to “<Your-repo-name>”
To create the RunnerDeployment object, create the file runner.yaml as follows:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
 name: k8s-action-runner
 namespace: actions-runner-system
spec:
 replicas: 2
 template:
   spec:
     repository: velotio/action-runner-poc

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
 name: k8s-action-runner
 namespace: actions-runner-system
spec:
 replicas: 2
 template:
   spec:
     repository: velotio/action-runner-poc

To create, run this command:

kubectl create -f runner.yaml

kubectl create -f runner.yaml

Check that the pod is running using the below command:

kubectl get pod -n actions-runner-system | grep -i "k8s-action-runner"

kubectl get pod -n actions-runner-system | grep -i "k8s-action-runner"

‍

If everything goes well, you should see two action runners on the Kubernetes, and the same are registered on Github. Check under Settings > Actions > Runner of your repository.

Check the pod with kubectl get po -n actions-runner-system

Install a MySQL Database on the Kubernetes cluster

‍

Create PV and PVC for MySQL Database.
Create mysql-pv.yaml with the below content.

apiVersion: v1
kind: PersistentVolume
metadata:
 name: mysql-pv-volume
 labels:
   type: local
spec:
 capacity:
   storage: 2Gi
 accessModes:
   - ReadWriteOnce
 hostPath:
   path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: mysql-pv-claim
spec:
 accessModes:
   - ReadWriteOnce
 resources:
   requests:
     storage: 2Gi

apiVersion: v1
kind: PersistentVolume
metadata:
 name: mysql-pv-volume
 labels:
   type: local
spec:
 capacity:
   storage: 2Gi
 accessModes:
   - ReadWriteOnce
 hostPath:
   path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: mysql-pv-claim
spec:
 accessModes:
   - ReadWriteOnce
 resources:
   requests:
     storage: 2Gi

Create mysql namespace

kubectl create ns mysql

kubectl create ns mysql

Now apply mysql-pv.yaml to create PV and PVC

kubectl create -f mysql-pv.yaml -n mysql

kubectl create -f mysql-pv.yaml -n mysql

Create the file mysql-svc-deploy.yaml and add the below content to mysql-svc-deploy.yaml

Here, we have used MYSQL_ROOT_PASSWORD as “password”.

apiVersion: v1
kind: Service
metadata:
 name: mysql
spec:
 ports:
   - port: 3306
 selector:
   app: mysql
 clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
 name: mysql
spec:
 selector:
   matchLabels:
     app: mysql
 strategy:
   type: Recreate
 template:
   metadata:
     labels:
       app: mysql
   spec:
     containers:
       - image: mysql:5.6
         name: mysql
         env:
             # Use secret in real usage
           - name: MYSQL_ROOT_PASSWORD
             value: password
         ports:
           - containerPort: 3306
             name: mysql
         volumeMounts:
           - name: mysql-persistent-storage
             mountPath: /var/lib/mysql
     volumes:
       - name: mysql-persistent-storage
         persistentVolumeClaim:
           claimName: mysql-pv-claim

apiVersion: v1
kind: Service
metadata:
 name: mysql
spec:
 ports:
   - port: 3306
 selector:
   app: mysql
 clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
 name: mysql
spec:
 selector:
   matchLabels:
     app: mysql
 strategy:
   type: Recreate
 template:
   metadata:
     labels:
       app: mysql
   spec:
     containers:
       - image: mysql:5.6
         name: mysql
         env:
             # Use secret in real usage
           - name: MYSQL_ROOT_PASSWORD
             value: password
         ports:
           - containerPort: 3306
             name: mysql
         volumeMounts:
           - name: mysql-persistent-storage
             mountPath: /var/lib/mysql
     volumes:
       - name: mysql-persistent-storage
         persistentVolumeClaim:
           claimName: mysql-pv-claim

Create the service and deployment

kubectl create -f mysql-svc-deploy.yaml -n mysql

kubectl create -f mysql-svc-deploy.yaml -n mysql

Verify that the MySQL database is running

kubectl get po -n mysql

kubectl get po -n mysql

‍

Create a GitHub repository secret to store MySQL password

As we will use MySQL password in the GitHub action workflow file as a good practice, we should not use it in plain text. So we will store MySQL password in GitHub secrets, and we will use this secret in our GitHub action workflow file.

Create a secret in the GitHub repository and give the name to the secret as “MYSQL_PASS”, and in the values, enter “password”.

Create a GitHub workflow file

YAML syntax is used to write GitHub workflows. For each workflow, we use a separate YAML file, which we store at .github/workflows/ directory. So, create a .github/workflows/ directory in your repository and create a file .github/workflows/mysql_workflow.yaml as follows.

---
name: Example 1
on:
 push:
   branches: [ main ]
jobs:
 build:
   name: Build-job
   runs-on: self-hosted
   steps:
   - name: Checkout
     uses: actions/checkout@v2
 
   - name: MySQLQuery
     env:
       PASS: ${{ secrets.MYSQL_PASS }}
     run: |
       docker run -v ${GITHUB_WORKSPACE}:/var/lib/docker --rm mysql:5.6 sh -c "mysql -u root -p$PASS -hmysql.mysql.svc.cluster.local </var/lib/docker/test.sql"

---
name: Example 1
on:
 push:
   branches: [ main ]
jobs:
 build:
   name: Build-job
   runs-on: self-hosted
   steps:
   - name: Checkout
     uses: actions/checkout@v2
 
   - name: MySQLQuery
     env:
       PASS: ${{ secrets.MYSQL_PASS }}
     run: |
       docker run -v ${GITHUB_WORKSPACE}:/var/lib/docker --rm mysql:5.6 sh -c "mysql -u root -p$PASS -hmysql.mysql.svc.cluster.local </var/lib/docker/test.sql"

If you check the docker run command in the mysql_workflow.yaml file, we are referring to the .sql file, i.e., test.sql. So, create a test.sql file in your repository as follows:

use mysql;
CREATE TABLE IF NOT EXISTS Persons (
   PersonID int,
   LastName varchar(255),
   FirstName varchar(255),
   Address varchar(255),
   City varchar(255)
);
 
SHOW TABLES;

use mysql;
CREATE TABLE IF NOT EXISTS Persons (
   PersonID int,
   LastName varchar(255),
   FirstName varchar(255),
   Address varchar(255),
   City varchar(255)
);
 
SHOW TABLES;

In test.sql, we are running MySQL queries like create tables.
Push changes to your repository main branch.
If everything is fine, you will be able to see that the GitHub action is getting executed in a self-hosted runner pod. You can check it under the “Actions” tab of your repository.

You can check the workflow logs to see the output of SHOW TABLES—a command we have used in the test.sql file—and check whether the persons tables is created.

References

January 18, 2023

Ensure Continuous Delivery On Kubernetes With GitOps’ Argo CD
What is GitOps?

GitOps is a Continuous Deployment model for cloud-native applications. In GitOps, the Git repositories which contain the declarative descriptions of the infrastructure are considered as the single source of truth for the desired state of the system and we need to have an automated way to ensure that the deployed state of the system always matches the state defined in the Git repository. All the changes (deployment/upgrade/rollback) on the environment are triggered by changes (commits) made on the Git repository

“The artifacts that we run on any environment always have a corresponding code for them on some Git repositories. Can we say the same thing for our infrastructure code?”

Infrastructure as code tools, completely declarative orchestration tools like Kubernetes allow us to represent the entire state of our system in a declarative way. GitOps intends to make use of this ability and make infrastructure-related operations more developer-centric.

Role of Infrastructure as Code (IaC) in GitOps

The ability to represent the infrastructure as code is at the core of GitOps. But just having versioned controlled infrastructure as code doesn’t mean GitOps, we also need to have a mechanism in place to keep (try to keep) our deployed state in sync with the state we define in the Git repository.

“Infrastructure as Code is necessary but not sufficient to achieve GitOps”

GitOps does pull-based deployments

Most of the deployment pipelines we see currently, push the changes in the deployed environment. For example, consider that we need to upgrade our application to a newer version then we will update its docker image tag in some repository which will trigger a deployment pipeline and update the deployed application. Here the changes were pushed on the environment. In GitOps, we just need to update the image tag on the Git repository for that environment and the changes will be pulled to the environment to match the updated state in the Git repository. The magic of keeping the deployed state in sync with state-defined on Git is achieved with the help of operators/agents. The operator is like a control loop which can identify differences between the deployed state and the desired state and make sure they are the same.

Key benefits of GitOps:
1. All the changes are verifiable and auditable as they make their way into the system through Git repositories.
2. Easy and consistent replication of the environment as Git repository is the single source of truth. This makes disaster recovery much quicker and simpler.
3. More developer-centric experience for operating infrastructure. Also a smaller learning curve for deploying dev environments.
4. Consistent rollback of application as well as infrastructure state.
Introduction to Argo CD

Argo CD is a continuous delivery tool that works on the principles of GitOps and is built specifically for Kubernetes. The product was developed and open-sourced by Intuit and is currently a part of CNCF.

Key components of Argo CD:
1. API Server: Just like K8s, Argo CD also has an API server that exposes APIs that other systems can interact with. The API server is responsible for managing the application, repository and cluster credentials, enforcing authentication and authorization, etc.
2. Repository server: The repository server keeps a local cache of the Git repository, which holds the K8s manifest files for the application. This service is called by other services to get the K8s manifests.
3. Application controller: The application controller continuously watches the deployed state of the application and compares it with the desired state of the application, reports the API server whenever they are not in sync with each other and seldom takes corrective actions as well. It is also responsible for executing user-defined hooks for various lifecycle events of the application.
Key objects/resources in Argo CD:
1. Application: Argo CD allows us to represent the instance of the application which we want to deploy in an environment by creating Kubernetes objects of a custom resource definition(CRD) named Application. In the specification of Application type objects, we specify the source (repository) of our application’s K8s manifest files, the K8s server where we want to deploy those manifests, namespace, and other information.
2. AppProject: Just like Application, Argo CD provides another CRD named AppProject. AppProjects are used to logically group related-applications.
3. Repo Credentials: In the case of private repositories, we need to provide access credentials. For credentials, Argo CD uses the K8s secrets and config map. First, we create objects of secret types and then we update a special-purpose configuration map named argocd-cm with the repository URL and the secret which contains the credentials.‍
4. Cluster Credentials: Along with Git repository credentials, we also need to provide the K8s cluster credentials. These credentials are also managed using K8s secret, we are required to add the label argocd.argoproj.io/secret-type: cluster to these secrets.
Demo:

Enough of theory, let’s try out the things we discussed above. For the demo, I have created a simple app named message-app. This app reads a message set in the environment variable named MESSAGE. We will populate the values of this environment variable using a K8s config map. I have kept the K8s manifest files for the app in a separate repository. We have the application and the K8s manifest files ready. Now we are all set to install Argo CD and deploy our application.

Installing Argo CD:

For installing Argo CD, we first need to create a namespace named argocd.
```
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
```
Applying the files from the argo-cd repo directly is fine for demo purposes, but in actual environments, you must copy the file in your repository before applying them.

We can see that this command has created the core components and CRDs we discussed earlier in the blog. There are some additional resources as well but we can ignore them for the time being.

Accessing the Argo CD GUI

We have the Argo CD running in our cluster, Argo CD also provides a GUI which gives us a graphical representation of our k8s objects. It allows us to view events, pod logs, and other configurations.

By default, the GUI service is not exposed outside the cluster. Let us update its service type to LoadBalancer so that we can access it from outside.
```
kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'
```
After this, we can use the external IP of the argocd-server service and access the GUI.

The initial username is admin and the password is the name of the api-server pod. The password can be obtained by listing the pods in the argocd namespace or directly by this command.
```
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f 2 
```
Deploy the app:

Now let’s go ahead and create our application for the staging environment for our message app.
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: message-app-staging namespace: argocd environment: staging finalizers: - resources-finalizer.argocd.argoproj.io spec: project: default # Source of the application manifests source: repoURL: https://github.com/akash-gautam/message-app-manifests.git targetRevision: HEAD path: manifests/staging # Destination cluster and namespace to deploy the application destination: server: https://kubernetes.default.svc namespace: staging syncPolicy: automated: prune: false selfHeal: false
```
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: message-app-staging
  namespace: argocd
  environment: staging
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default

  # Source of the application manifests
  source:
    repoURL: https://github.com/akash-gautam/message-app-manifests.git
    targetRevision: HEAD
    path: manifests/staging

  # Destination cluster and namespace to deploy the application
  destination:
    server: https://kubernetes.default.svc
    namespace: staging

  syncPolicy:
    automated:
      prune: false
      selfHeal: false
```
In the application spec, we have specified the repository, where our manifest files are stored and also the path of the files in the repository.

We want to deploy our app in the same k8s cluster where ArgoCD is running so we have specified the local k8s service URL in the destination. We want the resources to be deployed in the staging namespace, so we have set it accordingly.

In the sync policy, we have enabled automated sync. We have kept the project as default.

Adding the resources-finalizer.argocd.argoproj.io ensures that all the resources created for the application are deleted when the Application is deleted. This is fine for our demo setup but might not always be desirable in real-life scenarios.

Our git repos are public so we don’t need to create secrets for git repo credentials.

We are deploying in the same cluster where Argo CD itself is running. As this is a demo setup, we can use the admin user created by Argo CD, so we don’t need to create secrets for cluster credentials either.

Now let’s go ahead and create the application and see the magic happen.
```
kubectl apply -f message-app-staging.yaml
```
As soon as the application is created, we can see it on the GUI.

By clicking on the application, we can see all the Kubernetes objects created for it.

It also shows the objects which are indirectly created by the objects we create. In the above image, we can see the replica set and endpoint object which are created as a result of creating the deployment and service respectively.

We can also click on the individual objects and see their configuration. For pods, we can see events and logs as well.

As our app is deployed now, we can grab public IP of message-app service and access it on the browser.

We can see that our app is deployed and accessible.

Updating the app

For updating our application, all we need to do is commit our changes to the GitHub repository. We know the message-app just displays the message we pass to it via. Config map, so let’s update the message and push it to the repository.
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: message-configmap
  labels:
    app: message-app
data:
  MESSAGE: "This too shall pass" #Put the message you want to display here.
```
Once the commit is done, Argo CD will start to sync again.

Once the sync is done, we will restart our message app pod, so that it picks up the latest values in the config map. Then we need to refresh the browser to see updated values.

As we discussed earlier, for making any changes to the environment, we just need to update the repo which is being used as the source for the environment and then the changes will get pulled in the environment.

We can follow an exact similar approach and deploy the application to the production environment as well. We just need to create a new application object and set the manifest path and deployment namespace accordingly.

Conclusion:

It’s still early days for GitOps, but it has already been successfully implemented at scale by many organizations. As the GitOps tools mature along with the ever-growing adoption of Kubernetes, I think many organizations will consider adopting GitOps soon. GitOps is not limited only to Kubernetes, but the completely declarative nature of Kubernetes makes it simpler to achieve GitOps. Argo CD is a deployment tool that’s tailored for Kubernetes and allows us to do deployments in a Kubernetes native way while following the principles of GitOps.I hope this blog helped you in understanding how what and why of GitOps and gave some insights to Argo CD.
December 12, 2022
A Practical Guide to Deploying Multi-tier Applications on Google Container Engine (GKE)
Introduction

All modern era programmers can attest that containerization has afforded more flexibility and allows us to build truly cloud-native applications. Containers provide portability – ability to easily move applications across environments. Although complex applications comprise of many (10s or 100s) containers. Managing such applications is a real challenge and that’s where container orchestration and scheduling platforms like Kubernetes, Mesosphere, Docker Swarm, etc. come into the picture.
Kubernetes, backed by Google is leading the pack given that Redhat, Microsoft and now Amazon are putting their weight behind it.

Kubernetes can run on any cloud or bare metal infrastructure. Setting up & managing Kubernetes can be a challenge but Google provides an easy way to use Kubernetes through the Google Container Engine(GKE) service.

What is GKE?

Google Container Engine is a Management and orchestration system for Containers. In short, it is a hosted Kubernetes. The goal of GKE is to increase the productivity of DevOps and development teams by hiding the complexity of setting up the Kubernetes cluster, the overlay network, etc.

Why GKE? What are the things that GKE does for the user?
- GKE abstracts away the complexity of managing a highly available Kubernetes cluster.
- GKE takes care of the overlay network
- GKE also provides built-in authentication
- GKE also provides built-in auto-scaling.
- GKE also provides easy integration with the Google storage services.
In this blog, we will see how to create your own Kubernetes cluster in GKE and how to deploy a multi-tier application in it. The blog assumes you have a basic understanding of Kubernetes and have used it before. It also assumes you have created an account with Google Cloud Platform. If you are not familiar with Kubernetes, this guide from Deis is a good place to start.

Google provides a Command-line interface (gcloud) to interact with all Google Cloud Platform products and services. gcloud is a tool that provides the primary command-line interface to Google Cloud Platform. Gcloud tool can be used in the scripts to automate the tasks or directly from the command-line. Follow this guide to install the gcloud tool.

Now let’s begin! The first step is to create the cluster.

Basic Steps to create cluster

In this section, I would like to explain about how to create GKE cluster. We will use a command-line tool to setup the cluster.

Set the zone in which you want to deploy the cluster
```
$ gcloud config set compute/zone us-west1-a
```
Create the cluster using following command,
```
$ gcloud container --project <project-name> 
clusters create <cluster-name> 
--machine-type n1-standard-2 
--image-type "COS" --disk-size "50" 
--num-nodes 2 --network default 
--enable-cloud-logging --no-enable-cloud-monitoring
```
Let’s try to understand what each of these parameters mean:

–project: Project Name

–machine-type: Type of the machine like n1-standard-2, n1-standard-4

–image-type: OS image.”COS” i.e. Container Optimized OS from Google: More Info here.

–disk-size: Disk size of each instance.

–num-nodes: Number of nodes in the cluster.

–network: Network that users want to use for the cluster. In this case, we are using default network.

Apart from the above options, you can also use the following to provide specific requirements while creating the cluster:

–scopes: Scopes enable containers to direct access any Google service without needs credentials. You can specify comma separated list of scope APIs. For example:
- Compute: Lets you view and manage your Google Compute Engine resources
- Logging.write: Submit log data to Stackdriver.
You can find all the Scopes that Google supports here: .

–additional-zones: Specify additional zones to high availability. Eg. –additional-zones us-east1-b, us-east1-d . Here Kubernetes will create a cluster in 3 zones (1 specified at the beginning and additional 2 here).

–enable-autoscaling : To enable the autoscaling option. If you specify this option then you have to specify the minimum and maximum required nodes as follows; You can read more about how auto-scaling works here. Eg: –enable-autoscaling –min-nodes=15 –max-nodes=50

You can fetch the credentials of the created cluster. This step is to update the credentials in the kubeconfig file, so that kubectl will point to required cluster.
```
$ gcloud container clusters get-credentials my-first-cluster --project project-name
```
Now, your First Kubernetes cluster is ready. Let’s check the cluster information & health.
```
$ kubectl get nodes
NAME    STATUS    AGE   VERSION
gke-first-cluster-default-pool-d344484d-vnj1  Ready  2h  v1.6.4
gke-first-cluster-default-pool-d344484d-kdd7  Ready  2h  v1.6.4
gke-first-cluster-default-pool-d344484d-ytre2  Ready  2h  v1.6.4
```
After creating Cluster, now let’s see how to deploy a multi tier application on it. Let’s use simple Python Flask app which will greet the user, store employee data & get employee data.

Application Deployment

I have created simple Python Flask application to deploy on K8S cluster created using GKE. you can go through the source code here. If you check the source code then you will find directory structure as follows:
```
TryGKE/
├── Dockerfile
├── mysql-deployment.yaml
├── mysql-service.yaml
├── src    
  ├── app.py    
  └── requirements.txt    
  ├── testapp-deployment.yaml    
  └── testapp-service.yaml
```
In this, I have written a Dockerfile for the Python Flask application in order to build our own image to deploy. For MySQL, we won’t build an image of our own. We will use the latest MySQL image from the public docker repository.

Before deploying the application, let’s re-visit some of the important Kubernetes terms:

Pods:

The pod is a Docker container or a group of Docker containers which are deployed together on the host machine. It acts as a single unit of deployment.

Deployments:

Deployment is an entity which manages the ReplicaSets and provides declarative updates to pods. It is recommended to use Deployments instead of directly using ReplicaSets. We can use deployment to create, remove and update ReplicaSets. Deployments have the ability to rollout and rollback the changes.

Services:

Service in K8S is an abstraction which will connect you to one or more pods. You can connect to pod using the pod’s IP Address but since pods come and go, their IP Addresses change. Services get their own IP & DNS and those remain for the entire lifetime of the service.

Each tier in an application is represented by a Deployment. A Deployment is described by the YAML file. We have two YAML files – one for MySQL and one for the Python application.

1. MySQL Deployment YAML
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: mysql spec: template: metadata: labels: app: mysql spec: containers: - env: - name: MYSQL_DATABASE value: admin - name: MYSQL_ROOT_PASSWORD value: admin image: 'mysql:latest' name: mysql ports: - name: mysqlport containerPort: 3306 protocol: TCP
```
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: mysql
spec:
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
        - env:
            - name: MYSQL_DATABASE
              value: admin
            - name: MYSQL_ROOT_PASSWORD
              value: admin
          image: 'mysql:latest'
          name: mysql
          ports:
            - name: mysqlport
              containerPort: 3306
              protocol: TCP
```
2. Python Application Deployment YAML
```
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: test-app
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: test-app
    spec:
      containers:
      - name: test-app
        image: ajaynemade/pymy:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5000
```
Each Service is also represented by a YAML file as follows:

1. MySQL service YAML
```
apiVersion: v1
kind: Service
metadata:
  name: mysql-service
spec:
  ports:
  - port: 3306
    targetPort: 3306
    protocol: TCP
    name: http
  selector:
    app: mysql
```
2. Python Application service YAML
```
apiVersion: v1
kind: Service
metadata:
  name: test-service
spec:
  type: LoadBalancer
  ports:
  - name: test-service
    port: 80
    protocol: TCP
    targetPort: 5000
  selector:
    app: test-app
```
You will find a ‘kind’ field in each YAML file. It is used to specify whether the given configuration is for deployment, service, pod, etc.

In the Python app service YAML, I am using type = LoadBalancer. In GKE, There are two types of cloud load balancers available to expose the application to outside world.
1. TCP load balancer: This is a TCP Proxy-based load balancer. We will use this in our example.
2. HTTP(s) load balancer: It can be created using Ingress. For more information, refer to this post that talks about Ingress in detail.
In the MySQL service, I’ve not specified any type, in that case, type ‘ClusterIP’ will get used, which will make sure that MySQL container is exposed to the cluster and the Python app can access it.

If you check the app.py, you can see that I have used “mysql-service.default” as a hostname. “Mysql-service.default” is a DNS name of the service. The Python application will refer to that DNS name while accessing the MySQL Database.

Now, let’s actually setup the components from the configurations. As mentioned above, we will first create services followed by deployments.

Services:
```
$ kubectl create -f mysql-service.yaml
$ kubectl create -f testapp-service.yaml
```
Deployments:
```
$ kubectl create -f mysql-deployment.yaml
$ kubectl create -f testapp-deployment.yaml
```
Check the status of the pods and services. Wait till all pods come to the running state and Python application service to get external IP like below:
```
$ kubectl get services
NAME            CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes      10.55.240.1     <none>        443/TCP        5h
mysql-service   10.55.240.57    <none>        3306/TCP       1m
test-service    10.55.246.105   35.185.225.67     80:32546/TCP   11s
```
Once you get the external IP, then you should be able to make APIs calls using simple curl requests.

Eg. To Store Data :
```
curl -H "Content-Type: application/x-www-form-urlencoded" -X POST  http://35.185.225.67:80/storedata -d id=1 -d name=NoOne
```
Eg. To Get Data :
```
curl 35.185.225.67:80/getdata/1
```
At this stage your application is completely deployed and is externally accessible.

Manual scaling of pods

Scaling your application up or down in Kubernetes is quite straightforward. Let’s scale up the test-app deployment.
```
$ kubectl scale deployment test-app --replicas=3
```
Deployment configuration for test-app will get updated and you can see 3 replicas of test-app are running. Verify it using,
```
kubectl get pods
```
In the same manner, you can scale down your application by reducing the replica count.

Cleanup :

Un-deploying an application from Kubernetes is also quite straightforward. All we have to do is delete the services and delete the deployments. The only caveat is that the deletion of the load balancer is an asynchronous process. You have to wait until it gets deleted.
```
$ kubectl delete service mysql-service
$ kubectl delete service test-service
```
The above command will deallocate Load Balancer which was created as a part of test-service. You can check the status of the load balancer with the following command.
```
$ gcloud compute forwarding-rules list
```
Once the load balancer is deleted, you can clean-up the deployments as well.
```
$ kubectl delete deployments test-app
$ kubectl delete deployments mysql
```
Delete the Cluster:
```
$ gcloud container clusters delete my-first-cluster
```
Conclusion

In this blog, we saw how easy it is to deploy, scale & terminate applications on Google Container Engine. Google Container Engine abstracts away all the complexity of Kubernetes and gives us a robust platform to run containerised applications. I am super excited about what the future holds for Kubernetes!

Check out some of Velotio’s other blogs on Kubernetes.
December 12, 2022

Helm 3: A More Secured and Simpler Kubernetes Package Manager

What is Helm?

Helm helps you manage Kubernetes applications. Helm Charts help developers and operators easily define, install, and upgrade even the most complex Kubernetes application.

Below are the three big concepts regarding Helm.

1. Chart – A chart is a Helm package. It contains all resource definitions necessary to run an application, tool or service inside the Kubernetes cluster.

2. Repository – A repository is a place where charts can be collected and shared.

3, Release – Release is an instance of a chart running in a Kubernetes cluster. One chart can often be installed many times in the same cluster, and each time it is installed, a new release is created.

Registry – Helm Registry stores Helm charts in a hierarchy storage structure and provides a function to orchestrate charts from the existing charts. To deploy and configure registry, refer to this.

Why Helm?

It helps find and use popular software packaged as Kubernetes charts
Shares your own applications as Kubernetes charts
Manages releases of Helm packages
Creates reproducible builds of your Kubernetes applications

Changes since Helm2

Helm3 includes following major changes:

1. Client-only architecture

Helm 2 is a client-server architecture with the client called as Helm and the server called as Tiller. The client interacts with the Tiller and the chart repository. Tiller interacts with the Kubernetes API server. It renders Helm template files into Kubernetes manifest files, that it uses for operations on the Kubernetes cluster through the Kubernetes API.

Helm 3 has a client-only architecture with the client still called as Helm. It operates similar to Helm 2 client, but the client interacts directly with the Kubernetes API server. The in-cluster server Tiller is removed in Helm 3.

2. No need to initialize Helm

Initializing Helm is obsolete in version 3. i.e. Helm init was removed and you don’t need to install Tiller in the cluster and set up a Helm state before using Helm. A Helm state is created automatically, whenever required.

3. Chart dependency updated

In Helm 2, chart dependencies are declared in requirements.yaml, as shown in the following example:

dependencies:

– name: mysql

version: “1.3.2”

repository: “https://example.com/charts/mysql“

Chart dependencies are consolidated in Helm 3, hence moving the dependency definitions to Chart.yaml.

4. Chart value validation

In Helm 3, values passed to a chart during any Helm commands can be validated against a JSON schema. This validation is beneficial to help chart consumers avoid setting incorrect values and help improve chart usability. To enable consumers to avoid setting incorrect values, add a schema file named values.schema.json in the chart folder.

Following commands call the validation:

helm install
helm upgrade
helm template

5. Helm test framework updates

Helm 3 includes following updates to the test framework (helm test):

Users can define tests as job resources
The test-failure hook was removed
The test-success hook was renamed to test, but the alias remains for test-success
You can dump logs from test pods with –logs flag

Helm 3 is more than just removing Tiller. It has a lot of new capabilities. There is little or no difference from CLI or usage point of view in Helm 3 when compared with Helm 2.

Prerequisites

A running Kubernetes cluster.
The Kubernetes cluster API endpoint should be reachable from the machine you are running Helm commands.

Installing Helm

Download binary from here.
Unpack it (tar -zxvf helm-v3.0.0-linux-amd64.tgz)
Find the Helm binary and move it to its desired destination (mv linux-amd64/helm /usr/local/bin/helm)

From there, you should be able to run the client command: ‘helm help’.

Note: We will be using Helm version 3.0.0

Deploy a sample Helm Chart

Use below command to create new chart named mysql in a new directory

$ helm create mysql

$ helm create mysql

After running above command, Helm creates a directory with the following layout:

velotiotech:~/work/mysql$ tree
.
├── charts
├── Chart.yaml
├── templates
│   ├── deployment.yaml
│   ├── _helpers.tpl
│   ├── ingress.yaml
│   ├── NOTES.txt
│   ├── serviceaccount.yaml
│   ├── service.yaml
│   └── tests
│       └── test-connection.yaml
└── values.yaml
3 directories, 9 files

velotiotech:~/work/mysql$ tree
.
├── charts
├── Chart.yaml
├── templates
│   ├── deployment.yaml
│   ├── _helpers.tpl
│   ├── ingress.yaml
│   ├── NOTES.txt
│   ├── serviceaccount.yaml
│   ├── service.yaml
│   └── tests
│       └── test-connection.yaml
└── values.yaml

3 directories, 9 files

It creates a Chart.yaml file containing global variables for the chart such as version and description.

velotiotech:~/work/mysql$ cat Chart.yaml 
apiVersion: v2
name: mysql
description: A Helm chart for Kubernetes
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
version: 0.1.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application.
appVersion: 1.16.0

velotiotech:~/work/mysql$ cat Chart.yaml 
apiVersion: v2
name: mysql
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
version: 0.1.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application.
appVersion: 1.16.0

Then comes templates directory. There you put all the *.yaml files for Kubernetes. Helm uses Go template markup language to customize *.yaml files. Helm creates three default file types: deployment, service, ingress. All the files in this directory are skeletons that are filled with the variables from the values.yaml when you deploy your Helm chart. File _helpers.tpl contains your custom helper functions for variable calculation.

By default, Helm creates an nginx deployment. We will customize it to create a Helm Chart to deploy mysql on Kubernetes cluster. Add new deployment to the templates directory.

velotiotech:~/work/mysql$ cat templates/deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "mysql.fullname" . }}
spec:
  selector:
    matchLabels:
      app: {{ include "mysql.name" . }}
  template:
    metadata:
      labels:
        app: {{ include "mysql.name" . }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: {{ .Values.mysql_root_password }}
        ports:
        - containerPort: {{ .Values.service.port }}
          name: mysql
      volumes:
      - name: mysql-persistent-storage
        persistentVolumeClaim:
          claimName: {{ .Values.persistentVolumeClaim }}

velotiotech:~/work/mysql$ cat templates/deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "mysql.fullname" . }}
spec:
  selector:
    matchLabels:
      app: {{ include "mysql.name" . }}
  template:
    metadata:
      labels:
        app: {{ include "mysql.name" . }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: {{ .Values.mysql_root_password }}
        ports:
        - containerPort: {{ .Values.service.port }}
          name: mysql
      volumes:
      - name: mysql-persistent-storage
        persistentVolumeClaim:
          claimName: {{ .Values.persistentVolumeClaim }}

Also, let’s create PVC which is used in deployment by just adding below file to the templates directory.

velotiotech:~/work/mysql$ cat templates/persistentVolumeClaim.yml 
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: {{ .Values.persistentVolumeClaim }}
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

velotiotech:~/work/mysql$ cat templates/persistentVolumeClaim.yml 
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: {{ .Values.persistentVolumeClaim }}
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Helm runs each file in the templates directory through Go template rendering engine. Let’s create service.yaml for connecting to mysql instance.

velotiotech:~/work/mysql$ cat templates/service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: {{ include "mysql.fullname" . }}
spec:
  ports:
  - port: {{ .Values.service.port }}
  selector:
    app: {{ include "mysql.name" . }}
  clusterIP: None

velotiotech:~/work/mysql$ cat templates/service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: {{ include "mysql.fullname" . }}
spec:
  ports:
  - port: {{ .Values.service.port }}
  selector:
    app: {{ include "mysql.name" . }}
  clusterIP: None

Update values.yaml to populate the above chart’s templates.

velotiotech:~/work/mysql$ cat values.yaml 
# Default values for mysql.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
image:
  repository: mysql
  tag: 5.6
  pullPolicy: IfNotPresent
nameOverride: ""
fullnameOverride: ""
serviceAccount:
  # Specifies whether a service account should be created
  create: false
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name:
mysql_root_password: password 
service:
  port: 3306
persistentVolumeClaim: mysql-data-disk
resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

velotiotech:~/work/mysql$ cat values.yaml 
# Default values for mysql.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

image:
  repository: mysql
  tag: 5.6
  pullPolicy: IfNotPresent

nameOverride: ""
fullnameOverride: ""

serviceAccount:
  # Specifies whether a service account should be created
  create: false
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name:

mysql_root_password: password 

service:
  port: 3306

persistentVolumeClaim: mysql-data-disk

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

After adding above deployment files, directory structure will look like:

velotiotech:~/work/mysql$ tree
.
├── charts
├── Chart.yaml
├── templates
│   ├── deployment.yaml
│   ├── _helpers.tpl
│   ├── NOTES.txt
│   ├── persistentVolumeClaim.yml
│   ├── serviceaccount.yaml
│   ├── service.yaml
│   └── tests
│       └── test-connection.yaml
└── values.yaml
3 directories, 9 files

velotiotech:~/work/mysql$ tree
.
├── charts
├── Chart.yaml
├── templates
│   ├── deployment.yaml
│   ├── _helpers.tpl
│   ├── NOTES.txt
│   ├── persistentVolumeClaim.yml
│   ├── serviceaccount.yaml
│   ├── service.yaml
│   └── tests
│       └── test-connection.yaml
└── values.yaml

3 directories, 9 files

To render chart templates locally and display the output to check if everything is correct:

$ helm template mysql

$ helm template mysql

Execute the following helm install command to deploy our mysql chart in the Kubernetes cluster.

$ helm install mysql-release ./mysql

$ helm install mysql-release ./mysql

velotiotech:~/work$ helm install mysql-release ./mysql
NAME: mysql-release
LAST DEPLOYED: Mon Nov 25 14:48:38 2019
NAMESPACE: mysql-chart
STATUS: deployed
REVISION: 1
NOTES:
1. Use below command to connect to mysql:
   kubectl run -it --rm --image=mysql:5.6 --restart=Never mysql-client -- mysql -h mysql-release -ppassword
2. Try creating database in mysql using command:
   create database test;

velotiotech:~/work$ helm install mysql-release ./mysql
NAME: mysql-release
LAST DEPLOYED: Mon Nov 25 14:48:38 2019
NAMESPACE: mysql-chart
STATUS: deployed
REVISION: 1
NOTES:
1. Use below command to connect to mysql:
   kubectl run -it --rm --image=mysql:5.6 --restart=Never mysql-client -- mysql -h mysql-release -ppassword

2. Try creating database in mysql using command:
   create database test;

Now the chart is installed. Note that installing a Helm chart creates a new release object. The release above is named mysql-release.

To keep a track of a release’s state, or to re-read configuration information, you can use Helm status:

$ helm status mysql-release

$ helm status mysql-release

Additionally, to create a package, use below command which requires path for chart (which must contain a Chart.yaml file) and then package that directory:

$ helm package <path_to_Chart.yaml>

$ helm package <path_to_Chart.yaml>

This command creates an archive like mysql-0.1.0.tgz, with which you can share your chart with others. For instance, you can upload this file to the Helm repository.

You can also delete the sample deployment using delete command. For example,

$ helm delete mysql-release

$ helm delete mysql-release

Upgrade a release

Helm provides a way to perform an install or an upgrade as a single command. Use Helm upgrade with the –install command. This will help Helm to see if the release is already installed. If not, it will run an install. If it is, then the existing release will be upgraded.

$ helm upgrade --install <release name> --values <values file> <chart directory>

$ helm upgrade --install <release name> --values <values file> <chart directory>

December 12, 2022

Tag: kubernetes

Introduction

System Requirements

Installing K3s

Install K3s

Verify K3s Installation

Set Up kubectl

Understanding Kubernetes Networking

Viewing Pods and Their IP Addresses

Observing Network Configuration Changes

Show All Network Interfaces

Show IP Addresses

Pod-to-Pod Communication and Bridge Networks

Exploring Network Namespaces in Detail

Inspecting Network Namespaces

Executing Into Network Namespaces

Script to exec into network namespace

Veth Interfaces and Their Connection to Bridge

Demonstrating Pod-to-Pod Communication

Deploy Two Pods

Get the IP Addresses of the Pods

Ping Between Pods and Observe the Traffic Between Two Pods

Deeper Dive into the Journey of Network Packets from One Pod to Another

Packet Leaving Pod1’s Network

Packet Forwarding from cni0 to Pod2’s Network

Dynamic MAC Learning Process

Summary and Key Takeaways

Cost Analysis:

Cost Optimization Strategies:

Auto-Discovery Setup

Conclusion:

Prerequisites

The Need for Custom Prow Plugins

Getting Started with Prow Plugin Development

1. Set Up Your Development Environment

2. Choose a Plugin Type

3. Create Your Plugin

4. Deploy Your Plugin

Deploy the Docker image using Kubernetes deployment:

Create a service for deployment:

Conclusion

Why Backup and Recovery?

What to Backup?

How to Backup?

Taking etcd backup:

Backup Strategy for Internal Etcd Cluster:

Backup Strategy for External Etcd Cluster:

Disaster Recovery

Restore Strategy for Internal Etcd Cluster:

Restore Strategy for External Etcd Cluster

Conclusion

Introduction:

Prerequisites:

Overview of Velero:

Strategic Plan Overview:

Step-by-Step Guide:

Migration via Velero and CSI Integration:

Step 1 : Storage Class for Premium SSD v2

Step 2: Volume Snapshot Class

Step 3: Update Velero Deployment and Daemonset

Step 4: Take Velero Backup

Step 5: ConfigMap Deployment

Step 6: Velero Restore Operation

Step 7: Verification & Testing

Step 8: Post-Migration Cleanup

Impact:

Conclusion:

‍Introduction:

Pod Priority:

Priority Classes:

Understanding Priority Values:

Pod Preemption:

Example Scenario: The Enchanted Shop

Priority Class:

Pods:

Commands to Witness the Magic:

Conclusion:

Steps for completing this tutorial:

Create a GitHub repository

Setup a Kubernetes cluster using minikube