Category: Cloud & DevOps

  • Making Your Terminal More Productive With Z-Shell (ZSH)

    When working with servers or command-line-based applications, we spend most of our time on the command line. A good-looking and productive terminal is better in many aspects than a GUI (Graphical User Interface) environment since the command line takes less time for most use cases. Today, we’ll look at some of the features that make a terminal cool and productive.

    You can use the following steps on Ubuntu 20.04. If you are using a different operating system, your commands will likely differ. If you’re using Windows, you can choose between Cygwin, WSL, and Git Bash.

    Prerequisites

    Let’s upgrade the system and install some basic tools needed.

    sudo apt update && sudo apt upgrade
    sudo apt install build-essential curl wget git

    Z-Shell (ZSH)

    Zsh is an extended Bourne shell with many improvements, including some features of Bash and other shells.

    Let’s install Z-Shell:

    sudo apt install zsh

    Make it our default shell for our terminal:

    chsh -s $(which zsh)

    Now restart the system and open the terminal again to be welcomed by ZSH. Unlike other shells like Bash, ZSH requires some initial configuration, so it asks for some configuration options the first time we start it and saves them in a file called .zshrc in the home directory (/home/user) where the user is the current system user.

    For now, we’ll skip the manual work and get a head start with the default configuration. Press 2, and ZSH will populate the .zshrc file with some default options. We can change these later.  

    The initial configuration setup can be run again as shown in the below image:

    Oh-My-ZSH

    Oh-My-ZSH is a community-driven, open-source framework to manage your ZSH configuration. It comes with many plugins and helpers. It can be installed with one single command as below.

    Installation

    sh -c "$(wget https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh -O -)"

    It’d take a backup of our existing .zshrc in a file zshrc.pre-oh-my-zsh, so whenever you uninstall it, the backup would be restored automatically.

    Font

    A good terminal needs some good fonts, we’d use Terminess nerd font to make our terminal look awesome, which can be downloaded here. Once downloaded, extract and move them to ~/.local/share/fonts to make them available for the current user or to /usr/share/fonts to be available for all the users.

    tar -xvf Terminess.zip
    mv *.ttf ~/.local/share/fonts 

    Once the font is installed, it will look like:

    Among all the things Oh-My-ZSH provides, 2 things are community favorites, plugins, and themes.

    Theme

    My go-to ZSH theme is powerlevel10k because it’s flexible, provides everything out of the box, and is easy to install with one command as shown below:

    git clone --depth=1 https://github.com/romkatv/powerlevel10k.git ${ZSH_CUSTOM:-$HOME/.oh-my-zsh/custom}/themes/powerlevel10k

    To set this theme in .zshrc:

    Close the terminal and start it again. Powerlevel10k will welcome you with the initial setup, go through the setup with the options you want. You can run this setup again by executing the below command:

    p10k configure

    Tools and plugins we can’t live without

    Plugins can be added to the plugins array in the .zshrc file. For all the plugins you want to use from the below list, add those to the plugins array in the .zshrc file like so:

    ZSH-Syntax-Highlighting

    This enables the highlighting of commands as you type and helps you catch syntax errors before you execute them:

    As you can see, “ls” is in green but “lss” is in red.

    Execute below command to install it:

    git clone https://github.com/zsh-users/zsh-syntax-highlighting.git ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-syntax-highlighting

    ZSH Autosuggestions

    This suggests commands as you type based on your history:

    The below command is how you can install it by cloning the git repo:

    git clone https://github.com/zsh-users/zsh-autosuggestions ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-autosuggestions

    ZSH Completions

    For some extra Zsh completion scripts, execute below command

    git clone https://github.com/zsh-users/zsh-completions ${ZSH_CUSTOM:=~/.oh-my-zsh/custom}/plugins/zsh-completions 

    autojump

    It’s a faster way of navigating the file system; it works by maintaining a database of directories you visit the most. More details can be found here.

    sudo apt install autojump 

    You can also use the plugin Z as an alternative if you’re not able to install autojump or for any other reason.

    Internal Plugins

    Some plugins come installed with oh-my-zsh, and they can be included directly in .zshrc file without any installation.

    copyfile

    It copies the content of a file to the clipboard.

    copyfile test.txt

    copypath

    It copies the absolute path of the current directory to the clipboard.

    copybuffer

    This plugin copies the command that is currently typed in the command prompt to the clipboard. It works with the keyboard shortcut CTRL + o.

    sudo

    Sometimes, we forget to prefix a command with sudo, but that can be done in just a second with this plugin. When you hit the ESC key twice, it will prefix the command you’ve typed in the terminal with sudo.

    web-search

    This adds some aliases for searching with Google, Wikipedia, etc. For example, if you want to web-search with Google, you can execute the below command:

    google oh my zsh

    Doing so will open this search in Google:

    More details can be found here.

    Remember, you’d have to add each of these plugins in the .zshrc file as well. So, in the end, this is how the plugins array in .zshrc file should look like:

    plugins=(
            zsh-autosuggestions
            zsh-syntax-highlighting
            zsh-completions
            autojump
            copyfile
            copydir
            copybuffer
            history
            dirhistory
            sudo
            web-search
            git
    ) 

    You can add more plugins, like docker, heroku, kubectl, npm, jsontools, etc., if you’re a developer. There are plugins for system admins as well or for anything else you need. You can explore them here.

    Enhancd

    Enhancd is the next-gen method to navigate file system with cli. It works with a fuzzy finder, we’ll install it fzf for this purpose.

    sudo apt install fzf

    Enhancd can be installed with zplug plugin manager for Zsh, so first we’ll install zplug with the below command:

    $ curl -sL --proto-redir -all,https https://raw.githubusercontent.com/zplug/installer/master/installer.zsh | zsh

    Append the following to .zshrc:

    source ~/.zplug/init.zsh
    zplug load

    Now close your terminal, open it again, and use zplug to install enhanced

    zplug "b4b4r07/enhancd", use:init.sh

    Aliases

    As a developer, I need to execute git commands many times a day, typing each command every time is too cumbersome, so we can use aliases for them. Aliases need to be added .zshrc, and here’s how we can add them.

    alias gs='git status'
    alias ga='git add .'
    alias gf='git fetch'
    alias gr='git rebase'
    alias gp='git push'
    alias gd='git diff'
    alias gc='git commit'
    alias gh='git checkout'
    alias gst='git stash'
    alias gl='git log --oneline --graph'

    You can add these anywhere in the .zshrc file.

    Colorls

    Another tool that makes you say wow is Colorls. This tool colorizes the output of the ls command. This is how it looks once you install it:

    It works with Ruby, below is how you can install both Ruby and Colors:

    sudo apt install ruby ruby-dev ruby-colorize
    sudo gem install colorls

    Now, restart your terminal and execute the command colors in your terminal to see the magic!

    Bonus – We can add some aliases as well if we want the same output of Colorls when we execute the command ls. Note that we’re adding another alias for ls to make it available as well.

    alias cl='ls'
    alias ls='colorls'
    alias la='colorls -a'
    alias ll='colorls -l'
    alias lla='colorls -la'

    These are the tools and plugins I can’t live without now, Let me know if I’ve missed anything.

    Automation

    Do you wanna repeat this process again, if let’s say, you’ve bought a new laptop and want the same setup?

    You can automate all of this if your answer is no, and that’s why I’ve created Project Automator. This project does a lot more than just setting up a terminal: it works with Arch Linux as of now but you can take the parts you need and make it work with almost any *nix system you like.

    Explaining how it works is beyond the scope of this article, so I’ll have to leave you guys here to explore it on your own.

    Conclusion

    We need to perform many tasks on our systems, and using a GUI(Graphical User Interface) tool for a task can consume a lot of your time, especially if you repeat the same task on a daily basis like converting a media stream, setting up tools on a system, etc.

    Using a command-line tool can save you a lot of time and you can automate repetitive tasks with scripting. It can be a great tool for your arsenal.

  • Payments Engineering & Intelligence

    Our Payments Engineering & Intelligence flyer showcases how we help enterprises modernize and scale payment ecosystems with:

    • End-to-end gateway lifecycle management, ensuring uptime, compliance, and API continuity across global acquirers
    • AI-driven fraud detection and risk scoring for secure, low-friction transaction experiences
    • Automated reconciliation, settlement, and treasury workflows for error-free, real-time financial control
    • Modular SDKs, tokenization, and smart routing to accelerate integration and reduce cost of ownership
    • Payments Intelligence dashboards and ML pipelines turning transaction data into actionable insights

    With this flyer you will –

    • See how enterprises achieve up to 75% faster integration cycles and 40% reduction in operational overhead
    • Discover how R Systems helps transform payments from a cost center to a strategic revenue enabler
    • Learn how to build resilient, compliant, and intelligent payments architectures ready for scale
  • OptimaAI Suite

    Our OptimaAI Suite flyer showcases how R Systems helps enterprises harness GenAI across the entire software lifecycle with:

    • AI-assisted software delivery copilots for coding, reviews, testing, and deployment
    • GenAI-powered modernization for legacy systems, accelerating transformation
    • Secure, governed frameworks with responsible AI guardrails and compliance checks
    • Intelligent interfaces, chatbots, copilots, voice agents, and search to boost user productivity
    • Domain-specific LLMs, pipelines, and accelerators tailored to industry needs

    With this flyer you will –

    • See how organizations achieved 18% faster development and 16% efficiency gains in modernization
    • Discover proven OptimaAI Suite implementations that reduce costs, enhance quality, and speed innovation
    • Learn how to scale AI adoption responsibly across engineering, operations, and customer experience
  • LegalTech

    Our LegalTech flyer explores how R Systems empowers LegalTech Industry and corporate legal departments with:

    • AI-powered contract analysis and intelligent workflow automation
    • Secure cloud-enabled infrastructure for collaboration and scale
    • Predictive analytics and actionable insights for smarter decisions
    • Seamless integration between legacy systems and modern LegalTech

    With this flyer you will – 

    • See how leading firms achieved 50% faster case resolutions and 65% cost savings
    • Discover proven LegalTech solutions that reduce manual work and compliance risks
    • Learn how to modernize legal operations without disrupting existing systems
  • Shebang Your Shell Commands with GenAI using AWS Bedrock

    Generative AI (GenAI) is no longer a mystery—it’s been around for over two years now. Developers are leveraging GenAI for a wide range of tasks: writing code, handling customer queries, powering RAG pipelines for data retrieval, generating images and videos from text, and much more.

    In this blog post, we’ll integrate an AI model directly into the shell, enabling real-time translation of natural language queries into Linux shell commands—no more copying and pasting from tools like ChatGPT or Google Gemini. Even if you’re a Linux power user who knows most commands by heart, there are always moments when a specific command escapes you. We’ll use Amazon Bedrock, a fully managed serverless service, to run inferences with the model of our choice. For development and testing, we’ll start with local model hosting using Ollama and Open WebUI. Shell integration examples will cover both Zsh and Bash.

    Setting up Ollama and OpenWebUI for prompt testing

    1. Install Ollama

    curl -fsSL https://ollama.com/install.sh | sh

    2. Start Ollama service

    systemctl enable ollama && systemctl start ollama

    By default, Ollama listens on port 11434. If you’re comfortable without a user interface like ChatGPT, you can start sending prompts directly to the /api/generate endpoint using tools like curl or Postman. Alternatively, you can run a model from the shell using:

    ollama run <model_name>

    3. Install open web ui

    At this step we assume that you have python pip installed.

    pip install open-webui

    Now that Open WebUI is installed, let’s pull a model and begin prompt development. For this example, we’ll use the mistral model locally

    4. Pull mistral:7b or mistral:latest model

    ollama pull mistral:latest

    5. Start Open-WebUI server

    open-webui serve

    This starts the Open WebUI on the default port 8080. Open your favorite web browser and navigate to http://localhost:8080/. Set an initial username and password. Once configured, you’ll see an interface similar to ChatGPT. You can choose your model from the dropdown in the top-left corner.

    Testing the prompt in Open-WebUI and with API calls:

    Goal:

    • User types a natural language query
    • Model receives the input and processes it
    • Model generates a structured JSON output
    • The shell replaces the original query with the actual command

    Why Structured Output Instead of Plain Text?

    You might wonder—why not just instruct the model to return a plain shell command with strict prompting rules? During testing, we observed that even with rigid prompt instructions, the model occasionally includes explanatory text. This often happens when the command in question could be dangerous or needs caution.

    For instance, the dd command can write directly to disk at a low level. Models like Mistral or Llama may append a warning or explanation along with the command to prevent accidental misuse. Using structured JSON helps us isolate the actual command cleanly, regardless of any extra text the model may generate.

    The Prompt:

    You are a linux system administrator and devops engineer assistant used in an automated system that parses your responses as raw JSON.
    STRICT RULES:
    - Output MUST be only valid raw JSON. Do NOT include markdown, backticks, or formatting tags.
    - NO explanations, no introductory text, and no comments.
    - If no suitable command is found, output: {"command": "", "notes": "no command found", "status": "error"}
    - Output must always follow this exact schema:
    {
        "command": "<actual Linux command here>",
        "notes": "<if applicable, add any notes to the command>",
        "status": "success/error"
    }
    - Any deviation from this format will result in system error.
    Respond to the following user query as per the rules above:
    <Query Here>

    Let’s test it with the query:
    “start nginx container backed by alpine image”

    And here’s the structured response we get:

    {
    "command": "docker run -d --name my-nginx -p 80:80 -p 443:443 -v /etc/nginx/conf.d:/etc/nginx/conf.d nginx:alpine",  
    "notes": "Replace 'my-nginx' with a suitable container name.",  
    "status": "success"
    }

    Bingo! This is exactly the output we expect—clean, structured, and ready for direct use.

    Now that our prompt works as expected, we can test it directly via Ollama’s API.

    Assuming your payload is saved in /tmp/payload.json, you can make the API call using curl:

    {
      "model": "phi4:latest",
      "prompt": "You are a linux system administrator and devops engineer assistant used in an automated system that parses your responses as raw JSON.nSTRICT RULES:n- Output MUST be only valid raw JSON. Do NOT include markdown, backticks, or formatting tags.n- NO explanations, no introductory text, and no comments.n- If no suitable command is found, output: {"command": "", "notes": "no command found", "status": "error"}n- Output must always follow this exact schema:n{n    "command": "<actual Linux command here>",n    "notes": "<if applicable, add any notes to the command>",n    "status": "success/error"n}n- Any deviation from this format will result in system error.nRespond to the following user query as per the rules above:nstart nginx container backed by alpine image",
      "stream": false
    }

    curl -d @/tmp/payload.json -H 'Content-Type: application/json' 'http://localhost:11434/api/generate'

    Note: Ensure that smart quotes (‘’) are not used in your actual command—replace them with straight quotes (”) to avoid errors in the terminal.

    This allows you to interact with the model programmatically, bypassing the UI and integrating the prompt into automated workflows or CLI tools.

    Setting up AWS Bedrock Managed Service

    Login to the AWS Console and navigate to the Bedrock service.

    Under Foundation Models, filter by Serverless models.

    Subscribe to a model that suits code generation use cases. For this blog, I’ve chosen Anthropic Claude 3.7 Sonnet, known for strong code generation capabilities.
    Alternatively, you can go with Amazon Titan or Amazon Nova models, which are more cost-effective and often produce comparable results.

    Configure Prompt Management

    1. Once subscribed, go to the left sidebar and under Builder Tools, click on Prompt Management.

    2. Click Create prompt and give it a name—e.g., Shebang-NLP-TO-SHELL-CMD.

    3. In the next window:

    • Expand System Instructions and paste the structured prompt we tested earlier (excluding the <Query Here> placeholder).
    • In the User Message, enter {{question}} — this will act as a placeholder for the user’s natural language query.

    4. Under Generative AI Resource, select your subscribed model.

    5. Leave the randomness and diversity settings as default. You may reduce the temperature slightly to get more deterministic responses, depending on your needs.

    6. At the bottom of the screen, you should see the question variable under the Test Variables section.
    Add a sample value like: list all docker containers

    7. Click Run. You should see the structured JSON response on the right pane.

    8. If the output looks good, click Create Version to save your tested prompt.

    Setting Up a “Flow” in AWS Bedrock

    1. From the left sidebar under Builder Tools, click on Flows.

    2. Click the Create Flow button.

    • Name your flow (e.g., ShebangShellFlow).
    • Keep the “Create and use a new service role” checkbox selected.
    • Click Create flow.

    Once created, you’ll see a flow graph with the following nodes:

    • Flow Input
    • Prompts
    • Flow Output

    Configure Nodes

    • Click on the Flow Input and Flow Output nodes.
      Note down the Node Name and Output Name (default: FlowInputNode and document, respectively).
    • Click on the Prompts node, then in the Configure tab on the left:
      • Select “Use prompt from prompt management” 
      • From the Prompt dropdown, select the one you created earlier.
      • Choose the latest Version of the prompt.
      • Click Save.
    Test the Flow

    You can now test the flow by providing a sample natural language input like:

    list all docker containers

    Finalizing the Flow

    1. Go back to the Flows list and select the flow you just created.

    2. Note down the Flow ID or ARN.

    3. Click Publish Version to create the first version of your flow.

    4. Navigate to the Aliases tab and click Create Alias:

    • Name your alias (e.g., prod or v1).
    • Choose “Use existing version to associate this alias”.
    • From the Version dropdown, select Version 1.
      Click Create alias.

    5. After it’s created, click on the new alias under the Aliases tab and note the Alias ARN—you’ll need this when calling the flow programmatically.

    Shell Integration for ZSH and BASH

    Configuring IAM Policy

    To use the Bedrock flow from your CLI, you need a minimal IAM policy as shown below:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "Statement1",
          "Effect": "Allow",
          "Action": [
            "bedrock:InvokeFlow"
          ],
          "Resource": "<flow resource arn>"
        }
      ]
    }

    Attach this policy to the IAM user whose credentials you’ll use for invoking the flow.

    Note: This guide does not cover AWS credential configuration (e.g., ~/.aws/credentials).

    Bedrock Flow API

    AWS provides a REST endpoint to invoke a Bedrock flow:

    `/flows/<flowIdentifier>/aliases/<flowAliasIdentifier>`

    You can find the official API documentation here:
    InvokeFlow API Reference

    To simplify request signing (e.g., AWS SigV4), language-specific SDKs are available. For this example, we use the AWS SDK v3 for JavaScript and the InvokeFlowCommand from the @aws-sdk/client-bedrock-agent-runtime package:

    SDK Reference:
    https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/bedrock-agent-runtime/command/InvokeFlowCommand/

    Required Parameters

    You’ll need to substitute the following values in your SDK/API calls:

    • flowIdentifier: ID or ARN of the Bedrock flow
    • flowAliasIdentifier: Alias ARN of the flow version
    • nodeName: Usually FlowInputNode
    • content.document: Natural language query
    • nodeOutputName: Usually document

    Shell Script Integration

    The Node.js script reads a natural language query from standard input (either piped or redirected) and invokes the Bedrock flow accordingly. You can find the full source code of this project in the GitHub repo:
    https://github.com/azadsagar/ai-shell-helper

    Environment Variables

    To keep the script flexible across local and cloud-based inference, the following environment variables are used:

    INFERENCE_MODE="<ollama|aws_bedrock>"
    
    # For local inference
    OLLAMA_URL="http://localhost:11434"
    
    # For Bedrock inference
    BEDROCK_FLOW_IDENTIFIER="<flow ID or ARN>"
    BEDROCK_FLOW_ALIAS="<alias name or ARN>"
    AWS_REGION="us-east-1"

    Set INFERENCE_MODE to ollama if you want to use a locally hosted model.

    Configure ZSH/BASH shell to perform magic – Shebang

    When you type in a Zsh shell, your input is captured in a shell variable called LBUFFER. This is a duplex variable—meaning it can be read and also written back to. Updating LBUFFER automatically updates your shell prompt in place.

    In the case of Bash, the corresponding variable is READLINE_LINE. However, unlike Zsh, you must manually update the cursor position after modifying the input. You can do this by calculating the string length using ${#READLINE_LINE} and setting the cursor accordingly. This ensures the cursor moves to the end of the updated line.

    From Natural Language to Shell Command

    Typing natural language directly in the shell and pressing Enter would usually throw a “command not found” error. Instead, we’ll map a shortcut key to a shell function that:

    • Captures the input (LBUFFER for Zsh, READLINE_LINE for Bash)
    • Sends it to a Node.js script via standard input
    • Replaces the shell line with the generated shell command

    Zsh Integration Example

    In Zsh, you must register the shell function as a Zsh widget, then bind it to a shortcut using bindkey.

    function ai-command-widget() {
      alias ai-cmd='node $HOME/ai-shell-helper/main.js'
    
      local input
      input="$LBUFFER"
      local cmdout
      cmdout=$(echo "$input" | ai-cmd)
    
      # Replace current buffer with AI-generated command
      LBUFFER="$cmdout"
    }
    
    # Register the widget
    zle -N ai-command-widget
    
    # Bind Ctrl+G to the widget
    bindkey '^G' ai-command-widget

    Bash Integration Example

    In Bash, the setup is slightly different. You bind the function using the bind command and use READLINE_LINE for input and output.

    ai_command_widget() {
      local input="$READLINE_LINE"
      local cmdout
      cmdout=$(echo "$input" | node "$HOME/ai-shell-helper/main.js")
    
      READLINE_LINE="$cmdout"
      READLINE_POINT=${#READLINE_LINE}
    }
    
    # Bind Ctrl+G to the function
    bind -x '"C-g": ai_command_widget'

    Note: Ensure that Node.js and npm are installed on your system before proceeding.

    Quick Setup

    If you’ve cloned the GitHub repo into your home directory, run the following to install dependencies and activate the integration:

    cd ~/ai-shell-helper && npm install
    
    # For Zsh
    echo "source $HOME/ai-shell-helper/zsh_int.sh" >> ~/.zshrc
    
    # For Bash
    echo "source $HOME/ai-shell-helper/bash_int.sh" >> ~/.bashrc

    Then, start a new terminal session.

    Try It Out!

    In your new shell, type a natural language query like:

    list all docker containers

    Now press Ctrl+G.
    You’ll see your input replaced with the actual command:

    docker ps -a

    And that’s the magic of Shebang Shell with GenAI!

    Demo Video:

  • Linux Internals of Kubernetes Networking

    Introduction

    This blog is a hands-on guide designed to help you understand Kubernetes networking concepts by following along. We’ll use K3s, a lightweight Kubernetes distribution, to explore how networking works within a cluster.

    System Requirements

    Before getting started, ensure your system meets the following requirements:

    • A Linux-based system (Ubuntu, CentOS, or equivalent).
    • At least 2 CPU cores and 4 GB of RAM.
    • Basic familiarity with Linux commands.

    Installing K3s

    To follow along with this guide, we first need to install K3s—a lightweight Kubernetes distribution designed for ease of use and optimized for resource-constrained environments.

    Install K3s

    You can install K3s by running the following command in your terminal:

    curl -sfL https://get.k3s.io | sh -

    This script will:

    1. Download and install the K3s server.
    2. Set up the necessary dependencies.
    3. Start the K3s service automatically after installation.

    Verify K3s Installation

    After installation, you can check the status of the K3s service to make sure everything is running correctly:

    systemctl status k3s

    If everything is correct, you should see that the K3s service is active and running.

    Set Up kubectl

    K3s comes bundled with its own kubectl binary. To use it, you can either:

    Use the K3s binary directly:

    k3s kubectl get pods -A

    Or set up the kubectl config file by exporting the Kubeconfig path:

    export KUBECONFIG="/etc/rancher/k3s/k3s.yaml"
    sudo chown -R $USER $KUBECONFIG
    kubectl get pods -A

    Understanding Kubernetes Networking

    In Kubernetes, networking plays a crucial role in ensuring seamless communication between pods, services, and external resources. In this section, we will dive into the network configuration and explore how pods communicate with one another.

    Viewing Pods and Their IP Addresses

    To check the IP addresses assigned to the pods, use the following kubectl command:

    CODE: https://gist.github.com/velotiotech/1961a4cdd5ec38f7f0fbe0523821dc7f.sh

    This will show you a list of all the pods across all namespaces, including their corresponding IP addresses. Each pod is assigned a unique IP address within the cluster.

    You’ll notice that the IP addresses are assigned by Kubernetes and typically belong to the range specified by the network plugin (such as Flannel, Calico, or the default CNI). K3s uses Flannel CNI by default and sets default pod CIDR as 10.42.0.0/24. These IPs allow communication within the cluster.

    Observing Network Configuration Changes

    Upon starting K3s, it sets up several network interfaces and configurations on the host machine. These configurations are key to how the Kubernetes networking operates. Let’s examine the changes using the IP utility.

    Show All Network Interfaces

    Run the following command to list all network interfaces:

    ip link show

    This will show all the network interfaces.

    • lo, enp0s3, and enp0s9 are the network interfaces that belong to the host.  
    • flannel.1 interface is created by Flannel CNI for inter-pod communication that exists on different nodes.
    • cni0 interface is created by bridge CNI plugin for inter-pod communication that exists on the same node.
    • vethXXXXXXXX@ifY interface is created by bridge CNI plugin. This interface connects pods with the cni0 bridge.

    Show IP Addresses

    To display the IP addresses assigned to the interfaces:

    ip -c -o addr show

    You should see the IP addresses of all the network interfaces. With regards to K3s-related interfaces, only cni0 and flannel.1 have IP addresses. The rest of the vethXXXXXXXX interfaces only have MAC addresses; the details regarding this will be explained in the later section of this blog.

    Pod-to-Pod Communication and Bridge Networks

    The diagram illustrates how container networking works within a Kubernetes (K3s) node, showing the key components that enable pods to communicate with each other and the outside world. Let’s break down this networking architecture:

    At the top level, we have the host interface (enp0s9) with IP 192.168.2.224, which is the node’s physical network interface connected to the external network. This is the node’s gateway to the outside world.

    enp0s9 interface is connected to the cni0 bridge (IP: 10.42.0.1/24), which acts like a virtual switch inside the node. This bridge serves as the internal network hub for all pods running on the node.

    Each of the pods runs in its own network namespace, with each one having its own separate network stack, which includes its own network interfaces and routing tables. Each of the pod’s internal interfaces, eth0, as shown in the diagram above, has an IP address, which is the pod’s IP address. eth0 inside the pod is connected to its virtual ethernet (veth) pair that exists in the host’s network and connects the eth0 interface of the pod to the cni0 bridge.

    Exploring Network Namespaces in Detail

    Kubernetes uses network namespaces to isolate networking for each pod, ensuring that pods have separate networking environments and do not interfere with each other. 

    A network namespace is a Linux kernel feature that provides network isolation for a group of processes. Each namespace has its own network interfaces, IP addresses, routing tables, and firewall rules. Kubernetes uses this feature to ensure that each pod has its own isolated network environment.

    In Kubernetes:

    • Each pod has its own network namespace.
    • Each container within a pod shares the same network namespace.

    Inspecting Network Namespaces

    To inspect the network namespaces, follow these steps:

    If you installed k3s as per this blog, k3s by default selects containerd runtime, your commands to get the container pid will be different if you run k3s with docker or other container runtimes.

    Identify the container runtime and get the list of running containers.

    sudo crictl ps

    Get the container-id from the output and use it to get the process ID

    sudo crictl inspect <container-id> | grep pid

    Check the network namespace associated with the container

    sudo ls -l /proc/<container-pid>/ns/net

    You can use nsenter to enter the network namespace for further exploration.

    Executing Into Network Namespaces

    To explore the network settings of a pod’s namespace, you can use the nsenter command.

    sudo nsenter --net=/proc/<container-pid>/ns/net
    ip addr show

    Script to exec into network namespace

    You can use the following script to get the container process ID and exec into the pod network namespace directly.

    POD_ID=$(sudo crictl pods --name <pod_name> -q) 
    CONTAINER_ID=$(sudo crictl ps --pod $POD_ID -q) 
    nsenter -t $(sudo crictl inspect $CONTAINER_ID | jq -r .info.pid) -n ip addr show

    Veth Interfaces and Their Connection to Bridge

    Inside the pod’s network namespace, you should see the pod’s interfaces (lo and eth0) and the IP address: 10.42.0.8 assigned to the pod. If observed closely, we see eth0@if13, which means eth0 is connected to interface 13 (in your system the corresponding veth might be different). Interface eth0 inside the pod is a virtual ethernet (veth) interface, veths are always created in interconnected pairs. In this case, one end of veth is eth0 while the other part is if13. But where does if13 exist? It exists as a part of the host network connecting the pod’s network to the host network via the bridge (cni0) in this case.

    ip link show | grep 13

    Here you see veth82ebd960@if2, which denotes that the veth is connected to interface number 2 in the pod’s network namespace. You can verify that the veth is connected to bridge cni0 as follows and that the veth of each pod is connected to the bridge, which enables communication between the pods on the same node.

    brctl show

    Demonstrating Pod-to-Pod Communication

    Deploy Two Pods

    Deploy two busybox pods to test communication:

    kubectl run pod1 --image=busybox --restart=Never -- sleep infinity
    kubectl run pod2 --image=busybox --restart=Never -- sleep infinity

    Get the IP Addresses of the Pods

    kubectl get pods pod1 pod2 -o wide -A

    Pod1 IP : 10.42.0.9

    Pod2 IP : 10.42.0.10

    Ping Between Pods and Observe the Traffic Between Two Pods

    Before we ping from Pod1 to Pod2, we will set up a watch on cni0 and veth pair of Pod1 and pod2 that are connected to cni0 using tcpdump.

    Open three terminals and set up the tcpdump listeners: 

    # Terminal 1 – Watch traffic on cni0 bridge 

    sudo tcpdump -i cni0 icmp

     # Terminal 2 – Watch traffic on veth1 (Pod1’s veth pair)

    sudo tcpdump -i veth3a94f27 icmp

    # Terminal 3 – Watch traffic on veth2 (Pod2’s veth pair) 

    sudo tcpdump -i veth18eb7d52 icmp

    Exec into Pod1 and ping Pod2:

    kubectl exec -it pod1 -- ping -c 4 <pod2-IP>

    Watch results on veth3a94f27 pair of Pod1.

    Watch results on cni0:

    Watch results on veth18eb7d52 pair of Pod2:

    Observing the timestamps for each request and reply on different interfaces, we get the flow of request/reply, as shown in the diagram below.

    Deeper Dive into the Journey of Network Packets from One Pod to Another

    We have already seen the flow of request/reply between two pods via veth interfaces connected to each other in a bridge network. In this section, we will discuss the internal details of how a network packet reaches from one pod to another.

    Packet Leaving Pod1’s Network

    Inside Pod1’s network namespace, the packet originates from eth0 (Pod1’s internal interface) and is sent out via its virtual ethernet interface pair in the host network. The destination address of the network packet is 10.0.0.10, which lies within the CIDR range 10.42.0.0 – 10.42.0.255 hence it matches the second route.

    The packet exits Pod1’s namespace and enters the host namespace via the connected veth pair that exists in the host network. The packet arrives at bridge cni0 since it is the master of all the veth pairs that exist in the host network.

    Once the packet reaches cni0, it gets forwarded to the correct veth pair connected to Pod2.

    Packet Forwarding from cni0 to Pod2’s Network

    When the packet reaches cni0, the job of cni0 is to forward this packet to Pod2. cni0 bridge acts as a Layer2 switch here, which just forwards the packet to the destination veth. The bridge maintains a forwarding database and dynamically learns the mapping of the destination MAC address and its corresponding veth device. 

    You can view forwarding database information with the following command:

    bridge fdb show

    In this screenshot, I have limited the result of forwarding database to just the MAC address of Pod2’s eth0

    1. First column: MAC address of Pod2’s eth0
    2. dev vethX: The network interface this MAC address is reachable through
    3. master cni0: Indicates this entry belongs to cni0 bridge
    4. Flags that may appear:
      • permanent: Static entry, manually added or system-generated
      • self: MAC address belongs to the bridge interface itself
      • No flag: The entry is Dynamically learned.

    Dynamic MAC Learning Process

    When a packet is generated with a payload of ICMP requests made from Pod1, it is packed as a frame at layer 2 with source MAC as the MAC address of the eth0 interface in Pod1, in order to get the destination MAC address, eth0 broadcasts an ARP request to all the network interfaces the ARP request contains the destination interface’s IP address.

    This ARP request is received by all interfaces connected to the bridge, but only Pod2’s eth0 interface responds with its MAC address. The destination MAC address is then added to the frame, and the packet is sent to the cni0 bridge.

    This destination MAC address is added to the frame, and it is sent to the cni0 bridge.  

    When this frame reaches the cni0 bridge, the bridge will open the frame and it will save the source MAC against the source interface(veth pair of pod1’s eth0 in the host network) in the forwarding table.

    Now the bridge has to forward the frame to the appropriate interface where the destination lies (i.e. veth pair of Pod2 in the host network). If the forwarding table has information about veth pair of Pod2 then the bridge will forward that information to Pod2, else it will flood the frame to all the veths connected to the bridge, hence reaching Pod2.

    When Pod2 sends the reply to Pod1 for the request made, the reverse path is followed. In this case, the frame leaves Pod2’s eth0 and is tunneled to cni0 via the veth pair of Pod2’s eth0 in the host network. Bridge adds the source MAC address (in this case, the source will be Pod2’s eth0) and the device from which it is reachable in the forwarding database, and forwards the reply to Pod1, hence completing the request and response cycle.

    Summary and Key Takeaways

    In this guide, we explored the foundational elements of Linux that play a crucial role in Kubernetes networking using K3s. Here are the key takeaways:

    • Network Namespaces ensure pod isolation.
    • Veth Interfaces connect pods to the host network and enable inter-pod communication.
    • Bridge Networks facilitate pod-to-pod communication on the same node.

    I hope you gained a deeper understanding of how Linux internals are used in Kubernetes network design and how they play a key role in pod-to-pod communication within the same node.

  • Taming the OpenStack Beast – A Fun & Easy Guide!

    So, you’ve heard about OpenStack, but it sounds like a mythical beast only cloud wizards can tame? Fear not! No magic spells or enchanted scrolls are needed—we’re breaking it down in a simple, engaging, and fun way.

    Ever felt like managing cloud infrastructure is like trying to tame a wild beast? OpenStack might seem intimidating at first, but with the right approach, it’s more like training a dragon —challenging but totally worth it!

    By the end of this guide, you’ll not only understand OpenStack but also be able to deploy it like a pro using Kolla-Ansible. Let’s dive in! 🚀

    🤔 What Is OpenStack?

    Imagine you’re running an online store. Instead of buying an entire warehouse upfront, you rent shelf space, scaling up or down based on demand. That’s exactly how OpenStack works for computing!

    OpenStack is an open-source cloud platform that lets companies build, manage, and scale their own cloud infrastructure—without relying on expensive proprietary solutions.

    Think of it as LEGO blocks for cloud computing—but instead of plastic bricks, you’re assembling compute, storage, and networking components to create a flexible and powerful cloud. 🧱🚀

    🤷‍♀️ Why Should You Care?

    OpenStack isn’t just another cloud platform—it’s powerful, flexible, and built for the future. Here’s why you should care:

    It’s Free & Open-Source – No hefty licensing fees, no vendor lock-in—just pure, community-driven innovation. Whether you’re a student, a startup, or an enterprise, OpenStack gives you the freedom to build your own cloud, your way.

    Trusted by Industry Giants – If OpenStack is good enough for NASA, PayPal, and CERN (yes, the guys running the Large Hadron Collider ), it’s definitely worth your time! These tech powerhouses use OpenStack to manage mission-critical workloads, proving its reliability at scale.

    Super Scalable – Whether you’re running a tiny home lab or a massive enterprise deployment, OpenStack grows with you. Start with a few nodes and scale to thousands as your needs evolve—without breaking a sweat.

    Perfect for Hands-On Learning – Want real-world cloud experience? OpenStack is a playground for learning cloud infrastructure, automation, and networking. Setting up your own OpenStack lab is like a DevOps gym—you’ll gain hands-on skills that are highly valued in the industry.

    ️🏗️ OpenStack Architecture in Simple Terms – The Avengers of Cloud Computing

    OpenStack is a modular system. Think of it as assembling an Avengers team, where each component has a unique superpower, working together to form a powerful cloud infrastructure. Let’s break down the team:

    🦾 Nova (Iron Man) – The Compute Powerhouse

    Just like Iron Man powers up in his suit, Nova is the core component that spins up and manages virtual machines (VMs) in OpenStack. It ensures your cloud has enough compute power and efficiently allocates resources to different workloads.

    • Acts as the brain of OpenStack, managing instances on physical servers.
    • Works with different hypervisors like KVM, Xen, and VMware to create VMs.
    • Supports auto-scaling, so your applications never run out of power.

    ️🕸️ Neutron (Spider-Man) – The Web of Connectivity

    Neutron is like Spider-Man, ensuring all instances are connected via a complex web of virtual networking. It enables smooth communication between your cloud instances and the outside world.

    • Provides network automation, floating IPs, and load balancing.
    • Supports custom network configurations like VLANs, VXLAN, and GRE tunnels.
    • Just like Spidey’s web shooters, it’s flexible, allowing integration with SDN controllers like Open vSwitch and OVN.

    💪 Cinder (Hulk) – The Strength Behind Storage

    Cinder is OpenStack’s block storage service, acting like the Hulk’s immense strength, giving persistent storage to VMs. When VMs need extra storage, Cinder delivers!

    • Allows you to create, attach, and manage persistent block storage.
    • Works with backend storage solutions like Ceph, NetApp, and LVM.
    • If a VM is deleted, the data remains safe—just like Hulk’s memory, despite all the smashing.

    📸 Glance (Black Widow) – The Memory Keeper

    Glance is OpenStack’s image service, storing and managing operating system images, much like how Black Widow remembers every mission.

    • Acts as a repository for VM images, including Ubuntu, CentOS, and custom OS images.
    • Enables fast booting of instances by storing pre-configured templates.
    • Works with storage backends like Swift, Ceph, or NFS.

    🔑 Keystone (Nick Fury) – The Security Gatekeeper

    Keystone is the authentication and identity service, much like Nick Fury, who ensures that only authorized people (or superheroes) get access to SHIELD.

    • Handles user authentication and role-based access control (RBAC).
    • Supports multiple authentication methods, including LDAP, OAuth, and SAML.
    • Ensures that users and services only access what they are permitted to see.

    🧙‍♂️ Horizon (Doctor Strange) – The All-Seeing Dashboard

    Horizon provides a web-based UI for OpenStack, just like Doctor Strange’s ability to see multiple dimensions.

    • Gives a graphical interface to manage instances, networks, and storage.
    • Allows admins to control the entire OpenStack environment visually.
    • Supports multi-user access with dashboards customized for different roles.

    🚀 Additional Avengers (Other OpenStack Services)

    • Swift (Thor’s Mjolnir) – Object storage, durable and resilient like Thor’s hammer.
    • Heat (Wanda Maximoff) – Automates cloud resources like magic.
    • Ironic (Vision) – Bare metal provisioning, a bridge between hardware and cloud.

    Each of these heroes (services) communicates through APIs, working together to make OpenStack a powerful cloud platform.

    apt update &&

    ️🛠️ How This Helps in Installation

    Understanding these services will make it easier to set up OpenStack. During installation, configure each component based on your needs:

    • If you need VMs, you focus on Nova, Glance, and Cinder.
    • If networking is key, properly configure Neutron.
    • Secure access? Keystone is your best friend.

    Now that you know the Avengers of OpenStack, you’re ready to start your cloud journey. Let’s get our hands dirty with some real-world OpenStack deployment using Kolla-Ansible.

    ️🛠️ Hands-on: Deploying OpenStack with Kolla-Ansible

    So, you’ve learned the Avengers squad of OpenStack—now it’s time to assemble your own OpenStack cluster! 💪

    🔍 Pre-requisites: What You Need Before We Begin

    Before we start, let’s make sure you have everything in place:

    🖥️ Hardware Requirements (Minimum for a Test Setup)

    • 1 Control Node + 1 Compute Node (or more for better scaling).
    • At least 8GB RAM, 4 vCPUs, 100GB Disk per node (More = Better).
    • Ubuntu 22.04 LTS (Recommended) or CentOS 9 Stream.
    • Internet Access (for downloading dependencies).

    🔧 Software & Tools Needed

    Python 3.10+ – Because Python runs the world.

    Ansible 8-9 (ansible-core 2.15-2.16) – Automating OpenStack deployment.

    Docker & Docker Compose – Because we’re running OpenStack in containers!

    Kolla-Ansible – The magic tool for OpenStack deployment.

    Step-by-Step: Setting Up OpenStack with Kolla-Ansible

    1️⃣ Set Up Your Environment

    First, update your system and install dependencies:

    apt update && sudo apt upgrade -y
    apt-get install python3-dev libffi-dev gcc libssl-dev python3-selinux python3-setuptools python3-venv -y

    python3 -m venv kolla-venv
    echo "source ~/kolla-venv/bin/activate" >> ~/.bashrc
    source ~/kolla-venv/bin/activate

    Install Ansible & Docker:

    sudo apt install python3-pip -y
    pip install -U pip
    pip install 'ansible-core>=2.15,<2.17' ansible

    2️⃣ Install Kolla-Ansible

    pip install git+https://opendev.org/openstack/kolla-ansible@master

    3️⃣ Prepare Configuration Files

    Copy default configurations to /etc/kolla:

    mkdir -p /etc/kolla
    sudo chown $USER:$USER /etc/kolla
    cp -r /usr/local/share/kolla-ansible/etc/kolla/* /etc/kolla/

    Generate passwords for OpenStack services:

    kolla-genpwd

    Before deploying OpenStack, let’s configure some essential settings in globals.yml. This file defines how OpenStack services are installed and interact with your infrastructure.

    Run the following command to edit the file:

    nano /etc/kolla/globals.yml

    Here are a few key parameters you must configure:

    kolla_base_distro – Defines the OS used for deployment (e.g., ubuntu or centos).

    kolla_internal_vip_address – Set this to a free IP in your network. It acts as the virtual IP for OpenStack services. Example: 192.168.1.100.

    network_interface – Set this to your main network interface (e.g., eth0). Kolla-Ansible will use this interface for internal communication. (Check using ip -br a)

    enable_horizon – Set to yes to enable the OpenStack web dashboard (Horizon).

    Once configured, save and exit the file. These settings ensure that OpenStack is properly installed in your environment.

    4️⃣ Bootstrap the Nodes (Prepare Servers for Deployment)

    kolla-ansible -i /etc/kolla/inventory/all-in-one bootstrap-servers

    5️⃣ Deploy OpenStack! (The Moment of Truth)

    kolla-ansible -i /etc/kolla/inventory/all-in-one deploy

    This step takes some time (~30 minutes), so grab some ☕ and let OpenStack build itself.

    6️⃣ Access Horizon (Web Dashboard)

    Once deployment is complete, check the OpenStack dashboard:

    kolla-ansible post-deploy

    Now, find your login details:

    cat /etc/kolla/admin-openrc.sh

    Source the credentials and log in:

    source /etc/kolla/admin-openrc.sh
    openstack service list

    Open your browser and try accessing: http://<your-server-ip>/dashboard/ or https://<your-server-ip>/dashboard/”

    Use the username and the password from admin-openrc.sh.

    Troubleshooting Common Issues

    Deploying OpenStack isn’t always smooth sailing. Here are some common issues and how to fix them:

    Kolla-Ansible Fails at Bootstrap

    Solution: Run `kolla-ansible -i /etc/kolla/inventory/all-in-one prechecks` to check for missing dependencies before deployment.

    Containers Keep Restarting or Failing

    Solution: Run docker ps -a | grep Exit to check failed containers. Then inspect logs with:

    docker ps --format 'table {{.ID}}\t{{.Names}}\t{{.Status}}'
    docker logs $(docker ps -q --filter "status=exited")
    journalctl -u docker.service --no-pager | tail -n 50

    Horizon Dashboard Not Accessible

    Solution: Ensure enable_horizon: yes is set in globals.yml and restart services with:

    kolla-ansible -i /etc/kolla/inventory/all-in-one reconfigure

    Missing OpenStack CLI Commands

    Solution: Source the OpenStack credentials file before using the CLI:

    source /etc/kolla/admin-openrc.sh

    By tackling these common issues, you’ll have a much smoother OpenStack deployment experience.

    🎉 Congratulations, You Now Have Your Own Cloud!

    Now that your OpenStack deployment is up and running, you can start launching instances, creating networks, and exploring the endless possibilities.

    What’s Next?

    ✅ Launch your first VM using OpenStack CLI or Horizon!

    ✅ Set up floating IPs and networks to make instances accessible.

    ✅ Experiment with Cinder storage and Neutron networking.

    ✅ Explore Heat for automation and Swift for object storage.

    Final Thoughts

    Deploying OpenStack manually can be a nightmare, but Kolla-Ansible makes it much easier. You’ve now got your own containerized OpenStack cloud running in no time.

  • Transforming Infrastructure at Scale with Azure Cloud

    • Infrastructure Costs cut by 30-34% monthly, optimizing resource utilization and generating substantial savings.
    • Customer Onboarding Time reduced from 50 to 4 days, significantly accelerating the client’s ability to onboard new customers.
    • Site Provisioning Time for existing customers reduced from weeks to a few hours, streamlining operations and improving customer satisfaction.
    • Downtime affecting customers was reduced to under 30 minutes, with critical issues resolved within 1 hour and most proactively addressed before customer notification.

  • Strategies for Cost Optimization Across Amazon EKS Clusters

    Fast-growing tech companies rely heavily on Amazon EKS clusters to host a variety of microservices and applications. The pairing of Amazon EKS for managing the Kubernetes Control Plane and Amazon EC2 for flexible Kubernetes nodes creates an optimal environment for running containerized workloads. 

    With the increasing scale of operations, optimizing costs across multiple EKS clusters has become a critical priority. This blog will demonstrate how we can leverage various tools and strategies to analyze, optimize, and manage EKS costs effectively while maintaining performance and reliability. 

    Cost Analysis:

    Working on cost optimization becomes absolutely necessary for cost analysis. Data plays an important role, and trust your data. The total cost of operating an EKS cluster encompasses several components. The EKS Control Plane (or Master Node) incurs a fixed cost of $0.20 per hour, offering straightforward pricing. 

    Meanwhile, EC2 instances, serving as the cluster’s nodes, introduce various cost factors, such as block storage and data transfer, which can vary significantly based on workload characteristics. For this discussion, we’ll focus primarily on two aspects of EC2 cost: instance hours and instance pricing. Let’s look at how to do the cost analysis on your EKS cluster.

    • Tool Selection: We can begin our cost analysis journey by selecting Kubecost, a powerful tool specifically designed for Kubernetes cost analysis. Kubecost provides granular insights into resource utilization and costs across our EKS clusters.
    • Deployment and Usage: Deploying Kubecost is straightforward. We can integrate it with our Kubernetes clusters following the provided documentation. Kubecost’s intuitive dashboard allowed us to visualize resource usage, cost breakdowns, and cost allocation by namespace, pod, or label. Once deployed, you can see the Kubecost overview page in your browser by port-forwarding the Kubecost k8s service. It might take 5-10 minutes for Kubecost to gather metrics. You can see your Amazon EKS spend, including cumulative cluster costs, associated Kubernetes asset costs, and monthly aggregated spend.
    • Cluster Level Cost Analysis: For multi-cluster cost analysis and cluster level scoping, consider using the AWS Tagging strategy and tag your EKS clusters. Learn more about tagging strategy from the following documentations. You can then view your cost analysis in AWS Cost Explorer. AWS Cost Explorer provided additional insights into our AWS usage and spending trends. By analyzing cost and usage data at a granular level, we can identify areas for further optimization and cost reduction. 
    • Multi-Cluster Cost Analysis using Kubecost and Prometheus: Kubecost deployment comes with a Prometheus cluster to send cost analysis metrics to the Prometheus server. For multiple EKS clusters, we can enable the remote Prometheus server, either AWS-Managed Prometheus server or self-managed Prometheus. To get cost analysis metrics from multiple clusters, we need to run Kubeost with an additional Sigv4 pod that sends individual and combined cluster metrics to a common Prometheus cluster. You can follow the AWS documentation for Multi-Cluster Cost Analysis using Kubecost and Prometheus.

    Cost Optimization Strategies:

    Based on the cost analysis, the next step is to plan your cost optimization strategies. As explained in the previous section, the Control Plane has a fixed cost and straightforward pricing model. So, we will focus mainly on optimizing the data nodes and optimizing the application configuration. Let’s look at the following strategies when optimizing the cost of the EKS cluster and supporting AWS services:

    • Right Sizing: On the cost optimization pillar of the AWS Well-Architected Framework, we find a section on Cost-Effective Resources, which describes Right Sizing as:

    “… using the lowest cost resource that still meets the technical specifications of a specific workload.”

    • Application Right Sizing: Right-sizing is the strategy to optimize pod resources by allocating the appropriate CPU and memory resources to pods. Care must be taken to try to set requests that align as close as possible to the actual utilization of these resources. If the value is too low, then the containers may experience throttling of the resources and impact the performance. However, if the value is too high, then there is waste, since those unused resources remain reserved for that single container. When actual utilization is lower than the requested value, the difference is called slack cost. A tool like kube-resource-report is valuable for visualizing the slack cost and right-sizing the requests for the containers in a pod. Installation instructions demonstrate how to install via an included helm chart.

      helm upgrade –install kube-resource-report chart/kube-resource-report


      You can also consider tools like VPA recommender with Goldilocks to get an insight into your pod resource consumption and utilization.


    • Compute Right Sizing: Application right sizing and Kubecost analysis are required to right-size EKS Compute. Here are several strategies for computing right sizing:some text
      • Mixed Instance Auto Scaling group: Employ a mixed instance policy to create a diversified pool of instances within your auto scaling group. This mix can include both spot and on-demand instances. However, it’s advisable not to mix instances of different sizes within the same Node group.
      • Node Groups, Taints, and Tolerations: Utilize separate Node Groups with varying instance sizes for different application requirements. For example, use distinct node groups for GPU-intensive and CPU-intensive applications. Use taints and tolerations to ensure applications are deployed on the appropriate node group.
      • Graviton Instances: Explore the adoption of Graviton Instances, which offer up to 40% better price performance compared to traditional instances. Consider migrating to Graviton Instances to optimize costs and enhance application performance.
    • Purchase Options: Another part of the cost optimization pillar of the AWS Well-Architected Framework that we can apply comes from the Purchasing Options section, which says:

    Spot Instances allow you to use spare compute capacity at a significantly lower cost than On-Demand EC2 instances (up to 90%).”

    Understanding purchase options for Amazon EC2 is crucial for cost optimization. The Amazon EKS data plane consists of worker nodes or serverless compute resources responsible for running Kubernetes application workloads. These nodes can utilize different capacity types and purchase options, including  On-Demand, Spot Instances, Savings Plans, and Reserved Instances.

    On-Demand and Spot capacity offer flexibility without spending commitments. On-Demand instances are billed based on runtime and guarantee availability at On-Demand rates, while Spot instances offer discounted rates but are preemptible. Both options are suitable for temporary or bursty workloads, with Spot instances being particularly cost-effective for applications tolerant of compute availability fluctuations. 

    Reserved Instances involve upfront spending commitments over one or three years for discounted rates. Once a steady-state resource consumption profile is established, Reserved Instances or Savings Plans become effective. Savings Plans, introduced as a more flexible alternative to Reserved Instances, allow for commitments based on a “US Dollar spend amount,” irrespective of provisioned resources. There are two types: Compute Savings Plans, offering flexibility across instance types, Fargate, and Lambda charges, and EC2 Instance Savings Plans, providing deeper discounts but restricting compute choice to an instance family.

    Tailoring your approach to your workload can significantly impact cost optimization within your EKS cluster. For non-production environments, leveraging Spot Instances exclusively can yield substantial savings. Meanwhile, implementing Mixed-Instances Auto Scaling Groups for production workloads allows for dynamic scaling and cost optimization. Additionally, for steady workloads, investing in a Savings Plan for EC2 instances can provide long-term cost benefits. By strategically planning and optimizing your EC2 instances, you can achieve a notable reduction in your overall EKS compute costs, potentially reaching savings of approximately 60-70%.

    “… this (matching supply and demand) accomplished using Auto Scaling, which helps you to scale your EC2 instances and Spot Fleet capacity up or down automatically according to conditions you define.”

    • Cluster Autoscaling: Therefore, a prerequisite to cost optimization on a Kubernetes cluster is to ensure you have Cluster Autoscaler running. This tool performs two critical functions in the cluster. First, it will monitor the cluster for pods that are unable to run due to insufficient resources. Whenever this occurs, the Cluster Autoscaler will update the Amazon EC2 Auto Scaling group to increase the desired count, resulting in additional nodes in the cluster. Additionally, the Cluster Autoscaler will detect nodes that have been underutilized and reschedule pods onto other nodes. Cluster Autoscaler will then decrease the desired count for the Auto Scaling group to scale in the number of nodes.

    The Amazon EKS User Guide has a great section on the configuration of the Cluster Autoscaler. There are a couple of things to pay attention to when configuring the Cluster Autoscaler:

    IAM Roles for Service Account – Cluster Autoscaler will require access to update the desired capacity in the Auto Scaling group. The recommended approach is to create a new IAM role with the required policies and a trust policy that restricts access to the service account used by Cluster Autoscaler. The role name must then be provided as an annotation on the service account:

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: cluster-autoscaler
      annotations:
    	eks.amazonaws.com/role-arn: arn:aws:iam::000000000000:role/my_role_name

    Auto-Discovery Setup

    Setup your Cluster Autoscaler in Auto-Discovery Setup by enabling the –node-group-auto-discovery flag as an argument. Also, make sure to tag your EKS nodes’ Autoscaling groups with the following tags: 

    k8s.io/cluster-autoscaler/enabled,
    k8s.io/cluster-autoscaler/<cluster-name>


    Auto Scaling Group per AZ
    – When Cluster Autoscaler scales out, it simply increases the desired count for the Auto Scaling group, leaving the responsibility for launching new EC2 instances to the AWS Auto Scaling service. If an Auto Scaling group is configured for multiple availability zones, then the new instance may be provisioned in any of those availability zones.

    For deployments that use persistent volumes, you will need to provision a separate Auto Scaling group for each availability zone. This way, when Cluster Autoscaler detects the need to scale out in response to a given pod, it can target the correct availability zone for the scale-out based on persistent volume claims that already exist in a given availability zone.

    When using multiple Auto Scaling groups, be sure to include the following argument in the pod specification for Cluster Autoscaler:

    –balance-similar-node-groups=true

    • Pod Autoscaling: Now that Cluster Autoscaler is running in the cluster, you can be confident that the instance hours will align closely with the demand from pods within the cluster. Next up is to use Horizontal Pod Autoscaler (HPA) to scale out or in the number of pods for a deployment based on specific metrics for the pods to optimize pod hours and further optimize our instance hours.

    The HPA controller is included with Kubernetes, so all that is required to configure HPA is to ensure that the Kubernetes metrics server is deployed in your cluster and then defining HPA resources for your deployments. For example, the following HPA resource is configured to monitor the CPU utilization for a deployment named nginx-ingress-controller. HPA will then scale out or in the number of pods between 1 and 5 to target an average CPU utilization of 80% across all the pods:

    apiVersion: autoscaling/v1
    kind: HorizontalPodAutoscaler
    metadata:
      name: nginx-ingress-controller
    spec:
      scaleTargetRef:
    	apiVersion: apps/v1
    	kind: Deployment
    	name: nginx-ingress-controller
      minReplicas: 1
      maxReplicas: 5
      targetCPUUtilizationPercentage: 80

    The combination of Cluster Autoscaler and Horizontal Pod Autoscaler is an effective way to keep EC2 instance hours tied as close as possible to the actual utilization of the workloads running in the cluster.

    “Systems can be scheduled to scale out or in at defined times, such as the start of business hours, thus ensuring that resources are available when users arrive.”

    There are many deployments that only need to be available during business hours. A tool named kube-downscaler can be deployed to the cluster to scale in and out the deployments based on time of day. 

    Some example use case of kube-downscaler is:

    • Deploy the downscaler to a test (non-prod) cluster with a default uptime or downtime time range to scale down all deployments during the night and weekend.
    • Deploy the downscaler to a production cluster without any default uptime/downtime setting and scale down specific deployments by setting the downscaler/uptime (or downscaler/downtime) annotation. This might be useful for internal tooling front ends, which are only needed during work time.
    • AWS Fargate with EKS: You can run Kubernetes without managing clusters of K8s servers with AWS Fargate, a serverless compute service.

    AWS Fargate pricing is based on usage (pay-per-use). There are no upfront charges here as well. There is, however, a one-minute minimum charge. All charges are also rounded up to the nearest second. You will also be charged for any additional services you use, such as CloudWatch utilization charges and data transfer fees. Fargate can also reduce your management costs by reducing the number of DevOps professionals and tools you need to run Kubernetes on Amazon EKS.

    Conclusion:

    Effectively managing costs across multiple Amazon EKS clusters is essential for optimizing operations. By utilizing tools like Kubecost and AWS Cost Explorer, coupled with strategies such as right-sizing, mixed instance policies, and Spot Instances, organizations can streamline cost analysis and optimize resource allocation. Additionally, implementing auto-scaling mechanisms like Cluster Autoscaler ensures dynamic resource scaling based on demand, further optimizing costs. Leveraging AWS Fargate with EKS can eliminate the need to manage Kubernetes clusters, reducing management costs. Overall, by combining these strategies, organizations can achieve significant cost savings while maintaining performance and reliability in their containerized environments.

  • Mastering Prow: A Guide to Developing Your Own Plugin for Kubernetes CI/CD Workflow

    Continuous Integration and Continuous Delivery (CI/CD) pipelines are essential components of modern software development, especially in the world of Kubernetes and containerized applications. To facilitate these pipelines, many organizations use Prow, a CI/CD system built specifically for Kubernetes. While Prow offers a rich set of features out of the box, you may need to develop your own plugins to tailor the system to your organization’s requirements. In this guide, we’ll explore the world of Prow plugin development and show you how to get started.

    Prerequisites

    Before diving into Prow plugin development, ensure you have the following prerequisites:

    • Basic Knowledge of Kubernetes and CI/CD Concepts: Familiarity with Kubernetes concepts such as Pods, Deployments, and Services, as well as understanding CI/CD principles, will be beneficial for understanding Prow plugin development.
    • Access to a Kubernetes Cluster: You’ll need access to a Kubernetes cluster for testing your plugins. If you don’t have one already, you can set up a local cluster using tools like Minikube or use a cloud provider’s managed Kubernetes service.
    • Prow Setup: Install and configure Prow in your Kubernetes cluster. You can visit Velotio Technologies – Getting Started with Prow: A Kubernetes-Native CI/CD Framework
    • Development Environment Setup: Ensure you have Git, Go, and Docker installed on your local machine for developing and testing Prow plugins. You’ll also need to configure your environment to interact with your organization’s Prow setup.

    The Need for Custom Prow Plugins

    While Prow provides a wide range of built-in plugins, your organization’s Kubernetes workflow may have specific requirements that aren’t covered by these defaults. This is where developing custom Prow plugins comes into play. Custom plugins allow you to extend Prow’s functionality to cater to your needs. Whether automating workflows, integrating with other tools, or enforcing custom policies, developing your own Prow plugins gives you the power to tailor your CI/CD pipeline precisely.

    Getting Started with Prow Plugin Development

    Developing a custom Prow plugin may seem daunting, but with the right approach and tools, it can be a rewarding experience. Here’s a step-by-step guide to get you started:

    1. Set Up Your Development Environment

    Before diving into plugin development, you need to set up your development environment. You will need Git, Go, and access to a Kubernetes cluster for testing your plugins. Ensure you have the necessary permissions to make changes to your organization’s Prow setup.

    2. Choose a Plugin Type

    Prow supports various plugin types, including postsubmits, presubmits, triggers, and utilities. Choose the type that best fits your use case.

    • Postsubmits: These plugins are executed after the code is merged and are often used for tasks like publishing artifacts or creating release notes.
    • Presubmits: Presubmit plugins run before code is merged, typically used for running tests and ensuring code quality.
    • Triggers: Trigger plugins allow you to trigger custom jobs based on specific events or criteria.
    • Utilities: Utility plugins offer reusable functions and utilities for other plugins.

    3. Create Your Plugin

    Once you’ve chosen a plugin type, it’s time to create it. Below is an example of a simple Prow plugin written in Go, named comment-plugin.go. It will create a comment on a pull request each time an event is received.

    This code sets up a basic HTTP server that listens for GitHub events and handles them by creating a comment using the GitHub API. Customize this code to fit your specific use case.

    package main
    
    import (
        "encoding/json"
        "flag"
        "net/http"
        "os"
        "strconv"
        "time"
    
        "github.com/sirupsen/logrus"
        "k8s.io/test-infra/pkg/flagutil"
        "k8s.io/test-infra/prow/config"
        "k8s.io/test-infra/prow/config/secret"
        prowflagutil "k8s.io/test-infra/prow/flagutil"
        configflagutil "k8s.io/test-infra/prow/flagutil/config"
        "k8s.io/test-infra/prow/github"
        "k8s.io/test-infra/prow/interrupts"
        "k8s.io/test-infra/prow/logrusutil"
        "k8s.io/test-infra/prow/pjutil"
        "k8s.io/test-infra/prow/pluginhelp"
        "k8s.io/test-infra/prow/pluginhelp/externalplugins"
    )
    
    const pluginName = "comment-plugin"
    
    type options struct {
        port int
    
        config                 configflagutil.ConfigOptions
        dryRun                 bool
        github                 prowflagutil.GitHubOptions
        instrumentationOptions prowflagutil.InstrumentationOptions
    
        webhookSecretFile string
    }
    
    type server struct {
        tokenGenerator func() []byte
        botUser        *github.UserData
        email          string
        ghc            github.Client
        log            *logrus.Entry
        repos          []github.Repo
    }
    
    func helpProvider(_ []config.OrgRepo) (*pluginhelp.PluginHelp, error) {
        pluginHelp := &pluginhelp.PluginHelp{
           Description: `The sample plugin`,
        }
        return pluginHelp, nil
    }
    
    func (o *options) Validate() error {
        return nil
    }
    
    func gatherOptions() options {
        o := options{config: configflagutil.ConfigOptions{ConfigPath: "./config.yaml"}}
        fs := flag.NewFlagSet(os.Args[0], flag.ExitOnError)
        fs.IntVar(&o.port, "port", 8888, "Port to listen on.")
        fs.BoolVar(&o.dryRun, "dry-run", false, "Dry run for testing. Uses API tokens but does not mutate.")
        fs.StringVar(&o.webhookSecretFile, "hmac-secret-file", "/etc/hmac", "Path to the file containing GitHub HMAC secret.")
        for _, group := range []flagutil.OptionGroup{&o.github} {
           group.AddFlags(fs)
        }
        fs.Parse(os.Args[1:])
        return o
    }
    
    func main() {
        o := gatherOptions()
        if err := o.Validate(); err != nil {
           logrus.Fatalf("Invalid options: %v", err)
        }
    
        logrusutil.ComponentInit()
        log := logrus.StandardLogger().WithField("plugin", pluginName)
    
        if err := secret.Add(o.webhookSecretFile); err != nil {
           logrus.WithError(err).Fatal("Error starting secrets agent.")
        }
    
        gitHubClient, err := o.github.GitHubClient(o.dryRun)
        if err != nil {
           logrus.WithError(err).Fatal("Error getting GitHub client.")
        }
    
        email, err := gitHubClient.Email()
        if err != nil {
           log.WithError(err).Fatal("Error getting bot e-mail.")
        }
    
        botUser, err := gitHubClient.BotUser()
        if err != nil {
           logrus.WithError(err).Fatal("Error getting bot name.")
        }
        repos, err := gitHubClient.GetRepos(botUser.Login, true)
        if err != nil {
           log.WithError(err).Fatal("Error listing bot repositories.")
        }
        serv := &server{
           tokenGenerator: secret.GetTokenGenerator(o.webhookSecretFile),
           botUser:        botUser,
           email:          email,
           ghc:            gitHubClient,
           log:            log,
           repos:          repos,
        }
    
        health := pjutil.NewHealthOnPort(o.instrumentationOptions.HealthPort)
        health.ServeReady()
    
        mux := http.NewServeMux()
        mux.Handle("/", serv)
        externalplugins.ServeExternalPluginHelp(mux, log, helpProvider)
        logrus.Info("starting server " + strconv.Itoa(o.port))
        httpServer := &http.Server{Addr: ":" + strconv.Itoa(o.port), Handler: mux}
        defer interrupts.WaitForGracefulShutdown()
        interrupts.ListenAndServe(httpServer, 5*time.Second)
    }
    
    func (s *server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
        logrus.Info("inside http server")
        _, _, payload, ok, _ := github.ValidateWebhook(w, r, s.tokenGenerator)
        logrus.Info(string(payload))
        if !ok {
           return
        }
        logrus.Info(w, "Event received. Have a nice day.")
        if err := s.handleEvent(payload); err != nil {
           logrus.WithError(err).Error("Error parsing event.")
        }
    }
    
    func (s *server) handleEvent(payload []byte) error {
        logrus.Info("inside handler")
        var pr github.PullRequestEvent
        if err := json.Unmarshal(payload, &pr); err != nil {
           return err
        }
        logrus.Info(pr.Number)
        if err := s.ghc.CreateComment(pr.PullRequest.Base.Repo.Owner.Login, pr.PullRequest.Base.Repo.Name, pr.Number, "comment from smaple-plugin"); err != nil {
           return err
        }
        return nil
    }

    4. Deploy Your Plugin

    To deploy your custom Prow plugin, you will need to create a Docker image and deploy it into your Prow cluster.

    FROM golang as app-builder
    WORKDIR /app
    RUN apt  update
    RUN apt-get install git
    COPY . .
    RUN CGO_ENABLED=0 go build -o main
    
    FROM alpine:3.9
    RUN apk add ca-certificates git
    COPY --from=app-builder /app/main /app/custom-plugin
    ENTRYPOINT ["/app/custom-plugin"]

    docker build -t jainbhavya65/custom-plugin:v1 .

    docker push jainbhavya65/custom-plugin:v1

    Deploy the Docker image using Kubernetes deployment:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: comment-plugin
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: comment-plugin
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: comment-plugin
        spec:
          containers:
          - args:
            - --github-token-path=/etc/github/oauth
            - --hmac-secret-file=/etc/hmac-token/hmac
            - --port=80
            image: <IMAGE>
            imagePullPolicy: Always
            name: comment-plugin
            ports:
            - containerPort: 80
              protocol: TCP
            volumeMounts:
            - mountPath: /etc/github
              name: oauth
              readOnly: true
            - mountPath: /etc/hmac-token
              name: hmac
              readOnly: true
          volumes:
          - name: oauth
            secret:
              defaultMode: 420
              secretName: oauth-token
          - name: hmac
            secret:
              defaultMode: 420
              secretName: hmac-token

    Create a service for deployment:
    apiVersion: v1
    kind: Service
    metadata:
      name: comment-plugin
    spec:
      ports:
      - port: 80
        protocol: TCP
        targetPort: 80
      selector:
        app: comment-plugin
      sessionAffinity: None
      type: ClusterIP
    view raw

    After creating the deployment and service, integrate it into your organization’s Prow configuration. This involves updating your Prow plugin.yaml files to include your plugin and specify when it should run.

    external_plugins: 
    - name: comment-plugin
      # No endpoint specified implies "http://{{name}}". // as we deploy plugin into same cluster
      # if plugin is not deployed in same cluster then you can give endpoint
      events:
      # only pull request and issue comment events are send to our plugin
      - pull_request
      - issue_comment

    Conclusion

    Mastering Prow plugin development opens up a world of possibilities for tailoring your Kubernetes CI/CD workflow to meet your organization’s needs. While the initial learning curve may be steep, the benefits of custom plugins in terms of automation, efficiency, and control are well worth the effort.

    Remember that the key to successful Prow plugin development lies in clear documentation, thorough testing, and collaboration with your team to ensure that your custom plugins enhance your CI/CD pipeline’s functionality and reliability. As Kubernetes and containerized applications continue to evolve, Prow will remain a valuable tool for managing your CI/CD processes, and your custom plugins will be the secret sauce that sets your workflow apart from the rest.