Category: Cloud & DevOps

  • The Ultimate Guide to Disaster Recovery for Your Kubernetes Clusters

    Kubernetes allows us to run a containerized application at scale without drowning in the details of application load balancing. You can ensure high availability for your applications running on Kubernetes by running multiple replicas (pods) of the application. All the complexity of container orchestrations is hidden away safely so that you can focus on developing application instead of deploying it. Learn more about high availability of Kubernetes Clusters and how you can use Kubedm for high availability in Kubernetes here.

    But using Kubernetes has its own challenges and getting Kubernetes up and running takes some real work. If you are not familiar with getting Kubernetes up and running, you might want to take a look here.

    Kubernetes allows us to have a zero downtime deployment, yet service interrupting events are inevitable and can occur at any time. Your network can go down, your latest application push can introduce a critical bug, or in the rarest case, you might even have to face a natural disaster.

    When you are using Kubernetes, sooner or later, you need to set up a backup. In case your cluster goes into an unrecoverable state, you will need a backup to go back to the previous stable state of the Kubernetes cluster.

    Why Backup and Recovery?

    There are three reasons why you need a backup and recovery mechanism in place for your Kubernetes cluster. These are:

    1. To recover from Disasters: like someone accidentally deleted the namespace where your deployments reside.
    2. Replicate the environment: You want to replicate your production environment to staging environment before any major upgrade.
    3. Migration of Kubernetes Cluster: Let’s say, you want to migrate your Kubernetes cluster from one environment to another.

    What to Backup?

    Now that you know why, let’s see what exactly do you need to backup. The two things you need to backup are:

    1. Your Kubernetes control plane is stored into etcd storage and you need to backup the etcd state to get all the Kubernetes resources.
    2. If you have stateful containers (which you will have in real world), you need a backup of persistent volumes as well.

    How to Backup?

    There have been various tools like Heptio ark and Kube-backup to backup and restore the Kubernetes cluster for cloud providers. But, what if you are not using managed Kubernetes cluster? You might have to get your hands dirty if you are running Kubernetes on Baremetal, just like we are.

    We are running 3 master Kubernetes cluster with 3 etcd members running on each master. If we lose one master, we can still recover the master because etcd quorum is intact. Now if we lose two masters, we need a mechanism to recover from such situations as well for production grade clusters.

    Want to know how to set up multi-master Kubernetes cluster? Keep reading!

    Taking etcd backup:

    There is a different mechanism to take etcd backup depending on how you set up your etcd cluster in Kubernetes environment.

    There are two ways to setup etcd cluster in kubernetes environment:

    1. Internal etcd cluster: It means you’re running your etcd cluster in the form of containers/pods inside the Kubernetes cluster and it is the responsibility of Kubernetes to manage those pods.
    2. External etcd cluster: Etcd cluster you’re running outside of Kubernetes cluster mostly in the form of Linux services and providing its endpoints to Kubernetes cluster to write to.

    Backup Strategy for Internal Etcd Cluster:

    To take a backup from inside a etcd pod, we will be using Kubernetes CronJob functionality which will not require any etcdctl client to be installed on the host.

    Following is the definition of Kubernetes CronJob which will take etcd backup every minute:

    `apiVersion: batch/v1beta1kind: CronJobmetadata: name: backup namespace: kube-systemspec: # activeDeadlineSeconds: 100schedule: "*/1 * * * *"
    jobTemplate:
    spec:
    template:
    spec:
    containers:
    - name: backup
    # Same image as in /etc/kubernetes/manifests/etcd.yaml
    image: k8s.gcr.io/etcd:3.2.24
    env:
    - name: ETCDCTL_API
    value: "3"
    command: ["/bin/sh"]
    args: ["-c", "etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key snapshot save /backup/etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db"]
    volumeMounts:
    - mountPath: /etc/kubernetes/pki/etcd
    name: etcd-certs
    readOnly: true
    - mountPath: /backup
    name: backup
    restartPolicy: OnFailure
    hostNetwork: true
    volumes:
    - name: etcd-certs
    hostPath:
    path: /etc/kubernetes/pki/etcd
    type: DirectoryOrCreate
    - name: backup
    hostPath:
    path: /data/backup
    type: DirectoryOrCreate

    Backup Strategy for External Etcd Cluster:

    If you running etcd cluster on Linux hosts as a service, you should set up a Linux cron job to take backup of your cluster.

    Run the following command to save etcd backup

    ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save /path/for/backup/snapshot.db

    Disaster Recovery

    Now, Let’s say the Kubernetes cluster went completely down and we need to recover the Kubernetes cluster from the etcd snapshot.

    Normally, start the etcd cluster and do the kubeadm init on the master node with etcd endpoints.

    Make sure you put the backup certificates into /etc/kubernetes/pki folder before kubeadm init. It will pick up the same certificates.

    Restore Strategy for Internal Etcd Cluster:

    docker run --rm 
    -v '/data/backup:/backup' 
    -v '/var/lib/etcd:/var/lib/etcd' 
    --env ETCDCTL_API=3 
    k8s.gcr.io/etcd:3.2.24' 
    /bin/sh -c "etcdctl snapshot restore '/backup/etcd-snapshot-2018-12-09_11:12:05_UTC.db' ; mv /default.etcd/member/ /var/lib/etcd/"
    
    kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd

    Restore Strategy for External Etcd Cluster

    Restore the etcd on 3 nodes using following commands:

    ETCDCTL_API=3 etcdctl snapshot restore snapshot-188.db 
    --name master-0 
    --initial-cluster master-0=http://10.0.1.188:2380,master-01=http://10.0.1.136:2380,master-2=http://10.0.1.155:2380 
    --initial-cluster-token my-etcd-token 
    --initial-advertise-peer-urls http://10.0.1.188:2380
    
    ETCDCTL_API=3 etcdctl snapshot restore snapshot-136.db 
    --name master-1 
    --initial-cluster master-0=http://10.0.1.188:2380,master-1=http://10.0.1.136:2380,master-2=http://10.0.1.155:2380 
    --initial-cluster-token my-etcd-token 
    --initial-advertise-peer-urls http://10.0.1.136:2380
    
    ETCDCTL_API=3 etcdctl snapshot restore snapshot-155.db 
    --name master-2 
    --initial-cluster master-0=http://10.0.1.188:2380,master-1=http://10.0.1.136:2380,master-2=http://10.0.1.155:2380 
    --initial-cluster-token my-etcd-token 
    --initial-advertise-peer-urls http://10.0.1.155:2380

    The above three commands will give you three restored folders on three nodes named master:

    0.etcd, master-1.etcd and master-2.etcd

    Now, Stop all the etcd service on the nodes, replace the restored folder with the restored folders on all nodes and start the etcd service. Now you can see all the nodes, but in some time you will see that only master node is ready and other nodes went into the not ready state. You need to join those two nodes again with the existing ca.crt file (you should have a backup of that).

    Run the following command on master node:

    kubeadm token create --print-join-command

    It will give you kubeadm join command, add one –ignore-preflight-errors and run that command on other two nodes for them to come into the ready state.

    Conclusion

    One way to deal with master failure is to set up multi-master Kubernetes cluster, but even that does not allow you to completely eliminate the Kubernetes etcd backup and restore, and it is still possible that you may accidentally destroy data on the HA environment.

    Need help with disaster recovery for your Kubernetes Cluster? Connect with the experts at Velotio!

    For more insights into Kubernetes Disaster Recovery check out here.

  • Simplifying MySQL Sharding with ProxySQL: A Step-by-Step Guide

    Introduction:

    ProxySQL is a powerful SQL-aware proxy designed to sit between database servers and client applications, optimizing database traffic with features like load balancing, query routing, and failover. This article focuses on simplifying the setup of ProxySQL, especially for users implementing data-based sharding in a MySQL database.

    What is Sharding?

    Sharding involves partitioning a database into smaller, more manageable pieces called shards based on certain criteria, such as data attributes. ProxySQL supports data-based sharding, allowing users to distribute data across different shards based on specific conditions.

    Understanding the Need for ProxySQL:

    ProxySQL is an intermediary layer that enhances database management, monitoring, and optimization. With features like data-based sharding, ProxySQL is an ideal solution for scenarios where databases need to be distributed based on specific data attributes, such as geographic regions.

    ‍Installation & Setup:‍

    There are two ways to install the proxy, either by installing it using packages or running  ProxySQL in docker. ProxySQL can be installed using two methods: via packages or running it in a Docker container. For this guide, we will focus on the Docker installation.

    1. Install ProxySQL and MySQL Docker Images:

    To start, pull the necessary Docker images for ProxySQL and MySQL using the following commands:

    docker pull mysql:latest
    docker pull proxysql/proxysql

    2. Create Docker Network:

    Create a Docker network for communication between MySQL containers:

    docker network create multi-tenant-network

    Note: ProxySQL setup will need connections to multiple SQL servers. So, we will set up multiple SQL servers on our docker inside a Docker network.

    Containers within the same Docker network can communicate with each other using their container names or IP addresses.

    You can check the list of all the Docker networks currently present by running the following command:

    docker network ls

    3. Set Up MySQL Containers:

    Now, create three MySQL containers within the network:

    Note: We can create any number of MySQL containers.

    docker run -d --name mysql_host_1 --network=multi-tenant-network -p 3307:3306 -e MYSQL_ROOT_PASSWORD=pass123 mysql:latest 
    docker run -d --name mysql_host_2 --network=multi-tenant-network -p 3308:3306 -e MYSQL_ROOT_PASSWORD=pass123 mysql:latest 
    docker run -d --name mysql_host_3 --network=multi-tenant-network -p 3309:3306 -e MYSQL_ROOT_PASSWORD=pass123 mysql:latest

    Note: Adjust port numbers as necessary. 

    The default MySQL protocol port is 3306, but since we cannot access all three of our MySQL containers on the same port, we have set their ports to 3307, 3308, and 3309. Although internally, all MySQL containers will connect using port 3306.

    –network=multi-tenant-network. This specifies that the container should be created under the specified network.

    We have also specified the root password of the MySQL container to log into it, where the username is “root” and the password is “pass123” for all three of them.

    After running the above three commands, three MySQL containers will start running inside the network. You can connect to these three hosts using host = localhost or 127.0.0.1 and port = 3307 / 3308 / 3309.

    To ping the port, use the following command:

    for macOS:

    nc -zv 127.0.0.1 3307

    for Windows: 

    ping 127.0.0.1 3307

    for Linux: 

    telnet 127.0.0.1 3307

    Reference Image

    4. Create Users in MySQL Containers:

    Create “user_shard” and “monitor” users in each MySQL container.

    The “user_shard” user will be used by the proxy to make queries to the DB.

    The “monitor” user will be used by the proxy to monitor the DB.

    Note: To access the MySQL container mysql_host_1, use the command:

    docker exec -it mysql_host_1 mysql -uroot -ppass123

    Use the following commands inside the MySQL container to create the user:

    CREATE USER 'user_shard'@'%' IDENTIFIED BY 'pass123'; 
    GRANT ALL PRIVILEGES ON *.* TO 'user_shard'@'%' WITH GRANT OPTION; 
    FLUSH PRIVILEGES;
    
    CREATE USER monitor@'%' IDENTIFIED BY 'pass123'; 
    GRANT ALL PRIVILEGES ON *.* TO monitor@'%' WITH GRANT OPTION; 
    FLUSH PRIVILEGES;

    Repeat the above steps for mysql_host_2 & mysql_host_3.

    If, at any point, you need to drop the user, you can use the following command:

    DROP USER monitor@’%’;

    5. Prepare ProxySQL Configuration:

    To prepare the configuration, we will need the IP addresses of the MySQL containers. To find those, we can use the following command:

    docker inspect mysql_host_1;
    docker inspect mysql_host_2; 
    docker inspect mysql_host_3;

    By running these commands, you will get all the details of the MySQL Docker container under a field named “IPAddress” inside your network. That is the IP address of that particular MySQL container.

    Example:
    mysql_host_1: 172.19.0.2

    mysql_host_2: 172.19.0.3

    mysql_host_3: 172.19.0.4

    Reference image for IP address of mysql_host_1: 172.19.0.2

    Now, create a ProxySQL configuration file named proxysql.cnf. Include details such as IP addresses of MySQL containers, administrative credentials, and MySQL users.

    Below is the content that needs to be added to the proxysql.cnf file:

    datadir="/var/lib/proxysql"
    
    admin_variables=
    {
        admin_credentials="admin:admin;radmin:radmin"
        mysql_ifaces="0.0.0.0:6032"
        refresh_interval=2000
        hash_passwords=false
    }
    
    mysql_variables=
    {
        threads=4
        max_connections=2048
        default_query_delay=0
        default_query_timeout=36000000
        have_compress=true
        poll_timeout=2000
        interfaces="0.0.0.0:6033;/tmp/proxysql.sock"
        default_schema="information_schema"
        stacksize=1048576
        server_version="5.1.30"
        connect_timeout_server=10000
        monitor_history=60000
        monitor_connect_interval=200000
        monitor_ping_interval=200000
        ping_interval_server_msec=10000
        ping_timeout_server=200
        commands_stats=true
        sessions_sort=true
        monitor_username="monitor"
        monitor_password="pass123"
    }
    
    mysql_servers =
    (
        { address="172.19.0.2" , port=3306 , hostgroup=10, max_connections=100 },
        { address="172.19.0.3" , port=3306 , hostgroup=20, max_connections=100 },
        { address="172.19.0.4" , port=3306 , hostgroup=30, max_connections=100 }
    )
    
    
    mysql_users =
    (
        { username = "user_shard" , password = "pass123" , default_hostgroup = 10 , active = 1 },
        { username = "user_shard" , password = "pass123" , default_hostgroup = 20 , active = 1 },
        { username = "user_shard" , password = "pass123" , default_hostgroup = 30 , active = 1 }
    )

    Most of the settings are default; we won’t go into much detail for each setting. 

    admin_variables: These variables are used for ProxySQL’s administrative interface. It allows you to connect to ProxySQL and perform administrative tasks such as configuring runtime settings, managing servers, and monitoring performance.

    mysql_variables, monitor_username, and monitor_password are used to specify the username that ProxySQL will use when connecting to MySQL servers for monitoring purposes. This monitoring user is used to execute queries and gather statistics about the health and performance of the MySQL servers. This is the user we created during step 4.

    mysql_servers will contain all the MySQL servers we want to be connected with ProxySQL. Each entry will have the IP address of the MySQL container, port, host group, and max_connections. Mysql_users will have all the users we created during step 4.

    7. Run ProxySQL Container:

    Inside the same directory where the proxysql.cnf file is located, run the following command to start ProxySQL:

    docker run -d --rm -p 6032:6032 -p 6033:6033 -p 6080:6080 --name=proxysql --network=multi-tenant-network -v $PWD/proxysql.cnf:/etc/proxysql.cnf proxysql/proxysql

    Here, port 6032 is used for ProxySQL’s administrative interface. It allows you to connect to ProxySQL and perform administrative tasks such as configuring runtime settings, managing servers, and monitoring performance.

    Port 6033 is the default port for ProxySQL’s MySQL protocol interface. It is used for handling MySQL client connections. Our application will use it to access the ProxySQL db and make SQL queries.

    The above command will make ProxySQL run on our Docker with the configuration provided in the proxysql.cnf file.

    Inside ProxySQL Container:

    8. Access ProxySQL Admin Console:

    Now, to access the ProxySQL Docker container, use the following command:

    docker exec -it proxysql bash

    Now, once you’re inside the ProxySQL Docker container, you can access the ProxySQL admin console using the command:

    mysql -u admin -padmin -h 127.0.0.1 -P 6032

    You can run the following queries to get insights into your ProxySQL server:

    i) To get the list of all the connected MySQL servers:

    SELECT * FROM mysql_servers;

    ii) Verify the status of the MySQL backends in the monitor database tables in ProxySQL admin using the following command:

    SHOW TABLES FROM monitor;


    If this returns an empty set, it means that the monitor username and password are not set correctly. You can do so by using the below commands:

    UPDATE global_variables SET variable_value=’monitor’ WHERE variable_name='mysql-monitor_username'; 
    UPDATE global_variables SET variable_value=’pass123’ WHERE variable_name='mysql-monitor_password';
    LOAD MYSQL VARIABLES TO RUNTIME; 
    SAVE MYSQL VARIABLES TO DISK;

    And then restart the proxy Docker container:

    iii) Check the status of DBs connected to ProxySQL using the following command:

    SELECT * FROM monitor.mysql_server_connect_log ORDER BY time_start_us DESC;

    iv) To get a list of all the ProxySQL global variables, use the following command:

    SELECT * FROM global_variables; 

    v) To get all the queries made on ProxySQL, use the following command:

    Select * from stats_mysql_query_digest;

    Note: Whenever we change any row, use the below commands to load them:

    Change in variables:

    LOAD MYSQL VARIABLES TO RUNTIME; 
    SAVE MYSQL VARIABLES TO DISK;
    
    Change in mysql_servers:
    LOAD MYSQL SERVERS TO RUNTIME;
    SAVE MYSQL SERVERS TO DISK;
    
    Change in mysql_query_rules:
    LOAD MYSQL QUERY RULES TO RUNTIME;
    SAVE MYSQL QUERY RULES TO DISK;

    And then restart the proxy docker container.

    IMPORTANT:

    To connect to ProxySQL’s admin console, first get into the Docker container using the following command:

    docker exec -it proxysql bash

    Then, to access the ProxySQL admin console, use the following command:

    mysql -u admin -padmin -h 127.0.0.1 -P6032

    To access the ProxySQL MySQL console, we can directly access it using the following command without going inside the Docker ProxySQL container:

    mysql -u user_shard -ppass123 -h 127.0.0.1 -P6033

    To make queries to the database, we make use of ProxySQL’s 6033 port, where MySQL is being accessed.

    9. Define Query Rules:

    We can add custom query rules inside the mysql_query_rules table to redirect queries to specific databases based on defined patterns. Load the rules to runtime and save to disk.

    12. Sharding Example:

    Now, let’s illustrate how to leverage ProxySQL’s data-based sharding capabilities through a practical example. We’ll create three MySQL containers, each containing data from different continents in the “world” database, specifically within the “countries” table.

    Step 1: Create 3 MySQL containers named mysql_host_1, mysql_host_2 & mysql_host_3.

    Inside all containers, create a database named “world” with a table named “countries”.

    i) Inside mysql_host_1: Insert countries using the following query:

    INSERT INTO `countries` VALUES (1,'India','Asia'),(2,'Japan','Asia'),(3,'China','Asia'),(4,'USA','North America'),(5,'Cuba','North America'),(6,'Honduras','North America');

    ii) Inside mysql_host_2: Insert countries using the following query:

    INSERT INTO `countries` VALUES (1,'Kenya','Africa'),(2,'Ghana','Africa'),(3,'Morocco','Africa'),(4, "Brazil", "South America"), (5, "Chile", "South America"), (6, "Morocco", "South America");

    iii) Inside mysql_host_3: Insert countries using the following query:

    CODE: INSERT INTO `countries` VALUES (1, “Italy”, “Europe”), (2, “Germany”, “Europe”), (3, “France”, “Europe”);

    Now, we have distinct data sets for Asia & North America in mysql_host_1, Africa & South America in mysql_host_2, and Europe in mysql_host_3..js

    Step 2: Define Query Rules for Sharding

    Let’s create custom query rules to redirect queries based on the continent specified in the SQL statement.

    For example, if the query contains the continent “Asia,” we want it to be directed to mysql_host_1.

    — Query Rule for Asia and North America 

    INSERT INTO mysql_query_rules (rule_id, active, username, match_pattern, destination_hostgroup, apply) VALUES (10, 1, 'user_shard', "s*continents*=s*.*?(Asia|North America).*?s*", 10, 0);

    — Query Rule for Africa and South America

    INSERT INTO mysql_query_rules (rule_id, active, username, match_pattern, destination_hostgroup, apply) VALUES (20, 1, 'user_shard', "s*continents*=s*.*?(Africa|South America).*?s*", 20, 0);

    — Query Rule for Europe 

    INSERT INTO mysql_query_rules (rule_id, active, username, match_pattern, destination_hostgroup, apply) VALUES (30, 1, 'user_shard', "s*continents*=s*.*?(Europe).*?s*", 30, 0);

    Step 3: Apply and Save Query Rules

    After adding the query rules, ensure they take effect by running the following commands:

    LOAD MYSQL QUERY RULES TO RUNTIME; 
    SAVE MYSQL QUERY RULES TO DISK;

    Step 4: Test Sharding

    Now, access the MySQL server using the ProxySQL port and execute queries:

    mysql -u user_shard -ppass123 -h 127.0.0.1 -P 6033

    use world;

    — Example Queries:

    Select * from countries where id = 1 and continent = "Asia";

    — This will return id=1, name=India, continent=Asia

    Select * from countries where id = 1 and continent = "Africa";

    — This will return id=1, name=Kenya, continent=Africa.

    Select * from countries where id = 1 and continent = "Africa";

    Based on the defined query rules, the queries will be redirected to the specified MySQL host groups. If no rules match, the default host group that’s specified in mysql_users inside proxysql.cnf will be used.

    Conclusion:

    ProxySQL simplifies access to distributed data through effective sharding strategies. Its flexible query rules, combined with regex patterns and host group definitions, offer significant flexibility with relative simplicity.

    By following this step-by-step guide, users can quickly set up ProxySQL and leverage its capabilities to optimize database performance and achieve efficient data distribution.

    References:

    Download and Install ProxySQL – ProxySQL

    How to configure ProxySQL for the first time – ProxySQL

    Admin Variables – ProxySQL

  • Streamline Kubernetes Storage Upgrades

    Introduction:

    As technology advances, organizations are constantly seeking ways to optimize their IT infrastructure to enhance performance, reduce costs, and gain a competitive edge. One such approach involves migrating from traditional storage solutions to more advanced options that offer superior performance and cost-effectiveness. 

    In this blog post, we’ll explore a recent project (On Azure) where we successfully migrated our client’s applications from Disk type Premium SSD to Premium SSD v2. This migration led to performance improvements and cost savings for our client.

    Prerequisites:

    Before initiating this migration, ensure the following prerequisites are in place:

    1. Kubernetes Cluster: Ensure you have a working K8S cluster to host your applications.
    2. Velero Backup Tool: Install Velero, a widely-used backup and restoration tool tailored for Kubernetes environments.

    Overview of Velero:

    Velero stands out as a powerful tool designed for robust backup, restore, and migration solutions within Kubernetes clusters. It plays a crucial role in ensuring data safety and continuity during complex migration operations.

    Refer to the article on Velero installation and configuration.

    Strategic Plan Overview:

    There is two methods for upgrading storage classes:

    • Migration via Velero and CSI Integration: 

    This approach leverages Velero’s capabilities in conjunction with CSI integration to achieve a seamless and efficient migration.

    • Using Cloud Methods: 

    This method involves leveraging cloud provider-specific procedures. It includes steps like taking a snapshot of the disk, creating a new disk from the snapshot, and then establishing a Kubernetes volume using disk referencing. 

    Step-by-Step Guide:

    Migration via Velero and CSI Integration:

    Step 1 : Storage Class for Premium SSD v2

    Define a new storage class that supports Azure Premium SSD v2 disks. This storage class will be used to provision new persistent volumes during the restore process.

    # We have taken azure storage class example
    
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
     name: premium-ssd-v2
    parameters:
     cachingMode: None
     skuName: PremiumV2_LRS # (Disk Type)
    provisioner: disk.csi.azure.com
    reclaimPolicy: Delete
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true

    Step 2: Volume Snapshot Class

    Introduce a Volume Snapshot Class to enable snapshot creation for persistent volumes. This class will be utilized for capturing the current state of persistent volumes before restoring them using Premium SSD v2.

    apiVersion: snapshot.storage.k8s.io/v1
    kind: VolumeSnapshotClass
    metadata:
      name: disk-snapshot-class
    driver: disk.csi.azure.com
    deletionPolicy: Delete
    parameters:
      incremental: "false"

    Step 3: Update Velero Deployment and Daemonset

    Enable CSI (Container Storage Interface) support in both the Velero deployment and the node-agent daemonset. This modification allows Velero to interact with the Cloud Disk CSI driver for provisioning and managing persistent volumes. Additionally, configure the Velero client to utilize the CSI plugin, ensuring that Velero utilizes the Cloud Disk CSI driver for backup and restore operations.

    # Enable CSI Server side features 
    
    $ Kubectl -n velero edit deployment/velero
    $ kubectl -n velero edit daemonset/restic
    
    # Add below --features=EnableCSI flag in both resources 
    
        spec:
          containers:
          - args:
            - server
            - --features=EnableCSI
    
    # Enable client side features 
    
    $ velero client config set features=EnableCSI

    Step 4: Take Velero Backup

    Create a Velero backup of all existing persistent volumes stored on Disk Premium SSD. These backups serve as a safety net in case of any unforeseen issues during the migration process. And we can use the include and exclude flags with the velero backup commands.

    Reference Article : https://velero.io/docs/v1.12/resource-filtering 

    # run the below command for taking backup 
    $ velero backup create backup_name --include-namespaces namespace_name

    Step 5: ConfigMap Deployment 

    Deploy a ConfigMap in the Velero namespace. This ConfigMap defines the mapping between the old storage class (Premium SSD) and the new storage class (Premium SSD v2). During the restore process, Velero will use this mapping to recreate the persistent volumes using the new storage class.

    apiVersion: v1
    data:
      # managed-premium : premium-ssd-v2
      older storage_class name : new storage_class name
    kind: ConfigMap
    metadata:
      labels:
        velero.io/change-storage-class: RestoreItemAction
        velero.io/plugin-config: ""
      name: storage-class-config
      namespace: velero

    Step 6: Velero Restore Operation

    Initiate the Velero restore process. This will replace the existing persistent volumes with new ones provisioned using Disk Premium SSD v2. The ConfigMap will ensure that the restored persistent volumes utilize the new storage class. 

    Reference article: https://velero.io/docs/v1.12/restore-reference 

    # run the below command for restoring from backups to different namespace 
    $ velero restore create restore-name --from-backup backup-name --namespace-mappings namespace1:namespace2
    # verify the new restored resources in namespace2
    $ kubectl get pvc,pv,pod -n namespace2

    Step 7: Verification & Testing

    Verify that all applications continue to function correctly after the restore process. Check for any performance improvements and cost savings as a result of the migration to Premium SSD v2.

    Step 8: Post-Migration Cleanup

    Remove any temporary resources created during the migration process, such as Volume Snapshots, and the custom Volume Snapshot Class. And delete the old persistent volume claims (PVCs) that were associated with the Premium SSD disks. This will trigger the automatic deletion of the corresponding persistent volumes (PVs) and Azure Disk storage.

    Impact:

    It’s less risky because all new objects are created while retaining other copies with snapshots. And during the scheduling of new pods, the new Premium SSD v2 disks will be provisioned in the same zone as the node where the pod is being scheduled. While the content of the new disks is restored from the snapshot, there may be some downtime expected. The duration of downtime depends on the size of the disks being restored.

    Conclusion:

    Migrating from any storage class to a newer, more performant one using Velero can provide significant benefits for your organization. By leveraging Velero’s comprehensive backup and restore functionalities, you can effectively migrate your applications to the new storage class while maintaining data integrity and application functionality. Whether you’re upgrading from Premium SSD to Premium SSD v2 or transitioning to a completely different storage provider. By adopting this approach, organizations can reap the rewards of enhanced performance, reduced costs, and simplified storage management.

  • Unlocking Key Insights in NATS Development: My Journey from Novice to Expert – Part 1

    By examining my personal journey from a NATS novice to mastering its intricacies, this long-form article aims to showcase the importance and applicability of NATS in the software development landscape. Through comprehensive exploration of various topics, readers will gain a solid foundation, advanced techniques, and best practices for leveraging NATS effectively in their projects.

    Introduction

    Our topic for today is about how you can get started with NATS. We are assuming that you are aware of why you need NATS and want to know the concepts of NATS along with a walkthrough for how to deploy those concepts/components in your organization.

    The first part would include the basic concept and Installation Guide, and setup, admin-related CRUD operations, shell scripts which might not be needed immediately but would be good to have in your arsenal. Whereas the second part would be more developer-focused – applying NATS in application, etc. Let’s Begin

    Understanding NATS

    In this section, we will delve into the fundamentals of NATS and its key components.

    A. Definition and Overview

    Nats, which stands for “Naturally Adaptable and Transparent System,” is a lightweight, high-performance messaging system known for its simplicity and scalability. It enables the exchange of messages between applications in a distributed architecture, allowing for seamless communication and increased efficiency.

    B. Architecture Diagram

    To better understand the inner workings of NATS, let’s take a closer look at its architecture. The diagram below illustrates the key components involved in a typical Nats deployment:

    C. Key Features

    NATS offers several key features that make it a powerful messaging system. These include:

    • Publish-Subscribe Model: NATS follows a publish-subscribe model where publishers send messages to subjects and subscribers receive those messages based on their interest in specific subjects.
    • This model allows for flexible and decoupled communication between different parts of an application or across multiple applications.
    • Scalability: With support for horizontal scaling, NATS can handle high loads of message traffic, making it suitable for large-scale systems.
    • Performance: NATS is built for speed, providing low-latency message delivery and high throughput.
    • Reliability: NATS ensures that messages are reliably delivered to subscribers, even in the presence of network interruptions or failures.
    • Security: NATS supports secure communication through various authentication and encryption mechanisms, protecting sensitive data.

    D. Use Cases and Applications

    NATS’ simplicity and versatility make it suitable for a wide range of use cases and applications. Some common use cases include:

    • Real-time data streaming and processing
    • Event-driven architectures
    • Microservices communication
    • IoT (Internet of Things) systems
    • Distributed systems and cloud-native applications

    E. Concepts

    To better grasp the various components and terminologies associated with NATS, let’s explore some key concepts:

    1. NATS server: The NATS server acts as the central messaging infrastructure, responsible for routing messages between publishers and subscribers.
    2. NATS CLI: The NATS command-line interface (CLI) is a tool that provides developers with a command-line interface to interact with the NATS server and perform various administrative tasks.
    3. NATS clients: NATS CLI and clients both are different. The NATS client is an API/code-based approach to access the NATS server. Clients are not as powerful as CLI but are mainly used along with source code to achieve a specified goal. We won’t be covering this as it is not part of the scope.
    4. Routes: Routes allow NATS clusters to bridge and share messages with other nodes within and outside clusters, enabling communication across geographically distributed systems.
    5. Accounts: Accounts in NATS provide isolation and access control mechanisms, ensuring that messages are exchanged securely and only between authorized parties.
    6. Gateway: Gateways list all the servers in different clusters that you want to connect in order to create a supercluster.
    7. SuperCluster: SuperCluster is a powerful feature that allows scaling NATS horizontally across multiple clusters, providing enhanced performance and fault tolerance.

    F. System Requirements

    Before diving into NATS, it’s important to ensure that our system meets the necessary requirements. The system requirements for NATS will vary depending on the specific deployment scenario and use case. However, in general, the minimum requirements include:

    Hardware:

    Network:

    • All the VMs should be part of the same cluster.
    • 4222, 8222, 4248, and 7222 ports should be open for inter-server and client connection.
    • Whitelisting of GitHub EMU account on prod servers (Phase 2).
    • Get AVI VIP for all the clusters from the network team.

    Logs:

    By default, logs will be disabled, but the configuration file will have placeholders for logs enablement. Some of the important changes include:

    • debug: It will show system logs in verbose.
    • trace: It will record every message processed on NATS.
    • logtime, logfile_size_limit, log_file: As the name represents, it will show the time when recording the logs, individual file limit for log files (once filled, auto rotation is done by NATS), and the name of the file, respectively.

    TLS:

    I will be showing the configuration of how to use the certs. Do remember, this setup is being done for the development environment to allow more flexibility towards explaining things and executing it.

    Getting Started with NATS

    In this section, we will guide you through the installation and setup process for NATS.

    Building the Foundation

    First, we will focus on building a strong foundation in NATS by understanding its core concepts and implementing basic messaging patterns.

    A. Understanding NATS Subjects

    In NATS, subjects serve as identifiers that help publishers and subscribers establish communication channels. They are represented as hierarchical strings, allowing for flexibility in message routing and subscription matching.

    B. Exploring Messages, Publishers, and Subscribers

    Messages are the units of data exchanged between applications through NATS. Publishers create and send messages, while subscribers receive and process them based on their subscribed subjects of interest.

    C. Implementing Basic Pub/Sub Pattern

    The publish-subscribe pattern is a fundamental messaging pattern in NATS. It allows publishers to distribute messages to multiple subscribers interested in specific subjects, enabling decoupled and efficient communication between different parts of the system.

    D. JetStream

    JetStream is an advanced addition to NATS that provides durable, persistent message storage and retention policies. It is designed to handle high-throughput streaming scenarios while ensuring data integrity and fault tolerance.

    E. Single Cluster vs. SuperCluster

    NATS supports both single clusters and superclusters. Single clusters are ideal for smaller deployments, whereas superclusters provide the ability to horizontally scale NATS across multiple clusters, enhancing performance and fault tolerance.

    Implementation

    As this blog is about deploying NATS from an admin perspective. We will be using only shell script for this purpose.

    Let’s start with the implementation process:

    Prerequisite

    These commands are required to be run on all the servers hosting Nats-server. In this blog, we will cover a 3-node cluster that will be working at Jetstream.

    Installing NatsCLI and Nats-server:

    mkdir -p rpm

    # NATSCLI

    curl -o rpm/nats-0.0.35.rpm  -L https://github.com/nats-io/natscli/releases/download/v0.0.35/nats-0.0.35-amd64.rpm

    sudo yum install -y rpm/nats-0.0.35.rpm

    # NATS-server

    curl -o rpm/nats-server-2.9.20.rpm  -L https://github.com/nats-io/nats-server/releases/download/v2.9.20/nats-server-v2.9.20-amd64.rpm

    sudo yum install -y rpm/nats-server-2.9.20.rpm

    Local Machine Setup for JetStream:

    # Create User

    sudo useradd –system –home /nats –shell /bin/false nats

    # Jetstream Storage

    sudo mkdir -p /nats/storage

    # Certs

    sudo mkdir -p /nats/certs

    # Logs

    sudo mkdir -p /nats/logs

    # Setting Right Permission

    sudo chown –recursive nats:nats /nats

    sudo chmod 777 /nats

    sudo chmod 777 /nats/storage

    Next, we will create the service file in the servers at /etc/systemd/system/nats.service

    sudo bash -c ‘cat <<EOF > /etc/systemd/system/nats.service

    [Unit]

    Description=NATS Streaming Daemon

    Requires=network-online.target

    After=network-online.target

    ConditionFileNotEmpty=/nats/nats.conf

    [Service]

    #Type=notify

    User=nats

    Group=nats

    ExecStart=/usr/local/bin/nats-server -config=/nats/nats.conf

    #KillMode=process

    Restart=always

    RestartSec=10

    StandardOutput=syslog

    StandardError=syslog

    #TimeoutSec=900

    #LimitNOFILE=65536

    #LimitMEMLOCK=infinity

    [Install]

    WantedBy=multi-user.target

    EOF’

    Full File will look like:

    #!/bin/bash
    
    mkdir -p rpm
    
    # NATSCLI
    curl -o rpm/nats-0.0.35.rpm  -L https://github.com/nats-io/natscli/releases/download/v0.0.35/nats-0.0.35-amd64.rpm
    sudo yum install -y rpm/nats-0.0.35.rpm
    
    # NATS-server
    curl -o rpm/nats-server-2.9.20.rpm  -L https://github.com/nats-io/nats-server/releases/download/v2.9.20/nats-server-v2.9.20-amd64.rpm
    sudo yum install -y rpm/nats-server-2.9.20.rpm
    
    # Create User
    sudo useradd --system --home /nats --shell /bin/false nats
    
    # Jetstream Storage
    sudo mkdir -p /nats/storage
    
    # Certs
    sudo mkdir -p /nats/certs
    
    # Logs
    sudo mkdir -p /nats/logs
    
    # Setting Right Permission
    sudo chown --recursive nats:nats /nats
    sudo chmod 777 /nats
    sudo chmod 777 /nats/storage
    sudo bash -c 'cat <<EOF > /etc/systemd/system/nats.service
    [Unit]
    Description=NATS Streaming Daemon
    Requires=network-online.target
    After=network-online.target
    ConditionFileNotEmpty=/nats/nats.conf
    [Service]
    #Type=notify
    User=nats
    Group=nats
    ExecStart=/usr/local/bin/nats-server -config=/nats/nats.conf
    #KillMode=process
    Restart=always
    RestartSec=10
    StandardOutput=syslog
    StandardError=syslog
    #TimeoutSec=900
    #LimitNOFILE=65536
    #LimitMEMLOCK=infinity
    [Install]
    WantedBy=multi-user.target
    EOF'

    Creating conf file at all the servers at /nats directory

    Server setup

    server_name=nts0

    listen: <IP/DNS-First>:4222 # For other servers edit the IP/DNS remaining in the cluster

    https: <DNS-First>:8222

    #http: <IP/DNS-First>:8222 # Uncommnet this if you are running without tls certs 

    JetStream Configuration

    jetstream {

      store_dir=/nats/storage

      max_mem_store: 6GB

      max_file_store: 90GB

    }

    Intra Cluster Setup

    cluster {

      name: dev-nats # Super Cluster should have unique Cluster names

      host: <IP/DNS-First>

      port: 4248

      routes = [

        nats-route://<IP/DNS-First>:4248

        nats-route://<IP/DNS-Second>:4248

        nats-route://<IP/DNS-Third>:4248

      ]

    }

    Account Setup

    accounts: {

      $SYS: {

        users: [

          { user: admin, password: password }

        ]

      },

      B: {

        users: [

          {user: b, password: b}

        ],

        jetstream: enabled,

        imports: [

        # {stream: {account: “$G”}}

        ]

      },

      C: {

        users: [

          {user: c, password: c}

        ],

        jetstream: enabled,

        imports: [

        ]

      },

      E: {

        users: [

          {user: e, password: e}

        ],

        jetstream: enabled,

        imports: [

        ]

      }

    }

    no_auth_user: e # Change this on every server to have a user in the system which does not need password, allowing local account in supercluster

    We can use “Accounts” to help us provide local and global stream separation, the configuration is identical except for the changes in the no_auth_user which must be unique for each cluster, making the stream only accessible from the given cluster without the need of providing credentials exclusively.

    Gateway Setup: 

    Intra Cluster/Route Setup and Account Setup remain similar and need to be present in another cluster with the cluster having the name “new-dev-nats.”

    gateway {

      name: dev-nats

      listen: <IP/DNS-First>:7222

      gateways: [

        {name: dev-nats, urls: [nats://<IP/DNS-First>:7222, nats://<IP/DNS-Second>:7222, nats://<IP/DNS-Third>:7222]},

        {name: new-dev-nats, urls: [nats://<NEW-IP/DNS-First>:7222, nats://<NEW-IP/DNS-Second>:7222, nats://<NEW-IP/DNS-Third>:7222]}

      ]

    }

    TLS setup

    tls: {

      cert_file: “/nats/certs/natsio.crt”

      key_file: “/nats/certs/natsio.key”

      ca_file: “/nats/certs/natsio_rootCA.pem”

    }

    NOTE: no_auth_user: b is a special directive within NATS. If you choose to keep it seperate accross all the nodes in the cluster, you can have a “local account” setup in supercluster. This is beneficial when you want to publish data which should not be accessible by any other server.

    Complete conf file on <IP/DNS-First> machine would look like this:

    # `server_name`: Unique name for your node; attaching a number with increment value is recommended
    # listen: DNS name for the current node:4222
    # https: DNS name for the current node:8222
    # cluster.name: This is the name of your cluster. It is compulsory for them to be the same across all nodes.
    # cluster.host: DNS name for the current node
    # cluster.routes: List of all the DNS entries which will be part of the cluster in separate lines:4248
    # account.user: Make sure to use proper names here and also keep the same across all the nodes which will be involved as a super cluster
    # no_auth_user: To be unique for individual cluster
    # gateway.name: Should be for the current cluster the node is part of. (Best to match with cluster.name mentioned above)
    # gateway.listen: The same logic mentioned for listen is applicable here with port 7222
    # gateways:Mention all the nodes in all the cluster here with nodes separated logically by the cluster they are part of via name
    # tls: Make sure to have the certs ready to place at /nats/certs
    
    server_name=nts0
    listen: <IP/DNS-First>:4222 # For other servers edit the IP/DNS remaining in the cluster
    https: <DNS-First>:8222
    #http: <IP/DNS-First>:8222 # Uncommnet this if you are running without tls certs 
    
    jetstream {
      store_dir=/nats/storage
      max_mem_store: 6GB
      max_file_store: 90GB
    }
    
    cluster {
      name: dev-nats # Super Cluster should have unique Cluster names
      host: <IP/DNS-First>
      port: 4248
      routes = [
        nats-route://<IP/DNS-First>:4248
        nats-route://<IP/DNS-Second>:4248
        nats-route://<IP/DNS-Third>:4248
      ]
    }
    
    accounts: {
      $SYS: {
        users: [
          { user: admin, password: password }
        ]
      },
      B: {
        users: [
          {user: b, password: b}
        ],
        jetstream: enabled,
        imports: [
        # {stream: {account: "$G"}}
        ]
      },
      C: {
        users: [
          {user: c, password: c}
        ],
        jetstream: enabled,
        imports: [
        ]
      },
      E: {
        users: [
          {user: e, password: e}
        ],
        jetstream: enabled,
        imports: [
        ]
      }
    }
    
    no_auth_user: e # Change this on every server to have a user in the system which does not need password, allowing local account in supercluster
    
    gateway {
      name: dev-nats
      listen: <IP/DNS-First>:7222
      gateways: [
        {name: dev-nats, urls: [nats://<IP/DNS-First>:7222, nats://<IP/DNS-Second>:7222, nats://<IP/DNS-Third>:7222]},
        {name: new-dev-nats, urls: [nats://<NEW-IP/DNS-First>:7222, nats://<NEW-IP/DNS-Second>:7222, nats://<NEW-IP/DNS-Third>:7222]}
      ]
    }
    
    tls: {
      cert_file: "/nats/certs/natsio.crt"
      key_file: "/nats/certs/natsio.key"
      ca_file: "/nats/certs/natsio_rootCA.pem"
    }

    Recap on Conf File Changes

    The configuration file in all the nodes for all the environments will need to be updated, to support “gateway” and “accounts.”

    • Individual changes on all the conf files need to be done.
    • Changes for the gateway will be almost similar except for the change in the name, which will be specific to the local cluster of which the given node is part of.
    • Changes for an “account” will be almost similar except for the “no_auth_user” parameter, which will be specific to the local cluster of which the given node is part of.
    • The “nats-server –signal reload” command should be able to pick up the changes.

    Starting Service

    After adding the certs, re-own the files:

    sudo chown –recursive nats:nats /nats

    Creating firewall rules:

    sudo firewall-cmd –permanent –add-port=4222/tcp

    sudo firewall-cmd –permanent –add-port=8222/tcp

    sudo firewall-cmd –permanent –add-port=4248/tcp

    sudo firewall-cmd –permanent –add-port=7222/tcp

    sudo firewall-cmd –reload

    Start the service:

    sudo systemctl start nats.service

    sudo systemctl enable nats.service

    Check status:

    sudo systemctl status nats.service -l

    Note: Remember to check logs of status commands in node2 and node3; it should show a connection with node1 and also confirm that node1 has been made the leader.

    Setting up the context:

    Setting up context will help us in managing our cluster better with NATSCLI.

    # pass the –tlsca flag in dev because we do not have the DNS registered. In staging and Production the `tlsca` flag will not be needed because certs will be registered.

    nats context add nats –server <IP/DNS-First>:4222,<IP/DNS-Second>:4222,<IP/DNS-Third>:4222 –description “Awesome Nats Servers List” –tlsca /nats/certs/natsio_rootCA.pem –select

    nats context ls

    nats account info

    Complete file for starting service would like this:

    #!/bin/bash
    
    # Own the files
    sudo chown --recursive nats:nats /nats
    
    # Create Firewall Rules
    sudo firewall-cmd --permanent --add-port=4222/tcp
    sudo firewall-cmd --permanent --add-port=8222/tcp
    sudo firewall-cmd --permanent --add-port=4248/tcp
    sudo firewall-cmd --permanent --add-port=7222/tcp
    sudo firewall-cmd --reload
    
    # Start Service
    sudo systemctl start nats.service
    sudo systemctl enable nats.service
    sudo systemctl status nats.service -l
    
    # Setup Context
    # pass the --tlsca flag in dev because we do not have the DNS registered. In staging and Production the `tlsca` flag will not be needed because certs will be registered.
    nats context add nats --server <IP/DNS-First>:4222,<IP/DNS-Second>:4222,<IP/DNS-Third>:4222 --description "Awesome Nats Servers List" --tlsca /nats/certs/natsio_rootCA.pem --select
    
    nats context ls
    nats account info

    Validation: 

    The account info command should shuffle among the servers in the Connected URL string.

    Stream Listing:

    Streams that will be available across the regions would require the credentials. The creds should be common across all clusters:

    The same info can be obtained from the different clusters when the same command is fired:

    To fetch local streams that are present under the no_auth_user:

    And from the different clusters using the same command (without credentials), we should get a different stream:

    Advanced Messaging Patterns with NATS

    In this section, we will explore advanced messaging patterns that leverage the capabilities of NATS for more complex communication scenarios.

    A. Request-Reply Pattern

    The request-reply pattern allows applications to send requests and receive corresponding responses through NATS. It enables synchronous communication, making it suitable for scenarios where immediate responses are required.

    B. Publish-Subscribe Pattern with Wildcards

    Nats introduces the concept of wildcards to the publish-subscribe pattern, allowing subscribers to receive messages based on pattern matching. This enables greater flexibility in subscription matching and expands the possibilities of message distribution.

    C. Queue Groups for Load Balancing and Fault Tolerance

    Queue groups provide load balancing and fault tolerance capabilities in NATS. By grouping subscribers together, NATS ensures that messages are distributed evenly across the subscribers within the group, preventing any single subscriber from being overwhelmed.

    Overcoming Real-World Challenges

    In this section, we will discuss real-world challenges that developers may encounter when working with NATS and explore strategies to overcome them.

    A. Scalability and High Availability in NATS

    As applications grow and message traffic increases, scalability, and high availability become crucial considerations. NATS offers various techniques and features to address these challenges, including clustering, load balancing, and fault tolerance mechanisms.

    B. Securing NATS Communication

    Security is paramount in any messaging system, and NATS provides several mechanisms to secure communication. These include authentication, encryption, access control, and secure network configurations.

    C. Monitoring and Debugging Techniques

    Efficiently monitoring and troubleshooting a NATS deployment is essential for maintaining system health. NATS provides tools and techniques to monitor message traffic, track performance metrics, and identify and resolve potential issues in real time.

    Recovery Scenarios in NATS 

    This section is intended to help in scenarios when NATS services are not usable. Scenarios such as Node failure, Not reachable, Service down, or region down are some examples of such a situation.

    Summary

    In this article, we have embarked on a journey from being a NATS novice to mastering its intricacies. We have explored the importance and applicability of NATS in the software development landscape. Through a comprehensive exploration of NATS’ definition, architecture, key features, and use cases, we have built a strong foundation in NATS. We have also examined advanced messaging patterns and discussed strategies to overcome real-world challenges in scalability, security, and monitoring. Furthermore, we have delved into the Recovery scenarios, which might come in handy when things don’t behave as expected. Armed with this knowledge, developers can confidently utilize NATS to unlock its full potential in their projects.

  • Unveiling the Magic of Kubernetes: Exploring Pod Priority, Priority Classes, and Pod Preemption

    ‍Introduction:

    Generally, during the deployment of a manifest, we observe that some pods get successfully scheduled, while few critical pods encounter scheduling issues. Therefore, we must schedule the critical pods first over other pods. While exploring, we discovered a built-in solution for scheduling using Pod Priority and Priority Class. So, in this blog, we’ll be talking about Priority Class and Pod Priority and how we can implement them in our use case.

    Pod Priority:

    It is used to prioritize one pod over another based on its importance. Pod Priority is particularly useful when critical pods cannot be scheduled due to limited resources.

    Priority Classes:

    This Kubernetes object defines the priority of pods. Priority can be set by an integer value. Higher-priority values have higher priority to the pod.

    Understanding Priority Values:

    Priority Classes in Kubernetes are associated with priority values that range from 0 to 1000000000, with a higher value indicating greater importance.

    These values act as a guide for the scheduler when allocating resources. 

    Pod Preemption:

    It is already enabled when we create a priority class. The purpose of Pod Preemption is to evict lower-priority pods in order to make room for higher-priority pods to be scheduled.

    Example Scenario: The Enchanted Shop

    Let’s dive into a scenario featuring “The Enchanted Shop,” a Kubernetes cluster hosting an online store. The shop has three pods, each with a distinct role and priority:

    Priority Class:

    • Create High priority class: 
    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: high-priority
    value: 1000000

    • Create Medium priority class:
    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: medium-priority
    value: 500000

    • Create Low priority class:
    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: low-priority
    value: 100000

    Pods:

    • Checkout Pod (High Priority): This pod is responsible for processing customer orders and must receive top priority.

    Create the Checkout Pod with a high-priority class:

    apiVersion: v1
    kind: Pod
    metadata:
      name: checkout-pod
      labels:
        app: checkout
    spec:
      priorityClassName: high-priority
      containers:
      - name: checkout-container
        image: nginx:checkout

    • Product Recommendations Pod (Medium Priority):

    This pod provides personalized product recommendations to customers and holds moderate importance.

    Create the Product Recommendations Pod with a medium priority class:

    apiVersion: v1
    kind: Pod
    metadata:
      name: product-rec-pod
      labels:
        app: product-recommendations
    spec:
      priorityClassName: medium-priority
      containers:
      - name: product-rec-container
        image: nginx:store

    • Shopping Cart Pod (Low Priority):

    This pod manages customers’ shopping carts and has a lower priority compared to the others.

    Create the Shopping Cart Pod with a low-priority class:

    apiVersion: v1
    kind: Pod
    metadata:
      name: shopping-cart-pod
      labels:
        app: shopping-cart
    spec:
      priorityClassName: low-priority
      containers:
      - name: shopping-cart-container
        image: nginx:cart

    With these pods and their respective priority classes, Kubernetes will allocate resources based on their importance, ensuring smooth operation even during peak loads.

    Commands to Witness the Magic:

    • Verify Priority Classes:

    kubectl get priorityclasses

    Note: Kubernetes includes two predefined Priority Classes: system-cluster-critical and system-node-critical. These classes are specifically designed to prioritize the scheduling of critical components, ensuring they are always scheduled first.

    • Check Pod Priority:

    Conclusion:

    In Kubernetes, you have the flexibility to define how your pods are scheduled. This ensures that your critical pods receive priority over lower-priority pods during the scheduling process. To get deeper into the concepts of Pod Priority, Priority Class, and Pod Preemption, you can find more information by referring to the following links.

  • How to deploy GitHub Actions Self-Hosted Runners on Kubernetes

    GitHub Actions jobs are run in the cloud by default; however, sometimes we want to run jobs in our own customized/private environment where we have full control. That is where a self-hosted runner saves us from this problem. 

    To get a basic understanding of running self-hosted runners on the Kubernetes cluster, this blog is perfect for you. 

    We’ll be focusing on running GitHub Actions on a self-hosted runner on Kubernetes. 

    An example use case would be to create an automation in GitHub Actions to execute MySQL queries on MySQL Database running in a private network (i.e., MySQL DB, which is not accessible publicly).

    A self-hosted runner requires the provisioning and configuration of a virtual machine instance; here, we are running it on Kubernetes. For running a self-hosted runner on a Kubernetes cluster, the action-runner-controller helps us to make that possible.

    This blog aims to try out self-hosted runners on Kubernetes and covers:

    1. Deploying MySQL Database on minikube, which is accessible only within Kubernetes Cluster.
    2. Deploying self-hosted action runners on the minikube.
    3. Running GitHub Action on minikube to execute MySQL queries on MySQL Database.

    Steps for completing this tutorial:

    Create a GitHub repository

    1. Create a private repository on GitHub. I am creating it with the name velotio/action-runner-poc.

    Setup a Kubernetes cluster using minikube

    1. Install Docker.
    2. Install Minikube.
    3. Install Helm 
    4. Install kubectl

    Install cert-manager on a Kubernetes cluster

    • By default, actions-runner-controller uses cert-manager for certificate management of admission webhook, so we have to make sure cert-manager is installed on Kubernetes before we install actions-runner-controller. 
    • Run the below helm commands to install cert-manager on minikube.
    • Verify installation using “kubectl –namespace cert-manager get all”. If everything is okay, you will see an output as below:

    Setting Up Authentication for Hosted Runners‍

    There are two ways for actions-runner-controller to authenticate with the GitHub API (only 1 can be configured at a time, however):

    1. Using a GitHub App (not supported for enterprise-level runners due to lack of support from GitHub.)
    2. Using a PAT (personal access token)

    To keep this blog simple, we are going with PAT.

    To authenticate an action-runner-controller with the GitHub API, we can use a  PAT with the action-runner-controller registers a self-hosted runner.

    • Go to account > Settings > Developers settings > Personal access token. Click on “Generate new token”. Under scopes, select “Full control of private repositories”.
    •  Click on the “Generate token” button.
    • Copy the generated token and run the below commands to create a Kubernetes secret, which will be used by action-runner-controller deployment.
    export GITHUB_TOKEN=XXXxxxXXXxxxxXYAVNa 

    kubectl create ns actions-runner-system

    Create secret

    kubectl create secret generic controller-manager  -n actions-runner-system 
    --from-literal=github_token=${GITHUB_TOKEN}

    Install action runner controller on the Kubernetes cluster

    • Run the below helm commands
    helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller
    helm repo update
    helm upgrade --install --namespace actions-runner-system 
    --create-namespace --wait actions-runner-controller 
    actions-runner-controller/actions-runner-controller --set 
    syncPeriod=1m

    • Verify that the action-runner-controller installed properly using below command
    kubectl --namespace actions-runner-system get all

     

    Create a Repository Runner

    • Create a RunnerDeployment Kubernetes object, which will create a self-hosted runner named k8s-action-runner for the GitHub repository velotio/action-runner-poc
    • Please Update Repo name from “velotio/action-runner-poc” to “<Your-repo-name>”
    • To create the RunnerDeployment object, create the file runner.yaml as follows:
    apiVersion: actions.summerwind.dev/v1alpha1
    kind: RunnerDeployment
    metadata:
     name: k8s-action-runner
     namespace: actions-runner-system
    spec:
     replicas: 2
     template:
       spec:
         repository: velotio/action-runner-poc

    • To create, run this command:
    kubectl create -f runner.yaml

    Check that the pod is running using the below command:

    kubectl get pod -n actions-runner-system | grep -i "k8s-action-runner"

    • If everything goes well, you should see two action runners on the Kubernetes, and the same are registered on Github. Check under Settings > Actions > Runner of your repository.
    • Check the pod with kubectl get po -n actions-runner-system

    Install a MySQL Database on the Kubernetes cluster

    • Create PV and PVC for MySQL Database. 
    • Create mysql-pv.yaml with the below content.
    apiVersion: v1
    kind: PersistentVolume
    metadata:
     name: mysql-pv-volume
     labels:
       type: local
    spec:
     capacity:
       storage: 2Gi
     accessModes:
       - ReadWriteOnce
     hostPath:
       path: "/mnt/data"
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
     name: mysql-pv-claim
    spec:
     accessModes:
       - ReadWriteOnce
     resources:
       requests:
         storage: 2Gi

    • Create mysql namespace
    kubectl create ns mysql

    • Now apply mysql-pv.yaml to create PV and PVC 
    kubectl create -f mysql-pv.yaml -n mysql

    Create the file mysql-svc-deploy.yaml and add the below content to mysql-svc-deploy.yaml

    Here, we have used MYSQL_ROOT_PASSWORD as “password”.

    apiVersion: v1
    kind: Service
    metadata:
     name: mysql
    spec:
     ports:
       - port: 3306
     selector:
       app: mysql
     clusterIP: None
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
     name: mysql
    spec:
     selector:
       matchLabels:
         app: mysql
     strategy:
       type: Recreate
     template:
       metadata:
         labels:
           app: mysql
       spec:
         containers:
           - image: mysql:5.6
             name: mysql
             env:
                 # Use secret in real usage
               - name: MYSQL_ROOT_PASSWORD
                 value: password
             ports:
               - containerPort: 3306
                 name: mysql
             volumeMounts:
               - name: mysql-persistent-storage
                 mountPath: /var/lib/mysql
         volumes:
           - name: mysql-persistent-storage
             persistentVolumeClaim:
               claimName: mysql-pv-claim

    • Create the service and deployment
    kubectl create -f mysql-svc-deploy.yaml -n mysql

    • Verify that the MySQL database is running
    kubectl get po -n mysql

    Create a GitHub repository secret to store MySQL password

    As we will use MySQL password in the GitHub action workflow file as a good practice, we should not use it in plain text. So we will store MySQL password in GitHub secrets, and we will use this secret in our GitHub action workflow file.

    • Create a secret in the GitHub repository and give the name to the secret as “MYSQL_PASS”, and in the values, enter “password”. 

    Create a GitHub workflow file

    • YAML syntax is used to write GitHub workflows. For each workflow, we use a separate YAML file, which we store at .github/workflows/ directory. So, create a .github/workflows/ directory in your repository and create a file .github/workflows/mysql_workflow.yaml as follows.
    ---
    name: Example 1
    on:
     push:
       branches: [ main ]
    jobs:
     build:
       name: Build-job
       runs-on: self-hosted
       steps:
       - name: Checkout
         uses: actions/checkout@v2
     
       - name: MySQLQuery
         env:
           PASS: ${{ secrets.MYSQL_PASS }}
         run: |
           docker run -v ${GITHUB_WORKSPACE}:/var/lib/docker --rm mysql:5.6 sh -c "mysql -u root -p$PASS -hmysql.mysql.svc.cluster.local </var/lib/docker/test.sql"

    • If you check the docker run command in the mysql_workflow.yaml file, we are referring to the .sql file, i.e., test.sql. So, create a test.sql file in your repository as follows:
    use mysql;
    CREATE TABLE IF NOT EXISTS Persons (
       PersonID int,
       LastName varchar(255),
       FirstName varchar(255),
       Address varchar(255),
       City varchar(255)
    );
     
    SHOW TABLES;

    • In test.sql, we are running MySQL queries like create tables.
    • Push changes to your repository main branch.
    • If everything is fine, you will be able to see that the GitHub action is getting executed in a self-hosted runner pod. You can check it under the “Actions” tab of your repository.
    • You can check the workflow logs to see the output of SHOW TABLES—a command we have used in the test.sql file—and check whether the persons tables is created.

    References

  • How to Setup HashiCorp Vault HA Cluster with Integrated Storage (Raft)

    As businesses move their data to the public cloud, one of the most pressing issues is how to keep it safe from illegal access.

    Using a tool like HashiCorp Vault gives you greater control over your sensitive credentials and fulfills cloud security regulations.

    In this blog, we’ll walk you through HashiCorp Vault High Availability Setup.

    Hashicorp Vault

    Hashicorp Vault is an open-source tool that provides a secure, reliable way to store and distribute sensitive information like API keys, access tokens, passwords, etc. Vault provides high-level policy management, secret leasing, audit logging, and automatic revocation to protect this information using UI, CLI, or HTTP API.

    High Availability

    Vault can run in a High Availability mode to protect against outages by running multiple Vault servers. When running in HA mode, Vault servers have two additional states, i.e., active and standby. Within a Vault cluster, only a single instance will be active, handling all requests, and all standby instances redirect requests to the active instance.

    Integrated Storage Raft

    The Integrated Storage backend is used to maintain Vault’s data. Unlike other storage backends, Integrated Storage does not operate from a single source of data. Instead, all the nodes in a Vault cluster will have a replicated copy of Vault’s data. Data gets replicated across all the nodes via the Raft Consensus Algorithm.

    Raft is officially supported by Hashicorp.

    Architecture

    Prerequisites

    This setup requires Vault, Sudo access on the machines, and the below configuration to create the cluster.

    • Install Vault v1.6.3+ent or later on all nodes in the Vault cluster 

    In this example, we have 3 CentOs VMs provisioned using VMware. 

    Setup

    1. Verify the Vault version on all the nodes using the below command (in this case, we have 3 nodes node1, node2, node3):

    vault --version

    2. Configure SSL certificates

    Note: Vault should always be used with TLS in production to provide secure communication between clients and the Vault server. It requires a certificate file and key file on each Vault host.

    We can generate SSL certs for the Vault Cluster on the Master and copy them on the other nodes in the cluster.

    Refer to: https://developer.hashicorp.com/vault/tutorials/secrets-management/pki-engine#scenario-introduction for generating SSL certs.

    • Copy tls.crt tls.key tls_ca.pem to /etc/vault.d/ssl/ 
    • Change ownership to `vault`
    [user@node1 ~]$ cd /etc/vault.d/ssl/           
    [user@node1 ssl]$ sudo chown vault. tls*

    • Copy tls* from /etc/vault.d/ssl to of the nodes

    3. Configure the enterprise license. Copy license on all nodes:

    cp /root/vault.hclic /etc/vault.d/vault.hclic
    chown root:vault /etc/vault.d/vault.hclic
    chmod 0640 /etc/vault.d/vault.hclic

    4. Create the storage directory for raft storage on all nodes:

    sudo mkdir --parents /opt/raft
    sudo chown --recursive vault:vault /opt/raft

    5. Set firewall rules on all nodes:

    sudo firewall-cmd --permanent --add-port=8200/tcp
    sudo firewall-cmd --permanent --add-port=8201/tcp
    sudo firewall-cmd --reload

    6. Create vault configuration file on all nodes:

    ### Node 1 ###
    [user@node1 vault.d]$ cat vault.hcl
    storage "raft" {
        path = "/opt/raft"
        node_id = "node1"
        retry_join 
        {
            leader_api_addr = "https://node2.int.us-west-1-dev.central.example.com:8200"
            leader_ca_cert_file = "/etc/vault.d/ssl/tls_ca.pem"
            leader_client_cert_file = "/etc/vault.d/ssl/tls.crt"
            leader_client_key_file = "/etc/vault.d/ssl/tls.key"
        }
        retry_join 
        {
            leader_api_addr = "https://node3.int.us-west-1-dev.central.example.com:8200"
            leader_ca_cert_file = "/etc/vault.d/ssl/tls_ca.pem"
            leader_client_cert_file = "/etc/vault.d/ssl/tls.crt"
            leader_client_key_file = "/etc/vault.d/ssl/tls.key"
        }
    }
    
    listener "tcp" {
       address = "0.0.0.0:8200"
       tls_disable = false
       tls_cert_file = "/etc/vault.d/ssl/tls.crt"
       tls_key_file = "/etc/vault.d/ssl/tls.key"
       tls_client_ca_file = "/etc/vault.d/ssl/tls_ca.pem"
       tls_cipher_suites = "TLS_TEST_128_GCM_SHA256,
                            TLS_TEST_128_GCM_SHA256,
                            TLS_TEST20_POLY1305,
                            TLS_TEST_256_GCM_SHA384,
                            TLS_TEST20_POLY1305,
                            TLS_TEST_256_GCM_SHA384"
    }
    api_addr = "https://node1.int.us-west-1-dev.central.example.com:8200"
    cluster_addr = "https://node1.int.us-west-1-dev.central.example.com:8201"
    disable_mlock = true
    ui = true
    log_level = "trace"
    disable_cache = true
    cluster_name = "POC"
    
    # Enterprise license_path
    # This will be required for enterprise as of v1.8
    license_path = "/etc/vault.d/vault.hclic"

    ### Node 2 ###
    [user@node2 vault.d]$ cat vault.hcl
    storage "raft" {
        path = "/opt/raft"
        node_id = "node2"
        retry_join 
        {
            leader_api_addr = "https://node1.int.us-west-1-dev.central.example.com:8200"
            leader_ca_cert_file = "/etc/vault.d/ssl/tls_ca.pem"
            leader_client_cert_file = "/etc/vault.d/ssl/tls.crt"
            leader_client_key_file = "/etc/vault.d/ssl/tls.key"
        }
        retry_join 
        {
            leader_api_addr = "https://node3.int.us-west-1-dev.central.example.com:8200"
            leader_ca_cert_file = "/etc/vault.d/ssl/tls_ca.pem"
            leader_client_cert_file = "/etc/vault.d/ssl/tls.crt"
            leader_client_key_file = "/etc/vault.d/ssl/tls.key"
        } 
    }
    
    listener "tcp" {
       address = "0.0.0.0:8200"
       tls_disable = false
       tls_cert_file = "/etc/vault.d/ssl/tls.crt"
       tls_key_file = "/etc/vault.d/ssl/tls.key"
       tls_client_ca_file = "/etc/vault.d/ssl/tls_ca.pem"
       tls_cipher_suites = "TLS_TEST_128_GCM_SHA256,
                            TLS_TEST_128_GCM_SHA256,
                            TLS_TEST20_POLY1305,
                            TLS_TEST_256_GCM_SHA384,
                            TLS_TEST20_POLY1305,
                            TLS_TEST_256_GCM_SHA384"
    }
    api_addr = "https://node2.int.us-west-1-dev.central.example.com:8200"
    cluster_addr = "https://node2.int.us-west-1-dev.central.example.com:8201"
    disable_mlock = true
    ui = true
    log_level = "trace"
    disable_cache = true
    cluster_name = "POC"
    
    # Enterprise license_path
    # This will be required for enterprise as of v1.8
    license_path = "/etc/vault.d/vault.hclic"

    ### Node 3 ###
    [user@node3 ~]$ cat /etc/vault.d/vault.hcl
    storage "raft" {
        path = "/opt/raft"
        node_id = "node3"
        retry_join 
        {
            leader_api_addr = "https://node1.int.us-west-1-dev.central.example.com:8200"
            leader_ca_cert_file = "/etc/vault.d/ssl/tls_ca.pem"
            leader_client_cert_file = "/etc/vault.d/ssl/tls.crt"
            leader_client_key_file = "/etc/vault.d/ssl/tls.key"
        }
        retry_join 
        {
            leader_api_addr = "https://node2.int.us-west-1-dev.central.example.com:8200"
            leader_ca_cert_file = "/etc/vault.d/ssl/tls_ca.pem"
            leader_client_cert_file = "/etc/vault.d/ssl/tls.crt"
            leader_client_key_file = "/etc/vault.d/ssl/tls.key"
        }
    }
    
    listener "tcp" {
       address = "0.0.0.0:8200"
       tls_disable = false
       tls_cert_file = "/etc/vault.d/ssl/tls.crt"
       tls_key_file = "/etc/vault.d/ssl/tls.key"
       tls_client_ca_file = "/etc/vault.d/ssl/tls_ca.pem"
       tls_cipher_suites = "TLS_TEST_128_GCM_SHA256,
                            TLS_TEST_128_GCM_SHA256,
                            TLS_TEST20_POLY1305,
                            TLS_TEST_256_GCM_SHA384,
                            TLS_TEST20_POLY1305,
                            TLS_TEST_256_GCM_SHA384"
    }
    api_addr = "https://node3.int.us-west-1-dev.central.example.com:8200"
    cluster_addr = "https://node3.int.us-west-1-dev.central.example.com:8201"
    disable_mlock = true
    ui = true
    log_level = "trace"
    disable_cache = true
    cluster_name = "POC"
    
    # Enterprise license_path
    # This will be required for enterprise as of v1.8
    license_path = "/etc/vault.d/vault.hclic"

    7. Set environment variables on all nodes:

    export VAULT_ADDR=https://$(hostname):8200
    export VAULT_CACERT=/etc/vault.d/ssl/tls_ca.pem
    export CA_CERT=`cat /etc/vault.d/ssl/tls_ca.pem`

    8. Start Vault as a service on all nodes:

    You can view the systemd unit file if interested by: 

    cat /etc/systemd/system/vault.service
    systemctl enable vault.service
    systemctl start vault.service
    systemctl status vault.service

    9. Check Vault status on all nodes:

    vault status

    10. Initialize Vault with the following command on vault node 1 only. Store unseal keys securely.

    [user@node1 vault.d]$ vault operator init -key-shares=1 -key-threshold=1
    Unseal Key 1: HPY/g5OiT8ivD6L4Bqfjx9L1We2MVb4WZAqKZk6zFf8=
    Initial Root Token: hvs.j4qTq1IZP9nscILMtN2p9GE0
    Vault initialized with 1 key shares and a key threshold of 1.
    Please securely distribute the key shares printed above. 
    When the Vault is re-sealed, restarted, or stopped, you must supply at least 1 of these keys to unseal it
    before it can start servicing requests.
    Vault does not store the generated root key. 
    Without at least 1 keys to reconstruct the root key, Vault will remain permanently sealed!
    It is possible to generate new unseal keys, provided you have a
    quorum of existing unseal keys shares. See "vault operator rekey" for more information.

    11. Set Vault token environment variable for the vault CLI command to authenticate to the server. Use the following command, replacing <initial-root- token> with the value generated in the previous step.

    export VAULT_TOKEN=<initial-root-token>
    echo "export VAULT_TOKEN=$VAULT_TOKEN" >> /root/.bash_profile
    ### Repeat this step for the other 2 servers.

    12. Unseal Vault1 using the unseal key generated in step 10. Notice the Unseal Progress key-value change as you present each key. After meeting the key threshold, the status of the key value for Sealed should change from true to false.

    [user@node1 vault.d]$ vault operator unseal HPY/g5OiT8ivD6L4Bqfjx9L1We2MVb4WZAqKZk6zFf8=
    Key                         Value
    ---                         -----
    Seal Type                   shamir
    Initialized                 true
    Sealed                      false
    Total Shares                1
    Threshold                   1
    Version                     1.11.0
    Build Date                  2022-06-17T15:48:44Z
    Storage Type                raft
    Cluster Name                POC
    Cluster ID                  109658fe-36bd-7d28-bf92-f095c77e860c
    HA Enabled                  true
    HA Cluster                  https://node1.int.us-west-1-dev.central.example.com:8201
    HA Mode                     active
    Active Since                2022-06-29T12:50:46.992698336Z
    Raft Committed Index        36
    Raft Applied Index          36

    13. Unseal Vault2 (Use the same unseal key generated in step 10 for Vault1):

    [user@node2 vault.d]$ vault operator unseal HPY/g5OiT8ivD6L4Bqfjx9L1We2MVb4WZAqKZk6zFf8=
    Key                Value
    ---                -----
    Seal Type          shamir
    Initialized        true
    Sealed             true
    Total Shares       1
    Threshold          1
    Unseal Progress    0/1
    Unseal Nonce       n/a
    Version            1.11.0
    Build Date         2022-06-17T15:48:44Z
    Storage Type       raft
    HA Enabled         true
    
    [user@node2 vault.d]$ vault status
    Key                   Value
    ---                   -----
    Seal Type             shamir
    Initialized           true
    Sealed                true
    Total Shares          1
    Threshold             1
    Version               1.11.0
    Build Date            2022-06-17T15:48:44Z
    Storage Type          raft
    Cluster Name          POC
    Cluster ID            109658fe-36bd-7d28-bf92-f095c77e860c
    HA Enabled            true
    HA Cluster            https://node1.int.us-west-1-dev.central.example.com:8201
    HA Mode               standby
    Active Node Address   https://node1.int.us-west-1-dev.central.example.com:8200
    Raft Committed Index  37
    Raft Applied Index    37

    14. Unseal Vault3 (Use the same unseal key generated in step 10 for Vault1):

    [user@node3 ~]$ vault operator unseal HPY/g5OiT8ivD6L4Bqfjx9L1We2MVb4WZAqKZk6zFf8=
    Key                Value
    ---                -----
    Seal Type          shamir
    Initialized        true
    Sealed             true
    Total Shares       1
    Threshold          1
    Unseal Progress    0/1
    Unseal Nonce       n/a
    Version            1.11.0
    Build Date         2022-06-17T15:48:44Z
    Storage Type       raft
    HA Enabled         true
    
    [user@node3 ~]$ vault status
    Key                       Value
    ---                       -----
    Seal Type                 shamir
    Initialized               true
    Sealed                    false
    Total Shares              1
    Threshold                 1
    Version                   1.11.0
    Build Date                2022-06-17T15:48:44Z
    Storage Type              raft
    Cluster Name              POC
    Cluster ID                109658fe-36bd-7d28-bf92-f095c77e860c
    HA Enabled                true
    HA Cluster                https://node1.int.us-west-1-dev.central.example.com:8201
    HA Mode                   standby
    Active Node Address       https://node1.int.us-west-1-dev.central.example.com:8200
    Raft Committed Index      39
    Raft Applied Index        39

    15. Check the cluster’s raft status with the following command:

    [user@node3 ~]$ vault operator raft list-peers
    Node      Address                                            State       Voter
    ----      -------                                            -----       -----
    node1    node1.int.us-west-1-dev.central.example.com:8201    leader      true
    node2    node2.int.us-west-1-dev.central.example.com:8201    follower    true
    node3    node3.int.us-west-1-dev.central.example.com:8201    follower    true

    16. Currently, node1 is the active node. We can experiment to see what happens if node1 steps down from its active node duty.

    In the terminal where VAULT_ADDR is set to: https://node1.int.us-west-1-dev.central.example.com, execute the step-down command.

    $ vault operator step-down # equivalent of stopping the node or stopping the systemctl service
    Success! Stepped down: https://node2.int.us-west-1-dev.central.example.com:8200

    In the terminal, where VAULT_ADDR is set to https://node2.int.us-west-1-dev.central.example.com:8200, examine the raft peer set.

    [user@node1 ~]$ vault operator raft list-peers
    Node      Address                                            State       Voter
    ----      -------                                            -----       -----
    node1    node1.int.us-west-1-dev.central.example.com:8201    follower    true
    node2    node2.int.us-west-1-dev.central.example.com:8201    leader      true
    node3    node3.int.us-west-1-dev.central.example.com:8201    follower    true

    Conclusion 

    Vault servers are now operational in High Availability mode, and we can test this by writing a secret from either the active or standby Vault instance and see it succeed as a test of request forwarding. Also, we can shut down the active vault instance (sudo systemctl stop vault) to simulate a system failure and see the standby instance assumes the leadership.

  • How to Avoid Screwing Up CI/CD: Best Practices for DevOps Team

    Basic Fundamentals (One-line definition) :

    CI/CD is defined as continuous integration, continuous delivery, and/or continuous deployment. 

    Continuous Integration: 

    Continuous integration is defined as a practice where a developer’s changes are merged back to the main branch as soon as possible to avoid facing integration challenges.

    Continuous Delivery:

    Continuous delivery is basically the ability to get all the types of changes deployed to production or delivered to the customer in a safe, quick, and sustainable way.

    An oversimplified CI/CD pipeline

    Why CI/CD?

    • Avoid integration hell

    In most modern application development scenarios, multiple developers work on different features simultaneously. However, if all the source code is to be merged on the same day, the result can be a manual, tedious process of resolving conflicts between branches, as well as a lot of rework.  

    Continuous integration (CI) is the process of merging the code changes frequently (can be daily or multiple times a day also) to a shared branch (aka master or truck branch). The CI process makes it easier and quicker to identify bugs, saving a lot of developer time and effort.

    • Faster time to market

    Less time is spent on solving integration problems and reworking, allowing faster time to market for products.

    • Have a better and more reliable code

    The changes are small and thus easier to test. Each change goes through a rigorous cycle of unit tests, integration/regression tests, and performance tests before being pushed to prod, ensuring a better quality code.  

    • Lower costs 

    As we have a faster time to market and fewer integration problems,  a lot of developer time and development cycles are saved, leading to a lower cost of development.

    Enough theory now, let’s dive into “How do I get started ?”

    Basic Overview of CI/CD

    Decide on your branching strategy

    A good branching strategy should have the following characteristics:

    • Defines a clear development process from initial commit to production deployment
    • Enables parallel development
    • Optimizes developer productivity
    • Enables faster time to market for products and services
    • Facilitates integration with all DevOps practices and tools such as different versions of control systems

    Types of branching strategies (please refer to references for more details) :

    • Git flow – Ideal when handling multiple versions of the production code and for enterprise customers who have to adhere to release plans and workflows 
    • Trunk-based development – Ideal for simpler workflows and if automated testing is available, leading to a faster development time
    • Other branching strategies that you can read about are Github flow, Gitlab flow, and Forking flow.

    Build or compile your code 

    The next step is to build/compile your code, and if it is interpreted code, go ahead and package it.

    Build best practices :

    • Build Once – Building the same artifact for multiple env is inadvisable.
    • Exact versions of third-party dependencies should be used.
    • Libraries used for debugging, etc., should be removed from the product package.
    • Have a feedback loop so that the team is made aware of the status of the build step.
    • Make sure your builds are versioned correctly using semver 2.0 (https://semver.org/).
    • Commit early, commit often.

    Select tool for stitching the pipeline together

    • You can choose from GitHub actions, Jenkins, circleci, GitLab, etc.
    • Tool selection will not affect the quality of your CI/CD pipeline but might increase the maintenance if we go for managed CI/CD services as opposed to services like Jenkins deployed onprem. 

    Tools and strategy for SAST

    Instead of just DevOps, we should think of devsecops. To make the code more secure and reliable, we can introduce a step for SAST (static application security testing).

    SAST, or static analysis, is a testing procedure that analyzes source code to find security vulnerabilities. SAST scans the application code before the code is compiled. It’s also known as white-box testing, and it helps shift towards a security-first mindset as the code is scanned right at the start of SDLC.

    Problems SAST solves:

    • SAST tools give developers real-time feedback as they code, helping them fix issues before they pass the code to the next phase of the SDLC. 
    • This prevents security-related issues from being considered an afterthought. 

    Deployment strategies

    How will you deploy your code with zero downtime so that the customer has the best experience? Try and implement one of the strategies below automatically via CI/CD. This will help in keeping the blast radius to the minimum in case something goes wrong. 

    • Ramped (also known as rolling-update or incremental): The new version is slowly rolled out to replace the older version of the product .
    • Blue/Green: The new version is released alongside the older version, then the traffic is switched to the newer version.
    • Canary: The new version is released to a selected group of users before doing  a full rollout. This can be achieved by feature flagging as well. For more information, read about tools like launch darkly(https://launchdarkly.com/) and git unleash (https://github.com/Unleash/unleash). 
    • A/B testing: The new version is released to a subset of users under specific conditions.
    • Shadow: The new version receives real-world traffic alongside the older version and doesn’t impact the response.

    Config and Secret Management

    According to the 12-factor app, application configs should be exposed to the application with environment variables. However, it does not have restrictions on where these configurations need to be stored and sourced from.

    A few things to keep in mind while storing configs.

    • Versioning of configs always helps, but storing secrets in VCS is strongly discouraged.
    • For an enterprise, it is beneficial to use a cloud-agnostic solution.

    Solution:

    • Store your configuration secrets outside of the version control system.
    • You can use AWS secret manager, Vault, and even S3 for storing your configs, e.g.: S3 with KMS, etc. There are other services available as well, so choose the one which suits your use case the best.

    Automate versioning and release notes generation

    All the releases should be tagged in the version control system. Versions can be automatically updated by looking at the git commit history and searching for keywords.

    There are many modules available for release notes generation. Try and automate these as well as a part of your CI/CD process. If this is done, you can successfully eliminate human intervention from the release process.

    Example from GitHub actions workflow :

    - name: Automated Version Bump
      id: version-bump
      uses: 'phips28/gh-action-bump-version@v9.0.16'
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      with:
        commit-message: 'CI: Bump version to v{{version}}'

    Have a rollback strategy

    In case of regression, performance, or smoke test fails after deployment onto an environment, feedback should be given and the version should be rolled back automatically as a part of the CI/CD process. This makes sure that the environment is up and also reduces the MTTR (mean time to recovery), and MTTD (mean time to detection) in case there is a production outage due to code deployment.

    GitOps tools like argocd and flux make it easy to do things like this, but even if you are not using any of the GitOps tools, this can be easily managed using scripts or whatever tool you are using for deployment.

    Include db changes as a part of your CI/CD

    Databases are often created manually and frequently evolve through manual changes, informal processes, and even testing in production. Manual changes often lack documentation and are harder to review, test, and coordinate with software releases. This makes the system more fragile with a higher risk of failure.

    The correct way to do this is to include the database in source control and CI/CD pipeline. This lets the team document each change, follow the code review process, test it thoroughly before release, make rollbacks easier, and coordinate with software releases. 

    For a more enterprise or structured solution, we could use a tool such as Liquibase, Alembic, or Flyway.

    How it should ideally be done:

    • We can have a migration-based strategy where, for each DB change, an additional migration script is added and is executed as a part of CI/CD .
    • Things to keep in mind are that the CI/CD process should be the same across all the environments. Also, the amount of data on prod and other environments might vary drastically, so batching and limits should be used so that we don’t end up using all the memory of our database server.
    • As far as possible, DB migrations should be backward compatible. This makes it easier for rollbacks. This is the reason some companies only allow additive changes as a part of db migration scripts. 

    Real-world scenarios

    • Gated approach 

    It is not always possible to have a fully automated CI/CD pipeline because the team may have just started the development of a product and might not have automated testing yet.

    So, in cases like these, we have manual gates that can be approved by the responsible teams. For example, we will deploy to the development environment and then wait for testers to test the code and approve the manual gate, then the pipeline can go forward.

    Most of the tools support these kinds of requests. Make sure that you are not using any kind of resources for this step otherwise you will end up blocking resources for the other pipelines.

    Example:

    https://www.jenkins.io/doc/pipeline/steps/pipeline-input-step/#input-wait-for-interactive-input

    def LABEL_ID = "yourappname-${UUID.randomUUID().toString()}"
    def BRANCH_NAME = "<Your branch name>"
    def GIT_URL = "<Your git url>"
    // Start Agent
    node(LABEL_ID) {
        stage('Checkout') {
            doCheckout(BRANCH_NAME, GIT_URL)
        }
        stage('Build') {
            ...
        }
        stage('Tests') {
            ...
        }    
    }
    // Kill Agent
    // Input Step
    timeout(time: 15, unit: "MINUTES") {
        input message: 'Do you want to approve the deploy in production?', ok: 'Yes'
    }
    // Start Agent Again
    node(LABEL_ID) {
        doCheckout(BRANCH_NAME, GIT_URL) 
        stage('Deploy') {
            ...
        }
    }
    def doCheckout(branchName, gitUrl){
        checkout([$class: 'GitSCM',
            branches: [[name: branchName]],
            doGenerateSubmoduleConfigurations: false,
            extensions:[[$class: 'CloneOption', noTags: true, reference: '', shallow: true]],
            userRemoteConfigs: [[credentialsId: '<Your credentials id>', url: gitUrl]]])
    }

    Observability of releases 

    Whenever we are debugging the root cause of issues in production, we might need the information below. As the system gets more complex with multiple upstreams and downstream, it becomes imperative that we have this information, all in one place, for efficient debugging and support by the operations team.

    • When was the last deployment? What version was deployed?
    • The deployment history as to which version was deployed when along with the code changes that went in.

    Below are the 2 ways generally organizations follow to achieve this:

    • Have a release workflow that is tracked using a Change request or Service request on Jira or any other tracking tool.
    • For GitOps applications using tools like Argo CD and flux, all this information is available as a part of the version control system and can be derived from there.

    DORA metrics 

    DevOps maturity of a team is measured based on mainly four metrics that are defined below, and CI/CD helps in improving all of the below. So, teams and organizations should try and achieve the Elite status for DORA metrics.

    • Deployment Frequency— How often an org successfully releases to production
    • Lead Time for Changes— The amount of time a commit takes to get into prod
    • Change Failure Rate— The percentage of deployments causing a failure in prod
    • Time to Restore Service— How long an org takes to recover from a failure in prod

    Conclusion 

    CI/CD forms an integral part of DevOps and SRE practices, and if done correctly,  it can impact the team’s and organization’s productivity in a huge way. 

    So, try and implement the above principles and get one step closer to having a highly productive team and a better product.

  • Getting Started With Kubernetes Operators (Golang Based) – Part 3

    Introduction

    In the first, getting started with Kubernetes operators (Helm based), and the second part, getting started with Kubernetes operators (Ansible based), of this Introduction to Kubernetes operators blog series we learned various concepts related to Kubernetes operators and created a Helm based operator and an Ansible based operator respectively. In this final part, we will build a Golang based operator. In case of Helm based operators, we were executing a helm chart when changes were made to the custom object type of our application, similarly in the case of an Ansible based operator we executed an Ansible role. In case of Golang based operator we write the code for the action we need to perform (reconcile logic) whenever the state of our custom object change, this makes the Golang based operators quite powerful and flexible, at the same time making them the most complex to build out of the 3 types.

    What Will We Build?

    The database server we deployed as part of our book store app in previous blogs didn’t have any persistent volume attached to it and we would lose data in case the pod restarts, to avoid this we will attach a persistent volume attached to the host (K8s worker nodes ) and run our database as an statefulset rather than a deployment. We will also add a feature to expand the persistent volume associated with the mongodb pod.

    Building the Operator

    1. Set up the project:  

    operator-sdk new bookstore-operator –dep-manager=dep

    INFO[0000] Generating api version blog.velotio.com/v1alpha1 for kind BookStore. 
    INFO[0000] Created pkg/apis/blog/group.go               
    INFO[0001] Created pkg/apis/blog/v1alpha1/bookstore_types.go 
    INFO[0001] Created pkg/apis/addtoscheme_blog_v1alpha1.go 
    INFO[0001] Created pkg/apis/blog/v1alpha1/register.go   
    INFO[0001] Created pkg/apis/blog/v1alpha1/doc.go        
    INFO[0001] Created deploy/crds/blog.velotio.com_v1alpha1_bookstore_cr.yaml 
    INFO[0009] Created deploy/crds/blog.velotio.com_bookstores_crd.yaml 
    INFO[0009] Running deepcopy code-generation for Custom Resource group versions: [blog:[v1alpha1], ] 
    INFO[0010] Code-generation complete.                    
    INFO[0010] Running OpenAPI code-generation for Custom Resource group versions: [blog:[v1alpha1], ] 
    INFO[0011] Created deploy/crds/blog.velotio.com_bookstores_crd.yaml 
    INFO[0011] Code-generation complete.                    
    INFO[0011] API generation complete.

    The above command creates the bookstore-operator folder in our $GOPATH/src, here we have set the –dep-manager as dep which signifies we want to use dep for managing dependencies, by default it uses go modules for managing dependencies. Similar to what we have seen earlier the operator sdk creates all the necessary folder structure for us inside the bookstore-operator folder.

    2. Add the custom resource definition

    operator-sdk add api –api-version=blog.velotio.com/v1alpha1 –kind=BookStore

    The above command creates the CRD and CR for the BookStore type. It also creates the golang structs (pkg/apis/blog/v1alpha1/bookstore_types.go)  for BookStore types.  It also registers the custom type (pkg/apis/blog/v1alpha1/register.go) with schema and generates deep-copy methods as well. Here we can see that all the generic tasks are being done by the operator framework itself allowing us to focus on building and object and the controller. We will update the spec of our BookStore object later. We will update the spec of BookStore type to include two custom types BookApp and BookDB.

    type BookStoreSpec struct {
    
    	BookApp BookApp     `json:"bookApp,omitempty"`
    	BookDB  BookDB      `json:"bookDB,omitempty"`
    }
    
    type BookApp struct {
    	 
    	Repository      string             `json:"repository,omitempty"`
    	Tag             string             `json:"tag,omitempty"`
    	ImagePullPolicy corev1.PullPolicy  `json:"imagePullPolicy,omitempty"`
            Replicas        int32              `json:"replicas,omitempty"`
            Port            int32              `json:"port,omitempty"`
    	TargetPort      int                `json:"targetPort,omitempty"`
    	ServiceType     corev1.ServiceType `json:"serviceType,omitempty"`
    }
    
    type BookDB struct {
    	 
    	Repository      string            `json:"repository,omitempty"`
    	Tag             string            `json:"tag,omitempty"`
    	ImagePullPolicy corev1.PullPolicy `json:"imagePullPolicy,omitempty"`
            Replicas        int32             `json:"replicas,omitempty"`
    	Port            int32             `json:"port,omitempty"`
    	DBSize          resource.Quantity `json:"dbSize,omitempty"`
    }

    Let’s also update the BookStore CR (blog.velotio.com_v1alpha1_bookstore_cr.yaml)

    apiVersion: blog.velotio.com/v1alpha1
    kind: BookStore
    metadata:name: example-bookstore
    spec:
      bookApp: 
        repository: "akash125/pyapp"
        tag: latest
        imagePullPolicy: "IfNotPresent"
        replicas: 1
        port: 80
        targetPort: 3000
        serviceType: "LoadBalancer"
      bookDB:
        repository: "mongo"
        tag: latest
        imagePullPolicy: "IfNotPresent"
        replicas: 1
        port: 27017
        dbSize: 2Gi

    3. Add the bookstore controller

    operator-sdk add controller –api-version=blog.velotio.com/v1alpha1 –kind=BookStore

    INFO[0000] Generating controller version blog.velotio.com/v1alpha1 for kind BookStore. 
    INFO[0000] Created pkg/controller/bookstore/bookstore_controller.go 
    INFO[0000] Created pkg/controller/add_bookstore.go      
    INFO[0000] Controller generation complete.

    The above command adds the bookstore controller (pkg/controller/bookstore/bookstore_controller.go) to the project and also adds it to the manager.

    If we take a look at the add function in the bookstore_controller.go file we can see that a new controller is created here and added to the manager so that the manager can start the controller when it (manager) comes up,  the add(mgr manager.Manager, r reconcile.Reconciler) is called by the public function Add(mgr manager.Manager) which also creates a new reconciler objects and passes it to the add where the controller is associated with the reconciler, in the add function we also set the type of object (BookStore) which the controller will watch.

    // Watch for changes to primary resource BookStore
    	err = c.Watch(&source.Kind{Type: &blogv1alpha1.BookStore{}}, &handler.EnqueueRequestForObject{})
    	if err != nil {
    		return err
    	}

    This ensures that for any events related to any object of BookStore type, a reconcile request (a namespace/name key) is sent to the Reconcile method associated with the reconciler object (ReconcileBookStore) here.

    4. Build the reconcile logic

    The reconcile logic is implemented inside the Reconcile method of the reconciler object of the custom type which implements the reconcile loop.

    As a part of our reconcile logic we will do the following

    1. Create the bookstore app deployment if it doesn’t exist.
    2. Create the bookstore app service if it doesn’t exist.
    3. Create the Mongodb statefulset if it doesn’t exist.
    4. Create the Mongodb service if it doesn’t exist.
    5. Ensure deployments and services match their desired configurations like the replica count, image tag, service port, size of the PV associated with the Mongodb statefulset etc.

     There are three possible events that can happen with the BookStore object

    1. The object got created: Whenever an object of kind BookStore is created we create all the k8s resources we mentioned above
    2. The object has been updated: When the object gets updated then we update all the k8s resources associated with it..
    3. The object has been deleted: When the object gets deleted we don’t need to do anything as while creating the K8s objects we will set the `BookStore` type as its owner which will ensure that all the K8s objects associated with it gets automatically deleted when we delete the object.

    On receiving the reconcile request the first step if to lookup for the object.

    func (r *ReconcileBookStore) Reconcile(request reconcile.Request) (reconcile.Result, error) {
    	reqLogger := log.WithValues("Request.Namespace", request.Namespace, "Request.Name", request.Name)
    	reqLogger.Info("Reconciling BookStore")
    
    	// Fetch the BookStore instance
    	bookstore := &blogv1alpha1.BookStore{}
    	err := r.client.Get(context.TODO(), request.NamespacedName, bookstore)

    If the object is not found, we assume that it got deleted and don’t requeue the request considering the reconcile to be successful.

    If any error occurs while doing the reconcile then we return the error and whenever we return non nil error value then controller requeues the request.

    In the reconcile logic we call the BookStore method which creates or updates all the k8s objects associated with the BookStore objects based on whether the object has been created or updated.

    func (r *ReconcileBookStore) BookStore(bookstore *blogv1alpha1.BookStore) error {
         reqLogger := log.WithValues("Namespace", bookstore.Namespace)
         mongoDBSvc := getmongoDBSvc(bookstore)
         msvc := &corev1.Service{}
         err := r.client.Get(context.TODO(), types.NamespacedName{Name: "mongodb-service", Namespace: bookstore.Namespace}, msvc)
         if err != nil {
    	if errors.IsNotFound(err) {
    	   controllerutil.SetControllerReference(bookstore, mongoDBSvc, r.scheme)
    	   err = r.client.Create(context.TODO(), mongoDBSvc)
    	   if err != nil { return err }
                } else {  return err }
            } else if !reflect.DeepEqual(mongoDBSvc.Spec, msvc.Spec) {
    	   mongoDBSvc.ObjectMeta = msvc.ObjectMeta
               controllerutil.SetControllerReference(bookstore, mongoDBSvc, r.scheme)
               err = r.client.Update(context.TODO(), mongoDBSvc)
    	   if err != nil { return err }
    	      reqLogger.Info("mongodb-service updated")
    	   }
       mongoDBSS := getMongoDBStatefulsets(bookstore)
       mss := &appsv1.StatefulSet{}
       err = r.client.Get(context.TODO(), types.NamespacedName{Name: "mongodb", Namespace: bookstore.Namespace}, mss)
       if err != nil {
          if errors.IsNotFound(err) {
    	reqLogger.Info("mongodb statefulset not found, will be created")
    	controllerutil.SetControllerReference(bookstore, mongoDBSS, r.scheme)
    	err = r.client.Create(context.TODO(), mongoDBSS)
    	if err != nil { return err }
    	} else {
    	    reqLogger.Info("failed to get mongodb statefulset")
    	    return err
    	   }
    	} else if !reflect.DeepEqual(mongoDBSS.Spec, mss.Spec) {
                 r.UpdateVolume(bookstore)
    	     mongoDBSS.ObjectMeta = mss.ObjectMeta
    	     mongoDBSS.Spec.VolumeClaimTemplates = mss.Spec.VolumeClaimTemplates
    	     controllerutil.SetControllerReference(bookstore, mongoDBSS, r.scheme)
    	     err = r.client.Update(context.TODO(), mongoDBSS)
    	     if err != nil { return err }
    	        reqLogger.Info("mongodb statefulset updated")
            }
       bookStoreSvc := getBookStoreAppSvc(bookstore)
       bsvc := &corev1.Service{}
       err = r.client.Get(context.TODO(), types.NamespacedName{Name: "bookstore-svc", Namespace: bookstore.Namespace}, bsvc)
       if err != nil {
          if errors.IsNotFound(err) {
    	  controllerutil.SetControllerReference(bookstore, bookStoreSvc, r.scheme)
    	  err = r.client.Create(context.TODO(), bookStoreSvc)
    	  if err != nil { return err }
    	  } else {
    	      reqLogger.Info("failed to get bookstore service")
    	      return err
    	    }
    	} else if !reflect.DeepEqual(bookStoreSvc.Spec, bsvc.Spec) {
    	      bookStoreSvc.ObjectMeta = bsvc.ObjectMeta
    	      bookStoreSvc.Spec.ClusterIP = bsvc.Spec.ClusterIP
    	      controllerutil.SetControllerReference(bookstore, bookStoreSvc, r.scheme)
    	      err = r.client.Update(context.TODO(), bookStoreSvc)
    	      if err != nil { return err }
    	          reqLogger.Info("bookstore service updated")
    	  }
      bookStoreDep := getBookStoreDeploy(bookstore)
      bsdep := &appsv1.Deployment{}
      err = r.client.Get(context.TODO(), types.NamespacedName{Name: "bookstore", Namespace: bookstore.Namespace}, bsdep)
      if err != nil {
        if errors.IsNotFound(err) {
    	controllerutil.SetControllerReference(bookstore, bookStoreDep, r.scheme)
    	err = r.client.Create(context.TODO(), bookStoreDep)
    	if err != nil { return err }
    	} else {
    	   reqLogger.Info("failed to get bookstore deployment")
    	     return err
    	    }
    	} else if !reflect.DeepEqual(bookStoreDep.Spec, bsdep.Spec) {
    	       bookStoreDep.ObjectMeta = bsdep.ObjectMeta
    	       controllerutil.SetControllerReference(bookstore, bookStoreDep, r.scheme)
    	       err = r.client.Update(context.TODO(), bookStoreDep)
    	       if err != nil { return err }
    			reqLogger.Info("bookstore deployment updated")
    	}
      r.client.Status().Update(context.TODO(), bookstore)
      return nil
    }

    The implementation of the above method is a bit hacky but gives an idea of the flow. In the above function, we can see that we are setting the BookStore type as an owner for all the resources controllerutil.SetControllerReference(c, bookStoreDep, r.scheme) as we had discussed earlier. If we look at the owner reference for these objects we would see something like this.

    ownerReferences:
      - apiVersion: blog.velotio.com/v1alpha1
        blockOwnerDeletion: true
        controller: true
        kind: BookStore
        name: example-bookstore
        uid: 0ef42889-deb4-11e9-ba56-42010a800256
      resourceVersion: "20295281"

    5.  Deploy the operator and verify its working

    The approach to deploy and verify the working of the bookstore application is similar to what we did in the previous two blogs the only difference being that now we have deployed the Mongodb as a stateful set and even if we restart the pod we will see that the information that we stored will still be available.

    6. Verify volume expansion

    For updatingthe volume associated with the mongodb instance we first need to update the size of the volume we specified while creating the bookstore object. In the example above I had set it to 2GB let’s update it to 3GB and update the bookstore object.

    Once the bookstore object is updated if we describe the mongodb PVC we will see that it still has 2GB PV but the conditions we will see something like this.

    Conditions:
      Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
      ----                      ------  -----------------                 ------------------                ------  -------
      FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Mon, 30 Sep 2019 15:07:01 +0530           Waiting for user to (re-)start a pod to finish file system resize of volume on node.
    @velotiotech

    It is clear from the message that we need to restart the pod for resizing of volume to reflect. Once we delete the pod it will get restarted and the PVC will get updated to reflect the expanded volume size.

    The complete code is available here.

    Conclusion

    Golang based operators are built mostly for stateful applications like databases. The operator can automate complex operational tasks allow us to run applications with ease. At the same time, building and maintaining it can be quite complex and we should build one only when we are fully convinced that our requirements can’t be met with any other type of operator. Operators are an interesting and emerging area in Kubernetes and I hope this blog series on getting started with it help the readers in learning the basics of it.

  • Setting Up A Robust Authentication Environment For OpenSSH Using QR Code PAM

    Do you like WhatsApp Web authentication? Well, WhatsApp Web has always fascinated me with the simplicity of QR-Code based authentication. Though there are similar authentication UIs available, I always wondered whether a remote secure shell (SSH) could be authenticated with a QR code with this kind of simplicity while keeping the auth process secure. In this guide, we will see how to write and implement a bare-bones PAM module for OpenSSH Linux-based system.

    “OpenSSH is the premier connectivity tool for remote login with the SSH protocol. It encrypts all traffic to eliminate eavesdropping, connection hijacking, and other attacks. In addition, OpenSSH provides a large suite of secure tunneling capabilities, several authentication methods, and sophisticated configuration options.”

    openssh.com

    Meet PAM!

    PAM, short for “Pluggable Authentication Module,” is a middleware that abstracts authentication features on Linux and UNIX-like operating systems. PAM has been around for more than two decades. The authentication process could be cumbersome with each service looking for authenticating users with a different set of hardware and software, such as username-password, fingerprint module, face recognition, two-factor authentication, LDAP, etc. But the underlining process remains the same, i.e., users must be authenticated as who they say they are. This is where PAM comes into the picture and provides an API to the application layer and provides built-in functions to implement and extend PAM capability.

    Source: Redhat

    Understand how OpenSSH interacts with PAM

    The Linux host OpenSSH (sshd daemon) begins by reading the configuration defined in /etc/pam.conf or alternatively in /etc/pam.d configuration files. The config files are usually defined with service names having various realms (auth, account, session, password). The “auth” realm is what takes care of authenticating users as who they say. A typical sshd PAM service file on Ubuntu OS can be seen below, and you can relate with your own flavor of Linux:

    @include common-auth
    account    required     pam_nologin.so
    @include common-account
    session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so close
    session    required     pam_loginuid.so
    session    optional     pam_keyinit.so force revoke
    @include common-session
    session    optional     pam_motd.so  motd=/run/motd.dynamic
    session    optional     pam_motd.so noupdate
    session    optional     pam_mail.so standard noenv # [1]
    session    required     pam_limits.so
    session    required     pam_env.so # [1]
    session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale
    session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so open
    @include common-password

    The common-auth file has an “auth” realm with the pam_unix.so PAM module, which is responsible for authenticating the user with a password. Our goal is to write a PAM module that replaces pam_unix.so with our own version.

    When OpenSSH makes calls to the PAM module, the very first function it looks for is “pam_sm_authenticate,” along with some other mandatory function such as pam_sm_setcred. Thus, we will be implementing the pam_sm_authenticate function, which will be an entry point to our shared object library. The module should return PAM_SUCCESS (0) as the return code for successful authentication.

    Application Architecture

    The project architecture has four main applications. The backend is hosted on an AWS cloud with minimal and low-cost infrastructure resources.

    1. PAM Module: Provides QR-Code auth prompt to client SSH Login

    2. Android Mobile App: Authenticates SSH login by scanning a QR code

    3. QR Auth Server API: Backend application to which our Android App connects and communicates and shares authentication payload along with some other meta information

    4. WebSocket Server (API Gateway WebSocket, and NodeJS) App: PAM Module and server-side app shares auth message payload in real time

    When a user connects to the remote server via SSH, a PAM module is triggered, offering a QR code for authentication. Information is exchanged between the API gateway WebSocket, which in terms saves temporary auth data in DynamoDB. A user then uses an Android mobile app (written in react-native) to scan the QR code.

    Upon scanning, the app connects to the API gateway. An API call is first authenticated by AWS Cognito to avoid any intrusion. The request is then proxied to the Lambda function, which authenticates input payload comparing information available in DynamoDB. Upon successful authentication, the Lambda function makes a call to the API gateway WebSocket to inform the PAM to authenticate the user.

    Framework and Toolchains

    PAM modules are shared object libraries that must be be written in C (although other languages can be used to compile and link or probably make cross programming language calls like python pam or pam_exec). Below are the framework and toolset I am using to serve this project:

    1. gcc, make, automake, autoreconf, libpam (GNU dev tools on Ubuntu OS)

    2. libqrencode, libwebsockets, libpam, libssl, libcrypto (C libraries)

    3. NodeJS, express (for server-side app)

    4. API gateway and API Gateway webSocket, AWS Lambda (AWS Cloud Services for hosting serverless server side app)

    5. Serverless framework (for easily deploying infrastructure)

    6. react-native, react-native-qrcode-scanner (for Android mobile app)

    7. AWS Cognito (for authentication)

    8. AWS Amplify Library

    This guide assumes you have a basic understanding of the Linux OS, C programming language, pointers, and gcc code compilation. For the backend APIs, I prefer to use NodeJS as a primary programming language, but you may opt for the language of your choice for designing HTTP APIs.

    Authentication with QR Code PAM Module

    When the module initializes, we first want to generate a random string with the help “/dev/urandom” character device. Byte string obtained from this device contains non-screen characters, so we encode them with Base64. Let’s call this string an auth verification string.

    void get_random_string(char *random_str,int length)
    {
       FILE *fp = fopen("/dev/urandom","r");
       if(!fp){
           perror("Unble to open urandom device");
           exit(EXIT_FAILURE);
       }
       fread(random_str,length,1,fp);
       fclose(fp);
    }
     
    char random_string[11];
      
      //get random string
       get_random_string(random_string,10);
      //convert random string to base64 coz input string is coming from /dev/urandom and may contain binary chars
       const int encoded_length = Base64encode_len(10);
       base64_string=(char *)malloc(encoded_length+1);
       Base64encode(base64_string,random_string,10);
       base64_string[encoded_length]='';

    We then initiate a WebSocket connection with the help of the libwebsockets library and connect to our API Gateway WebSocket endpoint. Once the connection is established, we inform that a user may try to authenticate with auth verification string. The API Gateway WebSocket returns a unique connection ID to our PAM module.

    static void connect_client(struct lws_sorted_usec_list *sul)
    {
       struct vhd_minimal_client_echo *vhd =
           lws_container_of(sul, struct vhd_minimal_client_echo, sul);
       struct lws_client_connect_info i;
       char host[128];
       lws_snprintf(host, sizeof(host), "%s:%u", *vhd->ads, *vhd->port);
       memset(&i, 0, sizeof(i));
       i.context = vhd->context;
      //i.port = *vhd->port;
       i.port = *vhd->port;
       i.address = *vhd->ads;
       i.path = *vhd->url;
       i.host = host;
       i.origin = host;
       i.ssl_connection = LCCSCF_USE_SSL | LCCSCF_ALLOW_SELFSIGNED | LCCSCF_SKIP_SERVER_CERT_HOSTNAME_CHECK | LCCSCF_PIPELINE;
      //i.ssl_connection = 0;
       if ((*vhd->options) & 2)
           i.ssl_connection |= LCCSCF_USE_SSL;
       i.vhost = vhd->vhost;
       i.iface = *vhd->iface;
      //i.protocol = ;
       i.pwsi = &vhd->client_wsi;
      //lwsl_user("connecting to %s:%d/%s\n", i.address, i.port, i.path);
       log_message(LOG_INFO,ws_applogic.pamh,"About to create connection %s",host);
      //return !lws_client_connect_via_info(&i);
       if (!lws_client_connect_via_info(&i))
           lws_sul_schedule(vhd->context, 0, &vhd->sul,
                    connect_client, 10 * LWS_US_PER_SEC);
    }

    Upon receiving the connection id from the server, the PAM module converts this connection id to SHA1 hash string and finally composes a unique string for generating QR Code. This string consists of three parts separated by colons (:), i.e.,

    “qrauth:BASE64(AUTH_VERIFY_STRING):SHA1(CONNECTION_ID).” For example, let’s say a random Base64 encoded string is “UX6t4PcS5doEeA==” and connection id is “KZlfidYvBcwCFFw=”

    Then the final encoded string is “qrauth:UX6t4PcS5doEeA==:2fc58b0cc3b13c3f2db49a5b4660ad47c873b81a.

    This string is then encoded to the UTF8 QR code with the help of libqrencode library and the authentication screen is prompted by the PAM module.

    char *con_id=strstr(msg,ws_com_strings[READ_WS_CONNECTION_ID]);
               int length = strlen(ws_com_strings[READ_WS_CONNECTION_ID]);
              
               if(!con_id){
                   pam_login_status=PAM_AUTH_ERR;
                   interrupted=1;
                   return;
               }
               con_id+=length;
               log_message(LOG_DEBUG,ws_applogic.pamh,"strstr is %s",con_id);
               string_crypt(ws_applogic.sha_code_hex, con_id);
               sprintf(temp_text,"qrauth:%s:%s",ws_applogic.authkey,ws_applogic.sha_code_hex);
               char *qr_encoded_text=get_qrcode_string(temp_text);
               ws_applogic.qr_encoded_text=qr_encoded_text;
               conv_info(ws_applogic.pamh,"\nSSH Auth via QR Code\n\n");
               conv_info(ws_applogic.pamh, ws_applogic.qr_encoded_text);
               log_message(LOG_INFO,ws_applogic.pamh,"Use Mobile App to Scan \n %s",ws_applogic.qr_encoded_text);
               log_message(LOG_INFO,ws_applogic.pamh,"%s",temp_text);
               ws_applogic.current_action=READ_WS_AUTH_VERIFIED;
               sprintf(temp_text,ws_com_strings[SEND_WS_EXPECT_AUTH],ws_applogic.authkey,ws_applogic.username);
               websocket_write_back(wsi,temp_text,-1);
               conv_read(ws_applogic.pamh,"\n\nUse Mobile SSH QR Auth App to Authentiate SSh Login and Press Enter\n\n",PAM_PROMPT_ECHO_ON);

    API Gateway WebSocket App

    We used a serverless framework for easily creating and deploying our infrastructure resources. With serverless cli, we use aws-nodejs template (serverless create –template aws-nodejs). You can find a detailed guide on Serverless, API Gateway WebSocket, and DynamoDB here. Below is the template YAML definition. Note that the DynamoDB resource has TTL set to expires_at property. This field holds the UNIX epoch timestamp.

    What this means is that any record that we store is automatically deleted as per the epoch time set. We plan to keep the record only for 5 minutes. This also means the user must authenticate themselves within 5 minutes of the authentication request to the remote SSH server.

    service: ssh-qrapp-websocket
    frameworkVersion: '2'
    useDotenv: true
    provider:
     name: aws
     runtime: nodejs12.x
     lambdaHashingVersion: 20201221
     websocketsApiName: ssh-qrapp-websocket
     websocketsApiRouteSelectionExpression: $request.body.action
     region: ap-south-1
      iam:
       role:
         statements:
           - Effect: Allow
             Action:
               - "dynamodb:query"
               - "dynamodb:GetItem"
               - "dynamodb:PutItem"
             Resource:
               - Fn::GetAtt: [ SSHAuthDB, Arn ]
      environment:
       REGION: ${env:REGION}
       DYNAMODB_TABLE: SSHAuthDB
       WEBSOCKET_ENDPOINT: ${env:WEBSOCKET_ENDPOINT}
       NODE_ENV: ${env:NODE_ENV}
    package:
     patterns:
       - '!node_modules/**'
       - handler.js
       - '!package.json'
       - '!package-lock.json'
    plugins:
     - serverless-dotenv-plugin
    layers:
     sshQRAPPLibs:
       path: layer
       compatibleRuntimes:
         - nodejs12.x
    functions:
     connectionHandler:
       handler: handler.connectHandler
       timeout: 60
       memorySize: 256
       layers:
         - {Ref: SshQRAPPLibsLambdaLayer}
       events:
         - websocket:
            route: $connect
            routeResponseSelectionExpression: $default
     disconnectHandler:
       handler: handler.disconnectHandler
       memorySize: 256
       timeout: 60
       layers:
         - {Ref: SshQRAPPLibsLambdaLayer}
       events:
         - websocket: $disconnect
     defaultHandler:
       handler: handler.defaultHandler
       memorySize: 256
       timeout: 60
       layers:
         - {Ref: SshQRAPPLibsLambdaLayer}
       events:
         - websocket: $default
     customQueryHandler:
       handler: handler.queryHandler
       memorySize: 256
       timeout: 60
       layers:
         - {Ref: SshQRAPPLibsLambdaLayer}
       events:
         - websocket:
            route: expectauth
            routeResponseSelectionExpression: $default
         - websocket:
            route: getconid
            routeResponseSelectionExpression: $default
         - websocket:
            route: verifyauth
            routeResponseSelectionExpression: $default
     resources:
     Resources:
       SSHAuthDB:
         Type: AWS::DynamoDB::Table
         Properties:
           TableName: ${env:DYNAMODB_TABLE}
           AttributeDefinitions:
             - AttributeName: authkey
               AttributeType: S
           KeySchema:
             - AttributeName: authkey
               KeyType: HASH
           TimeToLiveSpecification:
             AttributeName: expires_at
             Enabled: true
           ProvisionedThroughput:
             ReadCapacityUnits: 2
             WriteCapacityUnits: 2

    The API Gateway WebSocket has three custom events. These events come as an argument to the lambda function in “event.body.action.” API Gateway WebSocket calls them as route selection expressions. These custom events are:

    • The “expectauth” event is sent by the PAM module to WebSocket informing that a client has asked for authentication and mobile application may try to authenticate by scanning QR code. During this event, the WebSocket handler stores the connection ID along with auth verification string. This key acts as a primary key to our DynamoDB table.
    • The “getconid” event is sent to retrieve the current connection ID so that the PAM can generate a SHA1 sum and provide a QR Code prompt.
    • The “verifyauth” event is sent by the PAM module to confirm and verify authentication. During this event, even the WebSocket server expects random challenge response text. WebSocket server retrieves data payload from DynamoDB with auth verification string as primary key, and tries to find the key “authVerified” marked as “true” (more on this later).
    queryHandler: async (event,context) => {
       const payload = JSON.parse(event.body);
       const documentClient = new DynamoDB.DocumentClient({
         region : process.env.REGION
       });
       try {
         switch(payload.action){
           case 'expectauth':
            
             const expires_at = parseInt(new Date().getTime() / 1000) + 300;
      
             await documentClient.put({
               TableName : process.env.DYNAMODB_TABLE,
               Item: {
                 authkey : payload.authkey,
                 connectionId : event.requestContext.connectionId,
                 username : payload.username,
                 expires_at : expires_at,
                 authVerified: false
               }
             }).promise();
             return {
               statusCode: 200,
               body : "OK"
             };
           case 'getconid':
             return {
               statusCode: 200,
               body: `connectionid:${event.requestContext.connectionId}`
             };
           case 'verifyauth':
             const data = await documentClient.get({
               TableName : process.env.DYNAMODB_TABLE,
               Key : {
                 authkey : payload.authkey
               }
             }).promise();
             if(!("Item" in data)){
               throw "Failed to query data";
             }
             if(data.Item.authVerified === true){
               return {
                 statusCode: 200,
                 body: `authverified:${payload.challengeText}`
               }
             }
             throw "auth verification failed";
         }
       } catch (error) {
         console.log(error);
       }
       return {
         statusCode:  200,
         body : "ok"
        };
      
     }

    Android App: SSH QR Code Auth

     

    The Android app consists of two parts. App login and scanning the QR code for authentication. The AWS Cognito and Amplify library ease out the process of a secure login. Just wrapping your react-native app with “withAutheticator” component you get ready to use “Login Screen.” We then use the react-native-qrcode-scanner component to scan the QR Code.

    This component returns decoded string on the successful scan. Application logic then breaks the string and finds the validity of the string decoded. If the decoded string is a valid application string, an API call is made to the server with the appropriate payload.

    render(){
       return (
         <View style={styles.container}>
           {this.state.authQRCode ?
           <AuthQRCode
            hideAuthQRCode = {this.hideAuthQRCode}
            qrScanData = {this.qrScanData}
           />
           :
           <View style={{marginVertical: 10}}>
           <Button title="Auth SSH Login" onPress={this.showAuthQRCode} />
           <View style={{margin:10}} />
           <Button title="Sign Out" onPress={this.signout} />
           </View>
          
           }
         </View>
       );
     }
         const scanCode = e.data.split(':');
         if(scanCode.length <3){
           throw "invalid qr code";
         }
         const [appstring,authcode,shacode] = scanCode;
         if(appstring !== "qrauth"){
           throw "Not a valid app qr code";
         }
         const authsession = await Auth.currentSession();
         const jwtToken = authsession.getIdToken().jwtToken;
         const response = await axios({
           url : "https://API_GATEWAY_URL/v1/app/sshqrauth/qrauth",
           method : "post",
           headers : {
             Authorization : jwtToken,
             'Content-Type' : 'application/json'
           },
           responseType: "json",
           data : {
             authcode,
             shacode
           }
         });
         if(response.data.status === 200){
           rescanQRCode=false;
           setTimeout(this.hideAuthQRCode, 1000);
         }

    This guide does not cover how to deploy react-native Android applications. You may refer to the official react-native guide to deploy your application to the Android mobile device.

    QR Auth API

    The QR Auth API is built using a serverless framework with aws-nodejs template. It uses API Gateway as HTTP API and AWS Cognito for authorizing input requests. The serverless YAML definition is defined below.

    service: ssh-qrauth-server
    frameworkVersion: '2 || 3'
    useDotenv: true
    provider:
     name: aws
     runtime: nodejs12.x
     lambdaHashingVersion: 20201221
     deploymentBucket:
       name: ${env:DEPLOYMENT_BUCKET_NAME}
     httpApi:
       authorizers:
         cognitoJWTAuth:
           identitySource: $request.header.Authorization
           issuerUrl: ${env:COGNITO_ISSUER}
           audience:
             - ${env:COGNITO_AUDIENCE}
     region: ap-south-1
     iam:
       role:
         statements:
         - Effect: "Allow"
           Action:
             - "dynamodb:Query"
             - "dynamodb:PutItem"
             - "dynamodb:GetItem"
           Resource:
             - ${env:DYNAMO_DB_ARN}
         - Effect: "Allow"
           Action:
             - "execute-api:Invoke"
             - "execute-api:ManageConnections"
           Resource:
             - ${env:API_GATEWAY_WEBSOCKET_API_ARN}/*
     environment:
       REGION: ${env:REGION}
       COGNITO_ISSUER: ${env:COGNITO_ISSUER}
       DYNAMODB_TABLE: ${env:DYNAMODB_TABLE}
       COGNITO_AUDIENCE: ${env:COGNITO_AUDIENCE}
       POOLID: ${env:POOLID}
       COGNITOIDP: ${env:COGNITOIDP}
       WEBSOCKET_ENDPOINT: ${env:WEBSOCKET_ENDPOINT}
    package:
     patterns:
       - '!node_modules/**'
       - handler.js
       - '!package.json'
       - '!package-lock.json'
       - '!.env'
       - '!test.http'
    plugins:
     - serverless-deployment-bucket
     - serverless-dotenv-plugin
    layers:
     qrauthLibs:
       path: layer
       compatibleRuntimes:
         - nodejs12.x
    functions:
     sshauthqrcode:
       handler: handler.authqrcode
       memorySize: 256
       timeout: 30
       layers:
         - {Ref: QrauthLibsLambdaLayer}
       events:
         - httpApi:
             path: /v1/app/sshqrauth/qrauth
             method: post
             authorizer:
               name: cognitoJWTAuth

    Once the API Gateway authenticates the incoming requests, control is handed over to the serverless-express router. At this stage, we verify the payload for the auth verify string, which is scanned by the Android mobile app. This auth verify string must be available in the DynamoDB table. Upon retrieving the record pointed by auth verification string, we read the connection ID property and convert it to SHA1 hash. If the hash matches with the hash available in the request payload, we update the record “authVerified” as “true” and inform the PAM module via API Gateway WebSocket API. PAM Module then takes care of further validation via challenge response text.

    The entire authentication flow is depicted in a flow diagram, and the architecture is depicted in the cover post of this blog.

     

    Compiling and Installing PAM module

    Unlike any other C programs, PAM modules are shared libraries. Therefore, the compiled code when loaded in memory may go at this arbitrary place. Thus, the module must be compiled as position independent. With gcc while compiling, we must pass -fPIC option. Further while linking and generating shared object binary, we should use -shared flag.

    gcc -I$PWD -fPIC -c $(ls *.c)
    gcc -shared -o pam_qrapp_auth.so $(ls *.o) -lpam -lqrencode -lssl -lcrypto -lpthread -lwebsockets

    To ease this process of compiling and validating libraries, I prefer to use the autoconf tool. The entire project is checked out at my GitHub repository along with autoconf scripts.

    Once the shared object file is generated (pam_qrapp_auth.so), copy this file to the “/usr/lib64/security/” directory and run ldconfig command to inform OS new shared library is available. Remove common-auth (from /etc/pam.d/sshd if applicable) or any line that uses “auth” realm with pam_unix.so module recursively used in /etc/pam.d/sshd. pam_unix.so module enforces a password or private key authentication. We then need to add our module to the auth realm (“auth required pam_qrapp_auth.so”). Depending upon your Linux flavor, your /etc/pam.d/sshd file may look similar to below:

    auth       required     pam_qrapp_auth.so
    account    required     pam_nologin.so
    @include common-account
    session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so close
    session    required     pam_loginuid.so
    session    optional     pam_keyinit.so force revoke
    @include common-session
    session    optional     pam_motd.so  motd=/run/motd.dynamic
    session    optional     pam_motd.so noupdate
    session    optional     pam_mail.so standard noenv # [1]
    session    required     pam_limits.so
    session    required     pam_env.so # [1]
    session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale
    session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so open
    @include common-password

    Finally, we need to configure our sshd daemon configuration file to allow challenge response authentication. Open file /etc/ssh/sshd_config and add “ChallengeResponseAuthentication yes” if already not available or commented or set to “no.” Reload the sshd service by issuing the command “systemctl reload sshd.” Voila, and we are done here.

    Conclusion

    This guide was a barebones tutorial and not meant for production use. There are certain flaws to this PAM module. For example, our module should prompt for changing the password if the password is expired or login should be denied if an account is a locked and similar feature that addresses security. Also, the Android mobile app should be bound with ssh username so that, AWS Cognito user bound with ssh username could only authenticate.

    One known limitation to this PAM module is we have to always hit enter after scanning the QR Code via Android Mobile App. This limitation is because of how OpenSSH itself is implemented. OpenSSH server blocks all the informational text unless user input is required. In our case, the informational text is UTF8 QR Code itself.

    However, no such input is required from the interactive device, as the authentication event comes from the WebSocket to PAM module. If we do not ask the user to exclusively press enter after scanning the QR Code our QR Code will never be displayed. Thus input here is a dummy. This is a known issue for OpenSSH PAM_TEXT_INFO. Find more about the issue here.

    References

    Pluggable authentication module

    An introduction to Pluggable Authentication Modules (PAM) in Linux

    Custom PAM for SSHD in C

    google-authenticator-libpam

    PAM_TEXT_INFO and PAM_ERROR_MSG conversation not honoured during PAM authentication