Author: admin

  • Micro Frontends: Reinventing UI In The Microservices World

    It is amazing how the software industry has evolved. Back in the day, a software was a simple program. Some of the first software applications like The Apollo Missions Landing modules and Manchester Baby were basic stored procedures. Software was primarily used for research and mathematical purposes.

    The invention of personal computers and the prominence of the Internet changed the software world. Desktop applications like word processors, spreadsheets, and games grew. Websites gradually emerged. Back then, simple pages were delivered to the client as static documents for viewing. By the mid-1990s, with Netscape introducing client-side scripting language, JavaScript and Macromedia bringing in Flash, the browser became more powerful, allowing websites to become richer and more interactive. In 1999, the Java language introduced Servlets. And thus born the Web Application. Nevertheless, these developments and applications were still simpler. Engineers didn’t emphasize enough on structuring them and mostly built unstructured monolithic applications.

    The advent of disruptive technologies like cloud computing and Big data paved the way for more intricate, convolute web and native mobile applications. From e-commerce and video streaming apps to social media and photo editing, we had applications doing some of the most complicated data processing and storage tasks. The traditional monolithic way now posed several challenges in terms of scalability, team collaboration and integration/deployment, and often led to huge and messy The Ball Of Mud codebases.

    Fig: Monolithic Application Problems – Source

    To untangle this ball of software, came in a number of service-oriented architectures. The most promising of them was Microservices – breaking an application into smaller chunks that can be developed, deployed and tested independently but worked as a single cohesive unit. Its benefits of scalability and ease of deployment by multiple teams proved as a panacea to most of the architectural problems. A few front-end architectures also came up, such as MVC, MVVM, Web Components, to name a few. But none of them were fully able to reap the benefits of Microservices.

    Fig: Microservice Architecture – Source

    ‍Micro Frontends: The Concept‍

    Micro Frontends first came up in ThoughtWorks Technology Radar where they assessed, tried and eventually adopted the technology after noticing significant benefits. It is a Microservice approach to front-end web development where independently deliverable front-end applications are composed as a whole. 

    With Microservices, Micro Frontends breaks the last monolith to create a complete Micro-Architecture design pattern for web applications. It is entirely composed of loosely coupled vertical slices of business functionality rather than in horizontals. We can term these verticals as ‘Microapps’. This concept is not new and has appeared in Scaling with Microservices and Vertical Decomposition. It first presented the idea of every vertical being responsible for a single business domain and having its presentation layer, persistence layer, and a separate database. From the development perspective, every vertical is implemented by exactly one team and no code is shared among different systems.

    Fig: Micro Frontends with Microservices (Micro-architecture)

    Why Micro Frontends?

    A microservice architecture has a whole slew of advantages when compared to monolithic architectures.

    Ease of Upgrades – Micro Frontends build strict bounded contexts in the application. Applications can be updated in a more incremental and isolated fashion without worrying about the risks of breaking up another part of the application.

    Scalability – Horizontal scaling is easy for Micro Frontends. Each Micro Frontend has to be stateless for easier scalability.

    Ease of deployability: Each Micro Frontend has its CI/CD pipeline, that builds, tests and deploys it to production. So it doesn’t matter if another team is working on a feature and has pushed a bug fix or if a cutover or refactoring is taking place. There should be no risks involved in pushing changes done on a Micro Frontend as long as there is only one team working on it.

    Team Collaboration and Ownership: The Scrum Guide says that “Optimal Development Team size is small enough to remain nimble and large enough to complete significant work within a Sprint”. Micro Frontends are perfect for multiple cross-functional teams that can completely own a stack (Micro Frontend) of an application from UX to Database design. In case of an E-commerce site, the Product team and the Payment team can concurrently work on the app without stepping on each other’s toes.

    Micro Frontend Integration Approaches

    There is a multitude of ways to implement Micro Frontends. It is recommended that any approach for this should take a Runtime integration route instead of a Build Time integration, as the former has to re-compile and release on every single Micro Frontend to release any one of the Micro Frontend’s changes.

    We shall learn some of the prominent approaches of Micro Frontends by building a simple Pet Store E-Commerce site. The site has the following aspects (or Microapps, if you will) – Home or Search, Cart, Checkout, Product, and Contact Us. We shall only be working on the Front-end aspect of the site. You can assume that each Microapp has a microservice dedicated to it in the backend. You can view the project demo here and the code repository here. Each way of integration has a branch in the repo code that you can check out to view.

    Single Page Frontends –

    The simplest way (but not the most elegant) to implement Micro Frontends is to treat each Micro Frontend as a single page.

    Fig: Single Page Micro Frontends: Each HTML file is a frontend.
    !DOCTYPE html>
    <html lang="zxx">
    <head>
    	<title>The MicroFrontend - eCommerce Template</title>
    </head>
    <body>
      <header class="header-section header-normal">
        <!-- Header is repeated in each frontend which is difficult to maintain -->
        ....
        ....
      </header
      <main>
      </main>
      <footer
        <!-- Footer is repeated in each frontend which means we have to multiple changes across all frontends-->
      </footer>
      <script>
        <!-- Cross Cutting features like notification, authentication are all replicated in all frontends-->
      </script>
    </body>

    It is one of the purest ways of doing Micro Frontends because no container or stitching element binds the front ends together into an application. Each Micro Frontend is a standalone app with each dependency encapsulated in it and no coupling with the others. The flipside of this approach is that each frontend has a lot of duplication in terms of cross-cutting concerns like headers and footers, which adds redundancy and maintenance burden.

    JavaScript Rendering Components (Or Web Components, Custom Element)-

    As we saw above, single-page Micro Frontend architecture has its share of drawbacks. To overcome these, we should opt for an architecture that has a container element that builds the context of the app and the cross-cutting concerns like authentication, and stitches all the Micro Frontends together to create a cohesive application.

    // A virtual class from which all micro-frontends would extend
    class MicroFrontend {
      
      beforeMount() {
        // do things before the micro front-end mounts
      }
    
      onChange() {
        // do things when the attributes of a micro front-end changes
      }
    
      render() {
        // html of the micro frontend
        return '<div></div>';
      }
    
      onDismount() {
        // do things after the micro front-end dismounts 
      }
    }

    class Cart extends MicroFrontend {
      beforeMount() {
        // get previously saved cart from backend
      }
    
      render() {
        return `<!-- Page -->
        <div class="page-area cart-page spad">
          <div class="container">
            <div class="cart-table">
              <table>
                <thead>
                .....
                
         `
      }
    
      addItemToCart(){
        ...
      }
        
      deleteItemFromCart () {
        ...
      }
    
      applyCouponToCart() {
        ...
      }
        
      onDismount() {
        // save Cart for the user to get back to afterwards
      }
    }

    class Product extends MicroFrontend {
      static get productDetails() {
        return {
          '1': {
            name: 'Cat Table',
            img: 'img/product/cat-table.jpg'
          },
          '2': {
            name: 'Dog House Sofa',
            img: 'img/product/doghousesofa.jpg'
          },
        }
      }
      getProductDetails() {
        var urlParams = new URLSearchParams(window.location.search);
        const productId = urlParams.get('productId');
        return this.constructor.productDetails[productId];
      }
      render() {
        const product = this.getProductDetails();
        return `	<!-- Page -->
        <div class="page-area product-page spad">
          <div class="container">
            <div class="row">
              <div class="col-lg-6">
                <figure>
                  <img class="product-big-img" src="${product.img}" alt="">`
      }
      selectProductColor(color) {}
    
      selectProductSize(size) {}
     
      addToCart() {
        // delegate call to MicroFrontend Cart.addToCart function
      }
      
    }

    <!DOCTYPE html>
    <html lang="zxx">
    <head>
    	<title>PetStore - because Pets love pampering</title>
    	<meta charset="UTF-8
      <link rel="stylesheet" href="css/style.css"/>
    
    </head>
    <body>
    	<!-- Header section -->
    	<header class="header-section">
      ....
      </header>
    	<!-- Header section end -->
    	<main id='microfrontend'>
        <!-- This is where the Micro-frontend gets rendered by utility renderMicroFrontend.js-->
    	</main>
                                    <!-- Header section -->
    	<footer class="header-section">
      ....
      </footer>
    	<!-- Footer section end -->
      	<script src="frontends/MicroFrontend.js"></script>
    	<script src="frontends/Home.js"></script>
    	<script src="frontends/Cart.js"></script>
    	<script src="frontends/Checkout.js"></script>
    	<script src="frontends/Product.js"></script>
    	<script src="frontends/Contact.js"></script>
    	<script src="routes.js"></script>
    	<script src="renderMicroFrontend.js"></script>

    function renderMicroFrontend(pathname) {
      const microFrontend = routes[pathname || window.location.hash];
      const root = document.getElementById('microfrontend');
      root.innerHTML = microFrontend ? new microFrontend().render(): new Home().render();
      $(window).scrollTop(0);
    }
    
    $(window).bind( 'hashchange', function(e) { renderFrontend(window.location.hash); });
    renderFrontend(window.location.hash);
    
    utility routes.js (A map of the hash route to the Microfrontend class)
    const routes = {
      '#': Home,
      '': Home,
      '#home': Home,
      '#cart': Cart,
      '#checkout': Checkout,
      '#product': Product,
      '#contact': Contact,
    };

    As you can see, this approach is pretty neat and encapsulates a separate class for Micro Frontends. All other Micro Frontends extend from this. Notice how all the functionality related to Microapp is encapsulated in the respective Micro Frontend. This makes sure that concurrent work on a Micro Frontend doesn’t mess up some other Micro Frontends.

    Everything will work in a similar paradigm when it comes to Web Components and Custom Elements.

    React

    With the client-side JavaScript frameworks being very popular, it is impossible to leave React from any Front End discussion. React being a component-based JS library, much of the things discussed above will also apply to React. I am going to discuss some of the technicalities and challenges when it comes to Micro Frontends with React.

    Styling

    Since there should be minimum sharing of code between any Micro Frontend, styling the React components can be challenging, considering the global and cascading nature of CSS. We should make sure styles are targeted on a specific Micro Frontend without spilling over to other Micro Frontends. Inline CSS, CSS in JS libraries like Radium,  and CSS Modules, can be used with React.

    Redux

    Using React with Redux is kind of a norm in today’s front-end world. The convention is to use Redux as a single global store for the entire app for cross application communication. A Micro Frontend should be self-contained with no dependencies. Hence each Micro Frontend should have its own Redux store, moving towards a multiple Redux store architecture. 

    Other Noteworthy Integration Approaches  –

    Server-side Rendering – One can use a server to assemble Micro Frontend templates before dispatching it to the browser. SSI techniques can be used too.

    iframes – Each Micro Frontend can be an iframe. They also offer a good degree of isolation in terms of styling, and global variables don’t interfere with each other.

    Summary

    With Microservices, Micro Frontends promise to  bring in a lot of benefits when it comes to structuring a complex application and simplifying its development, deployment and maintenance.

    But there is a wonderful saying that goes “there is no one-size-fits-all approach that anyone can offer you. The same hot water that softens a carrot hardens an egg”. Micro Frontend is no silver bullet for your architectural problems and comes with its own share of downsides. With more repositories, more tools, more build/deploy pipelines, more servers, more domains to maintain, Micro Frontends can increase the complexity of an app. It may render cross-application communication difficult to establish. It can also lead to duplication of dependencies and an increase in application size.

    Your decision to implement this architecture will depend on many factors like the size of your organization and the complexity of your application. Whether it is a new or legacy codebase, it is advisable to apply the technique gradually over time and review its efficacy over time.

  • Managing a TLS Certificate for Kubernetes Admission Webhook

    A Kubernetes admission controller is a great way of handling an incoming request, whether to add or modify fields or deny the request as per the rules/configuration defined. To extend the native functionalities, these admission webhook controllers call a custom-configured HTTP callback (webhook server) for additional checks. But the API server only communicates over HTTPS with the admission webhook servers and needs TLS cert’s CA information. This poses a problem for how we handle this webhook server certificate and how to pass CA information to the API server automatically.

    One way to handle these TLS certificate and CA is using Kubernetes cert-manager. However, Kubernetes cert-manager itself is a big application and consists of many CRDs to handle its operation. It is not a good idea to install cert-manager just to handle admission webhook TLS certificate and CA. The second and possibly easier way is to use self-signed certificate and handle CA on our own using the Init Container. This eliminates the dependency on other applications, like cert-manager, and gives us the flexibility to control our application flow.

    How is a custom admission webhook written? We will not cover this in-depth, and only a basic overview of admission controllers and their working will be covered. The main focus for this blog will be to cover the second approach step-by-step: handling admission webhook server TLS certificate and CA on our own using init container so that the API server can communicate with our custom webhook.

    To understand the in-depth working of Admission Controllers, these articles are great: 

    Prerequisites:

    • Knowledge of Kubernetes admission controllers,  MutatingAdmissionWebhook, ValidatingAdmissionWebhook
    • Knowledge of Kubernetes resources like pods and volumes

    Basic Overview: 

    Admission controllers intercept requests to the Kubernetes API server before persistence of objects in the etcd. These controllers are bundled and compiled together into the kube-apiserver binary. They consist of a list of controllers, and in that list, there are two special controllers: MutatingAdmissionWebhook and ValidatingAdmissionWebhook. MutatingAdmissionWebhook, as the name suggests, mutates/adds/modifies some fields in the request object by creating a patch, and ValidatingAdmissionWebhook validates the request by checking if the request object fields are valid or if the operation is allowed, etc., as per custom logic.

    The main reason for these types of controllers is to dynamically add new checks along with the native existing checks in Kubernetes to allow a request, just like the plug-in model. To understand this more clearly, let’s say we want all the deployments in the cluster to have certain required labels. If the deployment does not have required labels, then the create deployment request should be denied. This functionality can be achieved in two ways: 

    1) Add these extra checks natively in Kubernetes API server codebase, compile a new binary, and run with the new binary. This is a tedious process, and every time new checks are needed, a new binary is required. 

    2) Create a custom admission webhook, a simple HTTP server, for these additional checks, and register this admission webhook with the API server using AdmissionRegistration API. To register two configurations, MutatingWebhookconfiguration and ValidatingWebhookConfiguration are used. The second approach is recommended and it’s quite easy as well. We will be discussing it here in detail.

    Custom Admission Webhook Server:

    As mentioned earlier, a custom admission webhook server is a simple HTTP server with TLS that exposes endpoints for mutation and validation. Depending upon the endpoint hit, corresponding handlers process mutation and validation. Once a custom webhook server is ready and deployed in a cluster as a deployment along with webhook service, the next part is to register it with the API server so that the API server can communicate with the custom webhook server. To register, MutatingWebhookconfiguration and ValidatingWebhookConfiguration are used. These configurations have a section to fill custom webhook related information.

    apiVersion: admissionregistration.k8s.io/v1
    kind: MutatingWebhookConfiguration
    metadata:
      name: mutation-config
    webhooks:
      - admissionReviewVersions:
        - v1beta1
        name: mapplication.kb.io
        clientConfig:
          caBundle: ${CA_BUNDLE}
          service:
            name: webhook-service
            namespace: default
            path: /mutate
        rules:
          - apiGroups:
              - apps
          - apiVersions:
              - v1
            resources:
              - deployments
        sideEffects: None

    Here, the service field gives information about the name, namespace, and endpoint path of the webhook server running. An important field here to note is the CA bundle. A custom admission webhook is required to run the HTTP server with TLS only because the API server only communicates over HTTPS. So the webhook server runs with server cert, and key and “caBundle” in the configuration is CA (Certification Authority) information so that API server can recognize server certificate.

    The problem here is how to handle this server certificate and the key—and how to get this CA bundle and pass this information to the API server using MutatingWebhookconfiguration or ValidatingWebhookConfiguration. This will be the main focus of the following part.

    Here, we are going to use a self-signed certificate for the webhook server. Now, this self-signed certificate can be made available to the webhook server using different ways. Two possible ways are:

    • Create a Kubernetes secret containing certificate and key and mount that as volume on to the server pod
    • Somehow create certificate and key in a volume, e.g., emptyDir volume and server consumes those from that volume

    However, even after doing any of the above two possible ways, the remaining important part is to add the CA bundle in mutation/validation configs.

    So, instead of doing all these steps manually, we all make use of Kubernetes init containers to perform all functions for us.

    Custom Admission Webhook Server Init Container:

    The main function of this init container will be to create a self-signed webhook server certificate and provide the CA bundle to the API server via mutation/validation configs. How the webhook server consumes this certificate (via secret volume or emptyDir volume), depends on the use-case. This init container will run a simple Go binary to perform all these functions.

    package main
    
    import (
    	"bytes"
    	cryptorand "crypto/rand"
    	"crypto/rsa"
    	"crypto/x509"
    	"crypto/x509/pkix"
    	"encoding/pem"
    	"fmt"
    	log "github.com/sirupsen/logrus"
    	"math/big"
    	"os"
    	"time"
    )
    
    func main() {
    	var caPEM, serverCertPEM, serverPrivKeyPEM *bytes.Buffer
    	// CA config
    	ca := &x509.Certificate{
    		SerialNumber: big.NewInt(2020),
    		Subject: pkix.Name{
    			Organization: []string{"velotio.com"},
    		},
    		NotBefore:             time.Now(),
    		NotAfter:              time.Now().AddDate(1, 0, 0),
    		IsCA:                  true,
    		ExtKeyUsage:           []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth, x509.ExtKeyUsageServerAuth},
    		KeyUsage:              x509.KeyUsageDigitalSignature | x509.KeyUsageCertSign,
    		BasicConstraintsValid: true,
    	}
    
    	// CA private key
    	caPrivKey, err := rsa.GenerateKey(cryptorand.Reader, 4096)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// Self signed CA certificate
    	caBytes, err := x509.CreateCertificate(cryptorand.Reader, ca, ca, &caPrivKey.PublicKey, caPrivKey)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// PEM encode CA cert
    	caPEM = new(bytes.Buffer)
    	_ = pem.Encode(caPEM, &pem.Block{
    		Type:  "CERTIFICATE",
    		Bytes: caBytes,
    	})
    
    	dnsNames := []string{"webhook-service",
    		"webhook-service.default", "webhook-service.default.svc"}
    	commonName := "webhook-service.default.svc"
    
    	// server cert config
    	cert := &x509.Certificate{
    		DNSNames:     dnsNames,
    		SerialNumber: big.NewInt(1658),
    		Subject: pkix.Name{
    			CommonName:   commonName,
    			Organization: []string{"velotio.com"},
    		},
    		NotBefore:    time.Now(),
    		NotAfter:     time.Now().AddDate(1, 0, 0),
    		SubjectKeyId: []byte{1, 2, 3, 4, 6},
    		ExtKeyUsage:  []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth, x509.ExtKeyUsageServerAuth},
    		KeyUsage:     x509.KeyUsageDigitalSignature,
    	}
    
    	// server private key
    	serverPrivKey, err := rsa.GenerateKey(cryptorand.Reader, 4096)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// sign the server cert
    	serverCertBytes, err := x509.CreateCertificate(cryptorand.Reader, cert, ca, &serverPrivKey.PublicKey, caPrivKey)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// PEM encode the  server cert and key
    	serverCertPEM = new(bytes.Buffer)
    	_ = pem.Encode(serverCertPEM, &pem.Block{
    		Type:  "CERTIFICATE",
    		Bytes: serverCertBytes,
    	})
    
    	serverPrivKeyPEM = new(bytes.Buffer)
    	_ = pem.Encode(serverPrivKeyPEM, &pem.Block{
    		Type:  "RSA PRIVATE KEY",
    		Bytes: x509.MarshalPKCS1PrivateKey(serverPrivKey),
    	})
    
    	err = os.MkdirAll("/etc/webhook/certs/", 0666)
    	if err != nil {
    		log.Panic(err)
    	}
    	err = WriteFile("/etc/webhook/certs/tls.crt", serverCertPEM)
    	if err != nil {
    		log.Panic(err)
    	}
    
    	err = WriteFile("/etc/webhook/certs/tls.key", serverPrivKeyPEM)
    	if err != nil {
    		log.Panic(err)
    	}
    
    }
    
    // WriteFile writes data in the file at the given path
    func WriteFile(filepath string, sCert *bytes.Buffer) error {
    	f, err := os.Create(filepath)
    	if err != nil {
    		return err
    	}
    	defer f.Close()
    
    	_, err = f.Write(sCert.Bytes())
    	if err != nil {
    		return err
    	}
    	return nil
    }

    The steps to generate self-signed CA and sign webhook server certificate using this CA in Golang:

    • Create a config for the CA, ca in the code above.
    • Create an RSA private key for this CA, caPrivKey in the code above.
    • Generate a self-signed CA, caBytes, and caPEM above. Here caPEM is the PEM encoded caBytes and will be the CA bundle given to the API server.
    • Create a config for webhook server certificate, cert in the code above. The important field in this configuration is the DNSNames and commonName. This name must be the full webhook service name of the webhook server to reach the webhook pod.
    • Create an RS private key for the webhook server, serverPrivKey in the code above.
    • Create server certificate using ca and caPrivKey, serverCertBytes in the code above.
    • Now, PEM encode the serverPrivKey and serverCertBytes. This serverPrivKeyPEM and serverCertPEM is the TLS certificate and key and will be consumed by the webhook server.

    At this point, we have generated the required certificate, key, and CA bundle using init container. Now we will share this server certificate and key with the actual webhook server container in the same pod. 

    • One approach is to create an empty secret resource before-hand, create webhook deployment by passing the secret name as an environment variable. Init container will generate server certificate and key and populate the empty secret with certificate and key information. This secret will be mounted on to webhook server container to start HTTP server with TLS.
    • The second approach (used in the code above) is to use Kubernete’s native pod specific emptyDir volume. This volume will be shared between both the containers. In the code above, we can see the init container is writing these certificate and key information in a file on a particular path. This path will be the one emptyDir volume is mounted to, and the webhook server container will read certificate and key for TLS configuration from that path and start the HTTP webhook server. Refer to the below diagram:

    The pod spec will look something like this:

    spec:
      initContainers:
          image: <webhook init-image name>
          imagePullPolicy: IfNotPresent
          name: webhook-init
          volumeMounts:
            - mountPath: /etc/webhook/certs
              name: webhook-certs
      containers:
          image: <webhook server image name>
          imagePullPolicy: IfNotPresent
          name: webhook-server
          volumeMounts:
            - mountPath: /etc/webhook/certs
              name: webhook-certs
              readOnly: true
      volumes:
        - name: webhook-certs
          emptyDir: {}

    The only part remaining is to give this CA bundle information to the API server using mutation/validation configs. This can be done in two ways:

    • Patch the CA bundle in the existing MutatingWebhookConfiguration or ValidatingWebhookConfiguration using Kubernetes go-client in the init container.
    • Create MutatingWebhookConfiguration or ValidatingWebhookConfiguration in the init container itself with CA bundle information in configs.

    Here, we will create configs through init container. To get certain parameters, like mutation config name, webhook service name, and webhook namespace dynamically, we can take these values from init containers env:

    initContainers:
      image: <webhook init-image name>
      imagePullPolicy: IfNotPresent
      name: webhook-init
      volumeMounts:
        - mountPath: /etc/webhook/certs
          name: webhook-certs
      env:
        - name: MUTATE_CONFIG
          value: mutating-webhook-configuration
        - name: VALIDATE_CONFIG
          value: validating-webhook-configuration
        - name: WEBHOOK_SERVICE
          value: webhook-service
        - name: WEBHOOK_NAMESPACE
          value:  default

    To create MutatingWebhookConfiguration, we will add the below piece of code in init container code below the certificate generation code.

    package main
    
    import (
    	"bytes"
    	admissionregistrationv1 "k8s.io/api/admissionregistration/v1"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/client-go/kubernetes"
    	"os"
    	ctrl "sigs.k8s.io/controller-runtime"
    )
    
    func createMutationConfig(caCert *bytes.Buffer) {
    
    	var (
    		webhookNamespace, _ = os.LookupEnv("WEBHOOK_NAMESPACE")
    		mutationCfgName, _  = os.LookupEnv("MUTATE_CONFIG")
    		// validationCfgName, _ = os.LookupEnv("VALIDATE_CONFIG") Not used here in below code
    		webhookService, _ = os.LookupEnv("WEBHOOK_SERVICE")
    	)
    	config := ctrl.GetConfigOrDie()
    	kubeClient, err := kubernetes.NewForConfig(config)
    	if err != nil {
    		panic("failed to set go -client")
    	}
    
    	path := "/mutate"
    	fail := admissionregistrationv1.Fail
    
    	mutateconfig := &admissionregistrationv1.MutatingWebhookConfiguration{
    		ObjectMeta: metav1.ObjectMeta{
    			Name: mutationCfgName,
    		},
    		Webhooks: []admissionregistrationv1.MutatingWebhook{{
    			Name: "mapplication.kb.io",
    			ClientConfig: admissionregistrationv1.WebhookClientConfig{
    				CABundle: caCert.Bytes(), // CA bundle created earlier
    				Service: &admissionregistrationv1.ServiceReference{
    					Name:      webhookService,
    					Namespace: webhookNamespace,
    					Path:      &path,
    				},
    			},
    			Rules: []admissionregistrationv1.RuleWithOperations{{Operations: []admissionregistrationv1.OperationType{
    				admissionregistrationv1.Create},
    				Rule: admissionregistrationv1.Rule{
    					APIGroups:   []string{"apps"},
    					APIVersions: []string{"v1"},
    					Resources:   []string{"deployments"},
    				},
    			}},
    			FailurePolicy: &fail,
    		}},
    	}
      
    	if _, err := kubeClient.AdmissionregistrationV1().MutatingWebhookConfigurations().Create(mutateconfig)
    		err != nil {
    		panic(err)
    	}
    }

    The code above is just a sample code to create MutatingWebhookConfiguration. Here first, we are importing the required packages. Then, we are reading the environment variables like webhookNamespace, etc. Next, we are defining the MutatingWebhookConfiguration struct with CA bundle information (created earlier) and other required information. Finally, we are creating a configuration using the go-client. The same approach can be followed for creating the ValidatingWebhookConfiguration. For cases of pod restart or deletion, we can add extra logic in init containers like delete the existing configs first before creating or updating only the CA bundle if configs already exist.  

    For certificate rotation, the approach will be different for each approach taken for serving this certificate to the server container:

    • If we are using emptyDir volume, then the approach will be to just restart the webhook pod. As emptyDir volume is ephemeral and bound to the lifecycle of the pod, on restart, a new certificate will be generated and served to the server container. A new CA bundle will be added in configs if configs already exist.
    • If we are using secret volume, then, while restarting the webhook pod, the expiration of the existing certificate from the secret can be checked to decide whether to use the existing certificate for the server or create a new one.

    In both cases, the webhook pod restart is required to trigger the certificate rotation/renew process. When you will want to restart the webhook pod and how the webhook pod will be restarted will vary depending on the use-case. A few possible ways can be using cron-job, controllers, etc.

    Now, our custom webhook is registered, the API server can read CA bundle information through configs, and the webhook server is ready to serve the mutation/validation requests as per rules defined in configs. 

    Conclusion:

    We covered how we will add additional checks mutation/validation by registering our own custom admission webhook server. We also covered how we can automatically handle webhook server TLS certificate and key using init containers and passing the CA bundle information to API server through mutation/validation configs.

    Related Articles:

    1. OPA On Kubernetes: An Introduction For Beginners

    2. Prow + Kubernetes – A Perfect Combination To Execute CI/CD At Scale

  • Managing Secrets Using AWS Systems Manager Parameter Store and IAM Roles

    Amazon Web Services (AWS) has an extremely wide variety of services which cover almost all our infrastructure requirements. Among the given services, there is AWS Systems Manager which is a collection of services to manage AWS instances, hybrid environment, resources, and virtual machines by providing a common UI interface for all of them. Services are divided into categories such as Resource Groups, Insights, Actions and Shared Resource. Among Shared Resources one is Parameter Store, which is our topic of discussion today. There are many services that may require SSM agents to be installed on the system but the Parameter store can be used as standalone as well.

    What is Parameter Store?

    Parameter Store is a service which helps you arrange your data in a systematic hierarchical format for better reference. Data can be of any type like Passwords, Keys, URLs or Strings. Data can be stored as encrypted or as plain text. Storage is done in Key-Value Format. Parameter store comes integrated with AWS KMS. It provides a key by default and gives you an option to change it, in this blog we will be using the default one.

    Why Parameter Store?

    Let’s compare its competitors, these include Hashicorp Vault and AWS Secrets Manager.

    Vault stores secrets in Database/File-System but requires one to manage the root token and Unseal Keys. And it is not easy to use.

    Next, is the AWS owned Secrets Manager, this service is not free and would require Lambda functions to be written for secret rotation. Which might become an overhead. Also, the hierarchy is taken as a String only, which can’t be iterated.

    Some Key Features of Parameter Store include:

    • As KMS is integrated the encryption takes place automatically without providing extra parameters.
    • It arranges your data hierarchically and it is pretty simple, just apply “/” to form the hierarchy and by applying recursive search we can fetch required parameters.
    • This helps us in removing all those big Config files, which were previously holding our secrets and causing a severe security risk. Helping us in modularizing our applications.
    • Simple Data like Name can be stored as String.
    • Secured Data as SecureString.
    • Even Array data can be stored using StringList.
    • Access configuration is manageable with IAM.
    • Linked with other services like AWS ECS, Lambda, and CloudFormation
    • AWS backed
    • Easy to use
    • Free of cost

    Note: Parameter Store is region specific service thus might not be available in all regions.

    How to Use it?

    Initial Setup:

    Parameter Store can be used both via GUI and terminal.

    AWS console:

    1. Login into your account and select your preferred region.
    2. In Services select Systems Manager and after that select Parameter Store.
    3. If there are already some credentials created than Keys of that credentials will be displayed.
    4. If Not, then you will be asked to “Create Parameter.”

    On CLI:

    1. Download the AWS CLI, it comes along with inbuilt support for Systems Manager (SSM).
    2. Make sure to have your credentials file is configured.

    Use: Both on Console and CLI

    1. Create

    a. Enter the name of the key that you wish to store. If it is hierarchical then apply “/” without quotes and in place of value enter Value.

    Eg: This
      |- is
        | - Key : Value

    Then in name enter “ /This/is/Key” and in value write “Value”

    b. Select the type of storage, if it can be stored as plain text then use String, if the value is in Array format then choose StringList and mention the complete array in value and if you want to secure it then use SecureString.

    c. CLI:

    $aws ssm put-paramater --name "/This/is/Key" --value "Value" --type String  
    {  
    "Version": 1  
    }

    d. If you want to make it secure:

    $aws ssm put-parameter --name "/This" --value "SecureValue" --type SecureString
    {
    "Version": 1
    }

    2. Read

    a. Once Stored, parameters get listed on the console.

    b. To check any of them, just click on the key. If not secured, the value will be directly visible and if it is secured, then the value would be hidden and you will have to explicitly press “Show”.

    AWS Parameter Overview

    c. CLI:

    $aws ssm get-parameter --name /This/is/Key
    {
    "Parameter":
    {
    "Name": "/This/is/Key",
    "LastModifiedDate": 1535362148.994,
    "Value": "Value",
    "Version": 1,
    "Type": "String",
    "ARN": "arn:aws:ssm:us-east-1:275829625285:parameter/This/is/Key"
    } }

    d. For Secured String:  

    $aws ssm get-parameter --name /This --with-decrypt
    {
    "Parameter":
    {
    "Name": "/This",
    "LastModifiedDate": 1535362296.062,
    "Value": "SecureValue",
    "Version": 1,
    "Type": "SecureString",
    "ARN": "arn:aws:ssm:us-east-1:275829625285:parameter/This
    }
    }

    e. If you observe the above command you will realize that despite providing “/This” we did not receive the complete tree. In order to get that provide modify the command as follows:

    $aws ssm get-parameters-by-path --path /This --recursive
    {
    "Parameters": [
    {
    "Name": "/This/is/Key",
    "LastModifiedDate": 1535362148.994,
    "Value": "Value",
    "Version": 1,
    "Type": "String",
    "ARN": "arn:aws:ssm:us-east-1:275829625285:parameter/This/is/Key"
    } ]
    }

    3. Rotate/Modify:

    a. Once a value is saved it automatically gets versioned as 1, if you click on the parameter and EDIT it, then version gets incremented and the new value is stored as version 2. In this way, we achieve rotation of credentials as well.

    b. Type of parameters cannot be changed, you will have to create a new one.

    c. CLI:
    The command itself is clear, just observe the version:

    $aws ssm put-parameter --name "/This/is/Key" --value "NewValue" --type String --overwrite
    {
    "Version": 2
    }

    4. Delete:

    a. Select the parameter or select all the required parameters and click delete

    b. CLI:

    $aws ssm delete-parameter --name "/This/is/Key"

    As you can see commands are pretty simple and if you have observed, ARN information is also getting populated. Below we will discuss IAM role that we can configure, to help us with access control.

    IAM (AWS Identity and Access Management)

    Remember that we are storing some very critical data in Param Store, therefore access to that data should also be well maintained. If by mistake a new developer comes in the team and is given full access over the parameters, chances are he might end up modifying or deleting production parameters. This is something we really don’t want.

    Generally, it is a good practice to have roles and policies predefined such that only the person responsible has access to required data. Control over the parameters can be done to a granular level. But for this blog, we will take a simple use case. That being said we can take reference from the policies mentioned below.

    By using the resource we can specify the path for parameters, that can be accessed by a particular policy. For example, only System Admin should be able to fetch Production credentials, then in order to achieve this, we will be placing “parameter/production” on the policy, where production represents the top level hierarchy. Thus making anything stored under production become accessible, if we want to more fine tune it then we can do so by adding parameters after – parameter/production/<till>/<the>/<last>/<level></level></last></the></till>

    Below are some of the policies that can be applied to a group or user on a server level. Depending on the requirement, explicit deny can also be applied to Developers for Production.

    For Production Servers:

    SSMProdReadOnly:

    "ssm:GetParameterHistory",
    "ssm:ListTagsForResource",
    "ssm:GetParametersByPath",
    "ssm:GetParameters",
    "ssm:GetParameter"
    "Resource": "arn:aws:ssm:<Region>:<Role-ID>:parameter/production"

    SSMProdWriteOnly:

    "ssm:GetParameterHistory",
    "ssm:ListTagsForResource",
    "ssm:GetParametersByPath",
    "ssm:GetParameters",
    "ssm:GetParameter",
    "ssm:PutParameter",
    "ssm:DeleteParameter",
    "ssm:AddTagsToResource",
    "ssm:DeleteParameters" "Resource": "arn:aws:ssm:<Region>:<Role-ID>:parameter/production"

    For Dev Servers:

    SSMDevelopmentReadWrite

    "ssm:PutParameter",
    "ssm:DeleteParameter",
    "ssm:RemoveTagsFromResource",
    "ssm:AddTagsToResource",
    "ssm:GetParameterHistory",
    "ssm:ListTagsForResource",
    "ssm:GetParametersByPath",
    "ssm:GetParameters",
    "ssm:GetParameter"
    "Resource": "arn:aws:ssm:<Region>:<Role-ID>:parameter/development"

    Conclusion

    This was all about the AWS systems manager parameter store and the IAM roles. Now that you know what the parameter store is, why should you use it, and how to use it, I hope this helps you in kick-starting your credential management using AWS Parameter Store. Start using it already and share your experiences or suggestions in the comments section below.

  • Chatbots With Google DialogFlow: Build a Fun Reddit Chatbot in 30 Minutes

    Google DialogFlow

    If you’ve been keeping up with the current advancements in the world of chat and voice bots, you’ve probably come across Google’s newest acquisition – DialogFlow (formerly, api.ai) – a platform that provides a use-case specific, engaging voice and text-based conversations, powered by AI. While understanding the intricacies of human conversations, where we say one thing but mean the other, is still an art lost on machines, a domain-specific bot is the closest thing we can build.

    What is DialogFlow anyway?

    Natural language understanding (NLU) has always been the painful part while building a chatbot. How do you make sure your bot is actually understanding what the user says, and parsing their requests correctly? Well, here’s where DialogFlow comes in and fills the gap. It actually replaces the NLU parsing bit so that you can focus on other areas like your business logic!

    DialogFlow is simply a tool that allows you to make bots (or assistants or agents) that understand human conversation, string together a meaningful API call with appropriate parameters after parsing the conversation and respond with an adequate reply. You can then deploy this bot to any platform of your choosing – Facebook Messenger, Slack, Google Assistant, Twitter, Skype, etc. Or on your own app or website as well!

    The building blocks of DialogFlow

    Agent: DialogFlow allows you to make NLU modules, called agents (basically the face of your bot). This agent connects to your backend and provides it with business logic.

    Intent: An agent is made up of intents. Intents are simply actions that a user can perform on your agent. It maps what a user says to what action should be taken. They’re entry points into a conversation.

    In short, a user may request the same thing in many ways, re-structuring their sentences. But in the end, they should all resolve to a single intent.

    Examples of intents can be:
    “What’s the weather like in Mumbai today?” or “What is the recipe for an omelet?”

    You can create as many intents as your business logic desires, and even co-relate them, using contexts. An intent decides what API to call, with what parameters, and how to respond back, to a user’s request.

    Entity: An agent wouldn’t know what values to extract from a given user’s input. This is where entities come into play. Any information in a sentence, critical to your business logic, will be an entity. This includes stuff like dates, distance, currency, etc. There are system entities, provided by DialogFlow for simple things like numbers and dates. And then there are developer defined entities. For example, “category”, for a bot about Pokemon! We’ll dive into how to make a custom developer entity further in the post.

    Context: Final concept before we can get started with coding is “Context”. This is what makes the bot truly conversational. A context-aware bot can remember things, and hold a conversation like humans do. Consider the following conversation:

    “Hey, are you coming for piano practice tonight?”
    “Sorry, I’ve got dinner plans.”
    “Okay, what about tomorrow night then?”
    “That works!”

    Did you notice what just happened? The first question is straightforward to parse: The time is “tonight”, and the event, “piano practice”.

    However, the second question,  “Okay, what about tomorrow night then?” doesn’t specify anything about the actual event. It’s implied that we’re talking about “piano practice”. This sort of understanding comes naturally to us humans, but bots have to be explicitly programmed so that they understand the context across these sentences.

    Making a Reddit Chatbot using DialogFlow

    Now that we’re well equipped with the basics, let’s get started! We’re going to make a Reddit bot that tells a joke or an interesting fact from the day’s top threads on specific subreddits. We’ll also sprinkle in some context awareness so that the bot doesn’t feel “rigid”.

    NOTE: You would need a billing-enabled account on Google Cloud Platform(GCP) if you want to follow along with this tutorial. It’s free and just needs your credit card details to set up. 

    Creating an Agent 

    1. Log in to the DialogFlow dashboard using your Google account. Here’s the link for the lazy.
    2. Click on “Create Agent”
    3. Enter the details as below, and hit “Create”. You can select any other Google project if it has billing enabled on it as well.

    Setting up a “Welcome” Intent

    As soon as you create the agent, you see this intents page:

    The “Default Fallback” Intent exists in case the user says something unexpected and is outside the scope of your intents. We won’t worry too much about that right now. Go ahead and click on the “Default Welcome Intent”. We can notice a lot of options that we can tweak.
    Let’s start with a triggering phrase. Notice the “User Says” section? We want our bot to activate as soon as we say something along the lines of:

    Let’s fill that in. After that, scroll down to the “Responses” tab. You can see some generic welcome messages are provided. Get rid of them, and put in something more personalized to our bot, as follows:

    Now, this does a couple of things. Firstly, it lets the user know that they’re using our bot. It also guides the user to the next point in the conversation. Here, it is an “or” question.

    Hit “Save” and let’s move on.

    Creating a Custom Entity

    Before we start playing around with Intents, I want to set up a Custom Entity real quick. If you remember, Entities are what we extract from user’s input to process further. I’m going to call our Entity “content”. As the user request will be a content – either a joke or a fact. Let’s go ahead and create that. Click on the “Entities” tab on left-sidebar and click “Create Entity”.

    Fill in the following details:

    As you can see, we have 2 values possible for our content: “joke” and “fact”. We also have entered synonyms for each of them, so that if the user says something like “I want to hear something funny”, we know he wants a “joke” content. Click “Save” and let’s proceed to the next section!

    Attaching our new Entity to the Intent

    Create a new Intent called “say-content”. Add a phrase “Let’s hear a joke” in the “User Says” section, like so:

    Right off the bat, we notice a couple of interesting things. Dialogflow parsed this input and associated the entity content to it, with the correct value (here, “joke”). Let’s add a few more inputs:

    PS: Make sure all the highlighted words are in the same color and have associated the same entity. Dialogflow’s NLU isn’t perfect and sometimes assigns different Entities. If that’s the case, just remove it, double-click the word and assign the correct Entity yourself!

    Let’s add a placeholder text response to see it work. To do that, scroll to the bottom section “Response”, and fill it like so:

    The “$content” is a variable having a value extracted from user’s response that we saw above.

    Let’s see this in action. On the right side of every page on Dialogflow’s platform, you see a “Try It Now” box. Use that to test your work at any point in time. I’m going to go ahead and type in “Tell a fact” in the box. Notice that the “Tell a fact” phrase wasn’t present in the samples that we gave earlier. Dialogflow keeps training using it’s NLU modules and can extract data from similarly structured sentences:

    A Webhook to process requests

    To keep things simple I’m gonna write a JS app that fulfills the request by querying the Reddit’s website and returning the appropriate content. Luckily for us, Reddit doesn’t need authentication to read in JSON format. Here’s the code:

    'use strict';
    const http = require('https');
    exports.appWebhook = (req, res) => { 
    let content = req.body.result.parameters['content']; 
    getContent(content).then((output) => {   
    res.setHeader('Content-Type', 'application/json');   
    res.send(JSON.stringify({ 'speech': output, 'displayText': output    })); 
    }).catch((error) => {   
    // If there is an error let the user know   
    res.setHeader('Content-Type', 'application/json');   
    res.send(JSON.stringify({ 'speech': error, 'displayText': error     })); 
    });
    };
    function getSubreddit (content) { 
    if (content == "funny" || content == "joke" || content == "laugh")   
    return {sub: "jokes", displayText: "joke"};   
    else {     
    return {sub: "todayILearned", displayText: "fact"};   
    }
    }
    function getContent (content) { 
    let subReddit = getSubreddit(content); 
    return new Promise((resolve, reject) => {   
    console.log('API Request: to Reddit');   
    http.get(`https://www.reddit.com/r/${subReddit["sub"]}/top.json?sort=top&t=day`, (resp) => {     
    let data = '';     
    resp.on('data', (chunk) => {       
    data += chunk;     
    });     
    resp.on('end', () => {       
    let response = JSON.parse(data);       
    let thread = response["data"]["children"][(Math.floor((Math.random() * 24) + 1))]["data"];       
    let output = `Here's a ${subReddit["displayText"]}: ${thread["title"]}`;       
    if (subReddit['sub'] == "jokes") {         
    output += " " + thread["selftext"];       
    }       
    output += "nWhat do you want to hear next, a joke or a fact?"       
    console.log(output);       
    resolve(output);     
    });   
    }).on("error", (err) => {     
    console.log("Error: " + err.message);     
    reject(error);   
    }); 
    });
    }

    Now, before going ahead, follow the steps 1-5 mentioned here religiously.

    NOTE: For step 1, select the same Google Project that you created/used, when creating the agent.

    Now, to deploy our function using gcloud:

    $ gcloud beta functions deploy appWebHook --stage-bucket BUCKET_NAME --trigger-http

    To find the BUCKET_NAME, go to your Google project’s console and click on Cloud Storage under the Resources section.

    After you run the command, make note of the httpsTrigger URL mentioned. On the Dialoglow platform, find the “Fulfilment” tab on the sidebar. We need to enable webhooks and paste in the URL, like this:

    Hit “Done” on the bottom of the page, and now the final step. Visit the “say_content” Intent page and perform a couple of steps.

    1. Make the “content” parameter mandatory. This will make the bot ask explicitly for the parameter to the user if it’s not clear:

    2. Notice a new section has been added to the bottom of the screen called “Fulfilment”. Enable the “Use webhook” checkbox:

    Click “Save” and that’s it! Time to test this Intent out!

    Reddit’s crappy humor aside, this looks neat. Our replies always drive the conversation to places (Intents) that we want it to.

    Adding Context to our Bot

    Even though this works perfectly fine, there’s one more thing I’d like to add quickly. We want the user to be able to say, “More” or “Give me another one” and the bot to be able to understand what this means. This is done by emitting and absorbing contexts between intents.

    First, to emit the context, scroll up on the “say-content” Intent’s page and find the “Contexts” section. We want to output the “context”. Let’s say for a count of 5. The count makes sure the bot remembers what the “content” is in the current conversation for up to 5 back and forths.

    Now, we want to create a new content that can absorb this type of context and make sense of phrases like “More please”:

    Finally, since we want it to work the same way, we’ll make the Action and Fulfilment sections look the same way as the “say-content” Intent does:

    And that’s it! Your bot is ready.

    Integrations

    Dialogflow provides integrations with probably every messaging service in the Silicon Valley, and more. But we’ll use the Web Demo. Go to “Integrations” tab from the sidebar and enable “Web Demo” settings. Your bot should work like this:

    And that’s it! Your bot is ready to face a real person! Now, you can easily keep adding more subreddits, like news, sports, bodypainting, dankmemes or whatever your hobbies in life are! Or make it understand a few more parameters. For example, “A joke about Donald Trump”.

    Consider that your homework. You can also add a “Bye” intent, and make the bot stop. Our bot currently isn’t so great with goodbyes, sort of like real people.

    Debugging and Tips

    If you’re facing issues with no replies from the Reddit script, go to your Google Project and check the Errors and Reportings tab to make sure everything’s fine under the hood. If outbound requests are throwing an error, you probably don’t have billing enabled.

    Also, one caveat I found is that the entities can take up any value from the synonyms that you’ve provided. This means you HAVE to hardcode them in your business app as well. Which sucks right now, but maybe DialogFlow will provide a cleaner solution in the near future!

  • A Step Towards Machine Learning Algorithms: Univariate Linear Regression

    These days the concept of Machine Learning is evolving rapidly. The understanding of it is so vast and open that everyone is having their independent thoughts about it. Here I am putting mine. This blog is my experience with the learning algorithms. In this blog, we will get to know the basic difference between Artificial Intelligence, Machine Learning, and Deep Learning. We will also get to know the foundation Machine Learning Algorithm i.e Univariate Linear Regression.

    Intermediate knowledge of Python and its library (Numpy, Pandas, MatPlotLib) is good to start. For Mathematics, a little knowledge of Algebra, Calculus and Graph Theory will help to understand the trick of the algorithm.

    A way to Artificial intelligence, Machine Learning, and Deep Learning

    These are the three buzzwords of today’s Internet world where we are seeing the future of the programming language. Specifically, we can say that this is the place where science domain meets with programming. Here we use scientific concepts and mathematics with a programming language to simulate the decision-making process. Artificial Intelligence is a program or the ability of a machine to make decisions more as humans do. Machine Learning is another program that supports Artificial Intelligence.  It helps the machine to observe the pattern and learn from it to make a decision. Here programming is helping in observing the patterns not in making decisions. Machine learning requires more and more information from various sources to observe all of the variables for any given pattern to make more accurate decisions. Here deep learning is supporting machine learning by creating a network (neural network) to fetch all required information and provide it to machine learning algorithms.

    What is Machine Learning

    Definition: Machine Learning provides machines with the ability to learn autonomously based on experiences, observations and analyzing patterns within a given data set without explicitly programming.

    This is a two-part process. In the first part, it observes and analyses the patterns of given data and makes a shrewd guess of a mathematical function that will be very close to the pattern. There are various methods for this. Few of them are Linear, Non-Linear, logistic, etc. Here we calculate the error function using the guessed mathematical function and the given data. In the second part we will minimize the error function. This minimized function is used for the prediction of the pattern.

    Here are the general steps to understand the process of Machine Learning:

    1. Plot the given dataset on x-y axis
    2. By looking into the graph, we will guess more close mathematical function
    3. Derive the Error function with the given dataset and guessed mathematical function
    4. Try to minimize an error function by using some algorithms
    5. Minimized error function will give us a more accurate mathematical function for the given patterns.

    Getting Started with the First Algorithms: Linear Regression with Univariable

    Linear Regression is a very basic algorithm or we can say the first and foundation algorithm to understand the concept of ML. We will try to understand this with an example of given data of prices of plots for a given area. This example will help us understand it better.

    movieID	title	userID	rating	timestamp
    0	1	Toy story	170	3.0	1162208198000
    1	1	Toy story	175	4.0	1133674606000
    2	1	Toy story	190	4.5	1057778398000
    3	1	Toy story	267	2.5	1084284499000
    4	1	Toy story	325	4.0	1134939391000
    5	1	Toy story	493	3.5	1217711355000
    6	1	Toy story	533	5.0	1050012402000
    7	1	Toy story	545	4.0	1162333326000
    8	1	Toy story	580	5.0	1162374884000
    9	1	Toy story	622	4.0	1215485147000
    10	1	Toy story	788	4.0	1188553740000

    With this data, we can easily determine the price of plots of the given area. But what if we want the price of the plot with area 5.0 * 10 sq mtr. There is no direct price of this in our given dataset. So how we can get the price of the plots with the area not given in the dataset. This we can do using Linear Regression.

    So at first, we will plot this data into a graph.

    The below graphs describe the area of plots (10 sq mtr) in x-axis and its prices in y-axis (Lakhs INR).

    Definition of Linear Regression

    The objective of a linear regression model is to find a relationship between one or more features (independent variables) and a continuous target variable(dependent variable). When there is only feature it is called Univariate Linear Regression and if there are multiple features, it is called Multiple Linear Regression.

    Hypothesis function:

    Here we will try to find the relation between price and area of plots. As this is an example of univariate, we can see that the price is only dependent on the area of the plot.

    By observing this pattern we can have our hypothesis function as below:

    f(x) = w * x + b

    where w is weightage and b is biased.

    For the different value set of (w,b) there can be multiple line possible but for one set of value, it will be close to this pattern.

    When we generalize this function for multivariable then there will be a set of values of w then these constants are also termed as model params.

    Note: There is a range of mathematical functions that relate to this pattern and selection of the function is totally up to us. But point to be taken care is that neither it should be under or overmatched and function must be continuous so that we can easily differentiate it or it should have global minima or maxima.

    Error for a point

    As our hypothesis function is continuous, for every Xi (area points) there will be one Yi  Predicted Price and Y will be the actual price.

    So the error at any point,

    Ei = Yi – Y = F(Xi) – Y

    These errors are also called as residuals. These residuals can be positive (if actual points lie below the predicted line) or negative (if actual points lie above the predicted line). Our motive is to minimize this residual for each of the points.

    Note: While observing the patterns it is possible that few points are very far from the pattern. For these far points, residuals will be much more so if these points are less in numbers than we can avoid these points considering that these are errors in the dataset. Such points are termed as outliers.

    Energy Functions

    As there are m training points, we can calculate the Average Energy function below

    E (w,b) =  1/m ( iΣm  (Ei) )

    and

    our motive is to minimize the energy functions

    min (E (w,b)) at point ( w,b )

    Little Calculus: For any continuous function, the points where the first derivative is zero are the points of either minima or maxima. If the second derivative is negative, it is the point of maxima and if it is positive, it is the point of minima.

    Here we will do the trick – we will convert our energy function into an upper parabola by squaring the error function. It will ensure that our energy function will have only one global minima (the point of our concern). It will simplify our calculation that where the first derivative of the energy function will be zero is the point that we need and the value of  (w,b) at that point will be our required point.

    So our final Energy function is

    E (w,b) =  1/2m ( iΣm  (Ei)2 )

    dividing by 2 doesn’t affect our result and at the time of derivation it will cancel out for e.g

    the first derivative of x2  is 2x.

    Gradient Descent Method

    Gradient descent is a generic optimization algorithm. It iteratively hit and trials the parameters of the model in order to minimize the energy function.

    In the above picture, we can see on the right side:

    1. w0 and w1 is the random initialization and by following gradient descent it is moving towards global minima.
    2. No of turns of the black line is the number of iterations so it must not be more or less.
    3. The distance between the turns is alpha i.e the learning parameter.

    By solving this left side equation we will be able to get model params at the global minima of energy functions.

    Points to consider at the time of Gradient Descent calculations:

    1. Random initialization: We start this algorithm at any random point that is set of random (w, b) value. By moving along this algorithm decide at which direction new trials have to be taken. As we know that it will be the upper parabola so by moving into the right direction (towards the global minima) we will get lesser value compared to the previous point.
    2. No of iterations: No of iteration must not be more or less. If it is lesser, we will not reach global minima and if it is more, then it will be extra calculations around the global minima.
    3. Alpha as learning parameters: when alpha is too small then gradient descent will be slow as it takes unnecessary steps to reach the global minima. If alpha is too big then it might overshoot the global minima. In this case it will neither converge nor diverge.

    Implementation of Gradient Descent in Python

    """ Method to read the csv file using Pandas and later use this data for linear regression. """
    """ Better run with Python 3+. """
    
    # Library to read csv file effectively
    import pandas
    import matplotlib.pyplot as plt
    import numpy as np
    
    # Method to read the csv file
    def load_data(file_name):
    	column_names = ['area', 'price']
    	# To read columns
    	io = pandas.read_csv(file_name,names=column_names, header=None)
    	x_val = (io.values[1:, 0])
    	y_val = (io.values[1:, 1])
    	size_array = len(y_val)
    	for i in range(size_array):
    		x_val[i] = float(x_val[i])
    		y_val[i] = float(y_val[i])
    		return x_val, y_val
    
    # Call the method for a specific file
    x_raw, y_raw = load_data('area-price.csv')
    x_raw = x_raw.astype(np.float)
    y_raw = y_raw.astype(np.float)
    y = y_raw
    
    # Modeling
    w, b = 0.1, 0.1
    num_epoch = 100
    converge_rate = np.zeros([num_epoch , 1], dtype=float)
    learning_rate = 1e-3
    for e in range(num_epoch):
    	# Calculate the gradient of the loss function with respect to arguments (model parameters) manually.
    	y_predicted = w * x_raw + b
    	grad_w, grad_b = (y_predicted - y).dot(x_raw), (y_predicted - y).sum()
    	# Update parameters.
    	w, b = w - learning_rate * grad_w, b - learning_rate * grad_b
    	converge_rate[e] = np.mean(np.square(y_predicted-y))
    
    print(w, b)
    print(f"predicted function f(x) = x * {w} + {b}" )
    calculatedprice = (10 * w) + b
    print(f"price of plot with area 10 sqmtr = 10 * {w} + {b} = {calculatedprice}")

    This is the basic implementation of Gradient Descent algorithms using numpy and Pandas. It is basically reading the area-price.csv file. Here we are normalizing the x-axis for better readability of data points over the graph. We have taken (w,b) as (0.1, 0.1) as random initialization. We have taken 100 as count of iterations and learning rate as .001.

    In every iteration, we are calculating w and b value and seeing it for converging rate.

    We can repeat this calculation for (w,b) for different values of random initialization, no of iterations and learning rate (alpha).

    Note: There is another python Library TensorFlow which is more preferable for such calculations. There are inbuilt functions of Gradient Descent in TensorFlow. But for better understanding, we have used library numpy and pandas here.

    RMSE (Root Mean Square Error)

    RMSE: This is the method to verify that our calculation of (w,b) is accurate at what extent. Below is the basic formula of calculation of RMSE where f is the predicted value and the observed value.

    Note: There is no absolute good or bad threshold value for RMSE, however, we can assume this based on our observed value. For an observed value ranges from 0 to 1000, the RMSE value of 0.7 is small, but if the range goes from 0 to 1, it is not that small.

    Conclusion

    As part of this article, we have seen a little introduction to Machine Learning and the need for it. Then with the help of a very basic example, we learned about one of the various optimization algorithms i.e. Linear Regression (for univariate only). This can be generalized for multivariate also. We then use the Gradient Descent Method for the calculation of the predicted data model in Linear Regression. We also learned the basic flow details of Gradient Descent. There is one example in python for displaying Linear Regression via Gradient Descent.

  • Lessons Learnt While Building an ETL Pipeline for MongoDB & Amazon Redshift Using Apache Airflow

    Recently, I was involved in building an ETL (Extract-Transform-Load) pipeline. It included extracting data from MongoDB collections, perform transformations and then loading it into Redshift tables. Many ETL solutions are available in the market which kind-of solves the issue, but the key part of an ETL process lies in its ability to transform or process raw data before it is pushed to its destination.

    Each ETL pipeline comes with a specific business requirement around processing data which is hard to be achieved using off-the-shelf ETL solutions. This is why a majority of ETL solutions are custom built manually, from scratch. In this blog, I am going to talk about my learning around building a custom ETL solution which involved moving data from MongoDB to Redshift using Apache Airflow.

    Background:

    I began by writing a Python-based command line tool which supported different phases of ETL, like extracting data from MongoDB, processing extracted data locally, uploading the processed data to S3, loading data from S3 to Redshift, post-processing and cleanup. I used the PyMongo library to interact with MongoDB and the Boto library for interacting with Redshift and S3.

    I kept each operation atomic so that multiple instances of each operation can run independently of each other, which will help to achieve parallelism. One of the major challenges was to achieve parallelism while running the ETL tasks. One option was to develop our own framework based on threads or developing a distributed task scheduler tool using a message broker tool like Celery combined with RabbitMQ. After doing some research I settled for Apache Airflow. Airflow is a Python-based scheduler where you can define DAGs (Directed Acyclic Graphs), which would run as per the given schedule and run tasks in parallel in each phase of your ETL. You can define DAG as Python code and it also enables you to handle the state of your DAG run using environment variables. Features like task retries on failure handling are a plus.

    We faced several challenges while getting the above ETL workflow to be near real-time and fault tolerant. We discuss the challenges faced and the solutions below:

    Keeping your ETL code changes in sync with Redshift schema

    While you are building the ETL tool, you may end up fetching a new field from MongoDB, but at the same time, you have to add that column to the corresponding Redshift table. If you fail to do so the ETL pipeline will start failing. In order to tackle this, I created a database migration tool which would become the first step in my ETL workflow.

    The migration tool would:

    • keep the migration status in a Redshift table and
    • would track all migration scripts in a code directory.

    In each ETL run, it would get the most recently ran migrations from Redshift and would search for any new migration script available in the code directory. If found it would run the newly found migration script after which the regular ETL tasks would run. This adds the onus on the developer to add a migration script if he is making any changes like addition or removal of a field that he is fetching from MongoDB.

    Maintaining data consistency

    While extracting data from MongoDB, one needs to ensure all the collections are extracted at a specific point in time else there can be data inconsistency issues. We need to solve this problem at multiple levels:

    • While extracting data from MongoDB define parameters like modified date and extract data from different collections with a filter as records less than or equal to that date. This will ensure you fetch point in time data from MongoDB.
    • While loading data into Redshift tables, don’t load directly to master table, instead load it to some staging table. Once you are done loading data in staging for all related collections, load it to master from staging within a single transaction. This way data is either updated in all related tables or in none of the tables.

    A single bad record can break your ETL

    While moving data across the ETL pipeline into Redshift, one needs to take care of field formats. For example, the Date field in the incoming data can be different than that in the Redshift schema design. Another example can be that the incoming data can exceed the length of the field in the schema. Redshift’s COPY command which is used to load data from files to redshift tables is very vulnerable to such changes in data types. Even a single incorrectly formatted record will lead to all your data getting rejected and effectively breaking the ETL pipeline.

    There are multiple ways in which we can solve this problem. Either handle it in one of the transform jobs in the pipeline. Alternately we put the onus on Redshift to handle these variances. Redshift’s COPY command has many options which can help you solve these problems. Some of the very useful options are

    • ACCEPTANYDATE: Allows any date format, including invalid formats such as 00/00/00 00:00:00, to be loaded without generating an error.
    • ACCEPTINVCHARS: Enables loading of data into VARCHAR columns even if the data contains invalid UTF-8 characters.
    • TRUNCATECOLUMNS: Truncates data in columns to the appropriate number of characters so that it fits the column specification.

    Redshift going out of storage

    Redshift is based on PostgreSQL and one of the common problems is when you delete records from Redshift tables it does not actually free up space. So if your ETL process is deleting and creating new records frequently, then you may run out of Redshift storage space. VACUUM operation for Redshift is the solution to this problem. Instead of making VACUUM operation a part of your main ETL flow, define a different workflow which runs on a different schedule to run VACUUM operation. VACUUM operation reclaims space and resorts rows in either a specified table or all tables in the current database. VACUUM operation can be FULL, SORT ONLY, DELETE ONLY & REINDEX. More information on VACUUM can be found here.

    ETL instance going out of storage

    Your ETL will be generating a lot of files by extracting data from MongoDB onto your ETL instance. It is very important to periodically delete those files otherwise you are very likely to go out of storage on your ETL server. If your data from MongoDB is huge, you might end up creating large files on your ETL server. Again, I would recommend defining a different workflow which runs on a different schedule to run a cleanup operation.

    Making ETL Near Real Time

    Processing only the delta rather than doing a full load in each ETL run

    ETL would be faster if you keep track of the already processed data and process only the new data. If you are doing a full load of data in each ETL run, then the solution would not scale as your data scales. As a solution to this, we made it mandatory for the collection in our MongoDB to have a created and a modified date. Our ETL would check the maximum value of the modified date for the given collection from the Redshift table. It will then generate the filter query to fetch only those records from MongoDB which have modified date greater than that of the maximum value. It may be difficult for you to make changes in your product, but it’s worth the effort!

    Compressing and splitting files while loading

    A good approach is to write files in some compressed format. It saves your storage space on ETL server and also helps when you load data to Redshift. Redshift COPY command suggests that you provide compressed files as input. Also instead of a single huge file, you should split your files into parts and give all files to a single COPY command. This will enable Redshift to use it’s computing resources across the cluster to do the copy in parallel, leading to faster loads.

    Streaming mongo data directly to S3 instead of writing it to ETL server

    One of the major overhead in the ETL process is to write data first to ETL server and then uploading it to S3. In order to reduce disk IO, you should not store data to ETL server. Instead, use MongoDB’s handy stream API. For MongoDB Node driver, both the collection.find() and the collection.aggregate() function return cursors. The stream method also accepts a transform function as a parameter. All your custom transform logic could go into the transform function. AWS S3’s node library’s upload() function, also accepts readable streams. Use the stream from the MongoDB Node stream method, pipe it into zlib to gzip it, then feed the readable stream into AWS S3’s Node library. Simple! You will see a large improvement in your ETL process by this simple but important change.

    Optimizing Redshift Queries

    Optimizing Redshift Queries helps in making the ETL system highly scalable, efficient and also reduce the cost. Lets look at some of the approaches:

    Add a distribution key

    Redshift database is clustered, meaning your data is stored across cluster nodes. When you query for certain set of records, Redshift has to search for those records in each node, leading to slow queries. A distribution key is a single metric, which will decide the data distribution of all data records across your tables. If you have a single metric which is available for all your data, you can specify it as distribution key. When loading data into Redshift, all data for a certain value of distribution key will be placed on a single node of Redshift cluster. So when you query for certain records Redshift knows exactly where to search for your data. This is only useful when you are also using the distribution key to query the data.

    Source: Slideshare

     

    Generating a numeric primary key for string primary key

    In MongoDB, you can have any type of field as your primary key. If your Mongo collections are having a non-numeric primary key and you are using those same keys in Redshift, your joins will end up being on string keys which are slower. Instead, generate numeric keys for your string keys and joining on it which will make queries run much faster. Redshift supports specifying a column with an attribute as IDENTITY which will auto-generate numeric unique value for the column which you can use as your primary key.

    Conclusion:

    In this blog, I have covered the best practices around building ETL pipelines for Redshift  based on my learning. There are many more recommended practices which can be easily found in Redshift and MongoDB documentation. 

  • An Innovator’s Guide to Kubernetes Storage Using Ceph

    Kubernetes, the awesome container orchestration tool is changing the way applications are being developed and deployed. You can specify the required resources you want and have it available without worrying about the underlying infrastructure. Kubernetes is way ahead in terms of high availability, scaling, managing your application, but storage section in the k8s is still evolving. Many storage supports are getting added and are production ready.

    People are preferring clustered applications to store the data. But, what about the non-clustered applications? Where does these applications store data to make it highly available? Considering these questions, let’s go through the Ceph storage and its integration with Kubernetes.

    What is Ceph Storage?

    Ceph is open source, software-defined storage maintained by RedHat. It’s capable of block, object, and file storage. The clusters of Ceph are designed in order to run on any hardware with the help of an algorithm called CRUSH (Controlled Replication Under Scalable Hashing). This algorithm ensures that all the data is properly distributed across the cluster and data quickly without any constraints. Replication, Thin provisioning, Snapshots are the key features of the Ceph storage.

    There are good storage solutions like Gluster, Swift but we are going with Ceph for following reasons:

    1. File, Block, and Object storage in the same wrapper.
    2. Better transfer speed and lower latency
    3. Easily accessible storage that can quickly scale up or down

    We are going to use 2 types of storage in this blog to integrate with kubernetes.

    1. Ceph-RBD
    2. CephFS

    Ceph Deployment

    Deploying highly available Ceph cluster is pretty straightforward and easy. I am assuming that you are familiar with setting up the Ceph cluster. If not then refer the official document here.

    If you check the status, you should see something like:

    # ceph -s
      cluster:
        id:     ed0bfe4e-f44c-4797-9bc6-21a988b645c7
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum ip-10-0-1-118,ip-10-0-1-172,ip-10-0-1-227
        mgr: ip-10-0-1-118(active), standbys: ip-10-0-1-227, ip-10-0-1-172
        mds: cephfs-1/1/1 up  {0=ip-10-0-1-118=up:active}
        osd: 3 osds: 3 up, 3 in
     
      data:
        pools:   2 pools, 160 pgs
        objects: 22  objects, 19 KiB
        usage:   3.0 GiB used, 21 GiB / 24 GiB avail
        pgs:     160 active+clean
    @velotiotech

    Here notice that my Ceph monitors IPs are 10.0.1.118, 10.0.1.227 and 10.0.1.172

    K8s Integration

    After setting up the Ceph cluster, we would consume it with Kubernetes.  I am assuming that your Kubernetes cluster is up and running. We will be using Ceph-RBD and CephFS as storage in Kubernetes.

    Ceph-RBD and Kubernetes

    We need a Ceph RBD client to achieve interaction between Kubernetes cluster and CephFS. This client is not in the official kube-controller-manager container so let’s try to create the external storage plugin for Ceph.

    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: rbd-provisioner
    rules:
      - apiGroups: [""]
        resources: ["persistentvolumes"]
        verbs: ["get", "list", "watch", "create", "delete"]
      - apiGroups: [""]
        resources: ["persistentvolumeclaims"]
        verbs: ["get", "list", "watch", "update"]
      - apiGroups: ["storage.k8s.io"]
        resources: ["storageclasses"]
        verbs: ["get", "list", "watch"]
      - apiGroups: [""]
        resources: ["events"]
        verbs: ["create", "update", "patch"]
      - apiGroups: [""]
        resources: ["services"]
        resourceNames: ["kube-dns","coredns"]
        verbs: ["list", "get"]
      - apiGroups: [""]
        resources: ["endpoints"]
        verbs: ["get", "list", "watch", "create", "update", "patch"]
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: rbd-provisioner
    subjects:
      - kind: ServiceAccount
        name: rbd-provisioner
        namespace: kube-system
    roleRef:
      kind: ClusterRole
      name: rbd-provisioner
      apiGroup: rbac.authorization.k8s.io
    ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: Role
    metadata:
      name: rbd-provisioner
    rules:
    - apiGroups: [""]
      resources: ["secrets"]
      verbs: ["get"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: rbd-provisioner
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: rbd-provisioner
    subjects:
    - kind: ServiceAccount
      name: rbd-provisioner
      namespace: kube-system
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: rbd-provisioner
    ---
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: rbd-provisioner
    spec:
      replicas: 1
      strategy:
        type: Recreate
      template:
        metadata:
          labels:
            app: rbd-provisioner
        spec:
          containers:
          - name: rbd-provisioner
            image: "quay.io/external_storage/rbd-provisioner:latest"
            env:
            - name: PROVISIONER_NAME
              value: ceph.com/rbd
          serviceAccount: rbd-provisioner

    # kubectl create -n kube-system -f  Ceph-RBD-Provisioner.yaml

    • You will get output like this:
    clusterrole.rbac.authorization.k8s.io/rbd-provisioner created
    clusterrolebinding.rbac.authorization.k8s.io/rbd-provisioner created
    role.rbac.authorization.k8s.io/rbd-provisioner created
    rolebinding.rbac.authorization.k8s.io/rbd-provisioner created
    serviceaccount/rbd-provisioner created
    deployment.extensions/rbd-provisioner created

    • Check RBD volume provisioner status and wait till it comes up in running state. You would see something like following:
    [root@ip-10-0-1-226 Ceph-RBD]# kubectl get pods -l app=rbd-provisioner -n kube-system
    NAME                               READY     STATUS    RESTARTS   AGE
    rbd-provisioner-857866b5b7-vc4pr   1/1       Running   0          16s

    • Once the provisioner is up, provisioner needs the admin key for the storage provision. You can run the following command to get the admin key:
    # ceph auth get-key client.admin
    AQDyWw9dOUm/FhAA4JCA9PXkPo6+OXpOj9N2ZQ==
    
    # kubectl create secret generic ceph-secret 
        --type="kubernetes.io/rbd" 
        --from-literal=key='AQDyWw9dOUm/FhAA4JCA9PXkPo6+OXpOj9N2ZQ==' 
        --namespace=kube-system

    • Let’s create a separate Ceph pool for Kubernetes and the new client key:
    # ceph --cluster ceph osd pool create kube 1024 1024
    # ceph --cluster ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'

    • Get the auth token which we created in the above command and create kubernetes secret for new client secret for kube pool.
    # ceph --cluster ceph auth get-key client.kube
    AQDabg9d4MBeIBAAaOhTjqsYpsNa4X10V0qCfw==
    
    # kubectl create secret generic ceph-secret-kube 
        --type="kubernetes.io/rbd" 
        --from-literal=key=”AQDabg9d4MBeIBAAaOhTjqsYpsNa4X10V0qCfw=='' 
        --namespace=kube-system

    • Now let’s create the storage class.
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: fast-rbd
    provisioner: ceph.com/rbd
    parameters:
      monitors: 10.0.1.118:6789, 10.0.1.227:6789, 10.0.1.172:6789
      adminId: admin
      adminSecretName: ceph-secret
      adminSecretNamespace: kube-system
      pool: kube
      userId: kube
      userSecretName: ceph-secret-kube
      userSecretNamespace: kube-system
      imageFormat: "2"
      imageFeatures: layering

    # kubectl create -f Ceph-RBD-StorageClass.yaml

    • We are all set now. We can test the Ceph-RBD by creating the PVC. After creating the PVC, PV will get created automatically.  Let’s create the PVC now:
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: testclaim
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      storageClassName: fast-rbd

    # kubectl create -f Ceph-RBD-PVC.yaml
    
    [root@ip-10-0-1-226 Ceph-RBD]# kubectl get pvc
    NAME      STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    testclaim  Bound     pvc-c215ad98-95b3-11e9-8b5d-12e154d66096   1Gi        RWO            fast-rbd       2m

    • If you check pvc, you’ll find it shows that it’s been bounded with the pv which got created by storage class.
    • Let’s check the persistent volume
    [root@ip-10-0-1-226 Ceph-RBD]# kubectl get pv
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM             STORAGECLASS   REASON    AGE
    pvc-c215ad98-95b3-11e9-8b5d-12e154d66096   1Gi        RWO            Delete           Bound     default/testclaim   fast-rbd                 8m

    Till now we have seen how to use the block based storage i.e Ceph-RBD with kubernetes by creating the dynamic storage provisioner. Now let’s go through the process for setting up the storage using file system based storage i.e. CephFS.  

    CephFS and Kubernetes

    • Let’s create the provisioner and storage class for the CephFS.  Create the dedicated namespace for CephFS
    # kubectl create ns cephfs

    • Create the kubernetes secrete using the Ceph admin auth token
    # ceph auth get-key client.admin
    AQDyWw9dOUm/FhAA4JCA9PXkPo6+OXpOj9N2ZQ==
    
    # kubectl create secret generic ceph-secret-admin --from-literal=key="AQDyWw9dOUm/FhAA4JCA9PXkPo6+OXpOj9N2ZQ==" -n cephfs

    • Create the cluster role, role binding, provisioner
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: cephfs-provisioner
      namespace: cephfs
    rules:
      - apiGroups: [""]
        resources: ["persistentvolumes"]
        verbs: ["get", "list", "watch", "create", "delete"]
      - apiGroups: [""]
        resources: ["persistentvolumeclaims"]
        verbs: ["get", "list", "watch", "update"]
      - apiGroups: ["storage.k8s.io"]
        resources: ["storageclasses"]
        verbs: ["get", "list", "watch"]
      - apiGroups: [""]
        resources: ["events"]
        verbs: ["create", "update", "patch"]
      - apiGroups: [""]
        resources: ["services"]
        resourceNames: ["kube-dns","coredns"]
        verbs: ["list", "get"]
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: cephfs-provisioner
    subjects:
      - kind: ServiceAccount
        name: cephfs-provisioner
        namespace: cephfs
    roleRef:
      kind: ClusterRole
      name: cephfs-provisioner
      apiGroup: rbac.authorization.k8s.io
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: cephfs-provisioner
      namespace: cephfs
    rules:
      - apiGroups: [""]
        resources: ["secrets"]
        verbs: ["create", "get", "delete"]
      - apiGroups: [""]
        resources: ["endpoints"]
        verbs: ["get", "list", "watch", "create", "update", "patch"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: cephfs-provisioner
      namespace: cephfs
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: cephfs-provisioner
    subjects:
    - kind: ServiceAccount
      name: cephfs-provisioner
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: cephfs-provisioner
      namespace: cephfs
    ---
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: cephfs-provisioner
      namespace: cephfs
    spec:
      replicas: 1
      strategy:
        type: Recreate
      template:
        metadata:
          labels:
            app: cephfs-provisioner
        spec:
          containers:
          - name: cephfs-provisioner
            image: "quay.io/external_storage/cephfs-provisioner:latest"
            env:
            - name: PROVISIONER_NAME
              value: ceph.com/cephfs
            - name: PROVISIONER_SECRET_NAMESPACE
              value: cephfs
            command:
            - "/usr/local/bin/cephfs-provisioner"
            args:
            - "-id=cephfs-provisioner-1"
          serviceAccount: cephfs-provisioner

    # kubectl create -n cephfs -f Ceph-FS-Provisioner.yaml

    • Create the storage class
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: cephfs
    provisioner: ceph.com/cephfs
    parameters:
        monitors: 10.0.1.226:6789, 10.0.1.205:6789, 10.0.1.82:6789
        adminId: admin
        adminSecretName: ceph-secret-admin
        adminSecretNamespace: cephfs
        claimRoot: /pvc-volumes

    # kubectl create -f Ceph-FS-StorageClass.yaml

    • We are all set now. CephFS provisioner is created. Let’s wait till it gets into running state.
    # kubectl get pods -n cephfs
    NAME                                 READY     STATUS    RESTARTS   AGE
    cephfs-provisioner-8d957f95f-s7mdq   1/1       Running   0          1m

    • Once the CephFS provider is up, try creating the persistent volume claim. In this step, storage class will take care of creating the persistent volume dynamically.
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: claim1
    spec:
      storageClassName: cephfs
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 1Gi

    # kubectl create -f Ceph-FS-PVC.yaml

    • Let’s check the create PV and PVC
    # kubectl get pvc
    NAME      STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    claim1    Bound     pvc-a7db18a7-9641-11e9-ab86-12e154d66096   1Gi        RWX            cephfs         2m
    
    # kubectl get pv
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM            STORAGECLASS   REASON    AGE
    pvc-a7db18a7-9641-11e9-ab86-12e154d66096   1Gi        RWX            Delete           Bound     default/claim1   cephfs                   2m

    Conclusion

    We have seen how to integrate the Ceph storage with Kubernetes. In the integration, we covered ceph-rbd and cephfs. This approach is highly useful when your application is not clustered application and if you are looking to make it highly available.

  • Kubernetes CSI in Action: Explained with Features and Use Cases

    Kubernetes Volume plugins have been a great way for the third-party storage providers to support a block or file storage system by extending the Kubernetes volume interface and are “In-Tree” in nature.

    In this post, we will dig into Kubernetes Container Storage Interface. We will use Hostpath CSI Driver locally on a single node bare metal cluster, to get the conceptual understanding of the CSI workflow in provisioning the Persistent Volume and its lifecycle. Also, a cool feature of snapshotting the volume and recover it back is explained.

    Introduction

    CSI is a standard for exposing  storage systems in arbitrary block and file storage to containerized workloads on Container Orchestrations like Kubernetes, Mesos, and Cloud Foundry. It becomes very extensible for third-party storage provider to expose their new storage systems using CSI, without actually touching the Kubernetes code. Single independent implementation of CSI Driver by a storage provider will work on any orchestrator.

    This new plugin mechanism has been one of the most powerful features of Kubernetes. It enables the storage vendors to:

    1. Automatically create storage when required.
    2. Make storage available to containers wherever they’re scheduled.
    3. Automatically delete the storage when no longer needed.

    This decoupling helps the vendors to maintain the independent release and feature cycles and focus on the API implementation without actually worrying about the backward incompatibility and to support their plugin just as easy as deploying a few pods.

     

    Image Source: Weekly Geekly

    Why CSI?

    Prior to CSI, k8s volume plugins have to be “In-tree”, compiled and shipped with core kubernetes binaries. This means, it will require the storage providers to check-in their into the core k8s codebase if they wish to add the support for a new storage system.

    A plugin-based solution, flex-volume, tried to address this issue by exposing the exec based API for external  plugins. Although it also tried to work on the similar notion of being detached with k8s binary, there were several major problems with that approach. Firstly, it needed the root access to the host and master file system to deploy the driver files. 

    Secondly, it comes with the huge baggage of prerequisites and OS dependencies which are assumed to be available on the host. CSI implicitly solves all these issues by being containerized and using the k8s storage primitives.

    CSI has evolved as the one-stop solution addressing all the above issues which enables storage plugins to be out-of-tree and deployed via standard k8s primitives, such as PVC, PV and StorageClasses.

    The main aim of introducing CSI is to establish a standard mechanism of exposing any type of storage system under-the-hood for all the container orchestrators.

    Deploy the Driver Plugin

    The CSI Driver comprises of a few main components which are various side cars and also the implementation of the CSI Services by the vendor, which will be understood by the Cos. The CSI Services will be described later in the blog. Let’s try out deploying hostpath CSI Driver.

    Prerequisites:

    • Kubernetes cluster (not Minikube or Microk8s): Running version 1.13 or later
    • Access to the terminal with Kubectl installed

    Deploying HostPath Driver Plugin:

    1. Clone the repo of HostPath Driver Plugin locally or just copy the deploy and example folder from the root path
    2. Checkout the master branch (if not)
    3. The hostpath driver comprises of manifests for following side-cars: (in ./deploy/master/hostpath/)
      – csi-hostpath-attacher.yaml
      – csi-hostpath-provisioner.yaml
      – csi-hostpath-snapshotter.yaml
      – csi-hostpath-plugin.yaml:
      It will deploy 2 containers, one is node-driver-registrar and a hospath-plugin
    4. The driver also includes separate Service for each component and in the deployment file with statefulsets for the containers
    5. It also deploys Cluster-role-bindings and RBAC rules for each component, maintained in a separate repo
    6. Each Component (side-car) is managed in a separate repository
    7. The /deploy/util/ contains a shell script which handles the complete deployment process
    8. After copying the folder or cloning the repo, just run:    
    $ deploy/kubernetes-latest/deploy-hostpath.sh

         9. The output will be similar to:

    applying RBAC rules
    kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-provisioner/v1.0.1/deploy/kubernetes/rbac.yaml
    serviceaccount/csi-provisioner created
    clusterrole.rbac.authorization.k8s.io/external-provisioner-runner created
    clusterrolebinding.rbac.authorization.k8s.io/csi-provisioner-role created
    role.rbac.authorization.k8s.io/external-provisioner-cfg created
    rolebinding.rbac.authorization.k8s.io/csi-provisioner-role-cfg created
    kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-attacher/v1.0.1/deploy/kubernetes/rbac.yaml
    serviceaccount/csi-attacher created
    clusterrole.rbac.authorization.k8s.io/external-attacher-runner created
    clusterrolebinding.rbac.authorization.k8s.io/csi-attacher-role created
    role.rbac.authorization.k8s.io/external-attacher-cfg created
    rolebinding.rbac.authorization.k8s.io/csi-attacher-role-cfg created
    kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v1.0.1/deploy/kubernetes/rbac.yaml
    serviceaccount/csi-snapshotter created
    clusterrole.rbac.authorization.k8s.io/external-snapshotter-runner created
    clusterrolebinding.rbac.authorization.k8s.io/csi-snapshotter-role created
    deploying hostpath components
       deploy/kubernetes-1.13/hostpath/csi-hostpath-attacher.yaml
            using           image: quay.io/k8scsi/csi-attacher:v1.0.1
    service/csi-hostpath-attacher created
    statefulset.apps/csi-hostpath-attacher created
       deploy/kubernetes-1.13/hostpath/csi-hostpath-plugin.yaml
            using           image: quay.io/k8scsi/csi-node-driver-registrar:v1.0.2
            using           image: quay.io/k8scsi/hostpathplugin:v1.0.1
            using           image: quay.io/k8scsi/livenessprobe:v1.0.2
    service/csi-hostpathplugin created
    statefulset.apps/csi-hostpathplugin created
       deploy/kubernetes-1.13/hostpath/csi-hostpath-provisioner.yaml
            using           image: quay.io/k8scsi/csi-provisioner:v1.0.1
    service/csi-hostpath-provisioner created
    statefulset.apps/csi-hostpath-provisioner created
       deploy/kubernetes-1.13/hostpath/csi-hostpath-snapshotter.yaml
            using           image: quay.io/k8scsi/csi-snapshotter:v1.0.1
    service/csi-hostpath-snapshotter created
    statefulset.apps/csi-hostpath-snapshotter created
       deploy/kubernetes-1.13/hostpath/csi-hostpath-testing.yaml
            using           image: alpine/socat:1.0.3
    service/hostpath-service created
    statefulset.apps/csi-hostpath-socat created
    11:43:06 waiting for hostpath deployment to complete, attempt #0
    11:43:16 waiting for hostpath deployment to complete, attempt #1
    11:43:26 waiting for hostpath deployment to complete, attempt #2
    deploying snapshotclass
    volumesnapshotclass.snapshot.storage.k8s.io/csi-hostpath-snapclass created

         10. The driver is deployed, we can check:

    $ kubectl get pods
    
    NAME                          READY   STATUS        RESTARTS    AGE
    csi-hostpath-attacher-0       1/1     Running        0          1m06s
    csi-hostpath-provisioner-0    1/1     Running        0          1m06s
    csi-hostpath-snapshotter-0    1/1     Running        0          1m06s
    csi-hostpathplugin-0          2/2     Running        0          1m06s

    CSI API-Resources:

    $ kubectl api-resources | grep -E "^Name|csi|storage|PersistentVolume"
    
    NAME                     APIGROUP                  NAMESPACED     KIND
    persistentvolumesclaims                            true           PersistentVolumeClaim
    persistentvolume                                   false          PersistentVolume
    csidrivers               csi.storage.k8s.io        false          CSIDrivers
    volumesnapshotclasses    snapshot.storage.k8s.io   false          VolumeSnapshotClass
    volumesnapshotcontents   snapshot.storage.k8s.io   false          VolumeSnapshotContent
    Volumesnapshots          snapshot.storage.k8s.io   true           VolumeSnapshot
    csidrivers               storage.k8s.io            false          CSIDriver
    csinodes                 storage.k8s.io            false          CSINode
    storageclasses           storage.k8s.io            false          VolumeAttachment

    There are resources from core apigroups, storage.k8s.io and resources which created by CRDs snapshot.storage.k8s.io and csi.storage.k8s.io.

    CSI SideCars

    K8s CSI containers are sidecars that simplify the development and deployment of the CSI Drivers on a k8s cluster. Different Drivers have some similar logic to trigger the appropriate operations against the “CSI volume driver” container and update the Kubernetes API as appropriate.

    The common controller (common containers) has to be bundled with the provider-specific containers.

    The official sig-k8s contributors maintain the following basic skeleton containers for any CSI Driver:

    Note: In case of Hostpath driver, only ‘csi-hostpath-plugin’ container will be having the specific code. All the others are common CSI sidecar containers. These containers have a socket mounted in the socket-dir volume of type EmptyDir, which makes their communication possible using gRPC

    1. External Provisioner:
      It  is a sidecar container that watches Kubernetes PersistentVolumeClaim objects and triggers CSI CreateVolume and DeleteVolume operations against a driver endpoint.
      The CSI external-attacher also supports the Snapshot DataSource. If a Snapshot CRD is specified as a data source on a PVC object, the sidecar container fetches the information about the snapshot by fetching the SnapshotContent object and populates the data source field indicating to the storage system that new volume should be populated using specified snapshot.
    2. External Attacher :
      It  is a sidecar container that watches Kubernetes VolumeAttachment objects and triggers CSI ControllerPublish and ControllerUnpublish operations against a driver endpoint
    3. Node-Driver Registrar:
      It is a sidecar container that registers the CSI driver with kubelet, and adds the drivers custom NodeId to a label on the Kubernetes Node API Object. The communication of this sidecar is handled by the ‘Identity-Service’ implemented by the driver. The CSI Driver is registered with the kubelet using its device–plugin mechanisms
    4. External Snapshotter:
      It is a sidecar container that watches the Kubernetes API server for VolumeSnapshot and VolumeSnapshotContent CRD objects.The creation of a new VolumeSnapshot object referencing a SnapshotClass CRD object corresponding to this driver causes the sidecar container to provision a new snapshot.
    5. This sidecar listens to the service which indicates the successful creation of VolumeSnapshot, and immediately creates the VolumeSnapshotContent resource
    6. Cluster-driver Registrar:
      CSI driver is registered with the cluster by a sidecar container CSI cluster-driver-registrar creating a CSIDriver object. This CSIDriver enables the driver to customize the way of k8s interaction with it.

    Developing a CSI Driver

    To start the implementation of CSIDriver, an application must implement the gRPC services described by the CSI Specification.

    The minimum service a CSI application should implement are following:

    • CSI Identity service: Enables Kubernetes components and CSI containers to identify the driver
    • CSI Node service: Required methods enable callers to make volume available at a specified path.

    All the required services may be implemented independently or in the same driver application. The CSI driver application should be containerised to make it easy to deploy on Kubernetes. Once the main specific logic of the driver is containerized, they can be attached to the sidecars and deployed, in node and/or controller mode.

    Capabilities

    CSI also have provisions to enable the custom CSI driver to support many additional features/services by using the “Capabilities”. It contains a list of all the features the driver supports.

    Note: Refer the link for detailed explanation for developing a CSI Driver.

    Try out provisioning the PV:

    1. A storage class with:

    volumeBindingMode: WaitForFirstConsumer
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: csi-hostpath-sc
    provisioner: hostpath.csi.k8s.io
    volumeBindingMode: WaitForFirstConsumer

    2. Now, A PVC is also needed to be consumed by the sample Pod.

    And also a sample pod is also required, so that it can be bounded with the PV created by the PVC from above step
    The above files are found in ./exmples directory and can be deployed using create or apply kubectl commands

    Validate the deployed components:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: pvc-fs
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      storageClassName: csi-hostpath-sc # defined in csi-setup.yaml

    3. The Pod to consume the PV

    kind: Pod
    apiVersion: v1
    metadata:
      name: pod-fs
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - csi-hostpathplugin
            topologyKey: kubernetes.io/hostname
      containers:
        - name: my-frontend
          image: busybox
          volumeMounts:
          - mountPath: "/data"
            name: my-csi-volume
          command: [ "sleep", "1000000" ]
      volumes:
        - name: my-csi-volume
          persistentVolumeClaim:
            claimName: pvc-fs # defined in csi-pvc.yaml

    Validate the deployed components:

    $ kubectl get pv
    
    NAME                                    CAPACITY ACCESSMODES STATUS CLAIM         STORAGECLASS     
    pvc-58d5ec38-03e5-11e9-be51-000c29e88ff1  1Gi       RWO      Bound  default/pvc-fs csi-hostpath-sc
    $ kubectl get pvc
    
    NAME      STATUS   VOLUME                                     CAPACITY  ACCESS MODES    STORAGECLASS
    csi-pvc   Bound    pvc-58d5ec38-03e5-11e9-be51-000c29e88ff1     1Gi         RWO         csi-hostpath-sc

    Brief on how it works:

    • csi-provisioner issues CreateVolumeRequest call to the CSI socket, then hostpath-plugin calls CreateVolume and informs CSI about its creation
    • csi-provisioner creates PV and updates PVC to be bound and the VolumeAttachment object is created by controller-manager
    • csi-attacher which watches for VolumeAttachments submits ControllerPublishVolume rpc call to hostpath-plugin, then hostpaths-plugin gets ControllerPublishVolume and calls hostpath AttachVolume csi-attacher update VolumeAttachment status
    • All this time kubelet waits for volume to be attached and submits NodeStageVolume (format and mount to the node to the staging dir) to the csi-node.hostpath-plugin
    • csi-node.hostpath-plugin gets NodeStageVolume call and mounts to `/var/lib/kubelet/plugins/kubernetes.io/csi/pv/<pv-name>/globalmount`, then responses to kubelet</pv-name>
    • kubelet calls NodePublishVolume (mount volume to the pod’s dir)
    • csi-node.hostpath-plugin performs NodePublishVolume and mounts the volume to `/var/lib/kubelet/pods/<pod-uuid>/volumes/</pod-uuid>kubernetes.io~csi/<pvc-name>/mount`</pvc-name>

      Finally, kubelet starts container of the pod with the provisioned volume.


    Let’s confirm the working of Hostpath CSI driver:

    The Hostpath driver is configured to create new volumes in the hostpath container in the plugin daemonset under the ‘/tmp’ directory. This path persist as long as the DaemonSet pod is up and running.

    If a file is written in the hostpath mounted volume in an application pod, should be seen in the hostpath cotainer.A file written in a properly mounted Hostpath volume inside an application should show up inside the Hostpath container.

    1. To try out the above statement, Create a file on application pod

    $ kubectl exec -it pod-fs /bin/sh
    
    / # touch /data/my-test
    / # exit

    2. And then exec in the hostpath container and run ‘ls’ command to check

    $ kubectl exec -it $(kubectl get pods --selector app=csi-hostpathplugin 
    -o jsonpath='{.items[*].metadata.name}') -c hostpath /bin/sh
    
    / # find /tmp -name my-test
    /tmp/057485ab-c714-11e8-bb16-000c2967769a/my-test
    / # exit

    Note: The better way of the verification is to inspect the VolumeAttachment object created that represents the attached volume API object created that represents the attached volume

    Support for Snapshot

    Volume Snapshotting is introduced as an Alpha feature for the Kubernetes persistent volume in v1.12. 

    Being an alpha feature, ‘VolumeSnapshotDataSource’ feature gate needs to be enabled. This feature opens a pool of use cases of keeping the snapshot of data locally. The API objects used are VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass. It was developed with a similar notion and relationship of PV, PVC and StorageClass. 

    To create a snapshot, the VolumeSnapshot object needs to be created with the source as PVC and VolumeSnapshotClass

    and the CSI-Snapshotter container will create a VolumeSnaphsotContent.

    Let’s try out with an example:

    Just like the provisioner create a PV for us when a PVC is created, similarly a VolumeSnapshotContent object will be created when VolumeSnapshot object is created.

    apiVersion: snapshot.storage.k8s.io/v1alpha1
    kind: VolumeSnapshot
    metadata:
      name: fs-pv-snapshot
    spec:
      snapshotClassName: csi-hostpath-snapclass
      source:
        name: pvc-fs
        kind: PersistentVolumeClaim

    The volumesnapshotcontent is created. The output will look like:

    $ kubectl get volumesnapshotcontent
     
    NAME                                                  AGE
    snapcontent-f55db632-c716-11e8-8911-000c2967769a      14s

    Restore from the snapshot:

    The DataSource field in the PVC can accept the source of kind: VolumeSnapsot which will create a new PV from that volume snapshot, when a Pod is bound to this PVC.

    The new PV will be having the same data as of the PV from which the snapshot was taken and it can be attached to any other pod. The new pod having that PV, proves of the possible “Restore” and “Cloning” use cases.

    Tear Down CSI-Hostpath installation:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: fs-pvc-restore
    spec:
      storageClassName: csi-hostpath-sc
      dataSource:
        name: fs-pv-snapshot
        kind: VolumeSnapshot
        apiGroup: snapshot.storage.k8s.io
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi

    And we’re done here.

    Conclusion

    Since Kubernetes has started supporting the Raw Block Device type of Persistent Volume, hospath driver and any other driver may also support it, which will be explained in the next part of the blog. In this blog, we got deep understanding of the CSI, its components and services. The new features of CSI and the problems it solves. The CSI Hostpath driver was deeply in this blog to experiment and understand the provisioner, snapshotter flows for PV and VolumeSnapshots. Also, the PV snapshot, restore and cloning use cases were demonstrated.

  • A Beginner’s Guide to Kubernetes Python Client

    A couple of weeks back, we started working with the Kubernetes Python client to carry out basic operations on its components/ resources, and that’s when we realized how few resources there were (guides, docs) on the internet. So, we experimented and decided to share our findings with the community.

    This article is targeted towards an audience that is familiar with Kubernetes, its usage, and its architecture. This is not a simple Kubernetes guide; it’s about Kubernetes using Python, so as we move further, we may shed light on a few things that are required, but a few will be left for self exploration.

    Kubernetes Overview

    Kubernetes is an open-source container orchestration tool, largely used to simplify the process of deployment, maintenance, etc. in application development. Kubernetes is built to offer highly available, scalable, and reliable applications.

    Generally, kubectl commands are used to create, list, and delete the Kubernetes resources, but for this article, we put on a developer’s hat and use the Python way of doing things. In this article, we learn how to create, manage, and interact with Kubernetes resources using the Kubernetes’ Python library.

    But why, you may ask?

    Well, having an option of doing things programmatically creates potential of endless exciting innovations for developers. Using Python, we can:

    • Create and manage Kubernetes resources dynamically
    • Apply algorithms that change the state, amount of resources in our cluster
    • Build a more robust application with solid alerting and monitoring features

    So, let us begin:

    Kubernetes achieves what it does with the help of its resources. These resources are the building blocks for developing a scalable, reliable application.

    Let’s briefly explore these resources to understand what they are and how exactly they work together in Kubernetes:

    • Node: Simple server, a physical/virtual machine.
    • Pod: Smallest unit of Kubernetes, provides abstraction over a container. Creates a running env/layer on top of the container. Usually runs only one application container but can run multiple as well.
    • Service: Static IP address for the pod. Remains the same even after the pod dies. Also doubles as a load-balancer for multiple pods:
      a) External services are used to make the app accessible through external sources.
      b) Internal services are used when accessibility is to be restricted.
    • Ingress: Additional layer of security and address translation for services. All the requests first go to ingress then forwarded to the service.
    • ConfigMap: External configuration of your app like urls of database or other services.
    • Secret: To store secret/sensitive data like db-credentials, etc., encoded in base_64 format.
    • Volumes: Kubernetes does not manage any data persistence on its own. Volumes are used to persist data generated by pods. It attaches a physical storage to the pod that can be both local or remote like cloud or on-premise servers.
    • Deployment: Defines blueprint of the pods and its replication factor. A layer of abstraction over pods makes the configuration convenient.
    • StatefulSet: Applications that are stateful are created using these to avoid data inconsistency. Same as deployment.

    These are a few of the basic Kubernetes resources. If you want to explore the rest of these resources, you can click here.

    Apart from these resources, there are also namespaces in Kubernetes. You will come across them quite a few times in this article, so here are some noteworthy points for Kubernetes namespaces:

    Kubernetes namespaces: Used to group/organize resources in the cluster. It is like a virtual cluster inside a cluster. Namespaces are used to:

    1. Structure resources
    2. Avoid conflicts between teams
    3. Share services between different environments
    4. Access restrictions
    5. Limiting resources.

    Setting up the system

    The Kubernetes library comes to our aid with quite a few modules, the ones featured in this article are client and config modules from the package; we will be using these two heavily. So, let’s install the Kubernetes Python Client:

    To install the Kubernetes Python client, we make use of Python’s standard package installer pip:

    pip install kubernetes

    For installation from the source, we can refer to this guide from the official Python client git repository. 

    Now that we have the python-kubernetes package installed, we can import it as:

    from kubernetes import client, config

    Loading cluster configurations

    To load our cluster configurations, we can use one of the following methods:

    config.load_kube_config()  # for local environment
    # or
    config.load_incluster_config()

    Executing this will load the configurations for your clusters from your local or remote .kube/config file.

    Interacting with Kubernetes Resources

    Now that we have loaded the configurations, we can use the client module to interact with the resources.

    Get Resources: kubectl get commands are used to list all kinds of resources in a cluster for eg:

    – List nodes: To list all the nodes in the cluster, we fire following kubectl command:

    kubectl get nodes  # lists all the nodes

    In Python, we instantiate CoreV1Api class from client module:

    v1 = client.CoreV1Api()
    v1.list_node()  
    # returns a JSON with all the info like spec, metadata etc, for each node

    – List namespaces: To list all the namespaces in your cluster, by-default lists at least four:

    kubectl get namespaces  
    #	NAME          		 STATUS   	AGE
    #	default       		 Active   	94d
    #	kube-public   		 Active   	94d
    #	kube-system   		 Active   	94d

    In the Python client, we can achieve the same by:

    v1.list_namespace()
    """
    returns a JSON with all the info like spec, metadata for each namespace
    For eg:
    {'api_version': 'v1',
     'items': [{'api_version': None,
            	'kind': None,
            	'metadata': {'annotations': None,
                         	'cluster_name': None,
                         	'creation_timestamp': datetime.datetime(2021, 2, 11, 11, 29, 32, tzinfo=tzutc()),
                         	'deletion_grace_period_seconds': None,
                         	'deletion_timestamp': None,
                         	'finalizers': None,
                         	'generate_name': None,
                         	'generation': None,
                         	'labels': None,
                         	'managed_fields': [{'api_version': 'v1',
                                             	'fields_type': 'FieldsV1',
                                             	'fields_v1': {'f:status': {'f:phase': {}}},
                                             	'manager': 'kube-apiserver',
                                             	'operation': 'Update',
                                             	'time': datetime.datetime(2021, 2, 11, 11, 29, 32, tzinfo=tzutc())}],
                         	'name': 'default',
                         	'namespace': None,
                         	'owner_references': None,
                         	'resource_version': '199',
                         	'self_link': None,
                         	'uid': '3a362d64-437d-45b5-af19-4af9ae2c75fc'},
            	'spec': {'finalizers': ['kubernetes']},
            	'status': {'conditions': None, 'phase': 'Active'}}],
    'kind': 'NamespaceList',
     'metadata': {'_continue': None,
              	'remaining_item_count': None,
              	'resource_version': '69139',
              	'self_link': None}}
    """

    Similarly, we can list all the resources or resources in a particular namespace.

    For example, to list pods in all namespaces:

    v1.list_pod_for_all_namespaces()
    v1.list_persistent_volume_claim_for_all_namespaces()

    For all the resources that can be group within a given namespace, we can use:

    # v1.list_namespaced_pod(<namespace>)
    v1.list_namespaced_pod(namespace=’default’)
    
    # v1.list_namespaced_service(<namespace>)
    v1.list_namespaced_service(namespace=’default’)
    and so on.

    Creating Resources: The usual way to create resources in Kubernetes is to use a kubectl create command with required parameters (defaults if not specified) or to use kubectl apply command, which takes a YAML/JSON format configuration file as input. This file contains all the specifications and metadata for the component to be created. For example:

    kubectl create deployment my-nginx-depl --image=nginx
    kubectl apply -f nginx_depl.yaml

    Where the contents of nginx_depl.yaml could be as follows:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
      labels:
        app: nginx
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.14.2
            ports:
            - containerPort: 80

    To create resources in Python, though, we use create functions from the same instance of CoreV1Api class:

    # v1.create_namespaced_pod(<namespace>, <body>)
    # v1.create_namespaced_persistent_volume_claim(<namespace>, <body>)

    So, basically, we just need two things: a string type namespace in which we want our resource to be in and a body.

    This body is the same as the config.yaml file that we saw earlier. But, how exactly do we create or use that in our code? We utilize the component specific classes that this library offers us for this.

    Let us take an example, to create a pod we use V1Pod class from the Kubernetes.client.

    An instance of this V1Pod contains all the params like kind, metadata, spec, etc., so all we need to pass them and then we are good. And while we are at it, let’s create metadata and spec as well using a couple more classes.

    1. V1ObjectMeta: This takes all the fields that can be part of metadata as parameters, e.g.

    metadata = client.V1ObjectMeta(name='md1')
    
    # We could also set fields by accessing them through instance like:
    metadata.name = 'md2'

    2. V1Container: If you recall the brief definition of Kubernetes pods given earlier, we realize that pods are just layers above containers, which means we will have to provide the container(s) that the pods abstracts over. The V1Container class from Kubernetes client does just what we need.

    These containers run the specified image, with their name taken as a parameter by the object. Containers also have several other parameters like volume_mounts, ports that can also be passed while instantiation or could be set later using object reference.

    We create a container using:

    # container1 = client.V1Container(<name>, <image>) e.g:
    container1 = client.V1Container(‘my_container’, ‘nginx’)

    Kubernetes pods can have multiple containers running inside, hence the V1PodSpec class expects a list of those while we create a pod spec.

    containers = [container1, container2…]

    3. V1PodSpec: Depending on the component we are working on, the class for its spec and params change. For a pod, we can use V1PodSpec as:

    # pod_spec = client.V1PodSpec(<containers_list>)
    pod_spec = client.V1PodSpec(containers=containers)

    Now that we have both metadata and spec, let’s construct the pod’s body:

    pod_body = client.V1Pod(metadata=metadata, spec=pod_spec, kind='Pod', api_version='v1')

    And then, finally we could pass these to create a pod:

    pod = v1.create_namespaced_pod(namespace=my-namespace, body=pod_body)

    And there you have it, that’s how you create a pod.

    Similarly, we can create other resources, although not all resources take the same set of parameters, for example PersistentVolume (PV in short) does not come under namespaces, it is a cluster wide resource, so naturally it won’t be expecting a namespace parameter.

    Fetching Logs:

    When it comes to monitoring and debugging Kubernetes’ resources, logs play a major role. Using the Kubernetes Python client, we can fetch logs for resources. For example, to fetch logs for a pod:

    Using kubectl:

    # kubectl logs pod_name
    kubectl logs my-pod

    Using Python:

    pod_logs = v1.read_namespaced_pod_log(<pod_name>, <namespace>)
    pod_logs = v1.read_namespaced_pod_log(name=’my-app’, namespace=’default’)

    Deleting Resources: For deletion, we will be following the same class that we have been using so far, i.e kubernetes.client.CoreV1Api.

    There are functions that directly deal with deletion of that component, for example:

    #v1.delete_namespaced_pod(<pod_name>, <namespace>)
    v1.delete_namespaced_pod(name=’my-app’, namespace=’default’)

    Pass the required parameters and the deletion will take place as expected.

    Complete Example for creating a Kubernetes Pod:

    from kubernetes import client, config
    	
    config.load_kube_config()
    v1 = client.CoreV1Api()
    	
    namespaces_list = v1.list_namespace()
    namespaces = [item.metadata.name for item in namespaces_list.items]
    	
    pods_list = v1.list_namespaced_pod(namespace=’default’)
    pods = [item.metadata.name for item in pod_list.items]
    
    containers = []
    container1 = client.V1Container(name=’my-nginx-container’, image=’nginx’)
    containers.append(container1)
    	
    pod_spec = client.V1PodSpec(containers=containers)
    pod_metadata = client.V1ObjectMeta(name=’my-pod’, namespace=’default’)
    
    pod_body = client.V1Pod(api_version=’v1’, kind=’Pod’, metadata=pod_metadata, spec=pod_spec)
    	
    v1.create_namespaced_pod(namespace=’default’, body=pod_body)
    	
    pod_logs = v1.read_namespaced_pod_log(name=’my-pod’, namespace='default')
    
    v1.delete_namespaced_pod(namespace=’default’, name=’my-pod’)

    Conclusion

    There are quite a lot of ways this article could have been written, but as we conclude, it’s quite evident that we have barely scratched the surface. There are many more interesting, advanced things that we can do with this library, but those are beyond the scope of this article.

    We can do almost all the operations with the Python client that we usually do with kubectl on Kubernetes resources. We hope that we managed to keep the content both interesting and informative. 

    If you’re looking for a comprehensive guide on Kubernetes or something interesting to do with it, don’t worry, we’ve got you covered. You can refer to a few of our other articles and might find just what you need:

    1. Kubernetes CSI in Action: Explained with Features and Use Cases

    2. Continuous Deployment with Azure Kubernetes Service, Azure Container Registry & Jenkins

    3. Demystifying High Availability in Kubernetes Using Kubeadm

    References

    1. Official Kubernetes documentation: https://kubernetes.io/docs

    2. Kubernetes Resources: https://kubernetes.io/docs/reference/glossary/?core-object=true

    3. Kubernetes Python client: https://github.com/kubernetes-client/python

    4. Kubclt: https://kubernetes.io/docs/reference/kubectl/overview/

  • Kubernetes Migration: How To Move Data Freely Across Clusters

    This blog focuses on migrating Kubernetes clusters from one cloud provider to another. We will be migrating our entire data from Google Kubernetes Engine to Azure Kubernetes Service using Velero.

    Prerequisite

    • A Kubernetes cluster > 1.10

    Setup Velero with Restic Integration

    Velero consists of a client installed on your local computer and a server that runs in your Kubernetes cluster, like Helm

    Installing Velero Client

    You can find the latest release corresponding to your OS and system and download Velero from there:

    $ wget
    https://github.com/vmware-tanzu/velero/releases/download/v1.3.1/velero-v1.3.1-linux-amd64.tar.gz

    Extract the tarball (change the version depending on yours) and move the Velero binary to /usr/local/bin

    $ tar -xvzf velero-v0.11.0-darwin-amd64.tar.gz
    $ sudo mv velero /usr/local/bin/
    $ velero help

    Create a Bucket for Velero on GCP

    Velero needs an object storage bucket where it will store the backup. Create a GCS bucket using:

    gsutil mb gs://<bucket-name

    Create a Service Account for Velero

    # Create a Service Account
    gcloud iam service-accounts create velero --display-name "Velero service account"
    SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list --filter="displayName:Velero service account" --format 'value(email)')
    
    #Define Permissions for the Service Account
    ROLE_PERMISSIONS=(
    compute.disks.get
    compute.disks.create
    compute.disks.createSnapshot
    compute.snapshots.get
    compute.snapshots.create
    compute.snapshots.useReadOnly
    compute.snapshots.delete
    compute.zones.get
    )
    
    # Create a Role for Velero
    PROJECT_ID=$(gcloud config get-value project)
    
    gcloud iam roles create velero.server 
    --project $PROJECT_ID 
    --title "Velero Server" 
    --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")"
    
    # Create a Role Binding for Velero
    gcloud projects add-iam-policy-binding $PROJECT_ID 
    --member serviceAccount:$SERVICE_ACCOUNT_EMAIL 
    --role projects/$PROJECT_ID/roles/velero.server
    
    gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin
    
    # Generate Service Key file for Velero and save it for later
    gcloud iam service-accounts keys create credentials-velero 
    --iam-account $SERVICE_ACCOUNT_EMAIL

    Install Velero Server on GKE and AKS

    Use the –use-restic flag on the Velero install command to install restic integration.

    $ velero install 
    --use-restic 
    --bucket  
    --provider gcp 
    --secret-file  
    --use-volume-snapshots=false 
    --plugins=--plugins restic/restic
    $ velero plugin add velero/velero-plugin-for-gcp:v1.0.1
    $ velero plugin add velero/velero-plugin-for-microsoft-azure:v1.0.0

    After that, you can see a DaemonSet of restic and deployment of Velero in your Kubernetes cluster.

    $ kubectl get po -n velero

    Restic Components

    In addition, there are three more Custom Resource Definitions and their associated controllers to provide restic support.

    Restic Repository

    • Maintain the complete lifecycle for Velero’s restic repositories.
    • Restic lifecycle commands such as restic init check and prune are handled by this CRD controller.

    PodVolumeBackup

    • This CRD backs up the persistent volume based on the annotated pod in selected namespaces. 
    • This controller executes backup commands on the pod to initialize backups. 

    PodVolumeRestore

    • This controller restores the respective pods that were inside restic backups. And this controller is responsible for the restore commands execution.

    Backup an application on GKE

    For this blog post, we are considering that Kubernetes already has an application that is using persistent volumes. Or you can install WordPress as an example as explained here.

    We will perform GKE Persistent disk migration to Azure Persistent Disk using Velero. 

    Follow the below steps:

    1. To back up, the deployment or statefulset checks for the volume name that is mounted to backup that particular persistent volume. For example, here pods need to be annotated with Volume Name “data”.
    volumes:
        - name: data
            persistentVolumeClaim:
                claimName: mongodb 

    1. Annotate the pods with the volume names, you’d like to take the backup of and only those volumes will be backed up: 
    $ kubectl -n NAMESPACE annotate pod/POD_NAME backup.velero.io/backup-volumes=VOLUME_NAME1,VOLUME_NAME2

    For example, 

    $ kubectl -n application annotate pod/wordpress-pod backup.velero.io/backup-volumes=data

    1. Take a backup of the entire namespace in which the application is running. You can also specify multiple namespaces or skip this flag to backup all namespaces by default.
      We are going to backup only one namespace in this blog.
    $ velero backup create testbackup --include-namespaces application

    1. Monitor the progress of backup:
    $ velero backup describe testbackup --details

       

    Once the backup is complete, you can list it using:

    $ velero backup get

    You can also check the backup on GCP Portal under Storage.
    Select the bucket you created and you should see a similar directory structure:

    Restore the application to AKS

    Follow the below steps to restore the backup:

    1. Make sure to have the same StorageClass available in Azure as used by GKE Persistent Volumes. For example, if the Storage Class of the PVs is “persistent-ssd”, create the same on AKS using below template:
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: persistent-ssd // same name as GKE storageclass name
    provisioner: kubernetes.io/azure-disk
    parameters: 
      storageaccounttype: Premium_LRS
      kind: Managed 

    1. Run Velero restore.
    $ velero restore create testrestore --from-backup testbackup

    You can monitor the progress of restore:

    $ velero restore describe testrestore --details

    You can also check on GCP Portal, a new folder “restores” is created under the bucket.

    In some time, you should be able to see that the application namespace is back and WordPress and MySQL pods are running again.

    Troubleshooting

    For any errors/issues related to Velero, you may find below commands helpful for debugging purposes:

    # Describe the backup to see the status
    $ velero backup describe testbackup --details
    
    # Check backup logs, and look for errors if any
    $ velero backup logs testbackup
    
    # Describe the restore to see the status
    $ velero restore describe testrestore --details
    
    # Check restore logs, and look for errors if any
    $ velero restore logs testrestore
    
    # Check velero and restic pod logs, and look for errors if any
    $ kubectl -n velero logs VELERO_POD_NAME/RESTIC_POD_NAME
    NOTE: You can change the default log-level to debug mode by adding --log-level=debug as an argument to the container command in the velero pod template spec.
    
    # Describe the BackupStorageLocation resource and look for any errors in Events
    $ kubectl describe BackupStorageLocation default -n velero

    Conclusion

    The migration of persistent workloads across Kubernetes clusters on different cloud providers is difficult. This became possible by using restic integration with the Velero backup tool. This tool is still said to be in beta quality as mentioned on the official site. I have performed GKE to AKS migration and it went successfully. You can try other combinations of different cloud providers for migrations.

    The only drawback of using Velero to migrate data is if your data is too huge, it may take a while to complete migration. It took me almost a day to migrate a 350 GB disk from GKE to AKS. But, if your data is comparatively less, this should be a very efficient and hassle-free way to migrate it.