Tag: cloud native

  • Mastering Prow: A Guide to Developing Your Own Plugin for Kubernetes CI/CD Workflow

    Continuous Integration and Continuous Delivery (CI/CD) pipelines are essential components of modern software development, especially in the world of Kubernetes and containerized applications. To facilitate these pipelines, many organizations use Prow, a CI/CD system built specifically for Kubernetes. While Prow offers a rich set of features out of the box, you may need to develop your own plugins to tailor the system to your organization’s requirements. In this guide, we’ll explore the world of Prow plugin development and show you how to get started.

    Prerequisites

    Before diving into Prow plugin development, ensure you have the following prerequisites:

    • Basic Knowledge of Kubernetes and CI/CD Concepts: Familiarity with Kubernetes concepts such as Pods, Deployments, and Services, as well as understanding CI/CD principles, will be beneficial for understanding Prow plugin development.
    • Access to a Kubernetes Cluster: You’ll need access to a Kubernetes cluster for testing your plugins. If you don’t have one already, you can set up a local cluster using tools like Minikube or use a cloud provider’s managed Kubernetes service.
    • Prow Setup: Install and configure Prow in your Kubernetes cluster. You can visit Velotio Technologies – Getting Started with Prow: A Kubernetes-Native CI/CD Framework
    • Development Environment Setup: Ensure you have Git, Go, and Docker installed on your local machine for developing and testing Prow plugins. You’ll also need to configure your environment to interact with your organization’s Prow setup.

    The Need for Custom Prow Plugins

    While Prow provides a wide range of built-in plugins, your organization’s Kubernetes workflow may have specific requirements that aren’t covered by these defaults. This is where developing custom Prow plugins comes into play. Custom plugins allow you to extend Prow’s functionality to cater to your needs. Whether automating workflows, integrating with other tools, or enforcing custom policies, developing your own Prow plugins gives you the power to tailor your CI/CD pipeline precisely.

    Getting Started with Prow Plugin Development

    Developing a custom Prow plugin may seem daunting, but with the right approach and tools, it can be a rewarding experience. Here’s a step-by-step guide to get you started:

    1. Set Up Your Development Environment

    Before diving into plugin development, you need to set up your development environment. You will need Git, Go, and access to a Kubernetes cluster for testing your plugins. Ensure you have the necessary permissions to make changes to your organization’s Prow setup.

    2. Choose a Plugin Type

    Prow supports various plugin types, including postsubmits, presubmits, triggers, and utilities. Choose the type that best fits your use case.

    • Postsubmits: These plugins are executed after the code is merged and are often used for tasks like publishing artifacts or creating release notes.
    • Presubmits: Presubmit plugins run before code is merged, typically used for running tests and ensuring code quality.
    • Triggers: Trigger plugins allow you to trigger custom jobs based on specific events or criteria.
    • Utilities: Utility plugins offer reusable functions and utilities for other plugins.

    3. Create Your Plugin

    Once you’ve chosen a plugin type, it’s time to create it. Below is an example of a simple Prow plugin written in Go, named comment-plugin.go. It will create a comment on a pull request each time an event is received.

    This code sets up a basic HTTP server that listens for GitHub events and handles them by creating a comment using the GitHub API. Customize this code to fit your specific use case.

    package main
    
    import (
        "encoding/json"
        "flag"
        "net/http"
        "os"
        "strconv"
        "time"
    
        "github.com/sirupsen/logrus"
        "k8s.io/test-infra/pkg/flagutil"
        "k8s.io/test-infra/prow/config"
        "k8s.io/test-infra/prow/config/secret"
        prowflagutil "k8s.io/test-infra/prow/flagutil"
        configflagutil "k8s.io/test-infra/prow/flagutil/config"
        "k8s.io/test-infra/prow/github"
        "k8s.io/test-infra/prow/interrupts"
        "k8s.io/test-infra/prow/logrusutil"
        "k8s.io/test-infra/prow/pjutil"
        "k8s.io/test-infra/prow/pluginhelp"
        "k8s.io/test-infra/prow/pluginhelp/externalplugins"
    )
    
    const pluginName = "comment-plugin"
    
    type options struct {
        port int
    
        config                 configflagutil.ConfigOptions
        dryRun                 bool
        github                 prowflagutil.GitHubOptions
        instrumentationOptions prowflagutil.InstrumentationOptions
    
        webhookSecretFile string
    }
    
    type server struct {
        tokenGenerator func() []byte
        botUser        *github.UserData
        email          string
        ghc            github.Client
        log            *logrus.Entry
        repos          []github.Repo
    }
    
    func helpProvider(_ []config.OrgRepo) (*pluginhelp.PluginHelp, error) {
        pluginHelp := &pluginhelp.PluginHelp{
           Description: `The sample plugin`,
        }
        return pluginHelp, nil
    }
    
    func (o *options) Validate() error {
        return nil
    }
    
    func gatherOptions() options {
        o := options{config: configflagutil.ConfigOptions{ConfigPath: "./config.yaml"}}
        fs := flag.NewFlagSet(os.Args[0], flag.ExitOnError)
        fs.IntVar(&o.port, "port", 8888, "Port to listen on.")
        fs.BoolVar(&o.dryRun, "dry-run", false, "Dry run for testing. Uses API tokens but does not mutate.")
        fs.StringVar(&o.webhookSecretFile, "hmac-secret-file", "/etc/hmac", "Path to the file containing GitHub HMAC secret.")
        for _, group := range []flagutil.OptionGroup{&o.github} {
           group.AddFlags(fs)
        }
        fs.Parse(os.Args[1:])
        return o
    }
    
    func main() {
        o := gatherOptions()
        if err := o.Validate(); err != nil {
           logrus.Fatalf("Invalid options: %v", err)
        }
    
        logrusutil.ComponentInit()
        log := logrus.StandardLogger().WithField("plugin", pluginName)
    
        if err := secret.Add(o.webhookSecretFile); err != nil {
           logrus.WithError(err).Fatal("Error starting secrets agent.")
        }
    
        gitHubClient, err := o.github.GitHubClient(o.dryRun)
        if err != nil {
           logrus.WithError(err).Fatal("Error getting GitHub client.")
        }
    
        email, err := gitHubClient.Email()
        if err != nil {
           log.WithError(err).Fatal("Error getting bot e-mail.")
        }
    
        botUser, err := gitHubClient.BotUser()
        if err != nil {
           logrus.WithError(err).Fatal("Error getting bot name.")
        }
        repos, err := gitHubClient.GetRepos(botUser.Login, true)
        if err != nil {
           log.WithError(err).Fatal("Error listing bot repositories.")
        }
        serv := &server{
           tokenGenerator: secret.GetTokenGenerator(o.webhookSecretFile),
           botUser:        botUser,
           email:          email,
           ghc:            gitHubClient,
           log:            log,
           repos:          repos,
        }
    
        health := pjutil.NewHealthOnPort(o.instrumentationOptions.HealthPort)
        health.ServeReady()
    
        mux := http.NewServeMux()
        mux.Handle("/", serv)
        externalplugins.ServeExternalPluginHelp(mux, log, helpProvider)
        logrus.Info("starting server " + strconv.Itoa(o.port))
        httpServer := &http.Server{Addr: ":" + strconv.Itoa(o.port), Handler: mux}
        defer interrupts.WaitForGracefulShutdown()
        interrupts.ListenAndServe(httpServer, 5*time.Second)
    }
    
    func (s *server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
        logrus.Info("inside http server")
        _, _, payload, ok, _ := github.ValidateWebhook(w, r, s.tokenGenerator)
        logrus.Info(string(payload))
        if !ok {
           return
        }
        logrus.Info(w, "Event received. Have a nice day.")
        if err := s.handleEvent(payload); err != nil {
           logrus.WithError(err).Error("Error parsing event.")
        }
    }
    
    func (s *server) handleEvent(payload []byte) error {
        logrus.Info("inside handler")
        var pr github.PullRequestEvent
        if err := json.Unmarshal(payload, &pr); err != nil {
           return err
        }
        logrus.Info(pr.Number)
        if err := s.ghc.CreateComment(pr.PullRequest.Base.Repo.Owner.Login, pr.PullRequest.Base.Repo.Name, pr.Number, "comment from smaple-plugin"); err != nil {
           return err
        }
        return nil
    }

    4. Deploy Your Plugin

    To deploy your custom Prow plugin, you will need to create a Docker image and deploy it into your Prow cluster.

    FROM golang as app-builder
    WORKDIR /app
    RUN apt  update
    RUN apt-get install git
    COPY . .
    RUN CGO_ENABLED=0 go build -o main
    
    FROM alpine:3.9
    RUN apk add ca-certificates git
    COPY --from=app-builder /app/main /app/custom-plugin
    ENTRYPOINT ["/app/custom-plugin"]

    docker build -t jainbhavya65/custom-plugin:v1 .

    docker push jainbhavya65/custom-plugin:v1

    Deploy the Docker image using Kubernetes deployment:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: comment-plugin
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: comment-plugin
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: comment-plugin
        spec:
          containers:
          - args:
            - --github-token-path=/etc/github/oauth
            - --hmac-secret-file=/etc/hmac-token/hmac
            - --port=80
            image: <IMAGE>
            imagePullPolicy: Always
            name: comment-plugin
            ports:
            - containerPort: 80
              protocol: TCP
            volumeMounts:
            - mountPath: /etc/github
              name: oauth
              readOnly: true
            - mountPath: /etc/hmac-token
              name: hmac
              readOnly: true
          volumes:
          - name: oauth
            secret:
              defaultMode: 420
              secretName: oauth-token
          - name: hmac
            secret:
              defaultMode: 420
              secretName: hmac-token

    Create a service for deployment:
    apiVersion: v1
    kind: Service
    metadata:
      name: comment-plugin
    spec:
      ports:
      - port: 80
        protocol: TCP
        targetPort: 80
      selector:
        app: comment-plugin
      sessionAffinity: None
      type: ClusterIP
    view raw

    After creating the deployment and service, integrate it into your organization’s Prow configuration. This involves updating your Prow plugin.yaml files to include your plugin and specify when it should run.

    external_plugins: 
    - name: comment-plugin
      # No endpoint specified implies "http://{{name}}". // as we deploy plugin into same cluster
      # if plugin is not deployed in same cluster then you can give endpoint
      events:
      # only pull request and issue comment events are send to our plugin
      - pull_request
      - issue_comment

    Conclusion

    Mastering Prow plugin development opens up a world of possibilities for tailoring your Kubernetes CI/CD workflow to meet your organization’s needs. While the initial learning curve may be steep, the benefits of custom plugins in terms of automation, efficiency, and control are well worth the effort.

    Remember that the key to successful Prow plugin development lies in clear documentation, thorough testing, and collaboration with your team to ensure that your custom plugins enhance your CI/CD pipeline’s functionality and reliability. As Kubernetes and containerized applications continue to evolve, Prow will remain a valuable tool for managing your CI/CD processes, and your custom plugins will be the secret sauce that sets your workflow apart from the rest.

  • Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

    Introduction

    The world continues to go through digital transformation at an accelerating pace. Modern applications and infrastructure continues to expand and operational complexity continues to grow. According to a recent ManageEngine Application Performance Monitoring Survey:

    • 28 percent use ad-hoc scripts to detect issues in over 50 percent of their applications.
    • 32 percent learn about application performance issues from end users.
    • 59 percent trust monitoring tools to identify most performance deviations.

    Most enterprises and web-scale companies have instrumentation & monitoring capabilities with an ElasticSearch cluster. They have a high amount of collected data but struggle to use it effectively. This available data can be used to improve availability and effectiveness of performance and uptime along with root cause analysis and incident prediction

    IT Operations & Machine Learning

    Here is the main question: How to make sense of the huge piles of collected data? The first step towards making sense of data is to understand the correlations between the time series data. But only understanding will not work since correlation does not imply causation. We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc.

    It’s very likely that due to one component something goes wrong with another component. In such cases, operational historical data can be used to identify the root cause by investigating through a series of intermediate causes and effects. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

    Let’s see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

    Anomaly Detection using Elastic’s machine learning with X-Pack

    Step I: Setup

    1. Setup Elasticsearch: 

    According to Elastic documentation, it is recommended to use the Oracle JDK version 1.8.0_131. Check if you have required Java version installed on your system. It should be at least Java 8, if required install/upgrade accordingly.

    • Download elasticsearch tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.tar.gz
    $ tar -xzvf elasticsearch-5.5.1.tar.gz

    • It will then create a folder named elasticsearch-5.5.1. Go into the folder.
    $ cd elasticsearch-5.5.1

    • Install X-Pack into Elasticsearch
    $ ./bin/elasticsearch-plugin install x-pack

    • Start elasticsearch
    $ bin/elasticsearch

    2. Setup Kibana

    Kibana is an open source analytics and visualization platform designed to work with Elasticsearch.

    • Download kibana tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/kibana/kibana-5.5.1-linux-x86_64.tar.gz
    $ tar -xzf kibana-5.5.1-linux-x86_64.tar.gz

    • It will then create a folder named kibana-5.5.1. Go into the directory.
    $ cd kibana-5.5.1-linux-x86_64

    • Install X-Pack into Kibana
    $ ./bin/kibana-plugin install x-pack

    • Running kibana
    $ ./bin/kibana

    • Navigate to Kibana at http://localhost:5601/
    • Log in as the built-in user elastic and password changeme.
    • You will see the below screen:
    Kibana: X-Pack Welcome Page

     

    3. Metricbeat:

    Metricbeat helps in monitoring servers and the services they host by collecting metrics from the operating system and services. We will use it to get CPU utilization metrics of our local system in this blog.

    • Download Metric Beat’s tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-5.5.1-linux-x86_64.tar.gz
    $ tar -xvzf metricbeat-5.5.1-linux-x86_64.tar.gz

    • It will create a folder metricbeat-5.5.1-linux-x86_64. Go to the folder
    $ cd metricbeat-5.5.1-linux-x86_64

    • By default, Metricbeat is configured to send collected data to elasticsearch running on localhost. If your elasticsearch is hosted on any server, change the IP and authentication credentials in metricbeat.yml file.
     Metricbeat Config

     

    • Metric beat provides following stats:
    • System load
    • CPU stats
    • IO stats
    • Per filesystem stats
    • Per CPU core stats
    • File system summary stats
    • Memory stats
    • Network stats
    • Per process stats
    • Start Metricbeat as daemon process
    $ sudo ./metricbeat -e -c metricbeat.yml &

    Now, all setup is done. Let’s go to step 2 to create machine learning jobs. 

    Step II: Time Series data

    • Real-time data: We have metricbeat providing us the real-time series data which will be used for unsupervised learning. Follow below steps to define index pattern metricbeat-*  in Kibana to search against this pattern in Elasticsearch:
      – Go to Management -> Index Patterns  
      – Provide Index name or pattern as metricbeat-*
      – Select Time filter field name as @timestamp
      – Click Create

    You will not be able to create an index if elasticsearch did not contain any metric beat data. Make sure your metric beat is running and output is configured as elasticsearch.

    • Saved Historic data: Just to see quickly how machine learning detect the anomalies you can also use data provided by Elastic. Download sample data by clicking here.
    • Unzip the files in a folder: tar -zxvf server_metrics.tar.gz
    • Download this script. It will be used to upload sample data to elastic.
    • Provide execute permissions to the file: chmod +x upload_server-metrics.sh
    • Run the script.
    • As we created index pattern for metricbeat data, in same way create index pattern server-metrics*

    Step III: Creating Machine Learning jobs

    There are two scenarios in which data is considered anomalous. First, when the behavior of key indicator changes over time relative to its previous behavior. Secondly, when within a population behavior of an entity deviates from other entities in population over single key indicator.

    To detect these anomalies, there are three types of jobs we can create:

    1. Single Metric job: This job is used to detect Scenario 1 kind of anomalies over only one key performance indicator.
    2. Multimetric job: Multimetric job also detects Scenario 1 kind of anomalies but in this type of job we can track more than one performance indicators, such as CPU utilization along with memory utilization.
    3. Advanced job: This kind of job is created to detect anomalies of type 2.

    For simplicity, we are creating following single metric jobs:

    1. Tracking CPU Utilization: Using metric beat data
    2. Tracking total requests made on server: Using sample server data

    Follow below steps to create single metric jobs:

    Job1: Tracking CPU Utilization

    Job2: Tracking total requests made on server

    • Go to http://localhost:5601/
    • Go to Machine learning tab on the left panel of Kibana.
    • Click on Create new job
    • Click Create single metric job
    • Select index we created in Step 2 i.e. metricbeat-* and server-metrics* respectively
    • Configure jobs by providing following values:
    1. Aggregation: Here you need to select an aggregation function that will be applied to a particular field of data we are analyzing.
    2. Field: It is a drop down, will show you all field that you have w.r.t index pattern.
    3. Bucket span: It is interval time for analysis. Aggregation function will be applied on selected field after every interval time specified here.
    • If your data contains so many empty buckets i.e. data is sparse and you don’t want to consider it as anomalous check the checkbox named sparse data  (if it appears).
    • Click on Use full <index pattern=””> data to use all available data for analysis.</index>
    Metricbeats Description
    Server Description
    • Click on play symbol
    • Provide job name and description
    • Click on Create Job

    After creating job the data available will be analyzed. Click on view results, you will see a chart which will show the actual and upper & lower bound of predicted value. If actual value lies outside of the range, it will be considered as anomalous. The Color of the circles represents the severity level.

    Here we are getting a high range of prediction values since it just started learning. As we get more data the prediction will get better.
    You can see here predictions are pretty good since there is a lot of data to understand the pattern
    • Click on machine learning tab in the left panel. The jobs we created will be listed here.
    • You will see the list of actions for every job you have created.
    • Since we are storing every minute data for Job1 using metricbeat. We can feed the data to the job in real time. Click on play button to start data feed. As we get more and more data prediction will improve.
    • You see details of anomalies by clicking Anomaly Viewer.
    Anomaly in metricbeats data
    Server metrics anomalies  

    We have seen how machine learning can be used to get patterns among the different statistics along with anomaly detection. After identifying anomalies, it is required to find the context of those events. For example, to know about what other factors are contributing to the problem? In such cases, we can troubleshoot by creating multimetric jobs.

  • Automated Containerization and Migration of On-premise Applications to Cloud Platforms

    Containerized applications are becoming more popular with each passing year. All enterprise applications are adopting container technology as they modernize their IT systems. Migrating your applications from VMs or physical machines to containers comes with multiple advantages like optimal resource utilization, faster deployment times, replication, quick cloning, lesser lock-in and so on. Various container orchestration platforms like Kubernetes, Google Container Engine (GKE), Amazon EC2 Container Service (Amazon ECS) help in quick deployment and easy management of your containerized applications. But in order to use these platforms, you need to migrate your legacy applications to containers or rewrite/redeploy your applications from scratch with the containerization approach. Rearchitecting your applications using containerization approach is preferable, but is that possible for complex legacy applications? Is your deployment team capable enough to list down each and every detail about the deployment process of your application? Do you have the patience of authoring a Docker file for each of the components of your complex application stack?

    Automated migrations!

    Velotio has been helping customers with automated migration of VMs and bare-metal servers to various container platforms. We have developed automation to convert these migrated applications as containers on various container deployment platforms like GKE, Amazon ECS and Kubernetes. In this blog post, we will cover one such migration tool developed at Velotio which will migrate your application running on a VM or physical machine to Google Container Engine (GKE) by running a single command.

    Migration tool details

    We have named our migration tool as A2C(Anything to Container). It can migrate applications running on any Unix or Windows operating system. 

    The migration tool requires the following information about the server to be migrated:

    • IP of the server
    • SSH User, SSH Key/Password of the application server
    • Configuration file containing data paths for application/database/components (more details below)
    • Required name of your docker image (The docker image that will get created for your application)
    • GKE Container Cluster details

    In order to store persistent data, volumes can be defined in container definition. Data changes done on volume path remain persistent even if the container is killed or crashes. Volumes are basically filesystem path from host machine on which your container is running, NFS or cloud storage. Containers will mount the filesystem path from your local machine to container, leading to data changes being written on the host machine filesystem instead of the container’s filesystem. Our migration tool supports data volumes which can be defined in the configuration file. It will automatically create disks for the defined volumes and copy data from your application server to these disks in a consistent way.

    The configuration file we have been talking about is basically a YAML file containing filesystem level information about your application server. A sample of this file can be found below:

    includes:
    - /
    volumes:
    - var/log/httpd
    - var/log/mariadb
    - var/www/html
    - var/lib/mysql
    excludes:
    - mnt
    - var/tmp
    - etc/fstab
    - proc
    - tmp

    The configuration file contains 3 sections: includes, volumes and excludes:

    • Includes contains filesystem paths on your application server which you want to add to your container image.
    • Volumes contain filesystem paths on your application server which stores your application data. Generally, filesystem paths containing database files, application code files, configuration files, log files are good candidates for volumes.
    • The excludes section contains filesystem paths which you don’t want to make part of the container. This may include temporary filesystem paths like /proc, /tmp and also NFS mounted paths. Ideally, you would include everything by giving “/” in includes section and exclude specifics in exclude section.

    Docker image name to be given as input to the migration tool is the docker registry path in which the image will be stored, followed by the name and tag of the image. Docker registry is like GitHub of docker images, where you can store all your images. Different versions of the same image can be stored by giving version specific tag to the image. GKE also provides a Docker registry. Since in this demo we are migrating to GKE, we will also store our image to GKE registry.

    GKE container cluster details to be given as input to the migration tool, contains GKE specific details like GKE project name, GKE container cluster name and GKE region name. A container cluster can be created in GKE to host the container applications. We have a separate set of scripts to perform cluster creation operation. Container cluster creation can also be done easily through GKE UI. For now, we will assume that we have a 3 node cluster created in GKE, which we will use to host our application.

    Tasks performed under migration

    Our migration tool (A2C), performs the following set of activities for migrating the application running on a VM or physical machine to GKE Container Cluster:

    1. Install the A2C migration tool with all it’s dependencies to the target application server

    2. Create a docker image of the application server, based on the filesystem level information given in the configuration file

    3. Capture metadata from the application server like configured services information, port usage information, network configuration, external services, etc.

    4.  Push the docker image to GKE container registry

    5. Create disk in Google Cloud for each volume path defined in configuration file and prepopulate disks with data from application server

    6. Create deployment spec for the container application in GKE container cluster, which will open the required ports, configure required services, add multi container dependencies, attach the pre populated disks to containers, etc.

    7. Deploy the application, after which you will have your application running as containers in GKE with application software in running state. New application URL’s will be given as output.

    8. Load balancing, HA will be configured for your application.

    Demo

    For demonstration purpose, we will deploy a LAMP stack (Apache+PHP+Mysql) on a CentOS 7 VM and will run the migration utility for the VM, which will migrate the application to our GKE cluster. After the migration we will show our application preconfigured with the same data as on our VM, running on GKE.

    Step 1

    We setup LAMP stack using Apache, PHP and Mysql on a CentOS 7 VM in GCP. The PHP application can be used to list, add, delete or edit user data. The data is getting stored in MySQL database. We added some data to the database using the application and the UI would show the following:

    Step 2

    Now we run the A2C migration tool, which will migrate this application stack running on a VM into a container and auto-deploy it to GKE.

    # ./migrate.py -c lamp_data_handler.yml -d "tcp://35.202.201.247:4243" -i migrate-lamp -p glassy-chalice-XXXXX -u root -k ~/mykey -l a2c-host --gcecluster a2c-demo --gcezone us-central1-b 130.211.231.58

    Pushing converter binary to target machine
    Pushing data config to target machine
    Pushing installer script to target machine
    Running converter binary on target machine
    [130.211.231.58] out: creating docker image
    [130.211.231.58] out: image created with id 6dad12ba171eaa8615a9c353e2983f0f9130f3a25128708762228f293e82198d
    [130.211.231.58] out: Collecting metadata for image
    [130.211.231.58] out: Generating metadata for cent7
    [130.211.231.58] out: Building image from metadata
    Pushing the docker image to GCP container registryInitiate remote data copy
    Activated service account credentials for: [glassy-chaliceXXXXX@appspot.gserviceaccount.com]
    for volume var/log/httpd
    Creating disk migrate-lamp-0
    Disk Created Successfully
    transferring data from sourcefor volume var/log/mariadb
    Creating disk migrate-lamp-1
    Disk Created Successfully
    transferring data from sourcefor volume var/www/html
    Creating disk migrate-lamp-2
    Disk Created Successfully
    transferring data from sourcefor volume var/lib/mysql
    Creating disk migrate-lamp-3
    Disk Created Successfully
    transferring data from sourceConnecting to GCP cluster for deployment
    Created service file /tmp/gcp-service.yaml
    Created deployment file /tmp/gcp-deployment.yaml

    Deploying to GKE

    $ kubectl get pod
    
    NAMEREADY STATUSRESTARTS AGE
    migrate-lamp-3707510312-6dr5g 0/1 ContainerCreating 058s

    $ kubectl get deployment
    
    NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
    migrate-lamp 1 1 10 1m

    $ kubectl get service
    
    NAME CLUSTER-IP EXTERNAL-IP PORT(S)AGE
    kubernetes 10.59.240.1443/TCP23hmigrate-lamp 10.59.248.44 35.184.53.100 3306:31494/TCP,80:30909/TCP,22:31448/TCP 53s

    You can access your application using above connection details!

    Step 3

    Access LAMP stack on GKE using the IP 35.184.53.100 on default 80 port as was done on the source machine.

    Here is the Docker image being created in GKE Container Registry:

    We can also see that disks were created with migrate-lamp-x, as part of this automated migration.

    Load Balancer also got provisioned in GCP as part of the migration process

    Following service files and deployment files were created by our migration tool to deploy the application on GKE:

    # cat /tmp/gcp-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
    labels:
    app: migrate-lamp
    name: migrate-lamp
    spec:
    ports:
    - name: migrate-lamp-3306
    port: 3306
    - name: migrate-lamp-80
    port: 80
    - name: migrate-lamp-22
    port: 22
    selector:
    app: migrate-lamp
    type: LoadBalancer

    # cat /tmp/gcp-deployment.yaml
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    labels:
    app: migrate-lamp
    name: migrate-lamp
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: migrate-lamp
    template:
    metadata:
    labels:
    app: migrate-lamp
    spec:
    containers:
    - image: us.gcr.io/glassy-chalice-129514/migrate-lamp
    name: migrate-lamp
    ports:
    - containerPort: 3306
    - containerPort: 80
    - containerPort: 22
    securityContext:
    privileged: true
    volumeMounts:
    - mountPath: /var/log/httpd
    name: migrate-lamp-var-log-httpd
    - mountPath: /var/www/html
    name: migrate-lamp-var-www-html
    - mountPath: /var/log/mariadb
    name: migrate-lamp-var-log-mariadb
    - mountPath: /var/lib/mysql
    name: migrate-lamp-var-lib-mysql
    volumes:
    - gcePersistentDisk:
    fsType: ext4
    pdName: migrate-lamp-0
    name: migrate-lamp-var-log-httpd
    - gcePersistentDisk:
    fsType: ext4
    pdName: migrate-lamp-2
    name: migrate-lamp-var-www-html
    - gcePersistentDisk:
    fsType: ext4
    pdName: migrate-lamp-1
    name: migrate-lamp-var-log-mariadb
    - gcePersistentDisk:
    fsType: ext4
    pdName: migrate-lamp-3
    name: migrate-lamp-var-lib-mysql

    Conclusion

    Migrations are always hard for IT and development teams. At Velotio, we have been helping customers to migrate to cloud and container platforms using streamlined processes and automation. Feel free to reach out to us at contact@rsystems.com to know more about our cloud and container adoption/migration offerings.

  • A Primer on HTTP Load Balancing in Kubernetes using Ingress on Google Cloud Platform

    Containerized applications and Kubernetes adoption in cloud environments is on the rise. One of the challenges while deploying applications in Kubernetes is exposing these containerized applications to the outside world. This blog explores different options via which applications can be externally accessed with focus on Ingress – a new feature in Kubernetes that provides an external load balancer. This blog also provides a simple hand-on tutorial on Google Cloud Platform (GCP).  

    Ingress is the new feature (currently in beta) from Kubernetes which aspires to be an Application Load Balancer intending to simplify the ability to expose your applications and services to the outside world. It can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, offer name based virtual hosting etc. Before we dive into Ingress, let’s look at some of the alternatives currently available that help expose your applications, their complexities/limitations and then try to understand Ingress and how it addresses these problems.

    Current ways of exposing applications externally:

    There are certain ways using which you can expose your applications externally. Lets look at each of them:

    EXPOSE Pod:

    You can expose your application directly from your pod by using a port from the node which is running your pod, mapping that port to a port exposed by your container and using the combination of your HOST-IP:HOST-PORT to access your application externally. This is similar to what you would have done when running docker containers directly without using Kubernetes. Using Kubernetes you can use hostPortsetting in service configuration which will do the same thing. Another approach is to set hostNetwork: true in service configuration to use the host’s network interface from your pod.

    Limitations:

    • In both scenarios you should take extra care to avoid port conflicts at the host, and possibly some issues with packet routing and name resolutions.
    • This would limit running only one replica of the pod per cluster node as the hostport you use is unique and can bind with only one service.

    EXPOSE Service:

    Kubernetes services primarily work to interconnect different pods which constitute an application. You can scale the pods of your application very easily using services. Services are not primarily intended for external access, but there are some accepted ways to expose services to the external world.

    Basically, services provide a routing, balancing and discovery mechanism for the pod’s endpoints. Services target pods using selectors, and can map container ports to service ports. A service exposes one or more ports, although usually, you will find that only one is defined.

    A service can be exposed using 3 ServiceType choices:

    • ClusterIP: Exposes the service on a cluster-internal IP. Choosing this value makes the service only reachable from within the cluster. This is the default ServiceType.
    • NodePort: Exposes the service on each Node’s IP at a static port (the NodePort). A ClusterIP service, to which the NodePort service will route, is automatically created. You’ll be able to contact the NodePort service, from outside the cluster, by requesting <nodeip>:<nodeport>.Here NodePort remains fixed and NodeIP can be any node IP of your Kubernetes cluster.</nodeport></nodeip>
    • LoadBalancer: Exposes the service externally using a cloud provider’s load balancer (eg. AWS ELB). NodePort and ClusterIP services, to which the external load balancer will route, are automatically created.
    • ExternalName: Maps the service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up. This requires version 1.7 or higher of kube-dns

    Limitations:

    • If we choose NodePort to expose our services, kubernetes will generate ports corresponding to the ports of your pods in the range of 30000-32767. You will need to add an external proxy layer that uses DNAT to expose more friendly ports. The external proxy layer will also have to take care of load balancing so that you leverage the power of your pod replicas. Also it would not be easy to add TLS or simple host header routing rules to the external service.
    • ClusterIP and ExternalName similarly while easy to use have the limitation where we can add any routing or load balancing rules.
    • Choosing LoadBalancer is probably the easiest of all methods to get your service exposed to the internet. The problem is that there is no standard way of telling a Kubernetes service about the elements that a balancer requires, again TLS and host headers are left out. Another limitation is reliance on an external load balancer (AWS’s ELB, GCP’s Cloud Load Balancer etc.)

    Endpoints

    Endpoints are usually automatically created by services, unless you are using headless services and adding the endpoints manually. An endpoint is a host:port tuple registered at Kubernetes, and in the service context it is used to route traffic. The service tracks the endpoints as pods, that match the selector are created, deleted and modified. Individually, endpoints are not useful to expose services, since they are to some extent ephemeral objects.

    Summary

    If you can rely on your cloud provider to correctly implement the LoadBalancer for their API, to keep up-to-date with Kubernetes releases, and you are happy with their management interfaces for DNS and certificates, then setting up your services as type LoadBalancer is quite acceptable.

    On the other hand, if you want to manage load balancing systems manually and set up port mappings yourself, NodePort is a low-complexity solution. If you are directly using Endpoints to expose external traffic, perhaps you already know what you are doing (but consider that you might have made a mistake, there could be another option).

    Given that none of these elements has been originally designed to expose services to the internet, their functionality may seem limited for this purpose.

    Understanding Ingress

    Traditionally, you would create a LoadBalancer service for each public application you want to expose. Ingress gives you a way to route requests to services based on the request host or path, centralizing a number of services into a single entrypoint.

    Ingress is split up into two main pieces. The first is an Ingress resource, which defines how you want requests routed to the backing services and second is the Ingress Controller which does the routing and also keeps track of the changes on a service level.

    Ingress Resources

    The Ingress resource is a set of rules that map to Kubernetes services. Ingress resources are defined purely within Kubernetes as an object that other entities can watch and respond to.

    Ingress Supports defining following rules in beta stage:

    • host header:  Forward traffic based on domain names.
    • paths: Looks for a match at the beginning of the path.
    • TLS: If the ingress adds TLS, HTTPS and a certificate configured through a secret will be used.

    When no host header rules are included at an Ingress, requests without a match will use that Ingress and be mapped to the backend service. You will usually do this to send a 404 page to requests for sites/paths which are not sent to the other services. Ingress tries to match requests to rules, and forwards them to backends, which are composed of a service and a port.

    Ingress Controllers

    Ingress controller is the entity which grants (or remove) access, based on the changes in the services, pods and Ingress resources. Ingress controller gets the state change data by directly calling Kubernetes API.

    Ingress controllers are applications that watch Ingresses in the cluster and configure a balancer to apply those rules. You can configure any of the third party balancers like HAProxy, NGINX, Vulcand or Traefik to create your version of the Ingress controller.  Ingress controller should track the changes in ingress resources, services and pods and accordingly update configuration of the balancer.

    Ingress controllers will usually track and communicate with endpoints behind services instead of using services directly. This way some network plumbing is avoided, and we can also manage the balancing strategy from the balancer. Some of the open source implementations of Ingress Controllers can be found here.

    Now, let’s do an exercise of setting up a HTTP Load Balancer using Ingress on Google Cloud Platform (GCP), which has already integrated the ingress feature in it’s Container Engine (GKE) service.

    Ingress-based HTTP Load Balancer in Google Cloud Platform

    The tutorial assumes that you have your GCP account setup done and a default project created. We will first create a Container cluster, followed by deployment of a nginx server service and an echoserver service. Then we will setup an ingress resource for both the services, which will configure the HTTP Load Balancer provided by GCP

    Basic Setup

    Get your project ID by going to the “Project info” section in your GCP dashboard. Start the Cloud Shell terminal, set your project id and the compute/zone in which you want to create your cluster.

    $ gcloud config set project glassy-chalice-129514$ 
    gcloud config set compute/zone us-east1-d
    # Create a 3 node cluster with name “loadbalancedcluster”$ 
    gcloud container clusters create loadbalancedcluster  

    Fetch the cluster credentials for the kubectl tool:

    $ gcloud container clusters get-credentials loadbalancedcluster --zone us-east1-d --project glassy-chalice-129514

    Step 1: Deploy an nginx server and echoserver service

    $ kubectl run nginx --image=nginx --port=80
    $ kubectl run echoserver --image=gcr.io/google_containers/echoserver:1.4 --port=8080
    $ kubectl get deployments
    NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    echoserver   1         1         1            1           15s
    nginx        1         1         1            1           26m

    Step 2: Expose your nginx and echoserver deployment as a service internally

    Create a Service resource to make the nginx and echoserver deployment reachable within your container cluster:

    $ kubectl expose deployment nginx --target-port=80  --type=NodePort
    $ kubectl expose deployment echoserver --target-port=8080 --type=NodePort

    When you create a Service of type NodePort with this command, Container Engine makes your Service available on a randomly-selected high port number (e.g. 30746) on all the nodes in your cluster. Verify the Service was created and a node port was allocated:

    $ kubectl get service nginx
    NAME      CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
    nginx     10.47.245.54   <nodes>       80:30746/TCP   20s
    $ kubectl get service echoserver
    NAME         CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
    echoserver   10.47.251.9   <nodes>       8080:32301/TCP   33s

    In the output above, the node port for the nginx Service is 30746 and for echoserver service is 32301. Also, note that there is no external IP allocated for this Services. Since the Container Engine nodes are not externally accessible by default, creating this Service does not make your application accessible from the Internet. To make your HTTP(S) web server application publicly accessible, you need to create an Ingress resource.

    Step 3: Create an Ingress resource

    On Container Engine, Ingress is implemented using Cloud Load Balancing. When you create an Ingress in your cluster, Container Engine creates an HTTP(S) load balancer and configures it to route traffic to your application. Container Engine has internally defined an Ingress Controller, which takes the Ingress resource as input for setting up proxy rules and talk to Kubernetes API to get the service related information.

    The following config file defines an Ingress resource that directs traffic to your nginx and echoserver server:

    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata: 
    name: fanout-ingress
    spec: 
    rules: 
    - http:     
    paths:     
    - path: /       
    backend:         
    serviceName: nginx         
    servicePort: 80     
    - path: /echo       
    backend:         
    serviceName: echoserver         
    servicePort: 8080

    To deploy this Ingress resource run in the cloud shell:

    $ kubectl apply -f basic-ingress.yaml

    Step 4: Access your application

    Find out the external IP address of the load balancer serving your application by running:

    $ kubectl get ingress fanout-ingres
    NAME             HOSTS     ADDRESS          PORTS     AG
    fanout-ingress   *         130.211.36.168   80        36s    

     

    Use http://<external-ip-address> </external-ip-address>and http://<external-ip-address>/echo</external-ip-address> to access nginx and the echo-server.

    Summary

    Ingresses are simple and very easy to deploy, and really fun to play with. However, it’s currently in beta phase and misses some of the features that may restrict it from production use. Stay tuned to get updates in Ingress on Kubernetes page and their Github repo.

    References

  • A Practical Guide to Deploying Multi-tier Applications on Google Container Engine (GKE)

    Introduction

    All modern era programmers can attest that containerization has afforded more flexibility and allows us to build truly cloud-native applications. Containers provide portability – ability to easily move applications across environments. Although complex applications comprise of many (10s or 100s) containers. Managing such applications is a real challenge and that’s where container orchestration and scheduling platforms like Kubernetes, Mesosphere, Docker Swarm, etc. come into the picture. 
    Kubernetes, backed by Google is leading the pack given that Redhat, Microsoft and now Amazon are putting their weight behind it.

    Kubernetes can run on any cloud or bare metal infrastructure. Setting up & managing Kubernetes can be a challenge but Google provides an easy way to use Kubernetes through the Google Container Engine(GKE) service.

    What is GKE?

    Google Container Engine is a Management and orchestration system for Containers. In short, it is a hosted Kubernetes. The goal of GKE is to increase the productivity of DevOps and development teams by hiding the complexity of setting up the Kubernetes cluster, the overlay network, etc.

    Why GKE? What are the things that GKE does for the user?

    • GKE abstracts away the complexity of managing a highly available Kubernetes cluster.
    • GKE takes care of the overlay network
    • GKE also provides built-in authentication
    • GKE also provides built-in auto-scaling.
    • GKE also provides easy integration with the Google storage services.

    In this blog, we will see how to create your own Kubernetes cluster in GKE and how to deploy a multi-tier application in it. The blog assumes you have a basic understanding of Kubernetes and have used it before. It also assumes you have created an account with Google Cloud Platform. If you are not familiar with Kubernetes, this guide from Deis  is a good place to start.

    Google provides a Command-line interface (gcloud) to interact with all Google Cloud Platform products and services. gcloud is a tool that provides the primary command-line interface to Google Cloud Platform. Gcloud tool can be used in the scripts to automate the tasks or directly from the command-line. Follow this guide to install the gcloud tool.

    Now let’s begin! The first step is to create the cluster.

    Basic Steps to create cluster

    In this section, I would like to explain about how to create GKE cluster. We will use a command-line tool to setup the cluster.

    Set the zone in which you want to deploy the cluster

    $ gcloud config set compute/zone us-west1-a

    Create the cluster using following command,

    $ gcloud container --project <project-name> 
    clusters create <cluster-name> 
    --machine-type n1-standard-2 
    --image-type "COS" --disk-size "50" 
    --num-nodes 2 --network default 
    --enable-cloud-logging --no-enable-cloud-monitoring

    Let’s try to understand what each of these parameters mean:

    –project: Project Name

    –machine-type: Type of the machine like n1-standard-2, n1-standard-4

    –image-type: OS image.”COS” i.e. Container Optimized OS from Google: More Info here.

    –disk-size: Disk size of each instance.

    –num-nodes: Number of nodes in the cluster.

    –network: Network that users want to use for the cluster. In this case, we are using default network.

    Apart from the above options, you can also use the following to provide specific requirements while creating the cluster:

    –scopes: Scopes enable containers to direct access any Google service without needs credentials. You can specify comma separated list of scope APIs. For example:

    • Compute: Lets you view and manage your Google Compute Engine resources
    • Logging.write: Submit log data to Stackdriver.

    You can find all the Scopes that Google supports here: .

    –additional-zones: Specify additional zones to high availability. Eg. –additional-zones us-east1-b, us-east1-d . Here Kubernetes will create a cluster in 3 zones (1 specified at the beginning and additional 2 here).

    –enable-autoscaling : To enable the autoscaling option. If you specify this option then you have to specify the minimum and maximum required nodes as follows; You can read more about how auto-scaling works here. Eg:   –enable-autoscaling –min-nodes=15 –max-nodes=50

    You can fetch the credentials of the created cluster. This step is to update the credentials in the kubeconfig file, so that kubectl will point to required cluster.

    $ gcloud container clusters get-credentials my-first-cluster --project project-name

    Now, your First Kubernetes cluster is ready. Let’s check the cluster information & health.

    $ kubectl get nodes
    NAME    STATUS    AGE   VERSION
    gke-first-cluster-default-pool-d344484d-vnj1  Ready  2h  v1.6.4
    gke-first-cluster-default-pool-d344484d-kdd7  Ready  2h  v1.6.4
    gke-first-cluster-default-pool-d344484d-ytre2  Ready  2h  v1.6.4

    After creating Cluster, now let’s see how to deploy a multi tier application on it. Let’s use simple Python Flask app which will greet the user, store employee data & get employee data.

    Application Deployment

    I have created simple Python Flask application to deploy on K8S cluster created using GKE. you can go through the source code here. If you check the source code then you will find directory structure as follows:

    TryGKE/
    ├── Dockerfile
    ├── mysql-deployment.yaml
    ├── mysql-service.yaml
    ├── src    
      ├── app.py    
      └── requirements.txt    
      ├── testapp-deployment.yaml    
      └── testapp-service.yaml

    In this, I have written a Dockerfile for the Python Flask application in order to build our own image to deploy. For MySQL, we won’t build an image of our own. We will use the latest MySQL image from the public docker repository.

    Before deploying the application, let’s re-visit some of the important Kubernetes terms:

    Pods:

    The pod is a Docker container or a group of Docker containers which are deployed together on the host machine. It acts as a single unit of deployment.

    Deployments:

    Deployment is an entity which manages the ReplicaSets and provides declarative updates to pods. It is recommended to use Deployments instead of directly using ReplicaSets. We can use deployment to create, remove and update ReplicaSets. Deployments have the ability to rollout and rollback the changes.

    Services:

    Service in K8S is an abstraction which will connect you to one or more pods. You can connect to pod using the pod’s IP Address but since pods come and go, their IP Addresses change.  Services get their own IP & DNS and those remain for the entire lifetime of the service. 

    Each tier in an application is represented by a Deployment. A Deployment is described by the YAML file. We have two YAML files – one for MySQL and one for the Python application.

    1. MySQL Deployment YAML

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: mysql
    spec:
      template:
        metadata:
          labels:
            app: mysql
        spec:
          containers:
            - env:
                - name: MYSQL_DATABASE
                  value: admin
                - name: MYSQL_ROOT_PASSWORD
                  value: admin
              image: 'mysql:latest'
              name: mysql
              ports:
                - name: mysqlport
                  containerPort: 3306
                  protocol: TCP

    2. Python Application Deployment YAML

    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: test-app
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            app: test-app
        spec:
          containers:
          - name: test-app
            image: ajaynemade/pymy:latest
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 5000

    Each Service is also represented by a YAML file as follows:

    1. MySQL service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: mysql-service
    spec:
      ports:
      - port: 3306
        targetPort: 3306
        protocol: TCP
        name: http
      selector:
        app: mysql

    2. Python Application service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: test-service
    spec:
      type: LoadBalancer
      ports:
      - name: test-service
        port: 80
        protocol: TCP
        targetPort: 5000
      selector:
        app: test-app

    You will find a ‘kind’ field in each YAML file. It is used to specify whether the given configuration is for deployment, service, pod, etc.

    In the Python app service YAML, I am using type = LoadBalancer. In GKE, There are two types of cloud load balancers available to expose the application to outside world.

    1. TCP load balancer: This is a TCP Proxy-based load balancer. We will use this in our example.
    2. HTTP(s) load balancer: It can be created using Ingress. For more information, refer to this post that talks about Ingress in detail.

    In the MySQL service, I’ve not specified any type, in that case, type ‘ClusterIP’ will get used, which will make sure that MySQL container is exposed to the cluster and the Python app can access it.

    If you check the app.py, you can see that I have used “mysql-service.default” as a hostname. “Mysql-service.default” is a DNS name of the service. The Python application will refer to that DNS name while accessing the MySQL Database.

    Now, let’s actually setup the components from the configurations. As mentioned above, we will first create services followed by deployments.

    Services:

    $ kubectl create -f mysql-service.yaml
    $ kubectl create -f testapp-service.yaml

    Deployments:

    $ kubectl create -f mysql-deployment.yaml
    $ kubectl create -f testapp-deployment.yaml

    Check the status of the pods and services. Wait till all pods come to the running state and Python application service to get external IP like below:

    $ kubectl get services
    NAME            CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
    kubernetes      10.55.240.1     <none>        443/TCP        5h
    mysql-service   10.55.240.57    <none>        3306/TCP       1m
    test-service    10.55.246.105   35.185.225.67     80:32546/TCP   11s

    Once you get the external IP, then you should be able to make APIs calls using simple curl requests.

    Eg. To Store Data :

    curl -H "Content-Type: application/x-www-form-urlencoded" -X POST  http://35.185.225.67:80/storedata -d id=1 -d name=NoOne

    Eg. To Get Data :

    curl 35.185.225.67:80/getdata/1

    At this stage your application is completely deployed and is externally accessible.

    Manual scaling of pods

    Scaling your application up or down in Kubernetes is quite straightforward. Let’s scale up the test-app deployment.

    $ kubectl scale deployment test-app --replicas=3

    Deployment configuration for test-app will get updated and you can see 3 replicas of test-app are running. Verify it using,

    kubectl get pods

    In the same manner, you can scale down your application by reducing the replica count.

    Cleanup :

    Un-deploying an application from Kubernetes is also quite straightforward. All we have to do is delete the services and delete the deployments. The only caveat is that the deletion of the load balancer is an asynchronous process. You have to wait until it gets deleted.

    $ kubectl delete service mysql-service
    $ kubectl delete service test-service

    The above command will deallocate Load Balancer which was created as a part of test-service. You can check the status of the load balancer with the following command.

    $ gcloud compute forwarding-rules list

    Once the load balancer is deleted, you can clean-up the deployments as well.

    $ kubectl delete deployments test-app
    $ kubectl delete deployments mysql

    Delete the Cluster:

    $ gcloud container clusters delete my-first-cluster

    Conclusion

    In this blog, we saw how easy it is to deploy, scale & terminate applications on Google Container Engine. Google Container Engine abstracts away all the complexity of Kubernetes and gives us a robust platform to run containerised applications. I am super excited about what the future holds for Kubernetes!

    Check out some of Velotio’s other blogs on Kubernetes.

  • How To Get Started With Logging On Kubernetes?

    In distributed systems like Kubernetes, logging is critical for monitoring and providing observability and insight into an application’s operations. With the ever-increasing complexity of distributed systems and the proliferation of cloud-native solutions, monitoring and observability have become critical components in knowing how the systems are functioning.

    Logs don’t lie! They have been one of our greatest companions when investigating a production incident.

    How is logging in Kubernetes different?

    Log aggregation in Kubernetes differs greatly from logging on traditional servers or virtual machines, owing to the way it manages its applications (pods).

    When an app crashes on a virtual machine, its logs remain accessible until they are deleted. When pods are evicted, crashed, deleted, or scheduled on a different node in Kubernetes, the container logs are lost. The system is self-cleaning. As a result, you are left with no knowledge of why the anomaly occurred. Because default logging in Kubernetes is transient, a centralized log management solution is essential.

    Kubernetes is highly distributed and dynamic in nature; hence, in production, you’ll most certainly be working with multiple machines that have multiple containers each, which can crash at any time. Kubernetes clusters add to the complexity by introducing new layers that must be monitored, each of which generates its own type of log.

    We’ve curated some of the best tools to help you achieve this, alongside a simple guide on how to get started with each of them, as well as a comparison of these tools to match your use case.

    PLG Stack

    Introduction:

    Promtail is an agent that ships the logs from the local system to the Loki cluster.

    Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It indexes only metadata and doesn’t index the content of the log. This design decision makes it very cost-effective and easy to operate.

    Grafana is the visualisation tool which consumes data from Loki data source

    Loki is like Prometheus, but for logs: we prefer a multidimensional label-based approach to indexing and want a single-binary, easy to operate a system with no dependencies. Loki differs from Prometheus by focusing on logs instead of metrics, and delivering logs via push, instead of pull.

    Configuration Options:

    Installation with Helm chart –

    # Create a namespace to deploy PLG stack :
    
    kubectl create ns loki
    
    # Add Grafana's Helm Chart repository and Update repo :
    
    helm repo add grafana https://grafana.github.io/helm-charts
    helm repo update
    
    # Deploy the Loki stack :
    
    helm upgrade --install loki-stack grafana/loki-stack -n loki --set grafana.enabled=true
    
    # Retrieve password to log into Grafana with user admin
    
    kubectl get secret loki-stack-grafan -n loki -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
    
    # Finally execute command below to access the Grafana UI on http://localhost:3000
    
    kubectl port-forward -n loki service/loki-stack-grafana 3000:80

    Log in with user name “admin” and the password you retrieved earlier.

    Query Methods:

    Using CLI :

    Curl command to fetch logs directly from Loki

    curl -G -s "http://localhost:3100/loki/api/v1/query" 
    --data-urlencode 'query={job="shruti/logging-golang"}' | jq

    Using LogQL :

    • LogQL provides the functionality to filter logs through operators.

    For example :

    {container="kube-apiserver"} |= "error" != "timeout"

    • LogCLI is the command-line interface to Grafana Loki. It facilitates running LogQL queries against a Loki instance.

    For example :

    logcli query '{job="shruti/logging-golang"}'

    Using Dashboard :

    Click on Explore tab on the left side. Select Loki from the data source dropdown

    EFK Stack

    Introduction :

    The Elastic Stack contains most of the tools required for log management

    • Elastic search is an open source, distributed, RESTful and scalable search engine. It is a NoSQL database, primarily to store logs and retrive logs from Fluentd.
    • Log shippers such as LogStash, Fluentd , Fluent-bit. It is an open source log collection agent which support multiple data sources and output formats. It can forward logs to solutions like Stackdriver, CloudWatch, Splunk, Bigquery, etc.
    • Kibana as the UI tool for querying, data visualisation and dashboards. It has ability to virtually  build any type of dashboards using Kibana. Kibana Query Language (KQL) is used for querying elasticsearch data.
    • Fluentd ➖Deployed as daemonset as it need to collect the container logs from all the nodes. It connects to the Elasticsearch service endpoint to forward the logs.
    • ElasticSearch ➖ Deployed as statefulset as it holds the log data. A service endpoint is also exposed for Fluentd and Kibana to connect with it.
    • Kibana ➖ Deployed as deployment and connects to elasticsearch service endpoint.

    Configuration Options :

    Can be installed through helm chart as a Stack or as Individual components

    • Add the Elastic Helm charts repo:
     helm repo add elastic https://helm.elastic.co && helm repo update

    • More information related to deploying these Helm Charts can be found here

    After installation is complete and Kibana Dashboard is accessible, We need to define index patterns to be able to see logs in Kibana.

    From homepage, write Kibana / Index Patterns to search bar. Go to Index patterns page and click to Create index pattern on the right corner. You will see the list of index patterns here.

    Add required patterns to your indices and From left side menu, click to discover and check your logs 🙂

    Query Methods :

    • Elastic Search can be queried directly on any indices.
    • For Example –
    curl -XGET 'localhost:9200/my_index/my_type/_count?q=field:value&pretty'

    More information can be found here  https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html

    Using Dashboard :

    Graylog Stack

    Introduction

    Graylog is a leading centralised log management solution built to open standards for capturing, storing, and enabling real-time analysis of terabytes of machine. it supports the Master-Slave Architecture. The Graylog Stack — Graylog v3, Elasticsearch v6 along with MongoDB v3.

    Graylog is an open-source log management tool, using Elasticsearch as its storage. Unlike the ELK stack, which is built from individual components (Elasticsearch, Logstash, Kibana), Graylog is built as a complete package that can do everything.

    • One package with all the essentials of log processing: collect, parse, buffer, index, search, analyze
    • Additional features that you don’t get with the open-source ELK stack, such as role-based access control and alerts
    • Fits the needs of most centralized log management use-cases in one package
    • Easily scale both the storage (Elasticsearch) and the ingestion pipeline
    • Graylog’s extractors allow to extract fields out of log messages using a lot of methods such grok expression, regex and json

    Cons :

    • Visualization capabilities are limited, at least compared to ELK’s Kibana
    • Can’t use the whole ELK ecosystem, because they wouldn’t directly access the Elasticsearch API. Instead, Graylog has its own API
    • It is Not implemented for kubernetes distribution directly rather supports logging via fluent-bit/logstash/fluentd

    Configurations Options

    Graylog is very flexible in such a way that it supports multiple inputs (data sources ) we can mention :

    • GELF TCP.
    • GELF Kafka.
    • AWS Logs.

    as well as Outputs (how can Graylog nodes forward messages) — we can mention :

    • GELF Output.
    • STDOUT.- query via http / rest api

    Connecting External GrayLog Stack:

    • Host & IP (12201) TCP input to push logs to graylog stack directly

    Query Methods

    Using CLI:

    curl -u admin:password -H 'X-Requested-By: cli' "http://GRAYLOG_IP_OR_HOSTNAME/api/search/universal/relative?query=*&range=3600&limit=100&sort=timestamp:desc&pretty=true" -H "Accept: application/json" -H "Content-Type: application/json"

    Where:

    • query=* – replace * with your desired string
    • range=3600 – replace 3600 with time range (in seconds)
    • limit=100 – replace 100 with number of returned results
    • sort=timestamp:desc – replace timestamp:desc with field you want to sort

    Using Dashboard:

    One can easily navigate the filter section and perform search with the help of labels generated by log collectors.

    Splunk Stack

    Introduction

    Splunk is used for monitoring and searching through big data. It indexes and correlates information in a container that makes it searchable, and makes it possible to generate alerts, reports and visualisations.

    Configuration Options

    1. Helm based Installation as well as Operator based Installation is supported

    2. Splunk Connect for Kubernetes provides a way to import and search your Kubernetes logging, object, and metrics data in your Splunk platform deployment. Splunk Connect for Kubernetes supports importing and searching your container logs on ECS, EKS, AKS, GKE and Openshift

    3. Splunk Connect for Kubernetes supports installation using Helm.

    4. Splunk Connect for Kubernetes deploys a DaemonSet on each node. And in the DaemonSet, a Fluentd container runs and does the collecting job. Splunk Connector for Kubernetes collects three types of data – Logs, Objects and Metrics

    5. We need a minimum of two Splunk platform indexes

    One events index, which will handle logs and objects (you may also create two separate indexes for logs and objects).

    One metrics index. If you do not configure these indexes, Kubernetes Connect for Splunk uses the defaults created in your HTTP Event Collector (HEC) token.

    6. An HEC token will be required, before moving on to installation

    7. To install and configure defaults with Helm :

    Add Splunk chart repo

    helm repo add splunk <https://splunk.github.io/splunk-connect-for-kubernetes/>

    Get values file in your working directory and prepare this Values file.

    helm show values splunk/splunk-connect-for-kubernetes > values.yaml

    Once you have a Values file, you can simply install the chart with by running

    helm install my-splunk-connect -f values.yaml splunk/splunk-connect-for-kubernetes

    To learn more about using and modifying charts, see: https://github.com/splunk/splunk-connect-for-kubernetes/tree/main/helm-chart

    The values file for logging

    Query Methods

    Using CLI :

    curl --location -k --request GET '<https://localhost:8089/services/search/jobs/export?search=search%20index=%22event%22%20sourcetype=%22kube:container:docker-log-generator%22&output_mode=json>' -u admin:Admin123!

    Using Dashboard :

    Logging Stack

    Comparison of Tools

    Some of the other tools that are interesting but aren’t open source—but are too good not to talk about and offer end-to-end functionality for all your logging needs:

    Sumo Logic :

    This log management tool can store logs as well as metrics. It has a powerful search syntax, where you can define operations similarly to UNIX pipes.

    • Powerful query language
    • Capability to detect common log patterns and trends
    • Centralized management of agents
    • Supports Log Archival & Retention
    • Ability to perform Audit Trails and Compliance Tracking

    Configuration Options :

    • A subscription to Sumo Logic will be required
    • Helm installation
    • Provides options to install side-by-side existing Prometheus Operator

    More information can be found here!

    Cons :

    • Performance can be bad for searches over large data sets or long timeframes.
    • Deployment only available on Cloud, SaaS, and Web-Based
    • Expensive – Pricing is per ingested byte, so it forces you to pick and choose what you log, rather than ingesting everything and figuring it out later

    Datadog:

    Datadog is a SaaS that started up as a monitoring (APM) tool and later added log management capabilities as well.

    You can send logs via HTTP(S) or syslog, either via existing log shippers (rsyslog, syslog-ng, Logstash, etc.) or through Datadog’s own agent. With it, observe your logs in real-time using the Live Tail, without indexing them. You can also ingest all of the logs from your applications and infrastructure, decide what to index dynamically with filters, and then store them in an archive.

    It features Logging without Limits™, which is a double-edged sword: it’s harder to predict and manage costs, but you get pay-as-you-use pricing combined with the fact that you can archive and restore from archive

    • Log processing pipelines have the ability to process millions of logs per minute or petabytes per month seamlessly.
    • Automatically detects common log patterns
    • Can archive logs to AWS/Azure/Google Cloud storage and rehydrate them later
    • Easy search with good autocomplete (based on facets)
    • Integration with Datadog metrics and traces
    • Affordable, especially for short retention and/or if you rely on the archive for a few searches going back

    Configuration options :

    Using CLI :

    curl -X GET "<https://api.datadoghq.com/api/v2/logs/events>" -H "Content-Type: application/json" -H "DD-API-KEY: {DD_API_KEY}" -H "DD-APPLICATION-KEY: ${DD_APP_KEY}"

    Cons :

    • Not available on premises
    • It is a bit complicated to set up for the first time. Is not quite easy to use or know at first about all the available features that Datadog has. The interface is tricky and can be a hindrance sometimes. Following that, if application fields are not mapped in the right way, filters are not that useful.
    • Datadog per host pricing can be very expensive.

    Conclusion :

    As one can see, each software has its own benefits and downsides. Grafana’s Loki is more lightweight than Elastic Stack in overall performance, supporting Persistent Storage Options.

    That being said, the right solution platform really depends on each administrator’s needs.

    That’s all! Thank you.

    If you enjoyed this article, please like it.

    Feel free to drop a comment too.

  • Cloud Native Applications — The Why, The What & The How

    Cloud-native is an approach to build & run applications that can leverage the advantages of the cloud computing model — On demand computing power & pay-as-you-go pricing model. These applications are built and deployed in a rapid cadence to the cloud platform and offer organizations greater agility, resilience, and portability across clouds.

    This blog explains the importance, the benefits and how to go about building Cloud Native Applications.

    CLOUD NATIVE – The Why?

    Early technology adapters like FANG (Facebook, Amazon, Netflix & Google) have some common themes when it comes to shipping software. They have invested heavily in building capabilities that enable them to release new features regularly (weekly, daily or in some cases even hourly). They have achieved this rapid release cadence while supporting safe and reliable operation of their applications; in turn allowing them to respond more effectively to their customers’ needs.

    They have achieved this level of agility by moving beyond ad-hoc automation and by adopting cloud native practices that deliver these predictable capabilities. DevOps,Continuous Delivery, micro services & containers form the 4 main tenets of Cloud Native patterns. All of them have the same overarching goal of making application development and operations team more efficient through automation.

    At this point though, these techniques have only been successfully proven at the aforementioned software driven companies. Smaller, more agile companies are also realising the value here. However, as per Joe Beda(creator of Kubernetes & CTO at Heptio) there are very few examples of this philosophy being applied outside these technology centric companies.

    Any team/company shipping products should seriously consider adopting Cloud Native practices if they want to ship software faster while reducing risk and in turn delighting their customers.

    CLOUD NATIVE – The What?

    Cloud Native practices comprise of 4 main tenets.

     

    Cloud native — main tenets
    • DevOps is the collaboration between software developers and IT operations with the goal of automating the process of software delivery & infrastructure changes.
    • Continuous Delivery enables applications to released quickly, reliably & frequently, with less risk.
    • Micro-services is an architectural approach to building an application as a collection of small independent services that run on their own and communicate over HTTP APIs.
    • Containers provide light-weight virtualization by dynamically dividing a single server into one or more isolated containers. Containers offer both effiiciency & speed compared to standard Virual Machines (VMs). Containers provide the ability to manage and migrate the application dependencies along with the application. while abstracting away the OS and the underlying cloud platform in many cases.

    The benefits that can be reaped by adopting these methodologies include:

    1. Self managing infrastructure through automation: The Cloud Native practice goes beyond ad-hoc automation built on top of virtualization platforms, instead it focuses on orchestration, management and automation of the entire infrastructure right upto the application tier.
    2. Reliable infrastructure & application: Cloud Native practice ensures that it much easier to handle churn, replace failed components and even easier to recover from unexpected events & failures.
    3. Deeper insights into complex applications: Cloud Native tooling provides visualization for health management, monitoring and notifications with audit logs making applications easy to audit & debug
    4. Security: Enable developers to build security into applications from the start rather than an afterthought.
    5. More efficient use of resources: Containers are lighter in weight that full systems. Deploying applications in containers lead to increased resource utilization.

    Software teams have grown in size and the amount of applications and tools that a company needs to be build has grown 10x over last few years. Microservices break large complex applications into smaller pieces so that they can be developed, tested and managed independently. This enables a single microservice to be updated or rolled-back without affecting other parts of the application. Also nowadays software teams are distributed and microservices enables each team to own a small piece with service contracts acting as the communication layer.

    CLOUD NATIVE – The How?

    Now, lets look at the various building blocks of the cloud native stack that help achieve the above described goals. Here, we have grouped tools & solutions as per the problem they solve. We start with the infrastructure layer at the bottom, then the tools used to provision the infrastructure, following which we have the container runtime environment; above that we have tools to manage clusters of container environments and then at the very top we have the tools, frameworks to develop the applications.

    1. Infrastructure: At the very bottom, we have the infrastructure layer which provides the compute, storage, network & operating system usually provided by the Cloud (AWS, GCP, Azure, Openstack, VMware).

    2. Provisioning: The provisioning layer consists of automation tools that help in provisioning the infrastructure, managing images and deploying the application. Chef, Puppet & Ansible are the DevOps tools that give the ability to manage their configuration & environments. Spinnaker, Terraform, Cloudformation provide workflows to provision the infrastructure. Twistlock, Clair provide the ability to harden container images.

    3. Runtime: The Runtime provides the environment in which the application runs. It consists of the Container Engines where the application runs along with the associated storage & networking. containerd & rkt are the most widely used Container engines. Flannel, OpenContrail provide the necessary overlay networking for containers to interact with each other and the outside world while Datera, Portworx, AppOrbit etc. provide the necessary persistent storage enabling easy movement of containers across clouds.

    4. Orchestration and Management: Tools like Kubernetes, Docker Swarm and Apache Mesos abstract the management container clusters allowing easy scheduling & orchestration of containers across multiple hosts. etcd, Consul provide service registries for discovery while AVI, Envoy provide proxy, load balancer etc. services.

    5. Application Definition & Development: We can build micro-services for applications across multiple langauges — Python, Spring/Java, Ruby, Node. Packer, Habitat & Bitnami provide image management for the application to run across all infrastructure — container or otherwise. 
    Jenkins, TravisCI, CircleCI and other build automation servers provide the capability to setup continuous integration and delivery pipelines.

    6. Monitoring, Logging & Auditing: One of the key features of managing Cloud Native Infrastructure is the ability to monitor & audit the applications & underlying infrastructure.

    All modern monitoring platforms like Datadog, Newrelic, AppDynamic support monitoring of containers & microservices.

    Splunk, Elasticsearch & fluentd help in log aggregration while Open Tracing and Zipkin help in debugging applications.

    7. Culture: Adopting cloud native practices needs a cultural change where teams no longer work in independent silos. End-to-end automation of software delivery pipelines is only possible when there is an increased collaboration between development and IT operations team with a shared responbility.

    When we put all the pieces together we get the complete Cloud Native Landscape as shown below.

    Cloud Native Landscape

    I hope this post gives an idea why Cloud Native is important and what the main benefits are. As you may have noticed in the above infographic, there are several projects, tools & companies trying to solve similar problems. The next questions in mind most likely will be How do i get started? Which tools are right for me? and so on. I will cover these topics and more in my following blog posts. Stay tuned!

    Please let us know what you think by adding comments to this blog or reaching out to chirag_jog or Velotio on Twitter.

    Learn more about what we do at Velotio here and how Velotio can get you started on your cloud native journey here.

    References:

  • Container Security: Let’s Secure Your Enterprise Container Infrastructure!

    Introduction

    Containerized applications are becoming more popular with each passing year. A reason for this rise in popularity could be the pivotal role they play in Continuous Delivery by enabling fast and automated deployment of software services.

    Security still remains a major concern mainly because of the way container images are being used. In the world of VMs, infra/security team used to validate the OS images and installed packages for vulnerabilities. But with the adoption of containers, developers are building their own container images. Images are rarely built from scratch. They are typically built on some base image, which is itself built on top of other base images. When a developer builds a container image, he typically grabs a base image and other layers from public third party sources. These images and libraries may contain obsolete or vulnerable packages, thereby putting your infrastructure at risk. An added complexity is that many existing vulnerability-scanning tools may not work with containers, nor do they support container delivery workflows including registries and CI/CD pipelines. In addition, you can’t simply scan for vulnerabilities – you must scan, manage vulnerability fixes and enforce vulnerability-based policies.

    The Container Security Problem

    The table below shows the number of vulnerabilities found in the images available on dockerhub. Note that (as of April 2016) the worst offending community images contained almost 1,800 vulnerabilities! Official images were much better, but still contained 392 vulnerabilities in the worst case.

    If we look at the distribution of vulnerability severities, we see that pretty much all of them are high severity, for both official and community images. What we’re not told is the underlying distribution of vulnerability severities in the CVE database, so this could simply be a reflection of that distribution.

    Over 80% of the latest versions of official images contained at least one high severity vulnerability!

    • There are so many docker images readily available on dockerhub – are you sure the ones you are using are safe?
    • Do you know where your containers come from?
    • Are your developers downloading container images and libraries from unknown and potentially harmful sources?
    • Do the containers use third party library code that is obsolete or vulnerable?

    In this blog post, I will explain some of the solutions available which can help with these challenges. Solutions like ‘Docker scanning services‘, ‘Twistlock Trust’ and an open-source solution ‘Clair‘ from Coreos.com which can help in scanning and fixing vulnerability problems making your container images secure.

    Clair

    Clair is an open source project for the static analysis of vulnerabilities in application containers. It works as an API that analyzes every container layer to find known vulnerabilities using existing package managers such as Debian (dpkg), Ubuntu (dpkg), CentOS (rpm). It also can be used from the command line. It provides a list of vulnerabilities that threaten a container, and can notify users when new vulnerabilities that affect existing containers become known. In regular intervals, Clair ingests vulnerability metadata from a configured set of sources and stores it in the database. Clients use the Clair API to index their container images; this parses a list of installed source packages and stores them in the database. Clients use the Clair API to query the database; correlating data in real time, rather than a cached result that needs re-scanning.

    Clair identifies security issues that developers introduce in their container images. The vanilla process for using Clair is as follows:

    1. A developer programmatically submits their container image to Clair
    2. Clair analyzes the image, looking for security vulnerabilities
    3. Clair returns a detailed report of security vulnerabilities present in the image
    4. Developer acts based on the report

    How to use Clair

    Docker is required to follow along with this demonstration. Once Docker is installed, use the Dockerfile below to create an Ubuntu image that contains a version of SSL that is susceptible to Heartbleed attacks.

    #Dockerfile
    FROM ubuntu:precise-20160303
    #Install WGet
    RUN apt-get update
    RUN apt-get -f install
    RUN apt-get install -y wget
    #Install an OpenSSL vulnerable to Heartbleed (CVE-2014-0160)
    RUN wget --no-check-certificate https://launchpad.net/~ubuntu-security/+archive/ubuntu/ppa/+build/5436462/+files/openssl_1.0.1-4ubuntu5.11_amd64.deb
    RUN dpkg -i openssl_1.0.1-4ubuntu5.11_amd64.deb

    Build the image using below command:

    $ docker build . -t madhurnawandar/heartbeat

    After creating the insecure Docker image, the next step is to download and install Clair from here. The installation choice used for this demonstration was the Docker Compose solution. Once Clair is installed, it can be used via querying its API or through the clairctl command line tool. Submit the insecure Docker image created above to Clair for analysis and it will catch the Heartbleed vulnerability.

    $ clairctl analyze --local madhurnawandar/heartbeat
    Image: /madhurnawandar/heartbeat:latest
    9 layers found 
    ➜ Analysis [f3ce93f27451] found 0 vulnerabilities. 
    ➜ Analysis [738d67d10278] found 0 vulnerabilities. 
    ➜ Analysis [14dfb8014dea] found 0 vulnerabilities. 
    ➜ Analysis [2ef560f052c7] found 0 vulnerabilities. 
    ➜ Analysis [69a7b8948d35] found 0 vulnerabilities. 
    ➜ Analysis [a246ec1b6259] found 0 vulnerabilities. 
    ➜ Analysis [fc298ae7d587] found 0 vulnerabilities. 
    ➜ Analysis [7ebd44baf4ff] found 0 vulnerabilities. 
    ➜ Analysis [c7aacca5143d] found 52 vulnerabilities.
    $ clairctl report --local --format json madhurnawandar/heartbeat
    JSON report at reports/json/analysis-madhurnawandar-heartbeat-latest.json

    You can view the detailed report here.

    Docker Security Scanning

    Docker Cloud and Docker Hub can scan images in private repositories to verify that they are free from known security vulnerabilities or exposures, and report the results of the scan for each image tag. Docker Security Scanning is available as an add-on to Docker hosted private repositories on both Docker Cloud and Docker Hub.

    Security scanning is enabled on a per-repository basis and is only available for private repositories. Scans run each time a build pushes a new image to your private repository. They also run when you add a new image or tag. The scan traverses each layer of the image, identifies the software components in each layer, and indexes the SHA of each component.

    The scan compares the SHA of each component against the Common Vulnerabilities and Exposures (CVE®) database. The CVE is a “dictionary” of known information security vulnerabilities. When the CVE database is updated, the service reviews the indexed components for any that match the new vulnerability. If the new vulnerability is detected in an image, the service sends an email alert to the maintainers of the image.

    A single component can contain multiple vulnerabilities or exposures and Docker Security Scanning reports on each one. You can click an individual vulnerability report from the scan results and navigate to the specific CVE report data to learn more about it.

    Twistlock

    Twistlock is a rule-based access control policy system for Docker and Kubernetes containers. Twistlock is able to be fully integrated within Docker, with out-of-the-box security policies that are ready to use.

    Security policies can set the conditions for users to, say, create new containers but not delete them; or, they can launch containers but aren’t allowed to push code to them. Twistlock features the same policy management rules as those on Kubernetes, wherein a user can modify management policies but cannot delete them.

    Twistlock also handles image scanning. Users can scan an entire container image, including any packaged Docker application. Twistlock has done its due-diligence in this area, correlating with Red Hat and Mirantis to ensure no container is left vulnerable while a scan is running.

    Twistlock also deals with image scanning of containers within the registries themselves. In runtime environments, Twistlock features a Docker proxy running on the same server with an application’s other containers. This is essentially traffic filtering, whereupon the application container calling the Docker daemon is then re-routed through Twistlock. This approach enforces access control, allowing for safer configuration where no containers are set to run as root. It’s also able to SSH into an instance, for example. In order to delve into these layers of security, Twistlock enforces the policy at runtime.

    When new code is written in images, it is then integrated into the Twistlock API to push an event, whereupon the new image is deposited into the registry along with its unique IDs. It is then pulled out by Twistlock and scanned to ensure it complies with the set security policies in place. Twistlock deposits the scan result into the CI process so that developers can view the result for debugging purposes.

    Integrating these vulnerability scanning tools into your CI/CD Pipeline:

    These tools becomes more interesting paired with a CI server like Jenkins, TravisCI, etc. Given proper configuration, process becomes:

    1. A developer submits application code to source control
    2. Source control triggers a Jenkins build
    3. Jenkins builds the software containers necessary for the application
    4. Jenkins submits the container images to vulnerability scanning tool
    5. Tool identifies security vulnerabilities in the container
    6. Jenkins receives the security report, identifies a high vulnerability in the report, and stops the build

    Conclusion

    There are many solutions like ‘Docker scanning services’, ‘Twistlock Trust’, ‘Clair‘, etc to secure your containers. It’s critical for organizations to adopt such tools in their CI/CD pipelines. But this itself is not going to make containers secure. There are lot of guidelines available in the CIS Benchmark for containers like tuning kernel parameters, setting proper network configurations for inter-container connectivity, securing access to host level directories and others. I will cover these items in the next set of blogs. Stay tuned!

  • Continuous Deployment with Azure Kubernetes Service, Azure Container Registry & Jenkins

    Introduction

    Containerization has taken the application development world by storm. Kubernetes has become the standard way of deploying new containerized distributed applications used by the largest enterprises in a wide range of industries for mission-critical tasks, it has become one of the biggest open-source success stories.

    Although Google Cloud has been providing Kubernetes as a service since November 2014 (Note it started with a beta project), Microsoft with AKS (Azure Kubernetes Service) and Amazon with EKS (Elastic Kubernetes Service)  have jumped on to the scene in the second half of 2017.

    Example:

    AWS had KOPS

    Azure had Azure Container Service.

    However, they were wrapper tools available prior to these services which would help a user create a Kubernetes cluster, but the management and the maintenance (like monitoring and upgrades) needed efforts.

    Azure Container Registry:

    With container demand growing, there is always a need in the market for storing and protecting the container images. Microsoft provides a Geo Replica featured private repository as a service named Azure Container Registry.

    Azure Container Registry is a registry offering from Microsoft for hosting container images privately. It integrates well with orchestrators like Azure Container Service, including Docker Swarm, DC/OS, and the new Azure Kubernetes service. Moreover, ACR  provides capabilities such as Azure Active Directory-based authentication, webhook support, and delete operations.

    The coolest feature provided is Geo-Replication. This will create multiple copies of your image and distribute it across the globe and the container when spawned will have access to the image which is nearest.

    Although Microsoft has good documentation on how to set up ACR  in your Azure Subscription, we did encounter some issues and hence decided to write a blog on the precautions and steps required to configure the Registry in the correct manner.

    Note: We tried this using a free trial account. You can setup it up by referring the following link

    Prerequisites:

    • Make sure you have resource groups created in the supported region.
      Supported Regions: eastus, westeurope, centralus, canada central, canadaeast
    • If you are using Azure CLI for operations please make sure you use the version: 2.0.23 or 2.0.25 (This was the latest version at the time of writing this blog)

    Steps to install Azure CLI 2.0.23 or 2.0.25 (ubuntu 16.04 workstation):

    echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ wheezy main" |            
    sudo tee /etc/apt/sources.list.d/azure-cli.list
    sudo apt-key adv --keyserver packages.microsoft.com --recv-keys 52E16F86FEE04B979B07E28DB02C46DF417A0893
    sudo apt-get install apt-transport-httpssudo apt-get update && sudo apt-get install azure-cli
    
    Install a specific version:
    
    sudo apt install azure-cli=2.0.23-1
    sudo apt install azure-cli=2.0.25.1

    Steps for Container Registry Setup:

    • Login to your Azure Account:
    az  login --username --password

    • Create a resource group:
    az group create --name <RESOURCE-GROUP-NAME>  --location eastus
    Example : az group create --name acr-rg  --location eastus

    • Create a Container Registry:
    az acr create --resource-group <RESOURCE-GROUP-NAME> --name <CONTAINER-REGISTRY-NAME> --sku Basic --admin-enabled true
    Example : az acr create --resource-group acr-rg --name testacr --sku Basic --admin-enabled true

    Note: SKU defines the storage available for the registry for type Basic the storage available is 10GB, 1 WebHook and the billing amount is 11 Rs/day.

    For detailed information on the different SKU available visit the following link

    • Login to the registry :
    az acr login --name <CONTAINER-REGISTRY-NAME>
    Example :az acr login --name testacr

    • Sample docker file for a node application :
    FROM node:carbon
    # Create app directory
    WORKDIR /usr/src/app
    COPY package*.json ./
    # RUN npm install
    EXPOSE 8080
    CMD [ "npm", "start" ]

    • Build the docker image :
    docker build -t <image-tag>:<software>
    Example :docker build -t base:node8

    • Get the login server value for your ACR :
    az acr list --resource-group acr-rg --query "[].{acrLoginServer:loginServer}" --output table
    Output  :testacr.azurecr.io

    • Tag the image with the Login Server Value:
      Note: Get the image ID from docker images command

    Example:

    docker tag image-id testacr.azurecr.io/base:node8

    Push the image to the Azure Container Registry:Example:

    docker push testacr.azurecr.io/base:node8

    Microsoft does provide a GUI option to create the ACR.

    • List Images in the Registry:

    Example:

    az acr repository list --name testacr --output table

    • List tags for the Images:

    Example:

    az acr repository show-tags --name testacr --repository <name> --output table

    • How to use the ACR image in Kubernetes deployment: Use the login Server Name + the image name

    Example :

    containers:- 
    name: demo
    image: testacr.azurecr.io/base:node8

    Azure Kubernetes Service

    Microsoft released the public preview of Managed Kubernetes for Azure Container Service (AKS) on October 24, 2017. This service simplifies the deployment, management, and operations of Kubernetes. It features an Azure-hosted control plane, automated upgrades, self-healing, easy scaling.

    Similarly to Google AKE and Amazon EKS, this new service will allow access to the nodes only and the master will be managed by Cloud Provider. For more information visit the following link.

    Let’s now get our hands dirty and deploy an AKS infrastructure to play with:

    • Enable AKS preview for your Azure Subscription: At the time of writing this blog, AKS is in preview mode, it requires a feature flag on your subscription.
    az provider register -n Microsoft.ContainerService

    • Kubernetes Cluster Creation Command: Note: A new separate resource group should be created for the Kubernetes service.Since the service is in preview, it is available only to certain regions.

    Make sure you create a resource group under the following regions.

    eastus, westeurope, centralus, canadacentral, canadaeast
    az  group create  --name  <RESOURCE-GROUP>   --location eastus
    Example : az group create --name aks-rg --location eastus
    az aks create --resource-group <RESOURCE-GROUP-NAME> --name <CLUSTER-NAME>   --node-count 2 --generate-ssh-keys
    Example : az aks create --resource-group aks-rg --name akscluster  --node-count 2 --generate-ssh-keys

    Example with different arguments :

    Create a Kubernetes cluster with a specific version.

    az aks create -g MyResourceGroup -n MyManagedCluster --kubernetes-version 1.8.1

    Create a Kubernetes cluster with a larger node pool.

    az aks create -g MyResourceGroup -n MyManagedCluster --node-count 7

    Install the Kubectl CLI :

    To connect to the kubernetes cluster from the client computer Kubectl command line client is required.

    sudo az aks install-cli

    Note: If you’re using Azure CloudShell, kubectl is already installed. If you want to install it locally, run the above  command:

    • To configure kubectl to connect to your Kubernetes cluster :
    az aks get-credentials --resource-group=<RESOURCE-GROUP-NAME> --name=<CLUSTER-NAME>

    Example :

    CODE: <a href="https://gist.github.com/velotiotech/ac40b6014a435271f49ca0e3779e800f">https://gist.github.com/velotiotech/ac40b6014a435271f49ca0e3779e800f</a>.js

    • Verify the connection to the cluster :
    kubectl get nodes -o wide 

    • For all the command line features available for Azure check the link: https://docs.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest

    We had encountered a few issues while setting up the AKS cluster at the time of writing this blog. Listing them along with the workaround/fix:

    az aks create --resource-group aks-rg --name akscluster  --node-count 2 --generate-ssh-keys

    Error: Operation failed with status: ‘Bad Request’.

    Details: Resource provider registrations Microsoft.Compute, Microsoft.Storage, Microsoft.Network are needed we need to enable them.

    Fix: If you are using the trial account, click on subscriptions and check whether the following providers are registered or not :

    • Microsoft.Compute
    • Microsoft.Storage
    • Microsoft.Network
    • Microsoft.ContainerRegistry
    • Microsoft.ContainerService

    Error: We had encountered the following mentioned open issues at the time of writing this blog.

    1. Issue-1
    2. Issue-2
    3. Issue-3

    Jenkins setup for CI/CD with ACR, AKS

    Microsoft provides a solution template which will install the latest stable Jenkins version on a Linux (Ubuntu 14.04 LTS) VM along with tools and plugins configured to work with Azure. This includes:

    • git for source control
    • Azure Credentials plugin for connecting securely
    • Azure VM Agents plugin for elastic build, test and continuous integration
    • Azure Storage plugin for storing artifacts
    • Azure CLI to deploy apps using scripts

    Refer the below link to bring up the Instance

    Pipeline plan for Spinning up a Nodejs Application using ACR – AKS – Jenkins

    What the pipeline accomplishes :

    Stage 1:

    The code gets pushed in the Github. The Jenkins job gets triggered automatically. The Dockerfile is checked out from Github.

    Stage 2:

    Docker builds an image from the Dockerfile and then the image is tagged with the build number.Additionally, the latest tag is also attached to the image for the containers to use.

    Stage 3:

    We have default deployment and service YAML files stored on the Jenkins server. Jenkins makes a copy of the default YAML files, make the necessary changes according to the build and put them in a separate folder.

    Stage 4:

    kubectl was initially configured at the time of setting up AKS on the Jenkins server. The YAML files are fed to the kubectl util which in turn creates pods and services.

    Sample Jenkins pipeline code :

    node {      
      // Mark the code checkout 'stage'....        
        stage('Checkout the dockefile from GitHub') {            
          git branch: 'docker-file', credentialsId: 'git_credentials', url: 'https://gitlab.com/demo.git'        
        }        
        // Build and Deploy to ACR 'stage'...        
        stage('Build the Image and Push to Azure Container Registry') {                
          app = docker.build('testacr.azurecr.io/demo')                
          withDockerRegistry([credentialsId: 'acr_credentials', url: 'https://testacr.azurecr.io']) {                
          app.push("${env.BUILD_NUMBER}")                
          app.push('latest')                
          }        
         }        
         stage('Build the Kubernetes YAML Files for New App') {
    <The code here will differ depending on the YAMLs used for the application>        
      }        
      stage('Delpoying the App on Azure Kubernetes Service') {            
        app = docker.image('testacr.azurecr.io/demo:latest')            
        withDockerRegistry([credentialsId: 'acr_credentials', url: 'https://testacr.azurecr.io']) {            
        app.pull()            
        sh "kubectl create -f ."            
        }       
       }    
    }

    What we achieved:

    • We managed to create a private Docker registry on Azure using the ACR feature using az-cli 2.0.25.
    • Secondly, we were able to spin up a private Kubernetes cluster on Azure with 2 nodes.
    • Setup Up Jenkins using a pre-cooked template which had all the plugins necessary for communication with ACR and AKS.
    • Orchestrate  a Continuous Deployment pipeline in Jenkins which uses docker features.
  • Demystifying High Availability in Kubernetes Using Kubeadm

    Introduction

    The rise of containers has reshaped the way we develop, deploy and maintain the software. Containers allow us to package the different services that constitute an application into separate containers, and to deploy those containers across a set of virtual and physical machines. This gives rise to container orchestration tool to automate the deployment, management, scaling and availability of a container-based application. Kubernetes allows deployment and management of container-based applications at scale. Learn more about backup and disaster recovery for your Kubernetes clusters.

    One of the main advantages of Kubernetes is how it brings greater reliability and stability to the container-based distributed application, through the use of dynamic scheduling of containers. But, how do you make sure Kubernetes itself stays up when a component or its master node goes down?
     

     

    Why we need Kubernetes High Availability?

    Kubernetes High-Availability is about setting up Kubernetes, along with its supporting components in a way that there is no single point of failure. A single master cluster can easily fail, while a multi-master cluster uses multiple master nodes, each of which has access to same worker nodes. In a single master cluster the important component like API server, controller manager lies only on the single master node and if it fails you cannot create more services, pods etc. However, in case of Kubernetes HA environment, these important components are replicated on multiple masters(usually three masters) and if any of the masters fail, the other masters keep the cluster up and running.

    Advantages of multi-master

    In the Kubernetes cluster, the master node manages the etcd database, API server, controller manager, and scheduler, along with all the worker nodes. What if we have only a single master node and if that node fails, all the worker nodes will be unscheduled and the cluster will be lost.

    In a multi-master setup, by contrast, a multi-master provides high availability for a single cluster by running multiple apiserver, etcd, controller-manager, and schedulers. This does not only provides redundancy but also improves network performance because all the masters are dividing the load among themselves.

    A multi-master setup protects against a wide range of failure modes, from a loss of a single worker node to the failure of the master node’s etcd service. By providing redundancy, a multi-master cluster serves as a highly available system for your end-users.

    Steps to Achieve Kubernetes HA

    Before moving to steps to achieve high-availability, let us understand what we are trying to achieve through a diagram:

    (Image Source: Kubernetes Official Documentation)

    Master Node: Each master node in a multi-master environment run its’ own copy of Kube API server. This can be used for load balancing among the master nodes. Master node also runs its copy of the etcd database, which stores all the data of cluster. In addition to API server and etcd database, the master node also runs k8s controller manager, which handles replication and scheduler, which schedules pods to nodes.

    Worker Node: Like single master in the multi-master cluster also the worker runs their own component mainly orchestrating pods in the Kubernetes cluster. We need 3 machines which satisfy the Kubernetes master requirement and 3 machines which satisfy the Kubernetes worker requirement.

    For each master, that has been provisioned, follow the installation guide to install kubeadm and its dependencies. In this blog we will use k8s 1.10.4 to implement HA.

    Note: Please note that cgroup driver for docker and kubelet differs in some version of k8s, make sure you change cgroup driver to cgroupfs for docker and kubelet. If cgroup driver for kubelet and docker differs then the master doesn’t come up when rebooted.

    Setup etcd cluster

    1. Install cfssl and cfssljson

    $ curl -o /usr/local/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 
    $ curl -o /usr/local/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
    $ chmod +x /usr/local/bin/cfssl*
    $ export PATH=$PATH:/usr/local/bin

    2 . Generate certificates on master-0

    $ mkdir -p /etc/kubernetes/pki/etcd
    $ cd /etc/kubernetes/pki/etcd

    3. Create config.json file in /etc/kubernetes/pki/etcd folder with following content.

    {
        "signing": {
            "default": {
                "expiry": "43800h"
            },
            "profiles": {
                "server": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "server auth",
                        "client auth"
                    ]
                },
                "client": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "client auth"
                    ]
                },
                "peer": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "server auth",
                        "client auth"
                    ]
                }
            }
        }
    }

    4. Create ca-csr.json file in /etc/kubernetes/pki/etcd folder with following content.

    {
        "CN": "etcd",
        "key": {
            "algo": "rsa",
            "size": 2048
        }
    }

    5. Create client.json file in /etc/kubernetes/pki/etcd folder with following content.

    {
        "CN": "client",
        "key": {
            "algo": "ecdsa",
            "size": 256
        }
    }

    $ cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
    $ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client

    6. Create a directory  /etc/kubernetes/pki/etcd on master-1 and master-2 and copy all the generated certificates into it.

    7. On all masters, now generate peer and etcd certs in /etc/kubernetes/pki/etcd. To generate them, we need the previous CA certificates on all masters.

    $ export PEER_NAME=$(hostname)
    $ export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet K[d.]+')
    
    $ cfssl print-defaults csr > config.json
    $ sed -i 's/www.example.net/'"$PRIVATE_IP"'/' config.json
    $ sed -i 's/example.net/'"$PEER_NAME"'/' config.json
    $ sed -i '0,/CN/{s/example.net/'"$PEER_NAME"'/}' config.json
    
    $ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server config.json | cfssljson -bare server
    $ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer config.json | cfssljson -bare peer

    This will replace the default configuration with your machine’s hostname and IP address, so in case if you encounter any problem just check the hostname and IP address are correct and rerun cfssl command.

    8. On all masters, Install etcd and set it’s environment file.

    $ yum install etcd -y
    $ touch /etc/etcd.env
    $ echo "PEER_NAME=$PEER_NAME" >> /etc/etcd.env
    $ echo "PRIVATE_IP=$PRIVATE_IP" >> /etc/etcd.env

    9. Now, we will create a 3 node etcd cluster on all 3 master nodes. Starting etcd service on all three nodes as systemd. Create a file /etc/systemd/system/etcd.service on all masters.

    [Unit]
    Description=etcd
    Documentation=https://github.com/coreos/etcd
    Conflicts=etcd.service
    Conflicts=etcd2.service
    
    [Service]
    EnvironmentFile=/etc/etcd.env
    Type=notify
    Restart=always
    RestartSec=5s
    LimitNOFILE=40000
    TimeoutStartSec=0
    
    ExecStart=/bin/etcd --name <host_name>  --data-dir /var/lib/etcd --listen-client-urls http://<host_private_ip>:2379,http://127.0.0.1:2379 --advertise-client-urls http://<host_private_ip>:2379 --listen-peer-urls http://<host_private_ip>:2380 --initial-advertise-peer-urls http://<host_private_ip>:2380 --cert-file=/etc/kubernetes/pki/etcd/server.pem --key-file=/etc/kubernetes/pki/etcd/server-key.pem --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --initial-cluster master-0=http://<master0_private_ip>:2380,master-1=http://<master1_private_ip>:2380,master-2=http://<master2_private_ip>:2380 --initial-cluster-token my-etcd-token --initial-cluster-state new --client-cert-auth=false --peer-client-cert-auth=false
    
    [Install]
    WantedBy=multi-user.target

    10. Ensure that you will replace the following placeholder with

    • <host_name> : Replace as the master’s hostname</host_name>
    • <host_private_ip>: Replace as the current host private IP</host_private_ip>
    • <master0_private_ip>: Replace as the master-0 private IP</master0_private_ip>
    • <master1_private_ip>: Replace as the master-1 private IP</master1_private_ip>
    • <master2_private_ip>: Replace as the master-2 private IP</master2_private_ip>

    11. Start the etcd service on all three master nodes and check the etcd cluster health:

    $ systemctl daemon-reload
    $ systemctl enable etcd
    $ systemctl start etcd
    
    $ etcdctl cluster-health

    This will show the cluster healthy and connected to all three nodes.

    Setup load balancer

    There are multiple cloud provider solutions for load balancing like AWS elastic load balancer, GCE load balancing etc. There might not be a physical load balancer available, we can setup a virtual IP load balancer to healthy node master. We are using keepalived for load balancing, install keepalived on all master nodes

    $ yum install keepalived -y

    Create the following configuration file /etc/keepalived/keepalived.conf on all master nodes:

    ! Configuration File for keepalived
    global_defs {
      router_id LVS_DEVEL
    }
    
    vrrp_script check_apiserver {
      script "/etc/keepalived/check_apiserver.sh"
      interval 3
      weight -2
      fall 10
      rise 2
    }
    
    vrrp_instance VI_1 {
        state <state>
        interface <interface>
        virtual_router_id 51
        priority <priority>
        authentication {
            auth_type PASS
            auth_pass velotiotechnologies
        }
        virtual_ipaddress {
            <virtual ip>
        }
        track_script {
            check_apiserver
        }
    }

    • state is either MASTER (on the first master nodes) or BACKUP (the other master nodes).
    • Interface is generally the primary interface, in my case it is eth0
    • Priority should be higher for master node e.g 101 and lower for others e.g 100
    • Virtual_ip should contain the virtual ip of master nodes

    Install the following health check script to /etc/keepalived/check_apiserver.sh on all master nodes:

    #!/bin/sh
    
    errorExit() {
        echo "*** $*" 1>&2
        exit 1
    }
    
    curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
    if ip addr | grep -q <VIRTUAL-IP>; then
        curl --silent --max-time 2 --insecure https://<VIRTUAL-IP>:6443/ -o /dev/null || errorExit "Error GET https://<VIRTUAL-IP>:6443/"
    fi

    $ systemctl restart keepalived

    Setup three master node cluster

    Run kubeadm init on master0:

    Create config.yaml file with following content.

    apiVersion: kubeadm.k8s.io/v1alpha1
    kind: MasterConfiguration
    api:
      advertiseAddress: <master-private-ip>
    etcd:
      endpoints:
      - http://<master0-ip-address>:2379
      - http://<master1-ip-address>:2379
      - http://<master2-ip-address>:2379
      caFile: /etc/kubernetes/pki/etcd/ca.pem
      certFile: /etc/kubernetes/pki/etcd/client.pem
      keyFile: /etc/kubernetes/pki/etcd/client-key.pem
    networking:
      podSubnet: <podCIDR>
    apiServerCertSANs:
    - <load-balancer-ip>
    apiServerExtraArgs:
      endpoint-reconciler-type: lease

    Please ensure that the following placeholders are replaced:

    • <master-private-ip> with the private IPv4 of the master server on which config file resides.</master-private-ip>
    • <master0-ip-address>, <master1-ip-address> and <master-2-ip-address> with the IP addresses of your three master nodes</master-2-ip-address></master1-ip-address></master0-ip-address>
    • <podcidr> with your Pod CIDR. Please read the </podcidr>CNI network section of the docs for more information. Some CNI providers do not require a value to be set. I am using weave-net as pod network, hence podCIDR will be 10.32.0.0/12
    • <load-balancer-ip> with the virtual IP set up in the load balancer in the previous section.</load-balancer-ip>
    $ kubeadm init --config=config.yaml

    10. Run kubeadm init on master1 and master2:

    First of all copy /etc/kubernetes/pki/ca.crt, /etc/kubernetes/pki/ca.key, /etc/kubernetes/pki/sa.key, /etc/kubernetes/pki/sa.pub to master1’s and master2’s /etc/kubernetes/pki folder.

    Note: Copying this files is crucial, otherwise the other two master nodes won’t go into the ready state.

    Copy the config file config.yaml from master0 to master1 and master2. We need to change <master-private-ip> to current master host’s private IP.</master-private-ip>

    $ kubeadm init --config=config.yaml

    11. Now you can install pod network on all three masters to bring them in the ready state. I am using weave-net pod network, to apply weave-net run:

    export kubever=$(kubectl version | base64 | tr -d 'n') kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"

    12. By default, k8s doesn’t schedule any workload on the master, so if you want to schedule workload on master node as well, taint all the master nodes using the command:

    $ kubectl taint nodes --all node-role.kubernetes.io/master-

    13. Now that we have functional master nodes, we can join some worker nodes:

    Use the join string you got at the end of kubeadm init command

    $ kubeadm join 10.0.1.234:6443 --token llb1kx.azsbunpbg13tgc8k --discovery-token-ca-cert-hash sha256:1ad2a436ce0c277d0c5bd3826091e72badbd8417ffdbbd4f6584a2de588bf522

    High Availability in action

    The Kubernetes HA cluster will look like:

    [root@master-0 centos]# kubectl get nodes
    NAME       STATUS     ROLES     AGE       VERSION
    master-0   NotReady   master    4h        v1.10.4
    master-1   Ready      master    4h        v1.10.4
    master-2   Ready      master    4h        v1.10.4

    [root@master-0 centos]# kubectl get pods -n kube-system
    NAME                              READY     STATUS     RESTARTS   AGE
    kube-apiserver-master-0            1/1       Unknown    0          4h
    kube-apiserver-master-1            1/1       Running    0          4h
    kube-apiserver-master-2            1/1       Running    0          4h
    kube-controller-manager-master-0   1/1       Unknown    0          4h
    kube-controller-manager-master-1   1/1       Running    0          4h
    kube-controller-manager-master-2   1/1       Running    0          4h
    kube-dns-86f4d74b45-wh795          3/3       Running    0          4h
    kube-proxy-9ts6r                   1/1       Running    0          4h
    kube-proxy-hkbn7                   1/1       NodeLost   0          4h
    kube-proxy-sq6l6                   1/1       Running    0          4h
    kube-scheduler-master-0            1/1       Unknown    0          4h
    kube-scheduler-master-1            1/1       Running    0          4h
    kube-scheduler-master-2            1/1       Running    0          4h
    weave-net-6nzbq                    2/2       NodeLost   0          4h
    weave-net-ndx2q                    2/2       Running    0          4h
    weave-net-w2mfz                    2/2       Running    0          4h

    After failing over one master node the Kubernetes cluster is still accessible.

    [root@master-0 centos]# kubectl get nodes
    NAME       STATUS     ROLES     AGE       VERSION
    master-0   NotReady   master    4h        v1.10.4
    master-1   Ready      master    4h        v1.10.4
    master-2   Ready      master    4h        v1.10.4

    [root@master-0 centos]# kubectl get pods -n kube-system
    NAME                              READY     STATUS     RESTARTS   AGE
    kube-apiserver-master-0            1/1       Unknown    0          4h
    kube-apiserver-master-1            1/1       Running    0          4h
    kube-apiserver-master-2            1/1       Running    0          4h
    kube-controller-manager-master-0   1/1       Unknown    0          4h
    kube-controller-manager-master-1   1/1       Running    0          4h
    kube-controller-manager-master-2   1/1       Running    0          4h
    kube-dns-86f4d74b45-wh795          3/3       Running    0          4h
    kube-proxy-9ts6r                   1/1       Running    0          4h
    kube-proxy-hkbn7                   1/1       NodeLost   0          4h
    kube-proxy-sq6l6                   1/1       Running    0          4h
    kube-scheduler-master-0            1/1       Unknown    0          4h
    kube-scheduler-master-1            1/1       Running    0          4h
    kube-scheduler-master-2            1/1       Running    0          4h
    weave-net-6nzbq                    2/2       NodeLost   0          4h
    weave-net-ndx2q                    2/2       Running    0          4h
    weave-net-w2mfz                    2/2       Running    0          4h

    Even after one node failed, all the important components are up and running. The cluster is still accessible and you can create more pods, deployment services etc.

    [root@master-1 centos]# kubectl create -f nginx.yaml 
    deployment.apps "nginx-deployment" created
    [root@master-1 centos]# kubectl get pods -o wide
    NAME                                READY     STATUS    RESTARTS   AGE       IP              NODE
    nginx-deployment-75675f5897-884kc   1/1       Running   0          10s       10.117.113.98   master-2
    nginx-deployment-75675f5897-crgxt   1/1       Running   0          10s       10.117.113.2    master-1

    Conclusion

    High availability is an important part of reliability engineering, focused on making system reliable and avoid any single point of failure of the complete system. At first glance, its implementation might seem quite complex, but high availability brings tremendous advantages to the system that requires increased stability and reliability. Using highly available cluster is one of the most important aspects of building a solid infrastructure.