Tag: containers

  • Real Time Analytics for IoT Data using Mosquitto, AWS Kinesis and InfluxDB

    Internet of things (IoT) is maturing rapidly and it is finding application across various industries. Every common device that we use is turning into the category of smart devices. Smart devices are basically IoT devices. These devices captures various parameters in and around their environment leading to generation of a huge amount of data. This data needs to be collected, processed, stored and analyzed in order to get actionable insights from them. To do so, we need to build data pipeline.  In this blog we will be building a similar pipeline using Mosquitto, Kinesis, InfluxDB and Grafana. We will discuss all these individual components of the pipeline and the steps to build it.

    Why the Analysis of IoT data is different

    In an IoT setup, the data is generated by sensors that are distributed across various locations. In order to use the data generated by them we should first get them to a common location from where the various applications which want to process them can read it.

    Network Protocol

    IoT devices have low computational and network resources. Moreover, these devices write data in very short intervals thus high throughput is expected on the network. For transferring IoT data it is desirable to use lightweight network protocols. A protocol like HTTP uses a complex structure for communication resulting in consumption of more resources making it unsuitable for IoT data transfer. One of the lightweight protocol suitable for IoT data is MQTT which we are using in our pipeline. MQTT is designed for machine to machine (M2M) connectivity. It uses a publisher/subscriber communication model and helps clients to distribute telemetry data with very low network resource consumption. Along with IoT MQTT has been found to be useful in other fields as well.

    Other similar protocols include Constrained Application Protocol (CoAP), Advanced Message Queuing Protocol (AMQP) etc.

    Datastore   

    IoT devices generally collect telemetry about its environment usually through sensors. In most of the IoT scenarios, we try to analyze how things have changed over a period of time. Storing these data in a time series database makes our analysis simpler and better. InfluxDB is popular time series database which we will use in our pipeline. More about time series databases can be read here.

    Pipeline Overview

    The first thing we need for a data pipeline is data. As shown in the image above the data generated by various sensors are written to a topic in the MQTT message broker. To mimic sensors we will use a program which uses the MQTT client to write data to the MQTT broker.

    The next component is Amazon Kinesis which is used for streaming data analysis. It closely resembles apache Kafka which is an open source tool used for similar purposes. Kinesis brings the data generated by a number of clients to a single location from where different consumers can pull it for processing. We are using Kinesis so that multiple consumers can read data from a single location. This approach scales well even if we have multiple message brokers.

    Once the data is written to the MQTT broker a Kinesis producer subscribes to it and pull the data from it and writes it to the Kinesis stream, from the Kinesis stream the data is pulled by Kinesis consumers which processes the data and writes it to an InfluxDB which is a time series database.

    Finally, we use Grafana which is a well-known tool for analytics and monitoring, we can connect it to many popular databases and perform analytics and monitoring. Another popular tool in this space is Kibana (the K of ELK stack)

    Setting up a MQTT Message Broker Server:

    For MQTT message broker we will use Mosquitto which is a popular open source message broker that implements MQTT. The details of downloading and installing mosquitto for various platforms are available here.

    For Ubuntu, it can be installed using the following commands

    sudo apt-add-repository ppa:mosquitto-dev/mosquitto-ppa
    sudo apt-get update
    sudo apt-get install mosquitto
    service mosquitto status

    Setting up InfluxDB and Grafana

    The simplest way to set up both these components is to use their docker image directly

    docker run --name influxdb -p 8083:8083 -p 8086:8086 influxdb:1.0
    docker run --name grafana -p 3000:3000 --link influxdb grafana/grafana:3.1.1

    In InfluxDB we have mapped two ports, port 8086 is the HTTP API endpoint port while 8083 is the administration web server’s port. We need to create a database where we will write our data.

    For creating a database we can directly go to the console at <influxdb-ip>:8083 and run the command: </influxdb-ip>

    CREATE DATABASE "iotdata"

    Or we can do it via HTTP request :

    curl -XPOST "http://localhost:8086/query" --data-urlencode "q=CREATE DATABASE iotdata

    Creating a Kinesis stream

    In Kinesis, we create streams where the Kinesis producers write the data coming from various sources and then the Kinesis consumers read the data from the stream. In the stream, the data is stored in various shards. For our purpose, one shard would be enough.

    Creating the MQTT client

    We will use the Golang client available in this repository to connect with our message broker server and write data to a specific topic. We will first create a new MQTT client. Here we can see the list of options we have for configuring our MQTT client.

    Once we create the options object we can pass it to the NewClient() method which will return us the MQTT client. Now we can write data to the MQTT server. We have defined the structure of the data in the struct sensor data. Now to mimic two sensors which are writing telemetry data to the MQTT broker we have two goroutines which push data to the MQTT server every five seconds.

    package publisher
    
    import (
    	"config"
    	"encoding/json"
    	"fmt"
    	"log"
    	"math/rand"
    	"os"
    	"time"
    
    	"github.com/eclipse/paho.mqtt.golang"
    )
    
    type SensorData struct {
    	Id          string  `json:"id"`
    	Temperature float64 `json:"temperature"`
    	Humidity    float64 `json:"humidity"`
    	Timestamp   int64   `json:"timestamp"`
    	City        string  `json:"city"`
    }
    
    func StartMQTTPublisher() {
    	fmt.Println("MQTT publisher Started")
    	mqtt.DEBUG = log.New(os.Stdout, "", 0)
    	mqtt.ERROR = log.New(os.Stdout, "", 0)
    	opts := mqtt.NewClientOptions().AddBroker(config.GetMqttServerurl()).SetClientID("MqttPublisherClient")
    	opts.SetKeepAlive(2 * time.Second)
    	opts.SetPingTimeout(1 * time.Second)
    	c := mqtt.NewClient(opts)
    	if token := c.Connect(); token.Wait() && token.Error() != nil {
    		panic(token.Error())
    	}
    
    	go func() {
    		t := 20.04
    		h := 32.06
    		for i := 0; i < 100; i++ {
    			sensordata := SensorData{
    				Id:          "CITIMUM",
    				Temperature: t,
    				Humidity:    h,
    				Timestamp:   time.Now().Unix(),
    				City:        "Mumbai",
    			}
    			requestBody, err := json.Marshal(sensordata)
    			if err != nil {
    				fmt.Println(err)
    			}
    			token := c.Publish(config.GetMQTTTopicName(), 0, false, requestBody)
    			token.Wait()
    			if i < 50 {
    				t = t + 1*rand.Float64()
    				h = h + 1*rand.Float64()
    			} else {
    				t = t - 1*rand.Float64()
    				h = h - 1*rand.Float64()
    			}
    			time.Sleep(5 * time.Second)
    		}
    	}()
    	go func() {
    		t := 16.02
    		h := 24.04
    		for i := 0; i < 100; i++ {
    			sensordata := SensorData{
    				Id:          "CITIPUN",
    				Temperature: t,
    				Humidity:    h,
    				Timestamp:   time.Now().Unix(),
    				City:        "Pune",
    			}
    			requestBody, err := json.Marshal(sensordata)
    			if err != nil {
    				fmt.Println(err)
    			}
    			token := c.Publish(config.GetMQTTTopicName(), 0, false, requestBody)
    			token.Wait()
    			if i < 50 {
    				t = t + 1*rand.Float64()
    				h = h + 1*rand.Float64()
    			} else {
    				t = t - 1*rand.Float64()
    				h = h - 1*rand.Float64()
    			}
    			time.Sleep(5 * time.Second)
    		}
    	}()
    	time.Sleep(1000 * time.Second)
    	c.Disconnect(250)
    
    }

    Create a Kinesis Producer

    Now we will create a Kinesis producer which subscribes to the topic to which our MQTT client writes data and pull the data from the broker and pushes it to the Kinesis stream. Just like in the previous section here also we first create an MQTT client which connects to the message broker and subscribe to the topic to which our clients/publishers are going to write data to. In the client option, we have the option to define a function which will be called when data is written to this topic. We have created a function postDataTokinesisStream() which connects Kinesis using the Kinesis client and then writes data to the Kinesis stream, every time a data is pushed to the topic.

    package producer
    
    import (
    	"config"
    	"fmt"
    
    	"os"
    	"time"
    
    	"github.com/aws/aws-sdk-go/service/kinesis"
    
    	mqtt "github.com/eclipse/paho.mqtt.golang"
    )
    
    func postDataTokinesisStream(client mqtt.Client, message mqtt.Message) {
    	fmt.Printf("Received message on topic: %snMessage: %sn", message.Topic(), message.Payload())
    	streamName := config.GetKinesisStreamName()
    	kclient := config.GetKinesisClient()
    	var putRecordInput kinesis.PutRecordInput
    	partitionKey := message.Topic()
    	putRecordInput.PartitionKey = &partitionKey
    	putRecordInput.StreamName = &streamName
    	putRecordInput.Data = message.Payload()
    	putRecordOutput, err := kclient.PutRecord(&putRecordInput)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println(putRecordOutput)
    	}
    
    }
    
    func StartKinesisProducer() {
    	fmt.Println("Kinesis Producer Started")
    	c := make(chan os.Signal, 1)
    	opts := mqtt.NewClientOptions().AddBroker(config.GetMqttServerurl()).SetClientID("MqttSubscriberClient")
    	opts.SetKeepAlive(2 * time.Second)
    	opts.SetPingTimeout(1 * time.Second)
    	opts.OnConnect = func(c mqtt.Client) {
    		if token := c.Subscribe(config.GetMQTTTopicName(), 0, postDataTokinesisStream); token.Wait() && token.Error() != nil {
    			panic(token.Error())
    		}
    	}
    
    	client := mqtt.NewClient(opts)
    	if token := client.Connect(); token.Wait() && token.Error() != nil {
    		panic(token.Error())
    	} else {
    		fmt.Printf("Connected to %sn", config.GetMqttServerurl())
    	}
    
    	<-c
    }

    Create a Kinesis Consumer

    Now the data is available in our Kinesis stream we can pull it for processing. In the Kinesis consumer section, we create a Kinesis client just like we did in the previous section and then pull data from it. Here we first make a call to the DescribeStream method which returns us the shardId, we then use this shardId to get the ShardIterator and then finally we are able to fetch the records by passing the ShardIterator to GetRecords() method. GetRecords() also returns the  NextShardIterator which we can use to continuously look for records in the shard until NextShardIterator becomes null.

    package consumer
    
    import (
    	"config"
    	"fmt"
    
    	"github.com/aws/aws-sdk-go/service/kinesis"
    	"velotio.com/dao"
    )
    
    func StartKinesisConsumer() {
    	fmt.Println("Kinesis Consumer Started")
    	client := config.GetKinesisClient()
    	streamName := config.GetKinesisStreamName()
    	var describeStreamInput kinesis.DescribeStreamInput
    	describeStreamInput.StreamName = &streamName
    	describeStreamOutput, err := client.DescribeStream(&describeStreamInput)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println(*describeStreamOutput.StreamDescription.Shards[0].ShardId)
    	}
    	var getShardIteratorInput kinesis.GetShardIteratorInput
    	getShardIteratorInput.ShardId = describeStreamOutput.StreamDescription.Shards[0].ShardId
    	getShardIteratorInput.StreamName = &streamName
    	shardIteratorType := "TRIM_HORIZON"
    	getShardIteratorInput.ShardIteratorType = &shardIteratorType
    	getShardIteratorOuput, err := client.GetShardIterator(&getShardIteratorInput)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println(*getShardIteratorOuput.ShardIterator)
    	}
    	var getRecordsInput kinesis.GetRecordsInput
    
    	getRecordsInput.ShardIterator = getShardIteratorOuput.ShardIterator
    	getRecordsOuput, err := client.GetRecords(&getRecordsInput)
    	//fmt.Println(getRecordsOuput)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		for *getRecordsOuput.NextShardIterator != "" {
    			i := 0
    			for i < len(getRecordsOuput.Records) {
    				//fmt.Println(len(getRecordsOuput.Records))
    				sdf := &dao.SensorDataFiltered{}
    				sdf.PostDataToInfluxDB(getRecordsOuput.Records[i].Data)
    				i++
    			}
    			getRecordsInput.ShardIterator = getRecordsOuput.NextShardIterator
    			getRecordsOuput, err = client.GetRecords(&getRecordsInput)
    		}
    
    	}
    }

    Processing the data and writing it to InfluxDB

    Now we do simple processing of filtering out data. The data that we got from the sensor is having fields sensorId, temperature, humidity, city, and timestamp but we are interested in only the values of temperature and humidity for a city so we have created a new structure ‘SensorDataFiltered’ which contains only the fields we need.

    For every record that the Kinesis consumer receives it creates an instance of the SensorDataFiltered type and calls the PostDataToInfluxDB() method where the record received from the Kinesis stream is unmarshaled into the SensorDataFiltered type and send to InfluxDB. Here we need to provide the name of the database we created earlier to the variable dbName and the InfluxDB host and port values to dbHost and dbPort respectively.

    In the InfluxDB request body, the first value that we provide is used as the measurement which is an InfluxDB struct to store similar data together. Then we have tags, we have used `city` as our tag so that we can filter the data based on them and then we have the actual values. For more details on InfluxDB data write format please refer here.

    package dao
    
    import (
    	"bytes"
    	"crypto/tls"
    	"encoding/json"
    	"fmt"
    	"net/http"
    )
    
    type SensorDataFiltered struct {
    	Temperature float64 `json:"temperature"`
    	Humidity    float64 `json:"humidity"`
    	City        string  `json:"city"`
    }
    
    var dbName = "iotdata"
    var dbHost = "184.73.62.30"
    var dbPort = "8086"
    
    func (sdf *SensorDataFiltered) PostDataToInfluxDB(Data []byte) {
    	err := json.Unmarshal(Data, &sdf)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println(sdf.Temperature, sdf.Humidity)
    	}
    	url := "http://" + dbHost + ":" + dbPort + "/write?db=" + dbName
    	humidity := fmt.Sprintf("%.2f", sdf.Humidity)
    	temperature := fmt.Sprintf("%.2f", sdf.Temperature)
    	city := sdf.City
    	requestBody := "sensordata,city=" + city + " humidity=" + humidity + ",temperature=" + temperature
    	req, err := http.NewRequest("POST", url, bytes.NewBuffer([]byte(requestBody)))
    	httpclient := &http.Client{
    		Transport: &http.Transport{
    			TLSClientConfig: &tls.Config{
    				InsecureSkipVerify: true,
    			},
    		},
    	}
    	resp, err := httpclient.Do(req)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println("Status code for influxdb data port request = ", resp.StatusCode)
    	}
    	defer resp.Body.Close()
    
    }

    Once the data is written to InfluxDB we can see it in the web console by querying the measurement create in our database.

    Putting everything together in our main function

    Now we need to simply call the functions we discussed above and run our main program. Note that we have used `go` before the first two function call which makes them goroutines and they execute concurrently.

    On running the code you will see the logs for all the stages of our pipeline getting written to the stdout and it very closely resembles real-life scenarios where data is written by IoT devices and gets processed in near real-time.

    package main
    
    import (
    	"time"
    
    	"velotio.com/consumer"
    	"velotio.com/producer"
    	"velotio.com/publisher"
    )
    
    func main() {
    
    	go producer.StartKinesisProducer()
    	go publisher.StartMQTTPublisher()
    	time.Sleep(5 * time.Second)
    	consumer.StartKinesisConsumer()
    
    }

    Visualization through Grafana

    We can access the Grafana web console at port 3000 of the machine on which it is running. First, we need to add our InfluxDB as a data source to it under the data sources option.

    For creating dashboard go to the dashboard option and choose new. Once the dashboard is created we can start by adding a panel.

    We need to add Influxdb data source that we added earlier as the panel data source and write queries as shown in the image below.

    We can repeat the same process for adding another panel to the dashboard this time choosing a different city in our query.

    Conclusion:

    IoT data analytics is a fast evolving and interesting space. The number of IoT devices are growing rapidly. There is a great opportunity to get valuable insights from the huge amount of data generated by these device. In this blog, I tried to help you grab that opportunity by building a near real time data pipeline for IoT data. If you like it please share and subscribe to our blog.

  • Taking Amazon’s Elastic Kubernetes Service for a Spin

    With the introduction of Elastic Kubernetes service at AWS re: Invent last year, AWS finally threw their hat in the ever booming space of managed Kubernetes services. In this blog post, we will learn the basic concepts of EKS, launch an EKS cluster and also deploy a multi-tier application on it.

    What is Elastic Kubernetes service (EKS)?

    Kubernetes works on a master-slave architecture. The master is also referred to as control plane. If the master goes down it brings our entire cluster down, thus ensuring high availability of master is absolutely critical as it can be a single point of failure. Ensuring high availability of master and managing all the worker nodes along with it becomes a cumbersome task in itself, thus it is most desirable for organizations to have managed Kubernetes cluster so that they can focus on the most important task which is to run their applications rather than managing the cluster. Other cloud providers like Google cloud and Azure already had their managed Kubernetes service named GKE and AKS respectively. Similarly now with EKS Amazon has also rolled out its managed Kubernetes cluster to provide a seamless way to run Kubernetes workloads.

    Key EKS concepts:

    EKS takes full advantage of the fact that it is running on AWS so instead of creating Kubernetes specific features from the scratch they have reused/plugged in the existing AWS services with EKS for achieving Kubernetes specific functionalities. Here is a brief overview:

    IAM-integration: Amazon EKS integrates IAM authentication with Kubernetes RBAC ( role-based access control system native to Kubernetes) with the help of Heptio Authenticator which is a tool that uses AWS IAM credentials to authenticate to a Kubernetes cluster. Here we can directly attach an RBAC role with an IAM entity this saves the pain of managing another set of credentials at the cluster level.

    Container Interface:  AWS has developed an open source cni plugin which takes advantage of the fact that multiple network interfaces can be attached to a single EC2 instance and these interfaces can have multiple secondary private ips associated with them, these secondary ips are used to provide pods running on EKS with real ip address from VPC cidr pool. This improves the latency for inter pod communications as the traffic flows without any overlay.  

    ELB Support:  We can use any of the AWS ELB offerings (classic, network, application) to route traffic to our service running on the working nodes.

    Auto scaling:  The number of worker nodes in the cluster can grow and shrink using the EC2 auto scaling service.

    Route 53: With the help of the External DNS project and AWS route53 we can manage the DNS entries for the load balancers which get created when we create an ingress object in our EKS cluster or when we create a service of type LoadBalancer in our cluster. This way the DNS names are always in sync with the load balancers and we don’t have to give separate attention to it.   

    Shared responsibility for cluster: The responsibilities of an EKS cluster is shared between AWS and customer. AWS takes care of the most critical part of managing the control plane (api server and etcd database) and customers need to manage the worker node. Amazon EKS automatically runs Kubernetes with three masters across three Availability Zones to protect against a single point of failure, control plane nodes are also monitored and replaced if they fail, and are also patched and updated automatically this ensures high availability of the cluster and makes it extremely simple to migrate existing workloads to EKS.

    Prerequisites for launching an EKS cluster:

    1.  IAM role to be assumed by the cluster: Create an IAM role that allows EKS to manage a cluster on your behalf. Choose EKS as the service which will assume this role and add AWS managed policies ‘AmazonEKSClusterPolicy’ and ‘AmazonEKSServicePolicy’ to it.

    2.  VPC for the cluster:  We need to create the VPC where our cluster is going to reside. We need a VPC with subnets, internet gateways and other components configured. We can use an existing VPC for this if we wish or create one using the CloudFormation script provided by AWS here or use the Terraform script available here. The scripts take ‘cidr’ block of the VPC and three other subnets as arguments.

    Launching an EKS cluster:

    1.  Using the web console: With the prerequisites in place now we can go to the EKS console and launch an EKS cluster when we try to launch an EKS cluster we need to provide a the name of the EKS cluster, choose the Kubernetes version to use, provide the IAM role we created in step one and also choose a VPC, once we choose a VPC we also need to select subnets from the VPC where we want our worker nodes to be launched by default all the subnets in the VPC are selected we also need to provide a security group which is applied to the elastic network interfaces (eni) that EKS creates to allow control plane communicate with the worker nodes.

    NOTE: Couple of things to note here is that the subnets must be in at least two different availability zones and the security group that we provided is later updated when we create worker node cluster so it is better to not use this security group with any other entity or be completely sure of the changes happening to it.

    2. Using awscli :

    aws eks create-cluster --name eks-blog-cluster --role-arn arn:aws:iam::XXXXXXXXXXXX:role/eks-service-role  
    --resources-vpc-config subnetIds=subnet-0b8da2094908e1b23,subnet-01a46af43b2c5e16c,securityGroupIds=sg-03fa0c02886c183d4

    {
        "cluster": {
            "status": "CREATING",
            "name": "eks-blog-cluster",
            "certificateAuthority": {},
            "roleArn": "arn:aws:iam::XXXXXXXXXXXX:role/eks-service-role",
            "resourcesVpcConfig": {
                "subnetIds": [
                    "subnet-0b8da2094908e1b23",
                    "subnet-01a46af43b2c5e16c"
                ],
                "vpcId": "vpc-0364b5ed9f85e7ce1",
                "securityGroupIds": [
                    "sg-03fa0c02886c183d4"
                ]
            },
            "version": "1.10",
            "arn": "arn:aws:eks:us-east-1:XXXXXXXXXXXX:cluster/eks-blog-cluster",
            "createdAt": 1535269577.147
        }
    }

    In the response, we see that the cluster is in creating state. It will take a few minutes before it is available. We can check the status using the below command:

    aws eks describe-cluster --name=eks-blog-cluster

    Configure kubectl for EKS:

    We know that in Kubernetes we interact with the control plane by making requests to the API server. The most common way to interact with the API server is via kubectl command line utility. As our cluster is ready now we need to install kubectl.

    1.  Install the kubectl binary

    curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s 
    https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl

    Give executable permission to the binary.

    chmod +x ./kubectl

    Move the kubectl binary to a folder in your system’s $PATH.

    sudo cp ./kubectl /bin/kubectl && export PATH=$HOME/bin:$PATH

    As discussed earlier EKS uses AWS IAM Authenticator for Kubernetes to allow IAM authentication for your Kubernetes cluster. So we need to download and install the same.

    2.  Install aws-iam-authenticator

    curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator

    Give executable permission to the binary

    chmod +x ./aws-iam-authenticator

    Move the aws-iam-authenticator binary to a folder in your system’s $PATH.

    sudo cp ./aws-iam-authenticator /bin/aws-iam-authenticator

    3.  Create the kubeconfig file

    First create the directory.

    mkdir -p ~/.kube

    Open a config file in the folder created above

    sudo vi .kube/config-eks-blog-cluster

    Paste the below code in the file

    clusters:      
    - cluster:       
    server: https://DBFE36D09896EECAB426959C35FFCC47.sk1.us-east-1.eks.amazonaws.com        
    certificate-authority-data: ”....................”        
    name: kubernetes        
    contexts:        
    - context:             
    cluster: kubernetes             
    user: aws          
    name: aws        
    current-context: aws        
    kind: Config       
    preferences: {}        
    users:           
    - name: aws            
    user:                
    exec:                    
    apiVersion: client.authentication.k8s.io/v1alpha1                    
    command: aws-iam-authenticator                    
    args:                       
    - "token"                       
    - "-i"                     
    - “eks-blog-cluster"

    Replace the values of the server and certificateauthority data with the values of your cluster and certificate and also update the cluster name in the args section. You can get these values from the web console as well as using the command.

    aws eks describe-cluster --name=eks-blog-cluster

    Save and exit.

    Add that file path to your KUBECONFIG environment variable so that kubectl knows where to look for your cluster configuration.

    export KUBECONFIG=$KUBECONFIG:~/.kube/config-eks-blog-cluster

    To verify that the kubectl is now properly configured :

    kubectl get all
    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    service/kubernetes ClusterIP 172.20.0.1  443/TCP 50m

    Launch and configure worker nodes :

    Now we need to launch worker nodes before we can start deploying apps. We can create the worker node cluster by using the CloudFormation script provided by AWS which is available here or use the Terraform script available here.

    • ClusterName: Name of the Amazon EKS cluster we created earlier.
    • ClusterControlPlaneSecurityGroup: Id of the security group we used in EKS cluster.
    • NodeGroupName: Name for the worker node auto scaling group.
    • NodeAutoScalingGroupMinSize: Minimum number of worker nodes that you always want in your cluster.
    • NodeAutoScalingGroupMaxSize: Maximum number of worker nodes that you want in your cluster.
    • NodeInstanceType: Type of worker node you wish to launch.
    • NodeImageId: AWS provides Amazon EKS-optimized AMI to be used as worker nodes. Currently AKS is available in only two AWS regions Oregon and N.virginia and the AMI ids are ami-02415125ccd555295 and ami-048486555686d18a0 respectively
    • KeyName: Name of the key you will use to ssh into the worker node.
    • VpcId: Id of the VPC that we created earlier.
    • Subnets: Subnets from the VPC we created earlier.

    To enable worker nodes to join your cluster, we need to download, edit and apply the AWS authenticator config map.

    Download the config map:

    curl -O https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/aws-auth-cm.yaml

    Open it in an editor

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: aws-auth
      namespace: kube-system
    data:
      mapRoles: |
        - rolearn: <ARN of instance role (not instance profile)>
          username: system:node:{{EC2PrivateDNSName}}
          groups:
            - system:bootstrappers
            - system:nodes

    Edit the value of rolearn with the arn of the role of your worker nodes. This value is available in the output of the scripts that you ran. Save the change and then apply

    kubectl apply -f aws-auth-cm.yaml

    Now you can check if the nodes have joined the cluster or not.

    kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    ip-10-0-2-171.ec2.internal Ready  12s v1.10.3
    ip-10-0-3-58.ec2.internal Ready  14s v1.10.3

    Deploying an application:

    As our cluster is completely ready now we can start deploying applications on it. We will deploy a simple books api application which connects to a mongodb database and allows users to store,list and delete book information.

    1. MongoDB Deployment YAML

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: mongodb
    spec:
      template:
        metadata:
          labels:
            app: mongodb
        spec:
          containers:
          - name: mongodb
            image: mongo
            ports:
            - name: mongodbport
              containerPort: 27017
              protocol: TCP

    2. Test Application Development YAML

    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: test-app
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            app: test-app
        spec:
          containers:
          - name: test-app
            image: akash125/pyapp
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 3000

    3. MongoDB Service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: mongodb-service
    spec:
      ports:
      - port: 27017
        targetPort: 27017
        protocol: TCP
        name: mongodbport
      selector:
        app: mongodb

    4. Test Application Service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: test-service
    spec:
      type: LoadBalancer
      ports:
      - name: test-service
        port: 80
        protocol: TCP
        targetPort: 3000
      selector:
        app: test-app

    Services

    $ kubectl create -f mongodb-service.yaml
    $ kubectl create -f testapp-service.yaml

    Deployments

    $ kubectl create -f mongodb-deployment.yaml
    $ kubectl create -f testapp-deployment.yaml$ kubectl get services
    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 12m
    mongodb-service ClusterIP 172.20.55.194 <none> 27017/TCP 4m
    test-service LoadBalancer 172.20.188.77 a7ee4f4c3b0ea 80:31427/TCP 3m

    In the EXTERNAL-IP section of the test-service we see dns of an load balancer we can now access the application from outside the cluster using this dns.

    To Store Data :

    curl -X POST -d '{"name":"A Game of Thrones (A Song of Ice and Fire)“, "author":"George R.R. Martin","price":343}' http://a7ee4f4c3b0ea11e8b0f912f36098e4d-672471149.us-east-1.elb.amazonaws.com/books
    {"id":"5b8fab49fa142b000108d6aa","name":"A Game of Thrones (A Song of Ice and Fire)","author":"George R.R. Martin","price":343}

    To Get Data :

    curl -X GET http://a7ee4f4c3b0ea11e8b0f912f36098e4d-672471149.us-east-1.elb.amazonaws.com/books
    [{"id":"5b8fab49fa142b000108d6aa","name":"A Game of Thrones (A Song of Ice and Fire)","author":"George R.R. Martin","price":343}]

    We can directly put the URL used in the curl operation above in our browser as well, we will get the same response.

    Now our application is deployed on EKS and can be accessed by the users.

    Comparison BETWEEN GKE, ECS and EKS:

    Cluster creation: Creating GKE and ECS cluster is way simpler than creating an EKS cluster. GKE being the simplest of all three.

    Cost: In case of both, GKE and ECS we pay only for the infrastructure that is visible to us i.e., servers, volumes, ELB etc. and there is no cost for master nodes or other cluster management services but with EKS there is a charge of 0.2 $ per hour for the control plane.

    Add-ons: GKE provides the option of using Calico as the network plugin which helps in defining network policies for controlling inter pod communication (by default all pods in k8s can communicate with each other).

    Serverless: ECS cluster can be created using Fargate which is container as a Service (CaaS) offering from AWS. Similarly EKS is also expected to support Fargate very soon.

    In terms of availability and scalability all the services are at par with each other.

    Conclusion:

    In this blog post we learned the basics concepts of EKS, launched our own EKS cluster and deployed an application as well. EKS is much awaited service from AWS especially for the folks who were already running their Kubernetes workloads on AWS, as now they can easily migrate to EKS and have a fully managed Kubernetes control plane. EKS is expected to be adopted by many organisations in near future.

    References:

  • Continuous Integration & Delivery (CI/CD) for Kubernetes Using CircleCI & Helm

    Introduction

    Kubernetes is getting adopted rapidly across the software industry and is becoming the most preferred option for deploying and managing containerized applications. Once we have a fully functional Kubernetes cluster we need to have an automated process to deploy our applications on it. In this blog post, we will create a fully automated “commit to deploy” pipeline for Kubernetes. We will use CircleCI & helm for it.

    What is CircleCI?

    CircleCI is a fully managed saas offering which allows us to build, test or deploy our code on every check in. For getting started with circle we need to log into their web console with our GitHub or bitbucket credentials then add a project for the repository we want to build and then add the CircleCI config file to our repository. The CircleCI config file is a yaml file which lists the steps we want to execute on every time code is pushed to that repository.

    Some salient features of CircleCI is:

    1. Little or no operational overhead as the infrastructure is managed completely by CircleCI.
    2. User authentication is done via GitHub or bitbucket so user management is quite simple.
    3. It automatically notifies the build status on the github/bitbucket email ids of the users who are following the project on CircleCI.
    4. The UI is quite simple and gives a holistic view of builds.
    5. Can be integrated with Slack, hipchat, jira, etc.

    What is Helm?

    Helm is chart manager where chart refers to package of Kubernetes resources. Helm allows us to bundle related Kubernetes objects into charts and treat them as a single unit of deployment referred to as release.  For example, you have an application app1 which you want to run on Kubernetes. For this app1 you create multiple Kubernetes resources like deployment, service, ingress, horizontal pod scaler, etc. Now while deploying the application you need to create all the Kubernetes resources separately by applying their manifest files. What helm does is it allows us to group all those files into one chart (Helm chart) and then we just need to deploy the chart. This also makes deleting and upgrading the resources quite simple.

    Some other benefits of Helm is:

    1. It makes the deployment highly configurable. Thus just by changing the parameters, we can use the same chart for deploying on multiple environments like stag/prod or multiple cloud providers.
    2. We can rollback to a previous release with a single helm command.
    3. It makes managing and sharing Kubernetes specific application much simpler.

    Note: Helm is composed of two components one is helm client and the other one is tiller server. Tiller is the component which runs inside the cluster as deployment and serves the requests made by helm client. Tiller has potential security vulnerabilities thus we will use tillerless helm in our pipeline which runs tiller only when we need it.

    Building the Pipeline

    Overview:

    We will create the pipeline for a Golang application. The pipeline will first build the binary, create a docker image from it, push the image to ECR, then deploy it on the Kubernetes cluster using its helm chart.

    We will use a simple app which just exposes a `hello` endpoint and returns the hello world message:

    package main
    
    import (
    	"encoding/json"
    	"net/http"
    	"log"
    	"github.com/gorilla/mux"
    )
    
    type Message struct {
    	Msg string
    }
    
    func helloWorldJSON(w http.ResponseWriter, r *http.Request) {
    	m := Message{"Hello World"}
    	response, _ := json.Marshal(m)
    	w.Header().Set("Content-Type", "application/json")
    	w.WriteHeader(http.StatusOK)
    	w.Write(response)
    }
    func main() {
    	r := mux.NewRouter()
    	r.HandleFunc("/hello", helloWorldJSON).Methods("GET")
    	if err := http.ListenAndServe(":8080", r); err != nil {
    		log.Fatal(err)
    	}
    }

    We will create a docker image for hello app using the following Dockerfile:

    FROM centos/systemd
    
    MAINTAINER "Akash Gautam" <akash.gautam@velotio.com>
    
    COPY hello-app  /
    
    ENTRYPOINT ["/hello-app"]

    Creating Helm Chart:

    Now we need to create the helm chart for hello app.

    First, we create the Kubernetes manifest files. We will create a deployment and a service file:

    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: helloapp
    spec:
      replicas: 1
      strategy:
      type: RollingUpdate
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 1
      template:
        metadata:
          labels:
            app: helloapp
            env: {{ .Values.labels.env }}
            cluster: {{ .Values.labels.cluster }}
        spec:
          containers:
          - name: helloapp
            image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
            imagePullPolicy: {{ .Values.image.imagePullPolicy }}
            readinessProbe:
              httpGet:
                path: /hello
                port: 8080
                initialDelaySeconds: 5
                periodSeconds: 5
                successThreshold: 1

    apiVersion: v1
    kind: Service
    metadata:
      name: helloapp
    spec:
      type: {{ .Values.service.type }}
      ports:
      - name: helloapp
        port: {{ .Values.service.port }}
        protocol: TCP
        targetPort: {{ .Values.service.targetPort }}
      selector:
        app: helloapp

    In the above file, you must have noticed that we have used .Values object. All the values that we specify in the values.yaml file in our helm chart can be accessed using the .Values object inside the template.

    Let’s create the helm chart now:

    helm create helloapp

    Above command will create a chart helm chart folder structure for us.

    helloapp/
    |
    |- .helmignore # Contains patterns to ignore when packaging Helm charts.
    |
    |- Chart.yaml # Information about your chart
    |
    |- values.yaml # The default values for your templates
    |
    |- charts/ # Charts that this chart depends on
    |
    |- templates/ # The template files

    We can remove the charts/ folder inside our helloapp chart as our chart won’t have any sub-charts. Now we need to move our Kubernetes manifest files to the template folder and update our values.yaml and Chart.yaml

    Our values.yaml looks like:

    image:
      tag: 0.0.1
      repository: 123456789870.dkr.ecr.us-east-1.amazonaws.com/helloapp
      imagePullPolicy: Always
    
    labels:
      env: "staging"
      cluster: "eks-cluster-blog"
    
    service:
      port: 80
      targetPort: 8080
      type: LoadBalancer

    This allows us to make our deployment more configurable. For example, here we have set our service type as LoadBalancer in values.yaml but if we want to change it to nodePort we just need to set is as NodePort while installing the chart (–set service.type=NodePort). Similarly, we have set the image pull policy as Always which is fine for development/staging environment but when we deploy to production we may want to set is as ifNotPresent. In our chart, we need to identify the parameters/values which may change from one environment to another and make them configurable. This allows us to be flexible with our deployment and reuse the same chart

    Finally, we need to update Chart.yaml file. This file mostly contains metadata about the chart like the name, version, maintainer, etc, where name & version are two mandatory fields for Chart.yaml.

    version: 1.0.0
    appVersion: 0.0.1
    name: helloapp
    description: Helm chart for helloapp
    source:
      - https://github.com/akash-gautam/helloapp

    Now our Helm chart is ready we can start with the pipeline. We need to create a folder named .circleci in the root folder of our repository and create a file named config.yml in it. In our config.yml we have defined two jobs one is build&pushImage and deploy.

    Configure the pipeline:

    build&pushImage:
        working_directory: /go/src/hello-app (1)
        docker:
          - image: circleci/golang:1.10 (2)
        steps:
          - checkout (3)
          - run: (4)
              name: build the binary
              command: go build -o hello-app
          - setup_remote_docker: (5)
              docker_layer_caching: true
          - run: (6)
              name: Set the tag for the image, we will concatenate the app verson and circle build number with a `-` char in between
              command:  echo 'export TAG=$(cat VERSION)-$CIRCLE_BUILD_NUM' >> $BASH_ENV
          - run: (7)
              name: Build the docker image
              command: docker build . -t ${CIRCLE_PROJECT_REPONAME}:$TAG
          - run: (8)
              name: Install AWS cli
              command: export TZ=Europe/Minsk && sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > sudo  /etc/timezone && sudo apt-get update && sudo apt-get install -y awscli
          - run: (9)
              name: Login to ECR
              command: $(aws ecr get-login --region $AWS_REGION | sed -e 's/-e none//g')
          - run: (10)
              name: Tag the image with ECR repo name 
              command: docker tag ${CIRCLE_PROJECT_REPONAME}:$TAG ${HELLOAPP_ECR_REPO}:$TAG    
          - run: (11)
              name: Push the image the ECR repo
              command: docker push ${HELLOAPP_ECR_REPO}:$TAG

    1. We set the working directory for our job, we are setting it on the gopath so that we don’t need to do anything additional.
    2. We set the docker image inside which we want the job to run, as our app is built using golang we are using the image which already has golang installed in it.
    3. This step checks out our repository in the working directory
    4. In this step, we build the binary
    5. Here we setup docker with the help of  setup_remote_docker  key provided by CircleCI.
    6. In this step we create the tag we will be using while building the image, we use the app version available in the VERSION file and append the $CIRCLE_BUILD_NUM value to it, separated by a dash (`-`).
    7. Here we build the image and tag.
    8. Installing AWS CLI to interact with the ECR later.
    9. Here we log into ECR
    10. We tag the image build in step 7 with the ECR repository name.
    11. Finally, we push the image to ECR.

    Now we will deploy our helm charts. For this, we have a separate job deploy.

    deploy:
        docker: (1)
            - image: circleci/golang:1.10
        steps: (2)
          - checkout
          - run: (3)
              name: Install AWS cli
              command: export TZ=Europe/Minsk && sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > sudo  /etc/timezone && sudo apt-get update && sudo apt-get install -y awscli
          - run: (4)
              name: Set the tag for the image, we will concatenate the app verson and circle build number with a `-` char in between
              command:  echo 'export TAG=$(cat VERSION)-$CIRCLE_PREVIOUS_BUILD_NUM' >> $BASH_ENV
          - run: (5)
              name: Install and confgure kubectl
              command: sudo curl -L https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl && sudo chmod +x /usr/local/bin/kubectl  
          - run: (6)
              name: Install and confgure kubectl aws-iam-authenticator
              command: curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator && sudo chmod +x ./aws-iam-authenticator && sudo cp ./aws-iam-authenticator /bin/aws-iam-authenticator
           - run: (7)
              name: Install latest awscli version
              command: sudo apt install unzip && curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip" && unzip awscli-bundle.zip &&./awscli-bundle/install -b ~/bin/aws
          - run: (8)
              name: Get the kubeconfig file 
              command: export KUBECONFIG=$HOME/.kube/kubeconfig && /home/circleci/bin/aws eks --region $AWS_REGION update-kubeconfig --name $EKS_CLUSTER_NAME
          - run: (9)
              name: Install and configuire helm
              command: sudo curl -L https://storage.googleapis.com/kubernetes-helm/helm-v2.11.0-linux-amd64.tar.gz | tar xz && sudo mv linux-amd64/helm /bin/helm && sudo rm -rf linux-amd64
          - run: (10)
              name: Initialize helm
              command:  helm init --client-only --kubeconfig=$HOME/.kube/kubeconfig
          - run: (11)
              name: Install tiller plugin
              command: helm plugin install https://github.com/rimusz/helm-tiller --kubeconfig=$HOME/.kube/kubeconfig        
          - run: (12)
              name: Release helloapp using helm chart 
              command: bash scripts/release-helloapp.sh $TAG

    1. Set the docker image inside which we want to execute the job.
    2. Check out the code using `checkout` key
    3. Install AWS CLI.
    4. Setting the value of tag just like we did in case of build&pushImage job. Note that here we are using CIRCLE_PREVIOUS_BUILD_NUM variable which gives us the build number of build&pushImage job and ensures that the tag values are the same.
    5. Download kubectl and making it executable.
    6. Installing aws-iam-authenticator this is required because my k8s cluster is on EKS.
    7. Here we install the latest version of AWS CLI, EKS is a relatively newer service from AWS and older versions of AWS CLI doesn’t have it.
    8. Here we fetch the kubeconfig file. This step will vary depending upon where the k8s cluster has been set up. As my cluster is on EKS am getting the kubeconfig file via. AWS CLI similarly if your cluster in on GKE then you need to configure gcloud and use the command  `gcloud container clusters get-credentials <cluster-name> –zone=<zone-name>`. We can also have the kubeconfig file on some other secure storage system and fetch it from there.</zone-name></cluster-name>
    9. Download Helm and make it executable
    10. Initializing helm, note that we are initializing helm in client only mode so that it doesn’t start the tiller server.
    11. Download the tillerless helm plugin
    12. Execute the release-helloapp.sh shell script and pass it TAG value from step 4.

    In the release-helloapp.sh script we first start tiller, after this, we check if the release is already present or not if it is present then we upgrade otherwise we make a new release. Here we override the value of tag for the image present in the chart by setting it to the tag of the newly built image, finally, we stop the tiller server.

    #!/bin/bash
    TAG=$1
    echo "start tiller"
    export KUBECONFIG=$HOME/.kube/kubeconfig
    helm tiller start-ci
    export HELM_HOST=127.0.0.1:44134
    result=$(eval helm ls | grep helloapp) 
    if [ $? -ne "0" ]; then 
       helm install --timeout 180 --name helloapp --set image.tag=$TAG charts/helloapp
    else 
       helm upgrade --timeout 180 helloapp --set image.tag=$TAG charts/helloapp
    fi
    echo "stop tiller"
    helm tiller stop 

    The complete CircleCI config.yml file looks like:

    version: 2
    
    jobs:
      build&pushImage:
        working_directory: /go/src/hello-app
        docker:
          - image: circleci/golang:1.10
        steps:
          - checkout
          - run:
              name: build the binary
              command: go build -o hello-app
          - setup_remote_docker:
              docker_layer_caching: true
          - run:
              name: Set the tag for the image, we will concatenate the app verson and circle build number with a `-` char in between
              command:  echo 'export TAG=$(cat VERSION)-$CIRCLE_BUILD_NUM' >> $BASH_ENV
          - run:
              name: Build the docker image
              command: docker build . -t ${CIRCLE_PROJECT_REPONAME}:$TAG
          - run:
              name: Install AWS cli
              command: export TZ=Europe/Minsk && sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > sudo  /etc/timezone && sudo apt-get update && sudo apt-get install -y awscli
          - run:
              name: Login to ECR
              command: $(aws ecr get-login --region $AWS_REGION | sed -e 's/-e none//g')
          - run: 
              name: Tag the image with ECR repo name 
              command: docker tag ${CIRCLE_PROJECT_REPONAME}:$TAG ${HELLOAPP_ECR_REPO}:$TAG    
          - run: 
              name: Push the image the ECR repo
              command: docker push ${HELLOAPP_ECR_REPO}:$TAG
      deploy:
        docker:
            - image: circleci/golang:1.10
        steps:
          - attach_workspace:
              at: /tmp/workspace
          - checkout
          - run:
              name: Install AWS cli
              command: export TZ=Europe/Minsk && sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > sudo  /etc/timezone && sudo apt-get update && sudo apt-get install -y awscli
          - run:
              name: Set the tag for the image, we will concatenate the app verson and circle build number with a `-` char in between
              command:  echo 'export TAG=$(cat VERSION)-$CIRCLE_PREVIOUS_BUILD_NUM' >> $BASH_ENV
          - run:
              name: Install and confgure kubectl
              command: sudo curl -L https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl && sudo chmod +x /usr/local/bin/kubectl  
          - run:
              name: Install and confgure kubectl aws-iam-authenticator
              command: curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator && sudo chmod +x ./aws-iam-authenticator && sudo cp ./aws-iam-authenticator /bin/aws-iam-authenticator
           - run:
              name: Install latest awscli version
              command: sudo apt install unzip && curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip" && unzip awscli-bundle.zip &&./awscli-bundle/install -b ~/bin/aws
          - run:
              name: Get the kubeconfig file 
              command: export KUBECONFIG=$HOME/.kube/kubeconfig && /home/circleci/bin/aws eks --region $AWS_REGION update-kubeconfig --name $EKS_CLUSTER_NAME
          - run:
              name: Install and configuire helm
              command: sudo curl -L https://storage.googleapis.com/kubernetes-helm/helm-v2.11.0-linux-amd64.tar.gz | tar xz && sudo mv linux-amd64/helm /bin/helm && sudo rm -rf linux-amd64
          - run:
              name: Initialize helm
              command:  helm init --client-only --kubeconfig=$HOME/.kube/kubeconfig
          - run:
              name: Install tiller plugin
              command: helm plugin install https://github.com/rimusz/helm-tiller --kubeconfig=$HOME/.kube/kubeconfig        
          - run:
              name: Release helloapp using helm chart 
              command: bash scripts/release-helloapp.sh $TAG
    workflows:
      version: 2
      primary:
        jobs:
          - build&pushImage
          - deploy:
              requires:
                - build&pushImage

    At the end of the file, we see the workflows, workflows control the order in which the jobs specified in the file are executed and establishes dependencies and conditions for the job. For example, we may want our deploy job trigger only after my build job is complete so we added a dependency between them. Similarly, we may want to exclude the jobs from running on some particular branch then we can specify those type of conditions as well.

    We have used a few environment variables in our pipeline configuration some of them were created by us and some were made available by CircleCI. We created AWS_REGION, HELLOAPP_ECR_REPO, EKS_CLUSTER_NAME, AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY variables. These variables are set via. CircleCI web console by going to the projects settings. Other variables that we have used are made available by CircleCI as a part of its environment setup process. Complete list of environment variables set by CircleCI can be found here.

    Verify the working of the pipeline:

    Once everything is set up properly then our application will get deployed on the k8s cluster and should be available for access. Get the external IP of the helloapp service and make a curl request to the hello endpoint

    $ curl http://a31e25e7553af11e994620aebe144c51-242977608.us-west-2.elb.amazonaws.com/hello && printf "n"
    
    {"Msg":"Hello World"}

    Now update the code and change the message “Hello World” to “Hello World Returns” and push your code. It will take a few minutes for the pipeline to complete execution and once it is complete make the curl request again to see the changes getting reflected.

    $ curl http://a31e25e7553af11e994620aebe144c51-242977608.us-west-2.elb.amazonaws.com/hello && printf "n"
    
    {"Msg":"Hello World Returns"}

    Also, verify that a new tag is also created for the helloapp docker image on ECR.

    Conclusion

    In this blog post, we explored how we can set up a CI/CD pipeline for kubernetes and got basic exposure to CircleCI and Helm. Although helm is not absolutely necessary for building a pipeline, it has lots of benefits and is widely used across the industry. We can extend the pipeline to consider the cases where we have multiple environments like dev, staging & production and make the pipeline deploy the application to any of them depending upon some conditions. We can also add more jobs like integration tests. All the codes used in the blog post are available here.

    Related Reads:

    1. Continuous Deployment with Azure Kubernetes Service, Azure Container Registry & Jenkins
    2. Know Everything About Spinnaker & How to Deploy Using Kubernetes Engine
  • Automated Containerization and Migration of On-premise Applications to Cloud Platforms

    Containerized applications are becoming more popular with each passing year. All enterprise applications are adopting container technology as they modernize their IT systems. Migrating your applications from VMs or physical machines to containers comes with multiple advantages like optimal resource utilization, faster deployment times, replication, quick cloning, lesser lock-in and so on. Various container orchestration platforms like Kubernetes, Google Container Engine (GKE), Amazon EC2 Container Service (Amazon ECS) help in quick deployment and easy management of your containerized applications. But in order to use these platforms, you need to migrate your legacy applications to containers or rewrite/redeploy your applications from scratch with the containerization approach. Rearchitecting your applications using containerization approach is preferable, but is that possible for complex legacy applications? Is your deployment team capable enough to list down each and every detail about the deployment process of your application? Do you have the patience of authoring a Docker file for each of the components of your complex application stack?

    Automated migrations!

    Velotio has been helping customers with automated migration of VMs and bare-metal servers to various container platforms. We have developed automation to convert these migrated applications as containers on various container deployment platforms like GKE, Amazon ECS and Kubernetes. In this blog post, we will cover one such migration tool developed at Velotio which will migrate your application running on a VM or physical machine to Google Container Engine (GKE) by running a single command.

    Migration tool details

    We have named our migration tool as A2C(Anything to Container). It can migrate applications running on any Unix or Windows operating system. 

    The migration tool requires the following information about the server to be migrated:

    • IP of the server
    • SSH User, SSH Key/Password of the application server
    • Configuration file containing data paths for application/database/components (more details below)
    • Required name of your docker image (The docker image that will get created for your application)
    • GKE Container Cluster details

    In order to store persistent data, volumes can be defined in container definition. Data changes done on volume path remain persistent even if the container is killed or crashes. Volumes are basically filesystem path from host machine on which your container is running, NFS or cloud storage. Containers will mount the filesystem path from your local machine to container, leading to data changes being written on the host machine filesystem instead of the container’s filesystem. Our migration tool supports data volumes which can be defined in the configuration file. It will automatically create disks for the defined volumes and copy data from your application server to these disks in a consistent way.

    The configuration file we have been talking about is basically a YAML file containing filesystem level information about your application server. A sample of this file can be found below:

    includes:
    - /
    volumes:
    - var/log/httpd
    - var/log/mariadb
    - var/www/html
    - var/lib/mysql
    excludes:
    - mnt
    - var/tmp
    - etc/fstab
    - proc
    - tmp

    The configuration file contains 3 sections: includes, volumes and excludes:

    • Includes contains filesystem paths on your application server which you want to add to your container image.
    • Volumes contain filesystem paths on your application server which stores your application data. Generally, filesystem paths containing database files, application code files, configuration files, log files are good candidates for volumes.
    • The excludes section contains filesystem paths which you don’t want to make part of the container. This may include temporary filesystem paths like /proc, /tmp and also NFS mounted paths. Ideally, you would include everything by giving “/” in includes section and exclude specifics in exclude section.

    Docker image name to be given as input to the migration tool is the docker registry path in which the image will be stored, followed by the name and tag of the image. Docker registry is like GitHub of docker images, where you can store all your images. Different versions of the same image can be stored by giving version specific tag to the image. GKE also provides a Docker registry. Since in this demo we are migrating to GKE, we will also store our image to GKE registry.

    GKE container cluster details to be given as input to the migration tool, contains GKE specific details like GKE project name, GKE container cluster name and GKE region name. A container cluster can be created in GKE to host the container applications. We have a separate set of scripts to perform cluster creation operation. Container cluster creation can also be done easily through GKE UI. For now, we will assume that we have a 3 node cluster created in GKE, which we will use to host our application.

    Tasks performed under migration

    Our migration tool (A2C), performs the following set of activities for migrating the application running on a VM or physical machine to GKE Container Cluster:

    1. Install the A2C migration tool with all it’s dependencies to the target application server

    2. Create a docker image of the application server, based on the filesystem level information given in the configuration file

    3. Capture metadata from the application server like configured services information, port usage information, network configuration, external services, etc.

    4.  Push the docker image to GKE container registry

    5. Create disk in Google Cloud for each volume path defined in configuration file and prepopulate disks with data from application server

    6. Create deployment spec for the container application in GKE container cluster, which will open the required ports, configure required services, add multi container dependencies, attach the pre populated disks to containers, etc.

    7. Deploy the application, after which you will have your application running as containers in GKE with application software in running state. New application URL’s will be given as output.

    8. Load balancing, HA will be configured for your application.

    Demo

    For demonstration purpose, we will deploy a LAMP stack (Apache+PHP+Mysql) on a CentOS 7 VM and will run the migration utility for the VM, which will migrate the application to our GKE cluster. After the migration we will show our application preconfigured with the same data as on our VM, running on GKE.

    Step 1

    We setup LAMP stack using Apache, PHP and Mysql on a CentOS 7 VM in GCP. The PHP application can be used to list, add, delete or edit user data. The data is getting stored in MySQL database. We added some data to the database using the application and the UI would show the following:

    Step 2

    Now we run the A2C migration tool, which will migrate this application stack running on a VM into a container and auto-deploy it to GKE.

    # ./migrate.py -c lamp_data_handler.yml -d "tcp://35.202.201.247:4243" -i migrate-lamp -p glassy-chalice-XXXXX -u root -k ~/mykey -l a2c-host --gcecluster a2c-demo --gcezone us-central1-b 130.211.231.58

    Pushing converter binary to target machine
    Pushing data config to target machine
    Pushing installer script to target machine
    Running converter binary on target machine
    [130.211.231.58] out: creating docker image
    [130.211.231.58] out: image created with id 6dad12ba171eaa8615a9c353e2983f0f9130f3a25128708762228f293e82198d
    [130.211.231.58] out: Collecting metadata for image
    [130.211.231.58] out: Generating metadata for cent7
    [130.211.231.58] out: Building image from metadata
    Pushing the docker image to GCP container registryInitiate remote data copy
    Activated service account credentials for: [glassy-chaliceXXXXX@appspot.gserviceaccount.com]
    for volume var/log/httpd
    Creating disk migrate-lamp-0
    Disk Created Successfully
    transferring data from sourcefor volume var/log/mariadb
    Creating disk migrate-lamp-1
    Disk Created Successfully
    transferring data from sourcefor volume var/www/html
    Creating disk migrate-lamp-2
    Disk Created Successfully
    transferring data from sourcefor volume var/lib/mysql
    Creating disk migrate-lamp-3
    Disk Created Successfully
    transferring data from sourceConnecting to GCP cluster for deployment
    Created service file /tmp/gcp-service.yaml
    Created deployment file /tmp/gcp-deployment.yaml

    Deploying to GKE

    $ kubectl get pod
    
    NAMEREADY STATUSRESTARTS AGE
    migrate-lamp-3707510312-6dr5g 0/1 ContainerCreating 058s

    $ kubectl get deployment
    
    NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
    migrate-lamp 1 1 10 1m

    $ kubectl get service
    
    NAME CLUSTER-IP EXTERNAL-IP PORT(S)AGE
    kubernetes 10.59.240.1443/TCP23hmigrate-lamp 10.59.248.44 35.184.53.100 3306:31494/TCP,80:30909/TCP,22:31448/TCP 53s

    You can access your application using above connection details!

    Step 3

    Access LAMP stack on GKE using the IP 35.184.53.100 on default 80 port as was done on the source machine.

    Here is the Docker image being created in GKE Container Registry:

    We can also see that disks were created with migrate-lamp-x, as part of this automated migration.

    Load Balancer also got provisioned in GCP as part of the migration process

    Following service files and deployment files were created by our migration tool to deploy the application on GKE:

    # cat /tmp/gcp-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
    labels:
    app: migrate-lamp
    name: migrate-lamp
    spec:
    ports:
    - name: migrate-lamp-3306
    port: 3306
    - name: migrate-lamp-80
    port: 80
    - name: migrate-lamp-22
    port: 22
    selector:
    app: migrate-lamp
    type: LoadBalancer

    # cat /tmp/gcp-deployment.yaml
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    labels:
    app: migrate-lamp
    name: migrate-lamp
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: migrate-lamp
    template:
    metadata:
    labels:
    app: migrate-lamp
    spec:
    containers:
    - image: us.gcr.io/glassy-chalice-129514/migrate-lamp
    name: migrate-lamp
    ports:
    - containerPort: 3306
    - containerPort: 80
    - containerPort: 22
    securityContext:
    privileged: true
    volumeMounts:
    - mountPath: /var/log/httpd
    name: migrate-lamp-var-log-httpd
    - mountPath: /var/www/html
    name: migrate-lamp-var-www-html
    - mountPath: /var/log/mariadb
    name: migrate-lamp-var-log-mariadb
    - mountPath: /var/lib/mysql
    name: migrate-lamp-var-lib-mysql
    volumes:
    - gcePersistentDisk:
    fsType: ext4
    pdName: migrate-lamp-0
    name: migrate-lamp-var-log-httpd
    - gcePersistentDisk:
    fsType: ext4
    pdName: migrate-lamp-2
    name: migrate-lamp-var-www-html
    - gcePersistentDisk:
    fsType: ext4
    pdName: migrate-lamp-1
    name: migrate-lamp-var-log-mariadb
    - gcePersistentDisk:
    fsType: ext4
    pdName: migrate-lamp-3
    name: migrate-lamp-var-lib-mysql

    Conclusion

    Migrations are always hard for IT and development teams. At Velotio, we have been helping customers to migrate to cloud and container platforms using streamlined processes and automation. Feel free to reach out to us at contact@rsystems.com to know more about our cloud and container adoption/migration offerings.