Category: Type

Real Time Analytics for IoT Data using Mosquitto, AWS Kinesis and InfluxDB

Internet of things (IoT) is maturing rapidly and it is finding application across various industries. Every common device that we use is turning into the category of smart devices. Smart devices are basically IoT devices. These devices captures various parameters in and around their environment leading to generation of a huge amount of data. This data needs to be collected, processed, stored and analyzed in order to get actionable insights from them. To do so, we need to build data pipeline. In this blog we will be building a similar pipeline using Mosquitto, Kinesis, InfluxDB and Grafana. We will discuss all these individual components of the pipeline and the steps to build it.

Why the Analysis of IoT data is different

In an IoT setup, the data is generated by sensors that are distributed across various locations. In order to use the data generated by them we should first get them to a common location from where the various applications which want to process them can read it.

Network Protocol

IoT devices have low computational and network resources. Moreover, these devices write data in very short intervals thus high throughput is expected on the network. For transferring IoT data it is desirable to use lightweight network protocols. A protocol like HTTP uses a complex structure for communication resulting in consumption of more resources making it unsuitable for IoT data transfer. One of the lightweight protocol suitable for IoT data is MQTT which we are using in our pipeline. MQTT is designed for machine to machine (M2M) connectivity. It uses a publisher/subscriber communication model and helps clients to distribute telemetry data with very low network resource consumption. Along with IoT MQTT has been found to be useful in other fields as well.

Other similar protocols include Constrained Application Protocol (CoAP), Advanced Message Queuing Protocol (AMQP) etc.

Datastore

IoT devices generally collect telemetry about its environment usually through sensors. In most of the IoT scenarios, we try to analyze how things have changed over a period of time. Storing these data in a time series database makes our analysis simpler and better. InfluxDB is popular time series database which we will use in our pipeline. More about time series databases can be read here.

Pipeline Overview

The first thing we need for a data pipeline is data. As shown in the image above the data generated by various sensors are written to a topic in the MQTT message broker. To mimic sensors we will use a program which uses the MQTT client to write data to the MQTT broker.

The next component is Amazon Kinesis which is used for streaming data analysis. It closely resembles apache Kafka which is an open source tool used for similar purposes. Kinesis brings the data generated by a number of clients to a single location from where different consumers can pull it for processing. We are using Kinesis so that multiple consumers can read data from a single location. This approach scales well even if we have multiple message brokers.

Once the data is written to the MQTT broker a Kinesis producer subscribes to it and pull the data from it and writes it to the Kinesis stream, from the Kinesis stream the data is pulled by Kinesis consumers which processes the data and writes it to an InfluxDB which is a time series database.

Finally, we use Grafana which is a well-known tool for analytics and monitoring, we can connect it to many popular databases and perform analytics and monitoring. Another popular tool in this space is Kibana (the K of ELK stack)

Setting up a MQTT Message Broker Server:

For MQTT message broker we will use Mosquitto which is a popular open source message broker that implements MQTT. The details of downloading and installing mosquitto for various platforms are available here.

For Ubuntu, it can be installed using the following commands

sudo apt-add-repository ppa:mosquitto-dev/mosquitto-ppa
sudo apt-get update
sudo apt-get install mosquitto
service mosquitto status

sudo apt-add-repository ppa:mosquitto-dev/mosquitto-ppa
sudo apt-get update
sudo apt-get install mosquitto
service mosquitto status

Setting up InfluxDB and Grafana

The simplest way to set up both these components is to use their docker image directly

docker run --name influxdb -p 8083:8083 -p 8086:8086 influxdb:1.0
docker run --name grafana -p 3000:3000 --link influxdb grafana/grafana:3.1.1

docker run --name influxdb -p 8083:8083 -p 8086:8086 influxdb:1.0
docker run --name grafana -p 3000:3000 --link influxdb grafana/grafana:3.1.1

In InfluxDB we have mapped two ports, port 8086 is the HTTP API endpoint port while 8083 is the administration web server’s port. We need to create a database where we will write our data.

For creating a database we can directly go to the console at <influxdb-ip>:8083 and run the command: </influxdb-ip>

CREATE DATABASE "iotdata"

CREATE DATABASE "iotdata"

Or we can do it via HTTP request :

curl -XPOST "http://localhost:8086/query" --data-urlencode "q=CREATE DATABASE iotdata”

curl -XPOST "http://localhost:8086/query" --data-urlencode "q=CREATE DATABASE iotdata”

Creating a Kinesis stream

In Kinesis, we create streams where the Kinesis producers write the data coming from various sources and then the Kinesis consumers read the data from the stream. In the stream, the data is stored in various shards. For our purpose, one shard would be enough.

Creating the MQTT client

We will use the Golang client available in this repository to connect with our message broker server and write data to a specific topic. We will first create a new MQTT client. Here we can see the list of options we have for configuring our MQTT client.

Once we create the options object we can pass it to the NewClient() method which will return us the MQTT client. Now we can write data to the MQTT server. We have defined the structure of the data in the struct sensor data. Now to mimic two sensors which are writing telemetry data to the MQTT broker we have two goroutines which push data to the MQTT server every five seconds.

package publisher
import (
	"config"
	"encoding/json"
	"fmt"
	"log"
	"math/rand"
	"os"
	"time"
	"github.com/eclipse/paho.mqtt.golang"
)
type SensorData struct {
	Id          string  `json:"id"`
	Temperature float64 `json:"temperature"`
	Humidity    float64 `json:"humidity"`
	Timestamp   int64   `json:"timestamp"`
	City        string  `json:"city"`
}
func StartMQTTPublisher() {
	fmt.Println("MQTT publisher Started")
	mqtt.DEBUG = log.New(os.Stdout, "", 0)
	mqtt.ERROR = log.New(os.Stdout, "", 0)
	opts := mqtt.NewClientOptions().AddBroker(config.GetMqttServerurl()).SetClientID("MqttPublisherClient")
	opts.SetKeepAlive(2 * time.Second)
	opts.SetPingTimeout(1 * time.Second)
	c := mqtt.NewClient(opts)
	if token := c.Connect(); token.Wait() && token.Error() != nil {
		panic(token.Error())
	}
	go func() {
		t := 20.04
		h := 32.06
		for i := 0; i < 100; i++ {
			sensordata := SensorData{
				Id:          "CITIMUM",
				Temperature: t,
				Humidity:    h,
				Timestamp:   time.Now().Unix(),
				City:        "Mumbai",
			}
			requestBody, err := json.Marshal(sensordata)
			if err != nil {
				fmt.Println(err)
			}
			token := c.Publish(config.GetMQTTTopicName(), 0, false, requestBody)
			token.Wait()
			if i < 50 {
				t = t + 1*rand.Float64()
				h = h + 1*rand.Float64()
			} else {
				t = t - 1*rand.Float64()
				h = h - 1*rand.Float64()
			}
			time.Sleep(5 * time.Second)
		}
	}()
	go func() {
		t := 16.02
		h := 24.04
		for i := 0; i < 100; i++ {
			sensordata := SensorData{
				Id:          "CITIPUN",
				Temperature: t,
				Humidity:    h,
				Timestamp:   time.Now().Unix(),
				City:        "Pune",
			}
			requestBody, err := json.Marshal(sensordata)
			if err != nil {
				fmt.Println(err)
			}
			token := c.Publish(config.GetMQTTTopicName(), 0, false, requestBody)
			token.Wait()
			if i < 50 {
				t = t + 1*rand.Float64()
				h = h + 1*rand.Float64()
			} else {
				t = t - 1*rand.Float64()
				h = h - 1*rand.Float64()
			}
			time.Sleep(5 * time.Second)
		}
	}()
	time.Sleep(1000 * time.Second)
	c.Disconnect(250)
}

package publisher

import (
	"config"
	"encoding/json"
	"fmt"
	"log"
	"math/rand"
	"os"
	"time"

	"github.com/eclipse/paho.mqtt.golang"
)

type SensorData struct {
	Id          string  `json:"id"`
	Temperature float64 `json:"temperature"`
	Humidity    float64 `json:"humidity"`
	Timestamp   int64   `json:"timestamp"`
	City        string  `json:"city"`
}

func StartMQTTPublisher() {
	fmt.Println("MQTT publisher Started")
	mqtt.DEBUG = log.New(os.Stdout, "", 0)
	mqtt.ERROR = log.New(os.Stdout, "", 0)
	opts := mqtt.NewClientOptions().AddBroker(config.GetMqttServerurl()).SetClientID("MqttPublisherClient")
	opts.SetKeepAlive(2 * time.Second)
	opts.SetPingTimeout(1 * time.Second)
	c := mqtt.NewClient(opts)
	if token := c.Connect(); token.Wait() && token.Error() != nil {
		panic(token.Error())
	}

	go func() {
		t := 20.04
		h := 32.06
		for i := 0; i < 100; i++ {
			sensordata := SensorData{
				Id:          "CITIMUM",
				Temperature: t,
				Humidity:    h,
				Timestamp:   time.Now().Unix(),
				City:        "Mumbai",
			}
			requestBody, err := json.Marshal(sensordata)
			if err != nil {
				fmt.Println(err)
			}
			token := c.Publish(config.GetMQTTTopicName(), 0, false, requestBody)
			token.Wait()
			if i < 50 {
				t = t + 1*rand.Float64()
				h = h + 1*rand.Float64()
			} else {
				t = t - 1*rand.Float64()
				h = h - 1*rand.Float64()
			}
			time.Sleep(5 * time.Second)
		}
	}()
	go func() {
		t := 16.02
		h := 24.04
		for i := 0; i < 100; i++ {
			sensordata := SensorData{
				Id:          "CITIPUN",
				Temperature: t,
				Humidity:    h,
				Timestamp:   time.Now().Unix(),
				City:        "Pune",
			}
			requestBody, err := json.Marshal(sensordata)
			if err != nil {
				fmt.Println(err)
			}
			token := c.Publish(config.GetMQTTTopicName(), 0, false, requestBody)
			token.Wait()
			if i < 50 {
				t = t + 1*rand.Float64()
				h = h + 1*rand.Float64()
			} else {
				t = t - 1*rand.Float64()
				h = h - 1*rand.Float64()
			}
			time.Sleep(5 * time.Second)
		}
	}()
	time.Sleep(1000 * time.Second)
	c.Disconnect(250)

}

Create a Kinesis Producer

Now we will create a Kinesis producer which subscribes to the topic to which our MQTT client writes data and pull the data from the broker and pushes it to the Kinesis stream. Just like in the previous section here also we first create an MQTT client which connects to the message broker and subscribe to the topic to which our clients/publishers are going to write data to. In the client option, we have the option to define a function which will be called when data is written to this topic. We have created a function postDataTokinesisStream() which connects Kinesis using the Kinesis client and then writes data to the Kinesis stream, every time a data is pushed to the topic.

package producer
import (
	"config"
	"fmt"
	"os"
	"time"
	"github.com/aws/aws-sdk-go/service/kinesis"
	mqtt "github.com/eclipse/paho.mqtt.golang"
)
func postDataTokinesisStream(client mqtt.Client, message mqtt.Message) {
	fmt.Printf("Received message on topic: %snMessage: %sn", message.Topic(), message.Payload())
	streamName := config.GetKinesisStreamName()
	kclient := config.GetKinesisClient()
	var putRecordInput kinesis.PutRecordInput
	partitionKey := message.Topic()
	putRecordInput.PartitionKey = &partitionKey
	putRecordInput.StreamName = &streamName
	putRecordInput.Data = message.Payload()
	putRecordOutput, err := kclient.PutRecord(&putRecordInput)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println(putRecordOutput)
	}
}
func StartKinesisProducer() {
	fmt.Println("Kinesis Producer Started")
	c := make(chan os.Signal, 1)
	opts := mqtt.NewClientOptions().AddBroker(config.GetMqttServerurl()).SetClientID("MqttSubscriberClient")
	opts.SetKeepAlive(2 * time.Second)
	opts.SetPingTimeout(1 * time.Second)
	opts.OnConnect = func(c mqtt.Client) {
		if token := c.Subscribe(config.GetMQTTTopicName(), 0, postDataTokinesisStream); token.Wait() && token.Error() != nil {
			panic(token.Error())
		}
	}
	client := mqtt.NewClient(opts)
	if token := client.Connect(); token.Wait() && token.Error() != nil {
		panic(token.Error())
	} else {
		fmt.Printf("Connected to %sn", config.GetMqttServerurl())
	}
	<-c
}

package producer

import (
	"config"
	"fmt"

	"os"
	"time"

	"github.com/aws/aws-sdk-go/service/kinesis"

	mqtt "github.com/eclipse/paho.mqtt.golang"
)

func postDataTokinesisStream(client mqtt.Client, message mqtt.Message) {
	fmt.Printf("Received message on topic: %snMessage: %sn", message.Topic(), message.Payload())
	streamName := config.GetKinesisStreamName()
	kclient := config.GetKinesisClient()
	var putRecordInput kinesis.PutRecordInput
	partitionKey := message.Topic()
	putRecordInput.PartitionKey = &partitionKey
	putRecordInput.StreamName = &streamName
	putRecordInput.Data = message.Payload()
	putRecordOutput, err := kclient.PutRecord(&putRecordInput)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println(putRecordOutput)
	}

}

func StartKinesisProducer() {
	fmt.Println("Kinesis Producer Started")
	c := make(chan os.Signal, 1)
	opts := mqtt.NewClientOptions().AddBroker(config.GetMqttServerurl()).SetClientID("MqttSubscriberClient")
	opts.SetKeepAlive(2 * time.Second)
	opts.SetPingTimeout(1 * time.Second)
	opts.OnConnect = func(c mqtt.Client) {
		if token := c.Subscribe(config.GetMQTTTopicName(), 0, postDataTokinesisStream); token.Wait() && token.Error() != nil {
			panic(token.Error())
		}
	}

	client := mqtt.NewClient(opts)
	if token := client.Connect(); token.Wait() && token.Error() != nil {
		panic(token.Error())
	} else {
		fmt.Printf("Connected to %sn", config.GetMqttServerurl())
	}

	<-c
}

Create a Kinesis Consumer

Now the data is available in our Kinesis stream we can pull it for processing. In the Kinesis consumer section, we create a Kinesis client just like we did in the previous section and then pull data from it. Here we first make a call to the DescribeStream method which returns us the shardId, we then use this shardId to get the ShardIterator and then finally we are able to fetch the records by passing the ShardIterator to GetRecords() method. GetRecords() also returns the NextShardIterator which we can use to continuously look for records in the shard until NextShardIterator becomes null.

package consumer
import (
	"config"
	"fmt"
	"github.com/aws/aws-sdk-go/service/kinesis"
	"velotio.com/dao"
)
func StartKinesisConsumer() {
	fmt.Println("Kinesis Consumer Started")
	client := config.GetKinesisClient()
	streamName := config.GetKinesisStreamName()
	var describeStreamInput kinesis.DescribeStreamInput
	describeStreamInput.StreamName = &streamName
	describeStreamOutput, err := client.DescribeStream(&describeStreamInput)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println(*describeStreamOutput.StreamDescription.Shards[0].ShardId)
	}
	var getShardIteratorInput kinesis.GetShardIteratorInput
	getShardIteratorInput.ShardId = describeStreamOutput.StreamDescription.Shards[0].ShardId
	getShardIteratorInput.StreamName = &streamName
	shardIteratorType := "TRIM_HORIZON"
	getShardIteratorInput.ShardIteratorType = &shardIteratorType
	getShardIteratorOuput, err := client.GetShardIterator(&getShardIteratorInput)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println(*getShardIteratorOuput.ShardIterator)
	}
	var getRecordsInput kinesis.GetRecordsInput
	getRecordsInput.ShardIterator = getShardIteratorOuput.ShardIterator
	getRecordsOuput, err := client.GetRecords(&getRecordsInput)
	//fmt.Println(getRecordsOuput)
	if err != nil {
		fmt.Println(err)
	} else {
		for *getRecordsOuput.NextShardIterator != "" {
			i := 0
			for i < len(getRecordsOuput.Records) {
				//fmt.Println(len(getRecordsOuput.Records))
				sdf := &dao.SensorDataFiltered{}
				sdf.PostDataToInfluxDB(getRecordsOuput.Records[i].Data)
				i++
			}
			getRecordsInput.ShardIterator = getRecordsOuput.NextShardIterator
			getRecordsOuput, err = client.GetRecords(&getRecordsInput)
		}
	}
}

package consumer

import (
	"config"
	"fmt"

	"github.com/aws/aws-sdk-go/service/kinesis"
	"velotio.com/dao"
)

func StartKinesisConsumer() {
	fmt.Println("Kinesis Consumer Started")
	client := config.GetKinesisClient()
	streamName := config.GetKinesisStreamName()
	var describeStreamInput kinesis.DescribeStreamInput
	describeStreamInput.StreamName = &streamName
	describeStreamOutput, err := client.DescribeStream(&describeStreamInput)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println(*describeStreamOutput.StreamDescription.Shards[0].ShardId)
	}
	var getShardIteratorInput kinesis.GetShardIteratorInput
	getShardIteratorInput.ShardId = describeStreamOutput.StreamDescription.Shards[0].ShardId
	getShardIteratorInput.StreamName = &streamName
	shardIteratorType := "TRIM_HORIZON"
	getShardIteratorInput.ShardIteratorType = &shardIteratorType
	getShardIteratorOuput, err := client.GetShardIterator(&getShardIteratorInput)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println(*getShardIteratorOuput.ShardIterator)
	}
	var getRecordsInput kinesis.GetRecordsInput

	getRecordsInput.ShardIterator = getShardIteratorOuput.ShardIterator
	getRecordsOuput, err := client.GetRecords(&getRecordsInput)
	//fmt.Println(getRecordsOuput)
	if err != nil {
		fmt.Println(err)
	} else {
		for *getRecordsOuput.NextShardIterator != "" {
			i := 0
			for i < len(getRecordsOuput.Records) {
				//fmt.Println(len(getRecordsOuput.Records))
				sdf := &dao.SensorDataFiltered{}
				sdf.PostDataToInfluxDB(getRecordsOuput.Records[i].Data)
				i++
			}
			getRecordsInput.ShardIterator = getRecordsOuput.NextShardIterator
			getRecordsOuput, err = client.GetRecords(&getRecordsInput)
		}

	}
}

Processing the data and writing it to InfluxDB

Now we do simple processing of filtering out data. The data that we got from the sensor is having fields sensorId, temperature, humidity, city, and timestamp but we are interested in only the values of temperature and humidity for a city so we have created a new structure ‘SensorDataFiltered’ which contains only the fields we need.

For every record that the Kinesis consumer receives it creates an instance of the SensorDataFiltered type and calls the PostDataToInfluxDB() method where the record received from the Kinesis stream is unmarshaled into the SensorDataFiltered type and send to InfluxDB. Here we need to provide the name of the database we created earlier to the variable dbName and the InfluxDB host and port values to dbHost and dbPort respectively.

In the InfluxDB request body, the first value that we provide is used as the measurement which is an InfluxDB struct to store similar data together. Then we have tags, we have used `city` as our tag so that we can filter the data based on them and then we have the actual values. For more details on InfluxDB data write format please refer here.

package dao
import (
	"bytes"
	"crypto/tls"
	"encoding/json"
	"fmt"
	"net/http"
)
type SensorDataFiltered struct {
	Temperature float64 `json:"temperature"`
	Humidity    float64 `json:"humidity"`
	City        string  `json:"city"`
}
var dbName = "iotdata"
var dbHost = "184.73.62.30"
var dbPort = "8086"
func (sdf *SensorDataFiltered) PostDataToInfluxDB(Data []byte) {
	err := json.Unmarshal(Data, &sdf)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println(sdf.Temperature, sdf.Humidity)
	}
	url := "http://" + dbHost + ":" + dbPort + "/write?db=" + dbName
	humidity := fmt.Sprintf("%.2f", sdf.Humidity)
	temperature := fmt.Sprintf("%.2f", sdf.Temperature)
	city := sdf.City
	requestBody := "sensordata,city=" + city + " humidity=" + humidity + ",temperature=" + temperature
	req, err := http.NewRequest("POST", url, bytes.NewBuffer([]byte(requestBody)))
	httpclient := &http.Client{
		Transport: &http.Transport{
			TLSClientConfig: &tls.Config{
				InsecureSkipVerify: true,
			},
		},
	}
	resp, err := httpclient.Do(req)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println("Status code for influxdb data port request = ", resp.StatusCode)
	}
	defer resp.Body.Close()
}

package dao

import (
	"bytes"
	"crypto/tls"
	"encoding/json"
	"fmt"
	"net/http"
)

type SensorDataFiltered struct {
	Temperature float64 `json:"temperature"`
	Humidity    float64 `json:"humidity"`
	City        string  `json:"city"`
}

var dbName = "iotdata"
var dbHost = "184.73.62.30"
var dbPort = "8086"

func (sdf *SensorDataFiltered) PostDataToInfluxDB(Data []byte) {
	err := json.Unmarshal(Data, &sdf)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println(sdf.Temperature, sdf.Humidity)
	}
	url := "http://" + dbHost + ":" + dbPort + "/write?db=" + dbName
	humidity := fmt.Sprintf("%.2f", sdf.Humidity)
	temperature := fmt.Sprintf("%.2f", sdf.Temperature)
	city := sdf.City
	requestBody := "sensordata,city=" + city + " humidity=" + humidity + ",temperature=" + temperature
	req, err := http.NewRequest("POST", url, bytes.NewBuffer([]byte(requestBody)))
	httpclient := &http.Client{
		Transport: &http.Transport{
			TLSClientConfig: &tls.Config{
				InsecureSkipVerify: true,
			},
		},
	}
	resp, err := httpclient.Do(req)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println("Status code for influxdb data port request = ", resp.StatusCode)
	}
	defer resp.Body.Close()

}

Once the data is written to InfluxDB we can see it in the web console by querying the measurement create in our database.

Putting everything together in our main function

Now we need to simply call the functions we discussed above and run our main program. Note that we have used `go` before the first two function call which makes them goroutines and they execute concurrently.

On running the code you will see the logs for all the stages of our pipeline getting written to the stdout and it very closely resembles real-life scenarios where data is written by IoT devices and gets processed in near real-time.

package main
import (
	"time"
	"velotio.com/consumer"
	"velotio.com/producer"
	"velotio.com/publisher"
)
func main() {
	go producer.StartKinesisProducer()
	go publisher.StartMQTTPublisher()
	time.Sleep(5 * time.Second)
	consumer.StartKinesisConsumer()
}

package main

import (
	"time"

	"velotio.com/consumer"
	"velotio.com/producer"
	"velotio.com/publisher"
)

func main() {

	go producer.StartKinesisProducer()
	go publisher.StartMQTTPublisher()
	time.Sleep(5 * time.Second)
	consumer.StartKinesisConsumer()

}

Visualization through Grafana

We can access the Grafana web console at port 3000 of the machine on which it is running. First, we need to add our InfluxDB as a data source to it under the data sources option.

For creating dashboard go to the dashboard option and choose new. Once the dashboard is created we can start by adding a panel.

We need to add Influxdb data source that we added earlier as the panel data source and write queries as shown in the image below.

We can repeat the same process for adding another panel to the dashboard this time choosing a different city in our query.

Conclusion:

IoT data analytics is a fast evolving and interesting space. The number of IoT devices are growing rapidly. There is a great opportunity to get valuable insights from the huge amount of data generated by these device. In this blog, I tried to help you grab that opportunity by building a near real time data pipeline for IoT data. If you like it please share and subscribe to our blog.

December 12, 2022

How Much Do You Really Know About Simplified Cloud Deployments?

Is your EC2/VM bill giving you sleepless nights?

Are your EC2 instances under-utilized? Have you been wondering if there was an easy way to maximize the EC2/VM usage?

Are you investing too much in your Control Plane and wish you could divert some of that investment towards developing more features in your applications (business logic)?

Is your Configuration Management system overwhelming you and seems to have got a life of its own?

Do you have legacy applications that do not need Docker at all?

Would you like to simplify your deployment toolchain to streamline your workflows?

Have you been recommended to use Kubernetes as a problem to fix all your woes, but you aren’t sure if Kubernetes is actually going to help you?

Do you feel you are moving towards Docker, just so that Kubernetes can be used?

If you answered “Yes” to any of the questions above, do read on, this article is just what you might need.

There are steps to create a simple setup on your laptop at the end of the article.

Introduction

In the following article, we will present the typical components of a multi-tier application and how it is setup and deployed.

We shall further go on to see how the same application deployment can be remodeled for scale using any Cloud Infrastructure. (The same software toolchain can be used to deploy the application on your On-Premise Infrastructure as well)

The tools that we propose are Nomad and Consul. We shall focus more on how to use these tools, rather than deep-dive into the specifics of the tools. We will briefly see the features of the software which would help us achieve our goals.

Nomad is a distributed workload manager for not only Docker containers, but also for various other types of workloads like legacy applications, JAVA, LXC, etc.

More about Nomad Drivers here: Nomadproject.io, application delivery with HashiCorp, introduction to HashiCorp Nomad.

Consul is a distributed service mesh, with features like service registry and a key-value store, among others.

Using these tools, the application/startup workflow would be as follows:

Nomad will be responsible for starting the service.

Nomad will publish the service information in Consul. The service information will include details like:

Where is the application running (IP:PORT) ?
What “service-name” is used to identify the application?
What “tags” (metadata) does this application have?

A Typical Application

A typical application deployment consists of a certain fixed set of processes, usually coupled with a database and a set of few (or many) peripheral services.

These services could be primary (must-have) or support (optional) features of the application.

Note: We are aware about what/how a proper “service-oriented-architecture” should be, though we will skip that discussion for now. We will rather focus on how real-world applications are setup and deployed.

Simple Multi-tier Application

In this section, let’s see the components of a multi-tier application along with typical access patterns from outside the system and within the system.

Load Balancer/Web/Front End Tier
Application Services Tier
Database Tier
Utility (or Helper Servers): To run background, cron, or queued jobs.

Using a proxy/loadbalancer, the services (Service-A, Service-B, Service-C) could be accessed using distinct hostnames:

a.example.tld
b.example.tld
c.example.tld

For an equivalent path-based routing approach, the setup would be similar. Instead of distinct hostnames, the communication mechanism would be:

common-proxy.example.tld/path-a/
common-proxy.example.tld/path-b/
common-proxy.example.tld/path-c/

Problem Scenario 1

Some of the basic problems with the deployment of the simple multi-tier application are:

What if the service process crashes during its runtime?
What if the host on which the services run shuts down, reboots or terminates?

This is where Nomad’s feature of always keep the service running would be useful.

In spite of this auto-restart feature, there could be issues if the service restarts on a different machine (i.e. different IP address).

In case of Docker and ephemeral ports, the service could start on a different port as well.

To solve this, we will use the service discovery feature provided by Consul, combined with a with a Consul-aware load-balancer/proxy to redirect traffic to the appropriate service.

The order of the operations within the Nomad job will thus be:

Nomad will launch the job/task.
Nomad will register the task details as a service definition in Consul.
(These steps will be re-executed if/when the application is restarted due to a crash/fail-over)
The Consul-aware load-balancer will route the traffic to the service (IP:PORT)

Multi-tier Application With Load Balancer

Using the Consul-aware load-balancer, the diagram will now look like:

The details of the setup now are:

A Consul-aware load-balancer/proxy; the application will access the services via the load-balancer.
3 (three) instances of service A; A1, A2, A3
3 (three) instances of service B; B1, B2, B3

The Routing Question

At this moment, you could be wondering, “Why/How would the load-balancer know that it has to route traffic for service-A to A1/A2/A3 and route traffic for service-B to B1/B2/B3 ?”

The answer lies in the Consul tags which will be published as part of the service definition (when Nomad registers the service in Consul).

The appropriate Consul tags will tell the load-balancer to route traffic of a particular service to the appropriate backend. (+++)

Let’s read that statement again (very slowly, just to be sure); The Consul tags, which are part of the service definition, will inform (advertise) the load-balancer to route traffic to the appropriate backend.

The reason to dwell upon this distinction is very important, as this is different from how the classic load-balancer/proxy software like HAProxy or NGINX are configured. For HAProxy/NGINX the backend routing information resides with the load-balancer instance and is not “advertised” by the backend.

The traditional load-balancers like NGINX/HAProxy do not natively support dynamic reloading of the backends. (when the backends stop/start/move-around). The heavy lifting of regenerating the configuration file and reloading the service is left up to an external entity like Consul-Template.

The use of a Consul-aware load-balancer, instead of a traditional load-balancer, eliminates the need of external workarounds.

The setup can thus be termed as a zero-configuration setup; you don’t have to re-configure the load-balancer, it will discover the changing backend services based on the information available from Consul.

Problem Scenario 2

So far we have achieved a method to “automatically” discover the backends, but isn’t the Load-Balancer itself a single-point-of-failure (SPOF)?

It absolutely is, and you should always have redundant load-balancers instances (which is what any cloud-provided load-balancer has).

As there is a certain cost associated with using “cloud-provided load-balancer”, we would create the load-balancers ourselves and not use cloud-provided load-balancers.

To provide redundancy to the load-balancer instances, you should configure them using and AutoScalingGroup (AWS), VM Scale Sets (Azure), etc.

The same redundancy strategy should also be used for the worker nodes, where the actual services reside, by using AutoScaling Groups/VMSS for the worker nodes.

The Complete Picture

Installation and Configuration

Given that nowadays laptops are pretty powerful, you can easily create a test setup on your laptop using VirtualBox, VMware Workstation Player, VMware Workstation, etc.

As a prerequisite, you will need a few virtual machines which can communicate with each other.

NOTE: Create the VMs with networking set to bridged mode.

The machines needed for the simple setup/demo would be:

1 Linux VM to act as a server (srv1)
1 Linux VM to act as a load-balancer (lb1)
2 Linux VMs to act as worker machines (client1, client2)

*** Each machine can be 2 CPU 1 GB memory each.

The configuration files and scripts needed for the demo, which will help you set up the Nomad and Consul cluster are available here.

Setup the Server

Install the binaries on the server

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service

Create the Server Configuration

### On the server machine ...

### Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/server.hcl -O /etc/consul/server.hcl

### Edit Consul's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
sudo vim /etc/consul/server.hcl

### Nomad
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/server.hcl -O /etc/nomad/server.hcl

### Edit Nomad's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
sudo vim /etc/nomad/server.hcl

### After you are done with the edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members

### On the server machine ...

### Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/server.hcl -O /etc/consul/server.hcl

### Edit Consul's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
sudo vim /etc/consul/server.hcl

### Nomad
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/server.hcl -O /etc/nomad/server.hcl

### Edit Nomad's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
sudo vim /etc/nomad/server.hcl

### After you are done with the edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members

Setup the Load-Balancer

Install the binaries on the server

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service
sudo mv -fv nomad.service /etc/systemd/system/nomad.service

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service
sudo mv -fv nomad.service /etc/systemd/system/nomad.service

Create the Load-Balancer Configuration

### On the load-balancer machine ...

### for Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl

### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/consul/client.hcl

### for Nomad ...
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl

### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/nomad/client.hcl

### After you are done with the edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members
sudo nomad node status -verbose

### On the load-balancer machine ...

### for Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl

### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/consul/client.hcl

### for Nomad ...
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl

### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/nomad/client.hcl

### After you are done with the edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members
sudo nomad node status -verbose

Setup the Client (Worker) Machines

Install the binaries on the server

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service
sudo mv -fv nomad.service /etc/systemd/system/nomad.service

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service
sudo mv -fv nomad.service /etc/systemd/system/nomad.service

Create the Worker Configuration

### On the client (worker) machine ...

### Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl

### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/consul/client.hcl

### Nomad
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl

### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/nomad/client.hcl

### After you are sure about your edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members
sudo nomad node status -verbose

### On the client (worker) machine ...

### Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl

### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/consul/client.hcl

### Nomad
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl

### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/nomad/client.hcl

### After you are sure about your edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members
sudo nomad node status -verbose

Test the Setup

For the sake of simplicity, we shall assume the following IP addresses for the machines. (You can adapt the IPs as per your actual cluster configuration)

srv1: 192.168.1.11

lb1: 192.168.1.101

client1: 192.168.201

client1: 192.168.202

You can access the web GUI for Consul and Nomad at the following URLs:

Consul: http://192.168.1.11:8500

Nomad: http://192.168.1.11:4646

# watch -n 5 "consul members; echo; nomad server members; echo; nomad node status -verbose; echo; nomad job status"

# watch -n 5 "consul members; echo; nomad server members; echo; nomad node status -verbose; echo; nomad job status"

Output:

Node     Address             Status  Type    Build  Protocol  DC   Segment
srv1     192.168.1.11:8301   alive   server  1.5.1  2         dc1  <all>
client1  192.168.1.201:8301  alive   client  1.5.1  2         dc1  <default>
client2  192.168.1.202:8301  alive   client  1.5.1  2         dc1  <default>
lb1      192.168.1.101:8301  alive   client  1.5.1  2         dc1  <default>

Name         Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
srv1.global  192.168.1.11  4648  alive   true    2         0.9.3  dc1         global

ID           DC   Name     Class   Address        Version Drain  Eligibility  Status
37daf354...  dc1  client2  worker  192.168.1.202  0.9.3  false  eligible     ready
9bab72b1...  dc1  client1  worker  192.168.1.201  0.9.3  false  eligible     ready
621f4411...  dc1  lb1      lb      192.168.1.101  0.9.3  false  eligible     ready

Node     Address             Status  Type    Build  Protocol  DC   Segment
srv1     192.168.1.11:8301   alive   server  1.5.1  2         dc1  <all>
client1  192.168.1.201:8301  alive   client  1.5.1  2         dc1  <default>
client2  192.168.1.202:8301  alive   client  1.5.1  2         dc1  <default>
lb1      192.168.1.101:8301  alive   client  1.5.1  2         dc1  <default>

Name         Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
srv1.global  192.168.1.11  4648  alive   true    2         0.9.3  dc1         global

ID           DC   Name     Class   Address        Version Drain  Eligibility  Status
37daf354...  dc1  client2  worker  192.168.1.202  0.9.3  false  eligible     ready
9bab72b1...  dc1  client1  worker  192.168.1.201  0.9.3  false  eligible     ready
621f4411...  dc1  lb1      lb      192.168.1.101  0.9.3  false  eligible     ready

Submit Jobs

Run the load-balancer job

# nomad run fabio_docker.nomad

# nomad run fabio_docker.nomad

Output:

==> Monitoring evaluation "bb140467"
    Evaluation triggered by job "fabio_docker"
    Allocation "1a6a5587" created: node "621f4411", group "fabio"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "bb140467" finished with status "complete"

==> Monitoring evaluation "bb140467"
    Evaluation triggered by job "fabio_docker"
    Allocation "1a6a5587" created: node "621f4411", group "fabio"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "bb140467" finished with status "complete"

Check the status of the load-balancer

# nomad alloc status 1a6a5587

# nomad alloc status 1a6a5587

Output:

ID                  = 1a6a5587
Eval ID             = bb140467
Name                = fabio_docker.fabio[0]
Node ID             = 621f4411
Node Name           = lb1
Job ID              = fabio_docker
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 1m9s ago
Modified            = 1m3s ago

Task "fabio" is "running"
Task Resources
CPU        Memory          Disk     Addresses
5/200 MHz  10 MiB/128 MiB  300 MiB  lb: 192.168.1.101:9999
                                    ui: 192.168.1.101:9998

Task Events:
Started At     = 2019-06-13T19:15:17Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-13T19:15:17Z  Started     Task started by client
2019-06-13T19:15:12Z  Driver      Downloading image
2019-06-13T19:15:12Z  Task Setup  Building Task Directory
2019-06-13T19:15:12Z  Received    Task received by client

ID                  = 1a6a5587
Eval ID             = bb140467
Name                = fabio_docker.fabio[0]
Node ID             = 621f4411
Node Name           = lb1
Job ID              = fabio_docker
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 1m9s ago
Modified            = 1m3s ago

Task "fabio" is "running"
Task Resources
CPU        Memory          Disk     Addresses
5/200 MHz  10 MiB/128 MiB  300 MiB  lb: 192.168.1.101:9999
                                    ui: 192.168.1.101:9998

Task Events:
Started At     = 2019-06-13T19:15:17Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-13T19:15:17Z  Started     Task started by client
2019-06-13T19:15:12Z  Driver      Downloading image
2019-06-13T19:15:12Z  Task Setup  Building Task Directory
2019-06-13T19:15:12Z  Received    Task received by client

Run the service ‘foo’

# nomad run foo_docker.nomad

# nomad run foo_docker.nomad

Output:

==> Monitoring evaluation "a994bbf0"
    Evaluation triggered by job "foo_docker"
    Allocation "7794b538" created: node "9bab72b1", group "gowebhello"
    Allocation "eecceffc" modified: node "37daf354", group "gowebhello"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "a994bbf0" finished with status "complete"

==> Monitoring evaluation "a994bbf0"
    Evaluation triggered by job "foo_docker"
    Allocation "7794b538" created: node "9bab72b1", group "gowebhello"
    Allocation "eecceffc" modified: node "37daf354", group "gowebhello"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "a994bbf0" finished with status "complete"

Check the status of service ‘foo’

# nomad alloc status 7794b538

# nomad alloc status 7794b538

Output:

ID                  = 7794b538
Eval ID             = a994bbf0
Name                = foo_docker.gowebhello[1]
Node ID             = 9bab72b1
Node Name           = client1
Job ID              = foo_docker
Job Version         = 1
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 9s ago
Modified            = 7s ago

Task "gowebhello" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/500 MHz  4.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23382

Task Events:
Started At     = 2019-06-13T19:27:17Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-13T19:27:17Z  Started     Task started by client
2019-06-13T19:27:16Z  Task Setup  Building Task Directory
2019-06-13T19:27:15Z  Received    Task received by client

ID                  = 7794b538
Eval ID             = a994bbf0
Name                = foo_docker.gowebhello[1]
Node ID             = 9bab72b1
Node Name           = client1
Job ID              = foo_docker
Job Version         = 1
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 9s ago
Modified            = 7s ago

Task "gowebhello" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/500 MHz  4.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23382

Task Events:
Started At     = 2019-06-13T19:27:17Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-13T19:27:17Z  Started     Task started by client
2019-06-13T19:27:16Z  Task Setup  Building Task Directory
2019-06-13T19:27:15Z  Received    Task received by client

Run the service ‘bar’

# nomad run bar_docker.nomad

# nomad run bar_docker.nomad

Output:

==> Monitoring evaluation "075076bc"
    Evaluation triggered by job "bar_docker"
    Allocation "9f16354b" created: node "9bab72b1", group "gowebhello"
    Allocation "b86d8946" created: node "37daf354", group "gowebhello"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "075076bc" finished with status "complete"

==> Monitoring evaluation "075076bc"
    Evaluation triggered by job "bar_docker"
    Allocation "9f16354b" created: node "9bab72b1", group "gowebhello"
    Allocation "b86d8946" created: node "37daf354", group "gowebhello"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "075076bc" finished with status "complete"

Check the status of service ‘bar’

# nomad alloc status 9f16354b

# nomad alloc status 9f16354b

Output:

ID                  = 9f16354b
Eval ID             = 075076bc
Name                = bar_docker.gowebhello[1]
Node ID             = 9bab72b1
Node Name           = client1
Job ID              = bar_docker
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 4m28s ago
Modified            = 4m16s ago

Task "gowebhello" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/500 MHz  6.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23646

Task Events:
Started At     = 2019-06-14T06:49:36Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-14T06:49:36Z  Started     Task started by client
2019-06-14T06:49:35Z  Task Setup  Building Task Directory
2019-06-14T06:49:35Z  Received    Task received by client

ID                  = 9f16354b
Eval ID             = 075076bc
Name                = bar_docker.gowebhello[1]
Node ID             = 9bab72b1
Node Name           = client1
Job ID              = bar_docker
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 4m28s ago
Modified            = 4m16s ago

Task "gowebhello" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/500 MHz  6.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23646

Task Events:
Started At     = 2019-06-14T06:49:36Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-14T06:49:36Z  Started     Task started by client
2019-06-14T06:49:35Z  Task Setup  Building Task Directory
2019-06-14T06:49:35Z  Received    Task received by client

Check the Fabio Routes

http://192.168.1.101:9998/routes

Connect to the Services

The services “foo” and “bar” are available at:

http://192.168.1.101:9999/foo

http://192.168.1.101:9999/bar

Output:

gowebhello root page

https://github.com/udhos/gowebhello is a simple golang replacement for 'python -m SimpleHTTPServer'.
Welcome!
gowebhello version 0.7 runtime go1.12.5 os=linux arch=amd64
Keepalive: true
Application banner: Welcome to FOO
...
...

gowebhello root page

https://github.com/udhos/gowebhello is a simple golang replacement for 'python -m SimpleHTTPServer'.
Welcome!
gowebhello version 0.7 runtime go1.12.5 os=linux arch=amd64
Keepalive: true
Application banner: Welcome to FOO
...
...

Pressing F5 to refresh the browser should keep changing the backend service that you are eventually connected to.

Conclusion

This article should give you a fair idea about the common problems of a distributed application and how they can be solved.

Remodeling an existing application deployment as it scales can be quite a challenge. Hopefully the sample/demo setup will help you to explore, design and optimize the deployment workflows of your application, be it On-Premise or any Cloud Environment.

December 12, 2022

Why You Should Prefer Next.js 12 Over Other React Setup
If you are coming from a robust framework, such as Angular or any other major full-stack framework, you have probably asked yourself why a popular library like React (yes, it’s not a framework, hence this blog) has the worst tooling and developer experience.

They’ve done the least amount of work possible to build this framework: no routing, no support for SSR, nor a decent design system, or CSS support. While some people might disagree—“The whole idea is to keep it simple so that people can bootstrap their own framework.” –Dan Abramov. However, here’s the catch: Most people don’t want to go through the tedious process of setting up.

Many just want to install and start building some robust applications, and with the new release of Next.js (12), it’s more production-ready than your own setup can ever be.

Before we get started discussing what Next.js 12 can do for us, let’s get some facts straight:
- React is indeed a library that could be used with or without JSX.
- Next.js is a framework (Not entirely UI ) for building full-stack applications.
- Next.js is opinionated, so if your plan is to do whatever you want or how you want, maybe Next isn’t the right thing for you (mind that it’s for production).
- Although Next is one of the most updated code bases and has a massive community supporting it, a huge portion of it is handled by Vercel, and like other frameworks backed by a tech giant… be ready for occasional Vendor-lockin (don’t forget React–[Meta] ).
- This is not a Next.js tutorial; I won’t be going over Next.js. I will be going over the features that are released with V12 that make it go over the inflection point where Next could be considered as the primary framework for React apps.
ES module support

ES modules bring a standardized module system to the entire JS ecosystem. They’re supported by all major browsers and node.js, enabling your build to have smaller package sizes. This lets you use any package using a URL—no installation or build step required—use any CDN that serves ES module as well as the design tools of the future (Framer already does it –https://www.framer.com/ ).
```
import Card from 'https://framer.com/m/Card-3Yxh.js@gsb1Gjlgc5HwfhuD1VId';
import Head from 'next/head';

export default class MyDocument extends Document {
  render() {
    return (
      <>
        <Head>
          <title>URL imports for Next 12</title>
        </Head>
        <div>
          <Card variant='R3F' />
        </div>
      </>
    );
  }
}
```
As you can see, we are importing a Card component directly from the framer CDN on the go with all its perks. This would, in turn, be the start of seamless integration with all your developer environments in the not-too-distant future. If you want to learn more about URL imports and how to enable the alpha version, go here.

New engine for faster DEV run and production build:

Next.js 12 comes with a new Rust compiler that comes with a native infrastructure. This is built on top of SWC, an open platform for fast tooling systems. It comes with an impressive stat of having 3 times faster local refresh and 5 times faster production builds.

Contrary to most productions builds with React using webpack, which come with a ton of overheads and don’t really run on the native system, SWC is going to save you a ton of time that you waste during your mundane workloads.

Source: Nextjs.org

Next.js Live:

If you are anything like me, you’ve probably had some changes that you aren’t really sure about and just want to go through them with the designer, but you don’t really wanna push the code to PROD. Taking a call with the designer and sharing your screen isn’t really the best way to do it. If only there were a way to share your workflow on-the-go with your team with some collaboration feature that just wouldn’t take up an entire day to setup. Well, Next.js Live lets you do just that.

Source: Next.js

With the help of ES module system and native support for webassembly, Next.js Live runs entirely on the browser, and irrespective of where you host it, the development engine behind it will soon be open source so that more platforms can actually take advantage of this, but for now, it’s all Next.js.

Go over to V and do a test run.

Middleware & serverless:

These are just repetitive pieces of code that you think could run on their own out of your actual backend. The best part about this is that you don’t really need to place these close to your backend. Before the request gets completed, you can potentially rewrite, redirect, add headers, or even stream HTML., Depending upon how you host your middleware using Vercel edge functions or lambdas with AWS, they can potentially handle
- Authentication
- Bot protection
- Redirects
- Browser support
- Feature flags
- A/B tests
- Server-side analytics
- Logging
And since this is part of the Next build output, you can technically use any hosting providers with an Edge network (No Vendor lock-in)

For implementing middleware, we can create a file _middleware inside any pages folder that will run before any requests at that particular route (routename)

pages/routeName/_middleware.ts.
```
import type { NextFetchEvent } from 'next/server';
import { NextResponse } from 'next/server';
export function middleware(event: NextFetchEvent) {
  // gram the user's location or use India for default
  const country = event.request.geo.country.toLowerCase() || 'IND';

  //rewrite to static, cached page for each local
  return event.respondWith(NextResponse.rewrite(`/routeName/${country}`));
}
```
Since this middleware, each request will be cached, and rewriting the response change the URL in your client Next.js can make the difference and still provide you the country flag.

Server-side streaming:

React 18 now supports server-side suspense API and SSR streaming. One big drawback of SSR was that it wasn’t restricted to the strict run time of REST fetch standard. So, in theory, any page that needed heavy lifting from the server could give you higher FCP (first contentful paint). Now this will allow you to stream server-rendered pages using HTTP streaming that will solve your problem for higher render time you can take a look at the alpha version by adding.
```
module.exports = {
  experimental: {
    concurrentFeatures: true
  }
}
```
React server components:

React server components allow us to render almost everything, including the components themselves inside the server. This is fundamentally different from SSR where you are just generating HTML on the server, with server components, there’s zero client-side Javascript needed, making the rendering process much faster (basically no hydration process). This could also be deemed as including the best parts of server rendering with client-side interactivity.
import Footer from '../components/Footer'; import Page from '../components/Page'; import Story from '../components/Story'; import fetchData from '../lib/api'; export async function getServerSideProps() { const storyIds = await fetchData('storyIds'); const data = await Promise.all( storyIds.slice(0, 30).map(async (id) => await fetchData(`item/${id}`)) ); return { props: { data, }, }; } export default function News({ data }) { return ( <Page> {data?.map((item, i) => ( <Story key={i} {...item} /> ))} <Footer /> </Page> ); }
```
import Footer from '../components/Footer';
import Page from '../components/Page';
import Story from '../components/Story';
import fetchData from '../lib/api';
export async function getServerSideProps() {
  const storyIds = await fetchData('storyIds');
  const data = await Promise.all(
    storyIds.slice(0, 30).map(async (id) => await fetchData(`item/${id}`))
  );

  return {
    props: {
      data,
    },
  };
}

export default function News({ data }) {
  return (
    <Page>
      {data?.map((item, i) => (
        <Story key={i} {...item} />
      ))}
      <Footer />
    </Page>
  );
}
```
As you can see in the above SSR example, while we are fetching the stories from the endpoint, our client is actually waiting for a response with a blank page, and depending upon how fast your APIs are, this is a pretty big problem—and the reason we don’t just use SSR blindly everywhere.

Now, let’s take a look at a server component example:

Any file ending with .server.js/.ts will be treated as a server component in your Next.js application.
export async function NewsWithData() { const storyIds = await fetchData('storyIds'); return ( <> {storyIds.slice(0, 30).map((id) => { return ( <Suspense fallback={<Spinner />}> <StoryWithData id={id} /> </Suspense> ); })} </> ); } export default function News() { return ( <Page> <Suspense fallback={<Spinner />}> <NewsWithData /> </Suspense> <Footer /> </Page> ); }
```
export async function NewsWithData() {
  const storyIds = await fetchData('storyIds');
  return (
    <>
      {storyIds.slice(0, 30).map((id) => {
        return (
          <Suspense fallback={<Spinner />}>
            <StoryWithData id={id} />
          </Suspense>
        );
      })}
    </>
  );
}

export default function News() {
  return (
    <Page>
      <Suspense fallback={<Spinner />}>
        <NewsWithData />
      </Suspense>
      <Footer />
    </Page>
  );
}
```
This implementation will stream your components progressively and eventually show your data as it gets generated in the server component–by-component. The difference is huge; it will be the next level of code-splitting ,and allow you to do data fetching at the component level and you don’t need to worry about making an API call in the browser.

And functions like getStaticProps and getserverSideProps will be a liability of the past.

And this also identifies the React Hooks model, going with the de-centralized component model. It also removes the choice we often need to make between static or dynamic, bringing the best of both worlds. In the future, the feature of incremental static regeneration will be based on a per-component level, removing the all or nothing page caching and in terms will allow decisive / intelligent caching based on your needs.

Next.js is internally working on a data component, which is basically the React suspense API but with surrogate keys, revalidate, and fallback, which will help to realize these things in the future. Defining your caching semantics at the component level

Conclusion:

Although all the features mentioned above are still in the development stage, just the inception of these will take the React world and frontend in general into a particular direction, and it’s the reason you should be keeping it as your default go-to production framework.
December 12, 2022
Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack
Introduction

The world continues to go through digital transformation at an accelerating pace. Modern applications and infrastructure continues to expand and operational complexity continues to grow. According to a recent ManageEngine Application Performance Monitoring Survey:
- 28 percent use ad-hoc scripts to detect issues in over 50 percent of their applications.
- 32 percent learn about application performance issues from end users.
- 59 percent trust monitoring tools to identify most performance deviations.
Most enterprises and web-scale companies have instrumentation & monitoring capabilities with an ElasticSearch cluster. They have a high amount of collected data but struggle to use it effectively. This available data can be used to improve availability and effectiveness of performance and uptime along with root cause analysis and incident prediction.

IT Operations & Machine Learning

Here is the main question: How to make sense of the huge piles of collected data? The first step towards making sense of data is to understand the correlations between the time series data. But only understanding will not work since correlation does not imply causation. We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc.

It’s very likely that due to one component something goes wrong with another component. In such cases, operational historical data can be used to identify the root cause by investigating through a series of intermediate causes and effects. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

Let’s see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

Anomaly Detection using Elastic’s machine learning with X-Pack

Step I: Setup

1. Setup Elasticsearch:

According to Elastic documentation, it is recommended to use the Oracle JDK version 1.8.0_131. Check if you have required Java version installed on your system. It should be at least Java 8, if required install/upgrade accordingly.
- Download elasticsearch tarball and untar it‍
```
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.tar.gz
$ tar -xzvf elasticsearch-5.5.1.tar.gz
```
- It will then create a folder named elasticsearch-5.5.1. Go into the folder.‍‍
```
$ cd elasticsearch-5.5.1
```
- Install X-Pack into Elasticsearch‍‍
```
$ ./bin/elasticsearch-plugin install x-pack
```
- Start elasticsearch‍‍
```
$ bin/elasticsearch
```
2. Setup Kibana

Kibana is an open source analytics and visualization platform designed to work with Elasticsearch.
- Download kibana tarball and untar it‍
```
$ wget https://artifacts.elastic.co/downloads/kibana/kibana-5.5.1-linux-x86_64.tar.gz
$ tar -xzf kibana-5.5.1-linux-x86_64.tar.gz
```
- It will then create a folder named kibana-5.5.1. Go into the directory.‍
```
$ cd kibana-5.5.1-linux-x86_64
```
- Install X-Pack into Kibana‍
```
$ ./bin/kibana-plugin install x-pack
```
- Running kibana‍
```
$ ./bin/kibana
```
- Navigate to Kibana at http://localhost:5601/
- Log in as the built-in user elastic and password changeme.
- You will see the below screen:
Kibana: X-Pack Welcome Page

3. Metricbeat:

Metricbeat helps in monitoring servers and the services they host by collecting metrics from the operating system and services. We will use it to get CPU utilization metrics of our local system in this blog.
- Download Metric Beat’s tarball and untar it‍
```
$ wget https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-5.5.1-linux-x86_64.tar.gz
$ tar -xvzf metricbeat-5.5.1-linux-x86_64.tar.gz
```
- It will create a folder metricbeat-5.5.1-linux-x86_64. Go to the folder‍
```
$ cd metricbeat-5.5.1-linux-x86_64
```
- By default, Metricbeat is configured to send collected data to elasticsearch running on localhost. If your elasticsearch is hosted on any server, change the IP and authentication credentials in metricbeat.yml file.
Metricbeat Config
- Metric beat provides following stats:
- System load
- CPU stats
- IO stats
- Per filesystem stats
- Per CPU core stats
- File system summary stats
- Memory stats
- Network stats
- Per process stats
- Start Metricbeat as daemon process‍
```
$ sudo ./metricbeat -e -c metricbeat.yml &
```
Now, all setup is done. Let’s go to step 2 to create machine learning jobs.

Step II: Time Series data
- Real-time data: We have metricbeat providing us the real-time series data which will be used for unsupervised learning. Follow below steps to define index pattern metricbeat-* in Kibana to search against this pattern in Elasticsearch:
  – Go to Management -> Index Patterns
  – Provide Index name or pattern as metricbeat-*
  – Select Time filter field name as @timestamp
  – Click Create
You will not be able to create an index if elasticsearch did not contain any metric beat data. Make sure your metric beat is running and output is configured as elasticsearch.
- Saved Historic data: Just to see quickly how machine learning detect the anomalies you can also use data provided by Elastic. Download sample data by clicking here.
- Unzip the files in a folder: tar -zxvf server_metrics.tar.gz
- Download this script. It will be used to upload sample data to elastic.
- Provide execute permissions to the file: chmod +x upload_server-metrics.sh
- Run the script.
- As we created index pattern for metricbeat data, in same way create index pattern server-metrics*
Step III: Creating Machine Learning jobs

There are two scenarios in which data is considered anomalous. First, when the behavior of key indicator changes over time relative to its previous behavior. Secondly, when within a population behavior of an entity deviates from other entities in population over single key indicator.

To detect these anomalies, there are three types of jobs we can create:
1. Single Metric job: This job is used to detect Scenario 1 kind of anomalies over only one key performance indicator.
2. Multimetric job: Multimetric job also detects Scenario 1 kind of anomalies but in this type of job we can track more than one performance indicators, such as CPU utilization along with memory utilization.
3. Advanced job: This kind of job is created to detect anomalies of type 2.
For simplicity, we are creating following single metric jobs:
1. Tracking CPU Utilization: Using metric beat data
2. Tracking total requests made on server: Using sample server data
Follow below steps to create single metric jobs:

Job1: Tracking CPU Utilization

Job2: Tracking total requests made on server
- Go to http://localhost:5601/
- Go to Machine learning tab on the left panel of Kibana.
- Click on Create new job
- Click Create single metric job
- Select index we created in Step 2 i.e. metricbeat-* and server-metrics* respectively
- Configure jobs by providing following values:
1. Aggregation: Here you need to select an aggregation function that will be applied to a particular field of data we are analyzing.
2. Field: It is a drop down, will show you all field that you have w.r.t index pattern.
3. Bucket span: It is interval time for analysis. Aggregation function will be applied on selected field after every interval time specified here.
- If your data contains so many empty buckets i.e. data is sparse and you don’t want to consider it as anomalous check the checkbox named sparse data (if it appears).
- Click on Use full <index pattern=””> data to use all available data for analysis.</index>
Metricbeats Description

Server Description
- Click on play symbol
- Provide job name and description
- Click on Create Job
After creating job the data available will be analyzed. Click on view results, you will see a chart which will show the actual and upper & lower bound of predicted value. If actual value lies outside of the range, it will be considered as anomalous. The Color of the circles represents the severity level.

Here we are getting a high range of prediction values since it just started learning. As we get more data the prediction will get better.

You can see here predictions are pretty good since there is a lot of data to understand the pattern
- Click on machine learning tab in the left panel. The jobs we created will be listed here.
- You will see the list of actions for every job you have created.
- Since we are storing every minute data for Job1 using metricbeat. We can feed the data to the job in real time. Click on play button to start data feed. As we get more and more data prediction will improve.
- You see details of anomalies by clicking Anomaly Viewer.
Anomaly in metricbeats data

Server metrics anomalies

We have seen how machine learning can be used to get patterns among the different statistics along with anomaly detection. After identifying anomalies, it is required to find the context of those events. For example, to know about what other factors are contributing to the problem? In such cases, we can troubleshoot by creating multimetric jobs.
December 12, 2022
Idiot-proof Coding with Node.js and Express.js
Node.js has become the most popular framework for web development surpassing Ruby on Rails and Django in terms of the popularity.The growing popularity of full stack development along with the performance benefits of asynchronous programming has led to the rise of Node’s popularity. ExpressJs is a minimalistic, unopinionated and the most popular web framework built for Node which has become the de-facto framework for many projects.
Note — This article is about building a Restful API server with ExpressJs . I won’t be delving into a templating library like handlebars to manage the views.

A quick search on google will lead you a ton of articles agreeing with what I just said which could validate the theory. Your next step would be to go through a couple of videos about ExpressJS on Youtube, try hello world with a boilerplate template, choose few recommended middleware for Express (Helmet, Multer etc), an ORM (mongoose if you are using Mongo or Sequelize if you are using relational DB) and start building the APIs. Wow, that was so fast!

The problem starts to appear after a few weeks when your code gets larger and complex and you realise that there is no standard coding practice followed across the client and the server code, refactoring or updating the code breaks something else, versioning of the APIs becomes difficult, call backs have made your life hell (you are smart if you are using Promises but have you heard of async-await?).

Do you think you your code is not so idiot-proof anymore? Don’t worry! You aren’t the only one who thinks this way after reading this.

Let me break the suspense and list down the technologies and libraries used in our idiot-proof code before you get restless.
1. Node 8.11.3: This is the latest LTS release from Node. We are using all the ES6 features along with async-await. We have the latest version of ExpressJs (4.16.3).
2. Typescript: It adds an optional static typing interface to Javascript and also gives us familiar constructs like classes (Es6 also gives provides class as a construct) which makes it easy to maintain a large codebase.
3. Swagger: It provides a specification to easily design, develop, test and document RESTful interfaces. Swagger also provides many open source tools like codegen and editor that makes it easy to design the app.
4. TSLint: It performs static code analysis on Typescript for maintainability, readability and functionality errors.
5. Prettier: It is an opinionated code formatter which maintains a consistent style throughout the project. This only takes care of the styling like the indentation (2 or 4 spaces), should the arguments remain on the same line or go to the next line when the line length exceeds 80 characters etc.
6. Husky: It allows you to add git hooks (pre-commit, pre-push) which can trigger TSLint, Prettier or Unit tests to automatically format the code and to prevent the push if the lint or the tests fail.
Before you move to the next section I would recommend going through the links to ensure that you have a sound understanding of these tools.

Now I’ll talk about some of the challenges we faced in some of our older projects and how we addressed these issues in the newer projects with the tools/technologies listed above.

Formal API definition

A problem that everyone can relate to is the lack of formal documentation in the project. Swagger addresses a part of this problem with their OpenAPI specification which defines a standard to design REST APIs which can be discovered by both machines and humans. As a practice, we first design the APIs in swagger before writing the code. This has 3 benefits:
- It helps us to focus only on the design without having to worry about the code, scaffolder, naming conventions etc. Our API designs are consistent with the implementation because of this focused approach.
- We can leverage tools like swagger-express-mw to internally wire the routes in the API doc to the controller, validate request and response object from their definitions etc.
- Collaboration between teams becomes very easy, simple and standardised because of the Swagger specification.
Code Consistency

We wanted our code to look consistent across the stack (UI and Backend)and we use ESlint to enforce this consistency.
Example –
Node traditionally used “require” and the UI based frameworks used “import” based syntax to load the modules. We decided to follow ES6 style across the project and these rules are defined with ESLint.

Note — We have made slight adjustments to the TSlint for the backend and the frontend to make it easy for the developers. For example, we allow upto 120 characters in React as some of our DOM related code gets lengthy very easily.

Code Formatting

This is as important as maintaining the code consistency in the project. It’s easy to read a code which follows a consistent format like indentation, spaces, line breaks etc. Prettier does a great job at this. We have also integrated Prettier with Typescript to highlight the formatting errors along with linting errors. IDE like VS Code also has prettier plugin which supports features like auto-format to make it easy.

Strict Typing

Typescript can be leveraged to the best only if the application follows strict typing. We try to enforce it as much as possible with exceptions made in some cases (mostly when a third party library doesn’t have a type definition). This has the following benefits:
- Static code analysis works better when your code is strongly typed. We discover about 80–90% of the issues before compilation itself using the plugins mentioned above.
- Refactoring and enhancements becomes very simple with Typescript. We first update the interface or the function definition and then follow the errors thrown by Typescript compiler to refactor the code.
Git Hooks

Husky’s “pre-push” hook runs TSLint to ensure that we don’t push the code with linting issues. If you follow TDD (in the way it’s supposed to be done), then you can also run unit tests before pushing the code. We decided to go with pre-hooks because
– Not everyone has CI from the very first day. With a git hook, we at least have some code quality checks from the first day.
– Running lint and unit tests on the dev’s system will leave your CI with more resources to run integration and other complex tests which are not possible to do in local environment.
– You force the developer to fix the issues at the earliest which results in better code quality, faster code merges and release.

Async-await

We were using promises in our project for all the asynchronous operations. Promises would often lead to a long chain of then-error blocks which is not very comfortable to read and often result in bugs when it got very long (it goes without saying that Promises are much better than the call back function pattern). Async-await provides a very clean syntax to write asynchronous operations which just looks like sequential code. We have seen a drastic improvement in the code quality, fewer bugs and better readability after moving to async-await.

Hope this article gave you some insights into tools and libraries that you can use to build a scalable ExpressJS app.
December 12, 2022
Cloud Native Applications — The Why, The What & The How
Cloud-native is an approach to build & run applications that can leverage the advantages of the cloud computing model — On demand computing power & pay-as-you-go pricing model. These applications are built and deployed in a rapid cadence to the cloud platform and offer organizations greater agility, resilience, and portability across clouds.

This blog explains the importance, the benefits and how to go about building Cloud Native Applications.

CLOUD NATIVE – The Why?

Early technology adapters like FANG (Facebook, Amazon, Netflix & Google) have some common themes when it comes to shipping software. They have invested heavily in building capabilities that enable them to release new features regularly (weekly, daily or in some cases even hourly). They have achieved this rapid release cadence while supporting safe and reliable operation of their applications; in turn allowing them to respond more effectively to their customers’ needs.

They have achieved this level of agility by moving beyond ad-hoc automation and by adopting cloud native practices that deliver these predictable capabilities. DevOps,Continuous Delivery, micro services & containers form the 4 main tenets of Cloud Native patterns. All of them have the same overarching goal of making application development and operations team more efficient through automation.

At this point though, these techniques have only been successfully proven at the aforementioned software driven companies. Smaller, more agile companies are also realising the value here. However, as per Joe Beda(creator of Kubernetes & CTO at Heptio) there are very few examples of this philosophy being applied outside these technology centric companies.

Any team/company shipping products should seriously consider adopting Cloud Native practices if they want to ship software faster while reducing risk and in turn delighting their customers.

CLOUD NATIVE – The What?

Cloud Native practices comprise of 4 main tenets.

Cloud native — main tenets
- DevOps is the collaboration between software developers and IT operations with the goal of automating the process of software delivery & infrastructure changes.
- Continuous Delivery enables applications to released quickly, reliably & frequently, with less risk.
- Micro-services is an architectural approach to building an application as a collection of small independent services that run on their own and communicate over HTTP APIs.
- Containers provide light-weight virtualization by dynamically dividing a single server into one or more isolated containers. Containers offer both effiiciency & speed compared to standard Virual Machines (VMs). Containers provide the ability to manage and migrate the application dependencies along with the application. while abstracting away the OS and the underlying cloud platform in many cases.
The benefits that can be reaped by adopting these methodologies include:
1. Self managing infrastructure through automation: The Cloud Native practice goes beyond ad-hoc automation built on top of virtualization platforms, instead it focuses on orchestration, management and automation of the entire infrastructure right upto the application tier.
2. Reliable infrastructure & application: Cloud Native practice ensures that it much easier to handle churn, replace failed components and even easier to recover from unexpected events & failures.
3. Deeper insights into complex applications: Cloud Native tooling provides visualization for health management, monitoring and notifications with audit logs making applications easy to audit & debug
4. Security: Enable developers to build security into applications from the start rather than an afterthought.
5. More efficient use of resources: Containers are lighter in weight that full systems. Deploying applications in containers lead to increased resource utilization.
Software teams have grown in size and the amount of applications and tools that a company needs to be build has grown 10x over last few years. Microservices break large complex applications into smaller pieces so that they can be developed, tested and managed independently. This enables a single microservice to be updated or rolled-back without affecting other parts of the application. Also nowadays software teams are distributed and microservices enables each team to own a small piece with service contracts acting as the communication layer.

CLOUD NATIVE – The How?

Now, lets look at the various building blocks of the cloud native stack that help achieve the above described goals. Here, we have grouped tools & solutions as per the problem they solve. We start with the infrastructure layer at the bottom, then the tools used to provision the infrastructure, following which we have the container runtime environment; above that we have tools to manage clusters of container environments and then at the very top we have the tools, frameworks to develop the applications.
1. Infrastructure: At the very bottom, we have the infrastructure layer which provides the compute, storage, network & operating system usually provided by the Cloud (AWS, GCP, Azure, Openstack, VMware).
2. Provisioning: The provisioning layer consists of automation tools that help in provisioning the infrastructure, managing images and deploying the application. Chef, Puppet & Ansible are the DevOps tools that give the ability to manage their configuration & environments. Spinnaker, Terraform, Cloudformation provide workflows to provision the infrastructure. Twistlock, Clair provide the ability to harden container images.

3. Runtime: The Runtime provides the environment in which the application runs. It consists of the Container Engines where the application runs along with the associated storage & networking. containerd & rkt are the most widely used Container engines. Flannel, OpenContrail provide the necessary overlay networking for containers to interact with each other and the outside world while Datera, Portworx, AppOrbit etc. provide the necessary persistent storage enabling easy movement of containers across clouds.

4. Orchestration and Management: Tools like Kubernetes, Docker Swarm and Apache Mesos abstract the management container clusters allowing easy scheduling & orchestration of containers across multiple hosts. etcd, Consul provide service registries for discovery while AVI, Envoy provide proxy, load balancer etc. services.

5. Application Definition & Development: We can build micro-services for applications across multiple langauges — Python, Spring/Java, Ruby, Node. Packer, Habitat & Bitnami provide image management for the application to run across all infrastructure — container or otherwise.
Jenkins, TravisCI, CircleCI and other build automation servers provide the capability to setup continuous integration and delivery pipelines.

6. Monitoring, Logging & Auditing: One of the key features of managing Cloud Native Infrastructure is the ability to monitor & audit the applications & underlying infrastructure.

All modern monitoring platforms like Datadog, Newrelic, AppDynamic support monitoring of containers & microservices.

Splunk, Elasticsearch & fluentd help in log aggregration while Open Tracing and Zipkin help in debugging applications.

7. Culture: Adopting cloud native practices needs a cultural change where teams no longer work in independent silos. End-to-end automation of software delivery pipelines is only possible when there is an increased collaboration between development and IT operations team with a shared responbility.

When we put all the pieces together we get the complete Cloud Native Landscape as shown below.

Cloud Native Landscape

I hope this post gives an idea why Cloud Native is important and what the main benefits are. As you may have noticed in the above infographic, there are several projects, tools & companies trying to solve similar problems. The next questions in mind most likely will be How do i get started? Which tools are right for me? and so on. I will cover these topics and more in my following blog posts. Stay tuned!

Please let us know what you think by adding comments to this blog or reaching out to chirag_jog or Velotio on Twitter.

Learn more about what we do at Velotio here and how Velotio can get you started on your cloud native journey here.

References:
December 12, 2022
Eliminate Render-blocking Resources using React and Webpack
In the previous blog, we learned how a browser downloads many scripts and useful resources to render a webpage. But not all of them are necessary to show a page’s content. Because of this, the page rendering is delayed. However, most of them will be needed as the user navigates through the website’s various pages.

In this article, we’ll learn to identify such resources and classify them as critical and non-critical. Once identified, we’ll inline the critical resources and defer the non-critical resources.

For this blog, we’ll use the following tools:
- Google Lighthouse and other Chrome DevTools to identify render-blocking resources.‍
- Webpack and CRACO to fix it.
Demo Configuration

For the demo, I have added the JavaScript below to the <head></head> of index.html as a render-blocking JS resource. This script loads two more CSS resources on the page.

https://use.fontawesome.com/3ec06e3d93.js

Other configurations are as follows:
- Create React App v4.0
- Formik and Yup for handling form validations
- Font Awesome and Bootstrap
- Lazy loading and code splitting using Suspense, React lazy, and dynamic import
- CRACO
- html-critical-webpack-plugin
- ngrok and serve for serving build
Render-Blocking Resources

A render-blocking resource typically refers to a script or link that prevents a browser from rendering the processed content.

Lighthouse will flag the below as render-blocking resources:
- A <script></script> tag in <head></head> that doesn’t have a defer or async attribute.
- A <link rel=””stylesheet””> tag that doesn’t have a media attribute to match a user’s device or a disabled attribute to hint browser to not download if unnecessary.
- A <link rel=””import””> that doesn’t have an async attribute.
Identifying Render-Blocking Resources

To reduce the impact of render-blocking resources, find out what’s critical for loading and what’s not.

To do that, we’re going to use the Coverage Tab in Chrome DevTools. Follow the steps below:

1. Open the Chrome DevTools (press F12)

2. Go to the Sources tab and press the keys to Run command

‍The below screenshot is taken on a macOS.

3. Search for Show Coverage and select it, which will show the Coverage tab below. Expand the tab.

4. Click on the reload button on the Coverage tab to reload the page and start instrumenting the coverage of all the resources loading on the current page.

5. After capturing the coverage, the resources loaded on the page will get listed (refer to the screenshot below). This will show you the code being used vs. the code loaded on the page.

The list will display coverage in 2 colors:

a. Green (critical) – The code needed for the first paint

b. Red (non-critical) – The code not needed for the first paint.

After checking each file and the generated index.html after the build, I found three primary non-critical files –

a. 5.20aa2d7b.chunk.css – 98% non-critical code

b. https://use.fontawesome.com/3ec06e3d93.js – 69.8% non-critical code. This script loads below CSS –

1. font-awesome-css.min.css – 100% non-critical code

2. https://use.fontawesome.com/3ec06e3d93.css – 100% non-critical code

c. main.6f8298b5.chunk.css – 58.6% non-critical code

The above resources satisfy the condition of a render-blocking resource and hence are prompted by the Lighthouse Performance report as an opportunity to eliminate the render-blocking resources (refer screenshot). You can reduce the page size by only shipping the code that you need.

Solution

Once you’ve identified critical and non-critical code, it is time to extract the critical part as an inline resource in index.html and deferring the non-critical part by using the webpack plugin configuration.

For Inlining and Preloading CSS:

Use html-critical-webpack-plugin to inline the critical CSS into index.html. This will generate a <style></style> tag in the <head> with critical CSS stripped out of the main CSS chunk and preloading the main file.</head>
const path = require('path'); const { whenProd } = require('@craco/craco'); const HtmlCriticalWebpackPlugin = require('html-critical-webpack-plugin'); module.exports = { webpack: { configure: (webpackConfig) => { return { ...webpackConfig, plugins: [ ...webpackConfig.plugins, ...whenProd( () => [ new HtmlCriticalWebpackPlugin({ base: path.resolve(__dirname, 'build'), src: 'index.html', dest: 'index.html', inline: true, minify: true, extract: true, width: 320, height: 565, penthouse: { blockJSRequests: false, }, }), ], [] ), ], }; }, }, };
```
const path = require('path');
const { whenProd } = require('@craco/craco');
const HtmlCriticalWebpackPlugin = require('html-critical-webpack-plugin');

module.exports = {
  webpack: {
    configure: (webpackConfig) => {
      return {
        ...webpackConfig,
        plugins: [
          ...webpackConfig.plugins,
          ...whenProd(
            () => [
              new HtmlCriticalWebpackPlugin({
                base: path.resolve(__dirname, 'build'),
                src: 'index.html',
                dest: 'index.html',
                inline: true,
                minify: true,
                extract: true,
                width: 320,
                height: 565,
                penthouse: {
                  blockJSRequests: false,
                },
              }),
            ],
            []
          ),
        ],
      };
    },
  },
};
```
Once done, create a build and deploy. Here’s a screenshot of the improved opportunities:

To use CRACO, refer to its README file.

‍NOTE: If you’re planning to use the critters-webpack-plugin please check these issues first: Could not find HTML asset and Incompatible with html-webpack-plugin v4.

For Deferring Routes/Pages:

Use lazy-loading and code-splitting techniques along with webpack’s magic comments as below to preload or prefetch a route/page according to your use case.
import { Suspense, lazy } from 'react'; import { Redirect, Route, Switch } from 'react-router-dom'; import Loader from '../../components/Loader'; import './style.scss'; const Login = lazy(() => import( /* webpackChunkName: "login" */ /* webpackPreload: true */ '../../containers/Login' ) ); const Signup = lazy(() => import( /* webpackChunkName: "signup" */ /* webpackPrefetch: true */ '../../containers/Signup' ) ); const AuthLayout = () => { return ( <Suspense fallback={<Loader />}> <Switch> <Route path="/auth/login" component={Login} /> <Route path="/auth/signup" component={Signup} /> <Redirect from="/auth" to="/auth/login" /> </Switch> </Suspense> ); }; export default AuthLayout;
```
import { Suspense, lazy } from 'react';
import { Redirect, Route, Switch } from 'react-router-dom';
import Loader from '../../components/Loader';

import './style.scss';

const Login = lazy(() =>
  import(
    /* webpackChunkName: "login" */ /* webpackPreload: true */ '../../containers/Login'
  )
);
const Signup = lazy(() =>
  import(
    /* webpackChunkName: "signup" */ /* webpackPrefetch: true */ '../../containers/Signup'
  )
);

const AuthLayout = () => {
  return (
    <Suspense fallback={<Loader />}>
      <Switch>
        <Route path="/auth/login" component={Login} />
        <Route path="/auth/signup" component={Signup} />
        <Redirect from="/auth" to="/auth/login" />
      </Switch>
    </Suspense>
  );
};

export default AuthLayout;
```
The magic comments enable webpack to add correct attributes to defer the scripts according to the use-case.

For Deferring External Scripts:

For those who are using a version of webpack lower than 5, use script-ext-html-webpack-plugin or resource-hints-webpack-plugin.

I would recommend following the simple way given below to defer an external script.
```
// Add defer/async attribute to external render-blocking script
<script async defer src="https://use.fontawesome.com/3ec06e3d93.js"></script>
```
The defer and async attributes can be specified on an external script. The async attribute has a higher preference. For older browsers, it will fallback to the defer behaviour.

If you want to know more about the async/defer, read the further reading section.

Along with defer/async, we can also use media attributes to load CSS conditionally.

It’s also suggested to load fonts locally instead of using full CDN in case we don’t need all the font-face rules added by Font providers.

Now, let’s create and deploy the build once more and check the results.

The opportunity to eliminate render-blocking resources shows no more in the list.

We have finally achieved our goal!

Final Thoughts

The above configuration is a basic one. You can read the libraries’ docs for more complex implementation.

Let me know if this helps you eliminate render-blocking resources from your app.

If you want to check out the full implementation, here’s the link to the repo. I have created two branches—one with the problem and another with the solution. Read the further reading section for more details on the topics.

Hope this helps.

Happy Coding!

Further Reading
- Eliminate render-blocking resources ‍
- Scripts: async, defer
December 12, 2022
Installing Redis Cluster with Persistent Storage on Mesosphere DC/OS
In the first part of this blog, we saw how to install standalone redis service on DC/OS with Persistent storage using RexRay and AWS EBS volumes.

A single server is a single point of failure in every system, so to ensure high availability of redis database, we can deploy a master-slave cluster of Redis servers. In this blog, we will see how to setup such 6 node (3 master, 3 slave) Redis cluster and persist data using RexRay and AWS EBS volumes. After that we will see how to import existing data into this cluster.

Redis Cluster

It is form of replicated Redis servers in multi-master architecture. All the data is sharded into 16384 buckets, where every master node is assigned subset of buckets out of them (generally evenly sharded) and each master replicated by its slaves. It provides more resilience and scaling for production grade deployments where heavy workload is expected. Applications can connect to any node in cluster mode and the request will be redirected to respective master node.

Source: Octo

Objective: To create a Redis cluster with number of services in DCOC environment with persistent storage and import the existing Redis dump.rdb data to the cluster.

Prerequisites :
- Make sure rexray component is running and is in a healthy state for DCOS cluster.
Steps:
- As per Redis doc, the minimal cluster should have at least 3 master and 3 slave nodes, so making it a total 6 Redis services.
- All services will use similar json configuration except changes in names of service, external volume, and port mappings.
- We will deploy one Redis service for each Redis cluster node and once all services are running, we will form cluster among them.
- We will use host network for Redis node containers, for that we will restrict Redis nodes to run on particular node. This will help us to troubleshoot cluster (fixed IP, so we can restart Redis node any time without data loss).
- Using host network adds a prerequisites that number of dcos nodes >= number of Redis nodes.
1. First create Redis node services on DCOS:
2. Click on the Add button in Services tab of DCOS UI
- Click on JSON configuration
- Add below json config for Redis service, change the values which are written in BLOCK letters with # as prefix and suffix.
- #NODENAME# – Name of Redis node (Ex. redis-node-1)
- #NODEHOSTIP# – IP of dcos node on which this Redis node will run. This ip must be unique for each Redis node. (Ex. 10.2.12.23)
- #VOLUMENAME# – Name of persistent volume, Give name to identify volume on AWS EBS (Ex. <dcos cluster=”” name=””>-redis-node-<node number=””>)</node></dcos>
- #NODEVIP# – VIP For the Redis node. It must be ‘Redis’ for first Redis node, for others it can be the same as NODENAME (Ex. redis-node-2)
{ "id": "/#NODENAME#", "backoffFactor": 1.15, "backoffSeconds": 1, "constraints": [ [ "hostname", "CLUSTER", "#NODEHOSTIP#" ] ], "container": { "type": "DOCKER", "volumes": [ { "external": { "name": "#VOLUMENAME#", "provider": "dvdi", "options": { "dvdi/driver": "rexray" } }, "mode": "RW", "containerPath": "/data" } ], "docker": { "image": "parvezkazi13/redis:latest", "forcePullImage": false, "privileged": false, "parameters": [] } }, "cpus": 0.5, "disk": 0, "fetch": [], "healthChecks": [], "instances": 1, "maxLaunchDelaySeconds": 3600, "mem": 4096, "gpus": 0, "networks": [ { "mode": "host" } ], "portDefinitions": [ { "labels": { "VIP_0": "/#NODEVIP#:6379" }, "name": "#NODEVIP#", "protocol": "tcp", "port": 6379 } ], "requirePorts": true, "upgradeStrategy": { "maximumOverCapacity": 0, "minimumHealthCapacity": 0.5 }, "killSelection": "YOUNGEST_FIRST", "unreachableStrategy": { "inactiveAfterSeconds": 300, "expungeAfterSeconds": 600 } }
```
{
   "id": "/#NODENAME#",
   "backoffFactor": 1.15,
   "backoffSeconds": 1,
   "constraints": [
     [
       "hostname",
       "CLUSTER",
       "#NODEHOSTIP#"
     ]
   ],
   "container": {
     "type": "DOCKER",
     "volumes": [
       {
         "external": {
           "name": "#VOLUMENAME#",
           "provider": "dvdi",
           "options": {
             "dvdi/driver": "rexray"
           }
         },
         "mode": "RW",
         "containerPath": "/data"
       }
     ],
     "docker": {
       "image": "parvezkazi13/redis:latest",
       "forcePullImage": false,
       "privileged": false,
       "parameters": []
     }
   },
   "cpus": 0.5,
   "disk": 0,
   "fetch": [],
   "healthChecks": [],
   "instances": 1,
   "maxLaunchDelaySeconds": 3600,
   "mem": 4096,
   "gpus": 0,
   "networks": [
     {
       "mode": "host"
     }
   ],
   "portDefinitions": [
     {
       "labels": {
         "VIP_0": "/#NODEVIP#:6379"
       },
       "name": "#NODEVIP#",
       "protocol": "tcp",
       "port": 6379
     }
   ],
   "requirePorts": true,
   "upgradeStrategy": {
     "maximumOverCapacity": 0,
     "minimumHealthCapacity": 0.5
   },
   "killSelection": "YOUNGEST_FIRST",
   "unreachableStrategy": {
     "inactiveAfterSeconds": 300,
     "expungeAfterSeconds": 600
   }
 }
```
- After updating the highlighted fields, copy above json to json configuration box, click on ‘Review & Run’ button in the right corner, this will start the service with above configuration.
- Once above service is UP and Running, then repeat the step 2 to 4 for each Redis node with respective values for highlighted fields.
- So if we go with 6 node cluster, at the end we will have 6 Redis nodes UP and Running, like:
Note: Since we are using external volume for persistent storage, we can not scale our services, i.e. each service will only one instance max. If we try to scale, we will get below error :

2. Form the Redis cluster between Redis node services:
- To create or manage Redis-cluster, first deploy redis-cluster-util container on DCOS using below json config:
{ "id": "/infrastructure/redis-cluster-util", "backoffFactor": 1.15, "backoffSeconds": 1, "constraints": [], "container": { "type": "DOCKER", "volumes": [ { "containerPath": "/backup", "hostPath": "backups", "mode": "RW" } ], "docker": { "image": "parvezkazi13/redis-util", "forcePullImage": true, "privileged": false, "parameters": [] } }, "cpus": 0.25, "disk": 0, "fetch": [], "instances": 1, "maxLaunchDelaySeconds": 3600, "mem": 4096, "gpus": 0, "networks": [ { "mode": "host" } ], "portDefinitions": [], "requirePorts": true, "upgradeStrategy": { "maximumOverCapacity": 0, "minimumHealthCapacity": 0.5 }, "killSelection": "YOUNGEST_FIRST", "unreachableStrategy": { "inactiveAfterSeconds": 300, "expungeAfterSeconds": 600 }, "healthChecks": [] }
```
{
 "id": "/infrastructure/redis-cluster-util",
 "backoffFactor": 1.15,
 "backoffSeconds": 1,
 "constraints": [],
 "container": {
   "type": "DOCKER",
   "volumes": [
     {
       "containerPath": "/backup",
       "hostPath": "backups",
       "mode": "RW"
     }
   ],
   "docker": {
     "image": "parvezkazi13/redis-util",
     "forcePullImage": true,
     "privileged": false,
     "parameters": []
   }
 },
 "cpus": 0.25,
 "disk": 0,
 "fetch": [],
 "instances": 1,
 "maxLaunchDelaySeconds": 3600,
 "mem": 4096,
 "gpus": 0,
 "networks": [
   {
     "mode": "host"
   }
 ],
 "portDefinitions": [],
 "requirePorts": true,
 "upgradeStrategy": {
   "maximumOverCapacity": 0,
   "minimumHealthCapacity": 0.5
 },
 "killSelection": "YOUNGEST_FIRST",
 "unreachableStrategy": {
   "inactiveAfterSeconds": 300,
   "expungeAfterSeconds": 600
 },
 "healthChecks": []
}
```
This will run service as :
- Get the IP addresses of all Redis nodes to form the cluster, as Redis-cluster can not be created with node’s hostname / dns. This is an open issue.
Since we are using host network, we need the dcos node IP on which Redis nodes are running.

Get all Redis nodes IP using:
```
NODE_BASE_NAME=redis-nodedcos task $NODE_BASE_NAME | grep -E "$NODE_BASE_NAME-<[0-9]>" | awk '{print $2":6379"}' | paste -s -d' '  
```
Here Redis-node is the prefix used for all Redis nodes.

Note the output of this command, we will use it in further steps.
- Get the node where redis-cluster-util container is running and ssh to dcos node using:
```
dcos node ssh --master-proxy --private-ip $(dcos task | grep "redis-cluster-util" | awk '{print $2}')
```
- Now find the docker container id of redis-cluster-util and exec it using:
```
docker exec -it $(docker ps -qf ancestor="parvezkazi13/redis-util") bash  
```
- No we are inside the redis-cluster-util container. Run below command to form Redis cluster.
```
redis-trib.rb create --replicas 1 <Space separated IP address:PORT pair of all Redis nodes>
```
- Here use the Redis nodes IP addresses which retrieved in step 2.
```
redis-trib.rb create --replicas 1 10.0.1.90:6379 10.0.0.19:6379 10.0.9.203:6379 10.0.9.79:6379 10.0.3.199:6379 10.0.9.104:6379
```
- Parameters:
- The option –replicas 1 means that we want a slave for every master created.
- The other arguments are the list of addresses (host:port) of the instances we want to use to create the new cluster.
- Output:
- Select ‘yes’ when it prompts to set the slot configuration shown.
- Run below command to check the status of the newly create cluster
```
redis-trib.rb check <Any redis node host:PORT>
Ex:
redis-trib.rb check 10.0.1.90:6379
```
- Parameters:
- host:port of any node from the cluster.
- Output:
- If all OK, it will show OK with status, else it will show ERR with the error message.
3. Import existing dump.rdb to Redis cluster
- At this point, all the Redis nodes should be empty and each one should have an ID and some assigned slots:
Before reuse an existing dump data, we have to reshard all slots to one instance. We specify the number of slots to move (all, so 16384), the id we move to (here Node 1 – 10.0.1.90:6379) and where we take these slots from (all other nodes).
```
redis-trib.rb reshard 10.0.1.90:6379  
```
Parameters:

host:port of any node from the cluster.

Output:

It will prompt for number of slots to move – here all. i.e 16384

Receiving node id – here id of node 10.0.1.90:6379 (redis-node-1)

Source node IDs – here all, as we want to shard all slots to one node.

And prompt to proceed – press ‘yes’
- Now check again node 10.0.1.90:6379
```
redis-trib.rb check 10.0.1.90:6379  
```
Parameters: host:port of any node from the cluster.

Output: it will show all (16384) slots moved to node 10.0.1.90:6379
- Next step is Importing our existing Redis dump data.
Now copy the existing dump.rdb to our redis-cluster-util container using below steps:

– Copy existing dump.rdb to dcos node on which redis-cluster-util container is running. Can use scp from any other public server to dcos node.

– Now we have dump .rdb in our dcos node, copy this dump.rdb to redis-cluster-util container using below command:
```
docker cp dump.rdb $(docker ps -qf ancestor="parvezkazi13/redis-util"):/data
```
Now we have dump.rdb in our redis-cluster-util container, we can import it to our Redis cluster. Execute and go to the redis-cluster-util container using:
```
docker exec -it $(docker ps -qf ancestor="parvezkazi13/redis-util") bash
```
It will execute redis-cluster-util container which is already running and start its bash cmd.

Run below command to import dump.rdb to Redis cluster:
```
rdb --command protocol /data/dump.rdb | redis-cli --pipe -h 10.0.1.90 -p 6379
```
Parameters:

Path to dump.rdb

host:port of any node from the cluster.

Output:

If successful, you’ll see something like:
```
All data transferred. Waiting for the last reply...Last reply received from server.errors: 0, replies: 4341259  
```
as well as this in the Redis server logs:
```
95086:M 01 Mar 21:53:42.071 * 10000 changes in 60 seconds. Saving...95086:M 01 Mar 21:53:42.072 * Background saving started by pid 9822398223:C 01 Mar 21:53:44.277 * DB saved on disk
```
WARNING:
Like our Oracle DB instance can have multiple databases, similarly Redis saves keys in keyspaces.
Now when Redis is in cluster mode, it does not accept the dumps which has more than one keyspaces. As per documentation:
‍
“Redis Cluster does not support multiple databases like the stand alone version of Redis. There is just database 0 and the SELECT command is not allowed. “‍

So while importing such multi-keyspace Redis dump, server fails while starting on below issue :
```
23049:M 16 Mar 17:21:17.772 * DB loaded from disk: 5.222 seconds
23049:M 16 Mar 17:21:17.772 # You can't have keys in a DB different than DB 0 when in Cluster mode. Exiting.
Solution / WA :
```
There is redis-cli command “MOVE” to move keys from one keyspace to another keyspace.

Also can run below command to move all the keys from keyspace 1 to keyspace 0 :
```
redis-cli -h "$HOST" -p "$PORT" -n 1 --raw keys "*" |  xargs -I{} redis-cli -h "$HOST" -p "$PORT" -n 1 move {} 0
```
- Verify import status, using below commands : (inside redis-cluster-util container)
```
redis-cli -h 10.0.1.90 -p 6379 info keyspace
```
It will run Redis info command on node 10.0.1.90:6379 and fetch keyspace information, like below:

# Keyspace
db0:keys=33283,expires=0,avg_ttl=0
- Now reshard all the slots to all instances evenly
The reshard command will again list the existing nodes, their IDs and the assigned slots.
```
redis-trib.rb reshard 10.0.1.90:6379
```
Parameters:

host:port of any node from the cluster.

Output:

It will prompt for number of slots to move – here (16384 /3 Masters = 5461)

Receiving node id – here id of master node 2

Source node IDs – id of first instance which has currently all the slots. (master 1)

And prompt to proceed – press ‘yes’

Repeat above step and for receiving node id, give id of master node 3.
- After the above step, all 3 masters will have equal slots and imported keys will be distributed among the master nodes.
- Put keys to cluster for verification
```
redis-cli -h 10.0.1.90 -p 6379 set foo bar
OK
redis-cli -h 10.0.1.90 -p 6379 set foo bar
(error) MOVED 4813 10.0.9.203:6379
```
Above error shows that server saved this key to instance 10.0.9.203:6379, so client redirected it. To follow redirection, use flag “-c” which says it is a cluster mode, like:
```
redis-cli -h 10.0.1.90 -p 6379 -c set foo bar
OK
```
Redis Entrypoint

Application entrypoint for Redis cluster is mostly depends how your Redis client handles cluster support. Generally connecting to one of master nodes should do the work.

Use below host:port in your applications :

redis.marathon.l4lb.thisdcos.directory:6379

Automation of Redis Cluster Creation

We have automation script in place to deploy 6 node Redis cluster and form a cluster between them.

Script location: Github
- It deploys 6 marathon apps for 6 Redis nodes. All nodes are deployed on different nodes with CLUSTER_NAME as prefix to volume name.
- Once all nodes are up and running, it deploys redis-cluster-util app which will be used to form Redis cluster.
- Then it will print the Redis nodes and their IP addresses and prompt the user to proceed cluster creation.
- If user selects to proceed, it will run redis-cluster-util app and create the cluster using IP addresses collected. Util container will prompt for some input that the user has to select.
Conclusion

We learned about Redis cluster deployment on DCOS with Persistent storage using RexRay. We also learned how rexray automatically manages volumes over aws ebs and how to integrate them in DCOS apps/services. We saw how to use redis-cluster-util container to manage Redis cluster for different purposes, like forming cluster, resharding, importing existing dump.rdb data etc. Finally, we looked at the automation part of whole cluster setup using dcos cli and bash.

Reference
- Persistent Volumes
- Storage Management Solution
- Redis Cluster Official Tutorial
- Redis Cluster Dockerfile
- Redis Cluster Util Dockerfile
- Redis Clients
December 12, 2022

Mesosphere DC/OS Masterclass : Tips and Tricks to Make Life Easier

DC/OS is an open-source operating system and distributed system for data center built on Apache Mesos distributed system kernel. As a distributed system, it is a cluster of master nodes and private/public nodes, where each node also has host operating system which manages the underlying machine.

It enables the management of multiple machines as if they were a single computer. It automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate remote management and monitoring of the cluster and its services.

Distributed System : DC/OS is distributed system with group of private and public nodes which are coordinated by master nodes.
Cluster Manager : DC/OS is responsible for running tasks on agent nodes and providing required resources to them. DC/OS uses Apache Mesos to provide cluster management functionality.
Container Platform : All DC/OS tasks are containerized. DC/OS uses two different container runtimes, i.e. docker and mesos. So that containers can be started from docker images or they can be native executables (binaries or scripts) which are containerized at runtime by mesos.
Operating System : As name specifies, DC/OS is an operating system which abstracts cluster h/w and s/w resources and provide common services to applications.

Unlike Linux, DC/OS is not a host operating system. DC/OS spans multiple machines, but relies on each machine to have its own host operating system and host kernel.

The high level architecture of DC/OS can be seen below :

For the detailed architecture and components of DC/OS, please click here.

Adoption and usage of Mesosphere DC/OS:

Mesosphere customers include :

30% of the Fortune 50 U.S. Companies
5 of the top 10 North American Banks
7 of the top 12 Worldwide Telcos
5 of the top 10 Highest Valued Startups

Some companies using DC/OS are :

Cisco
Yelp
Tommy Hilfiger
Uber
Netflix
Verizon
Cerner
NIO

Installing and using DC/OS

A guide to installing DC/OS can be found here. After installing DC/OS on any platform, install dcos cli by following documentation found here.

Using dcos cli, we can manager cluster nodes, manage marathon tasks and services, install/remove packages from universe and it provides great support for automation process as each cli command can be output to json.

NOTE: The tasks below are executed with and tested on below tools:

DC/OS 1.11 Open Source
DC/OS cli 0.6.0
jq:1.5-1-a5b5cbe

DC/OS commands and scripts

Setup DC/OS cli with DC/OS cluster

dcos cluster setup <CLUSTER URL>

dcos cluster setup <CLUSTER URL>

Example :

dcos cluster setup http://dcos-cluster.com

dcos cluster setup http://dcos-cluster.com

The above command will give you the link for oauth authentication and prompt for auth token. You can authenticate yourself with any of Google, Github or Microsoft account. Paste the token generated after authentication to cli prompt. (Provided oauth is enabled).

DC/OS authentication token

docs config show core.dcos_acs_token

docs config show core.dcos_acs_token

DC/OS cluster url

dcos config show core.dcos_url

dcos config show core.dcos_url

DC/OS cluster name

dcos config show cluster.name

dcos config show cluster.name

Access Mesos UI

<DC/OS_CLUSTER_URL>/mesos

<DC/OS_CLUSTER_URL>/mesos

Example:

http://dcos-cluster.com/mesos

http://dcos-cluster.com/mesos

Access Marathon UI

<DC/OS_CLUSTER_URL>/service/marathon

<DC/OS_CLUSTER_URL>/service/marathon

Example:

http://dcos-cluster.com/service/marathon

http://dcos-cluster.com/service/marathon

Access any DC/OS service, like Marathon, Kafka, Elastic, Spark etc.[DC/OS Services]

<DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

<DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

Example:

http://dcos-cluster.com/service/marathon
http://dcos-cluster.com/service/kafka

http://dcos-cluster.com/service/marathon
http://dcos-cluster.com/service/kafka

Access DC/OS slaves info in json using Mesos API [Mesos Endpoints]

curl -H "Authorization: Bearer $(dcos config show 
core.dcos_acs_token)" $(dcos config show 
core.dcos_url)/mesos/slaves | jq

curl -H "Authorization: Bearer $(dcos config show 
core.dcos_acs_token)" $(dcos config show 
core.dcos_url)/mesos/slaves | jq

Access DC/OS slaves info in json using DC/OS cli

dcos node --json

dcos node --json

Note : DC/OS cli ‘dcos node –json’ is equivalent to running mesos slaves endpoint (/mesos/slaves)

Access DC/OS private slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

Access DC/OS public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r

Access DC/OS private and public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort

dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort

Get public IP of all public agents

#!/bin/bash
for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id'); 
do 
      dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
done 2>/dev/null

#!/bin/bash

for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id'); 
do 
      dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
done 2>/dev/null

Note: As ‘dcos node ssh’ requires private key to be added to ssh. Make sure you add your private key as ssh identity using :

ssh-add </path/to/private/key/file/.pem>

ssh-add </path/to/private/key/file/.pem>

Get public IP of master leader

dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null

dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null

Get all master nodes and their private ip

dcos node --json | jq '.[] | select(.type | contains("master"))
| .ip + " = " + .type' -r

dcos node --json | jq '.[] | select(.type | contains("master"))
| .ip + " = " + .type' -r

Get list of all users who have access to DC/OS cluster

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
"$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
"$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r

Add users to cluster using Mesosphere script (Run this on master)

Users to add are given in list.txt, each user on new line

for i in `cat list.txt`; do echo $i;
sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

for i in `cat list.txt`; do echo $i;
sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

Add users to cluster using DC/OS API

#!/bin/bash
# Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
for i in `cat users.list`; 
do 
  echo $i
  curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

#!/bin/bash

# Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
for i in `cat users.list`; 
do 
  echo $i
  curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

Delete users from DC/OS cluster organization

#!/bin/bash
# Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>
for i in `cat users.list`; 
do 
  echo $i
  curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

#!/bin/bash

# Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>

for i in `cat users.list`; 
do 
  echo $i
  curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

Offers/resources from individual DC/OS agent

In recent versions of the many dcos services, a scheduler endpoint at

http://yourcluster.com/service/<service-name>/v1/debug/offers

http://yourcluster.com/service/<service-name>/v1/debug/offers

will display an HTML table containing a summary of recently-evaluated offers. This table’s contents are currently very similar to what can be found in logs, but in a slightly more accessible format. Alternately, we can look at the scheduler’s logs in stdout. An offer is a set of resources all from one individual DC/OS agent.

<DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers

<DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers

Example:

http://dcos-cluster.com/service/kafka/v1/debug/offers
http://dcos-cluster.com/service/elastic/v1/debug/offers

http://dcos-cluster.com/service/kafka/v1/debug/offers
http://dcos-cluster.com/service/elastic/v1/debug/offers

Save JSON configs of all running Marathon apps

#!/bin/bash
# Save marathon configs in json format for all marathon apps
# Usage : saveMarathonConfig.sh
for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
  dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
done

#!/bin/bash

# Save marathon configs in json format for all marathon apps
# Usage : saveMarathonConfig.sh

for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
  dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
done

Get report of Marathon apps with details like container type, Docker image, tag or service version used by Marathon app.

#!/bin/bash
TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
#dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION  else "[ CMD ]" end end' -r > $TMP_CSV_FILE
sed -i "s|^/||g" $TMP_CSV_FILE
sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
cnt=1
printf '%.0s=' {1..150}
printf "n  %-5s%-35s%-23s%-40s%-20sn" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
printf '%.0s=' {1..150}
while IFS=, read -r app typ image; 
do
        tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
        image=`echo $image | awk -F':' '{print $1}'`
        printf "n  %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
        cnt=$((cnt + 1))
        sleep 0.3
done < $TMP_CSV_FILE_SORT
printf "n"
printf '%.0s=' {1..150}
printf "n"

#!/bin/bash

TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
#dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION  else "[ CMD ]" end end' -r > $TMP_CSV_FILE
sed -i "s|^/||g" $TMP_CSV_FILE
sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
cnt=1
printf '%.0s=' {1..150}
printf "n  %-5s%-35s%-23s%-40s%-20sn" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
printf '%.0s=' {1..150}
while IFS=, read -r app typ image; 
do
        tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
        image=`echo $image | awk -F':' '{print $1}'`
        printf "n  %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
        cnt=$((cnt + 1))
        sleep 0.3
done < $TMP_CSV_FILE_SORT
printf "n"
printf '%.0s=' {1..150}
printf "n"

Get DC/OS nodes with more information like node type, node ip, attributes, number of running tasks, free memory, free cpu etc.

#!/bin/bash
printf "n  %-15s %-18s%-18s%-10s%-15s%-10sn" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
printf '%.0s=' {1..90}
printf "n"
TAB=`echo -e "t"`
dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "t"+ if(.type |contains("master")) then .ip else .hostname end + "t" +  (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "t" + ((.resources.mem - .used_resources.mem)| tostring) + "tt" +  ((.resources.cpus - .used_resources.cpus)| tostring)  else "ttNAtNAttNA"  end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
printf '%.0s=' {1..90}
printf "n"

#!/bin/bash

printf "n  %-15s %-18s%-18s%-10s%-15s%-10sn" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
printf '%.0s=' {1..90}
printf "n"
TAB=`echo -e "t"`
dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "t"+ if(.type |contains("master")) then .ip else .hostname end + "t" +  (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "t" + ((.resources.mem - .used_resources.mem)| tostring) + "tt" +  ((.resources.cpus - .used_resources.cpus)| tostring)  else "ttNAtNAttNA"  end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
printf '%.0s=' {1..90}
printf "n"

Framework Cleaner

Uninstall framework and clean reserved resources if any after framework is deleted/uninstalled. (applicable if running DC/OS 1.9 or older, if higher than 1.10, then only uninstall cli is sufficient)

SERVICE_NAME=
dcos package uninstall $SERVICE_NAME
dcos node ssh --option StrictHostKeyChecking=no --master-proxy
--leader "docker run mesosphere/janitor /janitor.py -r
${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"

SERVICE_NAME=
dcos package uninstall $SERVICE_NAME
dcos node ssh --option StrictHostKeyChecking=no --master-proxy
--leader "docker run mesosphere/janitor /janitor.py -r
${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"

Get DC/OS apps and their placement constraints

dcos marathon app list --json | jq '.[] |
if (.constraints != null) then .id, .constraints else empty end'

dcos marathon app list --json | jq '.[] |
if (.constraints != null) then .id, .constraints else empty end'

Run shell command on all slaves

#!/bin/bash
# Run any shell command on all slave nodes (private and public)
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do 
   echo -e "n###> Running command [ $CMD ] on $i"
   dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
   echo -e "======================================n"
done

#!/bin/bash

# Run any shell command on all slave nodes (private and public)

# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do 
   echo -e "n###> Running command [ $CMD ] on $i"
   dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
   echo -e "======================================n"
done

Run shell command on master leader

CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
LogLevel=quiet --master-proxy --leader "$CMD"

CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
LogLevel=quiet --master-proxy --leader "$CMD"

Run shell command on all master nodes

#!/bin/bash
# Run any shell command on all master nodes
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
  echo -e "n###> Running command [ $CMD ] on $i"
  dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
 echo -e "======================================n"
done

#!/bin/bash

# Run any shell command on all master nodes

# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
  echo -e "n###> Running command [ $CMD ] on $i"
  dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
 echo -e "======================================n"
done

Add node attributes to dcos nodes and run apps on nodes with required attributes using placement constraints

#!/bin/bash
#1. SSH on node 
#2. Create or edit file /var/lib/dcos/mesos-slave-common
#3. Add contents as :
#    MESOS_ATTRIBUTES=<key>:<value>
#    Example:
#    MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
#4. Stop dcos-mesos-slave service
#    systemctl stop dcos-mesos-slave
#5. Remove link for latest slave metadata
#    rm -f /var/lib/mesos/slave/meta/slaves/latest
#6. Start dcos-mesos-slave service
#    systemctl start dcos-mesos-slave
#7. Wait for some time, node will be in HEALTHY state again.
#8. Add app placement constraint with field = key and value = value
#9. Verify attributes, run on any node
#    curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
#    OR Check DCOS cluster UI
#    Nodes => Select any Node => Details Tab
tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")
# key:value paired attribues, separated by ;
ATTRIBUTES=NODE_TYPE:GPU_NODE
cat <<EOF > ${tmpScript}
echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
sudo systemctl stop dcos-mesos-slave
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl start dcos-mesos-slave
EOF
# Add the private ip of nodes on which you want to add attrubutes, one ip per line.
for i in `cat nodes.txt`; do 
    echo $i
    dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
    sleep 10
done

#!/bin/bash

#1. SSH on node 
#2. Create or edit file /var/lib/dcos/mesos-slave-common
#3. Add contents as :
#    MESOS_ATTRIBUTES=<key>:<value>
#    Example:
#    MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
#4. Stop dcos-mesos-slave service
#    systemctl stop dcos-mesos-slave
#5. Remove link for latest slave metadata
#    rm -f /var/lib/mesos/slave/meta/slaves/latest
#6. Start dcos-mesos-slave service
#    systemctl start dcos-mesos-slave
#7. Wait for some time, node will be in HEALTHY state again.
#8. Add app placement constraint with field = key and value = value
#9. Verify attributes, run on any node
#    curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
#    OR Check DCOS cluster UI
#    Nodes => Select any Node => Details Tab

tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")

# key:value paired attribues, separated by ;
ATTRIBUTES=NODE_TYPE:GPU_NODE

cat <<EOF > ${tmpScript}
echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
sudo systemctl stop dcos-mesos-slave
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl start dcos-mesos-slave
EOF

# Add the private ip of nodes on which you want to add attrubutes, one ip per line.
for i in `cat nodes.txt`; do 
    echo $i
    dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
    sleep 10
done

Install DC/OS Datadog metrics plugin on all DC/OS nodes

#!/bin/bash

# Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>

DDAPI=$1

if [[ -z $DDAPI ]]; then
    echo "[Datadog Plugin] Need datadog API key as parameter."
    echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
fi
tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")

declare agent=$tmpScriptAgent
declare master=$tmpScriptMaster

for role in "agent" "master"
do
cat <<EOF > ${!role}
curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
export DD_API_KEY=$DDAPI
export AGENT_ROLE=$role
sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
sudo systemctl daemon-reload
sudo systemctl start dcos-metrics-datadog.service
echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
#sudo systemctl status dcos-metrics-datadog.service | head -3
#sudo journalctl -u dcos-metrics-datadog
EOF
done

echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"

for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` 
do 
    echo -e "\n###> Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
    echo -e "======================================================="
done

for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
    echo -e "\n###> Master Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
    echo -e "======================================================="
done

# Check status of dcos-metrics-datadog.service on all nodes.
#for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do  echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

#!/bin/bash

# Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>

DDAPI=$1

if [[ -z $DDAPI ]]; then
    echo "[Datadog Plugin] Need datadog API key as parameter."
    echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
fi
tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")

declare agent=$tmpScriptAgent
declare master=$tmpScriptMaster

for role in "agent" "master"
do
cat <<EOF > ${!role}
curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
export DD_API_KEY=$DDAPI
export AGENT_ROLE=$role
sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
sudo systemctl daemon-reload
sudo systemctl start dcos-metrics-datadog.service
echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
#sudo systemctl status dcos-metrics-datadog.service | head -3
#sudo journalctl -u dcos-metrics-datadog
EOF
done

echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"

for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` 
do 
    echo -e "\n###> Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
    echo -e "======================================================="
done

for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
    echo -e "\n###> Master Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
    echo -e "======================================================="
done

# Check status of dcos-metrics-datadog.service on all nodes.
#for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do  echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

Get app / node metrics fetched by dcos-metrics component using metrics API

Get DC/OS node id [dcos node]
Get Node metrics (CPU, memory, local filesystems, networks, etc) : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/node</agent_id></dc>
Get id of all containers running on that agent : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers</agent_id></dc>
Get Resource allocation and usage for the given container ID. : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id></container_id></agent_id></dc>
Get Application-level metrics from the container (shipped in StatsD format using the listener available at STATSD_UDP_HOST and STATSD_UDP_PORT) : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id>/app </container_id></agent_id></dc>

Get app / node metrics fetched by dcos-metrics component using dcos cli

Summary of container metrics for a specific task

dcos task metrics summary <task-id>

dcos task metrics summary <task-id>

All metrics in details for a specific task

dcos task metrics details <task-id>

dcos task metrics details <task-id>

Summary of Node metrics for a specific node

dcos task metrics summary <mesos-node-id>

dcos task metrics summary <mesos-node-id>

All Node metrics in details for a specific node

dcos node metrics details <mesos-node-id>

dcos node metrics details <mesos-node-id>

NOTE – All above commands have ‘–json’ flag to use them programmatically.

Launch / run command inside container for a task

DC/OS task exec cli only supports Mesos containers, this script supports both Mesos and Docker containers.

#!/bin/bash
echo "DCOS Task Exec 2.0"
if [ "$#" -eq 0 ]; then
        echo "Need task name or id as input. Exiting."
        exit 1
fi
taskName=$1
taskCmd=${2:-bash}
TMP_TASKLIST_JSON=/tmp/dcostasklist.json
dcos task --json > $TMP_TASKLIST_JSON
taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
if [[ $taskExist -eq 0 ]]; then 
        echo "No task with name $taskName exists."
        echo "Do you mean ?"
        dcos task | grep $taskName | awk '{print $1}'
        exit 1
fi
taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
if [[ $taskExist -ne 1 ]]; then
        echo -e "More than one instances. Please select task ID for executing command.n"
        #allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
        echo ""
        read TaskId
fi
if [[ $taskType !=  "DOCKER" ]]; then
        echo "Task [ $taskName ] is of type MESOS Container."
        execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
        echo "Running [$execCmd]"
        $execCmd
else
        echo "Task [ $taskName ] is of type DOCKER Container."
        taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
        echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
        taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
        taskContID=`echo $taskContID | tr -d 'r'`
        echo "Task Docker Container ID : [ $taskContID ]"
        echo "Running [ docker exec -it $taskContID $taskCmd ]"
        dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
fi

#!/bin/bash

echo "DCOS Task Exec 2.0"
if [ "$#" -eq 0 ]; then
        echo "Need task name or id as input. Exiting."
        exit 1
fi
taskName=$1
taskCmd=${2:-bash}
TMP_TASKLIST_JSON=/tmp/dcostasklist.json
dcos task --json > $TMP_TASKLIST_JSON
taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
if [[ $taskExist -eq 0 ]]; then 
        echo "No task with name $taskName exists."
        echo "Do you mean ?"
        dcos task | grep $taskName | awk '{print $1}'
        exit 1
fi
taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
if [[ $taskExist -ne 1 ]]; then
        echo -e "More than one instances. Please select task ID for executing command.n"
        #allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
        echo ""
        read TaskId
fi
if [[ $taskType !=  "DOCKER" ]]; then
        echo "Task [ $taskName ] is of type MESOS Container."
        execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
        echo "Running [$execCmd]"
        $execCmd
else
        echo "Task [ $taskName ] is of type DOCKER Container."
        taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
        echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
        taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
        taskContID=`echo $taskContID | tr -d 'r'`
        echo "Task Docker Container ID : [ $taskContID ]"
        echo "Running [ docker exec -it $taskContID $taskCmd ]"
        dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
fi

Get DC/OS tasks by node

#!/bin/bash 
function tasksByNodeAPI
{
    echo "DC/OS Tasks By Node"
    if [ "$#" -eq 0 ]; then
        echo "Need node ip as input. Exiting."
        exit 1
    fi
    nodeIp=$1
    mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
    if [ -z "mesosId" ]; then
        echo "No node found with ip $nodeIp. Exiting."
        exit 1
    fi
    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "ttt" + .id'  -r
}
function tasksByNodeCLI
{
        echo "DC/OS Tasks By Node"
        if [ "$#" -eq 0 ]; then
                echo "Need node ip as input. Exiting."
                exit 1
        fi
        nodeIp=$1
        dcos task | egrep "HOST|$nodeIp"
}

#!/bin/bash 

function tasksByNodeAPI
{
    echo "DC/OS Tasks By Node"
    if [ "$#" -eq 0 ]; then
        echo "Need node ip as input. Exiting."
        exit 1
    fi
    nodeIp=$1
    mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
    if [ -z "mesosId" ]; then
        echo "No node found with ip $nodeIp. Exiting."
        exit 1
    fi
    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "ttt" + .id'  -r
}

function tasksByNodeCLI
{
        echo "DC/OS Tasks By Node"
        if [ "$#" -eq 0 ]; then
                echo "Need node ip as input. Exiting."
                exit 1
        fi
        nodeIp=$1
        dcos task | egrep "HOST|$nodeIp"
}

Get cluster metadata – cluster Public IP and cluster ID

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"           
$(dcos config show core.dcos_url)/metadata

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"           
$(dcos config show core.dcos_url)/metadata

Sample Output:

{
"PUBLIC_IPV4": "123.456.789.012",
"CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
}

{
"PUBLIC_IPV4": "123.456.789.012",
"CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
}

Get DC/OS metadata – DC/OS version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq

Sample Output:

{
"version": "1.11.0",
"dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
"bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
}

{
"version": "1.11.0",
"dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
"bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
}

Get Mesos version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/version

Sample Output:

{
"build_date": "2018-02-27 21:31:27",
"build_time": 1519767087.0,
"build_user": "",
"git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
"version": "1.5.0"
}

{
"build_date": "2018-02-27 21:31:27",
"build_time": 1519767087.0,
"build_user": "",
"git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
"version": "1.5.0"
}

Access DC/OS cluster exhibitor UI (Exhibitor supervises ZooKeeper and provides a management web interface)

<CLUSTER_URL>/exhibitor

<CLUSTER_URL>/exhibitor

Access DC/OS cluster data from cluster zookeeper using Zookeeper Python client – Run inside any node / container

from kazoo.client import KazooClient
zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
zk.start()
clusterId = ""
# Here we can give znode path to retrieve its decoded data,
# for ex to get cluster-id, use
# data, stat = zk.get("/cluster-id")
# clusterId = data.decode("utf-8")
# Get cluster Id
if zk.exists("/cluster-id"):
    data, stat = zk.get("/cluster-id")
    clusterId = data.decode("utf-8")
zk.stop()
print (clusterId)

from kazoo.client import KazooClient

zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
zk.start()

clusterId = ""
# Here we can give znode path to retrieve its decoded data,
# for ex to get cluster-id, use
# data, stat = zk.get("/cluster-id")
# clusterId = data.decode("utf-8")

# Get cluster Id
if zk.exists("/cluster-id"):
    data, stat = zk.get("/cluster-id")
    clusterId = data.decode("utf-8")

zk.stop()

print (clusterId)

Access dcos cluster data from cluster zookeeper using exhibitor rest API

# Get znode data using endpoint :
# /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
# Example : Get znode data for path = /cluster-id
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id

# Get znode data using endpoint :
# /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
# Example : Get znode data for path = /cluster-id
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id

Sample Output:

{
"bytes": "3333-XXXXXX",
"str": "abcde-abcde-abcde-abcde-abcde-",
"stat": "XXXXXX"
}

{
"bytes": "3333-XXXXXX",
"str": "abcde-abcde-abcde-abcde-abcde-",
"stat": "XXXXXX"
}

Get cluster name using Mesos API

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

Mark Mesos node as decommissioned

Some times instances which are running as DC/OS node gets terminated and can not come back online, like AWS EC2 instances, once terminated due to any reason, can not start back. When Mesos detects that a node has stopped, it puts the node in the UNREACHABLE state because Mesos does not know if the node is temporarily stopped and will come back online, or if it is permanently stopped. In such case, we can explicitly tell Mesos to put a node in the GONE state if we know a node will not come back.

dcos node decommission <mesos-agent-id>

dcos node decommission <mesos-agent-id>

Conclusion

We learned about Mesosphere DC/OS, its functionality and roles. We also learned how to setup and use DC/OS cli and use http authentication to access DC/OS APIs as well as using DC/OS cli for automating tasks.

We went through different API endpoints like Mesos, Marathon, DC/OS metrics, exhibitor, DC/OS cluster organization etc. Finally, we looked at different tricks and scripts to automate DC/OS, like DC/OS node details, task exec, Docker report, DC/OS API http authentication etc.

December 12, 2022

Enable Real-time Functionality in Your App with GraphQL and Pusher
The most recognized solution for real-time problems is WebSockets (WS), where there is a persistent connection between the client and the server, and either can start sending data at any time. One of the latest implementations of WS is GraphQL subscriptions.

With GraphQL subscriptions, you can easily add real-time functionalities to your application. There is an easy and standard way to implement a subscription in the GraphQL app. The client just has to make a subscription query to the server, which specifies the event and the data shape. With this query, the client establishes a long-lived connection with the server on which it listens to specific events. Just as how GraphQL solves the over-fetching problem in the REST API, a subscription continues to extend the solution for real-time.

In this post, we will learn how to bring real-time functionality to your app by implementing GraphQL subscriptions with Pusher to manage Pub/Sub capabilities. The goal is to configure a Pusher channel and implement two subscriptions to be exposed by your GraphQL server. We will be implementing this in a Node.js runtime environment.

Why Pusher?

Why are we doing this using Pusher?
- Pusher, being a hosted real-time services provider, relieves us from managing our own real-time infrastructure, which is a highly complex problem.
- Pusher provides an easy and consistent API.
- Pusher also provides an entire set of tools to monitor and debug your realtime events.
- Events can be triggered by and consumed easily from different applications written in different frameworks.
Project Setup

We will start with a repository that contains a codebase for a simple GraphQL backend in Node.js, which is a minimal representation of a blog post application. The entities included are:
1. Link – Represents an URL and a small description for the Link
2. User – Link belongs to User
3. Vote – Represents users vote for a Link
In this application, a User can sign up and add or vote a Link in the application, and other users can upvote the Link. The database schema is built using Prisma and SQLite for quick bootstrapping. In the backend, we will use graphql-yoga as the GraphQL server implementation. To test our GraphQL backend, we will use the graphql-playground by Prisma, as a client, which will perform all queries and mutations on the server.

To set up the application:
1. Clone the repository here
2. Install all dependencies using
```
npm install
```
1. Set up a database using prisma-cli with following commands
```
npx prisma migrate save --experimental
#! Select ‘yes’ for the prompt to add an SQLite db after this command and enter a name for the migration. 
npx prisma migrate up --experimental
npx prisma generate
```
Note: Migrations are experimental features of the Prisma ORM, but you can ignore them because you can have a different backend setup for DB interactions. The purpose of using Prisma here is to quickly set up the project and dive into subscriptions.

A new directory, named Prisma, will be created containing the schema and database in SQLite. Now, you have your database and app set up and ready to use.

To start the Node.js application, execute the command:
```
npm start
```
Navigate to http://localhost:4000 to see the graphql-playground where we will execute our queries and mutations.

Our next task is to add a GraphQL subscription to our server to allow clients to listen to the following events:
- A new Link is created
- A Link is upvoted
To add subscriptions, we will need an npm package called graphql-pusher-subscriptions to help us interact with the Pusher service from within the GraphQL resolvers. The module will trigger events and listen to events for a channel from the Pusher service.

Before that, let’s first create a channel in Pusher. To configure a Pusher channel, head to their website at Pusher to create an account. Then, go to your dashboard and create a channels application. Choose a name, the cluster closest to your location, and frontend tech as React and backend tech as Node.js.

You will receive the following code to start.

Now, we add the graphql-pusher-subscription package. This package will take the Pusher channel configuration and give you an API to trigger and listen to events published on the channel.

Now, we import the package in the src/index.js file.
```
const { PusherChannel } = require('graphql-pusher-subscriptions');
```
After the PusherChannel class provided by the module accepts a configuration for the channel, we need to instantiate the class and create a reference Pub/Sub to the object. We give the Pusher config object given while creating the channel.
```
const pubsub = new PusherChannel({
  appId: '1046878',
  key: '3c84229419ed7b47e5b0',
  secret: 'e86868a98a2f052981a6',
  cluster: 'ap2',
  encrypted: true,
  channel: 'graphql-subscription'
});
```
Now, we add “pubsub” to the context so that it is available to all the resolvers. The channel field tells the client which channel to subscribe to. Here we have the channel “graphql-subscription”.
```
const server = new GraphQLServer({
  typeDefs: './src/schema.graphql',
  resolvers,
  context: request => {
    return {
      ...request,
      prisma,
      pubsub
    }
  },
})
```
The above part enables us to access the methods we need to implement our subscriptions from inside our resolvers via context.pubsub.

Subscribing to Link-created Event

The first step to add a subscription is to extend the GraphQL schema definition.
```
type Subscription {
  newLink: Link
}
```
Next, we implement the resolver for the “newLink” subscription type field. It is important to note that resolvers for subscriptions are different from queries and mutations in minor ways.

1. They return an AsyncIterator instead of data, which is then used by a GraphQL server to publish the event payload to the subscriber client.

2. The subscription resolvers are provided as a value of the resolve field inside an object. The object should also contain another field named “resolve” that returns the payload data from the data emitted by AsyncIterator.

To add the resolvers for the subscription, we start by adding a new file called Subscriptions.js

Inside the project directory, add the file as src/resolvers/Subscription.js

Now, in the new file created, add the following code, which will be the subscription resolver for the “newLink” type we created in GraphQL schema.
```
function newLinkSubscribe(parent, args, context, info) {
  return context.pubsub.asyncIterator("NEW_LINK")
}

const newLink = {
  subscribe: newLinkSubscribe,
  resolve: payload => {
    return payload
  },
}

module.exports = {
  newLink,
}
view raw
```
In the code above, the subscription resolver function, newLinkSubscribe, is added as a field value to the property subscribe just as we described before. The context provides reference to the Pub/Sub object, which lets us use the asyncIterator() with “NEW_LINK” as a parameter. This function resolves subscriptions and publishes events.

Adding Subscriptions to Your Resolvers

The final step for our subscription implementation is to call the function above inside of a resolver. We add the following call to pubsub.publish() inside the post resolver function inside Mutation.js file.
```
function post(parent, args, context, info) {
  const userId = getUserId(context)
  const newLink = await context.prisma.link.create({
    data: {
      url: args.url,
      description: args.description,
      postedBy: { connect: { id: userId } },
    }
  })
  context.pubsub.publish("NEW_LINK", newLink)
  return newLink
}
```
In the code above, we can see that we pass the same string “NEW_LINK” to the publish method as we did in the newLinkSubscribe function in the subscription function before. The “NEW_LINK” is the event name, and it will publish events to the Pusher service, and the same name will be used on the subscription resolver to bind to the particular event name. We also add the newLink as a second argument, which contains the data part for the event that will be published. The context.pubsub.publish function will be triggered before returning the newLink data.

Now, we will update the main resolver object, which is given to the GraphQL server.

First, import the subscription module inside of the index.js file.
```
const Subscription = require('./resolvers/Subscription') 
const resolvers = {
  Query,
  Mutation,
  Subscription,
  User,
  Link,
}
```
Now, with all code in place, we start testing our real time API. We will use multiple instances/tabs of GraphQL playground concurrently.

Testing Subscriptions

If your server is already running, then kill it with CTRL+C and restart with this command:
```
npm start
```
Next, open the browser and navigate to http://localhost:4000 to see the GraphQL playground. We will use one tab of the playground to perform the mutation to trigger the event to Pusher and invoke the subscriber.

We will now start to execute the queries to add some entities in the application.

First, let’s create a user in the application by using the signup mutation. We send the following mutation to the server to create a new User entity.
```
mutation {
    signup(
    name: "Alice"
    email: "alice@prisma.io"
    password: "graphql"
  ) {
    token
    user {
      Id
    }
  }
}
```
You will see a response in the playground that contains the authentication token for the user. Copy the token, and open another tab in the playground. Inside that new tab, open the HTTP_HEADERS section in the bottom and add the Authorization header.

Replace the __TOKEN__ placeholder from the below snippet with the copied token from above.
```
{
  "Authorization": "Bearer __TOKEN__"
}
```
Now, all the queries or mutations executed from that tab will carry the authentication token. With this in place, we sent the following mutation to our GraphQL server.
```
mutation {
post(
    url: "http://velotio.com"
    description: "An awesome GraphQL blog"
  ) {
    id
  }
}
```
The mutations above create a Link entity inside the application. Now that we have created an entity, we now move to test the subscription part. In another tab, we will send the subscription query and create a persistent WebSocket connection to the server. Before firing out a subscription query, let us first understand the syntax of it. It starts with the keyword subscription followed by the subscription name. The subscription query is defined in the GraphQL schema and shows the data shape we can resolve to. Here, we want to subscribe to a newLink subscription name, and the data resolved by it consists of that of a Link entity. That means we can resolve any specific part of the Link entity. Here, we are asking for attributes like id, URL, description, and nested attributes of the postedBy field.
```
subscription {
  newLink {
      id
      url
      description
      postedBy {
        id
        name
        email
      }
  }
}
```
The response of this operation is different from that of a mutation or query. You see a loading spinner, which indicates that it is waiting for an event to happen. This means the GraphQL client (playground) has established a connection with the server and is listening for response data.

Before triggering a subscription, we will also keep an eye on the Pusher channel for events triggered to verify that our Pusher service is integrated successfully.

To do this, we go to Pusher dasboard and navigate to the channel app we created and click on the debug console. The debug console will show us the events triggered in real-time.

Now that the Pusher dashboard is visible, we will trigger the subscription event by running the following mutation inside a new Playground tab.
```
mutation {
  post(
    url: "www.velotio.com"
    description: "Graphql remote schema stitching"
  ) {
    id
  }
}
```
Now, we observe the Playground where subscription was running.

We can see that the newly created Link is visible in the response section, and the subscription continues to listen, and the event has reached the Pusher service.

You will observe an event on the Pusher console that is the same event and data as sent by your post mutation.

We have achieved our first goal, i.e., we have integrated the Pusher channel and implemented a subscription for a Link creation event.

To achieve our second goal, i.e., to listen to Vote events, we repeat the same steps as we did for the Link subscription.

We add a subscription resolver for Vote in the Subscriptions.js file and update the Subscription type in the GraphQL schema. To trigger a different event, we use “NEW_VOTE” as the event name and add the publish function inside the resolver for Vote mutation.
```
function newVoteSubscribe(parent, args, context, info) {
  return context.pubsub.asyncIterator("NEW_VOTE")
}

const newVote = {
  subscribe: newVoteSubscribe,
  resolve: payload => {
    return payload
  },
}
view raw
```
Update the export statement to add the newVote resolver.
```
module.exports = {
  newLink,
  newVote,
}
```
Update the Vote mutation to add the publish call before returning the newVote data. Notice that the first parameter, “NEW_VOTE”, is being passed so that the listener can bind to the new event with that name.
```
const newVote = context.prisma.vote.create({
    data: {
      user: { connect: { id: userId } },
      link: { connect: { id: Number(args.linkId) } },
    }
  })
  context.pubsub.publish("NEW_VOTE", newVote)
  return newVote
}
```
Now, restart the server and complete the signup process with setting HTTP_HEADERS as we did before. Add the following subscription to a new Playground tab.
```
subscription {
  newVote {
    id
    link {
      url
      description
    }
    user {
      name
      email
    }
  }
}
```
In another Playground tab, send the following Vote mutation to the server to trigger the event, but do not forget to verify the Authorization header. The below mutation will add the Vote of the user to the Link. Replace the “__LINK_ID__” with the linkId generated in the previous post mutation.
```
mutation {
  vote(linkId: "__LINK_ID__") {
    link {
      url
      description
    }
    user {
      name
      email
    }
  }
}
```
Observe the event data on the response tab of the vote subscription. Also, you can check your event triggered on the pusher dashboard.

The final codebase is available on a branch named with-subscription.

Conclusion

By following the steps above, we saw how easy it is to add real-time features to GraphQL apps with subscriptions. Also, establishing a connection with the server is no hassle, and it is much similar to how we implement the queries and mutations. Unlike the mainstream approach, where one has to build and manage the event handlers, the GraphQL subscriptions come with these features built-in for the client and server. Also, we saw how we can use a managed real-time service like Pusher can be for Pub/Sub events. Both GraphQL and Pusher can prove to be a solid combination for a reliable real-time system.

Related Articles

1. Build and Deploy a Real-Time React App Using AWS Amplify and GraphQL

2. Scalable Real-time Communication With Pusher
December 12, 2022

Category: Type

Why the Analysis of IoT data is different

Network Protocol

Datastore

Pipeline Overview

Setting up a MQTT Message Broker Server:

Setting up InfluxDB and Grafana

Creating a Kinesis stream

Creating the MQTT client

Create a Kinesis Producer

Create a Kinesis Consumer

Processing the data and writing it to InfluxDB

Putting everything together in our main function

Visualization through Grafana

Conclusion:

Introduction

A Typical Application

Simple Multi-tier Application

Problem Scenario 1

Multi-tier Application With Load Balancer

The Routing Question

Problem Scenario 2

The Complete Picture

Installation and Configuration

Setup the Server

Install the binaries on the server

Create the Server Configuration

Setup the Load-Balancer

Install the binaries on the server

Create the Load-Balancer Configuration

Setup the Client (Worker) Machines

Install the binaries on the server

Create the Worker Configuration

Test the Setup

Submit Jobs

Run the load-balancer job

Check the status of the load-balancer

Run the service ‘foo’

Check the status of service ‘foo’

Run the service ‘bar’

Check the status of service ‘bar’

Check the Fabio Routes

Connect to the Services

Conclusion

New engine for faster DEV run and production build:

Next.js Live:

Middleware & serverless:

Server-side streaming:

React server components:

Conclusion:

Introduction

IT Operations & Machine Learning

Anomaly Detection using Elastic’s machine learning with X-Pack

Step I: Setup

1. Setup Elasticsearch:

2. Setup Kibana

3. Metricbeat:

Step II: Time Series data

Step III: Creating Machine Learning jobs

Job1: Tracking CPU Utilization

Job2: Tracking total requests made on server

Formal API definition

Code Consistency

Code Formatting

Strict Typing

Git Hooks

Async-await

CLOUD NATIVE – The Why?

CLOUD NATIVE – The What?

CLOUD NATIVE – The How?

References:

Demo Configuration

Render-Blocking Resources

Identifying Render-Blocking Resources

Solution

Final Thoughts

Further Reading

Redis Cluster

2. Form the Redis cluster between Redis node services:

3. Import existing dump.rdb to Redis cluster