Category: Cloud & DevOps

  • A Primer on HTTP Load Balancing in Kubernetes using Ingress on Google Cloud Platform

    Containerized applications and Kubernetes adoption in cloud environments is on the rise. One of the challenges while deploying applications in Kubernetes is exposing these containerized applications to the outside world. This blog explores different options via which applications can be externally accessed with focus on Ingress – a new feature in Kubernetes that provides an external load balancer. This blog also provides a simple hand-on tutorial on Google Cloud Platform (GCP).  

    Ingress is the new feature (currently in beta) from Kubernetes which aspires to be an Application Load Balancer intending to simplify the ability to expose your applications and services to the outside world. It can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, offer name based virtual hosting etc. Before we dive into Ingress, let’s look at some of the alternatives currently available that help expose your applications, their complexities/limitations and then try to understand Ingress and how it addresses these problems.

    Current ways of exposing applications externally:

    There are certain ways using which you can expose your applications externally. Lets look at each of them:

    EXPOSE Pod:

    You can expose your application directly from your pod by using a port from the node which is running your pod, mapping that port to a port exposed by your container and using the combination of your HOST-IP:HOST-PORT to access your application externally. This is similar to what you would have done when running docker containers directly without using Kubernetes. Using Kubernetes you can use hostPortsetting in service configuration which will do the same thing. Another approach is to set hostNetwork: true in service configuration to use the host’s network interface from your pod.

    Limitations:

    • In both scenarios you should take extra care to avoid port conflicts at the host, and possibly some issues with packet routing and name resolutions.
    • This would limit running only one replica of the pod per cluster node as the hostport you use is unique and can bind with only one service.

    EXPOSE Service:

    Kubernetes services primarily work to interconnect different pods which constitute an application. You can scale the pods of your application very easily using services. Services are not primarily intended for external access, but there are some accepted ways to expose services to the external world.

    Basically, services provide a routing, balancing and discovery mechanism for the pod’s endpoints. Services target pods using selectors, and can map container ports to service ports. A service exposes one or more ports, although usually, you will find that only one is defined.

    A service can be exposed using 3 ServiceType choices:

    • ClusterIP: Exposes the service on a cluster-internal IP. Choosing this value makes the service only reachable from within the cluster. This is the default ServiceType.
    • NodePort: Exposes the service on each Node’s IP at a static port (the NodePort). A ClusterIP service, to which the NodePort service will route, is automatically created. You’ll be able to contact the NodePort service, from outside the cluster, by requesting <nodeip>:<nodeport>.Here NodePort remains fixed and NodeIP can be any node IP of your Kubernetes cluster.</nodeport></nodeip>
    • LoadBalancer: Exposes the service externally using a cloud provider’s load balancer (eg. AWS ELB). NodePort and ClusterIP services, to which the external load balancer will route, are automatically created.
    • ExternalName: Maps the service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up. This requires version 1.7 or higher of kube-dns

    Limitations:

    • If we choose NodePort to expose our services, kubernetes will generate ports corresponding to the ports of your pods in the range of 30000-32767. You will need to add an external proxy layer that uses DNAT to expose more friendly ports. The external proxy layer will also have to take care of load balancing so that you leverage the power of your pod replicas. Also it would not be easy to add TLS or simple host header routing rules to the external service.
    • ClusterIP and ExternalName similarly while easy to use have the limitation where we can add any routing or load balancing rules.
    • Choosing LoadBalancer is probably the easiest of all methods to get your service exposed to the internet. The problem is that there is no standard way of telling a Kubernetes service about the elements that a balancer requires, again TLS and host headers are left out. Another limitation is reliance on an external load balancer (AWS’s ELB, GCP’s Cloud Load Balancer etc.)

    Endpoints

    Endpoints are usually automatically created by services, unless you are using headless services and adding the endpoints manually. An endpoint is a host:port tuple registered at Kubernetes, and in the service context it is used to route traffic. The service tracks the endpoints as pods, that match the selector are created, deleted and modified. Individually, endpoints are not useful to expose services, since they are to some extent ephemeral objects.

    Summary

    If you can rely on your cloud provider to correctly implement the LoadBalancer for their API, to keep up-to-date with Kubernetes releases, and you are happy with their management interfaces for DNS and certificates, then setting up your services as type LoadBalancer is quite acceptable.

    On the other hand, if you want to manage load balancing systems manually and set up port mappings yourself, NodePort is a low-complexity solution. If you are directly using Endpoints to expose external traffic, perhaps you already know what you are doing (but consider that you might have made a mistake, there could be another option).

    Given that none of these elements has been originally designed to expose services to the internet, their functionality may seem limited for this purpose.

    Understanding Ingress

    Traditionally, you would create a LoadBalancer service for each public application you want to expose. Ingress gives you a way to route requests to services based on the request host or path, centralizing a number of services into a single entrypoint.

    Ingress is split up into two main pieces. The first is an Ingress resource, which defines how you want requests routed to the backing services and second is the Ingress Controller which does the routing and also keeps track of the changes on a service level.

    Ingress Resources

    The Ingress resource is a set of rules that map to Kubernetes services. Ingress resources are defined purely within Kubernetes as an object that other entities can watch and respond to.

    Ingress Supports defining following rules in beta stage:

    • host header:  Forward traffic based on domain names.
    • paths: Looks for a match at the beginning of the path.
    • TLS: If the ingress adds TLS, HTTPS and a certificate configured through a secret will be used.

    When no host header rules are included at an Ingress, requests without a match will use that Ingress and be mapped to the backend service. You will usually do this to send a 404 page to requests for sites/paths which are not sent to the other services. Ingress tries to match requests to rules, and forwards them to backends, which are composed of a service and a port.

    Ingress Controllers

    Ingress controller is the entity which grants (or remove) access, based on the changes in the services, pods and Ingress resources. Ingress controller gets the state change data by directly calling Kubernetes API.

    Ingress controllers are applications that watch Ingresses in the cluster and configure a balancer to apply those rules. You can configure any of the third party balancers like HAProxy, NGINX, Vulcand or Traefik to create your version of the Ingress controller.  Ingress controller should track the changes in ingress resources, services and pods and accordingly update configuration of the balancer.

    Ingress controllers will usually track and communicate with endpoints behind services instead of using services directly. This way some network plumbing is avoided, and we can also manage the balancing strategy from the balancer. Some of the open source implementations of Ingress Controllers can be found here.

    Now, let’s do an exercise of setting up a HTTP Load Balancer using Ingress on Google Cloud Platform (GCP), which has already integrated the ingress feature in it’s Container Engine (GKE) service.

    Ingress-based HTTP Load Balancer in Google Cloud Platform

    The tutorial assumes that you have your GCP account setup done and a default project created. We will first create a Container cluster, followed by deployment of a nginx server service and an echoserver service. Then we will setup an ingress resource for both the services, which will configure the HTTP Load Balancer provided by GCP

    Basic Setup

    Get your project ID by going to the “Project info” section in your GCP dashboard. Start the Cloud Shell terminal, set your project id and the compute/zone in which you want to create your cluster.

    $ gcloud config set project glassy-chalice-129514$ 
    gcloud config set compute/zone us-east1-d
    # Create a 3 node cluster with name “loadbalancedcluster”$ 
    gcloud container clusters create loadbalancedcluster  

    Fetch the cluster credentials for the kubectl tool:

    $ gcloud container clusters get-credentials loadbalancedcluster --zone us-east1-d --project glassy-chalice-129514

    Step 1: Deploy an nginx server and echoserver service

    $ kubectl run nginx --image=nginx --port=80
    $ kubectl run echoserver --image=gcr.io/google_containers/echoserver:1.4 --port=8080
    $ kubectl get deployments
    NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    echoserver   1         1         1            1           15s
    nginx        1         1         1            1           26m

    Step 2: Expose your nginx and echoserver deployment as a service internally

    Create a Service resource to make the nginx and echoserver deployment reachable within your container cluster:

    $ kubectl expose deployment nginx --target-port=80  --type=NodePort
    $ kubectl expose deployment echoserver --target-port=8080 --type=NodePort

    When you create a Service of type NodePort with this command, Container Engine makes your Service available on a randomly-selected high port number (e.g. 30746) on all the nodes in your cluster. Verify the Service was created and a node port was allocated:

    $ kubectl get service nginx
    NAME      CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
    nginx     10.47.245.54   <nodes>       80:30746/TCP   20s
    $ kubectl get service echoserver
    NAME         CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
    echoserver   10.47.251.9   <nodes>       8080:32301/TCP   33s

    In the output above, the node port for the nginx Service is 30746 and for echoserver service is 32301. Also, note that there is no external IP allocated for this Services. Since the Container Engine nodes are not externally accessible by default, creating this Service does not make your application accessible from the Internet. To make your HTTP(S) web server application publicly accessible, you need to create an Ingress resource.

    Step 3: Create an Ingress resource

    On Container Engine, Ingress is implemented using Cloud Load Balancing. When you create an Ingress in your cluster, Container Engine creates an HTTP(S) load balancer and configures it to route traffic to your application. Container Engine has internally defined an Ingress Controller, which takes the Ingress resource as input for setting up proxy rules and talk to Kubernetes API to get the service related information.

    The following config file defines an Ingress resource that directs traffic to your nginx and echoserver server:

    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata: 
    name: fanout-ingress
    spec: 
    rules: 
    - http:     
    paths:     
    - path: /       
    backend:         
    serviceName: nginx         
    servicePort: 80     
    - path: /echo       
    backend:         
    serviceName: echoserver         
    servicePort: 8080

    To deploy this Ingress resource run in the cloud shell:

    $ kubectl apply -f basic-ingress.yaml

    Step 4: Access your application

    Find out the external IP address of the load balancer serving your application by running:

    $ kubectl get ingress fanout-ingres
    NAME             HOSTS     ADDRESS          PORTS     AG
    fanout-ingress   *         130.211.36.168   80        36s    

     

    Use http://<external-ip-address> </external-ip-address>and http://<external-ip-address>/echo</external-ip-address> to access nginx and the echo-server.

    Summary

    Ingresses are simple and very easy to deploy, and really fun to play with. However, it’s currently in beta phase and misses some of the features that may restrict it from production use. Stay tuned to get updates in Ingress on Kubernetes page and their Github repo.

    References

  • Elasticsearch 101: Fundamentals & Core Components

    Elasticsearch is currently the most popular way to implement free text search and analytics in applications. It is highly scalable and can easily manage petabytes of data. It supports a variety of use cases like allowing users to easily search through any portal, collect and analyze log data, build business intelligence dashboards to quickly analyze and visualize data.  

    This blog acts as an introduction to Elasticsearch and covers the basic concepts of clusters, nodes, index, document and shards.

    What is Elasticsearch?

    Elasticsearch (ES) is a combination of open-source, distributed, highly scalable data store, and Lucene – a search engine that supports extremely fast full-text search. It is a beautifully crafted software, which hides the internal complexities and provides full-text search capabilities with simple REST APIs. Elasticsearch is written in Java with Apache Lucene at its core. It should be clear that Elasticsearch is not like a traditional RDBMS. It is not suitable for your transactional database needs, and hence, in my opinion, it should not be your primary data store. It is a common practice to use a relational database as the primary data store and inject only required data into Elasticsearch.

    Elasticsearch is meant for fast text search. There are several functionalities, which make it different from RDBMS. Unlike RDBMS, Elasticsearch stores data in the form of a JSON document, which is denormalized and doesn’t support transactions, referential integrity, joins, and subqueries.

    Elasticsearch works with structured, semi-structured, and unstructured data as well. In the next section, let’s walk through the various components in Elasticsearch.

    Elasticsearch Components

    Cluster

    One or more servers collectively providing indexing and search capabilities form an Elasticsearch cluster. The cluster size can vary from a single node to thousands of nodes, depending on the use cases.

    Node

    Node is a single physical or virtual machine that holds full or part of your data and provides computing power for indexing and searching your data. Every node is identified with a unique name. If the node identifier is not specified, a random UUID is assigned as a node identifier at the startup. Every node configuration has the property `cluster.name`. The cluster will be formed automatically with all the nodes having the same `cluster.name` at startup.

    A node has to accomplish several duties such as:

    • storing the data
    • performing operations on data (indexing, searching, aggregation, etc.)
    • maintaining the health of the cluster

    Each node in a cluster can do all these operations. Elasticsearch provides the capability to split responsibilities across different nodes. This makes it easy to scale, optimize, and maintain the cluster. Based on the responsibilities, the following are the different types of nodes that are supported:

    Data Node

    Data node is the node that has storage and computation capability. Data node stores the part of data in the form of shards (explained in the later section). Data nodes also participate in the CRUD, search, and aggregate operations. These operations are resource-intensive, and hence, it is a good practice to have dedicated data nodes without having the additional load of cluster administration. By default, every node of the cluster is a data node.

    Master Node

    Master nodes are reserved to perform administrative tasks. Master nodes track the availability/failure of the data nodes. The master nodes are responsible for creating and deleting the indices (explained in the later section).

    This makes the master node a critical part of the Elasticsearch cluster. It has to be stable and healthy. A single master node for a cluster is certainly a single point of failure. Elasticsearch provides the capability to have multiple master-eligible nodes. All the master eligible nodes participate in an election to elect a master node. It is recommended to have a minimum of three nodes in the cluster to avoid a split-brain situation. By default, all the nodes are both data nodes as well as master nodes. However, some nodes can be master-eligible nodes only through explicit configuration.

    Coordinating-Only Node

    Any node, which is not a master node or a data node, is a coordinating node. Coordinating nodes act as smart load balancers. Coordinating nodes are exposed to end-user requests. It appropriately redirects the requests between data nodes and master nodes.

    To take an example, a user’s search request is sent to different data nodes. Each data node searches locally and sends the result back to the coordinating node. Coordinating node aggregates and returns the result to the user.

    There are a few concepts that are core to Elasticsearch. Understanding these basic concepts will tremendously ease the learning process.

    Index

    Index is a container to store data similar to a database in the relational databases. An index contains a collection of documents that have similar characteristics or are logically related. If we take an example of an e-commerce website, there will be one index for products, one for customers, and so on. Indices are identified by the lowercase name. The index name is required to perform the add, update, and delete operations on the documents.

    Type

    Type is a logical grouping of the documents within the index. In the previous example of product index, we can further group documents into types, like electronics, fashion, furniture, etc. Types are defined on the basis of documents having similar properties in it. It isn’t easy to decide when to use the type over the index. Indices have more overheads, so sometimes, it is better to use different types in the same index for better performance. There are a couple of restrictions to use types as well. For example, two fields having the same name in different types of documents should be of the same datatype (string, date, etc.).

    Document

    Document is the piece indexed by Elasticsearch. A document is represented in the JSON format. We can add as many documents as we want into an index. The following snippet shows how to create a document of type mobile in the index store. We will cover more about the individual field of the document in the Mapping Type section.

    HTTP POST <hostname:port>/store/mobile/
    {    
    "name": "Motorola G5",    
    "model": "XT3300",    
    "release_date": "2016-01-01",    
    "features": "16 GB ROM | Expandable Upto 128 GB | 5.2 inch Full HD Display | 12MP Rear Camera | 5MP Front Camera | 3000 mAh Battery | Snapdragon 625 Processor",    
    "ram_gb": "3",    
    "screen_size_inches": "5.2"
    }

    Mapping Types

    To create different types in an index, we need mapping types (or simply mapping) to be specified during index creation. Mappings can be defined as a list of directives given to Elasticseach about how the data is supposed to be stored and retrieved. It is important to provide mapping information at the time of index creation based on how we want to retrieve our data later. In the context of relational databases, think of mappings as a table schema.

    Mapping provides information on how to treat each JSON field. For example, the field can be of type date, geolocation, or person name. Mappings also allow specifying which fields will participate in the full-text search, and specify the analyzers used to transform and decorate data before storing into an index. If no mapping is provided, Elasticsearch tries to identify the schema itself, known as Dynamic Mapping. 

    Each mapping type has Meta Fields and Properties. The snippet below shows the mapping of the type mobile.

    {    
    "mappings": {        
      "mobile": {            
        "properties": {                
          "name": {                    
            "type": "keyword"                
          },                
            "model": {                    
              "type": "keyword"                
           },               
              "release_date": {                    
                "type": "date"                
           },                
                "features": {                    
                  "type": "text"               
             },                
                "ram_gb": {                    
                  "type": "short"                
              },                
                  "screen_size_inches": {                    
                    "type": "float"                
              }            
            }        
          }    
       }
    }

    Meta Fields

    As the name indicates, meta fields stores additional information about the document. Meta fields are meant for mostly internal usage, and it is unlikely that the end-user has to deal with meta fields. Meta field names starts with an underscore. There are around ten meta fields in total. We will talk about some of them here:

    _index

    It stores the name of the index document it belongs to. This is used internally to store/search the document within an index.

    _type

    It stores the type of the document. To get better performance, it is often included in search queries.

    _id

    This is the unique id of the document. It is used to access specific document directly over the HTTP GET API.

    _source

    This holds the original JSON document before applying any analyzers/transformations. It is important to note that Elasticsearch can query on fields that are indexed (provided mapping for). The _source field is not indexed, and hence, can’t be queried on but it can be included in the final search result.

    Fields Or Properties

    List of fields specifies which all JSON fields in the document should be included in a particular type. In the e-commerce website example, mobile can be a type. It will have fields, like operating_system, camera_specification, ram_size, etc.

    Fields also carry the data type information with them. This directs Elasticsearch to treat the specific fields in a particular way of storing/searching data. Data types are similar to what we see in any other programming language. We will talk about a few of them here.

    Simple Data Types

    Text

    This data type is used to store full-text like product description. These fields participate in full-text search. These types of fields are analyzed while storing, which enables to searching them by the individual word in it. Such fields are not used in sorting and aggregation queries.

    Keywords

    This type is also used to store text data, but unlike Text, it is not analyzed and stored. This is suitable to store information like a user’s mobile number, city, age, etc. These fields are used in filter, aggregation, and sorting queries. For e.g., list all users from a particular city and filter them by age.

    Numeric

    Elasticsearch supports a wide range of numeric type: long, integer, short, byte, double, float.

    There are a few more data types to support date, boolean (true/false, on/off, 1/0), IP (to store IP addresses).

    Special Data Types

    Geo Point

    This data type is used to store geographical location. It accepts latitude and longitude pair. For example, this data type can be used to arrange the user’s photo library by their geographical location or graphically display the locations trending on social media news.

    Geo Shape

    It allows storing arbitrary geometric shapes like rectangle, polygon, etc.

    Completion Suggester

    This data type is used to provide auto-completion feature over a specific field. As the user types certain text, the completion suggester can guide the user to reach particular results.

    Complex Data Type

    Object

    If you know JSON well, this concept won’t be new for you. Elasticsearch also allows storing nested JSON object structure as a document.

    Nested

    The Object data type is not that useful due to its underlying data representation in the Lucene index. Lucene index does not support inner JSON object. ES flattens the original JSON to make it compatible with storing in Lucene index. Thus, fields of the multiple inner objects get merged into one leading object to wrong search results. Most of the time, you may use Nested data type over Object.

    Shards

    Shards help with enabling Elasticsearch to become horizontally scalable. An index can store millions of documents and occupy terabytes of data. This can cause problems with performance, scalability, and maintenance. Let’s see how Shards help achieve scalability.

    Indices are divided into multiple units called Shards (refer the diagram below). Shard is a full-featured subset of an index. Shards of the same index now can reside on the same or different nodes of the cluster. Shard decides the degree of parallelism for search and indexing operations. Shards allow the cluster to grow horizontally. The number of shards per index can be specified at the time of index creation. By default, the number of shards created is 5. Although, once the index is created the number of shards can not be changed. To change the number of shards, reindex the data.

    Replication

    Hardware can fail at any time. To ensure fault tolerance and high availability, ES provides a feature to replicate the data. Shards can be replicated. A shard which is being copied is called as Primary Shard. The copy of the primary shard is called a replica shard or simply replica. Like the number of shards, the number of replication can also be specified at the time of index creation. Replication served two purposes:

    • High Availability – Replica is never been created on the same node where the primary shard is present. This ensures that data can be available through the replica shard even if the complete node is failed.
    • Performance – Replica can also contribute to search capabilities. The search queries will be executed parallelly across the replicas.

    To summarize, to achieve high availability and performance, the index is split into multiple shards. In a production environment, multiple replicas are created for every index. In the replicated index, only primary shards can serve write requests. However, all the shards (the primary shard as well as replicated shards) can serve read/query requests. The replication factor is defined at the time of index creation and can be changed later if required. Choosing the number of shards is an important exercise. As once defined, it can’t be changed. In critical scenarios, changing the number of shards requires creating a new index with required shards and reindexing old data.

    Summary

    In this blog, we have covered the basic but important aspects of Elasticsearch. In the following posts, I will talk about how indexing & searching works in detail. Stay tuned!

  • Mesosphere DC/OS Masterclass : Tips and Tricks to Make Life Easier

    DC/OS is an open-source operating system and distributed system for data center built on Apache Mesos distributed system kernel. As a distributed system, it is a cluster of master nodes and private/public nodes, where each node also has host operating system which manages the underlying machine. 

    It enables the management of multiple machines as if they were a single computer. It automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate remote management and monitoring of the cluster and its services.

    • Distributed System DC/OS is distributed system with group of private and public nodes which are coordinated by master nodes.
    • Cluster Manager : DC/OS  is responsible for running tasks on agent nodes and providing required resources to them. DC/OS uses Apache Mesos to provide cluster management functionality.
    • Container Platform : All DC/OS tasks are containerized. DC/OS uses two different container runtimes, i.e. docker and mesos. So that containers can be started from docker images or they can be native executables (binaries or scripts) which are containerized at runtime by mesos.
    • Operating System :  As name specifies, DC/OS is an operating system which abstracts cluster h/w and s/w resources and provide common services to applications.

    Unlike Linux, DC/OS is not a host operating system. DC/OS spans multiple machines, but relies on each machine to have its own host operating system and host kernel.

    The high level architecture of DC/OS can be seen below :

    For the detailed architecture and components of DC/OS, please click here.

    Adoption and usage of Mesosphere DC/OS:

    Mesosphere customers include :

    • 30% of the Fortune 50 U.S. Companies
    • 5 of the top 10 North American Banks
    • 7 of the top 12 Worldwide Telcos
    • 5 of the top 10 Highest Valued Startups

    Some companies using DC/OS are :

    • Cisco
    • Yelp
    • Tommy Hilfiger
    • Uber
    • Netflix
    • Verizon
    • Cerner
    • NIO

    Installing and using DC/OS

    A guide to installing DC/OS can be found here. After installing DC/OS on any platform, install dcos cli by following documentation found here.

    Using dcos cli, we can manager cluster nodes, manage marathon tasks and services, install/remove packages from universe and it provides great support for automation process as each cli command can be output to json.

    NOTE: The tasks below are executed with and tested on below tools:

    • DC/OS 1.11 Open Source
    • DC/OS cli 0.6.0
    • jq:1.5-1-a5b5cbe

    DC/OS commands and scripts

    Setup DC/OS cli with DC/OS cluster

    dcos cluster setup <CLUSTER URL>

    Example :

    dcos cluster setup http://dcos-cluster.com

    The above command will give you the link for oauth authentication and prompt for auth token. You can authenticate yourself with any of Google, Github or Microsoft account. Paste the token generated after authentication to cli prompt. (Provided oauth is enabled).

    DC/OS authentication token

    docs config show core.dcos_acs_token

    DC/OS cluster url

    dcos config show core.dcos_url

    DC/OS cluster name

    dcos config show cluster.name

    Access Mesos UI

    <DC/OS_CLUSTER_URL>/mesos

    Example:

    http://dcos-cluster.com/mesos

    Access Marathon UI

    <DC/OS_CLUSTER_URL>/service/marathon

    Example:

    http://dcos-cluster.com/service/marathon

    Access any DC/OS service, like Marathon, Kafka, Elastic, Spark etc.[DC/OS Services]

    <DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

    Example:

    http://dcos-cluster.com/service/marathon
    http://dcos-cluster.com/service/kafka

    Access DC/OS slaves info in json using Mesos API [Mesos Endpoints]

    curl -H "Authorization: Bearer $(dcos config show 
    core.dcos_acs_token)" $(dcos config show 
    core.dcos_url)/mesos/slaves | jq

    Access DC/OS slaves info in json using DC/OS cli

    dcos node --json

    Note : DC/OS cli ‘dcos node –json’ is equivalent to running mesos slaves endpoint (/mesos/slaves)

    Access DC/OS private slaves info using DC/OS cli

    dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

    Access DC/OS public slaves info using DC/OS cli

    dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r

    Access DC/OS private and public slaves info using DC/OS cli

    dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort

    Get public IP of all public agents

    #!/bin/bash
    
    for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id'); 
    do 
          dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
    done 2>/dev/null

    Note: As ‘dcos node ssh’ requires private key to be added to ssh. Make sure you add your private key as ssh identity using :

    ssh-add </path/to/private/key/file/.pem>

    Get public IP of master leader

    dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null

    Get all master nodes and their private ip

    dcos node --json | jq '.[] | select(.type | contains("master"))
    | .ip + " = " + .type' -r

    Get list of all users who have access to DC/OS cluster

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    "$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r

    Add users to cluster using Mesosphere script (Run this on master)

    Users to add are given in list.txt, each user on new line

    for i in `cat list.txt`; do echo $i;
    sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

    Add users to cluster using DC/OS API

    #!/bin/bash
    
    # Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
    for i in `cat users.list`; 
    do 
      echo $i
      curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
    done

    Delete users from DC/OS cluster organization

    #!/bin/bash
    
    # Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>
    
    for i in `cat users.list`; 
    do 
      echo $i
      curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
    done

    Offers/resources from individual DC/OS agent

    In recent versions of the many dcos services, a scheduler endpoint at                

    http://yourcluster.com/service/<service-name>/v1/debug/offers

    will display an HTML table containing a summary of recently-evaluated offers. This table’s contents are currently very similar to what can be found in logs, but in a slightly more accessible format. Alternately, we can look at the scheduler’s logs in stdout. An offer is a set of resources all from one individual DC/OS agent.

    <DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers

    Example:

    http://dcos-cluster.com/service/kafka/v1/debug/offers
    http://dcos-cluster.com/service/elastic/v1/debug/offers

    Save JSON configs of all running Marathon apps

    #!/bin/bash
    
    # Save marathon configs in json format for all marathon apps
    # Usage : saveMarathonConfig.sh
    
    for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
      dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
    done

    Get report of Marathon apps with details like container type, Docker image, tag or service version used by Marathon app.

    #!/bin/bash
    
    TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
    TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
    #dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
    dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION  else "[ CMD ]" end end' -r > $TMP_CSV_FILE
    sed -i "s|^/||g" $TMP_CSV_FILE
    sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
    cnt=1
    printf '%.0s=' {1..150}
    printf "n  %-5s%-35s%-23s%-40s%-20sn" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
    printf '%.0s=' {1..150}
    while IFS=, read -r app typ image; 
    do
            tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
            image=`echo $image | awk -F':' '{print $1}'`
            printf "n  %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
            cnt=$((cnt + 1))
            sleep 0.3
    done < $TMP_CSV_FILE_SORT
    printf "n"
    printf '%.0s=' {1..150}
    printf "n"

    Get DC/OS nodes with more information like node type, node ip, attributes, number of running tasks, free memory, free cpu etc.

    #!/bin/bash
    
    printf "n  %-15s %-18s%-18s%-10s%-15s%-10sn" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
    printf '%.0s=' {1..90}
    printf "n"
    TAB=`echo -e "t"`
    dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "t"+ if(.type |contains("master")) then .ip else .hostname end + "t" +  (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "t" + ((.resources.mem - .used_resources.mem)| tostring) + "tt" +  ((.resources.cpus - .used_resources.cpus)| tostring)  else "ttNAtNAttNA"  end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
    printf '%.0s=' {1..90}
    printf "n"

    Framework Cleaner

    Uninstall framework and clean reserved resources if any after framework is deleted/uninstalled. (applicable if running DC/OS 1.9 or older, if higher than 1.10, then only uninstall cli is sufficient)

    SERVICE_NAME=
    dcos package uninstall $SERVICE_NAME
    dcos node ssh --option StrictHostKeyChecking=no --master-proxy
    --leader "docker run mesosphere/janitor /janitor.py -r
    ${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"

    Get DC/OS apps and their placement constraints

    dcos marathon app list --json | jq '.[] |
    if (.constraints != null) then .id, .constraints else empty end'

    Run shell command on all slaves

    #!/bin/bash
    
    # Run any shell command on all slave nodes (private and public)
    
    # Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
    CMD=$1
    for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do 
       echo -e "n###> Running command [ $CMD ] on $i"
       dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
       echo -e "======================================n"
    done

    Run shell command on master leader

    CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
    LogLevel=quiet --master-proxy --leader "$CMD"

    Run shell command on all master nodes

    #!/bin/bash
    
    # Run any shell command on all master nodes
    
    # Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
    CMD=$1
    for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
    do 
      echo -e "n###> Running command [ $CMD ] on $i"
      dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
     echo -e "======================================n"
    done

    Add node attributes to dcos nodes and run apps on nodes with required attributes using placement constraints

    #!/bin/bash
    
    #1. SSH on node 
    #2. Create or edit file /var/lib/dcos/mesos-slave-common
    #3. Add contents as :
    #    MESOS_ATTRIBUTES=<key>:<value>
    #    Example:
    #    MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
    #4. Stop dcos-mesos-slave service
    #    systemctl stop dcos-mesos-slave
    #5. Remove link for latest slave metadata
    #    rm -f /var/lib/mesos/slave/meta/slaves/latest
    #6. Start dcos-mesos-slave service
    #    systemctl start dcos-mesos-slave
    #7. Wait for some time, node will be in HEALTHY state again.
    #8. Add app placement constraint with field = key and value = value
    #9. Verify attributes, run on any node
    #    curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
    #    OR Check DCOS cluster UI
    #    Nodes => Select any Node => Details Tab
    
    tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")
    
    # key:value paired attribues, separated by ;
    ATTRIBUTES=NODE_TYPE:GPU_NODE
    
    cat <<EOF > ${tmpScript}
    echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
    sudo systemctl stop dcos-mesos-slave
    sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
    sudo systemctl start dcos-mesos-slave
    EOF
    
    # Add the private ip of nodes on which you want to add attrubutes, one ip per line.
    for i in `cat nodes.txt`; do 
        echo $i
        dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
        sleep 10
    done

    Install DC/OS Datadog metrics plugin on all DC/OS nodes

    #!/bin/bash
    
    # Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>
    
    DDAPI=$1
    
    if [[ -z $DDAPI ]]; then
        echo "[Datadog Plugin] Need datadog API key as parameter."
        echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
    fi
    tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
    tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
    
    declare agent=$tmpScriptAgent
    declare master=$tmpScriptMaster
    
    for role in "agent" "master"
    do
    cat <<EOF > ${!role}
    curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
    chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
    echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
    export DD_API_KEY=$DDAPI
    export AGENT_ROLE=$role
    sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
    echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
    sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
    echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
    sudo systemctl daemon-reload
    sudo systemctl start dcos-metrics-datadog.service
    echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
    servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
    echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
    #sudo systemctl status dcos-metrics-datadog.service | head -3
    #sudo journalctl -u dcos-metrics-datadog
    EOF
    done
    
    echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
    echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"
    
    for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` 
    do 
        echo -e "\n###> Node - $i"
        dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
        echo -e "======================================================="
    done
    
    for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
    do 
        echo -e "\n###> Master Node - $i"
        dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
        echo -e "======================================================="
    done
    
    # Check status of dcos-metrics-datadog.service on all nodes.
    #for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do  echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

    Get app / node metrics fetched by dcos-metrics component using metrics API

    • Get DC/OS node id [dcos node]
    • Get Node metrics (CPU, memory, local filesystems, networks, etc) :  <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/node</agent_id></dc>
    • Get id of all containers running on that agent : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers</agent_id></dc>
    • Get Resource allocation and usage for the given container ID. : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id></container_id></agent_id></dc>
    • Get Application-level metrics from the container (shipped in StatsD format using the listener available at STATSD_UDP_HOST and STATSD_UDP_PORT) : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id>/app     </container_id></agent_id></dc>

    Get app / node metrics fetched by dcos-metrics component using dcos cli

    • Summary of container metrics for a specific task
    dcos task metrics summary <task-id>

    • All metrics in details for a specific task
    dcos task metrics details <task-id>

    • Summary of Node metrics for a specific node
    dcos task metrics summary <mesos-node-id>

    • All Node metrics in details for a specific node
    dcos node metrics details <mesos-node-id>

    NOTE – All above commands have ‘–json’ flag to use them programmatically.  

    Launch / run command inside container for a task

    DC/OS task exec cli only supports Mesos containers, this script supports both Mesos and Docker containers.

    #!/bin/bash
    
    echo "DCOS Task Exec 2.0"
    if [ "$#" -eq 0 ]; then
            echo "Need task name or id as input. Exiting."
            exit 1
    fi
    taskName=$1
    taskCmd=${2:-bash}
    TMP_TASKLIST_JSON=/tmp/dcostasklist.json
    dcos task --json > $TMP_TASKLIST_JSON
    taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
    if [[ $taskExist -eq 0 ]]; then 
            echo "No task with name $taskName exists."
            echo "Do you mean ?"
            dcos task | grep $taskName | awk '{print $1}'
            exit 1
    fi
    taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
    TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
    if [[ $taskExist -ne 1 ]]; then
            echo -e "More than one instances. Please select task ID for executing command.n"
            #allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
            echo ""
            read TaskId
    fi
    if [[ $taskType !=  "DOCKER" ]]; then
            echo "Task [ $taskName ] is of type MESOS Container."
            execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
            echo "Running [$execCmd]"
            $execCmd
    else
            echo "Task [ $taskName ] is of type DOCKER Container."
            taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
            echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
            taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
            taskContID=`echo $taskContID | tr -d 'r'`
            echo "Task Docker Container ID : [ $taskContID ]"
            echo "Running [ docker exec -it $taskContID $taskCmd ]"
            dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
    fi

    Get DC/OS tasks by node

    #!/bin/bash 
    
    function tasksByNodeAPI
    {
        echo "DC/OS Tasks By Node"
        if [ "$#" -eq 0 ]; then
            echo "Need node ip as input. Exiting."
            exit 1
        fi
        nodeIp=$1
        mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
        if [ -z "mesosId" ]; then
            echo "No node found with ip $nodeIp. Exiting."
            exit 1
        fi
        curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "ttt" + .id'  -r
    }
    
    function tasksByNodeCLI
    {
            echo "DC/OS Tasks By Node"
            if [ "$#" -eq 0 ]; then
                    echo "Need node ip as input. Exiting."
                    exit 1
            fi
            nodeIp=$1
            dcos task | egrep "HOST|$nodeIp"
    }

    Get cluster metadata – cluster Public IP and cluster ID

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"           
    $(dcos config show core.dcos_url)/metadata 

    Sample Output:

    {
    "PUBLIC_IPV4": "123.456.789.012",
    "CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
    }

    Get DC/OS metadata – DC/OS version

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    $(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq  

    Sample Output:

    {
    "version": "1.11.0",
    "dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
    "bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
    }

    Get Mesos version

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    $(dcos config show core.dcos_url)/mesos/version

    Sample Output:

    {
    "build_date": "2018-02-27 21:31:27",
    "build_time": 1519767087.0,
    "build_user": "",
    "git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
    "version": "1.5.0"
    }

    Access DC/OS cluster exhibitor UI (Exhibitor supervises ZooKeeper and provides a management web interface)

    <CLUSTER_URL>/exhibitor

    Access DC/OS cluster data from cluster zookeeper using Zookeeper Python client – Run inside any node / container

    from kazoo.client import KazooClient
    
    zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
    zk.start()
    
    clusterId = ""
    # Here we can give znode path to retrieve its decoded data,
    # for ex to get cluster-id, use
    # data, stat = zk.get("/cluster-id")
    # clusterId = data.decode("utf-8")
    
    # Get cluster Id
    if zk.exists("/cluster-id"):
        data, stat = zk.get("/cluster-id")
        clusterId = data.decode("utf-8")
    
    zk.stop()
    
    print (clusterId)

    Access dcos cluster data from cluster zookeeper using exhibitor rest API

    # Get znode data using endpoint :
    # /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
    # Example : Get znode data for path = /cluster-id
    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    $(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id

    Sample Output:

    {
    "bytes": "3333-XXXXXX",
    "str": "abcde-abcde-abcde-abcde-abcde-",
    "stat": "XXXXXX"
    }

    Get cluster name using Mesos API

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    $(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

    Mark Mesos node as decommissioned

    Some times instances which are running as DC/OS node gets terminated and can not come back online, like AWS EC2 instances, once terminated due to any reason, can not start back. When Mesos detects that a node has stopped, it puts the node in the UNREACHABLE state because Mesos does not know if the node is temporarily stopped and will come back online, or if it is permanently stopped. In such case, we can explicitly tell Mesos to put a node in the GONE state if we know a node will not come back.

    dcos node decommission <mesos-agent-id>

    Conclusion

    We learned about Mesosphere DC/OS, its functionality and roles. We also learned how to setup and use DC/OS cli and use http authentication to access DC/OS APIs as well as using DC/OS cli for automating tasks.

    We went through different API endpoints like Mesos, Marathon, DC/OS metrics, exhibitor, DC/OS cluster organization etc. Finally, we looked at different tricks and scripts to automate DC/OS, like DC/OS node details, task exec, Docker report, DC/OS API http authentication etc.

  • Ensure Continuous Delivery On Kubernetes With GitOps’ Argo CD

    What is GitOps?

    GitOps is a Continuous Deployment model for cloud-native applications. In GitOps, the Git repositories which contain the declarative descriptions of the infrastructure are considered as the single source of truth for the desired state of the system and we need to have an automated way to ensure that the deployed state of the system always matches the state defined in the Git repository. All the changes (deployment/upgrade/rollback) on the environment are triggered by changes (commits) made on the Git repository

    The artifacts that we run on any environment always have a corresponding code for them on some Git repositories. Can we say the same thing for our infrastructure code?

    Infrastructure as code tools, completely declarative orchestration tools like Kubernetes allow us to represent the entire state of our system in a declarative way. GitOps intends to make use of this ability and make infrastructure-related operations more developer-centric.

    Role of Infrastructure as Code (IaC) in GitOps

    The ability to represent the infrastructure as code is at the core of GitOps. But just having versioned controlled infrastructure as code doesn’t mean GitOps, we also need to have a mechanism in place to keep (try to keep) our deployed state in sync with the state we define in the Git repository.

    Infrastructure as Code is necessary but not sufficient to achieve GitOps

    GitOps does pull-based deployments

    Most of the deployment pipelines we see currently, push the changes in the deployed environment. For example, consider that we need to upgrade our application to a newer version then we will update its docker image tag in some repository which will trigger a deployment pipeline and update the deployed application. Here the changes were pushed on the environment. In GitOps, we just need to update the image tag on the Git repository for that environment and the changes will be pulled to the environment to match the updated state in the Git repository. The magic of keeping the deployed state in sync with state-defined on Git is achieved with the help of operators/agents. The operator is like a control loop which can identify differences between the deployed state and the desired state and make sure they are the same.

    Key benefits of GitOps:

    1. All the changes are verifiable and auditable as they make their way into the system through Git repositories.
    2. Easy and consistent replication of the environment as Git repository is the single source of truth. This makes disaster recovery much quicker and simpler.
    3. More developer-centric experience for operating infrastructure. Also a smaller learning curve for deploying dev environments.
    4. Consistent rollback of application as well as infrastructure state.

    Introduction to Argo CD

    Argo CD is a continuous delivery tool that works on the principles of GitOps and is built specifically for Kubernetes. The product was developed and open-sourced by Intuit and is currently a part of CNCF.

    Key components of Argo CD:

    1. API Server: Just like K8s, Argo CD also has an API server that exposes APIs that other systems can interact with. The API server is responsible for managing the application, repository and cluster credentials, enforcing authentication and authorization, etc.
    2. Repository server: The repository server keeps a local cache of the Git repository, which holds the K8s manifest files for the application. This service is called by other services to get the K8s manifests.  
    3. Application controller: The application controller continuously watches the deployed state of the application and compares it with the desired state of the application, reports the API server whenever they are not in sync with each other and seldom takes corrective actions as well. It is also responsible for executing user-defined hooks for various lifecycle events of the application.

    Key objects/resources in Argo CD:

    1. Application: Argo CD allows us to represent the instance of the application which we want to deploy in an environment by creating Kubernetes objects of a custom resource definition(CRD) named Application. In the specification of Application type objects, we specify the source (repository) of our application’s K8s manifest files, the K8s server where we want to deploy those manifests, namespace, and other information.
    2. AppProject: Just like Application, Argo CD provides another CRD named AppProject. AppProjects are used to logically group related-applications.
    3. Repo Credentials: In the case of private repositories, we need to provide access credentials. For credentials, Argo CD uses the K8s secrets and config map. First, we create objects of secret types and then we update a special-purpose configuration map named argocd-cm with the repository URL and the secret which contains the credentials.
    4. Cluster Credentials: Along with Git repository credentials, we also need to provide the K8s cluster credentials. These credentials are also managed using K8s secret, we are required to add the label argocd.argoproj.io/secret-type: cluster to these secrets.

    Demo:

    Enough of theory, let’s try out the things we discussed above. For the demo, I have created a simple app named message-app. This app reads a message set in the environment variable named MESSAGE. We will populate the values of this environment variable using a K8s config map. I have kept the K8s manifest files for the app in a separate repository. We have the application and the K8s manifest files ready. Now we are all set to install Argo CD and deploy our application.

    Installing Argo CD:

    For installing Argo CD, we first need to create a namespace named argocd.

    kubectl create namespace argocd
    kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

    Applying the files from the argo-cd repo directly is fine for demo purposes, but in actual environments, you must copy the file in your repository before applying them. 

    We can see that this command has created the core components and CRDs we discussed earlier in the blog. There are some additional resources as well but we can ignore them for the time being.

    Accessing the Argo CD GUI

    We have the Argo CD running in our cluster, Argo CD also provides a GUI which gives us a graphical representation of our k8s objects. It allows us to view events, pod logs, and other configurations.

    By default, the GUI service is not exposed outside the cluster. Let us update its service type to LoadBalancer so that we can access it from outside.

    kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'

    After this, we can use the external IP of the argocd-server service and access the GUI. 

    The initial username is admin and the password is the name of the api-server pod. The password can be obtained by listing the pods in the argocd namespace or directly by this command.

    kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f 2 

    Deploy the app:

    Now let’s go ahead and create our application for the staging environment for our message app.

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: message-app-staging
      namespace: argocd
      environment: staging
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
    
      # Source of the application manifests
      source:
        repoURL: https://github.com/akash-gautam/message-app-manifests.git
        targetRevision: HEAD
        path: manifests/staging
    
      # Destination cluster and namespace to deploy the application
      destination:
        server: https://kubernetes.default.svc
        namespace: staging
    
      syncPolicy:
        automated:
          prune: false
          selfHeal: false

    In the application spec, we have specified the repository, where our manifest files are stored and also the path of the files in the repository. 

    We want to deploy our app in the same k8s cluster where ArgoCD is running so we have specified the local k8s service URL in the destination. We want the resources to be deployed in the staging namespace, so we have set it accordingly.

    In the sync policy, we have enabled automated sync. We have kept the project as default. 

    Adding the resources-finalizer.argocd.argoproj.io ensures that all the resources created for the application are deleted when the Application is deleted. This is fine for our demo setup but might not always be desirable in real-life scenarios.

    Our git repos are public so we don’t need to create secrets for git repo credentials.

    We are deploying in the same cluster where Argo CD itself is running. As this is a demo setup, we can use the admin user created by Argo CD, so we don’t need to create secrets for cluster credentials either.

    Now let’s go ahead and create the application and see the magic happen.

    kubectl apply -f message-app-staging.yaml

    As soon as the application is created, we can see it on the GUI. 

    By clicking on the application, we can see all the Kubernetes objects created for it.

    It also shows the objects which are indirectly created by the objects we create. In the above image, we can see the replica set and endpoint object which are created as a result of creating the deployment and service respectively.

    We can also click on the individual objects and see their configuration. For pods, we can see events and logs as well.

    As our app is deployed now, we can grab public IP of message-app service and access it on the browser.

    We can see that our app is deployed and accessible.

    Updating the app

    For updating our application, all we need to do is commit our changes to the GitHub repository. We know the message-app just displays the message we pass to it via. Config map, so let’s update the message and push it to the repository.

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: message-configmap
      labels:
        app: message-app
    data:
      MESSAGE: "This too shall pass" #Put the message you want to display here.

    Once the commit is done, Argo CD will start to sync again.

    Once the sync is done, we will restart our message app pod, so that it picks up the latest values in the config map. Then we need to refresh the browser to see updated values.

    As we discussed earlier, for making any changes to the environment, we just need to update the repo which is being used as the source for the environment and then the changes will get pulled in the environment. 

    We can follow an exact similar approach and deploy the application to the production environment as well. We just need to create a new application object and set the manifest path and deployment namespace accordingly.

    Conclusion: 

    It’s still early days for GitOps, but it has already been successfully implemented at scale by many organizations. As the GitOps tools mature along with the ever-growing adoption of Kubernetes, I think many organizations will consider adopting GitOps soon. GitOps is not limited only to Kubernetes, but the completely declarative nature of Kubernetes makes it simpler to achieve GitOps. Argo CD is a deployment tool that’s tailored for Kubernetes and allows us to do deployments in a Kubernetes native way while following the principles of GitOps.I hope this blog helped you in understanding how what and why of GitOps and gave some insights to Argo CD.

  • Exploring Upgrade Strategies for Stateful Sets in Kubernetes

    Introduction

    In the age of continuous delivery and agility where the software is being deployed 10s of times per day and sometimes per hour as well using container orchestration platforms, a seamless upgrade mechanism becomes a critical aspect of any technology adoption, Kubernetes being no exception. 

    Kubernetes provides a variety of controllers that define how pods are set up and deployed within the Kubernetes cluster. These controllers can group pods together according to their runtime needs and can be used to define pod replication and pod startup ordering. Kubernetes controllers are nothing but an application pattern. The controller controls the pods(smallest unit in Kubernetes), so, you don’t need to create, manage and delete the pods. There are few types of controllers in Kubernetes like,

    1. Deployment
    2. Statefulset
    3. Daemonset
    4. Job
    5. Replica sets

    Each controller represents an application pattern. For example, Deployment represents the stateless application pattern in which you don’t store the state of your application. Statefulset represents the statefulset application pattern where you store the data, for example, databases, message queues.  We will be focusing on Statefulset controller and its update feature in this blog.

    Statefulset

    The StatefulSet acts as a controller in Kubernetes to deploy applications according to a specified rule set and is aimed towards the use of persistent and stateful applications. It is an ordered and graceful deployment. Statefulset is generally used with a distributed applications that require each node to have a persistent state and the ability to configure an arbitrary number of nodes. StatefulSet pods have a unique identity that is comprised of an ordinal, a stable network identity, and stable storage. The identity sticks to the pod, regardless of which node it’s scheduled on. For more details check here.

    Update Strategies FOR STATEFULSETS

    There are a couple of different strategies available for upgrades – Blue/Green and Rolling updates. Let’s review them in detail:

    Blue-Green DeploymentBlue-green deployment is one of the commonly used update strategies. There are 2 identical environments of your application in this strategy. One is the Blue environment which is running the current deployment and the Green environment is the new deployment to which we want to upgrade. The approach is simple:

    1. Switch the load balancer to route traffic to the Green environment.
    2. Delete the Blue environment once the Green environment is verified. 

    Disadvantages of Blue-Green deployment:

    1. One of the disadvantages of this strategy is that all current transactions and sessions will be lost, due to the physical switch from one machine serving the traffic to another one.
    2. Implementing blue-green deployment become complex with the database, especially if, the database schema changes across version.
    3. In blue-green deployment, you need the extra cloud setup/hardware which increases the overall costing.

    Rolling update strategy

    After Blue-Green deployment, let’s take a look at Rolling updates and how it works.

    1. In short, as the name suggests this strategy replaces currently running instances of the application with new instances, one by one. 
    2. In this strategy, health checks play an important role i.e. old instances of the application are removed only if new version are healthy. Due to this, the existing deployment becomes heterogeneous while moving from the old version of the application to new version. 
    3. The benefit of this strategy is that its incremental approach to roll out the update and verification happens in parallel while increasing traffic to the application.
    4. In rolling update strategy, you don’t need extra hardware/cloud setup and hence it’s cost-effective technique of upgrade.

    Statefulset upgrade strategies

    With the basic understanding of upgrade strategies, let’s explore the update strategies available for Stateful sets in Kubernetes. Statefulsets are used for databases where the state of the application is the crucial part of the deployment. We will take the example of Cassandra to learn about statefulset upgrade feature. We will use the gce-pd storage to store the data. StatefulSets(since Kubernetes 1.7) uses an update strategy to configure and disable automated rolling updates for containers, labels, resource request/limits, and annotations for its pods. The update strategy is configured using the updateStrategy field.

    The updateStrategy field accepts one of the following value 

    1. OnDelete
    2. RollingUpdate

    OnDelete update strategy

    OnDelete prevents the controller from automatically updating its pods. One needs to delete the pod manually for the changes to take effect. It’s more of a manual update process for the Statefulset application and this is the main difference between OnDelete and RollingUpdate strategy. OnDelete update strategy plays an important role where the user needs to perform few action/verification post the update of each pod. For example, after updating a single pod of Cassandra user might need to check if the updated pod joined the Cassandra cluster correctly.

    We will now create a Statefulset deployment first. Let’s take a simple example of Cassandra and deploy it using a Statefulset controller. Persistent storage is the key point in Statefulset controller. You can read more about the storage class here.

    For the purpose of this blog, we will use the Google Kubernetes Engine.

    • First, define the storage class as follows:
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: fast
    provisioner: kubernetes.io/gce-pd
    parameters:
      type: pd-ssd

    • Then create the Storage class using kubectl:
    $ kubectl create -f storage_class.yaml

    • Here is the YAML file for the Cassandra service and the Statefulset deployment.
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: cassandra
      name: cassandra
    spec:
      clusterIP: None
      ports:
      - port: 9042
      selector:
        app: cassandra
    ---
    apiVersion: apps/v1beta2
    kind: StatefulSet
    metadata:
      name: cassandra
      labels:
        app: cassandra
    spec:
      serviceName: cassandra
      replicas: 3
      updateStrategy:
        type: OnDelete
      selector:
        matchLabels:
          app: cassandra
      template:
        metadata:
          labels:
            app: cassandra
        spec:
          terminationGracePeriodSeconds: 1800
          containers:
          - name: cassandra
            image: gcr.io/google-samples/cassandra:v12
            imagePullPolicy: Always
            ports:
            - containerPort: 7000
              name: intra-node
            - containerPort: 7001
              name: tls-intra-node
            - containerPort: 7199
              name: jmx
            - containerPort: 9042
              name: cql
            resources:
              limits:
                cpu: "500m"
                memory: 1Gi
              requests:
               cpu: "500m"
               memory: 1Gi
            securityContext:
              capabilities:
                add:
                  - IPC_LOCK
            lifecycle:
              preStop:
                exec:
                  command: 
                  - /bin/sh
                  - -c
                  - nodetool drain
            env:
              - name: MAX_HEAP_SIZE
                value: 512M
              - name: HEAP_NEWSIZE
                value: 100M
              - name: CASSANDRA_SEEDS
                value: "cassandra-0.cassandra.default.svc.cluster.local"
              - name: CASSANDRA_CLUSTER_NAME
                value: "K8Demo"
              - name: CASSANDRA_DC
                value: "DC1-K8Demo"
              - name: CASSANDRA_RACK
                value: "Rack1-K8Demo"
              - name: POD_IP
                valueFrom:
                  fieldRef:
                    fieldPath: status.podIP
            readinessProbe:
              exec:
                command:
                - /bin/bash
                - -c
                - /ready-probe.sh
              initialDelaySeconds: 15
              timeoutSeconds: 5
            volumeMounts:
            - name: cassandra-data
              mountPath: /cassandra_data
      volumeClaimTemplates:
      - metadata:
          name: cassandra-data
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: "fast"
          resources:
            requests:
              storage: 5Gi

    • Let’s create the Statefulset now.
    $ kubectl create -f cassandra.yaml

    • After creating Cassandra Statefulset, if you check the running pods then you will find something like,
    $ kubectl get podsNAME READY STATUS RESTARTS AGE
    cassandra-0 1/1 Running 0 2m
    cassandra-1 1/1 Running 0 2m
    cassandra-2 1/1 Running 0 2m

    • Check if Cassandra cluster is formed correctly using following command:
    $ kubectl exec -it cassandra-0 -- nodetool statusDatacenter: DC1-K8Demo
    #ERROR!
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    
    Address Load Tokens Owns Host ID Rack
    UN 192.168.4.193 101.15 KiB 32 72.0% abd9f52d-85ef-44ee-863c-e1b174cd9412 Rack1-K8Demo
    UN 192.168.199.67 187.81 KiB 32 72.8% c40e89e4-44fe-4fc2-9e8a-863b6a74c90c Rack1-K8Demo
    UN 192.168.187.196 131.42 KiB 32 55.2% c235505c-eec5-43bc-a4d9-350858814fe5 Rack1-K8Demo

    • Let’s describe the running pod first before updating. Look for the image field in the output of the following command
    $ kubectl describe pod cassandra-0

    • The Image field will show gcr.io/google-samples/cassandra:v12 . Now, let’s patch the Cassandra statefulset with the latest image to which we want to update. The latest image might contain the new Cassandra version or database schema changes. Before upgrading such crucial components, it’s always safe to have the backup of the data,
    $ kubectl patch statefulset cassandra --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"gcr.io/google-samples/cassandra:v13"}]'

    You will see output as `statefulset.apps “cassandra” patched`, but controller won’t update the running pod automatically in this strategy. You need to delete the pods once and wait till pods with new configuration comes up. Let’s try deleting the cassandra-0 pod.

    $ kubectl delete pod cassandra-0

    • Wait till cassandra-0 comes up in running state and then check if the cassandra-0 is running with intended/updated image i.e. gcr.io/google-samples/cassandra:v13 Now, cassandra-0 is running the new image while cassandra-1 and cassandra-2 are still running the old image. You need to delete these pods for the new image to take effect in this strategy.

    Rolling update strategy

    Rolling update is an automated update process. In this, the controller deletes and then recreates each of its pods. Pods get updated one at a time. While updating, the controller makes sure that an updated pod is running and is in ready state before updating its predecessor. The pods in the StatefulSet are updated in reverse ordinal order(same as pod termination order i.e from the largest ordinal to the smallest)

    For the rolling update strategy, we will create the Cassandra statefulset with the .spec.updateStrategy field pointing to RollingUpdate

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: cassandra
      name: cassandra
    spec:
      clusterIP: None
      ports:
      - port: 9042
      selector:
        app: cassandra
    ---
    apiVersion: apps/v1beta2
    kind: StatefulSet
    metadata:
      name: cassandra
      labels:
        app: cassandra
    spec:
      serviceName: cassandra
      replicas: 3
      updateStrategy:
        type: RollingUpdate
      selector:
        matchLabels:
          app: cassandra
      template:
        metadata:
          labels:
            app: cassandra
        spec:
          terminationGracePeriodSeconds: 1800
          containers:
          - name: cassandra
            image: gcr.io/google-samples/cassandra:v12
            imagePullPolicy: Always
            ports:
            - containerPort: 7000
              name: intra-node
            - containerPort: 7001
              name: tls-intra-node
            - containerPort: 7199
              name: jmx
            - containerPort: 9042
              name: cql
            resources:
              limits:
                cpu: "500m"
                memory: 1Gi
              requests:
               cpu: "500m"
               memory: 1Gi
            securityContext:
              capabilities:
                add:
                  - IPC_LOCK
            lifecycle:
              preStop:
                exec:
                  command: 
                  - /bin/sh
                  - -c
                  - nodetool drain
            env:
              - name: MAX_HEAP_SIZE
                value: 512M
              - name: HEAP_NEWSIZE
                value: 100M
              - name: CASSANDRA_SEEDS
                value: "cassandra-0.cassandra.default.svc.cluster.local"
              - name: CASSANDRA_CLUSTER_NAME
                value: "K8Demo"
              - name: CASSANDRA_DC
                value: "DC1-K8Demo"
              - name: CASSANDRA_RACK
                value: "Rack1-K8Demo"
              - name: POD_IP
                valueFrom:
                  fieldRef:
                    fieldPath: status.podIP
            readinessProbe:
              exec:
                command:
                - /bin/bash
                - -c
                - /ready-probe.sh
              initialDelaySeconds: 15
              timeoutSeconds: 5
            volumeMounts:
            - name: cassandra-data
              mountPath: /cassandra_data
      volumeClaimTemplates:
      - metadata:
          name: cassandra-data
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: "fast"
          resources:
            requests:
              storage: 5Gi

    • To try the rolling update feature, we can patch the existing statefulset with the updated image.
    $ kubectl patch statefulset cassandra --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"gcr.io/google-samples/cassandra:v13"}]'

    • Once you execute the above command, monitor the output of the following command,
    $ kubectl get pods -w

    In the case of failure in update process, controller restores any pod that fails during the update to its current version i.e. pods that have already received the update will be restored to the updated version, and pods that have not yet received the update will be restored to the previous version.

    Partitioning a RollingUpdate (Staging an Update)

    The updateStrategy contains one more field for partitioning the RollingUpdate. If a partition is specified, all pods with an ordinal greater than or equal to that of the provided partition will be updated and the pods with an ordinal that is less than the partition will not be updated. If the pods with an ordinal value less than the partition get deleted, then those pods will get recreated with the old definition/version. This partitioning rolling update feature plays important role in the scenario where if you want to stage an update, roll out a canary, or perform a phased rollout.

    RollingUpdate supports partitioning option. You can define the partition parameter in the .spec.updateStrategy

    $ kubectl patch statefulset cassandra -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'

    In the above command, we are giving partition value as 2, which will patch the Cassandra statefulset in such a way that, whenever we try to update the Cassandra statefulset, it will update the cassandra-2 pod only. Let’s try to patch the updated image to existing statefulset.

    $ kubectl patch statefulset cassandra --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"gcr.io/google-samples/cassandra:v14"}]'

    After patching, watch the following command output,

    $ kubectl get pods -w

    You can keep decrementing the partition value and that many pods will keep taking the effect of the applied patch. For example, if you patch the statefulset with partition=0 then all the pods of the Cassandra statefulset will get updated with provided upgrade configuration.

    Verifying if the upgrade was successful

    Verifying the upgrade process of your application is the important step to conclude the upgrade. This step might differ as per the application. Here, in the blog we have taken the Cassandra example, so we will verify if the cluster of the Cassandra nodes is being formed properly.

    Use `nodetool status` command to verify the cluster. After upgrading all the pods, you might want to run some post-processing like migrating schema if your upgrade dictates that etc.

    As per the upgrade strategy, verification of your application can be done by following ways.

    1. In OnDelete update strategy, you can keep updating pod one by one and keep checking the application status to make sure the upgrade working fine.
    2. In RollingUpdate strategy, you can check the application status once all the running pods of your application gets upgraded.

    For Cassandra like application, OnDelete update is more preferred than RollingUpdate. In rolling update, we saw that Cassandra pod gets updated one by one, starting from high to low ordinal index. There might be the case where after updating 2 pods, Cassandra cluster might go in failed state but you can not recover it like the OnDelete strategy. You have to try to recover Cassandra once the complete upgrade is done i.e. once all the pods get upgraded to provided image. If you have to use the rolling update then try partitioning the rolling update.

    Conclusion

    In this blog, we went through the Kubernetes controllers and mainly through statefulsets. We learnt about the differences between blue-green deployment and rolling update strategies then we played with the Cassandra statefulset example and successfully upgraded it with update strategies like OnDelete and RollingUpdate. Do let us know if you have any questions, queries and additional thoughts in the comments section below.

  • Monitoring a Docker Container with Elasticsearch, Kibana, and Metricbeat

    Since you are on this page, you have probably already started using Docker to deploy your applications and are enjoying it compared to virtual machines, because of it being lightweight, easy to deploy and its exceptional security management features.

    And, once the applications are deployed, monitoring your containers and tracking their activities in real time is very essential. Imagine a scenario where you are managing one or many virtual machines. Your pre-configured session will be doing everything, including monitoring. If you face any problems during production, then—with a handful of commands such as top, htop, iotop, and with flags like -o, %CPU, and %MEM—you are good to troubleshoot the issue.

    On the other hand, consider a scenario where you have the same nodes spread across 100-200 containers. You will need to see all activity in one place to query for information about what happened. Here, monitoring comes into the picture. We will be discussing more benefits as we move further.

    This blog will cover Docker monitoring with Elasticsearch, Kibana, and Metricbeat. Basically, Elasticsearch is a platform that allows us to have distributed search and analysis of data in real-time along with visualization. We’ll be discussing how all these work interdependently as we move ahead. Like Elasticsearch, Kibana is also open-source software. Kibana is an interface mainly used to visualize the data sent from Elasticsearch. Metricbeat is a lightweight shipper of collected metrics from your system to the desired target (Elasticsearch in this case). 

    What is Docker Monitoring?

    In simple terms, monitoring containers is how we keep track of the above metrics and analyze them to ensure the performance of applications built on microservices and to keep track of issues so that they can be solved more easily. This monitoring is vital for performance improvement and optimization and to find the RCA of various issues.

    There is a lot of software available for monitoring the Docker container, both open-source as well as proprietary, like Prometheus, AppOptics, Metricbeats, Datadog, Sumologic, etc.

    You can choose any of these based on convenience. 

    Why is Docker Monitoring needed?

    1. Monitoring helps early detection and to fix issues to avoid a breakdown during production
    2. New feature additions/updates implemented safely as the entire application is monitored
    3. Docker monitoring is beneficial for developers, IT pros, and enterprises as well.
    • For developers, Docker monitoring tracks bugs and helps to resolve them quickly along with enhancing security.
    • For IT pros, it helps with flexible integration of existing processes and enterprise systems and satisfies all the requirements.
    • For enterprises, it helps to build the application within a certified container within a secured ecosystem that runs smoothly. 

    Elasticsearch is a platform that allows us to have distributed search and analysis of data in real-time, along with visualization. Elasticsearch is free and open-source software. It goes well with a huge number of technologies, like Metricbeat, Kibana, etc. Let’s move onto the installation of Elasticsearch.

    Installation of Elasticsearch:

    Prerequisite: Elasticsearch is built in Java. So, make sure that your system at least has Java8 to run Elasticsearch.

    For installing Elasticsearch for your OS, please follow the steps at Installing Elasticsearch | Elasticsearch Reference [7.11].

    After installing,  check the status of Elasticsearch by sending an HTTP request on port 9200 on localhost.

    http://localhost:9200/

    This will give you a response as below:

    You can configure Elasticsearch by editing $ES_HOME/config/elasticsearch.yml 

    Learn more about configuring Elasticsearch here.

    Now, we are done with the Elasticsearch setup and are ready to move onto Kibana.

    Kibana:

    Like Elasticsearch, Kibana is also open-source software. Kibana is an interface mainly used to visualize the data from Elasticsearch. Kibana allows you to do anything via query and let’s you generate numerous visuals as per your requirements. Kibana lets you visualize enormous amounts of data in terms of line graphs, gauges, and all other graphs.

    Let’s cover the installation steps of Kibana.

    Installing Kibana

    Prerequisites: 

    • Must have Java1.8+ installed 
    • Elasticsearch v1.4.4+
    • Web browser such as Chrome, Firefox

    For installing Kibana with respect to your OS, please follow the steps at Install Kibana | Kibana Guide [7.11]

    Kibana runs on default port number 5601. Just send an HTTP request to port 5601 on localhost with http://localhost:5601/ 

    You should land on the Kibana dashboard, and it is now ready to use:

    You can configure Kibana by editing $KIBANA_HOME/config. For more about configuring Kibana, visit here.

    Let’s move onto the final part—setting up with Metricbeat.

    Metricbeat

    Metricbeat sends metrics frequently, and we can say it’s a lightweight shipper of collected metrics from your system.

    You can simply install Metricbeat to your system or servers to periodically collect metrics from the OS and the microservices running on services. The collected metrics are shipped to the output you specified, e.g., Elasticsearch, Logstash. 

    Installing Metricbeat

    For installing Metricbeat according to your OS, follow the steps at Install Kibana | Kibana Guide [7.11]

    As soon as we start the Metricbeat service, it sends Docker metrics to the Elasticsearch index, which can be confirmed by curling Elasticsearch indexes with the command:

    curl -XGET 'localhost:9200/_cat/indices?v&pretty'

    How Are They Internally Connected?

    We have now installed all three and they are up and running. As per the period mentioned, docker.yml will hit the Docker API and send the Docker metrics to Elasticsearch. Those metrics are now available in different indexes of Elasticsearch. As mentioned earlier, Kibana queries the data of Elasticsearch and visualizes it in the form of graphs. In this, all three are connected. 

    Please refer to the flow chart for more clarification:

    How to Create Dashboards?

    Now that we are aware of how these three tools work interdependently, let’s create dashboards to monitor our containers and understand those. 

    First of all, open the Dashboards section on Kibana (localhost:5601/) and click the Create dashboard button:

     

    You will be directed to the next page:

    Choose the type of visualization you want from all options:

    For example, let’s go with Lens

    (Learn more about Kibana Lens)

    Here, we will be looking for the number of containers vs. timestamps by selecting the timestamp on X-axis and the unique count of docker.container.created on Y-axis.

    As soon we have selected both parameters, it will generate a graph as shown in the snapshot, and we will be getting the count of created containers (here Count=1). If you create move containers on your system, when that data metric is sent to Kibana, the graph and the counter will be modified. In this way, you can monitor how many containers are created over time. In similar fashion, depending on your monitoring needs, you can choose a parameter from the left panel showing available fields like: 

    activemq.broker.connections.count

    docker.container.status

    Docker.container.tags

    Now, we will show one more example of how to create a bar graph:

    As mentioned above, to create a bar graph just choose “vertical bar” from the above snapshot. Here, I’m trying to get a bar graph for the count of documents vs. metricset names, such as network, file system, cpu, etc. So, as shown in the snapshot on the left, choose the Y-axis parameter as count and X-axis parameter as metricset.name as shown in the right side of the snapshot

    After hitting enter, a graph will be generated: 

    Similarly, you can try it out with multiple parameters with different types of graphs to monitor. Now, we will move onto the most important and widely used monitoring tool to track warnings, errors, etc., which is DISCOVER.

    Discover for Monitoring:

    Basically, Discover provides deep insights into data, showing you where you can apply searches and filters as well. With it, you can show which processes are taking more time and show only those. Filter out errors occurring with the message filter with a value of ERROR. Check the health of the container; check for logged-in users. These kinds of queries can be sent and the desired results can be achieved, leading to good monitoring of containers, same as the SQL queries. 

    [More about Discover here.]

    To apply filters, just click on the “filter by type” from the left panel, and you will see all available filtering options. From there, you can select one as per your requirements, and view those on the central panel. 

    Similar to filter, you can choose fields to be shown on the dashboard from the left panel with “Selected fields” right below the filters. (Here, we have only selected info for Source.)

    Now, if you take a look at the top part of the snapshot, you will find the search bar. This is the most useful part of Discover for monitoring.

    In that bar, you just need to put a query, and according to that query, logs will be filtered. For example, I will be putting a query for error messages equal to No memory stats data available.

    When we hit the update button on the right side, only logs containing that error message will be there and highlighted for differentiation, as shown in the snapshot. All other logs will not be shown. In this way, you can track a particular error and ensure that it does not exist after fixing it.

    In addition to query, it also provides keyword search. So, if you input a word like warning, error, memory, or user, then it will provide logs for that word, like “memory” in the snapshot:

     

    Similar to Kibana, we also receive logs in the terminal. For example, the following highlighted portion is about the state of your cluster. In the terminal, you can put a simple grep command for required logs. 

    With this, you can monitor Docker containers with multiple queries, such as nested queries for the Discover facility. There are many different graphs you can try depending on your requirements to keep your application running smoothly.

    Conclusion

    Monitoring requires a lot of time and effort. What we have seen here is a drop in the ocean. For some next steps, try:

    1. Monitoring network
    2. Aggregating logs from your different applications
    3. Aggregating logs from multiple containers
    4. Alerts setting and monitoring
    5. Nested queries for logs
  • A Practical Guide to Deploying Multi-tier Applications on Google Container Engine (GKE)

    Introduction

    All modern era programmers can attest that containerization has afforded more flexibility and allows us to build truly cloud-native applications. Containers provide portability – ability to easily move applications across environments. Although complex applications comprise of many (10s or 100s) containers. Managing such applications is a real challenge and that’s where container orchestration and scheduling platforms like Kubernetes, Mesosphere, Docker Swarm, etc. come into the picture. 
    Kubernetes, backed by Google is leading the pack given that Redhat, Microsoft and now Amazon are putting their weight behind it.

    Kubernetes can run on any cloud or bare metal infrastructure. Setting up & managing Kubernetes can be a challenge but Google provides an easy way to use Kubernetes through the Google Container Engine(GKE) service.

    What is GKE?

    Google Container Engine is a Management and orchestration system for Containers. In short, it is a hosted Kubernetes. The goal of GKE is to increase the productivity of DevOps and development teams by hiding the complexity of setting up the Kubernetes cluster, the overlay network, etc.

    Why GKE? What are the things that GKE does for the user?

    • GKE abstracts away the complexity of managing a highly available Kubernetes cluster.
    • GKE takes care of the overlay network
    • GKE also provides built-in authentication
    • GKE also provides built-in auto-scaling.
    • GKE also provides easy integration with the Google storage services.

    In this blog, we will see how to create your own Kubernetes cluster in GKE and how to deploy a multi-tier application in it. The blog assumes you have a basic understanding of Kubernetes and have used it before. It also assumes you have created an account with Google Cloud Platform. If you are not familiar with Kubernetes, this guide from Deis  is a good place to start.

    Google provides a Command-line interface (gcloud) to interact with all Google Cloud Platform products and services. gcloud is a tool that provides the primary command-line interface to Google Cloud Platform. Gcloud tool can be used in the scripts to automate the tasks or directly from the command-line. Follow this guide to install the gcloud tool.

    Now let’s begin! The first step is to create the cluster.

    Basic Steps to create cluster

    In this section, I would like to explain about how to create GKE cluster. We will use a command-line tool to setup the cluster.

    Set the zone in which you want to deploy the cluster

    $ gcloud config set compute/zone us-west1-a

    Create the cluster using following command,

    $ gcloud container --project <project-name> 
    clusters create <cluster-name> 
    --machine-type n1-standard-2 
    --image-type "COS" --disk-size "50" 
    --num-nodes 2 --network default 
    --enable-cloud-logging --no-enable-cloud-monitoring

    Let’s try to understand what each of these parameters mean:

    –project: Project Name

    –machine-type: Type of the machine like n1-standard-2, n1-standard-4

    –image-type: OS image.”COS” i.e. Container Optimized OS from Google: More Info here.

    –disk-size: Disk size of each instance.

    –num-nodes: Number of nodes in the cluster.

    –network: Network that users want to use for the cluster. In this case, we are using default network.

    Apart from the above options, you can also use the following to provide specific requirements while creating the cluster:

    –scopes: Scopes enable containers to direct access any Google service without needs credentials. You can specify comma separated list of scope APIs. For example:

    • Compute: Lets you view and manage your Google Compute Engine resources
    • Logging.write: Submit log data to Stackdriver.

    You can find all the Scopes that Google supports here: .

    –additional-zones: Specify additional zones to high availability. Eg. –additional-zones us-east1-b, us-east1-d . Here Kubernetes will create a cluster in 3 zones (1 specified at the beginning and additional 2 here).

    –enable-autoscaling : To enable the autoscaling option. If you specify this option then you have to specify the minimum and maximum required nodes as follows; You can read more about how auto-scaling works here. Eg:   –enable-autoscaling –min-nodes=15 –max-nodes=50

    You can fetch the credentials of the created cluster. This step is to update the credentials in the kubeconfig file, so that kubectl will point to required cluster.

    $ gcloud container clusters get-credentials my-first-cluster --project project-name

    Now, your First Kubernetes cluster is ready. Let’s check the cluster information & health.

    $ kubectl get nodes
    NAME    STATUS    AGE   VERSION
    gke-first-cluster-default-pool-d344484d-vnj1  Ready  2h  v1.6.4
    gke-first-cluster-default-pool-d344484d-kdd7  Ready  2h  v1.6.4
    gke-first-cluster-default-pool-d344484d-ytre2  Ready  2h  v1.6.4

    After creating Cluster, now let’s see how to deploy a multi tier application on it. Let’s use simple Python Flask app which will greet the user, store employee data & get employee data.

    Application Deployment

    I have created simple Python Flask application to deploy on K8S cluster created using GKE. you can go through the source code here. If you check the source code then you will find directory structure as follows:

    TryGKE/
    ├── Dockerfile
    ├── mysql-deployment.yaml
    ├── mysql-service.yaml
    ├── src    
      ├── app.py    
      └── requirements.txt    
      ├── testapp-deployment.yaml    
      └── testapp-service.yaml

    In this, I have written a Dockerfile for the Python Flask application in order to build our own image to deploy. For MySQL, we won’t build an image of our own. We will use the latest MySQL image from the public docker repository.

    Before deploying the application, let’s re-visit some of the important Kubernetes terms:

    Pods:

    The pod is a Docker container or a group of Docker containers which are deployed together on the host machine. It acts as a single unit of deployment.

    Deployments:

    Deployment is an entity which manages the ReplicaSets and provides declarative updates to pods. It is recommended to use Deployments instead of directly using ReplicaSets. We can use deployment to create, remove and update ReplicaSets. Deployments have the ability to rollout and rollback the changes.

    Services:

    Service in K8S is an abstraction which will connect you to one or more pods. You can connect to pod using the pod’s IP Address but since pods come and go, their IP Addresses change.  Services get their own IP & DNS and those remain for the entire lifetime of the service. 

    Each tier in an application is represented by a Deployment. A Deployment is described by the YAML file. We have two YAML files – one for MySQL and one for the Python application.

    1. MySQL Deployment YAML

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: mysql
    spec:
      template:
        metadata:
          labels:
            app: mysql
        spec:
          containers:
            - env:
                - name: MYSQL_DATABASE
                  value: admin
                - name: MYSQL_ROOT_PASSWORD
                  value: admin
              image: 'mysql:latest'
              name: mysql
              ports:
                - name: mysqlport
                  containerPort: 3306
                  protocol: TCP

    2. Python Application Deployment YAML

    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: test-app
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            app: test-app
        spec:
          containers:
          - name: test-app
            image: ajaynemade/pymy:latest
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 5000

    Each Service is also represented by a YAML file as follows:

    1. MySQL service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: mysql-service
    spec:
      ports:
      - port: 3306
        targetPort: 3306
        protocol: TCP
        name: http
      selector:
        app: mysql

    2. Python Application service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: test-service
    spec:
      type: LoadBalancer
      ports:
      - name: test-service
        port: 80
        protocol: TCP
        targetPort: 5000
      selector:
        app: test-app

    You will find a ‘kind’ field in each YAML file. It is used to specify whether the given configuration is for deployment, service, pod, etc.

    In the Python app service YAML, I am using type = LoadBalancer. In GKE, There are two types of cloud load balancers available to expose the application to outside world.

    1. TCP load balancer: This is a TCP Proxy-based load balancer. We will use this in our example.
    2. HTTP(s) load balancer: It can be created using Ingress. For more information, refer to this post that talks about Ingress in detail.

    In the MySQL service, I’ve not specified any type, in that case, type ‘ClusterIP’ will get used, which will make sure that MySQL container is exposed to the cluster and the Python app can access it.

    If you check the app.py, you can see that I have used “mysql-service.default” as a hostname. “Mysql-service.default” is a DNS name of the service. The Python application will refer to that DNS name while accessing the MySQL Database.

    Now, let’s actually setup the components from the configurations. As mentioned above, we will first create services followed by deployments.

    Services:

    $ kubectl create -f mysql-service.yaml
    $ kubectl create -f testapp-service.yaml

    Deployments:

    $ kubectl create -f mysql-deployment.yaml
    $ kubectl create -f testapp-deployment.yaml

    Check the status of the pods and services. Wait till all pods come to the running state and Python application service to get external IP like below:

    $ kubectl get services
    NAME            CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
    kubernetes      10.55.240.1     <none>        443/TCP        5h
    mysql-service   10.55.240.57    <none>        3306/TCP       1m
    test-service    10.55.246.105   35.185.225.67     80:32546/TCP   11s

    Once you get the external IP, then you should be able to make APIs calls using simple curl requests.

    Eg. To Store Data :

    curl -H "Content-Type: application/x-www-form-urlencoded" -X POST  http://35.185.225.67:80/storedata -d id=1 -d name=NoOne

    Eg. To Get Data :

    curl 35.185.225.67:80/getdata/1

    At this stage your application is completely deployed and is externally accessible.

    Manual scaling of pods

    Scaling your application up or down in Kubernetes is quite straightforward. Let’s scale up the test-app deployment.

    $ kubectl scale deployment test-app --replicas=3

    Deployment configuration for test-app will get updated and you can see 3 replicas of test-app are running. Verify it using,

    kubectl get pods

    In the same manner, you can scale down your application by reducing the replica count.

    Cleanup :

    Un-deploying an application from Kubernetes is also quite straightforward. All we have to do is delete the services and delete the deployments. The only caveat is that the deletion of the load balancer is an asynchronous process. You have to wait until it gets deleted.

    $ kubectl delete service mysql-service
    $ kubectl delete service test-service

    The above command will deallocate Load Balancer which was created as a part of test-service. You can check the status of the load balancer with the following command.

    $ gcloud compute forwarding-rules list

    Once the load balancer is deleted, you can clean-up the deployments as well.

    $ kubectl delete deployments test-app
    $ kubectl delete deployments mysql

    Delete the Cluster:

    $ gcloud container clusters delete my-first-cluster

    Conclusion

    In this blog, we saw how easy it is to deploy, scale & terminate applications on Google Container Engine. Google Container Engine abstracts away all the complexity of Kubernetes and gives us a robust platform to run containerised applications. I am super excited about what the future holds for Kubernetes!

    Check out some of Velotio’s other blogs on Kubernetes.

  • Extending Kubernetes APIs with Custom Resource Definitions (CRDs)

    Introduction:

    Custom resources definition (CRD) is a powerful feature introduced in Kubernetes 1.7 which enables users to add their own/custom objects to the Kubernetes cluster and use it like any other native Kubernetes objects. In this blog post, we will see how we can add a custom resource to a Kubernetes cluster using the command line as well as using the Golang client library thus also learning how to programmatically interact with a Kubernetes cluster.

    What is a Custom Resource Definition (CRD)?

    In the Kubernetes API, every resource is an endpoint to store API objects of certain kind. For example, the built-in service resource contains a collection of service objects. The standard Kubernetes distribution ships with many inbuilt API objects/resources. CRD comes into picture when we want to introduce our own object into the Kubernetes cluster to full fill our requirements. Once we create a CRD in Kubernetes we can use it like any other native Kubernetes object thus leveraging all the features of Kubernetes like its CLI, security, API services, RBAC etc.

    The custom resource created is also stored in the etcd cluster with proper replication and lifecycle management. CRD allows us to use all the functionalities provided by a Kubernetes cluster for our custom objects and saves us the overhead of implementing them on our own.

    How to register a CRD using command line interface (CLI)

    Step-1: Create a CRD definition file sslconfig-crd.yaml

    apiVersion: "apiextensions.k8s.io/v1beta1"
    kind: "CustomResourceDefinition"
    metadata:
      name: "sslconfigs.blog.velotio.com"
    spec:
      group: "blog.velotio.com"
      version: "v1alpha1"
      scope: "Namespaced"
      names:
        plural: "sslconfigs"
        singular: "sslconfig"
        kind: "SslConfig"
      validation:
        openAPIV3Schema:
          required: ["spec"]
          properties:
            spec:
              required: ["cert","key","domain"]
              properties:
                cert:
                  type: "string"
                  minimum: 1
                key:
                  type: "string"
                  minimum: 1
                domain:
                  type: "string"
                  minimum: 1 

    Here we are creating a custom resource definition for an object of kind SslConfig. This object allows us to store the SSL configuration information for a domain. As we can see under the validation section specifying the cert, key and the domain are mandatory for creating objects of this kind, along with this we can store other information like the provider of the certificate etc. The name metadata that we specify must be spec.names.plural+”.”+spec.group.

    An API group (blog.velotio.com here) is a collection of API objects which are logically related to each other. We have also specified version for our custom objects (spec.version), as the definition of the object is expected to change/evolve in future so it’s better to start with alpha so that the users of the object knows that the definition might change later. In the scope, we have specified Namespaced, by default a custom resource name is clustered scoped. 

    # kubectl create -f crd.yaml
    # kubectl get crd NAME AGE sslconfigs.blog.velotio.com 5s

    Step-2:  Create objects using the definition we created above

    apiVersion: "blog.velotio.com/v1alpha1"
    kind: "SslConfig"
    metadata:
      name: "sslconfig-velotio.com"
    spec:
      cert: "my cert file"
      key : "my private  key"
      domain: "*.velotio.com"
      provider: "digicert"

    # kubectl create -f crd-obj.yaml
    # kubectl get sslconfig NAME AGE sslconfig-velotio.com 12s

    Along with the mandatory fields cert, key and domain, we have also stored the information of the provider ( certifying authority ) of the cert.

    How to register a CRD programmatically using client-go

    Client-go project provides us with packages using which we can easily create go client and access the Kubernetes cluster.  For creating a client first we need to create a connection with the API server.
    How we connect to the API server depends on whether we will be accessing it from within the cluster (our code running in the Kubernetes cluster itself) or if our code is running outside the cluster (locally)

    If the code is running outside the cluster then we need to provide either the path of the config file or URL of the Kubernetes proxy server running on the cluster.

    kubeconfig := filepath.Join(
    os.Getenv("HOME"), ".kube", "config",
    )
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil {
    log.Fatal(err)
    }

    OR

    var (
    // Set during build
    version string
    
    proxyURL = flag.String("proxy", "",
    `If specified, it is assumed that a kubctl proxy server is running on the
    given url and creates a proxy client. In case it is not given InCluster
    kubernetes setup will be used`)
    )
    if *proxyURL != "" {
    config, err = clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
    &clientcmd.ClientConfigLoadingRules{},
    &clientcmd.ConfigOverrides{
    ClusterInfo: clientcmdapi.Cluster{
    Server: *proxyURL,
    },
    }).ClientConfig()
    if err != nil {
    glog.Fatalf("error creating client configuration: %v", err)
    }

    When the code is to be run as a part of the cluster then we can simply use

    import "k8s.io/client-go/rest"  ...  rest.InClusterConfig() 

    Once the connection is established we can use it to create clientset. For accessing Kubernetes objects, generally the clientset from the client-go project is used, but for CRD related operations we need to use the clientset from apiextensions-apiserver project

    apiextension “k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset”

    kubeClient, err := apiextension.NewForConfig(config)
    if err != nil {
    glog.Fatalf("Failed to create client: %v.", err)
    }

    Now we can use the client to make the API call which will create the CRD for us.

    package v1alpha1
    
    import (
    	"reflect"
    
    	apiextensionv1beta1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1beta1"
    	apiextension "k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset"
    	apierrors "k8s.io/apimachinery/pkg/api/errors"
    	meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    )
    
    const (
    	CRDPlural   string = "sslconfigs"
    	CRDGroup    string = "blog.velotio.com"
    	CRDVersion  string = "v1alpha1"
    	FullCRDName string = CRDPlural + "." + CRDGroup
    )
    
    func CreateCRD(clientset apiextension.Interface) error {
    	crd := &apiextensionv1beta1.CustomResourceDefinition{
    		ObjectMeta: meta_v1.ObjectMeta{Name: FullCRDName},
    		Spec: apiextensionv1beta1.CustomResourceDefinitionSpec{
    			Group:   CRDGroup,
    			Version: CRDVersion,
    			Scope:   apiextensionv1beta1.NamespaceScoped,
    			Names: apiextensionv1beta1.CustomResourceDefinitionNames{
    				Plural: CRDPlural,
    				Kind:   reflect.TypeOf(SslConfig{}).Name(),
    			},
    		},
    	}
    
    	_, err := clientset.ApiextensionsV1beta1().CustomResourceDefinitions().Create(crd)
    	if err != nil && apierrors.IsAlreadyExists(err) {
    		return nil
    	}
    	return err
    }

    In the create CRD function, we first create the definition of our custom object and then pass it to the create method which creates it in our cluster. Just like we did while creating our definition using CLI, here also we set the parameters like version, group, kind etc.

    Once our definition is ready we can create objects of its type just like we did earlier using the CLI. First we need to define our object.

    package v1alpha1
    
    import meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    
    type SslConfig struct {
    	meta_v1.TypeMeta   `json:",inline"`
    	meta_v1.ObjectMeta `json:"metadata"`
    	Spec               SslConfigSpec   `json:"spec"`
    	Status             SslConfigStatus `json:"status,omitempty"`
    }
    type SslConfigSpec struct {
    	Cert   string `json:"cert"`
    	Key    string `json:"key"`
    	Domain string `json:"domain"`
    }
    
    type SslConfigStatus struct {
    	State   string `json:"state,omitempty"`
    	Message string `json:"message,omitempty"`
    }
    
    type SslConfigList struct {
    	meta_v1.TypeMeta `json:",inline"`
    	meta_v1.ListMeta `json:"metadata"`
    	Items            []SslConfig `json:"items"`
    }

    Kubernetes API conventions suggests that each object must have two nested object fields that govern the object’s configuration: the object spec and the object status. Objects must also have metadata associated with them. The custom objects that we define here comply with these standards. It is also recommended to create a list type for every type thus we have also created a SslConfigList struct.

    Now we need to write a function which will create a custom client which is aware of the new resource that we have created.

    package v1alpha1
    
    import (
    	meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/apimachinery/pkg/runtime"
    	"k8s.io/apimachinery/pkg/runtime/schema"
    	"k8s.io/apimachinery/pkg/runtime/serializer"
    	"k8s.io/client-go/rest"
    )
    
    var SchemeGroupVersion = schema.GroupVersion{Group: CRDGroup, Version: CRDVersion}
    
    func addKnownTypes(scheme *runtime.Scheme) error {
    	scheme.AddKnownTypes(SchemeGroupVersion,
    		&SslConfig{},
    		&SslConfigList{},
    	)
    	meta_v1.AddToGroupVersion(scheme, SchemeGroupVersion)
    	return nil
    }
    
    func NewClient(cfg *rest.Config) (*SslConfigV1Alpha1Client, error) {
    	scheme := runtime.NewScheme()
    	SchemeBuilder := runtime.NewSchemeBuilder(addKnownTypes)
    	if err := SchemeBuilder.AddToScheme(scheme); err != nil {
    		return nil, err
    	}
    	config := *cfg
    	config.GroupVersion = &SchemeGroupVersion
    	config.APIPath = "/apis"
    	config.ContentType = runtime.ContentTypeJSON
    	config.NegotiatedSerializer = serializer.DirectCodecFactory{CodecFactory: serializer.NewCodecFactory(scheme)}
    	client, err := rest.RESTClientFor(&config)
    	if err != nil {
    		return nil, err
    	}
    	return &SslConfigV1Alpha1Client{restClient: client}, nil
    }

    Building the custom client library

    Once we have registered our custom resource definition with the Kubernetes cluster we can create objects of its type using the Kubernetes cli as we did earlier but for creating controllers for these objects or for developing some custom functionalities around them we need to build a client library also using which we can access them from go API. For native Kubernetes objects, this type of library is provided for each object.

    package v1alpha1
    
    import (
    	meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/client-go/rest"
    )
    
    func (c *SslConfigV1Alpha1Client) SslConfigs(namespace string) SslConfigInterface {
    	return &sslConfigclient{
    		client: c.restClient,
    		ns:     namespace,
    	}
    }
    
    type SslConfigV1Alpha1Client struct {
    	restClient rest.Interface
    }
    
    type SslConfigInterface interface {
    	Create(obj *SslConfig) (*SslConfig, error)
    	Update(obj *SslConfig) (*SslConfig, error)
    	Delete(name string, options *meta_v1.DeleteOptions) error
    	Get(name string) (*SslConfig, error)
    }
    
    type sslConfigclient struct {
    	client rest.Interface
    	ns     string
    }
    
    func (c *sslConfigclient) Create(obj *SslConfig) (*SslConfig, error) {
    	result := &SslConfig{}
    	err := c.client.Post().
    		Namespace(c.ns).Resource("sslconfigs").
    		Body(obj).Do().Into(result)
    	return result, err
    }
    
    func (c *sslConfigclient) Update(obj *SslConfig) (*SslConfig, error) {
    	result := &SslConfig{}
    	err := c.client.Put().
    		Namespace(c.ns).Resource("sslconfigs").
    		Body(obj).Do().Into(result)
    	return result, err
    }
    
    func (c *sslConfigclient) Delete(name string, options *meta_v1.DeleteOptions) error {
    	return c.client.Delete().
    		Namespace(c.ns).Resource("sslconfigs").
    		Name(name).Body(options).Do().
    		Error()
    }
    
    func (c *sslConfigclient) Get(name string) (*SslConfig, error) {
    	result := &SslConfig{}
    	err := c.client.Get().
    		Namespace(c.ns).Resource("sslconfigs").
    		Name(name).Do().Into(result)
    	return result, err
    }

    We can add more methods like watch, update status etc. Their implementation will also be similar to the methods we have defined above. For looking at the methods available for various Kubernetes objects like pod, node etc. we can refer to the v1 package.

    Putting all things together

    Now in our main function we will get all the things together.

    package main
    
    import (
    	"flag"
    	"fmt"
    	"time"
    
    	"blog.velotio.com/crd-blog/v1alpha1"
    	"github.com/golang/glog"
    	apiextension "k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset"
    	meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/client-go/rest"
    	"k8s.io/client-go/tools/clientcmd"
    	clientcmdapi "k8s.io/client-go/tools/clientcmd/api"
    )
    
    var (
    	// Set during build
    	version string
    
    	proxyURL = flag.String("proxy", "",
    		`If specified, it is assumed that a kubctl proxy server is running on the
    		given url and creates a proxy client. In case it is not given InCluster
    		kubernetes setup will be used`)
    )
    
    func main() {
    
    	flag.Parse()
    	var err error
    
    	var config *rest.Config
    	if *proxyURL != "" {
    		config, err = clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
    			&clientcmd.ClientConfigLoadingRules{},
    			&clientcmd.ConfigOverrides{
    				ClusterInfo: clientcmdapi.Cluster{
    					Server: *proxyURL,
    				},
    			}).ClientConfig()
    		if err != nil {
    			glog.Fatalf("error creating client configuration: %v", err)
    		}
    	} else {
    		if config, err = rest.InClusterConfig(); err != nil {
    			glog.Fatalf("error creating client configuration: %v", err)
    		}
    	}
    
    	kubeClient, err := apiextension.NewForConfig(config)
    	if err != nil {
    		glog.Fatalf("Failed to create client: %v", err)
    	}
    	// Create the CRD
    	err = v1alpha1.CreateCRD(kubeClient)
    	if err != nil {
    		glog.Fatalf("Failed to create crd: %v", err)
    	}
    
    	// Wait for the CRD to be created before we use it.
    	time.Sleep(5 * time.Second)
    
    	// Create a new clientset which include our CRD schema
    	crdclient, err := v1alpha1.NewClient(config)
    	if err != nil {
    		panic(err)
    	}
    
    	// Create a new SslConfig object
    
    	SslConfig := &v1alpha1.SslConfig{
    		ObjectMeta: meta_v1.ObjectMeta{
    			Name:   "sslconfigobj",
    			Labels: map[string]string{"mylabel": "test"},
    		},
    		Spec: v1alpha1.SslConfigSpec{
    			Cert:   "my-cert",
    			Key:    "my-key",
    			Domain: "*.velotio.com",
    		},
    		Status: v1alpha1.SslConfigStatus{
    			State:   "created",
    			Message: "Created, not processed yet",
    		},
    	}
    	// Create the SslConfig object we create above in the k8s cluster
    	resp, err := crdclient.SslConfigs("default").Create(SslConfig)
    	if err != nil {
    		fmt.Printf("error while creating object: %vn", err)
    	} else {
    		fmt.Printf("object created: %vn", resp)
    	}
    
    	obj, err := crdclient.SslConfigs("default").Get(SslConfig.ObjectMeta.Name)
    	if err != nil {
    		glog.Infof("error while getting the object %vn", err)
    	}
    	fmt.Printf("SslConfig Objects Found: n%vn", obj)
    	select {}
    }

    Now if we run our code then our custom resource definition will get created in the Kubernetes cluster and also an object of its type will be there just like with the cli. The docker image akash125/crdblog is build using the code discussed above it can be directly pulled from docker hub and run in a Kubernetes cluster. After the image is run successfully, the CRD definition that we discussed above will get created in the cluster along with an object of its type. We can verify the same using the CLI the way we did earlier, we can also check the logs of the pod running the docker image to verify it. The complete code is available here.

    Conclusion and future work

    We learned how to create a custom resource definition and objects using Kubernetes command line interface as well as the Golang client. We also learned how to programmatically access a Kubernetes cluster, using which we can build some really cool stuff on Kubernetes, we can now also create custom controllers for our resources which continuously watches the cluster for various life cycle events of our object and takes desired action accordingly. To read more about CRD refer the following links:

  • Prow + Kubernetes – A Perfect Combination To Execute CI/CD At Scale

    Intro

    Kubernetes is currently the hottest and standard way of deploying workloads in the cloud. It’s well-suited for companies and vendors that need self-healing, high availability, cloud-agnostic characteristics, and easy extensibility.

    Now, on another front, a problem has arisen within the CI/CD domain. Since people are using Kubernetes as the underlying orchestrator, they need a robust CI/CD tool that is entirely Kubernetes-native.

    Enter Prow

    Prow compliments the Kubernetes family in the realm of automation and CI/CD.

    In fact, it is the only project that best exemplifies why and how Kubernetes is such a superb platform to execute CI/CD at scale.

    Prow (meaning: portion of a ship’s bow—ship’s front end–that’s above water) is a Kubernetes-native CI/CD system, and it has been used by many companies over the past few years like Kyma, Istio, Kubeflow, Openshift, etc.

    Where did it come from?

    Kubernetes is one of the largest and most successful open-source projects on GitHub. When it comes to Prow’s conception , the Kubernetes community was trying hard to keep its head above water in matters of CI/CD. Their needs included the execution of more than 10k CI/CD jobs/day, spanning over 100+ different repositories in various GitHub organizations—and other automation technology stacks were just not capable of handling everything at this scale.

    So, the Kubernetes Testing SIG created their own tools to compliment Prow. Because Prow is currently residing under Kubernetes test-infra project, one might underestimate its true prowess/capabilities. I personally would like to see Prow receive a dedicated repo coming out from under the umbrella of test-infra.

    What is Prow?

    Prow is not too complex to understand but still vast in a subtle way. It is designed and built on a distributed microservice architecture native to Kubernetes.

    It has many components that integrate with one another (plank, hook, etc.) and a bunch of standalone ones that are more of a plug-n-play nature (trigger, config-updater, etc.).

    For the context of this blog, I will not be covering Prow’s entire architecture, but feel free to dive into it on your own later. 

    Just to name the main building blocks for Prow:

    • Hook – acts as an API gateway to intercept all requests from Github, which then creates a Prow job custom resource that reads the job configuration as well as calls any specific plugin if needed.
    • Plank – is the Prow job controller; after Hook creates a Prow job, Plank processes it and creates a Kubernetes pod for it to run the tests.
    • Deck – serves as the UI for the history of jobs that ran in the past or are currently running.
    • Horologium – is the component that processes periodic jobs only.
    • Sinker responsible for cleaning up old jobs and pods from the cluster.

    More can be found here: Prow Architecture. Note that this link is not the official doc from Kubernetes but from another great open source project that uses Prow extensively day-in-day-out – Kyma.

    This is how Prow can be picturized:


     

     

    Here is a list of things Prow can do and why it was conceived in the first place.

    • GitHub Automation on a wide range

      – ChatOps via slash command like “/foo
      – Fine-tuned policies and permission management in GitHub via OWNERS files
      – tide – PR/merge automation
      ghProxy – to avoid hitting API limits and to use GitHub API request cache
      – label plugin – labels management 
      – branchprotector – branch protection configuration 
      – releasenote – release notes management
    • Job Execution engine – Plank‍
    • Job status Reporting to CI/CD dashboard – crier‍
    • Dashboards for comprehensive job/PR history, merge status, real-time logs, and other statuses – Deck‍
    • Plug-n-play service to interact with GCS and show job artifacts on dashboard – Spyglass‍
    • Super easy pluggable Prometheus stack for observability – metrics‍
    • Config-as-Code for Prow itself – updateconfig‍
    • And many more, like sinker, branch protector, etc.

    Possible Jobs in Prow

    Here, a job means any “task that is executed over a trigger.” This trigger can be anything from a github commit to a new PR or a periodic cron trigger. Possible jobs in Prow include:  

    • Presubmit – these jobs are triggered when a new github PR is created.
    • Postsubmit – triggered when there is a new commit.
    • Periodic – triggered on a specific cron time trigger.

    Possible states for a job

    • triggered – a new Prow-job custom resource is created reading the job configs
    • pending – a pod is created in response to the Prow-job to run the scripts/tests; Prow-job will be marked pending while the pod is getting created and running 
    • success – if a pod succeeds, the Prow-job status will change to success 
    • failure – if a pod fails, the Prow-job status will be marked failure
    • aborted – when a job is running and the same one is retriggered, then the first pro-job execution will be aborted and its status will change to aborted and the new one is marked pending

    What a job config looks like:

    presubmits:
      kubernetes/community:
      - name: pull-community-verify  # convention: (job type)-(repo name)-(suite name)
        branches:
        - master
        decorate: true
        always_run: true
        spec:
          containers:
          - image: golang:1.12.5
            command:
            - /bin/bash
            args:
            - -c
            - "export PATH=$GOPATH/bin:$PATH && make verify"

    • Here, this job is a “presubmit” type, meaning it will be executed when a PR is created against the “master” branch in repo “kubernetes/community”.
    • As shown in spec, a pod will be created from image “Golang” where this repo will be cloned, and the mentioned command will be executed at the start of the container.
    • The output of that command will decide if the pod has succeeded or failed, which will, in turn, decide if the Prow job has successfully completed.

    More jobs configs used by Kubernetes itself can be found here – Jobs

    Getting a minimalistic Prow cluster up and running on the local system in minutes.

    Pre-reqs:

    • Knowledge of Kubernetes 
    • Knowledge of Google Cloud and IAM

    For the context of this blog, I have created a sample github repo containing all the basic manifest files and config files. For this repo, the basic CI has also been configured. Feel free to clone/fork this and use it as a getting started guide.

    Let’s look at the directory structure for the repo:

    .
    ├── docker/     # Contains docker image in which all the CI jobs will run
    ├── hack/       # Contains small hack scripts used in a wide range of jobs 
    ├── hello.go
    ├── hello_test.go
    ├── Dockerfile
    ├── Makefile
    ├── prow
    │   ├── cluster/       # Install prow on k8s cluster
    │   ├── jobs/          # CI jobs config
    │   ├── labels.yaml    # Prow label config for managing github labels
    │   ├── config.yaml    # Prow config
    │   └── plugins.yaml   # Prow plugins config
    └── README.md

    1. Create a bot account. For info, look here. Add this bot as a collaborator in your repo. 

    2. Create an OAuth2 token from the GitHub GUI for the bot account.

    $ echo "PUT_TOKEN_HERE" > oauth
    $ kubectl create secret generic oauth --from-file=oauth=oauth

    3. Create an OpenSSL token to be used with the Hook.

    $ openssl rand -hex 20 > hmac
    $ kubectl create secret generic hmac --from-file=hmac=hmac

    4. Install all the Prow components mentioned in prow-starter.yaml.

    $ make deploy-prow

    5. Update all the jobs and plugins needed for the CI (rules mentioned in the Makefile). Use commands:

    • Updates in plugins.yaml and presubmits.yaml:
    • Change the repo name (velotio-tech/k8s-prow-guide) for the jobs to be configured 
    • Updates in config.yaml:
    • Create a GCS bucket 
    • Update the name of GCS bucket (GCS_BUCKET_NAME) in the config.yaml
    • Create a service_account.json with GCS storage permission and download the JSON file 
    • Create a secret from above service_account.json
    $ kubectl create secret generic gcs-sa --from-file=service-account.json=service-account.json

    • Update the secret name (GCS_SERVICE_ACC) in config.yaml
    $ make update-config
    $ make update-plugins
    $ make update-jobs

    6. For exposing a webhook from GitHub repo and pointing it to the local machine, use Ultrahook. Install Ultrahook. This will give you a publicly accessible endpoint. In my case, the result looked like this: http://github.sanster23.ultrahook.com. 

    $ echo "api_key: <API_KEY_ULTRAHOOK>" > ~/.ultrahook
    $ ultrahook github http://<MINIKUBE_IP>:<HOOK_NODE_PORT>/hook

    7. Create a webhook in your repo so that all events can be published to Hook via the public URL above:

    • Set the webhook URL from Step 6
    • Set Content Type as application/json
    • Set the value of token the same as hmac token secret, created in Step 2 
    • Check the “Send me everything” box

    8. Create a new PR and see the magic.

    9. Dashboard for Prow will be accessible at http://<minikube_ip>:<deck_node_port></deck_node_port></minikube_ip>

    • MINIKUBE_IP : 192.168.99.100  ( Run “minikube ip”)
    • DECK_NODE_PORT :  32710 ( Run “kubectl get svc deck” )

    I will leave you guys with an official reference of Prow Dashboard:

    What’s Next

    Above is an effort just to give you a taste of what Prow can do with and how easy it is to set up at any scale of infra and for a project of any complexity.

    P.S. – The content surrounding Prow is scarce, making it a bit unexplored in certain ways, but I found this helpful channel on the Kubernetes Slack #prow. Hopefully, this helps you explore the uncharted waters of Kubernetes Native CI/CD. 

  • Hacking Your Way Around AWS IAM Roles

    Identity and Access Management (IAM) offers role-based access control (RBAC) to your AWS account users and resources, and you can granularize the permission set by defining the policy. If you are familiar or even a beginner with AWS cloud, you know how important IAM is.

    “AWS Identity and Access Management (IAM) is a web service that helps you securely control access to AWS resources. You use IAM to control who is authenticated (signed in) and authorized (has permissions) to use resources.”

    – AWS IAM User Guide

    With the emergence of cloud infrastructure services, the coolest thing you can do is write your infrastructure as code. AWS offers SDKs for various programming/scripting languages, and of course, like any other API call, you need to sign a request with tokens. The AWS IAM console lets you generate access_key and secret_access_key tokens. This token can then be configured with your SDK. 

    Alternatively, you can configure the token with your user profile via aws cli. This also means anyone with access_key and secret_access_key will have permissions configured as per the IAM policy. Thus, keeping credentials on the disk is insecure. You can implement a key rotation policy to keep the environment compliant. To even overcome this, you can use the AWS IAM role for services. 

    Let’s say if you are working on an AWS EC2 instance that needs access to some other AWS service, like S3. You can create an IAM role for EC2 with a policy that has appropriate permission to access the S3 bucket. In this case, your SDK doesn’t need a token (not at least on the disk or hardcoded in code). Let’s take a look at the hierarchy of how the AWS SDK looks for a token for signing requests.

    1. Embedded in your code (very insecure). This is the very first place your SDK looks for. Below is a NodeJS example, where access_key and secret_access_key are part of the code itself.

    const {S3} = require("aws-sdk");
    const s3 = new S3({
       accessKeyId : "ABCDEFGHIJKLMNOPQRST",
       secretAccessKey : "7is/HVjA8lm9hRrJyZEPWAs5Bo8KyyvEqjjxIHoO"
      //sessionToken : "options_session_token_if_applicable"
    });

    2. AWS environment variables. If the token is not embedded in your code, your SDK looks for AWS environment variables available to process. These environment variables are AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY, and optional AWS_SESSION_TOKEN. Below is an example where AWS credentials are exported and the aws cli command is used to list S3 buckets. Note that once credentials are exported, they are available to all the child processes. Therefore, these credentials are auto looked up by your AWS SDK.

    3. The AWS credentials (default profile) file located at ~/.aws/credentials. This is the third place for the lookup. You can generate this file by running the command aws configure. You may also manually create this file with various profiles. If you happen to have multiple profiles, you can then export an environment variable called AWS_PROFILE. An example credentials file is given below:

    [default] ; default profile
    aws_access_key_id = <DEFAULT_ACCESS_KEY_ID>
    aws_secret_access_key = <DEFAULT_SECRET_ACCESS_KEY>
      
    [personal-account] ; personal account profile
    aws_access_key_id = <PERSONAL_ACCESS_KEY_ID>
    aws_secret_access_key = <PERSONAL_SECRET_ACCESS_KEY>
      
    [work-account] ; work account profile
    aws_access_key_id = <WORK_ACCESS_KEY_ID>
    aws_secret_access_key = <WORK_SECRET_ACCESS_KEY>

    4. The IAM role attached to your resource. Your resource could be EC2 Instance, Lambda function, AWS glue, ECS Container, RDS, etc. Now, this is a secure way of using credentials. Since your credentials are not stored anywhere on the disk, exported via an environment variable, or hardcoded in the code. You need not worry about key rotation at all.

    TL;DR: IAM roles are a secure way of using credentials. However, they are only applicable to resources within AWS. You can not use them outside of AWS. So, the IAM role can only be attached to resources like EC2, Lambda, ECS, etc.

    The problem statement:

    Let’s say a group of developers needs access to a few S3 buckets and DynamoDB. The organization does not want developers to use access_key and secret_access_key on their local machine (laptop) as access_key and secret_access can be used anywhere or can be stolen. 

    Since IAM roles are more secure, they allocate EC2 with Windows OS and attach the IAM role with appropriate permission to access S3 buckets and DynamoDB and configure IDE and other essential dev tools. Developers then use RDP to connect to EC2 Instance. However, due to license restrictions, only two users can connect with RDP at a given time. So, they add more similar instances. This heavily increases cost. Wouldn’t it be nice, if somehow, IAM roles could be attached to local machines?

    How do IAM roles for resources work?

    Resources like EC2 or Lambda have the link-local address available. The link-local address 169.254.169.254 can be accessed over HTTP port 80 to retrieve instance metadata. For instance, to get the instance-id of an EC2 instance from the host itself, you can query with a GET request to curl -L http://169.254.169.254/latest/meta-data/instance-id/. Similarly, you can retrieve IAM credentials if the IAM role is attached to the EC2 instance. Let’s  assume you have created an IAM role for an EC2 instance with the name “iam-role-for-ec2”. Your SDK will then automatically access credentials via a GET request to curl -L http://169.254.169.254/latest/meta-data/iam/security-credentials/iam-role-for-ec2/

    $ curl -L 169.254.169.254/latest/meta-data/iam/security-credentials/iam-role-for-ec2/
    {
     "Code" : "Success",
     "LastUpdated" : "2021-08-03T09:18:49Z",
     "Type" : "AWS-HMAC",
     "AccessKeyId" : "ASIASP26DFHDIOFNJFFX",
     "SecretAccessKey" : "EK1A7x9dntSzF9LlG7BK08C6zpTS/F6MHYTBo/+U",
     "Token" : "IQoJb3JpZ2luX2VjEPr//////////wEaCXVzLXdlc3QtMiJIMEYCIQCOCqHrHjEkYZUFsRtGXwa8gfGjsBmaU+WrL2Z0ihvA3QIhAIsGhJFiPetOod7IUUC++unWZfoUEgjEU0ULYwZUvGwwKvoDCBIQAhoMMTcxNDU5MTYwNTE4IgxFUXJfE/0cdJs2Gigq1wM8Ww8yAS2i2qUqsQ1t+yd4ATkE5fvIMDtHxzPQ2raVQb+cCgC/eJVQpeNET1SP01HnrN5W1QFID+xOPk3vZt6NrCy48OUf6+cCGrd63Jv/7glAsyQGaGM/Jt5ddi6593dgN7VLFHsEBAwqkZ3j/VjAzYbthP3clmRl++6k+vpiUp2j4uwM4zW/6f8faR6awPbPVmJsyh94pXaQXJU+H0w+9Hp0MlUvP6GRqBiuTwv/+EOiRfth1XGRxxOuR5X+fr0Ve4tede2x0ZvSLeUsUENHlOQnUkSGbu1Hiv1BhDEjhzbHi7PXhW1G9N1FZObE+wdF4hGYbe3LUUIrnp2xnIcxKzmume2YQvFE4DvJvBtF22DsdLP4GPmitofhV2FGcVxP1f5Nv76M6SfOQY65vSZQde4LIwcotRIrMgwEWup2Rplq6s56K93IYXp6QmnUWLgdtcMBTMVQsOFhCdj05P+VYqlKe5xRT4/8BucmIHn7+J4indNoL+3BvYvnpiISdcEhlyswNZOPhVQJjwJfKPPdu9NDEKQ+Jep4wpVvOSh+CAtxKtqwGz1wrKzqlRvzqBFaEQrD4WdPdf9YnTvmKIXgPuk74pZRlarVsREL0KmG6G0zzA2lRYow6JOkiAY6pAHIZGH+UH5RL79drKe86tUnWCORcX9omN2uUK7FemTENwyvholib4jLGY6HcjvDF10jqkcu1KEV20xNsPj87BP7irEH7xH//Jz2+rnSaN5PCqLezSsATPYhHFQjg6Oti+0E33F+F5MA25Pn2+u5TDP1VfFgYExwSor79gNtwbOMs76432ssHYFioYjHttPfVwyNXloLCwgphqJBwiNhMDMcKapK6Q==",
     "Expiration" : "2021-08-03T15:47:26Z"
    }

    Notice that the response payload is JSON with AccessKeyId, SecretAccessKey, and Token. Additionally, there is an Expiration key, which states the validity of the token. This means the token is autogenerated once they expire.

    Solution:

    Now that you know how IAM roles work and how important link-local address is, you have probably guessed what needs to be done so that you can access IAM role credentials from your local machine. The two solutions that popup in my mind are:

    1. Host a lightweight reverse proxy server like Nginx and then write a wrapper around your SDK so that initial calls are made to EC2 and credentials are retrieved.

    2. Route traffic originating from your system, targeting 169.254.169.254. Traffic should reach the EC2 instance and EC2 itself should take care of forwarding packets to the instance metadata server.

    The second solution may sound pretty techy, but it is the ideal solution, and you don’t need to do additional tweaking in your SDK. The developer is transparent about what is being implemented. This blog will focus on implementing a second solution.

    Implementation:

    1. Launch a Linux (Ubuntu 20.04 LTS prefered) EC2 instance from AWS console and attach the IAM role with appropriate permissions. The instance should be in the public subnet and make sure to attach an Elastic IP address. Whitelist incoming port 1194 UDP (open to world) and port 22 (ssh, open to your IP address only) TCP in your instance security group.

    2. Install OpenVPN and git package. apt update; apt install git openvpn.

    3. Clone easy-rsa repository on your server. cd ~;git clone https://github.com/OpenVPN/easy-rsa.git

    4. Generate certificates for OpenVPN server and client using easy-rsa.

    #switch to easy-rsa directory
    cd ~/easy-rsa/easyrsa3
    #copy vars.example to vars
    cp vars.example vars
    #Find below variables in "vars" file and edit them according to your needs
    set_var EASYRSA_REQ_COUNTRY    "US"
    set_var EASYRSA_REQ_PROVINCE   "California"
    set_var EASYRSA_REQ_CITY       "San Francisco"
    set_var EASYRSA_REQ_ORG        "Copyleft Certificate Co"
    set_var EASYRSA_REQ_EMAIL      "me@example.net"
    set_var EASYRSA_REQ_OU         "My Organizational Unit"
    #Also edit below two variables if you plan to run easyrsa in non-interactive mode
    # EASYRSA_REQ_CN should be set to your ElasticIP Address.
    # Note: If your are using openvpn behind a load balancer, or if you plan to map DNS to your server, then this should be set to your DNS name
    set_var EASYRSA_REQ_CN         "Your Instance Elastic IP"
    set_var EASYRSA_BATCH          "NONEMPTY"
    #====================================================
    #Generate certificate and keys for server and client
    ./easyrsa init-pki
    ./easyrsa build-ca nopass
    ./easyrsa gen-dh
    ./easyrsa build-server-full server nopass
    ./easyrsa build-client-full client nopass
    #Copy certificates and keys to server configuration
    cp -p ./pki/ca.crt /etc/openvpn/
    cp -p ./pki/issued/server.crt /etc/openvpn/
    cp -p ./pki/private/server.key /etc/openvpn/
    cp -p ./pki/dh.pem /etc/openvpn/dh2049.pem
    cd /etc/openvpn
    openvpn --genkey --secret myvpn.tlsauth
    echo "net.ipv4.ip_forward = 1" >>/etc/sysctl.conf
    sysctl -p

    5. Configure OpenVPN server.conf file:

    port 1194
    proto udp
    dev tun
    ca ca.crt
    cert server.crt
    key server.key # This file should be kept secret
    dh dh2048.pem
    topology subnet
    server 10.8.0.0 255.255.255.0
    ifconfig-pool-persist ipp.txt
    push "redirect-gateway def1 bypass-dhcp"
    push "dhcp-option DNS 8.8.8.8"
    push "dhcp-option DNS 1.1.1.1"
    push "route 169.254.169.254 255.255.255.255"
    keepalive 10 120
    tls-auth myvpn.tlsauth 0
    cipher AES-256-CBC
    comp-lzo
    user nobody
    group nogroup
    persist-key
    persist-tun
    status openvpn-status.log
    log-append  /var/log/openvpn.log
    verb 4
    explicit-exit-notify 1
    remote-cert-eku "TLS Web Client Authentication"

    In the above configuration file, make sure line number 9 is not conflicting with your AWS VPC CIDR. Line number 14 (push “route 169.254.169.254 255.255.255.255”) does a trick for us and is the heart of this blog post. This assures that when a client connects via OpenVPN, a route is added to the client machine so that packets targeting 168.254.169.254 are routed via OpenVPN tunnel. (Note: If you do not add this here, you can manually add a route to your client-side once OpenVPN is connected. ip route add 169.254.169.254/32 YOUR_TUNNEL_IP dev tun0)

    6. Generate an OpenVPN client configuration file:

    #These commands are executed on your EC2 (OopenvVpn)
    cd ~/easy-rsa/easyrsa3
    cat <<EOF >/tmp/client.ovpn
    client
    dev tun
    proto udp
    remote YOUR-ELASTIC-IP-ADDRESS 1194
    resolv-retry infinite
    nobind
    persist-key
    persist-tun
    cipher AES-256-CBC
    comp-lzo
    verb 3
    key-direction 1
    EOF
    #append ca certificate
    echo '<ca>' >>/tmp/client.ovpn
    cat ./pki/ca.crt >>/tmp/client.ovpn
    echo '</ca>' >>/tmp/client.ovpn
    #append client certificate
    echo '<cert>' >>/tmp/client.ovpn
    sed -n '/BEGIN CERTIFICATE/,/END CERTIFICATE/{p;/END CERTIFICATE/q}' ./pki/issued/client.crt >>/tmp/client.ovpn
    echo '</cert>' >>/tmp/client.ovpn
    #append client key
    echo '<key>' >>/tmp/client.ovpn
    cat ./pki/private/client.key >>/tmp/client.ovpn
    echo '</key>' >>/tmp/client.ovpn
    #append TLS auth key
    echo '<tls-auth>' >>/tmp/client.ovpn
    cat /etc/openvpn/myvpn.tlsauth >>/tmp/client.ovpn
    echo '</tls-auth>' >>/tmp/client.ovpn

    In the above configuration file, make sure to update line number 9. This could be your EC2 elastic IP address (or domain if mapped and configured).

    7. Finally, download the /tmp/client.ovpn file to your local machine. Install the OpenVPN client software, import the client.ovpn file, and connect. If you are using a Linux machine, you may connect using sudo openvpn –config /path/to/client.ovpn.

    Testing:

    Let us say you have configured the IAM role with permission that lets you list S3 buckets. You should be able to access AWS resources once the OpenVPN client is connected. Your SDK should automatically look for credentials via metadata link-local address. You may install the aws-cli utility and run aws s3 ls to list S3 buckets.

    Conclusion:

    IAM roles are meant to be used with AWS resources like EC2, ECS, Lambda, etc. so that you don’t keep the credentials hardcoded in the code or in the configuration file left unsecured on the disk. Our goal was to use the IAM role directly from the local machine (laptop). We achieved this by using OpenVPN secure SSL tunnel. The VPN assures that we are in a private network, thus keeping the environment compliant. This guide is not meant for how one should set up an OpenVPN server/client. Therefore, you must harden the OpenVPN server. You may put the server behind the network load balancer and may enforce MAC binding features to your clients.