Tag: load balancers

  • A Primer on HTTP Load Balancing in Kubernetes using Ingress on Google Cloud Platform

    Containerized applications and Kubernetes adoption in cloud environments is on the rise. One of the challenges while deploying applications in Kubernetes is exposing these containerized applications to the outside world. This blog explores different options via which applications can be externally accessed with focus on Ingress – a new feature in Kubernetes that provides an external load balancer. This blog also provides a simple hand-on tutorial on Google Cloud Platform (GCP).  

    Ingress is the new feature (currently in beta) from Kubernetes which aspires to be an Application Load Balancer intending to simplify the ability to expose your applications and services to the outside world. It can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, offer name based virtual hosting etc. Before we dive into Ingress, let’s look at some of the alternatives currently available that help expose your applications, their complexities/limitations and then try to understand Ingress and how it addresses these problems.

    Current ways of exposing applications externally:

    There are certain ways using which you can expose your applications externally. Lets look at each of them:

    EXPOSE Pod:

    You can expose your application directly from your pod by using a port from the node which is running your pod, mapping that port to a port exposed by your container and using the combination of your HOST-IP:HOST-PORT to access your application externally. This is similar to what you would have done when running docker containers directly without using Kubernetes. Using Kubernetes you can use hostPortsetting in service configuration which will do the same thing. Another approach is to set hostNetwork: true in service configuration to use the host’s network interface from your pod.

    Limitations:

    • In both scenarios you should take extra care to avoid port conflicts at the host, and possibly some issues with packet routing and name resolutions.
    • This would limit running only one replica of the pod per cluster node as the hostport you use is unique and can bind with only one service.

    EXPOSE Service:

    Kubernetes services primarily work to interconnect different pods which constitute an application. You can scale the pods of your application very easily using services. Services are not primarily intended for external access, but there are some accepted ways to expose services to the external world.

    Basically, services provide a routing, balancing and discovery mechanism for the pod’s endpoints. Services target pods using selectors, and can map container ports to service ports. A service exposes one or more ports, although usually, you will find that only one is defined.

    A service can be exposed using 3 ServiceType choices:

    • ClusterIP: Exposes the service on a cluster-internal IP. Choosing this value makes the service only reachable from within the cluster. This is the default ServiceType.
    • NodePort: Exposes the service on each Node’s IP at a static port (the NodePort). A ClusterIP service, to which the NodePort service will route, is automatically created. You’ll be able to contact the NodePort service, from outside the cluster, by requesting <nodeip>:<nodeport>.Here NodePort remains fixed and NodeIP can be any node IP of your Kubernetes cluster.</nodeport></nodeip>
    • LoadBalancer: Exposes the service externally using a cloud provider’s load balancer (eg. AWS ELB). NodePort and ClusterIP services, to which the external load balancer will route, are automatically created.
    • ExternalName: Maps the service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up. This requires version 1.7 or higher of kube-dns

    Limitations:

    • If we choose NodePort to expose our services, kubernetes will generate ports corresponding to the ports of your pods in the range of 30000-32767. You will need to add an external proxy layer that uses DNAT to expose more friendly ports. The external proxy layer will also have to take care of load balancing so that you leverage the power of your pod replicas. Also it would not be easy to add TLS or simple host header routing rules to the external service.
    • ClusterIP and ExternalName similarly while easy to use have the limitation where we can add any routing or load balancing rules.
    • Choosing LoadBalancer is probably the easiest of all methods to get your service exposed to the internet. The problem is that there is no standard way of telling a Kubernetes service about the elements that a balancer requires, again TLS and host headers are left out. Another limitation is reliance on an external load balancer (AWS’s ELB, GCP’s Cloud Load Balancer etc.)

    Endpoints

    Endpoints are usually automatically created by services, unless you are using headless services and adding the endpoints manually. An endpoint is a host:port tuple registered at Kubernetes, and in the service context it is used to route traffic. The service tracks the endpoints as pods, that match the selector are created, deleted and modified. Individually, endpoints are not useful to expose services, since they are to some extent ephemeral objects.

    Summary

    If you can rely on your cloud provider to correctly implement the LoadBalancer for their API, to keep up-to-date with Kubernetes releases, and you are happy with their management interfaces for DNS and certificates, then setting up your services as type LoadBalancer is quite acceptable.

    On the other hand, if you want to manage load balancing systems manually and set up port mappings yourself, NodePort is a low-complexity solution. If you are directly using Endpoints to expose external traffic, perhaps you already know what you are doing (but consider that you might have made a mistake, there could be another option).

    Given that none of these elements has been originally designed to expose services to the internet, their functionality may seem limited for this purpose.

    Understanding Ingress

    Traditionally, you would create a LoadBalancer service for each public application you want to expose. Ingress gives you a way to route requests to services based on the request host or path, centralizing a number of services into a single entrypoint.

    Ingress is split up into two main pieces. The first is an Ingress resource, which defines how you want requests routed to the backing services and second is the Ingress Controller which does the routing and also keeps track of the changes on a service level.

    Ingress Resources

    The Ingress resource is a set of rules that map to Kubernetes services. Ingress resources are defined purely within Kubernetes as an object that other entities can watch and respond to.

    Ingress Supports defining following rules in beta stage:

    • host header:  Forward traffic based on domain names.
    • paths: Looks for a match at the beginning of the path.
    • TLS: If the ingress adds TLS, HTTPS and a certificate configured through a secret will be used.

    When no host header rules are included at an Ingress, requests without a match will use that Ingress and be mapped to the backend service. You will usually do this to send a 404 page to requests for sites/paths which are not sent to the other services. Ingress tries to match requests to rules, and forwards them to backends, which are composed of a service and a port.

    Ingress Controllers

    Ingress controller is the entity which grants (or remove) access, based on the changes in the services, pods and Ingress resources. Ingress controller gets the state change data by directly calling Kubernetes API.

    Ingress controllers are applications that watch Ingresses in the cluster and configure a balancer to apply those rules. You can configure any of the third party balancers like HAProxy, NGINX, Vulcand or Traefik to create your version of the Ingress controller.  Ingress controller should track the changes in ingress resources, services and pods and accordingly update configuration of the balancer.

    Ingress controllers will usually track and communicate with endpoints behind services instead of using services directly. This way some network plumbing is avoided, and we can also manage the balancing strategy from the balancer. Some of the open source implementations of Ingress Controllers can be found here.

    Now, let’s do an exercise of setting up a HTTP Load Balancer using Ingress on Google Cloud Platform (GCP), which has already integrated the ingress feature in it’s Container Engine (GKE) service.

    Ingress-based HTTP Load Balancer in Google Cloud Platform

    The tutorial assumes that you have your GCP account setup done and a default project created. We will first create a Container cluster, followed by deployment of a nginx server service and an echoserver service. Then we will setup an ingress resource for both the services, which will configure the HTTP Load Balancer provided by GCP

    Basic Setup

    Get your project ID by going to the “Project info” section in your GCP dashboard. Start the Cloud Shell terminal, set your project id and the compute/zone in which you want to create your cluster.

    $ gcloud config set project glassy-chalice-129514$ 
    gcloud config set compute/zone us-east1-d
    # Create a 3 node cluster with name “loadbalancedcluster”$ 
    gcloud container clusters create loadbalancedcluster  

    Fetch the cluster credentials for the kubectl tool:

    $ gcloud container clusters get-credentials loadbalancedcluster --zone us-east1-d --project glassy-chalice-129514

    Step 1: Deploy an nginx server and echoserver service

    $ kubectl run nginx --image=nginx --port=80
    $ kubectl run echoserver --image=gcr.io/google_containers/echoserver:1.4 --port=8080
    $ kubectl get deployments
    NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    echoserver   1         1         1            1           15s
    nginx        1         1         1            1           26m

    Step 2: Expose your nginx and echoserver deployment as a service internally

    Create a Service resource to make the nginx and echoserver deployment reachable within your container cluster:

    $ kubectl expose deployment nginx --target-port=80  --type=NodePort
    $ kubectl expose deployment echoserver --target-port=8080 --type=NodePort

    When you create a Service of type NodePort with this command, Container Engine makes your Service available on a randomly-selected high port number (e.g. 30746) on all the nodes in your cluster. Verify the Service was created and a node port was allocated:

    $ kubectl get service nginx
    NAME      CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
    nginx     10.47.245.54   <nodes>       80:30746/TCP   20s
    $ kubectl get service echoserver
    NAME         CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
    echoserver   10.47.251.9   <nodes>       8080:32301/TCP   33s

    In the output above, the node port for the nginx Service is 30746 and for echoserver service is 32301. Also, note that there is no external IP allocated for this Services. Since the Container Engine nodes are not externally accessible by default, creating this Service does not make your application accessible from the Internet. To make your HTTP(S) web server application publicly accessible, you need to create an Ingress resource.

    Step 3: Create an Ingress resource

    On Container Engine, Ingress is implemented using Cloud Load Balancing. When you create an Ingress in your cluster, Container Engine creates an HTTP(S) load balancer and configures it to route traffic to your application. Container Engine has internally defined an Ingress Controller, which takes the Ingress resource as input for setting up proxy rules and talk to Kubernetes API to get the service related information.

    The following config file defines an Ingress resource that directs traffic to your nginx and echoserver server:

    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata: 
    name: fanout-ingress
    spec: 
    rules: 
    - http:     
    paths:     
    - path: /       
    backend:         
    serviceName: nginx         
    servicePort: 80     
    - path: /echo       
    backend:         
    serviceName: echoserver         
    servicePort: 8080

    To deploy this Ingress resource run in the cloud shell:

    $ kubectl apply -f basic-ingress.yaml

    Step 4: Access your application

    Find out the external IP address of the load balancer serving your application by running:

    $ kubectl get ingress fanout-ingres
    NAME             HOSTS     ADDRESS          PORTS     AG
    fanout-ingress   *         130.211.36.168   80        36s    

     

    Use http://<external-ip-address> </external-ip-address>and http://<external-ip-address>/echo</external-ip-address> to access nginx and the echo-server.

    Summary

    Ingresses are simple and very easy to deploy, and really fun to play with. However, it’s currently in beta phase and misses some of the features that may restrict it from production use. Stay tuned to get updates in Ingress on Kubernetes page and their Github repo.

    References

  • Demystifying High Availability in Kubernetes Using Kubeadm

    Introduction

    The rise of containers has reshaped the way we develop, deploy and maintain the software. Containers allow us to package the different services that constitute an application into separate containers, and to deploy those containers across a set of virtual and physical machines. This gives rise to container orchestration tool to automate the deployment, management, scaling and availability of a container-based application. Kubernetes allows deployment and management of container-based applications at scale. Learn more about backup and disaster recovery for your Kubernetes clusters.

    One of the main advantages of Kubernetes is how it brings greater reliability and stability to the container-based distributed application, through the use of dynamic scheduling of containers. But, how do you make sure Kubernetes itself stays up when a component or its master node goes down?
     

     

    Why we need Kubernetes High Availability?

    Kubernetes High-Availability is about setting up Kubernetes, along with its supporting components in a way that there is no single point of failure. A single master cluster can easily fail, while a multi-master cluster uses multiple master nodes, each of which has access to same worker nodes. In a single master cluster the important component like API server, controller manager lies only on the single master node and if it fails you cannot create more services, pods etc. However, in case of Kubernetes HA environment, these important components are replicated on multiple masters(usually three masters) and if any of the masters fail, the other masters keep the cluster up and running.

    Advantages of multi-master

    In the Kubernetes cluster, the master node manages the etcd database, API server, controller manager, and scheduler, along with all the worker nodes. What if we have only a single master node and if that node fails, all the worker nodes will be unscheduled and the cluster will be lost.

    In a multi-master setup, by contrast, a multi-master provides high availability for a single cluster by running multiple apiserver, etcd, controller-manager, and schedulers. This does not only provides redundancy but also improves network performance because all the masters are dividing the load among themselves.

    A multi-master setup protects against a wide range of failure modes, from a loss of a single worker node to the failure of the master node’s etcd service. By providing redundancy, a multi-master cluster serves as a highly available system for your end-users.

    Steps to Achieve Kubernetes HA

    Before moving to steps to achieve high-availability, let us understand what we are trying to achieve through a diagram:

    (Image Source: Kubernetes Official Documentation)

    Master Node: Each master node in a multi-master environment run its’ own copy of Kube API server. This can be used for load balancing among the master nodes. Master node also runs its copy of the etcd database, which stores all the data of cluster. In addition to API server and etcd database, the master node also runs k8s controller manager, which handles replication and scheduler, which schedules pods to nodes.

    Worker Node: Like single master in the multi-master cluster also the worker runs their own component mainly orchestrating pods in the Kubernetes cluster. We need 3 machines which satisfy the Kubernetes master requirement and 3 machines which satisfy the Kubernetes worker requirement.

    For each master, that has been provisioned, follow the installation guide to install kubeadm and its dependencies. In this blog we will use k8s 1.10.4 to implement HA.

    Note: Please note that cgroup driver for docker and kubelet differs in some version of k8s, make sure you change cgroup driver to cgroupfs for docker and kubelet. If cgroup driver for kubelet and docker differs then the master doesn’t come up when rebooted.

    Setup etcd cluster

    1. Install cfssl and cfssljson

    $ curl -o /usr/local/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 
    $ curl -o /usr/local/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
    $ chmod +x /usr/local/bin/cfssl*
    $ export PATH=$PATH:/usr/local/bin

    2 . Generate certificates on master-0

    $ mkdir -p /etc/kubernetes/pki/etcd
    $ cd /etc/kubernetes/pki/etcd

    3. Create config.json file in /etc/kubernetes/pki/etcd folder with following content.

    {
        "signing": {
            "default": {
                "expiry": "43800h"
            },
            "profiles": {
                "server": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "server auth",
                        "client auth"
                    ]
                },
                "client": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "client auth"
                    ]
                },
                "peer": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "server auth",
                        "client auth"
                    ]
                }
            }
        }
    }

    4. Create ca-csr.json file in /etc/kubernetes/pki/etcd folder with following content.

    {
        "CN": "etcd",
        "key": {
            "algo": "rsa",
            "size": 2048
        }
    }

    5. Create client.json file in /etc/kubernetes/pki/etcd folder with following content.

    {
        "CN": "client",
        "key": {
            "algo": "ecdsa",
            "size": 256
        }
    }

    $ cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
    $ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client

    6. Create a directory  /etc/kubernetes/pki/etcd on master-1 and master-2 and copy all the generated certificates into it.

    7. On all masters, now generate peer and etcd certs in /etc/kubernetes/pki/etcd. To generate them, we need the previous CA certificates on all masters.

    $ export PEER_NAME=$(hostname)
    $ export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet K[d.]+')
    
    $ cfssl print-defaults csr > config.json
    $ sed -i 's/www.example.net/'"$PRIVATE_IP"'/' config.json
    $ sed -i 's/example.net/'"$PEER_NAME"'/' config.json
    $ sed -i '0,/CN/{s/example.net/'"$PEER_NAME"'/}' config.json
    
    $ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server config.json | cfssljson -bare server
    $ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer config.json | cfssljson -bare peer

    This will replace the default configuration with your machine’s hostname and IP address, so in case if you encounter any problem just check the hostname and IP address are correct and rerun cfssl command.

    8. On all masters, Install etcd and set it’s environment file.

    $ yum install etcd -y
    $ touch /etc/etcd.env
    $ echo "PEER_NAME=$PEER_NAME" >> /etc/etcd.env
    $ echo "PRIVATE_IP=$PRIVATE_IP" >> /etc/etcd.env

    9. Now, we will create a 3 node etcd cluster on all 3 master nodes. Starting etcd service on all three nodes as systemd. Create a file /etc/systemd/system/etcd.service on all masters.

    [Unit]
    Description=etcd
    Documentation=https://github.com/coreos/etcd
    Conflicts=etcd.service
    Conflicts=etcd2.service
    
    [Service]
    EnvironmentFile=/etc/etcd.env
    Type=notify
    Restart=always
    RestartSec=5s
    LimitNOFILE=40000
    TimeoutStartSec=0
    
    ExecStart=/bin/etcd --name <host_name>  --data-dir /var/lib/etcd --listen-client-urls http://<host_private_ip>:2379,http://127.0.0.1:2379 --advertise-client-urls http://<host_private_ip>:2379 --listen-peer-urls http://<host_private_ip>:2380 --initial-advertise-peer-urls http://<host_private_ip>:2380 --cert-file=/etc/kubernetes/pki/etcd/server.pem --key-file=/etc/kubernetes/pki/etcd/server-key.pem --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --initial-cluster master-0=http://<master0_private_ip>:2380,master-1=http://<master1_private_ip>:2380,master-2=http://<master2_private_ip>:2380 --initial-cluster-token my-etcd-token --initial-cluster-state new --client-cert-auth=false --peer-client-cert-auth=false
    
    [Install]
    WantedBy=multi-user.target

    10. Ensure that you will replace the following placeholder with

    • <host_name> : Replace as the master’s hostname</host_name>
    • <host_private_ip>: Replace as the current host private IP</host_private_ip>
    • <master0_private_ip>: Replace as the master-0 private IP</master0_private_ip>
    • <master1_private_ip>: Replace as the master-1 private IP</master1_private_ip>
    • <master2_private_ip>: Replace as the master-2 private IP</master2_private_ip>

    11. Start the etcd service on all three master nodes and check the etcd cluster health:

    $ systemctl daemon-reload
    $ systemctl enable etcd
    $ systemctl start etcd
    
    $ etcdctl cluster-health

    This will show the cluster healthy and connected to all three nodes.

    Setup load balancer

    There are multiple cloud provider solutions for load balancing like AWS elastic load balancer, GCE load balancing etc. There might not be a physical load balancer available, we can setup a virtual IP load balancer to healthy node master. We are using keepalived for load balancing, install keepalived on all master nodes

    $ yum install keepalived -y

    Create the following configuration file /etc/keepalived/keepalived.conf on all master nodes:

    ! Configuration File for keepalived
    global_defs {
      router_id LVS_DEVEL
    }
    
    vrrp_script check_apiserver {
      script "/etc/keepalived/check_apiserver.sh"
      interval 3
      weight -2
      fall 10
      rise 2
    }
    
    vrrp_instance VI_1 {
        state <state>
        interface <interface>
        virtual_router_id 51
        priority <priority>
        authentication {
            auth_type PASS
            auth_pass velotiotechnologies
        }
        virtual_ipaddress {
            <virtual ip>
        }
        track_script {
            check_apiserver
        }
    }

    • state is either MASTER (on the first master nodes) or BACKUP (the other master nodes).
    • Interface is generally the primary interface, in my case it is eth0
    • Priority should be higher for master node e.g 101 and lower for others e.g 100
    • Virtual_ip should contain the virtual ip of master nodes

    Install the following health check script to /etc/keepalived/check_apiserver.sh on all master nodes:

    #!/bin/sh
    
    errorExit() {
        echo "*** $*" 1>&2
        exit 1
    }
    
    curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
    if ip addr | grep -q <VIRTUAL-IP>; then
        curl --silent --max-time 2 --insecure https://<VIRTUAL-IP>:6443/ -o /dev/null || errorExit "Error GET https://<VIRTUAL-IP>:6443/"
    fi

    $ systemctl restart keepalived

    Setup three master node cluster

    Run kubeadm init on master0:

    Create config.yaml file with following content.

    apiVersion: kubeadm.k8s.io/v1alpha1
    kind: MasterConfiguration
    api:
      advertiseAddress: <master-private-ip>
    etcd:
      endpoints:
      - http://<master0-ip-address>:2379
      - http://<master1-ip-address>:2379
      - http://<master2-ip-address>:2379
      caFile: /etc/kubernetes/pki/etcd/ca.pem
      certFile: /etc/kubernetes/pki/etcd/client.pem
      keyFile: /etc/kubernetes/pki/etcd/client-key.pem
    networking:
      podSubnet: <podCIDR>
    apiServerCertSANs:
    - <load-balancer-ip>
    apiServerExtraArgs:
      endpoint-reconciler-type: lease

    Please ensure that the following placeholders are replaced:

    • <master-private-ip> with the private IPv4 of the master server on which config file resides.</master-private-ip>
    • <master0-ip-address>, <master1-ip-address> and <master-2-ip-address> with the IP addresses of your three master nodes</master-2-ip-address></master1-ip-address></master0-ip-address>
    • <podcidr> with your Pod CIDR. Please read the </podcidr>CNI network section of the docs for more information. Some CNI providers do not require a value to be set. I am using weave-net as pod network, hence podCIDR will be 10.32.0.0/12
    • <load-balancer-ip> with the virtual IP set up in the load balancer in the previous section.</load-balancer-ip>
    $ kubeadm init --config=config.yaml

    10. Run kubeadm init on master1 and master2:

    First of all copy /etc/kubernetes/pki/ca.crt, /etc/kubernetes/pki/ca.key, /etc/kubernetes/pki/sa.key, /etc/kubernetes/pki/sa.pub to master1’s and master2’s /etc/kubernetes/pki folder.

    Note: Copying this files is crucial, otherwise the other two master nodes won’t go into the ready state.

    Copy the config file config.yaml from master0 to master1 and master2. We need to change <master-private-ip> to current master host’s private IP.</master-private-ip>

    $ kubeadm init --config=config.yaml

    11. Now you can install pod network on all three masters to bring them in the ready state. I am using weave-net pod network, to apply weave-net run:

    export kubever=$(kubectl version | base64 | tr -d 'n') kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"

    12. By default, k8s doesn’t schedule any workload on the master, so if you want to schedule workload on master node as well, taint all the master nodes using the command:

    $ kubectl taint nodes --all node-role.kubernetes.io/master-

    13. Now that we have functional master nodes, we can join some worker nodes:

    Use the join string you got at the end of kubeadm init command

    $ kubeadm join 10.0.1.234:6443 --token llb1kx.azsbunpbg13tgc8k --discovery-token-ca-cert-hash sha256:1ad2a436ce0c277d0c5bd3826091e72badbd8417ffdbbd4f6584a2de588bf522

    High Availability in action

    The Kubernetes HA cluster will look like:

    [root@master-0 centos]# kubectl get nodes
    NAME       STATUS     ROLES     AGE       VERSION
    master-0   NotReady   master    4h        v1.10.4
    master-1   Ready      master    4h        v1.10.4
    master-2   Ready      master    4h        v1.10.4

    [root@master-0 centos]# kubectl get pods -n kube-system
    NAME                              READY     STATUS     RESTARTS   AGE
    kube-apiserver-master-0            1/1       Unknown    0          4h
    kube-apiserver-master-1            1/1       Running    0          4h
    kube-apiserver-master-2            1/1       Running    0          4h
    kube-controller-manager-master-0   1/1       Unknown    0          4h
    kube-controller-manager-master-1   1/1       Running    0          4h
    kube-controller-manager-master-2   1/1       Running    0          4h
    kube-dns-86f4d74b45-wh795          3/3       Running    0          4h
    kube-proxy-9ts6r                   1/1       Running    0          4h
    kube-proxy-hkbn7                   1/1       NodeLost   0          4h
    kube-proxy-sq6l6                   1/1       Running    0          4h
    kube-scheduler-master-0            1/1       Unknown    0          4h
    kube-scheduler-master-1            1/1       Running    0          4h
    kube-scheduler-master-2            1/1       Running    0          4h
    weave-net-6nzbq                    2/2       NodeLost   0          4h
    weave-net-ndx2q                    2/2       Running    0          4h
    weave-net-w2mfz                    2/2       Running    0          4h

    After failing over one master node the Kubernetes cluster is still accessible.

    [root@master-0 centos]# kubectl get nodes
    NAME       STATUS     ROLES     AGE       VERSION
    master-0   NotReady   master    4h        v1.10.4
    master-1   Ready      master    4h        v1.10.4
    master-2   Ready      master    4h        v1.10.4

    [root@master-0 centos]# kubectl get pods -n kube-system
    NAME                              READY     STATUS     RESTARTS   AGE
    kube-apiserver-master-0            1/1       Unknown    0          4h
    kube-apiserver-master-1            1/1       Running    0          4h
    kube-apiserver-master-2            1/1       Running    0          4h
    kube-controller-manager-master-0   1/1       Unknown    0          4h
    kube-controller-manager-master-1   1/1       Running    0          4h
    kube-controller-manager-master-2   1/1       Running    0          4h
    kube-dns-86f4d74b45-wh795          3/3       Running    0          4h
    kube-proxy-9ts6r                   1/1       Running    0          4h
    kube-proxy-hkbn7                   1/1       NodeLost   0          4h
    kube-proxy-sq6l6                   1/1       Running    0          4h
    kube-scheduler-master-0            1/1       Unknown    0          4h
    kube-scheduler-master-1            1/1       Running    0          4h
    kube-scheduler-master-2            1/1       Running    0          4h
    weave-net-6nzbq                    2/2       NodeLost   0          4h
    weave-net-ndx2q                    2/2       Running    0          4h
    weave-net-w2mfz                    2/2       Running    0          4h

    Even after one node failed, all the important components are up and running. The cluster is still accessible and you can create more pods, deployment services etc.

    [root@master-1 centos]# kubectl create -f nginx.yaml 
    deployment.apps "nginx-deployment" created
    [root@master-1 centos]# kubectl get pods -o wide
    NAME                                READY     STATUS    RESTARTS   AGE       IP              NODE
    nginx-deployment-75675f5897-884kc   1/1       Running   0          10s       10.117.113.98   master-2
    nginx-deployment-75675f5897-crgxt   1/1       Running   0          10s       10.117.113.2    master-1

    Conclusion

    High availability is an important part of reliability engineering, focused on making system reliable and avoid any single point of failure of the complete system. At first glance, its implementation might seem quite complex, but high availability brings tremendous advantages to the system that requires increased stability and reliability. Using highly available cluster is one of the most important aspects of building a solid infrastructure.

  • How Much Do You Really Know About Simplified Cloud Deployments?

    Is your EC2/VM bill giving you sleepless nights?

    Are your EC2 instances under-utilized? Have you been wondering if there was an easy way to maximize the EC2/VM usage?

    Are you investing too much in your Control Plane and wish you could divert some of that investment towards developing more features in your applications (business logic)?

    Is your Configuration Management system overwhelming you and seems to have got a life of its own?

    Do you have legacy applications that do not need Docker at all?

    Would you like to simplify your deployment toolchain to streamline your workflows?

    Have you been recommended to use Kubernetes as a problem to fix all your woes, but you aren’t sure if Kubernetes is actually going to help you?

    Do you feel you are moving towards Docker, just so that Kubernetes can be used?

    If you answered “Yes” to any of the questions above, do read on, this article is just what you might need.

    There are steps to create a simple setup on your laptop at the end of the article.

    Introduction

    In the following article, we will present the typical components of a multi-tier application and how it is setup and deployed.

    We shall further go on to see how the same application deployment can be remodeled for scale using any Cloud Infrastructure. (The same software toolchain can be used to deploy the application on your On-Premise Infrastructure as well)

    The tools that we propose are Nomad and Consul. We shall focus more on how to use these tools, rather than deep-dive into the specifics of the tools. We will briefly see the features of the software which would help us achieve our goals.

    • Nomad is a distributed workload manager for not only Docker containers, but also for various other types of workloads like legacy applications, JAVA, LXC, etc.

    More about Nomad Drivers here: Nomadproject.io, application delivery with HashiCorp, introduction to HashiCorp Nomad.

    • Consul is a distributed service mesh, with features like service registry and a key-value store, among others.

    Using these tools, the application/startup workflow would be as follows:

    Nomad will be responsible for starting the service.

    Nomad will publish the service information in Consul. The service information will include details like:

    • Where is the application running (IP:PORT) ?
    • What “service-name” is used to identify the application?
    • What “tags” (metadata) does this application have?

    A Typical Application

    A typical application deployment consists of a certain fixed set of processes, usually coupled with a database and a set of few (or many) peripheral services.

    These services could be primary (must-have) or support (optional) features of the application.

    Note: We are aware about what/how a proper “service-oriented-architecture” should be, though we will skip that discussion for now. We will rather focus on how real-world applications are setup and deployed.

    Simple Multi-tier Application

    In this section, let’s see the components of a multi-tier application along with typical access patterns from outside the system and within the system.

    • Load Balancer/Web/Front End Tier
    • Application Services Tier
    • Database Tier
    • Utility (or Helper Servers): To run background, cron, or queued jobs.

    Using a proxy/loadbalancer, the services (Service-A, Service-B, Service-C) could be accessed using distinct hostnames:

    • a.example.tld
    • b.example.tld
    • c.example.tld

    For an equivalent path-based routing approach, the setup would be similar. Instead of distinct hostnames, the communication mechanism would be:

    • common-proxy.example.tld/path-a/
    • common-proxy.example.tld/path-b/
    • common-proxy.example.tld/path-c/

    Problem Scenario 1

    Some of the basic problems with the deployment of the simple multi-tier application are:

    • What if the service process crashes during its runtime?
    • What if the host on which the services run shuts down, reboots or terminates?

    This is where Nomad’s feature of always keep the service running would be useful.

    In spite of this auto-restart feature, there could be issues if the service restarts on a different machine (i.e. different IP address).

    In case of Docker and ephemeral ports, the service could start on a different port as well.

    To solve this, we will use the service discovery feature provided by Consul, combined with a with a Consul-aware load-balancer/proxy to redirect traffic to the appropriate service.

    The order of the operations within the Nomad job will thus be:

    • Nomad will launch the job/task.
    • Nomad will register the task details as a service definition in Consul.
      (These steps will be re-executed if/when the application is restarted due to a crash/fail-over)
    • The Consul-aware load-balancer will route the traffic to the service (IP:PORT)

    Multi-tier Application With Load Balancer

    Using the Consul-aware load-balancer, the diagram will now look like:

    The details of the setup now are:

    • A Consul-aware load-balancer/proxy; the application will access the services via the load-balancer.
    • 3 (three) instances of service A; A1, A2, A3
    • 3 (three) instances of service B; B1, B2, B3

    The Routing Question

    At this moment, you could be wondering, “Why/How would the load-balancer know that it has to route traffic for service-A to A1/A2/A3 and route traffic for service-B to B1/B2/B3 ?”

    The answer lies in the Consul tags which will be published as part of the service definition (when Nomad registers the service in Consul).

    The appropriate Consul tags will tell the load-balancer to route traffic of a particular service to the appropriate backend. (+++)

    Let’s read that statement again (very slowly, just to be sure); The Consul tags, which are part of the service definition, will inform (advertise) the load-balancer to route traffic to the appropriate backend.

    The reason to dwell upon this distinction is very important, as this is different from how the classic load-balancer/proxy software like HAProxy or NGINX are configured. For HAProxy/NGINX the backend routing information resides with the load-balancer instance and is not “advertised” by the backend.

    The traditional load-balancers like NGINX/HAProxy do not natively support dynamic reloading of the backends. (when the backends stop/start/move-around). The heavy lifting of regenerating the configuration file and reloading the service is left up to an external entity like Consul-Template.

    The use of a Consul-aware load-balancer, instead of a traditional load-balancer, eliminates the need of external workarounds.

    The setup can thus be termed as a zero-configuration setup; you don’t have to re-configure the load-balancer, it will discover the changing backend services based on the information available from Consul.

    Problem Scenario 2

    So far we have achieved a method to “automatically” discover the backends, but isn’t the Load-Balancer itself a single-point-of-failure (SPOF)?

    It absolutely is, and you should always have redundant load-balancers instances (which is what any cloud-provided load-balancer has).

    As there is a certain cost associated with using “cloud-provided load-balancer”, we would create the load-balancers ourselves and not use cloud-provided load-balancers.

    To provide redundancy to the load-balancer instances, you should configure them using and AutoScalingGroup (AWS), VM Scale Sets (Azure), etc.

    The same redundancy strategy should also be used for the worker nodes, where the actual services reside, by using AutoScaling Groups/VMSS for the worker nodes.

    The Complete Picture

    Installation and Configuration

    Given that nowadays laptops are pretty powerful, you can easily create a test setup on your laptop using VirtualBox, VMware Workstation Player, VMware Workstation, etc.

    As a prerequisite, you will need a few virtual machines which can communicate with each other.

    NOTE: Create the VMs with networking set to bridged mode.

    The machines needed for the simple setup/demo would be:

    • 1 Linux VM to act as a server (srv1)
    • 1 Linux VM to act as a load-balancer (lb1)
    • 2 Linux VMs to act as worker machines (client1, client2)

    *** Each machine can be 2 CPU 1 GB memory each.

    The configuration files and scripts needed for the demo, which will help you set up the Nomad and Consul cluster are available here.

    Setup the Server

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service

    Create the Server Configuration

    ### On the server machine ...
    
    ### Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/server.hcl -O /etc/consul/server.hcl
    
    ### Edit Consul's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
    sudo vim /etc/consul/server.hcl
    
    ### Nomad
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/server.hcl -O /etc/nomad/server.hcl
    
    ### Edit Nomad's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
    sudo vim /etc/nomad/server.hcl
    
    ### After you are done with the edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members

    Setup the Load-Balancer

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service
    sudo mv -fv nomad.service /etc/systemd/system/nomad.service

    Create the Load-Balancer Configuration

    ### On the load-balancer machine ...
    
    ### for Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl
    
    ### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/consul/client.hcl
    
    ### for Nomad ...
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl
    
    ### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/nomad/client.hcl
    
    ### After you are done with the edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members
    sudo nomad node status -verbose

    Setup the Client (Worker) Machines

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service
    sudo mv -fv nomad.service /etc/systemd/system/nomad.service

    Create the Worker Configuration

    ### On the client (worker) machine ...
    
    ### Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl
    
    ### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/consul/client.hcl
    
    ### Nomad
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl
    
    ### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/nomad/client.hcl
    
    ### After you are sure about your edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members
    sudo nomad node status -verbose

    Test the Setup

    For the sake of simplicity, we shall assume the following IP addresses for the machines. (You can adapt the IPs as per your actual cluster configuration)

    srv1: 192.168.1.11

    lb1: 192.168.1.101

    client1: 192.168.201

    client1: 192.168.202

    You can access the web GUI for Consul and Nomad at the following URLs:

    Consul: http://192.168.1.11:8500

    Nomad: http://192.168.1.11:4646

    Login into the server and start the following watch command:

    # watch -n 5 "consul members; echo; nomad server members; echo; nomad node status -verbose; echo; nomad job status"

    Output:

    Node     Address             Status  Type    Build  Protocol  DC   Segment
    srv1     192.168.1.11:8301   alive   server  1.5.1  2         dc1  <all>
    client1  192.168.1.201:8301  alive   client  1.5.1  2         dc1  <default>
    client2  192.168.1.202:8301  alive   client  1.5.1  2         dc1  <default>
    lb1      192.168.1.101:8301  alive   client  1.5.1  2         dc1  <default>
    
    Name         Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
    srv1.global  192.168.1.11  4648  alive   true    2         0.9.3  dc1         global
    
    ID           DC   Name     Class   Address        Version Drain  Eligibility  Status
    37daf354...  dc1  client2  worker  192.168.1.202  0.9.3  false  eligible     ready
    9bab72b1...  dc1  client1  worker  192.168.1.201  0.9.3  false  eligible     ready
    621f4411...  dc1  lb1      lb      192.168.1.101  0.9.3  false  eligible     ready

    Submit Jobs

    Login into the server (srv1) and download the sample jobs

    Run the load-balancer job

    # nomad run fabio_docker.nomad

    Output:

    ==> Monitoring evaluation "bb140467"
        Evaluation triggered by job "fabio_docker"
        Allocation "1a6a5587" created: node "621f4411", group "fabio"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "bb140467" finished with status "complete"

    Check the status of the load-balancer

    # nomad alloc status 1a6a5587

    Output:

    ID                  = 1a6a5587
    Eval ID             = bb140467
    Name                = fabio_docker.fabio[0]
    Node ID             = 621f4411
    Node Name           = lb1
    Job ID              = fabio_docker
    Job Version         = 0
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 1m9s ago
    Modified            = 1m3s ago
    
    Task "fabio" is "running"
    Task Resources
    CPU        Memory          Disk     Addresses
    5/200 MHz  10 MiB/128 MiB  300 MiB  lb: 192.168.1.101:9999
                                        ui: 192.168.1.101:9998
    
    Task Events:
    Started At     = 2019-06-13T19:15:17Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-13T19:15:17Z  Started     Task started by client
    2019-06-13T19:15:12Z  Driver      Downloading image
    2019-06-13T19:15:12Z  Task Setup  Building Task Directory
    2019-06-13T19:15:12Z  Received    Task received by client

    Run the service ‘foo’

    # nomad run foo_docker.nomad

    Output:

    ==> Monitoring evaluation "a994bbf0"
        Evaluation triggered by job "foo_docker"
        Allocation "7794b538" created: node "9bab72b1", group "gowebhello"
        Allocation "eecceffc" modified: node "37daf354", group "gowebhello"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "a994bbf0" finished with status "complete"

    Check the status of service ‘foo’

    # nomad alloc status 7794b538

    Output:

    ID                  = 7794b538
    Eval ID             = a994bbf0
    Name                = foo_docker.gowebhello[1]
    Node ID             = 9bab72b1
    Node Name           = client1
    Job ID              = foo_docker
    Job Version         = 1
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 9s ago
    Modified            = 7s ago
    
    Task "gowebhello" is "running"
    Task Resources
    CPU        Memory           Disk     Addresses
    0/500 MHz  4.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23382
    
    Task Events:
    Started At     = 2019-06-13T19:27:17Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-13T19:27:17Z  Started     Task started by client
    2019-06-13T19:27:16Z  Task Setup  Building Task Directory
    2019-06-13T19:27:15Z  Received    Task received by client

    Run the service ‘bar’

    # nomad run bar_docker.nomad

    Output:

    ==> Monitoring evaluation "075076bc"
        Evaluation triggered by job "bar_docker"
        Allocation "9f16354b" created: node "9bab72b1", group "gowebhello"
        Allocation "b86d8946" created: node "37daf354", group "gowebhello"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "075076bc" finished with status "complete"

    Check the status of service ‘bar’

    # nomad alloc status 9f16354b

    Output:

    ID                  = 9f16354b
    Eval ID             = 075076bc
    Name                = bar_docker.gowebhello[1]
    Node ID             = 9bab72b1
    Node Name           = client1
    Job ID              = bar_docker
    Job Version         = 0
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 4m28s ago
    Modified            = 4m16s ago
    
    Task "gowebhello" is "running"
    Task Resources
    CPU        Memory           Disk     Addresses
    0/500 MHz  6.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23646
    
    Task Events:
    Started At     = 2019-06-14T06:49:36Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-14T06:49:36Z  Started     Task started by client
    2019-06-14T06:49:35Z  Task Setup  Building Task Directory
    2019-06-14T06:49:35Z  Received    Task received by client

    Check the Fabio Routes

    http://192.168.1.101:9998/routes

    Connect to the Services

    The services “foo” and “bar” are available at:

    http://192.168.1.101:9999/foo

    http://192.168.1.101:9999/bar

    Output:

    gowebhello root page
    
    https://github.com/udhos/gowebhello is a simple golang replacement for 'python -m SimpleHTTPServer'.
    Welcome!
    gowebhello version 0.7 runtime go1.12.5 os=linux arch=amd64
    Keepalive: true
    Application banner: Welcome to FOO
    ...
    ...

    Pressing F5 to refresh the browser should keep changing the backend service that you are eventually connected to.

    Conclusion

    This article should give you a fair idea about the common problems of a distributed application and how they can be solved.

    Remodeling an existing application deployment as it scales can be quite a challenge. Hopefully the sample/demo setup will help you to explore, design and optimize the deployment workflows of your application, be it On-Premise or any Cloud Environment.