Blog

  • How Much Do You Really Know About Simplified Cloud Deployments?

    Is your EC2/VM bill giving you sleepless nights?

    Are your EC2 instances under-utilized? Have you been wondering if there was an easy way to maximize the EC2/VM usage?

    Are you investing too much in your Control Plane and wish you could divert some of that investment towards developing more features in your applications (business logic)?

    Is your Configuration Management system overwhelming you and seems to have got a life of its own?

    Do you have legacy applications that do not need Docker at all?

    Would you like to simplify your deployment toolchain to streamline your workflows?

    Have you been recommended to use Kubernetes as a problem to fix all your woes, but you aren’t sure if Kubernetes is actually going to help you?

    Do you feel you are moving towards Docker, just so that Kubernetes can be used?

    If you answered “Yes” to any of the questions above, do read on, this article is just what you might need.

    There are steps to create a simple setup on your laptop at the end of the article.

    Introduction

    In the following article, we will present the typical components of a multi-tier application and how it is setup and deployed.

    We shall further go on to see how the same application deployment can be remodeled for scale using any Cloud Infrastructure. (The same software toolchain can be used to deploy the application on your On-Premise Infrastructure as well)

    The tools that we propose are Nomad and Consul. We shall focus more on how to use these tools, rather than deep-dive into the specifics of the tools. We will briefly see the features of the software which would help us achieve our goals.

    • Nomad is a distributed workload manager for not only Docker containers, but also for various other types of workloads like legacy applications, JAVA, LXC, etc.

    More about Nomad Drivers here: Nomadproject.io, application delivery with HashiCorp, introduction to HashiCorp Nomad.

    • Consul is a distributed service mesh, with features like service registry and a key-value store, among others.

    Using these tools, the application/startup workflow would be as follows:

    Nomad will be responsible for starting the service.

    Nomad will publish the service information in Consul. The service information will include details like:

    • Where is the application running (IP:PORT) ?
    • What “service-name” is used to identify the application?
    • What “tags” (metadata) does this application have?

    A Typical Application

    A typical application deployment consists of a certain fixed set of processes, usually coupled with a database and a set of few (or many) peripheral services.

    These services could be primary (must-have) or support (optional) features of the application.

    Note: We are aware about what/how a proper “service-oriented-architecture” should be, though we will skip that discussion for now. We will rather focus on how real-world applications are setup and deployed.

    Simple Multi-tier Application

    In this section, let’s see the components of a multi-tier application along with typical access patterns from outside the system and within the system.

    • Load Balancer/Web/Front End Tier
    • Application Services Tier
    • Database Tier
    • Utility (or Helper Servers): To run background, cron, or queued jobs.

    Using a proxy/loadbalancer, the services (Service-A, Service-B, Service-C) could be accessed using distinct hostnames:

    • a.example.tld
    • b.example.tld
    • c.example.tld

    For an equivalent path-based routing approach, the setup would be similar. Instead of distinct hostnames, the communication mechanism would be:

    • common-proxy.example.tld/path-a/
    • common-proxy.example.tld/path-b/
    • common-proxy.example.tld/path-c/

    Problem Scenario 1

    Some of the basic problems with the deployment of the simple multi-tier application are:

    • What if the service process crashes during its runtime?
    • What if the host on which the services run shuts down, reboots or terminates?

    This is where Nomad’s feature of always keep the service running would be useful.

    In spite of this auto-restart feature, there could be issues if the service restarts on a different machine (i.e. different IP address).

    In case of Docker and ephemeral ports, the service could start on a different port as well.

    To solve this, we will use the service discovery feature provided by Consul, combined with a with a Consul-aware load-balancer/proxy to redirect traffic to the appropriate service.

    The order of the operations within the Nomad job will thus be:

    • Nomad will launch the job/task.
    • Nomad will register the task details as a service definition in Consul.
      (These steps will be re-executed if/when the application is restarted due to a crash/fail-over)
    • The Consul-aware load-balancer will route the traffic to the service (IP:PORT)

    Multi-tier Application With Load Balancer

    Using the Consul-aware load-balancer, the diagram will now look like:

    The details of the setup now are:

    • A Consul-aware load-balancer/proxy; the application will access the services via the load-balancer.
    • 3 (three) instances of service A; A1, A2, A3
    • 3 (three) instances of service B; B1, B2, B3

    The Routing Question

    At this moment, you could be wondering, “Why/How would the load-balancer know that it has to route traffic for service-A to A1/A2/A3 and route traffic for service-B to B1/B2/B3 ?”

    The answer lies in the Consul tags which will be published as part of the service definition (when Nomad registers the service in Consul).

    The appropriate Consul tags will tell the load-balancer to route traffic of a particular service to the appropriate backend. (+++)

    Let’s read that statement again (very slowly, just to be sure); The Consul tags, which are part of the service definition, will inform (advertise) the load-balancer to route traffic to the appropriate backend.

    The reason to dwell upon this distinction is very important, as this is different from how the classic load-balancer/proxy software like HAProxy or NGINX are configured. For HAProxy/NGINX the backend routing information resides with the load-balancer instance and is not “advertised” by the backend.

    The traditional load-balancers like NGINX/HAProxy do not natively support dynamic reloading of the backends. (when the backends stop/start/move-around). The heavy lifting of regenerating the configuration file and reloading the service is left up to an external entity like Consul-Template.

    The use of a Consul-aware load-balancer, instead of a traditional load-balancer, eliminates the need of external workarounds.

    The setup can thus be termed as a zero-configuration setup; you don’t have to re-configure the load-balancer, it will discover the changing backend services based on the information available from Consul.

    Problem Scenario 2

    So far we have achieved a method to “automatically” discover the backends, but isn’t the Load-Balancer itself a single-point-of-failure (SPOF)?

    It absolutely is, and you should always have redundant load-balancers instances (which is what any cloud-provided load-balancer has).

    As there is a certain cost associated with using “cloud-provided load-balancer”, we would create the load-balancers ourselves and not use cloud-provided load-balancers.

    To provide redundancy to the load-balancer instances, you should configure them using and AutoScalingGroup (AWS), VM Scale Sets (Azure), etc.

    The same redundancy strategy should also be used for the worker nodes, where the actual services reside, by using AutoScaling Groups/VMSS for the worker nodes.

    The Complete Picture

    Installation and Configuration

    Given that nowadays laptops are pretty powerful, you can easily create a test setup on your laptop using VirtualBox, VMware Workstation Player, VMware Workstation, etc.

    As a prerequisite, you will need a few virtual machines which can communicate with each other.

    NOTE: Create the VMs with networking set to bridged mode.

    The machines needed for the simple setup/demo would be:

    • 1 Linux VM to act as a server (srv1)
    • 1 Linux VM to act as a load-balancer (lb1)
    • 2 Linux VMs to act as worker machines (client1, client2)

    *** Each machine can be 2 CPU 1 GB memory each.

    The configuration files and scripts needed for the demo, which will help you set up the Nomad and Consul cluster are available here.

    Setup the Server

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service

    Create the Server Configuration

    ### On the server machine ...
    
    ### Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/server.hcl -O /etc/consul/server.hcl
    
    ### Edit Consul's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
    sudo vim /etc/consul/server.hcl
    
    ### Nomad
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/server.hcl -O /etc/nomad/server.hcl
    
    ### Edit Nomad's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
    sudo vim /etc/nomad/server.hcl
    
    ### After you are done with the edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members

    Setup the Load-Balancer

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service
    sudo mv -fv nomad.service /etc/systemd/system/nomad.service

    Create the Load-Balancer Configuration

    ### On the load-balancer machine ...
    
    ### for Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl
    
    ### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/consul/client.hcl
    
    ### for Nomad ...
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl
    
    ### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/nomad/client.hcl
    
    ### After you are done with the edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members
    sudo nomad node status -verbose

    Setup the Client (Worker) Machines

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service
    sudo mv -fv nomad.service /etc/systemd/system/nomad.service

    Create the Worker Configuration

    ### On the client (worker) machine ...
    
    ### Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl
    
    ### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/consul/client.hcl
    
    ### Nomad
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl
    
    ### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/nomad/client.hcl
    
    ### After you are sure about your edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members
    sudo nomad node status -verbose

    Test the Setup

    For the sake of simplicity, we shall assume the following IP addresses for the machines. (You can adapt the IPs as per your actual cluster configuration)

    srv1: 192.168.1.11

    lb1: 192.168.1.101

    client1: 192.168.201

    client1: 192.168.202

    You can access the web GUI for Consul and Nomad at the following URLs:

    Consul: http://192.168.1.11:8500

    Nomad: http://192.168.1.11:4646

    Login into the server and start the following watch command:

    # watch -n 5 "consul members; echo; nomad server members; echo; nomad node status -verbose; echo; nomad job status"

    Output:

    Node     Address             Status  Type    Build  Protocol  DC   Segment
    srv1     192.168.1.11:8301   alive   server  1.5.1  2         dc1  <all>
    client1  192.168.1.201:8301  alive   client  1.5.1  2         dc1  <default>
    client2  192.168.1.202:8301  alive   client  1.5.1  2         dc1  <default>
    lb1      192.168.1.101:8301  alive   client  1.5.1  2         dc1  <default>
    
    Name         Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
    srv1.global  192.168.1.11  4648  alive   true    2         0.9.3  dc1         global
    
    ID           DC   Name     Class   Address        Version Drain  Eligibility  Status
    37daf354...  dc1  client2  worker  192.168.1.202  0.9.3  false  eligible     ready
    9bab72b1...  dc1  client1  worker  192.168.1.201  0.9.3  false  eligible     ready
    621f4411...  dc1  lb1      lb      192.168.1.101  0.9.3  false  eligible     ready

    Submit Jobs

    Login into the server (srv1) and download the sample jobs

    Run the load-balancer job

    # nomad run fabio_docker.nomad

    Output:

    ==> Monitoring evaluation "bb140467"
        Evaluation triggered by job "fabio_docker"
        Allocation "1a6a5587" created: node "621f4411", group "fabio"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "bb140467" finished with status "complete"

    Check the status of the load-balancer

    # nomad alloc status 1a6a5587

    Output:

    ID                  = 1a6a5587
    Eval ID             = bb140467
    Name                = fabio_docker.fabio[0]
    Node ID             = 621f4411
    Node Name           = lb1
    Job ID              = fabio_docker
    Job Version         = 0
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 1m9s ago
    Modified            = 1m3s ago
    
    Task "fabio" is "running"
    Task Resources
    CPU        Memory          Disk     Addresses
    5/200 MHz  10 MiB/128 MiB  300 MiB  lb: 192.168.1.101:9999
                                        ui: 192.168.1.101:9998
    
    Task Events:
    Started At     = 2019-06-13T19:15:17Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-13T19:15:17Z  Started     Task started by client
    2019-06-13T19:15:12Z  Driver      Downloading image
    2019-06-13T19:15:12Z  Task Setup  Building Task Directory
    2019-06-13T19:15:12Z  Received    Task received by client

    Run the service ‘foo’

    # nomad run foo_docker.nomad

    Output:

    ==> Monitoring evaluation "a994bbf0"
        Evaluation triggered by job "foo_docker"
        Allocation "7794b538" created: node "9bab72b1", group "gowebhello"
        Allocation "eecceffc" modified: node "37daf354", group "gowebhello"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "a994bbf0" finished with status "complete"

    Check the status of service ‘foo’

    # nomad alloc status 7794b538

    Output:

    ID                  = 7794b538
    Eval ID             = a994bbf0
    Name                = foo_docker.gowebhello[1]
    Node ID             = 9bab72b1
    Node Name           = client1
    Job ID              = foo_docker
    Job Version         = 1
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 9s ago
    Modified            = 7s ago
    
    Task "gowebhello" is "running"
    Task Resources
    CPU        Memory           Disk     Addresses
    0/500 MHz  4.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23382
    
    Task Events:
    Started At     = 2019-06-13T19:27:17Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-13T19:27:17Z  Started     Task started by client
    2019-06-13T19:27:16Z  Task Setup  Building Task Directory
    2019-06-13T19:27:15Z  Received    Task received by client

    Run the service ‘bar’

    # nomad run bar_docker.nomad

    Output:

    ==> Monitoring evaluation "075076bc"
        Evaluation triggered by job "bar_docker"
        Allocation "9f16354b" created: node "9bab72b1", group "gowebhello"
        Allocation "b86d8946" created: node "37daf354", group "gowebhello"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "075076bc" finished with status "complete"

    Check the status of service ‘bar’

    # nomad alloc status 9f16354b

    Output:

    ID                  = 9f16354b
    Eval ID             = 075076bc
    Name                = bar_docker.gowebhello[1]
    Node ID             = 9bab72b1
    Node Name           = client1
    Job ID              = bar_docker
    Job Version         = 0
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 4m28s ago
    Modified            = 4m16s ago
    
    Task "gowebhello" is "running"
    Task Resources
    CPU        Memory           Disk     Addresses
    0/500 MHz  6.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23646
    
    Task Events:
    Started At     = 2019-06-14T06:49:36Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-14T06:49:36Z  Started     Task started by client
    2019-06-14T06:49:35Z  Task Setup  Building Task Directory
    2019-06-14T06:49:35Z  Received    Task received by client

    Check the Fabio Routes

    http://192.168.1.101:9998/routes

    Connect to the Services

    The services “foo” and “bar” are available at:

    http://192.168.1.101:9999/foo

    http://192.168.1.101:9999/bar

    Output:

    gowebhello root page
    
    https://github.com/udhos/gowebhello is a simple golang replacement for 'python -m SimpleHTTPServer'.
    Welcome!
    gowebhello version 0.7 runtime go1.12.5 os=linux arch=amd64
    Keepalive: true
    Application banner: Welcome to FOO
    ...
    ...

    Pressing F5 to refresh the browser should keep changing the backend service that you are eventually connected to.

    Conclusion

    This article should give you a fair idea about the common problems of a distributed application and how they can be solved.

    Remodeling an existing application deployment as it scales can be quite a challenge. Hopefully the sample/demo setup will help you to explore, design and optimize the deployment workflows of your application, be it On-Premise or any Cloud Environment.

  • Set Up Simple S3 Deployment Workflow with Github Actions and CircleCI

    In this article, we’ll implement a continuous delivery (referred to as CD going forward) workflow using the Serverless framework for our demo React SPA application using Serverless Finch.

    Deploying single-page applications to AWS S3 is a common use case. Manual deployment and bucket configuration can be tedious and unreliable. By using Serverless and CD platforms, we can simplify this commonly faced CD challenge.

    In almost every project we have worked on, we have built a general-purpose continuous integration (referred to as CI through the rest of this article) setup as part of our basic setups. The CI requirements might range from simple test workflows to cluster deployments.

    In this article, we’ll be focusing on a simple deployment workflow using Github Actions and CircleCI. Github Actions brought CI/CD to a wider community by simplifying the setup for CI pipelines. 

    Prerequisites

    This article assumes you have a basic understanding of CICD and AWS services such as IAM and S3. The sample application uses a basic Create React Application for the deployment demo. But knowing React.js is not required. You can implement the same flow for any other SPA or bare-bones application.

    Why Github Actions?

    There have always been great tools and CI platforms, such as AWS CodePipeline, Jenkins, Travis CI, CircleCI, etc. What makes Github Actions so compelling is that it’s built inside Github. Many organizations use Github for source control, and they often have to spend time configuring repositories with CI tools. On top of that, starting with Github Actions is free.

    As Github Actions is built inside the Github ecosystem, it’s a piece of cake to get CI pipelines up and running. Github Actions also allow you to build your own actions. However, there are some limitations because the CI platform is quite new compared to others.

    Why CircleCI?

    CircleCI has been in the market for almost a decade providing CICD solutions. One of many reasons to choose CircleCI is its pricing. CircleCI offers free credits each month without any upfront payments or payment details. It also offers a wide-ranging repository of plugins called Orbs. You can even build your own orbs, which are easy. It also offers simple and reliable workflow building tools. You can check other features as well.

    Let’s Get Started

    To introduce the application, we’ll create a simple React application with master-detail flow added to it. We’ll be using React’s official CRA tool to create our project, which creates the boilerplate for us.

    Installing Dependencies

    Let’s install the create-react-app as a global package. We’ll be calling our demo project “Serverless S3”. Now, we will create our react app with the following:

    yarn global add create-react-app
    create-react-app serverless-s3

    Now that we’ve created the frontend application, we can start building something cool with it. If we run the application with yarn start, we should be able to see the default CRA welcome page:

    Source: React

    To implement our master-detail flow of Github repositories, we’ll need to add some navigation to our app. Also, to keep it short, we’ll be using Github’s official SDK package. So, let’s use the react-router for the same.

    yarn add react-router-dom @octakit/core

    Our demo application will consist of two routes: 

    1. A list of all public repos of an organization
    2. The details of the repository after clicking a repo item from the list 

    We’ll be using the Octokit client to fetch the data from Github’s open endpoints. This won’t need any authentication with Github.

    Adding Application Components

    Alright, now that we have our dependencies installed, we can add the routes to our App.js, which is the entry point for our React app.

    import { BrowserRouter as Router, Switch, Route } from 'react-router-dom';
     
    import RepoList from './RepoList';
    import RepoDetails from './RepoDetails';
     
    import './App.css';
     
    function App() {
      return (
       <Router>
         <div className="App">
           <Switch>
             <Route path="/repo/:owner/:repo" component={RepoDetails} />
             <Route path="/" component={RepoList} />
           </Switch>
         </div>
       </Router>
     );
    }
     
    export default App;

    Let’s initialize our Octokit client, which will help us make calls to Github’s open endpoints to get data.

    import { Octokit } from '@octokit/core';
     
    export const octokit = new Octokit({});

    You can even make calls to authorized resources with the Octokit client. Octokit client supports both GraphQL and REST API. You can learn more about the client through the official documentation.

    Let’s add the RepoList.js component to the application, which will fetch the list of repositories of a given organization and display hyperlinks to the details page.

    import React, { useEffect, useState } from 'react';
    import { Link } from 'react-router-dom';
    import { octokit } from './client';
     
    function RepoList() {
     const [repos, setRepos] = useState([]);
     useEffect(() => {
       octokit
         .request('GET /orgs/:org/repos', {
           org: 'octokit',
         })
         .then((data) => setRepos(data.data));
     }, []);
     
     return (
       <div className="repo-list-container">
         <h1>Repositories</h1>
         <ul>
           {repos.map((repo) => (
             <li key={repo.id} className="repo-list-item">
               <Link to={`/repo/${repo.owner.login}/${repo.name}`}>{repo.full_name}</Link>
             </li>
           ))}
         </ul>
       </div>
     );
    }
     
    export default RepoList;

    Now that we have our list of repositories ready, we can now allow users to see some of their general details. Let’s create our details component called RepoDetails:

    import { useEffect, useState } from 'react';
    import { useParams } from 'react-router-dom';
    import { octokit } from './client';
    function RepoDetails() {
      const [repo, setRepo] = useState();
      const { repo: repoName, owner } = useParams();
      useEffect(() => {
        octokit
          .request('GET /repos/{owner}/{repo}', {
            owner,
            repo: repoName,
          })
          .then((data) => setRepo(data.data));
      }, [repoName, owner]);
      if (!repo) {
        return <b>loading...</b>;
      }
      return (
        <div className="repo-container">
          <h1>{repo.full_name}</h1>
          <p>Description: {repo.description}</p>
          <ul>
            <li><b>Forks:</b> {repo.forks}</li>
            <li><b>Subscribers:</b> {repo.subscribers_count}</li>
            <li><b>Watchers:</b> {repo.watchers}</li>
            <li><b>License:</b> {repo.license.name}</li>
          </ul>
        </div>
      );
    }
    export default RepoDetails;

    Setting up Serverless

    With this done, we have our repositories master-detail flow ready. Assuming we have an AWS account setup, we can start adding the Serverless config to our project. Let’s start with the CD setup. As we said before, we’ll be using the Serverless framework to achieve our deployment workflow. Let’s add it.

    We’ll also install the Serverless plugin called serverless-finch, which allows us to configure and deploy to S3 buckets.

    yarn global add serverless
    yarn add serverless-finch --save-dev

    Now that we have our Serverless CLI installed, we init the serverless service in our project by running the following command to create a hello-world serverless service:

    serverless create -t hello-world

    This will create a configuration yaml file and a handler lambda function. We don’t need the handler, so we can delete handler.js. Our serverless.yml should look like this:

    service: serverless-s3
    frameworkVersion: '2'
     
    # The `provider` block defines where your service will be deployed
    provider:
     name: aws
     runtime: nodejs12.x
     
    functions:
     helloWorld:
       handler: handler.hello-world
         events:
         - http:
             path: helloWorld
             method: get
             cors: true

    The serverless.yml file contains configurations for a lambda function called hello-world. We can remove the functions block completely. After doing that, let’s register our Serverless Finch plugin:

    service: serverless-s3
    frameworkVersion: '2'
     
    provider:
     name: aws
     runtime: nodejs12.x
     
    plugins:
     - serverless-finch

    Alright, now that our plugin is ready to be used, we can add details about our S3 buckets so it can deploy to it. Let’s add this block, which tells Serverless to use the serverless-s3-galileo bucket to deploy our code from the build directory. Make sure you use a different bucket name, as S3 bucket names are unique globally.

    custom:
     client:
       bucketName: serverles-s3-galileo
       distributionFolder: build
       indexDocument: index.html
       errorDocument: index.html

    That is it! We’re ready to deploy our app on our bucket. Haven’t created a bucket yet? No problem—serverless-finch will automatically create it. The last thing we need to add is bucket-policy so our app can be accessed publicly. Let’s create our bucket policy.

    Note: The indexDocument is the entry point for our web application, which is index.html in this case. We also need to add the same to errorDocument so our React routing works well in S3 hosting.

    {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "*"
               },
               "Action": "s3:GetObject",
               "Resource": "arn:aws:s3:::serverles-s3-galileo/*"
           }
       ]
    }

    As the default access to S3 assets is private, we need to set up a bucket policy for our deployment bucket. The policy gives read-only access to the public for our app so we can browse the deployed assets in the browser. You can learn more about bucket policies. Let’s update our Serverless config to use our policy. This is how our serverless.yml should look:

    service: serverless-s3
    frameworkVersion: '2'
     
    provider:
     name: aws
     runtime: nodejs12.x
     
    plugins:
     - serverless-finch
     
    custom:
     client:
       bucketName: serverles-s3-galileo
       distributionFolder: build
       indexDocument: index.html
       errorDocument: index.html
       bucketPolicyFile: config/bucket-policy.json

    Creating Github Actions Workflow

    Assuming you’ve created your repo and pushed the code to it, we can start setting up our first workflow using Github Actions. As we’re using AWS for our Serverless deployments to S3, we need to provide the details of our IAM role. The env block allows us to insert custom env variables into the CI build. In this case, we need the AWS access key and secret access key to deploy build files to the S3 bucket. 

    Github allows us to store secret values that can be used in the CI environment of Github Actions. You can easily set up these secrets for your repositories. This is how they should look when configured:

    Now, we can move ahead and add a Github Action workflow. Let’s create a workflow file at the .github/deploy.yml location and add the following to it.

    name: Serverless S3 Deploy
    on:
     push:
       branches: [ master ]
     pull_request:
       branches: [ master ]

    Alright, so the Github Actions config above tells Github to trigger this workflow whenever someone pushes to the master branch or creates a PR against it.

    As of now, our action config is incomplete and does nothing. Let’s add our first and only job to the workflow:

    name: Serverless S3
     
    on:
     push:
       branches: [ master ]
     pull_request:
       branches: [ master ]
     
    jobs:
     build:
       runs-on: ubuntu-latest
       strategy:
         matrix:
           node-version: [10.x]
       steps:
       - uses: actions/checkout@v2

    Let’s try to digest the config above:

    runs-on:  ubuntu-latest

    The runs-on statement specifies which executor will be running the job. In this case, it’s the latest release of Linux Ubuntu variant.

    Strategy: 

         Matrix:

            node-version: [10.x]

    The strategy defines the environment we want to run our job on. This is usually useful when we want to run tests on multiple machines. In our case, we don’t want that. So, we’ll be using a single node environment with version 10.x

       steps:

       – uses: actions/checkout@v2

    In the configuration’s steps block, we can define various tasks to be sequentially performed within a job. actions/checkout@v2 does the work of checking out branches for us. This step is required so we can do further work on our source code.

    This bare minimum setup is required for running a job in our Github workflows. After this, we will need to set up the environment and deploy our application. So, let’s add the rest of the steps to it.

    name: Serverless S3
     
    on:
     push:
       branches: [ master ]
     pull_request:
       branches: [ master ]
     
    jobs:
     build:
       runs-on: ubuntu-latest
       strategy:
         matrix:
           node-version: [10.x]
       steps:
       - uses: actions/checkout@v2
       - name: Use Node.js ${{ matrix.node-version }}
         uses: actions/setup-node@v1
         with:
           node-version: ${{ matrix.node-version }}
       - run: yarn install
       - run: yarn build
       - name: serverless deploy s3
         uses: serverless/github-action@master
         with:
           args: client deploy --no-confirm
         env:
           AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
           AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

    These actions need to be executed to deploy our frontend assets to our S3 buckets. As we read through the steps, we’re doing the following things in sequence:

    1. Check out the current branch code

    2. Setting up our node.js environment

    3. Installing our dependencies with yarn install

    4. Building our production build with yarn build

    5. Deploying our build to S3 with serverless deploy –no-confirm

    • The uses block defines which custom action we’re using
    • The args block allows us to pass arguments to the actions
    • The –no-confirm flag is needed so Serverless Finch does not ask us for confirmations while deploying to S3 buckets. 
    • The args allows us to tell action to run it with specific arguments
    • env allows us to pass custom environment variables to an action

    Alright, so now we have the CD workflow setup to deploy our app. We can make a commit and push to the master branch. This should trigger our workflow. You can see your workflow running in the Actions section of your repository like this:

    You can check the output of the serverless deploy step and browse the S3 website URL. It should now show our application running.

    Creating CircleCI Workflow

    To start building a repository, we need to authorize it with our Github account. You can do that by signing up for CircleCI and following the steps here.

    As we did, add the IAM role secret credentials to our actions workflow. We can set up env variables for our workflows in CircleCI. This is how they should look once configured in the project settings:

    Just like the Github Actions workflow, we can create workflows in CircleCI. CircleCI also allows us to use third-party custom plugins. We can use the available plugins called Orbs in our deployment workflows in CircleCI.

    We’ll need the official CircleCI distributions of the aws-cli, serverless-framework, and node.js orbs for our deploy workflow. Let’s create our first job for our workflow:

    version: 2.1
     
    orbs:
     aws-cli: circleci/aws-cli@1.0.0
     serverless: circleci/serverless-framework@1.0.1
     node: circleci/node@4.1.0
     
    jobs:
     deploy:
       executor: serverless/default

    The executor here is a prebuilt image, which allows us to run. 

    Just like we defined steps for our jobs in Github Actions, we can add for CircleCI. Here we’re using commands made available from the node orb to install dependencies, build projects, and set up Serverless with AWS. Just like we set up the secrets for Github Actions, we need to define our AWS credentials under the CircleCI environment variables.

    version: 2.1
     
    orbs:
     aws-cli: circleci/aws-cli@1.0.0
     serverless: circleci/serverless-framework@1.0.1
     node: circleci/node@4.1.0
     
    jobs:
     deploy:
       executor: serverless/default
       steps:
         - checkout
         - node/install-yarn
         - run:
             name: install
             command: yarn install
         - run:
             name: build
             command: yarn build
         - aws-cli/setup
         - serverless/setup:
             app-name: serverless-s3
             org-name: velotio
         - run:
             name: deploy
             command: serverless client deploy --no-confirm
    workflows:
     deploy:
       jobs:
         - deploy:
             filters:
               branches:
                 only:
                   - master

    The workflows section in the above yml file indicates that we want to trigger the deploy workflow whenever our master branch gets updated. Just like we mentioned the steps for the Github Actions deploy job, we did the same for CircleCI jobs.

    1. Check out the code
    2. Install yarn package manager with node/install-yarn 
    3. Install dependencies with yarn install
    4. Build the project with yarn build
    5. Setup AWS and Serverless CLI
    6. Deploy to s3 with serverless client deploy –no-confirm

    The workflow block in the config above tells CircleCI to run the deploy job. The filters block for the deploy job above tells us that we want to run the job only when the master branch gets updated. 

    Once we’re done with the above setup, we can make a test commit and check whether our workflow is running.

    Conclusion

    We can easily integrate build/deployment workflows with simple configurations offered through Github Actions. If we don’t primarily use GitHub as version control, we can opt for CircleCI for our workflows.

    Related Articles

    1. Automating Serverless Framework Deployment using Watchdog
    2. To Go Serverless Or Not Is The Question

    You can find the referenced code at this repo.

  • Using Packer and Terraform to Setup Jenkins Master-Slave Architecture

    Automation is everywhere and it is better to adopt it as soon as possible. Today, in this blog post, we are going to discuss creating the infrastructure. For this, we will be using AWS for hosting our deployment pipeline. Packer will be used to create AMI’s and Terraform will be used for creating the master/slaves. We will be discussing different ways of connecting the slaves and will also run a sample application with the pipeline.

    Please remember the intent of the blog is to accumulate all the different components together, this means some of the code which should be available in development code repo is also included here. Now that we have highlighted the required tools, 10000 ft view and intent of the blog. Let’s begin.

    Using Packer to Create AMI’s for Jenkins Master and Linux Slave

    Hashicorp has bestowed with some of the most amazing tools for simplifying our life. Packer is one of them. Packer can be used to create custom AMI from already available AMI’s. We just need to create a JSON file and pass installation script as part of creation and it will take care of developing the AMI for us. Install packer depending upon your requirement from Packer downloads page. For simplicity purpose, we will be using Linux machine for creating Jenkins Master and Linux Slave. JSON file for both of them will be same but can be separated if needed.

    Note: user-data passed from terraform will be different which will eventually differentiate their usage.

    We are using Amazon Linux 2 – JSON file for the same.

    {
      "builders": [
      {
        "ami_description": "{{user `ami-description`}}",
        "ami_name": "{{user `ami-name`}}",
        "ami_regions": [
          "us-east-1"
        ],
        "ami_users": [
          "XXXXXXXXXX"
        ],
        "ena_support": "true",
        "instance_type": "t2.medium",
        "region": "us-east-1",
        "source_ami_filter": {
          "filters": {
            "name": "amzn2-ami-hvm-2.0*x86_64*",
            "root-device-type": "ebs",
            "virtualization-type": "hvm"
          },
          "most_recent": true,
          "owners": [
            "amazon"
          ]
        },
        "sriov_support": "true",
        "ssh_username": "ec2-user",
        "tags": {
          "Name": "{{user `ami-name`}}"
        },
        "type": "amazon-ebs"
      }
    ],
    "post-processors": [
      {
        "inline": [
          "echo AMI Name {{user `ami-name`}}",
          "date",
          "exit 0"
        ],
        "type": "shell-local"
      }
    ],
    "provisioners": [
      {
        "script": "install_amazon.bash",
        "type": "shell"
      }
    ],
      "variables": {
        "ami-description": "Amazon Linux for Jenkins Master and Slave ({{isotime \"2006-01-02-15-04-05\"}})",
        "ami-name": "amazon-linux-for-jenkins-{{isotime \"2006-01-02-15-04-05\"}}",
        "aws_access_key": "",
        "aws_secret_key": ""
      }
    }

    As you can see the file is pretty simple. The only thing of interest here is the install_amazon.bash script. In this blog post, we will deploy a Node-based application which is running inside a docker container. Content of the bash file is as follows:

    #!/bin/bash
    
    set -x
    
    # For Node
    curl -sL https://rpm.nodesource.com/setup_10.x | sudo -E bash -
    
    # For xmlstarlet
    sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
    
    sudo yum update -y
    
    sleep 10
    
    # Setting up Docker
    sudo yum install -y docker
    sudo usermod -a -G docker ec2-user
    
    # Just to be safe removing previously available java if present
    sudo yum remove -y java
    
    sudo yum install -y python2-pip jq unzip vim tree biosdevname nc mariadb bind-utils at screen tmux xmlstarlet git java-1.8.0-openjdk nc gcc-c++ make nodejs
    
    sudo -H pip install awscli bcrypt
    sudo -H pip install --upgrade awscli
    sudo -H pip install --upgrade aws-ec2-assign-elastic-ip
    
    sudo npm install -g @angular/cli
    
    sudo systemctl enable docker
    sudo systemctl enable atd
    
    sudo yum clean all
    sudo rm -rf /var/cache/yum/
    exit 0
    @velotiotech

    Now there are a lot of things mentioned let’s check them out. As mentioned earlier we will be discussing different ways of connecting to a slave and for one of them, we need xmlstarlet. Rest of the things are packages that we might need in one way or the other.

    Update ami_users with actual user value. This can be found on AWS console Under Support and inside of it Support Center.

    Validate what we have written is right or not by running packer validate amazon.json.

    Once confirmed, build the packer image by running packer build amazon.json.

    After completion check your AWS console and you will find a new AMI created in “My AMI’s”.

    It’s now time to start using terraform for creating the machines. 

    Prerequisite:

    1. Please make sure you create a provider.tf file.

    provider "aws" {
      region                  = "us-east-1"
      shared_credentials_file = "~/.aws/credentials"
      profile                 = "dev"
    }

    The ‘credentials file’ will contain aws_access_key_id and aws_secret_access_key.

    2.  Keep SSH keys handy for server/slave machines. Here is a nice article highlighting how to create it or else create them before hand on aws console and reference it in the code.

    3. VPC:

    # lookup for the "default" VPC
    data "aws_vpc" "default_vpc" {
      default = true
    }
    
    # subnet list in the "default" VPC
    # The "default" VPC has all "public subnets"
    data "aws_subnet_ids" "default_public" {
      vpc_id = "${data.aws_vpc.default_vpc.id}"
    }

    Creating Terraform Script for Spinning up Jenkins Master

    Creating Terraform Script for Spinning up Jenkins Master. Get terraform from terraform download page.

    We will need to set up the Security Group before setting up the instance.

    # Security Group:
    resource "aws_security_group" "jenkins_server" {
      name        = "jenkins_server"
      description = "Jenkins Server: created by Terraform for [dev]"
    
      # legacy name of VPC ID
      vpc_id = "${data.aws_vpc.default_vpc.id}"
    
      tags {
        Name = "jenkins_server"
        env  = "dev"
      }
    }
    
    ###############################################################################
    # ALL INBOUND
    ###############################################################################
    
    # ssh
    resource "aws_security_group_rule" "jenkins_server_from_source_ingress_ssh" {
      type              = "ingress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["<Your Public IP>/32", "172.0.0.0/8"]
      description       = "ssh to jenkins_server"
    }
    
    # web
    resource "aws_security_group_rule" "jenkins_server_from_source_ingress_webui" {
      type              = "ingress"
      from_port         = 8080
      to_port           = 8080
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "jenkins server web"
    }
    
    # JNLP
    resource "aws_security_group_rule" "jenkins_server_from_source_ingress_jnlp" {
      type              = "ingress"
      from_port         = 33453
      to_port           = 33453
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["172.31.0.0/16"]
      description       = "jenkins server JNLP Connection"
    }
    
    ###############################################################################
    # ALL OUTBOUND
    ###############################################################################
    
    resource "aws_security_group_rule" "jenkins_server_to_other_machines_ssh" {
      type              = "egress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins servers to ssh to other machines"
    }
    
    resource "aws_security_group_rule" "jenkins_server_outbound_all_80" {
      type              = "egress"
      from_port         = 80
      to_port           = 80
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins servers for outbound yum"
    }
    
    resource "aws_security_group_rule" "jenkins_server_outbound_all_443" {
      type              = "egress"
      from_port         = 443
      to_port           = 443
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins servers for outbound yum"
    }

    Now that we have a custom AMI and security groups for ourselves let’s use them to create a terraform instance.

    # AMI lookup for this Jenkins Server
    data "aws_ami" "jenkins_server" {
      most_recent      = true
      owners           = ["self"]
    
      filter {
        name   = "name"
        values = ["amazon-linux-for-jenkins*"]
      }
    }
    
    resource "aws_key_pair" "jenkins_server" {
      key_name   = "jenkins_server"
      public_key = "${file("jenkins_server.pub")}"
    }
    
    # lookup the security group of the Jenkins Server
    data "aws_security_group" "jenkins_server" {
      filter {
        name   = "group-name"
        values = ["jenkins_server"]
      }
    }
    
    # userdata for the Jenkins server ...
    data "template_file" "jenkins_server" {
      template = "${file("scripts/jenkins_server.sh")}"
    
      vars {
        env = "dev"
        jenkins_admin_password = "mysupersecretpassword"
      }
    }
    
    # the Jenkins server itself
    resource "aws_instance" "jenkins_server" {
      ami                    		= "${data.aws_ami.jenkins_server.image_id}"
      instance_type          		= "t3.medium"
      key_name               		= "${aws_key_pair.jenkins_server.key_name}"
      subnet_id              		= "${data.aws_subnet_ids.default_public.ids[0]}"
      vpc_security_group_ids 		= ["${data.aws_security_group.jenkins_server.id}"]
      iam_instance_profile   		= "dev_jenkins_server"
      user_data              		= "${data.template_file.jenkins_server.rendered}"
    
      tags {
        "Name" = "jenkins_server"
      }
    
      root_block_device {
        delete_on_termination = true
      }
    }
    
    output "jenkins_server_ami_name" {
        value = "${data.aws_ami.jenkins_server.name}"
    }
    
    output "jenkins_server_ami_id" {
        value = "${data.aws_ami.jenkins_server.id}"
    }
    
    output "jenkins_server_public_ip" {
      value = "${aws_instance.jenkins_server.public_ip}"
    }
    
    output "jenkins_server_private_ip" {
      value = "${aws_instance.jenkins_server.private_ip}"
    }

    As mentioned before, we will be discussing multiple ways in which we can connect the slaves to Jenkins master. But it is already known that every time a new Jenkins comes up, it generates a unique password. Now there are two ways to deal with this, one is to wait for Jenkins to spin up and retrieve that password or just directly edit the admin password while creating Jenkins master. Here we will be discussing how to change the password when configuring Jenkins. (If you need the script to retrieve Jenkins password as soon as it gets created than comment and I will share that with you as well).

    Below is the user data to install Jenkins master, configure its password and install required packages.

    #!/bin/bash
    
    set -x
    
    function wait_for_jenkins()
    {
      while (( 1 )); do
          echo "waiting for Jenkins to launch on port [8080] ..."
          
          nc -zv 127.0.0.1 8080
          if (( $? == 0 )); then
              break
          fi
    
          sleep 10
      done
    
      echo "Jenkins launched"
    }
    
    function updating_jenkins_master_password ()
    {
      cat > /tmp/jenkinsHash.py <<EOF
    import bcrypt
    import sys
    if not sys.argv[1]:
      sys.exit(10)
    plaintext_pwd=sys.argv[1]
    encrypted_pwd=bcrypt.hashpw(sys.argv[1], bcrypt.gensalt(rounds=10, prefix=b"2a"))
    isCorrect=bcrypt.checkpw(plaintext_pwd, encrypted_pwd)
    if not isCorrect:
      sys.exit(20);
    print "{}".format(encrypted_pwd)
    EOF
    
      chmod +x /tmp/jenkinsHash.py
      
      # Wait till /var/lib/jenkins/users/admin* folder gets created
      sleep 10
    
      cd /var/lib/jenkins/users/admin*
      pwd
      while (( 1 )); do
          echo "Waiting for Jenkins to generate admin user's config file ..."
    
          if [[ -f "./config.xml" ]]; then
              break
          fi
    
          sleep 10
      done
    
      echo "Admin config file created"
    
      admin_password=$(python /tmp/jenkinsHash.py ${jenkins_admin_password} 2>&1)
      
      # Please do not remove alter quote as it keeps the hash syntax intact or else while substitution, $<character> will be replaced by null
      xmlstarlet -q ed --inplace -u "/user/properties/hudson.security.HudsonPrivateSecurityRealm_-Details/passwordHash" -v '#jbcrypt:'"$admin_password" config.xml
    
      # Restart
      systemctl restart jenkins
      sleep 10
    }
    
    function install_packages ()
    {
    
      wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
      rpm --import https://jenkins-ci.org/redhat/jenkins-ci.org.key
      yum install -y jenkins
    
      # firewall
      #firewall-cmd --permanent --new-service=jenkins
      #firewall-cmd --permanent --service=jenkins --set-short="Jenkins Service Ports"
      #firewall-cmd --permanent --service=jenkins --set-description="Jenkins Service firewalld port exceptions"
      #firewall-cmd --permanent --service=jenkins --add-port=8080/tcp
      #firewall-cmd --permanent --add-service=jenkins
      #firewall-cmd --zone=public --add-service=http --permanent
      #firewall-cmd --reload
      systemctl enable jenkins
      systemctl restart jenkins
      sleep 10
    }
    
    function configure_jenkins_server ()
    {
      # Jenkins cli
      echo "installing the Jenkins cli ..."
      cp /var/cache/jenkins/war/WEB-INF/jenkins-cli.jar /var/lib/jenkins/jenkins-cli.jar
    
      # Getting initial password
      # PASSWORD=$(cat /var/lib/jenkins/secrets/initialAdminPassword)
      PASSWORD="${jenkins_admin_password}"
      sleep 10
    
      jenkins_dir="/var/lib/jenkins"
      plugins_dir="$jenkins_dir/plugins"
    
      cd $jenkins_dir
    
      # Open JNLP port
      xmlstarlet -q ed --inplace -u "/hudson/slaveAgentPort" -v 33453 config.xml
    
      cd $plugins_dir || { echo "unable to chdir to [$plugins_dir]"; exit 1; }
    
      # List of plugins that are needed to be installed 
      plugin_list="git-client git github-api github-oauth github MSBuild ssh-slaves workflow-aggregator ws-cleanup"
    
      # remove existing plugins, if any ...
      rm -rfv $plugin_list
    
      for plugin in $plugin_list; do
          echo "installing plugin [$plugin] ..."
          java -jar $jenkins_dir/jenkins-cli.jar -s http://127.0.0.1:8080/ -auth admin:$PASSWORD install-plugin $plugin
      done
    
      # Restart jenkins after installing plugins
      java -jar $jenkins_dir/jenkins-cli.jar -s http://127.0.0.1:8080 -auth admin:$PASSWORD safe-restart
    }
    
    ### script starts here ###
    
    install_packages
    
    wait_for_jenkins
    
    updating_jenkins_master_password
    
    wait_for_jenkins
    
    configure_jenkins_server
    
    echo "Done"
    exit 0
    

    There is a lot of stuff that has been covered here. But the most tricky bit is changing Jenkins password. Here we are using a python script which uses brcypt to hash the plain text in Jenkins encryption format and xmlstarlet for replacing that password in the actual location. Also, we are using xmstarlet to edit the JNLP port for windows slave. Do remember initial username for Jenkins is admin.

    Command to run: Initialize terraform – terraform init , Check and apply – terraform plan -> terraform apply

    After successfully running apply command go to AWS console and check for a new instance coming up. Hit the <public ip=””>:8080 and enter credentials as you had passed and you will have the Jenkins master for yourself ready to be used. </public>

    Note: I will be providing the terraform script and permission list of IAM roles for the user at the end of the blog.

    Creating Terraform Script for Spinning up Linux Slave and connect it to master

    We won’t be creating a new image here rather use the same one that we used for Jenkins master.

    VPC will be same and updated Security groups for slave are below:

    resource "aws_security_group" "dev_jenkins_worker_linux" {
      name        = "dev_jenkins_worker_linux"
      description = "Jenkins Server: created by Terraform for [dev]"
    
    # legacy name of VPC ID
      vpc_id = "${data.aws_vpc.default_vpc.id}"
    
      tags {
        Name = "dev_jenkins_worker_linux"
        env  = "dev"
      }
    }
    
    ###############################################################################
    # ALL INBOUND
    ###############################################################################
    
    # ssh
    resource "aws_security_group_rule" "jenkins_worker_linux_from_source_ingress_ssh" {
      type              = "ingress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["<Your Public IP>/32"]
      description       = "ssh to jenkins_worker_linux"
    }
    
    # ssh
    resource "aws_security_group_rule" "jenkins_worker_linux_from_source_ingress_webui" {
      type              = "ingress"
      from_port         = 8080
      to_port           = 8080
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "ssh to jenkins_worker_linux"
    }
    
    
    ###############################################################################
    # ALL OUTBOUND
    ###############################################################################
    
    resource "aws_security_group_rule" "jenkins_worker_linux_to_all_80" {
      type              = "egress"
      from_port         = 80
      to_port           = 80
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker to all 80"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_linux_to_all_443" {
      type              = "egress"
      from_port         = 443
      to_port           = 443
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker to all 443"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_linux_to_other_machines_ssh" {
      type              = "egress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker linux to jenkins server"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_linux_to_jenkins_server_8080" {
      type                     = "egress"
      from_port                = 8080
      to_port                  = 8080
      protocol                 = "tcp"
      security_group_id        = "${aws_security_group.dev_jenkins_worker_linux.id}"
      source_security_group_id = "${aws_security_group.jenkins_server.id}"
      description              = "allow jenkins workers linux to jenkins server"
    }

    Now that we have the required security groups in place it is time to bring into light terraform script for linux slave.

    data "aws_ami" "jenkins_worker_linux" {
      most_recent      = true
      owners           = ["self"]
    
      filter {
        name   = "name"
        values = ["amazon-linux-for-jenkins*"]
      }
    }
    
    resource "aws_key_pair" "jenkins_worker_linux" {
      key_name   = "jenkins_worker_linux"
      public_key = "${file("jenkins_worker.pub")}"
    }
    
    data "local_file" "jenkins_worker_pem" {
      filename = "${path.module}/jenkins_worker.pem"
    }
    
    data "template_file" "userdata_jenkins_worker_linux" {
      template = "${file("scripts/jenkins_worker_linux.sh")}"
    
      vars {
        env         = "dev"
        region      = "us-east-1"
        datacenter  = "dev-us-east-1"
        node_name   = "us-east-1-jenkins_worker_linux"
        domain      = ""
        device_name = "eth0"
        server_ip   = "${aws_instance.jenkins_server.private_ip}"
        worker_pem  = "${data.local_file.jenkins_worker_pem.content}"
        jenkins_username = "admin"
        jenkins_password = "mysupersecretpassword"
      }
    }
    
    # lookup the security group of the Jenkins Server
    data "aws_security_group" "jenkins_worker_linux" {
      filter {
        name   = "group-name"
        values = ["dev_jenkins_worker_linux"]
      }
    }
    
    resource "aws_launch_configuration" "jenkins_worker_linux" {
      name_prefix                 = "dev-jenkins-worker-linux"
      image_id                    = "${data.aws_ami.jenkins_worker_linux.image_id}"
      instance_type               = "t3.medium"
      iam_instance_profile        = "dev_jenkins_worker_linux"
      key_name                    = "${aws_key_pair.jenkins_worker_linux.key_name}"
      security_groups             = ["${data.aws_security_group.jenkins_worker_linux.id}"]
      user_data                   = "${data.template_file.userdata_jenkins_worker_linux.rendered}"
      associate_public_ip_address = false
    
      root_block_device {
        delete_on_termination = true
        volume_size = 100
      }
    
      lifecycle {
        create_before_destroy = true
      }
    }
    
    resource "aws_autoscaling_group" "jenkins_worker_linux" {
      name                      = "dev-jenkins-worker-linux"
      min_size                  = "1"
      max_size                  = "2"
      desired_capacity          = "2"
      health_check_grace_period = 60
      health_check_type         = "EC2"
      vpc_zone_identifier       = ["${data.aws_subnet_ids.default_public.ids}"]
      launch_configuration      = "${aws_launch_configuration.jenkins_worker_linux.name}"
      termination_policies      = ["OldestLaunchConfiguration"]
      wait_for_capacity_timeout = "10m"
      default_cooldown          = 60
    
      tags = [
        {
          key                 = "Name"
          value               = "dev_jenkins_worker_linux"
          propagate_at_launch = true
        },
        {
          key                 = "class"
          value               = "dev_jenkins_worker_linux"
          propagate_at_launch = true
        },
      ]
    }

    And now the final piece of code, which is user-data of slave machine.

    #!/bin/bash
    
    set -x
    
    function wait_for_jenkins ()
    {
        echo "Waiting jenkins to launch on 8080..."
    
        while (( 1 )); do
            echo "Waiting for Jenkins"
    
            nc -zv ${server_ip} 8080
            if (( $? == 0 )); then
                break
            fi
    
            sleep 10
        done
    
        echo "Jenkins launched"
    }
    
    function slave_setup()
    {
        # Wait till jar file gets available
        ret=1
        while (( $ret != 0 )); do
            wget -O /opt/jenkins-cli.jar http://${server_ip}:8080/jnlpJars/jenkins-cli.jar
            ret=$?
    
            echo "jenkins cli ret [$ret]"
        done
    
        ret=1
        while (( $ret != 0 )); do
            wget -O /opt/slave.jar http://${server_ip}:8080/jnlpJars/slave.jar
            ret=$?
    
            echo "jenkins slave ret [$ret]"
        done
        
        mkdir -p /opt/jenkins-slave
        chown -R ec2-user:ec2-user /opt/jenkins-slave
    
        # Register_slave
        JENKINS_URL="http://${server_ip}:8080"
    
        USERNAME="${jenkins_username}"
        
        # PASSWORD=$(cat /tmp/secret)
        PASSWORD="${jenkins_password}"
    
        SLAVE_IP=$(ip -o -4 addr list ${device_name} | head -n1 | awk '{print $4}' | cut -d/ -f1)
        NODE_NAME=$(echo "jenkins-slave-linux-$SLAVE_IP" | tr '.' '-')
        NODE_SLAVE_HOME="/opt/jenkins-slave"
        EXECUTORS=2
        SSH_PORT=22
    
        CRED_ID="$NODE_NAME"
        LABELS="build linux docker"
        USERID="ec2-user"
    
        cd /opt
        
        # Creating CMD utility for jenkins-cli commands
        jenkins_cmd="java -jar /opt/jenkins-cli.jar -s $JENKINS_URL -auth $USERNAME:$PASSWORD"
    
        # Waiting for Jenkins to load all plugins
        while (( 1 )); do
    
          count=$($jenkins_cmd list-plugins 2>/dev/null | wc -l)
          ret=$?
    
          echo "count [$count] ret [$ret]"
    
          if (( $count > 0 )); then
              break
          fi
    
          sleep 30
        done
    
        # Delete Credentials if present for respective slave machines
        $jenkins_cmd delete-credentials system::system::jenkins _ $CRED_ID
    
        # Generating cred.xml for creating credentials on Jenkins server
        cat > /tmp/cred.xml <<EOF
    <com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey plugin="ssh-credentials@1.16">
      <scope>GLOBAL</scope>
      <id>$CRED_ID</id>
      <description>Generated via Terraform for $SLAVE_IP</description>
      <username>$USERID</username>
      <privateKeySource class="com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey\$DirectEntryPrivateKeySource">
        <privateKey>${worker_pem}</privateKey>
      </privateKeySource>
    </com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey>
    EOF
    
        # Creating credential using cred.xml
        cat /tmp/cred.xml | $jenkins_cmd create-credentials-by-xml system::system::jenkins _
    
        # For Deleting Node, used when testing
        $jenkins_cmd delete-node $NODE_NAME
        
        # Generating node.xml for creating node on Jenkins server
        cat > /tmp/node.xml <<EOF
    <slave>
      <name>$NODE_NAME</name>
      <description>Linux Slave</description>
      <remoteFS>$NODE_SLAVE_HOME</remoteFS>
      <numExecutors>$EXECUTORS</numExecutors>
      <mode>NORMAL</mode>
      <retentionStrategy class="hudson.slaves.RetentionStrategy\$Always"/>
      <launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.5">
        <host>$SLAVE_IP</host>
        <port>$SSH_PORT</port>
        <credentialsId>$CRED_ID</credentialsId>
      </launcher>
      <label>$LABELS</label>
      <nodeProperties/>
      <userId>$USERID</userId>
    </slave>
    EOF
    
      sleep 10
      
      # Creating node using node.xml
      cat /tmp/node.xml | $jenkins_cmd create-node $NODE_NAME
    }
    
    ### script begins here ###
    
    wait_for_jenkins
    
    slave_setup
    
    echo "Done"
    exit 0

    This will not only create a node on Jenkins master but also attach it.

    Command to run: Initialize terraform – terraform init, Check and apply – terraform plan -> terraform apply

    One drawback of this is, if by any chance slave gets disconnected or goes down, it will remain on Jenkins master as offline, also it will not manually attach itself to Jenkins master.

    Some solutions for them are:

    1. Create a cron job on the slave which will run user-data after a certain interval.

    2. Use swarm plugin.

    3. As we are on AWS, we can even use Amazon EC2 Plugin.

    Maybe in a future blog, we will cover using both of these plugins as well.

    Using Packer to create AMI’s for Windows Slave

    Windows AMI will also be created using packer. All the pointers for Windows will remain as it were for Linux.

    {
      "variables": {
        "ami-description": "Windows Server for Jenkins Slave ({{isotime \"2006-01-02-15-04-05\"}})",
        "ami-name": "windows-slave-for-jenkins-{{isotime \"2006-01-02-15-04-05\"}}",
        "aws_access_key": "",
        "aws_secret_key": ""
      },
    
      "builders": [
        {
          "ami_description": "{{user `ami-description`}}",
          "ami_name": "{{user `ami-name`}}",
          "ami_regions": [
            "us-east-1"
          ],
          "ami_users": [
            "XXXXXXXXXX"
          ],
          "ena_support": "true",
          "instance_type": "t3.medium",
          "region": "us-east-1",
          "source_ami_filter": {
            "filters": {
              "name": "Windows_Server-2016-English-Full-Containers-*",
              "root-device-type": "ebs",
              "virtualization-type": "hvm"
            },
            "most_recent": true,
            "owners": [
              "amazon"
            ]
          },
          "sriov_support": "true",
          "user_data_file": "scripts/SetUpWinRM.ps1",
          "communicator": "winrm",
          "winrm_username": "Administrator",
          "winrm_insecure": true,
          "winrm_use_ssl": true,
          "tags": {
            "Name": "{{user `ami-name`}}"
          },
          "type": "amazon-ebs"
        }
      ],
      "post-processors": [
      {
        "inline": [
          "echo AMI Name {{user `ami-name`}}",
          "date",
          "exit 0"
        ],
        "type": "shell-local"
      }
      ],
      "provisioners": [
        {
          "type": "powershell",
          "valid_exit_codes": [ 0, 3010 ],
          "scripts": [
            "scripts/disable-uac.ps1",
            "scripts/enable-rdp.ps1",
            "install_windows.ps1"
          ]
        },
        {
          "type": "windows-restart",
          "restart_check_command": "powershell -command \"& {Write-Output 'restarted.'}\""
        },
        {
          "type": "powershell",
          "inline": [
            "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
            "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"
          ]
        }
      ]
    }

    Now when it comes to windows one should know that it does not behave the same way Linux does. For us to be able to communicate with this image an essential component required is WinRM. We set it up at the very beginning as part of user_data_file. Also, windows require user input for a lot of things and while automating it is not possible to provide it as it will break the flow of execution so we disable UAC and enable RDP so that we can connect to that machine from our local desktop for debugging if needed. And at last, we will execute install_windows.ps1 file which will set up our slave. Please note at the last we are calling two PowerShell scripts to generate random password every time a new machine is created. It is mandatory to have them or you will never be able to login into your machines.

    There are multiple user-data in the above code, let’s understand them in their order of appearance.

    SetUpWinRM.ps1:

    <powershell>
    
    write-output "Running User Data Script"
    write-host "(host) Running User Data Script"
    
    Set-ExecutionPolicy Unrestricted -Scope LocalMachine -Force -ErrorAction Ignore
    
    # Don't set this before Set-ExecutionPolicy as it throws an error
    $ErrorActionPreference = "stop"
    
    # Remove HTTP listener
    Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
    
    $Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "packer"
    New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force
    
    # WinRM
    write-output "Setting up WinRM"
    write-host "(host) setting up WinRM"
    
    cmd.exe /c winrm quickconfig -q
    cmd.exe /c winrm set "winrm/config" '@{MaxTimeoutms="1800000"}'
    cmd.exe /c winrm set "winrm/config/winrs" '@{MaxMemoryPerShellMB="1024"}'
    cmd.exe /c winrm set "winrm/config/service" '@{AllowUnencrypted="true"}'
    cmd.exe /c winrm set "winrm/config/client" '@{AllowUnencrypted="true"}'
    cmd.exe /c winrm set "winrm/config/service/auth" '@{Basic="true"}'
    cmd.exe /c winrm set "winrm/config/client/auth" '@{Basic="true"}'
    cmd.exe /c winrm set "winrm/config/service/auth" '@{CredSSP="true"}'
    cmd.exe /c winrm set "winrm/config/listener?Address=*+Transport=HTTPS" "@{Port=`"5986`";Hostname=`"packer`";CertificateThumbprint=`"$($Cert.Thumbprint)`"}"
    cmd.exe /c netsh advfirewall firewall set rule group="remote administration" new enable=yes
    cmd.exe /c netsh firewall add portopening TCP 5986 "Port 5986"
    cmd.exe /c net stop winrm
    cmd.exe /c sc config winrm start= auto
    cmd.exe /c net start winrm
    
    </powershell>

    The content is pretty straightforward as it is just setting up WInRM. The only thing that matters here is the <powershell> and </powershell>. They are mandatory as packer will not be able to understand what is the type of script. Next, we come across disable-uac.ps1 & enable-rdp.ps1, and we have discussed their purpose before. The last user-data is the actual user-data that we need to install all the required packages in the AMI.

    Chocolatey: a blessing in disguise – Installing required applications in windows by scripting is a real headache as you have to write a lot of stuff just to install a single application but luckily for us we have chocolatey. It works as a package manager for windows and helps us install applications as we are installing packages on Linux. install_windows.ps1 has installation step for chocolatey and how it can be used to install other applications on windows.

    See, such a small script and you can get all the components to run your Windows application in no time (Kidding… This script actually takes around 20 minutes to run :P)

    Remaining user-data can be found here.

    Now that we have the image for ourselves let’s start with terraform script to make this machine a slave of your Jenkins master.

    Creating Terraform Script for Spinning up Windows Slave and Connect it to Master

    This time also we will first create the security groups and then create the slave machine from the same AMI that we developed above.

    resource "aws_security_group" "dev_jenkins_worker_windows" {
      name        = "dev_jenkins_worker_windows"
      description = "Jenkins Server: created by Terraform for [dev]"
    
      # legacy name of VPC ID
      vpc_id = "${data.aws_vpc.default_vpc.id}"
    
      tags {
        Name = "dev_jenkins_worker_windows"
        env  = "dev"
      }
    }
    
    ###############################################################################
    # ALL INBOUND
    ###############################################################################
    
    # ssh
    resource "aws_security_group_rule" "jenkins_worker_windows_from_source_ingress_webui" {
      type              = "ingress"
      from_port         = 8080
      to_port           = 8080
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "ssh to jenkins_worker_windows"
    }
    
    # rdp
    resource "aws_security_group_rule" "jenkins_worker_windows_from_rdp" {
      type              = "ingress"
      from_port         = 3389
      to_port           = 3389
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["<Your Public IP>/32"]
      description       = "rdp to jenkins_worker_windows"
    }
    
    ###############################################################################
    # ALL OUTBOUND
    ###############################################################################
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_all_80" {
      type              = "egress"
      from_port         = 80
      to_port           = 80
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker to all 80"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_all_443" {
      type              = "egress"
      from_port         = 443
      to_port           = 443
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker to all 443"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_jenkins_server_33453" {
      type              = "egress"
      from_port         = 33453
      to_port           = 33453
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["172.31.0.0/16"]
      description       = "allow jenkins worker windows to jenkins server"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_jenkins_server_8080" {
      type                     = "egress"
      from_port                = 8080
      to_port                  = 8080
      protocol                 = "tcp"
      security_group_id        = "${aws_security_group.dev_jenkins_worker_windows.id}"
      source_security_group_id = "${aws_security_group.jenkins_server.id}"
      description              = "allow jenkins workers windows to jenkins server"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_all_22" {
      type              = "egress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker windows to connect outbound from 22"
    }

    Once security groups are in place we move towards creating the terraform file for windows machine itself. Windows can’t connect to Jenkins master using SSH the method we used while connecting the Linux slave instead we have to use JNLP. A quick recap, when creating Jenkins master we used xmlstarlet to modify the JNLP port and also added rules in sg group to allow connection for JNLP. Also, we have opened the port for RDP so that if any issue occurs you can get in the machine and debug it.

    Terraform file:

    # Setting Up Windows Slave 
    data "aws_ami" "jenkins_worker_windows" {
      most_recent      = true
      owners           = ["self"]
    
      filter {
        name   = "name"
        values = ["windows-slave-for-jenkins*"]
      }
    }
    
    resource "aws_key_pair" "jenkins_worker_windows" {
      key_name   = "jenkins_worker_windows"
      public_key = "${file("jenkins_worker.pub")}"
    }
    
    data "template_file" "userdata_jenkins_worker_windows" {
      template = "${file("scripts/jenkins_worker_windows.ps1")}"
    
      vars {
        env         = "dev"
        region      = "us-east-1"
        datacenter  = "dev-us-east-1"
        node_name   = "us-east-1-jenkins_worker_windows"
        domain      = ""
        device_name = "eth0"
        server_ip   = "${aws_instance.jenkins_server.private_ip}"
        worker_pem  = "${data.local_file.jenkins_worker_pem.content}"
        jenkins_username = "admin"
        jenkins_password = "mysupersecretpassword"
      }
    }
    
    # lookup the security group of the Jenkins Server
    data "aws_security_group" "jenkins_worker_windows" {
      filter {
        name   = "group-name"
        values = ["dev_jenkins_worker_windows"]
      }
    }
    
    resource "aws_launch_configuration" "jenkins_worker_windows" {
      name_prefix                 = "dev-jenkins-worker-"
      image_id                    = "${data.aws_ami.jenkins_worker_windows.image_id}"
      instance_type               = "t3.medium"
      iam_instance_profile        = "dev_jenkins_worker_windows"
      key_name                    = "${aws_key_pair.jenkins_worker_windows.key_name}"
      security_groups             = ["${data.aws_security_group.jenkins_worker_windows.id}"]
      user_data                   = "${data.template_file.userdata_jenkins_worker_windows.rendered}"
      associate_public_ip_address = false
    
      root_block_device {
        delete_on_termination = true
        volume_size = 100
      }
    
      lifecycle {
        create_before_destroy = true
      }
    }
    
    resource "aws_autoscaling_group" "jenkins_worker_windows" {
      name                      = "dev-jenkins-worker-windows"
      min_size                  = "1"
      max_size                  = "2"
      desired_capacity          = "2"
      health_check_grace_period = 60
      health_check_type         = "EC2"
      vpc_zone_identifier       = ["${data.aws_subnet_ids.default_public.ids}"]
      launch_configuration      = "${aws_launch_configuration.jenkins_worker_windows.name}"
      termination_policies      = ["OldestLaunchConfiguration"]
      wait_for_capacity_timeout = "10m"
      default_cooldown          = 60
    
      #lifecycle {
      #  create_before_destroy = true
      #}
    
    
      ## on replacement, gives new service time to spin up before moving on to destroy
      #provisioner "local-exec" {
      #  command = "sleep 60"
      #}
    
      tags = [
        {
          key                 = "Name"
          value               = "dev_jenkins_worker_windows"
          propagate_at_launch = true
        },
        {
          key                 = "class"
          value               = "dev_jenkins_worker_windows"
          propagate_at_launch = true
        },
      ]
    }

    Finally, we reach the user-data for the terraform plan. It will download the required jar file, create a node on Jenkins and register itself as a slave.

    <powershell>
    
    function Wait-For-Jenkins {
    
      Write-Host "Waiting jenkins to launch on 8080..."
    
      Do {
      Write-Host "Waiting for Jenkins"
    
       Nc -zv ${server_ip} 8080
       If( $? -eq $true ) {
         Break
       }
       Sleep 10
    
      } While (1)
    
      Do {
       Write-Host "Waiting for JNLP"
          
       Nc -zv ${server_ip} 33453
       If( $? -eq $true ) {
        Break
       }
       Sleep 10
    
      } While (1)      
    
      Write-Host "Jenkins launched"
    }
    
    function Slave-Setup()
    {
      # Register_slave
      $JENKINS_URL="http://${server_ip}:8080"
    
      $USERNAME="${jenkins_username}"
      
      $PASSWORD="${jenkins_password}"
    
      $AUTH = -join ("$USERNAME", ":", "$PASSWORD")
      echo $AUTH
    
      # Below IP collection logic works for Windows Server 2016 edition and needs testing for windows server 2008 edition
      $SLAVE_IP=(ipconfig | findstr /r "[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*" | findstr "IPv4 Address").substring(39) | findstr /B "172.31"
      
      $NODE_NAME="jenkins-slave-windows-$SLAVE_IP"
      
      $NODE_SLAVE_HOME="C:\Jenkins\"
      $EXECUTORS=2
      $JNLP_PORT=33453
    
      $CRED_ID="$NODE_NAME"
      $LABELS="build windows"
      
      # Creating CMD utility for jenkins-cli commands
      # This is not working in windows therefore specify full path
      $jenkins_cmd = "java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth admin:$PASSWORD"
    
      Sleep 20
    
      Write-Host "Downloading jenkins-cli.jar file"
      (New-Object System.Net.WebClient).DownloadFile("$JENKINS_URL/jnlpJars/jenkins-cli.jar", "C:\Jenkins\jenkins-cli.jar")
    
      Write-Host "Downloading slave.jar file"
      (New-Object System.Net.WebClient).DownloadFile("$JENKINS_URL/jnlpJars/slave.jar", "C:\Jenkins\slave.jar")
    
      Sleep 10
    
      # Waiting for Jenkins to load all plugins
      Do {
      
        $count=(java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH list-plugins | Measure-Object -line).Lines
        $ret=$?
    
        Write-Host "count [$count] ret [$ret]"
    
        If ( $count -gt 0 ) {
            Break
        }
    
        sleep 30
      } While ( 1 )
    
      # For Deleting Node, used when testing
      Write-Host "Deleting Node $NODE_NAME if present"
      java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH delete-node $NODE_NAME
      
      # Generating node.xml for creating node on Jenkins server
      $NodeXml = @"
    <slave>
    <name>$NODE_NAME</name>
    <description>Windows Slave</description>
    <remoteFS>$NODE_SLAVE_HOME</remoteFS>
    <numExecutors>$EXECUTORS</numExecutors>
    <mode>NORMAL</mode>
    <retentionStrategy class="hudson.slaves.RetentionStrategy`$Always`"/>
    <launcher class="hudson.slaves.JNLPLauncher">
      <workDirSettings>
        <disabled>false</disabled>
        <internalDir>remoting</internalDir>
        <failIfWorkDirIsMissing>false</failIfWorkDirIsMissing>
      </workDirSettings>
    </launcher>
    <label>$LABELS</label>
    <nodeProperties/>
    </slave>
    "@
      $NodeXml | Out-File -FilePath C:\Jenkins\node.xml 
    
      type C:\Jenkins\node.xml
    
      # Creating node using node.xml
      Write-Host "Creating $NODE_NAME"
      Get-Content -Path C:\Jenkins\node.xml | java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH create-node $NODE_NAME
    
      Write-Host "Registering Node $NODE_NAME via JNLP"
      Start-Process java -ArgumentList "-jar C:\Jenkins\slave.jar -jnlpCredentials $AUTH -jnlpUrl $JENKINS_URL/computer/$NODE_NAME/slave-agent.jnlp"
    }
    
    ### script begins here ###
    
    Wait-For-Jenkins
    
    Slave-Setup
    
    echo "Done"
    </powershell>
    <persist>true</persist>

    Command to run: Initialize terraform – terraform init, Check and apply – terraform plan -> terraform apply

    Same drawbacks are applicable here and the same solutions will work here as well.

    Congratulations! You have a Jenkins master with Windows and Linux slave attached to it.

    IAM roles for reference

    Jenkins Master

    Linux Slave

    Windows Slave

    Bonus:

    If you want to associate IAM permissions to the user but cannot assign FULL ACCESS here is a curated list below for reference:

    Packer Policy

    Terraform Policy

    Conclusion:

    This blog tries to highlight one of the ways in which we can use packer and Terraform to create AMI’s which will serve as Jenkins master and slave. We not only covered their creation but also focused on how to associate security groups and checked some of the basic IAM roles that can be applied. Although we have covered almost all the possible scenarios but still depending on use case, the required changes would be very less and this can serve as a boiler plate code when beginning to plan your infrastructure on cloud.

  • Setting Up A Robust Authentication Environment For OpenSSH Using QR Code PAM

    Do you like WhatsApp Web authentication? Well, WhatsApp Web has always fascinated me with the simplicity of QR-Code based authentication. Though there are similar authentication UIs available, I always wondered whether a remote secure shell (SSH) could be authenticated with a QR code with this kind of simplicity while keeping the auth process secure. In this guide, we will see how to write and implement a bare-bones PAM module for OpenSSH Linux-based system.

    “OpenSSH is the premier connectivity tool for remote login with the SSH protocol. It encrypts all traffic to eliminate eavesdropping, connection hijacking, and other attacks. In addition, OpenSSH provides a large suite of secure tunneling capabilities, several authentication methods, and sophisticated configuration options.”

    openssh.com

    Meet PAM!

    PAM, short for “Pluggable Authentication Module,” is a middleware that abstracts authentication features on Linux and UNIX-like operating systems. PAM has been around for more than two decades. The authentication process could be cumbersome with each service looking for authenticating users with a different set of hardware and software, such as username-password, fingerprint module, face recognition, two-factor authentication, LDAP, etc. But the underlining process remains the same, i.e., users must be authenticated as who they say they are. This is where PAM comes into the picture and provides an API to the application layer and provides built-in functions to implement and extend PAM capability.

    Source: Redhat

    Understand how OpenSSH interacts with PAM

    The Linux host OpenSSH (sshd daemon) begins by reading the configuration defined in /etc/pam.conf or alternatively in /etc/pam.d configuration files. The config files are usually defined with service names having various realms (auth, account, session, password). The “auth” realm is what takes care of authenticating users as who they say. A typical sshd PAM service file on Ubuntu OS can be seen below, and you can relate with your own flavor of Linux:

    @include common-auth
    account    required     pam_nologin.so
    @include common-account
    session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so close
    session    required     pam_loginuid.so
    session    optional     pam_keyinit.so force revoke
    @include common-session
    session    optional     pam_motd.so  motd=/run/motd.dynamic
    session    optional     pam_motd.so noupdate
    session    optional     pam_mail.so standard noenv # [1]
    session    required     pam_limits.so
    session    required     pam_env.so # [1]
    session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale
    session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so open
    @include common-password

    The common-auth file has an “auth” realm with the pam_unix.so PAM module, which is responsible for authenticating the user with a password. Our goal is to write a PAM module that replaces pam_unix.so with our own version.

    When OpenSSH makes calls to the PAM module, the very first function it looks for is “pam_sm_authenticate,” along with some other mandatory function such as pam_sm_setcred. Thus, we will be implementing the pam_sm_authenticate function, which will be an entry point to our shared object library. The module should return PAM_SUCCESS (0) as the return code for successful authentication.

    Application Architecture

    The project architecture has four main applications. The backend is hosted on an AWS cloud with minimal and low-cost infrastructure resources.

    1. PAM Module: Provides QR-Code auth prompt to client SSH Login

    2. Android Mobile App: Authenticates SSH login by scanning a QR code

    3. QR Auth Server API: Backend application to which our Android App connects and communicates and shares authentication payload along with some other meta information

    4. WebSocket Server (API Gateway WebSocket, and NodeJS) App: PAM Module and server-side app shares auth message payload in real time

    When a user connects to the remote server via SSH, a PAM module is triggered, offering a QR code for authentication. Information is exchanged between the API gateway WebSocket, which in terms saves temporary auth data in DynamoDB. A user then uses an Android mobile app (written in react-native) to scan the QR code.

    Upon scanning, the app connects to the API gateway. An API call is first authenticated by AWS Cognito to avoid any intrusion. The request is then proxied to the Lambda function, which authenticates input payload comparing information available in DynamoDB. Upon successful authentication, the Lambda function makes a call to the API gateway WebSocket to inform the PAM to authenticate the user.

    Framework and Toolchains

    PAM modules are shared object libraries that must be be written in C (although other languages can be used to compile and link or probably make cross programming language calls like python pam or pam_exec). Below are the framework and toolset I am using to serve this project:

    1. gcc, make, automake, autoreconf, libpam (GNU dev tools on Ubuntu OS)

    2. libqrencode, libwebsockets, libpam, libssl, libcrypto (C libraries)

    3. NodeJS, express (for server-side app)

    4. API gateway and API Gateway webSocket, AWS Lambda (AWS Cloud Services for hosting serverless server side app)

    5. Serverless framework (for easily deploying infrastructure)

    6. react-native, react-native-qrcode-scanner (for Android mobile app)

    7. AWS Cognito (for authentication)

    8. AWS Amplify Library

    This guide assumes you have a basic understanding of the Linux OS, C programming language, pointers, and gcc code compilation. For the backend APIs, I prefer to use NodeJS as a primary programming language, but you may opt for the language of your choice for designing HTTP APIs.

    Authentication with QR Code PAM Module

    When the module initializes, we first want to generate a random string with the help “/dev/urandom” character device. Byte string obtained from this device contains non-screen characters, so we encode them with Base64. Let’s call this string an auth verification string.

    void get_random_string(char *random_str,int length)
    {
       FILE *fp = fopen("/dev/urandom","r");
       if(!fp){
           perror("Unble to open urandom device");
           exit(EXIT_FAILURE);
       }
       fread(random_str,length,1,fp);
       fclose(fp);
    }
     
    char random_string[11];
      
      //get random string
       get_random_string(random_string,10);
      //convert random string to base64 coz input string is coming from /dev/urandom and may contain binary chars
       const int encoded_length = Base64encode_len(10);
       base64_string=(char *)malloc(encoded_length+1);
       Base64encode(base64_string,random_string,10);
       base64_string[encoded_length]='';

    We then initiate a WebSocket connection with the help of the libwebsockets library and connect to our API Gateway WebSocket endpoint. Once the connection is established, we inform that a user may try to authenticate with auth verification string. The API Gateway WebSocket returns a unique connection ID to our PAM module.

    static void connect_client(struct lws_sorted_usec_list *sul)
    {
       struct vhd_minimal_client_echo *vhd =
           lws_container_of(sul, struct vhd_minimal_client_echo, sul);
       struct lws_client_connect_info i;
       char host[128];
       lws_snprintf(host, sizeof(host), "%s:%u", *vhd->ads, *vhd->port);
       memset(&i, 0, sizeof(i));
       i.context = vhd->context;
      //i.port = *vhd->port;
       i.port = *vhd->port;
       i.address = *vhd->ads;
       i.path = *vhd->url;
       i.host = host;
       i.origin = host;
       i.ssl_connection = LCCSCF_USE_SSL | LCCSCF_ALLOW_SELFSIGNED | LCCSCF_SKIP_SERVER_CERT_HOSTNAME_CHECK | LCCSCF_PIPELINE;
      //i.ssl_connection = 0;
       if ((*vhd->options) & 2)
           i.ssl_connection |= LCCSCF_USE_SSL;
       i.vhost = vhd->vhost;
       i.iface = *vhd->iface;
      //i.protocol = ;
       i.pwsi = &vhd->client_wsi;
      //lwsl_user("connecting to %s:%d/%s\n", i.address, i.port, i.path);
       log_message(LOG_INFO,ws_applogic.pamh,"About to create connection %s",host);
      //return !lws_client_connect_via_info(&i);
       if (!lws_client_connect_via_info(&i))
           lws_sul_schedule(vhd->context, 0, &vhd->sul,
                    connect_client, 10 * LWS_US_PER_SEC);
    }

    Upon receiving the connection id from the server, the PAM module converts this connection id to SHA1 hash string and finally composes a unique string for generating QR Code. This string consists of three parts separated by colons (:), i.e.,

    “qrauth:BASE64(AUTH_VERIFY_STRING):SHA1(CONNECTION_ID).” For example, let’s say a random Base64 encoded string is “UX6t4PcS5doEeA==” and connection id is “KZlfidYvBcwCFFw=”

    Then the final encoded string is “qrauth:UX6t4PcS5doEeA==:2fc58b0cc3b13c3f2db49a5b4660ad47c873b81a.

    This string is then encoded to the UTF8 QR code with the help of libqrencode library and the authentication screen is prompted by the PAM module.

    char *con_id=strstr(msg,ws_com_strings[READ_WS_CONNECTION_ID]);
               int length = strlen(ws_com_strings[READ_WS_CONNECTION_ID]);
              
               if(!con_id){
                   pam_login_status=PAM_AUTH_ERR;
                   interrupted=1;
                   return;
               }
               con_id+=length;
               log_message(LOG_DEBUG,ws_applogic.pamh,"strstr is %s",con_id);
               string_crypt(ws_applogic.sha_code_hex, con_id);
               sprintf(temp_text,"qrauth:%s:%s",ws_applogic.authkey,ws_applogic.sha_code_hex);
               char *qr_encoded_text=get_qrcode_string(temp_text);
               ws_applogic.qr_encoded_text=qr_encoded_text;
               conv_info(ws_applogic.pamh,"\nSSH Auth via QR Code\n\n");
               conv_info(ws_applogic.pamh, ws_applogic.qr_encoded_text);
               log_message(LOG_INFO,ws_applogic.pamh,"Use Mobile App to Scan \n %s",ws_applogic.qr_encoded_text);
               log_message(LOG_INFO,ws_applogic.pamh,"%s",temp_text);
               ws_applogic.current_action=READ_WS_AUTH_VERIFIED;
               sprintf(temp_text,ws_com_strings[SEND_WS_EXPECT_AUTH],ws_applogic.authkey,ws_applogic.username);
               websocket_write_back(wsi,temp_text,-1);
               conv_read(ws_applogic.pamh,"\n\nUse Mobile SSH QR Auth App to Authentiate SSh Login and Press Enter\n\n",PAM_PROMPT_ECHO_ON);

    API Gateway WebSocket App

    We used a serverless framework for easily creating and deploying our infrastructure resources. With serverless cli, we use aws-nodejs template (serverless create –template aws-nodejs). You can find a detailed guide on Serverless, API Gateway WebSocket, and DynamoDB here. Below is the template YAML definition. Note that the DynamoDB resource has TTL set to expires_at property. This field holds the UNIX epoch timestamp.

    What this means is that any record that we store is automatically deleted as per the epoch time set. We plan to keep the record only for 5 minutes. This also means the user must authenticate themselves within 5 minutes of the authentication request to the remote SSH server.

    service: ssh-qrapp-websocket
    frameworkVersion: '2'
    useDotenv: true
    provider:
     name: aws
     runtime: nodejs12.x
     lambdaHashingVersion: 20201221
     websocketsApiName: ssh-qrapp-websocket
     websocketsApiRouteSelectionExpression: $request.body.action
     region: ap-south-1
      iam:
       role:
         statements:
           - Effect: Allow
             Action:
               - "dynamodb:query"
               - "dynamodb:GetItem"
               - "dynamodb:PutItem"
             Resource:
               - Fn::GetAtt: [ SSHAuthDB, Arn ]
      environment:
       REGION: ${env:REGION}
       DYNAMODB_TABLE: SSHAuthDB
       WEBSOCKET_ENDPOINT: ${env:WEBSOCKET_ENDPOINT}
       NODE_ENV: ${env:NODE_ENV}
    package:
     patterns:
       - '!node_modules/**'
       - handler.js
       - '!package.json'
       - '!package-lock.json'
    plugins:
     - serverless-dotenv-plugin
    layers:
     sshQRAPPLibs:
       path: layer
       compatibleRuntimes:
         - nodejs12.x
    functions:
     connectionHandler:
       handler: handler.connectHandler
       timeout: 60
       memorySize: 256
       layers:
         - {Ref: SshQRAPPLibsLambdaLayer}
       events:
         - websocket:
            route: $connect
            routeResponseSelectionExpression: $default
     disconnectHandler:
       handler: handler.disconnectHandler
       memorySize: 256
       timeout: 60
       layers:
         - {Ref: SshQRAPPLibsLambdaLayer}
       events:
         - websocket: $disconnect
     defaultHandler:
       handler: handler.defaultHandler
       memorySize: 256
       timeout: 60
       layers:
         - {Ref: SshQRAPPLibsLambdaLayer}
       events:
         - websocket: $default
     customQueryHandler:
       handler: handler.queryHandler
       memorySize: 256
       timeout: 60
       layers:
         - {Ref: SshQRAPPLibsLambdaLayer}
       events:
         - websocket:
            route: expectauth
            routeResponseSelectionExpression: $default
         - websocket:
            route: getconid
            routeResponseSelectionExpression: $default
         - websocket:
            route: verifyauth
            routeResponseSelectionExpression: $default
     resources:
     Resources:
       SSHAuthDB:
         Type: AWS::DynamoDB::Table
         Properties:
           TableName: ${env:DYNAMODB_TABLE}
           AttributeDefinitions:
             - AttributeName: authkey
               AttributeType: S
           KeySchema:
             - AttributeName: authkey
               KeyType: HASH
           TimeToLiveSpecification:
             AttributeName: expires_at
             Enabled: true
           ProvisionedThroughput:
             ReadCapacityUnits: 2
             WriteCapacityUnits: 2

    The API Gateway WebSocket has three custom events. These events come as an argument to the lambda function in “event.body.action.” API Gateway WebSocket calls them as route selection expressions. These custom events are:

    • The “expectauth” event is sent by the PAM module to WebSocket informing that a client has asked for authentication and mobile application may try to authenticate by scanning QR code. During this event, the WebSocket handler stores the connection ID along with auth verification string. This key acts as a primary key to our DynamoDB table.
    • The “getconid” event is sent to retrieve the current connection ID so that the PAM can generate a SHA1 sum and provide a QR Code prompt.
    • The “verifyauth” event is sent by the PAM module to confirm and verify authentication. During this event, even the WebSocket server expects random challenge response text. WebSocket server retrieves data payload from DynamoDB with auth verification string as primary key, and tries to find the key “authVerified” marked as “true” (more on this later).
    queryHandler: async (event,context) => {
       const payload = JSON.parse(event.body);
       const documentClient = new DynamoDB.DocumentClient({
         region : process.env.REGION
       });
       try {
         switch(payload.action){
           case 'expectauth':
            
             const expires_at = parseInt(new Date().getTime() / 1000) + 300;
      
             await documentClient.put({
               TableName : process.env.DYNAMODB_TABLE,
               Item: {
                 authkey : payload.authkey,
                 connectionId : event.requestContext.connectionId,
                 username : payload.username,
                 expires_at : expires_at,
                 authVerified: false
               }
             }).promise();
             return {
               statusCode: 200,
               body : "OK"
             };
           case 'getconid':
             return {
               statusCode: 200,
               body: `connectionid:${event.requestContext.connectionId}`
             };
           case 'verifyauth':
             const data = await documentClient.get({
               TableName : process.env.DYNAMODB_TABLE,
               Key : {
                 authkey : payload.authkey
               }
             }).promise();
             if(!("Item" in data)){
               throw "Failed to query data";
             }
             if(data.Item.authVerified === true){
               return {
                 statusCode: 200,
                 body: `authverified:${payload.challengeText}`
               }
             }
             throw "auth verification failed";
         }
       } catch (error) {
         console.log(error);
       }
       return {
         statusCode:  200,
         body : "ok"
        };
      
     }

    Android App: SSH QR Code Auth

     

    The Android app consists of two parts. App login and scanning the QR code for authentication. The AWS Cognito and Amplify library ease out the process of a secure login. Just wrapping your react-native app with “withAutheticator” component you get ready to use “Login Screen.” We then use the react-native-qrcode-scanner component to scan the QR Code.

    This component returns decoded string on the successful scan. Application logic then breaks the string and finds the validity of the string decoded. If the decoded string is a valid application string, an API call is made to the server with the appropriate payload.

    render(){
       return (
         <View style={styles.container}>
           {this.state.authQRCode ?
           <AuthQRCode
            hideAuthQRCode = {this.hideAuthQRCode}
            qrScanData = {this.qrScanData}
           />
           :
           <View style={{marginVertical: 10}}>
           <Button title="Auth SSH Login" onPress={this.showAuthQRCode} />
           <View style={{margin:10}} />
           <Button title="Sign Out" onPress={this.signout} />
           </View>
          
           }
         </View>
       );
     }
         const scanCode = e.data.split(':');
         if(scanCode.length <3){
           throw "invalid qr code";
         }
         const [appstring,authcode,shacode] = scanCode;
         if(appstring !== "qrauth"){
           throw "Not a valid app qr code";
         }
         const authsession = await Auth.currentSession();
         const jwtToken = authsession.getIdToken().jwtToken;
         const response = await axios({
           url : "https://API_GATEWAY_URL/v1/app/sshqrauth/qrauth",
           method : "post",
           headers : {
             Authorization : jwtToken,
             'Content-Type' : 'application/json'
           },
           responseType: "json",
           data : {
             authcode,
             shacode
           }
         });
         if(response.data.status === 200){
           rescanQRCode=false;
           setTimeout(this.hideAuthQRCode, 1000);
         }

    This guide does not cover how to deploy react-native Android applications. You may refer to the official react-native guide to deploy your application to the Android mobile device.

    QR Auth API

    The QR Auth API is built using a serverless framework with aws-nodejs template. It uses API Gateway as HTTP API and AWS Cognito for authorizing input requests. The serverless YAML definition is defined below.

    service: ssh-qrauth-server
    frameworkVersion: '2 || 3'
    useDotenv: true
    provider:
     name: aws
     runtime: nodejs12.x
     lambdaHashingVersion: 20201221
     deploymentBucket:
       name: ${env:DEPLOYMENT_BUCKET_NAME}
     httpApi:
       authorizers:
         cognitoJWTAuth:
           identitySource: $request.header.Authorization
           issuerUrl: ${env:COGNITO_ISSUER}
           audience:
             - ${env:COGNITO_AUDIENCE}
     region: ap-south-1
     iam:
       role:
         statements:
         - Effect: "Allow"
           Action:
             - "dynamodb:Query"
             - "dynamodb:PutItem"
             - "dynamodb:GetItem"
           Resource:
             - ${env:DYNAMO_DB_ARN}
         - Effect: "Allow"
           Action:
             - "execute-api:Invoke"
             - "execute-api:ManageConnections"
           Resource:
             - ${env:API_GATEWAY_WEBSOCKET_API_ARN}/*
     environment:
       REGION: ${env:REGION}
       COGNITO_ISSUER: ${env:COGNITO_ISSUER}
       DYNAMODB_TABLE: ${env:DYNAMODB_TABLE}
       COGNITO_AUDIENCE: ${env:COGNITO_AUDIENCE}
       POOLID: ${env:POOLID}
       COGNITOIDP: ${env:COGNITOIDP}
       WEBSOCKET_ENDPOINT: ${env:WEBSOCKET_ENDPOINT}
    package:
     patterns:
       - '!node_modules/**'
       - handler.js
       - '!package.json'
       - '!package-lock.json'
       - '!.env'
       - '!test.http'
    plugins:
     - serverless-deployment-bucket
     - serverless-dotenv-plugin
    layers:
     qrauthLibs:
       path: layer
       compatibleRuntimes:
         - nodejs12.x
    functions:
     sshauthqrcode:
       handler: handler.authqrcode
       memorySize: 256
       timeout: 30
       layers:
         - {Ref: QrauthLibsLambdaLayer}
       events:
         - httpApi:
             path: /v1/app/sshqrauth/qrauth
             method: post
             authorizer:
               name: cognitoJWTAuth

    Once the API Gateway authenticates the incoming requests, control is handed over to the serverless-express router. At this stage, we verify the payload for the auth verify string, which is scanned by the Android mobile app. This auth verify string must be available in the DynamoDB table. Upon retrieving the record pointed by auth verification string, we read the connection ID property and convert it to SHA1 hash. If the hash matches with the hash available in the request payload, we update the record “authVerified” as “true” and inform the PAM module via API Gateway WebSocket API. PAM Module then takes care of further validation via challenge response text.

    The entire authentication flow is depicted in a flow diagram, and the architecture is depicted in the cover post of this blog.

     

    Compiling and Installing PAM module

    Unlike any other C programs, PAM modules are shared libraries. Therefore, the compiled code when loaded in memory may go at this arbitrary place. Thus, the module must be compiled as position independent. With gcc while compiling, we must pass -fPIC option. Further while linking and generating shared object binary, we should use -shared flag.

    gcc -I$PWD -fPIC -c $(ls *.c)
    gcc -shared -o pam_qrapp_auth.so $(ls *.o) -lpam -lqrencode -lssl -lcrypto -lpthread -lwebsockets

    To ease this process of compiling and validating libraries, I prefer to use the autoconf tool. The entire project is checked out at my GitHub repository along with autoconf scripts.

    Once the shared object file is generated (pam_qrapp_auth.so), copy this file to the “/usr/lib64/security/” directory and run ldconfig command to inform OS new shared library is available. Remove common-auth (from /etc/pam.d/sshd if applicable) or any line that uses “auth” realm with pam_unix.so module recursively used in /etc/pam.d/sshd. pam_unix.so module enforces a password or private key authentication. We then need to add our module to the auth realm (“auth required pam_qrapp_auth.so”). Depending upon your Linux flavor, your /etc/pam.d/sshd file may look similar to below:

    auth       required     pam_qrapp_auth.so
    account    required     pam_nologin.so
    @include common-account
    session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so close
    session    required     pam_loginuid.so
    session    optional     pam_keyinit.so force revoke
    @include common-session
    session    optional     pam_motd.so  motd=/run/motd.dynamic
    session    optional     pam_motd.so noupdate
    session    optional     pam_mail.so standard noenv # [1]
    session    required     pam_limits.so
    session    required     pam_env.so # [1]
    session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale
    session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so open
    @include common-password

    Finally, we need to configure our sshd daemon configuration file to allow challenge response authentication. Open file /etc/ssh/sshd_config and add “ChallengeResponseAuthentication yes” if already not available or commented or set to “no.” Reload the sshd service by issuing the command “systemctl reload sshd.” Voila, and we are done here.

    Conclusion

    This guide was a barebones tutorial and not meant for production use. There are certain flaws to this PAM module. For example, our module should prompt for changing the password if the password is expired or login should be denied if an account is a locked and similar feature that addresses security. Also, the Android mobile app should be bound with ssh username so that, AWS Cognito user bound with ssh username could only authenticate.

    One known limitation to this PAM module is we have to always hit enter after scanning the QR Code via Android Mobile App. This limitation is because of how OpenSSH itself is implemented. OpenSSH server blocks all the informational text unless user input is required. In our case, the informational text is UTF8 QR Code itself.

    However, no such input is required from the interactive device, as the authentication event comes from the WebSocket to PAM module. If we do not ask the user to exclusively press enter after scanning the QR Code our QR Code will never be displayed. Thus input here is a dummy. This is a known issue for OpenSSH PAM_TEXT_INFO. Find more about the issue here.

    References

    Pluggable authentication module

    An introduction to Pluggable Authentication Modules (PAM) in Linux

    Custom PAM for SSHD in C

    google-authenticator-libpam

    PAM_TEXT_INFO and PAM_ERROR_MSG conversation not honoured during PAM authentication

  • Set Up A Production-ready REST API Server Using TypeScript, Express And PostgreSQL

    Introduction

    So, you have a brilliant idea for a web application. It’s going to be the next big thing, and you are super-excited about it. Maybe you have already started building the perfect React/Angular UI for your app.

    Eventually, you realize that, like most web apps, your app is going to be data-intensive and will need a lightning-fast web server. You know that Node.js is the de facto standard for web servers for how well it unifies front-end and back-end web development with JavaScript, so you go for it.

    But you want your server to be robust and reliable too. A colleague introduces you to TypeScript, the superset of JavaScript developed by Microsoft, and recommends it for its strict static typing and compilation.

    Now comes storing the data. Naturally, you select PostgreSQL. After all, it is the most advanced Relational Database Management System (RDBMS) in the world, with its object-oriented features and extensibility. But RDBMSs can be slow for frequently used data and caching, so you decide to add Redis, the in-memory cache, to decrease data access latency and ease the load off your relational data store.

    That’s it. You have a perfect server waiting to be built. And while the initial process of getting it up and running can get arduous, you have come to the right place. This blog is going to guide you through the initial setup process.

    Prerequisites

    I am assuming you have a non-root user with sudo privileges running on Ubuntu 16.04. Before we start, please make sure you have the following: 

    1. NPM (~v6.9.0) and Node.js (~v10.16.0) – You can use this How to Install Node.js on Ubuntu 16.04
    2. Redis – How to install Redis on Ubuntu 16.04
    3. PostgreSQL – How to install PostgreSQL on Ubuntu 16.04

    Of course, MacOS or Windows would do fine too for this tutorial, but to use them, please find appropriate installation guides on the Internet before moving forward. 

    If you don’t want to go through the steps below, you can check out my GitHub Repo typescript-express-server and use it as your application skeleton. It has been set up with default configurations, which you can change later. Nevertheless, I strongly recommend going through this guide to further your understanding of the project files and configuration nuances.

    Initializing Server (Express with TypeScript)

    Setting up an Express Application with TypeScript can be done in three steps: 

    Initialize project using NPM

    Create a folder and run:

    npm init

    This will ask you a couple of project-specific questions, like name and version, and will create a package.json file, which may look like this:

    {
     "name": "my-typescript-express-server",
     "version": "0.0.0",
     "scripts": {
       "start": "node ./dist/index.js --env=production",
       "start:dev": "ts-node -r tsconfig-paths/register ./src",
      },
     "dependencies": {
       "cookie-parser": "^1.4.5",
       "dotenv": "^8.2.0"
      },
     "devDependencies": {
       "find": "^0.3.0",
       "fs-extra": "^9.0.1",
     }
    }

    This manifest file will contain all the metadata of your project, like module dependencies, configs, and scripts. For more information, check out this very good read about the basics of package.json

    Setting up TypeScript Configuration (tsconfig.json)

    This file needs to be created in the root of a TypeScript project. During development, TypeScript provides us with the convenience of running the code directly from the .ts extension files. But during production, since Node.js only understands JS, the entire TS files need to be transpiled to JS. Some of the options are: include – specifies the files to be included, exclude –  the files to exclude, and the compiler options: outFIle and moduleResolution.

    First, we need to install some TypeScript specific modules: 

    npm i typescript ts-node tsconfig-paths

    CODE: https://gist.github.com/velotiotech/95b021f1728a9b8a61d9fca89b0b9e59.js

    This is the tsconfig.json file with some default configurations:

    For a detailed reference, checkout tsconfig.json.

    Setting up ESLint

    It is not mandatory to use this JavaScript linter, but it’s highly recommended for enforcing code standards and keeping code clean. TypeScript projects once used TSLint, but it has been deprecated in favor of ESLint.

    Run this command:

    npm install --save-dev eslint @typescript-eslint/parser @typescript-eslint/eslint-plugin

    Create a .eslintrc file in the project root and use the following starter configuration:

    {
     "root": true,
     "parser": "@typescript-eslint/parser",
     "plugins": [
       "@typescript-eslint"
     ],
     "extends": [
       "eslint:recommended",
       "plugin:@typescript-eslint/eslint-recommended",
       "plugin:@typescript-eslint/recommended"
     ]
    }

    Lastly, add a lint script to package.json:

    {
      "name": "my-typescript-express-server",
      "version": "0.0.0",
      "scripts": {
        "start": "node ./dist/index.js --env=production",
        "start:dev": "ts-node -r tsconfig-paths/register ./src",
        "lint": "eslint . --ext .ts",
       },

    Now, you can run the command below to lint your codebase for lint errors:

    npm run lint

    ESLint has ample rules to enforce standards in your code. Please look them up at Eslint with TypeScript.

    Express App

    Finally, we need to install Express, which is as simple as running this command:

    npm install --save express @types/express

    You need a server file (src/Server.ts), which you can create like this:

    import cookieParser from 'cookie-parser';
    import express from 'express';
    import { BAD_REQUEST } from 'http-status-codes';
    import BaseRouter from './routes';
    const app = express();
    app.use(express.json());
    app.use(express.urlencoded({extended: true}));
    app.use(cookieParser());
    // Add APIs
    app.use('/api', BaseRouter);
    // Export express instance
    export default app;

    You will also need src/index.ts that will be the entry point for your application:

    import app from './Server';
    // Start the server
    const port = Number(process.env.PORT || 3000);
    app.listen(port, () => {
       logger.info('Express server started on port: ' + port);
    });

    Error Handling

    Many Express servers are configured to swallow all errors by configuring an Uncaught Exception handler, which in my opinion, is bad news. The best thing to do is to allow the application to crash and restart. Uncaught Exceptions in Node.js is a good read regarding this.

    Nonetheless, we are going to configure an error handler that will print errors and send a BadRequest response when an invalid HTTP request comes your API’s way.

    In the src/Server.ts, add this:

    /// Print API errors
    // eslint-disable-next-line @typescript-eslint/no-unused-vars
    app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
       logger.error(err.message, err);
       return res.status(BAD_REQUEST).json({
           error: err.message,
       });
    });

    Kudos! You have a basic Express server set up. Fire it up by running:

    npm run start:dev

    Connecting with the Database Store using TypeORM

    We have a basic server ready to go, but we need to connect it to our Postgres database using an ORM. TypeORM is a versatile ORM that supports both Active Record and Data Mapper patterns, unlike all other JavaScript ORMs. It can be installed on our server with the following steps:

    npm i --save typeorm pg reflect-metadata

    Create an ormconfig.json file in your project root with the following configuration:

    {
       "synchronize": true,
       "logging": false,
       "entities": [
          "src/entities/**/*.ts"
       ],
       "cli": {
          "entitiesDir": "src/entity",
          "migrationsDir": "src/migration",
          "subscribersDir": "src/subscriber"
        },
       "migrations": [
          "src/migration/**/*.ts"
       ],
       "subscribers": [
          "src/subscriber/**/*.ts"
        ]
    }

    Create a src/db.ts file that will initialize the database connection:

    import "reflect-metadata";
    import {createConnection} from "typeorm";
    import { Tedis } from "tedis";
    import logger from '../src/shared/Logger';
    export async function intializeDB(): Promise<void> {
      await createConnection();
    }

    TypeORM Entities are classes that represent the data models in our application. We are going to build a User Entity (which application doesn’t have a user, duh!) like this in src/entities/User.ts:

    import {Entity, PrimaryGeneratedColumn, Column} from "typeorm";
    @Entity()
    export class User {
      @PrimaryGeneratedColumn()
      id: number;
      @Column()
      firstName: string;
      @Column()
      lastName: string;
      @Column()
      age: number;
    }

    Then, add these lines to src/index.ts:

    import { intializeDB } from './db';
    intializeDB();

    You will need the env variables, like TYPEORM_CONNECTION, TYPEORM_HOST, and TYPEORM_USERNAME, with your postgres db’s connection params. Please check TypeORMs documentation for more details. 

    Connecting Redis

    We will use Tedis, the TypeScript wrapper for Redis in our server:

    npm i tedis

    Add these lines to src/db.ts:

    export function initializeCache(port: number | undefined) : unknown {
     const tedis = new Tedis({
       port: port,
       host: "127.0.0.1"
     });
     return tedis;
    }

    And these lines to src/index.ts:

    const redisPORT = Number(process.env.REDIS_PORT || 6379)
    initializeCache(redisPORT);

    Now, your application code can use the Redis cache using the client created above.

    Configuring Logging

    Logging is pivotal to an application because it gives us a real-time view of the state of our application. For development, we are going to install the Morgan Request Logger, a library that logs HTTP requests params. It comes really handy for debugging. 

    npm i morgan

    And include this in src/Server.ts:

    export function initializeCache(port: number | undefined) : unknown {
     const tedis = new Tedis({
       port: port,
       host: "127.0.0.1"
     });
     return tedis;
    }

    Winston can be used as the system-wide universal logger. Install it like this:

    npm i winston

    Then, add a src/shared/Logger.js file:

    import { createLogger, format, transports } from 'winston';
    // Import Functions
    const { File, Console } = transports;
    // Init Logger
    const logger = createLogger({
       level: 'info',
    });
    const errorStackFormat = format((info) => {
       if (info.stack) {
          // tslint:disable-next-line:no-console
          console.log(info.stack);
          return false;
        }
          return info;
       });
       const consoleTransport = new Console({
           format: format.combine(
               format.colorize(),
               format.simple(),
               errorStackFormat(),
           ),
       });
       logger.add(consoleTransport);
    }
    export default logger;

    Now, you can use this logger from anywhere in the code, be it for error logging in your API methods or for debugging purposes:

    import logger from '@shared/Logger';
    export async function intializeDB(): Promise<void> {
     await createConnection()
     logger.info('Database successfully initialized');
    }

    Creating your First API Service

    This is the moment you have been waiting for: creating your first API service for your application, the crux of the functionality that will define your web application.

    This API service is a simple GET request handler, which returns all the users in your database. You should have src/Users.ts, which can look like:

    import { Request, Response, Router } from 'express';
    import { BAD_REQUEST, CREATED, OK } from 'http-status-codes';
    import { ParamsDictionary } from 'express-serve-static-core';
    import { getConnection } from "typeorm";
    import { User } from "../entities/User";
    import { paramMissingError } from '../shared/constants';
    const router = Router();
    router.get('/all', async (req: Request, res: Response) => {
       const users = await getConnection()
           .getRepository(User)
           .createQueryBuilder("user")
           .getMany();
       return res.status(OK).json({users});
    });

    Add src/routes/index.ts

    import { Router } from 'express';
    import UserRouter from './Users';
    // Init router and path
    const router = Router();
    // Add sub-routes
    router.use('/users', UserRouter);
    // Export the base-router
    export default router;

    Voila! Your API service is ready. Fire up your server, and then use Postman to make requests to your API and see the magic happen. 

    You can also add other API services for fetching a user by ID, deleting a user, creating a user, and updating a user. I will not discuss them here to keep this blog short. You can find these in the Github repository I mentioned in the beginning.

    Deploying your Server to Production

    What we have been doing has been in the development phase. Now, we need to take this to production. You just need to have a <project-root>/build.js </project-root>script that will create a <project-root>/dist</project-root> folder and transpile all the TypeScript files that you have written. It can look like this: 

    const fsE = require('fs-extra');
    const childProcess = require('child_process');
    // Remove current build
    fsE.removeSync('./dist/');
    // Copy front-end files
    fsE.copySync('./src/public', './dist/public');
    fsE.copySync('./src/views', './dist/views');
    // Transpile the typescript files
    childProcess.execSync('tsc --build tsconfig.prod.json');

    Then, add this line to your <project-root>/package.json</project-root>:

    "scripts": {
       "build": "node build.js",
       "lint": "eslint . --ext .ts",

    Now, you can use:

    node build.js

    Doing so builds up the <project-root>/dist</project-root> folder and transpiles your code. You can deploy this folder to your deployment environment and run it to start your production server:

    npm start

    Note: You will need to do some additional setting up of your Nginx or AWS Virtual Machine to complete your deployment, which is beyond the scope of this blog.

    Going Forward

    Congratulations. You have made it through this tutorial that guided you through the process of setting up a web server. But this is just the beginning, and there is no end to the improvements and optimizations that you can add to your server to make it better and sturdier. And you will continue to discover them in your journey of developing your web application. Some of the key points that I want to mention are:

    Managing Environments

    Your Web server will be operated in multiple environments, such as development, testing, and production. Some of the vital configurations like AWS credentials and DB passwords are sensitive information, and managing them per environment is key to your development and deployment cycle. I strongly recommend using libraries like Dotenv and keeping your env configurations separate in your codebase. You can look up typescript-express-server for this.

    Configuring Swagger

    Software developers nowadays swear by this tool. It’s proved to be a godsend for API documentation and keeping APIs in confirmation with the OpenAPI standard. On top of that, it also does API requests validation according to your API specifications. I strongly recommend you configure this in your web server.

    Writing Tests

    Writing API tests and unit tests can be a crucial part of web application development as it exposes possible gaps in your systems. You can use Superagent, the lightweight REST API, to test your APIs for all possible requests and response scenarios. Please look up the src/spec in typescript-express-server about how to use it. You can also use Postman for API Testing Automation. For most of the services that you write, you should make sure to add unit tests for each of those using Jest.

    Further Reading

    1. Node.js production checklist
    2. Node.js production best practices
    3. Production best practices: performance and reliability

  • To Go Serverless Or Not Is The Question

    AWS Lambda was launched in 2014. Since then, serverless computing (Function as a Service) came into existence. We have been using Lambda for our projects for the last couple of years to build complete end-to-end web applications which includes usage of AWS Lambda with API Gateway (REST APIs), CloudWatch (logs), S3 (website hosting & data storage), and so on.

    Google and Azure also provide serverless technologies like AWS. There is one more popular open-source solution, i.e. OpenWhisk. While implementing serverless applications on AWS, we have learned a lot about running a website on Lambda. Like every other technology, serverless also has its own set of benefits and drawbacks that we will discuss here.

    ‍What Does Serverless Mean?‍

    Serverless is a dynamic cloud computing execution model where the server is run by the cloud providers i.e. AWS, Google, or Azure. This technology actually runs on the servers, but when they say serverless, it means that the servers are abstracted away from the users and provided as a service to them.

    The Serverless World‍

    There’s so much excitement for serverless in the industry. But there are issues that sometimes outweigh the pros of serverless architecture and would need complex workarounds. AWS charges for each invocation of Lambda in multiple of 100ms increments. When there are thousands of incoming requests coming up for EC2 servers, we need to scale up servers to handle them, but Lambda does this on its own. We don’t need to create auto-scaling or load balancers. But, how much does it cost to use Lambda? Let’s compare that below.

    Let’s say, we have a serverless application with only 1 Lambda & 1 API Gateway,

    • API Gateway
      $3.50/API calls * 200 million API requests/Month = 700 USD
    • Lambda
      $0.00001667 GB-second * (200 million requests * 0.3 seconds per execution * 1 GB Memory – 400k free tier seconds in case of new account) = 1308 USD
    • Total = 2008 USD (This is a lot)

    Now let’s see the example of the EC2 server,

    • 3 Highly available EC2 Server = 416 USD
      M5.xlarge: 16GB RAM, 4 vCPUs
    • Application Load Balancer = 39 USD
    • Total = 455 USD/Month

    So, if you compare the above pricing, classic servers are cheaper than serverless.

    This can be really useful for startups in their early stages where every rupee counts. In that case, Lambda or serverless will be very useful as it charges only for the number of hits coming on your server with less management and more development for the team. For example – you have your development environment for developers, so, instead of setting up new servers, you can go with serverless development.

    Loss Of Control‍

    One of the biggest disadvantages of serverless is that you don’t have the control over your services. We use a lot of services that are managed by third-party cloud providers, like Cloudwatch for logs and DynamoDB for databases. Also, various functions need to be managed as your project grows, and everything is handled by cloud providers. You lose portability as soon as you integrate with other services like Lambda with SNS, DynamoDB, Kinesis, and it also results in vendor lock-in. It becomes difficult for you to change the vendor later.

    On the other hand, in the non-serverless world, we can manage our language versions, queues, or db queries. Basically. we have all codes at one place where we don’t need to manage multiple functions. But every technology has its pros and cons which we said earlier as well. In serverless, there is a loss of control that leads to focusing less on development and more on adding the business values of our product.

    Choosing serverless or non-serverless will completely depend on the product type. If you have a simple application like selling cakes online and you need simple implementation or authentication, you can go with serverless. But if your application is really complex, you need to add some complex algorithm. To have the control over your code, security, and authentication, you should go with non-serverless.

    Security Issues‍

    The biggest risk in serverless or using cloud services is poorly configured functions, services, or applications. Bad configuration can lead to multiple issues in your application which can be either security-related or infrastructure-related. It doesn’t matter which cloud provider you are using, AWS, GCP, or Azure, it’s important to correctly configure your functions or services with the permission it needs to access other services and manage controls. Otherwise, it can lead to permission issues or security breach. Also, if you are connecting any third-party APIs with your provider, make sure the connections are safe and data is encrypted in the right format.

    Giving correct configuration is the most important thing in both serverless and non-serverless applications. When you use cloud services and be very strict about it, you will interact less with security breaches or permission issues in the near future.

    Testing & Debugging‍

    Serverless applications are hard to test. Normally, developers test the code locally and then deploy it. But in the serverless world, testing on local seems to be complicated, as no such tool is available to mock the cloud services on the local environment. So, we need to perform a decent amount of integration testing before moving forward. Currently, you can test & debug the code using Console or Print statement which will be visible in your Cloudwatch logs like below is one code snippet in Node.js.

    const https = require('https')
    let url = "https://google.com"
    
    exports.handler = async function(event) {
      const promise = new Promise(function(resolve, reject) {
        console.log("Processing URL: "url)
        https.get(url, (res) => {
            resolve(res.statusCode)
            // console for debugging / testing purpose.
            console.info("Request was successfull!!!")
          }).on('error', (e) => {
            reject(Error(e))
            // console for errors
            console.error("Error while processing:" + e)
          })
        })
      return promise
    }

    For serverless applications, it is important to give some time & effort upfront to architect your application correctly and create good integration tests over cloud infrastructure.

    It is difficult to test or debug the applications in serverless. In non-serverless applications, we debug the code, but in serverless, we need to debug end-to-end integration with multiple services that we use. Lambdas are so short-lived that till the time you search for the logs, they disappear. So, in this situation, we can use AWS Cloudwatch or Google Stackdriver that are meant to do that.

    Cold Start

    Regular cold start

    Source: AWS re: Invent

    An issue remains an issue until you trace that, and some technical issues are hard to find until you know or face them. Yes, Lambda has one such drawback which is known as Cold start. Lambda gets cold, it means, Lambda code runs on the server which is managed by Amazon. To make it feasible, Amazon doesn’t keep everyone’s code warm, i.e. it doesn’t serve all requests at the same time. So, if your particular function hasn’t run in a while, a request has to wait for Lambda to spin up the server then invoke the code, which will take some time for Lambda to give the result for that request.

    But, wait for how long? I was using Node.js and it took around 4 seconds to respond. This is not good for the end-user experience and it can impact your business. This kind of issue is not tolerable in today’s world where we need requests to respond faster to provide a better user experience.

    The problem is not much for limited Lambdas, but what if the number of Lambdas increases. Let’s say, there are 50s-100s of Lambdas, and warming up every Lambda can be annoying. You have to call Lambdas before the user calls it again, I mean, why? But there isn’t any solution rather than warming it. I particularly used the Serverless Framework for my serverless implementation. It helped me achieve most of the problems of Lambdas and other resources that we used to build serverless applications.

    Conclusion

    Serverless has many problems, I agree, but which tech doesn’t. When you choose either serverless or non-serverless, make sure you do your study and analyze your requirements to decide which direction to enter. If you want to implement quicker, small applications with strict deadlines and less budget, go with serverless, otherwise, choose EC2 servers. It mainly depends on the requirements. If you are using serverless, some frameworks will help you a lot. Also, you can compare the pricing here.

    If you are new to serverless and want to implement it from scratch, you can have a look at the following link.

    Currently, serverless has its downsides, but hoping that Amazon and other cloud providers will come up with some good solutions to make it more efficient. We look forward to learning as the technology evolves.

  • Serverless Computing: Predictions for 2017

    Serverless is the emerging trend in software architecture. 2016 was a very exciting year for serverless and adoption will continue to explode in 2017. This post covers the interesting developments in serverless space in 2016 and my thoughts on how this space will evolve in 2017.

    What is serverless?

    In my opinion, Serverless means two things:
    1. Serverless was initially used to describe fully hosted services or Backend-as-a-Service (BaaS) offerings where you fully depend on a 3rd party to manage the server-side logic and state. Examples include AWS S3(storage), Auth0(authentication), AWS Dynamo (database), Firebase, etc. 
    2. The popular interpretation of Serverless is Functions-as-a-Service(FaaS) where developers can upload code that is run within stateless compute containers that are triggered by a variety of events, are ephemeral and fully managed by the cloud platform. FaaS obviates the need to provision, manage, scale or manage availability of your own servers. The most popular FaaS offering is AWS Lambda but Microsoft Azure Functions , Auth0 Webtask, Google Cloud Functions are also coming up with fast maturing offerings. IBM OpenWhisk, Iron.io allow serverless environments to be setup on-premise as well.

    This post will focus more on FaaS and especially AWS Lambda since it is the leader in this space with the largest feature set.

    Serverless vs PaaS

    People are comparing serverless and Platform-as-a-Service(PaaS) which is an invalid comparison in my opinion. PaaS platforms provide you the ability to deploy and manage your entire application or micro-service. PaaS platforms will help you to scale your application infrastructure. Anyone who has used a PaaS knows that while it reduces administration, it does not do away with it. PaaS does not really require you to re-evaluate your code design. In PaaS, your application needs to be always on, though it can be scaled up or down.

    Serverless on the other hand is about “breaking up your application into discrete functions to obviate the need for any complex infrastructure or it’s management”. Serverless has several restrictions with respect to execution duration limits and state. Serverless requires developers to architect differently thinking about discrete functionality of each component of your application.

    New features from AWS for Serverless

    1. Lambda@Edge: This feature allows Lambda code to be executed on global AWS Edge locations. Imagine you deploy your code to one region of AWS and then are able to run it in any of the 10s and soon 100s of AWS Edge locations around the globe. Imagine the reduction in network latency for your end users.
    2. Step Functions: Companies that adopt serverless  soon end up with 10s of 100s of functions. The logic and workflow between these functions is hard to track and manage. Step Functions allow developers to create visual workflows to organize the various components, micro-services, events and conditions. This is nothing but a Rapid Application Development(RAD) product. I expect AWS to build a fully functional enterprise RAD based on this feature.
    3. API Gateway Monetisation: This is big deal in my opinion. There are various trends like a) startups are increasingly API-first b) all enterprises and startups build 10s or 100s of integrations with APIs c) fine-grained usage based billing based on API usage d) adoption of micro-services architecture which uses API contracts. This feature allows companies to start monetising their APIs via the AWS Marketplace. And the APIs can be implemented in AWS Lambda in the backend. I expect to see a lot of “data integration”, “data pipeline” and “data marketplace” companies try out this approach.
    4. AWS Greengrass: Extend AWS compute to devices. This feature enables running Lambda functions offline on devices in the field. This is another great feature which is extending the meaning of a “global cloud”. Most of the use-cases for this feature are in IoT space.
    5. Continuous Deployment for Serverless: AWS CodePipeline, CodeCommit and CodeBuild support AWS Lambda and enable creation of continuous integration and deployment pipelines. Jenkins and other CI tools also have some plugins for serverlesss which are improving all the time.
    6. API Gateway Developer Portal: Easily build your own developer portal with API documentation. Support as per Swagger documentation.
    7. AWS X-Ray: Analyze and debug distributed applications, commonly used with micro-services architecture. X-Ray will soon get Lambda support and enable even easier debugging of Lambda logic flows across all the AWS services and triggers that are part of your workflow.

    Predictions

    • Your code will increasingly run closer to clients. With Lambda@Edge, Greengrass & Snowball Edge, you can truly “deploy once and run around the globe”. This is not just scaling horizontally but scaling up or down geographically. I expect customers to leverage this feature in some very interesting ways.
    • Serverless frameworks will mature and allow easy creation of simple REST applications. I also expect vertical specific serverless frameworks to evolve, especially for IoT.
    • Monitoring, logging and debugging for serverless will improve with cloud vendor solutions as well as frameworks providing capabilities. A good example is IOPipe which provides a monitoring and logging capabilities by instrumenting your Lambda code.
    • I expect all the continuous integration and deployment tool vendors to add increasing support for serverless architectures in 2017. Automated testing for serverless will become easier via frameworks and CI tools.
    • With CloudFormation and Step Functions, AWS will try and solve the versioning and discovery problem for Lambda functions.
    • Mature patterns will emerge for serverless usage. For example, some very common use-cases today are image or video conversion based on S3 trigger, backends for messaging bots, data processing for IoT, etc. I expect to see more mature patterns and use-cases where serverless becomes an obvious solution. Expect lots of interesting white-papers and case studies from AWS to educate the market on merging use-cases.
    • AWS will allow users to choose to keep some number of Lambda instances always-on for lower latency with some extra pricing. This will start addressing the latency issues that some heavy functions or frameworks entail.
    • API and data integration vendors will increasing use Lambda and monetized API gateways. An early example is Cloud Elements which has released a feature that allows publishing SaaS APIs as AWS Lambda functions.

    Serverless Frameworks

    There are open-source projects like ServerlessChaliceLambda FrameworkApex, Gomix which add a level of abstraction over vendor serverless platforms. These frameworks make it easier to develop and deploy serverless components.

    Some of these frameworks plan to allow abstraction across FaaS offerings to avoid lock-in into a single platform. I do not expect to see any abstraction or standardisation that allows portability of serverless functions. Cloud vendors have unique FaaS offerings with different implementations and the triggers/events are based on their own proprietary services (for example, AWS has S3, Kinesis, DynamodDB, etc. triggers).

    Bots, IoT, mobile app backends, IoT backends, data processing/ETL, scheduled jobs and any other event driven use-cases are a perfect fit for serverless.

    Conclusion

    In my opinion, any service that adds agility along with reduction in IT operations and costs will become successful. Serverless definitely fits into this philosophy and in my opinion, will continue to see increasing adoption. It will not “replace” existing architectures but augment them. The serverless movement is just getting started and you can expect cloud vendors to invest heavily in improving the feature set and capabilities of serverless offerings. The serverless frameworks, DevOps teams, operational management and monitoring of FaaS will continue to mature in 2017. There are a lot of emergent trends at play here — containerization, serverless, LessOps, voice and chatbots, IoT proliferation, globalization and agility demands. All of these will accelerate serverless adoption.

  • A Quick Guide to Building a Serverless Chatbot With Amazon Lex

    Amazon announced “Amazon Lex” in December 2016 and since then we’ve been using it to build bots for our customers. Lex is effectively the technology used by Alexa, Amazon’s voice-activated virtual assistant which lets people control things with voice commands such as playing music, setting alarm, ordering groceries, etc. It provides deep learning-powered natural-language understanding along with automatic speech recognition. Amazon now provides it as a service that allows developers to take advantage of the same features used by Amazon Alexa. So, now there is no need to spend time in setting up and managing the infrastructure for your bots.

    Now, developers just need to design conversations according to their requirements in Lex console. The phrases provided by the developer are used to build the natural language model. After publishing the bot, Lex will process the text or voice conversations and execute the code to send responses.

    I’ve put together this quick-start tutorial using which you can start building Lex chat-bots. To understand the terms correctly, let’s consider an e-commerce bot that supports conversations involving the purchase of books.

    Lex-Related Terminologies

    Bot: It consists of all the components related to a conversation, which includes:

    • Intent: Intent represents a goal, needed to be achieved by the bot’s user. In our case, our goal is to purchase books.
    • Utterances: An utterance is a text phrase that invokes intent. If we have more than one intent, we need to provide different utterances for them. Amazon Lex builds a language model based on utterance phrases provided by us, which then invoke the required intent. For our demo example, we need a single intent “OrderBook”. Some sample utterances would be:
    • I want to order some books
    • Can you please order a book for me
    • Slots: Each slot is a piece of data that the user must supply in order to fulfill the intent. For instance, purchasing a book requires bookType and bookName as slots for intent “OrderBook” (I am considering these two factors for making the example simpler, otherwise there are so many other factors based on which one will purchase/select a book.). 
      Slots are an input, a string, date, city, location, boolean, number etc. that are needed to reach the goal of the intent. Each slot has a name, slot type, a prompt, and is it required. The slot types are the valid values a user can respond with, which can be either custom defined or one of the Amazon pre-built types.
    • Prompt: A prompt is a question that Lex uses to ask the user to supply some correct data (for a slot) that is needed to fulfill an intent e.g. Lex will ask  “what type of book you want to buy?” to fill the slot bookType.
    • Fulfillment: Fulfillment provides the business logic that is executed after getting all required slot values, need to achieve the goal. Amazon Lex supports the use of Lambda functions for fulfillment of business logic and for validations.

    Let’s Implement this Bot!

    Now that we are aware of the basic terminology used in Amazon Lex, let’s start building our chat-bot.

    Creating Lex Bot:

    • Go to Amazon Lex console, which is available only in US, East (N. Virginia) region and click on create button.
    • Create a custom bot by providing following information:
    1. Bot Name: PurchaseBook
    2. Output voice: None, this is only a test based application
    3. Set Session Timeout: 5 min
    4. Add Amazon Lex basic role to Bot app: Amazon will create it automatically.  Find out more about Lex roles & permissions here.
    5. Click on Create button, which will redirect you to the editor page.

    Architecting Bot Conversations

    Create Slots: We are creating two slots named bookType and bookName. Slot type values can be chosen from 275 pre-built types provided by Amazon or we can create our own customized slot types.

    Create custom slot type for bookType as shown here and consider predefined type named Amazon.Book for bookName.

    Create Intent: Our bot requires single custom intent named OrderBook.

    Configuring the Intents

    • Utterances: Provide some utterances to invoke the intent. An utterance can consist only of Unicode characters, spaces, and valid punctuation marks. Valid punctuation marks are periods for abbreviations, underscores, apostrophes, and hyphens. If there is a slot placeholder in your utterance ensure, that it’s in the {slotName} format and has spaces at both ends.

    Slots: Map slots with their types and provide prompt questions that need to be asked to get valid value for the slot. Note the sequence, Lex-bot will ask the questions according to priority.

    Confirmation prompt: This is optional. If required you can provide a confirmation message e.g. Are you sure you want to purchase book named {bookName}?, where bookName is a slot placeholder.

    Fulfillment: Now we have all necessary data gathered from the chatbot, it can just be passed over in lambda function, or the parameters can be returned to the client application that then calls a REST endpoint.

    Creating Amazon Lambda Functions

    Amazon Lex supports Lambda function to provide code hooks to the bot. These functions can serve multiple purposes such as improving the user interaction with the bot by using prior knowledge, validating the input data that bot received from the user and fulfilling the intent.

    • Go to AWS Lambda console and choose to Create a Lambda function.
    • Select blueprint as blank function and click next.
    • To configure your Lambda function, provide its name, runtime and code needs to be executed when the function is invoked. The code can also be uploaded in a zip folder instead of providing it as inline code. We are considering Nodejs4.3 as runtime.
    • Click next and choose Create Function.

    We can configure our bot to invoke these lambda functions at two places. We need to do this while configuring the intent as shown below:-

    where, botCodeHook and fulfillment are name of lambda functions we created.

    Lambda initialization and validation  

    Lambda function provided here i.e. botCodeHook will be invoked on each user input whose intent is understood by Amazon Lex. It will validate the bookName with predefined list of books.

    'use strict';
    exports.handler = (event, context, callback) => {
        const sessionAttributes = event.sessionAttributes;
        const slots = event.currentIntent.slots;
        const bookName = slots.bookName;
      
        // predefined list of available books
        const validBooks = ['harry potter', 'twilight', 'wings of fire'];
      
        // negative check: if valid slot value is not obtained, inform lex that user is expected 
        // respond with a slot value 
        if (bookName && !(bookName === "") && validBooks.indexOf(bookName.toLowerCase()) === -1) {
            let response = { sessionAttributes: event.sessionAttributes,
              dialogAction: {
                type: "ElicitSlot",
                 message: {
                   contentType: "PlainText",
                   content: `We do not have book: ${bookName}, Provide any other book name. For. e.g twilight.`
                },
                 intentName: event.currentIntent.name,
                 slots: slots,
                 slotToElicit : "bookName"
              }
            }
            callback(null, response);
        }
      
        // if valid book name is obtained, send command to choose next course of action
        let response = {sessionAttributes: sessionAttributes,
          dialogAction: {
            type: "Delegate",
            slots: event.currentIntent.slots
          }
        }
        callback(null, response);
    };

    Fulfillment code hook

    This lambda function is invoked after receiving all slot data required to fulfill the intent.

    'use strict';
    
    exports.handler = (event, context, callback) => {
        // when intent get fulfilled, inform lex to complete the state
        let response = {sessionAttributes: event.sessionAttributes,
          dialogAction: {
            type: "Close",
            fulfillmentState: "Fulfilled",
            message: {
              contentType: "PlainText",
              content: "Thanks for purchasing book."
            }
          }
        }
        callback(null, response);
    };

    Error Handling: We can customize the error message for our bot users. Click on error handling and replace default values with the required ones. Since the number of retries given is two, we can also provide different message for every retry.

    Your Bot is Now Ready To Chat

    Click on Build to build the chat-bot. Congratulations! Your Lex chat-bot is ready to test. We can test it in the overlay which appears in the Amazon Lex console.

    Sample conversations:

    I hope you have understood the basic terminologies of Amazon Lex along with how to create a simple chat-bot using serverless (Amazon Lambda). This is a really powerful platform to build mature and intelligent chatbots.

  • Introduction to the Modern Server-side Stack – Golang, Protobuf, and gRPC

    There are some new players in town for server programming and this time it’s all about Google. Golang has rapidly been gaining popularity ever since Google started using it for their own production systems. And since the inception of Microservice Architecture, people have been focusing on modern data communication solutions like gRPC along with Protobuf. In this post, I will walk you through each of these briefly.

    Golang

    Golang or Go is an open source, general purpose programming language by Google. It has been gaining popularity recently for all the good reasons. It may come as a surprise to most people that language is almost 10 years old and has been production ready for almost 7 years, according to Google.

    Golang is designed to be simple, modern, easy to understand, and quick to grasp. The creators of the language designed it in such a way that an average programmer can have a working knowledge of the language over a weekend. I can attest to the fact that they definitely succeeded. Speaking of the creators, these are the experts that have been involved in the original draft of the C language so we can be assured that these guys know what they are doing.

    That’s all good but why do we need another language?

    For most of the use cases, we actually don’t. In fact, Go doesn’t solve any new problems that haven’t been solved by some other language/tool before. But it does try to solve a specific set of relevant problems that people generally face in an efficient, elegant, and intuitive manner. Go’s primary focus is the following:

    • First class support for concurrency
    • An elegant, modern language that is very simple to its core
    • Very good performance
    • First hand support for the tools required for modern software development

    I’m going to briefly explain how Go provides all of the above. You can read more about the language and its features in detail from Go’s official website.

    Concurrency

    Concurrency is one of the primary concerns in most of the server applications and it should be the primary concern of the language, considering the modern microprocessors. Go introduces a concept called a ‘goroutine’. A ‘goroutine’ is analogous to a ‘lightweight user-space thread’. It is much more complicated than that in reality as several goroutines multiplex on a single thread but the above expression should give you a general idea. These are light enough that you can actually spin up a million goroutines simultaneously as they start with a very tiny stack. In fact, that’s recommended. Any function/method in Go can be used to spawn a Goroutine. You can just do ‘go myAsyncTask()’ to spawn a goroutine from ‘myAsyncTask’ function. The following is an example:

    // This function performs the given task concurrently by spawing a goroutine
    // for each of those tasks.
    
    func performAsyncTasks(task []Task) {
      for _, task := range tasks {
        // This will spawn a separate goroutine to carry out this task.
        // This call is non-blocking
        go task.Execute()
      }
    }

    Yes, it’s that easy and it is meant to be that way as Go is a simple language and you are expected to spawn a goroutine for every independent async task without caring much. Go’s runtime automatically takes care of running the goroutines in parallel if multiple cores are available. But how do these goroutines communicate? The answer is channels.

    ‘Channel’ is also a language primitive that is meant to be used for communication among goroutines. You can pass anything from a channel to another goroutine (A primitive Go type or a Go struct or even other channels). A channel is essentially a blocking double ended queue (can be single ended too). If you want a goroutine(s) to wait for a certain condition to be met before continuing further you can implement cooperative blocking of goroutines with the help of channels.

    These two primitives give a lot of flexibility and simplicity in writing asynchronous or parallel code. Other helper libraries like a goroutine pool can be easily created from the above primitives. One basic example is:

    package executor
    
    import (
    	"log"
    	"sync/atomic"
    )
    
    // The Executor struct is the main executor for tasks.
    // 'maxWorkers' represents the maximum number of simultaneous goroutines.
    // 'ActiveWorkers' tells the number of active goroutines spawned by the Executor at given time.
    // 'Tasks' is the channel on which the Executor receives the tasks.
    // 'Reports' is channel on which the Executor publishes the every tasks reports.
    // 'signals' is channel that can be used to control the executor. Right now, only the termination
    // signal is supported which is essentially is sending '1' on this channel by the client.
    type Executor struct {
    	maxWorkers    int64
    	ActiveWorkers int64
    
    	Tasks   chan Task
    	Reports chan Report
    	signals chan int
    }
    
    // NewExecutor creates a new Executor.
    // 'maxWorkers' tells the maximum number of simultaneous goroutines.
    // 'signals' channel can be used to control the Executor.
    func NewExecutor(maxWorkers int, signals chan int) *Executor {
    	chanSize := 1000
    
    	if maxWorkers > chanSize {
    		chanSize = maxWorkers
    	}
    
    	executor := Executor{
    		maxWorkers: int64(maxWorkers),
    		Tasks:      make(chan Task, chanSize),
    		Reports:    make(chan Report, chanSize),
    		signals:    signals,
    	}
    
    	go executor.launch()
    
    	return &executor
    }
    
    // launch starts the main loop for polling on the all the relevant channels and handling differents
    // messages.
    func (executor *Executor) launch() int {
    	reports := make(chan Report, executor.maxWorkers)
    
    	for {
    		select {
    		case signal := <-executor.signals:
    			if executor.handleSignals(signal) == 0 {
    				return 0
    			}
    
    		case r := <-reports:
    			executor.addReport(r)
    
    		default:
    			if executor.ActiveWorkers < executor.maxWorkers && len(executor.Tasks) > 0 {
    				task := <-executor.Tasks
    				atomic.AddInt64(&executor.ActiveWorkers, 1)
    				go executor.launchWorker(task, reports)
    			}
    		}
    	}
    }
    
    // handleSignals is called whenever anything is received on the 'signals' channel.
    // It performs the relevant task according to the received signal(request) and then responds either
    // with 0 or 1 indicating whether the request was respected(0) or rejected(1).
    func (executor *Executor) handleSignals(signal int) int {
    	if signal == 1 {
    		log.Println("Received termination request...")
    
    		if executor.Inactive() {
    			log.Println("No active workers, exiting...")
    			executor.signals <- 0
    			return 0
    		}
    
    		executor.signals <- 1
    		log.Println("Some tasks are still active...")
    	}
    
    	return 1
    }
    
    // launchWorker is called whenever a new Task is received and Executor can spawn more workers to spawn
    // a new Worker.
    // Each worker is launched on a new goroutine. It performs the given task and publishes the report on
    // the Executor's internal reports channel.
    func (executor *Executor) launchWorker(task Task, reports chan<- Report) {
    	report := task.Execute()
    
    	if len(reports) < cap(reports) {
    		reports <- report
    	} else {
    		log.Println("Executor's report channel is full...")
    	}
    
    	atomic.AddInt64(&executor.ActiveWorkers, -1)
    }
    
    // AddTask is used to submit a new task to the Executor is a non-blocking way. The Client can submit
    // a new task using the Executor's tasks channel directly but that will block if the tasks channel is
    // full.
    // It should be considered that this method doesn't add the given task if the tasks channel is full
    // and it is up to client to try again later.
    func (executor *Executor) AddTask(task Task) bool {
    	if len(executor.Tasks) == cap(executor.Tasks) {
    		return false
    	}
    
    	executor.Tasks <- task
    	return true
    }
    
    // addReport is used by the Executor to publish the reports in a non-blocking way. It client is not
    // reading the reports channel or is slower that the Executor publishing the reports, the Executor's
    // reports channel is going to get full. In that case this method will not block and that report will
    // not be added.
    func (executor *Executor) addReport(report Report) bool {
    	if len(executor.Reports) == cap(executor.Reports) {
    		return false
    	}
    
    	executor.Reports <- report
    	return true
    }
    
    // Inactive checks if the Executor is idle. This happens when there are no pending tasks, active
    // workers and reports to publish.
    func (executor *Executor) Inactive() bool {
    	return executor.ActiveWorkers == 0 && len(executor.Tasks) == 0 && len(executor.Reports) == 0
    }

    Simple Language

    Unlike a lot of other modern languages, Golang doesn’t have a lot of features. In fact, a compelling case can be made for the language being too restrictive in its feature set and that’s intended. It is not designed around a programming paradigm like Java or designed to support multiple programming paradigms like Python. It’s just bare bones structural programming. Just the essential features thrown into the language and not a single thing more.

    After looking at the language, you may feel that the language doesn’t follow any particular philosophy or direction and it feels like every feature is included in here to solve a specific problem and nothing more than that. For example, it has methods and interfaces but not classes; the compiler produces a statically linked binary but still has a garbage collector; it has strict static typing but doesn’t support generics. The language does have a thin runtime but doesn’t support exceptions.

    The main idea here that the developer should spend the least amount of time expressing his/her idea or algorithm as code without thinking about “What’s the best way to do this in x language?” and it should be easy to understand for others. It’s still not perfect, it does feel limiting from time to time and some of the essential features like Generics and Exceptions are being considered for the ‘Go 2’.

    Performance

    Single threaded execution performance NOT a good metric to judge a language, especially when the language is focused around concurrency and parallelism. But still, Golang sports impressive benchmark numbers only beaten by hardcore system programming languages like C, C++, Rust, etc. and it is still improving. The performance is actually very impressive considering its a Garbage collected language and is good enough for almost every use case.

    (Image Source: Medium)

    Developer Tooling

    The adoption of a new tool/language directly depends on its developer experience. And the adoption of Go does speak for its tooling. Here we can see that same ideas and tooling is very minimal but sufficient. It’s all achieved by the ‘go’ command and its subcommands. It’s all command line.

    There is no package manager for the language like pip, npm. But you can get any community package by just doing

    go get github.com/velotiotech/WebCrawler/blob/master/executor/executor.go

    CODE: https://gist.github.com/velotiotech/3977b7932b96564ac9a041029d760d6d.js

    Yes, it works. You can just pull packages directly from github or anywhere else. They are just source files.

    But what about package.json..? I don’t see any equivalent for `go get`. Because there isn’t. You don’t need to specify all your dependency in a single file. You can directly use:

    import "github.com/xlab/pocketsphinx-go/sphinx"

    In your source file itself and when you do `go build` it will automatically `go get` it for you. You can see the full source file here:

    package main
    
    import (
    	"encoding/binary"
    	"bytes"
    	"log"
    	"os/exec"
    
    	"github.com/xlab/pocketsphinx-go/sphinx"
    	pulse "github.com/mesilliac/pulse-simple" // pulse-simple
    )
    
    var buffSize int
    
    func readInt16(buf []byte) (val int16) {
    	binary.Read(bytes.NewBuffer(buf), binary.LittleEndian, &val)
    	return
    }
    
    func createStream() *pulse.Stream {
    	ss := pulse.SampleSpec{pulse.SAMPLE_S16LE, 16000, 1}
    	buffSize = int(ss.UsecToBytes(1 * 1000000))
    	stream, err := pulse.Capture("pulse-simple test", "capture test", &ss)
    	if err != nil {
    		log.Panicln(err)
    	}
    	return stream
    }
    
    func listen(decoder *sphinx.Decoder) {
    	stream := createStream()
    	defer stream.Free()
    	defer decoder.Destroy()
    	buf := make([]byte, buffSize)
    	var bits []int16
    
    	log.Println("Listening...")
    
    	for {
    		_, err := stream.Read(buf)
    		if err != nil {
    			log.Panicln(err)
    		}
    
    		for i := 0; i < buffSize; i += 2 {
    			bits = append(bits, readInt16(buf[i:i+2]))
    		}
    
    		process(decoder, bits)
    		bits = nil
    	}
    }
    
    func process(dec *sphinx.Decoder, bits []int16) {
    	if !dec.StartUtt() {
    		panic("Decoder failed to start Utt")
    	}
    	
    	dec.ProcessRaw(bits, false, false)
    	dec.EndUtt()
    	hyp, score := dec.Hypothesis()
    	
    	if score > -2500 {
    		log.Println("Predicted:", hyp, score)
    		handleAction(hyp)
    	}
    }
    
    func executeCommand(commands ...string) {
    	cmd := exec.Command(commands[0], commands[1:]...)
    	cmd.Run()
    }
    
    func handleAction(hyp string) {
    	switch hyp {
    		case "SLEEP":
    		executeCommand("loginctl", "lock-session")
    		
    		case "WAKE UP":
    		executeCommand("loginctl", "unlock-session")
    
    		case "POWEROFF":
    		executeCommand("poweroff")
    	}
    }
    
    func main() {
    	cfg := sphinx.NewConfig(
    		sphinx.HMMDirOption("/usr/local/share/pocketsphinx/model/en-us/en-us"),
    		sphinx.DictFileOption("6129.dic"),
    		sphinx.LMFileOption("6129.lm"),
    		sphinx.LogFileOption("commander.log"),
    	)
    	
    	dec, err := sphinx.NewDecoder(cfg)
    	if err != nil {
    		panic(err)
    	}
    
    	listen(dec)
    }

    This binds the dependency declaration with source itself.

    As you can see by now, it’s simple, minimal and yet sufficient and elegant. There is first hand support for both unit tests and benchmarks with flame charts too. Just like the feature set, it also has its downsides. For example, `go get` doesn’t support versions and you are locked to the import URL passed in you source file. It is evolving and other tools have come up for dependency management.

    Golang was originally designed to solve the problems that Google had with their massive code bases and the imperative need to code efficient concurrent apps. It makes coding applications/libraries that utilize the multicore nature of modern microchips very easy. And, it never gets into a developer’s way. It’s a simple modern language and it never tries to become anything more that that.

    Protobuf (Protocol Buffers)

    Protobuf or Protocol Buffers is a binary communication format by Google. It is used to serialize structured data. A communication format? Kind of like JSON? Yes. It’s more than 10 years old and Google has been using it for a while now.

    But don’t we have JSON and it’s so ubiquitous…

    Just like Golang, Protobufs doesn’t really solve anything new. It just solves existing problems more efficiently and in a modern way. Unlike Golang, they are not necessarily more elegant than the existing solutions. Here are the focus points of protobuf:

    • It’s a binary format, unlike JSON and XML, which are text based and hence it’s vastly space efficient.
    • First hand and sophisticated support for schemas.
    • First hand support for generating parsing and consumer code in various languages.

    Binary format and speed

    So are protobuf really that fast? The short answer is, yes. According to the Google Developers they are 3 to 10 times smaller and 20 to 100 times faster than XML. It’s not a surprise as it is a binary format, the serialized data is not human readable.

    (Image Source: Beating JSON performance with Protobuf)

    Protobufs take a more planned approach. You define `.proto` files which are kind of the schema files but are much more powerful. You essentially define how you want your messages to be structured, which fields are optional or required, their data types etc. After that the protobuf compiler will generate the data access classes for you. You can use these classes in your business logic to facilitate communication.

    Looking at a `.proto` file related to a service will also give you a very clear idea of the specifics of the communication and the features that are exposed. A typical .proto file looks like this:

    message Person {
      required string name = 1;
      required int32 id = 2;
      optional string email = 3;
    
      enum PhoneType {
        MOBILE = 0;
        HOME = 1;
        WORK = 2;
      }
    
      message PhoneNumber {
        required string number = 1;
        optional PhoneType type = 2 [default = HOME];
      }
    
      repeated PhoneNumber phone = 4;
    }

    Fun Fact: Jon Skeet, the king of Stack Overflow is one of the main contributors in the project.

    gRPC

    gRPC, as you guessed it, is a modern RPC (Remote Procedure Call) framework. It is a batteries included framework with built in support for load balancing, tracing, health checking, and authentication. It was open sourced by Google in 2015 and it’s been gaining popularity ever since.

    An RPC framework…? What about REST…?

    SOAP with WSDL has been used long time for communication between different systems in a Service Oriented Architecture. At the time, the contracts used to be strictly defined and systems were big and monolithic, exposing a large number of such interfaces.

    Then came the concept of ‘browsing’ where the server and client don’t need to be tightly coupled. A client should be able to browse service offerings even if they were coded independently. If the client demanded the information about a book, the service along with what’s requested may also offer a list of related books so that client can browse. REST paradigm was essential to this as it allows the server and client to communicate freely without strict restriction using some primitive verbs.

    As you can see above, the service is behaving like a monolithic system, which along with what is required is also doing n number of other things to provide the client with the intended `browsing` experience. But this is not always the use case. Is it?

    Enter the Microservices

    There are many reasons to adopt for a Microservice Architecture. The prominent one being the fact that it is very hard to scale a Monolithic system. While designing a big system with Microservices Architecture each business or technical requirement is intended to be carried out as a cooperative composition of several primitive ‘micro’ services.

    These services don’t need to be comprehensive in their responses. They should perform specific duties with expected responses. Ideally, they should behave like pure functions for seamless composability.

    Now using REST as a communication paradigm for such services doesn’t provide us with much of a benefit. However, exposing a REST API for a service does enable a lot of expression capability for that service but again if such expression power is neither required nor intended we can use a paradigm that focuses more on other factors.

    gRPC intends to improve upon the following technical aspects over traditional HTTP requests:

    • HTTP/2 by default with all its goodies.
    • Protobuf as machines are talking.
    • Dedicated support for streaming calls thanks to HTTP/2.
    • Pluggable auth, tracing, load balancing and health checking because you always need these.

    As it’s an RPC framework, we again have concepts like Service Definition and Interface Description Language which may feel alien to the people who were not there before REST but this time it feels a lot less clumsy as gRPC uses Protobuf for both of these.

    Protobuf is designed in such a way that it can be used as a communication format as well as a protocol specification tool without introducing anything new. A typical gRPC service definition looks like this:

    service HelloService {
      rpc SayHello (HelloRequest) returns (HelloResponse);
    }
    
    message HelloRequest {
      string greeting = 1;
    }
    
    message HelloResponse {
      string reply = 1;
    }

    You just write a `.proto` file for your service describing the interface name, what it expects, and what it returns as Protobuf messages. Protobuf compiler will then generate both the client and server side code. Clients can call this directly and server-side can implement these APIs to fill in the business logic.

    Conclusion

    Golang, along with gRPC using Protobuf is an emerging stack for modern server programming. Golang simplifies making concurrent/parallel applications and gRPC with Protobuf enables efficient communication with a pleasing developer experience.

  • Scalable Real-time Communication With Pusher

    What and why?

    Pusher is a hosted API service which makes adding real-time data and functionality to web and mobile applications seamless. 

    Pusher works as a real-time communication layer between the server and the client. It maintains persistent connections at the client using WebSockets, as and when new data is added to your server. If a server wants to push new data to clients, they can do it instantly using Pusher. It is highly flexible, scalable, and easy to integrate. Pusher has exposed over 40+ SDKs that support almost all tech stacks.

    In the context of delivering real-time data, there are other hosted and self-hosted services available. It depends on the use case of what exactly one needs, like if you need to broadcast data across all the users or something more complex having specific target groups. In our use case, Pusher was well-suited, as the decision was based on the easy usage, scalability, private and public channels, webhooks, and event-based automation. Other options which we considered were Socket.IO, Firebase & Ably, etc. 

    Pusher is categorically well-suited for communication and collaboration features using WebSockets. The key difference with  Pusher: it’s a hosted service/API.  It takes less work to get started, compared to others, where you need to manage the deployment yourself. Once we do the setup, it comes to scaling, that reduces future efforts/work.

    Some of the most common use cases of Pusher are:

    1. Notification: Pusher can inform users if there is any relevant change.  Notifications can also be thought of as a form of signaling, where there is no representation of the notification in the UI. Still, it triggers a reaction within an application.

    2. Activity streams: Stream of activities which are published when something changes on the server or someone publishes it across all channels.

    3. Live Data Visualizations: Pusher allows you to broadcast continuously changing data when needed.

    4. Chats: You can use Pusher for peer to peer or peer to multichannel communication.

    In this blog, we will be focusing on using Channels, which is an alias for Pub/Sub messaging API for a JavaScript-based application. Pusher also comes with Chatkit and Beams (Push Notification) SDK/APIs.

    • Chatkit is designed to make chat integration to your app as simple as possible. It allows you to add group chat and 1 to 1 chat feature to your app. It also allows you to add file attachments and online indicators.
    • Beams are used for adding Push Notification in your Mobile App. It includes SDKs to seamlessly manage push token and send notifications.

    Step 1: Getting Started

    Setup your account on the Pusher dashboard and get your free API keys.

    Image Source: Pusher

    1. Click on Channels
    2. Create an App. Add details based on the project and the environment
    3. Click on the App Keys tab to get the app keys.
    4. You can also check the getting started page. It will give code snippets to get you started.

    Add Pusher to your project:

    var express = require('express');
    var bodyParser = require('body-parser');
    
    var app = express();
    app.use(bodyParser.json());
    app.use(bodyParser.urlencoded({ extended: false }));
    
    app.post('/pusher/auth', function(req, res) {
      var socketId = req.body.socket_id;
      var channel = req.body.channel_name;
      var auth = pusher.authenticate(socketId, channel);
      res.send(auth);
    });
    
    var port = process.env.PORT || 5000;
    app.listen(port);

    CODE: https://gist.github.com/velotiotech/f09f14363bacd51446d5318e5050d628.js

    or using npm

    npm i pusher

    CODE: https://gist.github.com/velotiotech/423115d0943c1b882c913e437c529d11.js

    Step 2: Subscribing to Channels

    There are three types of channels in Pusher: Public, Private, and Presence.

    • Public channels: These channels are public in nature, so anyone who knows the channel name can subscribe to the channel and start receiving messages from the channel. Public channels are commonly used to broadcast general/public information, which does not contain any secure information or user-specific data.
    • Private channels: These channels have an access control mechanism that allows the server to control who can subscribe to the channel and receive data from the channel. All private channels should have a private- prefixed to the name. They are commonly used when the sever needs to know who can subscribe to the channel and validate the subscribers.
    • Presence channels: It is an extension to the private channel. In addition to the properties which private channels have, it lets the server ‘register’ users information on subscription to the channel. It also enables other members to identify who is online.

    In your application, you can create a subscription and start listening to events on: 

    // Here my-channel is the channel name
    // all the event published to this channel would be available
    // once you subscribe to the channel and start listing to it.
    
    var channel = pusher.subscribe('my-channel');
    
    channel.bind('my-event', function(data) {
      alert('An event was triggered with message: ' + data.message);
    });

    CODE: https://gist.github.com/velotiotech/d8c27960e2fac408a8db57b92f1e846d.js

    Step 3: Creating Channels

    For creating channels, you can use the dashboard or integrate it with your server. For more details on how to integrate Pusher with your server, you can read (Server API). You need to create an app on your Pusher dashboard and can use it to further trigger events to your app.

    or 

    Integrate Pusher with your server. Here is a sample snippet from our node App:

    var Pusher = require('pusher');
    
    var pusher = new Pusher({
      appId: 'APP_ID',
      key: 'APP_KEY',
      secret: 'APP_SECRET',
      cluster: 'APP_CLUSTER'
    });
    
    // Logic which will then trigger events to a channel
    function trigger(){
    ...
    ...
    pusher.trigger('my-channel', 'my-event', {"message": "hello world"});
    ...
    ...
    }

    CODE: https://gist.github.com/velotiotech/6f5b0f6407c0a74a0bce4b398a849410.js

    Step 4: Adding Security

    As a default behavior, anyone who knows your public app key can open a connection to your channels app. This behavior does not add any security risk, as connections can only access data on channels. 

    For more advanced use cases, you need to use the “Authorized Connections” feature. It authorizes every single connection to your channels, and hence, avoids unwanted/unauthorized connection. To enable the authorization, set up an auth endpoint, then modify your client code to look like this.

    const channels = new Pusher(APP_KEY, {
      cluster: APP_CLUSTER,
      authEndpoint: '/your_auth_endpoint'
    });
    
    const channel = channels.subscribe('private-<channel-name>');

    CODE: https://gist.github.com/velotiotech/9369051e5661a95352f08b1fdd8bf9ed.js

    For more details on how to create an auth endpoint for your server, read this. Here is a snippet from Node.js app

    var express = require('express');
    var bodyParser = require('body-parser');
    
    var app = express();
    app.use(bodyParser.json());
    app.use(bodyParser.urlencoded({ extended: false }));
    
    app.post('/pusher/auth', function(req, res) {
      var socketId = req.body.socket_id;
      var channel = req.body.channel_name;
      var auth = pusher.authenticate(socketId, channel);
      res.send(auth);
    });
    
    var port = process.env.PORT || 5000;
    app.listen(port);

    CODE: https://gist.github.com/velotiotech/fb67d5efe3029174abc6991089a910e1.js

    Step 5: Scale as you grow

     

    Pusher comes with a wide range of plans which you can subscribe to based on your usage. You can scale your application as it grows. Here is a snippet from available plans for mode details you can refer this.

    Image Source: Pusher

    Conclusion

    This article has covered a brief description of Pusher, its use cases, and how you can use it to build a scalable real-time application. Using Pusher may vary based on different use cases; it is no real debate on what one can choose. Pusher approach is simple and API based. It enables developers to add real-time functionality to any application in very little time.

    If you want to get hands-on tutorials/blogs, please visit here.