Blog

How Much Do You Really Know About Simplified Cloud Deployments?

Is your EC2/VM bill giving you sleepless nights?

Are your EC2 instances under-utilized? Have you been wondering if there was an easy way to maximize the EC2/VM usage?

Are you investing too much in your Control Plane and wish you could divert some of that investment towards developing more features in your applications (business logic)?

Is your Configuration Management system overwhelming you and seems to have got a life of its own?

Do you have legacy applications that do not need Docker at all?

Would you like to simplify your deployment toolchain to streamline your workflows?

Have you been recommended to use Kubernetes as a problem to fix all your woes, but you aren’t sure if Kubernetes is actually going to help you?

Do you feel you are moving towards Docker, just so that Kubernetes can be used?

If you answered “Yes” to any of the questions above, do read on, this article is just what you might need.

There are steps to create a simple setup on your laptop at the end of the article.

Introduction

In the following article, we will present the typical components of a multi-tier application and how it is setup and deployed.

We shall further go on to see how the same application deployment can be remodeled for scale using any Cloud Infrastructure. (The same software toolchain can be used to deploy the application on your On-Premise Infrastructure as well)

The tools that we propose are Nomad and Consul. We shall focus more on how to use these tools, rather than deep-dive into the specifics of the tools. We will briefly see the features of the software which would help us achieve our goals.

Nomad is a distributed workload manager for not only Docker containers, but also for various other types of workloads like legacy applications, JAVA, LXC, etc.

More about Nomad Drivers here: Nomadproject.io, application delivery with HashiCorp, introduction to HashiCorp Nomad.

Consul is a distributed service mesh, with features like service registry and a key-value store, among others.

Using these tools, the application/startup workflow would be as follows:

Nomad will be responsible for starting the service.

Nomad will publish the service information in Consul. The service information will include details like:

Where is the application running (IP:PORT) ?
What “service-name” is used to identify the application?
What “tags” (metadata) does this application have?

A Typical Application

A typical application deployment consists of a certain fixed set of processes, usually coupled with a database and a set of few (or many) peripheral services.

These services could be primary (must-have) or support (optional) features of the application.

Note: We are aware about what/how a proper “service-oriented-architecture” should be, though we will skip that discussion for now. We will rather focus on how real-world applications are setup and deployed.

Simple Multi-tier Application

In this section, let’s see the components of a multi-tier application along with typical access patterns from outside the system and within the system.

Load Balancer/Web/Front End Tier
Application Services Tier
Database Tier
Utility (or Helper Servers): To run background, cron, or queued jobs.

Using a proxy/loadbalancer, the services (Service-A, Service-B, Service-C) could be accessed using distinct hostnames:

a.example.tld
b.example.tld
c.example.tld

For an equivalent path-based routing approach, the setup would be similar. Instead of distinct hostnames, the communication mechanism would be:

common-proxy.example.tld/path-a/
common-proxy.example.tld/path-b/
common-proxy.example.tld/path-c/

Problem Scenario 1

Some of the basic problems with the deployment of the simple multi-tier application are:

What if the service process crashes during its runtime?
What if the host on which the services run shuts down, reboots or terminates?

This is where Nomad’s feature of always keep the service running would be useful.

In spite of this auto-restart feature, there could be issues if the service restarts on a different machine (i.e. different IP address).

In case of Docker and ephemeral ports, the service could start on a different port as well.

To solve this, we will use the service discovery feature provided by Consul, combined with a with a Consul-aware load-balancer/proxy to redirect traffic to the appropriate service.

The order of the operations within the Nomad job will thus be:

Nomad will launch the job/task.
Nomad will register the task details as a service definition in Consul.
(These steps will be re-executed if/when the application is restarted due to a crash/fail-over)
The Consul-aware load-balancer will route the traffic to the service (IP:PORT)

Multi-tier Application With Load Balancer

Using the Consul-aware load-balancer, the diagram will now look like:

The details of the setup now are:

A Consul-aware load-balancer/proxy; the application will access the services via the load-balancer.
3 (three) instances of service A; A1, A2, A3
3 (three) instances of service B; B1, B2, B3

The Routing Question

At this moment, you could be wondering, “Why/How would the load-balancer know that it has to route traffic for service-A to A1/A2/A3 and route traffic for service-B to B1/B2/B3 ?”

The answer lies in the Consul tags which will be published as part of the service definition (when Nomad registers the service in Consul).

The appropriate Consul tags will tell the load-balancer to route traffic of a particular service to the appropriate backend. (+++)

Let’s read that statement again (very slowly, just to be sure); The Consul tags, which are part of the service definition, will inform (advertise) the load-balancer to route traffic to the appropriate backend.

The reason to dwell upon this distinction is very important, as this is different from how the classic load-balancer/proxy software like HAProxy or NGINX are configured. For HAProxy/NGINX the backend routing information resides with the load-balancer instance and is not “advertised” by the backend.

The traditional load-balancers like NGINX/HAProxy do not natively support dynamic reloading of the backends. (when the backends stop/start/move-around). The heavy lifting of regenerating the configuration file and reloading the service is left up to an external entity like Consul-Template.

The use of a Consul-aware load-balancer, instead of a traditional load-balancer, eliminates the need of external workarounds.

The setup can thus be termed as a zero-configuration setup; you don’t have to re-configure the load-balancer, it will discover the changing backend services based on the information available from Consul.

Problem Scenario 2

So far we have achieved a method to “automatically” discover the backends, but isn’t the Load-Balancer itself a single-point-of-failure (SPOF)?

It absolutely is, and you should always have redundant load-balancers instances (which is what any cloud-provided load-balancer has).

As there is a certain cost associated with using “cloud-provided load-balancer”, we would create the load-balancers ourselves and not use cloud-provided load-balancers.

To provide redundancy to the load-balancer instances, you should configure them using and AutoScalingGroup (AWS), VM Scale Sets (Azure), etc.

The same redundancy strategy should also be used for the worker nodes, where the actual services reside, by using AutoScaling Groups/VMSS for the worker nodes.

The Complete Picture

Installation and Configuration

Given that nowadays laptops are pretty powerful, you can easily create a test setup on your laptop using VirtualBox, VMware Workstation Player, VMware Workstation, etc.

As a prerequisite, you will need a few virtual machines which can communicate with each other.

NOTE: Create the VMs with networking set to bridged mode.

The machines needed for the simple setup/demo would be:

1 Linux VM to act as a server (srv1)
1 Linux VM to act as a load-balancer (lb1)
2 Linux VMs to act as worker machines (client1, client2)

*** Each machine can be 2 CPU 1 GB memory each.

The configuration files and scripts needed for the demo, which will help you set up the Nomad and Consul cluster are available here.

Setup the Server

Install the binaries on the server

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service

Create the Server Configuration

### On the server machine ...

### Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/server.hcl -O /etc/consul/server.hcl

### Edit Consul's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
sudo vim /etc/consul/server.hcl

### Nomad
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/server.hcl -O /etc/nomad/server.hcl

### Edit Nomad's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
sudo vim /etc/nomad/server.hcl

### After you are done with the edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members

### On the server machine ...

### Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/server.hcl -O /etc/consul/server.hcl

### Edit Consul's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
sudo vim /etc/consul/server.hcl

### Nomad
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/server.hcl -O /etc/nomad/server.hcl

### Edit Nomad's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
sudo vim /etc/nomad/server.hcl

### After you are done with the edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members

Setup the Load-Balancer

Install the binaries on the server

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service
sudo mv -fv nomad.service /etc/systemd/system/nomad.service

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service
sudo mv -fv nomad.service /etc/systemd/system/nomad.service

Create the Load-Balancer Configuration

### On the load-balancer machine ...

### for Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl

### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/consul/client.hcl

### for Nomad ...
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl

### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/nomad/client.hcl

### After you are done with the edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members
sudo nomad node status -verbose

### On the load-balancer machine ...

### for Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl

### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/consul/client.hcl

### for Nomad ...
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl

### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/nomad/client.hcl

### After you are done with the edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members
sudo nomad node status -verbose

Setup the Client (Worker) Machines

Install the binaries on the server

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service
sudo mv -fv nomad.service /etc/systemd/system/nomad.service

# install the Consul binary
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
unzip -o consul.zip
sudo chown root:root consul
sudo mv -fv consul /usr/sbin/

# install the Nomad binary
wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
unzip -o nomad.zip
sudo chown root:root nomad
sudo mv -fv nomad /usr/sbin/

# install Consul's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
sudo chown root:root consul.service
sudo mv -fv consul.service /etc/systemd/system/consul.service

# install Nomad's service file
wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
sudo chown root:root nomad.service
sudo mv -fv nomad.service /etc/systemd/system/nomad.service

Create the Worker Configuration

### On the client (worker) machine ...

### Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl

### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/consul/client.hcl

### Nomad
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl

### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/nomad/client.hcl

### After you are sure about your edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members
sudo nomad node status -verbose

### On the client (worker) machine ...

### Consul 
sudo mkdir -p /etc/consul/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl

### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/consul/client.hcl

### Nomad
sudo mkdir -p /etc/nomad/
sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl

### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
sudo vim /etc/nomad/client.hcl

### After you are sure about your edits ...

sudo systemctl daemon-reload
sudo systemctl enable consul nomad
sudo systemctl restart consul nomad
sleep 10
sudo consul members
sudo nomad server members
sudo nomad node status -verbose

Test the Setup

For the sake of simplicity, we shall assume the following IP addresses for the machines. (You can adapt the IPs as per your actual cluster configuration)

srv1: 192.168.1.11

lb1: 192.168.1.101

client1: 192.168.201

client1: 192.168.202

You can access the web GUI for Consul and Nomad at the following URLs:

Consul: http://192.168.1.11:8500

Nomad: http://192.168.1.11:4646

# watch -n 5 "consul members; echo; nomad server members; echo; nomad node status -verbose; echo; nomad job status"

# watch -n 5 "consul members; echo; nomad server members; echo; nomad node status -verbose; echo; nomad job status"

Output:

Node     Address             Status  Type    Build  Protocol  DC   Segment
srv1     192.168.1.11:8301   alive   server  1.5.1  2         dc1  <all>
client1  192.168.1.201:8301  alive   client  1.5.1  2         dc1  <default>
client2  192.168.1.202:8301  alive   client  1.5.1  2         dc1  <default>
lb1      192.168.1.101:8301  alive   client  1.5.1  2         dc1  <default>

Name         Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
srv1.global  192.168.1.11  4648  alive   true    2         0.9.3  dc1         global

ID           DC   Name     Class   Address        Version Drain  Eligibility  Status
37daf354...  dc1  client2  worker  192.168.1.202  0.9.3  false  eligible     ready
9bab72b1...  dc1  client1  worker  192.168.1.201  0.9.3  false  eligible     ready
621f4411...  dc1  lb1      lb      192.168.1.101  0.9.3  false  eligible     ready

Node     Address             Status  Type    Build  Protocol  DC   Segment
srv1     192.168.1.11:8301   alive   server  1.5.1  2         dc1  <all>
client1  192.168.1.201:8301  alive   client  1.5.1  2         dc1  <default>
client2  192.168.1.202:8301  alive   client  1.5.1  2         dc1  <default>
lb1      192.168.1.101:8301  alive   client  1.5.1  2         dc1  <default>

Name         Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
srv1.global  192.168.1.11  4648  alive   true    2         0.9.3  dc1         global

ID           DC   Name     Class   Address        Version Drain  Eligibility  Status
37daf354...  dc1  client2  worker  192.168.1.202  0.9.3  false  eligible     ready
9bab72b1...  dc1  client1  worker  192.168.1.201  0.9.3  false  eligible     ready
621f4411...  dc1  lb1      lb      192.168.1.101  0.9.3  false  eligible     ready

Submit Jobs

Run the load-balancer job

# nomad run fabio_docker.nomad

# nomad run fabio_docker.nomad

Output:

==> Monitoring evaluation "bb140467"
    Evaluation triggered by job "fabio_docker"
    Allocation "1a6a5587" created: node "621f4411", group "fabio"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "bb140467" finished with status "complete"

==> Monitoring evaluation "bb140467"
    Evaluation triggered by job "fabio_docker"
    Allocation "1a6a5587" created: node "621f4411", group "fabio"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "bb140467" finished with status "complete"

Check the status of the load-balancer

# nomad alloc status 1a6a5587

# nomad alloc status 1a6a5587

Output:

ID                  = 1a6a5587
Eval ID             = bb140467
Name                = fabio_docker.fabio[0]
Node ID             = 621f4411
Node Name           = lb1
Job ID              = fabio_docker
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 1m9s ago
Modified            = 1m3s ago

Task "fabio" is "running"
Task Resources
CPU        Memory          Disk     Addresses
5/200 MHz  10 MiB/128 MiB  300 MiB  lb: 192.168.1.101:9999
                                    ui: 192.168.1.101:9998

Task Events:
Started At     = 2019-06-13T19:15:17Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-13T19:15:17Z  Started     Task started by client
2019-06-13T19:15:12Z  Driver      Downloading image
2019-06-13T19:15:12Z  Task Setup  Building Task Directory
2019-06-13T19:15:12Z  Received    Task received by client

ID                  = 1a6a5587
Eval ID             = bb140467
Name                = fabio_docker.fabio[0]
Node ID             = 621f4411
Node Name           = lb1
Job ID              = fabio_docker
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 1m9s ago
Modified            = 1m3s ago

Task "fabio" is "running"
Task Resources
CPU        Memory          Disk     Addresses
5/200 MHz  10 MiB/128 MiB  300 MiB  lb: 192.168.1.101:9999
                                    ui: 192.168.1.101:9998

Task Events:
Started At     = 2019-06-13T19:15:17Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-13T19:15:17Z  Started     Task started by client
2019-06-13T19:15:12Z  Driver      Downloading image
2019-06-13T19:15:12Z  Task Setup  Building Task Directory
2019-06-13T19:15:12Z  Received    Task received by client

Run the service ‘foo’

# nomad run foo_docker.nomad

# nomad run foo_docker.nomad

Output:

==> Monitoring evaluation "a994bbf0"
    Evaluation triggered by job "foo_docker"
    Allocation "7794b538" created: node "9bab72b1", group "gowebhello"
    Allocation "eecceffc" modified: node "37daf354", group "gowebhello"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "a994bbf0" finished with status "complete"

==> Monitoring evaluation "a994bbf0"
    Evaluation triggered by job "foo_docker"
    Allocation "7794b538" created: node "9bab72b1", group "gowebhello"
    Allocation "eecceffc" modified: node "37daf354", group "gowebhello"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "a994bbf0" finished with status "complete"

Check the status of service ‘foo’

# nomad alloc status 7794b538

# nomad alloc status 7794b538

Output:

ID                  = 7794b538
Eval ID             = a994bbf0
Name                = foo_docker.gowebhello[1]
Node ID             = 9bab72b1
Node Name           = client1
Job ID              = foo_docker
Job Version         = 1
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 9s ago
Modified            = 7s ago

Task "gowebhello" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/500 MHz  4.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23382

Task Events:
Started At     = 2019-06-13T19:27:17Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-13T19:27:17Z  Started     Task started by client
2019-06-13T19:27:16Z  Task Setup  Building Task Directory
2019-06-13T19:27:15Z  Received    Task received by client

ID                  = 7794b538
Eval ID             = a994bbf0
Name                = foo_docker.gowebhello[1]
Node ID             = 9bab72b1
Node Name           = client1
Job ID              = foo_docker
Job Version         = 1
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 9s ago
Modified            = 7s ago

Task "gowebhello" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/500 MHz  4.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23382

Task Events:
Started At     = 2019-06-13T19:27:17Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-13T19:27:17Z  Started     Task started by client
2019-06-13T19:27:16Z  Task Setup  Building Task Directory
2019-06-13T19:27:15Z  Received    Task received by client

Run the service ‘bar’

# nomad run bar_docker.nomad

# nomad run bar_docker.nomad

Output:

==> Monitoring evaluation "075076bc"
    Evaluation triggered by job "bar_docker"
    Allocation "9f16354b" created: node "9bab72b1", group "gowebhello"
    Allocation "b86d8946" created: node "37daf354", group "gowebhello"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "075076bc" finished with status "complete"

==> Monitoring evaluation "075076bc"
    Evaluation triggered by job "bar_docker"
    Allocation "9f16354b" created: node "9bab72b1", group "gowebhello"
    Allocation "b86d8946" created: node "37daf354", group "gowebhello"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "075076bc" finished with status "complete"

Check the status of service ‘bar’

# nomad alloc status 9f16354b

# nomad alloc status 9f16354b

Output:

ID                  = 9f16354b
Eval ID             = 075076bc
Name                = bar_docker.gowebhello[1]
Node ID             = 9bab72b1
Node Name           = client1
Job ID              = bar_docker
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 4m28s ago
Modified            = 4m16s ago

Task "gowebhello" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/500 MHz  6.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23646

Task Events:
Started At     = 2019-06-14T06:49:36Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-14T06:49:36Z  Started     Task started by client
2019-06-14T06:49:35Z  Task Setup  Building Task Directory
2019-06-14T06:49:35Z  Received    Task received by client

ID                  = 9f16354b
Eval ID             = 075076bc
Name                = bar_docker.gowebhello[1]
Node ID             = 9bab72b1
Node Name           = client1
Job ID              = bar_docker
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 4m28s ago
Modified            = 4m16s ago

Task "gowebhello" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/500 MHz  6.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23646

Task Events:
Started At     = 2019-06-14T06:49:36Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-14T06:49:36Z  Started     Task started by client
2019-06-14T06:49:35Z  Task Setup  Building Task Directory
2019-06-14T06:49:35Z  Received    Task received by client

Check the Fabio Routes

http://192.168.1.101:9998/routes

Connect to the Services

The services “foo” and “bar” are available at:

http://192.168.1.101:9999/foo

http://192.168.1.101:9999/bar

Output:

gowebhello root page

https://github.com/udhos/gowebhello is a simple golang replacement for 'python -m SimpleHTTPServer'.
Welcome!
gowebhello version 0.7 runtime go1.12.5 os=linux arch=amd64
Keepalive: true
Application banner: Welcome to FOO
...
...

gowebhello root page

https://github.com/udhos/gowebhello is a simple golang replacement for 'python -m SimpleHTTPServer'.
Welcome!
gowebhello version 0.7 runtime go1.12.5 os=linux arch=amd64
Keepalive: true
Application banner: Welcome to FOO
...
...

Pressing F5 to refresh the browser should keep changing the backend service that you are eventually connected to.

Conclusion

This article should give you a fair idea about the common problems of a distributed application and how they can be solved.

Remodeling an existing application deployment as it scales can be quite a challenge. Hopefully the sample/demo setup will help you to explore, design and optimize the deployment workflows of your application, be it On-Premise or any Cloud Environment.

December 12, 2022

Set Up Simple S3 Deployment Workflow with Github Actions and CircleCI
In this article, we’ll implement a continuous delivery (referred to as CD going forward) workflow using the Serverless framework for our demo React SPA application using Serverless Finch.

Deploying single-page applications to AWS S3 is a common use case. Manual deployment and bucket configuration can be tedious and unreliable. By using Serverless and CD platforms, we can simplify this commonly faced CD challenge.

In almost every project we have worked on, we have built a general-purpose continuous integration (referred to as CI through the rest of this article) setup as part of our basic setups. The CI requirements might range from simple test workflows to cluster deployments.

In this article, we’ll be focusing on a simple deployment workflow using Github Actions and CircleCI. Github Actions brought CI/CD to a wider community by simplifying the setup for CI pipelines.

Prerequisites

This article assumes you have a basic understanding of CICD and AWS services such as IAM and S3. The sample application uses a basic Create React Application for the deployment demo. But knowing React.js is not required. You can implement the same flow for any other SPA or bare-bones application.

Why Github Actions?

There have always been great tools and CI platforms, such as AWS CodePipeline, Jenkins, Travis CI, CircleCI, etc. What makes Github Actions so compelling is that it’s built inside Github. Many organizations use Github for source control, and they often have to spend time configuring repositories with CI tools. On top of that, starting with Github Actions is free.

As Github Actions is built inside the Github ecosystem, it’s a piece of cake to get CI pipelines up and running. Github Actions also allow you to build your own actions. However, there are some limitations because the CI platform is quite new compared to others.

Why CircleCI?

CircleCI has been in the market for almost a decade providing CICD solutions. One of many reasons to choose CircleCI is its pricing. CircleCI offers free credits each month without any upfront payments or payment details. It also offers a wide-ranging repository of plugins called Orbs. You can even build your own orbs, which are easy. It also offers simple and reliable workflow building tools. You can check other features as well.

Let’s Get Started

To introduce the application, we’ll create a simple React application with master-detail flow added to it. We’ll be using React’s official CRA tool to create our project, which creates the boilerplate for us.

Installing Dependencies

Let’s install the create-react-app as a global package. We’ll be calling our demo project “Serverless S3”. Now, we will create our react app with the following:
```
yarn global add create-react-app
create-react-app serverless-s3
```
Now that we’ve created the frontend application, we can start building something cool with it. If we run the application with yarn start, we should be able to see the default CRA welcome page:

Source: React

To implement our master-detail flow of Github repositories, we’ll need to add some navigation to our app. Also, to keep it short, we’ll be using Github’s official SDK package. So, let’s use the react-router for the same.
```
yarn add react-router-dom @octakit/core
```
Our demo application will consist of two routes:
1. A list of all public repos of an organization
2. The details of the repository after clicking a repo item from the list
We’ll be using the Octokit client to fetch the data from Github’s open endpoints. This won’t need any authentication with Github.

Adding Application Components

Alright, now that we have our dependencies installed, we can add the routes to our App.js, which is the entry point for our React app.
import { BrowserRouter as Router, Switch, Route } from 'react-router-dom'; import RepoList from './RepoList'; import RepoDetails from './RepoDetails'; import './App.css'; function App() { return ( <Router> <div className="App"> <Switch> <Route path="/repo/:owner/:repo" component={RepoDetails} /> <Route path="/" component={RepoList} /> </Switch> </div> </Router> ); } export default App;
```
import { BrowserRouter as Router, Switch, Route } from 'react-router-dom';
 
import RepoList from './RepoList';
import RepoDetails from './RepoDetails';
 
import './App.css';
 
function App() {
  return (
   <Router>
     <div className="App">
       <Switch>
         <Route path="/repo/:owner/:repo" component={RepoDetails} />
         <Route path="/" component={RepoList} />
       </Switch>
     </div>
   </Router>
 );
}
 
export default App;
```
Let’s initialize our Octokit client, which will help us make calls to Github’s open endpoints to get data.
```
import { Octokit } from '@octokit/core';
 
export const octokit = new Octokit({});
```
You can even make calls to authorized resources with the Octokit client. Octokit client supports both GraphQL and REST API. You can learn more about the client through the official documentation.

Let’s add the RepoList.js component to the application, which will fetch the list of repositories of a given organization and display hyperlinks to the details page.
import React, { useEffect, useState } from 'react'; import { Link } from 'react-router-dom'; import { octokit } from './client'; function RepoList() { const [repos, setRepos] = useState([]); useEffect(() => { octokit .request('GET /orgs/:org/repos', { org: 'octokit', }) .then((data) => setRepos(data.data)); }, []); return ( <div className="repo-list-container"> <h1>Repositories</h1> <ul> {repos.map((repo) => ( <li key={repo.id} className="repo-list-item"> <Link to={`/repo/${repo.owner.login}/${repo.name}`}>{repo.full_name}</Link> </li> ))} </ul> </div> ); } export default RepoList;
```
import React, { useEffect, useState } from 'react';
import { Link } from 'react-router-dom';
import { octokit } from './client';
 
function RepoList() {
 const [repos, setRepos] = useState([]);
 useEffect(() => {
   octokit
     .request('GET /orgs/:org/repos', {
       org: 'octokit',
     })
     .then((data) => setRepos(data.data));
 }, []);
 
 return (
   <div className="repo-list-container">
     <h1>Repositories</h1>
     <ul>
       {repos.map((repo) => (
         <li key={repo.id} className="repo-list-item">
           <Link to={`/repo/${repo.owner.login}/${repo.name}`}>{repo.full_name}</Link>
         </li>
       ))}
     </ul>
   </div>
 );
}
 
export default RepoList;
```
Now that we have our list of repositories ready, we can now allow users to see some of their general details. Let’s create our details component called RepoDetails:
import { useEffect, useState } from 'react'; import { useParams } from 'react-router-dom'; import { octokit } from './client'; function RepoDetails() { const [repo, setRepo] = useState(); const { repo: repoName, owner } = useParams(); useEffect(() => { octokit .request('GET /repos/{owner}/{repo}', { owner, repo: repoName, }) .then((data) => setRepo(data.data)); }, [repoName, owner]); if (!repo) { return <b>loading...</b>; } return ( <div className="repo-container"> <h1>{repo.full_name}</h1> <p>Description: {repo.description}</p> <ul> <li><b>Forks:</b> {repo.forks}</li> <li><b>Subscribers:</b> {repo.subscribers_count}</li> <li><b>Watchers:</b> {repo.watchers}</li> <li><b>License:</b> {repo.license.name}</li> </ul> </div> ); } export default RepoDetails;
```
import { useEffect, useState } from 'react';
import { useParams } from 'react-router-dom';
import { octokit } from './client';
function RepoDetails() {
  const [repo, setRepo] = useState();
  const { repo: repoName, owner } = useParams();
  useEffect(() => {
    octokit
      .request('GET /repos/{owner}/{repo}', {
        owner,
        repo: repoName,
      })
      .then((data) => setRepo(data.data));
  }, [repoName, owner]);
  if (!repo) {
    return <b>loading...</b>;
  }
  return (
    <div className="repo-container">
      <h1>{repo.full_name}</h1>
      <p>Description: {repo.description}</p>
      <ul>
        <li><b>Forks:</b> {repo.forks}</li>
        <li><b>Subscribers:</b> {repo.subscribers_count}</li>
        <li><b>Watchers:</b> {repo.watchers}</li>
        <li><b>License:</b> {repo.license.name}</li>
      </ul>
    </div>
  );
}
export default RepoDetails;
```
Setting up Serverless

With this done, we have our repositories master-detail flow ready. Assuming we have an AWS account setup, we can start adding the Serverless config to our project. Let’s start with the CD setup. As we said before, we’ll be using the Serverless framework to achieve our deployment workflow. Let’s add it.

We’ll also install the Serverless plugin called serverless-finch, which allows us to configure and deploy to S3 buckets.
```
yarn global add serverless
yarn add serverless-finch --save-dev
```
Now that we have our Serverless CLI installed, we init the serverless service in our project by running the following command to create a hello-world serverless service:
```
serverless create -t hello-world
```
This will create a configuration yaml file and a handler lambda function. We don’t need the handler, so we can delete handler.js. Our serverless.yml should look like this:
```
service: serverless-s3
frameworkVersion: '2'
 
# The `provider` block defines where your service will be deployed
provider:
 name: aws
 runtime: nodejs12.x
 
functions:
 helloWorld:
   handler: handler.hello-world
     events:
     - http:
         path: helloWorld
         method: get
         cors: true
```
The serverless.yml file contains configurations for a lambda function called hello-world. We can remove the functions block completely. After doing that, let’s register our Serverless Finch plugin:
```
service: serverless-s3
frameworkVersion: '2'
 
provider:
 name: aws
 runtime: nodejs12.x
 
plugins:
 - serverless-finch
```
Alright, now that our plugin is ready to be used, we can add details about our S3 buckets so it can deploy to it. Let’s add this block, which tells Serverless to use the serverless-s3-galileo bucket to deploy our code from the build directory. Make sure you use a different bucket name, as S3 bucket names are unique globally.
```
custom:
 client:
   bucketName: serverles-s3-galileo
   distributionFolder: build
   indexDocument: index.html
   errorDocument: index.html
```
That is it! We’re ready to deploy our app on our bucket. Haven’t created a bucket yet? No problem—serverless-finch will automatically create it. The last thing we need to add is bucket-policy so our app can be accessed publicly. Let’s create our bucket policy.

Note: The indexDocument is the entry point for our web application, which is index.html in this case. We also need to add the same to errorDocument so our React routing works well in S3 hosting.
```
{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Effect": "Allow",
           "Principal": {
               "AWS": "*"
           },
           "Action": "s3:GetObject",
           "Resource": "arn:aws:s3:::serverles-s3-galileo/*"
       }
   ]
}
```
As the default access to S3 assets is private, we need to set up a bucket policy for our deployment bucket. The policy gives read-only access to the public for our app so we can browse the deployed assets in the browser. You can learn more about bucket policies. Let’s update our Serverless config to use our policy. This is how our serverless.yml should look:
```
service: serverless-s3
frameworkVersion: '2'
 
provider:
 name: aws
 runtime: nodejs12.x
 
plugins:
 - serverless-finch
 
custom:
 client:
   bucketName: serverles-s3-galileo
   distributionFolder: build
   indexDocument: index.html
   errorDocument: index.html
   bucketPolicyFile: config/bucket-policy.json
```
Creating Github Actions Workflow

Assuming you’ve created your repo and pushed the code to it, we can start setting up our first workflow using Github Actions. As we’re using AWS for our Serverless deployments to S3, we need to provide the details of our IAM role. The env block allows us to insert custom env variables into the CI build. In this case, we need the AWS access key and secret access key to deploy build files to the S3 bucket.

Github allows us to store secret values that can be used in the CI environment of Github Actions. You can easily set up these secrets for your repositories. This is how they should look when configured:

Now, we can move ahead and add a Github Action workflow. Let’s create a workflow file at the .github/deploy.yml location and add the following to it.
```
name: Serverless S3 Deploy
on:
 push:
   branches: [ master ]
 pull_request:
   branches: [ master ]
```
Alright, so the Github Actions config above tells Github to trigger this workflow whenever someone pushes to the master branch or creates a PR against it.

As of now, our action config is incomplete and does nothing. Let’s add our first and only job to the workflow:
```
name: Serverless S3
 
on:
 push:
   branches: [ master ]
 pull_request:
   branches: [ master ]
 
jobs:
 build:
   runs-on: ubuntu-latest
   strategy:
     matrix:
       node-version: [10.x]
   steps:
   - uses: actions/checkout@v2
```
Let’s try to digest the config above:

runs-on: ubuntu-latest

The runs-on statement specifies which executor will be running the job. In this case, it’s the latest release of Linux Ubuntu variant.

Strategy:

     Matrix:

        node-version: [10.x]

The strategy defines the environment we want to run our job on. This is usually useful when we want to run tests on multiple machines. In our case, we don’t want that. So, we’ll be using a single node environment with version 10.x

   steps:

   – uses: actions/checkout@v2

In the configuration’s steps block, we can define various tasks to be sequentially performed within a job. actions/checkout@v2 does the work of checking out branches for us. This step is required so we can do further work on our source code.

This bare minimum setup is required for running a job in our Github workflows. After this, we will need to set up the environment and deploy our application. So, let’s add the rest of the steps to it.
name: Serverless S3 on: push: branches: [ master ] pull_request: branches: [ master ] jobs: build: runs-on: ubuntu-latest strategy: matrix: node-version: [10.x] steps: - uses: actions/checkout@v2 - name: Use Node.js ${{ matrix.node-version }} uses: actions/setup-node@v1 with: node-version: ${{ matrix.node-version }} - run: yarn install - run: yarn build - name: serverless deploy s3 uses: serverless/github-action@master with: args: client deploy --no-confirm env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
```
name: Serverless S3
 
on:
 push:
   branches: [ master ]
 pull_request:
   branches: [ master ]
 
jobs:
 build:
   runs-on: ubuntu-latest
   strategy:
     matrix:
       node-version: [10.x]
   steps:
   - uses: actions/checkout@v2
   - name: Use Node.js ${{ matrix.node-version }}
     uses: actions/setup-node@v1
     with:
       node-version: ${{ matrix.node-version }}
   - run: yarn install
   - run: yarn build
   - name: serverless deploy s3
     uses: serverless/github-action@master
     with:
       args: client deploy --no-confirm
     env:
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
       AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
```
These actions need to be executed to deploy our frontend assets to our S3 buckets. As we read through the steps, we’re doing the following things in sequence:

1. Check out the current branch code

2. Setting up our node.js environment

3. Installing our dependencies with yarn install‍

4. Building our production build with yarn build‍

5. Deploying our build to S3 with serverless deploy –no-confirm‍
- The uses block defines which custom action we’re using
- The args block allows us to pass arguments to the actions
- The –no-confirm flag is needed so Serverless Finch does not ask us for confirmations while deploying to S3 buckets.
- The args allows us to tell action to run it with specific arguments‍
- env allows us to pass custom environment variables to an action
Alright, so now we have the CD workflow setup to deploy our app. We can make a commit and push to the master branch. This should trigger our workflow. You can see your workflow running in the Actions section of your repository like this:

You can check the output of the serverless deploy step and browse the S3 website URL. It should now show our application running.

Creating CircleCI Workflow

To start building a repository, we need to authorize it with our Github account. You can do that by signing up for CircleCI and following the steps here.

As we did, add the IAM role secret credentials to our actions workflow. We can set up env variables for our workflows in CircleCI. This is how they should look once configured in the project settings:

Just like the Github Actions workflow, we can create workflows in CircleCI. CircleCI also allows us to use third-party custom plugins. We can use the available plugins called Orbs in our deployment workflows in CircleCI.

We’ll need the official CircleCI distributions of the aws-cli, serverless-framework, and node.js orbs for our deploy workflow. Let’s create our first job for our workflow:
```
version: 2.1
 
orbs:
 aws-cli: circleci/aws-cli@1.0.0
 serverless: circleci/serverless-framework@1.0.1
 node: circleci/node@4.1.0
 
jobs:
 deploy:
   executor: serverless/default
```
The executor here is a prebuilt image, which allows us to run.

Just like we defined steps for our jobs in Github Actions, we can add for CircleCI. Here we’re using commands made available from the node orb to install dependencies, build projects, and set up Serverless with AWS. Just like we set up the secrets for Github Actions, we need to define our AWS credentials under the CircleCI environment variables.
version: 2.1 orbs: aws-cli: circleci/aws-cli@1.0.0 serverless: circleci/serverless-framework@1.0.1 node: circleci/node@4.1.0 jobs: deploy: executor: serverless/default steps: - checkout - node/install-yarn - run: name: install command: yarn install - run: name: build command: yarn build - aws-cli/setup - serverless/setup: app-name: serverless-s3 org-name: velotio - run: name: deploy command: serverless client deploy --no-confirm workflows: deploy: jobs: - deploy: filters: branches: only: - master
```
version: 2.1
 
orbs:
 aws-cli: circleci/aws-cli@1.0.0
 serverless: circleci/serverless-framework@1.0.1
 node: circleci/node@4.1.0
 
jobs:
 deploy:
   executor: serverless/default
   steps:
     - checkout
     - node/install-yarn
     - run:
         name: install
         command: yarn install
     - run:
         name: build
         command: yarn build
     - aws-cli/setup
     - serverless/setup:
         app-name: serverless-s3
         org-name: velotio
     - run:
         name: deploy
         command: serverless client deploy --no-confirm
workflows:
 deploy:
   jobs:
     - deploy:
         filters:
           branches:
             only:
               - master
```
The workflows section in the above yml file indicates that we want to trigger the deploy workflow whenever our master branch gets updated. Just like we mentioned the steps for the Github Actions deploy job, we did the same for CircleCI jobs.
1. Check out the code
2. Install yarn package manager with node/install-yarn
3. Install dependencies with yarn install
4. Build the project with yarn build
5. Setup AWS and Serverless CLI
6. Deploy to s3 with serverless client deploy –no-confirm
The workflow block in the config above tells CircleCI to run the deploy job. The filters block for the deploy job above tells us that we want to run the job only when the master branch gets updated.

Once we’re done with the above setup, we can make a test commit and check whether our workflow is running.

Conclusion

We can easily integrate build/deployment workflows with simple configurations offered through Github Actions. If we don’t primarily use GitHub as version control, we can opt for CircleCI for our workflows.

Related Articles
1. Automating Serverless Framework Deployment using Watchdog
2. To Go Serverless Or Not Is The Question
You can find the referenced code at this repo.
December 12, 2022

Using Packer and Terraform to Setup Jenkins Master-Slave Architecture

Automation is everywhere and it is better to adopt it as soon as possible. Today, in this blog post, we are going to discuss creating the infrastructure. For this, we will be using AWS for hosting our deployment pipeline. Packer will be used to create AMI’s and Terraform will be used for creating the master/slaves. We will be discussing different ways of connecting the slaves and will also run a sample application with the pipeline.

Please remember the intent of the blog is to accumulate all the different components together, this means some of the code which should be available in development code repo is also included here. Now that we have highlighted the required tools, 10000 ft view and intent of the blog. Let’s begin.

Using Packer to Create AMI’s for Jenkins Master and Linux Slave

Hashicorp has bestowed with some of the most amazing tools for simplifying our life. Packer is one of them. Packer can be used to create custom AMI from already available AMI’s. We just need to create a JSON file and pass installation script as part of creation and it will take care of developing the AMI for us. Install packer depending upon your requirement from Packer downloads page. For simplicity purpose, we will be using Linux machine for creating Jenkins Master and Linux Slave. JSON file for both of them will be same but can be separated if needed.

Note: user-data passed from terraform will be different which will eventually differentiate their usage.

We are using Amazon Linux 2 – JSON file for the same.

{
  "builders": [
  {
    "ami_description": "{{user `ami-description`}}",
    "ami_name": "{{user `ami-name`}}",
    "ami_regions": [
      "us-east-1"
    ],
    "ami_users": [
      "XXXXXXXXXX"
    ],
    "ena_support": "true",
    "instance_type": "t2.medium",
    "region": "us-east-1",
    "source_ami_filter": {
      "filters": {
        "name": "amzn2-ami-hvm-2.0*x86_64*",
        "root-device-type": "ebs",
        "virtualization-type": "hvm"
      },
      "most_recent": true,
      "owners": [
        "amazon"
      ]
    },
    "sriov_support": "true",
    "ssh_username": "ec2-user",
    "tags": {
      "Name": "{{user `ami-name`}}"
    },
    "type": "amazon-ebs"
  }
],
"post-processors": [
  {
    "inline": [
      "echo AMI Name {{user `ami-name`}}",
      "date",
      "exit 0"
    ],
    "type": "shell-local"
  }
],
"provisioners": [
  {
    "script": "install_amazon.bash",
    "type": "shell"
  }
],
  "variables": {
    "ami-description": "Amazon Linux for Jenkins Master and Slave ({{isotime \"2006-01-02-15-04-05\"}})",
    "ami-name": "amazon-linux-for-jenkins-{{isotime \"2006-01-02-15-04-05\"}}",
    "aws_access_key": "",
    "aws_secret_key": ""
  }
}

{
  "builders": [
  {
    "ami_description": "{{user `ami-description`}}",
    "ami_name": "{{user `ami-name`}}",
    "ami_regions": [
      "us-east-1"
    ],
    "ami_users": [
      "XXXXXXXXXX"
    ],
    "ena_support": "true",
    "instance_type": "t2.medium",
    "region": "us-east-1",
    "source_ami_filter": {
      "filters": {
        "name": "amzn2-ami-hvm-2.0*x86_64*",
        "root-device-type": "ebs",
        "virtualization-type": "hvm"
      },
      "most_recent": true,
      "owners": [
        "amazon"
      ]
    },
    "sriov_support": "true",
    "ssh_username": "ec2-user",
    "tags": {
      "Name": "{{user `ami-name`}}"
    },
    "type": "amazon-ebs"
  }
],
"post-processors": [
  {
    "inline": [
      "echo AMI Name {{user `ami-name`}}",
      "date",
      "exit 0"
    ],
    "type": "shell-local"
  }
],
"provisioners": [
  {
    "script": "install_amazon.bash",
    "type": "shell"
  }
],
  "variables": {
    "ami-description": "Amazon Linux for Jenkins Master and Slave ({{isotime \"2006-01-02-15-04-05\"}})",
    "ami-name": "amazon-linux-for-jenkins-{{isotime \"2006-01-02-15-04-05\"}}",
    "aws_access_key": "",
    "aws_secret_key": ""
  }
}

As you can see the file is pretty simple. The only thing of interest here is the install_amazon.bash script. In this blog post, we will deploy a Node-based application which is running inside a docker container. Content of the bash file is as follows:

#!/bin/bash

set -x

# For Node
curl -sL https://rpm.nodesource.com/setup_10.x | sudo -E bash -

# For xmlstarlet
sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

sudo yum update -y

sleep 10

# Setting up Docker
sudo yum install -y docker
sudo usermod -a -G docker ec2-user

# Just to be safe removing previously available java if present
sudo yum remove -y java

sudo yum install -y python2-pip jq unzip vim tree biosdevname nc mariadb bind-utils at screen tmux xmlstarlet git java-1.8.0-openjdk nc gcc-c++ make nodejs

sudo -H pip install awscli bcrypt
sudo -H pip install --upgrade awscli
sudo -H pip install --upgrade aws-ec2-assign-elastic-ip

sudo npm install -g @angular/cli

sudo systemctl enable docker
sudo systemctl enable atd

sudo yum clean all
sudo rm -rf /var/cache/yum/
exit 0
@velotiotech

#!/bin/bash

set -x

# For Node
curl -sL https://rpm.nodesource.com/setup_10.x | sudo -E bash -

# For xmlstarlet
sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

sudo yum update -y

sleep 10

# Setting up Docker
sudo yum install -y docker
sudo usermod -a -G docker ec2-user

# Just to be safe removing previously available java if present
sudo yum remove -y java

sudo yum install -y python2-pip jq unzip vim tree biosdevname nc mariadb bind-utils at screen tmux xmlstarlet git java-1.8.0-openjdk nc gcc-c++ make nodejs

sudo -H pip install awscli bcrypt
sudo -H pip install --upgrade awscli
sudo -H pip install --upgrade aws-ec2-assign-elastic-ip

sudo npm install -g @angular/cli

sudo systemctl enable docker
sudo systemctl enable atd

sudo yum clean all
sudo rm -rf /var/cache/yum/
exit 0
@velotiotech

Now there are a lot of things mentioned let’s check them out. As mentioned earlier we will be discussing different ways of connecting to a slave and for one of them, we need xmlstarlet. Rest of the things are packages that we might need in one way or the other.

Update ami_users with actual user value. This can be found on AWS console Under Support and inside of it Support Center.

Validate what we have written is right or not by running packer validate amazon.json.

Once confirmed, build the packer image by running packer build amazon.json.

After completion check your AWS console and you will find a new AMI created in “My AMI’s”.

It’s now time to start using terraform for creating the machines.

Prerequisite:

1. Please make sure you create a provider.tf file.

provider "aws" {
  region                  = "us-east-1"
  shared_credentials_file = "~/.aws/credentials"
  profile                 = "dev"
}

provider "aws" {
  region                  = "us-east-1"
  shared_credentials_file = "~/.aws/credentials"
  profile                 = "dev"
}

The ‘credentials file’ will contain aws_access_key_id and aws_secret_access_key.

2. Keep SSH keys handy for server/slave machines. Here is a nice article highlighting how to create it or else create them before hand on aws console and reference it in the code.

3. VPC:

# lookup for the "default" VPC
data "aws_vpc" "default_vpc" {
  default = true
}

# subnet list in the "default" VPC
# The "default" VPC has all "public subnets"
data "aws_subnet_ids" "default_public" {
  vpc_id = "${data.aws_vpc.default_vpc.id}"
}

# lookup for the "default" VPC
data "aws_vpc" "default_vpc" {
  default = true
}

# subnet list in the "default" VPC
# The "default" VPC has all "public subnets"
data "aws_subnet_ids" "default_public" {
  vpc_id = "${data.aws_vpc.default_vpc.id}"
}

Creating Terraform Script for Spinning up Jenkins Master

Creating Terraform Script for Spinning up Jenkins Master. Get terraform from terraform download page.

We will need to set up the Security Group before setting up the instance.

# Security Group:
resource "aws_security_group" "jenkins_server" {
  name        = "jenkins_server"
  description = "Jenkins Server: created by Terraform for [dev]"

  # legacy name of VPC ID
  vpc_id = "${data.aws_vpc.default_vpc.id}"

  tags {
    Name = "jenkins_server"
    env  = "dev"
  }
}

###############################################################################
# ALL INBOUND
###############################################################################

# ssh
resource "aws_security_group_rule" "jenkins_server_from_source_ingress_ssh" {
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["<Your Public IP>/32", "172.0.0.0/8"]
  description       = "ssh to jenkins_server"
}

# web
resource "aws_security_group_rule" "jenkins_server_from_source_ingress_webui" {
  type              = "ingress"
  from_port         = 8080
  to_port           = 8080
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "jenkins server web"
}

# JNLP
resource "aws_security_group_rule" "jenkins_server_from_source_ingress_jnlp" {
  type              = "ingress"
  from_port         = 33453
  to_port           = 33453
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["172.31.0.0/16"]
  description       = "jenkins server JNLP Connection"
}

###############################################################################
# ALL OUTBOUND
###############################################################################

resource "aws_security_group_rule" "jenkins_server_to_other_machines_ssh" {
  type              = "egress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins servers to ssh to other machines"
}

resource "aws_security_group_rule" "jenkins_server_outbound_all_80" {
  type              = "egress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins servers for outbound yum"
}

resource "aws_security_group_rule" "jenkins_server_outbound_all_443" {
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins servers for outbound yum"
}

# Security Group:
resource "aws_security_group" "jenkins_server" {
  name        = "jenkins_server"
  description = "Jenkins Server: created by Terraform for [dev]"

  # legacy name of VPC ID
  vpc_id = "${data.aws_vpc.default_vpc.id}"

  tags {
    Name = "jenkins_server"
    env  = "dev"
  }
}

###############################################################################
# ALL INBOUND
###############################################################################

# ssh
resource "aws_security_group_rule" "jenkins_server_from_source_ingress_ssh" {
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["<Your Public IP>/32", "172.0.0.0/8"]
  description       = "ssh to jenkins_server"
}

# web
resource "aws_security_group_rule" "jenkins_server_from_source_ingress_webui" {
  type              = "ingress"
  from_port         = 8080
  to_port           = 8080
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "jenkins server web"
}

# JNLP
resource "aws_security_group_rule" "jenkins_server_from_source_ingress_jnlp" {
  type              = "ingress"
  from_port         = 33453
  to_port           = 33453
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["172.31.0.0/16"]
  description       = "jenkins server JNLP Connection"
}

###############################################################################
# ALL OUTBOUND
###############################################################################

resource "aws_security_group_rule" "jenkins_server_to_other_machines_ssh" {
  type              = "egress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins servers to ssh to other machines"
}

resource "aws_security_group_rule" "jenkins_server_outbound_all_80" {
  type              = "egress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins servers for outbound yum"
}

resource "aws_security_group_rule" "jenkins_server_outbound_all_443" {
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  security_group_id = "${aws_security_group.jenkins_server.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins servers for outbound yum"
}

Now that we have a custom AMI and security groups for ourselves let’s use them to create a terraform instance.

# AMI lookup for this Jenkins Server
data "aws_ami" "jenkins_server" {
  most_recent      = true
  owners           = ["self"]

  filter {
    name   = "name"
    values = ["amazon-linux-for-jenkins*"]
  }
}

resource "aws_key_pair" "jenkins_server" {
  key_name   = "jenkins_server"
  public_key = "${file("jenkins_server.pub")}"
}

# lookup the security group of the Jenkins Server
data "aws_security_group" "jenkins_server" {
  filter {
    name   = "group-name"
    values = ["jenkins_server"]
  }
}

# userdata for the Jenkins server ...
data "template_file" "jenkins_server" {
  template = "${file("scripts/jenkins_server.sh")}"

  vars {
    env = "dev"
    jenkins_admin_password = "mysupersecretpassword"
  }
}

# the Jenkins server itself
resource "aws_instance" "jenkins_server" {
  ami                    		= "${data.aws_ami.jenkins_server.image_id}"
  instance_type          		= "t3.medium"
  key_name               		= "${aws_key_pair.jenkins_server.key_name}"
  subnet_id              		= "${data.aws_subnet_ids.default_public.ids[0]}"
  vpc_security_group_ids 		= ["${data.aws_security_group.jenkins_server.id}"]
  iam_instance_profile   		= "dev_jenkins_server"
  user_data              		= "${data.template_file.jenkins_server.rendered}"

  tags {
    "Name" = "jenkins_server"
  }

  root_block_device {
    delete_on_termination = true
  }
}

output "jenkins_server_ami_name" {
    value = "${data.aws_ami.jenkins_server.name}"
}

output "jenkins_server_ami_id" {
    value = "${data.aws_ami.jenkins_server.id}"
}

output "jenkins_server_public_ip" {
  value = "${aws_instance.jenkins_server.public_ip}"
}

output "jenkins_server_private_ip" {
  value = "${aws_instance.jenkins_server.private_ip}"
}

# AMI lookup for this Jenkins Server
data "aws_ami" "jenkins_server" {
  most_recent      = true
  owners           = ["self"]

  filter {
    name   = "name"
    values = ["amazon-linux-for-jenkins*"]
  }
}

resource "aws_key_pair" "jenkins_server" {
  key_name   = "jenkins_server"
  public_key = "${file("jenkins_server.pub")}"
}

# lookup the security group of the Jenkins Server
data "aws_security_group" "jenkins_server" {
  filter {
    name   = "group-name"
    values = ["jenkins_server"]
  }
}

# userdata for the Jenkins server ...
data "template_file" "jenkins_server" {
  template = "${file("scripts/jenkins_server.sh")}"

  vars {
    env = "dev"
    jenkins_admin_password = "mysupersecretpassword"
  }
}

# the Jenkins server itself
resource "aws_instance" "jenkins_server" {
  ami                    		= "${data.aws_ami.jenkins_server.image_id}"
  instance_type          		= "t3.medium"
  key_name               		= "${aws_key_pair.jenkins_server.key_name}"
  subnet_id              		= "${data.aws_subnet_ids.default_public.ids[0]}"
  vpc_security_group_ids 		= ["${data.aws_security_group.jenkins_server.id}"]
  iam_instance_profile   		= "dev_jenkins_server"
  user_data              		= "${data.template_file.jenkins_server.rendered}"

  tags {
    "Name" = "jenkins_server"
  }

  root_block_device {
    delete_on_termination = true
  }
}

output "jenkins_server_ami_name" {
    value = "${data.aws_ami.jenkins_server.name}"
}

output "jenkins_server_ami_id" {
    value = "${data.aws_ami.jenkins_server.id}"
}

output "jenkins_server_public_ip" {
  value = "${aws_instance.jenkins_server.public_ip}"
}

output "jenkins_server_private_ip" {
  value = "${aws_instance.jenkins_server.private_ip}"
}

As mentioned before, we will be discussing multiple ways in which we can connect the slaves to Jenkins master. But it is already known that every time a new Jenkins comes up, it generates a unique password. Now there are two ways to deal with this, one is to wait for Jenkins to spin up and retrieve that password or just directly edit the admin password while creating Jenkins master. Here we will be discussing how to change the password when configuring Jenkins. (If you need the script to retrieve Jenkins password as soon as it gets created than comment and I will share that with you as well).

Below is the user data to install Jenkins master, configure its password and install required packages.

#!/bin/bash

set -x

function wait_for_jenkins()
{
  while (( 1 )); do
      echo "waiting for Jenkins to launch on port [8080] ..."
      
      nc -zv 127.0.0.1 8080
      if (( $? == 0 )); then
          break
      fi

      sleep 10
  done

  echo "Jenkins launched"
}

function updating_jenkins_master_password ()
{
  cat > /tmp/jenkinsHash.py <<EOF
import bcrypt
import sys
if not sys.argv[1]:
  sys.exit(10)
plaintext_pwd=sys.argv[1]
encrypted_pwd=bcrypt.hashpw(sys.argv[1], bcrypt.gensalt(rounds=10, prefix=b"2a"))
isCorrect=bcrypt.checkpw(plaintext_pwd, encrypted_pwd)
if not isCorrect:
  sys.exit(20);
print "{}".format(encrypted_pwd)
EOF

  chmod +x /tmp/jenkinsHash.py
  
  # Wait till /var/lib/jenkins/users/admin* folder gets created
  sleep 10

  cd /var/lib/jenkins/users/admin*
  pwd
  while (( 1 )); do
      echo "Waiting for Jenkins to generate admin user's config file ..."

      if [[ -f "./config.xml" ]]; then
          break
      fi

      sleep 10
  done

  echo "Admin config file created"

  admin_password=$(python /tmp/jenkinsHash.py ${jenkins_admin_password} 2>&1)
  
  # Please do not remove alter quote as it keeps the hash syntax intact or else while substitution, $<character> will be replaced by null
  xmlstarlet -q ed --inplace -u "/user/properties/hudson.security.HudsonPrivateSecurityRealm_-Details/passwordHash" -v '#jbcrypt:'"$admin_password" config.xml

  # Restart
  systemctl restart jenkins
  sleep 10
}

function install_packages ()
{

  wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
  rpm --import https://jenkins-ci.org/redhat/jenkins-ci.org.key
  yum install -y jenkins

  # firewall
  #firewall-cmd --permanent --new-service=jenkins
  #firewall-cmd --permanent --service=jenkins --set-short="Jenkins Service Ports"
  #firewall-cmd --permanent --service=jenkins --set-description="Jenkins Service firewalld port exceptions"
  #firewall-cmd --permanent --service=jenkins --add-port=8080/tcp
  #firewall-cmd --permanent --add-service=jenkins
  #firewall-cmd --zone=public --add-service=http --permanent
  #firewall-cmd --reload
  systemctl enable jenkins
  systemctl restart jenkins
  sleep 10
}

function configure_jenkins_server ()
{
  # Jenkins cli
  echo "installing the Jenkins cli ..."
  cp /var/cache/jenkins/war/WEB-INF/jenkins-cli.jar /var/lib/jenkins/jenkins-cli.jar

  # Getting initial password
  # PASSWORD=$(cat /var/lib/jenkins/secrets/initialAdminPassword)
  PASSWORD="${jenkins_admin_password}"
  sleep 10

  jenkins_dir="/var/lib/jenkins"
  plugins_dir="$jenkins_dir/plugins"

  cd $jenkins_dir

  # Open JNLP port
  xmlstarlet -q ed --inplace -u "/hudson/slaveAgentPort" -v 33453 config.xml

  cd $plugins_dir || { echo "unable to chdir to [$plugins_dir]"; exit 1; }

  # List of plugins that are needed to be installed 
  plugin_list="git-client git github-api github-oauth github MSBuild ssh-slaves workflow-aggregator ws-cleanup"

  # remove existing plugins, if any ...
  rm -rfv $plugin_list

  for plugin in $plugin_list; do
      echo "installing plugin [$plugin] ..."
      java -jar $jenkins_dir/jenkins-cli.jar -s http://127.0.0.1:8080/ -auth admin:$PASSWORD install-plugin $plugin
  done

  # Restart jenkins after installing plugins
  java -jar $jenkins_dir/jenkins-cli.jar -s http://127.0.0.1:8080 -auth admin:$PASSWORD safe-restart
}

### script starts here ###

install_packages

wait_for_jenkins

updating_jenkins_master_password

wait_for_jenkins

configure_jenkins_server

echo "Done"
exit 0

#!/bin/bash

set -x

function wait_for_jenkins()
{
  while (( 1 )); do
      echo "waiting for Jenkins to launch on port [8080] ..."
      
      nc -zv 127.0.0.1 8080
      if (( $? == 0 )); then
          break
      fi

      sleep 10
  done

  echo "Jenkins launched"
}

function updating_jenkins_master_password ()
{
  cat > /tmp/jenkinsHash.py <<EOF
import bcrypt
import sys
if not sys.argv[1]:
  sys.exit(10)
plaintext_pwd=sys.argv[1]
encrypted_pwd=bcrypt.hashpw(sys.argv[1], bcrypt.gensalt(rounds=10, prefix=b"2a"))
isCorrect=bcrypt.checkpw(plaintext_pwd, encrypted_pwd)
if not isCorrect:
  sys.exit(20);
print "{}".format(encrypted_pwd)
EOF

  chmod +x /tmp/jenkinsHash.py
  
  # Wait till /var/lib/jenkins/users/admin* folder gets created
  sleep 10

  cd /var/lib/jenkins/users/admin*
  pwd
  while (( 1 )); do
      echo "Waiting for Jenkins to generate admin user's config file ..."

      if [[ -f "./config.xml" ]]; then
          break
      fi

      sleep 10
  done

  echo "Admin config file created"

  admin_password=$(python /tmp/jenkinsHash.py ${jenkins_admin_password} 2>&1)
  
  # Please do not remove alter quote as it keeps the hash syntax intact or else while substitution, $<character> will be replaced by null
  xmlstarlet -q ed --inplace -u "/user/properties/hudson.security.HudsonPrivateSecurityRealm_-Details/passwordHash" -v '#jbcrypt:'"$admin_password" config.xml

  # Restart
  systemctl restart jenkins
  sleep 10
}

function install_packages ()
{

  wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
  rpm --import https://jenkins-ci.org/redhat/jenkins-ci.org.key
  yum install -y jenkins

  # firewall
  #firewall-cmd --permanent --new-service=jenkins
  #firewall-cmd --permanent --service=jenkins --set-short="Jenkins Service Ports"
  #firewall-cmd --permanent --service=jenkins --set-description="Jenkins Service firewalld port exceptions"
  #firewall-cmd --permanent --service=jenkins --add-port=8080/tcp
  #firewall-cmd --permanent --add-service=jenkins
  #firewall-cmd --zone=public --add-service=http --permanent
  #firewall-cmd --reload
  systemctl enable jenkins
  systemctl restart jenkins
  sleep 10
}

function configure_jenkins_server ()
{
  # Jenkins cli
  echo "installing the Jenkins cli ..."
  cp /var/cache/jenkins/war/WEB-INF/jenkins-cli.jar /var/lib/jenkins/jenkins-cli.jar

  # Getting initial password
  # PASSWORD=$(cat /var/lib/jenkins/secrets/initialAdminPassword)
  PASSWORD="${jenkins_admin_password}"
  sleep 10

  jenkins_dir="/var/lib/jenkins"
  plugins_dir="$jenkins_dir/plugins"

  cd $jenkins_dir

  # Open JNLP port
  xmlstarlet -q ed --inplace -u "/hudson/slaveAgentPort" -v 33453 config.xml

  cd $plugins_dir || { echo "unable to chdir to [$plugins_dir]"; exit 1; }

  # List of plugins that are needed to be installed 
  plugin_list="git-client git github-api github-oauth github MSBuild ssh-slaves workflow-aggregator ws-cleanup"

  # remove existing plugins, if any ...
  rm -rfv $plugin_list

  for plugin in $plugin_list; do
      echo "installing plugin [$plugin] ..."
      java -jar $jenkins_dir/jenkins-cli.jar -s http://127.0.0.1:8080/ -auth admin:$PASSWORD install-plugin $plugin
  done

  # Restart jenkins after installing plugins
  java -jar $jenkins_dir/jenkins-cli.jar -s http://127.0.0.1:8080 -auth admin:$PASSWORD safe-restart
}

### script starts here ###

install_packages

wait_for_jenkins

updating_jenkins_master_password

wait_for_jenkins

configure_jenkins_server

echo "Done"
exit 0

There is a lot of stuff that has been covered here. But the most tricky bit is changing Jenkins password. Here we are using a python script which uses brcypt to hash the plain text in Jenkins encryption format and xmlstarlet for replacing that password in the actual location. Also, we are using xmstarlet to edit the JNLP port for windows slave. Do remember initial username for Jenkins is admin.

Command to run: Initialize terraform – terraform init , Check and apply – terraform plan -> terraform apply

After successfully running apply command go to AWS console and check for a new instance coming up. Hit the <public ip=””>:8080 and enter credentials as you had passed and you will have the Jenkins master for yourself ready to be used. </public>

Note: I will be providing the terraform script and permission list of IAM roles for the user at the end of the blog.

Creating Terraform Script for Spinning up Linux Slave and connect it to master

We won’t be creating a new image here rather use the same one that we used for Jenkins master.

VPC will be same and updated Security groups for slave are below:

resource "aws_security_group" "dev_jenkins_worker_linux" {
  name        = "dev_jenkins_worker_linux"
  description = "Jenkins Server: created by Terraform for [dev]"

# legacy name of VPC ID
  vpc_id = "${data.aws_vpc.default_vpc.id}"

  tags {
    Name = "dev_jenkins_worker_linux"
    env  = "dev"
  }
}

###############################################################################
# ALL INBOUND
###############################################################################

# ssh
resource "aws_security_group_rule" "jenkins_worker_linux_from_source_ingress_ssh" {
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["<Your Public IP>/32"]
  description       = "ssh to jenkins_worker_linux"
}

# ssh
resource "aws_security_group_rule" "jenkins_worker_linux_from_source_ingress_webui" {
  type              = "ingress"
  from_port         = 8080
  to_port           = 8080
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "ssh to jenkins_worker_linux"
}


###############################################################################
# ALL OUTBOUND
###############################################################################

resource "aws_security_group_rule" "jenkins_worker_linux_to_all_80" {
  type              = "egress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker to all 80"
}

resource "aws_security_group_rule" "jenkins_worker_linux_to_all_443" {
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker to all 443"
}

resource "aws_security_group_rule" "jenkins_worker_linux_to_other_machines_ssh" {
  type              = "egress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker linux to jenkins server"
}

resource "aws_security_group_rule" "jenkins_worker_linux_to_jenkins_server_8080" {
  type                     = "egress"
  from_port                = 8080
  to_port                  = 8080
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.dev_jenkins_worker_linux.id}"
  source_security_group_id = "${aws_security_group.jenkins_server.id}"
  description              = "allow jenkins workers linux to jenkins server"
}

resource "aws_security_group" "dev_jenkins_worker_linux" {
  name        = "dev_jenkins_worker_linux"
  description = "Jenkins Server: created by Terraform for [dev]"

# legacy name of VPC ID
  vpc_id = "${data.aws_vpc.default_vpc.id}"

  tags {
    Name = "dev_jenkins_worker_linux"
    env  = "dev"
  }
}

###############################################################################
# ALL INBOUND
###############################################################################

# ssh
resource "aws_security_group_rule" "jenkins_worker_linux_from_source_ingress_ssh" {
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["<Your Public IP>/32"]
  description       = "ssh to jenkins_worker_linux"
}

# ssh
resource "aws_security_group_rule" "jenkins_worker_linux_from_source_ingress_webui" {
  type              = "ingress"
  from_port         = 8080
  to_port           = 8080
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "ssh to jenkins_worker_linux"
}


###############################################################################
# ALL OUTBOUND
###############################################################################

resource "aws_security_group_rule" "jenkins_worker_linux_to_all_80" {
  type              = "egress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker to all 80"
}

resource "aws_security_group_rule" "jenkins_worker_linux_to_all_443" {
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker to all 443"
}

resource "aws_security_group_rule" "jenkins_worker_linux_to_other_machines_ssh" {
  type              = "egress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker linux to jenkins server"
}

resource "aws_security_group_rule" "jenkins_worker_linux_to_jenkins_server_8080" {
  type                     = "egress"
  from_port                = 8080
  to_port                  = 8080
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.dev_jenkins_worker_linux.id}"
  source_security_group_id = "${aws_security_group.jenkins_server.id}"
  description              = "allow jenkins workers linux to jenkins server"
}

Now that we have the required security groups in place it is time to bring into light terraform script for linux slave.

data "aws_ami" "jenkins_worker_linux" {
  most_recent      = true
  owners           = ["self"]

  filter {
    name   = "name"
    values = ["amazon-linux-for-jenkins*"]
  }
}

resource "aws_key_pair" "jenkins_worker_linux" {
  key_name   = "jenkins_worker_linux"
  public_key = "${file("jenkins_worker.pub")}"
}

data "local_file" "jenkins_worker_pem" {
  filename = "${path.module}/jenkins_worker.pem"
}

data "template_file" "userdata_jenkins_worker_linux" {
  template = "${file("scripts/jenkins_worker_linux.sh")}"

  vars {
    env         = "dev"
    region      = "us-east-1"
    datacenter  = "dev-us-east-1"
    node_name   = "us-east-1-jenkins_worker_linux"
    domain      = ""
    device_name = "eth0"
    server_ip   = "${aws_instance.jenkins_server.private_ip}"
    worker_pem  = "${data.local_file.jenkins_worker_pem.content}"
    jenkins_username = "admin"
    jenkins_password = "mysupersecretpassword"
  }
}

# lookup the security group of the Jenkins Server
data "aws_security_group" "jenkins_worker_linux" {
  filter {
    name   = "group-name"
    values = ["dev_jenkins_worker_linux"]
  }
}

resource "aws_launch_configuration" "jenkins_worker_linux" {
  name_prefix                 = "dev-jenkins-worker-linux"
  image_id                    = "${data.aws_ami.jenkins_worker_linux.image_id}"
  instance_type               = "t3.medium"
  iam_instance_profile        = "dev_jenkins_worker_linux"
  key_name                    = "${aws_key_pair.jenkins_worker_linux.key_name}"
  security_groups             = ["${data.aws_security_group.jenkins_worker_linux.id}"]
  user_data                   = "${data.template_file.userdata_jenkins_worker_linux.rendered}"
  associate_public_ip_address = false

  root_block_device {
    delete_on_termination = true
    volume_size = 100
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "jenkins_worker_linux" {
  name                      = "dev-jenkins-worker-linux"
  min_size                  = "1"
  max_size                  = "2"
  desired_capacity          = "2"
  health_check_grace_period = 60
  health_check_type         = "EC2"
  vpc_zone_identifier       = ["${data.aws_subnet_ids.default_public.ids}"]
  launch_configuration      = "${aws_launch_configuration.jenkins_worker_linux.name}"
  termination_policies      = ["OldestLaunchConfiguration"]
  wait_for_capacity_timeout = "10m"
  default_cooldown          = 60

  tags = [
    {
      key                 = "Name"
      value               = "dev_jenkins_worker_linux"
      propagate_at_launch = true
    },
    {
      key                 = "class"
      value               = "dev_jenkins_worker_linux"
      propagate_at_launch = true
    },
  ]
}

data "aws_ami" "jenkins_worker_linux" {
  most_recent      = true
  owners           = ["self"]

  filter {
    name   = "name"
    values = ["amazon-linux-for-jenkins*"]
  }
}

resource "aws_key_pair" "jenkins_worker_linux" {
  key_name   = "jenkins_worker_linux"
  public_key = "${file("jenkins_worker.pub")}"
}

data "local_file" "jenkins_worker_pem" {
  filename = "${path.module}/jenkins_worker.pem"
}

data "template_file" "userdata_jenkins_worker_linux" {
  template = "${file("scripts/jenkins_worker_linux.sh")}"

  vars {
    env         = "dev"
    region      = "us-east-1"
    datacenter  = "dev-us-east-1"
    node_name   = "us-east-1-jenkins_worker_linux"
    domain      = ""
    device_name = "eth0"
    server_ip   = "${aws_instance.jenkins_server.private_ip}"
    worker_pem  = "${data.local_file.jenkins_worker_pem.content}"
    jenkins_username = "admin"
    jenkins_password = "mysupersecretpassword"
  }
}

# lookup the security group of the Jenkins Server
data "aws_security_group" "jenkins_worker_linux" {
  filter {
    name   = "group-name"
    values = ["dev_jenkins_worker_linux"]
  }
}

resource "aws_launch_configuration" "jenkins_worker_linux" {
  name_prefix                 = "dev-jenkins-worker-linux"
  image_id                    = "${data.aws_ami.jenkins_worker_linux.image_id}"
  instance_type               = "t3.medium"
  iam_instance_profile        = "dev_jenkins_worker_linux"
  key_name                    = "${aws_key_pair.jenkins_worker_linux.key_name}"
  security_groups             = ["${data.aws_security_group.jenkins_worker_linux.id}"]
  user_data                   = "${data.template_file.userdata_jenkins_worker_linux.rendered}"
  associate_public_ip_address = false

  root_block_device {
    delete_on_termination = true
    volume_size = 100
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "jenkins_worker_linux" {
  name                      = "dev-jenkins-worker-linux"
  min_size                  = "1"
  max_size                  = "2"
  desired_capacity          = "2"
  health_check_grace_period = 60
  health_check_type         = "EC2"
  vpc_zone_identifier       = ["${data.aws_subnet_ids.default_public.ids}"]
  launch_configuration      = "${aws_launch_configuration.jenkins_worker_linux.name}"
  termination_policies      = ["OldestLaunchConfiguration"]
  wait_for_capacity_timeout = "10m"
  default_cooldown          = 60

  tags = [
    {
      key                 = "Name"
      value               = "dev_jenkins_worker_linux"
      propagate_at_launch = true
    },
    {
      key                 = "class"
      value               = "dev_jenkins_worker_linux"
      propagate_at_launch = true
    },
  ]
}

And now the final piece of code, which is user-data of slave machine.

#!/bin/bash

set -x

function wait_for_jenkins ()
{
    echo "Waiting jenkins to launch on 8080..."

    while (( 1 )); do
        echo "Waiting for Jenkins"

        nc -zv ${server_ip} 8080
        if (( $? == 0 )); then
            break
        fi

        sleep 10
    done

    echo "Jenkins launched"
}

function slave_setup()
{
    # Wait till jar file gets available
    ret=1
    while (( $ret != 0 )); do
        wget -O /opt/jenkins-cli.jar http://${server_ip}:8080/jnlpJars/jenkins-cli.jar
        ret=$?

        echo "jenkins cli ret [$ret]"
    done

    ret=1
    while (( $ret != 0 )); do
        wget -O /opt/slave.jar http://${server_ip}:8080/jnlpJars/slave.jar
        ret=$?

        echo "jenkins slave ret [$ret]"
    done
    
    mkdir -p /opt/jenkins-slave
    chown -R ec2-user:ec2-user /opt/jenkins-slave

    # Register_slave
    JENKINS_URL="http://${server_ip}:8080"

    USERNAME="${jenkins_username}"
    
    # PASSWORD=$(cat /tmp/secret)
    PASSWORD="${jenkins_password}"

    SLAVE_IP=$(ip -o -4 addr list ${device_name} | head -n1 | awk '{print $4}' | cut -d/ -f1)
    NODE_NAME=$(echo "jenkins-slave-linux-$SLAVE_IP" | tr '.' '-')
    NODE_SLAVE_HOME="/opt/jenkins-slave"
    EXECUTORS=2
    SSH_PORT=22

    CRED_ID="$NODE_NAME"
    LABELS="build linux docker"
    USERID="ec2-user"

    cd /opt
    
    # Creating CMD utility for jenkins-cli commands
    jenkins_cmd="java -jar /opt/jenkins-cli.jar -s $JENKINS_URL -auth $USERNAME:$PASSWORD"

    # Waiting for Jenkins to load all plugins
    while (( 1 )); do

      count=$($jenkins_cmd list-plugins 2>/dev/null | wc -l)
      ret=$?

      echo "count [$count] ret [$ret]"

      if (( $count > 0 )); then
          break
      fi

      sleep 30
    done

    # Delete Credentials if present for respective slave machines
    $jenkins_cmd delete-credentials system::system::jenkins _ $CRED_ID

    # Generating cred.xml for creating credentials on Jenkins server
    cat > /tmp/cred.xml <<EOF
<com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey plugin="ssh-credentials@1.16">
  <scope>GLOBAL</scope>
  <id>$CRED_ID</id>
  <description>Generated via Terraform for $SLAVE_IP</description>
  <username>$USERID</username>
  <privateKeySource class="com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey\$DirectEntryPrivateKeySource">
    <privateKey>${worker_pem}</privateKey>
  </privateKeySource>
</com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey>
EOF

    # Creating credential using cred.xml
    cat /tmp/cred.xml | $jenkins_cmd create-credentials-by-xml system::system::jenkins _

    # For Deleting Node, used when testing
    $jenkins_cmd delete-node $NODE_NAME
    
    # Generating node.xml for creating node on Jenkins server
    cat > /tmp/node.xml <<EOF
<slave>
  <name>$NODE_NAME</name>
  <description>Linux Slave</description>
  <remoteFS>$NODE_SLAVE_HOME</remoteFS>
  <numExecutors>$EXECUTORS</numExecutors>
  <mode>NORMAL</mode>
  <retentionStrategy class="hudson.slaves.RetentionStrategy\$Always"/>
  <launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.5">
    <host>$SLAVE_IP</host>
    <port>$SSH_PORT</port>
    <credentialsId>$CRED_ID</credentialsId>
  </launcher>
  <label>$LABELS</label>
  <nodeProperties/>
  <userId>$USERID</userId>
</slave>
EOF

  sleep 10
  
  # Creating node using node.xml
  cat /tmp/node.xml | $jenkins_cmd create-node $NODE_NAME
}

### script begins here ###

wait_for_jenkins

slave_setup

echo "Done"
exit 0

#!/bin/bash

set -x

function wait_for_jenkins ()
{
    echo "Waiting jenkins to launch on 8080..."

    while (( 1 )); do
        echo "Waiting for Jenkins"

        nc -zv ${server_ip} 8080
        if (( $? == 0 )); then
            break
        fi

        sleep 10
    done

    echo "Jenkins launched"
}

function slave_setup()
{
    # Wait till jar file gets available
    ret=1
    while (( $ret != 0 )); do
        wget -O /opt/jenkins-cli.jar http://${server_ip}:8080/jnlpJars/jenkins-cli.jar
        ret=$?

        echo "jenkins cli ret [$ret]"
    done

    ret=1
    while (( $ret != 0 )); do
        wget -O /opt/slave.jar http://${server_ip}:8080/jnlpJars/slave.jar
        ret=$?

        echo "jenkins slave ret [$ret]"
    done
    
    mkdir -p /opt/jenkins-slave
    chown -R ec2-user:ec2-user /opt/jenkins-slave

    # Register_slave
    JENKINS_URL="http://${server_ip}:8080"

    USERNAME="${jenkins_username}"
    
    # PASSWORD=$(cat /tmp/secret)
    PASSWORD="${jenkins_password}"

    SLAVE_IP=$(ip -o -4 addr list ${device_name} | head -n1 | awk '{print $4}' | cut -d/ -f1)
    NODE_NAME=$(echo "jenkins-slave-linux-$SLAVE_IP" | tr '.' '-')
    NODE_SLAVE_HOME="/opt/jenkins-slave"
    EXECUTORS=2
    SSH_PORT=22

    CRED_ID="$NODE_NAME"
    LABELS="build linux docker"
    USERID="ec2-user"

    cd /opt
    
    # Creating CMD utility for jenkins-cli commands
    jenkins_cmd="java -jar /opt/jenkins-cli.jar -s $JENKINS_URL -auth $USERNAME:$PASSWORD"

    # Waiting for Jenkins to load all plugins
    while (( 1 )); do

      count=$($jenkins_cmd list-plugins 2>/dev/null | wc -l)
      ret=$?

      echo "count [$count] ret [$ret]"

      if (( $count > 0 )); then
          break
      fi

      sleep 30
    done

    # Delete Credentials if present for respective slave machines
    $jenkins_cmd delete-credentials system::system::jenkins _ $CRED_ID

    # Generating cred.xml for creating credentials on Jenkins server
    cat > /tmp/cred.xml <<EOF
<com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey plugin="ssh-credentials@1.16">
  <scope>GLOBAL</scope>
  <id>$CRED_ID</id>
  <description>Generated via Terraform for $SLAVE_IP</description>
  <username>$USERID</username>
  <privateKeySource class="com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey\$DirectEntryPrivateKeySource">
    <privateKey>${worker_pem}</privateKey>
  </privateKeySource>
</com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey>
EOF

    # Creating credential using cred.xml
    cat /tmp/cred.xml | $jenkins_cmd create-credentials-by-xml system::system::jenkins _

    # For Deleting Node, used when testing
    $jenkins_cmd delete-node $NODE_NAME
    
    # Generating node.xml for creating node on Jenkins server
    cat > /tmp/node.xml <<EOF
<slave>
  <name>$NODE_NAME</name>
  <description>Linux Slave</description>
  <remoteFS>$NODE_SLAVE_HOME</remoteFS>
  <numExecutors>$EXECUTORS</numExecutors>
  <mode>NORMAL</mode>
  <retentionStrategy class="hudson.slaves.RetentionStrategy\$Always"/>
  <launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.5">
    <host>$SLAVE_IP</host>
    <port>$SSH_PORT</port>
    <credentialsId>$CRED_ID</credentialsId>
  </launcher>
  <label>$LABELS</label>
  <nodeProperties/>
  <userId>$USERID</userId>
</slave>
EOF

  sleep 10
  
  # Creating node using node.xml
  cat /tmp/node.xml | $jenkins_cmd create-node $NODE_NAME
}

### script begins here ###

wait_for_jenkins

slave_setup

echo "Done"
exit 0

This will not only create a node on Jenkins master but also attach it.

Command to run: Initialize terraform – terraform init, Check and apply – terraform plan -> terraform apply

One drawback of this is, if by any chance slave gets disconnected or goes down, it will remain on Jenkins master as offline, also it will not manually attach itself to Jenkins master.

Some solutions for them are:

1. Create a cron job on the slave which will run user-data after a certain interval.

2. Use swarm plugin.

3. As we are on AWS, we can even use Amazon EC2 Plugin.

Maybe in a future blog, we will cover using both of these plugins as well.

Using Packer to create AMI’s for Windows Slave

Windows AMI will also be created using packer. All the pointers for Windows will remain as it were for Linux.

{
  "variables": {
    "ami-description": "Windows Server for Jenkins Slave ({{isotime \"2006-01-02-15-04-05\"}})",
    "ami-name": "windows-slave-for-jenkins-{{isotime \"2006-01-02-15-04-05\"}}",
    "aws_access_key": "",
    "aws_secret_key": ""
  },

  "builders": [
    {
      "ami_description": "{{user `ami-description`}}",
      "ami_name": "{{user `ami-name`}}",
      "ami_regions": [
        "us-east-1"
      ],
      "ami_users": [
        "XXXXXXXXXX"
      ],
      "ena_support": "true",
      "instance_type": "t3.medium",
      "region": "us-east-1",
      "source_ami_filter": {
        "filters": {
          "name": "Windows_Server-2016-English-Full-Containers-*",
          "root-device-type": "ebs",
          "virtualization-type": "hvm"
        },
        "most_recent": true,
        "owners": [
          "amazon"
        ]
      },
      "sriov_support": "true",
      "user_data_file": "scripts/SetUpWinRM.ps1",
      "communicator": "winrm",
      "winrm_username": "Administrator",
      "winrm_insecure": true,
      "winrm_use_ssl": true,
      "tags": {
        "Name": "{{user `ami-name`}}"
      },
      "type": "amazon-ebs"
    }
  ],
  "post-processors": [
  {
    "inline": [
      "echo AMI Name {{user `ami-name`}}",
      "date",
      "exit 0"
    ],
    "type": "shell-local"
  }
  ],
  "provisioners": [
    {
      "type": "powershell",
      "valid_exit_codes": [ 0, 3010 ],
      "scripts": [
        "scripts/disable-uac.ps1",
        "scripts/enable-rdp.ps1",
        "install_windows.ps1"
      ]
    },
    {
      "type": "windows-restart",
      "restart_check_command": "powershell -command \"& {Write-Output 'restarted.'}\""
    },
    {
      "type": "powershell",
      "inline": [
        "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
        "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"
      ]
    }
  ]
}

{
  "variables": {
    "ami-description": "Windows Server for Jenkins Slave ({{isotime \"2006-01-02-15-04-05\"}})",
    "ami-name": "windows-slave-for-jenkins-{{isotime \"2006-01-02-15-04-05\"}}",
    "aws_access_key": "",
    "aws_secret_key": ""
  },

  "builders": [
    {
      "ami_description": "{{user `ami-description`}}",
      "ami_name": "{{user `ami-name`}}",
      "ami_regions": [
        "us-east-1"
      ],
      "ami_users": [
        "XXXXXXXXXX"
      ],
      "ena_support": "true",
      "instance_type": "t3.medium",
      "region": "us-east-1",
      "source_ami_filter": {
        "filters": {
          "name": "Windows_Server-2016-English-Full-Containers-*",
          "root-device-type": "ebs",
          "virtualization-type": "hvm"
        },
        "most_recent": true,
        "owners": [
          "amazon"
        ]
      },
      "sriov_support": "true",
      "user_data_file": "scripts/SetUpWinRM.ps1",
      "communicator": "winrm",
      "winrm_username": "Administrator",
      "winrm_insecure": true,
      "winrm_use_ssl": true,
      "tags": {
        "Name": "{{user `ami-name`}}"
      },
      "type": "amazon-ebs"
    }
  ],
  "post-processors": [
  {
    "inline": [
      "echo AMI Name {{user `ami-name`}}",
      "date",
      "exit 0"
    ],
    "type": "shell-local"
  }
  ],
  "provisioners": [
    {
      "type": "powershell",
      "valid_exit_codes": [ 0, 3010 ],
      "scripts": [
        "scripts/disable-uac.ps1",
        "scripts/enable-rdp.ps1",
        "install_windows.ps1"
      ]
    },
    {
      "type": "windows-restart",
      "restart_check_command": "powershell -command \"& {Write-Output 'restarted.'}\""
    },
    {
      "type": "powershell",
      "inline": [
        "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
        "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"
      ]
    }
  ]
}

Now when it comes to windows one should know that it does not behave the same way Linux does. For us to be able to communicate with this image an essential component required is WinRM. We set it up at the very beginning as part of user_data_file. Also, windows require user input for a lot of things and while automating it is not possible to provide it as it will break the flow of execution so we disable UAC and enable RDP so that we can connect to that machine from our local desktop for debugging if needed. And at last, we will execute install_windows.ps1 file which will set up our slave. Please note at the last we are calling two PowerShell scripts to generate random password every time a new machine is created. It is mandatory to have them or you will never be able to login into your machines.

There are multiple user-data in the above code, let’s understand them in their order of appearance.

SetUpWinRM.ps1:

<powershell>

write-output "Running User Data Script"
write-host "(host) Running User Data Script"

Set-ExecutionPolicy Unrestricted -Scope LocalMachine -Force -ErrorAction Ignore

# Don't set this before Set-ExecutionPolicy as it throws an error
$ErrorActionPreference = "stop"

# Remove HTTP listener
Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse

$Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "packer"
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force

# WinRM
write-output "Setting up WinRM"
write-host "(host) setting up WinRM"

cmd.exe /c winrm quickconfig -q
cmd.exe /c winrm set "winrm/config" '@{MaxTimeoutms="1800000"}'
cmd.exe /c winrm set "winrm/config/winrs" '@{MaxMemoryPerShellMB="1024"}'
cmd.exe /c winrm set "winrm/config/service" '@{AllowUnencrypted="true"}'
cmd.exe /c winrm set "winrm/config/client" '@{AllowUnencrypted="true"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/client/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{CredSSP="true"}'
cmd.exe /c winrm set "winrm/config/listener?Address=*+Transport=HTTPS" "@{Port=`"5986`";Hostname=`"packer`";CertificateThumbprint=`"$($Cert.Thumbprint)`"}"
cmd.exe /c netsh advfirewall firewall set rule group="remote administration" new enable=yes
cmd.exe /c netsh firewall add portopening TCP 5986 "Port 5986"
cmd.exe /c net stop winrm
cmd.exe /c sc config winrm start= auto
cmd.exe /c net start winrm

</powershell>

<powershell>

write-output "Running User Data Script"
write-host "(host) Running User Data Script"

Set-ExecutionPolicy Unrestricted -Scope LocalMachine -Force -ErrorAction Ignore

# Don't set this before Set-ExecutionPolicy as it throws an error
$ErrorActionPreference = "stop"

# Remove HTTP listener
Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse

$Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "packer"
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force

# WinRM
write-output "Setting up WinRM"
write-host "(host) setting up WinRM"

cmd.exe /c winrm quickconfig -q
cmd.exe /c winrm set "winrm/config" '@{MaxTimeoutms="1800000"}'
cmd.exe /c winrm set "winrm/config/winrs" '@{MaxMemoryPerShellMB="1024"}'
cmd.exe /c winrm set "winrm/config/service" '@{AllowUnencrypted="true"}'
cmd.exe /c winrm set "winrm/config/client" '@{AllowUnencrypted="true"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/client/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{CredSSP="true"}'
cmd.exe /c winrm set "winrm/config/listener?Address=*+Transport=HTTPS" "@{Port=`"5986`";Hostname=`"packer`";CertificateThumbprint=`"$($Cert.Thumbprint)`"}"
cmd.exe /c netsh advfirewall firewall set rule group="remote administration" new enable=yes
cmd.exe /c netsh firewall add portopening TCP 5986 "Port 5986"
cmd.exe /c net stop winrm
cmd.exe /c sc config winrm start= auto
cmd.exe /c net start winrm

</powershell>

The content is pretty straightforward as it is just setting up WInRM. The only thing that matters here is the <powershell> and </powershell>. They are mandatory as packer will not be able to understand what is the type of script. Next, we come across disable-uac.ps1 & enable-rdp.ps1, and we have discussed their purpose before. The last user-data is the actual user-data that we need to install all the required packages in the AMI.

Chocolatey: a blessing in disguise – Installing required applications in windows by scripting is a real headache as you have to write a lot of stuff just to install a single application but luckily for us we have chocolatey. It works as a package manager for windows and helps us install applications as we are installing packages on Linux. install_windows.ps1 has installation step for chocolatey and how it can be used to install other applications on windows.

See, such a small script and you can get all the components to run your Windows application in no time (Kidding… This script actually takes around 20 minutes to run :P)

Remaining user-data can be found here.

Now that we have the image for ourselves let’s start with terraform script to make this machine a slave of your Jenkins master.

Creating Terraform Script for Spinning up Windows Slave and Connect it to Master

This time also we will first create the security groups and then create the slave machine from the same AMI that we developed above.

resource "aws_security_group" "dev_jenkins_worker_windows" {
  name        = "dev_jenkins_worker_windows"
  description = "Jenkins Server: created by Terraform for [dev]"

  # legacy name of VPC ID
  vpc_id = "${data.aws_vpc.default_vpc.id}"

  tags {
    Name = "dev_jenkins_worker_windows"
    env  = "dev"
  }
}

###############################################################################
# ALL INBOUND
###############################################################################

# ssh
resource "aws_security_group_rule" "jenkins_worker_windows_from_source_ingress_webui" {
  type              = "ingress"
  from_port         = 8080
  to_port           = 8080
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "ssh to jenkins_worker_windows"
}

# rdp
resource "aws_security_group_rule" "jenkins_worker_windows_from_rdp" {
  type              = "ingress"
  from_port         = 3389
  to_port           = 3389
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["<Your Public IP>/32"]
  description       = "rdp to jenkins_worker_windows"
}

###############################################################################
# ALL OUTBOUND
###############################################################################

resource "aws_security_group_rule" "jenkins_worker_windows_to_all_80" {
  type              = "egress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker to all 80"
}

resource "aws_security_group_rule" "jenkins_worker_windows_to_all_443" {
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker to all 443"
}

resource "aws_security_group_rule" "jenkins_worker_windows_to_jenkins_server_33453" {
  type              = "egress"
  from_port         = 33453
  to_port           = 33453
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["172.31.0.0/16"]
  description       = "allow jenkins worker windows to jenkins server"
}

resource "aws_security_group_rule" "jenkins_worker_windows_to_jenkins_server_8080" {
  type                     = "egress"
  from_port                = 8080
  to_port                  = 8080
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.dev_jenkins_worker_windows.id}"
  source_security_group_id = "${aws_security_group.jenkins_server.id}"
  description              = "allow jenkins workers windows to jenkins server"
}

resource "aws_security_group_rule" "jenkins_worker_windows_to_all_22" {
  type              = "egress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker windows to connect outbound from 22"
}

resource "aws_security_group" "dev_jenkins_worker_windows" {
  name        = "dev_jenkins_worker_windows"
  description = "Jenkins Server: created by Terraform for [dev]"

  # legacy name of VPC ID
  vpc_id = "${data.aws_vpc.default_vpc.id}"

  tags {
    Name = "dev_jenkins_worker_windows"
    env  = "dev"
  }
}

###############################################################################
# ALL INBOUND
###############################################################################

# ssh
resource "aws_security_group_rule" "jenkins_worker_windows_from_source_ingress_webui" {
  type              = "ingress"
  from_port         = 8080
  to_port           = 8080
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "ssh to jenkins_worker_windows"
}

# rdp
resource "aws_security_group_rule" "jenkins_worker_windows_from_rdp" {
  type              = "ingress"
  from_port         = 3389
  to_port           = 3389
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["<Your Public IP>/32"]
  description       = "rdp to jenkins_worker_windows"
}

###############################################################################
# ALL OUTBOUND
###############################################################################

resource "aws_security_group_rule" "jenkins_worker_windows_to_all_80" {
  type              = "egress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker to all 80"
}

resource "aws_security_group_rule" "jenkins_worker_windows_to_all_443" {
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker to all 443"
}

resource "aws_security_group_rule" "jenkins_worker_windows_to_jenkins_server_33453" {
  type              = "egress"
  from_port         = 33453
  to_port           = 33453
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["172.31.0.0/16"]
  description       = "allow jenkins worker windows to jenkins server"
}

resource "aws_security_group_rule" "jenkins_worker_windows_to_jenkins_server_8080" {
  type                     = "egress"
  from_port                = 8080
  to_port                  = 8080
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.dev_jenkins_worker_windows.id}"
  source_security_group_id = "${aws_security_group.jenkins_server.id}"
  description              = "allow jenkins workers windows to jenkins server"
}

resource "aws_security_group_rule" "jenkins_worker_windows_to_all_22" {
  type              = "egress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
  cidr_blocks       = ["0.0.0.0/0"]
  description       = "allow jenkins worker windows to connect outbound from 22"
}

Once security groups are in place we move towards creating the terraform file for windows machine itself. Windows can’t connect to Jenkins master using SSH the method we used while connecting the Linux slave instead we have to use JNLP. A quick recap, when creating Jenkins master we used xmlstarlet to modify the JNLP port and also added rules in sg group to allow connection for JNLP. Also, we have opened the port for RDP so that if any issue occurs you can get in the machine and debug it.

Terraform file:

# Setting Up Windows Slave 
data "aws_ami" "jenkins_worker_windows" {
  most_recent      = true
  owners           = ["self"]

  filter {
    name   = "name"
    values = ["windows-slave-for-jenkins*"]
  }
}

resource "aws_key_pair" "jenkins_worker_windows" {
  key_name   = "jenkins_worker_windows"
  public_key = "${file("jenkins_worker.pub")}"
}

data "template_file" "userdata_jenkins_worker_windows" {
  template = "${file("scripts/jenkins_worker_windows.ps1")}"

  vars {
    env         = "dev"
    region      = "us-east-1"
    datacenter  = "dev-us-east-1"
    node_name   = "us-east-1-jenkins_worker_windows"
    domain      = ""
    device_name = "eth0"
    server_ip   = "${aws_instance.jenkins_server.private_ip}"
    worker_pem  = "${data.local_file.jenkins_worker_pem.content}"
    jenkins_username = "admin"
    jenkins_password = "mysupersecretpassword"
  }
}

# lookup the security group of the Jenkins Server
data "aws_security_group" "jenkins_worker_windows" {
  filter {
    name   = "group-name"
    values = ["dev_jenkins_worker_windows"]
  }
}

resource "aws_launch_configuration" "jenkins_worker_windows" {
  name_prefix                 = "dev-jenkins-worker-"
  image_id                    = "${data.aws_ami.jenkins_worker_windows.image_id}"
  instance_type               = "t3.medium"
  iam_instance_profile        = "dev_jenkins_worker_windows"
  key_name                    = "${aws_key_pair.jenkins_worker_windows.key_name}"
  security_groups             = ["${data.aws_security_group.jenkins_worker_windows.id}"]
  user_data                   = "${data.template_file.userdata_jenkins_worker_windows.rendered}"
  associate_public_ip_address = false

  root_block_device {
    delete_on_termination = true
    volume_size = 100
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "jenkins_worker_windows" {
  name                      = "dev-jenkins-worker-windows"
  min_size                  = "1"
  max_size                  = "2"
  desired_capacity          = "2"
  health_check_grace_period = 60
  health_check_type         = "EC2"
  vpc_zone_identifier       = ["${data.aws_subnet_ids.default_public.ids}"]
  launch_configuration      = "${aws_launch_configuration.jenkins_worker_windows.name}"
  termination_policies      = ["OldestLaunchConfiguration"]
  wait_for_capacity_timeout = "10m"
  default_cooldown          = 60

  #lifecycle {
  #  create_before_destroy = true
  #}


  ## on replacement, gives new service time to spin up before moving on to destroy
  #provisioner "local-exec" {
  #  command = "sleep 60"
  #}

  tags = [
    {
      key                 = "Name"
      value               = "dev_jenkins_worker_windows"
      propagate_at_launch = true
    },
    {
      key                 = "class"
      value               = "dev_jenkins_worker_windows"
      propagate_at_launch = true
    },
  ]
}

# Setting Up Windows Slave 
data "aws_ami" "jenkins_worker_windows" {
  most_recent      = true
  owners           = ["self"]

  filter {
    name   = "name"
    values = ["windows-slave-for-jenkins*"]
  }
}

resource "aws_key_pair" "jenkins_worker_windows" {
  key_name   = "jenkins_worker_windows"
  public_key = "${file("jenkins_worker.pub")}"
}

data "template_file" "userdata_jenkins_worker_windows" {
  template = "${file("scripts/jenkins_worker_windows.ps1")}"

  vars {
    env         = "dev"
    region      = "us-east-1"
    datacenter  = "dev-us-east-1"
    node_name   = "us-east-1-jenkins_worker_windows"
    domain      = ""
    device_name = "eth0"
    server_ip   = "${aws_instance.jenkins_server.private_ip}"
    worker_pem  = "${data.local_file.jenkins_worker_pem.content}"
    jenkins_username = "admin"
    jenkins_password = "mysupersecretpassword"
  }
}

# lookup the security group of the Jenkins Server
data "aws_security_group" "jenkins_worker_windows" {
  filter {
    name   = "group-name"
    values = ["dev_jenkins_worker_windows"]
  }
}

resource "aws_launch_configuration" "jenkins_worker_windows" {
  name_prefix                 = "dev-jenkins-worker-"
  image_id                    = "${data.aws_ami.jenkins_worker_windows.image_id}"
  instance_type               = "t3.medium"
  iam_instance_profile        = "dev_jenkins_worker_windows"
  key_name                    = "${aws_key_pair.jenkins_worker_windows.key_name}"
  security_groups             = ["${data.aws_security_group.jenkins_worker_windows.id}"]
  user_data                   = "${data.template_file.userdata_jenkins_worker_windows.rendered}"
  associate_public_ip_address = false

  root_block_device {
    delete_on_termination = true
    volume_size = 100
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "jenkins_worker_windows" {
  name                      = "dev-jenkins-worker-windows"
  min_size                  = "1"
  max_size                  = "2"
  desired_capacity          = "2"
  health_check_grace_period = 60
  health_check_type         = "EC2"
  vpc_zone_identifier       = ["${data.aws_subnet_ids.default_public.ids}"]
  launch_configuration      = "${aws_launch_configuration.jenkins_worker_windows.name}"
  termination_policies      = ["OldestLaunchConfiguration"]
  wait_for_capacity_timeout = "10m"
  default_cooldown          = 60

  #lifecycle {
  #  create_before_destroy = true
  #}


  ## on replacement, gives new service time to spin up before moving on to destroy
  #provisioner "local-exec" {
  #  command = "sleep 60"
  #}

  tags = [
    {
      key                 = "Name"
      value               = "dev_jenkins_worker_windows"
      propagate_at_launch = true
    },
    {
      key                 = "class"
      value               = "dev_jenkins_worker_windows"
      propagate_at_launch = true
    },
  ]
}

Finally, we reach the user-data for the terraform plan. It will download the required jar file, create a node on Jenkins and register itself as a slave.

<powershell>

function Wait-For-Jenkins {

  Write-Host "Waiting jenkins to launch on 8080..."

  Do {
  Write-Host "Waiting for Jenkins"

   Nc -zv ${server_ip} 8080
   If( $? -eq $true ) {
     Break
   }
   Sleep 10

  } While (1)

  Do {
   Write-Host "Waiting for JNLP"
      
   Nc -zv ${server_ip} 33453
   If( $? -eq $true ) {
    Break
   }
   Sleep 10

  } While (1)      

  Write-Host "Jenkins launched"
}

function Slave-Setup()
{
  # Register_slave
  $JENKINS_URL="http://${server_ip}:8080"

  $USERNAME="${jenkins_username}"
  
  $PASSWORD="${jenkins_password}"

  $AUTH = -join ("$USERNAME", ":", "$PASSWORD")
  echo $AUTH

  # Below IP collection logic works for Windows Server 2016 edition and needs testing for windows server 2008 edition
  $SLAVE_IP=(ipconfig | findstr /r "[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*" | findstr "IPv4 Address").substring(39) | findstr /B "172.31"
  
  $NODE_NAME="jenkins-slave-windows-$SLAVE_IP"
  
  $NODE_SLAVE_HOME="C:\Jenkins\"
  $EXECUTORS=2
  $JNLP_PORT=33453

  $CRED_ID="$NODE_NAME"
  $LABELS="build windows"
  
  # Creating CMD utility for jenkins-cli commands
  # This is not working in windows therefore specify full path
  $jenkins_cmd = "java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth admin:$PASSWORD"

  Sleep 20

  Write-Host "Downloading jenkins-cli.jar file"
  (New-Object System.Net.WebClient).DownloadFile("$JENKINS_URL/jnlpJars/jenkins-cli.jar", "C:\Jenkins\jenkins-cli.jar")

  Write-Host "Downloading slave.jar file"
  (New-Object System.Net.WebClient).DownloadFile("$JENKINS_URL/jnlpJars/slave.jar", "C:\Jenkins\slave.jar")

  Sleep 10

  # Waiting for Jenkins to load all plugins
  Do {
  
    $count=(java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH list-plugins | Measure-Object -line).Lines
    $ret=$?

    Write-Host "count [$count] ret [$ret]"

    If ( $count -gt 0 ) {
        Break
    }

    sleep 30
  } While ( 1 )

  # For Deleting Node, used when testing
  Write-Host "Deleting Node $NODE_NAME if present"
  java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH delete-node $NODE_NAME
  
  # Generating node.xml for creating node on Jenkins server
  $NodeXml = @"
<slave>
<name>$NODE_NAME</name>
<description>Windows Slave</description>
<remoteFS>$NODE_SLAVE_HOME</remoteFS>
<numExecutors>$EXECUTORS</numExecutors>
<mode>NORMAL</mode>
<retentionStrategy class="hudson.slaves.RetentionStrategy`$Always`"/>
<launcher class="hudson.slaves.JNLPLauncher">
  <workDirSettings>
    <disabled>false</disabled>
    <internalDir>remoting</internalDir>
    <failIfWorkDirIsMissing>false</failIfWorkDirIsMissing>
  </workDirSettings>
</launcher>
<label>$LABELS</label>
<nodeProperties/>
</slave>
"@
  $NodeXml | Out-File -FilePath C:\Jenkins\node.xml 

  type C:\Jenkins\node.xml

  # Creating node using node.xml
  Write-Host "Creating $NODE_NAME"
  Get-Content -Path C:\Jenkins\node.xml | java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH create-node $NODE_NAME

  Write-Host "Registering Node $NODE_NAME via JNLP"
  Start-Process java -ArgumentList "-jar C:\Jenkins\slave.jar -jnlpCredentials $AUTH -jnlpUrl $JENKINS_URL/computer/$NODE_NAME/slave-agent.jnlp"
}

### script begins here ###

Wait-For-Jenkins

Slave-Setup

echo "Done"
</powershell>
<persist>true</persist>

<powershell>

function Wait-For-Jenkins {

  Write-Host "Waiting jenkins to launch on 8080..."

  Do {
  Write-Host "Waiting for Jenkins"

   Nc -zv ${server_ip} 8080
   If( $? -eq $true ) {
     Break
   }
   Sleep 10

  } While (1)

  Do {
   Write-Host "Waiting for JNLP"
      
   Nc -zv ${server_ip} 33453
   If( $? -eq $true ) {
    Break
   }
   Sleep 10

  } While (1)      

  Write-Host "Jenkins launched"
}

function Slave-Setup()
{
  # Register_slave
  $JENKINS_URL="http://${server_ip}:8080"

  $USERNAME="${jenkins_username}"
  
  $PASSWORD="${jenkins_password}"

  $AUTH = -join ("$USERNAME", ":", "$PASSWORD")
  echo $AUTH

  # Below IP collection logic works for Windows Server 2016 edition and needs testing for windows server 2008 edition
  $SLAVE_IP=(ipconfig | findstr /r "[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*" | findstr "IPv4 Address").substring(39) | findstr /B "172.31"
  
  $NODE_NAME="jenkins-slave-windows-$SLAVE_IP"
  
  $NODE_SLAVE_HOME="C:\Jenkins\"
  $EXECUTORS=2
  $JNLP_PORT=33453

  $CRED_ID="$NODE_NAME"
  $LABELS="build windows"
  
  # Creating CMD utility for jenkins-cli commands
  # This is not working in windows therefore specify full path
  $jenkins_cmd = "java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth admin:$PASSWORD"

  Sleep 20

  Write-Host "Downloading jenkins-cli.jar file"
  (New-Object System.Net.WebClient).DownloadFile("$JENKINS_URL/jnlpJars/jenkins-cli.jar", "C:\Jenkins\jenkins-cli.jar")

  Write-Host "Downloading slave.jar file"
  (New-Object System.Net.WebClient).DownloadFile("$JENKINS_URL/jnlpJars/slave.jar", "C:\Jenkins\slave.jar")

  Sleep 10

  # Waiting for Jenkins to load all plugins
  Do {
  
    $count=(java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH list-plugins | Measure-Object -line).Lines
    $ret=$?

    Write-Host "count [$count] ret [$ret]"

    If ( $count -gt 0 ) {
        Break
    }

    sleep 30
  } While ( 1 )

  # For Deleting Node, used when testing
  Write-Host "Deleting Node $NODE_NAME if present"
  java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH delete-node $NODE_NAME
  
  # Generating node.xml for creating node on Jenkins server
  $NodeXml = @"
<slave>
<name>$NODE_NAME</name>
<description>Windows Slave</description>
<remoteFS>$NODE_SLAVE_HOME</remoteFS>
<numExecutors>$EXECUTORS</numExecutors>
<mode>NORMAL</mode>
<retentionStrategy class="hudson.slaves.RetentionStrategy`$Always`"/>
<launcher class="hudson.slaves.JNLPLauncher">
  <workDirSettings>
    <disabled>false</disabled>
    <internalDir>remoting</internalDir>
    <failIfWorkDirIsMissing>false</failIfWorkDirIsMissing>
  </workDirSettings>
</launcher>
<label>$LABELS</label>
<nodeProperties/>
</slave>
"@
  $NodeXml | Out-File -FilePath C:\Jenkins\node.xml 

  type C:\Jenkins\node.xml

  # Creating node using node.xml
  Write-Host "Creating $NODE_NAME"
  Get-Content -Path C:\Jenkins\node.xml | java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH create-node $NODE_NAME

  Write-Host "Registering Node $NODE_NAME via JNLP"
  Start-Process java -ArgumentList "-jar C:\Jenkins\slave.jar -jnlpCredentials $AUTH -jnlpUrl $JENKINS_URL/computer/$NODE_NAME/slave-agent.jnlp"
}

### script begins here ###

Wait-For-Jenkins

Slave-Setup

echo "Done"
</powershell>
<persist>true</persist>

Command to run: Initialize terraform – terraform init, Check and apply – terraform plan -> terraform apply

Same drawbacks are applicable here and the same solutions will work here as well.

Congratulations! You have a Jenkins master with Windows and Linux slave attached to it.

IAM roles for reference

‍Jenkins Master

Linux Slave

Windows Slave

Bonus:

If you want to associate IAM permissions to the user but cannot assign FULL ACCESS here is a curated list below for reference:

Packer Policy

Terraform Policy

Conclusion:

This blog tries to highlight one of the ways in which we can use packer and Terraform to create AMI’s which will serve as Jenkins master and slave. We not only covered their creation but also focused on how to associate security groups and checked some of the basic IAM roles that can be applied. Although we have covered almost all the possible scenarios but still depending on use case, the required changes would be very less and this can serve as a boiler plate code when beginning to plan your infrastructure on cloud.

December 12, 2022

Setting Up A Robust Authentication Environment For OpenSSH Using QR Code PAM

Do you like WhatsApp Web authentication? Well, WhatsApp Web has always fascinated me with the simplicity of QR-Code based authentication. Though there are similar authentication UIs available, I always wondered whether a remote secure shell (SSH) could be authenticated with a QR code with this kind of simplicity while keeping the auth process secure. In this guide, we will see how to write and implement a bare-bones PAM module for OpenSSH Linux-based system.

“OpenSSH is the premier connectivity tool for remote login with the SSH protocol. It encrypts all traffic to eliminate eavesdropping, connection hijacking, and other attacks. In addition, OpenSSH provides a large suite of secure tunneling capabilities, several authentication methods, and sophisticated configuration options.”

– openssh.com

Meet PAM!

PAM, short for “Pluggable Authentication Module,” is a middleware that abstracts authentication features on Linux and UNIX-like operating systems. PAM has been around for more than two decades. The authentication process could be cumbersome with each service looking for authenticating users with a different set of hardware and software, such as username-password, fingerprint module, face recognition, two-factor authentication, LDAP, etc. But the underlining process remains the same, i.e., users must be authenticated as who they say they are. This is where PAM comes into the picture and provides an API to the application layer and provides built-in functions to implement and extend PAM capability.

‍

Source: Redhat

Understand how OpenSSH interacts with PAM

The Linux host OpenSSH (sshd daemon) begins by reading the configuration defined in /etc/pam.conf or alternatively in /etc/pam.d configuration files. The config files are usually defined with service names having various realms (auth, account, session, password). The “auth” realm is what takes care of authenticating users as who they say. A typical sshd PAM service file on Ubuntu OS can be seen below, and you can relate with your own flavor of Linux:

@include common-auth
account    required     pam_nologin.so
@include common-account
session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so close
session    required     pam_loginuid.so
session    optional     pam_keyinit.so force revoke
@include common-session
session    optional     pam_motd.so  motd=/run/motd.dynamic
session    optional     pam_motd.so noupdate
session    optional     pam_mail.so standard noenv # [1]
session    required     pam_limits.so
session    required     pam_env.so # [1]
session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale
session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so open
@include common-password

@include common-auth
account    required     pam_nologin.so
@include common-account
session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so close
session    required     pam_loginuid.so
session    optional     pam_keyinit.so force revoke
@include common-session
session    optional     pam_motd.so  motd=/run/motd.dynamic
session    optional     pam_motd.so noupdate
session    optional     pam_mail.so standard noenv # [1]
session    required     pam_limits.so
session    required     pam_env.so # [1]
session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale
session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so open
@include common-password

The common-auth file has an “auth” realm with the pam_unix.so PAM module, which is responsible for authenticating the user with a password. Our goal is to write a PAM module that replaces pam_unix.so with our own version.

When OpenSSH makes calls to the PAM module, the very first function it looks for is “pam_sm_authenticate,” along with some other mandatory function such as pam_sm_setcred. Thus, we will be implementing the pam_sm_authenticate function, which will be an entry point to our shared object library. The module should return PAM_SUCCESS (0) as the return code for successful authentication.

Application Architecture

The project architecture has four main applications. The backend is hosted on an AWS cloud with minimal and low-cost infrastructure resources.

1. PAM Module: Provides QR-Code auth prompt to client SSH Login

2. Android Mobile App: Authenticates SSH login by scanning a QR code

3. QR Auth Server API: Backend application to which our Android App connects and communicates and shares authentication payload along with some other meta information

4. WebSocket Server (API Gateway WebSocket, and NodeJS) App: PAM Module and server-side app shares auth message payload in real time

When a user connects to the remote server via SSH, a PAM module is triggered, offering a QR code for authentication. Information is exchanged between the API gateway WebSocket, which in terms saves temporary auth data in DynamoDB. A user then uses an Android mobile app (written in react-native) to scan the QR code.

Upon scanning, the app connects to the API gateway. An API call is first authenticated by AWS Cognito to avoid any intrusion. The request is then proxied to the Lambda function, which authenticates input payload comparing information available in DynamoDB. Upon successful authentication, the Lambda function makes a call to the API gateway WebSocket to inform the PAM to authenticate the user.

Framework and Toolchains

PAM modules are shared object libraries that must be be written in C (although other languages can be used to compile and link or probably make cross programming language calls like python pam or pam_exec). Below are the framework and toolset I am using to serve this project:

1. gcc, make, automake, autoreconf, libpam (GNU dev tools on Ubuntu OS)

2. libqrencode, libwebsockets, libpam, libssl, libcrypto (C libraries)

3. NodeJS, express (for server-side app)

4. API gateway and API Gateway webSocket, AWS Lambda (AWS Cloud Services for hosting serverless server side app)

5. Serverless framework (for easily deploying infrastructure)

6. react-native, react-native-qrcode-scanner (for Android mobile app)

7. AWS Cognito (for authentication)

8. AWS Amplify Library

This guide assumes you have a basic understanding of the Linux OS, C programming language, pointers, and gcc code compilation. For the backend APIs, I prefer to use NodeJS as a primary programming language, but you may opt for the language of your choice for designing HTTP APIs.

Authentication with QR Code PAM Module

When the module initializes, we first want to generate a random string with the help “/dev/urandom” character device. Byte string obtained from this device contains non-screen characters, so we encode them with Base64. Let’s call this string an auth verification string.

void get_random_string(char *random_str,int length)
{
   FILE *fp = fopen("/dev/urandom","r");
   if(!fp){
       perror("Unble to open urandom device");
       exit(EXIT_FAILURE);
   }
   fread(random_str,length,1,fp);
   fclose(fp);
}
 
char random_string[11];
  
  //get random string
   get_random_string(random_string,10);
  //convert random string to base64 coz input string is coming from /dev/urandom and may contain binary chars
   const int encoded_length = Base64encode_len(10);
   base64_string=(char *)malloc(encoded_length+1);
   Base64encode(base64_string,random_string,10);
   base64_string[encoded_length]='�';

void get_random_string(char *random_str,int length)
{
   FILE *fp = fopen("/dev/urandom","r");
   if(!fp){
       perror("Unble to open urandom device");
       exit(EXIT_FAILURE);
   }
   fread(random_str,length,1,fp);
   fclose(fp);
}
 
char random_string[11];
  
  //get random string
   get_random_string(random_string,10);
  //convert random string to base64 coz input string is coming from /dev/urandom and may contain binary chars
   const int encoded_length = Base64encode_len(10);
   base64_string=(char *)malloc(encoded_length+1);
   Base64encode(base64_string,random_string,10);
   base64_string[encoded_length]='';

We then initiate a WebSocket connection with the help of the libwebsockets library and connect to our API Gateway WebSocket endpoint. Once the connection is established, we inform that a user may try to authenticate with auth verification string. The API Gateway WebSocket returns a unique connection ID to our PAM module.

static void connect_client(struct lws_sorted_usec_list *sul)
{
   struct vhd_minimal_client_echo *vhd =
       lws_container_of(sul, struct vhd_minimal_client_echo, sul);
   struct lws_client_connect_info i;
   char host[128];
   lws_snprintf(host, sizeof(host), "%s:%u", *vhd->ads, *vhd->port);
   memset(&i, 0, sizeof(i));
   i.context = vhd->context;
  //i.port = *vhd->port;
   i.port = *vhd->port;
   i.address = *vhd->ads;
   i.path = *vhd->url;
   i.host = host;
   i.origin = host;
   i.ssl_connection = LCCSCF_USE_SSL | LCCSCF_ALLOW_SELFSIGNED | LCCSCF_SKIP_SERVER_CERT_HOSTNAME_CHECK | LCCSCF_PIPELINE;
  //i.ssl_connection = 0;
   if ((*vhd->options) & 2)
       i.ssl_connection |= LCCSCF_USE_SSL;
   i.vhost = vhd->vhost;
   i.iface = *vhd->iface;
  //i.protocol = ;
   i.pwsi = &vhd->client_wsi;
  //lwsl_user("connecting to %s:%d/%s\n", i.address, i.port, i.path);
   log_message(LOG_INFO,ws_applogic.pamh,"About to create connection %s",host);
  //return !lws_client_connect_via_info(&i);
   if (!lws_client_connect_via_info(&i))
       lws_sul_schedule(vhd->context, 0, &vhd->sul,
                connect_client, 10 * LWS_US_PER_SEC);
}

static void connect_client(struct lws_sorted_usec_list *sul)
{
   struct vhd_minimal_client_echo *vhd =
       lws_container_of(sul, struct vhd_minimal_client_echo, sul);
   struct lws_client_connect_info i;
   char host[128];
   lws_snprintf(host, sizeof(host), "%s:%u", *vhd->ads, *vhd->port);
   memset(&i, 0, sizeof(i));
   i.context = vhd->context;
  //i.port = *vhd->port;
   i.port = *vhd->port;
   i.address = *vhd->ads;
   i.path = *vhd->url;
   i.host = host;
   i.origin = host;
   i.ssl_connection = LCCSCF_USE_SSL | LCCSCF_ALLOW_SELFSIGNED | LCCSCF_SKIP_SERVER_CERT_HOSTNAME_CHECK | LCCSCF_PIPELINE;
  //i.ssl_connection = 0;
   if ((*vhd->options) & 2)
       i.ssl_connection |= LCCSCF_USE_SSL;
   i.vhost = vhd->vhost;
   i.iface = *vhd->iface;
  //i.protocol = ;
   i.pwsi = &vhd->client_wsi;
  //lwsl_user("connecting to %s:%d/%s\n", i.address, i.port, i.path);
   log_message(LOG_INFO,ws_applogic.pamh,"About to create connection %s",host);
  //return !lws_client_connect_via_info(&i);
   if (!lws_client_connect_via_info(&i))
       lws_sul_schedule(vhd->context, 0, &vhd->sul,
                connect_client, 10 * LWS_US_PER_SEC);
}

Upon receiving the connection id from the server, the PAM module converts this connection id to SHA1 hash string and finally composes a unique string for generating QR Code. This string consists of three parts separated by colons (:), i.e.,

“qrauth:BASE64(AUTH_VERIFY_STRING):SHA1(CONNECTION_ID).” For example, let’s say a random Base64 encoded string is “UX6t4PcS5doEeA==” and connection id is “KZlfidYvBcwCFFw=”

Then the final encoded string is “qrauth:UX6t4PcS5doEeA==:2fc58b0cc3b13c3f2db49a5b4660ad47c873b81a.”

This string is then encoded to the UTF8 QR code with the help of libqrencode library and the authentication screen is prompted by the PAM module.

char *con_id=strstr(msg,ws_com_strings[READ_WS_CONNECTION_ID]);
           int length = strlen(ws_com_strings[READ_WS_CONNECTION_ID]);
          
           if(!con_id){
               pam_login_status=PAM_AUTH_ERR;
               interrupted=1;
               return;
           }
           con_id+=length;
           log_message(LOG_DEBUG,ws_applogic.pamh,"strstr is %s",con_id);
           string_crypt(ws_applogic.sha_code_hex, con_id);
           sprintf(temp_text,"qrauth:%s:%s",ws_applogic.authkey,ws_applogic.sha_code_hex);
           char *qr_encoded_text=get_qrcode_string(temp_text);
           ws_applogic.qr_encoded_text=qr_encoded_text;
           conv_info(ws_applogic.pamh,"\nSSH Auth via QR Code\n\n");
           conv_info(ws_applogic.pamh, ws_applogic.qr_encoded_text);
           log_message(LOG_INFO,ws_applogic.pamh,"Use Mobile App to Scan \n %s",ws_applogic.qr_encoded_text);
           log_message(LOG_INFO,ws_applogic.pamh,"%s",temp_text);
           ws_applogic.current_action=READ_WS_AUTH_VERIFIED;
           sprintf(temp_text,ws_com_strings[SEND_WS_EXPECT_AUTH],ws_applogic.authkey,ws_applogic.username);
           websocket_write_back(wsi,temp_text,-1);
           conv_read(ws_applogic.pamh,"\n\nUse Mobile SSH QR Auth App to Authentiate SSh Login and Press Enter\n\n",PAM_PROMPT_ECHO_ON);

char *con_id=strstr(msg,ws_com_strings[READ_WS_CONNECTION_ID]);
           int length = strlen(ws_com_strings[READ_WS_CONNECTION_ID]);
          
           if(!con_id){
               pam_login_status=PAM_AUTH_ERR;
               interrupted=1;
               return;
           }
           con_id+=length;
           log_message(LOG_DEBUG,ws_applogic.pamh,"strstr is %s",con_id);
           string_crypt(ws_applogic.sha_code_hex, con_id);
           sprintf(temp_text,"qrauth:%s:%s",ws_applogic.authkey,ws_applogic.sha_code_hex);
           char *qr_encoded_text=get_qrcode_string(temp_text);
           ws_applogic.qr_encoded_text=qr_encoded_text;
           conv_info(ws_applogic.pamh,"\nSSH Auth via QR Code\n\n");
           conv_info(ws_applogic.pamh, ws_applogic.qr_encoded_text);
           log_message(LOG_INFO,ws_applogic.pamh,"Use Mobile App to Scan \n %s",ws_applogic.qr_encoded_text);
           log_message(LOG_INFO,ws_applogic.pamh,"%s",temp_text);
           ws_applogic.current_action=READ_WS_AUTH_VERIFIED;
           sprintf(temp_text,ws_com_strings[SEND_WS_EXPECT_AUTH],ws_applogic.authkey,ws_applogic.username);
           websocket_write_back(wsi,temp_text,-1);
           conv_read(ws_applogic.pamh,"\n\nUse Mobile SSH QR Auth App to Authentiate SSh Login and Press Enter\n\n",PAM_PROMPT_ECHO_ON);

API Gateway WebSocket App

We used a serverless framework for easily creating and deploying our infrastructure resources. With serverless cli, we use aws-nodejs template (serverless create –template aws-nodejs). You can find a detailed guide on Serverless, API Gateway WebSocket, and DynamoDB here. Below is the template YAML definition. Note that the DynamoDB resource has TTL set to expires_at property. This field holds the UNIX epoch timestamp.

What this means is that any record that we store is automatically deleted as per the epoch time set. We plan to keep the record only for 5 minutes. This also means the user must authenticate themselves within 5 minutes of the authentication request to the remote SSH server.

service: ssh-qrapp-websocket
frameworkVersion: '2'
useDotenv: true
provider:
 name: aws
 runtime: nodejs12.x
 lambdaHashingVersion: 20201221
 websocketsApiName: ssh-qrapp-websocket
 websocketsApiRouteSelectionExpression: $request.body.action
 region: ap-south-1
  iam:
   role:
     statements:
       - Effect: Allow
         Action:
           - "dynamodb:query"
           - "dynamodb:GetItem"
           - "dynamodb:PutItem"
         Resource:
           - Fn::GetAtt: [ SSHAuthDB, Arn ]
  environment:
   REGION: ${env:REGION}
   DYNAMODB_TABLE: SSHAuthDB
   WEBSOCKET_ENDPOINT: ${env:WEBSOCKET_ENDPOINT}
   NODE_ENV: ${env:NODE_ENV}
package:
 patterns:
   - '!node_modules/**'
   - handler.js
   - '!package.json'
   - '!package-lock.json'
plugins:
 - serverless-dotenv-plugin
layers:
 sshQRAPPLibs:
   path: layer
   compatibleRuntimes:
     - nodejs12.x
functions:
 connectionHandler:
   handler: handler.connectHandler
   timeout: 60
   memorySize: 256
   layers:
     - {Ref: SshQRAPPLibsLambdaLayer}
   events:
     - websocket:
        route: $connect
        routeResponseSelectionExpression: $default
 disconnectHandler:
   handler: handler.disconnectHandler
   memorySize: 256
   timeout: 60
   layers:
     - {Ref: SshQRAPPLibsLambdaLayer}
   events:
     - websocket: $disconnect
 defaultHandler:
   handler: handler.defaultHandler
   memorySize: 256
   timeout: 60
   layers:
     - {Ref: SshQRAPPLibsLambdaLayer}
   events:
     - websocket: $default
 customQueryHandler:
   handler: handler.queryHandler
   memorySize: 256
   timeout: 60
   layers:
     - {Ref: SshQRAPPLibsLambdaLayer}
   events:
     - websocket:
        route: expectauth
        routeResponseSelectionExpression: $default
     - websocket:
        route: getconid
        routeResponseSelectionExpression: $default
     - websocket:
        route: verifyauth
        routeResponseSelectionExpression: $default
 resources:
 Resources:
   SSHAuthDB:
     Type: AWS::DynamoDB::Table
     Properties:
       TableName: ${env:DYNAMODB_TABLE}
       AttributeDefinitions:
         - AttributeName: authkey
           AttributeType: S
       KeySchema:
         - AttributeName: authkey
           KeyType: HASH
       TimeToLiveSpecification:
         AttributeName: expires_at
         Enabled: true
       ProvisionedThroughput:
         ReadCapacityUnits: 2
         WriteCapacityUnits: 2

service: ssh-qrapp-websocket
frameworkVersion: '2'
useDotenv: true
provider:
 name: aws
 runtime: nodejs12.x
 lambdaHashingVersion: 20201221
 websocketsApiName: ssh-qrapp-websocket
 websocketsApiRouteSelectionExpression: $request.body.action
 region: ap-south-1
  iam:
   role:
     statements:
       - Effect: Allow
         Action:
           - "dynamodb:query"
           - "dynamodb:GetItem"
           - "dynamodb:PutItem"
         Resource:
           - Fn::GetAtt: [ SSHAuthDB, Arn ]
  environment:
   REGION: ${env:REGION}
   DYNAMODB_TABLE: SSHAuthDB
   WEBSOCKET_ENDPOINT: ${env:WEBSOCKET_ENDPOINT}
   NODE_ENV: ${env:NODE_ENV}
package:
 patterns:
   - '!node_modules/**'
   - handler.js
   - '!package.json'
   - '!package-lock.json'
plugins:
 - serverless-dotenv-plugin
layers:
 sshQRAPPLibs:
   path: layer
   compatibleRuntimes:
     - nodejs12.x
functions:
 connectionHandler:
   handler: handler.connectHandler
   timeout: 60
   memorySize: 256
   layers:
     - {Ref: SshQRAPPLibsLambdaLayer}
   events:
     - websocket:
        route: $connect
        routeResponseSelectionExpression: $default
 disconnectHandler:
   handler: handler.disconnectHandler
   memorySize: 256
   timeout: 60
   layers:
     - {Ref: SshQRAPPLibsLambdaLayer}
   events:
     - websocket: $disconnect
 defaultHandler:
   handler: handler.defaultHandler
   memorySize: 256
   timeout: 60
   layers:
     - {Ref: SshQRAPPLibsLambdaLayer}
   events:
     - websocket: $default
 customQueryHandler:
   handler: handler.queryHandler
   memorySize: 256
   timeout: 60
   layers:
     - {Ref: SshQRAPPLibsLambdaLayer}
   events:
     - websocket:
        route: expectauth
        routeResponseSelectionExpression: $default
     - websocket:
        route: getconid
        routeResponseSelectionExpression: $default
     - websocket:
        route: verifyauth
        routeResponseSelectionExpression: $default
 resources:
 Resources:
   SSHAuthDB:
     Type: AWS::DynamoDB::Table
     Properties:
       TableName: ${env:DYNAMODB_TABLE}
       AttributeDefinitions:
         - AttributeName: authkey
           AttributeType: S
       KeySchema:
         - AttributeName: authkey
           KeyType: HASH
       TimeToLiveSpecification:
         AttributeName: expires_at
         Enabled: true
       ProvisionedThroughput:
         ReadCapacityUnits: 2
         WriteCapacityUnits: 2

The API Gateway WebSocket has three custom events. These events come as an argument to the lambda function in “event.body.action.” API Gateway WebSocket calls them as route selection expressions. These custom events are:

The “expectauth” event is sent by the PAM module to WebSocket informing that a client has asked for authentication and mobile application may try to authenticate by scanning QR code. During this event, the WebSocket handler stores the connection ID along with auth verification string. This key acts as a primary key to our DynamoDB table.
The “getconid” event is sent to retrieve the current connection ID so that the PAM can generate a SHA1 sum and provide a QR Code prompt.
The “verifyauth” event is sent by the PAM module to confirm and verify authentication. During this event, even the WebSocket server expects random challenge response text. WebSocket server retrieves data payload from DynamoDB with auth verification string as primary key, and tries to find the key “authVerified” marked as “true” (more on this later).

queryHandler: async (event,context) => {
   const payload = JSON.parse(event.body);
   const documentClient = new DynamoDB.DocumentClient({
     region : process.env.REGION
   });
   try {
     switch(payload.action){
       case 'expectauth':
        
         const expires_at = parseInt(new Date().getTime() / 1000) + 300;
  
         await documentClient.put({
           TableName : process.env.DYNAMODB_TABLE,
           Item: {
             authkey : payload.authkey,
             connectionId : event.requestContext.connectionId,
             username : payload.username,
             expires_at : expires_at,
             authVerified: false
           }
         }).promise();
         return {
           statusCode: 200,
           body : "OK"
         };
       case 'getconid':
         return {
           statusCode: 200,
           body: `connectionid:${event.requestContext.connectionId}`
         };
       case 'verifyauth':
         const data = await documentClient.get({
           TableName : process.env.DYNAMODB_TABLE,
           Key : {
             authkey : payload.authkey
           }
         }).promise();
         if(!("Item" in data)){
           throw "Failed to query data";
         }
         if(data.Item.authVerified === true){
           return {
             statusCode: 200,
             body: `authverified:${payload.challengeText}`
           }
         }
         throw "auth verification failed";
     }
   } catch (error) {
     console.log(error);
   }
   return {
     statusCode:  200,
     body : "ok"
    };
  
 }

queryHandler: async (event,context) => {
   const payload = JSON.parse(event.body);
   const documentClient = new DynamoDB.DocumentClient({
     region : process.env.REGION
   });
   try {
     switch(payload.action){
       case 'expectauth':
        
         const expires_at = parseInt(new Date().getTime() / 1000) + 300;
  
         await documentClient.put({
           TableName : process.env.DYNAMODB_TABLE,
           Item: {
             authkey : payload.authkey,
             connectionId : event.requestContext.connectionId,
             username : payload.username,
             expires_at : expires_at,
             authVerified: false
           }
         }).promise();
         return {
           statusCode: 200,
           body : "OK"
         };
       case 'getconid':
         return {
           statusCode: 200,
           body: `connectionid:${event.requestContext.connectionId}`
         };
       case 'verifyauth':
         const data = await documentClient.get({
           TableName : process.env.DYNAMODB_TABLE,
           Key : {
             authkey : payload.authkey
           }
         }).promise();
         if(!("Item" in data)){
           throw "Failed to query data";
         }
         if(data.Item.authVerified === true){
           return {
             statusCode: 200,
             body: `authverified:${payload.challengeText}`
           }
         }
         throw "auth verification failed";
     }
   } catch (error) {
     console.log(error);
   }
   return {
     statusCode:  200,
     body : "ok"
    };
  
 }

Android App: SSH QR Code Auth

The Android app consists of two parts. App login and scanning the QR code for authentication. The AWS Cognito and Amplify library ease out the process of a secure login. Just wrapping your react-native app with “withAutheticator” component you get ready to use “Login Screen.” We then use the react-native-qrcode-scanner component to scan the QR Code.

This component returns decoded string on the successful scan. Application logic then breaks the string and finds the validity of the string decoded. If the decoded string is a valid application string, an API call is made to the server with the appropriate payload.

render(){
   return (
     <View style={styles.container}>
       {this.state.authQRCode ?
       <AuthQRCode
        hideAuthQRCode = {this.hideAuthQRCode}
        qrScanData = {this.qrScanData}
       />
       :
       <View style={{marginVertical: 10}}>
       <Button title="Auth SSH Login" onPress={this.showAuthQRCode} />
       <View style={{margin:10}} />
       <Button title="Sign Out" onPress={this.signout} />
       </View>
      
       }
     </View>
   );
 }
     const scanCode = e.data.split(':');
     if(scanCode.length <3){
       throw "invalid qr code";
     }
     const [appstring,authcode,shacode] = scanCode;
     if(appstring !== "qrauth"){
       throw "Not a valid app qr code";
     }
     const authsession = await Auth.currentSession();
     const jwtToken = authsession.getIdToken().jwtToken;
     const response = await axios({
       url : "https://API_GATEWAY_URL/v1/app/sshqrauth/qrauth",
       method : "post",
       headers : {
         Authorization : jwtToken,
         'Content-Type' : 'application/json'
       },
       responseType: "json",
       data : {
         authcode,
         shacode
       }
     });
     if(response.data.status === 200){
       rescanQRCode=false;
       setTimeout(this.hideAuthQRCode, 1000);
     }

render(){
   return (
     <View style={styles.container}>
       {this.state.authQRCode ?
       <AuthQRCode
        hideAuthQRCode = {this.hideAuthQRCode}
        qrScanData = {this.qrScanData}
       />
       :
       <View style={{marginVertical: 10}}>
       <Button title="Auth SSH Login" onPress={this.showAuthQRCode} />
       <View style={{margin:10}} />
       <Button title="Sign Out" onPress={this.signout} />
       </View>
      
       }
     </View>
   );
 }
     const scanCode = e.data.split(':');
     if(scanCode.length <3){
       throw "invalid qr code";
     }
     const [appstring,authcode,shacode] = scanCode;
     if(appstring !== "qrauth"){
       throw "Not a valid app qr code";
     }
     const authsession = await Auth.currentSession();
     const jwtToken = authsession.getIdToken().jwtToken;
     const response = await axios({
       url : "https://API_GATEWAY_URL/v1/app/sshqrauth/qrauth",
       method : "post",
       headers : {
         Authorization : jwtToken,
         'Content-Type' : 'application/json'
       },
       responseType: "json",
       data : {
         authcode,
         shacode
       }
     });
     if(response.data.status === 200){
       rescanQRCode=false;
       setTimeout(this.hideAuthQRCode, 1000);
     }

This guide does not cover how to deploy react-native Android applications. You may refer to the official react-native guide to deploy your application to the Android mobile device.

QR Auth API

The QR Auth API is built using a serverless framework with aws-nodejs template. It uses API Gateway as HTTP API and AWS Cognito for authorizing input requests. The serverless YAML definition is defined below.

service: ssh-qrauth-server
frameworkVersion: '2 || 3'
useDotenv: true
provider:
 name: aws
 runtime: nodejs12.x
 lambdaHashingVersion: 20201221
 deploymentBucket:
   name: ${env:DEPLOYMENT_BUCKET_NAME}
 httpApi:
   authorizers:
     cognitoJWTAuth:
       identitySource: $request.header.Authorization
       issuerUrl: ${env:COGNITO_ISSUER}
       audience:
         - ${env:COGNITO_AUDIENCE}
 region: ap-south-1
 iam:
   role:
     statements:
     - Effect: "Allow"
       Action:
         - "dynamodb:Query"
         - "dynamodb:PutItem"
         - "dynamodb:GetItem"
       Resource:
         - ${env:DYNAMO_DB_ARN}
     - Effect: "Allow"
       Action:
         - "execute-api:Invoke"
         - "execute-api:ManageConnections"
       Resource:
         - ${env:API_GATEWAY_WEBSOCKET_API_ARN}/*
 environment:
   REGION: ${env:REGION}
   COGNITO_ISSUER: ${env:COGNITO_ISSUER}
   DYNAMODB_TABLE: ${env:DYNAMODB_TABLE}
   COGNITO_AUDIENCE: ${env:COGNITO_AUDIENCE}
   POOLID: ${env:POOLID}
   COGNITOIDP: ${env:COGNITOIDP}
   WEBSOCKET_ENDPOINT: ${env:WEBSOCKET_ENDPOINT}
package:
 patterns:
   - '!node_modules/**'
   - handler.js
   - '!package.json'
   - '!package-lock.json'
   - '!.env'
   - '!test.http'
plugins:
 - serverless-deployment-bucket
 - serverless-dotenv-plugin
layers:
 qrauthLibs:
   path: layer
   compatibleRuntimes:
     - nodejs12.x
functions:
 sshauthqrcode:
   handler: handler.authqrcode
   memorySize: 256
   timeout: 30
   layers:
     - {Ref: QrauthLibsLambdaLayer}
   events:
     - httpApi:
         path: /v1/app/sshqrauth/qrauth
         method: post
         authorizer:
           name: cognitoJWTAuth

service: ssh-qrauth-server
frameworkVersion: '2 || 3'
useDotenv: true
provider:
 name: aws
 runtime: nodejs12.x
 lambdaHashingVersion: 20201221
 deploymentBucket:
   name: ${env:DEPLOYMENT_BUCKET_NAME}
 httpApi:
   authorizers:
     cognitoJWTAuth:
       identitySource: $request.header.Authorization
       issuerUrl: ${env:COGNITO_ISSUER}
       audience:
         - ${env:COGNITO_AUDIENCE}
 region: ap-south-1
 iam:
   role:
     statements:
     - Effect: "Allow"
       Action:
         - "dynamodb:Query"
         - "dynamodb:PutItem"
         - "dynamodb:GetItem"
       Resource:
         - ${env:DYNAMO_DB_ARN}
     - Effect: "Allow"
       Action:
         - "execute-api:Invoke"
         - "execute-api:ManageConnections"
       Resource:
         - ${env:API_GATEWAY_WEBSOCKET_API_ARN}/*
 environment:
   REGION: ${env:REGION}
   COGNITO_ISSUER: ${env:COGNITO_ISSUER}
   DYNAMODB_TABLE: ${env:DYNAMODB_TABLE}
   COGNITO_AUDIENCE: ${env:COGNITO_AUDIENCE}
   POOLID: ${env:POOLID}
   COGNITOIDP: ${env:COGNITOIDP}
   WEBSOCKET_ENDPOINT: ${env:WEBSOCKET_ENDPOINT}
package:
 patterns:
   - '!node_modules/**'
   - handler.js
   - '!package.json'
   - '!package-lock.json'
   - '!.env'
   - '!test.http'
plugins:
 - serverless-deployment-bucket
 - serverless-dotenv-plugin
layers:
 qrauthLibs:
   path: layer
   compatibleRuntimes:
     - nodejs12.x
functions:
 sshauthqrcode:
   handler: handler.authqrcode
   memorySize: 256
   timeout: 30
   layers:
     - {Ref: QrauthLibsLambdaLayer}
   events:
     - httpApi:
         path: /v1/app/sshqrauth/qrauth
         method: post
         authorizer:
           name: cognitoJWTAuth

Once the API Gateway authenticates the incoming requests, control is handed over to the serverless-express router. At this stage, we verify the payload for the auth verify string, which is scanned by the Android mobile app. This auth verify string must be available in the DynamoDB table. Upon retrieving the record pointed by auth verification string, we read the connection ID property and convert it to SHA1 hash. If the hash matches with the hash available in the request payload, we update the record “authVerified” as “true” and inform the PAM module via API Gateway WebSocket API. PAM Module then takes care of further validation via challenge response text.

The entire authentication flow is depicted in a flow diagram, and the architecture is depicted in the cover post of this blog.

Compiling and Installing PAM module

Unlike any other C programs, PAM modules are shared libraries. Therefore, the compiled code when loaded in memory may go at this arbitrary place. Thus, the module must be compiled as position independent. With gcc while compiling, we must pass -fPIC option. Further while linking and generating shared object binary, we should use -shared flag.

gcc -I$PWD -fPIC -c $(ls *.c)
gcc -shared -o pam_qrapp_auth.so $(ls *.o) -lpam -lqrencode -lssl -lcrypto -lpthread -lwebsockets

gcc -I$PWD -fPIC -c $(ls *.c)
gcc -shared -o pam_qrapp_auth.so $(ls *.o) -lpam -lqrencode -lssl -lcrypto -lpthread -lwebsockets

To ease this process of compiling and validating libraries, I prefer to use the autoconf tool. The entire project is checked out at my GitHub repository along with autoconf scripts.

Once the shared object file is generated (pam_qrapp_auth.so), copy this file to the “/usr/lib64/security/” directory and run ldconfig command to inform OS new shared library is available. Remove common-auth (from /etc/pam.d/sshd if applicable) or any line that uses “auth” realm with pam_unix.so module recursively used in /etc/pam.d/sshd. pam_unix.so module enforces a password or private key authentication. We then need to add our module to the auth realm (“auth required pam_qrapp_auth.so”). Depending upon your Linux flavor, your /etc/pam.d/sshd file may look similar to below:

auth       required     pam_qrapp_auth.so
account    required     pam_nologin.so
@include common-account
session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so close
session    required     pam_loginuid.so
session    optional     pam_keyinit.so force revoke
@include common-session
session    optional     pam_motd.so  motd=/run/motd.dynamic
session    optional     pam_motd.so noupdate
session    optional     pam_mail.so standard noenv # [1]
session    required     pam_limits.so
session    required     pam_env.so # [1]
session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale
session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so open
@include common-password

auth       required     pam_qrapp_auth.so
account    required     pam_nologin.so
@include common-account
session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so close
session    required     pam_loginuid.so
session    optional     pam_keyinit.so force revoke
@include common-session
session    optional     pam_motd.so  motd=/run/motd.dynamic
session    optional     pam_motd.so noupdate
session    optional     pam_mail.so standard noenv # [1]
session    required     pam_limits.so
session    required     pam_env.so # [1]
session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale
session [success=ok ignore=ignore module_unknown=ignore default=bad]        pam_selinux.so open
@include common-password

Finally, we need to configure our sshd daemon configuration file to allow challenge response authentication. Open file /etc/ssh/sshd_config and add “ChallengeResponseAuthentication yes” if already not available or commented or set to “no.” Reload the sshd service by issuing the command “systemctl reload sshd.” Voila, and we are done here.

Conclusion

This guide was a barebones tutorial and not meant for production use. There are certain flaws to this PAM module. For example, our module should prompt for changing the password if the password is expired or login should be denied if an account is a locked and similar feature that addresses security. Also, the Android mobile app should be bound with ssh username so that, AWS Cognito user bound with ssh username could only authenticate.

One known limitation to this PAM module is we have to always hit enter after scanning the QR Code via Android Mobile App. This limitation is because of how OpenSSH itself is implemented. OpenSSH server blocks all the informational text unless user input is required. In our case, the informational text is UTF8 QR Code itself.

However, no such input is required from the interactive device, as the authentication event comes from the WebSocket to PAM module. If we do not ask the user to exclusively press enter after scanning the QR Code our QR Code will never be displayed. Thus input here is a dummy. This is a known issue for OpenSSH PAM_TEXT_INFO. Find more about the issue here.

References

– Pluggable authentication module

– An introduction to Pluggable Authentication Modules (PAM) in Linux

– Custom PAM for SSHD in C

– google-authenticator-libpam

– PAM_TEXT_INFO and PAM_ERROR_MSG conversation not honoured during PAM authentication

December 12, 2022

Set Up A Production-ready REST API Server Using TypeScript, Express And PostgreSQL

Introduction

So, you have a brilliant idea for a web application. It’s going to be the next big thing, and you are super-excited about it. Maybe you have already started building the perfect React/Angular UI for your app.

Eventually, you realize that, like most web apps, your app is going to be data-intensive and will need a lightning-fast web server. You know that Node.js is the de facto standard for web servers for how well it unifies front-end and back-end web development with JavaScript, so you go for it.

But you want your server to be robust and reliable too. A colleague introduces you to TypeScript, the superset of JavaScript developed by Microsoft, and recommends it for its strict static typing and compilation.

Now comes storing the data. Naturally, you select PostgreSQL. After all, it is the most advanced Relational Database Management System (RDBMS) in the world, with its object-oriented features and extensibility. But RDBMSs can be slow for frequently used data and caching, so you decide to add Redis, the in-memory cache, to decrease data access latency and ease the load off your relational data store.

That’s it. You have a perfect server waiting to be built. And while the initial process of getting it up and running can get arduous, you have come to the right place. This blog is going to guide you through the initial setup process.

Prerequisites

I am assuming you have a non-root user with sudo privileges running on Ubuntu 16.04. Before we start, please make sure you have the following:

NPM (~v6.9.0) and Node.js (~v10.16.0) – You can use this How to Install Node.js on Ubuntu 16.04
Redis – How to install Redis on Ubuntu 16.04
PostgreSQL – How to install PostgreSQL on Ubuntu 16.04

Of course, MacOS or Windows would do fine too for this tutorial, but to use them, please find appropriate installation guides on the Internet before moving forward.

If you don’t want to go through the steps below, you can check out my GitHub Repo typescript-express-server and use it as your application skeleton. It has been set up with default configurations, which you can change later. Nevertheless, I strongly recommend going through this guide to further your understanding of the project files and configuration nuances.

Initializing Server (Express with TypeScript)

Setting up an Express Application with TypeScript can be done in three steps:

Initialize project using NPM

Create a folder and run:

npm init

npm init

This will ask you a couple of project-specific questions, like name and version, and will create a package.json file, which may look like this:

{
 "name": "my-typescript-express-server",
 "version": "0.0.0",
 "scripts": {
   "start": "node ./dist/index.js --env=production",
   "start:dev": "ts-node -r tsconfig-paths/register ./src",
  },
 "dependencies": {
   "cookie-parser": "^1.4.5",
   "dotenv": "^8.2.0"
  },
 "devDependencies": {
   "find": "^0.3.0",
   "fs-extra": "^9.0.1",
 }
}

{
 "name": "my-typescript-express-server",
 "version": "0.0.0",
 "scripts": {
   "start": "node ./dist/index.js --env=production",
   "start:dev": "ts-node -r tsconfig-paths/register ./src",
  },
 "dependencies": {
   "cookie-parser": "^1.4.5",
   "dotenv": "^8.2.0"
  },
 "devDependencies": {
   "find": "^0.3.0",
   "fs-extra": "^9.0.1",
 }
}

This manifest file will contain all the metadata of your project, like module dependencies, configs, and scripts. For more information, check out this very good read about the basics of package.json.

Setting up TypeScript Configuration (tsconfig.json)

This file needs to be created in the root of a TypeScript project. During development, TypeScript provides us with the convenience of running the code directly from the .ts extension files. But during production, since Node.js only understands JS, the entire TS files need to be transpiled to JS. Some of the options are: include – specifies the files to be included, exclude – the files to exclude, and the compiler options: outFIle and moduleResolution.

First, we need to install some TypeScript specific modules:

npm i typescript ts-node tsconfig-paths

npm i typescript ts-node tsconfig-paths

CODE: https://gist.github.com/velotiotech/95b021f1728a9b8a61d9fca89b0b9e59.js

This is the tsconfig.json file with some default configurations:

For a detailed reference, checkout tsconfig.json.

Setting up ESLint

It is not mandatory to use this JavaScript linter, but it’s highly recommended for enforcing code standards and keeping code clean. TypeScript projects once used TSLint, but it has been deprecated in favor of ESLint.

Run this command:

npm install --save-dev eslint @typescript-eslint/parser @typescript-eslint/eslint-plugin

npm install --save-dev eslint @typescript-eslint/parser @typescript-eslint/eslint-plugin

Create a .eslintrc file in the project root and use the following starter configuration:

{
 "root": true,
 "parser": "@typescript-eslint/parser",
 "plugins": [
   "@typescript-eslint"
 ],
 "extends": [
   "eslint:recommended",
   "plugin:@typescript-eslint/eslint-recommended",
   "plugin:@typescript-eslint/recommended"
 ]
}

{
 "root": true,
 "parser": "@typescript-eslint/parser",
 "plugins": [
   "@typescript-eslint"
 ],
 "extends": [
   "eslint:recommended",
   "plugin:@typescript-eslint/eslint-recommended",
   "plugin:@typescript-eslint/recommended"
 ]
}

Lastly, add a lint script to package.json:

{
  "name": "my-typescript-express-server",
  "version": "0.0.0",
  "scripts": {
    "start": "node ./dist/index.js --env=production",
    "start:dev": "ts-node -r tsconfig-paths/register ./src",
    "lint": "eslint . --ext .ts",
   },

{
  "name": "my-typescript-express-server",
  "version": "0.0.0",
  "scripts": {
    "start": "node ./dist/index.js --env=production",
    "start:dev": "ts-node -r tsconfig-paths/register ./src",
    "lint": "eslint . --ext .ts",
   },

Now, you can run the command below to lint your codebase for lint errors:

npm run lint

npm run lint

ESLint has ample rules to enforce standards in your code. Please look them up at Eslint with TypeScript.

Express App

Finally, we need to install Express, which is as simple as running this command:

npm install --save express @types/express

npm install --save express @types/express

You need a server file (src/Server.ts), which you can create like this:

import cookieParser from 'cookie-parser';
import express from 'express';
import { BAD_REQUEST } from 'http-status-codes';
import BaseRouter from './routes';
const app = express();
app.use(express.json());
app.use(express.urlencoded({extended: true}));
app.use(cookieParser());
// Add APIs
app.use('/api', BaseRouter);
// Export express instance
export default app;

import cookieParser from 'cookie-parser';
import express from 'express';
import { BAD_REQUEST } from 'http-status-codes';
import BaseRouter from './routes';
const app = express();
app.use(express.json());
app.use(express.urlencoded({extended: true}));
app.use(cookieParser());
// Add APIs
app.use('/api', BaseRouter);
// Export express instance
export default app;

You will also need src/index.ts that will be the entry point for your application:

import app from './Server';
// Start the server
const port = Number(process.env.PORT || 3000);
app.listen(port, () => {
   logger.info('Express server started on port: ' + port);
});

import app from './Server';
// Start the server
const port = Number(process.env.PORT || 3000);
app.listen(port, () => {
   logger.info('Express server started on port: ' + port);
});

Error Handling

Many Express servers are configured to swallow all errors by configuring an Uncaught Exception handler, which in my opinion, is bad news. The best thing to do is to allow the application to crash and restart. Uncaught Exceptions in Node.js is a good read regarding this.

Nonetheless, we are going to configure an error handler that will print errors and send a BadRequest response when an invalid HTTP request comes your API’s way.

In the src/Server.ts, add this:

/// Print API errors
// eslint-disable-next-line @typescript-eslint/no-unused-vars
app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
   logger.error(err.message, err);
   return res.status(BAD_REQUEST).json({
       error: err.message,
   });
});

/// Print API errors
// eslint-disable-next-line @typescript-eslint/no-unused-vars
app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
   logger.error(err.message, err);
   return res.status(BAD_REQUEST).json({
       error: err.message,
   });
});

Kudos! You have a basic Express server set up. Fire it up by running:

npm run start:dev

npm run start:dev

Connecting with the Database Store using TypeORM

We have a basic server ready to go, but we need to connect it to our Postgres database using an ORM. TypeORM is a versatile ORM that supports both Active Record and Data Mapper patterns, unlike all other JavaScript ORMs. It can be installed on our server with the following steps:

npm i --save typeorm pg reflect-metadata

npm i --save typeorm pg reflect-metadata

Create an ormconfig.json file in your project root with the following configuration:

{
   "synchronize": true,
   "logging": false,
   "entities": [
      "src/entities/**/*.ts"
   ],
   "cli": {
      "entitiesDir": "src/entity",
      "migrationsDir": "src/migration",
      "subscribersDir": "src/subscriber"
    },
   "migrations": [
      "src/migration/**/*.ts"
   ],
   "subscribers": [
      "src/subscriber/**/*.ts"
    ]
}

{
   "synchronize": true,
   "logging": false,
   "entities": [
      "src/entities/**/*.ts"
   ],
   "cli": {
      "entitiesDir": "src/entity",
      "migrationsDir": "src/migration",
      "subscribersDir": "src/subscriber"
    },
   "migrations": [
      "src/migration/**/*.ts"
   ],
   "subscribers": [
      "src/subscriber/**/*.ts"
    ]
}

Create a src/db.ts file that will initialize the database connection:

import "reflect-metadata";
import {createConnection} from "typeorm";
import { Tedis } from "tedis";
import logger from '../src/shared/Logger';
export async function intializeDB(): Promise<void> {
  await createConnection();
}

import "reflect-metadata";
import {createConnection} from "typeorm";
import { Tedis } from "tedis";
import logger from '../src/shared/Logger';
export async function intializeDB(): Promise<void> {
  await createConnection();
}

TypeORM Entities are classes that represent the data models in our application. We are going to build a User Entity (which application doesn’t have a user, duh!) like this in src/entities/User.ts:

import {Entity, PrimaryGeneratedColumn, Column} from "typeorm";
@Entity()
export class User {
  @PrimaryGeneratedColumn()
  id: number;
  @Column()
  firstName: string;
  @Column()
  lastName: string;
  @Column()
  age: number;
}

import {Entity, PrimaryGeneratedColumn, Column} from "typeorm";
@Entity()
export class User {
  @PrimaryGeneratedColumn()
  id: number;
  @Column()
  firstName: string;
  @Column()
  lastName: string;
  @Column()
  age: number;
}

Then, add these lines to src/index.ts:

import { intializeDB } from './db';
intializeDB();

import { intializeDB } from './db';
intializeDB();

You will need the env variables, like TYPEORM_CONNECTION, TYPEORM_HOST, and TYPEORM_USERNAME, with your postgres db’s connection params. Please check TypeORMs documentation for more details.

Connecting Redis

We will use Tedis, the TypeScript wrapper for Redis in our server:

npm i tedis

npm i tedis

Add these lines to src/db.ts:

export function initializeCache(port: number | undefined) : unknown {
 const tedis = new Tedis({
   port: port,
   host: "127.0.0.1"
 });
 return tedis;
}

export function initializeCache(port: number | undefined) : unknown {
 const tedis = new Tedis({
   port: port,
   host: "127.0.0.1"
 });
 return tedis;
}

And these lines to src/index.ts:

const redisPORT = Number(process.env.REDIS_PORT || 6379)
initializeCache(redisPORT);

const redisPORT = Number(process.env.REDIS_PORT || 6379)
initializeCache(redisPORT);

Now, your application code can use the Redis cache using the client created above.

Configuring Logging

Logging is pivotal to an application because it gives us a real-time view of the state of our application. For development, we are going to install the Morgan Request Logger, a library that logs HTTP requests params. It comes really handy for debugging.

npm i morgan

npm i morgan

And include this in src/Server.ts:

export function initializeCache(port: number | undefined) : unknown {
 const tedis = new Tedis({
   port: port,
   host: "127.0.0.1"
 });
 return tedis;
}

export function initializeCache(port: number | undefined) : unknown {
 const tedis = new Tedis({
   port: port,
   host: "127.0.0.1"
 });
 return tedis;
}

Winston can be used as the system-wide universal logger. Install it like this:

npm i winston

npm i winston

Then, add a src/shared/Logger.js file:

import { createLogger, format, transports } from 'winston';
// Import Functions
const { File, Console } = transports;
// Init Logger
const logger = createLogger({
   level: 'info',
});
const errorStackFormat = format((info) => {
   if (info.stack) {
      // tslint:disable-next-line:no-console
      console.log(info.stack);
      return false;
    }
      return info;
   });
   const consoleTransport = new Console({
       format: format.combine(
           format.colorize(),
           format.simple(),
           errorStackFormat(),
       ),
   });
   logger.add(consoleTransport);
}
export default logger;

import { createLogger, format, transports } from 'winston';
// Import Functions
const { File, Console } = transports;
// Init Logger
const logger = createLogger({
   level: 'info',
});
const errorStackFormat = format((info) => {
   if (info.stack) {
      // tslint:disable-next-line:no-console
      console.log(info.stack);
      return false;
    }
      return info;
   });
   const consoleTransport = new Console({
       format: format.combine(
           format.colorize(),
           format.simple(),
           errorStackFormat(),
       ),
   });
   logger.add(consoleTransport);
}
export default logger;

Now, you can use this logger from anywhere in the code, be it for error logging in your API methods or for debugging purposes:

import logger from '@shared/Logger';
export async function intializeDB(): Promise<void> {
 await createConnection()
 logger.info('Database successfully initialized');
}

import logger from '@shared/Logger';
export async function intializeDB(): Promise<void> {
 await createConnection()
 logger.info('Database successfully initialized');
}

Creating your First API Service

This is the moment you have been waiting for: creating your first API service for your application, the crux of the functionality that will define your web application.

This API service is a simple GET request handler, which returns all the users in your database. You should have src/Users.ts, which can look like:

import { Request, Response, Router } from 'express';
import { BAD_REQUEST, CREATED, OK } from 'http-status-codes';
import { ParamsDictionary } from 'express-serve-static-core';
import { getConnection } from "typeorm";
import { User } from "../entities/User";
import { paramMissingError } from '../shared/constants';
const router = Router();
router.get('/all', async (req: Request, res: Response) => {
   const users = await getConnection()
       .getRepository(User)
       .createQueryBuilder("user")
       .getMany();
   return res.status(OK).json({users});
});

import { Request, Response, Router } from 'express';
import { BAD_REQUEST, CREATED, OK } from 'http-status-codes';
import { ParamsDictionary } from 'express-serve-static-core';
import { getConnection } from "typeorm";
import { User } from "../entities/User";
import { paramMissingError } from '../shared/constants';
const router = Router();
router.get('/all', async (req: Request, res: Response) => {
   const users = await getConnection()
       .getRepository(User)
       .createQueryBuilder("user")
       .getMany();
   return res.status(OK).json({users});
});

Add src/routes/index.ts

import { Router } from 'express';
import UserRouter from './Users';
// Init router and path
const router = Router();
// Add sub-routes
router.use('/users', UserRouter);
// Export the base-router
export default router;

import { Router } from 'express';
import UserRouter from './Users';
// Init router and path
const router = Router();
// Add sub-routes
router.use('/users', UserRouter);
// Export the base-router
export default router;

Voila! Your API service is ready. Fire up your server, and then use Postman to make requests to your API and see the magic happen.

You can also add other API services for fetching a user by ID, deleting a user, creating a user, and updating a user. I will not discuss them here to keep this blog short. You can find these in the Github repository I mentioned in the beginning.

Deploying your Server to Production

What we have been doing has been in the development phase. Now, we need to take this to production. You just need to have a <project-root>/build.js </project-root>script that will create a <project-root>/dist</project-root> folder and transpile all the TypeScript files that you have written. It can look like this:

const fsE = require('fs-extra');
const childProcess = require('child_process');
// Remove current build
fsE.removeSync('./dist/');
// Copy front-end files
fsE.copySync('./src/public', './dist/public');
fsE.copySync('./src/views', './dist/views');
// Transpile the typescript files
childProcess.execSync('tsc --build tsconfig.prod.json');

const fsE = require('fs-extra');
const childProcess = require('child_process');
// Remove current build
fsE.removeSync('./dist/');
// Copy front-end files
fsE.copySync('./src/public', './dist/public');
fsE.copySync('./src/views', './dist/views');
// Transpile the typescript files
childProcess.execSync('tsc --build tsconfig.prod.json');

Then, add this line to your <project-root>/package.json</project-root>:

"scripts": {
   "build": "node build.js",
   "lint": "eslint . --ext .ts",

"scripts": {
   "build": "node build.js",
   "lint": "eslint . --ext .ts",

Now, you can use:

node build.js

node build.js

Doing so builds up the <project-root>/dist</project-root> folder and transpiles your code. You can deploy this folder to your deployment environment and run it to start your production server:

npm start

npm start

Note: You will need to do some additional setting up of your Nginx or AWS Virtual Machine to complete your deployment, which is beyond the scope of this blog.

Going Forward

Congratulations. You have made it through this tutorial that guided you through the process of setting up a web server. But this is just the beginning, and there is no end to the improvements and optimizations that you can add to your server to make it better and sturdier. And you will continue to discover them in your journey of developing your web application. Some of the key points that I want to mention are:

Managing Environments

Your Web server will be operated in multiple environments, such as development, testing, and production. Some of the vital configurations like AWS credentials and DB passwords are sensitive information, and managing them per environment is key to your development and deployment cycle. I strongly recommend using libraries like Dotenv and keeping your env configurations separate in your codebase. You can look up typescript-express-server for this.

Configuring Swagger

Software developers nowadays swear by this tool. It’s proved to be a godsend for API documentation and keeping APIs in confirmation with the OpenAPI standard. On top of that, it also does API requests validation according to your API specifications. I strongly recommend you configure this in your web server.

Writing Tests

Writing API tests and unit tests can be a crucial part of web application development as it exposes possible gaps in your systems. You can use Superagent, the lightweight REST API, to test your APIs for all possible requests and response scenarios. Please look up the src/spec in typescript-express-server about how to use it. You can also use Postman for API Testing Automation. For most of the services that you write, you should make sure to add unit tests for each of those using Jest.

Introduction to the Modern Server-side Stack – Golang, Protobuf, and gRPC

There are some new players in town for server programming and this time it’s all about Google. Golang has rapidly been gaining popularity ever since Google started using it for their own production systems. And since the inception of Microservice Architecture, people have been focusing on modern data communication solutions like gRPC along with Protobuf. In this post, I will walk you through each of these briefly.

Golang

Golang or Go is an open source, general purpose programming language by Google. It has been gaining popularity recently for all the good reasons. It may come as a surprise to most people that language is almost 10 years old and has been production ready for almost 7 years, according to Google.

Golang is designed to be simple, modern, easy to understand, and quick to grasp. The creators of the language designed it in such a way that an average programmer can have a working knowledge of the language over a weekend. I can attest to the fact that they definitely succeeded. Speaking of the creators, these are the experts that have been involved in the original draft of the C language so we can be assured that these guys know what they are doing.

That’s all good but why do we need another language?

For most of the use cases, we actually don’t. In fact, Go doesn’t solve any new problems that haven’t been solved by some other language/tool before. But it does try to solve a specific set of relevant problems that people generally face in an efficient, elegant, and intuitive manner. Go’s primary focus is the following:

First class support for concurrency
An elegant, modern language that is very simple to its core
Very good performance
First hand support for the tools required for modern software development

I’m going to briefly explain how Go provides all of the above. You can read more about the language and its features in detail from Go’s official website.

Concurrency

Concurrency is one of the primary concerns in most of the server applications and it should be the primary concern of the language, considering the modern microprocessors. Go introduces a concept called a ‘goroutine’. A ‘goroutine’ is analogous to a ‘lightweight user-space thread’. It is much more complicated than that in reality as several goroutines multiplex on a single thread but the above expression should give you a general idea. These are light enough that you can actually spin up a million goroutines simultaneously as they start with a very tiny stack. In fact, that’s recommended. Any function/method in Go can be used to spawn a Goroutine. You can just do ‘go myAsyncTask()’ to spawn a goroutine from ‘myAsyncTask’ function. The following is an example:

// This function performs the given task concurrently by spawing a goroutine
// for each of those tasks.
func performAsyncTasks(task []Task) {
  for _, task := range tasks {
    // This will spawn a separate goroutine to carry out this task.
    // This call is non-blocking
    go task.Execute()
  }
}

// This function performs the given task concurrently by spawing a goroutine
// for each of those tasks.

func performAsyncTasks(task []Task) {
  for _, task := range tasks {
    // This will spawn a separate goroutine to carry out this task.
    // This call is non-blocking
    go task.Execute()
  }
}

Yes, it’s that easy and it is meant to be that way as Go is a simple language and you are expected to spawn a goroutine for every independent async task without caring much. Go’s runtime automatically takes care of running the goroutines in parallel if multiple cores are available. But how do these goroutines communicate? The answer is channels.

‘Channel’ is also a language primitive that is meant to be used for communication among goroutines. You can pass anything from a channel to another goroutine (A primitive Go type or a Go struct or even other channels). A channel is essentially a blocking double ended queue (can be single ended too). If you want a goroutine(s) to wait for a certain condition to be met before continuing further you can implement cooperative blocking of goroutines with the help of channels.

These two primitives give a lot of flexibility and simplicity in writing asynchronous or parallel code. Other helper libraries like a goroutine pool can be easily created from the above primitives. One basic example is:

package executor
import (
	"log"
	"sync/atomic"
)
// The Executor struct is the main executor for tasks.
// 'maxWorkers' represents the maximum number of simultaneous goroutines.
// 'ActiveWorkers' tells the number of active goroutines spawned by the Executor at given time.
// 'Tasks' is the channel on which the Executor receives the tasks.
// 'Reports' is channel on which the Executor publishes the every tasks reports.
// 'signals' is channel that can be used to control the executor. Right now, only the termination
// signal is supported which is essentially is sending '1' on this channel by the client.
type Executor struct {
	maxWorkers    int64
	ActiveWorkers int64
	Tasks   chan Task
	Reports chan Report
	signals chan int
}
// NewExecutor creates a new Executor.
// 'maxWorkers' tells the maximum number of simultaneous goroutines.
// 'signals' channel can be used to control the Executor.
func NewExecutor(maxWorkers int, signals chan int) *Executor {
	chanSize := 1000
	if maxWorkers > chanSize {
		chanSize = maxWorkers
	}
	executor := Executor{
		maxWorkers: int64(maxWorkers),
		Tasks:      make(chan Task, chanSize),
		Reports:    make(chan Report, chanSize),
		signals:    signals,
	}
	go executor.launch()
	return &executor
}
// launch starts the main loop for polling on the all the relevant channels and handling differents
// messages.
func (executor *Executor) launch() int {
	reports := make(chan Report, executor.maxWorkers)
	for {
		select {
		case signal := <-executor.signals:
			if executor.handleSignals(signal) == 0 {
				return 0
			}
		case r := <-reports:
			executor.addReport(r)
		default:
			if executor.ActiveWorkers < executor.maxWorkers && len(executor.Tasks) > 0 {
				task := <-executor.Tasks
				atomic.AddInt64(&executor.ActiveWorkers, 1)
				go executor.launchWorker(task, reports)
			}
		}
	}
}
// handleSignals is called whenever anything is received on the 'signals' channel.
// It performs the relevant task according to the received signal(request) and then responds either
// with 0 or 1 indicating whether the request was respected(0) or rejected(1).
func (executor *Executor) handleSignals(signal int) int {
	if signal == 1 {
		log.Println("Received termination request...")
		if executor.Inactive() {
			log.Println("No active workers, exiting...")
			executor.signals <- 0
			return 0
		}
		executor.signals <- 1
		log.Println("Some tasks are still active...")
	}
	return 1
}
// launchWorker is called whenever a new Task is received and Executor can spawn more workers to spawn
// a new Worker.
// Each worker is launched on a new goroutine. It performs the given task and publishes the report on
// the Executor's internal reports channel.
func (executor *Executor) launchWorker(task Task, reports chan<- Report) {
	report := task.Execute()
	if len(reports) < cap(reports) {
		reports <- report
	} else {
		log.Println("Executor's report channel is full...")
	}
	atomic.AddInt64(&executor.ActiveWorkers, -1)
}
// AddTask is used to submit a new task to the Executor is a non-blocking way. The Client can submit
// a new task using the Executor's tasks channel directly but that will block if the tasks channel is
// full.
// It should be considered that this method doesn't add the given task if the tasks channel is full
// and it is up to client to try again later.
func (executor *Executor) AddTask(task Task) bool {
	if len(executor.Tasks) == cap(executor.Tasks) {
		return false
	}
	executor.Tasks <- task
	return true
}
// addReport is used by the Executor to publish the reports in a non-blocking way. It client is not
// reading the reports channel or is slower that the Executor publishing the reports, the Executor's
// reports channel is going to get full. In that case this method will not block and that report will
// not be added.
func (executor *Executor) addReport(report Report) bool {
	if len(executor.Reports) == cap(executor.Reports) {
		return false
	}
	executor.Reports <- report
	return true
}
// Inactive checks if the Executor is idle. This happens when there are no pending tasks, active
// workers and reports to publish.
func (executor *Executor) Inactive() bool {
	return executor.ActiveWorkers == 0 && len(executor.Tasks) == 0 && len(executor.Reports) == 0
}

package executor

import (
	"log"
	"sync/atomic"
)

// The Executor struct is the main executor for tasks.
// 'maxWorkers' represents the maximum number of simultaneous goroutines.
// 'ActiveWorkers' tells the number of active goroutines spawned by the Executor at given time.
// 'Tasks' is the channel on which the Executor receives the tasks.
// 'Reports' is channel on which the Executor publishes the every tasks reports.
// 'signals' is channel that can be used to control the executor. Right now, only the termination
// signal is supported which is essentially is sending '1' on this channel by the client.
type Executor struct {
	maxWorkers    int64
	ActiveWorkers int64

	Tasks   chan Task
	Reports chan Report
	signals chan int
}

// NewExecutor creates a new Executor.
// 'maxWorkers' tells the maximum number of simultaneous goroutines.
// 'signals' channel can be used to control the Executor.
func NewExecutor(maxWorkers int, signals chan int) *Executor {
	chanSize := 1000

	if maxWorkers > chanSize {
		chanSize = maxWorkers
	}

	executor := Executor{
		maxWorkers: int64(maxWorkers),
		Tasks:      make(chan Task, chanSize),
		Reports:    make(chan Report, chanSize),
		signals:    signals,
	}

	go executor.launch()

	return &executor
}

// launch starts the main loop for polling on the all the relevant channels and handling differents
// messages.
func (executor *Executor) launch() int {
	reports := make(chan Report, executor.maxWorkers)

	for {
		select {
		case signal := <-executor.signals:
			if executor.handleSignals(signal) == 0 {
				return 0
			}

		case r := <-reports:
			executor.addReport(r)

		default:
			if executor.ActiveWorkers < executor.maxWorkers && len(executor.Tasks) > 0 {
				task := <-executor.Tasks
				atomic.AddInt64(&executor.ActiveWorkers, 1)
				go executor.launchWorker(task, reports)
			}
		}
	}
}

// handleSignals is called whenever anything is received on the 'signals' channel.
// It performs the relevant task according to the received signal(request) and then responds either
// with 0 or 1 indicating whether the request was respected(0) or rejected(1).
func (executor *Executor) handleSignals(signal int) int {
	if signal == 1 {
		log.Println("Received termination request...")

		if executor.Inactive() {
			log.Println("No active workers, exiting...")
			executor.signals <- 0
			return 0
		}

		executor.signals <- 1
		log.Println("Some tasks are still active...")
	}

	return 1
}

// launchWorker is called whenever a new Task is received and Executor can spawn more workers to spawn
// a new Worker.
// Each worker is launched on a new goroutine. It performs the given task and publishes the report on
// the Executor's internal reports channel.
func (executor *Executor) launchWorker(task Task, reports chan<- Report) {
	report := task.Execute()

	if len(reports) < cap(reports) {
		reports <- report
	} else {
		log.Println("Executor's report channel is full...")
	}

	atomic.AddInt64(&executor.ActiveWorkers, -1)
}

// AddTask is used to submit a new task to the Executor is a non-blocking way. The Client can submit
// a new task using the Executor's tasks channel directly but that will block if the tasks channel is
// full.
// It should be considered that this method doesn't add the given task if the tasks channel is full
// and it is up to client to try again later.
func (executor *Executor) AddTask(task Task) bool {
	if len(executor.Tasks) == cap(executor.Tasks) {
		return false
	}

	executor.Tasks <- task
	return true
}

// addReport is used by the Executor to publish the reports in a non-blocking way. It client is not
// reading the reports channel or is slower that the Executor publishing the reports, the Executor's
// reports channel is going to get full. In that case this method will not block and that report will
// not be added.
func (executor *Executor) addReport(report Report) bool {
	if len(executor.Reports) == cap(executor.Reports) {
		return false
	}

	executor.Reports <- report
	return true
}

// Inactive checks if the Executor is idle. This happens when there are no pending tasks, active
// workers and reports to publish.
func (executor *Executor) Inactive() bool {
	return executor.ActiveWorkers == 0 && len(executor.Tasks) == 0 && len(executor.Reports) == 0
}

Simple Language

Unlike a lot of other modern languages, Golang doesn’t have a lot of features. In fact, a compelling case can be made for the language being too restrictive in its feature set and that’s intended. It is not designed around a programming paradigm like Java or designed to support multiple programming paradigms like Python. It’s just bare bones structural programming. Just the essential features thrown into the language and not a single thing more.

After looking at the language, you may feel that the language doesn’t follow any particular philosophy or direction and it feels like every feature is included in here to solve a specific problem and nothing more than that. For example, it has methods and interfaces but not classes; the compiler produces a statically linked binary but still has a garbage collector; it has strict static typing but doesn’t support generics. The language does have a thin runtime but doesn’t support exceptions.

The main idea here that the developer should spend the least amount of time expressing his/her idea or algorithm as code without thinking about “What’s the best way to do this in x language?” and it should be easy to understand for others. It’s still not perfect, it does feel limiting from time to time and some of the essential features like Generics and Exceptions are being considered for the ‘Go 2’.

Performance

Single threaded execution performance NOT a good metric to judge a language, especially when the language is focused around concurrency and parallelism. But still, Golang sports impressive benchmark numbers only beaten by hardcore system programming languages like C, C++, Rust, etc. and it is still improving. The performance is actually very impressive considering its a Garbage collected language and is good enough for almost every use case.

(Image Source: Medium)

Developer Tooling

The adoption of a new tool/language directly depends on its developer experience. And the adoption of Go does speak for its tooling. Here we can see that same ideas and tooling is very minimal but sufficient. It’s all achieved by the ‘go’ command and its subcommands. It’s all command line.

There is no package manager for the language like pip, npm. But you can get any community package by just doing

go get github.com/velotiotech/WebCrawler/blob/master/executor/executor.go

go get github.com/velotiotech/WebCrawler/blob/master/executor/executor.go

CODE: https://gist.github.com/velotiotech/3977b7932b96564ac9a041029d760d6d.js

Yes, it works. You can just pull packages directly from github or anywhere else. They are just source files.

But what about package.json..? I don’t see any equivalent for `go get`. Because there isn’t. You don’t need to specify all your dependency in a single file. You can directly use:

import "github.com/xlab/pocketsphinx-go/sphinx"

import "github.com/xlab/pocketsphinx-go/sphinx"

In your source file itself and when you do `go build` it will automatically `go get` it for you. You can see the full source file here:

package main
import (
	"encoding/binary"
	"bytes"
	"log"
	"os/exec"
	"github.com/xlab/pocketsphinx-go/sphinx"
	pulse "github.com/mesilliac/pulse-simple" // pulse-simple
)
var buffSize int
func readInt16(buf []byte) (val int16) {
	binary.Read(bytes.NewBuffer(buf), binary.LittleEndian, &val)
	return
}
func createStream() *pulse.Stream {
	ss := pulse.SampleSpec{pulse.SAMPLE_S16LE, 16000, 1}
	buffSize = int(ss.UsecToBytes(1 * 1000000))
	stream, err := pulse.Capture("pulse-simple test", "capture test", &ss)
	if err != nil {
		log.Panicln(err)
	}
	return stream
}
func listen(decoder *sphinx.Decoder) {
	stream := createStream()
	defer stream.Free()
	defer decoder.Destroy()
	buf := make([]byte, buffSize)
	var bits []int16
	log.Println("Listening...")
	for {
		_, err := stream.Read(buf)
		if err != nil {
			log.Panicln(err)
		}
		for i := 0; i < buffSize; i += 2 {
			bits = append(bits, readInt16(buf[i:i+2]))
		}
		process(decoder, bits)
		bits = nil
	}
}
func process(dec *sphinx.Decoder, bits []int16) {
	if !dec.StartUtt() {
		panic("Decoder failed to start Utt")
	}
	
	dec.ProcessRaw(bits, false, false)
	dec.EndUtt()
	hyp, score := dec.Hypothesis()
	
	if score > -2500 {
		log.Println("Predicted:", hyp, score)
		handleAction(hyp)
	}
}
func executeCommand(commands ...string) {
	cmd := exec.Command(commands[0], commands[1:]...)
	cmd.Run()
}
func handleAction(hyp string) {
	switch hyp {
		case "SLEEP":
		executeCommand("loginctl", "lock-session")
		
		case "WAKE UP":
		executeCommand("loginctl", "unlock-session")
		case "POWEROFF":
		executeCommand("poweroff")
	}
}
func main() {
	cfg := sphinx.NewConfig(
		sphinx.HMMDirOption("/usr/local/share/pocketsphinx/model/en-us/en-us"),
		sphinx.DictFileOption("6129.dic"),
		sphinx.LMFileOption("6129.lm"),
		sphinx.LogFileOption("commander.log"),
	)
	
	dec, err := sphinx.NewDecoder(cfg)
	if err != nil {
		panic(err)
	}
	listen(dec)
}

package main

import (
	"encoding/binary"
	"bytes"
	"log"
	"os/exec"

	"github.com/xlab/pocketsphinx-go/sphinx"
	pulse "github.com/mesilliac/pulse-simple" // pulse-simple
)

var buffSize int

func readInt16(buf []byte) (val int16) {
	binary.Read(bytes.NewBuffer(buf), binary.LittleEndian, &val)
	return
}

func createStream() *pulse.Stream {
	ss := pulse.SampleSpec{pulse.SAMPLE_S16LE, 16000, 1}
	buffSize = int(ss.UsecToBytes(1 * 1000000))
	stream, err := pulse.Capture("pulse-simple test", "capture test", &ss)
	if err != nil {
		log.Panicln(err)
	}
	return stream
}

func listen(decoder *sphinx.Decoder) {
	stream := createStream()
	defer stream.Free()
	defer decoder.Destroy()
	buf := make([]byte, buffSize)
	var bits []int16

	log.Println("Listening...")

	for {
		_, err := stream.Read(buf)
		if err != nil {
			log.Panicln(err)
		}

		for i := 0; i < buffSize; i += 2 {
			bits = append(bits, readInt16(buf[i:i+2]))
		}

		process(decoder, bits)
		bits = nil
	}
}

func process(dec *sphinx.Decoder, bits []int16) {
	if !dec.StartUtt() {
		panic("Decoder failed to start Utt")
	}
	
	dec.ProcessRaw(bits, false, false)
	dec.EndUtt()
	hyp, score := dec.Hypothesis()
	
	if score > -2500 {
		log.Println("Predicted:", hyp, score)
		handleAction(hyp)
	}
}

func executeCommand(commands ...string) {
	cmd := exec.Command(commands[0], commands[1:]...)
	cmd.Run()
}

func handleAction(hyp string) {
	switch hyp {
		case "SLEEP":
		executeCommand("loginctl", "lock-session")
		
		case "WAKE UP":
		executeCommand("loginctl", "unlock-session")

		case "POWEROFF":
		executeCommand("poweroff")
	}
}

func main() {
	cfg := sphinx.NewConfig(
		sphinx.HMMDirOption("/usr/local/share/pocketsphinx/model/en-us/en-us"),
		sphinx.DictFileOption("6129.dic"),
		sphinx.LMFileOption("6129.lm"),
		sphinx.LogFileOption("commander.log"),
	)
	
	dec, err := sphinx.NewDecoder(cfg)
	if err != nil {
		panic(err)
	}

	listen(dec)
}

This binds the dependency declaration with source itself.

As you can see by now, it’s simple, minimal and yet sufficient and elegant. There is first hand support for both unit tests and benchmarks with flame charts too. Just like the feature set, it also has its downsides. For example, `go get` doesn’t support versions and you are locked to the import URL passed in you source file. It is evolving and other tools have come up for dependency management.

Golang was originally designed to solve the problems that Google had with their massive code bases and the imperative need to code efficient concurrent apps. It makes coding applications/libraries that utilize the multicore nature of modern microchips very easy. And, it never gets into a developer’s way. It’s a simple modern language and it never tries to become anything more that that.

Protobuf (Protocol Buffers)

Protobuf or Protocol Buffers is a binary communication format by Google. It is used to serialize structured data. A communication format? Kind of like JSON? Yes. It’s more than 10 years old and Google has been using it for a while now.

But don’t we have JSON and it’s so ubiquitous…

Just like Golang, Protobufs doesn’t really solve anything new. It just solves existing problems more efficiently and in a modern way. Unlike Golang, they are not necessarily more elegant than the existing solutions. Here are the focus points of protobuf:

It’s a binary format, unlike JSON and XML, which are text based and hence it’s vastly space efficient.
First hand and sophisticated support for schemas.
First hand support for generating parsing and consumer code in various languages.

Binary format and speed

So are protobuf really that fast? The short answer is, yes. According to the Google Developers they are 3 to 10 times smaller and 20 to 100 times faster than XML. It’s not a surprise as it is a binary format, the serialized data is not human readable.

(Image Source: Beating JSON performance with Protobuf)

Protobufs take a more planned approach. You define `.proto` files which are kind of the schema files but are much more powerful. You essentially define how you want your messages to be structured, which fields are optional or required, their data types etc. After that the protobuf compiler will generate the data access classes for you. You can use these classes in your business logic to facilitate communication.

Looking at a `.proto` file related to a service will also give you a very clear idea of the specifics of the communication and the features that are exposed. A typical .proto file looks like this:

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }
  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }
  repeated PhoneNumber phone = 4;
}

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

Fun Fact: Jon Skeet, the king of Stack Overflow is one of the main contributors in the project.

gRPC

gRPC, as you guessed it, is a modern RPC (Remote Procedure Call) framework. It is a batteries included framework with built in support for load balancing, tracing, health checking, and authentication. It was open sourced by Google in 2015 and it’s been gaining popularity ever since.

An RPC framework…? What about REST…?

SOAP with WSDL has been used long time for communication between different systems in a Service Oriented Architecture. At the time, the contracts used to be strictly defined and systems were big and monolithic, exposing a large number of such interfaces.

Then came the concept of ‘browsing’ where the server and client don’t need to be tightly coupled. A client should be able to browse service offerings even if they were coded independently. If the client demanded the information about a book, the service along with what’s requested may also offer a list of related books so that client can browse. REST paradigm was essential to this as it allows the server and client to communicate freely without strict restriction using some primitive verbs.

As you can see above, the service is behaving like a monolithic system, which along with what is required is also doing n number of other things to provide the client with the intended `browsing` experience. But this is not always the use case. Is it?

Enter the Microservices

There are many reasons to adopt for a Microservice Architecture. The prominent one being the fact that it is very hard to scale a Monolithic system. While designing a big system with Microservices Architecture each business or technical requirement is intended to be carried out as a cooperative composition of several primitive ‘micro’ services.

These services don’t need to be comprehensive in their responses. They should perform specific duties with expected responses. Ideally, they should behave like pure functions for seamless composability.

Now using REST as a communication paradigm for such services doesn’t provide us with much of a benefit. However, exposing a REST API for a service does enable a lot of expression capability for that service but again if such expression power is neither required nor intended we can use a paradigm that focuses more on other factors.

gRPC intends to improve upon the following technical aspects over traditional HTTP requests:

HTTP/2 by default with all its goodies.
Protobuf as machines are talking.
Dedicated support for streaming calls thanks to HTTP/2.
Pluggable auth, tracing, load balancing and health checking because you always need these.

As it’s an RPC framework, we again have concepts like Service Definition and Interface Description Language which may feel alien to the people who were not there before REST but this time it feels a lot less clumsy as gRPC uses Protobuf for both of these.

Protobuf is designed in such a way that it can be used as a communication format as well as a protocol specification tool without introducing anything new. A typical gRPC service definition looks like this:

service HelloService {
  rpc SayHello (HelloRequest) returns (HelloResponse);
}
message HelloRequest {
  string greeting = 1;
}
message HelloResponse {
  string reply = 1;
}

service HelloService {
  rpc SayHello (HelloRequest) returns (HelloResponse);
}

message HelloRequest {
  string greeting = 1;
}

message HelloResponse {
  string reply = 1;
}

You just write a `.proto` file for your service describing the interface name, what it expects, and what it returns as Protobuf messages. Protobuf compiler will then generate both the client and server side code. Clients can call this directly and server-side can implement these APIs to fill in the business logic.

Conclusion

Golang, along with gRPC using Protobuf is an emerging stack for modern server programming. Golang simplifies making concurrent/parallel applications and gRPC with Protobuf enables efficient communication with a pleasing developer experience.

December 12, 2022

Scalable Real-time Communication With Pusher
What and why?

Pusher is a hosted API service which makes adding real-time data and functionality to web and mobile applications seamless.

Pusher works as a real-time communication layer between the server and the client. It maintains persistent connections at the client using WebSockets, as and when new data is added to your server. If a server wants to push new data to clients, they can do it instantly using Pusher. It is highly flexible, scalable, and easy to integrate. Pusher has exposed over 40+ SDKs that support almost all tech stacks.

In the context of delivering real-time data, there are other hosted and self-hosted services available. It depends on the use case of what exactly one needs, like if you need to broadcast data across all the users or something more complex having specific target groups. In our use case, Pusher was well-suited, as the decision was based on the easy usage, scalability, private and public channels, webhooks, and event-based automation. Other options which we considered were Socket.IO, Firebase & Ably, etc.

Pusher is categorically well-suited for communication and collaboration features using WebSockets. The key difference with Pusher: it’s a hosted service/API. It takes less work to get started, compared to others, where you need to manage the deployment yourself. Once we do the setup, it comes to scaling, that reduces future efforts/work.

Some of the most common use cases of Pusher are:

1. Notification: Pusher can inform users if there is any relevant change. Notifications can also be thought of as a form of signaling, where there is no representation of the notification in the UI. Still, it triggers a reaction within an application.

2. Activity streams: Stream of activities which are published when something changes on the server or someone publishes it across all channels.

3. Live Data Visualizations: Pusher allows you to broadcast continuously changing data when needed.

4. Chats: You can use Pusher for peer to peer or peer to multichannel communication.

In this blog, we will be focusing on using Channels, which is an alias for Pub/Sub messaging API for a JavaScript-based application. Pusher also comes with Chatkit and Beams (Push Notification) SDK/APIs.
- Chatkit is designed to make chat integration to your app as simple as possible. It allows you to add group chat and 1 to 1 chat feature to your app. It also allows you to add file attachments and online indicators.
- Beams are used for adding Push Notification in your Mobile App. It includes SDKs to seamlessly manage push token and send notifications.
Step 1: Getting Started

Setup your account on the Pusher dashboard and get your free API keys.

Image Source: Pusher
1. Click on Channels
2. Create an App. Add details based on the project and the environment
3. Click on the App Keys tab to get the app keys.
4. You can also check the getting started page. It will give code snippets to get you started.
Add Pusher to your project:
var express = require('express'); var bodyParser = require('body-parser'); var app = express(); app.use(bodyParser.json()); app.use(bodyParser.urlencoded({ extended: false })); app.post('/pusher/auth', function(req, res) { var socketId = req.body.socket_id; var channel = req.body.channel_name; var auth = pusher.authenticate(socketId, channel); res.send(auth); }); var port = process.env.PORT || 5000; app.listen(port);
```
var express = require('express');
var bodyParser = require('body-parser');

var app = express();
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: false }));

app.post('/pusher/auth', function(req, res) {
  var socketId = req.body.socket_id;
  var channel = req.body.channel_name;
  var auth = pusher.authenticate(socketId, channel);
  res.send(auth);
});

var port = process.env.PORT || 5000;
app.listen(port);
```
CODE: https://gist.github.com/velotiotech/f09f14363bacd51446d5318e5050d628.js

or using npm
```
npm i pusher
```
CODE: https://gist.github.com/velotiotech/423115d0943c1b882c913e437c529d11.js

Step 2: Subscribing to Channels

There are three types of channels in Pusher: Public, Private, and Presence.
- Public channels: These channels are public in nature, so anyone who knows the channel name can subscribe to the channel and start receiving messages from the channel. Public channels are commonly used to broadcast general/public information, which does not contain any secure information or user-specific data.
- ‍Private channels: These channels have an access control mechanism that allows the server to control who can subscribe to the channel and receive data from the channel. All private channels should have a private- prefixed to the name. They are commonly used when the sever needs to know who can subscribe to the channel and validate the subscribers.
- ‍Presence channels: It is an extension to the private channel. In addition to the properties which private channels have, it lets the server ‘register’ users information on subscription to the channel. It also enables other members to identify who is online.
In your application, you can create a subscription and start listening to events on:
```
// Here my-channel is the channel name
// all the event published to this channel would be available
// once you subscribe to the channel and start listing to it.

var channel = pusher.subscribe('my-channel');

channel.bind('my-event', function(data) {
  alert('An event was triggered with message: ' + data.message);
});
```
CODE: https://gist.github.com/velotiotech/d8c27960e2fac408a8db57b92f1e846d.js

Step 3: Creating Channels

For creating channels, you can use the dashboard or integrate it with your server. For more details on how to integrate Pusher with your server, you can read (Server API). You need to create an app on your Pusher dashboard and can use it to further trigger events to your app.

or

Integrate Pusher with your server. Here is a sample snippet from our node App:
```
var Pusher = require('pusher');

var pusher = new Pusher({
  appId: 'APP_ID',
  key: 'APP_KEY',
  secret: 'APP_SECRET',
  cluster: 'APP_CLUSTER'
});

// Logic which will then trigger events to a channel
function trigger(){
...
...
pusher.trigger('my-channel', 'my-event', {"message": "hello world"});
...
...
}
```
CODE: https://gist.github.com/velotiotech/6f5b0f6407c0a74a0bce4b398a849410.js

Step 4: Adding Security

As a default behavior, anyone who knows your public app key can open a connection to your channels app. This behavior does not add any security risk, as connections can only access data on channels.

For more advanced use cases, you need to use the “Authorized Connections” feature. It authorizes every single connection to your channels, and hence, avoids unwanted/unauthorized connection. To enable the authorization, set up an auth endpoint, then modify your client code to look like this.
```
const channels = new Pusher(APP_KEY, {
  cluster: APP_CLUSTER,
  authEndpoint: '/your_auth_endpoint'
});

const channel = channels.subscribe('private-<channel-name>');
```
CODE: https://gist.github.com/velotiotech/9369051e5661a95352f08b1fdd8bf9ed.js

For more details on how to create an auth endpoint for your server, read this. Here is a snippet from Node.js app
var express = require('express'); var bodyParser = require('body-parser'); var app = express(); app.use(bodyParser.json()); app.use(bodyParser.urlencoded({ extended: false })); app.post('/pusher/auth', function(req, res) { var socketId = req.body.socket_id; var channel = req.body.channel_name; var auth = pusher.authenticate(socketId, channel); res.send(auth); }); var port = process.env.PORT || 5000; app.listen(port);
```
var express = require('express');
var bodyParser = require('body-parser');

var app = express();
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: false }));

app.post('/pusher/auth', function(req, res) {
  var socketId = req.body.socket_id;
  var channel = req.body.channel_name;
  var auth = pusher.authenticate(socketId, channel);
  res.send(auth);
});

var port = process.env.PORT || 5000;
app.listen(port);
```
CODE: https://gist.github.com/velotiotech/fb67d5efe3029174abc6991089a910e1.js

Step 5: Scale as you grow

Pusher comes with a wide range of plans which you can subscribe to based on your usage. You can scale your application as it grows. Here is a snippet from available plans for mode details you can refer this.

Image Source: Pusher

Conclusion

This article has covered a brief description of Pusher, its use cases, and how you can use it to build a scalable real-time application. Using Pusher may vary based on different use cases; it is no real debate on what one can choose. Pusher approach is simple and API based. It enables developers to add real-time functionality to any application in very little time.

If you want to get hands-on tutorials/blogs, please visit here.
December 12, 2022

Blog

Introduction

A Typical Application

Simple Multi-tier Application

Problem Scenario 1

Multi-tier Application With Load Balancer

The Routing Question

Problem Scenario 2

The Complete Picture

Installation and Configuration

Setup the Server

Install the binaries on the server

Create the Server Configuration

Setup the Load-Balancer

Install the binaries on the server

Create the Load-Balancer Configuration

Setup the Client (Worker) Machines

Install the binaries on the server

Create the Worker Configuration

Test the Setup

Submit Jobs

Run the load-balancer job

Check the status of the load-balancer

Run the service ‘foo’

Check the status of service ‘foo’

Run the service ‘bar’

Check the status of service ‘bar’

Check the Fabio Routes

Connect to the Services

Conclusion

Prerequisites

Why Github Actions?

Why CircleCI?

Let’s Get Started

Installing Dependencies

Adding Application Components

Setting up Serverless

Creating Github Actions Workflow

Creating CircleCI Workflow

Conclusion

Related Articles

Using Packer to Create AMI’s for Jenkins Master and Linux Slave

Creating Terraform Script for Spinning up Jenkins Master

Creating Terraform Script for Spinning up Linux Slave and connect it to master

Using Packer to create AMI’s for Windows Slave

Creating Terraform Script for Spinning up Windows Slave and Connect it to Master

IAM roles for reference

Bonus:

Conclusion:

Meet PAM!

Understand how OpenSSH interacts with PAM

Application Architecture

Framework and Toolchains

Authentication with QR Code PAM Module

API Gateway WebSocket App

Android App: SSH QR Code Auth

QR Auth API

Compiling and Installing PAM module

Conclusion

References

Introduction

Prerequisites

Initializing Server (Express with TypeScript)

Initialize project using NPM

Setting up TypeScript Configuration (tsconfig.json)

Setting up ESLint

Express App

Error Handling

Connecting with the Database Store using TypeORM

Connecting Redis

Configuring Logging

Creating your First API Service

Deploying your Server to Production

Going Forward

Managing Environments

Configuring Swagger

Writing Tests

Further Reading

‍What Does Serverless Mean?‍

The Serverless World‍