Blog

Understanding Node.js Async Flows: Parallel, Serial, Waterfall and Queues

Promises in Javascript has been around for a long time now. It helped solve the problem of callback hell. But as soon as the requirements get complicated with control flows, promises start getting unmanageable and harder to work with. This is where async flows come to the rescue. In this blog, let’s talk about the various async flows which are used frequently rather than raw promises and callbacks.

Async Utility Module

Async is a utility module which provides straight-forward, powerful functions for working with asynchronous JavaScript. Although it is built on top of promises, it makes asynchronous code look and behave a little more like synchronous code, making it easier to read and maintain.

Async utility has a number of control flows. Let’s discuss the most popular ones and their use cases:

1. Parallel

When we have to run multiple tasks independent of each other without waiting until the previous task has completed, parallel comes into the picture.

async.parallel(tasks, callback)

async.parallel(tasks, callback)

Tasks: A collection of functions to run. It can be an array, an object or any iterable.

Callback: This is the callback where all the task results are passed and is executed once all the task execution has completed.

In case an error is passed to a function’s callback, the main callback is immediately called with the error. Although parallel is about starting I/O tasks in parallel, it’s not about parallel execution since Javascript is single-threaded.

An example of Parallel is shared below:

async.parallel([
  function(callback) {
    setTimeout(function() {
      console.log('Task One');
      callback(null, 1);
    }, 200);
  },
  function(callback) {
    setTimeout(function() {
      console.log('Task Two');
      callback(null, 2);
    }, 100);
  }
],
function(err, results) {
  console.log(results);
  // the results array will equal [1, 2] even though
  // the second function had a shorter timeout.
});

// an example using an object instead of an array
async.parallel({
  task1: function(callback) {
    setTimeout(function() {
      console.log('Task One');
      callback(null, 1);
    }, 200);
  },
  task2: function(callback) {
    setTimeout(function() {
      console.log('Task Two');
      callback(null, 2);
    }, 100);
    }
}, function(err, results) {
  console.log(results);
  // results now equals to: { task1: 1, task2: 2 }
});

async.parallel([
  function(callback) {
    setTimeout(function() {
      console.log('Task One');
      callback(null, 1);
    }, 200);
  },
  function(callback) {
    setTimeout(function() {
      console.log('Task Two');
      callback(null, 2);
    }, 100);
  }
],
function(err, results) {
  console.log(results);
  // the results array will equal [1, 2] even though
  // the second function had a shorter timeout.
});

// an example using an object instead of an array
async.parallel({
  task1: function(callback) {
    setTimeout(function() {
      console.log('Task One');
      callback(null, 1);
    }, 200);
  },
  task2: function(callback) {
    setTimeout(function() {
      console.log('Task Two');
      callback(null, 2);
    }, 100);
    }
}, function(err, results) {
  console.log(results);
  // results now equals to: { task1: 1, task2: 2 }
});

2. Series

When we have to run multiple tasks which depend on the output of the previous task, series comes to our rescue.

async.series(tasks, callback)

async.series(tasks, callback)

Tasks: A collection of functions to run. It can be an array, an object or any iterable.

Callback: This is the callback where all the task results are passed and is executed once all the task execution has completed.

Callback function receives an array of result objects when all the tasks have been completed. If an error is encountered in any of the task, no more functions are run but the final callback is called with the error value.

An example of Series is shared below:

async.series([
  function(callback) {
    console.log('one');
    callback(null, 1);
  },
  function(callback) {
    console.log('two');
    callback(null, 2);
  },
  function(callback) {
    console.log('three');
    callback(null, 3);
  }
],
function(err, results) {
  console.log(result);
  // results is now equal to [1, 2, 3]
});

async.series({
  1: function(callback) {
    setTimeout(function() {
      console.log('Task 1');
      callback(null, 'one');
    }, 200);
  },
  2: function(callback) {
    setTimeout(function() {
      console.log('Task 2');
      callback(null, 'two');
    }, 300);
  },
  3: function(callback) {
    setTimeout(function() {
      console.log('Task 3');
      callback(null, 'three');
    }, 100);
  }
},
function(err, results) {
  console.log(results);
  // results is now equal to: { 1: 'one', 2: 'two', 3:'three' }
});

async.series([
  function(callback) {
    console.log('one');
    callback(null, 1);
  },
  function(callback) {
    console.log('two');
    callback(null, 2);
  },
  function(callback) {
    console.log('three');
    callback(null, 3);
  }
],
function(err, results) {
  console.log(result);
  // results is now equal to [1, 2, 3]
});

async.series({
  1: function(callback) {
    setTimeout(function() {
      console.log('Task 1');
      callback(null, 'one');
    }, 200);
  },
  2: function(callback) {
    setTimeout(function() {
      console.log('Task 2');
      callback(null, 'two');
    }, 300);
  },
  3: function(callback) {
    setTimeout(function() {
      console.log('Task 3');
      callback(null, 'three');
    }, 100);
  }
},
function(err, results) {
  console.log(results);
  // results is now equal to: { 1: 'one', 2: 'two', 3:'three' }
});

3. Waterfall

When we have to run multiple tasks which depend on the output of previous task, Waterfall can be helpful.

async.waterfall(tasks, callback)

async.waterfall(tasks, callback)

Tasks: A collection of functions to run. It can be an array, an object or any iterable structure.

Callback: This is the callback where all the task results are passed and is executed once all the task execution has completed.

It will run one function at a time and pass the result of the previous function to the next one.

An example of Waterfall is shared below:

async.waterfall([
  function(callback) {
    callback(null, 'Task 1', 'Task 2');
  },
  function(arg1, arg2, callback) {
    // arg1 now equals 'Task 1' and arg2 now equals 'Task 2'
    let arg3 = arg1 + ' and ' + arg2;
    callback(null, arg3);
  },
  function(arg1, callback) {
    // arg1 now equals 'Task1 and Task2'
    arg1 += ' completed';
    callback(null, arg1);
  }
], function(err, result) {
  // result now equals to 'Task1 and Task2 completed'
  console.log(result);
});

// Or, with named functions:
async.waterfall([
  myFirstFunction,
  mySecondFunction,
  myLastFunction,
], function(err, result) {
  // result now equals 'Task1 and Task2 completed'
  console.log(result);
});

function myFirstFunction(callback) {
  callback(null, 'Task 1', 'Task 2');
}
function mySecondFunction(arg1, arg2, callback) {
  // arg1 now equals 'Task 1' and arg2 now equals 'Task 2'
  let arg3 = arg1 + ' and ' + arg2;
  callback(null, arg3);
}
function myLastFunction(arg1, callback) {
  // arg1 now equals 'Task1 and Task2'
  arg1 += ' completed';
  callback(null, arg1);
}

async.waterfall([
  function(callback) {
    callback(null, 'Task 1', 'Task 2');
  },
  function(arg1, arg2, callback) {
    // arg1 now equals 'Task 1' and arg2 now equals 'Task 2'
    let arg3 = arg1 + ' and ' + arg2;
    callback(null, arg3);
  },
  function(arg1, callback) {
    // arg1 now equals 'Task1 and Task2'
    arg1 += ' completed';
    callback(null, arg1);
  }
], function(err, result) {
  // result now equals to 'Task1 and Task2 completed'
  console.log(result);
});

// Or, with named functions:
async.waterfall([
  myFirstFunction,
  mySecondFunction,
  myLastFunction,
], function(err, result) {
  // result now equals 'Task1 and Task2 completed'
  console.log(result);
});

function myFirstFunction(callback) {
  callback(null, 'Task 1', 'Task 2');
}
function mySecondFunction(arg1, arg2, callback) {
  // arg1 now equals 'Task 1' and arg2 now equals 'Task 2'
  let arg3 = arg1 + ' and ' + arg2;
  callback(null, arg3);
}
function myLastFunction(arg1, callback) {
  // arg1 now equals 'Task1 and Task2'
  arg1 += ' completed';
  callback(null, arg1);
}

4. Queue

When we need to run a set of tasks asynchronously, queue can be used. A queue object based on an asynchronous function can be created which is passed as worker.

async.queue(task, concurrency)

async.queue(task, concurrency)

Task: Here, it takes two parameters, first – the task to be performed and second – the callback function.

Concurrency: It is the number of functions to be run in parallel.

async.queue returns a queue object that supports few properties:

push: Adds tasks to the queue to be processed.
drain: The drain function is called after the last task of the queue.
unshift: Adds tasks in front of the queue.

An example of Queue is shared below:

// create a queue object with concurrency 2
var q = async.queue(function(task, callback) {
  console.log('Hello ' + task.name);
  callback();
}, 2);

// assign a callback
q.drain = function() {
  console.log('All items have been processed');
};

// add some items to the queue
q.push({name: 'foo'}, function(err) {
  console.log('Finished processing foo');
});

q.push({name: 'bar'}, function (err) {
  console.log('Finished processing bar');
});

// add some items to the queue (batch-wise)
q.push([{name: 'baz'},{name: 'bay'},{name: 'bax'}], function(err) {
  console.log('Finished processing item');
});

// add some items to the front of the queue
q.unshift({name: 'bar'}, function (err) {
  console.log('Finished processing bar');
});

// create a queue object with concurrency 2
var q = async.queue(function(task, callback) {
  console.log('Hello ' + task.name);
  callback();
}, 2);

// assign a callback
q.drain = function() {
  console.log('All items have been processed');
};

// add some items to the queue
q.push({name: 'foo'}, function(err) {
  console.log('Finished processing foo');
});

q.push({name: 'bar'}, function (err) {
  console.log('Finished processing bar');
});

// add some items to the queue (batch-wise)
q.push([{name: 'baz'},{name: 'bay'},{name: 'bax'}], function(err) {
  console.log('Finished processing item');
});

// add some items to the front of the queue
q.unshift({name: 'bar'}, function (err) {
  console.log('Finished processing bar');
});

5. Priority Queue

It is the same as queue, the only difference being that a priority can be assigned to the tasks which is considered in ascending order.

async.priorityQueue(task,concurrency)

async.priorityQueue(task,concurrency)

Task: Here, it takes three parameters:

First – task to be performed.
Second – priority, it is a number that determines the sequence of execution. For array of tasks, the priority remains same for all of them.
Third – Callback function.

The async.priorityQueue does not support ‘unshift’ property of the queue.

An example of Priority Queue is shared below:

// create a queue object with concurrency 1
var q = async.priorityQueue(function(task, callback) {
  console.log('Hello ' + task.name);
  callback();
}, 1);

// assign a callback
q.drain = function() {
  console.log('All items have been processed');
};

// add some items to the queue with priority
q.push({name: 'foo'}, 3, function(err) {
  console.log('Finished processing foo');
});

q.push({name: 'bar'}, 2, function (err) {
  console.log('Finished processing bar');
});

// add some items to the queue (batch-wise) which will have same priority
q.push([{name: 'baz'},{name: 'bay'},{name: 'bax'}], 1, function(err) {
  console.log('Finished processing item');
});

// create a queue object with concurrency 1
var q = async.priorityQueue(function(task, callback) {
  console.log('Hello ' + task.name);
  callback();
}, 1);

// assign a callback
q.drain = function() {
  console.log('All items have been processed');
};

// add some items to the queue with priority
q.push({name: 'foo'}, 3, function(err) {
  console.log('Finished processing foo');
});

q.push({name: 'bar'}, 2, function (err) {
  console.log('Finished processing bar');
});

// add some items to the queue (batch-wise) which will have same priority
q.push([{name: 'baz'},{name: 'bay'},{name: 'bax'}], 1, function(err) {
  console.log('Finished processing item');
});

6. Race

It runs all the tasks in parallel, but as soon as any of the function completes its execution or passes error to its callback, the main callback is immediately called.

async.race(tasks, callback)

async.race(tasks, callback)

Task: Here, it is a collection of functions to run. It is an array or any iterable.

Callback: The result of the first complete execution is passed. It may be the result or error.

An example of Race is shared below:

async.race([
  function (callback) {
    setTimeout(function () {
      callback(null, 'one');
    }, 300);
  },
  function (callback) {
    setTimeout(function () {
      callback(null, 'two');
    }, 100);
  },
  function (callback) {
    setTimeout(function () {
      callback(null, 'three');
    }, 200);
  }
],
  // main callback
  function (err, result) {
    // the result will be equal to 'two' as it finishes earlier than the other 2
    console.log('The result is ', result);
  });

async.race([
  function (callback) {
    setTimeout(function () {
      callback(null, 'one');
    }, 300);
  },
  function (callback) {
    setTimeout(function () {
      callback(null, 'two');
    }, 100);
  },
  function (callback) {
    setTimeout(function () {
      callback(null, 'three');
    }, 200);
  }
],
  // main callback
  function (err, result) {
    // the result will be equal to 'two' as it finishes earlier than the other 2
    console.log('The result is ', result);
  });

Combining Async Flows

In complex scenarios, the async flows like parallel and series can be combined and nested. This helps in achieving the expected output with the benefits of async utilities.

However, the only difference between Waterfall and Series async utility is that the final callback in series receives an array of results of all the task whereas in Waterfall, the result object of the final task is received by the final callback.

Conclusion

Async Utilities has an upper hand over promises due to its concise and clean code, better error handling and easier debugging. It makes us realize how simple and easy asynchronous code can be without the syntactical mess of promises and callback hell.

December 12, 2022

Getting Started With Golang Channels! Here’s Everything You Need to Know
We live in a world where speed is important. With cutting-edge technology coming into the telecommunications and software industry, we expect to get things done quickly. We want to develop applications that are fast, can process high volumes of data and requests, and keep the end-user happy.

This is great, but of course, it’s easier said than done. That’s why concurrency and parallelism are important in application development. We must process data as fast as possible. Every programming language has its own way of dealing with this, and we will see how Golang does it.

Now, many of us choose Golang because of its concurrency, and the inclusion of goroutines and channels has massively impacted the concurrency.

This blog will cover channels and how they work internally, as well as their key components. To benefit the most from this content, it will help to know a little about goroutines and channels as this blog gets into the internals of channels. If you don’t know anything, then don’t worry, we’ll be starting off with an introduction to channels, and then we’ll see how they operate.

What are channels?

Normally, when we talk about channels, we think of the ones in applications like RabbitMQ, Redis, AWS SQS, and so on. Anyone with no or only a small amount of Golang knowledge would think like this. But Channels in Golang are different from a work queue system. In the work queue system like above, there are TCP connections to the channels, but in Go, the channel is a data structure or even a design pattern, which we’ll explain later. So, what are the channels in Golang exactly?

Channels are the medium through which goroutines can communicate with each other. In simple terms, a channel is a pipe that allows a goroutine to either put or read the data.

What are goroutines?

So, a channel is a communication medium for goroutines. Now, let’s give a quick overview of what goroutines are. If you know this already, feel free to skip this section.

Technically, a goroutine is a function that executes independently in a concurrent fashion. In simple terms, it’s a lightweight thread that’s managed by go runtime.

You can create a goroutine by using a Go keyword before a function call.

Let’s say there’s a function called PrintHello, like this:
```
func PrintHello() {
   fmt.Println("Hello")
}
```
You can make this into a goroutine simply by calling this function, as below:
```
//create goroutine
 go PrintHello()
```
Now, let’s head back to channels, as that’s the important topic of this blog.

How to define a channel?

Let’s see a syntax that will declare a channel. We can do so by using the chan keyword provided by Go.

You must specify the data type as the channel can handle data of the same data type.
```
//create channel
 var c chan int
```
Very simple! But this is not useful since it would create a Nil channel. Let’s print it and see.
```
fmt.Println(c)
fmt.Printf("Type of channel: %T", c)
<nil>
Type of channel: chan int
```
As you can see, we have just declared the channel, but we can’t transport data through it. So, to create a useful channel, we must use the make function.
```
//create channel
c := make(chan int)
fmt.Printf("Type of `c`: %T\n", c)
fmt.Printf("Value of `c` is %v\n", c)
 
Type of `c`: chan int
Value of `c` is 0xc000022120
```
As you may notice here, the value of c is a memory address. Keep in mind that channels are nothing but pointers. That’s why we can pass them to goroutines, and we can easily put the data or read the data. Now, let’s quickly see how to read and write the data to a channel.

Read and write operations on a channel:

Go provides an easy way to read and write data to a channel by using the left arrow.
```
c <- 10
```
This is a simple syntax to put the value in our created channel. The same syntax is used to define the “send” only type of channels.

And to get/read the data from channel, we do this:
```
<-c
```
This is also the way to define the “receive” only type of channels.

Let’s see a simple program to use the channels.
```
func printChannelData(c chan int) {
   fmt.Println("Data in channel is: ", <-c)
}
```
This simple function just prints whatever data is in the channel. Now, let’s see the main function that will push the data into the channel.
```
func main() {
   fmt.Println("Main started...")
   //create channel of int
   c := make(chan int)
   // call to goroutine
   go printChannelData(c)
   // put the data in channel
   c <- 10
   fmt.Println("Main ended...")
}
```
This yields to the output:
```
Main started...
Data in channel is:  10
Main ended...
```
Let’s talk about the execution of the program.

1. We declared a printChannelData function, which accepts a channel c of data type integer. In this function, we are just reading data from channel c and printing it.

2. Now, this method will first print “main started…” to the console.

3. Then, we have created the channel c of data type integer using the make keyword.

4. We now pass the channel to the function printChannelData, and as we saw earlier, it’s a goroutine.

5. At this point, there are two goroutines. One is the main goroutine, and the other is what we have declared.

6. Now, we are putting 10 as data in the channel, and at this point, our main goroutine is blocked and waiting for some other goroutine to read the data. The reader, in this case, is the printChannelData goroutine, which was previously blocked because there was no data in the channel. Now that we’ve pushed the data onto the channel, the Go scheduler (more on this later in the blog) now schedules printChannelData goroutine, and it will read and print the value from the channel.

7. After that, the main goroutine again activates and prints “main ended…” and the program stops.

So, what’s happening here? Basically, blocking and unblocking operations are done over goroutines by the Go scheduler. Unless there’s data in a channel you can’t read from it, which is why our printChannelData goroutine was blocked in the first place, the written data has to be read first to resume further operations. This happened in case of our main goroutine.

With this, let’s see how channels operate internally.

Internals of channels:

Until now, we have seen how to define a goroutine, how to declare a channel, and how to read and write data through a channel with a very simple example. Now, let’s look at how Go handles this blocking and unblocking nature internally. But before that, let’s quickly see the types of channels.

Types of channels:

There are two basic types of channels: buffered channels and unbuffered channels. The above example illustrates the behaviour of unbuffered channels. Let’s quickly see the definition of these:
- Unbuffered channel: This is what we have seen above. A channel that can hold a single piece of data, which has to be consumed before pushing other data. That’s why our main goroutine got blocked when we added data into the channel.
- Buffered channel: In a buffered channel, we specify the data capacity of a channel. The syntax is very simple. c := make(chan int,10) the second argument in the make function is the capacity of a channel. So, we can put up to ten elements in a channel. When the capacity is full, then that channel would get blocked so that the receiver goroutine can start consuming it.
Properties of a channel:

A channel does lot of things internally, and it holds some of the properties below:
- Channels are goroutine-safe.
- Channels can store and pass values between goroutines.
- Channels provide FIFO semantics.
- Channels cause goroutines to block and unblock, which we just learned about.
As we see the internals of a channel, you’ll learn about the first three properties.

Channel Structure:

As we learned in the definition, a channel is data structure. Now, looking at the properties above, we want a mechanism that handles goroutines in a synchronized manner and with a FIFO semantics. This can be solved using a queue with a lock. So, the channel internally behaves in that fashion. It has a circular queue, a lock, and some other fields.

When we do this c := make(chan int,10) Go creates a channel using hchan struct, which has the following fields:
type hchan struct { qcount uint // total data in the queue dataqsiz uint // size of the circular queue buf unsafe.Pointer // points to an array of dataqsiz elements elemsize uint16 closed uint32 elemtype *_type // element type sendx uint // send index recvx uint // receive index recvq waitq // list of recv waiters sendq waitq // list of send waiters // lock protects all fields in hchan, as well as several // fields in sudogs blocked on this channel. // // Do not change another G's status while holding this lock // (in particular, do not ready a G), as this can deadlock // with stack shrinking. lock mutex }
```
type hchan struct {
   qcount   uint           // total data in the queue
   dataqsiz uint           // size of the circular queue
   buf      unsafe.Pointer // points to an array of dataqsiz elements
   elemsize uint16
   closed   uint32
   elemtype *_type // element type
   sendx    uint   // send index
   recvx    uint   // receive index
   recvq    waitq  // list of recv waiters
   sendq    waitq  // list of send waiters
 
   // lock protects all fields in hchan, as well as several
   // fields in sudogs blocked on this channel.
   //
   // Do not change another G's status while holding this lock
   // (in particular, do not ready a G), as this can deadlock
   // with stack shrinking.
   lock mutex
}
```
(Above info taken from Golang.org]

This is what a channel is internally. Let’s see one-by-one what these fields are.

qcount holds the count of items/data in the queue.

dataqsize is the size of a circular queue. This is used in case of buffered channels and is the second parameter used in the make function.

elemsize is the size of a channel with respect to a single element.

buf is the actual circular queue where the data is stored when we use buffered channels.

closed indicates whether the channel is closed. The syntax to close the channel is close(<channel_name>). The default value of this field is 0, which is set when the channel gets created, and it’s set to 1 when the channel is closed.

sendx and recvx indicates the current index of a buffer or circular queue. As we add the data into the buffered channel, sendx increases, and as we start receiving, recvx increases.

recvq and sendq are the waiting queue for the blocked goroutines that are trying to either read data from or write data to the channel.

lock is basically a mutex to lock the channel for each read or write operation as we don’t want goroutines to go into deadlock state.

These are the important fields of a hchan struct, which comes into the picture when we create a channel. This hchan struct basically resides on a heap and the make function gives us a pointer to that location. There’s another struct known as sudog, which also comes into the picture, but we’ll learn more about that later. Now, let’s see what happens when we write and read the data.

Read and write operations on a channel:

We are considering buffered channels in this. When one goroutine, let’s say G1, wants to write the data onto a channel, it does following:
- Acquire the lock: As we saw before, if we want to modify the channel, or hchan struct, we must acquire a lock. So, G1 in this case, will acquire a lock before writing the data.
- Perform enqueue operation: We now know that buf is actually a circular queue that holds the data. But before enqueuing the data, goroutine does a memory copy operation on the data and puts the copy into the buffer slot. We will see an example of this.
- Release the lock: After performing an enqueue operation, it just releases the lock and goes on performing further executions.
When goroutine, let’s say G2, reads the above data, it performs the same operation, except instead of enqueue, it performs dequeue while also performing the memory copy operation. This states that in channels there’s no shared memory, so the goroutines only share the hchan struct, which is protected by mutex. Others are just copies of memory.

This satisfies the famous Golang quote: “Do not communicate by sharing memory instead share memory by communicating.”

Now, let’s look at a small example of this memory copy operation.
func printData(c chan *int) { time.Sleep(time.Second * 3) data := <-c fmt.Println("Data in channel is: ", *data) } func main() { fmt.Println("Main started...") var a = 10 b := &a //create channel c := make(chan *int) go printData(c) fmt.Println("Value of b before putting into channel", *b) c <- b a = 20 fmt.Println("Updated value of a:", a) fmt.Println("Updated value of b:", *b) time.Sleep(time.Second * 2) fmt.Println("Main ended...") }
```
func printData(c chan *int) {
   time.Sleep(time.Second * 3)
   data := <-c
   fmt.Println("Data in channel is: ", *data)
}
 
func main() {
   fmt.Println("Main started...")
   var a = 10
   b := &a
   //create channel
   c := make(chan *int)
   go printData(c)
   fmt.Println("Value of b before putting into channel", *b)
   c <- b
   a = 20
   fmt.Println("Updated value of a:", a)
   fmt.Println("Updated value of b:", *b)
   time.Sleep(time.Second * 2)
   fmt.Println("Main ended...")
}
```
And the output of this is:
```
Main started...
Value of b before putting into channel 10
Updated value of a: 20
Updated value of b: 20
Data in channel is:  10
Main ended...
```
So, as you can see, we have added the value of variable a into the channel, and we modify that value before the channel can access it. However, the value in the channel stays the same, i.e., 10. Because here, the main goroutine has performed a memory copy operation before putting the value onto the channel. So, even if you change the value later, the value in the channel does not change.

Write in case of buffer overflow:

We’ve seen that the Go routine can add data up to the buffer capacity, but what happens when the buffer capacity is reached? When the buffer has no more space and a goroutine, let’s say G1, wants to write the data, the go scheduler blocks/pauses G1, which will wait until a receive happens from another goroutine, say G2. Now, since we are talking about buffer channels, when G2 consumes all the data, the Go scheduler makes G1 active again and G2 pauses. Remember this scenario, as we’ll use G1 and G2 frequently here onwards.

We know that goroutine works in a pause and resume fashion, but who controls it? As you might have guessed, the Go scheduler does the magic here. There are few things that the Go scheduler does and those are very important considering the goroutines and channels.

Go Runtime Scheduler

You may already know this, but goroutines are user-space threads. Now, the OS can schedule and manage threads, but it’s overhead to the OS, considering the properties that threads carry.

That’s why the Go scheduler handles the goroutines, and it basically multiplexes the goroutines on the OS threads. Let’s see how.

There are scheduling models, like 1:1, N:1, etc., but the Go scheduler uses the M:N scheduling model.

Basically, this means that there are a number of goroutines and OS threads, and the scheduler basically schedules the M goroutines on N OS threads. For example:

OS Thread 1:

OS Thread 2:

As you can see, there are two OS threads, and the scheduler is running six goroutines by swapping them as needed. The Go scheduler has three structures as below:
- M: M represents the OS thread, which is entirely managed by the OS, and it’s similar to POSIX thread. M stands for machine.
- G: G represents the goroutine. Now, a goroutine is a resizable stack that also includes information about scheduling, any channel it’s blocked on, etc.
- P: P is a context for scheduling. This is like a single thread that runs the Go code to multiplex M goroutines to N OS threads. This is important part, and that’s why P stands for processor.
Diagrammatically, we can represent the scheduler as:

(This diagram is referenced from The Go scheduler]

The P processor basically holds the queue of runnable goroutines—or simply run queues.

So, anytime the goroutine (G) wants to run it on a OS thread (M), that OS thread first gets hold of P i.e., the context. Now, this behaviour occurs when a goroutine needs to be paused and some other goroutines must run. One such case is a buffered channel. When the buffer is full, we pause the sender goroutine and activate the receiver goroutine.

Imagine the above scenario: G1 is a sender that tries to send a full buffered channel, and G2 is a receiver goroutine. Now, when G1 wants to send a full channel, it calls into the runtime Go scheduler and signals it as gopark. So, now scheduler, or M, changes the state of G1 from running to waiting, and it will schedule another goroutine from the run queue, say G2.

This transition diagram might help you better understand:

As you can see, after the gopark call, G1 is in a waiting state and G2 is running. We haven’t paused the OS thread (M); instead, we’ve blocked the goroutine and scheduled another one. So, we are using maximum throughput of an OS thread. The context switching of goroutine is handled by the scheduler (P), and because of this, it adds complexity to the scheduler.

This is great. But how do we resume G1 now because it still wants to add the data/task on a channel, right? So, before G1 sends the gopark signal, it actually sets a state of itself on a hchan struct, i.e., our channel in the sendq field. Remember the sendq and recvq fields? They’re waiting senders and receivers.

Now, G1 stores the state of itself as a sudog struct. A sudog is simply a goroutine that is waiting on an element. The sudog struct has these elements:
```
type sudog struct{
   g *g
   isSelect bool
   next *sudog
   prev *sudog
   elem unsafe.Pointer //data element
   ...
}
```
g is a waiting goroutine, next and prev are the pointers to sudog/goroutine respectively if there’s any next or previous goroutine present, and elem is the actual element it’s waiting on.

So, considering our example, G1 is basically waiting to write the data so it will create a state of itself, which we’ll call sudog as below:

Cool. Now we know, before going into the waiting state, what operations G1 performs. Currently, G2 is in a running state, and it will start consuming the channel data.

As soon as it receives the first data/task, it will check the waiting goroutine in the sendq attribute of an hchan struct, and it will find that G1 is waiting to push data or a task. Now, here is the interesting thing: G2 will copy that data/task to the buffer, and it will call the scheduler, and the scheduler will put G1 from the waiting state to runnable, and it will add G1 to the run queue and return to G2. This call from G2 is known as goready, and it will happen for G1. Impressive, right? Golang behaves like this because when G1 runs, it doesn’t want to hold onto a lock and push the data/task. That extra overhead is handled by G2. That’s why the sudog has the data/task and the details for the waiting goroutine. So, the state of G1 is like this:

As you can see, G1 is placed on a run queue. Now we know what’s done by the goroutine and the go scheduler in case of buffered channels. In this example, the sender gorountine came first, but what if the receiver goroutine comes first? What if there’s no data in the channel and the receiver goroutine is executed first? The receiver goroutine (G2) will create a sudog in recvq on the hchan struct. Things are a little twisted when G1 goroutine activates. It will now see whether there are any goroutines waiting in the recvq, and if there is, it will copy the task to the waiting goroutine’s (G2) memory location, i.e., the elem attribute of the sudog.

This is incredible! Instead of writing to the buffer, it will write the task/data to the waiting goroutine’s space simply to avoid G2’s overhead when it activates. We know that each goroutine has its own resizable stack, and they never use each other’s space except in case of channels. Until now, we have seen how the send and receive happens in a buffered channel.

This may have been confusing, so let me give you the summary of the send operation.

Summary of a send operation for buffered channels:
1. Acquire lock on the entire channel or the hchan struct.
2. Check if there’s any sudog or a waiting goroutine in the recvq. If so, then put the element directly into its stack. We saw this just now with G1 writing to G2’s stack.
3. If recvq is empty, then check whether the buffer has space. If yes, then do a memory copy of the data.
4. If the buffer is full, then create a sudog under sendq of the hchan struct, which will have details, like a currently executing goroutine and the data to put on the channel.
We have seen all the above steps in detail, but concentrate on the last point.

It’s kind of similar to an unbuffered channel. We know that for unbuffered channels, every read must have a write operation first and vice versa.

So, keep in mind that an unbuffered channel always works like a direct send. So, a summary of a read and write operation in unbuffered channel could be:
- Sender first: At this point, there’s no receiver, so the sender will create a sudog of itself and the receiver will receive the value from the sudog.
- Receiver first: The receiver will create a sudog in recvq, and the sender will directly put the data in the receiver’s stack.
With this, we have covered the basics of channels. We’ve learned how read and write operates in a buffered and unbuffered channel, and we talked about the Go runtime scheduler.

Conclusion:

Channels is a very interesting Golang topic. They seem to be difficult to understand, but when you learn the mechanism, they’re very powerful and help you to achieve concurrency in applications. Hopefully, this blog helps your understanding of the fundamental concepts and the operations of channels.
December 12, 2022

Building Type Safe Backend Apps with Typegoose and TypeGraphQL

In this article, we will be trying to solve the most common problem encountered while trying to model MongoDB backend schema with TypeScript and Mongoose. We will also try to address and solve the difficulties of maintaining GraphQL types.

Almost every serious JavaScript developer uses TypeScript. However, many aged libraries do not support it natively, which becomes an increasing issue as the project grows. Then, if you add up GraphQL, which is a great modern API development solution, it becomes too much of a boilerplate.

Prerequisites

This article assumes that you have working knowledge of TypeScript, MongoDB, and GraphQL. We’ll be using Mongoose for specifying models, which is the go-to Object Document Mapper (ODM) solution for MongoDB.

Let’s consider a basic example of a Mongoose model written in TypeScript. This might look something like the one mentioned below, a user model with basic model properties (email, first name, last name, and password):

import { Document, Model, Schema } from "mongoose";
import { db } from "../util/database";

export interface IUserProps {
  email: string;
  firstName: string;
  lastName: string;
  password: string;
}

export interface IUserDocument extends IUserProps, Document {
}

export interface IUserModel extends Model<IUserDocument> {
  dateCreated: Date;
  lastUpdated: Date;
  hashPassword(password: string): string;
}

const UserSchema: Schema = new Schema(
  {
    email: {
      type: String,
      unique: true,
    },
    firstName: {
      type: String,
    },
    password: {
      type: String,
    },
  },
  { timestamps: true }
);

const hashPassword = (_password: string) => {
  // logic to hash passwords
}

UserSchema.method("hashPassword", hashPassword);

export const User: IUserModel = db.model<IUserDocument, IUserModel>(
  "User",
  UserSchema
);

import { Document, Model, Schema } from "mongoose";
import { db } from "../util/database";

export interface IUserProps {
  email: string;
  firstName: string;
  lastName: string;
  password: string;
}

export interface IUserDocument extends IUserProps, Document {
}

export interface IUserModel extends Model<IUserDocument> {
  dateCreated: Date;
  lastUpdated: Date;
  hashPassword(password: string): string;
}

const UserSchema: Schema = new Schema(
  {
    email: {
      type: String,
      unique: true,
    },
    firstName: {
      type: String,
    },
    password: {
      type: String,
    },
  },
  { timestamps: true }
);

const hashPassword = (_password: string) => {
  // logic to hash passwords
}

UserSchema.method("hashPassword", hashPassword);

export const User: IUserModel = db.model<IUserDocument, IUserModel>(
  "User",
  UserSchema
);

As you can see, it would be cumbersome to add and maintain interfaces manually with Mongoose. We would need at least 2-3 interfaces to occupy the typing needs to get model properties and methods working with proper typing.

Moving forward to add our queries and mutations, we need to create resolvers for the model above, assuming we have a service that deals with models. Here’s what our resolver looks like:

import { ObjectId } from 'bson';
import { IResolvers } from 'graphql-tools';
import { IUserProps } from './user.model';
import { UserService } from './user.service';

const userService = new UserService();
export const userResolvers: IResolvers = {
  Query: {
    User: (_root: unknown, args: { id: ObjectId }) => userService.get(args.id),
    //...
  },
  Mutation: {
    createUser: async (_root: unknown, args: IUserProps) => await userService.create(args),
    //...
  }
};

import { ObjectId } from 'bson';
import { IResolvers } from 'graphql-tools';
import { IUserProps } from './user.model';
import { UserService } from './user.service';

const userService = new UserService();
export const userResolvers: IResolvers = {
  Query: {
    User: (_root: unknown, args: { id: ObjectId }) => userService.get(args.id),
    //...
  },
  Mutation: {
    createUser: async (_root: unknown, args: IUserProps) => await userService.create(args),
    //...
  }
};

Not bad, we got our model and service and the resolver also looks good. But wait, we need to add GraphQL types as well. Here we are intentionally not including inputs to keep it short. Let’s do that:

type Query {
  User(id: String): User
}

type Mutation {
  createUser(
    email: String,
    firstName: String,
    lastName: String,
    password: String,
  ): User
}

type User {
  id: String!
  email: String!
  firstName: String!
  lastName: String!
  password: String!
}

type Query {
  User(id: String): User
}

type Mutation {
  createUser(
    email: String,
    firstName: String,
    lastName: String,
    password: String,
  ): User
}

type User {
  id: String!
  email: String!
  firstName: String!
  lastName: String!
  password: String!
}

Now, we have to club the schemas and resolvers together then pass them onto the GraphQL Express server—Apollo Server in this case:

import * as path from 'path';
import * as fs from 'fs';
import { ApolloServer } from 'apollo-server'
import { makeExecutableSchema }  from 'graphql-tools';
import { resolvers } from './src/resolvers';

const userSchema = path.join(__dirname, 'src/user/user.schema.graphql');
const schemaDef = fs.readFileSync(userSchema, 'utf8');

const schema = makeExecutableSchema({ typeDefs: schemaDef });

const server = new ApolloServer({ schema, resolvers });

server.listen().then(({ url }) => {
  console.log(`🚀 Server ready at ${url}`);
});

import * as path from 'path';
import * as fs from 'fs';
import { ApolloServer } from 'apollo-server'
import { makeExecutableSchema }  from 'graphql-tools';
import { resolvers } from './src/resolvers';

const userSchema = path.join(__dirname, 'src/user/user.schema.graphql');
const schemaDef = fs.readFileSync(userSchema, 'utf8');

const schema = makeExecutableSchema({ typeDefs: schemaDef });

const server = new ApolloServer({ schema, resolvers });

server.listen().then(({ url }) => {
  console.log(`🚀 Server ready at ${url}`);
});

With this setup, we got four files per model: model, resolver, service, and GraphQL schema file.

That’s too many things to keep in sync in real life. Imagine you need to add a new property to the above model after reaching production. You’ll end up doing at least following:

Add a migration to sync the DB
Update the interfaces
Update the model schema
Update the GraphQL schema

Possible Solution

As we know, after this setup, we’re mostly dealing with the entity models and struggling to keep its types and relations in sync.

If the model itself can handle it somehow, we can definitely save some effort, which means we can sort things out if these entity model classes can represent both the database schema and its types.

Adding TypeGoose

Mongoose schema declarations with TypeScript can get tricky—or there might be a better way. Let’s add TypeGoose, so you no longer have to maintain interfaces (arguably). Here’s what the same user model looks like:

import { DocumentType, getModelForClass, prop as Property } from '@typegoose/typegoose';
import { getSchemaOptions } from 'src/util/typegoose';
import { Field as GqlField, ObjectType as GqlType } from 'type-graphql';

export class User {
  readonly _id: string;

  @Property({ required: true })
  firstName: string;

  @Property({ required: false })
  lastName: string;

  @Property({ required: true })
  password: string;

  @Property({ required: true, unique: true })
  email: string;

  hashPassword(this: DocumentType<User>, _password: string) {
    // logic to hash passwords
  }
}

import { DocumentType, getModelForClass, prop as Property } from '@typegoose/typegoose';
import { getSchemaOptions } from 'src/util/typegoose';
import { Field as GqlField, ObjectType as GqlType } from 'type-graphql';

export class User {
  readonly _id: string;

  @Property({ required: true })
  firstName: string;

  @Property({ required: false })
  lastName: string;

  @Property({ required: true })
  password: string;

  @Property({ required: true, unique: true })
  email: string;

  hashPassword(this: DocumentType<User>, _password: string) {
    // logic to hash passwords
  }
}

Alright, no need for adding interfaces for the model and documents. You could have an interface for model implementation, but it’s not necessary.

With Reflect, which is used internally by TypeGoose, we managed to skip the need for additional interfaces.

If we want to add custom validations and messages, TypeGoose allows us to do that too. The prop decorator offers almost all the things you can expect from a mongoose model schema definition.

@Property({ required: false, unique: true })

@Property({ required: false, unique: true })

Adding TypeGraphQL

Alright, TypeGoose has helped us with handling mongoose schema smoothly. But, we still need to define types for GraphQL. Also, we need to update the model types whenever we change our models.

Let’s add TypeGraphQL.

import { DocumentType, getModelForClass, prop as Property } from '@typegoose/typegoose';
import { getSchemaOptions } from 'src/util/typegoose';
import { Field as GqlField, ObjectType as GqlType } from 'type-graphql';

@GqlType()
export class User {
  @GqlField(_type => String)
  readonly _id: string;

  @GqlField(_type => String)
  @Property({ required: true })
  firstName: string;

  @GqlField(_type => String, { nullable: true })
  @Property({ required: false })
  lastName: string;

  @GqlField(_type => String)
  @Property({ required: true })
  password: string;

  @GqlField(_type => String)
  @Property({ required: true, unique: true })
  email: string;

  hashPassword(this: DocumentType<User>, _password: string) {
    // logic to hash passwords
  }
}

import { DocumentType, getModelForClass, prop as Property } from '@typegoose/typegoose';
import { getSchemaOptions } from 'src/util/typegoose';
import { Field as GqlField, ObjectType as GqlType } from 'type-graphql';

@GqlType()
export class User {
  @GqlField(_type => String)
  readonly _id: string;

  @GqlField(_type => String)
  @Property({ required: true })
  firstName: string;

  @GqlField(_type => String, { nullable: true })
  @Property({ required: false })
  lastName: string;

  @GqlField(_type => String)
  @Property({ required: true })
  password: string;

  @GqlField(_type => String)
  @Property({ required: true, unique: true })
  email: string;

  hashPassword(this: DocumentType<User>, _password: string) {
    // logic to hash passwords
  }
}

What we just did is use the same TypeScript user class to define the schema as well as its GraphQL type—pretty neat.

Because we have added TypeGraphQL, our resolvers no longer need extra interfaces. We can add input classes for parameter types. Consider common input types such as CreateInput, UpdateInput, and FilterInput.

import { Arg, Ctx, Mutation, Resolver } from 'type-graphql';
import { User } from './user.model';
import { UserService } from './user.service';

@Resolver(_of => User)

export class UserResolver {
  private __userService: UserService;

  constructor() {
    this.__userService = new UserService();    
  }

  @Mutation(_returns => User)
  async createUser(@Arg('data', type => UserCreateInput) data: UserCreateInput, @Ctx() ctx: any) {
   return this.__userService.create(data)
  }
}

import { Arg, Ctx, Mutation, Resolver } from 'type-graphql';
import { User } from './user.model';
import { UserService } from './user.service';

@Resolver(_of => User)

export class UserResolver {
  private __userService: UserService;

  constructor() {
    this.__userService = new UserService();    
  }

  @Mutation(_returns => User)
  async createUser(@Arg('data', type => UserCreateInput) data: UserCreateInput, @Ctx() ctx: any) {
   return this.__userService.create(data)
  }
}

You can learn more about the syntax and input definition in the official docs.

That’s it. We are ready with our setup, and we can now simply build a schema and pass it to the server entry point just like that. No need to import schema files and merge resolvers. Simply pass array of resolvers to buildSchema:

import {ApolloServer} from 'apollo-server'

import { resolvers } from './src/resolvers';
import { buildSchema }  from 'type-graphql';

const schema = buildSchema({
  resolvers,
});

const server = new ApolloServer({ schema, resolvers });

server.listen().then(({ url }) => {
  console.log(`🚀 Server ready at ${url}`);
});

import {ApolloServer} from 'apollo-server'

import { resolvers } from './src/resolvers';
import { buildSchema }  from 'type-graphql';

const schema = buildSchema({
  resolvers,
});

const server = new ApolloServer({ schema, resolvers });

server.listen().then(({ url }) => {
  console.log(`🚀 Server ready at ${url}`);
});

Once implemented, this is how our custom demo project architecture might look:

Limitations and Alternatives

Though these packages save some work for us, one may decide not to go for them since they use experimental features such as experimental decorators. However, the acceptance of these experimental features is growing.

TypeGoose:

Though TypeGoose offers a great extension to Mongoose, they’ve recently introduced some breaking changes. Upgrading from recent versions might be a risk. One alternative to TypeGoose for decorator-based schema definitions is TypeORM. Though, it currently has basic experimental support for MongoDB.

TypeGraphQL:

TypeGraphQL is a well-maintained library. There are other options, like Nest.js and graphql-schema-decorators, which supports decorators for GraphQL schema.

However, as Nest.js’s GraphQL support is more framework-oriented, it might be more than needed. The other one is not supported any longer. You can even integrate TypeGraphQL with Nest.js with some caveats.

Conclusion

Unsurprisingly, both of these libraries use experimental decorators API with Reflect Metadata. Reflect Metadata adds additional metadata support to the class and its members. The concept might look innovative but it’s nothing new. Languages like C# and Java support attributes or annotations that add metadata to types. With these added, it becomes handy to create and maintain well-typed applications.

One thing to note here would be—though the article introduces the benefits of using TypeGraphQL and TypeGoose together—it does not mean you can’t use them separately. Depending upon your requirements, you may use either of the tools or a combination of them.

This article covers a very basic setup for introduction of the mentioned technologies. You might want to learn more about advanced real-life needs with these tools and techniques from some of the articles mentioned below.

Unit Testing Data at Scale using Deequ and Apache Spark

Everyone knows the importance of knowledge and how critical it is to progress. In today’s world, data is knowledge. But that’s only when the data is “good” and correctly interpreted. Let’s focus on the “good” part. What do we really mean by “good data”?

Its definition can change from use case to use case but, in general terms, good data can be defined by its accuracy, legitimacy, reliability, consistency, completeness, and availability.

Bad data can lead to failures in production systems, unexpected outputs, and wrong inferences, leading to poor business decisions.

It’s important to have something in place that can tell us about the quality of the data we have, how close it is to our expectations, and whether we can rely on it.

This is basically the problem we’re trying to solve.

The Problem and the Potential Solutions

A manual approach to data quality testing is definitely one of the solutions and can work well.

We’ll need to write code for computing various statistical measures, running them manually on different columns, maybe draw some plots, and then conduct some spot checks to see if there’s something not right or unexpected. The overall process can get tedious and time-consuming if we need to do it on a daily basis.

Certain tools can make life easier for us, like:

In this blog, we’ll be focussing on Amazon Deequ.

Amazon Deequ

Amazon Deequ is an open-source tool developed and used at Amazon. It’s built on top of Apache Spark, so it’s great at handling big data. Deequ computes data quality metrics regularly, based on the checks and validations set, and generates relevant reports.

Deequ provides a lot of interesting features, and we’ll be discussing them in detail. Here’s a look at its main components:

Source: AWS

Prerequisites

Working with Deequ requires having Apache Spark up and running with Deequ as one of the dependencies.

As of this blog, the latest version of Deequ, 1.1.0, supports Spark 2.2.x to 2.4.x and Spark 3.0.x.

Sample Dataset

For learning more about Deequ and its features, we’ll be using an open-source IMDb dataset which has the following schema:

root
 |-- tconst: string (nullable = true)
 |-- titleType: string (nullable = true)
 |-- primaryTitle: string (nullable = true)
 |-- originalTitle: string (nullable = true)
 |-- isAdult: integer (nullable = true)
 |-- startYear: string (nullable = true)
 |-- endYear: string (nullable = true)
 |-- runtimeMinutes: string (nullable = true)
 |-- genres: string (nullable = true)
 |-- averageRating: double (nullable = true)
 |-- numVotes: integer (nullable = true)

root
 |-- tconst: string (nullable = true)
 |-- titleType: string (nullable = true)
 |-- primaryTitle: string (nullable = true)
 |-- originalTitle: string (nullable = true)
 |-- isAdult: integer (nullable = true)
 |-- startYear: string (nullable = true)
 |-- endYear: string (nullable = true)
 |-- runtimeMinutes: string (nullable = true)
 |-- genres: string (nullable = true)
 |-- averageRating: double (nullable = true)
 |-- numVotes: integer (nullable = true)

Here, tconst is the primary key, and the rest of the columns are pretty much self-explanatory.

Data Analysis and Validation

Before we start defining checks on the data, if we want to compute some basic stats on the dataset, Deequ provides us with an easy way to do that. They’re called metrics.

Deequ provides support for the following metrics:

ApproxCountDistinct, ApproxQuantile, ApproxQuantiles, Completeness, Compliance, Correlation, CountDistinct, DataType, Distance, Distinctness, Entropy, Histogram, Maximum, MaxLength, Mean, Minimum, MinLength, MutualInformation, PatternMatch, Size, StandardDeviation, Sum, UniqueValueRatio, Uniqueness

ApproxCountDistinct, ApproxQuantile, ApproxQuantiles, Completeness, Compliance, Correlation, CountDistinct, DataType, Distance, Distinctness, Entropy, Histogram, Maximum, MaxLength, Mean, Minimum, MinLength, MutualInformation, PatternMatch, Size, StandardDeviation, Sum, UniqueValueRatio, Uniqueness

Let’s go ahead and apply some metrics to our dataset.

val runAnalyzer: AnalyzerContext = { AnalysisRunner
  .onData(data)
  .addAnalyzer(Size())
  .addAnalyzer(Completeness("averageRating"))
  .addAnalyzer(Uniqueness("tconst"))
  .addAnalyzer(Mean("averageRating"))
  .addAnalyzer(StandardDeviation("averageRating"))
  .addAnalyzer(Compliance("top rating", "averageRating >= 7.0"))
  .addAnalyzer(Correlation("numVotes", "averageRating"))
  .addAnalyzer(Distinctness("tconst"))
  .addAnalyzer(Maximum("averageRating"))
  .addAnalyzer(Minimum("averageRating"))
  .run()
}

val metricsResult = successMetricsAsDataFrame(spark, runAnalyzer)
metricsResult.show(false)

val runAnalyzer: AnalyzerContext = { AnalysisRunner
  .onData(data)
  .addAnalyzer(Size())
  .addAnalyzer(Completeness("averageRating"))
  .addAnalyzer(Uniqueness("tconst"))
  .addAnalyzer(Mean("averageRating"))
  .addAnalyzer(StandardDeviation("averageRating"))
  .addAnalyzer(Compliance("top rating", "averageRating >= 7.0"))
  .addAnalyzer(Correlation("numVotes", "averageRating"))
  .addAnalyzer(Distinctness("tconst"))
  .addAnalyzer(Maximum("averageRating"))
  .addAnalyzer(Minimum("averageRating"))
  .run()
}

val metricsResult = successMetricsAsDataFrame(spark, runAnalyzer)
metricsResult.show(false)

We get the following output by running the code above:

+-----------+----------------------+-----------------+--------------------+
|entity     |instance              |name             |value               |
+-----------+----------------------+-----------------+--------------------+
|Mutlicolumn|numVotes,averageRating|Correlation      |0.013454113877394851|
|Column     |tconst                |Uniqueness       |1.0                 |
|Column     |tconst                |Distinctness     |1.0                 |
|Dataset    |*                     |Size             |7339583.0           |
|Column     |averageRating         |Completeness     |0.14858528066240276 |
|Column     |averageRating         |Mean             |6.886130810579155   |
|Column     |averageRating         |StandardDeviation|1.3982924856469208  |
|Column     |averageRating         |Maximum          |10.0                |
|Column     |averageRating         |Minimum          |1.0                 |
|Column     |top rating            |Compliance       |0.080230443609671   |
+-----------+----------------------+-----------------+--------------------+

+-----------+----------------------+-----------------+--------------------+
|entity     |instance              |name             |value               |
+-----------+----------------------+-----------------+--------------------+
|Mutlicolumn|numVotes,averageRating|Correlation      |0.013454113877394851|
|Column     |tconst                |Uniqueness       |1.0                 |
|Column     |tconst                |Distinctness     |1.0                 |
|Dataset    |*                     |Size             |7339583.0           |
|Column     |averageRating         |Completeness     |0.14858528066240276 |
|Column     |averageRating         |Mean             |6.886130810579155   |
|Column     |averageRating         |StandardDeviation|1.3982924856469208  |
|Column     |averageRating         |Maximum          |10.0                |
|Column     |averageRating         |Minimum          |1.0                 |
|Column     |top rating            |Compliance       |0.080230443609671   |
+-----------+----------------------+-----------------+--------------------+

Let’s try to quickly understand what this tells us.

The dataset has 7,339,583 rows.
The distinctness and uniqueness of the tconst column is 1.0, which means that all the values in the column are distinct and unique, which should be expected as it’s the primary key column.
The averageRating column has a min of 1 and a max of 10 with a mean of 6.88 and a standard deviation of 1.39, which tells us about the variation in the average rating values across the data.
The completeness of the averageRating column is 0.148, which tells us that we have an average rating available for around 15% of the dataset’s records.
Then, we tried to see if there’s any correlation between the numVotes and averageRating column. This metric calculates the Pearson correlation coefficient, which has a value of 0.01, meaning there’s no correlation between the two columns, which is expected.

This feature of Deequ can be really helpful if we want to quickly do some basic analysis on a dataset.

Let’s move on to defining and running tests and checks on the data.

Data Validation

For writing tests for our dataset, we use Deequ’s VerificationSuite and add checks on attributes of the dataset.

Deequ has a big handy list of validators available to use, which are:

hasSize, isComplete, hasCompleteness, isUnique, isPrimaryKey, hasUniqueness, hasDistinctness, hasUniqueValueRatio, hasNumberOfDistinctValues, hasHistogramValues, hasEntropy, hasMutualInformation, hasApproxQuantile, hasMinLength, hasMaxLength, hasMin, hasMax, hasMean, hasSum, hasStandardDeviation, hasApproxCountDistinct, hasCorrelation, satisfies, hasPattern, containsCreditCardNumber, containsEmail, containsURL, containsSocialSecurityNumber, hasDataType, isNonNegative, isPositive, isLessThan, isLessThanOrEqualTo, isGreaterThan, isGreaterThanOrEqualTo, isContainedIn

hasSize, isComplete, hasCompleteness, isUnique, isPrimaryKey, hasUniqueness, hasDistinctness, hasUniqueValueRatio, hasNumberOfDistinctValues, hasHistogramValues, hasEntropy, hasMutualInformation, hasApproxQuantile, hasMinLength, hasMaxLength, hasMin, hasMax, hasMean, hasSum, hasStandardDeviation, hasApproxCountDistinct, hasCorrelation, satisfies, hasPattern, containsCreditCardNumber, containsEmail, containsURL, containsSocialSecurityNumber, hasDataType, isNonNegative, isPositive, isLessThan, isLessThanOrEqualTo, isGreaterThan, isGreaterThanOrEqualTo, isContainedIn

Let’s apply some checks to our dataset.

val validationResult: VerificationResult = { VerificationSuite()
  .onData(data)
  .addCheck(
    Check(CheckLevel.Error, "Review Check") 
      .hasSize(_ >= 100000) // check if the data has atleast 100k records
      .hasMin("averageRating", _ > 0.0) // min rating should not be less than 0
      .hasMax("averageRating", _ < 9.0) // max rating should not be greater than 9
      .containsURL("titleType") // verify that titleType column has URLs
      .isComplete("primaryTitle") // primaryTitle should never be NULL
      .isNonNegative("numVotes") // should not contain negative values
      .isPrimaryKey("tconst") // verify that tconst is the primary key column
      .hasDataType("isAdult", ConstrainableDataTypes.Integral) 
      //column contains Integer values only, expected as values this col has are 0 or 1
      )
  .run()
}

val results = checkResultsAsDataFrame(spark, validationResult)
results.select("constraint","constraint_status","constraint_message").show(false)

val validationResult: VerificationResult = { VerificationSuite()
  .onData(data)
  .addCheck(
    Check(CheckLevel.Error, "Review Check") 
      .hasSize(_ >= 100000) // check if the data has atleast 100k records
      .hasMin("averageRating", _ > 0.0) // min rating should not be less than 0
      .hasMax("averageRating", _ < 9.0) // max rating should not be greater than 9
      .containsURL("titleType") // verify that titleType column has URLs
      .isComplete("primaryTitle") // primaryTitle should never be NULL
      .isNonNegative("numVotes") // should not contain negative values
      .isPrimaryKey("tconst") // verify that tconst is the primary key column
      .hasDataType("isAdult", ConstrainableDataTypes.Integral) 
      //column contains Integer values only, expected as values this col has are 0 or 1
      )
  .run()
}

val results = checkResultsAsDataFrame(spark, validationResult)
results.select("constraint","constraint_status","constraint_message").show(false)

We have added some checks to our dataset, and the details about the check can be seen as comments in the above code.

We expect all checks to pass for our dataset except the containsURL and hasMax ones.

That’s because the titleType column doesn’t have URLs, and we know that the max rating is 10.0, but we are checking against 9.0.

We can see the output below:

+--------------------------------------------------------------------------------------------+-----------------+-----------------------------------------------------+
|constraint                                                                                  |constraint_status|constraint_message                                   |
+--------------------------------------------------------------------------------------------+-----------------+-----------------------------------------------------+
|SizeConstraint(Size(None))                                                                  |Success          |                                                     |
|MinimumConstraint(Minimum(averageRating,None))                                              |Success          |                                                     |
|MaximumConstraint(Maximum(averageRating,None))                                              |Failure          |Value: 10.0 does not meet the constraint requirement!|
|containsURL(titleType)                                                                      |Failure          |Value: 0.0 does not meet the constraint requirement! |
|CompletenessConstraint(Completeness(primaryTitle,None))                                     |Success          |                                                     |
|ComplianceConstraint(Compliance(numVotes is non-negative,COALESCE(numVotes, 0.0) >= 0,None))|Success          |                                                     |
|UniquenessConstraint(Uniqueness(List(tconst),None))                                         |Success          |                                                     |
|AnalysisBasedConstraint(DataType(isAdult,None),<function1>,Some(<function1>),None)          |Success          |                                                     |
+--------------------------------------------------------------------------------------------+-----------------+-----------------------------------------------------+
view raw

+--------------------------------------------------------------------------------------------+-----------------+-----------------------------------------------------+
|constraint                                                                                  |constraint_status|constraint_message                                   |
+--------------------------------------------------------------------------------------------+-----------------+-----------------------------------------------------+
|SizeConstraint(Size(None))                                                                  |Success          |                                                     |
|MinimumConstraint(Minimum(averageRating,None))                                              |Success          |                                                     |
|MaximumConstraint(Maximum(averageRating,None))                                              |Failure          |Value: 10.0 does not meet the constraint requirement!|
|containsURL(titleType)                                                                      |Failure          |Value: 0.0 does not meet the constraint requirement! |
|CompletenessConstraint(Completeness(primaryTitle,None))                                     |Success          |                                                     |
|ComplianceConstraint(Compliance(numVotes is non-negative,COALESCE(numVotes, 0.0) >= 0,None))|Success          |                                                     |
|UniquenessConstraint(Uniqueness(List(tconst),None))                                         |Success          |                                                     |
|AnalysisBasedConstraint(DataType(isAdult,None),<function1>,Some(<function1>),None)          |Success          |                                                     |
+--------------------------------------------------------------------------------------------+-----------------+-----------------------------------------------------+
view raw

In order to perform these checks, behind the scenes, Deequ calculated metrics that we saw in the previous section.

To look at the metrics Deequ computed for the checks we defined, we can use:

VerificationResult.successMetricsAsDataFrame(spark,validationResult)
                  .show(truncate=false)

VerificationResult.successMetricsAsDataFrame(spark,validationResult)
                  .show(truncate=false)

Automated Constraint Suggestion

Automated constraint suggestion is a really interesting and useful feature provided by Deequ.

Adding validation checks on a dataset with hundreds of columns or on a large number of datasets can be challenging. With this feature, Deequ tries to make our task easier. Deequ analyses the data distribution and, based on that, suggests potential useful constraints that can be used as validation checks.

Let’s see how this works.

This piece of code can automatically generate constraint suggestions for us:

val constraintResult = { ConstraintSuggestionRunner()
  .onData(data)
  .addConstraintRules(Rules.DEFAULT)
  .run()
}

val suggestionsDF = constraintResult.constraintSuggestions.flatMap { 
  case (column, suggestions) => 
    suggestions.map { constraint =>
      (column, constraint.description, constraint.codeForConstraint)
    } 
}.toSeq.toDS()

suggestionsDF.select("_1","_2").show(false)

val constraintResult = { ConstraintSuggestionRunner()
  .onData(data)
  .addConstraintRules(Rules.DEFAULT)
  .run()
}

val suggestionsDF = constraintResult.constraintSuggestions.flatMap { 
  case (column, suggestions) => 
    suggestions.map { constraint =>
      (column, constraint.description, constraint.codeForConstraint)
    } 
}.toSeq.toDS()

suggestionsDF.select("_1","_2").show(false)

Let’s look at constraint suggestions generated by Deequ:

+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|runtimeMinutes|'runtimeMinutes' has less than 72% missing values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|tconst        |'tconst' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|titleType     |'titleType' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|titleType     |'titleType' has value range 'tvEpisode', 'short', 'movie', 'video', 'tvSeries', 'tvMovie', 'tvMiniSeries', 'tvSpecial', 'videoGame', 'tvShort'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|titleType     |'titleType' has value range 'tvEpisode', 'short', 'movie' for at least 90.0% of values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|averageRating |'averageRating' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|originalTitle |'originalTitle' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|startYear     |'startYear' has less than 9% missing values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|startYear     |'startYear' has type Integral                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|startYear     |'startYear' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|endYear       |'endYear' has type Integral  
|endYear       |'endYear' has value range '2017', '2018', '2019', '2016', '2015', '2020', '2014', '2013', '2012', '2011', '2010',......|
|endYear       |'endYear' has value range '' for at least 99.0% of values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|endYear       |'endYear' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|numVotes      |'numVotes' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|primaryTitle  |'primaryTitle' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|isAdult       |'isAdult' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|isAdult       |'isAdult' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|genres        |'genres' has less than 7% missing values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|runtimeMinutes|'runtimeMinutes' has less than 72% missing values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|tconst        |'tconst' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|titleType     |'titleType' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|titleType     |'titleType' has value range 'tvEpisode', 'short', 'movie', 'video', 'tvSeries', 'tvMovie', 'tvMiniSeries', 'tvSpecial', 'videoGame', 'tvShort'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|titleType     |'titleType' has value range 'tvEpisode', 'short', 'movie' for at least 90.0% of values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|averageRating |'averageRating' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|originalTitle |'originalTitle' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|startYear     |'startYear' has less than 9% missing values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|startYear     |'startYear' has type Integral                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|startYear     |'startYear' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|endYear       |'endYear' has type Integral  
|endYear       |'endYear' has value range '2017', '2018', '2019', '2016', '2015', '2020', '2014', '2013', '2012', '2011', '2010',......|
|endYear       |'endYear' has value range '' for at least 99.0% of values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|endYear       |'endYear' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|numVotes      |'numVotes' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|primaryTitle  |'primaryTitle' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|isAdult       |'isAdult' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|isAdult       |'isAdult' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|genres        |'genres' has less than 7% missing values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

We shouldn’t expect the constraint suggestions generated by Deequ to always make sense. They should always be verified before using.

This is because the algorithm that generates the constraint suggestions just works on the data distribution and isn’t exactly “intelligent.”

We can see that most of the suggestions generated make sense even though they might be really trivial.

For the endYear column, one of the suggestions is that endYear should be contained in a list of years, which indeed is true for our dataset. However, it can’t be generalized as every passing year, the value for endYear continues to increase.

But on the other hand, the suggestion that titleType can take the following values: ‘tvEpisode,’ ‘short,’ ‘movie,’ ‘video,’ ‘tvSeries,’ ‘tvMovie,’ ‘tvMiniSeries,’ ‘tvSpecial,’ ‘videoGame,’ and ‘tvShort’ makes sense and can be generalized, which makes it a great suggestion.

And this is why we should not blindly use the constraints suggested by Deequ and always cross-check them.

Something we can do to improve the constraint suggestions is to use the useTrainTestSplitWithTestsetRatio method in ConstraintSuggestionRunner.
It makes a lot of sense to use this on large datasets.

How does this work? If we use the config useTrainTestSplitWithTestsetRatio(0.1), Deequ would compute constraint suggestions on 90% of the data and evaluate the suggested constraints on the remaining 10%, which would improve the quality of the suggested constraints.

Anomaly Detection

Deequ also supports anomaly detection for data quality metrics.

The idea behind Deequ’s anomaly detection is that often we have a sense of how much change in certain metrics of our data can be expected. Say we are getting new data every day, and we know that the number of records we get on a daily basis are around 8 to 12k. On a random day, if we get 40k records, we know something went wrong with the data ingestion job or some other job didn’t go right.

Deequ will regularly store the metrics of our data in a MetricsRepository. Once that’s done, anomaly detection checks can be run. These compare the current values of the metrics to the historical values stored in the MetricsRepository, and that helps Deequ to detect anomalous changes that are a red flag.

One of Deequ’s anomaly detection strategies is the RateOfChangeStrategy, which limits the maximum change in the metrics by some numerical factor that can be passed as a parameter.

Deequ supports other strategies that can be found here. And code examples for anomaly detection can be found here.

Conclusion

We learned about the main features and capabilities of AWS Lab’s Deequ.

It might feel a little daunting to people unfamiliar with Scala or Spark, but using Deequ is very easy and straightforward. Someone with a basic understanding of Scala or Spark should be able to work with Deequ’s primary features without any friction.

For someone who rarely deals with data quality checks, manual test runs might be a good enough option. However, for someone dealing with new datasets frequently, as in multiple times in a day or a week, using a tool like Deequ to perform automated data quality testing makes a lot of sense in terms of time and effort.

We hope this article helped you get a deep dive into data quality testing and using Deequ for these types of engineering practices.

December 12, 2022

Test Automation in React Native apps using Appium and WebdriverIO

React Native provides a mobile app development experience without sacrificing user experience or visual performance. And when it comes to mobile app UI testing, Appium is a great way to test indigenous React Native apps out of the box. Creating native apps from the same code and being able to do it using JavaScript has made Appium popular. Apart from this, businesses are attracted by the fact that they can save a lot of money by using this application development framework.

In this blog, we are going to cover how to add automated tests for React native apps using Appium & WebdriverIO with a Node.js framework.

What are React Native Apps

‍React Native is an open-source framework for building Android and iOS apps using React and local app capabilities. With React Native, you can use JavaScript to access the APIs on your platform and define the look and behavior of your UI using React components: lots of usable, non-compact code. In the development of Android and iOS apps, “viewing” is the basic building block of a UI: this small rectangular object on the screen can be used to display text, photos, or user input. Even the smallest detail of an app, such as a text line or a button, is a kind of view. Some views may contain other views.

What is Appium

‍Appium is an open-source tool for traditional automation, web, and hybrid apps on iOS, Android, and Windows desktop mobile platforms. Indigenous apps are those written using iOS and Android. Mobile web applications are accessed using a mobile browser (Appium supports Safari for iOS apps and Chrome or the built-in ‘Browser’ for Android apps). Hybrid apps have a wrapper around “web view”—a traditional controller that allows you to interact with web content. Projects like Apache Cordova make it easy to build applications using web technology and integrate it into a traditional wrapper, creating a hybrid application.

Importantly, Appium is “cross-platform”, allowing you to write tests against multiple platforms (iOS, Android), using the same API. This enables code usage between iOS, Android, and Windows test suites. It runs on iOS and Android applications using the WebDriver protocol.

What is WebDriverIO

‍WebdriverIO is a next-gen browser and Node.js automated mobile testing framework. It allows you to customize any application written with modern web frameworks for mobile devices or browsers, such as React, Angular, Polymeror, and Vue.js.

WebdriverIO is a widely used test automation framework in JavaScript. It has various features like it supports many reports and services, Test Frameworks, and WDIO CLI Test Runners

The following are examples of supported services:

Appium Service
Devtools Service
Firefox Profile Service
Selenium Standalone Service
Shared Store Service
Static Server Service
ChromeDriver Service
Report Portal Service
Docker Service

The followings are supported by the test framework:

Mocha
Jasmine
Cucumber

Key features of Appium & WebdriverIO

Appium

Does not require application source code or library
Provides a strong and active community
Has multi-platform support, i.e., it can run the same test cases on multiple platforms
Allows the parallel execution of test scripts
In Appium, a small change does not require reinstallation of the application
Supports various languages like C#, Python, Java, Ruby, PHP, JavaScript with node.js, and many others that have a Selenium client library

WebdriverIO

Extendable
Compatible
Feature-rich
Supports modern web and mobile frameworks
Runs automation tests both for web applications as well as native mobile apps.
Simple and easy syntax
Integrates tests to third-party tools such as Appium
‘Wdio setup wizard’ makes the setup simple and easy
Integrated test runner

Installation & Configuration

Install the latest stable version of Android Studio from https://developer.android.com/studio
Install android-platform-tools from CLI
Install JDK latest stable version from here https://www.oracle.com/java/technologies/javase-jdk16-downloads.html
Download the latest stable version of VS Code from https://code.visualstudio.com/download
Install the latest version of Allure for Report Generation from https://docs.qameta.io/allure/
Download and install the latest LTS Node.js – https://nodejs.org/en/download
Open Terminal
Create Project Directory

$ mkdir Demo_Appium_Project

$ mkdir Demo_Appium_Project

Create a sample Appium Project

$ npm init
$ package name: (demo_appium_project) demo_appium_test
$ version: (1.0.0) 1.0.0
$ description: demo_appium_practice
$ entry point: (index.js) index.js
$ test command: "./node_modules/.bin/wdio wdio.conf.js"
$ git repository: 
$ keywords: 
$ author: Pushkar
$ license: (ISC) ISC

$ npm init
$ package name: (demo_appium_project) demo_appium_test
$ version: (1.0.0) 1.0.0
$ description: demo_appium_practice
$ entry point: (index.js) index.js
$ test command: "./node_modules/.bin/wdio wdio.conf.js"
$ git repository: 
$ keywords: 
$ author: Pushkar
$ license: (ISC) ISC

This will also create a package.json file for test settings and project dependencies.

Install node packages

$ npm install

$ npm install

Install Appium through npm or as a standalone app.

$ npm install -g appium or npm install --save appium

$ npm install -g appium or npm install --save appium

Appium Desktop version can be downloaded from here https://github.com/appium/appium-desktop/releases/
Install WebdriverIO

$ npm install -g webdriverio or npm install --save-dev webdriverio @wdio/cli

$ npm install -g webdriverio or npm install --save-dev webdriverio @wdio/cli

Install Chai Assertion library

$ npm install -g chai or npm install --save chai

$ npm install -g chai or npm install --save chai

Make sure you have following versions installed:

$ node --version - v.14.17.0
$ npm --version - 7.17.0
$ appium --version - 1.21.0
$ java --version - java 16.0.1
$ allure --version - 2.14.0

$ node --version - v.14.17.0
$ npm --version - 7.17.0
$ appium --version - 1.21.0
$ java --version - java 16.0.1
$ allure --version - 2.14.0

WebdriverIO Configuration

The web driver configuration file must be created to apply the configuration during the test Generate command below project:

$ npx wdio config

$ npx wdio config

With the following series of questions, install the required dependencies,

$ Where is your automation backend located? - On my local machine
$ Which framework do you want to use? - mocha	
$ Do you want to use a compiler? No!
$ Where are your test specs located? - ./test/specs/**/*.js
$ Do you want WebdriverIO to autogenerate some test files? - Yes
$ Do you want to use page objects (https://martinfowler.com/bliki/PageObject.html)? - No
$ Which reporter do you want to use? - Allure
$ Do you want to add a service to your test setup? - No
$ What is the base url? - http://localhost

$ Where is your automation backend located? - On my local machine
$ Which framework do you want to use? - mocha	
$ Do you want to use a compiler? No!
$ Where are your test specs located? - ./test/specs/**/*.js
$ Do you want WebdriverIO to autogenerate some test files? - Yes
$ Do you want to use page objects (https://martinfowler.com/bliki/PageObject.html)? - No
$ Which reporter do you want to use? - Allure
$ Do you want to add a service to your test setup? - No
$ What is the base url? - http://localhost

This is how wdio.conf.js looks:

exports.config = {
 port: 4724,
 path: '/wd/hub/',
 runner: 'local',
 specs: ['./test/specs/*.js'],
 maxInstances: 1,
 capabilities: [
   {
     platformName: 'Android',
     platformVersion: '11',
     appPackage: 'com.facebook.katana',
     appActivity: 'com.facebook.katana.LoginActivity',
     automationName: 'UiAutomator2'
   }
 ],
 services: [
   [
     'appium',
     {
       args: {
         relaxedSecurity: true
        },
       command: 'appium'
     }
   ]
 ],
 logLevel: 'debug',
 bail: 0,
 baseUrl: 'http://localhost',
 waitforTimeout: 10000,
 connectionRetryTimeout: 90000,
 connectionRetryCount: 3,
 framework: 'mocha',
 reporters: [
   [
     'allure',
     {
       outputDir: 'allure-results',
       disableWebdriverStepsReporting: true,
       disableWebdriverScreenshotsReporting: false
     }
   ]
 ],
 mochaOpts: {
   ui: 'bdd',
   timeout: 60000
 },
 afterTest: function(test, context, { error, result, duration, passed, retries }) {
   if (!passed) {
       browser.takeScreenshot();
   }
 }
}
view raw

exports.config = {
 port: 4724,
 path: '/wd/hub/',
 runner: 'local',
 specs: ['./test/specs/*.js'],
 maxInstances: 1,
 capabilities: [
   {
     platformName: 'Android',
     platformVersion: '11',
     appPackage: 'com.facebook.katana',
     appActivity: 'com.facebook.katana.LoginActivity',
     automationName: 'UiAutomator2'
   }
 ],
 services: [
   [
     'appium',
     {
       args: {
         relaxedSecurity: true
        },
       command: 'appium'
     }
   ]
 ],
 logLevel: 'debug',
 bail: 0,
 baseUrl: 'http://localhost',
 waitforTimeout: 10000,
 connectionRetryTimeout: 90000,
 connectionRetryCount: 3,
 framework: 'mocha',
 reporters: [
   [
     'allure',
     {
       outputDir: 'allure-results',
       disableWebdriverStepsReporting: true,
       disableWebdriverScreenshotsReporting: false
     }
   ]
 ],
 mochaOpts: {
   ui: 'bdd',
   timeout: 60000
 },
 afterTest: function(test, context, { error, result, duration, passed, retries }) {
   if (!passed) {
       browser.takeScreenshot();
   }
 }
}
view raw

For iOS Automation, just add the following capabilities in wdio.conf.js & the Appium Configuration:

{
  "platformName": "IOS",
  "platformVersion": "14.5",
  "app": "/Your_PATH/wdioNativeDemoApp.app",
  "deviceName": "iPhone 12 Pro Max"
}

{
  "platformName": "IOS",
  "platformVersion": "14.5",
  "app": "/Your_PATH/wdioNativeDemoApp.app",
  "deviceName": "iPhone 12 Pro Max"
}

Launch the iOS Simulator from Xcode

Install Appium Doctor for iOS by using following command:

npm install -g appium-doctor

npm install -g appium-doctor

This is how package.json will look:

{
 "name": "demo_appium_test",
 "version": "1.0.0",
 "description": "demo_appium_practice",
 "main": "index.js",
 "scripts": {
   "test": "./node_modules/.bin/wdio wdio.conf.js"
 },
 "author": "Pushkar",
 "license": "ISC",
 "dependencies": {
   "@wdio/sync": "^7.7.4",
   "appium": "^1.21.0",
   "chai": "^4.3.4",
   "webdriverio": "^7.7.4"
 },
 "devDependencies": {
   "@wdio/allure-reporter": "^7.7.3",
   "@wdio/appium-service": "^7.7.3",
   "@wdio/cli": "^7.7.4",
   "@wdio/local-runner": "^7.7.4",
   "@wdio/mocha-framework": "^7.7.4",
   "@wdio/selenium-standalone-service": "^7.7.4"
 }
}

{
 "name": "demo_appium_test",
 "version": "1.0.0",
 "description": "demo_appium_practice",
 "main": "index.js",
 "scripts": {
   "test": "./node_modules/.bin/wdio wdio.conf.js"
 },
 "author": "Pushkar",
 "license": "ISC",
 "dependencies": {
   "@wdio/sync": "^7.7.4",
   "appium": "^1.21.0",
   "chai": "^4.3.4",
   "webdriverio": "^7.7.4"
 },
 "devDependencies": {
   "@wdio/allure-reporter": "^7.7.3",
   "@wdio/appium-service": "^7.7.3",
   "@wdio/cli": "^7.7.4",
   "@wdio/local-runner": "^7.7.4",
   "@wdio/mocha-framework": "^7.7.4",
   "@wdio/selenium-standalone-service": "^7.7.4"
 }
}

Steps to follow if npm legacy peer deeps problem occurred:

npm install --save --legacy-peer-deps
npm config set legacy-peer-deps true
npm i --legacy-peer-deps
npm config set legacy-peer-deps true
npm cache clean --force

npm install --save --legacy-peer-deps
npm config set legacy-peer-deps true
npm i --legacy-peer-deps
npm config set legacy-peer-deps true
npm cache clean --force

This is how the folder structure will look in Appium with the WebDriverIO Framework:

Step-by-Step Configuration of Android Emulator using Android Studio

‍

‍

‍Appium Desktop Configuration

Setup of ANDROID_HOME + ANDROID_SDK_ROOT & JAVA_HOME

Follow these steps for setting up ANDROID_HOME:

vi ~/.bash_profile
Add following 
export ANDROID_HOME=/Users/pushkar/android-sdk 
export PATH=$PATH:$ANDROID_HOME/platform-tools 
export PATH=$PATH:$ANDROID_HOME/tools 
export PATH=$PATH:$ANDROID_HOME/tools/bin 
export PATH=$PATH:$ANDROID_HOME/emulator
Save ~/.bash_profile 
source ~/.bash_profile 
echo $ANDROID_HOME
/Users/pushkar/Library/Android/sdk

vi ~/.bash_profile
Add following 
export ANDROID_HOME=/Users/pushkar/android-sdk 
export PATH=$PATH:$ANDROID_HOME/platform-tools 
export PATH=$PATH:$ANDROID_HOME/tools 
export PATH=$PATH:$ANDROID_HOME/tools/bin 
export PATH=$PATH:$ANDROID_HOME/emulator
Save ~/.bash_profile 
source ~/.bash_profile 
echo $ANDROID_HOME
/Users/pushkar/Library/Android/sdk

Follow these steps for setting up ANDROID_SDK_ROOT:

vi ~/.bash_profile
Add following 
export ANDROID_HOME=/Users/pushkar/Android/sdk
export ANDROID_SDK_ROOT=/Users/pushkar/Android/sdk
export ANDROID_AVD_HOME=/Users/pushkar/.android/avd
Save ~/.bash_profile 
source ~/.bash_profile 
echo $ANDROID_SDK_ROOT
/Users/pushkar/Library/Android/sdk

vi ~/.bash_profile
Add following 
export ANDROID_HOME=/Users/pushkar/Android/sdk
export ANDROID_SDK_ROOT=/Users/pushkar/Android/sdk
export ANDROID_AVD_HOME=/Users/pushkar/.android/avd
Save ~/.bash_profile 
source ~/.bash_profile 
echo $ANDROID_SDK_ROOT
/Users/pushkar/Library/Android/sdk

Follow these steps for setting up JAVA_HOME:

java --version
vi ~/.bash_profile
Add following 
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home.
echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home

java --version
vi ~/.bash_profile
Add following 
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home.
echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home

‍

Note – Make sure you need to install the app from Google Play Store.

Fig: – Android Emulator with Facebook React Native Mobile App

Fig:- Locating Elements using Appium Inspector

How to write E2E React Native Mobile App Tests

‍Here is an example of how to write E2E test in Appium:

Positive Testing Scenario – Validate Login & Nav Bar

Open Facebook React Native App
Enter valid email and password
Click on Login
Users should be able to login into Facebook

Negative Testing Scenario – Invalid Login

Open Facebook React Native App
Enter invalid email and password
Click on login
Users should not be able to login after receiving an “Incorrect Password” message popup

Negative Testing Scenario – Invalid Element

Open Facebook React Native App
Enter invalid email and password
Click on login
Provide invalid element to capture message

Make sure test_script should be under test/specs folder

var expect = require('chai').expect

beforeEach(() => {
 driver.launchApp()
})

afterEach(() => {
 driver.closeApp()
})

describe('Verify Login Scenarios on Facebook React Native Mobile App', () => {
 it('User should be able to login using valid credentials to Facebook Mobile App', () => {   
   $(`~Username`).waitForDisplayed(20000)
   $(`~Username`).setValue('Valid-Email')
   $(`~Password`).waitForDisplayed(20000)
   $(`~Password`).setValue('Valid-Password')
   $('~Log In').click()
   browser.pause(10000)
 })

 it('User should not be able to login with invalid credentials to Facebook Mobile App', () => {
   $(`~Username`).waitForDisplayed(20000)
   $(`~Username`).setValue('Invalid-Email')
   $(`~Password`).waitForDisplayed(20000)
   $(`~Password`).setValue('Invalid-Password')   
   $('~Log In').click()
   $(
       '//android.widget.TextView[@resource-id="com.facebook.katana:id/(name removed)"]'
     )
     .waitForDisplayed(11000)
   const status = $(
     '//android.widget.TextView[@resource-id="com.facebook.katana:id/(name removed)"]'
   ).getText()
   expect(status).to.equal(
     `You Can't Use This Feature Right Now`     
   )
 })

 it('Test Case should Fail Because of Invalid Element', () => {
   $(`~Username`).waitForDisplayed(20000)
   $(`~Username`).setValue('Invalid-Email')
   $(`~Password`).waitForDisplayed(20000)
   $(`~Password`).setValue('Invalid-Pasword')   
   $('~Log In').click()
   $(
       '//android.widget.TextView[@resource-id="com.facebook.katana:id/(name removed)"'
     )
     .waitForDisplayed(11000)
   const status = $(
     '//android.widget.TextView[@resource-id="com.facebook.katana"'
   ).getText()
   expect(status).to.equal(
     `You Can't Use This Feature Right Now`     
   )
 })

})

var expect = require('chai').expect

beforeEach(() => {
 driver.launchApp()
})

afterEach(() => {
 driver.closeApp()
})

describe('Verify Login Scenarios on Facebook React Native Mobile App', () => {
 it('User should be able to login using valid credentials to Facebook Mobile App', () => {   
   $(`~Username`).waitForDisplayed(20000)
   $(`~Username`).setValue('Valid-Email')
   $(`~Password`).waitForDisplayed(20000)
   $(`~Password`).setValue('Valid-Password')
   $('~Log In').click()
   browser.pause(10000)
 })

 it('User should not be able to login with invalid credentials to Facebook Mobile App', () => {
   $(`~Username`).waitForDisplayed(20000)
   $(`~Username`).setValue('Invalid-Email')
   $(`~Password`).waitForDisplayed(20000)
   $(`~Password`).setValue('Invalid-Password')   
   $('~Log In').click()
   $(
       '//android.widget.TextView[@resource-id="com.facebook.katana:id/(name removed)"]'
     )
     .waitForDisplayed(11000)
   const status = $(
     '//android.widget.TextView[@resource-id="com.facebook.katana:id/(name removed)"]'
   ).getText()
   expect(status).to.equal(
     `You Can't Use This Feature Right Now`     
   )
 })

 it('Test Case should Fail Because of Invalid Element', () => {
   $(`~Username`).waitForDisplayed(20000)
   $(`~Username`).setValue('Invalid-Email')
   $(`~Password`).waitForDisplayed(20000)
   $(`~Password`).setValue('Invalid-Pasword')   
   $('~Log In').click()
   $(
       '//android.widget.TextView[@resource-id="com.facebook.katana:id/(name removed)"'
     )
     .waitForDisplayed(11000)
   const status = $(
     '//android.widget.TextView[@resource-id="com.facebook.katana"'
   ).getText()
   expect(status).to.equal(
     `You Can't Use This Feature Right Now`     
   )
 })

})

How to Run Mobile Tests Scripts

$ npm test 
This will create a Results folder with .xml report

$ npm test 
This will create a Results folder with .xml report

Reporting

The following are examples of the supported reporters:

Allure Reporter
Concise Reporter
Dot Reporter
JUnit Reporter
Spec Reporter
Sumologic Reporter
Report Portal Reporter
Video Reporter
HTML Reporter
JSON Reporter
Mochawesome Reporter
Timeline Reporter
CucumberJS JSON Reporter

Here, we are using Allure Reporting. Allure Reporting in WebdriverIO is a plugin to create Allure Test Reports.

The easiest way is to keep @wdio/allure-reporter as a devDependency in your package.json with

$ npm install @wdio/allure-reporter --save-dev

$ npm install @wdio/allure-reporter --save-dev

Reporter options can be specified in the wdio.conf.js configuration file

reporters: [
   [
     'allure',
     {
       outputDir: 'allure-results',
       disableWebdriverStepsReporting: true,
       disableWebdriverScreenshotsReporting: false
     }
   ]
 ]

reporters: [
   [
     'allure',
     {
       outputDir: 'allure-results',
       disableWebdriverStepsReporting: true,
       disableWebdriverScreenshotsReporting: false
     }
   ]
 ]

To convert Allure .xml report to .html report, run the following command:

$ allure generate && allure open
Allure HTML report should be opened in browser

$ allure generate && allure open
Allure HTML report should be opened in browser

This is what Allure Reports look like:

Limitations with Appium & WebDriverIO

Appium

Android versions lower than 4.2 are not supported for testing
Limited support for hybrid app testing
Doesn’t support image comparison.

WebdriverIO

It has a custom implementation
It can be used for automating AngularJS apps, but it is not as customized as Protractor.

Conclusion

‍In the QA and developer ecosystem, using Appium to test React native applications is common. Appium makes it easy to record test cases on both Android and iOS platforms while working with React Native. Selenium, a basic web developer, acts as a bridge between Appium and mobile platforms for delivery and testing. Appium is a solid framework for automatic UI testing. This article explains that this framework is capable of conducting test cases quickly and reliably. Most importantly, it can test both Android and iOS apps developed by the React Native framework on the basis of a single code.

References

‍

December 12, 2022

Taking Amazon’s Elastic Kubernetes Service for a Spin

With the introduction of Elastic Kubernetes service at AWS re: Invent last year, AWS finally threw their hat in the ever booming space of managed Kubernetes services. In this blog post, we will learn the basic concepts of EKS, launch an EKS cluster and also deploy a multi-tier application on it.

What is Elastic Kubernetes service (EKS)?

Kubernetes works on a master-slave architecture. The master is also referred to as control plane. If the master goes down it brings our entire cluster down, thus ensuring high availability of master is absolutely critical as it can be a single point of failure. Ensuring high availability of master and managing all the worker nodes along with it becomes a cumbersome task in itself, thus it is most desirable for organizations to have managed Kubernetes cluster so that they can focus on the most important task which is to run their applications rather than managing the cluster. Other cloud providers like Google cloud and Azure already had their managed Kubernetes service named GKE and AKS respectively. Similarly now with EKS Amazon has also rolled out its managed Kubernetes cluster to provide a seamless way to run Kubernetes workloads.

Key EKS concepts:

EKS takes full advantage of the fact that it is running on AWS so instead of creating Kubernetes specific features from the scratch they have reused/plugged in the existing AWS services with EKS for achieving Kubernetes specific functionalities. Here is a brief overview:

IAM-integration: Amazon EKS integrates IAM authentication with Kubernetes RBAC ( role-based access control system native to Kubernetes) with the help of Heptio Authenticator which is a tool that uses AWS IAM credentials to authenticate to a Kubernetes cluster. Here we can directly attach an RBAC role with an IAM entity this saves the pain of managing another set of credentials at the cluster level.

Container Interface: AWS has developed an open source cni plugin which takes advantage of the fact that multiple network interfaces can be attached to a single EC2 instance and these interfaces can have multiple secondary private ips associated with them, these secondary ips are used to provide pods running on EKS with real ip address from VPC cidr pool. This improves the latency for inter pod communications as the traffic flows without any overlay.

ELB Support: We can use any of the AWS ELB offerings (classic, network, application) to route traffic to our service running on the working nodes.

Auto scaling: The number of worker nodes in the cluster can grow and shrink using the EC2 auto scaling service.

Route 53: With the help of the External DNS project and AWS route53 we can manage the DNS entries for the load balancers which get created when we create an ingress object in our EKS cluster or when we create a service of type LoadBalancer in our cluster. This way the DNS names are always in sync with the load balancers and we don’t have to give separate attention to it.

Shared responsibility for cluster: The responsibilities of an EKS cluster is shared between AWS and customer. AWS takes care of the most critical part of managing the control plane (api server and etcd database) and customers need to manage the worker node. Amazon EKS automatically runs Kubernetes with three masters across three Availability Zones to protect against a single point of failure, control plane nodes are also monitored and replaced if they fail, and are also patched and updated automatically this ensures high availability of the cluster and makes it extremely simple to migrate existing workloads to EKS.

Prerequisites for launching an EKS cluster:

1. IAM role to be assumed by the cluster: Create an IAM role that allows EKS to manage a cluster on your behalf. Choose EKS as the service which will assume this role and add AWS managed policies ‘AmazonEKSClusterPolicy’ and ‘AmazonEKSServicePolicy’ to it.

2. VPC for the cluster: We need to create the VPC where our cluster is going to reside. We need a VPC with subnets, internet gateways and other components configured. We can use an existing VPC for this if we wish or create one using the CloudFormation script provided by AWS here or use the Terraform script available here. The scripts take ‘cidr’ block of the VPC and three other subnets as arguments.

Launching an EKS cluster:

1. Using the web console: With the prerequisites in place now we can go to the EKS console and launch an EKS cluster when we try to launch an EKS cluster we need to provide a the name of the EKS cluster, choose the Kubernetes version to use, provide the IAM role we created in step one and also choose a VPC, once we choose a VPC we also need to select subnets from the VPC where we want our worker nodes to be launched by default all the subnets in the VPC are selected we also need to provide a security group which is applied to the elastic network interfaces (eni) that EKS creates to allow control plane communicate with the worker nodes.

NOTE: Couple of things to note here is that the subnets must be in at least two different availability zones and the security group that we provided is later updated when we create worker node cluster so it is better to not use this security group with any other entity or be completely sure of the changes happening to it.

2. Using awscli :‍

aws eks create-cluster --name eks-blog-cluster --role-arn arn:aws:iam::XXXXXXXXXXXX:role/eks-service-role  
--resources-vpc-config subnetIds=subnet-0b8da2094908e1b23,subnet-01a46af43b2c5e16c,securityGroupIds=sg-03fa0c02886c183d4

aws eks create-cluster --name eks-blog-cluster --role-arn arn:aws:iam::XXXXXXXXXXXX:role/eks-service-role  
--resources-vpc-config subnetIds=subnet-0b8da2094908e1b23,subnet-01a46af43b2c5e16c,securityGroupIds=sg-03fa0c02886c183d4

{
    "cluster": {
        "status": "CREATING",
        "name": "eks-blog-cluster",
        "certificateAuthority": {},
        "roleArn": "arn:aws:iam::XXXXXXXXXXXX:role/eks-service-role",
        "resourcesVpcConfig": {
            "subnetIds": [
                "subnet-0b8da2094908e1b23",
                "subnet-01a46af43b2c5e16c"
            ],
            "vpcId": "vpc-0364b5ed9f85e7ce1",
            "securityGroupIds": [
                "sg-03fa0c02886c183d4"
            ]
        },
        "version": "1.10",
        "arn": "arn:aws:eks:us-east-1:XXXXXXXXXXXX:cluster/eks-blog-cluster",
        "createdAt": 1535269577.147
    }
}

{
    "cluster": {
        "status": "CREATING",
        "name": "eks-blog-cluster",
        "certificateAuthority": {},
        "roleArn": "arn:aws:iam::XXXXXXXXXXXX:role/eks-service-role",
        "resourcesVpcConfig": {
            "subnetIds": [
                "subnet-0b8da2094908e1b23",
                "subnet-01a46af43b2c5e16c"
            ],
            "vpcId": "vpc-0364b5ed9f85e7ce1",
            "securityGroupIds": [
                "sg-03fa0c02886c183d4"
            ]
        },
        "version": "1.10",
        "arn": "arn:aws:eks:us-east-1:XXXXXXXXXXXX:cluster/eks-blog-cluster",
        "createdAt": 1535269577.147
    }
}

In the response, we see that the cluster is in creating state. It will take a few minutes before it is available. We can check the status using the below command:

aws eks describe-cluster --name=eks-blog-cluster

aws eks describe-cluster --name=eks-blog-cluster

Configure kubectl for EKS:

We know that in Kubernetes we interact with the control plane by making requests to the API server. The most common way to interact with the API server is via kubectl command line utility. As our cluster is ready now we need to install kubectl.

1. Install the kubectl binary

curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s 
https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl

curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s 
https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl

Give executable permission to the binary.

chmod +x ./kubectl

chmod +x ./kubectl

Move the kubectl binary to a folder in your system’s $PATH.

sudo cp ./kubectl /bin/kubectl && export PATH=$HOME/bin:$PATH

sudo cp ./kubectl /bin/kubectl && export PATH=$HOME/bin:$PATH

As discussed earlier EKS uses AWS IAM Authenticator for Kubernetes to allow IAM authentication for your Kubernetes cluster. So we need to download and install the same.

2. Install aws-iam-authenticator

curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator

curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator

Give executable permission to the binary

chmod +x ./aws-iam-authenticator

chmod +x ./aws-iam-authenticator

Move the aws-iam-authenticator binary to a folder in your system’s $PATH.

sudo cp ./aws-iam-authenticator /bin/aws-iam-authenticator

sudo cp ./aws-iam-authenticator /bin/aws-iam-authenticator

3. Create the kubeconfig file

First create the directory.

mkdir -p ~/.kube

mkdir -p ~/.kube

Open a config file in the folder created above

sudo vi .kube/config-eks-blog-cluster

sudo vi .kube/config-eks-blog-cluster

Paste the below code in the file

clusters:      
- cluster:       
server: https://DBFE36D09896EECAB426959C35FFCC47.sk1.us-east-1.eks.amazonaws.com        
certificate-authority-data: ”....................”        
name: kubernetes        
contexts:        
- context:             
cluster: kubernetes             
user: aws          
name: aws        
current-context: aws        
kind: Config       
preferences: {}        
users:           
- name: aws            
user:                
exec:                    
apiVersion: client.authentication.k8s.io/v1alpha1                    
command: aws-iam-authenticator                    
args:                       
- "token"                       
- "-i"                     
- “eks-blog-cluster"

clusters:      
- cluster:       
server: https://DBFE36D09896EECAB426959C35FFCC47.sk1.us-east-1.eks.amazonaws.com        
certificate-authority-data: ”....................”        
name: kubernetes        
contexts:        
- context:             
cluster: kubernetes             
user: aws          
name: aws        
current-context: aws        
kind: Config       
preferences: {}        
users:           
- name: aws            
user:                
exec:                    
apiVersion: client.authentication.k8s.io/v1alpha1                    
command: aws-iam-authenticator                    
args:                       
- "token"                       
- "-i"                     
- “eks-blog-cluster"

Replace the values of the server and certificate–authority data with the values of your cluster and certificate and also update the cluster name in the args section. You can get these values from the web console as well as using the command.

aws eks describe-cluster --name=eks-blog-cluster

aws eks describe-cluster --name=eks-blog-cluster

Save and exit.

Add that file path to your KUBECONFIG environment variable so that kubectl knows where to look for your cluster configuration.

export KUBECONFIG=$KUBECONFIG:~/.kube/config-eks-blog-cluster

export KUBECONFIG=$KUBECONFIG:~/.kube/config-eks-blog-cluster

To verify that the kubectl is now properly configured :

kubectl get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 172.20.0.1  443/TCP 50m

kubectl get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 172.20.0.1  443/TCP 50m

Launch and configure worker nodes :

Now we need to launch worker nodes before we can start deploying apps. We can create the worker node cluster by using the CloudFormation script provided by AWS which is available here or use the Terraform script available here.

ClusterName: Name of the Amazon EKS cluster we created earlier.
ClusterControlPlaneSecurityGroup: Id of the security group we used in EKS cluster.
NodeGroupName: Name for the worker node auto scaling group.
NodeAutoScalingGroupMinSize: Minimum number of worker nodes that you always want in your cluster.
NodeAutoScalingGroupMaxSize: Maximum number of worker nodes that you want in your cluster.
NodeInstanceType: Type of worker node you wish to launch.
NodeImageId: AWS provides Amazon EKS-optimized AMI to be used as worker nodes. Currently AKS is available in only two AWS regions Oregon and N.virginia and the AMI ids are ami-02415125ccd555295 and ami-048486555686d18a0 respectively
KeyName: Name of the key you will use to ssh into the worker node.
VpcId: Id of the VPC that we created earlier.
Subnets: Subnets from the VPC we created earlier.

To enable worker nodes to join your cluster, we need to download, edit and apply the AWS authenticator config map.

Download the config map:

curl -O https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/aws-auth-cm.yaml

curl -O https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/aws-auth-cm.yaml

Open it in an editor

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: <ARN of instance role (not instance profile)>
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: <ARN of instance role (not instance profile)>
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes

Edit the value of rolearn with the arn of the role of your worker nodes. This value is available in the output of the scripts that you ran. Save the change and then apply

kubectl apply -f aws-auth-cm.yaml

kubectl apply -f aws-auth-cm.yaml

Now you can check if the nodes have joined the cluster or not.

kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-2-171.ec2.internal Ready  12s v1.10.3
ip-10-0-3-58.ec2.internal Ready  14s v1.10.3

kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-2-171.ec2.internal Ready  12s v1.10.3
ip-10-0-3-58.ec2.internal Ready  14s v1.10.3

Deploying an application:

As our cluster is completely ready now we can start deploying applications on it. We will deploy a simple books api application which connects to a mongodb database and allows users to store,list and delete book information.

1. MongoDB Deployment YAML

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: mongodb
spec:
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      containers:
      - name: mongodb
        image: mongo
        ports:
        - name: mongodbport
          containerPort: 27017
          protocol: TCP

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: mongodb
spec:
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      containers:
      - name: mongodb
        image: mongo
        ports:
        - name: mongodbport
          containerPort: 27017
          protocol: TCP

2. Test Application Development YAML

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: test-app
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: test-app
    spec:
      containers:
      - name: test-app
        image: akash125/pyapp
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: test-app
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: test-app
    spec:
      containers:
      - name: test-app
        image: akash125/pyapp
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000

3. MongoDB Service YAML

apiVersion: v1
kind: Service
metadata:
  name: mongodb-service
spec:
  ports:
  - port: 27017
    targetPort: 27017
    protocol: TCP
    name: mongodbport
  selector:
    app: mongodb

apiVersion: v1
kind: Service
metadata:
  name: mongodb-service
spec:
  ports:
  - port: 27017
    targetPort: 27017
    protocol: TCP
    name: mongodbport
  selector:
    app: mongodb

4. Test Application Service YAML

apiVersion: v1
kind: Service
metadata:
  name: test-service
spec:
  type: LoadBalancer
  ports:
  - name: test-service
    port: 80
    protocol: TCP
    targetPort: 3000
  selector:
    app: test-app

apiVersion: v1
kind: Service
metadata:
  name: test-service
spec:
  type: LoadBalancer
  ports:
  - name: test-service
    port: 80
    protocol: TCP
    targetPort: 3000
  selector:
    app: test-app

Services

$ kubectl create -f mongodb-service.yaml
$ kubectl create -f testapp-service.yaml

$ kubectl create -f mongodb-service.yaml
$ kubectl create -f testapp-service.yaml

Deployments

$ kubectl create -f mongodb-deployment.yaml
$ kubectl create -f testapp-deployment.yaml$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 12m
mongodb-service ClusterIP 172.20.55.194 <none> 27017/TCP 4m
test-service LoadBalancer 172.20.188.77 a7ee4f4c3b0ea 80:31427/TCP 3m

$ kubectl create -f mongodb-deployment.yaml
$ kubectl create -f testapp-deployment.yaml$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 12m
mongodb-service ClusterIP 172.20.55.194 <none> 27017/TCP 4m
test-service LoadBalancer 172.20.188.77 a7ee4f4c3b0ea 80:31427/TCP 3m

In the EXTERNAL-IP section of the test-service we see dns of an load balancer we can now access the application from outside the cluster using this dns.

To Store Data :

curl -X POST -d '{"name":"A Game of Thrones (A Song of Ice and Fire)“, "author":"George R.R. Martin","price":343}' http://a7ee4f4c3b0ea11e8b0f912f36098e4d-672471149.us-east-1.elb.amazonaws.com/books
{"id":"5b8fab49fa142b000108d6aa","name":"A Game of Thrones (A Song of Ice and Fire)","author":"George R.R. Martin","price":343}

curl -X POST -d '{"name":"A Game of Thrones (A Song of Ice and Fire)“, "author":"George R.R. Martin","price":343}' http://a7ee4f4c3b0ea11e8b0f912f36098e4d-672471149.us-east-1.elb.amazonaws.com/books
{"id":"5b8fab49fa142b000108d6aa","name":"A Game of Thrones (A Song of Ice and Fire)","author":"George R.R. Martin","price":343}

To Get Data :

curl -X GET http://a7ee4f4c3b0ea11e8b0f912f36098e4d-672471149.us-east-1.elb.amazonaws.com/books
[{"id":"5b8fab49fa142b000108d6aa","name":"A Game of Thrones (A Song of Ice and Fire)","author":"George R.R. Martin","price":343}]

curl -X GET http://a7ee4f4c3b0ea11e8b0f912f36098e4d-672471149.us-east-1.elb.amazonaws.com/books
[{"id":"5b8fab49fa142b000108d6aa","name":"A Game of Thrones (A Song of Ice and Fire)","author":"George R.R. Martin","price":343}]

We can directly put the URL used in the curl operation above in our browser as well, we will get the same response.

Now our application is deployed on EKS and can be accessed by the users.

Comparison BETWEEN GKE, ECS and EKS:

Cluster creation: Creating GKE and ECS cluster is way simpler than creating an EKS cluster. GKE being the simplest of all three.

Cost: In case of both, GKE and ECS we pay only for the infrastructure that is visible to us i.e., servers, volumes, ELB etc. and there is no cost for master nodes or other cluster management services but with EKS there is a charge of 0.2 $ per hour for the control plane.

Add-ons: GKE provides the option of using Calico as the network plugin which helps in defining network policies for controlling inter pod communication (by default all pods in k8s can communicate with each other).

Serverless: ECS cluster can be created using Fargate which is container as a Service (CaaS) offering from AWS. Similarly EKS is also expected to support Fargate very soon.

In terms of availability and scalability all the services are at par with each other.

Conclusion:

In this blog post we learned the basics concepts of EKS, launched our own EKS cluster and deployed an application as well. EKS is much awaited service from AWS especially for the folks who were already running their Kubernetes workloads on AWS, as now they can easily migrate to EKS and have a fully managed Kubernetes control plane. EKS is expected to be adopted by many organisations in near future.

References:

December 12, 2022

Surviving & Thriving in the Age of Software Accelerations
It has almost been a decade since Marc Andreessen made this prescient statement. Software is not only eating the world but doing so at an accelerating pace. There is no industry that hasn’t been challenged by technology startups with disruptive approaches.
- Automakers are no longer just manufacturing companies: Tesla is disrupting the industry with their software approach to vehicle development and continuous over-the-air software delivery. Waymo’s autonomous cars have driven millions of miles and self-driving cars are a near-term reality. Uber is transforming the transportation industry into a service, potentially affecting the economics and incentives of almost 3–4% of the world GDP!
- Social networks and media platforms had a significant and decisive impact on the US election results.
- Banks and large financial institutions are being attacked by FinTech startups like WealthFront, Venmo, Affirm, Stripe, SoFi, etc. Bitcoin, Ethereum and the broader blockchain revolution can upend the core structure of banks and even sovereign currencies.
- Traditional retail businesses are under tremendous pressure due to Amazon and other e-commerce vendors. Retail is now a customer ownership, recommendations, and optimization business rather than a brick and mortar one.
Enterprises need to adopt a new approach to software development and digital innovation. At Velotio, we are helping customers to modernize and transform their business with all of the approaches and best practices listed below.

Agility

In this fast-changing world, your business needs to be agile and fast-moving. You need to ship software faster, at a regular cadence, with high quality and be able to scale it globally.

Agile practices allow companies to rally diverse teams behind a defined process that helps to achieve inclusivity and drives productivity. Agile is about getting cross-functional teams to work in concert in planned short iterations with continuous learning and improvement.

Generally, teams that work in an Agile methodology will:
- Conduct regular stand-ups and Scrum/Kanban planning meetings with the optimal use of tools like Jira, PivotalTracker, Rally, etc.
- Use pair programming and code review practices to ensure better code quality.
- Use continuous integration and delivery tools like Jenkins or CircleCI.
- Design processes for all aspects of product management, development, QA, DevOps and SRE.
- Use Slack, Hipchat or Teams for communication between team members and geographically diverse teams. Integrate all tools with Slack to ensure that it becomes the central hub for notifications and engagement.
Cloud-Native

Businesses need software that is purpose-built for the cloud model. What does that mean? Software team sizes are now in the hundreds of thousands. The number of applications and software stacks is growing rapidly in most companies. All companies use various cloud providers, SaaS vendors and best-of-breed hosted or on-premise software. Essentially, software complexity has increased exponentially which required a “cloud-native” approach to manage effectively. Cloud Native Computing Foundation defines cloud native as a software stack which is:
1. Containerized: Each part (applications, processes, etc) is packaged in its own container. This facilitates reproducibility, transparency, and resource isolation.
2. Dynamically orchestrated: Containers are actively scheduled and managed to optimize resource utilization.
3. Microservices oriented: Applications are segmented into micro services. This significantly increases the overall agility and maintainability of applications.
You can deep-dive into cloud native with this blog by our CTO, Chirag Jog.

Cloud native is disrupting the traditional enterprise software vendors. Software is getting decomposed into specialized best of breed components — much like the micro-services architecture. See the Cloud Native landscape below from CNCF.

DevOps

Process and toolsets need to change to enable faster development and deployment of software. Enterprises cannot compete without mature DevOps strategies. DevOps is essentially a set of practices, processes, culture, tooling, and automation that focuses on delivering software continuously with high quality.

DevOps tool chains & process

As you begin or expand your DevOps journey, a few things to keep in mind:
- Customize to your needs: There is no single DevOps process or toolchain that suits all needs. Take into account your organization structure, team capabilities, current software process, opportunities for automation and goals while making decisions. For example, your infrastructure team may have automated deployments but the main source of your quality issues could be the lack of code reviews in your development team. So identify the critical pain points and sources of delay to address those first.
- Automation: Automate everything that can be. The lesser the dependency on human intervention, the higher are the chances for success.
- Culture: Align the incentives and goals with your development, ITOps, SecOps, SRE teams. Ensure that they collaborate effectively and ownership in the DevOps pipeline is well established.
- Small wins: Pick one application or team and implement your DevOps strategy within it. That way you can focus your energies and refine your experiments before applying them broadly. Show success as measured by quantifiable parameters and use that to transform the rest of your teams.
- Organizational dynamics & integrations: Adoption of new processes and tools will cause some disruptions and you may need to re-skill part of your team or hire externally. Ensure that compliance, SecOps & audit teams are aware of your DevOps journey and get their buy-in.
- DevOps is a continuous journey: DevOps will never be done. Train your team to learn continuously and refine your DevOps practice to keep achieving your goal: delivering software reliably and quickly.
Micro-services

As the amount of software in an enterprise explodes, so does the complexity. The only way to manage this complexity is by splitting your software and teams into smaller manageable units. Micro-services adoption is primarily to manage this complexity.

Development teams across the board are choosing micro services to develop new applications and break down legacy monoliths. Every micro-service can be deployed, upgraded, scaled, monitored and restarted independent of other services. Micro-services should ideally be managed by an automated system so that teams can easily update live applications without affecting end-users.

There are companies with 100s of micro-services in production which is only possible with mature DevOps, cloud-native and agile practice adoption.

Interestingly, serverless platforms like Google Functions and AWS Lambda are taking the concept of micro-services to the extreme by allowing each function to act like an independent piece of the application. You can read about my thoughts on serverless computing in this blog: Serverless Computing Predictions for 2017.

Digital Transformation

Digital transformation involves making strategic changes to business processes, competencies, and models to leverage digital technologies. It is a very broad term and every consulting vendor twists it in various ways. Let me give a couple of examples to drive home the point that digital transformation is about using technology to improve your business model, gain efficiencies or built a moat around your business:
- GE has done an excellent job transforming themselves from a manufacturing company into an IoT/software company with Predix. GE builds airplane engines, medical equipment, oil & gas equipment and much more. Predix is an IoT platform that is being embedded into all of GE’s products. This enabled them to charge airlines on a per-mile basis by taking the ownership of maintenance and quality instead of charging on a one-time basis. This also gives them huge amounts of data that they can leverage to improve the business as a whole. So digital innovation has enabled a business model improvement leading to higher profits.
- Car companies are exploring models where they can provide autonomous car fleets to cities where they will charge on a per-mile basis. This will convert them into a “service” & “data” company from a pure manufacturing one.
- Insurance companies need to built digital capabilities to acquire and retain customers. They need to build data capabilities and provide ongoing value with services rather than interact with the customer just once a year.
You would be better placed to compete in the market if you have automation and digital process in place so that you can build new products and pivot in an agile manner.

Big Data / Data Science

Businesses need to deal with increasing amounts of data due to IoT, social media, mobile and due to the adoption of software for various processes. And they need to use this data intelligently. Cloud platforms provide the services and solutions to accelerate your data science and machine learning strategies. AWS, Google Cloud & open-source libraries like Tensorflow, SciPy, Keras, etc. have a broad set of machine learning and big data services that can be leveraged. Companies need to build mature data processing pipelines to aggregate data from various sources and store it for quick and efficient access to various teams. Companies are leveraging these services and libraries to build solutions like:
- Predictive analytics
- Cognitive computing
- Robotic Process Automation
- Fraud detection
- Customer churn and segmentation analysis
- Recommendation engines
- Forecasting
- Anomaly detection
Companies are creating data science teams to build long term capabilities and moats around their business by using their data smartly.

Re-platforming & App Modernization

Enterprises want to modernize their legacy, often monolithic apps as they migrate to the cloud. The move can be triggered due to hardware refresh cycles or license renewals or IT cost optimization or adoption of software-focused business models.

Benefits of modernization to customers and businesses

Intelligent Applications

Software is getting more intelligent and to enable this, businesses need to integrate disparate datasets, distributed teams, and processes. This is best done on a scalable global cloud platform with agile processes. Big data and data science enables the creation of intelligent applications.

How can smart applications help your business?
- New intelligent systems of engagement: intelligent apps surface insights to users enabling the user to be more effective and efficient. For example, CRMs and marketing software is getting intelligent and multi-platform enabling sales and marketing reps to become more productive.
- Personalisation: E-Commerce, social networks and now B2B software is getting personalized. In order to improve user experience and reduce churn, your applications should be personalized based on the user preferences and traits.
- Drive efficiencies: IoT is an excellent example where the efficiency of machines can be improved with data and cloud software. Real-time insights can help to optimize processes or can be used for preventive maintenance.
- Creation of new business models: Traditional and modern industries can use AI to build new business models. For example, what if insurance companies allow you to pay insurance premiums only for the miles driven?
Security

Security threats to governments, enterprises and data have never been greater. As business adopt cloud native, DevOps & micro-services practices, their security practices need to evolve.

In our experience, these are few of the features of a mature cloud native security practice:
- Automated: Systems are updated automatically with the latest fixes. Another approach is immutable infrastructure with the adoption of containers and serverless.
- Proactive: Automated security processes tend to be proactive. For example, if a malware of vulnerability is found in one environment, automation can fix it in all environments. Mature DevOps & CI/CD processes ensure that fixes can be deployed in hours or days instead of weeks or months.
- Cloud Platforms: Businesses have realized that the mega-clouds are way more secure than their own data centers can be. Many of these cloud platforms have audit, security and compliance services which should be leveraged.
- Protecting credentials: Use AWS KMS, Hashicorp Vault or other solutions for protecting keys, passwords and authorizations.
- Bug bounties: Either setup bug bounties internally or through sites like HackerOne. You want the good guys to work for you and this is an easy way to do that.
Conclusion

As you can see, all of these approaches and best practices are intertwined and need to be implemented in concert to gain the desired results. It is best to start with one project, one group or one application and build on early wins. Remember, that is is a process and you are looking for gradual improvements to achieve your final objectives.

Please let us know your thoughts and experiences by adding comments to this blog or reaching out to @kalpakshah or RSI. We would love to help your business adopt these best practices and help to build great software together. Drop me a note at kalpak (at) velotio (dot) com.
December 12, 2022
Your Complete Guide to Building Stateless Bots Using Rasa Stack
This blog aims at exploring the Rasa Stack to create a stateless chat-bot. We will look into how, the recently released Rasa Core, which provides machine learning based dialogue management, helps in maintaining the context of conversations using machine learning in an efficient way.

If you have developed chatbots, you would know how hopelessly bots fail in maintaining the context once complex use-cases need to be developed. There are some home-grown approaches that people currently use to build stateful bots. The most naive approach is to create the state machines where you create different states and based on some logic take actions. As the number of states increases, more levels of nested logic are required or there is a need to add an extra state to the state machine, with another set of rules for how to get in and out of that state. Both of these approaches lead to fragile code that is harder to maintain and update. Anyone who’s built and debugged a moderately complex bot knows this pain.

After building many chatbots, we have experienced that flowcharts are useful for doing the initial design of a bot and describing a few of the known conversation paths, but we shouldn’t hard-code a bunch of rules since this approach doesn’t scale beyond simple conversations.

Thanks to the Rasa guys who provided a way to go stateless where scaling is not at all a problem. Let’s build a bot using Rasa Core and learn more about this.

Rasa Core: Getting Rid of State Machines

The main idea behind Rasa Core is that thinking of conversations as a flowchart and implementing them as a state machine doesn’t scale. It’s very hard to reason about all possible conversations explicitly, but it’s very easy to tell, mid-conversation, if a response is right or wrong. For example, let’s consider a term insurance purchase bot, where you have defined different states to take different actions. Below diagram shows an example state machine:

Let’s consider a sample conversation where a user wants to compare two policies listed by policy_search state.

In above conversation, it can be compared very easily by adding some logic around the intent campare_policies. But real life is not so easy, as a majority of conversations are edge cases. We need to add rules manually to handle such cases, and after testing we realize that these clash with other rules we wrote earlier.

Rasa guys figured out how machine learning can be used to solve this problem. They have released Rasa Core where the logic of the bot is based on a probabilistic model trained on real conversations.

Structure of a Rasa Core App

Let’s understand few terminologies we need to know to build a Rasa Core app:

1. Interpreter: An interpreter is responsible for parsing messages. It performs the Natural Language Understanding and transforms the message into structured output i.e. intent and entities. In this blog, we are using Rasa NLU model as an interpreter. Rasa NLU comes under the Rasa Stack. In Training section, it is shown in detail how to prepare the training data and create a model.

2. Domain: To define a domain we create a domain.yml file, which defines the universe of your bot. Following things need to be defined in a domain file:
- Intents: Things we expect the user to say. It is more related to Rasa NLU.
- Entities: These represent pieces of information extracted what user said. It is also related to Rasa NLU.
- Templates: We define some template strings which our bot can say. The format for defining a template string is utter_<intent>. These are considered as actions which bot can take.
- Actions: List of things bot can do and say. There are two types of actions we define one those which will only utter message (Templates) and others some customised actions where some required logic is defined. Customised actions are defined as Python classes and are referenced in domain file.
- Slots: These are user-defined variables which need to be tracked in a conversation. For e.g to buy a term insurance we need to keep track of what policy user selects and details of the user, so all these details will come under slots.
3. Stories: In stories, we define what bot needs to do at what point in time. Based on these stories, a probabilistic model is generated which is used to decide which action to be taken next. There are two ways in which stories can be created which are explained in next section.

Let’s combine all these pieces together. When a message arrives in a Rasa Core app initially, interpreter transforms the message into structured output i.e. intents and entities. The Tracker is the object which keeps track of conversation state. It receives the info that a new message has come in. Then based on dialog model we generate using domain and stories policy chooses which action to take next. The chosen action is logged by the tracker and response is sent back to the user.

Training and Running A Sample Bot

We will create a simple Facebook chat-bot named Secure Life which assists you in buying term life insurance. To keep the example simple, we have restricted options such as age-group, term insurance amount, etc.

There are two models we need to train in the Rasa Core app:

Rasa NLU model based on which messages will be processed and converted to a structured form of intent and entities. Create following two files to generate the model:

data.json: Create this training file using the rasa-nlu trainer. Click here to know more about the rasa-nlu trainer.

nlu_config.json: This is the configuration file.
```
{
"pipeline": "spacy_sklearn",
"path" : "./models",
"project": "nlu",
"data" : "./data/data.md"
}
```
Run below command to train the rasa-nlu model:-
```
$ python -m rasa_nlu.train -c nlu_model_config.json --fixed_model_name current
```
Dialogue Model: This model is trained on stories we define, based on which the policy will take the action. There are two ways in which stories can be generated:
- Supervised Learning: In this type of learning we will create the stories by hand, writing them directly in a file. It is easy to write but in case of complex use-cases it is difficult to cover all scenarios.
- Reinforcement Learning: The user provides feedback on every decision taken by the policy. This is also known as interactive learning. This helps in including edge cases which are difficult to create by hand. You must be thinking how it works? Every time when a policy chooses an action to take, it is asked from the user whether the chosen action is correct or not. If the action taken is wrong, you can correct the action on the fly and store the stories to train the model again.
Since the example is simple, we have used supervised learning method, to generate the dialogue model. Below is the stories.md file.
## All yes * greet - utter_greet * affirm - utter_very_much_so * affirm - utter_gender * gender - utter_coverage_duration - action_gender * affirm - utter_nicotine * affirm - action_nicotine * age - action_thanks ## User not interested * greet - utter_greet * deny - utter_decline ## Coverage duration is not sufficient * greet - utter_greet * affirm - utter_very_much_so * affirm - utter_gender * gender - utter_coverage_duration - action_gender * deny - utter_decline
```
## All yes
* greet
- utter_greet
* affirm
- utter_very_much_so
* affirm
- utter_gender
* gender
- utter_coverage_duration
- action_gender
* affirm
- utter_nicotine
* affirm
- action_nicotine
* age
- action_thanks

## User not interested
* greet
- utter_greet
* deny
- utter_decline

## Coverage duration is not sufficient
* greet
- utter_greet
* affirm
- utter_very_much_so
* affirm
- utter_gender
* gender
- utter_coverage_duration
- action_gender
* deny
- utter_decline
```
Run below command to train dialogue model :
```
$ python -m rasa_core.train -s <path to stories.md file> -d <path to domain.yml> -o models/dialogue --epochs 300
```
Define a Domain: Create domain.yml file containing all the required information. Among the intents and entities write all those strings which bot is supposed to see when user say something i.e. intents and entities you defined in rasa NLU training file.
intents: - greet - goodbye - affirm - deny - age - gender slots: gender: type: text nicotine: type: text agegroup: type: text templates: utter_greet: - "hey there! welcome to Secure-Life!\nI can help you quickly estimate your rate of coverage.\nWould you like to do that ?" utter_very_much_so: - "Great! Let's get started.\nWe currently offer term plans of Rs. 1Cr. Does that suit your need?" utter_gender: - "What gender do you go by ?" utter_coverage_duration: - "We offer this term plan for a duration of 30Y. Do you think that's enough to cover entire timeframe of your financial obligations ?" utter_nicotine: - "Do you consume nicotine-containing products?" utter_age: - "And lastly, how old are you ?" utter_thanks: - "Thank you for providing all the info. Let me calculate the insurance premium based on your inputs." utter_decline: - "Sad to see you go. In case you change your plans, you know where to find me :-)" utter_goodbye: - "goodbye :(" actions: - utter_greet - utter_goodbye - utter_very_much_so - utter_coverage_duration - utter_age - utter_nicotine - utter_gender - utter_decline - utter_thanks - actions.ActionGender - actions.ActionNicotine - actions.ActionThanks
```
intents:
- greet
- goodbye
- affirm
- deny
- age
- gender

slots:
gender:
type: text
nicotine:
type: text
agegroup:
type: text

templates:
utter_greet:
- "hey there! welcome to Secure-Life!\nI can help you quickly estimate your rate of coverage.\nWould you like to do that ?"

utter_very_much_so:
- "Great! Let's get started.\nWe currently offer term plans of Rs. 1Cr. Does that suit your need?"

utter_gender:
- "What gender do you go by ?"

utter_coverage_duration:
- "We offer this term plan for a duration of 30Y. Do you think that's enough to cover entire timeframe of your financial obligations ?"

utter_nicotine:
- "Do you consume nicotine-containing products?"

utter_age:
- "And lastly, how old are you ?"

utter_thanks:
- "Thank you for providing all the info. Let me calculate the insurance premium based on your inputs."

utter_decline:
- "Sad to see you go. In case you change your plans, you know where to find me :-)"

utter_goodbye:
- "goodbye :("

actions:
- utter_greet
- utter_goodbye
- utter_very_much_so
- utter_coverage_duration
- utter_age
- utter_nicotine
- utter_gender
- utter_decline
- utter_thanks
- actions.ActionGender
- actions.ActionNicotine
- actions.ActionThanks
```
Define Actions: Templates defined in domain.yml also considered as actions. A sample customized action is shown below where we are setting a slot named gender with values according to the option selected by the user.
from rasa_core.actions.action import Action from rasa_core.events import SlotSet class ActionGender(Action): def name(self): return 'action_gender' def run(self, dispatcher, tracker, domain): messageObtained = tracker.latest_message.text.lower() if ("male" in messageObtained): return [SlotSet("gender", "male")] elif ("female" in messageObtained): return [SlotSet("gender", "female")] else: return [SlotSet("gender", "others")]
```
from rasa_core.actions.action import Action
from rasa_core.events import SlotSet

class ActionGender(Action):
def name(self):
return 'action_gender'
def run(self, dispatcher, tracker, domain):
messageObtained = tracker.latest_message.text.lower()

if ("male" in messageObtained):
return [SlotSet("gender", "male")]
elif ("female" in messageObtained):
return [SlotSet("gender", "female")]
else:
return [SlotSet("gender", "others")]
```
Running the Bot

Create a Facebook app and get the app credentials. Create a bot.py file as shown below:
from rasa_core import utils from rasa_core.agent import Agent from rasa_core.interpreter import RasaNLUInterpreter from rasa_core.channels import HttpInputChannel from rasa_core.channels.facebook import FacebookInput logger = logging.getLogger(__name__) def run(serve_forever=True): # create rasa NLU interpreter interpreter = RasaNLUInterpreter("models/nlu/current") agent = Agent.load("models/dialogue", interpreter=interpreter) input_channel = FacebookInput( fb_verify="your_fb_verify_token", # you need tell facebook this token, to confirm your URL fb_secret="your_app_secret", # your app secret fb_tokens={"your_page_id": "your_page_token"}, # page ids + tokens you subscribed to debug_mode=True # enable debug mode for underlying fb library ) if serve_forever: agent.handle_channel(HttpInputChannel(5004, "/app", input_channel)) return agent if __name__ == '__main__': utils.configure_colored_logging(loglevel="DEBUG") run()
```
from rasa_core import utils
from rasa_core.agent import Agent
from rasa_core.interpreter import RasaNLUInterpreter
from rasa_core.channels import HttpInputChannel
from rasa_core.channels.facebook import FacebookInput

logger = logging.getLogger(__name__)

def run(serve_forever=True):
# create rasa NLU interpreter
interpreter = RasaNLUInterpreter("models/nlu/current")
agent = Agent.load("models/dialogue", interpreter=interpreter)

input_channel = FacebookInput(
fb_verify="your_fb_verify_token", # you need tell facebook this token, to confirm your URL
fb_secret="your_app_secret", # your app secret
fb_tokens={"your_page_id": "your_page_token"}, # page ids + tokens you subscribed to
debug_mode=True # enable debug mode for underlying fb library
)

if serve_forever:
agent.handle_channel(HttpInputChannel(5004, "/app", input_channel))
return agent

if __name__ == '__main__':
utils.configure_colored_logging(loglevel="DEBUG")
run()
```
Run the file and your bot is ready to test. Sample conversations are provided below:

Summary

You have seen how Rasa Core has made it easier to build bots. Just create few files and boom! Your bot is ready! Isn’t it exciting? I hope this blog provided you some insights on how Rasa Core works. Start exploring and let us know if you need any help in building chatbots using Rasa Core.
December 12, 2022
A Step Towards Simplified Querying in NodeJS
Recently, I came across a question on StackOverflow regarding querying the data on relationship table using sequalize and I went into flashback with the same situation and hence decided to write a blog over a better alternative Objection.js. When we choose ORM’s without looking into the use case we are tackling we usually end up with a mess.

The question on StackOverflow was about converting the below query into sequalize query.
```
SELECT a.* 
FROM employees a, emp_dept_details b 
WHERE b.Dept_Id=2 AND a.Emp_No = b.Emp_Id
```
(Pardon me for naming in the query, it was asked by novice programmer and I wanted to keep it as it is for purity sake).

Seems pretty straightforward right? So the solution is like below:
```
Employee.findAll({ 
  include: [{ 
    model: EmployeeDeptDetails, 
    where: { 
      Emp_Id: Sequelize.col('employees.Emp_No'), 
      Dept_Id: 2 
    } 
  }] 
});
```
If you look at this it’s much complex solution for simple querying and this grows with added relationships. And also for simple queries like this, the sequalize documentation is not sufficient. Now if you ask me how it can be done in a better way with Objection.js below is the same query in objection.
```
Employee.query()
  .joinRelation(‘employeeDeptDetails’)
  .where({ ‘employeeDeptDetails.Dept_Id’: 2 })
```
Note: It’s assumed that relationship is defined (in model classes) in both examples.

Now you guys can see the difference this is just one example I came across there are others on the internet for better understanding. So are you guys ready for diving into Objection.js?

But before we dive in, I wanted to let you guys know whenever we check online for Node.js ORM, we always find some people saying “don’t use an ORM, just write plain SQL” and they are correct in their perception. If your app is small enough that you can write a bunch of query helper functions and carry out all the needed functionality, then don’t go with ORM approach, instead just use plain SQL.

But when your app has an ample amount of tables and relationships between them that need to be defined and multiple-joint queries need to done, there comes the power of ORM.

So when we search for the ORM’s (For relational DB) available in NodeJS arena we usually get the list below:

1. Sequelize‍

2. Objection.js

3. typeORM

There are others, I have just mentioned more popular ones.

Well, I have personally used both Sequelize and Objection.js as they are the most popular ORM available today. So if you are a person who is deciding on which ORM you should be using for your next project or got frustrated with the relationship query complexity of `Sequelize` then you have landed on the correct place.

I am going to be honest here, I am using Objection.js currently doesn’t make it the facto or best ORM for NodeJS. If you don’t love to write the SQL resembling queries and prefer the fully abstracted query syntax then I think `Sequelize` is the right option for you (though you might struggle with relationship queries as I did and land up with Objection.js later on) but if you want your queries to resemble the SQL one then you should read out this blog.

What Makes Objection So Special?

1. Objection under the hood uses KNEX.JS a powerful SQL query builder

2. Let’s you create models for tables with ES6 / ES7 classes and deﬁne the relationships between them

3. Make queries with async / await

4. Add validation to your models using JSON schema

5. Perform graph inserts and upserts

to name a few.

The Learning Curve

I have exclusively relied upon the documentation. The Knex.js and objection.js documentation is great and there are simple (One of them, I am going to use below for explanation) examples on the Objection GitHub. So if you have previously worked with any NodeJS ORM or you are a newbie, this will help you get started without any struggles.

So let’s get started with some of the important topics while I explain to you the advantages over other ORM and usage along the way.

For setup (package installation, configuration, etc.) and full code you can check out Github

Creating and Managing DB Schema

Migration is a good pattern to manage your changes database schema. Objection.js uses knex.js migration for this purpose.

So what is Migration : Migrations are changes to a database’s schema specified within your ORM, so we will be defining the tables and columns of our database straight in JavaScript rather than using SQL.

One of the best features of Knex is its robust migration support. To create a new migration simply use the knex cli:
```
knex mirate:make migration_name
```
After running this command you’ll notice that a new file is created within your migrations directory. This file will include a current timestamp as well as the name that you gave to your migration. The file will look like this:
```
exports.up = function(knex, Promise) {

};

exports.down = function(knex, Promise) {

};
```
As you can notice the first is `exports.up`, which specifies the commands that should be run to make the database change that you’d like to make.e.g creating database tables, adding or removing a column from a table, changing indexes, etc.

The second function within your migration file is `exports.down`. This functions goal is to do the opposite of what exports.up did. If `exports.up` created a table, then `exports.down` will drop that table. The reason to include `exports.down` is so that you can quickly undo a migration should you need to.

For example:
exports.up = knex => { return knex.schema .createTable('persons', table => { table.increments('id').primary(); table .integer('parentId') .unsigned() .references('id') .inTable('persons') .onDelete('SET NULL') .index(); table.string('firstName'); table.string('lastName'); table.integer('age'); table.json('address'); }) }; exports.down = knex => { return knex.schema .dropTableIfExists('persons'); };
```
exports.up = knex => {
  return knex.schema
    .createTable('persons', table => {
      table.increments('id').primary();
      table
	.integer('parentId')
	.unsigned()
	.references('id')
	.inTable('persons')
	.onDelete('SET NULL')
	.index();
      table.string('firstName');
      table.string('lastName');
      table.integer('age');
      table.json('address');
    })
};

exports.down = knex => {
  return knex.schema
    .dropTableIfExists('persons');
}; 
```
It’s that simple to create the migration. Now you can run your migration like below.
```
$ knex migrate:latest
```
You can also pass the `–env` flag or set `NODE_ENV` to select an alternative environment:
```
$ knex migrate:latest --env production
```
To rollback the last batch of migrations:
```
$ knex migrate:rollback
```
Models

Models are wrappers around the database tables, they help to encapsulate the business logic within those tables.

Objection.js allows to create model using ES classes.

Before diving into the example you guys need to clear your thoughts regarding model little bit as Objection.js Model does not create any table in DB. Yes! the only thing Models are used for are adding the validations and relationship mapping.

For example:
const { Model } = require('objection'); const Animal = require('./Animal'); class Person extends Model { // Table name is the only required property. static get tableName() { return 'persons'; } // Optional JSON schema. This is not the database schema. Nothing is generated // based on this. This is only used for validation. Whenever a model instance // is created it is checked against this schema. http://json-schema.org/. static get jsonSchema() { return { type: 'object', required: ['firstName', 'lastName'], properties: { id: { type: 'integer' }, parentId: { type: ['integer', 'null'] }, firstName: { type: 'string', minLength: 1, maxLength: 255 }, lastName: { type: 'string', minLength: 1, maxLength: 255 }, age: { type: 'number' }, address: { type: 'object', properties: { street: { type: 'string' }, city: { type: 'string' }, zipCode: { type: 'string' } } } } }; } // This object defines the relations to other models. static get relationMappings() { return { pets: { relation: Model.HasManyRelation, // The related model. This can be either a Model subclass constructor or an // absolute file path to a module that exports one. modelClass: Animal, join: { from: 'persons.id', to: 'animals.ownerId' } } }; } } module.exports = Person;
```
const { Model } = require('objection');
const Animal = require('./Animal');

class Person extends Model {
  // Table name is the only required property.
  static get tableName() {
    return 'persons';
  }

  // Optional JSON schema. This is not the database schema. Nothing is generated
  // based on this. This is only used for validation. Whenever a model instance
  // is created it is checked against this schema. http://json-schema.org/.
  static get jsonSchema() {
    return {
      type: 'object',
      required: ['firstName', 'lastName'],

      properties: {
	id: { type: 'integer' },
	parentId: { type: ['integer', 'null'] },
	firstName: { type: 'string', minLength: 1, maxLength: 255 },
	lastName: { type: 'string', minLength: 1, maxLength: 255 },
	age: { type: 'number' },
	address: {
	  type: 'object',
	  properties: {
	    street: { type: 'string' },
	    city: { type: 'string' },
	    zipCode: { type: 'string' }
	  }
	}
      }
    };
  }

  // This object defines the relations to other models.
  static get relationMappings() {
    return {
      pets: {
	relation: Model.HasManyRelation,
	// The related model. This can be either a Model subclass constructor or an
	// absolute file path to a module that exports one.
	modelClass: Animal,
	join: {
	  from: 'persons.id',
	  to: 'animals.ownerId'
	}
      }
    };	
  }
}

module.exports = Person;
```
- Now let’s break it down, that static getter `tableName` return the table name.
- We also have a second static getter method that defines the validations of each field and this is an optional thing to do. We can specify the required properties, type of the field i.e. number, string, object, etc and other validations as you can see in the example.
- Third static getter function we see is `relationMappings` which defines this models relationship to other models. In this case, the key of the outside object `pets` is how we will refer to the child class. The join property in addition to the relation type defines how the models are related to one another. The from and to properties of the join object define the database columns through which the models are associated. The modelClass passed to the relation mappings is the class of the related model.
So here `Person` has `HasManyRelation` with `Animal` model class and join is performed on persons `id` column and Animals `ownerId` column. So one person can have multiple pets.

Queries

Let’s start with simple SELECT queries:
```
SELECT * FROM persons;
```
Can be done like:
```
const persons = await Person.query();
```
Little advanced or should I say typical select query:
```
SELECT * FROM persons where firstName = 'Ben' ORDER BY age;
```
Can be done like:
```
const persons = await Person.query()
  .where({ firstName: 'Ben' })
  .orderBy('age');
```
So we can look how much objection queries resemble to the actual SQL queries so it’s always easy to transform SQL query easily into Objection.js one which is quite difficult with other ORMs.

INSERT Queries:
```
INSERT INTO persons (firstName) VALUES ('Ben');
```
Can be done like:
```
await Person.query().insert({ firstName: 'Ben' });
```
UPDATE Queries:
```
UPDATE persons set firstName = 'Brayn' where id = 1;
```
Can be done like:
```
await Person.query().patch({ firstName: 'Brayn' }).where({ id: 1 });
```
DELETE Queries:
```
DELETE from persons where id = 1;
```
Can be done like:
```
await Person.query().delete().where({ id: 1 });
```
Relationship Queries:

Suppose we want to fetch all the pets of Person whose first name is Ben.
```
const pets = await person
  .$relatedQuery('pets')
  .where('name', 'Ben');
```
Now suppose you want to insert person along with his pets. In this case we can use the graph queries.
const personWithPets = { firstName: 'Matt', lastName: 'Damon', age: 43, pets: [ { name: 'Doggo', species: 'dog' }, { name: 'Kat', species: 'cat' } ] }; // wrap `insertGraph` call in a transaction since its creating multiple queries. const insertedGraph = await transaction(Person.knex(), trx => { return ( Person.query(trx).insertGraph(personWithPets) ); });
```
const personWithPets = {
  firstName: 'Matt',
  lastName: 'Damon',
  age: 43,

    pets: [
    {
      name: 'Doggo',
      species: 'dog'
    },
    {
      name: 'Kat',
      species: 'cat'
    }
  ]
};

// wrap `insertGraph` call in a transaction since its creating multiple queries.
const insertedGraph = await transaction(Person.knex(), trx => {
  return (
    Person.query(trx).insertGraph(personWithPets)
  );
});
```
So here we can see the power of Objection queries and if try to compare these queries with other ORM queries you will find out the difference yourself which is better.

Plugin Availability

objection-password: This plugin automatically adds automatic password hashing to your Objection.js models. This makes it super-easy to secure passwords and other sensitive data.

objection-graphql: Automatic GraphQL API generator for objection.js models.

Verdict

I am having fun time working with Objection and Knex currently! If you ask me to choose between sequalize and objection.js I would definitely go with objection.js to avoid all the relationship queries pain. It’s worth noting that Objection.js is unlike your other ORM’s, it’s just a wrapper over the KNEX.js query builder so its like using query builder with additional features.
December 12, 2022

Blog

Async Utility Module

1. Parallel

2. Series

3. Waterfall

4. Queue

5. Priority Queue

6. Race

Combining Async Flows

Conclusion

What are channels?

What are goroutines?

How to define a channel?

Read and write operations on a channel:

Internals of channels:

Types of channels:

Properties of a channel:

Channel Structure:

Read and write operations on a channel:

Write in case of buffer overflow:

Go Runtime Scheduler

Summary of a send operation for buffered channels:

Conclusion:

Prerequisites

Possible Solution

Adding TypeGoose

Adding TypeGraphQL

Limitations and Alternatives

TypeGoose:

TypeGraphQL:

Conclusion

Further Reading

Why Do We Need to Split Reducers?

1. For fast page loads

2. Organization of code

3. One page/component

4. SEO

What Exists Today?

What it Does?

Conclusion

The Problem and the Potential Solutions

Amazon Deequ

Prerequisites

Data Analysis and Validation

Automated Constraint Suggestion

Anomaly Detection

Conclusion

What are React Native Apps

What is Appium

What is WebDriverIO

Key features of Appium & WebdriverIO

Appium

WebdriverIO

Installation & Configuration

WebdriverIO Configuration

Step-by-Step Configuration of Android Emulator using Android Studio

‍Appium Desktop Configuration

Setup of ANDROID_HOME + ANDROID_SDK_ROOT & JAVA_HOME

How to write E2E React Native Mobile App Tests

‍Here is an example of how to write E2E test in Appium:

How to Run Mobile Tests Scripts

Reporting

Limitations with Appium & WebDriverIO

Appium

WebdriverIO

Conclusion

Related Articles –

References

What is Elastic Kubernetes service (EKS)?

Key EKS concepts:

Prerequisites for launching an EKS cluster:

Launching an EKS cluster:

Configure kubectl for EKS:

Launch and configure worker nodes :

Deploying an application:

1. MongoDB Deployment YAML

2. Test Application Development YAML

3. MongoDB Service YAML

4. Test Application Service YAML

Services