Author: admin

  • Implementing Async Features in Python – A Step-by-step Guide

    Asynchronous programming is a characteristic of modern programming languages that allows an application to perform various operations without waiting for any of them. Asynchronicity is one of the big reasons for the popularity of Node.js.

    We have discussed Python’s asynchronous features as part of our previous post: an introduction to asynchronous programming in Python. This blog is a natural progression on the same topic. We are going to discuss async features in Python in detail and look at some hands-on examples.

    Consider a traditional web scraping application that needs to open thousands of network connections. We could open one network connection, fetch the result, and then move to the next ones iteratively. This approach increases the latency of the program. It spends a lot of time opening a connection and waiting for others to finish their bit of work.

    On the other hand, async provides you a method of opening thousands of connections at once and swapping among each connection as they finish and return their results. Basically, it sends the request to a connection and moves to the next one instead of waiting for the previous one’s response. It continues like this until all the connections have returned the outputs.  

    Source: phpmind

    From the above chart, we can see that using synchronous programming on four tasks took 45 seconds to complete, while in asynchronous programming, those four tasks took only 20 seconds.

    Where Does Asynchronous Programming Fit in the Real-world?

    Asynchronous programming is best suited for popular scenarios such as:

    1. The program takes too much time to execute.

    2. The reason for the delay is waiting for input or output operations, not computation.

    3. For the tasks that have multiple input or output operations to be executed at once.

    And application-wise, these are the example use cases:

    • Web Scraping
    • Network Services

    Difference Between Parallelism, Concurrency, Threading, and Async IO

    Because we discussed this comparison in detail in our previous post, we will just quickly go through the concept as it will help us with our hands-on example later.

    Parallelism involves performing multiple operations at a time. Multiprocessing is an example of it. It is well suited for CPU bound tasks.

    Concurrency is slightly broader than Parallelism. It involves multiple tasks running in an overlapping manner.

    Threading – a thread is a separate flow of execution. One process can contain multiple threads and each thread runs independently. It is ideal for IO bound tasks.

    Async IO is a single-threaded, single-process design that uses cooperative multitasking. In simple words, async IO gives a feeling of concurrency despite using a single thread in a single process.

     

    Fig:- A comparison in concurrency and parallelism

     

    Components of Async IO Programming

    Let’s explore the various components of Async IO in depth. We will also look at an example code to help us understand the implementation.

    1. Coroutines

    Coroutines are mainly generalization forms of subroutines. They are generally used for cooperative tasks and behave like Python generators.

    An async function uses the await keyword to denote a coroutine. When using the await keyword, coroutines release the flow of control back to the event loop.

    To run a coroutine, we need to schedule it on the event loop. After scheduling, coroutines are wrapped in Tasks as a Future object.

    Example:

    In the below snippet, we called async_func from the main function. We have to add the await keyword while calling the sync function. As you can see, async_func will do nothing unless the await keyword implementation accompanies it.

    import asyncio
    async def async_func():
        print('Velotio ...')
        await asyncio.sleep(1)
        print('... Technologies!')
    
    async def main():
        async_func()#this will do nothing because coroutine object is created but not awaited
        await async_func()
    
    asyncio.run(main())

    Output

    RuntimeWarning: coroutine 'async_func' was never awaited
     async_func()#this will do nothing because coroutine object is created but not awaited
    RuntimeWarning: Enable tracemalloc to get the object allocation traceback
    Velotio ...
    ... Blog!

    2. Tasks

    Tasks are used to schedule coroutines concurrently.

    When submitting a coroutine to an event loop for processing, you can get a Task object, which provides a way to control the coroutine’s behavior from outside the event loop.

    Example:

    In the snippet below, we are creating a task using create_task (an inbuilt function of asyncio library), and then we are running it.

    import asyncio
    async def async_func():
        print('Velotio ...')
        await asyncio.sleep(1)
        print('... Blog!')
    
    async def main():
        task = asyncio.create_task (async_func())
        await task
    asyncio.run(main())

    Output

    Velotio ...
    ... Blog!

    3 Event Loops

    This mechanism runs coroutines until they complete. You can imagine it as while(True) loop that monitors coroutine, taking feedback on what’s idle, and looking around for things that can be executed in the meantime.

    It can wake up an idle coroutine when whatever that coroutine is waiting on becomes available.

    Only one event loop can run at a time in Python.

    Example:

    In the snippet below, we are creating three tasks and then appending them in a list and executing all tasks asynchronously using get_event_loop, create_task and the await function of the asyncio library.

    import asyncio
    async def async_func(task_no):
        print(f'{task_no} :Velotio ...')
        await asyncio.sleep(1)
        print(f'{task_no}... Blog!')
    
    async def main():
        taskA = loop.create_task (async_func('taskA'))
        taskB = loop.create_task(async_func('taskB'))
        taskC = loop.create_task(async_func('taskC'))
        await asyncio.wait([taskA,taskB,taskC])
    
    if __name__ == "__main__":
        try:
            loop = asyncio.get_event_loop()
            loop.run_until_complete(main())
        except :
            pass

    Output

    taskA :Velotio ...
    taskB :Velotio ...
    taskC :Velotio ...
    taskA... Blog!
    taskB... Blog!
    taskC... Blog!

    Future

    A future is a special, low-level available object that represents an eventual result of an asynchronous operation.

    When a Future object is awaited, the co-routine will wait until the Future is resolved in some other place.

    We will look into the sample code for Future objects in the next section.

    A Comparison Between Multithreading and Async IO

    Before we get to Async IO, let’s use multithreading as a benchmark and then compare them to see which is more efficient.

    For this benchmark, we will be fetching data from a sample URL (the Velotio Career webpage) with different frequencies, like once, ten times, 50 times, 100 times, 500 times, respectively.

    We will then compare the time taken by both of these approaches to fetch the required data.

    Implementation

    Code of Multithreading:

    import requests
    import time
    from concurrent.futures import ProcessPoolExecutor
    
    
    def fetch_url_data(pg_url):
        try:
            resp = requests.get(pg_url)
        except Exception as e:
            print(f"Error occured during fetch data from url{pg_url}")
        else:
            return resp.content
            
    
    def get_all_url_data(url_list):
        with ProcessPoolExecutor() as executor:
            resp = executor.map(fetch_url_data, url_list)
        return resp
        
    
    if __name__=='__main__':
        url = "https://www.velotio.com/careers"
        for ntimes in [1,10,50,100,500]:
            start_time = time.time()
            responses = get_all_url_data([url] * ntimes)
            print(f'Fetch total {ntimes} urls and process takes {time.time() - start_time} seconds')

    Output

    Fetch total 1 urls and process takes 1.8822264671325684 seconds
    Fetch total 10 urls and process takes 2.3358211517333984 seconds
    Fetch total 50 urls and process takes 8.05638575553894 seconds
    Fetch total 100 urls and process takes 14.43302869796753 seconds
    Fetch total 500 urls and process takes 65.25404500961304 seconds

    ProcessPoolExecutor is a Python package that implements the Executor interface. The fetch_url_data is a function to fetch the data from the given URL using the requests python package, and the get_all_url_data function is used to map the fetch_url_data function to the lists of URLs.

    Async IO Programming Example:

    import asyncio
    import time
    from aiohttp import ClientSession, ClientResponseError
    
    
    async def fetch_url_data(session, url):
        try:
            async with session.get(url, timeout=60) as response:
                resp = await response.read()
        except Exception as e:
            print(e)
        else:
            return resp
        return
    
    
    async def fetch_async(loop, r):
        url = "https://www.velotio.com/careers"
        tasks = []
        async with ClientSession() as session:
            for i in range(r):
                task = asyncio.ensure_future(fetch_url_data(session, url))
                tasks.append(task)
            responses = await asyncio.gather(*tasks)
        return responses
    
    
    if __name__ == '__main__':
        for ntimes in [1, 10, 50, 100, 500]:
            start_time = time.time()
            loop = asyncio.get_event_loop()
            future = asyncio.ensure_future(fetch_async(loop, ntimes))
            loop.run_until_complete(future) #will run until it finish or get any error
            responses = future.result()
            print(f'Fetch total {ntimes} urls and process takes {time.time() - start_time} seconds')

    Output

    Fetch total 1 urls and process takes 1.3974951362609863 seconds
    Fetch total 10 urls and process takes 1.4191942596435547 seconds
    Fetch total 50 urls and process takes 2.6497368812561035 seconds
    Fetch total 100 urls and process takes 4.391665458679199 seconds
    Fetch total 500 urls and process takes 4.960426330566406 seconds

    We need to use the get_event_loop function to create and add the tasks. For running more than one URL, we have to use ensure_future and gather function.

    The fetch_async function is used to add the task in the event_loop object and the fetch_url_data function is used to read the data from the URL using the session package. The future_result method returns the response of all the tasks.

    Results:

    As you can see from the plot, async programming is much more efficient than multi-threading for the program above. 

    The graph of the multithreading program looks linear, while the asyncio program graph is similar to logarithmic.

     

    Conclusion

    As we saw in our experiment above, Async IO showed better performance with the efficient use of concurrency than multi-threading.

    Async IO can be beneficial in applications that can exploit concurrency. Though, based on what kind of applications we are dealing with, it is very pragmatic to choose Async IO over other implementations.

    We hope this article helped further your understanding of the async feature in Python and gave you some quick hands-on experience using the code examples shared above.

  • API Testing Using Postman and Newman

    In the last few years, we have an exponential increase in the development and use of APIs. We are in the era of API-first companies like Stripe, Twilio, Mailgun etc. where the entire product or service is exposed via REST APIs. Web applications also today are powered by REST-based Web Services. APIs today encapsulate critical business logic with high SLAs. Hence it is important to test APIs as part of the continuous integration process to reduce errors, improve predictability and catch nasty bugs.

    In the context of API development, Postman is great REST client to test APIs. Although Postman is not just a REST Client, it contains a full-featured testing sandbox that lets you write and execute Javascript based tests for your API.

    Postman comes with a nifty CLI tool – Newman. Newman is the Postman’s Collection Runner engine that sends API requests, receives the response and then runs your tests against the response. Newman lets developments easily integrate Postman into continuous integration systems like Jenkins. Some of the important features of Postman & Newman include:-

    1. Ability to test any API and see the response instantly.
    2. Ability to create test suites or collections using a collection of API endpoints.
    3. Ability to collaborate with team members on these collections.
    4. Ability to easily export/import collections as JSON files.

    We are going to look at all these features, some are intuitive and some not so much unless you’ve been using Postman for a while.

    Setting up Your Postman

    You can install Postman either as a Chrome extension or as a native application

    Later, can then look it up in your installed apps and open it. You can choose to Sign Up & create an account if you want, this is important especially for saving your API collections and accessing them anytime on any machine. However, for this article, we can skip this. There’s a button for that towards the bottom when you first launch the app.

    Postman Collections

    Postman Collections in simple words is a collection of tests. It is essentially a test suite of related tests. These tests can be scenario-based tests or sequence/workflow-based tests.

    There’s a Collections tab on the top left of Postman, with an example Postman Echo collection. You can open and go through it.

    Just like in the above screenshot, select a API request and click on the Tests. Check the first line:

    tests["response code is 200"] = responseCode.code === 200;

    The above line is a simple test to check if the response code for the API is 200. This is the pattern for writing Assertions/Tests in Postman (using JavaScript), and this is actually how you are going to write the tests for API’s need to be tested.You can open the other API requests in the POSTMAN Echo collection to get a sense of how requests are made.

    Adding a COLLECTION

    To make your own collection, click on the ‘Add Collection‘ button on the top left of Postman and call it “Test API”

    You will be prompted to give details about the collection, I’ve added a name Github API and given it a description.

    Clicking on Create should add the collection to the left pane, above, or below the example “POSTMAN Echo” collection.

    If you need a hierarchy for maintaining relevance between multiple API’s inside a collection, APIs can further be added to a folder inside a collection. Folders are a great way of separating different parts of your API workflow. You can be added folders through the “3 dot” button beside Collection Name:

    Eg.: name the folder “Get Calls” and give a description once again.

    Now that we have the folder, the next task is to add an API call that is related to the TEST_API_COLLECTION to that folder. That API call is to https://api.github.com/.

    If you still have one of the TEST_API_COLLECTION collections open, you can close it the same way you close tabs in a browser, or just click on the plus button to add a new tab on the right pane where we make requests.

    Type in or paste in https://api.github.com/ and press Send to see the response.

    Once you get the response, you can click on the arrow next to the Save button on the far right, and select Save As, a pop up will be displayed asking where to save the API call.

    Give a name, it can be the request URL, or a name like “GET Github Basic”, and a description, then choose the collection and folder, in this case, TEST_API_COLLECTION> GET CALLS, then click on Save. The API call will be added to the Github Root API folder on the left pane.

    Whenever you click on this request from the collection, it will open in the center pane.

    Write the Tests

    We’ve seen that the GET Github Basic request has a JSON response, which is usually the case for most of the APIs.This response has properties such as current_user_url, emails_url, followers_url and following_url to pick a few. The current_user_url has a value of https://api.github.com/user.  Let’s add a test, for this URL. Click on the ‘GET Github Basic‘ and click on the test tab in the section just below where the URL is put.

    You will notice on the right pane, we have some snippets which Postman creates when you click so that you don’t have to write a lot of code. Let’s add Response Body: JSON value check. Clicking on it produces the following snippet.

    var jsonData = JSON.parse(responseBody);
    tests["Your test name"] = jsonData.value === 100;

    From these two lines, it is apparent that Postman stores the response in a global object called responseBody, and we can use this to access response and assert values in tests as required.

    Postman also has another global variable object called tests, which is an object you can use to name your tests, and equate it to a boolean expression. If the boolean expression returns true, then the test passes.

    tests['some random test'] = x === y

    If you click on Send to make the request, you will see one of the tests failing.

    Lets create a test that relevant to our usecase.

    var jsonData = JSON.parse(responseBody);
    var usersURL = "https://api.github.com/user"
    tests["Gets the correct users url"] = jsonData.current_user_url === usersURL;

    Clicking on ‘Send‘, you’ll see the test passing.

    Let’s modify the test further to test some of the properties we want to check

    Ideally the things to be tested in an API Response Body should be:

    • Response Code ( Assert Correct Response Code for any request)
    • Response Time ( to check api responds in an acceptable time range / is not delayed)
    • Response Body is not empty / null
    tests["Status code is 200"] = responseCode.code === 200;
    tests["Response time is less than 200ms"] = responseTime < 200;
    tests["Response time is acceptable"] = _.inRange(responseTime, 0, 500);
    tests["Body is not empty"] = (responseBody!==null || responseBody.length!==0);

    Newman CLI

    Once you’ve set up all your collections and written tests for them, it may be tedious to go through them one by one and clicking send to see if a given collection test passes. This is where Newman comes in. Newman is a command-line collection runner for Postman.

    All you need to do is export your collection and the environment variables, then use Newman to run the tests from your terminal.

    NOTE: Make sure you’ve clicked on ‘Save’ to save your collection first before exporting.

    USING NEWMAN

    So the first step is to export your collection and environment variables. Click on the Menu icon for Github API collection, and select export.

    Select version 2, and click on “Export”

    Save the JSON file in a location you can access with your terminal. I created a local directory/folder called “postman” and saved it there.

    Install Newman CLI globally, then navigate to the where you saved the collection.

    npm install -g newman 
    cd postman

    Using Newman is quite straight-forward, and the documentation is extensive. You can even require it as a Node.js module and run the tests there. However, we will use the CLI.

    Once you are in the directory, run newman run <collection_name.json>, </collection_name.json> replacing the collection_name with the name you used to save the collection.

    newman run TEST_API_COLLECTION.postman_collection.json     

    NEWMAN CLI Options

    Newman provides a rich set of options to customize a run. A list of options can be retrieved by running it with the -h flag.

    
    $ newman run -h
    Options - Additional args: 
    Utility:
    -h, --help output usage information
    -v, --version output the version number
    Basic setup:
    --folder [folderName] Specify a single folder to run from a collection.
    -e, --environment [file|URL] Specify a Postman environment as a JSON [file]
    -d, --data [file] Specify a data file to use either json or csv
    -g, --global [file] Specify a Postman globals file as JSON [file]
    -n, --iteration-count [number] Define the number of iterations to run
    Request options:
    --delay-request [number] Specify a delay (in ms) between requests [number] --timeout-request [number] Specify a request timeout (in ms) for a request
    Misc.:
    --bail Stops the runner when a test case fails
    --silent Disable terminal output --no-color Disable colored output
    -k, --insecure Disable strict ssl
    -x, --suppress-exit-code Continue running tests even after a failure, but exit with code=0
    --ignore-redirects Disable automatic following of 3XX responses

    Lets try out of some of the options.

    Iterations

    Lets use the -n option to set the number of iterations to run the collection.

    $ newman run mycollection.json -n 10 # runs the collection 10 times

    To provide a different set of data, i.e. variables for each iteration, you can use the -d to specify a JSON or CSV file. For example, a data file such as the one shown below will run 2 iterations, with each iteration using a set of variables.

    [{
    "url": "http://127.0.0.1:5000",
      "user_id": "1",
      "id": "1",
      "token_id": "123123",
    },{
      "url": "http://postman-echo.com",
      "user_id": "2",
      "id": "2",
      "token_id": "899899",
    }]$ newman run mycollection.json -d data.json

    Alternately, the CSV file for the above set of variables would look like:

    url, user_id, id, token_id 
    http://127.0.0.1:5000, 1, 1, 123123123 
    http://postman-echo.com, 2, 2, 899899

    Environment Variables

    Each environment is a set of key-value pairs, with the key as the variable name. These Environment configurations can be used to differentiate between configurations specific to your execution environments eg. Dev, Test & Production.

    To provide a different execution environment, you can use the -e to specify a JSON or CSV file. For example, a environment file such as the one shown below will provide the environment variables globally to all tests during execution.

    postman_dev_env.json
    {
    "id": "b5c617ad-7aaf-6cdf-25c8-fc0711f8941b",
    "name": "dev env",
    "values": [
    {
    "enabled": true,
    "key": "env",
    "value": "dev.example.com",
    "type": "text"
    }  
    ],
    "timestamp": 1507210123364,
    "_postman_variable_scope": "environment",
    "_postman_exported_at": "2017-10-05T13:28:45.041Z",
    "_postman_exported_using": "Postman/5.2.1"
    }

    Bail FLAG

    Newman, by default, exits with a status code of 0 if everything runs well i.e. without any exceptions. Continuous integration tools respond to these exit codes and correspondingly pass or fail a build. You can use the –bail flag to tell Newman to halt on a test case error with a status code of 1 which can then be picked up by a CI tool or build system.

    $ newman run PostmanCollection.json -e environment.json --bail newman

    Conclusion

    Postman and Newman can be used for a number of test cases, including creating usage scenarios, Suites, Packs for your API Test Cases. Further NEWMAN / POSTMAN can be very well Integrated with CI/CD Tools such as Jenkins, Travis etc.

  • Building a Progressive Web Application in React [With Live Code Examples]

    What is PWA:

    A Progressive Web Application or PWA is a web application that is built to look and behave like native apps, operates offline-first, is optimized for a variety of viewports ranging from mobile, tablets to FHD desktop monitors and more. PWAs are built using front-end technologies such as HTML, CSS and JavaScript and bring native-like user experience to the web platform. PWAs can also be installed on devices just like native apps.

    For an application to be classified as a PWA, it must tick all of these boxes:

    • PWAs must implement service workers. Service workers act as a proxy between the web browsers and API servers. This allows web apps to manage and cache network requests and assets
    • PWAs must be served over a secure network, i.e. the application must be served over HTTPS
    • PWAs must have a web manifest definition, which is a JSON file that provides basic information about the PWA, such as name, different icons, look and feel of the app, splash screen, version of the app, description, author, etc

    Why build a PWA?

    Businesses and engineering teams should consider building a progressive web app instead of a traditional web app. Here are some of the most prominent arguments in favor of PWAs:

    • PWAs are responsive. The mobile-first design approach enables PWAs to support a variety of viewports and orientation
    • PWAs can work on slow Internet or no Internet environment. App developers can choose how a PWA will behave when there’s no Internet connectivity, whereas traditional web apps or websites simply stop working without an active Internet connection
    • PWAs are secure because they are always served over HTTPs
    • PWAs can be installed on the home screen, making the application more accessible
    • PWAs bring in rich features, such as push notification, application updates and more

    PWA and React

    There are various ways to build a progressive web application. One can just use Vanilla JS, HTML and CSS or pick up a framework or library. Some of the popular choices in 2020 are Ionic, Vue, Angular, Polymer, and of course React, which happens to be my favorite front-end library.

    Building PWAs with React

    To get started, let’s create a PWA which lists all the users in a system.

    npm init react-app users
    cd users
    yarn add react-router-dom
    yarn run start

    Next, we will replace the default App.js file with our own implementation.

    import React from "react";
    import { BrowserRouter, Route } from "react-router-dom";
    import "./App.css";
    const Users = () => {
     // state
     const [users, setUsers] = React.useState([]);
     // effects
     React.useEffect(() => {
       fetch("https://jsonplaceholder.typicode.com/users")
         .then((res) => res.json())
         .then((users) => {
           setUsers(users);
         })
         .catch((err) => {});
     }, []);
     // render
     return (
       <div>
         <h2>Users</h2>
         <ul>
           {users.map((user) => (
             <li key={user.id}>
               {user.name} ({user.email})
             </li>
           ))}
         </ul>
       </div>
     );
    };
    const App = () => (
     <BrowserRouter>
       <Route path="/" exact component={Users} />
     </BrowserRouter>
    );
    export default App;

    This displays a list of users fetched from the server.

    Let’s also remove the logo.svg file inside the src directory and truncate the App.css file that is populated as a part of the boilerplate code.

    To make this app a PWA, we need to follow these steps:

    1. Register service worker

    • In the file /src/index.js, replace serviceWorker.unregister() with serviceWorker.register().
    import React from 'react';
    import ReactDOM from 'react-dom';
    import './index.css';
    import App from './App';
    import * as serviceWorker from './serviceWorker';
    ReactDOM.render(
     <React.StrictMode>
       <App />
     </React.StrictMode>,
     document.getElementById('root')
    );
    serviceWorker.register();

    • The default behavior here is to not set up a service worker, i.e. the CRA boilerplate allows the users to opt-in for the offline-first experience.

    2. Update the manifest file

    • The CRA boilerplate provides a manifest file out of the box. This file is located at /public/manifest.json and needs to be modified to include the name of the PWA, description, splash screen configuration and much more. You can read more about available configuration options in the manifest file here.

    Our modified manifest file looks like this:

    {
     "short_name": "User Mgmt.",
     "name": "User Management",
     "icons": [
       {
         "src": "favicon.ico",
         "sizes": "64x64 32x32 24x24 16x16",
         "type": "image/x-icon"
       },
       {
         "src": "logo192.png",
         "type": "image/png",
         "sizes": "192x192"
       },
       {
         "src": "logo512.png",
         "type": "image/png",
         "sizes": "512x512"
       }
     ],
     "start_url": ".",
     "display": "standalone",
     "theme_color": "#aaffaa",
     "background_color": "#ffffff"
    }

    PWA Splash Screen

    Here the display mode selected is “standalone” which tells the web browsers to give this PWA the same look and feel as that of a standalone app. Other display options include, “browser,” which is the default mode and launches the PWA like a traditional web app and “fullscreen,” which opens the PWA in fullscreen mode – hiding all other elements such as navigation, the address bar and the status bar.

    The manifest can be inspected using Chrome dev tools > Application tab > Manifest.

    1. Test the PWA:

    • To test a progressive web app, build it completely first. This is because PWA features, such as caching aren’t enabled while running the app in dev mode to ensure hassle-free development  
    • Create a production build with: npm run build
    • Change into the build directory: cd build
    • Host the app locally: http-server or python3 -m http.server 8080
    • Test the application by logging in to http://localhost:8080

    2. Audit the PWA: If you are testing the app for the first time on a desktop or laptop browser, PWA may look like just another website. To test and audit various aspects of the PWA, let’s use Lighthouse, which is a tool built by Google specifically for this purpose.

    PWA on mobile

    At this point, we already have a simple PWA which can be published on the Internet and made available to billions of devices. Now let’s try to enhance the app by improving its offline viewing experience.

    1. Offline indication: Since service workers can operate without the Internet as well, let’s add an offline indicator banner to let users know the current state of the application. We will use navigator.onLine along with the “online” and “offline” window events to detect the connection status.

     // state
      const [offline, setOffline] = React.useState(false);
      // effects
      React.useEffect(() => {
        window.addEventListener("offline", offlineListener);
        return () => {
          window.removeEventListener("offline", offlineListener);
        };
      }, []);
      
      {/* add to jsx */}
      {offline ? (
        <div className="banner-offline">The app is currently offline</div>
      ) : null}

    The easiest way to test this is to just turn off the Wi-Fi on your dev machine. Chrome dev tools also provide an option to test this without actually going offline. Head over to Dev tools > Network and then select “Offline” from the dropdown in the top section. This should bring up the banner when the app is offline.

    2. Let’s cache a network request using service worker

    CRA comes with its own service-worker.js file which caches all static assets such as JavaScript and CSS files that are a part of the application bundle. To put custom logic into the service worker, let’s create a new file called ‘custom-service-worker.js’ and combine the two.

    • Install react-app-rewired and update package.json:
    1. yarn add react-app-rewired
    2. Update the package.json as follows:
    "scripts": {
       "start": "react-app-rewired start",
       "build": "react-app-rewired build",
       "test": "react-app-rewired test",
       "eject": "react-app-rewired eject"
    },

    • Create a config file to override how CRA generates service workers and inject our custom service worker, i.e. combine the two service worker files.
    const WorkboxWebpackPlugin = require("workbox-webpack-plugin");
    module.exports = function override(config, env) {
      config.plugins = config.plugins.map((plugin) => {
        if (plugin.constructor.name === "GenerateSW") {
          return new WorkboxWebpackPlugin.InjectManifest({
           swSrc: "./src/service-worker-custom.js",
           swDest: "service-worker.js"
          });
        }
        return plugin;
      });
      return config;
    };

    • Create service-worker-custom.js file and cache network request in there:
    workbox.skipWaiting();
    workbox.clientsClaim();
    workbox.routing.registerRoute(
      new RegExp("/users"),
      workbox.strategies.NetworkFirst()
    );
    workbox.precaching.precacheAndRoute(self.__precacheManifest || [])

    Your app should now work correctly in the offline mode.

    Distributing and publishing a PWA

    PWAs can be published just like any other website and only have one additional requirement, i.e. it must be served over HTTPs. When a user visits PWA from mobile or tablet, a pop-up is displayed asking the user if they’d like to install the app to their home screen.

    Conclusion

    Building PWAs with React enables engineering teams to develop, deploy and publish progressive web apps for billions of devices using technologies they’re already familiar with. Existing React apps can also be converted to a PWA. PWAs are fun to build, easy to ship and distribute, and add a lot of value to customers by providing native-live experience, better engagement via features, such as add to homescreen, push notifications and more without any installation process. 

  • Building Scalable and Efficient React Applications Using GraphQL and Relay

    Building a React application is not only about creating a user interface. It also has tricky parts like data fetching, re-render performance, and scalability. Many libraries and frameworks try to solve these problems, like Redux, Sagas, etc. But these tools come with their own set of difficulties.

    Redux gives you a single data source, but all the data fetching and rendering logic is handled by developers. Immer gives you immutable data structures, but one needs to handle the re-render performance of applications.

    GraphQL helps developers design and expose APIs on the backend, but no tool on the client side could utilize the full advantage of the single endpoint and data schema provided by GraphQL.

    In this article, we will learn about Relay as a GraphQL client. What are the advantages of using Relay in your application, and what conventions are required to integrate it?  We’ll also cover how following those conventions will give you a better developer experience and a performant app. We will also see how applications built with Relay are modular, scalable, efficient, and, by default, resilient to change.

    About Relay

    Relay is a JavaScript framework to declaratively fetch and manage your GraphQL data inside a React application. Relay uses static queries and ahead-of-time compilation to help you build a high-performance app. 

    But as the great saying goes, “With great power comes great responsibilities.” Relay comes with a set of costs (conventions), which—when compared with the benefits you get—is well worth it. We will explore the trade-offs in this article.

    The Relay framework is built of multiple modules:

    1. The compiler: This is a set of modules designed to extract GraphQL code from across the codebase and do validations and optimizations during build time.

    2. Relay runtime: A high-performance GraphQL runtime that features a normalized cache for objects and highly optimized read/write operations, simplified abstractions over fetching data fields, garbage collection, subscriptions, and more.

    3. React-relay: This provides the high-level APIs to integrate React with the Relay runtime.

    The Relay compiler runs as a separate process, like how webpack works for React. It keeps watching and compiling the GraphQL code, and in case of errors, it simply does not build your code, which prevents bugs from going into higher environments.

    Fragments

    Fragments are at the heart of how Relay blends with GraphQL. A fragment is a selection of fields on a GraphQL type. 

    fragment Avatar_user on User {
      avatarImgUrl
      firstName
      lastName
      userName
    }

    If we look at the sample fragment definition above, the fragment name, Avatar_user, is not just a random name. One of the Relay framework’s important conventions is that fragments have globally unique fragment names and follow a structure of <modulename>_<propertyname>. The example above is a fragment definition for Avatar_user.</propertyname></modulename>

    This fragment can then be reused throughout the queries instead of selecting the fields manually to render the avatar in each view.

    In the below query, we see the author type, and the first two who liked the blog post can use the fragment definition of Avatar_user

    query GetBlogPost($postId: ID!) {
          blogPostById(id: $postId) {
            author {
              firstName
              lastName
              avatarImgUrl
              userName
            }
            likedBy(first: 2) {
              edges {
                node {
                  firstName
                  lastName
                  avatarImgUrl
                  userName
                }
              }
            }
          }
        }

    Now, our new query with fragments looks like this:

    query GetBlogPost($postId: ID!) {
          blogPostById(id: $postId) {
            author {
              ...Avatar_user
            }
            likedBy(first: 2) {
              edges {
                node {
                  ...Avatar_user
                }
              }
            }
          }
        }

    Fragments not only allow us to reuse the definitions but more essentially, they let us add or remove fields needed to render our avatar as we evolve our application.

    Another highly important client-side convention is colocation. This means the data required for a component lives inside the component. This makes maintenance and extending much easier. Just like how React allows us to break our UI elements into components and group/compose different views, fragments in Relay allow us to split the data definitions and colocate the data and the view definitions.

    So, a good practice is to define single or multiple fragments that contain the data component to be rendered. This means that a component depends on some fields from the user type, irrespective of the parent component. In the example above, the <avatar> component will render an avatar using the fields specified in the Avatar_user fragment named.</avatar>

    How Relay leverages the GraphQL Fragment

    Relay wants all components to enlist all the data it needs to render, along with the component itself. Relay uses data and fragments to integrate the component and its data requirement. This convention mandates that every component lists the fields it needs access to. 

    Other advantages of the above are:

    1. Components are not dependent on data they don’t explicitly request.
    2. Components are modular and self-contained.
    3. Reusing and refactoring the components becomes easier.

    Performance

    In Relay, the component re-renders only when its exact fields change, and this feature available is out of the box. The fragment subscribes to updates specifically for data the component selects. This lets Relay enhance how the view is updated, and performance is not affected as codebase scales.

    Now, let’s look at an example of components in a single post of a blog application. Here is a wireframe of a sample post to give an idea of the data and view required.

    Now, let’s write a plain query without Relay, which will fetch all the data in a single query. It will look like this for the above wireframe:

    query GetBlogPost($postId: ID!) {
          blogPostById(id: $postId) {
            author {
              firstName
              lastName
              avatarUrl
              shortBio
            }
            title
            coverImgUrl
            createdAt
            tags {
              slug
              shortName
            }
            body
            likedByMe
            likedBy(first: 2) {
              totalCount
              edges {
                node {
                  firstName
                  lastName
                  avatarUrl
                }
              }
            }
          }
        }

    This one query has all the necessary data. Let’s also write down a sample structure of UI components for the query above:

    <BlogPostContainer>
        <BlogPostHead>
          <BlogPostAuthor>
            <Avatar />
          </BlogPostAuthor>
        </BlogPostHead>
        <BlogPostBody>
          <BlogPostTitle />
          <BlogPostMeta>
            <CreatedAtDisplayer />
            <TagsDisplayer />
          </BlogPostMeta>
          <BlogPostContent />
          <LikeButton>
            <LikedByDisplayer />
          </LikeButton>
        </BlogPostBody>
     </BlogPostContainer>

    In the implementation above, we have a single query that will be managed by the top-level component. It will be the top-level component’s responsibility to fetch the data and pass it down as props. Now, we will look at how we would build this in Relay:

    import * as React from "react";
        import { GetBlogPost } from "./__generated__/GetBlogPost.graphql";
        import { useLazyLoadQuery } from "react-relay/hooks";
        import { BlogPostHead } from "./BlogPostHead";
        import { BlogPostBody } from "./BlogPostBody";
        import { graphql } from "react-relay";
    
    
        interface BlogPostProps {
          postId: string;
        }
    
        export const BlogPost = ({ postId }: BlogPostProps) => {
          const { blogPostById } = useLazyLoadQuery<GetBlogPost>(
            graphql`
              query GetBlogPost($postId: ID!) {
                blogPostById(id: $postId) {
                  ...BlogPostHead_blogPost
                  ...BlogPostBody_blogPost
                }
              }
            `,
            {
              variables: { postId }
            }
          );
    
          if (!blogPostById) {
            return null;
          }
    
          return (
            <div>
              <BlogPostHead blogPost={blogPostById} />
              <BlogPostBody blogPost={blogPostById} />
            </div>
          );
        };

    First, let’s look at the query used inside the component:

    const { blogPostById } = useLazyLoadQuery<GetBlogPost>(
    graphql`
      query GetBlogPost($postId: ID!) {
        blogPostById(id: $postId) {
          ...BlogPostHead_blogPost
          ...BlogPostBody_blogPost
        }
      }
    `,
    {
      variables: { postId }
    }
    );

    The useLazyLoadQuery React hook from Relay will start fetching the GetBlogPost query just as the component renders. 

    NOTE: The useLazyLoadQuery is used here as it follows a common mental model of fetching data after the page is loaded. However, Relay encourages data to be fetched as early as possible using the usePreladedQuery hook. 

    For type safety, we are annotating the useLazyLoadQuery with the type GetBlogPost, which is imported from ./__generated__/GetBlogPost.graphql. This file is auto-generated and synced by the Relay compiler. It contains all the information about the types needed to be queried, along with the return type of data and the input variables for the query.

    The Relay compiler takes all the declared fragments in the codebase and generates the type files, which can then be used to annotate a particular component.

    The GetBlogPost query is defined by composing multiple fragments. Another great aspect of Relay is that there is no need to import the fragments manually. They are automatically included by the Relay compiler. Building the query by composing fragments, just like how we compose our component, is the key here. 

    Another approach can be to define queries per component, which takes full responsibility for its data requirements. But this approach has two problems: 

    1. Multiple queries are sent to the server instead of one.

    2. The loading will be slower as components would have to wait till they render to start fetching the data.

    In the above example, the GetBlogPost only deals with including the fragments for its child components, BlogPostHead and BlogPostBody. It is kept hidden from the actual data fields of the children component.

    When using Relay, components define their data requirement by themselves. These components can then be composed along with other components that have their own separate data. 

    At the same time, no component knows what data the other component needs except from the GraphQL type that has the required component data. Relay makes sure the right data is passed to the respective component, and all input for a query is sent to the server.

    This allows developers to think only about the component and fragments as one while Relay does all the heavy lifting in the background. Relay minimizes the round-trips to the server by placing the fragments from multiple components into optimized and efficient batches. 

    As we said earlier, the two fragments, BlogPostHead_blogPost and BlogPostBody_blogPost, which we referenced in the query, are not imported manually. This is because Relay imposes unique fragment names globally so that the compiler can include the definitions in queries sent to the server. This eliminates the chances of errors and takes away the laborious task of referencing the fragments by hand. 

     if (!blogPostById) {
          return null;
      }
    
      return (
        <div>
          <BlogPostHead blogPost={blogPostById} />
          <BlogPostBody blogPost={blogPostById} />
        </div>
      );

    Now, in the rendering logic above, we render the <BlogPostHead/> and <BlogPostBody/> and pass the blogPostById object as prop. It’s passed because it is the object inside the query that spreads the fragment needed by the two components. This is how Relay transfers fragment data. Because we spread both fragments on this object, it is guaranteed to satisfy both components.

    To put it into simpler terms, we say that to pass the fragment data, we pass the object where the fragment is spread, and the component then uses this object to get the real fragment data. Relay, through its robust type systems, makes sure that the right object is passed with required fragment spread on it.

    The previous component, the BlogPost, was the Parent component, i.e., the component with the root query object. The root query is necessary because it cannot fetch a fragment in isolation. Fragments must be included in the root query in a parent component. The parent can, in turn, be a fragment as long the root query exists in the hierarchy. Now, we will build the BlogPostHead component using fragments:

     import * as React from "react";
        import { useFragment } from "react-relay/hooks";
        import { graphql } from "react-relay";
        import {
          BlogPostHead_blogPost$key, BlogPostHead_blogPost
        } from "./__generated__/BlogPostHead_blogPost.graphql";
        import { BlogPostAuthor } from "./BlogPostAuthor";
        import { BlogPostLikeControls } from "./BlogPostLikeControls";
    
        interface BlogPostHeadProps {
          blogPost: BlogPostHead_blogPost$key;
        }
    
        export const BlogPostHead = ({ blogPost }: BlogPostHeadProps) => {
          const blogPostData = useFragment<BlogPostHead_blogPost>(
            graphql`
              fragment BlogPostHead_blogPost on BlogPost {
                title
                coverImgUrl
                ...BlogPostAuthor_blogPost
                ...BlogPostLikeControls_blogPost
              }
            `,
            blogPost
          );
    
          return (
            <div>
              <img src={blogPostData.coverImgUrl} />
              <h1>{blogPostData.title}</h1>
              <BlogPostAuthor blogPost={blogPostData} />
              <BlogPostLikeControls blogPost={blogPostData} />
            </div>
          );
        };

    NOTE: In our example, the BlogPostHead and BlogPostBody define only one fragment, but in general, a component can have any number of fragments or GraphQL types and even more than one fragments on the same type.

    In the component above, two type definitions, namely BlogPostHead_blogPost$key and BlogPostHead_blogPost, are imported from the file BlogPostHead_blogPost.graphql, generated by the Relay compiler. The compiler extracts the fragment code from this file and generates the types. This process is followed for all the GraphQL code—queries, mutations, fragments, and subscriptions.

    The blogPostHead_blogPost has the fragment type definitions, which is then passed to the useFragment hook to ensure type safety when using the data from the fragment. The other import, blogPostHead_blogPost$key, is used in the interface Props { … }, and this type definition makes sure that we pass the right object to useFragment. Otherwise,  the type system will throw errors during build time. In the above child component, the blogPost object is received as a prop and is passed to useFragment as a second parameter. If the blogPost object did not have the correct fragment, i.e., BlogPostHead_blogPost, spread on it, we would have received a type error. Even if there were another fragment with exact same data selection spread on it, Relay makes sure it’s the right fragment that we use with the useFragement. This allows you to change the update fragment definitions without affecting other components.

    Data masking

    In our example, the fragment BlogPostHead_blogPost explicitly selects two fields for the component:

    1. title
    2. coverImgUrl

    This is because we use/access only these two fields in the view for the <blogposthead></blogposthead> component. So, even if we define another fragment, BlogPostAuthor_blogPost, which selects the title and coverImgUrl, we don’t receive access to them unless we ask for them in the same fragment. This is enforced by Relay’s type system both at compile time and at runtime. This safety feature of Relay makes it impossible for components to depend on data they do not explicitly select. So, developers can refactor the components without risking other components. To reiterate, all components and their data dependencies are self-contained.

    The data for this component, i.e., title and coverImgUrl, will not be accessible on the parent component, BlogPost, even though the props object is sent by the parent. The data becomes available only through the useFragment React hook. This hook can consume the fragment definition. The useFragment takes in the fragment definition and the object where the fragment is spread to get the data listed for the particular fragment.  

    Just like how we spread the fragment for the BlogPostHead component in the BlogPost root query, we an also extend this to the child components of BlogPostHead. We spread the fragments, i.e., BlogPostAuthor_blogPost, BlogPostLikeControls_blogPost, since we are rendering <BlogPostAuthor /> and <BlogPostLikeControls />.

    NOTE: The useFragment hook does not fetch the data. It can be thought of as a selector that grabs only what is needed from the data definitions.

    Performance

    When using a fragment for a component, the component subscribes only to the data it depends on. In our example, the component BlogPostHead will only automatically re-render when the fields “coverImgUrl” or “title” change for a specific blog post the component renders. Since the BlogPostAuthor_blogPost fragment does not select those fields, it will not re-render. Subscription to any updates is made on fragment level. This is an essential feature that works out of the box with Relay for performance.

    Let us now see how general data and components are updated in a different GraphQL framework than Relay. The data that gets rendered on view actually comes from an operation that requests data from the server, i.e., a query or mutation. We write the query that fetches data from the server, and that data is passed down to different components as per their needs as props. The data flows from the root component, i.e., the component with the query, down to the components. 

    Let’s look at a graphical representation of the data flow in other GraphQL frameworks:

    Image source: Dev.to

    NOTE: Here, the framework data store is usually referred to as cache in most frameworks:

    1. The Profile component executes the operation ProfileQuery to a GraphQL server.

    2. The data return is kept in some framework-specific representation of the data store.

    3. The data is passed to the view rendering it.

    4. The view then passes on the data to all the child components who need it. Example: Name, Avatar, and Bio. And finally React renders the view.

    In contrast, the Relay framework takes a different approach:

    Image source: Dev.to

    Let’s breakdown the approach taken by Relay: 

    • For the initial part, we see nothing changes. We still have a query that is sent to the GraphQL server and the data is fetched and stored in the Relay data store.
    • What Relay does after this is different. The components get the data directly from the cache-store(data store). This is because the fragments help Relay integrate deeply with the component data requirements.The component fragments get the data straight from the framework data store and do not rely on data to be passed down as props. Although some information is passed from the query to the fragments used to look up the particular data needed from the data store, the data is fetched by the fragment itself.

    To conclude the above comparison, in other frameworks (like Apollo), the component uses the query as the data source. The implementation details of how the root component executing the query sends data to its descendants is left to us. But Relay takes a different approach of letting the component take care of the data in needs from the data store.

    In an approach used by other GraphQL frameworks, the query is the data source, and updates in the data store forces the component holding the query to re-render. This re-render cascades down to any number of components even if those components do not have to do anything with the updated data other than acting as a layer to pass data from parent to child. In the Relay approach, the components directly subscribe to the updates for the data used. This ensures the best performance as our app scales in size and complexity.

    Developer Experience

    Relay removes the responsibility of developers to route the data down from query to the components that need it. This eliminates the changes of developer error. There is no way for a component to accidentally or deliberately depend on data that it should be just passing down in the component tree if it cannot access it. All the hard work is taken care of by the Relay framework if we follow the conventions discussed.

    Conclusion

    To summarize, we detailed all the work Relay does for us and the effects:

    • The type system of the Relay framework makes sure the right components get the right data they need. Everything in Relay revolves around fragments.
    • In Relay, fragments are coupled and colocated with components, which allows it to mask the data requirements from the outside world. This increases the readability and modularity.
    • By default, Relay takes care of performance as components only re-render when the exact data they use change in the data store.
    • Type generation is a main feature of Relay compiler. Through type generation, interactions with the fragment’s data is typesafe.

    Conventions enforced by Relay’s philosophy and architecture allows it to take advantage of the information available about your component. It knows the exact data dependencies and types. It uses all this information to do a lot of work that developers are required to deal with.

    Related Articles

    1. Enable Real-time Functionality in Your App with GraphQL and Pusher

    2. Build and Deploy a Real-Time React App Using AWS Amplify and GraphQL

  • How To Implement Chaos Engineering For Microservices Using Istio

    “Embrace Failures. Chaos and failures are your friends, not enemies.” A microservice ecosystem is going to fail at some point. The issue is not if you fail, but when you fail, will you notice or not. It’s between whether it will affect your users because all of your services are down, or it will affect only a few users and you can fix it at your own time.

    Chaos Engineering is a practice to intentionally introduce faults and failures into your microservice architecture to test the resilience and stability of your system. Istio can be a great tool to do so. Let’s have a look at how Istio made it easy.

    For more information on how to setup Istio and what are virtual service and Gateways, please have a look at the following blog, how to setup Istio on GKE.

    Fault Injection With Istio

    Fault injection is a testing method to introduce errors into your microservice architecture to ensure it can withstand the error conditions. Istio lets you injects errors at HTTP layer instead of delaying the packets or killing the pods at network layer. This way, you can generate various types of HTTP error codes and test the reaction of your services under those conditions. 

    Generating HTTP 503 Error

    Here we see that two pods are running two different versions of recommendation service using the recommended tutorial while installing the sample application.

    Currently, the traffic on the recommendation service is automatically load balanced between those two pods.

    kubectl get pods -l app=recommendation
    NAME                                  READY     STATUS    RESTARTS   AGE
    recommendation-v1-798bf87d96-d9d95   2/2       Running   0          1h
    recommendation-v2-7bc4f7f696-d9j2m   2/2       Running   0          1h

    Now let’s apply a fault injection using virtual service which will send 503 HTTP error codes in 30% of the traffic serving the above pods.

    To test whether it is working, check the output from the curl of customer service microservice endpoint. 

    You will find the 503 error on approximately 30% of the request coming to recommendation service.

    To restore normal operation, please delete the above virtual service using:

    kubectl delete -f recommendation-fault.yaml

    Delay

    The most common failure we see in production is not the down service, rather a delay service. To inject network latency as a chaos experiment, you can create another virtual service. Sometimes, it happens that your application doesn’t respond on time and creates chaos in the complete ecosystem. How to simulate that behavior, let’s have a look.

    Now, if you hit the URL of endpoints of the above service in a loop, you will see the delays in some of the requests. 

    Retry‍

    In some of the production services, we expect that instead of failing instantly, it should retry N number of times to get the desired output. If not succeeded, then only a request should be considered as failed.

    For that mechanism, you can insert retries on those services as follows:

    Now any request coming to recommendation will do 3 attempts before considering it as failed.

    Timeout‍

    In the real world, an application faces most failures due to timeouts. It can be because of more load on the application or any other latency in serving the request. Your application should have proper timeouts defined, before declaring any request as “Failed”. You can use Istio to simulate the timeout mechanism and give our application a limited amount of time to respond before giving up.

    Wait only for N seconds before failing and giving up.

    kind: VirtualService
    metadata:
      name: recommendation
    spec:
      hosts:
      - recommendation
      http:
      - route:
        - destination:
            host: recommendation
        timeout: 1.000s

    Conclusion‍

    Istio lets you inject faults at the HTTP layer for your application and improves its resilience and stability. But, the application must handle the failures and take appropriate course of action. Chaos Engineering is only effective when you know your application can take failures, otherwise, there is no point in testing for chaos if you know your application is definitely broken.

  • A Comprehensive Tutorial to Implementing OpenTracing With Jaeger

    Introduction

    Recently, there has been a lot of discussion around OpenTracing. We’ll start this blog by introducing OpenTracing, explaining what it is and why it is gaining attention. Next, we will discuss distributed tracing system Jaeger and how it helps in troubleshooting microservices-based distributed systems. We will also set up Jaeger and learn to use it for monitoring and troubleshooting purposes.

    Drift to Microservice Architecture

    Microservice Architecture has now become the obvious choice for application developers. In the Microservice Architecture,  a monolithic application is broken down into a group of independently deployed services. In simple words,  an application is more like a collection of microservices. When we have millions of such intertwined microservices working together, it’s almost impossible to map the inter-dependencies of these services and understand the execution of a request.

    If a monolithic application fails then it is more feasible to do the root cause analysis and understand the path of a transaction using some logging frameworks. But in a microservice architecture, logging alone fails to deliver the complete picture.

    Is this service called first in the chain? How do I span all these services to get insight into the application? With questions like these, it becomes a significantly larger problem to debug a set of interdependent distributed services in comparison to a single monolithic application, making OpenTracing more and more popular.

    OpenTracing

    What is Distributed Tracing?

    Distributed tracing is a method used to monitor applications, mostly those built using the microservices architecture. Distributed tracing helps to highlight what causes poor performance and where failures occur.

    How OpenTracing Fits Into This?

    The OpenTracing API provides a standard, vendor neutral framework for instrumentation. This means that if a developer wants to try out a different distributed tracing system, then instead of repeating the whole instrumentation process for the new distributed tracing system, the developer can easily change the configuration of Tracer.

    OpenTracing uses basic terminologies, such as Span and Trace. You can read about them in detail here.

    OpenTracing is a way for services to “describe and propagate distributed traces without knowledge of the underlying OpenTracing implementation.

    Let us take the example of a service like renting a movie on any rental service like iTunes. A service like this requires many other microservices to check that the movie is available, proper payment credentials are received, and enough space exists on the viewer’s device for download. If either one of those microservice fail, then the entire transaction fails. In such a case, having logs just for the main rental service wouldn’t be very useful for debugging. However, if you were able to analyze each service you wouldn’t have to scratch your head to troubleshoot  which microservice failed and what made it fail.

    In real life, applications are even more complex and with the increasing complexity of applications, monitoring the applications has been a tedious task. Opentracing helps us to easily monitor:

    • Spans of services
    • Time taken by each service
    • Latency between the services
    • Hierarchy of services
    • Errors or exceptions during execution of each service.

    Jaeger: A Distributed Tracing System by Uber

    Jaeger, is released as an open source distributed tracing system by Uber Technologies. It is used for monitoring and troubleshooting microservices-based distributed systems, including:

    • Distributed transaction monitoring
    • Performance and latency optimization
    • Root cause analysis
    • Service dependency analysis
    • Distributed context propagation

    Major Components of Jaeger

    1. Jaeger Client Libraries
    2. Agent
    3. Collector
    4. Query
    5. Ingester

    Running Jaeger in a Docker Container

    1.  First, install Jaeger Client on your machine:

    $ pip install jaeger-client

    2.  Now, let’s run Jaeger backend as an all-in-one Docker image. The image launches the Jaeger UI, collector, query, and agent:

    $ docker run -d -p6831:6831/udp -p16686:16686 jaegertracing/all-in-one:latest

    TIP:  To check if the docker container is running, use: Docker ps.

    Once the container starts, open http://localhost:16686/  to access the Jaeger UI. The container runs the Jaeger backend with an in-memory store, which is initially empty, so there is not much we can do with the UI right now since the store has no traces.

    Creating Traces on Jaeger UI

    1.   Create a Python program to create Traces:

    Let’s generate some traces using a simple python program. You can clone the Jaeger-Opentracing repository given below for a sample program that is used in this blog.

    import sys
    import time
    import logging
    import random
    from jaeger_client import Config
    from opentracing_instrumentation.request_context import get_current_span, span_in_context
    
    def init_tracer(service):
        logging.getLogger('').handlers = []
        logging.basicConfig(format='%(message)s', level=logging.DEBUG)    
        config = Config(
            config={
                'sampler': {
                    'type': 'const',
                    'param': 1,
                },
                'logging': True,
            },
            service_name=service,
        )
        return config.initialize_tracer()
    
    def booking_mgr(movie):
        with tracer.start_span('booking') as span:
            span.set_tag('Movie', movie)
            with span_in_context(span):
                cinema_details = check_cinema(movie)
                showtime_details = check_showtime(cinema_details)
                book_show(showtime_details)
    
    def check_cinema(movie):
        with tracer.start_span('CheckCinema', child_of=get_current_span()) as span:
            with span_in_context(span):
                num = random.randint(1,30)
                time.sleep(num)
                cinema_details = "Cinema Details"
                flags = ['false', 'true', 'false']
                random_flag = random.choice(flags)
                span.set_tag('error', random_flag)
                span.log_kv({'event': 'CheckCinema' , 'value': cinema_details })
                return cinema_details
    
    def check_showtime( cinema_details ):
        with tracer.start_span('CheckShowtime', child_of=get_current_span()) as span:
            with span_in_context(span):
                num = random.randint(1,30)
                time.sleep(num)
                showtime_details = "Showtime Details"
                flags = ['false', 'true', 'false']
                random_flag = random.choice(flags)
                span.set_tag('error', random_flag)
                span.log_kv({'event': 'CheckCinema' , 'value': showtime_details })
                return showtime_details
    
    def book_show(showtime_details):
        with tracer.start_span('BookShow',  child_of=get_current_span()) as span:
            with span_in_context(span):
                num = random.randint(1,30)
                time.sleep(num)
                Ticket_details = "Ticket Details"
                flags = ['false', 'true', 'false']
                random_flag = random.choice(flags)
                span.set_tag('error', random_flag)
                span.log_kv({'event': 'CheckCinema' , 'value': showtime_details })
                print(Ticket_details)
    
    assert len(sys.argv) == 2
    tracer = init_tracer('booking')
    movie = sys.argv[1]
    booking_mgr(movie)
    # yield to IOLoop to flush the spans
    time.sleep(2)
    tracer.close()

    The Python program takes a movie name as an argument and calls three functions that get the cinema details, movie showtime details, and finally book a movie ticket.

    It creates some random delays in all the functions to make it more interesting, as in reality the functions would take certain time to get the details. Also the function throws random errors to give us a feel of how the traces of a real-life application may look like in case of failures.

    Here is a brief description of how OpenTracing has been used in the program:

    • Initializing a tracer:
    def init_tracer(service):
       logging.getLogger('').handlers = []
       logging.basicConfig(format='%(message)s', level=logging.DEBUG)   
       config = Config(
           config={
               'sampler': {
                   'type': 'const',
                   'param': 1,
               },
               'logging': True,
           },
           service_name=service,
       )
       return config.initialize_tracer()

    • Using the tracer instance:
    tracer = init_tracer('booking')

    • Starting new child spans using start_span:  
    with tracer.start_span('CheckCinema', child_of=get_current_span()) as span:

    • Using Tags:
    span.set_tag('Movie', movie)

    • Using Logs:
    span.log_kv({'event': 'CheckCinema' , 'value': cinema_details })

    2. Run the python program:

    $ python booking-mgr.py <movie-name>
    
    Initializing Jaeger Tracer with UDP reporter
    Using sampler ConstSampler(True)
    opentracing.tracer initialized to <jaeger_client.tracer.Tracer object at 0x7f72ffa25b50>[app_name=booking]
    Reporting span cfe1cc4b355aacd9:8d6da6e9161f32ac:cfe1cc4b355aacd9:1 booking.CheckCinema
    Reporting span cfe1cc4b355aacd9:88d294b85345ac7b:cfe1cc4b355aacd9:1 booking.CheckShowtime
    Ticket Details
    Reporting span cfe1cc4b355aacd9:98cbfafca3aa0fe2:cfe1cc4b355aacd9:1 booking.BookShow
    Reporting span cfe1cc4b355aacd9:cfe1cc4b355aacd9:0:1 booking.booking

    Now, check your Jaeger UI, you can see a new service “booking” added. Select the service and click on “Find Traces” to see the traces of your service. Every time you run the program a new trace will be created.

    You can now compare the duration of traces through the graph shown above. You can also filter traces using  “Tags” section under “Find Traces”. For example, Setting “error=true” tag will filter out all the jobs that have errors, as shown:

    To view the detailed trace, you can select a specific trace instance and check details like the time taken by each service, errors during execution and logs.

    The above trace instance has four spans, the first representing the root span “booking”, the second is the “CheckCinema”, the third is the “CheckShowtime” and last is the “BookShow”. In this particular trace instance, both the “CheckCinema” and “CheckShowtime” have reported errors, marked by the error=true tag.

    Conclusion

    In this blog, we’ve described the importance and benefits of OpenTracing, one of the core pillars of modern applications. We also explored how distributed tracer Jaeger collect and store traces while revealing inefficient portions of our applications. It is fully compatible with OpenTracing API and has a number of clients for different programming languages including Java, Go, Node.js, Python, PHP, and more.

    References

    • https://www.jaegertracing.io/docs/1.9/
    • https://opentracing.io/docs/
  • Building a Collaborative Editor Using Quill and Yjs

    “Hope this email finds you well” is how 2020-2021 has been in a nutshell. Since we’ve all been working remotely since last year, actively collaborating with teammates became one notch harder, from activities like brainstorming a topic on a whiteboard to building documentation.

    Having tools powered by collaborative systems had become a necessity, and to explore the same following the principle of build fast fail fast, I started building up a collaborative editor using existing available, open-source tools, which can eventually be extended for needs across different projects.

    Conflicts, as they say, are inevitable, when multiple users are working on the same document constantly modifying it, especially if it’s the same block of content. Ultimately, the end-user experience is defined by how such conflicts are resolved.

    There are various conflict resolution mechanisms, but two of the most commonly discussed ones are Operational Transformation (OT) and Conflict-Free Replicated Data Type (CRDT). So, let’s briefly talk about those first.

    Operational Transformation

    The order of operations matter in OT, as each user will have their own local copy of the document, and since mutations are atomic, such as insert V at index 4 and delete X at index 2. If the order of these operations is changed, the end result will be different. And that’s why all the operations are synchronized through a central server. The central server can then alter the indices and operations and then forward to the clients. For example, in the below image, User2 makes a delete(0) operation, but as the OT server realizes that User1 has made an insert operation, the User2’s operation needs to be changed as delete(1) before applying to User1.

    OT with a central server is typically easier to implement. Plain text operations with OT in its basic form only has three defined operations: insert, delete, and apply.

    Source: Conclave

    “Fully distributed OT and adding rich text operations are very hard, and that’s why there’s a million papers.”

    CRDT

    Instead of performing operations directly on characters like in OT, CRDT uses a complex data structure to which it can then add/update/remove properties to signify transformation, enabling scope for commutativity and idempotency. CRDTs guarantee eventual consistency.

    There are different algorithms, but in general, CRDT has two requirements: globally unique characters and globally ordered characters. Basically, this involves a global reference for each object, instead of positional indices, in which the ordering is based on the neighboring objects. Fractional indices can be used to assign index to an object.

    Source: Conclave

    As all the objects have their own unique reference, delete operation becomes idempotent. And giving fractional indices is one way to give unique references while insertion and updation.

    There are two types of CRDT, one is state-based, where the whole state (or delta) is shared between the instances and merged continuously. The other is operational based, where only individual operations are sent between replicas. If you want to dive deep into CRDT, here’s a nice resource.

    For our purposes, we choose CRDT since it can also support peer-to-peer networks. If you directly want to jump to the code, you can visit the repo here.

    Tools used for this project:

    As our goal was for a quick implementation, we targeted off-the-shelf tools for editor and backend to manage collaborative operations.

    • Quill.js is an API-driven WYSIWYG rich text editor built for compatibility and extensibility. We choose Quill as our editor because of the ease to plug it into your application and availability of extensions.
    • Yjs is a framework that provides shared editing capabilities by exposing its different shared data types (Array, Map, Text, etc) that are synced automatically. It’s also network agnostic, so the changes are synced when a client is online. We used it because it’s a CRDT implementation, and surprisingly had readily available bindings for quill.js.

    Prerequisites:

    To keep it simple, we’ll set up a client and server both in the same code base. Initialize a project with npm init and install the below dependencies:

    npm i quill quill-cursors webpack webpack-cli webpack-dev-server y-quill y-websocket yjs

    • Quill: Quill is the WYSIWYG rich text editor we will use as our editor.
    • quill-cursors is an extension that helps us to display cursors of other connected clients to the same editor room.
    • Webpack, webpack-cli, and webpack-dev-server are developer utilities, webpack being the bundler that creates a deployable bundle for your application.
    • The Y-quill module provides bindings between Yjs and QuillJS with use of the SharedType y.Text. For more information, you can check out the module’s source on Github.
    • Y-websocket provides a WebsocketProvider to communicate with Yjs server in a client-server manner to exchange awareness information and data.
    • Yjs, this is the CRDT framework which orchestrates conflict resolution between multiple clients. 

    Code to use

    const path = require('path');
    
    module.exports = {
      mode: 'development',
      devtool: 'source-map',
      entry: {
        index: './index.js'
      },
      output: {
        globalObject: 'self',
        path: path.resolve(__dirname, './dist/'),
        filename: '[name].bundle.js',
        publicPath: '/quill/dist'
      },
      devServer: {
        contentBase: path.join(__dirname),
        compress: true,
        publicPath: '/dist/'
      }
    }

    This is a basic webpack config where we have provided which file is the starting point of our frontend project, i.e., the index.js file. Webpack then uses that file to build the internal dependency graph of your project. The output property is to define where and how the generated bundles should be saved. And the devServer config defines necessary parameters for the local dev server, which runs when you execute “npm start”.

    We’ll first create an index.html file to define the basic skeleton:

    <!DOCTYPE html>
    <html>
      <head>
        <title>Yjs Quill Example</title>
        <script src="./dist/index.bundle.js" async defer></script>
        <link rel=stylesheet href="//cdn.quilljs.com/1.3.6/quill.snow.css" async defer>
      </head>
      <body>
        <button type="button" id="connect-btn">Disconnect</button>
        <div id="editor" style="height: 500px;"></div>
      </body>
    </html>

    The index.html has a pretty basic structure. In <head>, we’ve provided the path of the bundled js file that will be created by webpack, and the css theme for the quill editor. And for the <body> part, we’ve just created a button to connect/disconnect from the backend and a placeholder div where the quill editor will be plugged.

    • Here, we’ve just made the imports, registered quill-cursors extension, and added an event listener for window load:
    import Quill from "quill";
    import * as Y from 'yjs';
    import { QuillBinding } from 'y-quill';
    import { WebsocketProvider } from 'y-websocket';
    import QuillCursors from "quill-cursors";
    
    // Register QuillCursors module to add the ability to show multiple cursors on the editor.
    Quill.register('modules/cursors', QuillCursors);
    
    window.addEventListener('load', () => {
      // We'll add more blocks as we continue
    });

    • Let’s initialize the Yjs document, socket provider, and load the document:
    window.addEventListener('load', () => {
      const ydoc = new Y.Doc();
      const provider = new WebsocketProvider('ws://localhost:3312', 'velotio-demo', ydoc);
      const type = ydoc.getText('Velotio-Blog');
    });

    • We’ll now initialize and plug the Quill editor with its bindings:
    window.addEventListener('load', () => {
      // ### ABOVE CODE HERE ###
    
      const editorContainer = document.getElementById('editor');
      const toolbarOptions = [
        ['bold', 'italic', 'underline', 'strike'],  // toggled buttons
        ['blockquote', 'code-block'],
        [{ 'header': 1 }, { 'header': 2 }],               // custom button values
        [{ 'list': 'ordered' }, { 'list': 'bullet' }],
        [{ 'script': 'sub' }, { 'script': 'super' }],      // superscript/subscript
        [{ 'indent': '-1' }, { 'indent': '+1' }],          // outdent/indent
        [{ 'direction': 'rtl' }],                         // text direction
        // array for drop-downs, empty array = defaults
        [{ 'size': [] }],
        [{ 'header': [1, 2, 3, 4, 5, 6, false] }],
        [{ 'color': [] }, { 'background': [] }],          // dropdown with defaults from theme
        [{ 'font': [] }],
        [{ 'align': [] }],
        ['image', 'video'],
        ['clean']                                         // remove formatting button
      ];
    
      const editor = new Quill(editorContainer, {
        modules: {
          cursors: true,
          toolbar: toolbarOptions,
          history: {
            userOnly: true  // only user changes will be undone or redone.
          }
        },
        placeholder: "collab-edit-test",
        theme: "snow"
      });
    
      const binding = new QuillBinding(type, editor, provider.awareness);
    });

    • Finally, let’s implement the Connect/Disconnect button and complete the callback:
    window.addEventListener('load', () => {
      // ### ABOVE CODE HERE ###
    
      const connectBtn = document.getElementById('connect-btn');
      connectBtn.addEventListener('click', () => {
    	if (provider.shouldConnect) {
      	  provider.disconnect();
      	  connectBtn.textContent = 'Connect'
    	} else {
      	  provider.connect();
      	  connectBtn.textContent = 'Disconnect'
    	}
      });
    
      window.example = { provider, ydoc, type, binding, Y }
    });

    Steps to run:

    • Server:

    For simplicity, we’ll directly use the y-websocket-server out of the box.

    NOTE: You can either let it run and open a new terminal for the next commands, or let it run in the background using `&` at the end of the command.

    • Client:

    Start the client by npm start. On successful compilation, it should open on your default browser, or you can just go to http://localhost:8080.

    Show me the repo

    You can find the repository here.

    Conclusion:

    Conflict resolution approaches are not relatively new, but with the trend of remote culture, it is important to have good collaborative systems in place to enhance productivity.

    Although this example was just on rich text editing capabilities, we can extend existing resources to build more features and structures like tabular data, graphs, charts, etc. Yjs shared types can be used to define your own data format based on how your custom editor represents data internally.

  • Building an ETL Workflow Using Apache NiFi and Hive

    The objective of this article is to design an ETL workflow using Apache NiFi that will scrape a web page with almost no code to get an endpoint, extract and transform the dataset, and load the transformed data into a Hive table.

    Problem Statement

    One potential use case where we need to create a data pipeline would be to capture the district level COVID-19 information from the COVID19-India API website, which gets updated daily. So, the aim is to create a flow that collates and loads a dataset into a warehouse system used by various downstream applications for further analysis, and the flow should be easily configurable for future changes.

    Prerequisites

    Before we start, we must have a basic understanding of Apache NiFi, and having it installed on a system would be a great start for this article. If you do not have it installed, please follow these quick steps. Apache Hive should be added to this architecture, which also requires a fully functional Hadoop framework. For this article, I am using Hive on a single cluster installed locally, but you can use a remote hive connection as well.

    Basic Terminologies

    Apache NiFi is an ETL tool with flow-based programming that comes with a web UI built to provide an easy way (drag & drop) to handle data flow in real-time. It also supports powerful and scalable means of data routing and transformation, which can be run on a single server or in a clustered mode across many servers.

    NiFi workflow consists of processors, the rectangular components that can process, verify, filter, join, split, or adjust data. They exchange pieces of information called FlowFiles through queues named connections, and the FlowFile Controller helps to manage the resources between those components.

    Web scraping is a process to extract and collect structured web data with automation. It includes extracting and processing underlying HTML code using CSS selectors and the extracted data gets stored into a database.

    Apache Hive is a warehouse system built on top of Hadoop used for data summarization, query, and ad-hoc analysis.

    Steps for ETL Workflow

    Fig:- End-to-End NiFi WorkFlow

    The above flow comprises multiple processors each performing different tasks at different stages to process data. The different stages are Collect (InvokeHTTP – API Web Page, InvokeHTTP – Download District Data), Filter (GetHTMLElement, ExtractEndPoints, RouteOnAttributeDistrict API, QueryRecord), Transform (ReplaceHeaders, ConvertJSONToSQL), Load (PutHiveQL), and Logging (LogAttribute). Each processor is connected through different relationship connections and gets triggered on success until the data gets loaded into the table. The entire flow is scheduled to run daily.

    So, let’s dig into each step to understand the flow better.

    1. Get the HTML document using the Remote URL

    The flow starts with an InvokeHTTP processor that sends an HTTP GET request to the COVID19-India API URL and returns an HTML page in the response queue for further inspection. The processor can be used to invoke multiple HTTP methods (GET, PUT, POST, or PATCH) as well.

    Fig:- InvokeHTTP – API Web Page Configuration

    2. Extract listed endpoints

    The second step occurs when the GETHTMLElement processor targets HTML table rows from the response where all the endpoints are listed inside anchor tags using the CSS selector as tr > td > a. and extracts data into FlowFiles.

    Fig:- GetHTMLElement Configuration

    After the success of the previous step, the ExtractText processor evaluates regular expressions against the content of the FlowFile to extract the URLs, which are then assigned to a FlowFile attribute named data_url.

    Fig:- ExtractEndPoints Configuration

    Note: The layout of the web page may have changed in the future. So, if you are reading this article in the future, configure the above processors as per the layout changes if any.

    3. Pick districts API and Download the dataset

    Here, the RouteOnAttribute processor filters out an API for district-level information and ignores other APIs using Apache NiFi Expression since we are only interested in district.csv

    Fig:- RouteOnAttribute – District API Configuration

    And this time, the InvokeHTTP processor downloads the data using the extracted API endpoint assigned to the attribute data_url surrounded with curly braces and the response data will be in the CSV format.

    Fig:- InvokeHTTP – Download District Data Configuration

    ‍4. Transform and Filter the dataset

    In this stage, the header of the response data is changed to lowercase using the ReplaceText processor with Literal Replace strategy, and the first field name is changed from date to recorded_date to avoid using reserved database keywords.

    Since the data is being updated daily on an incremental basis, we will only extract the data from the previous day using the QueryRecord processor. It will also convert the CSV data into JSON FlowFile using the CSVReader and JsonRecordSetWriter controller services.

    Please note that both the CSVReader and JsonRecordSetWriter services can have the default settings for our use. You can check out this blog for more reading on the controller services.

    And as mentioned, QueryRecord evaluates the below query to get data from the previous day out of the FlowFile and passes it to the next processor.

    select * from FlowFile where recorded_date=’${now():toNumber():minus(86400000):format(‘yyyy-MM-dd’)}’ 

    Fig:- ReplaceHeaders Configuration

    Fig:- QueryRecord Configuration

    ‍5. Establish JDBC connection pool for Hive and create a table

    Let’s set up the Hive JDBC driver for the NiFi flow using HiveConnectinPool with required local/remote configurations (database connection URL, user, and password).  Hive Configuration Resources property expects Hive configuration file path, i.e., hive-site.xml.

    Fig:- HiveConnectionPool Setup

    Now, we need an empty table to load the data from the NiFi flow, and to do so, you can use the DDL structure below:

    CREATE TABLE IF NOT EXISTS demo.districts (recorded_date string, state string, district string, confirmed string, recovered string, deceased string, other string, tested string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

    6. Load data into the Hive table

    In this step, the JSON-formatted FlowFile is converted into an SQL statement using ConvertJSONToSQL to provide a SQL query as the output FlowFile. We can configure the HiveConnectinPool for the JDBC Connection Pool property along with the table name and statement type before running the processor. In this case, the statement would be an insert type since we need to load the data into the table.

    Also, please note that when preparing a SQL command, the SQL Parameter Attribute Prefix property should be hiveql. Otherwise, the very next processor will not be able to identify it and will throw an error.

    Then, on success, PutHiveQL executes the input SQL command and loads the data into the table. The success of this processor marks the end of the workflow and the data can be verified by fetching the target table.

    Fig:- ConvertJSONToSQL Configurations

     

    Fig:- PutHiveQL Configuration

    ‍7. Schedule the flow for daily updates

    You can schedule the entire flow to run at any given time using different NiFi scheduling strategies. Since the first InvokeHTTP is the initiator of this flow, we can configure it to run daily at 2 AM.

    Fig:- Scheduling Strategy

    8. Log Management

    Almost every processor has been directed to the LogAttribute processor with a failure/success queue, which will write the state and information of all used attributes into the NiFi file, logs/nifi-app.log. By checking this file, we can debug and fix the issue in case of any failure. To extend it even further, we can also set up a flow to capture and notify error logs using Apache Kafka over email.

    9. Consume data for analysis

    You can use various open-source visualization tools to start off with the exploratory data analysis on the data stored in the Hive table.

    You can download the template covid_etl_workflow.xml and run it on your machine for reference.

    Future Scope

    There are different ways to build any workflow, and this was one of them. You can take this further by allowing multiple datasets (state_wise, test_datasets) from the list with different combinations of various processors/controllers as a part of the flow. 

    You can also try scraping data from a product listing page of multiple e-commerce websites for a comparison between goods and price or you can even extract movie reviews and ratings from the IMDb website and use it as a recommendation for users. 

    Conclusion

    In this article, we discussed Apache NiFi and created a workflow to extract, filter, transform, and load the data for analysis purposes. If you are more comfortable building logics and want to focus on the architecture with less code, then Apache NiFi is the tool for you.

  • Amazon Lex + AWS Lambda: Beyond Hello World

    In my previous blog, I explained how to get started with Amazon Lex and build simple bots. This blog aims at exploring the Lambda functions used by Amazon Lex for code validation and fulfillment. We will go along with the same example we created in our first blog i.e. purchasing a book and will see in details how the dots are connected.

    This blog is divided into following sections:

    1. Lambda function input format
    2. Response format
    3. Managing conversation context
    4. An example (demonstration to understand better how context is maintained to make data flow between two different intents)

    NOTE: Input to a Lambda function will change according to the language you use to create the function. Since we have used NodeJS for our example, everything will thus be explained using it.

    Section 1:  Lambda function input format

    When communication is started with a Bot, Amazon Lex passes control to Lambda function, we have defined while creating the bot.

    There are three arguments that Amazon Lex passes to a Lambda function:

    1. Event:

    event is a JSON variable containing all details regarding a bot conversation. Every time lambda function is invoked, event JSON is sent by Amazon Lex which contains the details of the respective message sent by the user to the bot.

    Below is a sample event JSON:

    {  
    currentIntent: {    
    name: 'orderBook',    
    slots: {      
      bookType: null,      
      bookName: 'null'    
     },    
      confirmationStatus: 'None'
     },
     bot: {  
      name: 'PurchaseBook',  
      alias: '$LATEST',  
      version: '$LATEST'
     },
     userId: 'user-1',
     inputTranscript: 'buy me a book',
     invocationSource: 'DialogCodeHook',
     outputDialogMode: 'Text',
     messageVersion: '1.0'
     };

    Format of event JSON is explained below:-

    • currentIntent:  It will contain information regarding the intent of message sent by the user to the bot. It contains following keys:
    • name: intent name  (for e.g orderBook, we defined this intent in our previous blog).
    • slots: It will contain a map of slot names configured for that particular intent,  populated with values recognized by Amazon Lex during the conversation. Default values are null.  
    • confirmationStatus:  It provides the user response to a confirmation prompt if there is one. Possible values for this variable are:
    • None: Default value
    • Confirmed: When the user responds with a confirmation w.r.t confirmation prompt.
    • Denied: When the user responds with a deny w.r.t confirmation prompt.
    • inputTranscipt: Text input by the user for processing. In case of audio input, the text will be extracted from audio. This is the text that is actually processed to recognize intents and slot values.                                
    • invocationSource: Its value directs the reason for invoking the Lambda function. It can have following two values:
    • DialogCodeHook:  This value directs the Lambda function to initialize the validation of user’s data input. If the intent is not clear, Amazon Lex can’t invoke the Lambda function.
    • FulfillmentCodeHook: This value is set to fulfil the intent. If the intent is configured to invoke a Lambda function as a fulfilment code hook, Amazon Lex sets the invocationSource to this value only after it has all the slot data to fulfil the intent.
    • bot: Details of bot that processed the request. It consists of below information:
    • name:  name of the bot.
    • alias: alias of the bot version.
    • version: the version of the bot.
    • userId: Its value is defined by the client application. Amazon Lex passes it to the Lambda function.
    • outputDialogMode:  Its value depends on how you have configured your bot. Its value can be Text / Voice.
    • messageVersion: The version of the message that identifies the format of the event data going into the Lambda function and the expected format of the response from a Lambda function. In the current implementation, only message version 1.0 is supported. Therefore, the console assumes the default value of 1.0 and doesn’t show the message version.
    • sessionAttributes:  Application-specific session attributes that the client sent in the request. It is optional.

    2. Context:

    AWS Lambda uses this parameter to provide the runtime information of the Lambda function that is executing. Some useful information we can get from context object are:-

    • The time is remaining before AWS Lambda terminates the Lambda function.
    • The CloudWatch log stream associated with the Lambda function that is executing.
    • The AWS request ID returned to the client that invoked the Lambda function which can be used for any follow-up inquiry with AWS support.

    Section 2: Response Format

    Amazon Lex expects a response from a Lambda function in the following format:

    {  
    sessionAttributes: {},  
    dialogAction: {   
    type: "ElicitIntent/ ElicitSlot/ ConfirmIntent/ Delegate/ Close",
    <structure based on type> 
    }
    }

    The response consists of two fields. The sessionAttributes field is optional, the dialogAction field is required. The contents of the dialogAction field depends on the value of the type field.

    • sessionAttributes: This is an optional field, it can be empty. If the function has to send something back to the client it should be passed under sessionAttributes. We will see its use-case in Section-4.
    • dialogAction (Required): Type of this field defines the next course of action. There are five types of dialogAction explained below:-

    1) Close: Informs Amazon Lex not to expect a response from the user. This is the case when all slots get filled. If you don’t specify a message, Amazon Lex uses the goodbye message or the follow-up message configured for the intent.

    dialogAction: {   
    type: "Close",   
    fulfillmentState: "Fulfilled/ Failed", // (required)   
    message: { // (optional)     
    contentType: "PlainText or SSML",     
    content: "Message to convey to the user"   
    } 
    }

    2) ConfirmIntent: Informs Amazon Lex that the user is expected to give a yes or no answer to confirm or deny the current intent. The slots field must contain an entry for each of the slots configured for the specified intent. If the value of a slot is unknown, you must set it to null. The message and responseCard fields are optional.

    dialogAction: {   
    type: "ConfirmIntent",   
    intentName: "orderBook",   
    slots: {     
      bookName: "value",     
      bookType: "value",   
     }   
     message: { // (optional)     
      contentType: "PlainText or SSML",     
      content: "Message to convey to the user"   
      } 
      }

    3) Delegate:  Directs Amazon Lex to choose the next course of action based on the bot configuration. The response must include any session attributes, and the slots field must include all of the slots specified for the requested intent. If the value of the field is unknown, you must set it to null. You will get a DependencyFailedException exception if your fulfilment function returns the Delegate dialog action without removing any slots.

    dialogAction: {   
    type: "Delegate",   
    slots: {     
      slot1: "value",     
      slot2: "value"   
     } 
     }

    4) ElicitIntent: Informs Amazon Lex that the user is expected to respond with an utterance that includes an intent. For example, “I want a buy a book” which indicates the OrderBook intent. The utterance “book,” on the other hand, is not sufficient for Amazon Lex to infer the user’s intent

    dialogAction: 
    {   type: "ElicitIntent",   
    message: { // (optional)     
    contentType: "PlainText or SSML",     
    content: "Message to convey to the user"   
    } 
    }

    5) ElicitSlot:  Informs Amazon Lex that the user is expected to provide a slot value in the response. In below structure, we are informing Amazon lex that user response should provide value for the slot named ‘bookName’.

    dialogAction: {   
      type: "ElicitSlot",   
      intentName: "orderBook",   
      slots: {     
        bookName: "",     
        bookType: "fiction",   
       },   
       slotToElicit: "bookName",   
       message: { // (optional)     
       contentType: "PlainText or SSML",     
       content: "Message to convey to the user"   
       }
       }

    Section 3: Managing Conversation Context

    Conversation context is the information that a user, your application, or a Lambda function provides to an Amazon Lex bot to fulfill an intent. Conversation context includes slot data that the user provides, request attributes set by the client application, and session attributes that the client application and Lambda functions create.

    1. Setting session timeout

    Session timeout is the length of time that a conversation session lasts. For in-progress conversations, Amazon Lex retains the context information, slot data, and session attributes till the session ends. Default session duration is 5 minutes but it can be changed upto 24 hrs while creating the bot in Amazon Lex console.

    2.Setting session attributes

    Session attributes contain application-specific information that is passed between a bot and a client application during a session. Amazon Lex passes session attributes to all Lambda functions configured for a bot. If a Lambda function adds or updates session attributes, Amazon Lex passes the new information back to the client application.

    Session attributes persist for the duration of the session. Amazon Lex stores them in an encrypted data store until the session ends.

    3. Sharing information between intents

    If you have created a bot with more than one intent, information can be shared between them using session attributes. Attributes defined while fulfilling an intent can be used in other defined intent.

    For example, a user of the book ordering bot starts by ordering books. the bot engages in a conversation with the user, gathering slot data, such as book name, and quantity. When the user places an order, the Lambda function that fulfils the order sets the lastConfirmedReservation session attribute containing information regarding ordered book and currentReservationPrice containing the price of the book. So, when the user has fulfilled the intent orderMagazine, the final price will be calculated on the bases of currentReservationPrice.

    lastConfirmedReservation session attribute containing information regarding ordered book and currentReservationPrice containing the price of the book. So, when the user also fulfilled the intent orderMagazine, the final price will be calculated on the basis of currentReservationPrice.

    Section 4:  Example

    The details of example Bot are below:

    Bot Name: PurchaseBot

    Intents :

    • orderBook – bookName, bookType
    • orderMagazine – magazineName, issueMonth

    Session attributes set while fulfilling the intent “orderBook” are:

    1. lastConfirmedReservation: In this variable, we are storing slot values corresponding to intent orderBook.
    2. currentReservationPrice: Book price is calculated and stored in this variable

    When intent orderBook gets fulfilled we will ask the user if he also wants to order a magazine. If the user responds with a confirmation bot will start fulfilling the intent “orderMagazine”.  

    Conclusion

    AWS Lambda functions are used as code hooks for your Amazon Lex bot. You can identify Lambda functions to perform initialization and validation, fulfillment, or both in your intent configuration. This blog bought more technical insight of how Amazon Lex works and how it communicates with Lambda functions. This blog explains how a conversation context is maintained using the session attributes. I hope you find the information useful.

  • Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

    Introduction

    The world continues to go through digital transformation at an accelerating pace. Modern applications and infrastructure continues to expand and operational complexity continues to grow. According to a recent ManageEngine Application Performance Monitoring Survey:

    • 28 percent use ad-hoc scripts to detect issues in over 50 percent of their applications.
    • 32 percent learn about application performance issues from end users.
    • 59 percent trust monitoring tools to identify most performance deviations.

    Most enterprises and web-scale companies have instrumentation & monitoring capabilities with an ElasticSearch cluster. They have a high amount of collected data but struggle to use it effectively. This available data can be used to improve availability and effectiveness of performance and uptime along with root cause analysis and incident prediction

    IT Operations & Machine Learning

    Here is the main question: How to make sense of the huge piles of collected data? The first step towards making sense of data is to understand the correlations between the time series data. But only understanding will not work since correlation does not imply causation. We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc.

    It’s very likely that due to one component something goes wrong with another component. In such cases, operational historical data can be used to identify the root cause by investigating through a series of intermediate causes and effects. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

    Let’s see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

    Anomaly Detection using Elastic’s machine learning with X-Pack

    Step I: Setup

    1. Setup Elasticsearch: 

    According to Elastic documentation, it is recommended to use the Oracle JDK version 1.8.0_131. Check if you have required Java version installed on your system. It should be at least Java 8, if required install/upgrade accordingly.

    • Download elasticsearch tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.tar.gz
    $ tar -xzvf elasticsearch-5.5.1.tar.gz

    • It will then create a folder named elasticsearch-5.5.1. Go into the folder.
    $ cd elasticsearch-5.5.1

    • Install X-Pack into Elasticsearch
    $ ./bin/elasticsearch-plugin install x-pack

    • Start elasticsearch
    $ bin/elasticsearch

    2. Setup Kibana

    Kibana is an open source analytics and visualization platform designed to work with Elasticsearch.

    • Download kibana tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/kibana/kibana-5.5.1-linux-x86_64.tar.gz
    $ tar -xzf kibana-5.5.1-linux-x86_64.tar.gz

    • It will then create a folder named kibana-5.5.1. Go into the directory.
    $ cd kibana-5.5.1-linux-x86_64

    • Install X-Pack into Kibana
    $ ./bin/kibana-plugin install x-pack

    • Running kibana
    $ ./bin/kibana

    • Navigate to Kibana at http://localhost:5601/
    • Log in as the built-in user elastic and password changeme.
    • You will see the below screen:
    Kibana: X-Pack Welcome Page

     

    3. Metricbeat:

    Metricbeat helps in monitoring servers and the services they host by collecting metrics from the operating system and services. We will use it to get CPU utilization metrics of our local system in this blog.

    • Download Metric Beat’s tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-5.5.1-linux-x86_64.tar.gz
    $ tar -xvzf metricbeat-5.5.1-linux-x86_64.tar.gz

    • It will create a folder metricbeat-5.5.1-linux-x86_64. Go to the folder
    $ cd metricbeat-5.5.1-linux-x86_64

    • By default, Metricbeat is configured to send collected data to elasticsearch running on localhost. If your elasticsearch is hosted on any server, change the IP and authentication credentials in metricbeat.yml file.
     Metricbeat Config

     

    • Metric beat provides following stats:
    • System load
    • CPU stats
    • IO stats
    • Per filesystem stats
    • Per CPU core stats
    • File system summary stats
    • Memory stats
    • Network stats
    • Per process stats
    • Start Metricbeat as daemon process
    $ sudo ./metricbeat -e -c metricbeat.yml &

    Now, all setup is done. Let’s go to step 2 to create machine learning jobs. 

    Step II: Time Series data

    • Real-time data: We have metricbeat providing us the real-time series data which will be used for unsupervised learning. Follow below steps to define index pattern metricbeat-*  in Kibana to search against this pattern in Elasticsearch:
      – Go to Management -> Index Patterns  
      – Provide Index name or pattern as metricbeat-*
      – Select Time filter field name as @timestamp
      – Click Create

    You will not be able to create an index if elasticsearch did not contain any metric beat data. Make sure your metric beat is running and output is configured as elasticsearch.

    • Saved Historic data: Just to see quickly how machine learning detect the anomalies you can also use data provided by Elastic. Download sample data by clicking here.
    • Unzip the files in a folder: tar -zxvf server_metrics.tar.gz
    • Download this script. It will be used to upload sample data to elastic.
    • Provide execute permissions to the file: chmod +x upload_server-metrics.sh
    • Run the script.
    • As we created index pattern for metricbeat data, in same way create index pattern server-metrics*

    Step III: Creating Machine Learning jobs

    There are two scenarios in which data is considered anomalous. First, when the behavior of key indicator changes over time relative to its previous behavior. Secondly, when within a population behavior of an entity deviates from other entities in population over single key indicator.

    To detect these anomalies, there are three types of jobs we can create:

    1. Single Metric job: This job is used to detect Scenario 1 kind of anomalies over only one key performance indicator.
    2. Multimetric job: Multimetric job also detects Scenario 1 kind of anomalies but in this type of job we can track more than one performance indicators, such as CPU utilization along with memory utilization.
    3. Advanced job: This kind of job is created to detect anomalies of type 2.

    For simplicity, we are creating following single metric jobs:

    1. Tracking CPU Utilization: Using metric beat data
    2. Tracking total requests made on server: Using sample server data

    Follow below steps to create single metric jobs:

    Job1: Tracking CPU Utilization

    Job2: Tracking total requests made on server

    • Go to http://localhost:5601/
    • Go to Machine learning tab on the left panel of Kibana.
    • Click on Create new job
    • Click Create single metric job
    • Select index we created in Step 2 i.e. metricbeat-* and server-metrics* respectively
    • Configure jobs by providing following values:
    1. Aggregation: Here you need to select an aggregation function that will be applied to a particular field of data we are analyzing.
    2. Field: It is a drop down, will show you all field that you have w.r.t index pattern.
    3. Bucket span: It is interval time for analysis. Aggregation function will be applied on selected field after every interval time specified here.
    • If your data contains so many empty buckets i.e. data is sparse and you don’t want to consider it as anomalous check the checkbox named sparse data  (if it appears).
    • Click on Use full <index pattern=””> data to use all available data for analysis.</index>
    Metricbeats Description
    Server Description
    • Click on play symbol
    • Provide job name and description
    • Click on Create Job

    After creating job the data available will be analyzed. Click on view results, you will see a chart which will show the actual and upper & lower bound of predicted value. If actual value lies outside of the range, it will be considered as anomalous. The Color of the circles represents the severity level.

    Here we are getting a high range of prediction values since it just started learning. As we get more data the prediction will get better.
    You can see here predictions are pretty good since there is a lot of data to understand the pattern
    • Click on machine learning tab in the left panel. The jobs we created will be listed here.
    • You will see the list of actions for every job you have created.
    • Since we are storing every minute data for Job1 using metricbeat. We can feed the data to the job in real time. Click on play button to start data feed. As we get more and more data prediction will improve.
    • You see details of anomalies by clicking Anomaly Viewer.
    Anomaly in metricbeats data
    Server metrics anomalies  

    We have seen how machine learning can be used to get patterns among the different statistics along with anomaly detection. After identifying anomalies, it is required to find the context of those events. For example, to know about what other factors are contributing to the problem? In such cases, we can troubleshoot by creating multimetric jobs.