Blog

  • Building Scalable Front-end With Lerna, YARN And React In 60 Minutes

    Beginnings are often messy. Be it in any IT project, we often see that at a certain point, people look forward to revamping things. With revamping, there comes additional costs and additional time. And those will be a lot costlier if not addressed at the right time to meet the customers’ feature demands. Most things like code organization, reusability, code cleanup, documentation are often left unattended at the beginning only to realize that they hold the key to faster development and ensure quick delivery of requested features in the future, as projects grow into platforms to serve huge numbers of users.

    We are going to look at how to write a scalable frontend platform with React and Lerna within an hour in this blog post. The goal is to have an organized modular architecture that makes it easy to maintain existing products and can quickly deliver new modules as they arrive.

    Going Mono-Repo:

    Most projects start with a single git/bitbucket repository and end up in chaos with time. With mono-repo, we can make it more manageable. That being said, we will use Lerna, npm and YARN to initialize our monorepo.

    Prerequisite:

    npm, npx

    Installing YARN: Installation Guide

    Installing npx

    npm install -g npx

    Installing Lerna

    npm install --global lerna

    After this, we will have Lerna installed globally.

    Initializing a project with Lerna.

    mkdir startup_ui && cd startup_ui
    npx lerna init

    Let’s go through the files generated with it. So we have the following files:

     – package.json 

     – lerna.json

     – packages/

    package.json is the same as any other npm package.json file. It specifies the name of the project and some basic stuff that we normally define like adding husky for pre-commit hooks.

    lerna.json is a configuration file for configuring Lerna. You can find more about Lerna configuration and supported options at Lerna concepts.

    packages/ is a directory where all our modules will be defined and Lerna will take care of referencing them in each other. Lerna will simply create symlinks of referenced local modules to make it available for other modules to use.

    To understand Lerna and its concepts, you can go through its official documentation.

    For better performance, we will go with YARN. So how do we configure YARN with Lerna?

    It’s pretty simple as shown below.

    We just need to add “npmClient”: “yarn” to lerna.json

    Using YARN workspaces with Lerna

    YARN workspace is a quick way to get around the mess of `yarn link` i.e. referencing one module into another. Lerna already provides it so why go with YARN workspaces? 

    The answer is excellent bootstrapping time provided by YARN workspaces. 

    Here is how we can use YARN workspaces with Lerna

    {
      "name": "root",
      "private": true,
      "workspaces": [
        "packages/*"
      ],
      "devDependencies": {
        "lerna": "^3.20.2"
      }
    }

    Now we just need to add “useWorkspaces”: true to lerna.json and in package.json, we need to add 

    "workspaces": [
    "packages/*"
    ]

    {
      "packages": [
        "packages/*"
      ],
      "version": "0.0.0",
      "npmClient": "yarn",
      "useWorkspaces": true
    }

    This will take care of linking different modules in a UI platform which are mentioned in the packages folder.

    Once this is done, we can proceed to bootstrap, which in other terms, means forcefully telling Lerna to link the packages. As of now, we do not have anything under it, but we can run it to check if this setup can bootstrap and work properly.

    yarn install
    lerna bootstrap

    So it’s all about the Lerna and YARN set up, but how should one really organize a UI package to build in a manageable and modular way.

    What most React projects are made of

    1 – Component Libraries
    2 – Modules
    3 – Utils libraries
    4 – Abstractions over React-Redux ( Optional but most organizations go with it)

    Components: Most organizations end up building their own component libraries and it is crucial to have it separated from the codebase for reusability of components. There exist a lot of libraries, but when you start building the product, you realize every library has something and misses something. Most commonly available libraries convey a standard UX design pattern. What if your designs don’t fit into those at a later point? So we need a separate components library where we can maintain organization-specific components.

    Modules: At first you may have one module or product, but over time it will grow. To avoid breaking existing modules over time and keeping change lists smaller and limited to individual products, it’s essential to split a monolith into multiple modules to manage the chaos of each module within it without impacting other stable modules.

    Utils: These are common to any project. Almost every one of us ends up creating utils folders in projects to help with little functions like converting currency or converting large numbers like 100000 as 100K and many more. Most of these functions are common and specific to organizations. E.g. a company working with statistics is going to have a helper function to convert large numbers into human-readable figures and eventually they end up copying the same code. Keeping utils separated gives us a unique opportunity to avoid code duplication of such cases and keep consistency across different modules.

    Abstractions over React-Redux: A lot of organizations prefer to do it. AWS, Microsoft Outlook, and many more have already adopted this strategy to abstract React & Redux bindings to create their own simplified functions to quickly bootstrap a new module/product into an existing ecosystem. This helps in faster delivery of new modules since developers don’t get into the same problems of bootstrapping and can focus on product problems rather than setting up the environment. 

    One of the most simplified approaches is presented at react-redux-patch to reduce boilerplate. We will not go into depth in this article since it’s a vast topic and a lot of people have their opinion on how this should be built. 

    Example:

    We will use create-react-app & create-react-library to create a base for our libraries and modules.

    Installing create-react-library globally.

    yarn global add create-react-library

    Creating a components library:

    create-react-library takes away the pain of complex configurations and enables us to create components and utility libraries with ease.

    cd packages
    create-react-library ui-components

    For starters, just create a simple button component in the library.

    import React from 'react'
    
    const buttonStyle = {
      padding: '10px 20px'
    };
    
    const Button = (props) => {
      return (
        <button
        className="btn btn-default"
        style={buttonStyle}
        onClick={props.handleClick}>{props.label}</button>
      )
    }
    
    Button.defaultProps = {
        onClick: () => {},
        label: ''
    };
    
    export default Button;

    Similarly, we can create common-utils packages with the help of create-react-library.

    We will just define a simple formatDate function in the common-utils library.

    export const formatDate = (date) => {
      var d = new Date(date),
          month = '' + (d.getMonth() + 1),
          day = '' + d.getDate(),
          year = d.getFullYear();
    
      if (month.length < 2) 
          month = '0' + month;
      if (day.length < 2) 
          day = '0' + day;
    
      return [year, month, day].join('-');
    }

    Now these two packages, ui-components and common-utils are ready to be used in multiple projects.

    Let’s create a simple React app with create-react-app.

    To link these libraries, simply run 

    lerna bootstrap 

    whenever you add a new library to package.json

    And we will be ready to use them.

    Let’s add the dependencies in package.json

    {
      "name": "product-one",
      "version": "0.1.0",
      "private": true,
      "dependencies": {
        "@testing-library/jest-dom": "^4.2.4",
        "@testing-library/react": "^9.3.2",
        "@testing-library/user-event": "^7.1.2",
        "react": "^16.13.1",
        "react-dom": "^16.13.1",
        "react-scripts": "3.4.1",
        "ui-components": "latest",
        "common-utils": "latest"
      },
      "scripts": {
        "start": "react-scripts start",
        "build": "react-scripts build",
        "test": "react-scripts test",
        "eject": "react-scripts eject"
      },
      "eslintConfig": {
        "extends": "react-app"
      },
      "browserslist": {
        "production": [
          ">0.2%",
          "not dead",
          "not op_mini all"
        ],
        "development": [
          "last 1 chrome version",
          "last 1 firefox version",
          "last 1 safari version"
        ]
      }
    }

    Let’s do Lerna bootstrap:

    Here is a simple example of how we can use it.

    import React from 'react';
    import logo from './logo.svg';
    import './App.css';
    import { Button } from 'ui-components';
    import { formatDate } from 'common-utils';
    
    function App() {
      return (
        <div className="App">
          <div>
            Sample product
            <Button label="A product button" />
            <span>
              Today is: {formatDate(new Date())}
            </span>
          </div>
        </div>
      );
    }
    
    export default App;

    Real manageable chaos:

    With this strategy, our codebase is now split, can be easily reused, and is modular.

    – Every module is placed under packages. 

    – Product codebase is separated from utils and components

    – Multiple teams can work on individual modules

    – Finding impact analysis is easier than a monolith project

    – Side effects are mostly limited to only one module for small changes

    All the code in this article is available at lerna-yarn-react-example

    How about publishing the packages?

    Publishing your packages to a private npm directory is a chaotic thing in monorepo. Lerna provides a very simple approach towards publishing packages.

    You will just need to add the following to package.json

    "publishConfig": {
    "registry": REGISTRY_URL_HERE
    }

    "publishConfig": {
       "registry": REGISTRY_URL_HERE
    }

    Now you can simply run

    lerna publish

    It will try to publish the packages that have been changed since the last commit.

    Summary:

    With the use of Lerna and YARN, we can create an efficient front-end architecture to quickly deliver new features with less impact on existing modules. Of course with additional bootstrapping tools like yeoman generator along with abstractions over React and Redux, it makes the process of introducing to new modules a piece of a cake. Over time, you can easily split these modules and components into individual repositories by utilizing the private npm repositories. But for the initial chaos of getting things working and quick prototyping of your next big company’s UI architecture, Lerna and YARN are perfectly suited tools!!!

  • Building A Scalable API Testing Framework With Jest And SuperTest

    Focus on API testing

    Before starting off, below listed are the reasons why API testing should be encouraged:

    • Identifies bugs before it goes to UI
    • Effective testing at a lower level over high-level broad-stack testing
    • Reduces future efforts to fix defects
    • Time-saving

    Well, QA practices are becoming more automation-centric with evolving requirements, but identifying the appropriate approach is the primary and the most essential step. This implies choosing a framework or a tool to develop a test setup which should be:

    • Scalable 
    • Modular
    • Maintainable
    • Able to provide maximum test coverage
    • Extensible
    • Able to generate test reports
    • Easy to integrate with source control tool and CI pipeline

    To attain the goal, why not develop your own asset rather than relying on the ready-made tools like Postman, JMeter, or any? Let’s have a look at why you should choose ‘writing your own code’ over depending on the API testing tools available in the market:

    1. Customizable
    2. Saves you from the trap of limitations of a ready-made tool
    3. Freedom to add configurations and libraries as required and not really depend on the specific supported plugins of the tool
    4. No limit on the usage and no question of cost
    5. Let’s take Postman for example. If we are going with Newman (CLI of Postman), there are several efforts that are likely to evolve with growing or changing requirements. Adding a new test requires editing in Postman, saving it in the collection, exporting it again and running the entire collection.json through Newman. Isn’t it tedious to repeat the same process every time?

    We can overcome such annoyance and meet our purpose using a self-built Jest framework using SuperTest. Come on, let’s dive in!

    Source: school.geekwall

    Why Jest?

    Jest is pretty impressive. 

    • High performance
    • Easy and minimal setup
    • Provides in-built assertion library and mocking support
    • Several in-built testing features without any additional configuration
    • Snapshot testing
    • Brilliant test coverage
    • Allows interactive watch mode ( jest –watch or jest –watchAll )

    Hold on. Before moving forward, let’s quickly visit Jest configurations, Jest CLI commands, Jest Globals and Javascript async/await for better understanding of the coming content.

    Ready, set, go!

    Creating a node project jest-supertest in our local and doing npm init. Into the workspace, we will install Jest, jest-stare for generating custom test reports, jest-serial-runner to disable parallel execution (since our tests might be dependent) and save these as dependencies.

    npm install jest jest-stare jest-serial-runner --save-dev

    Tags to the scripts block in our package.json. 

    
    "scripts": {
        "test": "NODE_TLS_REJECT_UNAUTHORIZED=0 jest --reporters default jest-stare --coverage --detectOpenHandles --runInBand --testTimeout=60000",
        "test:watch": "jest --verbose --watchAll"
      }

    npm run test command will invoke the test parameter with the following:

    • NODE_TLS_REJECT_UNAUTHORIZED=0: ignores the SSL certificate
    • jest: runs the framework with the configurations defined under Jest block
    • –reporters: default jest-stare 
    • –coverage: invokes test coverage
    • –detectOpenHandles: for debugging
    • –runInBand: serial execution of Jest tests
    • –forceExit: to shut down cleanly
    • –testTimeout = 60000 (custom timeout, default is 5000 milliseconds)

    Jest configurations:

    [Note: This is customizable as per requirements]

    "jest": {
        "verbose": true,
        "testSequencer": "/home/abc/jest-supertest/testSequencer.js",
        "coverageDirectory": "/home/abc/jest-supertest/coverage/my_reports/",
        "coverageReporters": ["html","text"],
        "coverageThreshold": {
          "global": {
            "branches": 100,
            "functions": 100,
            "lines": 100,
            "statements": 100
          }
        }
      }

    testSequencer: to invoke testSequencer.js in the workspace to customize the order of running our test files

    touch testSequencer.js

    Below code in testSequencer.js will run our test files in alphabetical order.

    const Sequencer = require('@jest/test-sequencer').default;
    
    class CustomSequencer extends Sequencer {
      sort(tests) {
        // Test structure information
        // https://github.com/facebook/jest/blob/6b8b1404a1d9254e7d5d90a8934087a9c9899dab/packages/jest-runner/src/types.ts#L17-L21
        const copyTests = Array.from(tests);
        return copyTests.sort((testA, testB) => (testA.path > testB.path ? 1 : -1));
      }
    }
    
    module.exports = CustomSequencer;

    • verbose: to display individual test results
    • coverageDirectory: creates a custom directory for coverage reports
    • coverageReporters: format of reports generated
    • coverageThreshold: minimum and maximum threshold enforcements for coverage results

    Testing endpoints with SuperTest

    SuperTest is a node library, superagent driven, to extensively test Restful web services. It hits the HTTP server to send requests (GET, POST, PATCH, PUT, DELETE ) and fetch responses.

    Install SuperTest and save it as a dependency.

    npm install supertest --save-dev

    "devDependencies": {
        "jest": "^25.5.4",
        "jest-serial-runner": "^1.1.0",
        "jest-stare": "^2.0.1",
        "supertest": "^4.0.2"
      }

    All the required dependencies are installed and our package.json looks like:

    {
      "name": "supertestjest",
      "version": "1.0.0",
      "description": "",
      "main": "index.js",
      "jest": {
        "verbose": true,
        "testSequencer": "/home/abc/jest-supertest/testSequencer.js",
        "coverageDirectory": "/home/abc/jest-supertest/coverage/my_reports/",
        "coverageReporters": ["html","text"],
        "coverageThreshold": {
          "global": {
            "branches": 100,
            "functions": 100,
            "lines": 100,
            "statements": 100
          }
        }
      },
      "scripts": {
        "test": "NODE_TLS_REJECT_UNAUTHORIZED=0 jest --reporters default jest-stare --coverage --detectOpenHandles --runInBand --testTimeout=60000",
        "test:watch": "jest --verbose --watchAll"
      },
      "author": "",
      "license": "ISC",
      "devDependencies": {
        "jest": "^25.5.4",
        "jest-serial-runner": "^1.1.0",
        "jest-stare": "^2.0.1",
        "supertest": "^4.0.2"
      }
    }

    Now we are ready to create our Jest tests with some defined conventions:

    • describe block assembles multiple tests or its
    • test block – (an alias usually used is ‘it’) holds single test 
    • expect() –  performs assertions 

    It recognizes the test files in __test__/ folder

    • with .test.js extension
    • with .spec.js extension

    Here is a reference app for API tests.

    Let’s write commonTests.js which will be required by every test file. This hits the app through SuperTest, logs in (if required) and saves authorization token. The aliases are exported from here to be used in all the tests. 

    [Note: commonTests.js, be created or not, will vary as per the test requirements]

    touch commonTests.js

    var supertest = require('supertest'); //require supertest
    const request = supertest('https://reqres.in/'); //supertest hits the HTTP server (your app)
    
    /*
    This piece of code is for getting the authorization token after login to your app.
    const token;
    test("Login to the application", function(){
        return request.post(``).then((response)=>{
            token = response.body.token  //to save the login token for further requests
        })
    }); 
    */
    
    module.exports = 
    {
        request
            //, token     -- export if token is generated
    }

    Moving forward to writing our tests on POST, GET, PUT and DELETE requests for the basic understanding of the setup. For that, we are creating two test files to also see and understand if the sequencer works.

    mkdir __test__/
    touch __test__/postAndGet.test.js __test__/putAndDelete.test.js

    As mentioned above and sticking to Jest protocols, we have our tests written.

    postAndGet.test.js test file:

    • requires commonTests.js into ‘request’ alias
    • POST requests to api/users endpoint, calls supertest.post() 
    • GET requests to api/users endpoint, calls supertest.get()
    • uses file system to write globals and read those across all the tests
    • validates response returned on hitting the HTTP endpoints
    const request = require('../commonTests');
    const fs = require('fs');
    let userID;
    
    //Create a new user
    describe("POST request", () => {
      
      try{
        let userDetails;
        beforeEach(function () {  
            console.log("Input user details!")
            userDetails = {
              "name": "morpheus",
              "job": "leader"
          }; //new user details to be created
          });
        
        afterEach(function () {
          console.log("User is created with ID : ", userID)
        });
    
    	  it("Create user data", async done => {
    
            return request.request.post(`api/users`) //post() of supertest
                    //.set('Authorization', `Token $  {request.token}`) //Authorization token
                    .send(userDetails) //Request header
                    .expect(201) //response to be 201
                    .then((res) => {
                        expect(res.body).toBeDefined(); //test if response body is defined
                        //expect(res.body.status).toBe("success")
                        userID = res.body.id;
                        let jsonContent = JSON.stringify({userId: res.body.id}); // create a json
                        fs.writeFile("data.json", jsonContent, 'utf8', function (err) //write user id into global json file to be used 
                        {
                        if (err) {
                            return console.log(err);
                        }
                        console.log("POST response body : ", res.body)
                        done();
                        });
                      })
                    })
                  }
                  catch(err){
                    console.log("Exception : ", err)
                  }
            });
    
    //GET all users      
    describe("GET all user details", () => {
      
      try{
          beforeEach(function () {
            console.log("GET all users details ")
        });
              
          afterEach(function () {
            console.log("All users' details are retrieved")
        });
    
          test("GET user output", async done =>{
            await request.request.get(`api/users`) //get() of supertest
                                    //.set('Authorization', `Token ${request.token}`) 
                                    .expect(200).then((response) =>{
                                    console.log("GET RESPONSE : ", response.body);
                                    done();
                        })
          })
        }
      catch(err){
        console.log("Exception : ", err)
        }
    });

    putAndDelete.test.js file:

    • requires commonsTests into ‘request’ alias
    • calls data.json into ‘data’ alias which was created by the file system in our previous test to write global variables into it
    • PUT sto api/users/${data.userId} endpoint, calls supertest.put() 
    • DELETE requests to api/users/${data.userId} endpoint, calls supertest.delete() 
    • validates response returned by the endpoints
    • removes data.json (similar to unsetting global variables) after all the tests are done
    const request = require('../commonTests');
    const fs = require('fs'); //file system
    const data = require('../data.json'); //data.json containing the global variables
    
    //Update user data
    describe("PUT user details", () => {
    
        try{
            let newDetails;
            beforeEach(function () {
                console.log("Input updated user's details");
                newDetails = {
                    "name": "morpheus",
                    "job": "zion resident"
                }; // details to be updated
      
            });
            afterEach(function () {
                console.log("user details are updated");
            });
      
            test("Update user now", async done =>{
    
                console.log("User to be updated : ", data.userId)
    
                const response = await request.request.put(`api/users/${data.userId}`).send(newDetails) //call put() of supertest
                                    //.set('Authorization', `Token ${request.token}`) 
                                            .expect(200)
                expect(response.body.updatedAt).toBeDefined();
                console.log("UPDATED RESPONSE : ", response.body);
                done();
        })
      }
        catch(err){
            console.log("ERROR : ", err)
        }
    });
    
    //DELETE the user
    describe("DELETE user details", () =>{
        try{
            beforeAll(function (){
                console.log("To delete user : ", data.userId)
            });
    
            test("Delete request", async done =>{
                const response = await request.request.delete(`api/users/${data.userId}`) //invoke delete() of supertest
                                            .expect(204) 
                console.log("DELETE RESPONSE : ", response.body);
                done(); 
            });
    
            afterAll(function (){
                console.log("user is deleted!!")
                fs.unlinkSync('data.json'); //remove data.json after all tests are run
            });
        }
    
        catch(err){
            console.log("EXCEPTION : ", err);
        }
    });

    And we are done with setting up a decent framework and just a command away!

    npm test

    Once complete, the test results will be immediately visible on the terminal.

    Test results HTML report is also generated as index.html under jest-stare/ 

    And test coverage details are created under coverage/my_reports/ in the workspace.

    Similarly, other HTTP methods can also be tested, like OPTIONS – supertest.options() which allows dealing with CORS, PATCH – supertest.patch(), HEAD – supertest.head() and many more.

    Wasn’t it a convenient and successful journey?

    Conclusion

    So, wrapping it up with a note that API testing needs attention, and as a QA, let’s abide by the concept of a testing pyramid which is nothing but the mindset of a tester and how to combat issues at a lower level and avoid chaos at upper levels, i.e. UI. 

    Testing Pyramid

    I hope you had a good read. Kindly spread the word. Happy coding!

  • Setting up S3 & CloudFront to Deliver Static Assets Across the Web

    If you have a web application, you probably have static content. Static content might include files like images, videos, and music. One of the simpler approaches to serve your content on the internet is Amazon AWS’s “S3 Bucket.” S3 is very easy to set up and use.

    Problems with only using S3 to serve your resources

    But there are a few limitations of serving content directly using S3. Using S3, you will need:

    • Either keep the bucket public, which is not at all recommended
    • Or, create pre-signed urls to access the private resources. Now, if your application has tons of resources to be loaded, then it will add a lot of latency to pre-sign each and every resource before serving on the UI.

    For these reasons, we will also use AWS’s CloudFront.

    Why use CloudFront with S3?

    Amazon CloudFront (CDN) is designed to work seamlessly with S3 to serve your S3 content in a faster way. Also, using CloudFront to serve s3 content gives you a lot more flexibility and control.

    It has below advantages:

    • Using CloudFront provides authentication, so there’s no need to generate pre-signed urls for each resource.
    • Improved Latency, which results in a better end-user experience.
    • CloudFront provides caching, which can reduce the running costs as content is not always served from S3 when cached.
    • Another case for using CloudFront over S3 is that you can use an SSL certificate to a custom domain in CloudFront.

    Setting up S3 & CloudFront

    Creating an S3 bucket

    1. Navigate to S3 from the AWS console and click on Create Bucket. Enter a unique bucket name and select the AWS Region.

    2. Make sure the Block Public Access settings for this bucket is set to “Block All Public Access,” as it is recommended and we don’t need public access to buckets.

    3. Review other options and create a bucket. Once a bucket is created, you can see it on the S3 dashboard. Open the bucket to view its details, and next, let’s add some assets.

    4. Click on upload and add/drag all the files or folders you want to upload. 

    5. Review the settings and upload. You can see the status on successful upload. Go to bucket details, and, after opening up the uploaded asset, you can see the details of the uploaded asset.

    If you try to copy the object URL and open it in the browser, you will get the access denied error as we have blocked direct public access. 

    We will be using CloudFront to serve the S3 assets in the next step. CloudFront will restrict access to your S3 bucket to CloudFront endpoints rendering your content and application will become more secure and performant.

    Creating a CloudFront

    1. Navigate to CloudFront from AWS console and click on Create Distribution. For the Origin domain, select the bucket from which we want to serve the static assets.

    2. Next, we need Use a CloudFront origin access identity (OAI) to access the S3 bucket. This will enable us to access private S3 content via CloudFront. To enable this, under S3 bucket access, select “Yes use OAI.” Select an existing origin access identity or create a new identity.
    You can also choose to update the S3 bucket policy to allow read access to the OAI if it is not already configured previously.

    3. Review all the settings and create distribution. You can see the domain name once it is successfully created.

    4. The basic setup is done. If you can try to access the asset we uploaded via the CloudFront domain in your browser, it should serve the asset. You can access assets at {cloudfront domain name}/{s3 asset}
    for e.g.https://d1g71lhh75winl.cloudfront.net/sample.jpeg

    Even though we successfully served the assets via CloudFront. One thing to note is that all the assets are publicly accessible and not secured. In the next section, we will see how you can secure your CloudFront assets.

    Restricting public access

    Previously, while configuring CloudFront, we set Restrict Viewer access to No, which enabled us to access the assets publicly.

    Let’s see how to configure CloudFront to enable signed URLs for assets that should have restricted access. We will be using Trusted key groups, which is the AWS recommended way for restricting access.

    Creating key group

    To create a key pair for a trusted key group, perform the following steps:

    1. Creating the public–private key pair.

    The below commands will generate an RSA key pair and will store the public key & private key in public_key.pem & private_key.pem files respectively.

    openssl genrsa -out private_key.pem 2048
    openssl rsa -pubout -in private_key.pem -out public_key.pem

    Note: The above steps use OpenSSL as an example to create a key pair. There are other ways to create an RSA key pair as well.

    2. Uploading the Public Key to CloudFront.

    To upload, in the AWS console, open CloudFront console and navigate to Public Key. Choose Create Public Key. Add name and copy and paste the contents of public_key.pem file under Key. Once done, click Create Public Key.

    3. Adding the public key to a Key Group.

    To do this, navigate to Key Groups. Add name and select the public key we created. Once done, click Create Key Group.

    Adding key group signer to distribution

    1. Navigate to CloudFront and choose the distribution whose files you want to protect with signed URLs or signed cookies.
    2. Navigate to the Behaviors tab. Select the cache behavior, and then choose Edit.
    3. For Restrict Viewer Access (Use Signed URLs or Signed Cookies), choose Yes and choose Trusted Key Groups.
    4. For Trusted Key Groups, select the key group, and then choose Add.
    5. Once done, review and Save Changes.

    Cheers, you have successfully restricted public access to assets. If you try to open any asset urls in the browser, you will see something like this:

    You can either create signed urls or cookies using the private key to access the assets.

    Setting cookies and accessing CloudFront private urls

    You need to create and set cookies on the domain to access your content. Once cookies are set,  they will be sent along with every request by the browser.

    The cookies to be set are:

    • CloudFront-Policy: Your policy statement in JSON format, with white space removed, then base64 encoded.
    • CloudFront-Signature: A hashed, signed using the private key, and base64-encoded version of the JSON policy statement.
    • CloudFront-Key-Pair-Id: The ID for a CloudFront public key, e.g., K4EGX7PEAN4EN. The public key ID tells CloudFront which public key to use to validate the signed URL.

    Please note that the cookie names are case-sensitive. Make sure cookies are http only and secure.

    Set-Cookie: 
    CloudFront-Policy=base64 encoded version of the policy statement; 
    Domain=optional domain name; 
    Path=/optional directory path; 
    Secure; 
    HttpOnly
    
    
    Set-Cookie: 
    CloudFront-Signature=hashed and signed version of the policy statement; 
    Domain=optional domain name; 
    Path=/optional directory path; 
    Secure; 
    HttpOnly
    
    Set-Cookie: 
    CloudFront-Key-Pair-Id=public key ID for the CloudFront public key whose corresponding private key you're using to generate the signature; 
    Domain=optional domain name; 
    Path=/optional directory path; 
    Secure; 
    HttpOnly

    Cookies can be created in any language you are working on with help of the AWS SDK. For this blog, we will create cookies in python using the botocore module.

    import functools
    
    import rsa
    from botocore.signers import CloudFrontSigner
    
    CLOUDFRONT_RESOURCE = # IN format "{protocol}://{domain}/{resource}" for e.g. "https://d1g71lhh75winl.cloudfront.net/*"
    CLOUDFRONT_PUBLIC_KEY_ID = # The ID for a CloudFront public key
    CLOUDFRONT_PRIVATE_KEY = # contents of the private_key.pem file associated to public key e.g. open('private_key.pem','rb').read()
    EXPIRES_AT = # Enter datetime for expiry of cookies e.g.: datetime.datetime.now() + datetime.timedelta(hours=1)
    
    # load the private key
    key = rsa.PrivateKey.load_pkcs1(CLOUDFRONT_PRIVATE_KEY)
    # create a signer function that can sign message with the private key
    rsa_signer = functools.partial(rsa.sign, priv_key=key, hash_method="SHA-1")
    # Create a CloudFrontSigner boto3 object
    signer = CloudFrontSigner(CLOUDFRONT_PUBLIC_KEY_ID, rsa_signer)
    
    # build the CloudFront Policy
    policy = signer.build_policy(CLOUDFRONT_RESOURCE, EXPIRES_AT).encode("utf8")
    CLOUDFRONT_POLICY = signer._url_b64encode(policy).decode("utf8")
    
    # create CloudFront Signature
    signature = rsa_signer(policy)
    CLOUDFRONT_SIGNATURE = signer._url_b64encode(signature).decode("utf8")
    
    # you can set this cookies on response
    COOKIES = {
        "CloudFront-Policy": CLOUDFRONT_POLICY,
        "CloudFront-Signature": CLOUDFRONT_SIGNATURE,
        "CloudFront-Key-Pair-Id": CLOUDFRONT_PUBLIC_KEY_ID,
    }

    For more details, you can follow AWS official docs.

    Once you set cookies using the above guide, you should be able to access the asset.

    This is how you can effectively use CloudFront along with S3 to securely serve your content.

  • Real Time Text Classification Using Kafka and Scikit-learn

    Introduction:

    Text classification is one of the essential tasks in supervised machine learning (ML). Assigning categories to text, which can be tweets, Facebook posts, web page, library book, media articles, gallery, etc. has many applications like spam filtering, sentiment analysis, etc. In this blog, we build a text classification engine to classify topics in an incoming Twitter stream using Apache Kafka and scikit-learn – a Python based Machine Learning Library.

    Let’s dive into the details. Here is a diagram to explain visually the components and data flow. The Kafka producer will ingest data from Twitter and send it to Kafka broker. The Kafka consumer will ask the Kafka broker for the tweets. We convert the tweets binary stream from Kafka to human readable strings and perform predictions using saved models. We train the models using Twenty Newsgroups which is a prebuilt training data from Sci-kit. It is a standard data set used for training classification algorithms. 

    In this blog we will use the following machine learning models:

    We have used the following libraries/tools:

    • tweepy – Twitter library for python
    • Apache Kafka
    • scikit-learn
    • pickle – Python Object serialization library

    Let’s first understand the following key concepts:

    • Word to Vector Methodology (Word2Vec)
    • Bag-of-Words
    • tf-idf
    • Multinomial Naive Bayes classifier

    Word2Vec methodology

    One of the key ideas in Natural Language Processing(NLP) is how we can efficiently convert words into numeric vectors which can then be given as an input to machine learning models to perform predictions.

    Neural networks or any other machine learning models are nothing but mathematical functions which need numbers or vectors to churn out the output except tree based methods, they can work on words.

    For this we have an approach known as Word2Vec. A very trivial solution to this would be to use “one-hot” method of converting the word into a sparse matrix with only one element of the vector set to 1, the rest being zero.

    For example, “the apple a day the good” would have following representation

    Here we have transformed the above sentence into a 6×5 matrix, with the 5 being the size of the vocabulary as “the” is repeated. But what are we supposed to do when we have a gigantic dictionary to learn from say more than 100000 words? Here one hot encoding fails. In one hot encoding the relationship between the words is lost. Like “Lanka” should come after “Sri”.

    Here is where Word2Vec comes in. Our goal is to vectorize the words while maintaining the context. Word2vec can utilize either of two model architectures to produce a distributed representation of words: continuous bag-of-words (CBOW) or continuous skip-gram. In the continuous bag-of-words architecture, the model predicts the current word from a window of surrounding context words. The order of context words does not influence prediction (bag-of-words assumption). In the continuous skip-gram architecture, the model uses the current word to predict the surrounding window of context words. 

    Tf-idf (term frequency–inverse document frequency)

    TF-IDF is a statistic which determines how important is a word to the document in given corpus. Variations of tf-idf is used by search engines, for text summarizations etc. You can read more about tf-idf – here.

    Multinomial Naive Bayes classifier

    Naive Bayes Classifier comes from family of probabilistic classifiers based on Bayes theorem. We use it to classify spam or not spam, sports or politics etc. We are going to use this for classifying streams of tweets coming in. You can explore it – here.

    Lets how they fit in together.

    The data from the “20 newsgroups datasets” is completely in text format. We cannot feed it directly to any model to do mathematical calculations. We have to extract features from the datasets and have to convert them to numbers which a model can ingest and then produce an output.
    So, we use Continuous Bag of Words and tf-idf for extracting features from datasets and then ingest them to multinomial naive bayes classifier to get predictions.

    1. Train Your Model

    We are going to use this dataset. We create another file and import the needed libraries We are using sklearn for ML and pickle to save trained model. Now we define the model.

    from __future__ import division,print_function, absolute_import
    from sklearn.datasets import fetch_20newsgroups #built-in dataset
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.naive_bayes import MultinomialNB
    import pickle
    from kafka import KafkaConsumer
    
    #Defining model and training it
    categories = ["talk.politics.misc","misc.forsale","rec.motorcycles",
    "comp.sys.mac.hardware","sci.med","talk.religion.misc"] #http://qwone.com/~jason/20Newsgroups/ for reference
    
    def fetch_train_dataset(categories):
    twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)
    return twenty_train
    
    def bag_of_words(categories):
    count_vect = CountVectorizer()
    X_train_counts = count_vect.fit_transform(fetch_train_dataset(categories).data)
    pickle.dump(count_vect.vocabulary_, open("vocab.pickle", 'wb'))
    return X_train_counts
    
    def tf_idf(categories):
    tf_transformer = TfidfTransformer()
    return (tf_transformer,tf_transformer.fit_transform(bag_of_words(categories)))
    
    def model(categories):
    clf = MultinomialNB().fit(tf_idf(categories)[1], fetch_train_dataset(categories).target)
    return clf
    
    model = model(categories)
    pickle.dump(model,open("model.pickle", 'wb'))
    print("Training Finished!")
    #Training Finished Here

    2. The Kafka Tweet Producer

    We have the trained model in place. Now lets get the real time stream of Twitter via Kafka. We define the Producer.

    # import required libraries
    from kafka import SimpleProducer, KafkaClient
    from tweepy.streaming import StreamListener
    from tweepy import OAuthHandler
    from tweepy import Stream
    from twitter_config import consumer_key, consumer_secret, access_token, access_token_secret
    import json

    Now we will define Kafka settings and will create KafkaPusher Class. This is necessary because we need to send the data coming from tweepy stream to Kafka producer.

    # Kafka settings
    topic = b'twitter-stream'
    
    # setting up Kafka producer
    kafka = KafkaClient('localhost:9092')
    producer = SimpleProducer(kafka)
    
    class KafkaPusher(StreamListener):
    
    def on_data(self, data):
    all_data = json.loads(data)
    tweet = all_data["text"]
    producer.send_messages(topic, tweet.encode('utf-8'))
    return True
    
    def on_error(self, status):
    print statusWORDS_TO_TRACK = ["Politics","Apple","Google","Microsoft","Bikes","Harley Davidson","Medicine"]
    
    if __name__ == '__main__':
    l = KafkaPusher()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)
    while True:
    try:
    stream.filter(languages=["en"], track=WORDS_TO_TRACK)
    except:
    pass

    Note – You need to start Kafka server before running this script.

    3. Loading your model for predictions

    Now we have the trained model in step 1 and a twitter stream in step 2. Lets use the model now to do actual predictions. The first step is to load the model:

    #Loading model and vocab
    print("Loading pre-trained model")
    vocabulary_to_load = pickle.load(open("vocab.pickle", 'rb'))
    count_vect = CountVectorizer(vocabulary=vocabulary_to_load)
    load_model = pickle.load(open("model.pickle", 'rb'))count_vect._validate_vocabulary()
    tfidf_transformer = tf_idf(categories)[0]

    Then we start the kafka consumer and begin predictions:

    #predicting the streaming kafka messages
    consumer = KafkaConsumer('twitter-stream',bootstrap_servers=['localhost:9092'])
    print("Starting ML predictions.")
    for message in consumer:
    X_new_counts = count_vect.transform([message.value])
    X_new_tfidf = tfidf_transformer.transform(X_new_counts)
    predicted = load_model.predict(X_new_tfidf)
    print(message.value+" => "+fetch_train_dataset(categories).target_names[predicted[0]])

    Following are some of the classification done by our model

    • RT @amazingatheist: Making fun of kids who survived a school shooting just days after the event because you disagree with their politics is… => talk.politics.misc
    • sci.med
    • RT @DavidKlion: Apropos of that D’Souza tweet; I think in order to make sense of our politics, you need to understand that there are some t… => talk.politics.misc
    • RT @BeauWillimon: These students have already cemented a place in history with their activism, and they’re just getting started. No one wil… => talk.politics.misc
    • RT @byedavo: Cause we ain’t got no president => talk.politics.misc
    • RT @appleinsider: .@Apple reportedly in talks to buy cobalt, key Li-ion battery ingredient, directly from miners … => comp.sys.mac.hardware

    Here is the link to the complete git repository

    Conclusion:

    In this blog, we were successful in creating a data pipeline where we were using the Naive Bayes model for doing classification of the streaming twitter data. We can classify other sources of data like news articles, blog posts etc. Do let us know if you have any questions, queries and additional thoughts in the comments section below.

    Happy coding!

  • Build and Deploy a Real-Time React App Using AWS Amplify and GraphQL

    GraphQL is becoming a popular way to use APIs in modern web and mobile apps.

    However, learning new things is always time-consuming and without getting your hands dirty, it’s very difficult to understand the nuances of a new technology.

    So, we have put together a powerful and concise tutorial that will guide you through setting up a GraphQL backend and integration into your React app in the shortest time possible. This tutorial is light on opinions, so that once you get a hang of the fundamentals, you can go on and tailor your workflow.

    Key topics and takeaways:

    • Authentication
    • GraphQL API with AWS AppSync
    • Hosting
    • Working with multiple environments
    • Removing services

    What will we be building?

    We will build a basic real-time Restaurant CRUD app using authenticated GraphQL APIs. Click here to try the deployed version of the app to see what we’ll be building.

    Will this tutorial teach React or GraphQL concepts as well?

    No. The focus is to learn how to use AWS Amplify to build cloud-enabled, real-time web applications. If you are new to React or GraphQL, we recommend going through the official documentation and then coming back here.

    What do I need to take this tutorial?

    • Node >= v10.9.0
    • NPM >= v6.9.0 packaged with Node.

    Getting started – Creating the application

    To get started, we first need to create a React project using the create-react-app boilerplate:

    npx create-react-app amplify-app --typescript
    cd amplify-app

    Let’s now install the AWS Amplify and AWS Amplify React bindings and try running the application:

    npm install --save aws-amplify aws-amplify-react
    npm start

    If you have initialized the app with Typescript and see errors while using

    aws-amplify-react, add aws-amplify-react.d.ts to src with:

    declare module 'aws-amplify-react';

    Installing the AWS Amplify CLI and adding it to the project

    To install the CLI:

    npm install -g @aws-amplify/cli

    Now we need to configure the CLI with our credentials:

    amplify configure

    If you’d like to see a video walkthrough of this process, click here

    Here we’ll walk you through the amplify configure setup. After you sign in to the AWS console, follow these steps:

    • Specify the AWS region: ap-south-1 (Mumbai) <Select the region based on your location. Click here for reference>
    • Specify the username of the new IAM user: amplify-app <name of=”” your=”” app=””></name>

    In the AWS Console, click Next: Permissions, Next: Tags, Next: Review, and Create User to create the new IAM user. Then, return to the command line and press Enter.

    • Enter the credentials of the newly created user:
      accessKeyId: <your_access_key_id> </your_access_key_id>
      secretAccessKey: <your_secret_access_key></your_secret_access_key>
    • Profile Name: default

    To view the newly created IAM user, go to the dashboard. Also, make sure that your region matches your selection.

    To add amplify to your project:

    amplify init

    Answer the following questions:

    • Enter a name for the project: amplify-app <name of=”” your=”” app=””></name>
    • Enter a name for the environment: dev <name of=”” your=”” environment=””></name>
    • Choose your default editor: Visual Studio Code <your default editor=””></your>
    • Choose the type of app that you’re building: javaScript
    • What JavaScript framework are you using: React
    • Source Directory Path: src
    • Distribution Directory Path: build
    • Build Command: npm run build (for macOS/Linux), npm.cmd run-script build (for Windows)
    • Start Command: npm start (for macOS/Linux), npm.cmd run-script start (for Windows)
    • Do you want to use an AWS profile: Yes
    • Please choose the profile you want to use: default

    Now, the AWS Amplify CLI has initialized a new project and you will see a new folder: amplify. This folder has files that hold your project configuration.

    <amplify-app>
    |_ amplify
    |_ .config
    |_ #current-cloud-backend
    |_ backend
    team-provider-info.json

    Adding Authentication

    To add authentication:

    amplify add auth

    When prompted, choose:

    • Do you want to use default authentication and security configuration: Default configuration
    • How do you want users to be able to sign in when using your Cognito User Pool: Username
    • What attributes are required for signing up: Email

    Now, let’s run the push command to create the cloud resources in our AWS account:

    amplify push

    To quickly check your newly created Cognito User Pool, you can run

    amplify status

    To access the AWS Cognito Console at any time, go to the dashboard. Also, ensure that your region is set correctly.

    Now, our resources are created and we can start using them.

    The first thing is to connect our React application to our new AWS Amplify project. To do this, reference the auto-generated aws-exports.js file that is now in our src folder.

    To configure the app, open App.tsx and add the following code below the last import:

    import Amplify from 'aws-amplify';
    import awsConfig from './aws-exports';
    
    Amplify.configure(awsConfig);

    Now, we can start using our AWS services.
    To add the Authentication flow to the UI, export the app component by wrapping it with the authenticator HOC:

    import { withAuthenticator } from 'aws-amplify-react';
    ...
    // app component
    ...
    export default withAuthenticator(App);

    Now, let’s run the app to check if an Authentication flow has been added before our App component is rendered.

    This flow gives users the ability to sign up and sign in. To view any users that were created, go back to the Cognito dashboard. Alternatively, you can also use:

    amplify console auth

    The withAuthenticator HOC is a really easy way to get up and running with authentication, but in a real-world application, we probably want more control over how our form looks and functions. We can use the aws-amplify/Auth class to do this. This class has more than 30 methods including signUp, signIn, confirmSignUp, confirmSignIn, and forgotPassword. These functions return a promise, so they need to be handled asynchronously.

    Adding and Integrating the GraphQL API

    To add GraphQL API, use the following command:

    amplify add api

    Answer the following questions:

    • Please select from one of the below mentioned services: GraphQL
    • Provide API name: RestaurantAPI
    • Choose an authorization type for the API: API key
    • Do you have an annotated GraphQL schema: No
    • Do you want a guided schema creation: Yes
    • What best describes your project: Single object with fields (e.g., “Todo” with ID, name, description)
    • Do you want to edit the schema now: Yes

    When prompted, update the schema to the following:

    type Restaurant @model {
      id: ID!
      name: String!
      description: String!
      city: String!
    }

    Next, let’s run the push command to create the cloud resources in our AWS account:

    amplify push

    • Are you sure you want to continue: Yes
    • Do you want to generate code for your newly created GraphQL API: Yes
    • Choose the code generation language target: typescript
    • Enter the file name pattern of graphql queries, mutations and subscriptions: src/graphql/**/*.ts
    • Do you want to generate/update all possible GraphQL operations – queries, mutations and subscriptions: Yes
    • Enter maximum statement depth [increase from default if your schema is deeply nested]: 2
    • Enter the file name for the generated code: src/API.ts

    Notice your GraphQL endpoint and API KEY. This step has created a new AWS AppSync API and generated the GraphQL queries, mutations, and subscriptions on your local. To check, see src/graphql or visit the AppSync dashboard. Alternatively, you can use:

    amplify console api

    Please select from one of the below mentioned services: GraphQL

    Now, in the AppSync console, on the left side click on Queries. Execute the following mutation to create a restaurant in the API:

    mutation createRestaurant {
      createRestaurant(input: {
        name: "Nobu"
        description: "Great Sushi"
        city: "New York"
      }) {
        id name description city
      }
    }

    Now, let’s query for the restaurant:

    query listRestaurants {
      listRestaurants {
        items {
          id
          name
          description
          city
        }
      }
    }

    We can even search / filter data when querying:

    query searchRestaurants {
      listRestaurants(filter: {
        city: {
          contains: "New York"
        }
      }) {
        items {
          id
          name
          description
          city
        }
      }
    }

    Now that the GraphQL API is created, we can begin interacting with it from our client application. Here is how we’ll add queries, mutations, and subscriptions:

    import Amplify, { API, graphqlOperation } from 'aws-amplify';
    import { withAuthenticator } from 'aws-amplify-react';
    import React, { useEffect, useReducer } from 'react';
    import { Button, Col, Container, Form, Row, Table } from 'react-bootstrap';
    
    import './App.css';
    import awsConfig from './aws-exports';
    import { createRestaurant } from './graphql/mutations';
    import { listRestaurants } from './graphql/queries';
    import { onCreateRestaurant } from './graphql/subscriptions';
    
    Amplify.configure(awsConfig);
    
    type Restaurant = {
      name: string;
      description: string;
      city: string;
    };
    
    type AppState = {
      restaurants: Restaurant[];
      formData: Restaurant;
    };
    
    type Action =
      | {
          type: 'QUERY';
          payload: Restaurant[];
        }
      | {
          type: 'SUBSCRIPTION';
          payload: Restaurant;
        }
      | {
          type: 'SET_FORM_DATA';
          payload: { [field: string]: string };
        };
    
    type SubscriptionEvent<D> = {
      value: {
        data: D;
      };
    };
    
    const initialState: AppState = {
      restaurants: [],
      formData: {
        name: '',
        city: '',
        description: '',
      },
    };
    const reducer = (state: AppState, action: Action) => {
      switch (action.type) {
        case 'QUERY':
          return { ...state, restaurants: action.payload };
        case 'SUBSCRIPTION':
          return { ...state, restaurants: [...state.restaurants, action.payload] };
        case 'SET_FORM_DATA':
          return { ...state, formData: { ...state.formData, ...action.payload } };
        default:
          return state;
      }
    };
    
    const App: React.FC = () => {
      const createNewRestaurant = async (e: React.SyntheticEvent) => {
        e.stopPropagation();
        const { name, description, city } = state.formData;
        const restaurant = {
          name,
          description,
          city,
        };
        await API.graphql(graphqlOperation(createRestaurant, { input: restaurant }));
      };
    
      const [state, dispatch] = useReducer(reducer, initialState);
    
      useEffect(() => {
        getRestaurantList();
    
        const subscription = API.graphql(graphqlOperation(onCreateRestaurant)).subscribe({
          next: (eventData: SubscriptionEvent<{ onCreateRestaurant: Restaurant }>) => {
            const payload = eventData.value.data.onCreateRestaurant;
            dispatch({ type: 'SUBSCRIPTION', payload });
          },
        });
    
        return () => subscription.unsubscribe();
      }, []);
    
      const getRestaurantList = async () => {
        const restaurants = await API.graphql(graphqlOperation(listRestaurants));
        dispatch({
          type: 'QUERY',
          payload: restaurants.data.listRestaurants.items,
        });
      };
    
      const handleChange = (e: React.ChangeEvent<HTMLInputElement>) =>
        dispatch({
          type: 'SET_FORM_DATA',
          payload: { [e.target.name]: e.target.value },
        });
    
      return (
        <div className="App">
          <Container>
            <Row className="mt-3">
              <Col md={4}>
                <Form>
                  <Form.Group controlId="formDataName">
                    <Form.Control onChange={handleChange} type="text" name="name" placeholder="Name" />
                  </Form.Group>
                  <Form.Group controlId="formDataDescription">
                    <Form.Control onChange={handleChange} type="text" name="description" placeholder="Description" />
                  </Form.Group>
                  <Form.Group controlId="formDataCity">
                    <Form.Control onChange={handleChange} type="text" name="city" placeholder="City" />
                  </Form.Group>
                  <Button onClick={createNewRestaurant} className="float-left">
                    Add New Restaurant
                  </Button>
                </Form>
              </Col>
            </Row>
    
            {state.restaurants.length ? (
              <Row className="my-3">
                <Col>
                  <Table striped bordered hover>
                    <thead>
                      <tr>
                        <th>#</th>
                        <th>Name</th>
                        <th>Description</th>
                        <th>City</th>
                      </tr>
                    </thead>
                    <tbody>
                      {state.restaurants.map((restaurant, index) => (
                        <tr key={`restaurant-${index}`}>
                          <td>{index + 1}</td>
                          <td>{restaurant.name}</td>
                          <td>{restaurant.description}</td>
                          <td>{restaurant.city}</td>
                        </tr>
                      ))}
                    </tbody>
                  </Table>
                </Col>
              </Row>
            ) : null}
          </Container>
        </div>
      );
    };
    
    export default withAuthenticator(App);

    Finally, we have our app ready. You can now sign-up,sign-in, add new restaurants, see real-time updates of newly added restaurants.

    Hosting

    The hosting category enables you to deploy and host your app on AWS.

    amplify add hosting

    • Select the environment setup: DEV (S3 only with HTTP)
    • hosting bucket name: <YOUR_BUCKET_NAME>
    • index doc for the website: index.html
    • error doc for the website: index.html

    Now, everything is set up & we can publish it:

    amplify publish

    Working with multiple environments

    You can create multiple environments for your application to create & test out new features without affecting the main environment which you are working on.

    When you use an existing environment to create a new environment, you get a copy of the entire backend application stack (CloudFormation) for the current environment. When you make changes in the new environment, you are then able to test these new changes in the new environment & merge only the changes that have been made since the new environment was created.

    Let’s take a look at how to create a new environment. In this new environment, we’ll add another field for the restaurant owner to the GraphQL Schema.

    First, we’ll initialize a new environment using amplify init:

    amplify init

    • Do you want to use an existing environment: N
    • Enter a name for the environment: apiupdate
    • Do you want to use an AWS profile: Y

    Once the new environment is initialized, we should be able to see some information about our environment setup by running:

    amplify env list
    
    | Environments |
    | ------------ |
    | dev |
    | *apiupdate |

    Now, add the owner field to the GraphQL Schema in

    amplify/backend/api/RestaurantAPI/schema.graphql:

    type Restaurant @model {
      ...
      owner: String
    }

    Run the push command to create a new stack:

    amplify push.

    After testing it out, it can be merged into our original dev environment:

    amplify env checkout dev
    amplify status
    amplify push

    • Do you want to update code for your updated GraphQL API: Y
    • Do you want to generate GraphQL statements: Y

    Removing Services

    If at any time, you would like to delete a service from your project & your account, you can do this by running the amplify remove command:

    amplify remove auth
    amplify push

    If you are unsure of what services you have enabled at any time, amplify status will give you the list of resources that are currently enabled in your app.

    Sample code

    The sample code for this blog post with an end to end working app is available here.

    Summary

    Once you’ve worked through all the sections above, your app should now have all the capabilities of a modern app, and building GraphQL + React apps should now be easier and faster with Amplify.

  • Enable Real-time Functionality in Your App with GraphQL and Pusher

    The most recognized solution for real-time problems is WebSockets (WS), where there is a persistent connection between the client and the server, and either can start sending data at any time. One of the latest implementations of WS is GraphQL subscriptions.

    With GraphQL subscriptions, you can easily add real-time functionalities to your application. There is an easy and standard way to implement a subscription in the GraphQL app. The client just has to make a subscription query to the server, which specifies the event and the data shape. With this query, the client establishes a long-lived connection with the server on which it listens to specific events. Just as how GraphQL solves the over-fetching problem in the REST API, a subscription continues to extend the solution for real-time.

    In this post, we will learn how to bring real-time functionality to your app by implementing GraphQL subscriptions with Pusher to manage Pub/Sub capabilities. The goal is to configure a Pusher channel and implement two subscriptions to be exposed by your GraphQL server. We will be implementing this in a Node.js runtime environment.

    Why Pusher?

    Why are we doing this using Pusher? 

    • Pusher, being a hosted real-time services provider, relieves us from managing our own real-time infrastructure, which is a highly complex problem.
    • Pusher provides an easy and consistent API.
    • Pusher also provides an entire set of tools to monitor and debug your realtime events.
    • Events can be triggered by and consumed easily from different applications written in different frameworks.

    Project Setup

    We will start with a repository that contains a codebase for a simple GraphQL backend in Node.js, which is a minimal representation of a blog post application. The entities included are:

    1. Link – Represents an URL and a small description for the Link
    2. User – Link belongs to User
    3. Vote – Represents users vote for a Link

    In this application, a User can sign up and add or vote a Link in the application, and other users can upvote the Link. The database schema is built using Prisma and SQLite for quick bootstrapping. In the backend, we will use graphql-yoga as the GraphQL server implementation. To test our GraphQL backend, we will use the graphql-playground by Prisma, as a client, which will perform all queries and mutations on the server.

    To set up the application:

    1. Clone the repository here
    2. Install all dependencies using 
    npm install

    1. Set up a database using prisma-cli with following commands
    npx prisma migrate save --experimental
    #! Select ‘yes’ for the prompt to add an SQLite db after this command and enter a name for the migration. 
    npx prisma migrate up --experimental
    npx prisma generate

    Note: Migrations are experimental features of the Prisma ORM, but you can ignore them because you can have a different backend setup for DB interactions. The purpose of using Prisma here is to quickly set up the project and dive into subscriptions.

    A new directory, named Prisma, will be created containing the schema and database in SQLite. Now, you have your database and app set up and ready to use.

    To start the Node.js application, execute the command:

    npm start

    Navigate to http://localhost:4000 to see the graphql-playground where we will execute our queries and mutations.

    Our next task is to add a GraphQL subscription to our server to allow clients to listen to the following events:

    • A new Link is created
    • A Link is upvoted

    To add subscriptions, we will need an npm package called graphql-pusher-subscriptions to help us interact with the Pusher service from within the GraphQL resolvers. The module will trigger events and listen to events for a channel from the Pusher service.

    Before that, let’s first create a channel in Pusher. To configure a Pusher channel, head to their website at Pusher to create an account. Then, go to your dashboard and create a channels application. Choose a name, the cluster closest to your location, and frontend tech as React and backend tech as Node.js.

    You will receive the following code to start.

    Now, we add the graphql-pusher-subscription package. This package will take the Pusher channel configuration and give you an API to trigger and listen to events published on the channel.

    Now, we import the package in the src/index.js file.

    const { PusherChannel } = require('graphql-pusher-subscriptions');

    After the PusherChannel class provided by the module accepts a configuration for the channel, we need to instantiate the class and create a reference Pub/Sub to the object. We give the Pusher config object given while creating the channel.

    const pubsub = new PusherChannel({
      appId: '1046878',
      key: '3c84229419ed7b47e5b0',
      secret: 'e86868a98a2f052981a6',
      cluster: 'ap2',
      encrypted: true,
      channel: 'graphql-subscription'
    });

    Now, we add “pubsub” to the context so that it is available to all the resolvers. The channel field tells the client which channel to subscribe to. Here we have the channel “graphql-subscription”.

    const server = new GraphQLServer({
      typeDefs: './src/schema.graphql',
      resolvers,
      context: request => {
        return {
          ...request,
          prisma,
          pubsub
        }
      },
    })

    The above part enables us to access the methods we need to implement our subscriptions from inside our resolvers via context.pubsub.

    Subscribing to Link-created Event

    The first step to add a subscription is to extend the GraphQL schema definition.

    type Subscription {
      newLink: Link
    }

    Next, we implement the resolver for the “newLink” subscription type field. It is important to note that resolvers for subscriptions are different from queries and mutations in minor ways.

    1. They return an AsyncIterator instead of data, which is then used by a GraphQL server to publish the event payload to the subscriber client.

    2. The subscription resolvers are provided as a value of the resolve field inside an object. The object should also contain another field named “resolve” that returns the payload data from the data emitted by AsyncIterator.

    To add the resolvers for the subscription, we start by adding a new file called Subscriptions.js

    Inside the project directory, add the file as src/resolvers/Subscription.js

    Now, in the new file created, add the following code, which will be the subscription resolver for the “newLink” type we created in GraphQL schema.

    function newLinkSubscribe(parent, args, context, info) {
      return context.pubsub.asyncIterator("NEW_LINK")
    }
    
    const newLink = {
      subscribe: newLinkSubscribe,
      resolve: payload => {
        return payload
      },
    }
    
    module.exports = {
      newLink,
    }
    view raw

    In the code above, the subscription resolver function, newLinkSubscribe, is added as a field value to the property subscribe just as we described before. The context provides reference to the Pub/Sub object, which lets us use the asyncIterator() with “NEW_LINK” as a parameter. This function resolves subscriptions and publishes events.

    Adding Subscriptions to Your Resolvers

    The final step for our subscription implementation is to call the function above inside of a resolver. We add the following call to pubsub.publish() inside the post resolver function inside Mutation.js file.

    function post(parent, args, context, info) {
      const userId = getUserId(context)
      const newLink = await context.prisma.link.create({
        data: {
          url: args.url,
          description: args.description,
          postedBy: { connect: { id: userId } },
        }
      })
      context.pubsub.publish("NEW_LINK", newLink)
      return newLink
    }

    In the code above, we can see that we pass the same string “NEW_LINK” to the publish method as we did in the newLinkSubscribe function in the subscription function before. The “NEW_LINK” is the event name, and it will publish events to the Pusher service, and the same name will be used on the subscription resolver to bind to the particular event name. We also add the newLink as a second argument, which contains the data part for the event that will be published. The context.pubsub.publish function will be triggered before returning the newLink data.

    Now, we will update the main resolver object, which is given to the GraphQL server.

    First, import the subscription module inside of the index.js file.

    const Subscription = require('./resolvers/Subscription') 
    const resolvers = {
      Query,
      Mutation,
      Subscription,
      User,
      Link,
    }

    Now, with all code in place, we start testing our real time API. We will use multiple instances/tabs of GraphQL playground concurrently.

    Testing Subscriptions

    If your server is already running, then kill it with CTRL+C and restart with this command:

    npm start

    Next, open the browser and navigate to http://localhost:4000 to see the GraphQL playground. We will use one tab of the playground to perform the mutation to trigger the event to Pusher and invoke the subscriber.

    We will now start to execute the queries to add some entities in the application.

    First, let’s create a user in the application by using the signup mutation. We send the following mutation to the server to create a new User entity.

    mutation {
        signup(
        name: "Alice"
        email: "alice@prisma.io"
        password: "graphql"
      ) {
        token
        user {
          Id
        }
      }
    }

    You will see a response in the playground that contains the authentication token for the user. Copy the token, and open another tab in the playground. Inside that new tab, open the HTTP_HEADERS section in the bottom and add the Authorization header.

    Replace the __TOKEN__  placeholder from the below snippet with the copied token from above.

    {
      "Authorization": "Bearer __TOKEN__"
    }

    Now, all the queries or mutations executed from that tab will carry the authentication token. With this in place, we sent the following mutation to our GraphQL server.

    mutation {
    post(
        url: "http://velotio.com"
        description: "An awesome GraphQL blog"
      ) {
        id
      }
    }

    The mutations above create a Link entity inside the application. Now that we have created an entity, we now move to test the subscription part. In another tab, we will send the subscription query and create a persistent WebSocket connection to the server. Before firing out a subscription query, let us first understand the syntax of it. It starts with the keyword subscription followed by the subscription name. The subscription query is defined in the GraphQL schema and shows the data shape we can resolve to. Here, we want to subscribe to a newLink subscription name, and the data resolved by it consists of that of a Link entity. That means we can resolve any specific part of the Link entity. Here, we are asking for attributes like id, URL, description, and nested attributes of the postedBy field.

    subscription {
      newLink {
          id
          url
          description
          postedBy {
            id
            name
            email
          }
      }
    }

    The response of this operation is different from that of a mutation or query. You see a loading spinner, which indicates that it is waiting for an event to happen. This means the GraphQL client (playground) has established a connection with the server and is listening for response data.

    Before triggering a subscription, we will also keep an eye on the Pusher channel for events triggered to verify that our Pusher service is integrated successfully.

    To do this, we go to Pusher dasboard and navigate to the channel app we created and click on the debug console. The debug console will show us the events triggered in real-time.

    Now that the Pusher dashboard is visible, we will trigger the subscription event by running the following mutation inside a new Playground tab.

    mutation {
      post(
        url: "www.velotio.com"
        description: "Graphql remote schema stitching"
      ) {
        id
      }
    }

    Now, we observe the Playground where subscription was running.

    We can see that the newly created Link is visible in the response section, and the subscription continues to listen, and the event has reached the Pusher service.

    You will observe an event on the Pusher console that is the same event and data as sent by your post mutation.

     

    We have achieved our first goal, i.e., we have integrated the Pusher channel and implemented a subscription for a Link creation event.

    To achieve our second goal, i.e., to listen to Vote events, we repeat the same steps as we did for the Link subscription.

    We add a subscription resolver for Vote in the Subscriptions.js file and update the Subscription type in the GraphQL schema. To trigger a different event, we use “NEW_VOTE” as the event name and add the publish function inside the resolver for Vote mutation.

    function newVoteSubscribe(parent, args, context, info) {
      return context.pubsub.asyncIterator("NEW_VOTE")
    }
    
    const newVote = {
      subscribe: newVoteSubscribe,
      resolve: payload => {
        return payload
      },
    }
    view raw

    Update the export statement to add the newVote resolver.

    module.exports = {
      newLink,
      newVote,
    }

    Update the Vote mutation to add the publish call before returning the newVote data. Notice that the first parameter, “NEW_VOTE”, is being passed so that the listener can bind to the new event with that name.

    const newVote = context.prisma.vote.create({
        data: {
          user: { connect: { id: userId } },
          link: { connect: { id: Number(args.linkId) } },
        }
      })
      context.pubsub.publish("NEW_VOTE", newVote)
      return newVote
    }

    Now, restart the server and complete the signup process with setting HTTP_HEADERS as we did before. Add the following subscription to a new Playground tab.

    subscription {
      newVote {
        id
        link {
          url
          description
        }
        user {
          name
          email
        }
      }
    }

    In another Playground tab, send the following Vote mutation to the server to trigger the event, but do not forget to verify the Authorization header. The below mutation will add the Vote of the user to the Link.  Replace the “__LINK_ID__” with the linkId generated in the previous post mutation.

    mutation {
      vote(linkId: "__LINK_ID__") {
        link {
          url
          description
        }
        user {
          name
          email
        }
      }
    }

    Observe the event data on the response tab of the vote subscription. Also, you can check your event triggered on the pusher dashboard.

    The final codebase is available on a branch named with-subscription.

    Conclusion

    By following the steps above, we saw how easy it is to add real-time features to GraphQL apps with subscriptions. Also, establishing a connection with the server is no hassle, and it is much similar to how we implement the queries and mutations. Unlike the mainstream approach, where one has to build and manage the event handlers, the GraphQL subscriptions come with these features built-in for the client and server. Also, we saw how we can use a managed real-time service like Pusher can be for Pub/Sub events. Both GraphQL and Pusher can prove to be a solid combination for a reliable real-time system.

    Related Articles

    1. Build and Deploy a Real-Time React App Using AWS Amplify and GraphQL

    2. Scalable Real-time Communication With Pusher

  • Real Time Analytics for IoT Data using Mosquitto, AWS Kinesis and InfluxDB

    Internet of things (IoT) is maturing rapidly and it is finding application across various industries. Every common device that we use is turning into the category of smart devices. Smart devices are basically IoT devices. These devices captures various parameters in and around their environment leading to generation of a huge amount of data. This data needs to be collected, processed, stored and analyzed in order to get actionable insights from them. To do so, we need to build data pipeline.  In this blog we will be building a similar pipeline using Mosquitto, Kinesis, InfluxDB and Grafana. We will discuss all these individual components of the pipeline and the steps to build it.

    Why the Analysis of IoT data is different

    In an IoT setup, the data is generated by sensors that are distributed across various locations. In order to use the data generated by them we should first get them to a common location from where the various applications which want to process them can read it.

    Network Protocol

    IoT devices have low computational and network resources. Moreover, these devices write data in very short intervals thus high throughput is expected on the network. For transferring IoT data it is desirable to use lightweight network protocols. A protocol like HTTP uses a complex structure for communication resulting in consumption of more resources making it unsuitable for IoT data transfer. One of the lightweight protocol suitable for IoT data is MQTT which we are using in our pipeline. MQTT is designed for machine to machine (M2M) connectivity. It uses a publisher/subscriber communication model and helps clients to distribute telemetry data with very low network resource consumption. Along with IoT MQTT has been found to be useful in other fields as well.

    Other similar protocols include Constrained Application Protocol (CoAP), Advanced Message Queuing Protocol (AMQP) etc.

    Datastore   

    IoT devices generally collect telemetry about its environment usually through sensors. In most of the IoT scenarios, we try to analyze how things have changed over a period of time. Storing these data in a time series database makes our analysis simpler and better. InfluxDB is popular time series database which we will use in our pipeline. More about time series databases can be read here.

    Pipeline Overview

    The first thing we need for a data pipeline is data. As shown in the image above the data generated by various sensors are written to a topic in the MQTT message broker. To mimic sensors we will use a program which uses the MQTT client to write data to the MQTT broker.

    The next component is Amazon Kinesis which is used for streaming data analysis. It closely resembles apache Kafka which is an open source tool used for similar purposes. Kinesis brings the data generated by a number of clients to a single location from where different consumers can pull it for processing. We are using Kinesis so that multiple consumers can read data from a single location. This approach scales well even if we have multiple message brokers.

    Once the data is written to the MQTT broker a Kinesis producer subscribes to it and pull the data from it and writes it to the Kinesis stream, from the Kinesis stream the data is pulled by Kinesis consumers which processes the data and writes it to an InfluxDB which is a time series database.

    Finally, we use Grafana which is a well-known tool for analytics and monitoring, we can connect it to many popular databases and perform analytics and monitoring. Another popular tool in this space is Kibana (the K of ELK stack)

    Setting up a MQTT Message Broker Server:

    For MQTT message broker we will use Mosquitto which is a popular open source message broker that implements MQTT. The details of downloading and installing mosquitto for various platforms are available here.

    For Ubuntu, it can be installed using the following commands

    sudo apt-add-repository ppa:mosquitto-dev/mosquitto-ppa
    sudo apt-get update
    sudo apt-get install mosquitto
    service mosquitto status

    Setting up InfluxDB and Grafana

    The simplest way to set up both these components is to use their docker image directly

    docker run --name influxdb -p 8083:8083 -p 8086:8086 influxdb:1.0
    docker run --name grafana -p 3000:3000 --link influxdb grafana/grafana:3.1.1

    In InfluxDB we have mapped two ports, port 8086 is the HTTP API endpoint port while 8083 is the administration web server’s port. We need to create a database where we will write our data.

    For creating a database we can directly go to the console at <influxdb-ip>:8083 and run the command: </influxdb-ip>

    CREATE DATABASE "iotdata"

    Or we can do it via HTTP request :

    curl -XPOST "http://localhost:8086/query" --data-urlencode "q=CREATE DATABASE iotdata

    Creating a Kinesis stream

    In Kinesis, we create streams where the Kinesis producers write the data coming from various sources and then the Kinesis consumers read the data from the stream. In the stream, the data is stored in various shards. For our purpose, one shard would be enough.

    Creating the MQTT client

    We will use the Golang client available in this repository to connect with our message broker server and write data to a specific topic. We will first create a new MQTT client. Here we can see the list of options we have for configuring our MQTT client.

    Once we create the options object we can pass it to the NewClient() method which will return us the MQTT client. Now we can write data to the MQTT server. We have defined the structure of the data in the struct sensor data. Now to mimic two sensors which are writing telemetry data to the MQTT broker we have two goroutines which push data to the MQTT server every five seconds.

    package publisher
    
    import (
    	"config"
    	"encoding/json"
    	"fmt"
    	"log"
    	"math/rand"
    	"os"
    	"time"
    
    	"github.com/eclipse/paho.mqtt.golang"
    )
    
    type SensorData struct {
    	Id          string  `json:"id"`
    	Temperature float64 `json:"temperature"`
    	Humidity    float64 `json:"humidity"`
    	Timestamp   int64   `json:"timestamp"`
    	City        string  `json:"city"`
    }
    
    func StartMQTTPublisher() {
    	fmt.Println("MQTT publisher Started")
    	mqtt.DEBUG = log.New(os.Stdout, "", 0)
    	mqtt.ERROR = log.New(os.Stdout, "", 0)
    	opts := mqtt.NewClientOptions().AddBroker(config.GetMqttServerurl()).SetClientID("MqttPublisherClient")
    	opts.SetKeepAlive(2 * time.Second)
    	opts.SetPingTimeout(1 * time.Second)
    	c := mqtt.NewClient(opts)
    	if token := c.Connect(); token.Wait() && token.Error() != nil {
    		panic(token.Error())
    	}
    
    	go func() {
    		t := 20.04
    		h := 32.06
    		for i := 0; i < 100; i++ {
    			sensordata := SensorData{
    				Id:          "CITIMUM",
    				Temperature: t,
    				Humidity:    h,
    				Timestamp:   time.Now().Unix(),
    				City:        "Mumbai",
    			}
    			requestBody, err := json.Marshal(sensordata)
    			if err != nil {
    				fmt.Println(err)
    			}
    			token := c.Publish(config.GetMQTTTopicName(), 0, false, requestBody)
    			token.Wait()
    			if i < 50 {
    				t = t + 1*rand.Float64()
    				h = h + 1*rand.Float64()
    			} else {
    				t = t - 1*rand.Float64()
    				h = h - 1*rand.Float64()
    			}
    			time.Sleep(5 * time.Second)
    		}
    	}()
    	go func() {
    		t := 16.02
    		h := 24.04
    		for i := 0; i < 100; i++ {
    			sensordata := SensorData{
    				Id:          "CITIPUN",
    				Temperature: t,
    				Humidity:    h,
    				Timestamp:   time.Now().Unix(),
    				City:        "Pune",
    			}
    			requestBody, err := json.Marshal(sensordata)
    			if err != nil {
    				fmt.Println(err)
    			}
    			token := c.Publish(config.GetMQTTTopicName(), 0, false, requestBody)
    			token.Wait()
    			if i < 50 {
    				t = t + 1*rand.Float64()
    				h = h + 1*rand.Float64()
    			} else {
    				t = t - 1*rand.Float64()
    				h = h - 1*rand.Float64()
    			}
    			time.Sleep(5 * time.Second)
    		}
    	}()
    	time.Sleep(1000 * time.Second)
    	c.Disconnect(250)
    
    }

    Create a Kinesis Producer

    Now we will create a Kinesis producer which subscribes to the topic to which our MQTT client writes data and pull the data from the broker and pushes it to the Kinesis stream. Just like in the previous section here also we first create an MQTT client which connects to the message broker and subscribe to the topic to which our clients/publishers are going to write data to. In the client option, we have the option to define a function which will be called when data is written to this topic. We have created a function postDataTokinesisStream() which connects Kinesis using the Kinesis client and then writes data to the Kinesis stream, every time a data is pushed to the topic.

    package producer
    
    import (
    	"config"
    	"fmt"
    
    	"os"
    	"time"
    
    	"github.com/aws/aws-sdk-go/service/kinesis"
    
    	mqtt "github.com/eclipse/paho.mqtt.golang"
    )
    
    func postDataTokinesisStream(client mqtt.Client, message mqtt.Message) {
    	fmt.Printf("Received message on topic: %snMessage: %sn", message.Topic(), message.Payload())
    	streamName := config.GetKinesisStreamName()
    	kclient := config.GetKinesisClient()
    	var putRecordInput kinesis.PutRecordInput
    	partitionKey := message.Topic()
    	putRecordInput.PartitionKey = &partitionKey
    	putRecordInput.StreamName = &streamName
    	putRecordInput.Data = message.Payload()
    	putRecordOutput, err := kclient.PutRecord(&putRecordInput)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println(putRecordOutput)
    	}
    
    }
    
    func StartKinesisProducer() {
    	fmt.Println("Kinesis Producer Started")
    	c := make(chan os.Signal, 1)
    	opts := mqtt.NewClientOptions().AddBroker(config.GetMqttServerurl()).SetClientID("MqttSubscriberClient")
    	opts.SetKeepAlive(2 * time.Second)
    	opts.SetPingTimeout(1 * time.Second)
    	opts.OnConnect = func(c mqtt.Client) {
    		if token := c.Subscribe(config.GetMQTTTopicName(), 0, postDataTokinesisStream); token.Wait() && token.Error() != nil {
    			panic(token.Error())
    		}
    	}
    
    	client := mqtt.NewClient(opts)
    	if token := client.Connect(); token.Wait() && token.Error() != nil {
    		panic(token.Error())
    	} else {
    		fmt.Printf("Connected to %sn", config.GetMqttServerurl())
    	}
    
    	<-c
    }

    Create a Kinesis Consumer

    Now the data is available in our Kinesis stream we can pull it for processing. In the Kinesis consumer section, we create a Kinesis client just like we did in the previous section and then pull data from it. Here we first make a call to the DescribeStream method which returns us the shardId, we then use this shardId to get the ShardIterator and then finally we are able to fetch the records by passing the ShardIterator to GetRecords() method. GetRecords() also returns the  NextShardIterator which we can use to continuously look for records in the shard until NextShardIterator becomes null.

    package consumer
    
    import (
    	"config"
    	"fmt"
    
    	"github.com/aws/aws-sdk-go/service/kinesis"
    	"velotio.com/dao"
    )
    
    func StartKinesisConsumer() {
    	fmt.Println("Kinesis Consumer Started")
    	client := config.GetKinesisClient()
    	streamName := config.GetKinesisStreamName()
    	var describeStreamInput kinesis.DescribeStreamInput
    	describeStreamInput.StreamName = &streamName
    	describeStreamOutput, err := client.DescribeStream(&describeStreamInput)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println(*describeStreamOutput.StreamDescription.Shards[0].ShardId)
    	}
    	var getShardIteratorInput kinesis.GetShardIteratorInput
    	getShardIteratorInput.ShardId = describeStreamOutput.StreamDescription.Shards[0].ShardId
    	getShardIteratorInput.StreamName = &streamName
    	shardIteratorType := "TRIM_HORIZON"
    	getShardIteratorInput.ShardIteratorType = &shardIteratorType
    	getShardIteratorOuput, err := client.GetShardIterator(&getShardIteratorInput)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println(*getShardIteratorOuput.ShardIterator)
    	}
    	var getRecordsInput kinesis.GetRecordsInput
    
    	getRecordsInput.ShardIterator = getShardIteratorOuput.ShardIterator
    	getRecordsOuput, err := client.GetRecords(&getRecordsInput)
    	//fmt.Println(getRecordsOuput)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		for *getRecordsOuput.NextShardIterator != "" {
    			i := 0
    			for i < len(getRecordsOuput.Records) {
    				//fmt.Println(len(getRecordsOuput.Records))
    				sdf := &dao.SensorDataFiltered{}
    				sdf.PostDataToInfluxDB(getRecordsOuput.Records[i].Data)
    				i++
    			}
    			getRecordsInput.ShardIterator = getRecordsOuput.NextShardIterator
    			getRecordsOuput, err = client.GetRecords(&getRecordsInput)
    		}
    
    	}
    }

    Processing the data and writing it to InfluxDB

    Now we do simple processing of filtering out data. The data that we got from the sensor is having fields sensorId, temperature, humidity, city, and timestamp but we are interested in only the values of temperature and humidity for a city so we have created a new structure ‘SensorDataFiltered’ which contains only the fields we need.

    For every record that the Kinesis consumer receives it creates an instance of the SensorDataFiltered type and calls the PostDataToInfluxDB() method where the record received from the Kinesis stream is unmarshaled into the SensorDataFiltered type and send to InfluxDB. Here we need to provide the name of the database we created earlier to the variable dbName and the InfluxDB host and port values to dbHost and dbPort respectively.

    In the InfluxDB request body, the first value that we provide is used as the measurement which is an InfluxDB struct to store similar data together. Then we have tags, we have used `city` as our tag so that we can filter the data based on them and then we have the actual values. For more details on InfluxDB data write format please refer here.

    package dao
    
    import (
    	"bytes"
    	"crypto/tls"
    	"encoding/json"
    	"fmt"
    	"net/http"
    )
    
    type SensorDataFiltered struct {
    	Temperature float64 `json:"temperature"`
    	Humidity    float64 `json:"humidity"`
    	City        string  `json:"city"`
    }
    
    var dbName = "iotdata"
    var dbHost = "184.73.62.30"
    var dbPort = "8086"
    
    func (sdf *SensorDataFiltered) PostDataToInfluxDB(Data []byte) {
    	err := json.Unmarshal(Data, &sdf)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println(sdf.Temperature, sdf.Humidity)
    	}
    	url := "http://" + dbHost + ":" + dbPort + "/write?db=" + dbName
    	humidity := fmt.Sprintf("%.2f", sdf.Humidity)
    	temperature := fmt.Sprintf("%.2f", sdf.Temperature)
    	city := sdf.City
    	requestBody := "sensordata,city=" + city + " humidity=" + humidity + ",temperature=" + temperature
    	req, err := http.NewRequest("POST", url, bytes.NewBuffer([]byte(requestBody)))
    	httpclient := &http.Client{
    		Transport: &http.Transport{
    			TLSClientConfig: &tls.Config{
    				InsecureSkipVerify: true,
    			},
    		},
    	}
    	resp, err := httpclient.Do(req)
    	if err != nil {
    		fmt.Println(err)
    	} else {
    		fmt.Println("Status code for influxdb data port request = ", resp.StatusCode)
    	}
    	defer resp.Body.Close()
    
    }

    Once the data is written to InfluxDB we can see it in the web console by querying the measurement create in our database.

    Putting everything together in our main function

    Now we need to simply call the functions we discussed above and run our main program. Note that we have used `go` before the first two function call which makes them goroutines and they execute concurrently.

    On running the code you will see the logs for all the stages of our pipeline getting written to the stdout and it very closely resembles real-life scenarios where data is written by IoT devices and gets processed in near real-time.

    package main
    
    import (
    	"time"
    
    	"velotio.com/consumer"
    	"velotio.com/producer"
    	"velotio.com/publisher"
    )
    
    func main() {
    
    	go producer.StartKinesisProducer()
    	go publisher.StartMQTTPublisher()
    	time.Sleep(5 * time.Second)
    	consumer.StartKinesisConsumer()
    
    }

    Visualization through Grafana

    We can access the Grafana web console at port 3000 of the machine on which it is running. First, we need to add our InfluxDB as a data source to it under the data sources option.

    For creating dashboard go to the dashboard option and choose new. Once the dashboard is created we can start by adding a panel.

    We need to add Influxdb data source that we added earlier as the panel data source and write queries as shown in the image below.

    We can repeat the same process for adding another panel to the dashboard this time choosing a different city in our query.

    Conclusion:

    IoT data analytics is a fast evolving and interesting space. The number of IoT devices are growing rapidly. There is a great opportunity to get valuable insights from the huge amount of data generated by these device. In this blog, I tried to help you grab that opportunity by building a near real time data pipeline for IoT data. If you like it please share and subscribe to our blog.

  • An Introduction to React Fiber – The Algorithm Behind React

    In this article, we will learn about React Fiber—the core algorithm behind React. React Fiber is the new reconciliation algorithm in React 16. You’ve most likely heard of the virtualDOM from React 15. It’s the old reconciler algorithm (also known as the Stack Reconciler) because it uses stack internally. The same reconciler is shared with different renderers like DOM, Native, and Android view. So, calling it virtualDOM may lead to confusion.

    So without any delay, let’s see what React Fiber is.

    Introduction

    React Fiber is a completely backward-compatible rewrite of the old reconciler. This new reconciliation algorithm from React is called Fiber Reconciler. The name comes from fiber, which it uses to represent the node of the DOM tree. We will go through fiber in detail in later sections.

    The main goals of the Fiber reconciler are incremental rendering, better or smoother rendering of UI animations and gestures, and responsiveness of the user interactions. The reconciler also allows you to divide the work into multiple chunks and divide the rendering work over multiple frames. It also adds the ability to define the priority for each unit of work and pause, reuse, and abort the work. 

    Some other features of React include returning multiple elements from a render function, supporting better error handling(we can use the componentDidCatch method to get clearer error messages), and portals.

    While computing new rendering updates, React refers back to the main thread multiple times. As a result, high-priority work can be jumped over low-priority work. React has priorities defined internally for each update. 

    Before going into technical details, I would recommend you learn the following terms, which will help understand React Fiber.

    Prerequisites

    Reconciliation

    As explained in the official React documentation, reconciliation is the algorithm for diffing two DOM trees. When the UI renders for the first time, React creates a tree of nodes. Every individual node represents the React element. It creates a virtual tree (which is known as virtualDOM) that’s a copy of the rendered DOM tree. After any update from the UI, it recursively compares every tree node from two trees. The cumulative changes are then passed to the renderer.

    Scheduling

    As explained in the React documentation, suppose we have some low-priority work (like a large computing function or the rendering of recently fetched elements), and some high-priority work (such as animation). There should be an option to prioritize the high-priority work over low-priority work. In the old stack reconciler implementation, recursive traversal and calling the render method of the whole updated tree happens in single flow. This can lead to dropping frames. 

    Scheduling can be time-based or priority-based. The updates should be scheduled according to the deadline. The high-priority work should be scheduled over low-priority work.

    requestIdleCallback 

    requestAnimationFrame schedules the high-priority function to be called before the next animation frame. Similarly, requestIdleCallback schedules the low-priority or non-essential function to be called in the free time at the end of the frame. 

     requestIdleCallback(lowPriorityWork);

    This shows the usage of requestIdleCallback. lowPriorityWork is a callback function that will be called in the free time at the end of the frame.

    function lowPriorityWork(deadline) {
        while (deadline.timeRemaining() > 0 && workList.length > 0)
          performUnitOfWork();
      
        if (workList.length > 0)
          requestIdleCallback(lowPriorityWork);
      }

    When this callback function is called, it gets the argument deadline object. As you can see in the snippet above, the timeRemaining function returns the latest idle time remaining. If this time is greater than zero, we can do the work needed. And if the work is not completed, we can schedule it again at the last line for the next frame.

    So, now we are good to proceed with how the fiber object itself looks and see how React Fiber works

    Structure of fiber

    A fiber(lowercase ‘f’) is a simple JavaScript object. It represents the React element or a node of the DOM tree. It’s a unit of work. In comparison, Fiber is the React Fiber reconciler.

    This example shows a simple React component that renders in root div.

    function App() {
        return (
          <div className="wrapper">
            <div className="list">
              <div className="list_item">List item A</div>
              <div className="list_item">List item B</div>
            </div>
            <div className="section">
              <button>Add</button>
              <span>No. of items: 2</span>
            </div>
          </div>
        );
      }
     
      ReactDOM.render(<App />, document.getElementById('root'));

    It’s a simple component that shows a list of items for the data we have got from the component state. (I have replaced the .map and iteration over data with two list items just to make this example look simpler.) There is also a button and the span,which shows the number of list items.

    As mentioned earlier, fiber represents the React element. While rendering for the first time, React goes through each of the React elements and creates a tree of fibers. (We will see how it creates this tree in later sections.) 

    It creates a fiber for each individual React element, like in the example above. It will create a fiber, such as W, for the div, which has the class wrapper. Then, fiber L for the div, which has a class list, and so on. Let’s name the fibers for two list items as LA and LB.

    In the later section, we will see how it iterates and the final structure of the tree. Though we call it a tree, React Fiber creates a linked list of nodes where each node is a fiber. And there is a relationship between parent, child, and siblings. React uses a return key to point to the parent node, where any of the children fiber should return after completion of work. So, in the above example, LA’s return is L, and the sibling is LB.

    So, how does this fiber object actually look?

    Below is the definition of type, as defined in the React codebase. I have removed some extra props and kept some comments to understand the meaning of the properties. You can find the detailed structure in the React codebase.

    export type Fiber = {
        // Tag identifying the type of fiber.
        tag: TypeOfWork,
     
        // Unique identifier of this child.
        key: null | string,
     
        // The value of element.type which is used to preserve the identity during
        // reconciliation of this child.
        elementType: any,
     
        // The resolved function/class/ associated with this fiber.
        type: any,
     
        // The local state associated with this fiber.
        stateNode: any,
     
        // Remaining fields belong to Fiber
     
        // The Fiber to return to after finishing processing this one.
        // This is effectively the parent.
        // It is conceptually the same as the return address of a stack frame.
        return: Fiber | null,
     
        // Singly Linked List Tree Structure.
        child: Fiber | null,
        sibling: Fiber | null,
        index: number,
     
        // The ref last used to attach this node.
        ref: null | (((handle: mixed) => void) & {_stringRef: ?string, ...}) | RefObject,
     
        // Input is the data coming into process this fiber. Arguments. Props.
        pendingProps: any, // This type will be more specific once we overload the tag.
        memoizedProps: any, // The props used to create the output.
     
        // A queue of state updates and callbacks.
        updateQueue: mixed,
     
        // The state used to create the output
        memoizedState: any,
     
        mode: TypeOfMode,
     
        // Effect
        effectTag: SideEffectTag,
        subtreeTag: SubtreeTag,
        deletions: Array<Fiber> | null,
     
        // Singly linked list fast path to the next fiber with side-effects.
        nextEffect: Fiber | null,
     
        // The first and last fiber with side-effect within this subtree. This allows
        // us to reuse a slice of the linked list when we reuse the work done within
        // this fiber.
        firstEffect: Fiber | null,
        lastEffect: Fiber | null,
     
        // This is a pooled version of a Fiber. Every fiber that gets updated will
        // eventually have a pair. There are cases when we can clean up pairs to save
        // memory if we need to.
        alternate: Fiber | null,
      };

    How does React Fiber work?

    Next, we will see how the React Fiber creates the linked list tree and what it does when there is an update.

    Before that, let’s explain what a current tree and workInProgress tree is and how the tree traversal happens. 

    The tree, which is currently flushed to render the UI, is called current. It’s one that was used to render the current UI. Whenever there is an update, Fiber builds a workInProgress tree, which is created from the updated data from the React elements. React performs work on this workInProgress tree and uses this updated tree for the next render. Once this workInProgress tree is rendered on the UI, it becomes the current tree.

    Fig:- Current and workInProgress trees

    Fiber tree traversal happens like this:

    • Start: Fiber starts traversal from the topmost React element and creates a fiber node for it. 
    • Child: Then, it goes to the child element and creates a fiber node for this element. This continues until the leaf element is reached. 
    • Sibling: Now, it checks for the sibling element if there is any. If there is any sibling, it traverses the sibling subtree until the leaf element of the sibling. 
    • Return: If there is no sibling, then it returns to the parent. 

    Every fiber has a child (or a null value if there is no child), sibling, and parent property (as you have seen the structure of fiber in the earlier section). These are the pointers in the Fiber to work as a linked list.

    Fig:- React Fiber tree traversal

    Let’s take the same example, but let’s name the fibers that correspond to the specific React elements.

    function App() {    // App
        return (
          <div className="wrapper">    // W
            <div className="list">    // L
              <div className="list_item">List item A</div>    // LA
              <div className="list_item">List item B</div>    // LB
            </div>
            <div className="section">   // S
              <button>Add</button>   // SB
              <span>No. of items: 2</span>   // SS
            </div>
          </div>
        );
      }
     
      ReactDOM.render(<App />, document.getElementById('root'));  // HostRoot

    First, we will quickly cover the mounting stage where the tree is created, and after that, we will see the detailed logic behind what happens after any update.

    Initial render

    The App component is rendered in root div, which has the id of root.

    Before traversing further, React Fiber creates a root fiber. Every Fiber tree has one root node. Here in our case, it’s HostRoot. There can be multiple roots if we import multiple React Apps in the DOM.

    Before rendering for the first time, there won’t be any tree. React Fiber traverses through the output from each component’s render function and creates a fiber node in the tree for each React element. It uses createFiberFromTypeAndProps to convert React elements to fiber. The React element can be a class component or a host component like div or span. For the class component, it creates an instance, and for the host component, it gets the data/props from the React Element.

    So, as shown in the example, it creates a fiber App. Going further, it creates one more fiber, W, and then it goes to child div and creates a fiber L. So on, it creates a fiber, LA  and LB, for its children. The fiber, LA, will have return (can also be called as a parent in this case) fiber as L, and sibling as LB.

    So, this is how the final fiber tree will look.

    Fig:- React Fiber Relationship

    This is how the nodes of a tree are connected using the child, sibling, and return pointers.

    Update Phase

    Now, let’s cover the second case, which is update—say due to setState. 

    So, at this time, Fiber already has the current tree. For every update, it builds a workInProgress tree. It starts with the root fiber and traverses the tree until the leaf node. Unlike the initial render phase, it doesn’t create a new fiber for every React element. It just uses the preexisting fiber for that React element and merges the new data/props from the updated element in the update phase. 

    Earlier, in React 15, the stack reconciler was synchronous. So, an update would traverse the whole tree recursively and make a copy of the tree. Suppose in between this, if some other update comes that has a higher priority than this, then there is no chance to abort or pause the first update and perform the second update. 

    React Fiber divides the update into units of works. It can assign the priority to each unit of work, and has the ability to pause, reuse, or abort the unit of work if not needed. React Fiber divides the work into multiple units of work, which is fiber. It schedules the work in multiple frames and uses the deadline from the requestIdleCallback. Every update has its priority defined like animation, or user input has a higher priority than rendering the list of items from the fetched data. Fiber uses requestAnimationFrame for higher priority updates and requestIdleCallback for lower priority updates. So, while scheduling a work, Fiber checks the priority of the current update and the deadline (free time after the end of the frame).

    Fiber can schedule multiple units of work after a single frame if the priority is higher than the pending work—or if there is no deadline or the deadline has yet to be reached. And the next set of units of work is carried over the further frames. This is what makes it possible for Fiber to pause, reuse, and abort the unit of work.

    So, let’s see what actually happens in the scheduled work. There are two phases to complete the work: render and commit.

    Render Phase

    The actual tree traversal and the use of deadline happens in this phase. This is the internal logic of Fiber, so the changes made on the Fiber tree in this phase won’t be visible to the user. So Fiber can pause, abort, or divide work on multiple frames. 

    We can call this phase the reconciliation phase. Fiber traverses from the root of the fiber tree and processes each fiber. The workLoop function is called for every unit of work to perform the work. We can divide this processing of the work into two steps: begin and complete.

    Begin Step

    If you find the workLoop function from the React codebase, it calls the performUnitOfWork, which takes the nextUnitOfWork as a parameter. It is nothing but the unit of work, which will be performed. The performUnitOfWork function internally calls the beginWork function. This is where the actual work happens on the fiber, and performUnitOfWork is just where the iteration happens. 

    Inside the beginWork function, if the fiber doesn’t have any pending work, it just bails out(skips) the fiber without entering the begin phase. This is how, while traversing the large tree, Fiber skips already processed fibers and directly jumps to the fiber, which has pending work. If you see the large beginWork function code block, we will find a switch block that calls the respective fiber update function, depending on the fiber tag. Like updateHostComponent for host components. These functions update the fiber. 

    The beginWork function returns the child fiber if there is any or null if there is no child. The performUnitOfWork function keeps on iterative and calls the child fibers till the leaf node reaches. In the case of a leaf node, beginWork returns null as there is no any child and performUnitOfWork function calls a completeUnitOfWork function. Let’s see the complete step now.

    Complete Step

    This completeUnitOfWork function completes the current unit of work by calling a completeWork function. completeUnitOfWork returns a sibling fiber if there is any to perform the next unit of work else completes the return(parent) fiber if there is no work on it. This goes till the return is null, i.e.,  until it reaches the root node. Like beginWork, completeWork is also a function where actual work happens, and completeUnitOfWork is for the iterations.

    The result of the render phase creates an effect list (side-effects). These effects are like insert, update, or delete a node of host components, or calling the lifecycle methods for the node of class components. The fibers are marked with the respective effect tag.

    After the render phase, Fiber will be ready to commit the updates. 

    Commit Phase

    This is the phase where the finished work will be used to render it on the UI. As the result of this phase will be visible to the user, it can’t be divided in partial renders. This phase is a synchronous phase. 

    At the beginning of this phase, Fiber has the current tree that’s already rendered on the UI, finishedWork, or the workInProgress tree, which is built during the render phase and the effect list.

    The effect list is the linked list of fibers, which has side-effects. So, it’s a subset of nodes of the workInProgress tree from the render phase, which has side-effects(updates). The effect list nodes are linked using a nextEffect pointer.

    The function called during this phase is completeRoot

    Here, the workInProgress tree becomes the current tree as it is used to render the UI. The actual DOM updates like insert, update, delete, and calls to lifecycle methods—or updates related to refs—happen for the nodes present in the effect list.

    That’s how the Fiber reconciler works.

    Conclusion

    This is how the React Fiber reconciler makes it possible to divide the work into multiple units of work. It sets the priority of each work, and makes it possible to pause, reuse, and abort the unit of work. In the fiber tree, the individual node keeps track of which are needed to make the above things possible. Every fiber is a node of linked lists, which are connected through the child, sibling, and return references. 

    Here is a well documented list of resources you can find to know more about the React Fiber.

    Related Articles

    1. Using Formik To Build Dynamic Forms In React – Faster & Better

    2. Cleaner, Efficient Code with Hooks and Functional Programming

  • A Beginner’s Guide to Python Tornado

    The web is a big place now. We need to support thousands of clients at a time, and here comes Tornado. Tornado is a Python web framework and asynchronous network library, originally developed at FriendFreed.

    Tornado uses non-blocking network-io. Due to this, it can handle thousands of active server connections. It is a saviour for applications where long polling and a large number of active connections are maintained.

    Tornado is not like most Python frameworks. It’s not based on WSGI, while it supports some features of WSGI using module `tornado.wsgi`. It uses an event loop design that makes Tornado request execution faster.  

    What is Synchronous Program?

    A function blocks, performs its computation, and returns, once done . A function may block for many reasons: network I/O, disk I/O, mutexes, etc.

    Application performance depends on how efficiently application uses CPU cycles, that’s why blocking statements/calls must be taken seriously. Consider password hashing functions like bcrypt, which by design use hundreds of milliseconds of CPU time, far more than a typical network or disk access. As the CPU is not idle, there is no need to go for asynchronous functions.

    A function can be blocking in one, and non-blocking in others. In the context of Tornado, we generally consider blocking due to network I/O and disk, although all kinds of blocking need to be minimized.

    What is Asynchronous Program?

    1) Single-threaded architecture:

        Means, it can’t do computation-centric tasks parallely.

    2) I/O concurrency:

        It can handover IO tasks to the operating system and continue to the next task to achieve parallelism.

    3) epoll/ kqueue:

        Underline system-related construct that allows an application to get events on a file descriptor or I/O specific tasks.

    4) Event loop:

        It uses epoll or kqueue to check if any event has happened, and executes callback that is waiting for those network events.

    Asynchronous vs Synchronous Web Framework:

    In case of synchronous model, each request or task is transferred to thread or routing, and as it finishes, the result is handed over to the caller. Here, managing things are easy, but creating new threads is too much overhead.

    On the other hand, in Asynchronous framework, like Node.js, there is a single-threaded model, so very less overhead, but it has complexity.

    Let’s imagine thousands of requests coming through and a server uses event loop and callback. Now, until request gets processed, it has to efficiently store and manage the state of that request to map callback result to the actual client.

    Node.js vs Tornado

    Most of these comparison points are tied to actual programming language and not the framework: 

    • Node.js has one big advantage that all of its libraries are Async. In Python, there are lots of available packages, but very few of them are asynchronous
    • As Node.js is JavaScript runtime, and we can use JS for both front and back-end, developers can keep only one codebase and share the same utility library
    • Google’s V8 engine makes Node.js faster than Tornado. But a lot of Python libraries are written in C and can be faster alternatives.

    A Simple ‘Hello World’ Example

    import tornado.ioloop
    import tornado.web
    
    class MainHandler(tornado.web.RequestHandler):
        def get(self):
            self.write("Hello, world")
    
    def make_app():
        return tornado.web.Application([
            (r"/", MainHandler),
        ])
    
    if __name__ == "__main__":
        app = make_app()
        app.listen(8888)
        tornado.ioloop.IOLoop.current().start()

    Note: This example does not use any asynchronous feature.

    Using AsyncHTTPClient module, we can do REST call asynchronously.

    from tornado.httpclient import AsyncHTTPClient
    from tornado import gen
    
    @gen.coroutine
    def async_fetch_gen(url):
        http_client = AsyncHTTPClient()
        response = yield http_client.fetch(url)
        raise gen.Return(response.body)

    As you can see `yield http_client.fetch(url)` will run as a coroutine.

    Complex Example of Tornado Async

    Please have a look at Asynchronous Request handler.

    WebSockets Using Tornado:

    Tornado has built-in package for WebSockets that can be easily used with coroutines to achieve concurrency, here is one example:

    import logging
    import tornado.escape
    import tornado.ioloop
    import tornado.options
    import tornado.web
    import tornado.websocket
    from tornado.options import define, options
    from tornado.httpserver import HTTPServer
    
    define("port", default=8888, help="run on the given port", type=int)
    
    
    # queue_size = 1
    # producer_num_items = 5
    # q = queues.Queue(queue_size)
    
    def isPrime(num):
        """
        Simple worker but mostly IO/network call
        """
        if num > 1:
            for i in range(2, num // 2):
                if (num % i) == 0:
                    return ("is not a prime number")
            else:
                return("is a prime number")
        else:
            return ("is not a prime number")
    
    class Application(tornado.web.Application):
        def __init__(self):
            handlers = [(r"/chatsocket", TornadoWebSocket)]
            super(Application, self).__init__(handlers)
    
    class TornadoWebSocket(tornado.websocket.WebSocketHandler):
        clients = set()
    
        # enable cross domain origin
        def check_origin(self, origin):
            return True
    
        def open(self):
            TornadoWebSocket.clients.add(self)
    
        # when client closes connection
        def on_close(self):
            TornadoWebSocket.clients.remove(self)
    
        @classmethod
        def send_updates(cls, producer, result):
    
            for client in cls.clients:
    
                # check if result is mapped to correct sender
                if client == producer:
                    try:
                        client.write_message(result)
                    except:
                        logging.error("Error sending message", exc_info=True)
    
        def on_message(self, message):
            try:
                num = int(message)
            except ValueError:
                TornadoWebSocket.send_updates(self, "Invalid input")
                return
            TornadoWebSocket.send_updates(self, isPrime(num))
    
    def start_websockets():
        tornado.options.parse_command_line()
        app = Application()
        server = HTTPServer(app)
        server.listen(options.port)
        tornado.ioloop.IOLoop.current().start()
    
    
    
    if __name__ == "__main__":
        start_websockets()

    One can use a WebSocket client application to connect to the server, message can be any integer. After processing, the client receives the result if the integer is prime or not.  
    Here is one more example of actual async features of Tornado. Many will find it similar to Golang’s Goroutine and channels.

    In this example, we can start worker(s) and they will listen to the ‘tornado.queue‘. This queue is asynchronous and very similar to the asyncio package.

    # Example 1
    from tornado import gen, queues
    from tornado.ioloop import IOLoop
    
    @gen.coroutine
    def consumer(queue, num_expected):
        for _ in range(num_expected):
            # heavy I/O or network task
            print('got: %s' % (yield queue.get()))
    
    
    @gen.coroutine
    def producer(queue, num_items):
        for i in range(num_items):
            print('putting %s' % i)
            yield queue.put(i)
    
    @gen.coroutine
    def main():
        """
        Starts producer and consumer and wait till they finish
        """
        yield [producer(q, producer_num_items), consumer(q, producer_num_items)]
    
    queue_size = 1
    producer_num_items = 5
    q = queues.Queue(queue_size)
    
    results = IOLoop.current().run_sync(main)
    
    
    # Output:
    # putting 0
    # putting 1
    # got: 0
    # got: 1
    # putting 2
    # putting 3
    # putting 4
    # got: 2
    # got: 3
    # got: 4
    
    
    # Example 2
    # Condition
    # A condition allows one or more coroutines to wait until notified.
    from tornado import gen
    from tornado.ioloop import IOLoop
    from tornado.locks import Condition
    
    my_condition = Condition()
    
    @gen.coroutine
    def waiter():
        print("I'll wait right here")
        yield my_condition.wait()
        print("Received notification now doing my things")
    
    @gen.coroutine
    def notifier():
        yield gen.sleep(60)
        print("About to notify")
        my_condition.notify()
        print("Done notifying")
    
    @gen.coroutine
    def runner():
        # Wait for waiter() and notifier() in parallel
        yield([waiter(), notifier()])
    
    results = IOLoop.current().run_sync(runner)
    
    
    # output:
    
    # I'll wait right here
    # About to notify
    # Done notifying
    # Received notification now doing my things

    Conclusion

    1) Asynchronous frameworks are not much of use when most of the computations are CPU centric and not I/O.

    2) Due to a single thread per core model and event loop, it can manage thousands of active client connections.

    3) Many say Django is too big, Flask is too small, and Tornado is just right:)

  • Publish APIs For Your Customers: Deploy Serverless Developer Portal For Amazon API Gateway

    Amazon API Gateway is a fully managed service that allows you to create, secure, publish, test and monitor your APIs. We often come across scenarios where customers of these APIs expect a platform to learn and discover APIs that are available to them (often with examples).

    The Serverless Developer Portal is one such application that is used for developer engagement by making your APIs available to your customers. Further, your customers can use the developer portal to subscribe to an API, browse API documentation, test published APIs, monitor their API usage, and submit their feedback.

    This blog is a detailed step-by-step guide for deploying the Serverless Developer Portal for APIs that are managed via Amazon API Gateway.

    Advantages

    The users of the Amazon API Gateway can be vaguely categorized as –

    API Publishers – They can use the Serverless Developer Portal to expose and secure their APIs for customers which can be integrated with AWS Marketplace for monetary benefits. Furthermore, they can customize the developer portal, including content, styling, logos, custom domains, etc. 

    API Consumers – They could be Frontend/Backend developers, third party customers, or simply students. They can explore available APIs, invoke the APIs, and go through the documentation to get an insight into how each API works with different requests. 

    Developer Portal Architecture

    We would need to establish a basic understanding of how the developer portal works. The Serverless Developer Portal is a serverless application built on microservice architecture using Amazon API Gateway, Amazon Cognito, AWS Lambda, Simple Storage Service and Amazon CloudFront. 

    The developer portal comprises multiple microservices and components as described in the following figure.

    Source: AWS

    There are a few key pieces in the above architecture –

    1. Identity Management: Amazon Cognito is basically the secure user directory of the developer portal responsible for user management. It allows you to configure triggers for registration, authentication, and confirmation, thereby giving you more control over the authentication process. 
    2. Business Logic: AWS Cloudfront is configured to serve your static content hosted in a private S3 bucket. The static content is built using the React JS framework which interacts with backend APIs dictating the business logic for various events. 
    3. Catalog Management: Developer portal uses catalog for rendering the APIs with Swagger specifications on the APIs page. The catalog file (catalog.json in S3 Artifact bucket) is updated whenever an API is published or removed. This is achieved by creating an S3 trigger on AWS Lambda responsible for studying the content of the catalog directory and generating a catalog for the developer portal.  
    4. API Key Creation: API Key is created for consumers at the time of registration. Whenever you subscribe to an API, associated Usage Plans are updated to your API key, thereby giving you access to those APIs as defined by the usage plan. Cognito User – API key mapping is stored in the DynamoDB table along with other registration related details.
    5. Static Asset Uploader: AWS Lambda (Static-Asset-Uploader) is responsible for updating/deploying static assets for the developer portal. Static assets include – content, logos, icons, CSS, JavaScripts, and other media files.

    Let’s move forward to building and deploying a simple Serverless Developer Portal.

    Building Your API

    Start with deploying an API which can be accessed using API Gateway from 

    https://<api-id>.execute-api.region.amazonaws.com/stage

    If you do not have any such API available, create a simple application by jumping to the section, “API Performance Across the Globe,” on this blog.

    Setup custom domain name

    For professional projects, I recommend that you create a custom domain name as they provide simpler and more intuitive URLs you can provide to your API users.

    Make sure your API Gateway domain name is updated in the Route53 record set created after you set up your custom domain name. 

    See more on Setting up custom domain names for REST APIs – Amazon API Gateway

    Enable CORS for an API Resource

    There are two ways you can enable CORS on a resource:

    1. Enable CORS Using the Console
    2. Enable CORS on a resource using the import API from Amazon API Gateway

    Let’s discuss the easiest way to do it using a console.

    1. Open API Gateway console.
    2. Select the API Gateway for your API from the list.
    3. Choose a resource to enable CORS for all the methods under that resource.
      Alternatively, you could choose a method under the resource to enable CORS for just this method.
    4. Select Enable CORS from the Actions drop-down menu.
    5. In the Enable CORS form, do the following:
      – Leave Access-Control-Allow-Headers and Access-Control-Allow-Origin header to default values.
      – Click on Enable CORS and replace existing CORS headers.
    6. Review the changes in Confirm method changes popup, choose Yes, overwrite existing values to apply your CORS settings.

    Once enabled, you can see a mock integration on the OPTIONS method for the selected resource. You must enable CORS for ${proxy} resources too. 

    To verify the CORS is enabled on API resource, try curl on OPTIONS method

    curl -v -X OPTIONS -H "Access-Control-Request-Method: POST" -H "Origin: http://example.com" https://api-id.execute-api.region.amazonaws.com/stage
    

    You should see the response OK in the header:

    < HTTP/1.1 200 OK
    < Content-Type: application/json
    < Content-Length: 0
    < Connection: keep-alive
    < Date: Mon, 13 Apr 2020 16:27:44 GMT
    < x-amzn-RequestId: a50b97b5-2437-436c-b99c-22e00bbe9430
    < Access-Control-Allow-Origin: *
    < Access-Control-Allow-Headers: Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token
    < x-amz-apigw-id: K7voBHDZIAMFu9g=
    < Access-Control-Allow-Methods: DELETE,GET,HEAD,OPTIONS,PATCH,POST,PUT
    < X-Cache: Miss from cloudfront
    < Via: 1.1 1c8c957c4a5bf1213bd57bd7d0ec6570.cloudfront.net (CloudFront)
    < X-Amz-Cf-Pop: BOM50-C1
    < X-Amz-Cf-Id: OmxFzV2-TH2BWPVyOohNrhNlJ-s1ZhYVKyoJaIrA_zyE9i0mRTYxOQ==

    Deploy Developer Portal

    There are two ways to deploy the developer portal for your API. 

    Using SAR

    An easy way will be to deploy api-gateway-dev-portal directly from AWS Serverless Application Repository. 

    Note -If you intend to upgrade your Developer portal to a major version then you need to refer to the Upgrading Instructions which is currently under development.

    Using AWS SAM

    1. Ensure that you have the latest AWS CLI and AWS SAM CLI installed and configured.
    2. Download or clone the API Gateway Serverless Developer Portal repository.
    3. Update the Cloudformation template file – cloudformation/template.yaml.

    Parameters you must configure and verify includes: 

    • ArtifactsS3BucketName
    • DevPortalSiteS3BucketName
    • DevPortalCustomersTableName
    • DevPortalPreLoginAccountsTableName
    • DevPortalAdminEmail
    • DevPortalFeedbackTableName
    • CognitoIdentityPoolName
    • CognitoDomainNameOrPrefix
    • CustomDomainName
    • CustomDomainNameAcmCertArn
    • UseRoute53Nameservers
    • AccountRegistrationMode

    You can view your template file in AWS Cloudformation Designer to get a better idea of all the components/services involved and how they are connected.

    See Developer portal settings for more information about parameters.

    1. Replace the static files in your project with the ones you would like to use.
      dev-portal/public/custom-content
      lambdas/static-asset-uploader/build
      api-logo contains the logos you would like to show on the API page (in png format). Portal checks for an api-id_stage.png file when rendering the API page. If not found, it chooses the default logo – default.png.
      content-fragments includes various markdown files comprising the content of the different pages in the portal. 
      Other static assets including favicon.ico, home-image.png and nav-logo.png that appear on your portal. 
    2. Let’s create a ZIP file of your code and dependencies, and upload it to Amazon S3. Running below command creates an AWS SAM template packaged.yaml, replacing references to local artifacts with the Amazon S3 location where the command uploaded the artifacts:
    sam package --template-file ./cloudformation/template.yaml --output-template-file ./cloudformation/packaged.yaml --s3-bucket {your-lambda-artifacts-bucket-name}

    1. Run the following command from the project root to deploy your portal, replace:
      – {your-template-bucket-name}
      with the name of your Amazon S3 bucket.
      – {custom-prefix}
      with a prefix that is globally unique.
      – {cognito-domain-or-prefix}
      with a unique string.
    sam deploy --template-file ./cloudformation/packaged.yaml --s3-bucket {your-template-bucket-name} --stack-name "{custom-prefix}-dev-portal" --capabilities CAPABILITY_NAMED_IAM

    Note: Ensure that you have required privileges to make deployments, as, during the deployment process, it attempts to create various resources such as AWS Lambda, Cognito User Pool, IAM roles, API Gateway, Cloudfront Distribution, etc. 

    After your developer portal has been fully deployed, you can get its URL by following.

    1. Open the AWS CloudFormation console.
    2. Select your stack you created above.
    3. Open the Outputs section. The URL for the developer portal is specified in the WebSiteURL property.

    Create Usage Plan

    Create a usage plan, to list your API under a subscribable APIs category allowing consumers to access the API using their API keys in the developer portal. Ensure that the API gateway stage is configured for the usage plan.

    Publishing an API

    Only Administrators have permission to publish an API. To create an Administrator account for your developer portal –

    1. Go to the WebSiteURL obtained after the successful deployment. 

    2. On the top right of the home page click on Register.

    Source: Github

    3. Fill the registration form and hit Sign up.

    4. Enter the confirmation code received on your email address provided in the previous step.

    5. Promote the user as Administrator by adding it to AdminGroup. 

    • Open Amazon Cognito User Pool console.
    • Select the User Pool created for your developer portal.
    • From the General Settings > Users and Groups page, select the User you want to promote as Administrator.
    • Click on Add to group and then select the Admin group from the dropdown and confirm.

    6. You will be required to log in again to log in as an Administrator. Click on the Admin Panel and choose the API you wish to publish from the APIs list.

    Setting up an account

    The signup process depends on the registration mode selected for the developer portal. 

    For request registration mode, you need to wait for the Administrator to approve your registration request.

    For invite registration mode, you can only register on the portal when invited by the portal administrator. 

    Subscribing an API

    1. Sign in to the developer portal.
    2. Navigate to the Dashboard page and Copy your API Key.
    3. Go to APIs Page to see a list of published APIs.
    4. Select an API you wish to subscribe to and hit the Subscribe button.

    Tips

    1. When a user subscribes to API, all the APIs published under that usage plan are accessible no matter whether they are published or not.
    2. Whenever you subscribe to an API, the catalog is exported from API Gateway resource documentation. You can customize the workflow or override the catalog swagger definition JSON in S3 bucket as defined in ArtifactsS3BucketName under /catalog/<apiid>_<stage>.json</stage></apiid>
    3. For backend APIs, CORS requests are allowed only from custom domain names selected for your developer portal.
    4. Ensure to set the CORS response header from the published API in order to invoke them from the developer portal.

    Summary

    You’ve seen how to deploy a Serverless Developer Portal and publish an API. If you are creating a serverless application for the first time, you might want to read more on Serverless Computing and AWS Gateway before you get started. 

    Start building your own developer portal. To know more on distributing your API Gateway APIs to your customers follow this AWS guide.