Category: Industry

  • API Testing Using Postman and Newman

    In the last few years, we have an exponential increase in the development and use of APIs. We are in the era of API-first companies like Stripe, Twilio, Mailgun etc. where the entire product or service is exposed via REST APIs. Web applications also today are powered by REST-based Web Services. APIs today encapsulate critical business logic with high SLAs. Hence it is important to test APIs as part of the continuous integration process to reduce errors, improve predictability and catch nasty bugs.

    In the context of API development, Postman is great REST client to test APIs. Although Postman is not just a REST Client, it contains a full-featured testing sandbox that lets you write and execute Javascript based tests for your API.

    Postman comes with a nifty CLI tool – Newman. Newman is the Postman’s Collection Runner engine that sends API requests, receives the response and then runs your tests against the response. Newman lets developments easily integrate Postman into continuous integration systems like Jenkins. Some of the important features of Postman & Newman include:-

    1. Ability to test any API and see the response instantly.
    2. Ability to create test suites or collections using a collection of API endpoints.
    3. Ability to collaborate with team members on these collections.
    4. Ability to easily export/import collections as JSON files.

    We are going to look at all these features, some are intuitive and some not so much unless you’ve been using Postman for a while.

    Setting up Your Postman

    You can install Postman either as a Chrome extension or as a native application

    Later, can then look it up in your installed apps and open it. You can choose to Sign Up & create an account if you want, this is important especially for saving your API collections and accessing them anytime on any machine. However, for this article, we can skip this. There’s a button for that towards the bottom when you first launch the app.

    Postman Collections

    Postman Collections in simple words is a collection of tests. It is essentially a test suite of related tests. These tests can be scenario-based tests or sequence/workflow-based tests.

    There’s a Collections tab on the top left of Postman, with an example Postman Echo collection. You can open and go through it.

    Just like in the above screenshot, select a API request and click on the Tests. Check the first line:

    tests["response code is 200"] = responseCode.code === 200;

    The above line is a simple test to check if the response code for the API is 200. This is the pattern for writing Assertions/Tests in Postman (using JavaScript), and this is actually how you are going to write the tests for API’s need to be tested.You can open the other API requests in the POSTMAN Echo collection to get a sense of how requests are made.

    Adding a COLLECTION

    To make your own collection, click on the ‘Add Collection‘ button on the top left of Postman and call it “Test API”

    You will be prompted to give details about the collection, I’ve added a name Github API and given it a description.

    Clicking on Create should add the collection to the left pane, above, or below the example “POSTMAN Echo” collection.

    If you need a hierarchy for maintaining relevance between multiple API’s inside a collection, APIs can further be added to a folder inside a collection. Folders are a great way of separating different parts of your API workflow. You can be added folders through the “3 dot” button beside Collection Name:

    Eg.: name the folder “Get Calls” and give a description once again.

    Now that we have the folder, the next task is to add an API call that is related to the TEST_API_COLLECTION to that folder. That API call is to https://api.github.com/.

    If you still have one of the TEST_API_COLLECTION collections open, you can close it the same way you close tabs in a browser, or just click on the plus button to add a new tab on the right pane where we make requests.

    Type in or paste in https://api.github.com/ and press Send to see the response.

    Once you get the response, you can click on the arrow next to the Save button on the far right, and select Save As, a pop up will be displayed asking where to save the API call.

    Give a name, it can be the request URL, or a name like “GET Github Basic”, and a description, then choose the collection and folder, in this case, TEST_API_COLLECTION> GET CALLS, then click on Save. The API call will be added to the Github Root API folder on the left pane.

    Whenever you click on this request from the collection, it will open in the center pane.

    Write the Tests

    We’ve seen that the GET Github Basic request has a JSON response, which is usually the case for most of the APIs.This response has properties such as current_user_url, emails_url, followers_url and following_url to pick a few. The current_user_url has a value of https://api.github.com/user.  Let’s add a test, for this URL. Click on the ‘GET Github Basic‘ and click on the test tab in the section just below where the URL is put.

    You will notice on the right pane, we have some snippets which Postman creates when you click so that you don’t have to write a lot of code. Let’s add Response Body: JSON value check. Clicking on it produces the following snippet.

    var jsonData = JSON.parse(responseBody);
    tests["Your test name"] = jsonData.value === 100;

    From these two lines, it is apparent that Postman stores the response in a global object called responseBody, and we can use this to access response and assert values in tests as required.

    Postman also has another global variable object called tests, which is an object you can use to name your tests, and equate it to a boolean expression. If the boolean expression returns true, then the test passes.

    tests['some random test'] = x === y

    If you click on Send to make the request, you will see one of the tests failing.

    Lets create a test that relevant to our usecase.

    var jsonData = JSON.parse(responseBody);
    var usersURL = "https://api.github.com/user"
    tests["Gets the correct users url"] = jsonData.current_user_url === usersURL;

    Clicking on ‘Send‘, you’ll see the test passing.

    Let’s modify the test further to test some of the properties we want to check

    Ideally the things to be tested in an API Response Body should be:

    • Response Code ( Assert Correct Response Code for any request)
    • Response Time ( to check api responds in an acceptable time range / is not delayed)
    • Response Body is not empty / null
    tests["Status code is 200"] = responseCode.code === 200;
    tests["Response time is less than 200ms"] = responseTime < 200;
    tests["Response time is acceptable"] = _.inRange(responseTime, 0, 500);
    tests["Body is not empty"] = (responseBody!==null || responseBody.length!==0);

    Newman CLI

    Once you’ve set up all your collections and written tests for them, it may be tedious to go through them one by one and clicking send to see if a given collection test passes. This is where Newman comes in. Newman is a command-line collection runner for Postman.

    All you need to do is export your collection and the environment variables, then use Newman to run the tests from your terminal.

    NOTE: Make sure you’ve clicked on ‘Save’ to save your collection first before exporting.

    USING NEWMAN

    So the first step is to export your collection and environment variables. Click on the Menu icon for Github API collection, and select export.

    Select version 2, and click on “Export”

    Save the JSON file in a location you can access with your terminal. I created a local directory/folder called “postman” and saved it there.

    Install Newman CLI globally, then navigate to the where you saved the collection.

    npm install -g newman 
    cd postman

    Using Newman is quite straight-forward, and the documentation is extensive. You can even require it as a Node.js module and run the tests there. However, we will use the CLI.

    Once you are in the directory, run newman run <collection_name.json>, </collection_name.json> replacing the collection_name with the name you used to save the collection.

    newman run TEST_API_COLLECTION.postman_collection.json     

    NEWMAN CLI Options

    Newman provides a rich set of options to customize a run. A list of options can be retrieved by running it with the -h flag.

    
    $ newman run -h
    Options - Additional args: 
    Utility:
    -h, --help output usage information
    -v, --version output the version number
    Basic setup:
    --folder [folderName] Specify a single folder to run from a collection.
    -e, --environment [file|URL] Specify a Postman environment as a JSON [file]
    -d, --data [file] Specify a data file to use either json or csv
    -g, --global [file] Specify a Postman globals file as JSON [file]
    -n, --iteration-count [number] Define the number of iterations to run
    Request options:
    --delay-request [number] Specify a delay (in ms) between requests [number] --timeout-request [number] Specify a request timeout (in ms) for a request
    Misc.:
    --bail Stops the runner when a test case fails
    --silent Disable terminal output --no-color Disable colored output
    -k, --insecure Disable strict ssl
    -x, --suppress-exit-code Continue running tests even after a failure, but exit with code=0
    --ignore-redirects Disable automatic following of 3XX responses

    Lets try out of some of the options.

    Iterations

    Lets use the -n option to set the number of iterations to run the collection.

    $ newman run mycollection.json -n 10 # runs the collection 10 times

    To provide a different set of data, i.e. variables for each iteration, you can use the -d to specify a JSON or CSV file. For example, a data file such as the one shown below will run 2 iterations, with each iteration using a set of variables.

    [{
    "url": "http://127.0.0.1:5000",
      "user_id": "1",
      "id": "1",
      "token_id": "123123",
    },{
      "url": "http://postman-echo.com",
      "user_id": "2",
      "id": "2",
      "token_id": "899899",
    }]$ newman run mycollection.json -d data.json

    Alternately, the CSV file for the above set of variables would look like:

    url, user_id, id, token_id 
    http://127.0.0.1:5000, 1, 1, 123123123 
    http://postman-echo.com, 2, 2, 899899

    Environment Variables

    Each environment is a set of key-value pairs, with the key as the variable name. These Environment configurations can be used to differentiate between configurations specific to your execution environments eg. Dev, Test & Production.

    To provide a different execution environment, you can use the -e to specify a JSON or CSV file. For example, a environment file such as the one shown below will provide the environment variables globally to all tests during execution.

    postman_dev_env.json
    {
    "id": "b5c617ad-7aaf-6cdf-25c8-fc0711f8941b",
    "name": "dev env",
    "values": [
    {
    "enabled": true,
    "key": "env",
    "value": "dev.example.com",
    "type": "text"
    }  
    ],
    "timestamp": 1507210123364,
    "_postman_variable_scope": "environment",
    "_postman_exported_at": "2017-10-05T13:28:45.041Z",
    "_postman_exported_using": "Postman/5.2.1"
    }

    Bail FLAG

    Newman, by default, exits with a status code of 0 if everything runs well i.e. without any exceptions. Continuous integration tools respond to these exit codes and correspondingly pass or fail a build. You can use the –bail flag to tell Newman to halt on a test case error with a status code of 1 which can then be picked up by a CI tool or build system.

    $ newman run PostmanCollection.json -e environment.json --bail newman

    Conclusion

    Postman and Newman can be used for a number of test cases, including creating usage scenarios, Suites, Packs for your API Test Cases. Further NEWMAN / POSTMAN can be very well Integrated with CI/CD Tools such as Jenkins, Travis etc.

  • A Practical Guide to Deploying Multi-tier Applications on Google Container Engine (GKE)

    Introduction

    All modern era programmers can attest that containerization has afforded more flexibility and allows us to build truly cloud-native applications. Containers provide portability – ability to easily move applications across environments. Although complex applications comprise of many (10s or 100s) containers. Managing such applications is a real challenge and that’s where container orchestration and scheduling platforms like Kubernetes, Mesosphere, Docker Swarm, etc. come into the picture. 
    Kubernetes, backed by Google is leading the pack given that Redhat, Microsoft and now Amazon are putting their weight behind it.

    Kubernetes can run on any cloud or bare metal infrastructure. Setting up & managing Kubernetes can be a challenge but Google provides an easy way to use Kubernetes through the Google Container Engine(GKE) service.

    What is GKE?

    Google Container Engine is a Management and orchestration system for Containers. In short, it is a hosted Kubernetes. The goal of GKE is to increase the productivity of DevOps and development teams by hiding the complexity of setting up the Kubernetes cluster, the overlay network, etc.

    Why GKE? What are the things that GKE does for the user?

    • GKE abstracts away the complexity of managing a highly available Kubernetes cluster.
    • GKE takes care of the overlay network
    • GKE also provides built-in authentication
    • GKE also provides built-in auto-scaling.
    • GKE also provides easy integration with the Google storage services.

    In this blog, we will see how to create your own Kubernetes cluster in GKE and how to deploy a multi-tier application in it. The blog assumes you have a basic understanding of Kubernetes and have used it before. It also assumes you have created an account with Google Cloud Platform. If you are not familiar with Kubernetes, this guide from Deis  is a good place to start.

    Google provides a Command-line interface (gcloud) to interact with all Google Cloud Platform products and services. gcloud is a tool that provides the primary command-line interface to Google Cloud Platform. Gcloud tool can be used in the scripts to automate the tasks or directly from the command-line. Follow this guide to install the gcloud tool.

    Now let’s begin! The first step is to create the cluster.

    Basic Steps to create cluster

    In this section, I would like to explain about how to create GKE cluster. We will use a command-line tool to setup the cluster.

    Set the zone in which you want to deploy the cluster

    $ gcloud config set compute/zone us-west1-a

    Create the cluster using following command,

    $ gcloud container --project <project-name> 
    clusters create <cluster-name> 
    --machine-type n1-standard-2 
    --image-type "COS" --disk-size "50" 
    --num-nodes 2 --network default 
    --enable-cloud-logging --no-enable-cloud-monitoring

    Let’s try to understand what each of these parameters mean:

    –project: Project Name

    –machine-type: Type of the machine like n1-standard-2, n1-standard-4

    –image-type: OS image.”COS” i.e. Container Optimized OS from Google: More Info here.

    –disk-size: Disk size of each instance.

    –num-nodes: Number of nodes in the cluster.

    –network: Network that users want to use for the cluster. In this case, we are using default network.

    Apart from the above options, you can also use the following to provide specific requirements while creating the cluster:

    –scopes: Scopes enable containers to direct access any Google service without needs credentials. You can specify comma separated list of scope APIs. For example:

    • Compute: Lets you view and manage your Google Compute Engine resources
    • Logging.write: Submit log data to Stackdriver.

    You can find all the Scopes that Google supports here: .

    –additional-zones: Specify additional zones to high availability. Eg. –additional-zones us-east1-b, us-east1-d . Here Kubernetes will create a cluster in 3 zones (1 specified at the beginning and additional 2 here).

    –enable-autoscaling : To enable the autoscaling option. If you specify this option then you have to specify the minimum and maximum required nodes as follows; You can read more about how auto-scaling works here. Eg:   –enable-autoscaling –min-nodes=15 –max-nodes=50

    You can fetch the credentials of the created cluster. This step is to update the credentials in the kubeconfig file, so that kubectl will point to required cluster.

    $ gcloud container clusters get-credentials my-first-cluster --project project-name

    Now, your First Kubernetes cluster is ready. Let’s check the cluster information & health.

    $ kubectl get nodes
    NAME    STATUS    AGE   VERSION
    gke-first-cluster-default-pool-d344484d-vnj1  Ready  2h  v1.6.4
    gke-first-cluster-default-pool-d344484d-kdd7  Ready  2h  v1.6.4
    gke-first-cluster-default-pool-d344484d-ytre2  Ready  2h  v1.6.4

    After creating Cluster, now let’s see how to deploy a multi tier application on it. Let’s use simple Python Flask app which will greet the user, store employee data & get employee data.

    Application Deployment

    I have created simple Python Flask application to deploy on K8S cluster created using GKE. you can go through the source code here. If you check the source code then you will find directory structure as follows:

    TryGKE/
    ├── Dockerfile
    ├── mysql-deployment.yaml
    ├── mysql-service.yaml
    ├── src    
      ├── app.py    
      └── requirements.txt    
      ├── testapp-deployment.yaml    
      └── testapp-service.yaml

    In this, I have written a Dockerfile for the Python Flask application in order to build our own image to deploy. For MySQL, we won’t build an image of our own. We will use the latest MySQL image from the public docker repository.

    Before deploying the application, let’s re-visit some of the important Kubernetes terms:

    Pods:

    The pod is a Docker container or a group of Docker containers which are deployed together on the host machine. It acts as a single unit of deployment.

    Deployments:

    Deployment is an entity which manages the ReplicaSets and provides declarative updates to pods. It is recommended to use Deployments instead of directly using ReplicaSets. We can use deployment to create, remove and update ReplicaSets. Deployments have the ability to rollout and rollback the changes.

    Services:

    Service in K8S is an abstraction which will connect you to one or more pods. You can connect to pod using the pod’s IP Address but since pods come and go, their IP Addresses change.  Services get their own IP & DNS and those remain for the entire lifetime of the service. 

    Each tier in an application is represented by a Deployment. A Deployment is described by the YAML file. We have two YAML files – one for MySQL and one for the Python application.

    1. MySQL Deployment YAML

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: mysql
    spec:
      template:
        metadata:
          labels:
            app: mysql
        spec:
          containers:
            - env:
                - name: MYSQL_DATABASE
                  value: admin
                - name: MYSQL_ROOT_PASSWORD
                  value: admin
              image: 'mysql:latest'
              name: mysql
              ports:
                - name: mysqlport
                  containerPort: 3306
                  protocol: TCP

    2. Python Application Deployment YAML

    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: test-app
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            app: test-app
        spec:
          containers:
          - name: test-app
            image: ajaynemade/pymy:latest
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 5000

    Each Service is also represented by a YAML file as follows:

    1. MySQL service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: mysql-service
    spec:
      ports:
      - port: 3306
        targetPort: 3306
        protocol: TCP
        name: http
      selector:
        app: mysql

    2. Python Application service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: test-service
    spec:
      type: LoadBalancer
      ports:
      - name: test-service
        port: 80
        protocol: TCP
        targetPort: 5000
      selector:
        app: test-app

    You will find a ‘kind’ field in each YAML file. It is used to specify whether the given configuration is for deployment, service, pod, etc.

    In the Python app service YAML, I am using type = LoadBalancer. In GKE, There are two types of cloud load balancers available to expose the application to outside world.

    1. TCP load balancer: This is a TCP Proxy-based load balancer. We will use this in our example.
    2. HTTP(s) load balancer: It can be created using Ingress. For more information, refer to this post that talks about Ingress in detail.

    In the MySQL service, I’ve not specified any type, in that case, type ‘ClusterIP’ will get used, which will make sure that MySQL container is exposed to the cluster and the Python app can access it.

    If you check the app.py, you can see that I have used “mysql-service.default” as a hostname. “Mysql-service.default” is a DNS name of the service. The Python application will refer to that DNS name while accessing the MySQL Database.

    Now, let’s actually setup the components from the configurations. As mentioned above, we will first create services followed by deployments.

    Services:

    $ kubectl create -f mysql-service.yaml
    $ kubectl create -f testapp-service.yaml

    Deployments:

    $ kubectl create -f mysql-deployment.yaml
    $ kubectl create -f testapp-deployment.yaml

    Check the status of the pods and services. Wait till all pods come to the running state and Python application service to get external IP like below:

    $ kubectl get services
    NAME            CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
    kubernetes      10.55.240.1     <none>        443/TCP        5h
    mysql-service   10.55.240.57    <none>        3306/TCP       1m
    test-service    10.55.246.105   35.185.225.67     80:32546/TCP   11s

    Once you get the external IP, then you should be able to make APIs calls using simple curl requests.

    Eg. To Store Data :

    curl -H "Content-Type: application/x-www-form-urlencoded" -X POST  http://35.185.225.67:80/storedata -d id=1 -d name=NoOne

    Eg. To Get Data :

    curl 35.185.225.67:80/getdata/1

    At this stage your application is completely deployed and is externally accessible.

    Manual scaling of pods

    Scaling your application up or down in Kubernetes is quite straightforward. Let’s scale up the test-app deployment.

    $ kubectl scale deployment test-app --replicas=3

    Deployment configuration for test-app will get updated and you can see 3 replicas of test-app are running. Verify it using,

    kubectl get pods

    In the same manner, you can scale down your application by reducing the replica count.

    Cleanup :

    Un-deploying an application from Kubernetes is also quite straightforward. All we have to do is delete the services and delete the deployments. The only caveat is that the deletion of the load balancer is an asynchronous process. You have to wait until it gets deleted.

    $ kubectl delete service mysql-service
    $ kubectl delete service test-service

    The above command will deallocate Load Balancer which was created as a part of test-service. You can check the status of the load balancer with the following command.

    $ gcloud compute forwarding-rules list

    Once the load balancer is deleted, you can clean-up the deployments as well.

    $ kubectl delete deployments test-app
    $ kubectl delete deployments mysql

    Delete the Cluster:

    $ gcloud container clusters delete my-first-cluster

    Conclusion

    In this blog, we saw how easy it is to deploy, scale & terminate applications on Google Container Engine. Google Container Engine abstracts away all the complexity of Kubernetes and gives us a robust platform to run containerised applications. I am super excited about what the future holds for Kubernetes!

    Check out some of Velotio’s other blogs on Kubernetes.

  • An Introduction To Cloudflare Workers And Cloudflare KV store

    Cloudflare Workers

    This post gives a brief introduction to Cloudflare Workers and Cloudflare KV store. They address a fairly common set of problems around scaling an application globally. There are standard ways of doing this but they usually require a considerable amount of upfront engineering work and developers have to be aware of the ‘scalability’ issues to some degree. Serverless application tools target easy scalability and quick response times around the globe while keeping the developers focused on the application logic rather than infra nitty-gritties.

    Global responsiveness

    When an application is expected to be accessed around the globe, requests from users sitting in different time-zones should take a similar amount of time. There can be multiple ways of achieving this depending upon how data intensive the requests are and what those requests actually do.

    Data intensive requests are harder and more expensive to globalize, but again not all the requests are same. On the other hand, static requests like getting a documentation page or a blog post can be globalized by generating markup at build time and deploying them on a CDN.

    And there are semi-dynamic requests. They render static content either with some small amount of data or their content change based on the timezone the request came from.

    The above is a loose classification of requests but there are exceptions, for example, not all the static requests are presentational.

    Serverless frameworks are particularly useful in scaling static and semi-static requests.

    Cloudflare Workers Overview

    Cloudflare worker is essentially a function deployment service. They provide a serverless execution environment which can be used to develop and deploy small(although not necessarily) and modular cloud functions with minimal effort.

    It is very trivial to start with workers. First, lets install wrangler, a tool for managing Cloudfare Worker projects.

    npm i @cloudflare/wrangler -g

    Wrangler handles all the standard stuff for you like project generation from templates, build, config, publishing among other things.

    A worker primarily contains 2 parts: an event listener that invokes a worker and an event handler that returns a response object. Creating a worker is as easy as adding an event listener to a button.

    addEventListener('fetch', event => {
        event.respondWith(handleRequest(event.request))
    })
    
    async function handleRequest(request) {
        return new Response("hello world")
    }

    Above is a simple hello world example. Wrangler can be used to build and get a live preview of your worker.

    wrangler build

    will build your worker. And 

    wrangler preview 

    can be used to take a live preview on the browser. The preview is only meant to be used for testing(either by you or others). If you want the workers to be triggered by your own domain or a workers.dev subdomain, you need to publish it.

    Publishing is fairly straightforward and requires very less configuration on both wrangler and your project.

    Wrangler Configuration

    Just create an account on Cloudflare and get API key. To configure wrangler, just do:

    wrangler config

    It will ask for the registered email and API key, and you are good to go.

    To publish your worker on a workers.dev subdomain, just fill your account ID in the wrangler.toml and hit wrangler publish. The worker will be deployed and live at a generated workers.dev subdomain.

    Regarding Routes

    When you publish on a {script-name}.{subdomain}.workers.dev domain, the script or project associated with script-name will be invoked. There is no way to call a script just from {subdomain}.workers.dev.

    Worker KV

    Workers alone can’t be used to make anything complex without any persistent storage, that’s where Workers KV comes into the picture. Workers KV as it sounds, is a low-latency, high-volume, key-value store that is designed for efficient reads.

    It optimizes the read latency by dynamically spreading the most frequently read entries to the edges(replicated in several regions) and storing less frequent entries centrally.

    Newly added keys(or a CREATE) are immediately reflected in every region while a value change in the keys(or an UPDATE) may take as long as 60 seconds to propagate, depending upon the region.

    Workers KV is only available to paid users of Cloudflare.

    Writing Data in Workers KV

    curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces" 
    -X POST 
    -H "X-Auth-Email: $CLOUDFLARE_EMAIL" 
    -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" 
    -H "Content-Type: application/json" 
    --data '{"title": "Requests"}'
    The above HTTP request will create a namespace by the name Requests. The response should look something like this:
    {
        "result": {
            "id": "30b52f55aafb41d88546d01d5f69440a",
            "title": "Requests",
            "supports_url_encoding": true
        },
        "success": true,
        "errors": [],
        "messages": []
    }

    Now we can write KV pairs in this namespace. The following HTTP requests will do the same:

    curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/values/first-key" 
    -X PUT 
    -H "X-Auth-Email: $CLOUDFLARE_EMAIL" 
    -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" 
    --data 'My first value!'

    Here the NAMESPACE_ID is the same ID that we received in the last request. First-key is the key name and the My first value is the value.

    Let’s complicate things a little

    Above overview just introduces the managed cloud workers with a ‘hello world’ app and basics of the Workers KV, but now let’s make something more complicated. We will make an app which will tell how many requests have been made from your country till now. For example, if you pinged the worker from the US then it will return number of requests made so far from the US.

    We will need: 

    • Some place to store the count of requests for each country. 
    • Find from which country the Worker was invoked.

    For the first part, we will use the Workers KV to store the count for every request.

    Let’s start

    First, we will create a new project using wrangler: wrangler generate request-count.

    We will be making HTTP calls to write values in the Workers KV, so let’s add ‘node-fetch’ to the project:

    npm install node-fetch

    Now, how do we find from which country each request is coming from? The answer is the cf object that is provided with each request to a worker.

    The cf object is a special object that is passed with each request and can be accessed with request.cf. This mainly contains region specific information along with TLS and Auth information. The details of what is provided in the cf, can be found here.

    As we can see from the documentation, we can get country from

    request.cf.country.

    The cf object is not correctly populated in the wrangler preview, you will need to publish your worker in order to test cf’s usage. An open issue mentioning the same can be found here.

    Now, the logic is pretty straightforward here. When we get a request from a country for which we don’t have an entry in the Worker’s KV, we make an entry with value 1, else we increment the value of the country key.

    To use Workers KV, we need to create a namespace. A namespace is just a collection of key-value pairs where all the keys have to be unique.

    A namespace can be created under the KV tab in Cloudflare web UI by giving the name or using the API call above. You can also view/browse all of your namespaces from the web UI. Following API call can be used to read the value of a key from a namespace:

    curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/values/first-key" 
    -H "X-Auth-Email: $CLOUDFLARE_EMAIL" 
    -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" 

    But, it is neither the fastest nor the easiest way. Cloudflare provides a better and faster way to read data from your namespaces. It’s called binding. Each KV namespace can be bound to a worker script so to make it available in the script by the variable name. Any namespace can be bound with any worker. A KV namespace can be bound to a worker by going to the editing menu of a worker from the Cloudflare UI. 

    Following steps show you how to bind a namespace to a worker:

    Go to the edit page of the worker in Cloudflare web UI and click on the KV tab:

    Then add a binding by clicking the ‘Add binding’ button.

    You can select the namespace name and the variable name by which it will be bound. More details can be found here. A binding that I’ve made can be seen in the above image.

    That’s all we need to get this to work. Following is the relevant part of the script:

    const fetch = require('node-fetch')
    
    addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request))
    })
    
    /**
    * Fetch and log a request
    * @param {Request} request
    */
    async function handleRequest(request) {
        const country = request.cf.country
    
        const url = `https://api.cloudflare.com/client/v4/accounts/account-id/storage/kv/namespaces/namespace-id/values/${country}`
    
        let count = await requests.get(country)
    
        if (!count) {
            count = 1
        } else {
            count = parseInt(count) + 1
        }
    
        try {
            response = await fetch(url, {
            method: 'PUT',
            headers: {"X-Auth-Email": "email", "X-Auth-Key": "auth-key"},
            body: `${count}`
            })
        } catch (error) {
            return new Response(error, { status: 500 })
        }
    
        return new Response(`${country}: ${count}`, { status: 200 }) 
    }

    In the above code, I bound the Requests namespace that we created by the requests variable that would be dynamically resolved when we publish.

    The full source of this can be found here.

    This small application also demonstrates some of the practical aspects of the workers. For example, you would notice that the updates take some time to get reflected and response time of the workers is quick, especially when they are deployed on a .workers.dev subdomain here.

    Side note: You will have to recreate the namespace-worker binding everytime you deploy the worker or you do wrangler publish.

    Workers vs. AWS Lambda

    AWS Lambda has been a major player in the serverless market for a while now. So, how is Cloudflare Workers as compared to it? Let’s see.

    Architecture:

    Cloudflare Workers `Isolates` instead of a container based underlying architecture. `Isolates` is the technology that allows V8(Google Chrome’s JavaScript Engine) to run thousands of processes on a single server in an efficient and secure manner. This effectively translates into faster code execution and lowers memory usage. More details can be found here.

    Price:

    The above mentioned architectural difference allows Workers to be significantly cheaper than Lambda. While a Worker offering 50 milliseconds of CPU costs $0.50 per million requests, the equivalent Lambda costs $1.84 per million. A more detailed price comparison can be found here.

    Speed:

    Workers also show significantly better performance numbers than Lambda and Lambda@Edge. Tests run by Cloudflare claim that they are 441% faster than Lambda and 192% faster than Lambda@Edge. A detailed performance comparison can be found here.

    This better performance is also confirmed by serverless-benchmark.

    Wrapping Up:

    As we have seen, Cloudflare Workers along with the KV Store does make it very easy to start with a serverless application. They provide fantastic performance while using less cost along with intuitive deployment. These properties make them ideal for making globally accessible serverless applications.

  • Explanatory vs. Predictive Models in Machine Learning

    My vision on Data Analysis is that there is continuum between explanatory models on one side and predictive models on the other side. The decisions you make during the modeling process depend on your goal. Let’s take Customer Churn as an example, you can ask yourself why are customers leaving? Or you can ask yourself which customers are leaving? The first question has as its primary goal to explain churn, while the second question has as its primary goal to predict churn. These are two fundamentally different questions and this has implications for the decisions you take along the way. The predictive side of Data Analysis is closely related to terms like Data Mining and Machine Learning.

    SPSS & SAS

    When we’re looking at SPSS and SAS, both of these languages originate from the explanatory side of Data Analysis. They are developed in an academic environment, where hypotheses testing plays a major role. This makes that they have significant less methods and techniques in comparison to R and Python. Nowadays, SAS and SPSS both have data mining tools (SAS Enterprise Miner and SPSS Modeler), however these are different tools and you’ll need extra licenses.

    I have spent some time to build extensive macros in SAS EG to seamlessly create predictive models, which also does a decent job at explaining the feature importance. While a Neural Network may do a fair job at making predictions, it is extremely difficult to explain such models, let alone feature importance. The macros that I have built in SAS EG does precisely the job of explaining the features, apart from producing excellent predictions.

    Open source TOOLS: R & PYTHON

    One of the major advantages of open source tools is that the community continuously improves and increases functionality. R was created by academics, who wanted their algorithms to spread as easily as possible. R has the widest range of algorithms, which makes R strong on the explanatory side and on the predictive side of Data Analysis.

    Python is developed with a strong focus on (business) applications, not from an academic or statistical standpoint. This makes Python very powerful when algorithms are directly used in applications. Hence, we see that the statistical capabilities are primarily focused on the predictive side. Python is mostly used in Data Mining or Machine Learning applications where a data analyst doesn’t need to intervene. Python is therefore also strong in analyzing images and videos. Python is also the easiest language to use when using Big Data Frameworks like Spark. With the plethora of packages and ever improving functionality, Python is a very accessible tool for data scientists.

    MACHINE LEARNING MODELS

    While procedures like Logistic Regression are very good at explaining the features used in a prediction, some others like Neural Networks are not. The latter procedures may be preferred over the former when it comes to only prediction accuracy and not explaining the models. Interpreting or explaining the model becomes an issue for Neural Networks. You can’t just peek inside a deep neural network to figure out how it works. A network’s reasoning is embedded in the behavior of numerous simulated neurons, arranged into dozens or even hundreds of interconnected layers. In most cases the Product Marketing Officer may be interested in knowing what are the factors that are most important for a specific advertising project. What can they concentrate on to get the response rates higher, rather than, what will be their response rate, or revenues in the upcoming year. These questions are better answered by procedures which can be interpreted in an easier way. This is a great article about the technical and ethical consequences of the lack of explanations provided by complex AI models.

    Procedures like Decision Trees are very good at explaining and visualizing what exactly are the decision points (features and their metrics). However, those do not produce the best models. Random Forests, Boosting are the procedures which use Decision Trees as the basic starting point to build the predictive models, which are by far some of the best methods to build sophisticated prediction models.

    While Random Forests use fully grown (highly complex) Trees, and by taking random samples from the training set (a process called Bootstrapping), then each split uses only a proper subset of features from the entire feature set to actually make the split, rather than using all of the features. This process of bootstrapping helps with lower number of training data (in many cases there is no choice to get more data). The (proper) subsetting of the features has a tremendous effect on de-correlating the Trees grown in the Forest (hence randomizing it), leading to a drop in Test Set error. A fresh subset of features is chosen at each step of splitting, making the method robust. The strategy also stops the strongest feature from appearing each time a split is considered, making all the trees in the forest similar. The final result is obtained by averaging the result over all trees (in case of Regression problems), or by taking a majority class vote (in case of classification problem).

    On the other hand, Boosting is a method where a Forest is grown using Trees which are NOT fully grown, or in other words, with Weak Learners. One has to specify the number of trees to be grown, and the initial weights of those trees for taking a majority vote for class selection. The default weight, if not specified is the average of the number of trees requested. At each iteration, the method fits these weak learners, finds the residuals. Then the weights of those trees which failed to predict the correct class is increased so that those trees can concentrate better on the failed examples. This way, the method proceeds by improving the accuracy of the Boosted Trees, stopping when the improvement is below a threshold. One particularly implementation of Boosting, AdaBoost has very good accuracy over other implementations. AdaBoost uses Trees of depth 1, known as Decision Stump as each member of the Forest. These are slightly better than random guessing to start with, but over time they learn the pattern and perform extremely well on test set. This method is more like a feedback control mechanism (where the system learns from the errors). To address overfitting, one can use the hyper-parameter Learning Rate (lambda) by choosing values in the range: (0,1]. Very small values of lambda will take more time to converge, however larger values may have difficulty converging. This can be achieved by a iterative process to select the correct value for lambda, plotting the test error rate against values of lambda. The value of lambda with the lowest test error should be chosen.

    In all these methods, as we move from Logistic Regression, to Decision Trees to Random Forests and Boosting, the complexity of the models increase, making it almost impossible to EXPLAIN the Boosting model to marketers/product managers. Decision Trees are easy to visualize, Logisitic Regression results can be used to demonstrate the most important factors in a customer acquisition model and hence will be well received by business leaders. On the other hand, the Random Forest and Boosting methods are extremely good predictors, without much scope for explaining. But there is hope: These models have functions for revealing the most important variables, although it is not possible to visualize why. 

    USING A BALANCED APPROACH

    So I use a mixed strategy: Use the previous methods as a step in Exploratory Data Analysis, present the importance of features, characteristics of the data to the business leaders in phase one, and then use the more complicated models to build the prediction models for deployment, after building competing models. That way, one not only gets to understand what is happening and why, but also gets the best predictive power. In most cases that I have worked, I have rarely seen a mismatch between the explanation and the predictions using different methods. After all, this is all math and the way of delivery should not change end results. Now that’s a happy ending for all sides of the business!

  • Installing Redis Service in DC/OS With Persistent Storage Using AWS Volumes

    Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker.

    It supports various data structures such as Strings, Hashes, Lists, Sets etc. DCOS offers Redis as a service

    Why Do We Use External Persistent Storage for Redis Mesos Containers?

    Since Redis is an in-memory database, an instance/service restart will result in loss of data. To counter this, it is always advisable to snapshot the Redis in-memory database from time to time.

    This helps Redis instance to recover from the point in time failure.

    In DCOS, Redis is deployed as a stateless service. To make it a stateful and persistent data, we can configure local volumes or external volumes.

    The disadvantage of having a local volume mapped to Mesos containers is when a slave node goes down, your local volume becomes unavailable, and the data loss occurs.

    However, with external persistent volumes, as they are available on each node of the DCOS cluster, a slave node failure does not impact the data availability.

    Rex-Ray

    REX-Ray is an open source, storage management solution designed to support container runtimes such as Docker and Mesos.

    REX-Ray enables stateful applications such as databases to persist and maintain its data after the life cycle of the container has ended. Built-in high availability enables orchestrators such as Docker Swarm, Kubernetes, and Mesos Frameworks like Marathon to automatically orchestrate storage tasks between hosts in a cluster.

    Built on top of the libStorage framework, REX-Ray’s simplified architecture consists of a single binary and runs as a stateless service on every host using a configuration file to orchestrate multiple storage platforms.

    Objective: To create a Redis service in DC/OS environment with persistent storage.

    Warning: The Persistent Volume feature is still in beta Phase for DC/OS Version 1.11.

    Prerequisites:

    • Make sure the rexray service is running and is in a healthy state for the cluster.

    Steps:

    • Click on the Add button in Services component of DC/OS GUI.
    • Click on JSON Configuration.  

    Note: For persistent storage, below code should be added in the normal Redis service configuration JSON file to mount external persistent volumes.

    "volumes": [
          {
            "containerPath": "/data",
            "mode": "RW",
            "external": {
              "name": "redis4volume",
              "provider": "dvdi",
              "options": {
                "dvdi/driver": "rexray"
              }
            }
          }
        ],

    • Make sure the service is up and in a running state:

    If you look closely, the service was suspended and respawned on a different slave node. We populated the database with dummy data and saved the snapshot in the data directory.

    When the service did come upon a different node 10.0.3.204, the data persisted and the volume was visible on the new node.

    core@ip-10-0-3-204 ~ $ /opt/mesosphere/bin/rexray volume list
    
    - name: datavolume
      volumeid: vol-00aacade602cf960c
      availabilityzone: us-east-1a
      status: in-use
      volumetype: standard
      iops: 0
      size: "16"
      networkname: ""
      attachments:
      - volumeid: vol-00aacade602cf960c
        instanceid: i-0d7cad91b62ec9a64
        devicename: /dev/xvdb
    

    •  Check the volume tab :

    Note: For external volumes, the status will be unavailable. This is an issue with DC/OS.

    The Entire Service JSON file:

    {
      "id": "/redis4.0-new-failover-test",
      "instances": 1,
      "cpus": 1.001,
      "mem": 2,
      "disk": 0,
      "gpus": 0,
      "backoffSeconds": 1,
      "backoffFactor": 1.15,
      "maxLaunchDelaySeconds": 3600,
      "container": {
        "type": "DOCKER",
        "volumes": [
          {
            "containerPath": "/data",
            "mode": "RW",
            "external": {
              "name": "redis4volume",
              "provider": "dvdi",
              "options": {
                "dvdi/driver": "rexray"
              }
            }
          }
        ],
        "docker": {
          "image": "redis:4",
          "network": "BRIDGE",
          "portMappings": [
            {
              "containerPort": 6379,
              "hostPort": 0,
              "servicePort": 10101,
              "protocol": "tcp",
              "name": "default",
              "labels": {
                "VIP_0": "/redis4.0:6379"
              }
            }
          ],
          "privileged": false,
          "forcePullImage": false
        }
      },
      "healthChecks": [
        {
          "gracePeriodSeconds": 60,
          "intervalSeconds": 5,
          "timeoutSeconds": 5,
          "maxConsecutiveFailures": 3,
          "portIndex": 0,
          "protocol": "TCP"
        }
      ],
      "upgradeStrategy": {
        "minimumHealthCapacity": 0.5,
        "maximumOverCapacity": 0
      },
      "unreachableStrategy": {
        "inactiveAfterSeconds": 300,
        "expungeAfterSeconds": 600
      },
      "killSelection": "YOUNGEST_FIRST",
      "requirePorts": true
    }

    Redis entrypoint

    To connect with Redis service, use below host:port in your applications:

    redis.marathon.l4lb.thisdcos.directory:6379

    Conclusion

    We learned about Standalone Redis Service deployment from DCOS catalog on DCOS.  Also, we saw how to add Persistent storage to it using RexRay. We also learned how RexRay automatically manages volumes over AWS ebs and how to integrate them in DCOS apps/services.  Finally, we saw how other applications can communicate with this Redis service.

    References

  • Micro Frontends: Reinventing UI In The Microservices World

    It is amazing how the software industry has evolved. Back in the day, a software was a simple program. Some of the first software applications like The Apollo Missions Landing modules and Manchester Baby were basic stored procedures. Software was primarily used for research and mathematical purposes.

    The invention of personal computers and the prominence of the Internet changed the software world. Desktop applications like word processors, spreadsheets, and games grew. Websites gradually emerged. Back then, simple pages were delivered to the client as static documents for viewing. By the mid-1990s, with Netscape introducing client-side scripting language, JavaScript and Macromedia bringing in Flash, the browser became more powerful, allowing websites to become richer and more interactive. In 1999, the Java language introduced Servlets. And thus born the Web Application. Nevertheless, these developments and applications were still simpler. Engineers didn’t emphasize enough on structuring them and mostly built unstructured monolithic applications.

    The advent of disruptive technologies like cloud computing and Big data paved the way for more intricate, convolute web and native mobile applications. From e-commerce and video streaming apps to social media and photo editing, we had applications doing some of the most complicated data processing and storage tasks. The traditional monolithic way now posed several challenges in terms of scalability, team collaboration and integration/deployment, and often led to huge and messy The Ball Of Mud codebases.

    Fig: Monolithic Application Problems – Source

    To untangle this ball of software, came in a number of service-oriented architectures. The most promising of them was Microservices – breaking an application into smaller chunks that can be developed, deployed and tested independently but worked as a single cohesive unit. Its benefits of scalability and ease of deployment by multiple teams proved as a panacea to most of the architectural problems. A few front-end architectures also came up, such as MVC, MVVM, Web Components, to name a few. But none of them were fully able to reap the benefits of Microservices.

    Fig: Microservice Architecture – Source

    ‍Micro Frontends: The Concept‍

    Micro Frontends first came up in ThoughtWorks Technology Radar where they assessed, tried and eventually adopted the technology after noticing significant benefits. It is a Microservice approach to front-end web development where independently deliverable front-end applications are composed as a whole. 

    With Microservices, Micro Frontends breaks the last monolith to create a complete Micro-Architecture design pattern for web applications. It is entirely composed of loosely coupled vertical slices of business functionality rather than in horizontals. We can term these verticals as ‘Microapps’. This concept is not new and has appeared in Scaling with Microservices and Vertical Decomposition. It first presented the idea of every vertical being responsible for a single business domain and having its presentation layer, persistence layer, and a separate database. From the development perspective, every vertical is implemented by exactly one team and no code is shared among different systems.

    Fig: Micro Frontends with Microservices (Micro-architecture)

    Why Micro Frontends?

    A microservice architecture has a whole slew of advantages when compared to monolithic architectures.

    Ease of Upgrades – Micro Frontends build strict bounded contexts in the application. Applications can be updated in a more incremental and isolated fashion without worrying about the risks of breaking up another part of the application.

    Scalability – Horizontal scaling is easy for Micro Frontends. Each Micro Frontend has to be stateless for easier scalability.

    Ease of deployability: Each Micro Frontend has its CI/CD pipeline, that builds, tests and deploys it to production. So it doesn’t matter if another team is working on a feature and has pushed a bug fix or if a cutover or refactoring is taking place. There should be no risks involved in pushing changes done on a Micro Frontend as long as there is only one team working on it.

    Team Collaboration and Ownership: The Scrum Guide says that “Optimal Development Team size is small enough to remain nimble and large enough to complete significant work within a Sprint”. Micro Frontends are perfect for multiple cross-functional teams that can completely own a stack (Micro Frontend) of an application from UX to Database design. In case of an E-commerce site, the Product team and the Payment team can concurrently work on the app without stepping on each other’s toes.

    Micro Frontend Integration Approaches

    There is a multitude of ways to implement Micro Frontends. It is recommended that any approach for this should take a Runtime integration route instead of a Build Time integration, as the former has to re-compile and release on every single Micro Frontend to release any one of the Micro Frontend’s changes.

    We shall learn some of the prominent approaches of Micro Frontends by building a simple Pet Store E-Commerce site. The site has the following aspects (or Microapps, if you will) – Home or Search, Cart, Checkout, Product, and Contact Us. We shall only be working on the Front-end aspect of the site. You can assume that each Microapp has a microservice dedicated to it in the backend. You can view the project demo here and the code repository here. Each way of integration has a branch in the repo code that you can check out to view.

    Single Page Frontends –

    The simplest way (but not the most elegant) to implement Micro Frontends is to treat each Micro Frontend as a single page.

    Fig: Single Page Micro Frontends: Each HTML file is a frontend.
    !DOCTYPE html>
    <html lang="zxx">
    <head>
    	<title>The MicroFrontend - eCommerce Template</title>
    </head>
    <body>
      <header class="header-section header-normal">
        <!-- Header is repeated in each frontend which is difficult to maintain -->
        ....
        ....
      </header
      <main>
      </main>
      <footer
        <!-- Footer is repeated in each frontend which means we have to multiple changes across all frontends-->
      </footer>
      <script>
        <!-- Cross Cutting features like notification, authentication are all replicated in all frontends-->
      </script>
    </body>

    It is one of the purest ways of doing Micro Frontends because no container or stitching element binds the front ends together into an application. Each Micro Frontend is a standalone app with each dependency encapsulated in it and no coupling with the others. The flipside of this approach is that each frontend has a lot of duplication in terms of cross-cutting concerns like headers and footers, which adds redundancy and maintenance burden.

    JavaScript Rendering Components (Or Web Components, Custom Element)-

    As we saw above, single-page Micro Frontend architecture has its share of drawbacks. To overcome these, we should opt for an architecture that has a container element that builds the context of the app and the cross-cutting concerns like authentication, and stitches all the Micro Frontends together to create a cohesive application.

    // A virtual class from which all micro-frontends would extend
    class MicroFrontend {
      
      beforeMount() {
        // do things before the micro front-end mounts
      }
    
      onChange() {
        // do things when the attributes of a micro front-end changes
      }
    
      render() {
        // html of the micro frontend
        return '<div></div>';
      }
    
      onDismount() {
        // do things after the micro front-end dismounts 
      }
    }

    class Cart extends MicroFrontend {
      beforeMount() {
        // get previously saved cart from backend
      }
    
      render() {
        return `<!-- Page -->
        <div class="page-area cart-page spad">
          <div class="container">
            <div class="cart-table">
              <table>
                <thead>
                .....
                
         `
      }
    
      addItemToCart(){
        ...
      }
        
      deleteItemFromCart () {
        ...
      }
    
      applyCouponToCart() {
        ...
      }
        
      onDismount() {
        // save Cart for the user to get back to afterwards
      }
    }

    class Product extends MicroFrontend {
      static get productDetails() {
        return {
          '1': {
            name: 'Cat Table',
            img: 'img/product/cat-table.jpg'
          },
          '2': {
            name: 'Dog House Sofa',
            img: 'img/product/doghousesofa.jpg'
          },
        }
      }
      getProductDetails() {
        var urlParams = new URLSearchParams(window.location.search);
        const productId = urlParams.get('productId');
        return this.constructor.productDetails[productId];
      }
      render() {
        const product = this.getProductDetails();
        return `	<!-- Page -->
        <div class="page-area product-page spad">
          <div class="container">
            <div class="row">
              <div class="col-lg-6">
                <figure>
                  <img class="product-big-img" src="${product.img}" alt="">`
      }
      selectProductColor(color) {}
    
      selectProductSize(size) {}
     
      addToCart() {
        // delegate call to MicroFrontend Cart.addToCart function
      }
      
    }

    <!DOCTYPE html>
    <html lang="zxx">
    <head>
    	<title>PetStore - because Pets love pampering</title>
    	<meta charset="UTF-8
      <link rel="stylesheet" href="css/style.css"/>
    
    </head>
    <body>
    	<!-- Header section -->
    	<header class="header-section">
      ....
      </header>
    	<!-- Header section end -->
    	<main id='microfrontend'>
        <!-- This is where the Micro-frontend gets rendered by utility renderMicroFrontend.js-->
    	</main>
                                    <!-- Header section -->
    	<footer class="header-section">
      ....
      </footer>
    	<!-- Footer section end -->
      	<script src="frontends/MicroFrontend.js"></script>
    	<script src="frontends/Home.js"></script>
    	<script src="frontends/Cart.js"></script>
    	<script src="frontends/Checkout.js"></script>
    	<script src="frontends/Product.js"></script>
    	<script src="frontends/Contact.js"></script>
    	<script src="routes.js"></script>
    	<script src="renderMicroFrontend.js"></script>

    function renderMicroFrontend(pathname) {
      const microFrontend = routes[pathname || window.location.hash];
      const root = document.getElementById('microfrontend');
      root.innerHTML = microFrontend ? new microFrontend().render(): new Home().render();
      $(window).scrollTop(0);
    }
    
    $(window).bind( 'hashchange', function(e) { renderFrontend(window.location.hash); });
    renderFrontend(window.location.hash);
    
    utility routes.js (A map of the hash route to the Microfrontend class)
    const routes = {
      '#': Home,
      '': Home,
      '#home': Home,
      '#cart': Cart,
      '#checkout': Checkout,
      '#product': Product,
      '#contact': Contact,
    };

    As you can see, this approach is pretty neat and encapsulates a separate class for Micro Frontends. All other Micro Frontends extend from this. Notice how all the functionality related to Microapp is encapsulated in the respective Micro Frontend. This makes sure that concurrent work on a Micro Frontend doesn’t mess up some other Micro Frontends.

    Everything will work in a similar paradigm when it comes to Web Components and Custom Elements.

    React

    With the client-side JavaScript frameworks being very popular, it is impossible to leave React from any Front End discussion. React being a component-based JS library, much of the things discussed above will also apply to React. I am going to discuss some of the technicalities and challenges when it comes to Micro Frontends with React.

    Styling

    Since there should be minimum sharing of code between any Micro Frontend, styling the React components can be challenging, considering the global and cascading nature of CSS. We should make sure styles are targeted on a specific Micro Frontend without spilling over to other Micro Frontends. Inline CSS, CSS in JS libraries like Radium,  and CSS Modules, can be used with React.

    Redux

    Using React with Redux is kind of a norm in today’s front-end world. The convention is to use Redux as a single global store for the entire app for cross application communication. A Micro Frontend should be self-contained with no dependencies. Hence each Micro Frontend should have its own Redux store, moving towards a multiple Redux store architecture. 

    Other Noteworthy Integration Approaches  –

    Server-side Rendering – One can use a server to assemble Micro Frontend templates before dispatching it to the browser. SSI techniques can be used too.

    iframes – Each Micro Frontend can be an iframe. They also offer a good degree of isolation in terms of styling, and global variables don’t interfere with each other.

    Summary

    With Microservices, Micro Frontends promise to  bring in a lot of benefits when it comes to structuring a complex application and simplifying its development, deployment and maintenance.

    But there is a wonderful saying that goes “there is no one-size-fits-all approach that anyone can offer you. The same hot water that softens a carrot hardens an egg”. Micro Frontend is no silver bullet for your architectural problems and comes with its own share of downsides. With more repositories, more tools, more build/deploy pipelines, more servers, more domains to maintain, Micro Frontends can increase the complexity of an app. It may render cross-application communication difficult to establish. It can also lead to duplication of dependencies and an increase in application size.

    Your decision to implement this architecture will depend on many factors like the size of your organization and the complexity of your application. Whether it is a new or legacy codebase, it is advisable to apply the technique gradually over time and review its efficacy over time.

  • Build and Deploy a Real-Time React App Using AWS Amplify and GraphQL

    GraphQL is becoming a popular way to use APIs in modern web and mobile apps.

    However, learning new things is always time-consuming and without getting your hands dirty, it’s very difficult to understand the nuances of a new technology.

    So, we have put together a powerful and concise tutorial that will guide you through setting up a GraphQL backend and integration into your React app in the shortest time possible. This tutorial is light on opinions, so that once you get a hang of the fundamentals, you can go on and tailor your workflow.

    Key topics and takeaways:

    • Authentication
    • GraphQL API with AWS AppSync
    • Hosting
    • Working with multiple environments
    • Removing services

    What will we be building?

    We will build a basic real-time Restaurant CRUD app using authenticated GraphQL APIs. Click here to try the deployed version of the app to see what we’ll be building.

    Will this tutorial teach React or GraphQL concepts as well?

    No. The focus is to learn how to use AWS Amplify to build cloud-enabled, real-time web applications. If you are new to React or GraphQL, we recommend going through the official documentation and then coming back here.

    What do I need to take this tutorial?

    • Node >= v10.9.0
    • NPM >= v6.9.0 packaged with Node.

    Getting started – Creating the application

    To get started, we first need to create a React project using the create-react-app boilerplate:

    npx create-react-app amplify-app --typescript
    cd amplify-app

    Let’s now install the AWS Amplify and AWS Amplify React bindings and try running the application:

    npm install --save aws-amplify aws-amplify-react
    npm start

    If you have initialized the app with Typescript and see errors while using

    aws-amplify-react, add aws-amplify-react.d.ts to src with:

    declare module 'aws-amplify-react';

    Installing the AWS Amplify CLI and adding it to the project

    To install the CLI:

    npm install -g @aws-amplify/cli

    Now we need to configure the CLI with our credentials:

    amplify configure

    If you’d like to see a video walkthrough of this process, click here

    Here we’ll walk you through the amplify configure setup. After you sign in to the AWS console, follow these steps:

    • Specify the AWS region: ap-south-1 (Mumbai) <Select the region based on your location. Click here for reference>
    • Specify the username of the new IAM user: amplify-app <name of=”” your=”” app=””></name>

    In the AWS Console, click Next: Permissions, Next: Tags, Next: Review, and Create User to create the new IAM user. Then, return to the command line and press Enter.

    • Enter the credentials of the newly created user:
      accessKeyId: <your_access_key_id> </your_access_key_id>
      secretAccessKey: <your_secret_access_key></your_secret_access_key>
    • Profile Name: default

    To view the newly created IAM user, go to the dashboard. Also, make sure that your region matches your selection.

    To add amplify to your project:

    amplify init

    Answer the following questions:

    • Enter a name for the project: amplify-app <name of=”” your=”” app=””></name>
    • Enter a name for the environment: dev <name of=”” your=”” environment=””></name>
    • Choose your default editor: Visual Studio Code <your default editor=””></your>
    • Choose the type of app that you’re building: javaScript
    • What JavaScript framework are you using: React
    • Source Directory Path: src
    • Distribution Directory Path: build
    • Build Command: npm run build (for macOS/Linux), npm.cmd run-script build (for Windows)
    • Start Command: npm start (for macOS/Linux), npm.cmd run-script start (for Windows)
    • Do you want to use an AWS profile: Yes
    • Please choose the profile you want to use: default

    Now, the AWS Amplify CLI has initialized a new project and you will see a new folder: amplify. This folder has files that hold your project configuration.

    <amplify-app>
    |_ amplify
    |_ .config
    |_ #current-cloud-backend
    |_ backend
    team-provider-info.json

    Adding Authentication

    To add authentication:

    amplify add auth

    When prompted, choose:

    • Do you want to use default authentication and security configuration: Default configuration
    • How do you want users to be able to sign in when using your Cognito User Pool: Username
    • What attributes are required for signing up: Email

    Now, let’s run the push command to create the cloud resources in our AWS account:

    amplify push

    To quickly check your newly created Cognito User Pool, you can run

    amplify status

    To access the AWS Cognito Console at any time, go to the dashboard. Also, ensure that your region is set correctly.

    Now, our resources are created and we can start using them.

    The first thing is to connect our React application to our new AWS Amplify project. To do this, reference the auto-generated aws-exports.js file that is now in our src folder.

    To configure the app, open App.tsx and add the following code below the last import:

    import Amplify from 'aws-amplify';
    import awsConfig from './aws-exports';
    
    Amplify.configure(awsConfig);

    Now, we can start using our AWS services.
    To add the Authentication flow to the UI, export the app component by wrapping it with the authenticator HOC:

    import { withAuthenticator } from 'aws-amplify-react';
    ...
    // app component
    ...
    export default withAuthenticator(App);

    Now, let’s run the app to check if an Authentication flow has been added before our App component is rendered.

    This flow gives users the ability to sign up and sign in. To view any users that were created, go back to the Cognito dashboard. Alternatively, you can also use:

    amplify console auth

    The withAuthenticator HOC is a really easy way to get up and running with authentication, but in a real-world application, we probably want more control over how our form looks and functions. We can use the aws-amplify/Auth class to do this. This class has more than 30 methods including signUp, signIn, confirmSignUp, confirmSignIn, and forgotPassword. These functions return a promise, so they need to be handled asynchronously.

    Adding and Integrating the GraphQL API

    To add GraphQL API, use the following command:

    amplify add api

    Answer the following questions:

    • Please select from one of the below mentioned services: GraphQL
    • Provide API name: RestaurantAPI
    • Choose an authorization type for the API: API key
    • Do you have an annotated GraphQL schema: No
    • Do you want a guided schema creation: Yes
    • What best describes your project: Single object with fields (e.g., “Todo” with ID, name, description)
    • Do you want to edit the schema now: Yes

    When prompted, update the schema to the following:

    type Restaurant @model {
      id: ID!
      name: String!
      description: String!
      city: String!
    }

    Next, let’s run the push command to create the cloud resources in our AWS account:

    amplify push

    • Are you sure you want to continue: Yes
    • Do you want to generate code for your newly created GraphQL API: Yes
    • Choose the code generation language target: typescript
    • Enter the file name pattern of graphql queries, mutations and subscriptions: src/graphql/**/*.ts
    • Do you want to generate/update all possible GraphQL operations – queries, mutations and subscriptions: Yes
    • Enter maximum statement depth [increase from default if your schema is deeply nested]: 2
    • Enter the file name for the generated code: src/API.ts

    Notice your GraphQL endpoint and API KEY. This step has created a new AWS AppSync API and generated the GraphQL queries, mutations, and subscriptions on your local. To check, see src/graphql or visit the AppSync dashboard. Alternatively, you can use:

    amplify console api

    Please select from one of the below mentioned services: GraphQL

    Now, in the AppSync console, on the left side click on Queries. Execute the following mutation to create a restaurant in the API:

    mutation createRestaurant {
      createRestaurant(input: {
        name: "Nobu"
        description: "Great Sushi"
        city: "New York"
      }) {
        id name description city
      }
    }

    Now, let’s query for the restaurant:

    query listRestaurants {
      listRestaurants {
        items {
          id
          name
          description
          city
        }
      }
    }

    We can even search / filter data when querying:

    query searchRestaurants {
      listRestaurants(filter: {
        city: {
          contains: "New York"
        }
      }) {
        items {
          id
          name
          description
          city
        }
      }
    }

    Now that the GraphQL API is created, we can begin interacting with it from our client application. Here is how we’ll add queries, mutations, and subscriptions:

    import Amplify, { API, graphqlOperation } from 'aws-amplify';
    import { withAuthenticator } from 'aws-amplify-react';
    import React, { useEffect, useReducer } from 'react';
    import { Button, Col, Container, Form, Row, Table } from 'react-bootstrap';
    
    import './App.css';
    import awsConfig from './aws-exports';
    import { createRestaurant } from './graphql/mutations';
    import { listRestaurants } from './graphql/queries';
    import { onCreateRestaurant } from './graphql/subscriptions';
    
    Amplify.configure(awsConfig);
    
    type Restaurant = {
      name: string;
      description: string;
      city: string;
    };
    
    type AppState = {
      restaurants: Restaurant[];
      formData: Restaurant;
    };
    
    type Action =
      | {
          type: 'QUERY';
          payload: Restaurant[];
        }
      | {
          type: 'SUBSCRIPTION';
          payload: Restaurant;
        }
      | {
          type: 'SET_FORM_DATA';
          payload: { [field: string]: string };
        };
    
    type SubscriptionEvent<D> = {
      value: {
        data: D;
      };
    };
    
    const initialState: AppState = {
      restaurants: [],
      formData: {
        name: '',
        city: '',
        description: '',
      },
    };
    const reducer = (state: AppState, action: Action) => {
      switch (action.type) {
        case 'QUERY':
          return { ...state, restaurants: action.payload };
        case 'SUBSCRIPTION':
          return { ...state, restaurants: [...state.restaurants, action.payload] };
        case 'SET_FORM_DATA':
          return { ...state, formData: { ...state.formData, ...action.payload } };
        default:
          return state;
      }
    };
    
    const App: React.FC = () => {
      const createNewRestaurant = async (e: React.SyntheticEvent) => {
        e.stopPropagation();
        const { name, description, city } = state.formData;
        const restaurant = {
          name,
          description,
          city,
        };
        await API.graphql(graphqlOperation(createRestaurant, { input: restaurant }));
      };
    
      const [state, dispatch] = useReducer(reducer, initialState);
    
      useEffect(() => {
        getRestaurantList();
    
        const subscription = API.graphql(graphqlOperation(onCreateRestaurant)).subscribe({
          next: (eventData: SubscriptionEvent<{ onCreateRestaurant: Restaurant }>) => {
            const payload = eventData.value.data.onCreateRestaurant;
            dispatch({ type: 'SUBSCRIPTION', payload });
          },
        });
    
        return () => subscription.unsubscribe();
      }, []);
    
      const getRestaurantList = async () => {
        const restaurants = await API.graphql(graphqlOperation(listRestaurants));
        dispatch({
          type: 'QUERY',
          payload: restaurants.data.listRestaurants.items,
        });
      };
    
      const handleChange = (e: React.ChangeEvent<HTMLInputElement>) =>
        dispatch({
          type: 'SET_FORM_DATA',
          payload: { [e.target.name]: e.target.value },
        });
    
      return (
        <div className="App">
          <Container>
            <Row className="mt-3">
              <Col md={4}>
                <Form>
                  <Form.Group controlId="formDataName">
                    <Form.Control onChange={handleChange} type="text" name="name" placeholder="Name" />
                  </Form.Group>
                  <Form.Group controlId="formDataDescription">
                    <Form.Control onChange={handleChange} type="text" name="description" placeholder="Description" />
                  </Form.Group>
                  <Form.Group controlId="formDataCity">
                    <Form.Control onChange={handleChange} type="text" name="city" placeholder="City" />
                  </Form.Group>
                  <Button onClick={createNewRestaurant} className="float-left">
                    Add New Restaurant
                  </Button>
                </Form>
              </Col>
            </Row>
    
            {state.restaurants.length ? (
              <Row className="my-3">
                <Col>
                  <Table striped bordered hover>
                    <thead>
                      <tr>
                        <th>#</th>
                        <th>Name</th>
                        <th>Description</th>
                        <th>City</th>
                      </tr>
                    </thead>
                    <tbody>
                      {state.restaurants.map((restaurant, index) => (
                        <tr key={`restaurant-${index}`}>
                          <td>{index + 1}</td>
                          <td>{restaurant.name}</td>
                          <td>{restaurant.description}</td>
                          <td>{restaurant.city}</td>
                        </tr>
                      ))}
                    </tbody>
                  </Table>
                </Col>
              </Row>
            ) : null}
          </Container>
        </div>
      );
    };
    
    export default withAuthenticator(App);

    Finally, we have our app ready. You can now sign-up,sign-in, add new restaurants, see real-time updates of newly added restaurants.

    Hosting

    The hosting category enables you to deploy and host your app on AWS.

    amplify add hosting

    • Select the environment setup: DEV (S3 only with HTTP)
    • hosting bucket name: <YOUR_BUCKET_NAME>
    • index doc for the website: index.html
    • error doc for the website: index.html

    Now, everything is set up & we can publish it:

    amplify publish

    Working with multiple environments

    You can create multiple environments for your application to create & test out new features without affecting the main environment which you are working on.

    When you use an existing environment to create a new environment, you get a copy of the entire backend application stack (CloudFormation) for the current environment. When you make changes in the new environment, you are then able to test these new changes in the new environment & merge only the changes that have been made since the new environment was created.

    Let’s take a look at how to create a new environment. In this new environment, we’ll add another field for the restaurant owner to the GraphQL Schema.

    First, we’ll initialize a new environment using amplify init:

    amplify init

    • Do you want to use an existing environment: N
    • Enter a name for the environment: apiupdate
    • Do you want to use an AWS profile: Y

    Once the new environment is initialized, we should be able to see some information about our environment setup by running:

    amplify env list
    
    | Environments |
    | ------------ |
    | dev |
    | *apiupdate |

    Now, add the owner field to the GraphQL Schema in

    amplify/backend/api/RestaurantAPI/schema.graphql:

    type Restaurant @model {
      ...
      owner: String
    }

    Run the push command to create a new stack:

    amplify push.

    After testing it out, it can be merged into our original dev environment:

    amplify env checkout dev
    amplify status
    amplify push

    • Do you want to update code for your updated GraphQL API: Y
    • Do you want to generate GraphQL statements: Y

    Removing Services

    If at any time, you would like to delete a service from your project & your account, you can do this by running the amplify remove command:

    amplify remove auth
    amplify push

    If you are unsure of what services you have enabled at any time, amplify status will give you the list of resources that are currently enabled in your app.

    Sample code

    The sample code for this blog post with an end to end working app is available here.

    Summary

    Once you’ve worked through all the sections above, your app should now have all the capabilities of a modern app, and building GraphQL + React apps should now be easier and faster with Amplify.

  • Surviving & Thriving in the Age of Software Accelerations

    It has almost been a decade since Marc Andreessen made this prescient statement. Software is not only eating the world but doing so at an accelerating pace. There is no industry that hasn’t been challenged by technology startups with disruptive approaches.

    • Automakers are no longer just manufacturing companies: Tesla is disrupting the industry with their software approach to vehicle development and continuous over-the-air software delivery. Waymo’s autonomous cars have driven millions of miles and self-driving cars are a near-term reality. Uber is transforming the transportation industry into a service, potentially affecting the economics and incentives of almost 3–4% of the world GDP!
    • Social networks and media platforms had a significant and decisive impact on the US election results.
    • Banks and large financial institutions are being attacked by FinTech startups like WealthFront, Venmo, Affirm, Stripe, SoFi, etc. Bitcoin, Ethereum and the broader blockchain revolution can upend the core structure of banks and even sovereign currencies.
    • Traditional retail businesses are under tremendous pressure due to Amazon and other e-commerce vendors. Retail is now a customer ownership, recommendations, and optimization business rather than a brick and mortar one.

    Enterprises need to adopt a new approach to software development and digital innovation. At Velotio, we are helping customers to modernize and transform their business with all of the approaches and best practices listed below.

    Agility

    In this fast-changing world, your business needs to be agile and fast-moving. You need to ship software faster, at a regular cadence, with high quality and be able to scale it globally.

    Agile practices allow companies to rally diverse teams behind a defined process that helps to achieve inclusivity and drives productivity. Agile is about getting cross-functional teams to work in concert in planned short iterations with continuous learning and improvement.

    Generally, teams that work in an Agile methodology will:

    • Conduct regular stand-ups and Scrum/Kanban planning meetings with the optimal use of tools like Jira, PivotalTracker, Rally, etc.
    • Use pair programming and code review practices to ensure better code quality.
    • Use continuous integration and delivery tools like Jenkins or CircleCI.
    • Design processes for all aspects of product management, development, QA, DevOps and SRE.
    • Use Slack, Hipchat or Teams for communication between team members and geographically diverse teams. Integrate all tools with Slack to ensure that it becomes the central hub for notifications and engagement.

    Cloud-Native

    Businesses need software that is purpose-built for the cloud model. What does that mean? Software team sizes are now in the hundreds of thousands. The number of applications and software stacks is growing rapidly in most companies. All companies use various cloud providers, SaaS vendors and best-of-breed hosted or on-premise software. Essentially, software complexity has increased exponentially which required a “cloud-native” approach to manage effectively. Cloud Native Computing Foundation defines cloud native as a software stack which is:

    1. Containerized: Each part (applications, processes, etc) is packaged in its own container. This facilitates reproducibility, transparency, and resource isolation.
    2. Dynamically orchestrated: Containers are actively scheduled and managed to optimize resource utilization.
    3. Microservices oriented: Applications are segmented into micro services. This significantly increases the overall agility and maintainability of applications.

    You can deep-dive into cloud native with this blog by our CTO, Chirag Jog.

    Cloud native is disrupting the traditional enterprise software vendors. Software is getting decomposed into specialized best of breed components — much like the micro-services architecture. See the Cloud Native landscape below from CNCF.

    DevOps

    Process and toolsets need to change to enable faster development and deployment of software. Enterprises cannot compete without mature DevOps strategies. DevOps is essentially a set of practices, processes, culture, tooling, and automation that focuses on delivering software continuously with high quality.

    DevOps tool chains & process

    As you begin or expand your DevOps journey, a few things to keep in mind:

    • Customize to your needs: There is no single DevOps process or toolchain that suits all needs. Take into account your organization structure, team capabilities, current software process, opportunities for automation and goals while making decisions. For example, your infrastructure team may have automated deployments but the main source of your quality issues could be the lack of code reviews in your development team. So identify the critical pain points and sources of delay to address those first.
    • Automation: Automate everything that can be. The lesser the dependency on human intervention, the higher are the chances for success.
    • Culture: Align the incentives and goals with your development, ITOps, SecOps, SRE teams. Ensure that they collaborate effectively and ownership in the DevOps pipeline is well established.
    • Small wins: Pick one application or team and implement your DevOps strategy within it. That way you can focus your energies and refine your experiments before applying them broadly. Show success as measured by quantifiable parameters and use that to transform the rest of your teams.
    • Organizational dynamics & integrations: Adoption of new processes and tools will cause some disruptions and you may need to re-skill part of your team or hire externally. Ensure that compliance, SecOps & audit teams are aware of your DevOps journey and get their buy-in.
    • DevOps is a continuous journey: DevOps will never be done. Train your team to learn continuously and refine your DevOps practice to keep achieving your goal: delivering software reliably and quickly.

    Micro-services

    As the amount of software in an enterprise explodes, so does the complexity. The only way to manage this complexity is by splitting your software and teams into smaller manageable units. Micro-services adoption is primarily to manage this complexity.

    Development teams across the board are choosing micro services to develop new applications and break down legacy monoliths. Every micro-service can be deployed, upgraded, scaled, monitored and restarted independent of other services. Micro-services should ideally be managed by an automated system so that teams can easily update live applications without affecting end-users.

    There are companies with 100s of micro-services in production which is only possible with mature DevOps, cloud-native and agile practice adoption.

    Interestingly, serverless platforms like Google Functions and AWS Lambda are taking the concept of micro-services to the extreme by allowing each function to act like an independent piece of the application. You can read about my thoughts on serverless computing in this blog: Serverless Computing Predictions for 2017.

    Digital Transformation

    Digital transformation involves making strategic changes to business processes, competencies, and models to leverage digital technologies. It is a very broad term and every consulting vendor twists it in various ways. Let me give a couple of examples to drive home the point that digital transformation is about using technology to improve your business model, gain efficiencies or built a moat around your business:

    • GE has done an excellent job transforming themselves from a manufacturing company into an IoT/software company with Predix. GE builds airplane engines, medical equipment, oil & gas equipment and much more. Predix is an IoT platform that is being embedded into all of GE’s products. This enabled them to charge airlines on a per-mile basis by taking the ownership of maintenance and quality instead of charging on a one-time basis. This also gives them huge amounts of data that they can leverage to improve the business as a whole. So digital innovation has enabled a business model improvement leading to higher profits.
    • Car companies are exploring models where they can provide autonomous car fleets to cities where they will charge on a per-mile basis. This will convert them into a “service” & “data” company from a pure manufacturing one.
    • Insurance companies need to built digital capabilities to acquire and retain customers. They need to build data capabilities and provide ongoing value with services rather than interact with the customer just once a year.

    You would be better placed to compete in the market if you have automation and digital process in place so that you can build new products and pivot in an agile manner.

    Big Data / Data Science

    Businesses need to deal with increasing amounts of data due to IoT, social media, mobile and due to the adoption of software for various processes. And they need to use this data intelligently. Cloud platforms provide the services and solutions to accelerate your data science and machine learning strategies. AWS, Google Cloud & open-source libraries like Tensorflow, SciPy, Keras, etc. have a broad set of machine learning and big data services that can be leveraged. Companies need to build mature data processing pipelines to aggregate data from various sources and store it for quick and efficient access to various teams. Companies are leveraging these services and libraries to build solutions like:

    • Predictive analytics
    • Cognitive computing
    • Robotic Process Automation
    • Fraud detection
    • Customer churn and segmentation analysis
    • Recommendation engines
    • Forecasting
    • Anomaly detection

    Companies are creating data science teams to build long term capabilities and moats around their business by using their data smartly.

    Re-platforming & App Modernization

    Enterprises want to modernize their legacy, often monolithic apps as they migrate to the cloud. The move can be triggered due to hardware refresh cycles or license renewals or IT cost optimization or adoption of software-focused business models.

     Benefits of modernization to customers and businesses

    Intelligent Applications

    Software is getting more intelligent and to enable this, businesses need to integrate disparate datasets, distributed teams, and processes. This is best done on a scalable global cloud platform with agile processes. Big data and data science enables the creation of intelligent applications.

    How can smart applications help your business?

    • New intelligent systems of engagement: intelligent apps surface insights to users enabling the user to be more effective and efficient. For example, CRMs and marketing software is getting intelligent and multi-platform enabling sales and marketing reps to become more productive.
    • Personalisation: E-Commerce, social networks and now B2B software is getting personalized. In order to improve user experience and reduce churn, your applications should be personalized based on the user preferences and traits.
    • Drive efficiencies: IoT is an excellent example where the efficiency of machines can be improved with data and cloud software. Real-time insights can help to optimize processes or can be used for preventive maintenance.
    • Creation of new business models: Traditional and modern industries can use AI to build new business models. For example, what if insurance companies allow you to pay insurance premiums only for the miles driven?

    Security

    Security threats to governments, enterprises and data have never been greater. As business adopt cloud native, DevOps & micro-services practices, their security practices need to evolve.

    In our experience, these are few of the features of a mature cloud native security practice:

    • Automated: Systems are updated automatically with the latest fixes. Another approach is immutable infrastructure with the adoption of containers and serverless.
    • Proactive: Automated security processes tend to be proactive. For example, if a malware of vulnerability is found in one environment, automation can fix it in all environments. Mature DevOps & CI/CD processes ensure that fixes can be deployed in hours or days instead of weeks or months.
    • Cloud Platforms: Businesses have realized that the mega-clouds are way more secure than their own data centers can be. Many of these cloud platforms have audit, security and compliance services which should be leveraged.
    • Protecting credentials: Use AWS KMS, Hashicorp Vault or other solutions for protecting keys, passwords and authorizations.
    • Bug bounties: Either setup bug bounties internally or through sites like HackerOne. You want the good guys to work for you and this is an easy way to do that.

    Conclusion

    As you can see, all of these approaches and best practices are intertwined and need to be implemented in concert to gain the desired results. It is best to start with one project, one group or one application and build on early wins. Remember, that is is a process and you are looking for gradual improvements to achieve your final objectives.

    Please let us know your thoughts and experiences by adding comments to this blog or reaching out to @kalpakshah or RSI. We would love to help your business adopt these best practices and help to build great software together. Drop me a note at kalpak (at) velotio (dot) com.

  • Implementing Async Features in Python – A Step-by-step Guide

    Asynchronous programming is a characteristic of modern programming languages that allows an application to perform various operations without waiting for any of them. Asynchronicity is one of the big reasons for the popularity of Node.js.

    We have discussed Python’s asynchronous features as part of our previous post: an introduction to asynchronous programming in Python. This blog is a natural progression on the same topic. We are going to discuss async features in Python in detail and look at some hands-on examples.

    Consider a traditional web scraping application that needs to open thousands of network connections. We could open one network connection, fetch the result, and then move to the next ones iteratively. This approach increases the latency of the program. It spends a lot of time opening a connection and waiting for others to finish their bit of work.

    On the other hand, async provides you a method of opening thousands of connections at once and swapping among each connection as they finish and return their results. Basically, it sends the request to a connection and moves to the next one instead of waiting for the previous one’s response. It continues like this until all the connections have returned the outputs.  

    Source: phpmind

    From the above chart, we can see that using synchronous programming on four tasks took 45 seconds to complete, while in asynchronous programming, those four tasks took only 20 seconds.

    Where Does Asynchronous Programming Fit in the Real-world?

    Asynchronous programming is best suited for popular scenarios such as:

    1. The program takes too much time to execute.

    2. The reason for the delay is waiting for input or output operations, not computation.

    3. For the tasks that have multiple input or output operations to be executed at once.

    And application-wise, these are the example use cases:

    • Web Scraping
    • Network Services

    Difference Between Parallelism, Concurrency, Threading, and Async IO

    Because we discussed this comparison in detail in our previous post, we will just quickly go through the concept as it will help us with our hands-on example later.

    Parallelism involves performing multiple operations at a time. Multiprocessing is an example of it. It is well suited for CPU bound tasks.

    Concurrency is slightly broader than Parallelism. It involves multiple tasks running in an overlapping manner.

    Threading – a thread is a separate flow of execution. One process can contain multiple threads and each thread runs independently. It is ideal for IO bound tasks.

    Async IO is a single-threaded, single-process design that uses cooperative multitasking. In simple words, async IO gives a feeling of concurrency despite using a single thread in a single process.

     

    Fig:- A comparison in concurrency and parallelism

     

    Components of Async IO Programming

    Let’s explore the various components of Async IO in depth. We will also look at an example code to help us understand the implementation.

    1. Coroutines

    Coroutines are mainly generalization forms of subroutines. They are generally used for cooperative tasks and behave like Python generators.

    An async function uses the await keyword to denote a coroutine. When using the await keyword, coroutines release the flow of control back to the event loop.

    To run a coroutine, we need to schedule it on the event loop. After scheduling, coroutines are wrapped in Tasks as a Future object.

    Example:

    In the below snippet, we called async_func from the main function. We have to add the await keyword while calling the sync function. As you can see, async_func will do nothing unless the await keyword implementation accompanies it.

    import asyncio
    async def async_func():
        print('Velotio ...')
        await asyncio.sleep(1)
        print('... Technologies!')
    
    async def main():
        async_func()#this will do nothing because coroutine object is created but not awaited
        await async_func()
    
    asyncio.run(main())

    Output

    RuntimeWarning: coroutine 'async_func' was never awaited
     async_func()#this will do nothing because coroutine object is created but not awaited
    RuntimeWarning: Enable tracemalloc to get the object allocation traceback
    Velotio ...
    ... Blog!

    2. Tasks

    Tasks are used to schedule coroutines concurrently.

    When submitting a coroutine to an event loop for processing, you can get a Task object, which provides a way to control the coroutine’s behavior from outside the event loop.

    Example:

    In the snippet below, we are creating a task using create_task (an inbuilt function of asyncio library), and then we are running it.

    import asyncio
    async def async_func():
        print('Velotio ...')
        await asyncio.sleep(1)
        print('... Blog!')
    
    async def main():
        task = asyncio.create_task (async_func())
        await task
    asyncio.run(main())

    Output

    Velotio ...
    ... Blog!

    3 Event Loops

    This mechanism runs coroutines until they complete. You can imagine it as while(True) loop that monitors coroutine, taking feedback on what’s idle, and looking around for things that can be executed in the meantime.

    It can wake up an idle coroutine when whatever that coroutine is waiting on becomes available.

    Only one event loop can run at a time in Python.

    Example:

    In the snippet below, we are creating three tasks and then appending them in a list and executing all tasks asynchronously using get_event_loop, create_task and the await function of the asyncio library.

    import asyncio
    async def async_func(task_no):
        print(f'{task_no} :Velotio ...')
        await asyncio.sleep(1)
        print(f'{task_no}... Blog!')
    
    async def main():
        taskA = loop.create_task (async_func('taskA'))
        taskB = loop.create_task(async_func('taskB'))
        taskC = loop.create_task(async_func('taskC'))
        await asyncio.wait([taskA,taskB,taskC])
    
    if __name__ == "__main__":
        try:
            loop = asyncio.get_event_loop()
            loop.run_until_complete(main())
        except :
            pass

    Output

    taskA :Velotio ...
    taskB :Velotio ...
    taskC :Velotio ...
    taskA... Blog!
    taskB... Blog!
    taskC... Blog!

    Future

    A future is a special, low-level available object that represents an eventual result of an asynchronous operation.

    When a Future object is awaited, the co-routine will wait until the Future is resolved in some other place.

    We will look into the sample code for Future objects in the next section.

    A Comparison Between Multithreading and Async IO

    Before we get to Async IO, let’s use multithreading as a benchmark and then compare them to see which is more efficient.

    For this benchmark, we will be fetching data from a sample URL (the Velotio Career webpage) with different frequencies, like once, ten times, 50 times, 100 times, 500 times, respectively.

    We will then compare the time taken by both of these approaches to fetch the required data.

    Implementation

    Code of Multithreading:

    import requests
    import time
    from concurrent.futures import ProcessPoolExecutor
    
    
    def fetch_url_data(pg_url):
        try:
            resp = requests.get(pg_url)
        except Exception as e:
            print(f"Error occured during fetch data from url{pg_url}")
        else:
            return resp.content
            
    
    def get_all_url_data(url_list):
        with ProcessPoolExecutor() as executor:
            resp = executor.map(fetch_url_data, url_list)
        return resp
        
    
    if __name__=='__main__':
        url = "https://www.velotio.com/careers"
        for ntimes in [1,10,50,100,500]:
            start_time = time.time()
            responses = get_all_url_data([url] * ntimes)
            print(f'Fetch total {ntimes} urls and process takes {time.time() - start_time} seconds')

    Output

    Fetch total 1 urls and process takes 1.8822264671325684 seconds
    Fetch total 10 urls and process takes 2.3358211517333984 seconds
    Fetch total 50 urls and process takes 8.05638575553894 seconds
    Fetch total 100 urls and process takes 14.43302869796753 seconds
    Fetch total 500 urls and process takes 65.25404500961304 seconds

    ProcessPoolExecutor is a Python package that implements the Executor interface. The fetch_url_data is a function to fetch the data from the given URL using the requests python package, and the get_all_url_data function is used to map the fetch_url_data function to the lists of URLs.

    Async IO Programming Example:

    import asyncio
    import time
    from aiohttp import ClientSession, ClientResponseError
    
    
    async def fetch_url_data(session, url):
        try:
            async with session.get(url, timeout=60) as response:
                resp = await response.read()
        except Exception as e:
            print(e)
        else:
            return resp
        return
    
    
    async def fetch_async(loop, r):
        url = "https://www.velotio.com/careers"
        tasks = []
        async with ClientSession() as session:
            for i in range(r):
                task = asyncio.ensure_future(fetch_url_data(session, url))
                tasks.append(task)
            responses = await asyncio.gather(*tasks)
        return responses
    
    
    if __name__ == '__main__':
        for ntimes in [1, 10, 50, 100, 500]:
            start_time = time.time()
            loop = asyncio.get_event_loop()
            future = asyncio.ensure_future(fetch_async(loop, ntimes))
            loop.run_until_complete(future) #will run until it finish or get any error
            responses = future.result()
            print(f'Fetch total {ntimes} urls and process takes {time.time() - start_time} seconds')

    Output

    Fetch total 1 urls and process takes 1.3974951362609863 seconds
    Fetch total 10 urls and process takes 1.4191942596435547 seconds
    Fetch total 50 urls and process takes 2.6497368812561035 seconds
    Fetch total 100 urls and process takes 4.391665458679199 seconds
    Fetch total 500 urls and process takes 4.960426330566406 seconds

    We need to use the get_event_loop function to create and add the tasks. For running more than one URL, we have to use ensure_future and gather function.

    The fetch_async function is used to add the task in the event_loop object and the fetch_url_data function is used to read the data from the URL using the session package. The future_result method returns the response of all the tasks.

    Results:

    As you can see from the plot, async programming is much more efficient than multi-threading for the program above. 

    The graph of the multithreading program looks linear, while the asyncio program graph is similar to logarithmic.

     

    Conclusion

    As we saw in our experiment above, Async IO showed better performance with the efficient use of concurrency than multi-threading.

    Async IO can be beneficial in applications that can exploit concurrency. Though, based on what kind of applications we are dealing with, it is very pragmatic to choose Async IO over other implementations.

    We hope this article helped further your understanding of the async feature in Python and gave you some quick hands-on experience using the code examples shared above.

  • Implementing gRPC In Python: A Step-by-step Guide

    In the last few years, we saw a great shift in technology, where projects are moving towards “microservice architecture” vs the old “monolithic architecture”. This approach has done wonders for us. 

    As we say, “smaller things are much easier to handle”, so here we have microservices that can be handled conveniently. We need to interact among different microservices. I handled it using the HTTP API call, which seems great and it worked for me.

    But is this the perfect way to do things?

    The answer is a resounding, “no,” because we compromised both speed and efficiency here. 

    Then came in the picture, the gRPC framework, that has been a game-changer.

    What is gRPC?

    Quoting the official documentation

    gRPC or Google Remote Procedure Call is a modern open-source high-performance RPC framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication.”

     

    Credit: gRPC

    RPC or remote procedure calls are the messages that the server sends to the remote system to get the task(or subroutines) done.

    Google’s RPC is designed to facilitate smooth and efficient communication between the services. It can be utilized in different ways, such as:

    • Efficiently connecting polyglot services in microservices style architecture
    • Connecting mobile devices, browser clients to backend services
    • Generating efficient client libraries

    Why gRPC? 

    HTTP/2 based transport – It uses HTTP/2 protocol instead of HTTP 1.1. HTTP/2 protocol provides multiple benefits over the latter. One major benefit is multiple bidirectional streams that can be created and sent over TCP connections parallelly, making it swift. 

    Auth, tracing, load balancing and health checking – gRPC provides all these features, making it a secure and reliable option to choose.

    Language independent communication– Two services may be written in different languages, say Python and Golang. gRPC ensures smooth communication between them.

    Use of Protocol Buffers – gRPC uses protocol buffers for defining the type of data (also called Interface Definition Language (IDL)) to be sent between the gRPC client and the gRPC server. It also uses it as the message interchange format. 

    Let’s dig a little more into what are Protocol Buffers.

    Protocol Buffers

    Protocol Buffers like XML, are an efficient and automated mechanism for serializing structured data. They provide a way to define the structure of data to be transmitted. Google says that protocol buffers are better than XML, as they are:

    • simpler
    • three to ten times smaller
    • 20 to 100 times faster
    • less ambiguous
    • generates data access classes that make it easier to use them programmatically

    Protobuf are defined in .proto files. It is easy to define them. 

    Types of gRPC implementation

    1. Unary RPCs:- This is a simple gRPC which works like a normal function call. It sends a single request declared in the .proto file to the server and gets back a single response from the server.

    rpc HelloServer(RequestMessage) returns (ResponseMessage);

    2. Server streaming RPCs:- The client sends a message declared in the .proto file to the server and gets back a stream of message sequence to read. The client reads from that stream of messages until there are no messages.

    rpc HelloServer(RequestMessage) returns (stream ResponseMessage);

    3. Client streaming RPCs:- The client writes a message sequence using a write stream and sends the same to the server. After all the messages are sent to the server, the client waits for the server to read all the messages and return a response.

    rpc HelloServer(stream RequestMessage) returns (ResponseMessage);

    4. Bidirectional streaming RPCs:- Both gRPC client and the gRPC server use a read-write stream to send a message sequence. Both operate independently, so gRPC clients and gRPC servers can write and read in any order they like, i.e. the server can read a message then write a message alternatively, wait to receive all messages then write its responses, or perform reads and writes in any other combination.

    rpc HelloServer(stream RequestMessage) returns (stream ResponseMessage);

    **gRPC guarantees the ordering of messages within an individual RPC call. In the case of Bidirectional streaming, the order of messages is preserved in each stream.

    Implementing gRPC in Python

    Currently, gRPC provides support for many languages like Golang, C++, Java, etc. I will be focussing on its implementation using Python.

    mkdir grpc_example
    cd grpc_example
    virtualenv -p python3 env
    source env/bin/activate
    pip install grpcio grpcio-tools

    This will install all the required dependencies to implement gRPC.

    Unary gRPC 

    For implementing gRPC services, we need to define three files:-

    • Proto file – Proto file comprises the declaration of the service that is used to generate stubs (<package_name>_pb2.py and <package_name>_pb2_grpc.py). These are used by the gRPC client and the gRPC server.</package_name></package_name>
    • gRPC client – The client makes a gRPC call to the server to get the response as per the proto file.
    • gRPC Server – The server is responsible for serving requests to the client.
    syntax = "proto3";
    
    package unary;
    
    service Unary{
      // A simple RPC.
      //
      // Obtains the MessageResponse at a given position.
     rpc GetServerResponse(Message) returns (MessageResponse) {}
    
    }
    
    message Message{
     string message = 1;
    }
    
    message MessageResponse{
     string message = 1;
     bool received = 2;
    }

    In the above code, we have declared a service named Unary. It consists of a collection of services. For now, I have implemented a single service GetServerResponse(). This service takes an input of type Message and returns a MessageResponse. Below the service declaration, I have declared Message and Message Response.

    Once we are done with the creation of the .proto file, we need to generate the stubs. For that, we will execute the below command:-

    python -m grpc_tools.protoc --proto_path=. ./unary.proto --python_out=. --grpc_python_out=.

    Two files are generated named unary_pb2.py and unary_pb2_grpc.py. Using these two stub files, we will implement the gRPC server and the client.

    Implementing the Server

    import grpc
    from concurrent import futures
    import time
    import unary.unary_pb2_grpc as pb2_grpc
    import unary.unary_pb2 as pb2
    
    
    class UnaryService(pb2_grpc.UnaryServicer):
    
        def __init__(self, *args, **kwargs):
            pass
    
        def GetServerResponse(self, request, context):
    
            # get the string from the incoming request
            message = request.message
            result = f'Hello I am up and running received "{message}" message from you'
            result = {'message': result, 'received': True}
    
            return pb2.MessageResponse(**result)
    
    
    def serve():
        server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
        pb2_grpc.add_UnaryServicer_to_server(UnaryService(), server)
        server.add_insecure_port('[::]:50051')
        server.start()
        server.wait_for_termination()
    
    
    if __name__ == '__main__':
        serve()

    In the gRPC server file, there is a GetServerResponse() method which takes `Message` from the client and returns a `MessageResponse` as defined in the proto file.

    server() function is called from the main function, and makes sure that the server is listening to all the time. We will run the unary_server to start the server

    python3 unary_server.py

    Implementing the Client

    import grpc
    import unary.unary_pb2_grpc as pb2_grpc
    import unary.unary_pb2 as pb2
    
    
    class UnaryClient(object):
        """
        Client for gRPC functionality
        """
    
        def __init__(self):
            self.host = 'localhost'
            self.server_port = 50051
    
            # instantiate a channel
            self.channel = grpc.insecure_channel(
                '{}:{}'.format(self.host, self.server_port))
    
            # bind the client and the server
            self.stub = pb2_grpc.UnaryStub(self.channel)
    
        def get_url(self, message):
            """
            Client function to call the rpc for GetServerResponse
            """
            message = pb2.Message(message=message)
            print(f'{message}')
            return self.stub.GetServerResponse(message)
    
    
    if __name__ == '__main__':
        client = UnaryClient()
        result = client.get_url(message="Hello Server you there?")
        print(f'{result}')

    In the __init__func. we have initialized the stub using ` self.stub = pb2_grpc.UnaryStub(self.channel)’ And we have a get_url function which calls to server using the above-initialized stub  

    This completes the implementation of Unary gRPC service.

    Let’s check the output:-

    Run -> python3 unary_client.py 

    Output:-

    message: “Hello Server you there?”

    message: “Hello I am up and running. Received ‘Hello Server you there?’ message from you”

    received: true

    Bidirectional Implementation

    syntax = "proto3";
    
    package bidirectional;
    
    service Bidirectional {
      // A Bidirectional streaming RPC.
      //
      // Accepts a stream of Message sent while a route is being traversed,
       rpc GetServerResponse(stream Message) returns (stream Message) {}
    }
    
    message Message {
      string message = 1;
    }

    In the above code, we have declared a service named Bidirectional. It consists of a collection of services. For now, I have implemented a single service GetServerResponse(). This service takes an input of type Message and returns a Message. Below the service declaration, I have declared Message.

    Once we are done with the creation of the .proto file, we need to generate the stubs. To generate the stub, we need the execute the below command:-

    python -m grpc_tools.protoc --proto_path=.  ./bidirecctional.proto --python_out=. --grpc_python_out=.

    Two files are generated named bidirectional_pb2.py and bidirectional_pb2_grpc.py. Using these two stub files, we will implement the gRPC server and client.

    Implementing the Server

    from concurrent import futures
    
    import grpc
    import bidirectional.bidirectional_pb2_grpc as bidirectional_pb2_grpc
    
    
    class BidirectionalService(bidirectional_pb2_grpc.BidirectionalServicer):
    
        def GetServerResponse(self, request_iterator, context):
            for message in request_iterator:
                yield message
    
    
    def serve():
        server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
        bidirectional_pb2_grpc.add_BidirectionalServicer_to_server(BidirectionalService(), server)
        server.add_insecure_port('[::]:50051')
        server.start()
        server.wait_for_termination()
    
    
    if __name__ == '__main__':
        serve()

    In the gRPC server file, there is a GetServerResponse() method which takes a stream of `Message` from the client and returns a stream of `Message` independent of each other. server() function is called from the main function and makes sure that the server is listening to all the time.

    We will run the bidirectional_server to start the server:

    python3 bidirectional_server.py

    Implementing the Client

    from __future__ import print_function
    
    import grpc
    import bidirectional.bidirectional_pb2_grpc as bidirectional_pb2_grpc
    import bidirectional.bidirectional_pb2 as bidirectional_pb2
    
    
    def make_message(message):
        return bidirectional_pb2.Message(
            message=message
        )
    
    
    def generate_messages():
        messages = [
            make_message("First message"),
            make_message("Second message"),
            make_message("Third message"),
            make_message("Fourth message"),
            make_message("Fifth message"),
        ]
        for msg in messages:
            print("Hello Server Sending you the %s" % msg.message)
            yield msg
    
    
    def send_message(stub):
        responses = stub.GetServerResponse(generate_messages())
        for response in responses:
            print("Hello from the server received your %s" % response.message)
    
    
    def run():
        with grpc.insecure_channel('localhost:50051') as channel:
            stub = bidirectional_pb2_grpc.BidirectionalStub(channel)
            send_message(stub)
    
    
    if __name__ == '__main__':
        run()

    In the run() function. we have initialised the stub using `  stub = bidirectional_pb2_grpc.BidirectionalStub(channel)’

    And we have a send_message function to which the stub is passed and it makes multiple calls to the server and receives the results from the server simultaneously.

    This completes the implementation of Bidirectional gRPC service.

    Let’s check the output:-

    Run -> python3 bidirectional_client.py 

    Output:-

    Hello Server Sending you the First message

    Hello Server Sending you the Second message

    Hello Server Sending you the Third message

    Hello Server Sending you the Fourth message

    Hello Server Sending you the Fifth message

    Hello from the server received your First message

    Hello from the server received your Second message

    Hello from the server received your Third message

    Hello from the server received your Fourth message

    Hello from the server received your Fifth message

    For code reference, please visit here.

    Conclusion‍

    gRPC is an emerging RPC framework that makes communication between microservices smooth and efficient. I believe gRPC is currently confined to inter microservice but has many other utilities that we will see in the coming years. To know more about modern data communication solutions, check out this blog.