Category: Services

  • Building a WebSocket Service with AWS Lambda & DynamoDB

    WebSocket is an effective way for full-duplex, real-time communication between a web server and a client. It is widely used for building real-time web applications along with helper libraries that offer better features. Implementing WebSockets requires a persistent connection between two parties. Serverless functions are known for short execution time and non-persistent behavior. However, with the API Gateway support for WebSocket endpoints, it is possible to implement a Serverless service built on AWS Lambda, API Gateway, and DynamoDB.

    Prerequisites

    A basic understanding of real-time web applications will help with this implementation. Throughout this article, we will be using Serverless Framework for developing and deploying the WebSocket service. Also, Node.js is used to write the business logic. 

    Behind the scenes, Serverless uses Cloudformation to create various required resources, like API Gateway APIs, AWS Lambda functions, IAM roles and policies, etc.

    Why Serverless?

    Serverless Framework abstracts the complex syntax needed for creating the Cloudformation stacks and helps us focus on the business logic of the services. Along with that, there are a variety of plugins available that help developing serverless applications easier.

    Why DynamoDB?

    We need persistent storage for WebSocket connection data, along with AWS Lambda. DynamoDB, a serverless key-value database from AWS, offers low latency, making it a great fit for storing and retrieving WebSocket connection details.

    Overview

    In this application, we’ll be creating an AWS Lambda service that accepts the WebSocket connections coming via API Gateway. The connections and subscriptions to topics are persisted using DynamoDB. We will be using ws for implementing basic WebSocket clients for the demonstration. The implementation has a Lambda consuming WebSocket that receives the connections and handles the communication. 

    Base Setup

    We will be using the default Node.js boilerplate offered by Serverless as a starting point.

    serverless create --template aws-nodejs

    A few of the Serverless plugins are installed and used to speed up the development and deployment of the Serverless stack. We also add the webpack config given here to support the latest JS syntax.

    Adding Lambda role and policies:

    The lambda function requires a role attached to it that has enough permissions to access DynamoDB and Execute API. These are the links for the configuration files:

    Link to dynamoDB.yaml

    Link to lambdaRole.yaml

    Adding custom config for plugins:

    The plugins used for local development must have the custom config added in the yaml file.

    This is how our serverless.yaml file should look like after the base serverless configuration:

    service: websocket-app
    frameworkVersion: '2'
    custom:
     dynamodb:
       stages:
         - dev
       start:
         port: 8000
         inMemory: true
         heapInitial: 200m
         heapMax: 1g
         migrate: true
         convertEmptyValues: true
     webpack:
       keepOutputDirectory: true
       packager: 'npm'
       includeModules:
         forceExclude:
           - aws-sdk
     
    provider:
     name: aws
     runtime: nodejs12.x
     lambdaHashingVersion: 20201221
    plugins:
     - serverless-dynamodb-local
     - serverless-plugin-existing-s3
     - serverless-dotenv-plugin
     - serverless-webpack
     - serverless-offline
    resources:
     - Resources: ${file(./config/dynamoDB.yaml)}
     - Resources: ${file(./config/lambdaRoles.yaml)}
    functions:
     hello:
       handler: handler.hello

    Add WebSocket Lambda:

    We need to create a lambda function that accepts WebSocket events from API Gateway. As you can see, we’ve defined 3 WebSocket events for the lambda function.

    • $connect
    • $disconnect
    • $default

    These 3 events stand for the default routes that come with WebSocket API Gateway offering. $connect and $disconnect are used for initialization and termination of the socket connection, where $default route is for data transfer.

    functions:
     websocket:
       handler: lambda/websocket.handler
       events:
         - websocket:
             route: $connect
         - websocket:
             route: $disconnect
         - websocket:
             route: $default

    We can go ahead and update how data is sent and add custom WebSocket routes to the application.

    The lambda needs to establish a connection with the client and handle the subscriptions. The logic for updating the DynamoDB is written in a utility class client. Whenever a connection is received, we create a record in the topics table.

    console.log(`Received socket connectionId: ${event.requestContext && event.requestContext.connectionId}`);
           if (!(event.requestContext && event.requestContext.connectionId)) {
               throw new Error('Invalid event. Missing `connectionId` parameter.');
           }
           const connectionId = event.requestContext.connectionId;
           const route = event.requestContext.routeKey;
           console.log(`data from ${connectionId} ${event.body}`);
           const connection = new Client(connectionId);
           const response = { statusCode: 200, body: '' };
     
           if (route === '$connect') {
               console.log(`Route ${route} - Socket connectionId connectedconected: ${event.requestContext && event.requestContext.connectionId}`);
               await new Client(connectionId).connect();
               return response;
           } 

    The Client utility class internally creates a record for the given connectionId in the DynamoDB topics table.

    async subscribe({ topic, ttl }) {
       return dynamoDBClient
         .put({ 
            Item: {
             topic,
             connectionId: this.connectionId,
            ttl: typeof ttl === 'number' ? ttl : Math.floor(Date.now() / 1000) + 60 * 60 * 2,
           },
           TableName: process.env.TOPICS_TABLE,
         }).promise();
     }

    Similarly, for the $disconnect route, we remove the INITIAL_CONNECTION topic record when a client disconnects.

    else if (route === '$disconnect') {
     console.log(`Route ${route} - Socket disconnected: ${ event.requestContext.connectionId}`);
               await new Client(connectionId).unsubscribe();
               return response;
           }

    The client.unsubscribe method internally removes the connection entry from the DynamoDB table. Here, the getTopics method fetches all the topics the particular client has subscribed to.

    async unsubscribe() {
       const topics = await this.getTopics();
       if (!topics) {
         throw Error(`Topics got undefined`);
       }
       return this.removeTopics({
         [process.env.TOPICS_TABLE]: topics.map(({ topic, connectionId }) => ({
           DeleteRequest: { Key: { topic, connectionId } },
         })),
       });
     }

    Now comes the default route part of the lambda where we customize message handling. In this implementation, we’re relaying our message handling based on the event.body.type, which indicates what kind of message is received from the client to server. The subscribe type here is used to subscribe to new topics. Similarly, the message type is used to receive the message from one client and then publish it to other clients who have subscribed to the same topic as the sender.

    console.log(`Route ${route} - data from ${connectionId}`);
               if (!event.body) {
                   return response;
               }
               let body = JSON.parse(event.body);
               const topic = body.topic;
               if (body.type === 'subscribe') {
                   connection.subscribe({ topic });
                   console.log(`Client subscribing for topic: ${topic}`);
               }
               if (body.type === 'message') {
                   await new Topic(topic).publishMessage({ data: body.message });
                   console.error(`Published messages to subscribers`);
                   return response;
               }
               return response;

    Similar to $connect, the subscribe type of payload, when received, creates a new subscription for the mentioned topic.

    Publishing the messages

    Here is the interesting part of this lambda. When a client sends a payload with type message, the lambda calls the publishMessage method with the data received. The method gets the subscribers active for the topic and publishes messages using another utility TopicSubscriber.sendMessage

    async publishMessage(data) {
       const subscribers = await this.getSubscribers();
       const promises = subscribers.map(async ({ connectionId, subscriptionId }) => {
         const TopicSubscriber = new Client(connectionId);
           const res = await TopicSubscriber.sendMessage({
             id: subscriptionId,
             payload: { data },
             type: 'data',
           });
           return res;
       });
       return Promise.all(promises);
     }

    The sendMessage executes the API endpoint, which is the API Gateway URL after deployment. As we’re using serverless-offline for the local development, the IS_OFFLINE env variable is automatically set.

    const endpoint =  process.env.IS_OFFLINE ? 'http://localhost:3001' : process.env.PUBLISH_ENDPOINT;
       console.log('publish endpoint', endpoint);
       const gatewayClient = new ApiGatewayManagementApi({
         apiVersion: '2018-11-29',
         credentials: config,
         endpoint,
       });
       return gatewayClient
         .postToConnection({
           ConnectionId: this.connectionId,
           Data: JSON.stringify(message),
         })
         .promise();

    Instead of manually invoking the API endpoint, we can also use DynamoDB streams to trigger a lambda asynchronously and publish messages to topics.

    Implementing the client

    For testing the socket implementation, we will be using a node.js script ws-client.js. This creates two nodejs ws clients: one that sends the data and another that receives it.

    const WebSocket = require('ws');
    const sockedEndpoint = 'http://0.0.0.0:3001';
    const ws1 = new WebSocket(sockedEndpoint, {
     perMessageDeflate: false
    });
    const ws2 = new WebSocket(sockedEndpoint, {
     perMessageDeflate: false
    });

    The first client on connect sends the data at an interval of one second to a topic named general. The count increments each send.

    ws1.on('open', () => {
       console.log('WS1 connected');
       let count = 0;
       setInterval(() => {
         const data = {
           type: 'message',
           message: `count is ${count}`,
           topic: 'general'
         }
         const message  = JSON.stringify(data);
         ws1.send(message, (err) => {
           if(err) {
             console.log(`Error occurred while send data ${err.message}`)
           }
           console.log(`WS1 OUT ${message}`);
         })
         count++;
       }, 15000)
    })

    The second client on connect will first subscribe to the general topic and then attach a handler for receiving data.

    ws2.on('open', () => {
     console.log('WS2 connected');
     const data = {
       type: 'subscribe',
       topic: 'general'
     }
     ws2.send(JSON.stringify(data), (err) => {
       if(err) {
         console.log(`Error occurred while send data ${err.message}`)
       }
     })
    });
    ws2.on('message', ( message) => {
     console.log(`ws2 IN ${message}`);
    });

    Once the service is running, we should be able to see the following output, where the two clients successfully sharing and receiving the messages with our socket server.

    Conclusion

    With API Gateway WebSocket support and DynamoDB, we’re able to implement persistent socket connections using serverless functions. The implementation can be improved and can be as complex as needed.

    WebSocket is an effective way for full-duplex, real-time communication between a web server and a client. It is widely used for building real-time web applications along with helper libraries that offer better features. Implementing WebSockets requires a persistent connection between two parties. Serverless functions are known for short execution time and non-persistent behavior. However, with the API Gateway support for WebSocket endpoints, it is possible to implement a Serverless service built on AWS Lambda, API Gateway, and DynamoDB.

  • Elasticsearch – Basic and Advanced Concepts

    What is Elasticsearch?

    In our previous blog, we have seen Elasticsearch is a highly scalable open-source full-text search and analytics engine, built on the top of Apache Lucene. Elasticsearch allows you to store, search, and analyze huge volumes of data as quickly as possible and in near real-time.

    Basic Concepts –

    • Index – Large collection of JSON documents. Can be compared to a database in relational databases. Every document must reside in an index.
    • Shards – Since, there is no limit on the number of documents that reside in an index, indices are often horizontally partitioned as shards that reside on nodes in the cluster. 
      Max documents allowed in a shard = 2,147,483,519 (as of now)
    • Type – Logical partition of an index. Similar to a table in relational databases. 
    • Fields – Similar to a column in relational databases. 
    • Analyzers – Used while indexing/searching the documents. These contain “tokenizers” that split phrases/text into tokens and “token-filters”, that filter/modify tokens during indexing & searching.
    • Mappings – Combination of Field + Analyzers. It defines how your fields can be stored & indexed.

    Inverted Index

    ES uses Inverted Indexes under the hood. Inverted Index is an index which maps terms to documents containing them.

    Let’s say, we have 3 documents :

    1. Food is great
    2. It is raining
    3. Wind is strong

    An inverted index for these documents can be constructed as –

    The terms in the dictionary are stored in a sorted order to find them quickly.

    Searching multiple terms is done by performing a lookup on the terms in the index. It performs either UNION or INTERSECTION on them and fetches relevant matching documents.

    An ES Index is spanned across multiple shards, each document is routed to a shard in a round–robin fashion while indexing. We can customize which shard to route the document, and which shard search-requests are sent to.

    ES Index is made of multiple Lucene indexes, which in turn, are made up of index segments. These are write once, read many types of indices, i.e the index files Lucene writes are immutable (except for deletions).

    Analyzers –

    Analysis is the process of converting text into tokens or terms which are added to the inverted index for searching. Analysis is performed by an analyzer. An analyzer can be either a built-in or a custom. 

    We can define single analyzer for both indexing & searching, or a different search-analyzer and an index-analyzer for a mapping.

    Building blocks of analyzer- 

    • Character filters – receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.
    • Tokenizers – receives a stream of characters, breaks it up into individual tokens. 
    • Token filters – receives the token stream and may add, remove, or change tokens.

    Some Commonly used built-in analyzers –

    1. Standard –

    Divides text into terms on word boundaries. Lower-cases all terms. Removes punctuation and stopwords (if specified, default = None).

    Text:  The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [the, 2, quick, brown, foxes, jumped, over, the, lazy, dog’s, bone]

    2. Simple/Lowercase –

    Divides text into terms whenever it encounters a non-letter character. Lower-cases all terms.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

    3. Whitespace –

    Divides text into terms whenever it encounters a white-space character.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog’s, bone.]

    4. Stopword –

    Same as simple-analyzer with stop word removal by default.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [ quick, brown, foxes, jumped, over, lazy, dog, s, bone]

    5. Keyword / NOOP –

    Returns the entire input string as it is.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.]

    Some Commonly used built-in tokenizers –

    1. Standard –

    Divides text into terms on word boundaries, removes most punctuation.

    2. Letter –

    Divides text into terms whenever it encounters a non-letter character.

    3. Lowercase –

    Letter tokenizer which lowercases all tokens.

    4. Whitespace –

    Divides text into terms whenever it encounters any white-space character.

    5. UAX-URL-EMAIL –

    Standard tokenizer which recognizes URLs and email addresses as single tokens.

    6. N-Gram –

    Divides text into terms when it encounters anything from a list of specified characters (e.g. whitespace or punctuation), and returns n-grams of each word: a sliding window of continuous letters, e.g. quick → [qu, ui, ic, ck, qui, quic, quick, uic, uick, ick].

    7. Edge-N-Gram –

    It is similar to N-Gram tokenizer with n-grams anchored to the start of the word (prefix- based NGrams). e.g. quick → [q, qu, qui, quic, quick].

    8. Keyword –

    Emits exact same text as a single term.

    Make your mappings right –

    Analyzers if not made right, can increase your search time extensively. 

    Avoid using regular expressions in queries as much as possible. Let your analyzers handle them.

    ES provides multiple tokenizers (standard, whitespace, ngram, edge-ngram, etc) which can be directly used, or you can create your own tokenizer. 

    A simple use-case where we had to search for a user who either has “brad” in their name or “brad_pitt” in their email (substring based search), one would simply go and write a regex for this query, if no proper analyzers are written for this mapping.

    {
      "query": {
        "bool": {
          "should": [
            {
              "regexp": {
                "email.raw": ".*brad_pitt.*"
              }
            },
            {
              "regexp": {
                "name.raw": ".*brad.*"
              }
            }
          ]
        }
      }
    }

    This took 16s for us to fetch 1 lakh out of 60 million documents

    Instead, we created an n-gram analyzer with lower-case filter which would generate all relevant tokens while indexing.

    The above regex query was updated to –

    {
      "query": {
        "bool": {
          "multi_match": {
            "query": "brad",
            "fields": [
              "email.suggestion",
              "full_name.suggestion"
            ]
          }
        }
      },
      "size": 25
    }

    This took 109ms for us to fetch 1 lakh out of 60 million documents

    Thus, previous search query which took more than 10-25s got reduced to less than 800-900ms to fetch the same set of records.

    Had the use-case been to search results where name starts with “brad” or email starts with “brad_pitt” (prefix based search), it is better to go for edge-n-gram analyzer or suggesters.

    Performance Improvement with Filter Queries –

    Use Filter queries whenever possible. 

    ES usually scores documents and returns them in sorted order as per their scores. This may take a hit on performance if scoring of documents is not relevant to our use-case. In such scenarios, use “filter” queries which give boolean scores to documents.

    {
      "query": {
        "bool": {
          "multi_match": {
            "query": "brad",
            "fields": [
              "email.suggestion",
              "full_name.suggestion"
            ]
          }
        }
      },
      "size": 25
    }

    Above query can now be written as –

    {
      "query": {
        "bool": {
          "filter": {
            "bool": {
              "must": [
                {
                  "multi_match": {
                    "query": "brad",
                    "fields": [
                      "email.suggestion",
                      "full_name.suggestion"
                    ]
                  }
                }
              ]
            }
          }
        }
      },
      "size": 25
    }

    This will reduce query-time by a few milliseconds.

    Re-indexing made faster –

    Before creating any mappings, know your use-case well.

    ES does not allow us to alter existing mappings unlike “ALTER” command in relational databases, although we can keep adding new mappings to the index. 

    The only way to change existing mappings is by creating a new index, re-indexing existing documents and aliasing the new-index with required name with ZERO downtime on production. Note – This process can take days if you have millions of records to re-index.

    To re-index faster, we can change a few settings  –

    1. Disable swapping – Since no requests will be directed to the new index till indexing is done, we can safely disable swap
    Command for Linux machines –

    sudo swapoff -a

    2. Disable refresh_interval for ES – Default refresh_interval is 1s which can safely be disabled while documents are getting re-indexed.

    3. Change bulk size while indexing – ES usually indexes documents in chunks of size 1k. It is preferred to increase this default size to approx 5 to 10K, although we need to find the sweet spot while reindexing to avoid load on current index.

    4. Reset replica count to 0  – ES creates at least 1 replica per shard, by default. We can set this to 0 while indexing & reset it to required value post indexing.

    Conclusion

    ElasticSearch is a very powerful database for text-based searches. The Elastic ecosystem is widely used for reporting, alerting, machine learning, etc. This article just gives an overview of ElasticSearch mappings and how creating relevant mappings can improve your query performance & accuracy. Giving right mappings, right resources to your ElasticSearch cluster can do wonders.

  • How To Use Inline Functions In React Applications Efficiently

    This blog post explores the performance cost of inline functions in a React application. Before we begin, let’s try to understand what inline function means in the context of a React application.

    What is an inline function?

    Simply put, an inline function is a function that is defined and passed down inside the render method of a React component.

    Let’s understand this with a basic example of what an inline function might look like in a React application:

    export default class CounterApp extends React.Component {
      constructor(props) {
        super(props);
        this.state = { count: 0 };
      }
      render() {
        return (
          <div className="App">
            <button
              onClick={() => {
              this.setState({ count: this.state.count + 1 });
              }}
            >COUNT ({this.state.count})</button>
          </div>
        );
      }
    }

    The onClick prop, in the example above, is being passed as an inline function that calls this.setState. The function is defined within the render method, often inline with JSX. In the context of React applications, this is a very popular and widely used pattern.

    Let’s begin by listing some common patterns and techniques where inline functions are used in a React application:

    • Render prop: A component prop that expects a function as a value. This function must return a JSX element, hence the name. Render prop is a good candidate for inline functions.
    render() {
      return (
        <ListView
          items={items}
          render={({ item }) => (<div>{item.label}</div>)}
        />
      );
    }

    • DOM event handlers: DOM event handlers often make a call to setState or invoke some effect in the React application such as sending data to an API server.
    <button
      onClick={() => {
        this.setState({ count: this.state.count + 1 });
      }}>
      COUNT ({this.state.count})
    </button>

    • Custom function or event handlers passed to child: Oftentimes, a child component requires a custom event handler to be passed down as props. Inline function is usually used in this scenario.
    <Button onTap={() => {
      this.nextPage();
    }}>Next<Button>

    Alternatives to inline function

    • Bind in constructor: One of the most common patterns is to define the function within the class component and then bind context to the function in constructor. We only need to bind the current context if we want to use this keyword inside the handler function.
    export default class CounterApp extends React.Component {
      constructor(props) {
        super(props);
        this.state = { count: 0 };
        this.increaseCount = this.increaseCount.bind(this);
      }
    
      increaseCount() {
        this.setState({ count: this.state.count + 1 });
      }
    
      render() {
        return (
          <div className="App">
            <button onClick={this.increaseCount}>COUNT ({this.state.count})</button>
          </div>
        );
     }
    }

    • Bind in render: Another common pattern is to bind the context inline when the function is passed down. Eventually, this gets repetitive and hence the first approach is more popular.
    render() {
      return (
        <div className="App">
          <button onClick={this.increaseCount.bind(this)}>COUNT ({this.state.count})</button>
        </div>
      );
    }

    • Define as public field:
    increaseCount = () => {
      this.setState({ count: this.state.count + 1 });
    };
    
    render() {
      return (
        <div className="App">
          <button onClick={this.increaseCount}>
            COUNT ({this.state.count})
          </button>
        </div>
      );
    }
    view raw

    There are several other approaches that React dev community has come up with, like using a helper method to bind all functions automatically in the constructor.

    After understanding inline functions with its examples and also taking a look at a few alternatives, let’s see why inline functions are so popular and widely used.

    Why use inline function

    Inline function definitions are right where they are invoked or passed down. This means inline functions are easier to write, especially when the body of the function is of a few instructions such as calling setState. This works well within loops as well.

    For example, when rendering a list and assigning a DOM event handler to each list item, passing down an inline function feels much more intuitive. For the same reason, inline functions also make code more organized and readable.

    Inline arrow functions preserve context that means developers can use this without having to worry about current execution context or explicitly bind a context to the function.

    <Button onTap={() => {
      this.prevPage();
    }}>Previous<Button>

    Inline functions make value from parent scope available within the function definition. It results in more intuitive code and developers need to pass down fewer parameters. Let’s understand this with an example.

    render() {
      const { count } = this.state;
      return (
        <div className="App">
          <button
            onClick={() => {
              this.setState({ count: count + 1 });
          }}>
            COUNT ({count})
          </button>
        </div>
      );
    }

    Here, the value of count is readily available to onClick event handlers. This behavior is called closing over.

    For these reasons, React developers make use of inline functions heavily. That said, inline function has also been a hot topic of debate because of performance concerns. Let’s take a look at a few of these arguments.

    Arguments against inline functions

    • A new function is defined every time the render method is called. It results in frequent garbage collection, and hence performance loss.
    • There is an eslint config that advises against using inline function jsx-no-bind. The idea behind this rule is when an inline function is passed down to a child component, React uses reference checks to re-render the component. This can result in child component rendering again and again as a reference to the passed prop value i.e. inline function. In this case, it doesn’t match the original one.

    <listitem onclick=”{()” ==””> console.log(‘click’)}></listitem>

    Suppose ListItem component implements shouldComponentUpdate method where it checks for onClick prop reference. Since inline functions are created every time a component re-renders, this means that the ListItem component will reference a new function every time, which points to a different location in memory. The comparison checks in shouldComponentUpdate and tells React to re-render ListItem even though the inline function’s behavior doesn’t change. This results in unnecessary DOM updates and eventually reduces the performance of applications.

    Performance concerns revolving around the Function.prototype.bind methods: when not using arrow functions, the inline function being passed down must be bound to a context if using this keyword inside the function. The practice of calling .bind before passing down an inline function raises performance concerns, but it has been fixed. For older browsers, Function.prototype.bind can be supplemented with a polyfill for performance.

    Now that we’ve summarized a few arguments in favor of inline functions and a few arguments against it, let’s investigate and see how inline functions really perform.

    render() {
      return (
        <div>
          {this.state.timeThen > this.state.timeNow ? (
           <>
             <button onClick={() => { /* some action */ }} />
             <button onClick={() => { /* another action */ }} />
           </>
          ) : (
            <button onClick={() => { /* yet another action */ }} />
          )}
        </div>
      );
    }

    Pre-optimization can often lead to bad code. For instance, let’s try to get rid of all the inline function definitions in the component above and move them to the constructor because of performance concerns.

    We’d then have to define 3 custom event handlers in the class definition and bind context to all three functions in the constructor.

    export default class CounterApp extends React.Component {
      constructor(props) {
        super(props);
        this.state = {
          timeThen: ...,
          timeNow: Date.now()
        };
        this.someAction = this.someAction.bind(this);
        this.anotherAction = this.anotherAction.bind(this);
        this.yetAnotherAction = this.yetAnotherAction.bind(this);
      }
    
      someAction() { /* some action */ }
      anotherAction() { /* another action */ }
      yetAnotherAction() { /* yet another action */ }
    
      render() {
        return (<div>
          {this.state.timeThen > this.state.timeNow ? (
            <>
              <button onClick={this.someAction} />
              <button onClick={this.anotherAction} />
            </>
          ) : (
            <button onClick={this.yetAnotherAction} />
          )}</div>);
      }
    }

    This would increase the initialization time of the component significantly as opposed to inline function declarations where only one or two functions are defined and used at a time based on the result of condition timeThen > timeNow.

    Concerns around render props: A render prop is a method that returns a React element and is used to share state among React components.

    Render props are meant to be invoked on each render since they share state between parent components and enclosed React elements. Inline functions are a good candidate for use in render prop and won’t cause any performance concern.

    render() {
      return (
        <ListView
          items={items}
          render={({ item }) => (<div>{item.label}</div>)}
        />
      )
    }

    Here, the render prop of ListView component returns a label enclosed in a div. Since the enclosed component can never know what the value of the item variable is, it can never be a PureComponent or have a meaningful implementation of shouldComponentUpdate(). This eliminates the concerns around use of inline function in render prop. In fact, promotes it in most cases.

    In my experience, inline render props can sometimes be harder to maintain especially when render prop returns a larger more complicated component in terms of code size. In such cases, breaking down the component further or having a separate method that gets passed down as render prop has worked well for me.

    Concerns around PureComponents and shouldComponentUpdate(): Pure components and various implementations of shouldComponentUpdate both do a strict type comparison of props and state. These act as performance enhancers by letting React know when or when not to trigger a render based on changes to state and props. Since inline functions are created on every render, when such a method is passed down to a pure component or a component that implements the shouldComponentUpdate method, it can lead to an unnecessary render. This is because of the changed reference of the inline function.

    To overcome this, consider skipping checks on all function props in shouldComponentUpdate(). This assumes that inline functions passed to the component are only different in reference and not behavior. If there is a difference in the behavior of the function passed down, it will result in a missed render and eventually lead to bugs in the component’s state and effects.

    Conclusion‍

    A common rule of thumb is to measure performance of the app and only optimize if needed. Performance impact of inline function, often categorized under micro-optimizations, is always a tradeoff between code readability, performance gain, code organization, etc that must be thought out carefully on a case by case basis and pre-optimization should be avoided.

    In this blog post, we observed that inline functions don’t necessarily bring in a lot of performance cost. They are widely used because of ease of writing, reading and organizing inline functions, especially when inline function definitions are short and simple.

  • Mesosphere DC/OS Masterclass : Tips and Tricks to Make Life Easier

    DC/OS is an open-source operating system and distributed system for data center built on Apache Mesos distributed system kernel. As a distributed system, it is a cluster of master nodes and private/public nodes, where each node also has host operating system which manages the underlying machine. 

    It enables the management of multiple machines as if they were a single computer. It automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate remote management and monitoring of the cluster and its services.

    • Distributed System DC/OS is distributed system with group of private and public nodes which are coordinated by master nodes.
    • Cluster Manager : DC/OS  is responsible for running tasks on agent nodes and providing required resources to them. DC/OS uses Apache Mesos to provide cluster management functionality.
    • Container Platform : All DC/OS tasks are containerized. DC/OS uses two different container runtimes, i.e. docker and mesos. So that containers can be started from docker images or they can be native executables (binaries or scripts) which are containerized at runtime by mesos.
    • Operating System :  As name specifies, DC/OS is an operating system which abstracts cluster h/w and s/w resources and provide common services to applications.

    Unlike Linux, DC/OS is not a host operating system. DC/OS spans multiple machines, but relies on each machine to have its own host operating system and host kernel.

    The high level architecture of DC/OS can be seen below :

    For the detailed architecture and components of DC/OS, please click here.

    Adoption and usage of Mesosphere DC/OS:

    Mesosphere customers include :

    • 30% of the Fortune 50 U.S. Companies
    • 5 of the top 10 North American Banks
    • 7 of the top 12 Worldwide Telcos
    • 5 of the top 10 Highest Valued Startups

    Some companies using DC/OS are :

    • Cisco
    • Yelp
    • Tommy Hilfiger
    • Uber
    • Netflix
    • Verizon
    • Cerner
    • NIO

    Installing and using DC/OS

    A guide to installing DC/OS can be found here. After installing DC/OS on any platform, install dcos cli by following documentation found here.

    Using dcos cli, we can manager cluster nodes, manage marathon tasks and services, install/remove packages from universe and it provides great support for automation process as each cli command can be output to json.

    NOTE: The tasks below are executed with and tested on below tools:

    • DC/OS 1.11 Open Source
    • DC/OS cli 0.6.0
    • jq:1.5-1-a5b5cbe

    DC/OS commands and scripts

    Setup DC/OS cli with DC/OS cluster

    dcos cluster setup <CLUSTER URL>

    Example :

    dcos cluster setup http://dcos-cluster.com

    The above command will give you the link for oauth authentication and prompt for auth token. You can authenticate yourself with any of Google, Github or Microsoft account. Paste the token generated after authentication to cli prompt. (Provided oauth is enabled).

    DC/OS authentication token

    docs config show core.dcos_acs_token

    DC/OS cluster url

    dcos config show core.dcos_url

    DC/OS cluster name

    dcos config show cluster.name

    Access Mesos UI

    <DC/OS_CLUSTER_URL>/mesos

    Example:

    http://dcos-cluster.com/mesos

    Access Marathon UI

    <DC/OS_CLUSTER_URL>/service/marathon

    Example:

    http://dcos-cluster.com/service/marathon

    Access any DC/OS service, like Marathon, Kafka, Elastic, Spark etc.[DC/OS Services]

    <DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

    Example:

    http://dcos-cluster.com/service/marathon
    http://dcos-cluster.com/service/kafka

    Access DC/OS slaves info in json using Mesos API [Mesos Endpoints]

    curl -H "Authorization: Bearer $(dcos config show 
    core.dcos_acs_token)" $(dcos config show 
    core.dcos_url)/mesos/slaves | jq

    Access DC/OS slaves info in json using DC/OS cli

    dcos node --json

    Note : DC/OS cli ‘dcos node –json’ is equivalent to running mesos slaves endpoint (/mesos/slaves)

    Access DC/OS private slaves info using DC/OS cli

    dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

    Access DC/OS public slaves info using DC/OS cli

    dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r

    Access DC/OS private and public slaves info using DC/OS cli

    dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort

    Get public IP of all public agents

    #!/bin/bash
    
    for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id'); 
    do 
          dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
    done 2>/dev/null

    Note: As ‘dcos node ssh’ requires private key to be added to ssh. Make sure you add your private key as ssh identity using :

    ssh-add </path/to/private/key/file/.pem>

    Get public IP of master leader

    dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null

    Get all master nodes and their private ip

    dcos node --json | jq '.[] | select(.type | contains("master"))
    | .ip + " = " + .type' -r

    Get list of all users who have access to DC/OS cluster

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    "$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r

    Add users to cluster using Mesosphere script (Run this on master)

    Users to add are given in list.txt, each user on new line

    for i in `cat list.txt`; do echo $i;
    sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

    Add users to cluster using DC/OS API

    #!/bin/bash
    
    # Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
    for i in `cat users.list`; 
    do 
      echo $i
      curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
    done

    Delete users from DC/OS cluster organization

    #!/bin/bash
    
    # Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>
    
    for i in `cat users.list`; 
    do 
      echo $i
      curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
    done

    Offers/resources from individual DC/OS agent

    In recent versions of the many dcos services, a scheduler endpoint at                

    http://yourcluster.com/service/<service-name>/v1/debug/offers

    will display an HTML table containing a summary of recently-evaluated offers. This table’s contents are currently very similar to what can be found in logs, but in a slightly more accessible format. Alternately, we can look at the scheduler’s logs in stdout. An offer is a set of resources all from one individual DC/OS agent.

    <DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers

    Example:

    http://dcos-cluster.com/service/kafka/v1/debug/offers
    http://dcos-cluster.com/service/elastic/v1/debug/offers

    Save JSON configs of all running Marathon apps

    #!/bin/bash
    
    # Save marathon configs in json format for all marathon apps
    # Usage : saveMarathonConfig.sh
    
    for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
      dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
    done

    Get report of Marathon apps with details like container type, Docker image, tag or service version used by Marathon app.

    #!/bin/bash
    
    TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
    TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
    #dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
    dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION  else "[ CMD ]" end end' -r > $TMP_CSV_FILE
    sed -i "s|^/||g" $TMP_CSV_FILE
    sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
    cnt=1
    printf '%.0s=' {1..150}
    printf "n  %-5s%-35s%-23s%-40s%-20sn" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
    printf '%.0s=' {1..150}
    while IFS=, read -r app typ image; 
    do
            tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
            image=`echo $image | awk -F':' '{print $1}'`
            printf "n  %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
            cnt=$((cnt + 1))
            sleep 0.3
    done < $TMP_CSV_FILE_SORT
    printf "n"
    printf '%.0s=' {1..150}
    printf "n"

    Get DC/OS nodes with more information like node type, node ip, attributes, number of running tasks, free memory, free cpu etc.

    #!/bin/bash
    
    printf "n  %-15s %-18s%-18s%-10s%-15s%-10sn" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
    printf '%.0s=' {1..90}
    printf "n"
    TAB=`echo -e "t"`
    dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "t"+ if(.type |contains("master")) then .ip else .hostname end + "t" +  (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "t" + ((.resources.mem - .used_resources.mem)| tostring) + "tt" +  ((.resources.cpus - .used_resources.cpus)| tostring)  else "ttNAtNAttNA"  end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
    printf '%.0s=' {1..90}
    printf "n"

    Framework Cleaner

    Uninstall framework and clean reserved resources if any after framework is deleted/uninstalled. (applicable if running DC/OS 1.9 or older, if higher than 1.10, then only uninstall cli is sufficient)

    SERVICE_NAME=
    dcos package uninstall $SERVICE_NAME
    dcos node ssh --option StrictHostKeyChecking=no --master-proxy
    --leader "docker run mesosphere/janitor /janitor.py -r
    ${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"

    Get DC/OS apps and their placement constraints

    dcos marathon app list --json | jq '.[] |
    if (.constraints != null) then .id, .constraints else empty end'

    Run shell command on all slaves

    #!/bin/bash
    
    # Run any shell command on all slave nodes (private and public)
    
    # Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
    CMD=$1
    for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do 
       echo -e "n###> Running command [ $CMD ] on $i"
       dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
       echo -e "======================================n"
    done

    Run shell command on master leader

    CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
    LogLevel=quiet --master-proxy --leader "$CMD"

    Run shell command on all master nodes

    #!/bin/bash
    
    # Run any shell command on all master nodes
    
    # Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
    CMD=$1
    for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
    do 
      echo -e "n###> Running command [ $CMD ] on $i"
      dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
     echo -e "======================================n"
    done

    Add node attributes to dcos nodes and run apps on nodes with required attributes using placement constraints

    #!/bin/bash
    
    #1. SSH on node 
    #2. Create or edit file /var/lib/dcos/mesos-slave-common
    #3. Add contents as :
    #    MESOS_ATTRIBUTES=<key>:<value>
    #    Example:
    #    MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
    #4. Stop dcos-mesos-slave service
    #    systemctl stop dcos-mesos-slave
    #5. Remove link for latest slave metadata
    #    rm -f /var/lib/mesos/slave/meta/slaves/latest
    #6. Start dcos-mesos-slave service
    #    systemctl start dcos-mesos-slave
    #7. Wait for some time, node will be in HEALTHY state again.
    #8. Add app placement constraint with field = key and value = value
    #9. Verify attributes, run on any node
    #    curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
    #    OR Check DCOS cluster UI
    #    Nodes => Select any Node => Details Tab
    
    tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")
    
    # key:value paired attribues, separated by ;
    ATTRIBUTES=NODE_TYPE:GPU_NODE
    
    cat <<EOF > ${tmpScript}
    echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
    sudo systemctl stop dcos-mesos-slave
    sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
    sudo systemctl start dcos-mesos-slave
    EOF
    
    # Add the private ip of nodes on which you want to add attrubutes, one ip per line.
    for i in `cat nodes.txt`; do 
        echo $i
        dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
        sleep 10
    done

    Install DC/OS Datadog metrics plugin on all DC/OS nodes

    #!/bin/bash
    
    # Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>
    
    DDAPI=$1
    
    if [[ -z $DDAPI ]]; then
        echo "[Datadog Plugin] Need datadog API key as parameter."
        echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
    fi
    tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
    tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
    
    declare agent=$tmpScriptAgent
    declare master=$tmpScriptMaster
    
    for role in "agent" "master"
    do
    cat <<EOF > ${!role}
    curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
    chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
    echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
    export DD_API_KEY=$DDAPI
    export AGENT_ROLE=$role
    sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
    echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
    sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
    echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
    sudo systemctl daemon-reload
    sudo systemctl start dcos-metrics-datadog.service
    echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
    servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
    echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
    #sudo systemctl status dcos-metrics-datadog.service | head -3
    #sudo journalctl -u dcos-metrics-datadog
    EOF
    done
    
    echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
    echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"
    
    for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` 
    do 
        echo -e "\n###> Node - $i"
        dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
        echo -e "======================================================="
    done
    
    for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
    do 
        echo -e "\n###> Master Node - $i"
        dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
        echo -e "======================================================="
    done
    
    # Check status of dcos-metrics-datadog.service on all nodes.
    #for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do  echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

    Get app / node metrics fetched by dcos-metrics component using metrics API

    • Get DC/OS node id [dcos node]
    • Get Node metrics (CPU, memory, local filesystems, networks, etc) :  <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/node</agent_id></dc>
    • Get id of all containers running on that agent : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers</agent_id></dc>
    • Get Resource allocation and usage for the given container ID. : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id></container_id></agent_id></dc>
    • Get Application-level metrics from the container (shipped in StatsD format using the listener available at STATSD_UDP_HOST and STATSD_UDP_PORT) : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id>/app     </container_id></agent_id></dc>

    Get app / node metrics fetched by dcos-metrics component using dcos cli

    • Summary of container metrics for a specific task
    dcos task metrics summary <task-id>

    • All metrics in details for a specific task
    dcos task metrics details <task-id>

    • Summary of Node metrics for a specific node
    dcos task metrics summary <mesos-node-id>

    • All Node metrics in details for a specific node
    dcos node metrics details <mesos-node-id>

    NOTE – All above commands have ‘–json’ flag to use them programmatically.  

    Launch / run command inside container for a task

    DC/OS task exec cli only supports Mesos containers, this script supports both Mesos and Docker containers.

    #!/bin/bash
    
    echo "DCOS Task Exec 2.0"
    if [ "$#" -eq 0 ]; then
            echo "Need task name or id as input. Exiting."
            exit 1
    fi
    taskName=$1
    taskCmd=${2:-bash}
    TMP_TASKLIST_JSON=/tmp/dcostasklist.json
    dcos task --json > $TMP_TASKLIST_JSON
    taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
    if [[ $taskExist -eq 0 ]]; then 
            echo "No task with name $taskName exists."
            echo "Do you mean ?"
            dcos task | grep $taskName | awk '{print $1}'
            exit 1
    fi
    taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
    TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
    if [[ $taskExist -ne 1 ]]; then
            echo -e "More than one instances. Please select task ID for executing command.n"
            #allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
            echo ""
            read TaskId
    fi
    if [[ $taskType !=  "DOCKER" ]]; then
            echo "Task [ $taskName ] is of type MESOS Container."
            execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
            echo "Running [$execCmd]"
            $execCmd
    else
            echo "Task [ $taskName ] is of type DOCKER Container."
            taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
            echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
            taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
            taskContID=`echo $taskContID | tr -d 'r'`
            echo "Task Docker Container ID : [ $taskContID ]"
            echo "Running [ docker exec -it $taskContID $taskCmd ]"
            dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
    fi

    Get DC/OS tasks by node

    #!/bin/bash 
    
    function tasksByNodeAPI
    {
        echo "DC/OS Tasks By Node"
        if [ "$#" -eq 0 ]; then
            echo "Need node ip as input. Exiting."
            exit 1
        fi
        nodeIp=$1
        mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
        if [ -z "mesosId" ]; then
            echo "No node found with ip $nodeIp. Exiting."
            exit 1
        fi
        curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "ttt" + .id'  -r
    }
    
    function tasksByNodeCLI
    {
            echo "DC/OS Tasks By Node"
            if [ "$#" -eq 0 ]; then
                    echo "Need node ip as input. Exiting."
                    exit 1
            fi
            nodeIp=$1
            dcos task | egrep "HOST|$nodeIp"
    }

    Get cluster metadata – cluster Public IP and cluster ID

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"           
    $(dcos config show core.dcos_url)/metadata 

    Sample Output:

    {
    "PUBLIC_IPV4": "123.456.789.012",
    "CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
    }

    Get DC/OS metadata – DC/OS version

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    $(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq  

    Sample Output:

    {
    "version": "1.11.0",
    "dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
    "bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
    }

    Get Mesos version

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    $(dcos config show core.dcos_url)/mesos/version

    Sample Output:

    {
    "build_date": "2018-02-27 21:31:27",
    "build_time": 1519767087.0,
    "build_user": "",
    "git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
    "version": "1.5.0"
    }

    Access DC/OS cluster exhibitor UI (Exhibitor supervises ZooKeeper and provides a management web interface)

    <CLUSTER_URL>/exhibitor

    Access DC/OS cluster data from cluster zookeeper using Zookeeper Python client – Run inside any node / container

    from kazoo.client import KazooClient
    
    zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
    zk.start()
    
    clusterId = ""
    # Here we can give znode path to retrieve its decoded data,
    # for ex to get cluster-id, use
    # data, stat = zk.get("/cluster-id")
    # clusterId = data.decode("utf-8")
    
    # Get cluster Id
    if zk.exists("/cluster-id"):
        data, stat = zk.get("/cluster-id")
        clusterId = data.decode("utf-8")
    
    zk.stop()
    
    print (clusterId)

    Access dcos cluster data from cluster zookeeper using exhibitor rest API

    # Get znode data using endpoint :
    # /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
    # Example : Get znode data for path = /cluster-id
    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    $(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id

    Sample Output:

    {
    "bytes": "3333-XXXXXX",
    "str": "abcde-abcde-abcde-abcde-abcde-",
    "stat": "XXXXXX"
    }

    Get cluster name using Mesos API

    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
    $(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

    Mark Mesos node as decommissioned

    Some times instances which are running as DC/OS node gets terminated and can not come back online, like AWS EC2 instances, once terminated due to any reason, can not start back. When Mesos detects that a node has stopped, it puts the node in the UNREACHABLE state because Mesos does not know if the node is temporarily stopped and will come back online, or if it is permanently stopped. In such case, we can explicitly tell Mesos to put a node in the GONE state if we know a node will not come back.

    dcos node decommission <mesos-agent-id>

    Conclusion

    We learned about Mesosphere DC/OS, its functionality and roles. We also learned how to setup and use DC/OS cli and use http authentication to access DC/OS APIs as well as using DC/OS cli for automating tasks.

    We went through different API endpoints like Mesos, Marathon, DC/OS metrics, exhibitor, DC/OS cluster organization etc. Finally, we looked at different tricks and scripts to automate DC/OS, like DC/OS node details, task exec, Docker report, DC/OS API http authentication etc.

  • Enable Real-time Functionality in Your App with GraphQL and Pusher

    The most recognized solution for real-time problems is WebSockets (WS), where there is a persistent connection between the client and the server, and either can start sending data at any time. One of the latest implementations of WS is GraphQL subscriptions.

    With GraphQL subscriptions, you can easily add real-time functionalities to your application. There is an easy and standard way to implement a subscription in the GraphQL app. The client just has to make a subscription query to the server, which specifies the event and the data shape. With this query, the client establishes a long-lived connection with the server on which it listens to specific events. Just as how GraphQL solves the over-fetching problem in the REST API, a subscription continues to extend the solution for real-time.

    In this post, we will learn how to bring real-time functionality to your app by implementing GraphQL subscriptions with Pusher to manage Pub/Sub capabilities. The goal is to configure a Pusher channel and implement two subscriptions to be exposed by your GraphQL server. We will be implementing this in a Node.js runtime environment.

    Why Pusher?

    Why are we doing this using Pusher? 

    • Pusher, being a hosted real-time services provider, relieves us from managing our own real-time infrastructure, which is a highly complex problem.
    • Pusher provides an easy and consistent API.
    • Pusher also provides an entire set of tools to monitor and debug your realtime events.
    • Events can be triggered by and consumed easily from different applications written in different frameworks.

    Project Setup

    We will start with a repository that contains a codebase for a simple GraphQL backend in Node.js, which is a minimal representation of a blog post application. The entities included are:

    1. Link – Represents an URL and a small description for the Link
    2. User – Link belongs to User
    3. Vote – Represents users vote for a Link

    In this application, a User can sign up and add or vote a Link in the application, and other users can upvote the Link. The database schema is built using Prisma and SQLite for quick bootstrapping. In the backend, we will use graphql-yoga as the GraphQL server implementation. To test our GraphQL backend, we will use the graphql-playground by Prisma, as a client, which will perform all queries and mutations on the server.

    To set up the application:

    1. Clone the repository here
    2. Install all dependencies using 
    npm install

    1. Set up a database using prisma-cli with following commands
    npx prisma migrate save --experimental
    #! Select ‘yes’ for the prompt to add an SQLite db after this command and enter a name for the migration. 
    npx prisma migrate up --experimental
    npx prisma generate

    Note: Migrations are experimental features of the Prisma ORM, but you can ignore them because you can have a different backend setup for DB interactions. The purpose of using Prisma here is to quickly set up the project and dive into subscriptions.

    A new directory, named Prisma, will be created containing the schema and database in SQLite. Now, you have your database and app set up and ready to use.

    To start the Node.js application, execute the command:

    npm start

    Navigate to http://localhost:4000 to see the graphql-playground where we will execute our queries and mutations.

    Our next task is to add a GraphQL subscription to our server to allow clients to listen to the following events:

    • A new Link is created
    • A Link is upvoted

    To add subscriptions, we will need an npm package called graphql-pusher-subscriptions to help us interact with the Pusher service from within the GraphQL resolvers. The module will trigger events and listen to events for a channel from the Pusher service.

    Before that, let’s first create a channel in Pusher. To configure a Pusher channel, head to their website at Pusher to create an account. Then, go to your dashboard and create a channels application. Choose a name, the cluster closest to your location, and frontend tech as React and backend tech as Node.js.

    You will receive the following code to start.

    Now, we add the graphql-pusher-subscription package. This package will take the Pusher channel configuration and give you an API to trigger and listen to events published on the channel.

    Now, we import the package in the src/index.js file.

    const { PusherChannel } = require('graphql-pusher-subscriptions');

    After the PusherChannel class provided by the module accepts a configuration for the channel, we need to instantiate the class and create a reference Pub/Sub to the object. We give the Pusher config object given while creating the channel.

    const pubsub = new PusherChannel({
      appId: '1046878',
      key: '3c84229419ed7b47e5b0',
      secret: 'e86868a98a2f052981a6',
      cluster: 'ap2',
      encrypted: true,
      channel: 'graphql-subscription'
    });

    Now, we add “pubsub” to the context so that it is available to all the resolvers. The channel field tells the client which channel to subscribe to. Here we have the channel “graphql-subscription”.

    const server = new GraphQLServer({
      typeDefs: './src/schema.graphql',
      resolvers,
      context: request => {
        return {
          ...request,
          prisma,
          pubsub
        }
      },
    })

    The above part enables us to access the methods we need to implement our subscriptions from inside our resolvers via context.pubsub.

    Subscribing to Link-created Event

    The first step to add a subscription is to extend the GraphQL schema definition.

    type Subscription {
      newLink: Link
    }

    Next, we implement the resolver for the “newLink” subscription type field. It is important to note that resolvers for subscriptions are different from queries and mutations in minor ways.

    1. They return an AsyncIterator instead of data, which is then used by a GraphQL server to publish the event payload to the subscriber client.

    2. The subscription resolvers are provided as a value of the resolve field inside an object. The object should also contain another field named “resolve” that returns the payload data from the data emitted by AsyncIterator.

    To add the resolvers for the subscription, we start by adding a new file called Subscriptions.js

    Inside the project directory, add the file as src/resolvers/Subscription.js

    Now, in the new file created, add the following code, which will be the subscription resolver for the “newLink” type we created in GraphQL schema.

    function newLinkSubscribe(parent, args, context, info) {
      return context.pubsub.asyncIterator("NEW_LINK")
    }
    
    const newLink = {
      subscribe: newLinkSubscribe,
      resolve: payload => {
        return payload
      },
    }
    
    module.exports = {
      newLink,
    }
    view raw

    In the code above, the subscription resolver function, newLinkSubscribe, is added as a field value to the property subscribe just as we described before. The context provides reference to the Pub/Sub object, which lets us use the asyncIterator() with “NEW_LINK” as a parameter. This function resolves subscriptions and publishes events.

    Adding Subscriptions to Your Resolvers

    The final step for our subscription implementation is to call the function above inside of a resolver. We add the following call to pubsub.publish() inside the post resolver function inside Mutation.js file.

    function post(parent, args, context, info) {
      const userId = getUserId(context)
      const newLink = await context.prisma.link.create({
        data: {
          url: args.url,
          description: args.description,
          postedBy: { connect: { id: userId } },
        }
      })
      context.pubsub.publish("NEW_LINK", newLink)
      return newLink
    }

    In the code above, we can see that we pass the same string “NEW_LINK” to the publish method as we did in the newLinkSubscribe function in the subscription function before. The “NEW_LINK” is the event name, and it will publish events to the Pusher service, and the same name will be used on the subscription resolver to bind to the particular event name. We also add the newLink as a second argument, which contains the data part for the event that will be published. The context.pubsub.publish function will be triggered before returning the newLink data.

    Now, we will update the main resolver object, which is given to the GraphQL server.

    First, import the subscription module inside of the index.js file.

    const Subscription = require('./resolvers/Subscription') 
    const resolvers = {
      Query,
      Mutation,
      Subscription,
      User,
      Link,
    }

    Now, with all code in place, we start testing our real time API. We will use multiple instances/tabs of GraphQL playground concurrently.

    Testing Subscriptions

    If your server is already running, then kill it with CTRL+C and restart with this command:

    npm start

    Next, open the browser and navigate to http://localhost:4000 to see the GraphQL playground. We will use one tab of the playground to perform the mutation to trigger the event to Pusher and invoke the subscriber.

    We will now start to execute the queries to add some entities in the application.

    First, let’s create a user in the application by using the signup mutation. We send the following mutation to the server to create a new User entity.

    mutation {
        signup(
        name: "Alice"
        email: "alice@prisma.io"
        password: "graphql"
      ) {
        token
        user {
          Id
        }
      }
    }

    You will see a response in the playground that contains the authentication token for the user. Copy the token, and open another tab in the playground. Inside that new tab, open the HTTP_HEADERS section in the bottom and add the Authorization header.

    Replace the __TOKEN__  placeholder from the below snippet with the copied token from above.

    {
      "Authorization": "Bearer __TOKEN__"
    }

    Now, all the queries or mutations executed from that tab will carry the authentication token. With this in place, we sent the following mutation to our GraphQL server.

    mutation {
    post(
        url: "http://velotio.com"
        description: "An awesome GraphQL blog"
      ) {
        id
      }
    }

    The mutations above create a Link entity inside the application. Now that we have created an entity, we now move to test the subscription part. In another tab, we will send the subscription query and create a persistent WebSocket connection to the server. Before firing out a subscription query, let us first understand the syntax of it. It starts with the keyword subscription followed by the subscription name. The subscription query is defined in the GraphQL schema and shows the data shape we can resolve to. Here, we want to subscribe to a newLink subscription name, and the data resolved by it consists of that of a Link entity. That means we can resolve any specific part of the Link entity. Here, we are asking for attributes like id, URL, description, and nested attributes of the postedBy field.

    subscription {
      newLink {
          id
          url
          description
          postedBy {
            id
            name
            email
          }
      }
    }

    The response of this operation is different from that of a mutation or query. You see a loading spinner, which indicates that it is waiting for an event to happen. This means the GraphQL client (playground) has established a connection with the server and is listening for response data.

    Before triggering a subscription, we will also keep an eye on the Pusher channel for events triggered to verify that our Pusher service is integrated successfully.

    To do this, we go to Pusher dasboard and navigate to the channel app we created and click on the debug console. The debug console will show us the events triggered in real-time.

    Now that the Pusher dashboard is visible, we will trigger the subscription event by running the following mutation inside a new Playground tab.

    mutation {
      post(
        url: "www.velotio.com"
        description: "Graphql remote schema stitching"
      ) {
        id
      }
    }

    Now, we observe the Playground where subscription was running.

    We can see that the newly created Link is visible in the response section, and the subscription continues to listen, and the event has reached the Pusher service.

    You will observe an event on the Pusher console that is the same event and data as sent by your post mutation.

     

    We have achieved our first goal, i.e., we have integrated the Pusher channel and implemented a subscription for a Link creation event.

    To achieve our second goal, i.e., to listen to Vote events, we repeat the same steps as we did for the Link subscription.

    We add a subscription resolver for Vote in the Subscriptions.js file and update the Subscription type in the GraphQL schema. To trigger a different event, we use “NEW_VOTE” as the event name and add the publish function inside the resolver for Vote mutation.

    function newVoteSubscribe(parent, args, context, info) {
      return context.pubsub.asyncIterator("NEW_VOTE")
    }
    
    const newVote = {
      subscribe: newVoteSubscribe,
      resolve: payload => {
        return payload
      },
    }
    view raw

    Update the export statement to add the newVote resolver.

    module.exports = {
      newLink,
      newVote,
    }

    Update the Vote mutation to add the publish call before returning the newVote data. Notice that the first parameter, “NEW_VOTE”, is being passed so that the listener can bind to the new event with that name.

    const newVote = context.prisma.vote.create({
        data: {
          user: { connect: { id: userId } },
          link: { connect: { id: Number(args.linkId) } },
        }
      })
      context.pubsub.publish("NEW_VOTE", newVote)
      return newVote
    }

    Now, restart the server and complete the signup process with setting HTTP_HEADERS as we did before. Add the following subscription to a new Playground tab.

    subscription {
      newVote {
        id
        link {
          url
          description
        }
        user {
          name
          email
        }
      }
    }

    In another Playground tab, send the following Vote mutation to the server to trigger the event, but do not forget to verify the Authorization header. The below mutation will add the Vote of the user to the Link.  Replace the “__LINK_ID__” with the linkId generated in the previous post mutation.

    mutation {
      vote(linkId: "__LINK_ID__") {
        link {
          url
          description
        }
        user {
          name
          email
        }
      }
    }

    Observe the event data on the response tab of the vote subscription. Also, you can check your event triggered on the pusher dashboard.

    The final codebase is available on a branch named with-subscription.

    Conclusion

    By following the steps above, we saw how easy it is to add real-time features to GraphQL apps with subscriptions. Also, establishing a connection with the server is no hassle, and it is much similar to how we implement the queries and mutations. Unlike the mainstream approach, where one has to build and manage the event handlers, the GraphQL subscriptions come with these features built-in for the client and server. Also, we saw how we can use a managed real-time service like Pusher can be for Pub/Sub events. Both GraphQL and Pusher can prove to be a solid combination for a reliable real-time system.

    Related Articles

    1. Build and Deploy a Real-Time React App Using AWS Amplify and GraphQL

    2. Scalable Real-time Communication With Pusher

  • Your Complete Guide to Building Stateless Bots Using Rasa Stack

    This blog aims at exploring the Rasa Stack to create a stateless chat-bot. We will look into how, the recently released Rasa Core, which provides machine learning based dialogue management, helps in maintaining the context of conversations using machine learning in an efficient way.

    If you have developed chatbots, you would know how hopelessly bots fail in maintaining the context once complex use-cases need to be developed. There are some home-grown approaches that people currently use to build stateful bots. The most naive approach is to create the state machines where you create different states and based on some logic take actions. As the number of states increases, more levels of nested logic are required or there is a need to add an extra state to the state machine, with another set of rules for how to get in and out of that state. Both of these approaches lead to fragile code that is harder to maintain and update. Anyone who’s built and debugged a moderately complex bot knows this pain.

    After building many chatbots, we have experienced that flowcharts are useful for doing the initial design of a bot and describing a few of the known conversation paths, but we shouldn’t hard-code a bunch of rules since this approach doesn’t scale beyond simple conversations.

    Thanks to the Rasa guys who provided a way to go stateless where scaling is not at all a problem. Let’s build a bot using Rasa Core and learn more about this.

    Rasa Core: Getting Rid of State Machines

    The main idea behind Rasa Core is that thinking of conversations as a flowchart and implementing them as a state machine doesn’t scale. It’s very hard to reason about all possible conversations explicitly, but it’s very easy to tell, mid-conversation, if a response is right or wrong. For example, let’s consider a term insurance purchase bot, where you have defined different states to take different actions. Below diagram shows an example state machine:

    Let’s consider a sample conversation where a user wants to compare two policies listed by policy_search state.

    In above conversation, it can be compared very easily by adding some logic around the intent campare_policies. But real life is not so easy, as a majority of conversations are edge cases. We need to add rules manually to handle such cases, and after testing we realize that these clash with other rules we wrote earlier.

    Rasa guys figured out how machine learning can be used to solve this problem. They have released Rasa Core where the logic of the bot is based on a probabilistic model trained on real conversations.

    Structure of a Rasa Core App

    Let’s understand few terminologies we need to know to build a Rasa Core app:

    1. Interpreter: An interpreter is responsible for parsing messages. It performs the Natural Language Understanding and transforms the message into structured output i.e. intent and entities. In this blog, we are using Rasa NLU model as an interpreter. Rasa NLU comes under the Rasa Stack. In Training section, it is shown in detail how to prepare the training data and create a model.

    2. Domain: To define a domain we create a domain.yml file, which defines the universe of your bot. Following things need to be defined in a domain file:

    • Intents: Things we expect the user to say. It is more related to Rasa NLU.
    • Entities: These represent pieces of information extracted what user said. It is also related to Rasa NLU.
    • Templates: We define some template strings which our bot can say. The format for defining a template string is utter_<intent>. These are considered as actions which bot can take.
    • Actions: List of things bot can do and say. There are two types of actions we define one those which will only utter message (Templates) and others some customised actions where some required logic is defined. Customised actions are defined as Python classes and are referenced in domain file.
    • Slots: These are user-defined variables which need to be tracked in a conversation. For e.g to buy a term insurance we need to keep track of what policy user selects and details of the user, so all these details will come under slots.

    3. Stories: In stories, we define what bot needs to do at what point in time. Based on these stories, a probabilistic model is generated which is used to decide which action to be taken next. There are two ways in which stories can be created which are explained in next section.

    Let’s combine all these pieces together. When a message arrives in a Rasa Core app initially, interpreter transforms the message into structured output i.e. intents and entities. The Tracker is the object which keeps track of conversation state. It receives the info that a new message has come in. Then based on dialog model we generate using domain and stories policy chooses which action to take next. The chosen action is logged by the tracker and response is sent back to the user.

    Training and Running A Sample Bot

    We will create a simple Facebook chat-bot named Secure Life which assists you in buying term life insurance. To keep the example simple, we have restricted options such as age-group, term insurance amount, etc.

    There are two models we need to train in the Rasa Core app:

    Rasa NLU model based on which messages will be processed and converted to a structured form of intent and entities. Create following two files to generate the model:

    data.json: Create this training file using the rasa-nlu trainer. Click here to know more about the rasa-nlu trainer.

    nlu_config.json: This is the configuration file.

    {
    "pipeline": "spacy_sklearn",
    "path" : "./models",
    "project": "nlu",
    "data" : "./data/data.md"
    }

    Run below command to train the rasa-nlu model:-

    $ python -m rasa_nlu.train -c nlu_model_config.json --fixed_model_name current

    Dialogue Model: This model is trained on stories we define, based on which the policy will take the action. There are two ways in which stories can be generated:

    • Supervised Learning: In this type of learning we will create the stories by hand, writing them directly in a file. It is easy to write but in case of complex use-cases it is difficult to cover all scenarios.
    • Reinforcement Learning: The user provides feedback on every decision taken by the policy. This is also known as interactive learning. This helps in including edge cases which are difficult to create by hand. You must be thinking how it works? Every time when a policy chooses an action to take, it is asked from the user whether the chosen action is correct or not. If the action taken is wrong, you can correct the action on the fly and store the stories to train the model again.

    Since the example is simple, we have used supervised learning method, to generate the dialogue model. Below is the stories.md file.

    ## All yes
    * greet
    - utter_greet
    * affirm
    - utter_very_much_so
    * affirm
    - utter_gender
    * gender
    - utter_coverage_duration
    - action_gender
    * affirm
    - utter_nicotine
    * affirm
    - action_nicotine
    * age
    - action_thanks
    
    ## User not interested
    * greet
    - utter_greet
    * deny
    - utter_decline
    
    ## Coverage duration is not sufficient
    * greet
    - utter_greet
    * affirm
    - utter_very_much_so
    * affirm
    - utter_gender
    * gender
    - utter_coverage_duration
    - action_gender
    * deny
    - utter_decline

    Run below command to train dialogue model :

    $ python -m rasa_core.train -s <path to stories.md file> -d <path to domain.yml> -o models/dialogue --epochs 300

    Define a Domain: Create domain.yml file containing all the required information. Among the intents and entities write all those strings which bot is supposed to see when user say something i.e. intents and entities you defined in rasa NLU training file.

    intents:
    - greet
    - goodbye
    - affirm
    - deny
    - age
    - gender
    
    slots:
    gender:
    type: text
    nicotine:
    type: text
    agegroup:
    type: text
    
    templates:
    utter_greet:
    - "hey there! welcome to Secure-Life!\nI can help you quickly estimate your rate of coverage.\nWould you like to do that ?"
    
    utter_very_much_so:
    - "Great! Let's get started.\nWe currently offer term plans of Rs. 1Cr. Does that suit your need?"
    
    utter_gender:
    - "What gender do you go by ?"
    
    utter_coverage_duration:
    - "We offer this term plan for a duration of 30Y. Do you think that's enough to cover entire timeframe of your financial obligations ?"
    
    utter_nicotine:
    - "Do you consume nicotine-containing products?"
    
    utter_age:
    - "And lastly, how old are you ?"
    
    utter_thanks:
    - "Thank you for providing all the info. Let me calculate the insurance premium based on your inputs."
    
    utter_decline:
    - "Sad to see you go. In case you change your plans, you know where to find me :-)"
    
    utter_goodbye:
    - "goodbye :("
    
    actions:
    - utter_greet
    - utter_goodbye
    - utter_very_much_so
    - utter_coverage_duration
    - utter_age
    - utter_nicotine
    - utter_gender
    - utter_decline
    - utter_thanks
    - actions.ActionGender
    - actions.ActionNicotine
    - actions.ActionThanks

    Define Actions: Templates defined in domain.yml also considered as actions. A sample customized action is shown below where we are setting a slot named gender with values according to the option selected by the user.

    from rasa_core.actions.action import Action
    from rasa_core.events import SlotSet
    
    class ActionGender(Action):
    def name(self):
    return 'action_gender'
    def run(self, dispatcher, tracker, domain):
    messageObtained = tracker.latest_message.text.lower()
    
    if ("male" in messageObtained):
    return [SlotSet("gender", "male")]
    elif ("female" in messageObtained):
    return [SlotSet("gender", "female")]
    else:
    return [SlotSet("gender", "others")]

    Running the Bot

    Create a Facebook app and get the app credentials. Create a bot.py file as shown below:

    from rasa_core import utils
    from rasa_core.agent import Agent
    from rasa_core.interpreter import RasaNLUInterpreter
    from rasa_core.channels import HttpInputChannel
    from rasa_core.channels.facebook import FacebookInput
    
    logger = logging.getLogger(__name__)
    
    def run(serve_forever=True):
    # create rasa NLU interpreter
    interpreter = RasaNLUInterpreter("models/nlu/current")
    agent = Agent.load("models/dialogue", interpreter=interpreter)
    
    input_channel = FacebookInput(
    fb_verify="your_fb_verify_token", # you need tell facebook this token, to confirm your URL
    fb_secret="your_app_secret", # your app secret
    fb_tokens={"your_page_id": "your_page_token"}, # page ids + tokens you subscribed to
    debug_mode=True # enable debug mode for underlying fb library
    )
    
    if serve_forever:
    agent.handle_channel(HttpInputChannel(5004, "/app", input_channel))
    return agent
    
    if __name__ == '__main__':
    utils.configure_colored_logging(loglevel="DEBUG")
    run()

    Run the file and your bot is ready to test. Sample conversations are provided below:

    Summary

    You have seen how Rasa Core has made it easier to build bots. Just create few files and boom! Your bot is ready! Isn’t it exciting? I hope this blog provided you some insights on how Rasa Core works. Start exploring and let us know if you need any help in building chatbots using Rasa Core.

  • Building a Progressive Web Application in React [With Live Code Examples]

    What is PWA:

    A Progressive Web Application or PWA is a web application that is built to look and behave like native apps, operates offline-first, is optimized for a variety of viewports ranging from mobile, tablets to FHD desktop monitors and more. PWAs are built using front-end technologies such as HTML, CSS and JavaScript and bring native-like user experience to the web platform. PWAs can also be installed on devices just like native apps.

    For an application to be classified as a PWA, it must tick all of these boxes:

    • PWAs must implement service workers. Service workers act as a proxy between the web browsers and API servers. This allows web apps to manage and cache network requests and assets
    • PWAs must be served over a secure network, i.e. the application must be served over HTTPS
    • PWAs must have a web manifest definition, which is a JSON file that provides basic information about the PWA, such as name, different icons, look and feel of the app, splash screen, version of the app, description, author, etc

    Why build a PWA?

    Businesses and engineering teams should consider building a progressive web app instead of a traditional web app. Here are some of the most prominent arguments in favor of PWAs:

    • PWAs are responsive. The mobile-first design approach enables PWAs to support a variety of viewports and orientation
    • PWAs can work on slow Internet or no Internet environment. App developers can choose how a PWA will behave when there’s no Internet connectivity, whereas traditional web apps or websites simply stop working without an active Internet connection
    • PWAs are secure because they are always served over HTTPs
    • PWAs can be installed on the home screen, making the application more accessible
    • PWAs bring in rich features, such as push notification, application updates and more

    PWA and React

    There are various ways to build a progressive web application. One can just use Vanilla JS, HTML and CSS or pick up a framework or library. Some of the popular choices in 2020 are Ionic, Vue, Angular, Polymer, and of course React, which happens to be my favorite front-end library.

    Building PWAs with React

    To get started, let’s create a PWA which lists all the users in a system.

    npm init react-app users
    cd users
    yarn add react-router-dom
    yarn run start

    Next, we will replace the default App.js file with our own implementation.

    import React from "react";
    import { BrowserRouter, Route } from "react-router-dom";
    import "./App.css";
    const Users = () => {
     // state
     const [users, setUsers] = React.useState([]);
     // effects
     React.useEffect(() => {
       fetch("https://jsonplaceholder.typicode.com/users")
         .then((res) => res.json())
         .then((users) => {
           setUsers(users);
         })
         .catch((err) => {});
     }, []);
     // render
     return (
       <div>
         <h2>Users</h2>
         <ul>
           {users.map((user) => (
             <li key={user.id}>
               {user.name} ({user.email})
             </li>
           ))}
         </ul>
       </div>
     );
    };
    const App = () => (
     <BrowserRouter>
       <Route path="/" exact component={Users} />
     </BrowserRouter>
    );
    export default App;

    This displays a list of users fetched from the server.

    Let’s also remove the logo.svg file inside the src directory and truncate the App.css file that is populated as a part of the boilerplate code.

    To make this app a PWA, we need to follow these steps:

    1. Register service worker

    • In the file /src/index.js, replace serviceWorker.unregister() with serviceWorker.register().
    import React from 'react';
    import ReactDOM from 'react-dom';
    import './index.css';
    import App from './App';
    import * as serviceWorker from './serviceWorker';
    ReactDOM.render(
     <React.StrictMode>
       <App />
     </React.StrictMode>,
     document.getElementById('root')
    );
    serviceWorker.register();

    • The default behavior here is to not set up a service worker, i.e. the CRA boilerplate allows the users to opt-in for the offline-first experience.

    2. Update the manifest file

    • The CRA boilerplate provides a manifest file out of the box. This file is located at /public/manifest.json and needs to be modified to include the name of the PWA, description, splash screen configuration and much more. You can read more about available configuration options in the manifest file here.

    Our modified manifest file looks like this:

    {
     "short_name": "User Mgmt.",
     "name": "User Management",
     "icons": [
       {
         "src": "favicon.ico",
         "sizes": "64x64 32x32 24x24 16x16",
         "type": "image/x-icon"
       },
       {
         "src": "logo192.png",
         "type": "image/png",
         "sizes": "192x192"
       },
       {
         "src": "logo512.png",
         "type": "image/png",
         "sizes": "512x512"
       }
     ],
     "start_url": ".",
     "display": "standalone",
     "theme_color": "#aaffaa",
     "background_color": "#ffffff"
    }

    PWA Splash Screen

    Here the display mode selected is “standalone” which tells the web browsers to give this PWA the same look and feel as that of a standalone app. Other display options include, “browser,” which is the default mode and launches the PWA like a traditional web app and “fullscreen,” which opens the PWA in fullscreen mode – hiding all other elements such as navigation, the address bar and the status bar.

    The manifest can be inspected using Chrome dev tools > Application tab > Manifest.

    1. Test the PWA:

    • To test a progressive web app, build it completely first. This is because PWA features, such as caching aren’t enabled while running the app in dev mode to ensure hassle-free development  
    • Create a production build with: npm run build
    • Change into the build directory: cd build
    • Host the app locally: http-server or python3 -m http.server 8080
    • Test the application by logging in to http://localhost:8080

    2. Audit the PWA: If you are testing the app for the first time on a desktop or laptop browser, PWA may look like just another website. To test and audit various aspects of the PWA, let’s use Lighthouse, which is a tool built by Google specifically for this purpose.

    PWA on mobile

    At this point, we already have a simple PWA which can be published on the Internet and made available to billions of devices. Now let’s try to enhance the app by improving its offline viewing experience.

    1. Offline indication: Since service workers can operate without the Internet as well, let’s add an offline indicator banner to let users know the current state of the application. We will use navigator.onLine along with the “online” and “offline” window events to detect the connection status.

     // state
      const [offline, setOffline] = React.useState(false);
      // effects
      React.useEffect(() => {
        window.addEventListener("offline", offlineListener);
        return () => {
          window.removeEventListener("offline", offlineListener);
        };
      }, []);
      
      {/* add to jsx */}
      {offline ? (
        <div className="banner-offline">The app is currently offline</div>
      ) : null}

    The easiest way to test this is to just turn off the Wi-Fi on your dev machine. Chrome dev tools also provide an option to test this without actually going offline. Head over to Dev tools > Network and then select “Offline” from the dropdown in the top section. This should bring up the banner when the app is offline.

    2. Let’s cache a network request using service worker

    CRA comes with its own service-worker.js file which caches all static assets such as JavaScript and CSS files that are a part of the application bundle. To put custom logic into the service worker, let’s create a new file called ‘custom-service-worker.js’ and combine the two.

    • Install react-app-rewired and update package.json:
    1. yarn add react-app-rewired
    2. Update the package.json as follows:
    "scripts": {
       "start": "react-app-rewired start",
       "build": "react-app-rewired build",
       "test": "react-app-rewired test",
       "eject": "react-app-rewired eject"
    },

    • Create a config file to override how CRA generates service workers and inject our custom service worker, i.e. combine the two service worker files.
    const WorkboxWebpackPlugin = require("workbox-webpack-plugin");
    module.exports = function override(config, env) {
      config.plugins = config.plugins.map((plugin) => {
        if (plugin.constructor.name === "GenerateSW") {
          return new WorkboxWebpackPlugin.InjectManifest({
           swSrc: "./src/service-worker-custom.js",
           swDest: "service-worker.js"
          });
        }
        return plugin;
      });
      return config;
    };

    • Create service-worker-custom.js file and cache network request in there:
    workbox.skipWaiting();
    workbox.clientsClaim();
    workbox.routing.registerRoute(
      new RegExp("/users"),
      workbox.strategies.NetworkFirst()
    );
    workbox.precaching.precacheAndRoute(self.__precacheManifest || [])

    Your app should now work correctly in the offline mode.

    Distributing and publishing a PWA

    PWAs can be published just like any other website and only have one additional requirement, i.e. it must be served over HTTPs. When a user visits PWA from mobile or tablet, a pop-up is displayed asking the user if they’d like to install the app to their home screen.

    Conclusion

    Building PWAs with React enables engineering teams to develop, deploy and publish progressive web apps for billions of devices using technologies they’re already familiar with. Existing React apps can also be converted to a PWA. PWAs are fun to build, easy to ship and distribute, and add a lot of value to customers by providing native-live experience, better engagement via features, such as add to homescreen, push notifications and more without any installation process. 

  • Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

    Introduction

    The world continues to go through digital transformation at an accelerating pace. Modern applications and infrastructure continues to expand and operational complexity continues to grow. According to a recent ManageEngine Application Performance Monitoring Survey:

    • 28 percent use ad-hoc scripts to detect issues in over 50 percent of their applications.
    • 32 percent learn about application performance issues from end users.
    • 59 percent trust monitoring tools to identify most performance deviations.

    Most enterprises and web-scale companies have instrumentation & monitoring capabilities with an ElasticSearch cluster. They have a high amount of collected data but struggle to use it effectively. This available data can be used to improve availability and effectiveness of performance and uptime along with root cause analysis and incident prediction

    IT Operations & Machine Learning

    Here is the main question: How to make sense of the huge piles of collected data? The first step towards making sense of data is to understand the correlations between the time series data. But only understanding will not work since correlation does not imply causation. We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc.

    It’s very likely that due to one component something goes wrong with another component. In such cases, operational historical data can be used to identify the root cause by investigating through a series of intermediate causes and effects. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

    Let’s see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

    Anomaly Detection using Elastic’s machine learning with X-Pack

    Step I: Setup

    1. Setup Elasticsearch: 

    According to Elastic documentation, it is recommended to use the Oracle JDK version 1.8.0_131. Check if you have required Java version installed on your system. It should be at least Java 8, if required install/upgrade accordingly.

    • Download elasticsearch tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.tar.gz
    $ tar -xzvf elasticsearch-5.5.1.tar.gz

    • It will then create a folder named elasticsearch-5.5.1. Go into the folder.
    $ cd elasticsearch-5.5.1

    • Install X-Pack into Elasticsearch
    $ ./bin/elasticsearch-plugin install x-pack

    • Start elasticsearch
    $ bin/elasticsearch

    2. Setup Kibana

    Kibana is an open source analytics and visualization platform designed to work with Elasticsearch.

    • Download kibana tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/kibana/kibana-5.5.1-linux-x86_64.tar.gz
    $ tar -xzf kibana-5.5.1-linux-x86_64.tar.gz

    • It will then create a folder named kibana-5.5.1. Go into the directory.
    $ cd kibana-5.5.1-linux-x86_64

    • Install X-Pack into Kibana
    $ ./bin/kibana-plugin install x-pack

    • Running kibana
    $ ./bin/kibana

    • Navigate to Kibana at http://localhost:5601/
    • Log in as the built-in user elastic and password changeme.
    • You will see the below screen:
    Kibana: X-Pack Welcome Page

     

    3. Metricbeat:

    Metricbeat helps in monitoring servers and the services they host by collecting metrics from the operating system and services. We will use it to get CPU utilization metrics of our local system in this blog.

    • Download Metric Beat’s tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-5.5.1-linux-x86_64.tar.gz
    $ tar -xvzf metricbeat-5.5.1-linux-x86_64.tar.gz

    • It will create a folder metricbeat-5.5.1-linux-x86_64. Go to the folder
    $ cd metricbeat-5.5.1-linux-x86_64

    • By default, Metricbeat is configured to send collected data to elasticsearch running on localhost. If your elasticsearch is hosted on any server, change the IP and authentication credentials in metricbeat.yml file.
     Metricbeat Config

     

    • Metric beat provides following stats:
    • System load
    • CPU stats
    • IO stats
    • Per filesystem stats
    • Per CPU core stats
    • File system summary stats
    • Memory stats
    • Network stats
    • Per process stats
    • Start Metricbeat as daemon process
    $ sudo ./metricbeat -e -c metricbeat.yml &

    Now, all setup is done. Let’s go to step 2 to create machine learning jobs. 

    Step II: Time Series data

    • Real-time data: We have metricbeat providing us the real-time series data which will be used for unsupervised learning. Follow below steps to define index pattern metricbeat-*  in Kibana to search against this pattern in Elasticsearch:
      – Go to Management -> Index Patterns  
      – Provide Index name or pattern as metricbeat-*
      – Select Time filter field name as @timestamp
      – Click Create

    You will not be able to create an index if elasticsearch did not contain any metric beat data. Make sure your metric beat is running and output is configured as elasticsearch.

    • Saved Historic data: Just to see quickly how machine learning detect the anomalies you can also use data provided by Elastic. Download sample data by clicking here.
    • Unzip the files in a folder: tar -zxvf server_metrics.tar.gz
    • Download this script. It will be used to upload sample data to elastic.
    • Provide execute permissions to the file: chmod +x upload_server-metrics.sh
    • Run the script.
    • As we created index pattern for metricbeat data, in same way create index pattern server-metrics*

    Step III: Creating Machine Learning jobs

    There are two scenarios in which data is considered anomalous. First, when the behavior of key indicator changes over time relative to its previous behavior. Secondly, when within a population behavior of an entity deviates from other entities in population over single key indicator.

    To detect these anomalies, there are three types of jobs we can create:

    1. Single Metric job: This job is used to detect Scenario 1 kind of anomalies over only one key performance indicator.
    2. Multimetric job: Multimetric job also detects Scenario 1 kind of anomalies but in this type of job we can track more than one performance indicators, such as CPU utilization along with memory utilization.
    3. Advanced job: This kind of job is created to detect anomalies of type 2.

    For simplicity, we are creating following single metric jobs:

    1. Tracking CPU Utilization: Using metric beat data
    2. Tracking total requests made on server: Using sample server data

    Follow below steps to create single metric jobs:

    Job1: Tracking CPU Utilization

    Job2: Tracking total requests made on server

    • Go to http://localhost:5601/
    • Go to Machine learning tab on the left panel of Kibana.
    • Click on Create new job
    • Click Create single metric job
    • Select index we created in Step 2 i.e. metricbeat-* and server-metrics* respectively
    • Configure jobs by providing following values:
    1. Aggregation: Here you need to select an aggregation function that will be applied to a particular field of data we are analyzing.
    2. Field: It is a drop down, will show you all field that you have w.r.t index pattern.
    3. Bucket span: It is interval time for analysis. Aggregation function will be applied on selected field after every interval time specified here.
    • If your data contains so many empty buckets i.e. data is sparse and you don’t want to consider it as anomalous check the checkbox named sparse data  (if it appears).
    • Click on Use full <index pattern=””> data to use all available data for analysis.</index>
    Metricbeats Description
    Server Description
    • Click on play symbol
    • Provide job name and description
    • Click on Create Job

    After creating job the data available will be analyzed. Click on view results, you will see a chart which will show the actual and upper & lower bound of predicted value. If actual value lies outside of the range, it will be considered as anomalous. The Color of the circles represents the severity level.

    Here we are getting a high range of prediction values since it just started learning. As we get more data the prediction will get better.
    You can see here predictions are pretty good since there is a lot of data to understand the pattern
    • Click on machine learning tab in the left panel. The jobs we created will be listed here.
    • You will see the list of actions for every job you have created.
    • Since we are storing every minute data for Job1 using metricbeat. We can feed the data to the job in real time. Click on play button to start data feed. As we get more and more data prediction will improve.
    • You see details of anomalies by clicking Anomaly Viewer.
    Anomaly in metricbeats data
    Server metrics anomalies  

    We have seen how machine learning can be used to get patterns among the different statistics along with anomaly detection. After identifying anomalies, it is required to find the context of those events. For example, to know about what other factors are contributing to the problem? In such cases, we can troubleshoot by creating multimetric jobs.

  • Idiot-proof Coding with Node.js and Express.js

    Node.js has become the most popular framework for web development surpassing Ruby on Rails and Django in terms of the popularity.The growing popularity of full stack development along with the performance benefits of asynchronous programming has led to the rise of Node’s popularity. ExpressJs is a minimalistic, unopinionated and the most popular web framework built for Node which has become the de-facto framework for many projects.
    Note — This article is about building a Restful API server with ExpressJs . I won’t be delving into a templating library like handlebars to manage the views.

    A quick search on google will lead you a ton of articles agreeing with what I just said which could validate the theory. Your next step would be to go through a couple of videos about ExpressJS on Youtube, try hello world with a boilerplate template, choose few recommended middleware for Express (Helmet, Multer etc), an ORM (mongoose if you are using Mongo or Sequelize if you are using relational DB) and start building the APIs. Wow, that was so fast!

    The problem starts to appear after a few weeks when your code gets larger and complex and you realise that there is no standard coding practice followed across the client and the server code, refactoring or updating the code breaks something else, versioning of the APIs becomes difficult, call backs have made your life hell (you are smart if you are using Promises but have you heard of async-await?).

    Do you think you your code is not so idiot-proof anymore? Don’t worry! You aren’t the only one who thinks this way after reading this.

    Let me break the suspense and list down the technologies and libraries used in our idiot-proof code before you get restless.

    1. Node 8.11.3: This is the latest LTS release from Node. We are using all the ES6 features along with async-await. We have the latest version of ExpressJs (4.16.3).
    2. Typescript: It adds an optional static typing interface to Javascript and also gives us familiar constructs like classes (Es6 also gives provides class as a construct) which makes it easy to maintain a large codebase.
    3. Swagger: It provides a specification to easily design, develop, test and document RESTful interfaces. Swagger also provides many open source tools like codegen and editor that makes it easy to design the app.
    4. TSLint: It performs static code analysis on Typescript for maintainability, readability and functionality errors.
    5. Prettier: It is an opinionated code formatter which maintains a consistent style throughout the project. This only takes care of the styling like the indentation (2 or 4 spaces), should the arguments remain on the same line or go to the next line when the line length exceeds 80 characters etc.
    6. Husky: It allows you to add git hooks (pre-commit, pre-push) which can trigger TSLint, Prettier or Unit tests to automatically format the code and to prevent the push if the lint or the tests fail.

    Before you move to the next section I would recommend going through the links to ensure that you have a sound understanding of these tools.

    Now I’ll talk about some of the challenges we faced in some of our older projects and how we addressed these issues in the newer projects with the tools/technologies listed above.

    Formal API definition

    A problem that everyone can relate to is the lack of formal documentation in the project. Swagger addresses a part of this problem with their OpenAPI specification which defines a standard to design REST APIs which can be discovered by both machines and humans. As a practice, we first design the APIs in swagger before writing the code. This has 3 benefits:

    • It helps us to focus only on the design without having to worry about the code, scaffolder, naming conventions etc. Our API designs are consistent with the implementation because of this focused approach.
    • We can leverage tools like swagger-express-mw to internally wire the routes in the API doc to the controller, validate request and response object from their definitions etc.
    • Collaboration between teams becomes very easy, simple and standardised because of the Swagger specification.

    Code Consistency

    We wanted our code to look consistent across the stack (UI and Backend)and we use ESlint to enforce this consistency.
    Example –
    Node traditionally used “require” and the UI based frameworks used “import” based syntax to load the modules. We decided to follow ES6 style across the project and these rules are defined with ESLint.

    Note — We have made slight adjustments to the TSlint for the backend and the frontend to make it easy for the developers. For example, we allow upto 120 characters in React as some of our DOM related code gets lengthy very easily.

    Code Formatting

    This is as important as maintaining the code consistency in the project. It’s easy to read a code which follows a consistent format like indentation, spaces, line breaks etc. Prettier does a great job at this. We have also integrated Prettier with Typescript to highlight the formatting errors along with linting errors. IDE like VS Code also has prettier plugin which supports features like auto-format to make it easy.

    Strict Typing

    Typescript can be leveraged to the best only if the application follows strict typing. We try to enforce it as much as possible with exceptions made in some cases (mostly when a third party library doesn’t have a type definition). This has the following benefits:

    • Static code analysis works better when your code is strongly typed. We discover about 80–90% of the issues before compilation itself using the plugins mentioned above.
    • Refactoring and enhancements becomes very simple with Typescript. We first update the interface or the function definition and then follow the errors thrown by Typescript compiler to refactor the code.

    Git Hooks

    Husky’s “pre-push” hook runs TSLint to ensure that we don’t push the code with linting issues. If you follow TDD (in the way it’s supposed to be done), then you can also run unit tests before pushing the code. We decided to go with pre-hooks because
    – Not everyone has CI from the very first day. With a git hook, we at least have some code quality checks from the first day.
    – Running lint and unit tests on the dev’s system will leave your CI with more resources to run integration and other complex tests which are not possible to do in local environment.
    – You force the developer to fix the issues at the earliest which results in better code quality, faster code merges and release.

    Async-await

    We were using promises in our project for all the asynchronous operations. Promises would often lead to a long chain of then-error blocks which is not very comfortable to read and often result in bugs when it got very long (it goes without saying that Promises are much better than the call back function pattern). Async-await provides a very clean syntax to write asynchronous operations which just looks like sequential code. We have seen a drastic improvement in the code quality, fewer bugs and better readability after moving to async-await.

    Hope this article gave you some insights into tools and libraries that you can use to build a scalable ExpressJS app.

  • Cloud Native Applications — The Why, The What & The How

    Cloud-native is an approach to build & run applications that can leverage the advantages of the cloud computing model — On demand computing power & pay-as-you-go pricing model. These applications are built and deployed in a rapid cadence to the cloud platform and offer organizations greater agility, resilience, and portability across clouds.

    This blog explains the importance, the benefits and how to go about building Cloud Native Applications.

    CLOUD NATIVE – The Why?

    Early technology adapters like FANG (Facebook, Amazon, Netflix & Google) have some common themes when it comes to shipping software. They have invested heavily in building capabilities that enable them to release new features regularly (weekly, daily or in some cases even hourly). They have achieved this rapid release cadence while supporting safe and reliable operation of their applications; in turn allowing them to respond more effectively to their customers’ needs.

    They have achieved this level of agility by moving beyond ad-hoc automation and by adopting cloud native practices that deliver these predictable capabilities. DevOps,Continuous Delivery, micro services & containers form the 4 main tenets of Cloud Native patterns. All of them have the same overarching goal of making application development and operations team more efficient through automation.

    At this point though, these techniques have only been successfully proven at the aforementioned software driven companies. Smaller, more agile companies are also realising the value here. However, as per Joe Beda(creator of Kubernetes & CTO at Heptio) there are very few examples of this philosophy being applied outside these technology centric companies.

    Any team/company shipping products should seriously consider adopting Cloud Native practices if they want to ship software faster while reducing risk and in turn delighting their customers.

    CLOUD NATIVE – The What?

    Cloud Native practices comprise of 4 main tenets.

     

    Cloud native — main tenets
    • DevOps is the collaboration between software developers and IT operations with the goal of automating the process of software delivery & infrastructure changes.
    • Continuous Delivery enables applications to released quickly, reliably & frequently, with less risk.
    • Micro-services is an architectural approach to building an application as a collection of small independent services that run on their own and communicate over HTTP APIs.
    • Containers provide light-weight virtualization by dynamically dividing a single server into one or more isolated containers. Containers offer both effiiciency & speed compared to standard Virual Machines (VMs). Containers provide the ability to manage and migrate the application dependencies along with the application. while abstracting away the OS and the underlying cloud platform in many cases.

    The benefits that can be reaped by adopting these methodologies include:

    1. Self managing infrastructure through automation: The Cloud Native practice goes beyond ad-hoc automation built on top of virtualization platforms, instead it focuses on orchestration, management and automation of the entire infrastructure right upto the application tier.
    2. Reliable infrastructure & application: Cloud Native practice ensures that it much easier to handle churn, replace failed components and even easier to recover from unexpected events & failures.
    3. Deeper insights into complex applications: Cloud Native tooling provides visualization for health management, monitoring and notifications with audit logs making applications easy to audit & debug
    4. Security: Enable developers to build security into applications from the start rather than an afterthought.
    5. More efficient use of resources: Containers are lighter in weight that full systems. Deploying applications in containers lead to increased resource utilization.

    Software teams have grown in size and the amount of applications and tools that a company needs to be build has grown 10x over last few years. Microservices break large complex applications into smaller pieces so that they can be developed, tested and managed independently. This enables a single microservice to be updated or rolled-back without affecting other parts of the application. Also nowadays software teams are distributed and microservices enables each team to own a small piece with service contracts acting as the communication layer.

    CLOUD NATIVE – The How?

    Now, lets look at the various building blocks of the cloud native stack that help achieve the above described goals. Here, we have grouped tools & solutions as per the problem they solve. We start with the infrastructure layer at the bottom, then the tools used to provision the infrastructure, following which we have the container runtime environment; above that we have tools to manage clusters of container environments and then at the very top we have the tools, frameworks to develop the applications.

    1. Infrastructure: At the very bottom, we have the infrastructure layer which provides the compute, storage, network & operating system usually provided by the Cloud (AWS, GCP, Azure, Openstack, VMware).

    2. Provisioning: The provisioning layer consists of automation tools that help in provisioning the infrastructure, managing images and deploying the application. Chef, Puppet & Ansible are the DevOps tools that give the ability to manage their configuration & environments. Spinnaker, Terraform, Cloudformation provide workflows to provision the infrastructure. Twistlock, Clair provide the ability to harden container images.

    3. Runtime: The Runtime provides the environment in which the application runs. It consists of the Container Engines where the application runs along with the associated storage & networking. containerd & rkt are the most widely used Container engines. Flannel, OpenContrail provide the necessary overlay networking for containers to interact with each other and the outside world while Datera, Portworx, AppOrbit etc. provide the necessary persistent storage enabling easy movement of containers across clouds.

    4. Orchestration and Management: Tools like Kubernetes, Docker Swarm and Apache Mesos abstract the management container clusters allowing easy scheduling & orchestration of containers across multiple hosts. etcd, Consul provide service registries for discovery while AVI, Envoy provide proxy, load balancer etc. services.

    5. Application Definition & Development: We can build micro-services for applications across multiple langauges — Python, Spring/Java, Ruby, Node. Packer, Habitat & Bitnami provide image management for the application to run across all infrastructure — container or otherwise. 
    Jenkins, TravisCI, CircleCI and other build automation servers provide the capability to setup continuous integration and delivery pipelines.

    6. Monitoring, Logging & Auditing: One of the key features of managing Cloud Native Infrastructure is the ability to monitor & audit the applications & underlying infrastructure.

    All modern monitoring platforms like Datadog, Newrelic, AppDynamic support monitoring of containers & microservices.

    Splunk, Elasticsearch & fluentd help in log aggregration while Open Tracing and Zipkin help in debugging applications.

    7. Culture: Adopting cloud native practices needs a cultural change where teams no longer work in independent silos. End-to-end automation of software delivery pipelines is only possible when there is an increased collaboration between development and IT operations team with a shared responbility.

    When we put all the pieces together we get the complete Cloud Native Landscape as shown below.

    Cloud Native Landscape

    I hope this post gives an idea why Cloud Native is important and what the main benefits are. As you may have noticed in the above infographic, there are several projects, tools & companies trying to solve similar problems. The next questions in mind most likely will be How do i get started? Which tools are right for me? and so on. I will cover these topics and more in my following blog posts. Stay tuned!

    Please let us know what you think by adding comments to this blog or reaching out to chirag_jog or Velotio on Twitter.

    Learn more about what we do at Velotio here and how Velotio can get you started on your cloud native journey here.

    References: