Blog

  • API Testing Using Postman and Newman

    In the last few years, we have an exponential increase in the development and use of APIs. We are in the era of API-first companies like Stripe, Twilio, Mailgun etc. where the entire product or service is exposed via REST APIs. Web applications also today are powered by REST-based Web Services. APIs today encapsulate critical business logic with high SLAs. Hence it is important to test APIs as part of the continuous integration process to reduce errors, improve predictability and catch nasty bugs.

    In the context of API development, Postman is great REST client to test APIs. Although Postman is not just a REST Client, it contains a full-featured testing sandbox that lets you write and execute Javascript based tests for your API.

    Postman comes with a nifty CLI tool – Newman. Newman is the Postman’s Collection Runner engine that sends API requests, receives the response and then runs your tests against the response. Newman lets developments easily integrate Postman into continuous integration systems like Jenkins. Some of the important features of Postman & Newman include:-

    1. Ability to test any API and see the response instantly.
    2. Ability to create test suites or collections using a collection of API endpoints.
    3. Ability to collaborate with team members on these collections.
    4. Ability to easily export/import collections as JSON files.

    We are going to look at all these features, some are intuitive and some not so much unless you’ve been using Postman for a while.

    Setting up Your Postman

    You can install Postman either as a Chrome extension or as a native application

    Later, can then look it up in your installed apps and open it. You can choose to Sign Up & create an account if you want, this is important especially for saving your API collections and accessing them anytime on any machine. However, for this article, we can skip this. There’s a button for that towards the bottom when you first launch the app.

    Postman Collections

    Postman Collections in simple words is a collection of tests. It is essentially a test suite of related tests. These tests can be scenario-based tests or sequence/workflow-based tests.

    There’s a Collections tab on the top left of Postman, with an example Postman Echo collection. You can open and go through it.

    Just like in the above screenshot, select a API request and click on the Tests. Check the first line:

    tests["response code is 200"] = responseCode.code === 200;

    The above line is a simple test to check if the response code for the API is 200. This is the pattern for writing Assertions/Tests in Postman (using JavaScript), and this is actually how you are going to write the tests for API’s need to be tested.You can open the other API requests in the POSTMAN Echo collection to get a sense of how requests are made.

    Adding a COLLECTION

    To make your own collection, click on the ‘Add Collection‘ button on the top left of Postman and call it “Test API”

    You will be prompted to give details about the collection, I’ve added a name Github API and given it a description.

    Clicking on Create should add the collection to the left pane, above, or below the example “POSTMAN Echo” collection.

    If you need a hierarchy for maintaining relevance between multiple API’s inside a collection, APIs can further be added to a folder inside a collection. Folders are a great way of separating different parts of your API workflow. You can be added folders through the “3 dot” button beside Collection Name:

    Eg.: name the folder “Get Calls” and give a description once again.

    Now that we have the folder, the next task is to add an API call that is related to the TEST_API_COLLECTION to that folder. That API call is to https://api.github.com/.

    If you still have one of the TEST_API_COLLECTION collections open, you can close it the same way you close tabs in a browser, or just click on the plus button to add a new tab on the right pane where we make requests.

    Type in or paste in https://api.github.com/ and press Send to see the response.

    Once you get the response, you can click on the arrow next to the Save button on the far right, and select Save As, a pop up will be displayed asking where to save the API call.

    Give a name, it can be the request URL, or a name like “GET Github Basic”, and a description, then choose the collection and folder, in this case, TEST_API_COLLECTION> GET CALLS, then click on Save. The API call will be added to the Github Root API folder on the left pane.

    Whenever you click on this request from the collection, it will open in the center pane.

    Write the Tests

    We’ve seen that the GET Github Basic request has a JSON response, which is usually the case for most of the APIs.This response has properties such as current_user_url, emails_url, followers_url and following_url to pick a few. The current_user_url has a value of https://api.github.com/user.  Let’s add a test, for this URL. Click on the ‘GET Github Basic‘ and click on the test tab in the section just below where the URL is put.

    You will notice on the right pane, we have some snippets which Postman creates when you click so that you don’t have to write a lot of code. Let’s add Response Body: JSON value check. Clicking on it produces the following snippet.

    var jsonData = JSON.parse(responseBody);
    tests["Your test name"] = jsonData.value === 100;

    From these two lines, it is apparent that Postman stores the response in a global object called responseBody, and we can use this to access response and assert values in tests as required.

    Postman also has another global variable object called tests, which is an object you can use to name your tests, and equate it to a boolean expression. If the boolean expression returns true, then the test passes.

    tests['some random test'] = x === y

    If you click on Send to make the request, you will see one of the tests failing.

    Lets create a test that relevant to our usecase.

    var jsonData = JSON.parse(responseBody);
    var usersURL = "https://api.github.com/user"
    tests["Gets the correct users url"] = jsonData.current_user_url === usersURL;

    Clicking on ‘Send‘, you’ll see the test passing.

    Let’s modify the test further to test some of the properties we want to check

    Ideally the things to be tested in an API Response Body should be:

    • Response Code ( Assert Correct Response Code for any request)
    • Response Time ( to check api responds in an acceptable time range / is not delayed)
    • Response Body is not empty / null
    tests["Status code is 200"] = responseCode.code === 200;
    tests["Response time is less than 200ms"] = responseTime < 200;
    tests["Response time is acceptable"] = _.inRange(responseTime, 0, 500);
    tests["Body is not empty"] = (responseBody!==null || responseBody.length!==0);

    Newman CLI

    Once you’ve set up all your collections and written tests for them, it may be tedious to go through them one by one and clicking send to see if a given collection test passes. This is where Newman comes in. Newman is a command-line collection runner for Postman.

    All you need to do is export your collection and the environment variables, then use Newman to run the tests from your terminal.

    NOTE: Make sure you’ve clicked on ‘Save’ to save your collection first before exporting.

    USING NEWMAN

    So the first step is to export your collection and environment variables. Click on the Menu icon for Github API collection, and select export.

    Select version 2, and click on “Export”

    Save the JSON file in a location you can access with your terminal. I created a local directory/folder called “postman” and saved it there.

    Install Newman CLI globally, then navigate to the where you saved the collection.

    npm install -g newman 
    cd postman

    Using Newman is quite straight-forward, and the documentation is extensive. You can even require it as a Node.js module and run the tests there. However, we will use the CLI.

    Once you are in the directory, run newman run <collection_name.json>, </collection_name.json> replacing the collection_name with the name you used to save the collection.

    newman run TEST_API_COLLECTION.postman_collection.json     

    NEWMAN CLI Options

    Newman provides a rich set of options to customize a run. A list of options can be retrieved by running it with the -h flag.

    
    $ newman run -h
    Options - Additional args: 
    Utility:
    -h, --help output usage information
    -v, --version output the version number
    Basic setup:
    --folder [folderName] Specify a single folder to run from a collection.
    -e, --environment [file|URL] Specify a Postman environment as a JSON [file]
    -d, --data [file] Specify a data file to use either json or csv
    -g, --global [file] Specify a Postman globals file as JSON [file]
    -n, --iteration-count [number] Define the number of iterations to run
    Request options:
    --delay-request [number] Specify a delay (in ms) between requests [number] --timeout-request [number] Specify a request timeout (in ms) for a request
    Misc.:
    --bail Stops the runner when a test case fails
    --silent Disable terminal output --no-color Disable colored output
    -k, --insecure Disable strict ssl
    -x, --suppress-exit-code Continue running tests even after a failure, but exit with code=0
    --ignore-redirects Disable automatic following of 3XX responses

    Lets try out of some of the options.

    Iterations

    Lets use the -n option to set the number of iterations to run the collection.

    $ newman run mycollection.json -n 10 # runs the collection 10 times

    To provide a different set of data, i.e. variables for each iteration, you can use the -d to specify a JSON or CSV file. For example, a data file such as the one shown below will run 2 iterations, with each iteration using a set of variables.

    [{
    "url": "http://127.0.0.1:5000",
      "user_id": "1",
      "id": "1",
      "token_id": "123123",
    },{
      "url": "http://postman-echo.com",
      "user_id": "2",
      "id": "2",
      "token_id": "899899",
    }]$ newman run mycollection.json -d data.json

    Alternately, the CSV file for the above set of variables would look like:

    url, user_id, id, token_id 
    http://127.0.0.1:5000, 1, 1, 123123123 
    http://postman-echo.com, 2, 2, 899899

    Environment Variables

    Each environment is a set of key-value pairs, with the key as the variable name. These Environment configurations can be used to differentiate between configurations specific to your execution environments eg. Dev, Test & Production.

    To provide a different execution environment, you can use the -e to specify a JSON or CSV file. For example, a environment file such as the one shown below will provide the environment variables globally to all tests during execution.

    postman_dev_env.json
    {
    "id": "b5c617ad-7aaf-6cdf-25c8-fc0711f8941b",
    "name": "dev env",
    "values": [
    {
    "enabled": true,
    "key": "env",
    "value": "dev.example.com",
    "type": "text"
    }  
    ],
    "timestamp": 1507210123364,
    "_postman_variable_scope": "environment",
    "_postman_exported_at": "2017-10-05T13:28:45.041Z",
    "_postman_exported_using": "Postman/5.2.1"
    }

    Bail FLAG

    Newman, by default, exits with a status code of 0 if everything runs well i.e. without any exceptions. Continuous integration tools respond to these exit codes and correspondingly pass or fail a build. You can use the –bail flag to tell Newman to halt on a test case error with a status code of 1 which can then be picked up by a CI tool or build system.

    $ newman run PostmanCollection.json -e environment.json --bail newman

    Conclusion

    Postman and Newman can be used for a number of test cases, including creating usage scenarios, Suites, Packs for your API Test Cases. Further NEWMAN / POSTMAN can be very well Integrated with CI/CD Tools such as Jenkins, Travis etc.

  • Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

    Introduction

    The world continues to go through digital transformation at an accelerating pace. Modern applications and infrastructure continues to expand and operational complexity continues to grow. According to a recent ManageEngine Application Performance Monitoring Survey:

    • 28 percent use ad-hoc scripts to detect issues in over 50 percent of their applications.
    • 32 percent learn about application performance issues from end users.
    • 59 percent trust monitoring tools to identify most performance deviations.

    Most enterprises and web-scale companies have instrumentation & monitoring capabilities with an ElasticSearch cluster. They have a high amount of collected data but struggle to use it effectively. This available data can be used to improve availability and effectiveness of performance and uptime along with root cause analysis and incident prediction

    IT Operations & Machine Learning

    Here is the main question: How to make sense of the huge piles of collected data? The first step towards making sense of data is to understand the correlations between the time series data. But only understanding will not work since correlation does not imply causation. We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc.

    It’s very likely that due to one component something goes wrong with another component. In such cases, operational historical data can be used to identify the root cause by investigating through a series of intermediate causes and effects. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

    Let’s see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

    Anomaly Detection using Elastic’s machine learning with X-Pack

    Step I: Setup

    1. Setup Elasticsearch: 

    According to Elastic documentation, it is recommended to use the Oracle JDK version 1.8.0_131. Check if you have required Java version installed on your system. It should be at least Java 8, if required install/upgrade accordingly.

    • Download elasticsearch tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.tar.gz
    $ tar -xzvf elasticsearch-5.5.1.tar.gz

    • It will then create a folder named elasticsearch-5.5.1. Go into the folder.
    $ cd elasticsearch-5.5.1

    • Install X-Pack into Elasticsearch
    $ ./bin/elasticsearch-plugin install x-pack

    • Start elasticsearch
    $ bin/elasticsearch

    2. Setup Kibana

    Kibana is an open source analytics and visualization platform designed to work with Elasticsearch.

    • Download kibana tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/kibana/kibana-5.5.1-linux-x86_64.tar.gz
    $ tar -xzf kibana-5.5.1-linux-x86_64.tar.gz

    • It will then create a folder named kibana-5.5.1. Go into the directory.
    $ cd kibana-5.5.1-linux-x86_64

    • Install X-Pack into Kibana
    $ ./bin/kibana-plugin install x-pack

    • Running kibana
    $ ./bin/kibana

    • Navigate to Kibana at http://localhost:5601/
    • Log in as the built-in user elastic and password changeme.
    • You will see the below screen:
    Kibana: X-Pack Welcome Page

     

    3. Metricbeat:

    Metricbeat helps in monitoring servers and the services they host by collecting metrics from the operating system and services. We will use it to get CPU utilization metrics of our local system in this blog.

    • Download Metric Beat’s tarball and untar it
    $ wget https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-5.5.1-linux-x86_64.tar.gz
    $ tar -xvzf metricbeat-5.5.1-linux-x86_64.tar.gz

    • It will create a folder metricbeat-5.5.1-linux-x86_64. Go to the folder
    $ cd metricbeat-5.5.1-linux-x86_64

    • By default, Metricbeat is configured to send collected data to elasticsearch running on localhost. If your elasticsearch is hosted on any server, change the IP and authentication credentials in metricbeat.yml file.
     Metricbeat Config

     

    • Metric beat provides following stats:
    • System load
    • CPU stats
    • IO stats
    • Per filesystem stats
    • Per CPU core stats
    • File system summary stats
    • Memory stats
    • Network stats
    • Per process stats
    • Start Metricbeat as daemon process
    $ sudo ./metricbeat -e -c metricbeat.yml &

    Now, all setup is done. Let’s go to step 2 to create machine learning jobs. 

    Step II: Time Series data

    • Real-time data: We have metricbeat providing us the real-time series data which will be used for unsupervised learning. Follow below steps to define index pattern metricbeat-*  in Kibana to search against this pattern in Elasticsearch:
      – Go to Management -> Index Patterns  
      – Provide Index name or pattern as metricbeat-*
      – Select Time filter field name as @timestamp
      – Click Create

    You will not be able to create an index if elasticsearch did not contain any metric beat data. Make sure your metric beat is running and output is configured as elasticsearch.

    • Saved Historic data: Just to see quickly how machine learning detect the anomalies you can also use data provided by Elastic. Download sample data by clicking here.
    • Unzip the files in a folder: tar -zxvf server_metrics.tar.gz
    • Download this script. It will be used to upload sample data to elastic.
    • Provide execute permissions to the file: chmod +x upload_server-metrics.sh
    • Run the script.
    • As we created index pattern for metricbeat data, in same way create index pattern server-metrics*

    Step III: Creating Machine Learning jobs

    There are two scenarios in which data is considered anomalous. First, when the behavior of key indicator changes over time relative to its previous behavior. Secondly, when within a population behavior of an entity deviates from other entities in population over single key indicator.

    To detect these anomalies, there are three types of jobs we can create:

    1. Single Metric job: This job is used to detect Scenario 1 kind of anomalies over only one key performance indicator.
    2. Multimetric job: Multimetric job also detects Scenario 1 kind of anomalies but in this type of job we can track more than one performance indicators, such as CPU utilization along with memory utilization.
    3. Advanced job: This kind of job is created to detect anomalies of type 2.

    For simplicity, we are creating following single metric jobs:

    1. Tracking CPU Utilization: Using metric beat data
    2. Tracking total requests made on server: Using sample server data

    Follow below steps to create single metric jobs:

    Job1: Tracking CPU Utilization

    Job2: Tracking total requests made on server

    • Go to http://localhost:5601/
    • Go to Machine learning tab on the left panel of Kibana.
    • Click on Create new job
    • Click Create single metric job
    • Select index we created in Step 2 i.e. metricbeat-* and server-metrics* respectively
    • Configure jobs by providing following values:
    1. Aggregation: Here you need to select an aggregation function that will be applied to a particular field of data we are analyzing.
    2. Field: It is a drop down, will show you all field that you have w.r.t index pattern.
    3. Bucket span: It is interval time for analysis. Aggregation function will be applied on selected field after every interval time specified here.
    • If your data contains so many empty buckets i.e. data is sparse and you don’t want to consider it as anomalous check the checkbox named sparse data  (if it appears).
    • Click on Use full <index pattern=””> data to use all available data for analysis.</index>
    Metricbeats Description
    Server Description
    • Click on play symbol
    • Provide job name and description
    • Click on Create Job

    After creating job the data available will be analyzed. Click on view results, you will see a chart which will show the actual and upper & lower bound of predicted value. If actual value lies outside of the range, it will be considered as anomalous. The Color of the circles represents the severity level.

    Here we are getting a high range of prediction values since it just started learning. As we get more data the prediction will get better.
    You can see here predictions are pretty good since there is a lot of data to understand the pattern
    • Click on machine learning tab in the left panel. The jobs we created will be listed here.
    • You will see the list of actions for every job you have created.
    • Since we are storing every minute data for Job1 using metricbeat. We can feed the data to the job in real time. Click on play button to start data feed. As we get more and more data prediction will improve.
    • You see details of anomalies by clicking Anomaly Viewer.
    Anomaly in metricbeats data
    Server metrics anomalies  

    We have seen how machine learning can be used to get patterns among the different statistics along with anomaly detection. After identifying anomalies, it is required to find the context of those events. For example, to know about what other factors are contributing to the problem? In such cases, we can troubleshoot by creating multimetric jobs.

  • Amazon Lex + AWS Lambda: Beyond Hello World

    In my previous blog, I explained how to get started with Amazon Lex and build simple bots. This blog aims at exploring the Lambda functions used by Amazon Lex for code validation and fulfillment. We will go along with the same example we created in our first blog i.e. purchasing a book and will see in details how the dots are connected.

    This blog is divided into following sections:

    1. Lambda function input format
    2. Response format
    3. Managing conversation context
    4. An example (demonstration to understand better how context is maintained to make data flow between two different intents)

    NOTE: Input to a Lambda function will change according to the language you use to create the function. Since we have used NodeJS for our example, everything will thus be explained using it.

    Section 1:  Lambda function input format

    When communication is started with a Bot, Amazon Lex passes control to Lambda function, we have defined while creating the bot.

    There are three arguments that Amazon Lex passes to a Lambda function:

    1. Event:

    event is a JSON variable containing all details regarding a bot conversation. Every time lambda function is invoked, event JSON is sent by Amazon Lex which contains the details of the respective message sent by the user to the bot.

    Below is a sample event JSON:

    {  
    currentIntent: {    
    name: 'orderBook',    
    slots: {      
      bookType: null,      
      bookName: 'null'    
     },    
      confirmationStatus: 'None'
     },
     bot: {  
      name: 'PurchaseBook',  
      alias: '$LATEST',  
      version: '$LATEST'
     },
     userId: 'user-1',
     inputTranscript: 'buy me a book',
     invocationSource: 'DialogCodeHook',
     outputDialogMode: 'Text',
     messageVersion: '1.0'
     };

    Format of event JSON is explained below:-

    • currentIntent:  It will contain information regarding the intent of message sent by the user to the bot. It contains following keys:
    • name: intent name  (for e.g orderBook, we defined this intent in our previous blog).
    • slots: It will contain a map of slot names configured for that particular intent,  populated with values recognized by Amazon Lex during the conversation. Default values are null.  
    • confirmationStatus:  It provides the user response to a confirmation prompt if there is one. Possible values for this variable are:
    • None: Default value
    • Confirmed: When the user responds with a confirmation w.r.t confirmation prompt.
    • Denied: When the user responds with a deny w.r.t confirmation prompt.
    • inputTranscipt: Text input by the user for processing. In case of audio input, the text will be extracted from audio. This is the text that is actually processed to recognize intents and slot values.                                
    • invocationSource: Its value directs the reason for invoking the Lambda function. It can have following two values:
    • DialogCodeHook:  This value directs the Lambda function to initialize the validation of user’s data input. If the intent is not clear, Amazon Lex can’t invoke the Lambda function.
    • FulfillmentCodeHook: This value is set to fulfil the intent. If the intent is configured to invoke a Lambda function as a fulfilment code hook, Amazon Lex sets the invocationSource to this value only after it has all the slot data to fulfil the intent.
    • bot: Details of bot that processed the request. It consists of below information:
    • name:  name of the bot.
    • alias: alias of the bot version.
    • version: the version of the bot.
    • userId: Its value is defined by the client application. Amazon Lex passes it to the Lambda function.
    • outputDialogMode:  Its value depends on how you have configured your bot. Its value can be Text / Voice.
    • messageVersion: The version of the message that identifies the format of the event data going into the Lambda function and the expected format of the response from a Lambda function. In the current implementation, only message version 1.0 is supported. Therefore, the console assumes the default value of 1.0 and doesn’t show the message version.
    • sessionAttributes:  Application-specific session attributes that the client sent in the request. It is optional.

    2. Context:

    AWS Lambda uses this parameter to provide the runtime information of the Lambda function that is executing. Some useful information we can get from context object are:-

    • The time is remaining before AWS Lambda terminates the Lambda function.
    • The CloudWatch log stream associated with the Lambda function that is executing.
    • The AWS request ID returned to the client that invoked the Lambda function which can be used for any follow-up inquiry with AWS support.

    Section 2: Response Format

    Amazon Lex expects a response from a Lambda function in the following format:

    {  
    sessionAttributes: {},  
    dialogAction: {   
    type: "ElicitIntent/ ElicitSlot/ ConfirmIntent/ Delegate/ Close",
    <structure based on type> 
    }
    }

    The response consists of two fields. The sessionAttributes field is optional, the dialogAction field is required. The contents of the dialogAction field depends on the value of the type field.

    • sessionAttributes: This is an optional field, it can be empty. If the function has to send something back to the client it should be passed under sessionAttributes. We will see its use-case in Section-4.
    • dialogAction (Required): Type of this field defines the next course of action. There are five types of dialogAction explained below:-

    1) Close: Informs Amazon Lex not to expect a response from the user. This is the case when all slots get filled. If you don’t specify a message, Amazon Lex uses the goodbye message or the follow-up message configured for the intent.

    dialogAction: {   
    type: "Close",   
    fulfillmentState: "Fulfilled/ Failed", // (required)   
    message: { // (optional)     
    contentType: "PlainText or SSML",     
    content: "Message to convey to the user"   
    } 
    }

    2) ConfirmIntent: Informs Amazon Lex that the user is expected to give a yes or no answer to confirm or deny the current intent. The slots field must contain an entry for each of the slots configured for the specified intent. If the value of a slot is unknown, you must set it to null. The message and responseCard fields are optional.

    dialogAction: {   
    type: "ConfirmIntent",   
    intentName: "orderBook",   
    slots: {     
      bookName: "value",     
      bookType: "value",   
     }   
     message: { // (optional)     
      contentType: "PlainText or SSML",     
      content: "Message to convey to the user"   
      } 
      }

    3) Delegate:  Directs Amazon Lex to choose the next course of action based on the bot configuration. The response must include any session attributes, and the slots field must include all of the slots specified for the requested intent. If the value of the field is unknown, you must set it to null. You will get a DependencyFailedException exception if your fulfilment function returns the Delegate dialog action without removing any slots.

    dialogAction: {   
    type: "Delegate",   
    slots: {     
      slot1: "value",     
      slot2: "value"   
     } 
     }

    4) ElicitIntent: Informs Amazon Lex that the user is expected to respond with an utterance that includes an intent. For example, “I want a buy a book” which indicates the OrderBook intent. The utterance “book,” on the other hand, is not sufficient for Amazon Lex to infer the user’s intent

    dialogAction: 
    {   type: "ElicitIntent",   
    message: { // (optional)     
    contentType: "PlainText or SSML",     
    content: "Message to convey to the user"   
    } 
    }

    5) ElicitSlot:  Informs Amazon Lex that the user is expected to provide a slot value in the response. In below structure, we are informing Amazon lex that user response should provide value for the slot named ‘bookName’.

    dialogAction: {   
      type: "ElicitSlot",   
      intentName: "orderBook",   
      slots: {     
        bookName: "",     
        bookType: "fiction",   
       },   
       slotToElicit: "bookName",   
       message: { // (optional)     
       contentType: "PlainText or SSML",     
       content: "Message to convey to the user"   
       }
       }

    Section 3: Managing Conversation Context

    Conversation context is the information that a user, your application, or a Lambda function provides to an Amazon Lex bot to fulfill an intent. Conversation context includes slot data that the user provides, request attributes set by the client application, and session attributes that the client application and Lambda functions create.

    1. Setting session timeout

    Session timeout is the length of time that a conversation session lasts. For in-progress conversations, Amazon Lex retains the context information, slot data, and session attributes till the session ends. Default session duration is 5 minutes but it can be changed upto 24 hrs while creating the bot in Amazon Lex console.

    2.Setting session attributes

    Session attributes contain application-specific information that is passed between a bot and a client application during a session. Amazon Lex passes session attributes to all Lambda functions configured for a bot. If a Lambda function adds or updates session attributes, Amazon Lex passes the new information back to the client application.

    Session attributes persist for the duration of the session. Amazon Lex stores them in an encrypted data store until the session ends.

    3. Sharing information between intents

    If you have created a bot with more than one intent, information can be shared between them using session attributes. Attributes defined while fulfilling an intent can be used in other defined intent.

    For example, a user of the book ordering bot starts by ordering books. the bot engages in a conversation with the user, gathering slot data, such as book name, and quantity. When the user places an order, the Lambda function that fulfils the order sets the lastConfirmedReservation session attribute containing information regarding ordered book and currentReservationPrice containing the price of the book. So, when the user has fulfilled the intent orderMagazine, the final price will be calculated on the bases of currentReservationPrice.

    lastConfirmedReservation session attribute containing information regarding ordered book and currentReservationPrice containing the price of the book. So, when the user also fulfilled the intent orderMagazine, the final price will be calculated on the basis of currentReservationPrice.

    Section 4:  Example

    The details of example Bot are below:

    Bot Name: PurchaseBot

    Intents :

    • orderBook – bookName, bookType
    • orderMagazine – magazineName, issueMonth

    Session attributes set while fulfilling the intent “orderBook” are:

    1. lastConfirmedReservation: In this variable, we are storing slot values corresponding to intent orderBook.
    2. currentReservationPrice: Book price is calculated and stored in this variable

    When intent orderBook gets fulfilled we will ask the user if he also wants to order a magazine. If the user responds with a confirmation bot will start fulfilling the intent “orderMagazine”.  

    Conclusion

    AWS Lambda functions are used as code hooks for your Amazon Lex bot. You can identify Lambda functions to perform initialization and validation, fulfillment, or both in your intent configuration. This blog bought more technical insight of how Amazon Lex works and how it communicates with Lambda functions. This blog explains how a conversation context is maintained using the session attributes. I hope you find the information useful.

  • Acquiring Temporary AWS Credentials with Browser Navigated Authentication

    In one of my previous blog posts (Hacking your way around AWS IAM Roles), we demonstrated how users can access AWS resources without having to store AWS credentials on disk. This was achieved by setting up an OpenVPN server and client-side route that gets automatically pushed when the user is connected to the VPN. To this date, I really find this as a complaint-friendly solution without forcing users to do any manual configuration on their system. It also makes sense to have access to AWS resources as long as they are connected on VPN. One of the downsides to this method is maintaining an OpenVPN server, keeping it secure and having it running in a highly available (HA) state. If the OpenVPN server is compromised, our credentials are at stake. Secondly, all the users connected on VPN get the same level of access.

    In this blog post, we present to you a CLI utility written in Rust that writes temporary AWS credentials to a user profile (~/.aws/credentials file) using web browser navigated Google authentication. This utility is inspired by gimme-aws-creds (written in python for Okta authenticated AWS farm) and heroku cli (written in nodejs and utilizes oclif framework). We will refer to our utility as aws-authcreds throughout this post.

    “If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”

    – George Bernard Shaw

    What does this CLI utility (auth-awscreds) do?

    When the user fires a command (auth-awscreds) on the terminal, our program reads utility configuration from file .auth-awscreds located in the user home directory. If this file is not present, the utility prompts for setting the configuration for the first time. Utility configuration file is INI format. Program then opens a default web browser and navigates to the URL read from the configuration file. At this point, the utility waits for the browser URL to navigate and authorize. Web UI then navigates to Google Authentication. If authentication is successful, a callback is shared with CLI utility along with temporary AWS credentials, which is then written to ~/.aws/credentials file.

    Block Diagram

    Tech Stack Used

    As stated earlier, we wrote this utility in Rust. One of the reasons for choosing Rust is because we wanted a statically typed binary (ELF) file (executed independent of interpreter), which ships as it is when compiled. Unlike programs written in Python or Node.js, one needs a language interpreter and has supporting libraries installed for your program. The golang would have also suffice our purpose, but I prefer Rust over golang.

    Software Stack:

    • Rust (for CLI utility)
    • Actix Web – HTTP Server
    • Node.js, Express, ReactJS, serverless-http, aws-sdk, AWS Amplify, axios
    • Terraform and serverless framework

    Infrastructure Stack:

    • AWS Cognito (User Pool and Federated Identities)
    • AWS API Gateway (HTTP API)
    • AWS Lambda
    • AWS S3 Bucket (React App)
    • AWS CloudFront (For Serving React App)
    • AWS ACM (SSL Certificate)

    Recipe

    Architecture Diagram

    CLI Utility: auth-awscreds

    Our goal is, when the auth-awscreds command is fired, we first check if the user’s home directory ~/.aws/credentials file exists. If not, we create a ~/.aws directory. This is the default AWS credentials directory, where usually AWS SDK looks for credentials (unless exclusively specified by env var AWS_SHARED_CREDENTIALS_FILE). The next step would be to check if a ~/.auth-awscredds file exists. If this file doesn’t exist, we create a prompt user with two inputs: 

    1. AWS credentials profile name (used by SDK, default is preferred) 

    2. Application domain URL (Our backend app domain is used for authentication)

    let app_profile_file = format!("{}/.auth-awscreds",&user_home_dir);
     
       let config_exist : bool = Path::new(&app_profile_file).exists();
     
       let mut profile_name = String::new();
       let mut app_domain = String::new();
     
       if !config_exist {
           //ask the series of questions
           print!("Which profile to write AWS Credentials [default] : ");
           io::stdout().flush().unwrap();
           io::stdin()
               .read_line(&mut profile_name)
               .expect("Failed to read line");
     
           print!("App Domain : ");
           io::stdout().flush().unwrap();
          
           io::stdin()
               .read_line(&mut app_domain)
               .expect("Failed to read line");
          
           profile_name=String::from(profile_name.trim());
           app_domain=String::from(app_domain.trim());
          
           config_profile(&profile_name,&app_domain);
          
       }
       else {
           (profile_name,app_domain) = read_profile();
       }

    These two properties are written in ~/.auth-awscreds under the default section. Followed by this, our utility generates RSA asymmetric 1024 bit public and private key. Both the keypair are converted to base64.

    pub fn genkeypairs() -> (String,String) {
       let rsa = Rsa::generate(1024).unwrap();
     
       let private_key: Vec<u8> = rsa.private_key_to_pem_passphrase(Cipher::aes_128_cbc(),"Sagar Barai".as_bytes()).unwrap();
       let public_key: Vec<u8> = rsa.public_key_to_pem().unwrap();
     
       (base64::encode(private_key) , base64::encode(public_key))
    }

    We then launch a browser window and navigate to the specified app domain URL. At this stage, our utility starts a temporary web server with the help of the Actix Web framework and listens on 63442 port of localhost.

    println!("Opening web ui for authentication...!");
       open::that(&app_domain).unwrap();
     
       HttpServer::new(move || {
           //let stopper = tx.clone();
           let cors = Cors::permissive();
           App::new()
           .wrap(cors)
           //.app_data(stopper)
           .app_data(crypto_data.clone())
           .service(get_public_key)
           .service(set_aws_creds)
       })
       .bind(("127.0.0.1",63442))?
       .run()
       .await

    Localhost web server has two end points.

    1. GET Endpoint (/publickey): This endpoint is called by our React app after authentication and returns the public key created during the initialization process. Since the web server hosted by the Rust application is insecure (non ssl),  when actual AWS credentials are received, they should be posted as an encrypted string with the help of this public key.

    #[get("/publickey")]
    pub async fn get_public_key(data: web::Data<AppData>) -> impl Responder {
       let public_key = &data.public_key;
      
       web::Json(HTTPResponseData{
           status: 200,
           msg: String::from("Ok"),
           success: true,
           data: String::from(public_key)
       })
    }

    2. POST Endpoint (/setcreds): This endpoint is called when the react app has successfully retrieved credentials from API Gateway. Credentials are decrypted by private key and then written to ~/.aws/credentials file defined by profile name in utility configuration. 

    let encrypted_data = payload["data"].as_array().unwrap();
       let username = payload["username"].as_str().unwrap();
     
       let mut decypted_payload = vec![];
     
       for str in encrypted_data.iter() {
           //println!("{}",str.to_string());
           let s = str.as_str().unwrap();
           let decrypted = decrypt_data(&private_key, &s.to_string());
           decypted_payload.extend_from_slice(&decrypted);
       }
     
       let credentials : serde_json::Value = serde_json::from_str(&String::from_utf8(decypted_payload).unwrap()).unwrap();
     
       let aws_creds = AWSCreds{
           profile_name: String::from(profile_name),
           aws_access_key_id: String::from(credentials["AccessKeyId"].as_str().unwrap()),
           aws_secret_access_key: String::from(credentials["SecretAccessKey"].as_str().unwrap()),
           aws_session_token: String::from(credentials["SessionToken"].as_str().unwrap())
       };
     
       println!("Authenticated as {}",username);
       println!("Updating AWS Credentials File...!");
     
       configcreds(&aws_creds);

    One of the interesting parts of this code is the decryption process, which iterates through an array of strings and is joined by method decypted_payload.extend_from_slice(&decrypted);. RSA 1024 is 128-byte encryption, and we used OAEP padding, which uses 42 bytes for padding and the rest for encrypted data. Thus, 86 bytes can be encrypted at max. So, when credentials are received they are an array of 128 bytes long base64 encoded data. One has to decode the bas64 string to a data buffer and then decrypt data piece by piece.

    To generate a statically typed binary file, run: cargo build –release

    AWS Cognito and Google Authentication

    This guide does not cover how to set up Cognito and integration with Google Authentication. You can refer to our old post for a detailed guide on setting up authentication and authorization. (Refer to the sections Setup Authentication and Setup Authorization).

    React App:

    The React app is launched via our Rust CLI utility. This application is served right from the S3 bucket via CloudFront. When our React app is loaded, it checks if the current session is authenticated. If not, then with the help of the AWS Amplify framework, our app is redirected to Cognito-hosted UI authentication, which in turn auto redirects to Google Login page.

    render(){
       return (
         <div className="centerdiv">
           {
             this.state.appInitialised ?
               this.state.user === null ? Auth.federatedSignIn({provider: 'Google'}) :
               <Aux>
                 {this.state.pageContent}
               </Aux>
             :
             <Loader/>
           }
         </div>
       )
     }

    Once the session is authenticated, we set the react state variables and then retrieve the public key from the actix web server (Rust CLI App: auth-awscreds) by calling /publickey GET method. Followed by this, an Ajax POST request (/auth-creds) is made via axios library to API Gateway. The payload contains a public key, and JWT token for authentication. Expected response from API gateway is encrypted AWS temporary credentials which is then proxied to our CLI application.

    To ease this deployment, we have written a terraform code (available in the repository) that takes care of creating an S3 bucket, CloudFront distribution, ACM, React build, and deploying it to the S3 bucket. Navigate to vars.tf file and change the respective default variables). The Terraform script will fail at first launch since the ACM needs a DNS record validation. You can create a CNAME record for DNS validation and re-run the Terraform script to continue deployment. The React app expects few environment variables. Below is the sample .env file; update the respective values for your environment.

    REACT_APP_IDENTITY_POOL_ID=
    REACT_APP_COGNITO_REGION=
    REACT_APP_COGNITO_USER_POOL_ID=
    REACT_APP_COGNTIO_DOMAIN_NAME=
    REACT_APP_DOMAIN_NAME=
    REACT_APP_CLIENT_ID=
    REACT_APP_CLI_APP_URL=
    REACT_APP_API_APP_URL=

    Finally, deploy the React app using below sample commands.

    $ terraform plan -out plan     #creates plan for revision
    $ terraform apply plan         #apply plan and deploy

    API Gateway HTTP API and Lambda Function

    When a request is first intercepted by API Gateway, it validates the JWT token on its own. API Gateway natively supports Cognito integration. Thus, any payload with invalid authorization header is rejected at API Gateway itself. This eases our authentication process and validates the identity. If the request is valid, it is then received by our Lambda function. Our Lambda function is written in Node.js and wrapped by serverless-http framework around express app. The Express app has only one endpoint.

    /auth-creds (POST): once the request is received, it retrieves the ID from Cognito and logs it to stdout for audit purpose.

    let identityParams = {
               IdentityPoolId: process.env.IDENTITY_POOL_ID,
               Logins: {}
           };
      
           identityParams.Logins[`${process.env.COGNITOIDP}`] = req.headers.authorization;
      
           const ci = new CognitoIdentity({region : process.env.AWSREGION});
      
           let idpResponse = await ci.getId(identityParams).promise();
      
           console.log("Auth Creds Request Received from ",JSON.stringify(idpResponse));

    The app then extracts the base64 encoded public key. Followed by this, an STS api call (Security Token Service) is made and temporary credentials are derived. These credentials are then encrypted with a public key in chunks of 86 bytes.

    const pemPublicKey = Buffer.from(public_key,'base64').toString();
     
           const authdata=await sts.assumeRole({
               ExternalId: process.env.STS_EXTERNAL_ID,
               RoleArn: process.env.IAM_ROLE_ARN,
               RoleSessionName: "DemoAWSAuthSession"
           }).promise();
     
           const creds = JSON.stringify(authdata.Credentials);
           const splitData = creds.match(/.{1,86}/g);
          
           const encryptedData = splitData.map(d=>{
               return publicEncrypt(pemPublicKey,Buffer.from(d)).toString('base64');
           });

    Here, the assumeRole calls the IAM role, which has appropriate policy documents attached. For the sake of this demo, we attached an Administrator role. However, one should consider a hardening policy document and avoid attaching Administrator policy directly to the role.

    resources:
     Resources:
       AuthCredsAssumeRole:
         Type: AWS::IAM::Role
         Properties:
           AssumeRolePolicyDocument:
             Version: "2012-10-17"
             Statement:
               -
                 Effect: Allow
                 Principal:
                   AWS: !GetAtt IamRoleLambdaExecution.Arn
                 Action: sts:AssumeRole
                 Condition:
                   StringEquals:
                     sts:ExternalId: ${env:STS_EXTERNAL_ID}
           RoleName: auth-awscreds-api
           ManagedPolicyArns:
             - arn:aws:iam::aws:policy/AdministratorAccess

    Finally, the response is sent to the React app. 

    We have used the Serverless framework to deploy the API. The Serverless framework creates API gateway, lambda function, Lambda Layer, and IAM role, and takes care of code deployment to lambda function.

    To deploy this application, follow the below steps.

    1. cd layer/nodejs && npm install && cd ../.. && npm install

    2. npm install -g serverless (on mac you can skip this step and use the npx serverless command instead) 

    3. Create .env file and below environment variables to file and set the respective values.

    AWSREGION=ap-south-1
    COGNITO_USER_POOL_ID=
    IDENTITY_POOL_ID=
    COGNITOIDP=
    APP_CLIENT_ID=
    STS_EXTERNAL_ID=
    IAM_ROLE_ARN=
    DEPLOYMENT_BUCKET=
    APP_DOMAIN=

    4. serverless deploy or npx serverless deploy

    Entire codebase for CLI APP, React App, and Backend API  is available on the GitHub repository.

    Testing:

    Assuming that you have compiled binary (auth-awscreds) available in your local machine and for the sake of testing you have installed `aws-cli`, you can then run /path/to/your/auth-awscreds. 

    App Testing

    If you selected your AWS profile name as “demo-awscreds,” you can then export the AWS_PROFILE environment variable. If you prefer a “default” profile, you don’t need to export the environment variable as AWS SDK selects a “default” profile on its own.

    [demo-awscreds]
    aws_access_key_id=ASIAUAOF2CHC77SJUPZU
    aws_secret_access_key=r21J4vwPDnDYWiwdyJe3ET+yhyzFEj7Wi1XxdIaq
    aws_session_token=FwoGZXIvYXdzEIj//////////wEaDHVLdvxSNEqaQZPPQyK2AeuaSlfAGtgaV1q2aKBCvK9c8GCJqcRLlNrixCAFga9n+9Vsh/5AWV2fmea6HwWGqGYU9uUr3mqTSFfh+6/9VQH3RTTwfWEnQONuZ6+E7KT9vYxPockyIZku2hjAUtx9dSyBvOHpIn2muMFmizZH/8EvcZFuzxFrbcy0LyLFHt2HI/gy9k6bLCMbcG9w7Ej2l8vfF3dQ6y1peVOQ5Q8dDMahhS+CMm1q/T1TdNeoon7mgqKGruO4KJrKiZoGMi1JZvXeEIVGiGAW0ro0/Vlp8DY1MaL7Af8BlWI1ZuJJwDJXbEi2Y7rHme5JjbA=

    To validate, you can then run “aws s3 ls.” You should see S3 buckets listed from your AWS account. Note that these credentials are only valid for 60 minutes. This means you will have to re-run the command and acquire a new pair of AWS credentials. Of course, you can configure your IAM role to extend expiry for an “assume role.” 

    auth-awscreds in Action:

    Summary

    Currently, “auth-awscreds” is at its early development stage. This post demonstrates how AWS credentials can be acquired temporarily without having to worry about key rotation. One of the features that we are currently working on is RBAC, with the help of AWS Cognito. Since this tool currently doesn’t support any command line argument, we can’t reconfigure utility configuration. You can manually edit or delete the utility configuration file, which triggers a prompt for configuring during the next run. We also want to add multiple profiles so that multiple AWS accounts can be used.

  • A Comprehensive Tutorial to Implementing OpenTracing With Jaeger

    Introduction

    Recently, there has been a lot of discussion around OpenTracing. We’ll start this blog by introducing OpenTracing, explaining what it is and why it is gaining attention. Next, we will discuss distributed tracing system Jaeger and how it helps in troubleshooting microservices-based distributed systems. We will also set up Jaeger and learn to use it for monitoring and troubleshooting purposes.

    Drift to Microservice Architecture

    Microservice Architecture has now become the obvious choice for application developers. In the Microservice Architecture,  a monolithic application is broken down into a group of independently deployed services. In simple words,  an application is more like a collection of microservices. When we have millions of such intertwined microservices working together, it’s almost impossible to map the inter-dependencies of these services and understand the execution of a request.

    If a monolithic application fails then it is more feasible to do the root cause analysis and understand the path of a transaction using some logging frameworks. But in a microservice architecture, logging alone fails to deliver the complete picture.

    Is this service called first in the chain? How do I span all these services to get insight into the application? With questions like these, it becomes a significantly larger problem to debug a set of interdependent distributed services in comparison to a single monolithic application, making OpenTracing more and more popular.

    OpenTracing

    What is Distributed Tracing?

    Distributed tracing is a method used to monitor applications, mostly those built using the microservices architecture. Distributed tracing helps to highlight what causes poor performance and where failures occur.

    How OpenTracing Fits Into This?

    The OpenTracing API provides a standard, vendor neutral framework for instrumentation. This means that if a developer wants to try out a different distributed tracing system, then instead of repeating the whole instrumentation process for the new distributed tracing system, the developer can easily change the configuration of Tracer.

    OpenTracing uses basic terminologies, such as Span and Trace. You can read about them in detail here.

    OpenTracing is a way for services to “describe and propagate distributed traces without knowledge of the underlying OpenTracing implementation.

    Let us take the example of a service like renting a movie on any rental service like iTunes. A service like this requires many other microservices to check that the movie is available, proper payment credentials are received, and enough space exists on the viewer’s device for download. If either one of those microservice fail, then the entire transaction fails. In such a case, having logs just for the main rental service wouldn’t be very useful for debugging. However, if you were able to analyze each service you wouldn’t have to scratch your head to troubleshoot  which microservice failed and what made it fail.

    In real life, applications are even more complex and with the increasing complexity of applications, monitoring the applications has been a tedious task. Opentracing helps us to easily monitor:

    • Spans of services
    • Time taken by each service
    • Latency between the services
    • Hierarchy of services
    • Errors or exceptions during execution of each service.

    Jaeger: A Distributed Tracing System by Uber

    Jaeger, is released as an open source distributed tracing system by Uber Technologies. It is used for monitoring and troubleshooting microservices-based distributed systems, including:

    • Distributed transaction monitoring
    • Performance and latency optimization
    • Root cause analysis
    • Service dependency analysis
    • Distributed context propagation

    Major Components of Jaeger

    1. Jaeger Client Libraries
    2. Agent
    3. Collector
    4. Query
    5. Ingester

    Running Jaeger in a Docker Container

    1.  First, install Jaeger Client on your machine:

    $ pip install jaeger-client

    2.  Now, let’s run Jaeger backend as an all-in-one Docker image. The image launches the Jaeger UI, collector, query, and agent:

    $ docker run -d -p6831:6831/udp -p16686:16686 jaegertracing/all-in-one:latest

    TIP:  To check if the docker container is running, use: Docker ps.

    Once the container starts, open http://localhost:16686/  to access the Jaeger UI. The container runs the Jaeger backend with an in-memory store, which is initially empty, so there is not much we can do with the UI right now since the store has no traces.

    Creating Traces on Jaeger UI

    1.   Create a Python program to create Traces:

    Let’s generate some traces using a simple python program. You can clone the Jaeger-Opentracing repository given below for a sample program that is used in this blog.

    import sys
    import time
    import logging
    import random
    from jaeger_client import Config
    from opentracing_instrumentation.request_context import get_current_span, span_in_context
    
    def init_tracer(service):
        logging.getLogger('').handlers = []
        logging.basicConfig(format='%(message)s', level=logging.DEBUG)    
        config = Config(
            config={
                'sampler': {
                    'type': 'const',
                    'param': 1,
                },
                'logging': True,
            },
            service_name=service,
        )
        return config.initialize_tracer()
    
    def booking_mgr(movie):
        with tracer.start_span('booking') as span:
            span.set_tag('Movie', movie)
            with span_in_context(span):
                cinema_details = check_cinema(movie)
                showtime_details = check_showtime(cinema_details)
                book_show(showtime_details)
    
    def check_cinema(movie):
        with tracer.start_span('CheckCinema', child_of=get_current_span()) as span:
            with span_in_context(span):
                num = random.randint(1,30)
                time.sleep(num)
                cinema_details = "Cinema Details"
                flags = ['false', 'true', 'false']
                random_flag = random.choice(flags)
                span.set_tag('error', random_flag)
                span.log_kv({'event': 'CheckCinema' , 'value': cinema_details })
                return cinema_details
    
    def check_showtime( cinema_details ):
        with tracer.start_span('CheckShowtime', child_of=get_current_span()) as span:
            with span_in_context(span):
                num = random.randint(1,30)
                time.sleep(num)
                showtime_details = "Showtime Details"
                flags = ['false', 'true', 'false']
                random_flag = random.choice(flags)
                span.set_tag('error', random_flag)
                span.log_kv({'event': 'CheckCinema' , 'value': showtime_details })
                return showtime_details
    
    def book_show(showtime_details):
        with tracer.start_span('BookShow',  child_of=get_current_span()) as span:
            with span_in_context(span):
                num = random.randint(1,30)
                time.sleep(num)
                Ticket_details = "Ticket Details"
                flags = ['false', 'true', 'false']
                random_flag = random.choice(flags)
                span.set_tag('error', random_flag)
                span.log_kv({'event': 'CheckCinema' , 'value': showtime_details })
                print(Ticket_details)
    
    assert len(sys.argv) == 2
    tracer = init_tracer('booking')
    movie = sys.argv[1]
    booking_mgr(movie)
    # yield to IOLoop to flush the spans
    time.sleep(2)
    tracer.close()

    The Python program takes a movie name as an argument and calls three functions that get the cinema details, movie showtime details, and finally book a movie ticket.

    It creates some random delays in all the functions to make it more interesting, as in reality the functions would take certain time to get the details. Also the function throws random errors to give us a feel of how the traces of a real-life application may look like in case of failures.

    Here is a brief description of how OpenTracing has been used in the program:

    • Initializing a tracer:
    def init_tracer(service):
       logging.getLogger('').handlers = []
       logging.basicConfig(format='%(message)s', level=logging.DEBUG)   
       config = Config(
           config={
               'sampler': {
                   'type': 'const',
                   'param': 1,
               },
               'logging': True,
           },
           service_name=service,
       )
       return config.initialize_tracer()

    • Using the tracer instance:
    tracer = init_tracer('booking')

    • Starting new child spans using start_span:  
    with tracer.start_span('CheckCinema', child_of=get_current_span()) as span:

    • Using Tags:
    span.set_tag('Movie', movie)

    • Using Logs:
    span.log_kv({'event': 'CheckCinema' , 'value': cinema_details })

    2. Run the python program:

    $ python booking-mgr.py <movie-name>
    
    Initializing Jaeger Tracer with UDP reporter
    Using sampler ConstSampler(True)
    opentracing.tracer initialized to <jaeger_client.tracer.Tracer object at 0x7f72ffa25b50>[app_name=booking]
    Reporting span cfe1cc4b355aacd9:8d6da6e9161f32ac:cfe1cc4b355aacd9:1 booking.CheckCinema
    Reporting span cfe1cc4b355aacd9:88d294b85345ac7b:cfe1cc4b355aacd9:1 booking.CheckShowtime
    Ticket Details
    Reporting span cfe1cc4b355aacd9:98cbfafca3aa0fe2:cfe1cc4b355aacd9:1 booking.BookShow
    Reporting span cfe1cc4b355aacd9:cfe1cc4b355aacd9:0:1 booking.booking

    Now, check your Jaeger UI, you can see a new service “booking” added. Select the service and click on “Find Traces” to see the traces of your service. Every time you run the program a new trace will be created.

    You can now compare the duration of traces through the graph shown above. You can also filter traces using  “Tags” section under “Find Traces”. For example, Setting “error=true” tag will filter out all the jobs that have errors, as shown:

    To view the detailed trace, you can select a specific trace instance and check details like the time taken by each service, errors during execution and logs.

    The above trace instance has four spans, the first representing the root span “booking”, the second is the “CheckCinema”, the third is the “CheckShowtime” and last is the “BookShow”. In this particular trace instance, both the “CheckCinema” and “CheckShowtime” have reported errors, marked by the error=true tag.

    Conclusion

    In this blog, we’ve described the importance and benefits of OpenTracing, one of the core pillars of modern applications. We also explored how distributed tracer Jaeger collect and store traces while revealing inefficient portions of our applications. It is fully compatible with OpenTracing API and has a number of clients for different programming languages including Java, Go, Node.js, Python, PHP, and more.

    References

    • https://www.jaegertracing.io/docs/1.9/
    • https://opentracing.io/docs/
  • ClickHouse – The Newest Data Store in Your Big Data Arsenal

    ClickHouse

    ClickHouse is an open-source column-oriented data warehouse for online analytical processing of queries (OLAP). It is fast, scalable, flexible, cost-efficient, and easy to run. It supports the best in the industry query performance while significantly reducing storage requirements through innovative use of columnar storage and compression.

    ClickHouse’s performance exceeds comparable column-oriented database management systems that are available on the market. ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases at runtime, loading data, and running queries without reconfiguring and restarting the server.

    ClickHouse processes from hundreds of millions to over a billion rows of data across hundreds of node clusters. It utilizes all available hardware for processing queries to their fastest. The peak processing performance for a single query stands at more than two terabytes per second.

    What makes ClickHouse unique?

    • Data Storage & Compression: ClickHouse is designed to work on regular hard drives but uses SSD and additional RAM if available. Data compression in ClickHouse plays a crucial role in achieving excellent performance. It provides general-purpose compression codecs and some specialized codecs for specific kinds of data. These codecs have different CPU consumption and disk space and help ClickHouse outperform other databases.
    • High Performance: By using vector computation, engine data is processed by vectors which are parts of columns, and achieve high CPU efficiency. It supports parallel processing across multiple cores, turning large queries into parallelized naturally. ClickHouse also supports distributed query processing; data resides across shards which are used for parallel execution of the query.
    • Primary & Secondary Index: Data is sorted physically by the primary key allowing low latency extraction of specific values or ranges. The secondary index in ClickHouse enable the database to know that the query filtering conditions would skip some of the parts entirely. Therefore, these are also called data skipping indexes.
    • Support for Approximated Calculations: ClickHouse trades accuracy for performance by approximated calculations. It provides aggregate functions for an approximated estimate of several distinct values, medians, and quantiles. It retrieves proportionally fewer data from the disk to run queries based on the part of data to get approximated results.
    • Data Replication and Data Integrity Support: All the remaining duplicates retrieve their copies in the background after being written to any available replica. The system keeps identical data on several clones. Most failures are recovered automatically or semi-automatically in complex scenarios.

    But it can’t be all good, can it? there are some disadvantages to ClickHouse as well:

    • No full-fledged transactions.
    • Inability to efficiently and precisely change or remove previously input data. For example, to comply with GDPR, data could well be cleaned up or modified using batch deletes and updates.
    • ClickHouse is less efficient for point queries that retrieve individual rows by their keys due to the sparse index.

    ClickHouse against its contemporaries

    So with all these distinctive features, how does ClickHouse compare with other industry-leading data storage tools. Now, ClickHouse being general-purpose, has a variety of use cases, and it has its pros and cons, so here’s a high-level comparison against the best tools in their domain. Depending on the use case, each tool has its unique traits, and comparison around them would not be fair, but what we care about the most is performance, scalability, cost, and other key attributes that can be compared irrespective of the domain. So here we go:

    ClickHouse vs Snowflake:

    • With its decoupled storage & compute approach, Snowflake is able to segregate workloads and enhance performance. The search optimization service in Snowflake further enhances the performance for point lookups but has additional costs attached with it. ClickHouse, on the other hand, with local runtime and inherent support for multiple forms of indexing, drastically improves query performance.
    • Regarding scalability, ClickHouse being on-prem makes it slightly challenging to scale compared to Snowflake, which is cloud-based. Managing hardware manually by provisioning clusters and migrating is doable but tedious. But one possible solution to tackle is to deploy CH on the cloud, a very good option that is cheaper and, frankly, the most viable. 

    ClickHouse vs Redshift:

    • Redshift is a managed, scalable cloud data warehouse. It offers both provisioned and serverless options. Its RA3 nodes compute scalably and cache the necessary data. Still, even with that, its performance does not separate different workloads that are on the same data putting it on the lower end of the decoupled compute & storage cloud architectures. ClickHouse’s local runtime is one of the fastest. 
    • Both Redshift and ClickHouse are columnar, sort data, allowing read-only specific data. But deploying CH is cheaper, and although RS is tailored to be a ready-to-use tool, CH is better if you’re not entirely dependent on Redshift’s features like configuration, backup & monitoring.

    ClickHouse vs InfluxDB:

    • InfluxDB, written in Go, this open-source no-SQL is one of the most popular choices when it comes to dealing with time-series data and analysis. Despite being a general-purpose analytical DB, ClickHouse provides competitive write performance. 
    • ClickHouse’s data structures like AggregatingMergeTree allow real-time data to be stored in a pre-aggregated format which puts it on par in performance regarding TSDBs. It is significantly faster in heavy queries and comparable in the case of light queries.

    ClickHouse vs PostgreSQL:

    • Postgres is another DB that is very versatile and thus is widely used by the world for various use cases, just like ClickHouse. Postgres, however, is an OLTP DB, so unlike ClickHouse, analytics is not its primary aim, but it’s still used for analytics purposes to a certain extent.
    • In terms of transactional data, ClickHouse’s columnar nature puts it below Postgres, but when it comes to analytical capabilities, even after tuning Postgres to its max potential, for, e.g., by using materialized views, indexing, cache size, buffers, etc. ClickHouse is ahead.  

    ClickHouse vs Apache Druid:

    • Apache Druid is an open-source data store that is primarily used for OLAP. Both Druid & ClickHouse are very similar in terms of their approaches and use cases but differ in terms of their architecture. Druid is mainly used for real-time analytics with heavy ingestions and high uptime.
    • Unlike Druid, ClickHouse has a much simpler deployment. CH can be deployed on only one server, while Druid setup needs multiple types of nodes (master, broker, ingestion, etc.). ClickHouse, with its support for SQL-like nature, provides better flexibility. It is more performant when the deployment is small.

    To summarize the differences between ClickHouse and other data warehouses:

    ClickHouse Engines

    Depending on the type of your table (internal or external) ClickHouse provides an array of engines that help us connect to different data storages and also determine the way data is stored, accessed, and other interactions on it.

    These engines are mainly categorized into two types:

    Database Engines:

    These allow us to work with different databases & tables.
    ClickHouse uses the Atomic database engine to provide configurable table engines and dialects. The popular ones are PostgreSQL, MySQL, and so on.

    Table Engines:

    These determine 

    • how and where data is stored
    • where to read/write it from/to
    • which queries it supports
    • use of indexes
    • concurrent data access and so on.

    These engines are further classified into families based on the above parameters:

    MergeTree Engines:

    This is the most universal and functional table for high-load tasks. The engines of this family support quick data insertion with subsequent background data processing. These engines also support data replication, partitioning, secondary data-skipping indexes and some other features. Following are some of the popular engines in this family:

    • MergeTree
    • SummingMergeTree
    • AggregatingMergeTree

    MergeTree engines with indexing and partitioning support allow data to be processed at a tremendous speed. These can also be leveraged to form materialized views that store aggregated data further improving the performance.

    Log Engines:

    These are lightweight engines with minimum functionality. These work the best when the requirement is to quickly write into many small tables and read them later as a whole. This family consists of:

    • Log
    • StripeLog
    • TinyLog

    These engines append data to the disk in a sequential fashion and support concurrent reading. They do not support indexing, updating, or deleting and hence are only useful when the data is small, sequential, and immutable.

    Integration Engines:

    These are used for communicating with other data storage and processing systems. This support:

    • JDBC
    • MongoDB
    • HDFS
    • S3
    • Kafka and so on.

    Using these engines we can import and export data from external sources. With engines like Kafka we can ingest data directly from a topic to a table in ClickHouse and with the S3 engine, we work directly with S3 objects.

    Special Engines:

    ClickHouse offers some special engines that are specific to the use case. For example:

    • MaterializedView
    • Distributed
    • Merge
    • File and so on.

    These special engines have their own quirks for eg. with File we can export data to a file, update data in the table by updating the file, etc.

    Summary

    We learned that ClickHouse is a very powerful and versatile tool. One that has stellar performance is feature-packed, very cost-efficient, and open-source. We saw a high-level comparison of ClickHouse with some of the best choices in an array of use cases. Although it ultimately comes down to how specific and intense your use case is, ClickHouse and its generic nature measure up pretty well on multiple occasions.

    ClickHouse’s applicability in web analytics, network management, log analysis, time series analysis, asset valuation in financial markets, and security threat identification makes it tremendously versatile. With consistently solving business problems in a low latency response for petabytes of data, ClickHouse is indeed one of the faster data warehouses out there.

    Further Readings

  • Cleaner, Efficient Code with Hooks and Functional Programming

    React Hooks were introduced in 2018 and ever since numerous POCs have been built around the same. Hooks come in at a time when React has become a norm and class components are becoming increasingly complex. With this blog, I will showcase how Hooks can reduce the size of your code up to 90%. Yes, you heard it right. Exciting, isn’t it? 

    Hooks are a powerful upgrade coming with React 16.8 and utilize the functional programming paradigm. React, however, also acknowledges the volume of class components already built, and therefore, comes with backward compatibility. You can practice by refactoring a small chunk of your codebase to use React Hooks, while not impacting the existing functionality. 

    With this article, I tried to show you how Hooks can help you write cleaner, smaller and more efficient code. 90% Remember!

    First, let’s list out the common problems we all face with React Components as they are today:

    1. Huge Components – caused by the distributed logic in lifecycle Hooks

    2. Wrapper Hell – caused by re-using components

    3. Confusing and hard to understand classes

    In my opinion, these are the symptoms of one big problem i.e. React does not provide a stateful primitive simpler, smaller and more lightweight than class component. That is why solving one problem worsens the other. For example, if we put all of the logic in components to fix Wrapper Hell, it leads to Huge Components, that makes it hard to refactor. On the other hand, if we divide the huge components into smaller reusable pieces, it leads to more nests than in the component tree i.e. Wrapper Hell. In either case, there’s always confusion around the classes.

    Let’s approach these problems one by one and solve them in isolation.

    Huge Components –

    We all have used lifecycle Hooks and often with time they contain more and more stateful logic. It is also observed that stateful logic is shared amongst lifecycle Hooks. For example, consider you have a code that adds an event listener in componentDidMount. The componentDidUpdate method might also contain some logic for setting up the event listeners. Now the cleanup code will be written in componentWillUnmount. See how the logic for the same thing is split between these lifecycle Hooks.

    // Class component
    
    import React from "react";
    
    export default class LazyLoader extends React.Component {
      constructor(props) {
        super(props);
    
        this.state = { data: [] };
      }
    
      loadMore = () => {
        // Load More Data
        console.log("loading data");
      };
    
      handleScroll = () => {
        if (!this.props.isLoading && this.props.isCompleted) {
          this.loadMore();
        }
      };
    
      componentDidMount() {
        this.loadMore();
        document.addEventListener("scroll", this.handleScroll, false);
        // more subscribers and event listeners
      }
    
      componentDidUpdate() {
        //
      }
    
      componentWillUnmount() {
        document.removeEventListener("scroll", this.handleScroll, false);
        // unsubscribe and remove listeners
      }
    
      render() {
        return <div>{this.state.data}</div>;
      }
    }

    React Hooks approach this with useEffect.

    import React, { useEffect, useState } from "react";
    
    export const LazyLoader = ({ isLoading, isCompleted }) => {
      const [data, setData] = useState([]);
    
      const loadMore = () => {
        // Load and setData here
      };
    
      const handleScroll = () => {
        if (!isLoading && isCompleted) {
          loadMore();
        }
      };
    
      // cDM and cWU
      useEffect(() => {
        document.addEventListener("scroll", handleScroll, false);
        // more subscribers and event listeners
    
        return () => {
          document.removeEventListener("scroll", handleScroll, false);
          // unsubscribe and remove listeners
        };
      }, []);
    
      // cDU
      useEffect(() => {
        //
      }, [/** dependencies */]);
    
      return data && <div>{data}</div>;
    };

    Now, let’s move the logic to a custom Hook.

    import { useEffect, useState } from "react";
    
    export function useScroll() {
      const [data, setData] = useState([]);
    
      const loadMore = () => {
        // Load and setData here
      };
    
      const handleScroll = () => {
        if (!isLoading && isCompleted) {
          loadMore();
        }
      };
    
      // cDM and cWU
      useEffect(() => {
        document.addEventListener("scroll", handleScroll, false);
        // more subscribers and event listeners
    
        return () => {
          document.removeEventListener("scroll", handleScroll, false);
          // unsubscribe and remove listeners
        };
      }, []);
    
      return data;
    };

    import React, { useEffect } from "react";
    import { useScroll } from "./useScroll";
    
    const LazyLoader = ({ isLoading, isCompleted }) => {
      const data = useScroll();
    
      // cDU
      useEffect(() => {
        //
      }, [/** dependencies */]);
    
      return data && <div>{data}</div>;
    };

    useEffect puts the code that changes together in one place, making the code more readable and easy to understand. You can also write multiple useEffects. The advantage of this is again to separate out the mutually unrelated code.

    Wrapper Hell –

    If you’re well versed with React, you probably know it doesn’t provide a pattern of attaching a reusable code to the component (like “connect” in react-redux). React solves this problem of data sharing by render props and higher-order components patterns. But using this, requires restructuring of your components, that is hard to follow and, at times, cumbersome. This typically leads to a problem called Wrapper Hell. One can check this by looking at the application in React DevTools. There you can see components wrapped by a number of providers, consumers, HOCs and other abstractions. Because of this, React needed a better way of sharing the logic.

    The below code is inspired from React Conf 2018 – 90% cleaner react w/ Hooks.

    import React from "react";
    import Media from "./components/Media";
    
    function App() {
      return (
        <Media query="(max-width: 480px)">
          {small => (
            <Media query="(min-width: 1024px)">
              {large => (
                <div className="media">
                  <h1>Media</h1>
                  <p>{small ? "small screen" : "not a small screen"}</p>
                  <p>{large ? "large screen" : "not a large screen"}</p>
                </div>
              )}
            </Media>
          )}
        </Media>
      );
    }
    
    export default App;

    import React from "react";
    
    export default class Media extends React.Component {
      removeListener = () => null;
    
      constructor(props) {
        super(props);
        this.state = {
          matches: window.matchMedia(this.props.query).matches
        };
      }
    
      componentDidMount() {
        this.init();
      }
    
      init() {
        const media = window.matchMedia(this.props.query);
        if (media.matches !== this.state.matches) {
          this.setState({ matches: media.matches });
        }
    
        const listener = () => this.setState({ matches: media.matches });
        media.addListener(listener);
        this.removeListener = () => media.removeListener(listener);
      }
    
      componentDidUpdate(prevProps) {
        if (prevProps.query !== this.props.query) {
          this.removeListener();
          this.init();
        }
      }
    
      componentWillUnmount() {
        this.removeListener();
      }
    
      render() {
        return this.props.children(this.state.matches);
      }
    }

    We can check the below example to see how Hooks fix this problem.

    import { useState, useEffect } from "react";
    
    export default function(query) {
      let [matches, setMatches] = useState(window.matchMedia(query).matches);
    
      useEffect(() => {
        let media = window.matchMedia(query);
        if (media.matches !== matches) {
          setMatches(media.matches);
        }
        const listener = () => setMatches(media.matches);
        media.addListener(listener);
        return () => media.removeListener(listener);
      }, [query, matches]);
    
      return matches;
    }

    import React from "react";
    import useMedia from "./hooks/useMedia";
    
    function App() {
      let small = useMedia("(max-width: 480px)");
      let large = useMedia("(min-width: 1024px)");
      return (
        <div className="media">
          <h1>Media</h1>
          <p>{small ? "small screen" : "not a small screen"}</p>
          <p>{large ? "large screen" : "not a large screen"}</p>
        </div>
      );
    }
    
    export default App;

    Hooks provide you with a way to extract a reusable stateful logic from a component without affecting the component hierarchy. This enables it to be tested independently.

    Confusing and hard to understand classes

    Classes pose more problems than it solves. We’ve known React for a very long time and there’s no denying that it is hard for humans as well as for machines. It confuses both of them. Here’s why:

    For Humans –

    1. There’s a fair amount of boilerplate when defining a class.

    2. Beginners and even expert developers find it difficult to bind methods and writing class components.

    3. People often couldn’t decide between functional and class components, as with time they might need state.

    For Machines –

    1. In the minified version of a component file, the method names are not minified and the unused methods are not stripped out, as it’s not possible to tell how all the methods fit together.

    2. Classes make it difficult for React to implement hot loading reliably.

    3. Classes encourage patterns that make it difficult for the compiler to optimize.

    Due to the above problems, classes can be a large barrier in learning React. To keep the React relevant, the community has been experimenting with component folding and Prepack, but the classes make optimizations fall back to the slower path. Hence, the community wanted to present an API that makes it more likely for code to stay on the optimizable path.

    React components have always been closer to functions. And since Hooks introduced stateful logic into functional components, it lets you use more of React’s features without classes. Hooks embrace functions without compromising the practical spirit of React. Hooks don’t require you to learn complex functional and reactive programming techniques.

    Conclusion –

    React Hooks got me excited and I am learning new things every day. Hooks are a way to write far less code for the same usecase. Also, Hooks do not ask the developers who are already busy with shipping, to rewrite everything. You can redo small components with Hooks and slowly move to the complex components later.

    The thinking process in Hooks is meant to be gradual. I hope this blog makes you want to get your hands dirty with Hooks. Do share your thoughts and experiences with Hooks. Finally, I would strongly recommend this official documentation which has great content.

    Recommended Reading: React Today and Tomorrow and 90% cleaner React with Hook

  • Continuous Integration & Delivery (CI/CD) for Kubernetes Using CircleCI & Helm

    Introduction

    Kubernetes is getting adopted rapidly across the software industry and is becoming the most preferred option for deploying and managing containerized applications. Once we have a fully functional Kubernetes cluster we need to have an automated process to deploy our applications on it. In this blog post, we will create a fully automated “commit to deploy” pipeline for Kubernetes. We will use CircleCI & helm for it.

    What is CircleCI?

    CircleCI is a fully managed saas offering which allows us to build, test or deploy our code on every check in. For getting started with circle we need to log into their web console with our GitHub or bitbucket credentials then add a project for the repository we want to build and then add the CircleCI config file to our repository. The CircleCI config file is a yaml file which lists the steps we want to execute on every time code is pushed to that repository.

    Some salient features of CircleCI is:

    1. Little or no operational overhead as the infrastructure is managed completely by CircleCI.
    2. User authentication is done via GitHub or bitbucket so user management is quite simple.
    3. It automatically notifies the build status on the github/bitbucket email ids of the users who are following the project on CircleCI.
    4. The UI is quite simple and gives a holistic view of builds.
    5. Can be integrated with Slack, hipchat, jira, etc.

    What is Helm?

    Helm is chart manager where chart refers to package of Kubernetes resources. Helm allows us to bundle related Kubernetes objects into charts and treat them as a single unit of deployment referred to as release.  For example, you have an application app1 which you want to run on Kubernetes. For this app1 you create multiple Kubernetes resources like deployment, service, ingress, horizontal pod scaler, etc. Now while deploying the application you need to create all the Kubernetes resources separately by applying their manifest files. What helm does is it allows us to group all those files into one chart (Helm chart) and then we just need to deploy the chart. This also makes deleting and upgrading the resources quite simple.

    Some other benefits of Helm is:

    1. It makes the deployment highly configurable. Thus just by changing the parameters, we can use the same chart for deploying on multiple environments like stag/prod or multiple cloud providers.
    2. We can rollback to a previous release with a single helm command.
    3. It makes managing and sharing Kubernetes specific application much simpler.

    Note: Helm is composed of two components one is helm client and the other one is tiller server. Tiller is the component which runs inside the cluster as deployment and serves the requests made by helm client. Tiller has potential security vulnerabilities thus we will use tillerless helm in our pipeline which runs tiller only when we need it.

    Building the Pipeline

    Overview:

    We will create the pipeline for a Golang application. The pipeline will first build the binary, create a docker image from it, push the image to ECR, then deploy it on the Kubernetes cluster using its helm chart.

    We will use a simple app which just exposes a `hello` endpoint and returns the hello world message:

    package main
    
    import (
    	"encoding/json"
    	"net/http"
    	"log"
    	"github.com/gorilla/mux"
    )
    
    type Message struct {
    	Msg string
    }
    
    func helloWorldJSON(w http.ResponseWriter, r *http.Request) {
    	m := Message{"Hello World"}
    	response, _ := json.Marshal(m)
    	w.Header().Set("Content-Type", "application/json")
    	w.WriteHeader(http.StatusOK)
    	w.Write(response)
    }
    func main() {
    	r := mux.NewRouter()
    	r.HandleFunc("/hello", helloWorldJSON).Methods("GET")
    	if err := http.ListenAndServe(":8080", r); err != nil {
    		log.Fatal(err)
    	}
    }

    We will create a docker image for hello app using the following Dockerfile:

    FROM centos/systemd
    
    MAINTAINER "Akash Gautam" <akash.gautam@velotio.com>
    
    COPY hello-app  /
    
    ENTRYPOINT ["/hello-app"]

    Creating Helm Chart:

    Now we need to create the helm chart for hello app.

    First, we create the Kubernetes manifest files. We will create a deployment and a service file:

    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: helloapp
    spec:
      replicas: 1
      strategy:
      type: RollingUpdate
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 1
      template:
        metadata:
          labels:
            app: helloapp
            env: {{ .Values.labels.env }}
            cluster: {{ .Values.labels.cluster }}
        spec:
          containers:
          - name: helloapp
            image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
            imagePullPolicy: {{ .Values.image.imagePullPolicy }}
            readinessProbe:
              httpGet:
                path: /hello
                port: 8080
                initialDelaySeconds: 5
                periodSeconds: 5
                successThreshold: 1

    apiVersion: v1
    kind: Service
    metadata:
      name: helloapp
    spec:
      type: {{ .Values.service.type }}
      ports:
      - name: helloapp
        port: {{ .Values.service.port }}
        protocol: TCP
        targetPort: {{ .Values.service.targetPort }}
      selector:
        app: helloapp

    In the above file, you must have noticed that we have used .Values object. All the values that we specify in the values.yaml file in our helm chart can be accessed using the .Values object inside the template.

    Let’s create the helm chart now:

    helm create helloapp

    Above command will create a chart helm chart folder structure for us.

    helloapp/
    |
    |- .helmignore # Contains patterns to ignore when packaging Helm charts.
    |
    |- Chart.yaml # Information about your chart
    |
    |- values.yaml # The default values for your templates
    |
    |- charts/ # Charts that this chart depends on
    |
    |- templates/ # The template files

    We can remove the charts/ folder inside our helloapp chart as our chart won’t have any sub-charts. Now we need to move our Kubernetes manifest files to the template folder and update our values.yaml and Chart.yaml

    Our values.yaml looks like:

    image:
      tag: 0.0.1
      repository: 123456789870.dkr.ecr.us-east-1.amazonaws.com/helloapp
      imagePullPolicy: Always
    
    labels:
      env: "staging"
      cluster: "eks-cluster-blog"
    
    service:
      port: 80
      targetPort: 8080
      type: LoadBalancer

    This allows us to make our deployment more configurable. For example, here we have set our service type as LoadBalancer in values.yaml but if we want to change it to nodePort we just need to set is as NodePort while installing the chart (–set service.type=NodePort). Similarly, we have set the image pull policy as Always which is fine for development/staging environment but when we deploy to production we may want to set is as ifNotPresent. In our chart, we need to identify the parameters/values which may change from one environment to another and make them configurable. This allows us to be flexible with our deployment and reuse the same chart

    Finally, we need to update Chart.yaml file. This file mostly contains metadata about the chart like the name, version, maintainer, etc, where name & version are two mandatory fields for Chart.yaml.

    version: 1.0.0
    appVersion: 0.0.1
    name: helloapp
    description: Helm chart for helloapp
    source:
      - https://github.com/akash-gautam/helloapp

    Now our Helm chart is ready we can start with the pipeline. We need to create a folder named .circleci in the root folder of our repository and create a file named config.yml in it. In our config.yml we have defined two jobs one is build&pushImage and deploy.

    Configure the pipeline:

    build&pushImage:
        working_directory: /go/src/hello-app (1)
        docker:
          - image: circleci/golang:1.10 (2)
        steps:
          - checkout (3)
          - run: (4)
              name: build the binary
              command: go build -o hello-app
          - setup_remote_docker: (5)
              docker_layer_caching: true
          - run: (6)
              name: Set the tag for the image, we will concatenate the app verson and circle build number with a `-` char in between
              command:  echo 'export TAG=$(cat VERSION)-$CIRCLE_BUILD_NUM' >> $BASH_ENV
          - run: (7)
              name: Build the docker image
              command: docker build . -t ${CIRCLE_PROJECT_REPONAME}:$TAG
          - run: (8)
              name: Install AWS cli
              command: export TZ=Europe/Minsk && sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > sudo  /etc/timezone && sudo apt-get update && sudo apt-get install -y awscli
          - run: (9)
              name: Login to ECR
              command: $(aws ecr get-login --region $AWS_REGION | sed -e 's/-e none//g')
          - run: (10)
              name: Tag the image with ECR repo name 
              command: docker tag ${CIRCLE_PROJECT_REPONAME}:$TAG ${HELLOAPP_ECR_REPO}:$TAG    
          - run: (11)
              name: Push the image the ECR repo
              command: docker push ${HELLOAPP_ECR_REPO}:$TAG

    1. We set the working directory for our job, we are setting it on the gopath so that we don’t need to do anything additional.
    2. We set the docker image inside which we want the job to run, as our app is built using golang we are using the image which already has golang installed in it.
    3. This step checks out our repository in the working directory
    4. In this step, we build the binary
    5. Here we setup docker with the help of  setup_remote_docker  key provided by CircleCI.
    6. In this step we create the tag we will be using while building the image, we use the app version available in the VERSION file and append the $CIRCLE_BUILD_NUM value to it, separated by a dash (`-`).
    7. Here we build the image and tag.
    8. Installing AWS CLI to interact with the ECR later.
    9. Here we log into ECR
    10. We tag the image build in step 7 with the ECR repository name.
    11. Finally, we push the image to ECR.

    Now we will deploy our helm charts. For this, we have a separate job deploy.

    deploy:
        docker: (1)
            - image: circleci/golang:1.10
        steps: (2)
          - checkout
          - run: (3)
              name: Install AWS cli
              command: export TZ=Europe/Minsk && sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > sudo  /etc/timezone && sudo apt-get update && sudo apt-get install -y awscli
          - run: (4)
              name: Set the tag for the image, we will concatenate the app verson and circle build number with a `-` char in between
              command:  echo 'export TAG=$(cat VERSION)-$CIRCLE_PREVIOUS_BUILD_NUM' >> $BASH_ENV
          - run: (5)
              name: Install and confgure kubectl
              command: sudo curl -L https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl && sudo chmod +x /usr/local/bin/kubectl  
          - run: (6)
              name: Install and confgure kubectl aws-iam-authenticator
              command: curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator && sudo chmod +x ./aws-iam-authenticator && sudo cp ./aws-iam-authenticator /bin/aws-iam-authenticator
           - run: (7)
              name: Install latest awscli version
              command: sudo apt install unzip && curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip" && unzip awscli-bundle.zip &&./awscli-bundle/install -b ~/bin/aws
          - run: (8)
              name: Get the kubeconfig file 
              command: export KUBECONFIG=$HOME/.kube/kubeconfig && /home/circleci/bin/aws eks --region $AWS_REGION update-kubeconfig --name $EKS_CLUSTER_NAME
          - run: (9)
              name: Install and configuire helm
              command: sudo curl -L https://storage.googleapis.com/kubernetes-helm/helm-v2.11.0-linux-amd64.tar.gz | tar xz && sudo mv linux-amd64/helm /bin/helm && sudo rm -rf linux-amd64
          - run: (10)
              name: Initialize helm
              command:  helm init --client-only --kubeconfig=$HOME/.kube/kubeconfig
          - run: (11)
              name: Install tiller plugin
              command: helm plugin install https://github.com/rimusz/helm-tiller --kubeconfig=$HOME/.kube/kubeconfig        
          - run: (12)
              name: Release helloapp using helm chart 
              command: bash scripts/release-helloapp.sh $TAG

    1. Set the docker image inside which we want to execute the job.
    2. Check out the code using `checkout` key
    3. Install AWS CLI.
    4. Setting the value of tag just like we did in case of build&pushImage job. Note that here we are using CIRCLE_PREVIOUS_BUILD_NUM variable which gives us the build number of build&pushImage job and ensures that the tag values are the same.
    5. Download kubectl and making it executable.
    6. Installing aws-iam-authenticator this is required because my k8s cluster is on EKS.
    7. Here we install the latest version of AWS CLI, EKS is a relatively newer service from AWS and older versions of AWS CLI doesn’t have it.
    8. Here we fetch the kubeconfig file. This step will vary depending upon where the k8s cluster has been set up. As my cluster is on EKS am getting the kubeconfig file via. AWS CLI similarly if your cluster in on GKE then you need to configure gcloud and use the command  `gcloud container clusters get-credentials <cluster-name> –zone=<zone-name>`. We can also have the kubeconfig file on some other secure storage system and fetch it from there.</zone-name></cluster-name>
    9. Download Helm and make it executable
    10. Initializing helm, note that we are initializing helm in client only mode so that it doesn’t start the tiller server.
    11. Download the tillerless helm plugin
    12. Execute the release-helloapp.sh shell script and pass it TAG value from step 4.

    In the release-helloapp.sh script we first start tiller, after this, we check if the release is already present or not if it is present then we upgrade otherwise we make a new release. Here we override the value of tag for the image present in the chart by setting it to the tag of the newly built image, finally, we stop the tiller server.

    #!/bin/bash
    TAG=$1
    echo "start tiller"
    export KUBECONFIG=$HOME/.kube/kubeconfig
    helm tiller start-ci
    export HELM_HOST=127.0.0.1:44134
    result=$(eval helm ls | grep helloapp) 
    if [ $? -ne "0" ]; then 
       helm install --timeout 180 --name helloapp --set image.tag=$TAG charts/helloapp
    else 
       helm upgrade --timeout 180 helloapp --set image.tag=$TAG charts/helloapp
    fi
    echo "stop tiller"
    helm tiller stop 

    The complete CircleCI config.yml file looks like:

    version: 2
    
    jobs:
      build&pushImage:
        working_directory: /go/src/hello-app
        docker:
          - image: circleci/golang:1.10
        steps:
          - checkout
          - run:
              name: build the binary
              command: go build -o hello-app
          - setup_remote_docker:
              docker_layer_caching: true
          - run:
              name: Set the tag for the image, we will concatenate the app verson and circle build number with a `-` char in between
              command:  echo 'export TAG=$(cat VERSION)-$CIRCLE_BUILD_NUM' >> $BASH_ENV
          - run:
              name: Build the docker image
              command: docker build . -t ${CIRCLE_PROJECT_REPONAME}:$TAG
          - run:
              name: Install AWS cli
              command: export TZ=Europe/Minsk && sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > sudo  /etc/timezone && sudo apt-get update && sudo apt-get install -y awscli
          - run:
              name: Login to ECR
              command: $(aws ecr get-login --region $AWS_REGION | sed -e 's/-e none//g')
          - run: 
              name: Tag the image with ECR repo name 
              command: docker tag ${CIRCLE_PROJECT_REPONAME}:$TAG ${HELLOAPP_ECR_REPO}:$TAG    
          - run: 
              name: Push the image the ECR repo
              command: docker push ${HELLOAPP_ECR_REPO}:$TAG
      deploy:
        docker:
            - image: circleci/golang:1.10
        steps:
          - attach_workspace:
              at: /tmp/workspace
          - checkout
          - run:
              name: Install AWS cli
              command: export TZ=Europe/Minsk && sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > sudo  /etc/timezone && sudo apt-get update && sudo apt-get install -y awscli
          - run:
              name: Set the tag for the image, we will concatenate the app verson and circle build number with a `-` char in between
              command:  echo 'export TAG=$(cat VERSION)-$CIRCLE_PREVIOUS_BUILD_NUM' >> $BASH_ENV
          - run:
              name: Install and confgure kubectl
              command: sudo curl -L https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl && sudo chmod +x /usr/local/bin/kubectl  
          - run:
              name: Install and confgure kubectl aws-iam-authenticator
              command: curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator && sudo chmod +x ./aws-iam-authenticator && sudo cp ./aws-iam-authenticator /bin/aws-iam-authenticator
           - run:
              name: Install latest awscli version
              command: sudo apt install unzip && curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip" && unzip awscli-bundle.zip &&./awscli-bundle/install -b ~/bin/aws
          - run:
              name: Get the kubeconfig file 
              command: export KUBECONFIG=$HOME/.kube/kubeconfig && /home/circleci/bin/aws eks --region $AWS_REGION update-kubeconfig --name $EKS_CLUSTER_NAME
          - run:
              name: Install and configuire helm
              command: sudo curl -L https://storage.googleapis.com/kubernetes-helm/helm-v2.11.0-linux-amd64.tar.gz | tar xz && sudo mv linux-amd64/helm /bin/helm && sudo rm -rf linux-amd64
          - run:
              name: Initialize helm
              command:  helm init --client-only --kubeconfig=$HOME/.kube/kubeconfig
          - run:
              name: Install tiller plugin
              command: helm plugin install https://github.com/rimusz/helm-tiller --kubeconfig=$HOME/.kube/kubeconfig        
          - run:
              name: Release helloapp using helm chart 
              command: bash scripts/release-helloapp.sh $TAG
    workflows:
      version: 2
      primary:
        jobs:
          - build&pushImage
          - deploy:
              requires:
                - build&pushImage

    At the end of the file, we see the workflows, workflows control the order in which the jobs specified in the file are executed and establishes dependencies and conditions for the job. For example, we may want our deploy job trigger only after my build job is complete so we added a dependency between them. Similarly, we may want to exclude the jobs from running on some particular branch then we can specify those type of conditions as well.

    We have used a few environment variables in our pipeline configuration some of them were created by us and some were made available by CircleCI. We created AWS_REGION, HELLOAPP_ECR_REPO, EKS_CLUSTER_NAME, AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY variables. These variables are set via. CircleCI web console by going to the projects settings. Other variables that we have used are made available by CircleCI as a part of its environment setup process. Complete list of environment variables set by CircleCI can be found here.

    Verify the working of the pipeline:

    Once everything is set up properly then our application will get deployed on the k8s cluster and should be available for access. Get the external IP of the helloapp service and make a curl request to the hello endpoint

    $ curl http://a31e25e7553af11e994620aebe144c51-242977608.us-west-2.elb.amazonaws.com/hello && printf "n"
    
    {"Msg":"Hello World"}

    Now update the code and change the message “Hello World” to “Hello World Returns” and push your code. It will take a few minutes for the pipeline to complete execution and once it is complete make the curl request again to see the changes getting reflected.

    $ curl http://a31e25e7553af11e994620aebe144c51-242977608.us-west-2.elb.amazonaws.com/hello && printf "n"
    
    {"Msg":"Hello World Returns"}

    Also, verify that a new tag is also created for the helloapp docker image on ECR.

    Conclusion

    In this blog post, we explored how we can set up a CI/CD pipeline for kubernetes and got basic exposure to CircleCI and Helm. Although helm is not absolutely necessary for building a pipeline, it has lots of benefits and is widely used across the industry. We can extend the pipeline to consider the cases where we have multiple environments like dev, staging & production and make the pipeline deploy the application to any of them depending upon some conditions. We can also add more jobs like integration tests. All the codes used in the blog post are available here.

    Related Reads:

    1. Continuous Deployment with Azure Kubernetes Service, Azure Container Registry & Jenkins
    2. Know Everything About Spinnaker & How to Deploy Using Kubernetes Engine
  • How To Implement Chaos Engineering For Microservices Using Istio

    “Embrace Failures. Chaos and failures are your friends, not enemies.” A microservice ecosystem is going to fail at some point. The issue is not if you fail, but when you fail, will you notice or not. It’s between whether it will affect your users because all of your services are down, or it will affect only a few users and you can fix it at your own time.

    Chaos Engineering is a practice to intentionally introduce faults and failures into your microservice architecture to test the resilience and stability of your system. Istio can be a great tool to do so. Let’s have a look at how Istio made it easy.

    For more information on how to setup Istio and what are virtual service and Gateways, please have a look at the following blog, how to setup Istio on GKE.

    Fault Injection With Istio

    Fault injection is a testing method to introduce errors into your microservice architecture to ensure it can withstand the error conditions. Istio lets you injects errors at HTTP layer instead of delaying the packets or killing the pods at network layer. This way, you can generate various types of HTTP error codes and test the reaction of your services under those conditions. 

    Generating HTTP 503 Error

    Here we see that two pods are running two different versions of recommendation service using the recommended tutorial while installing the sample application.

    Currently, the traffic on the recommendation service is automatically load balanced between those two pods.

    kubectl get pods -l app=recommendation
    NAME                                  READY     STATUS    RESTARTS   AGE
    recommendation-v1-798bf87d96-d9d95   2/2       Running   0          1h
    recommendation-v2-7bc4f7f696-d9j2m   2/2       Running   0          1h

    Now let’s apply a fault injection using virtual service which will send 503 HTTP error codes in 30% of the traffic serving the above pods.

    To test whether it is working, check the output from the curl of customer service microservice endpoint. 

    You will find the 503 error on approximately 30% of the request coming to recommendation service.

    To restore normal operation, please delete the above virtual service using:

    kubectl delete -f recommendation-fault.yaml

    Delay

    The most common failure we see in production is not the down service, rather a delay service. To inject network latency as a chaos experiment, you can create another virtual service. Sometimes, it happens that your application doesn’t respond on time and creates chaos in the complete ecosystem. How to simulate that behavior, let’s have a look.

    Now, if you hit the URL of endpoints of the above service in a loop, you will see the delays in some of the requests. 

    Retry‍

    In some of the production services, we expect that instead of failing instantly, it should retry N number of times to get the desired output. If not succeeded, then only a request should be considered as failed.

    For that mechanism, you can insert retries on those services as follows:

    Now any request coming to recommendation will do 3 attempts before considering it as failed.

    Timeout‍

    In the real world, an application faces most failures due to timeouts. It can be because of more load on the application or any other latency in serving the request. Your application should have proper timeouts defined, before declaring any request as “Failed”. You can use Istio to simulate the timeout mechanism and give our application a limited amount of time to respond before giving up.

    Wait only for N seconds before failing and giving up.

    kind: VirtualService
    metadata:
      name: recommendation
    spec:
      hosts:
      - recommendation
      http:
      - route:
        - destination:
            host: recommendation
        timeout: 1.000s

    Conclusion‍

    Istio lets you inject faults at the HTTP layer for your application and improves its resilience and stability. But, the application must handle the failures and take appropriate course of action. Chaos Engineering is only effective when you know your application can take failures, otherwise, there is no point in testing for chaos if you know your application is definitely broken.

  • Building Scalable and Efficient React Applications Using GraphQL and Relay

    Building a React application is not only about creating a user interface. It also has tricky parts like data fetching, re-render performance, and scalability. Many libraries and frameworks try to solve these problems, like Redux, Sagas, etc. But these tools come with their own set of difficulties.

    Redux gives you a single data source, but all the data fetching and rendering logic is handled by developers. Immer gives you immutable data structures, but one needs to handle the re-render performance of applications.

    GraphQL helps developers design and expose APIs on the backend, but no tool on the client side could utilize the full advantage of the single endpoint and data schema provided by GraphQL.

    In this article, we will learn about Relay as a GraphQL client. What are the advantages of using Relay in your application, and what conventions are required to integrate it?  We’ll also cover how following those conventions will give you a better developer experience and a performant app. We will also see how applications built with Relay are modular, scalable, efficient, and, by default, resilient to change.

    About Relay

    Relay is a JavaScript framework to declaratively fetch and manage your GraphQL data inside a React application. Relay uses static queries and ahead-of-time compilation to help you build a high-performance app. 

    But as the great saying goes, “With great power comes great responsibilities.” Relay comes with a set of costs (conventions), which—when compared with the benefits you get—is well worth it. We will explore the trade-offs in this article.

    The Relay framework is built of multiple modules:

    1. The compiler: This is a set of modules designed to extract GraphQL code from across the codebase and do validations and optimizations during build time.

    2. Relay runtime: A high-performance GraphQL runtime that features a normalized cache for objects and highly optimized read/write operations, simplified abstractions over fetching data fields, garbage collection, subscriptions, and more.

    3. React-relay: This provides the high-level APIs to integrate React with the Relay runtime.

    The Relay compiler runs as a separate process, like how webpack works for React. It keeps watching and compiling the GraphQL code, and in case of errors, it simply does not build your code, which prevents bugs from going into higher environments.

    Fragments

    Fragments are at the heart of how Relay blends with GraphQL. A fragment is a selection of fields on a GraphQL type. 

    fragment Avatar_user on User {
      avatarImgUrl
      firstName
      lastName
      userName
    }

    If we look at the sample fragment definition above, the fragment name, Avatar_user, is not just a random name. One of the Relay framework’s important conventions is that fragments have globally unique fragment names and follow a structure of <modulename>_<propertyname>. The example above is a fragment definition for Avatar_user.</propertyname></modulename>

    This fragment can then be reused throughout the queries instead of selecting the fields manually to render the avatar in each view.

    In the below query, we see the author type, and the first two who liked the blog post can use the fragment definition of Avatar_user

    query GetBlogPost($postId: ID!) {
          blogPostById(id: $postId) {
            author {
              firstName
              lastName
              avatarImgUrl
              userName
            }
            likedBy(first: 2) {
              edges {
                node {
                  firstName
                  lastName
                  avatarImgUrl
                  userName
                }
              }
            }
          }
        }

    Now, our new query with fragments looks like this:

    query GetBlogPost($postId: ID!) {
          blogPostById(id: $postId) {
            author {
              ...Avatar_user
            }
            likedBy(first: 2) {
              edges {
                node {
                  ...Avatar_user
                }
              }
            }
          }
        }

    Fragments not only allow us to reuse the definitions but more essentially, they let us add or remove fields needed to render our avatar as we evolve our application.

    Another highly important client-side convention is colocation. This means the data required for a component lives inside the component. This makes maintenance and extending much easier. Just like how React allows us to break our UI elements into components and group/compose different views, fragments in Relay allow us to split the data definitions and colocate the data and the view definitions.

    So, a good practice is to define single or multiple fragments that contain the data component to be rendered. This means that a component depends on some fields from the user type, irrespective of the parent component. In the example above, the <avatar> component will render an avatar using the fields specified in the Avatar_user fragment named.</avatar>

    How Relay leverages the GraphQL Fragment

    Relay wants all components to enlist all the data it needs to render, along with the component itself. Relay uses data and fragments to integrate the component and its data requirement. This convention mandates that every component lists the fields it needs access to. 

    Other advantages of the above are:

    1. Components are not dependent on data they don’t explicitly request.
    2. Components are modular and self-contained.
    3. Reusing and refactoring the components becomes easier.

    Performance

    In Relay, the component re-renders only when its exact fields change, and this feature available is out of the box. The fragment subscribes to updates specifically for data the component selects. This lets Relay enhance how the view is updated, and performance is not affected as codebase scales.

    Now, let’s look at an example of components in a single post of a blog application. Here is a wireframe of a sample post to give an idea of the data and view required.

    Now, let’s write a plain query without Relay, which will fetch all the data in a single query. It will look like this for the above wireframe:

    query GetBlogPost($postId: ID!) {
          blogPostById(id: $postId) {
            author {
              firstName
              lastName
              avatarUrl
              shortBio
            }
            title
            coverImgUrl
            createdAt
            tags {
              slug
              shortName
            }
            body
            likedByMe
            likedBy(first: 2) {
              totalCount
              edges {
                node {
                  firstName
                  lastName
                  avatarUrl
                }
              }
            }
          }
        }

    This one query has all the necessary data. Let’s also write down a sample structure of UI components for the query above:

    <BlogPostContainer>
        <BlogPostHead>
          <BlogPostAuthor>
            <Avatar />
          </BlogPostAuthor>
        </BlogPostHead>
        <BlogPostBody>
          <BlogPostTitle />
          <BlogPostMeta>
            <CreatedAtDisplayer />
            <TagsDisplayer />
          </BlogPostMeta>
          <BlogPostContent />
          <LikeButton>
            <LikedByDisplayer />
          </LikeButton>
        </BlogPostBody>
     </BlogPostContainer>

    In the implementation above, we have a single query that will be managed by the top-level component. It will be the top-level component’s responsibility to fetch the data and pass it down as props. Now, we will look at how we would build this in Relay:

    import * as React from "react";
        import { GetBlogPost } from "./__generated__/GetBlogPost.graphql";
        import { useLazyLoadQuery } from "react-relay/hooks";
        import { BlogPostHead } from "./BlogPostHead";
        import { BlogPostBody } from "./BlogPostBody";
        import { graphql } from "react-relay";
    
    
        interface BlogPostProps {
          postId: string;
        }
    
        export const BlogPost = ({ postId }: BlogPostProps) => {
          const { blogPostById } = useLazyLoadQuery<GetBlogPost>(
            graphql`
              query GetBlogPost($postId: ID!) {
                blogPostById(id: $postId) {
                  ...BlogPostHead_blogPost
                  ...BlogPostBody_blogPost
                }
              }
            `,
            {
              variables: { postId }
            }
          );
    
          if (!blogPostById) {
            return null;
          }
    
          return (
            <div>
              <BlogPostHead blogPost={blogPostById} />
              <BlogPostBody blogPost={blogPostById} />
            </div>
          );
        };

    First, let’s look at the query used inside the component:

    const { blogPostById } = useLazyLoadQuery<GetBlogPost>(
    graphql`
      query GetBlogPost($postId: ID!) {
        blogPostById(id: $postId) {
          ...BlogPostHead_blogPost
          ...BlogPostBody_blogPost
        }
      }
    `,
    {
      variables: { postId }
    }
    );

    The useLazyLoadQuery React hook from Relay will start fetching the GetBlogPost query just as the component renders. 

    NOTE: The useLazyLoadQuery is used here as it follows a common mental model of fetching data after the page is loaded. However, Relay encourages data to be fetched as early as possible using the usePreladedQuery hook. 

    For type safety, we are annotating the useLazyLoadQuery with the type GetBlogPost, which is imported from ./__generated__/GetBlogPost.graphql. This file is auto-generated and synced by the Relay compiler. It contains all the information about the types needed to be queried, along with the return type of data and the input variables for the query.

    The Relay compiler takes all the declared fragments in the codebase and generates the type files, which can then be used to annotate a particular component.

    The GetBlogPost query is defined by composing multiple fragments. Another great aspect of Relay is that there is no need to import the fragments manually. They are automatically included by the Relay compiler. Building the query by composing fragments, just like how we compose our component, is the key here. 

    Another approach can be to define queries per component, which takes full responsibility for its data requirements. But this approach has two problems: 

    1. Multiple queries are sent to the server instead of one.

    2. The loading will be slower as components would have to wait till they render to start fetching the data.

    In the above example, the GetBlogPost only deals with including the fragments for its child components, BlogPostHead and BlogPostBody. It is kept hidden from the actual data fields of the children component.

    When using Relay, components define their data requirement by themselves. These components can then be composed along with other components that have their own separate data. 

    At the same time, no component knows what data the other component needs except from the GraphQL type that has the required component data. Relay makes sure the right data is passed to the respective component, and all input for a query is sent to the server.

    This allows developers to think only about the component and fragments as one while Relay does all the heavy lifting in the background. Relay minimizes the round-trips to the server by placing the fragments from multiple components into optimized and efficient batches. 

    As we said earlier, the two fragments, BlogPostHead_blogPost and BlogPostBody_blogPost, which we referenced in the query, are not imported manually. This is because Relay imposes unique fragment names globally so that the compiler can include the definitions in queries sent to the server. This eliminates the chances of errors and takes away the laborious task of referencing the fragments by hand. 

     if (!blogPostById) {
          return null;
      }
    
      return (
        <div>
          <BlogPostHead blogPost={blogPostById} />
          <BlogPostBody blogPost={blogPostById} />
        </div>
      );

    Now, in the rendering logic above, we render the <BlogPostHead/> and <BlogPostBody/> and pass the blogPostById object as prop. It’s passed because it is the object inside the query that spreads the fragment needed by the two components. This is how Relay transfers fragment data. Because we spread both fragments on this object, it is guaranteed to satisfy both components.

    To put it into simpler terms, we say that to pass the fragment data, we pass the object where the fragment is spread, and the component then uses this object to get the real fragment data. Relay, through its robust type systems, makes sure that the right object is passed with required fragment spread on it.

    The previous component, the BlogPost, was the Parent component, i.e., the component with the root query object. The root query is necessary because it cannot fetch a fragment in isolation. Fragments must be included in the root query in a parent component. The parent can, in turn, be a fragment as long the root query exists in the hierarchy. Now, we will build the BlogPostHead component using fragments:

     import * as React from "react";
        import { useFragment } from "react-relay/hooks";
        import { graphql } from "react-relay";
        import {
          BlogPostHead_blogPost$key, BlogPostHead_blogPost
        } from "./__generated__/BlogPostHead_blogPost.graphql";
        import { BlogPostAuthor } from "./BlogPostAuthor";
        import { BlogPostLikeControls } from "./BlogPostLikeControls";
    
        interface BlogPostHeadProps {
          blogPost: BlogPostHead_blogPost$key;
        }
    
        export const BlogPostHead = ({ blogPost }: BlogPostHeadProps) => {
          const blogPostData = useFragment<BlogPostHead_blogPost>(
            graphql`
              fragment BlogPostHead_blogPost on BlogPost {
                title
                coverImgUrl
                ...BlogPostAuthor_blogPost
                ...BlogPostLikeControls_blogPost
              }
            `,
            blogPost
          );
    
          return (
            <div>
              <img src={blogPostData.coverImgUrl} />
              <h1>{blogPostData.title}</h1>
              <BlogPostAuthor blogPost={blogPostData} />
              <BlogPostLikeControls blogPost={blogPostData} />
            </div>
          );
        };

    NOTE: In our example, the BlogPostHead and BlogPostBody define only one fragment, but in general, a component can have any number of fragments or GraphQL types and even more than one fragments on the same type.

    In the component above, two type definitions, namely BlogPostHead_blogPost$key and BlogPostHead_blogPost, are imported from the file BlogPostHead_blogPost.graphql, generated by the Relay compiler. The compiler extracts the fragment code from this file and generates the types. This process is followed for all the GraphQL code—queries, mutations, fragments, and subscriptions.

    The blogPostHead_blogPost has the fragment type definitions, which is then passed to the useFragment hook to ensure type safety when using the data from the fragment. The other import, blogPostHead_blogPost$key, is used in the interface Props { … }, and this type definition makes sure that we pass the right object to useFragment. Otherwise,  the type system will throw errors during build time. In the above child component, the blogPost object is received as a prop and is passed to useFragment as a second parameter. If the blogPost object did not have the correct fragment, i.e., BlogPostHead_blogPost, spread on it, we would have received a type error. Even if there were another fragment with exact same data selection spread on it, Relay makes sure it’s the right fragment that we use with the useFragement. This allows you to change the update fragment definitions without affecting other components.

    Data masking

    In our example, the fragment BlogPostHead_blogPost explicitly selects two fields for the component:

    1. title
    2. coverImgUrl

    This is because we use/access only these two fields in the view for the <blogposthead></blogposthead> component. So, even if we define another fragment, BlogPostAuthor_blogPost, which selects the title and coverImgUrl, we don’t receive access to them unless we ask for them in the same fragment. This is enforced by Relay’s type system both at compile time and at runtime. This safety feature of Relay makes it impossible for components to depend on data they do not explicitly select. So, developers can refactor the components without risking other components. To reiterate, all components and their data dependencies are self-contained.

    The data for this component, i.e., title and coverImgUrl, will not be accessible on the parent component, BlogPost, even though the props object is sent by the parent. The data becomes available only through the useFragment React hook. This hook can consume the fragment definition. The useFragment takes in the fragment definition and the object where the fragment is spread to get the data listed for the particular fragment.  

    Just like how we spread the fragment for the BlogPostHead component in the BlogPost root query, we an also extend this to the child components of BlogPostHead. We spread the fragments, i.e., BlogPostAuthor_blogPost, BlogPostLikeControls_blogPost, since we are rendering <BlogPostAuthor /> and <BlogPostLikeControls />.

    NOTE: The useFragment hook does not fetch the data. It can be thought of as a selector that grabs only what is needed from the data definitions.

    Performance

    When using a fragment for a component, the component subscribes only to the data it depends on. In our example, the component BlogPostHead will only automatically re-render when the fields “coverImgUrl” or “title” change for a specific blog post the component renders. Since the BlogPostAuthor_blogPost fragment does not select those fields, it will not re-render. Subscription to any updates is made on fragment level. This is an essential feature that works out of the box with Relay for performance.

    Let us now see how general data and components are updated in a different GraphQL framework than Relay. The data that gets rendered on view actually comes from an operation that requests data from the server, i.e., a query or mutation. We write the query that fetches data from the server, and that data is passed down to different components as per their needs as props. The data flows from the root component, i.e., the component with the query, down to the components. 

    Let’s look at a graphical representation of the data flow in other GraphQL frameworks:

    Image source: Dev.to

    NOTE: Here, the framework data store is usually referred to as cache in most frameworks:

    1. The Profile component executes the operation ProfileQuery to a GraphQL server.

    2. The data return is kept in some framework-specific representation of the data store.

    3. The data is passed to the view rendering it.

    4. The view then passes on the data to all the child components who need it. Example: Name, Avatar, and Bio. And finally React renders the view.

    In contrast, the Relay framework takes a different approach:

    Image source: Dev.to

    Let’s breakdown the approach taken by Relay: 

    • For the initial part, we see nothing changes. We still have a query that is sent to the GraphQL server and the data is fetched and stored in the Relay data store.
    • What Relay does after this is different. The components get the data directly from the cache-store(data store). This is because the fragments help Relay integrate deeply with the component data requirements.The component fragments get the data straight from the framework data store and do not rely on data to be passed down as props. Although some information is passed from the query to the fragments used to look up the particular data needed from the data store, the data is fetched by the fragment itself.

    To conclude the above comparison, in other frameworks (like Apollo), the component uses the query as the data source. The implementation details of how the root component executing the query sends data to its descendants is left to us. But Relay takes a different approach of letting the component take care of the data in needs from the data store.

    In an approach used by other GraphQL frameworks, the query is the data source, and updates in the data store forces the component holding the query to re-render. This re-render cascades down to any number of components even if those components do not have to do anything with the updated data other than acting as a layer to pass data from parent to child. In the Relay approach, the components directly subscribe to the updates for the data used. This ensures the best performance as our app scales in size and complexity.

    Developer Experience

    Relay removes the responsibility of developers to route the data down from query to the components that need it. This eliminates the changes of developer error. There is no way for a component to accidentally or deliberately depend on data that it should be just passing down in the component tree if it cannot access it. All the hard work is taken care of by the Relay framework if we follow the conventions discussed.

    Conclusion

    To summarize, we detailed all the work Relay does for us and the effects:

    • The type system of the Relay framework makes sure the right components get the right data they need. Everything in Relay revolves around fragments.
    • In Relay, fragments are coupled and colocated with components, which allows it to mask the data requirements from the outside world. This increases the readability and modularity.
    • By default, Relay takes care of performance as components only re-render when the exact data they use change in the data store.
    • Type generation is a main feature of Relay compiler. Through type generation, interactions with the fragment’s data is typesafe.

    Conventions enforced by Relay’s philosophy and architecture allows it to take advantage of the information available about your component. It knows the exact data dependencies and types. It uses all this information to do a lot of work that developers are required to deal with.

    Related Articles

    1. Enable Real-time Functionality in Your App with GraphQL and Pusher

    2. Build and Deploy a Real-Time React App Using AWS Amplify and GraphQL