Category: Type

  • An Introduction to React Fiber – The Algorithm Behind React

    In this article, we will learn about React Fiber—the core algorithm behind React. React Fiber is the new reconciliation algorithm in React 16. You’ve most likely heard of the virtualDOM from React 15. It’s the old reconciler algorithm (also known as the Stack Reconciler) because it uses stack internally. The same reconciler is shared with different renderers like DOM, Native, and Android view. So, calling it virtualDOM may lead to confusion.

    So without any delay, let’s see what React Fiber is.

    Introduction

    React Fiber is a completely backward-compatible rewrite of the old reconciler. This new reconciliation algorithm from React is called Fiber Reconciler. The name comes from fiber, which it uses to represent the node of the DOM tree. We will go through fiber in detail in later sections.

    The main goals of the Fiber reconciler are incremental rendering, better or smoother rendering of UI animations and gestures, and responsiveness of the user interactions. The reconciler also allows you to divide the work into multiple chunks and divide the rendering work over multiple frames. It also adds the ability to define the priority for each unit of work and pause, reuse, and abort the work. 

    Some other features of React include returning multiple elements from a render function, supporting better error handling(we can use the componentDidCatch method to get clearer error messages), and portals.

    While computing new rendering updates, React refers back to the main thread multiple times. As a result, high-priority work can be jumped over low-priority work. React has priorities defined internally for each update. 

    Before going into technical details, I would recommend you learn the following terms, which will help understand React Fiber.

    Prerequisites

    Reconciliation

    As explained in the official React documentation, reconciliation is the algorithm for diffing two DOM trees. When the UI renders for the first time, React creates a tree of nodes. Every individual node represents the React element. It creates a virtual tree (which is known as virtualDOM) that’s a copy of the rendered DOM tree. After any update from the UI, it recursively compares every tree node from two trees. The cumulative changes are then passed to the renderer.

    Scheduling

    As explained in the React documentation, suppose we have some low-priority work (like a large computing function or the rendering of recently fetched elements), and some high-priority work (such as animation). There should be an option to prioritize the high-priority work over low-priority work. In the old stack reconciler implementation, recursive traversal and calling the render method of the whole updated tree happens in single flow. This can lead to dropping frames. 

    Scheduling can be time-based or priority-based. The updates should be scheduled according to the deadline. The high-priority work should be scheduled over low-priority work.

    requestIdleCallback 

    requestAnimationFrame schedules the high-priority function to be called before the next animation frame. Similarly, requestIdleCallback schedules the low-priority or non-essential function to be called in the free time at the end of the frame. 

     requestIdleCallback(lowPriorityWork);

    This shows the usage of requestIdleCallback. lowPriorityWork is a callback function that will be called in the free time at the end of the frame.

    function lowPriorityWork(deadline) {
        while (deadline.timeRemaining() > 0 && workList.length > 0)
          performUnitOfWork();
      
        if (workList.length > 0)
          requestIdleCallback(lowPriorityWork);
      }

    When this callback function is called, it gets the argument deadline object. As you can see in the snippet above, the timeRemaining function returns the latest idle time remaining. If this time is greater than zero, we can do the work needed. And if the work is not completed, we can schedule it again at the last line for the next frame.

    So, now we are good to proceed with how the fiber object itself looks and see how React Fiber works

    Structure of fiber

    A fiber(lowercase ‘f’) is a simple JavaScript object. It represents the React element or a node of the DOM tree. It’s a unit of work. In comparison, Fiber is the React Fiber reconciler.

    This example shows a simple React component that renders in root div.

    function App() {
        return (
          <div className="wrapper">
            <div className="list">
              <div className="list_item">List item A</div>
              <div className="list_item">List item B</div>
            </div>
            <div className="section">
              <button>Add</button>
              <span>No. of items: 2</span>
            </div>
          </div>
        );
      }
     
      ReactDOM.render(<App />, document.getElementById('root'));

    It’s a simple component that shows a list of items for the data we have got from the component state. (I have replaced the .map and iteration over data with two list items just to make this example look simpler.) There is also a button and the span,which shows the number of list items.

    As mentioned earlier, fiber represents the React element. While rendering for the first time, React goes through each of the React elements and creates a tree of fibers. (We will see how it creates this tree in later sections.) 

    It creates a fiber for each individual React element, like in the example above. It will create a fiber, such as W, for the div, which has the class wrapper. Then, fiber L for the div, which has a class list, and so on. Let’s name the fibers for two list items as LA and LB.

    In the later section, we will see how it iterates and the final structure of the tree. Though we call it a tree, React Fiber creates a linked list of nodes where each node is a fiber. And there is a relationship between parent, child, and siblings. React uses a return key to point to the parent node, where any of the children fiber should return after completion of work. So, in the above example, LA’s return is L, and the sibling is LB.

    So, how does this fiber object actually look?

    Below is the definition of type, as defined in the React codebase. I have removed some extra props and kept some comments to understand the meaning of the properties. You can find the detailed structure in the React codebase.

    export type Fiber = {
        // Tag identifying the type of fiber.
        tag: TypeOfWork,
     
        // Unique identifier of this child.
        key: null | string,
     
        // The value of element.type which is used to preserve the identity during
        // reconciliation of this child.
        elementType: any,
     
        // The resolved function/class/ associated with this fiber.
        type: any,
     
        // The local state associated with this fiber.
        stateNode: any,
     
        // Remaining fields belong to Fiber
     
        // The Fiber to return to after finishing processing this one.
        // This is effectively the parent.
        // It is conceptually the same as the return address of a stack frame.
        return: Fiber | null,
     
        // Singly Linked List Tree Structure.
        child: Fiber | null,
        sibling: Fiber | null,
        index: number,
     
        // The ref last used to attach this node.
        ref: null | (((handle: mixed) => void) & {_stringRef: ?string, ...}) | RefObject,
     
        // Input is the data coming into process this fiber. Arguments. Props.
        pendingProps: any, // This type will be more specific once we overload the tag.
        memoizedProps: any, // The props used to create the output.
     
        // A queue of state updates and callbacks.
        updateQueue: mixed,
     
        // The state used to create the output
        memoizedState: any,
     
        mode: TypeOfMode,
     
        // Effect
        effectTag: SideEffectTag,
        subtreeTag: SubtreeTag,
        deletions: Array<Fiber> | null,
     
        // Singly linked list fast path to the next fiber with side-effects.
        nextEffect: Fiber | null,
     
        // The first and last fiber with side-effect within this subtree. This allows
        // us to reuse a slice of the linked list when we reuse the work done within
        // this fiber.
        firstEffect: Fiber | null,
        lastEffect: Fiber | null,
     
        // This is a pooled version of a Fiber. Every fiber that gets updated will
        // eventually have a pair. There are cases when we can clean up pairs to save
        // memory if we need to.
        alternate: Fiber | null,
      };

    How does React Fiber work?

    Next, we will see how the React Fiber creates the linked list tree and what it does when there is an update.

    Before that, let’s explain what a current tree and workInProgress tree is and how the tree traversal happens. 

    The tree, which is currently flushed to render the UI, is called current. It’s one that was used to render the current UI. Whenever there is an update, Fiber builds a workInProgress tree, which is created from the updated data from the React elements. React performs work on this workInProgress tree and uses this updated tree for the next render. Once this workInProgress tree is rendered on the UI, it becomes the current tree.

    Fig:- Current and workInProgress trees

    Fiber tree traversal happens like this:

    • Start: Fiber starts traversal from the topmost React element and creates a fiber node for it. 
    • Child: Then, it goes to the child element and creates a fiber node for this element. This continues until the leaf element is reached. 
    • Sibling: Now, it checks for the sibling element if there is any. If there is any sibling, it traverses the sibling subtree until the leaf element of the sibling. 
    • Return: If there is no sibling, then it returns to the parent. 

    Every fiber has a child (or a null value if there is no child), sibling, and parent property (as you have seen the structure of fiber in the earlier section). These are the pointers in the Fiber to work as a linked list.

    Fig:- React Fiber tree traversal

    Let’s take the same example, but let’s name the fibers that correspond to the specific React elements.

    function App() {    // App
        return (
          <div className="wrapper">    // W
            <div className="list">    // L
              <div className="list_item">List item A</div>    // LA
              <div className="list_item">List item B</div>    // LB
            </div>
            <div className="section">   // S
              <button>Add</button>   // SB
              <span>No. of items: 2</span>   // SS
            </div>
          </div>
        );
      }
     
      ReactDOM.render(<App />, document.getElementById('root'));  // HostRoot

    First, we will quickly cover the mounting stage where the tree is created, and after that, we will see the detailed logic behind what happens after any update.

    Initial render

    The App component is rendered in root div, which has the id of root.

    Before traversing further, React Fiber creates a root fiber. Every Fiber tree has one root node. Here in our case, it’s HostRoot. There can be multiple roots if we import multiple React Apps in the DOM.

    Before rendering for the first time, there won’t be any tree. React Fiber traverses through the output from each component’s render function and creates a fiber node in the tree for each React element. It uses createFiberFromTypeAndProps to convert React elements to fiber. The React element can be a class component or a host component like div or span. For the class component, it creates an instance, and for the host component, it gets the data/props from the React Element.

    So, as shown in the example, it creates a fiber App. Going further, it creates one more fiber, W, and then it goes to child div and creates a fiber L. So on, it creates a fiber, LA  and LB, for its children. The fiber, LA, will have return (can also be called as a parent in this case) fiber as L, and sibling as LB.

    So, this is how the final fiber tree will look.

    Fig:- React Fiber Relationship

    This is how the nodes of a tree are connected using the child, sibling, and return pointers.

    Update Phase

    Now, let’s cover the second case, which is update—say due to setState. 

    So, at this time, Fiber already has the current tree. For every update, it builds a workInProgress tree. It starts with the root fiber and traverses the tree until the leaf node. Unlike the initial render phase, it doesn’t create a new fiber for every React element. It just uses the preexisting fiber for that React element and merges the new data/props from the updated element in the update phase. 

    Earlier, in React 15, the stack reconciler was synchronous. So, an update would traverse the whole tree recursively and make a copy of the tree. Suppose in between this, if some other update comes that has a higher priority than this, then there is no chance to abort or pause the first update and perform the second update. 

    React Fiber divides the update into units of works. It can assign the priority to each unit of work, and has the ability to pause, reuse, or abort the unit of work if not needed. React Fiber divides the work into multiple units of work, which is fiber. It schedules the work in multiple frames and uses the deadline from the requestIdleCallback. Every update has its priority defined like animation, or user input has a higher priority than rendering the list of items from the fetched data. Fiber uses requestAnimationFrame for higher priority updates and requestIdleCallback for lower priority updates. So, while scheduling a work, Fiber checks the priority of the current update and the deadline (free time after the end of the frame).

    Fiber can schedule multiple units of work after a single frame if the priority is higher than the pending work—or if there is no deadline or the deadline has yet to be reached. And the next set of units of work is carried over the further frames. This is what makes it possible for Fiber to pause, reuse, and abort the unit of work.

    So, let’s see what actually happens in the scheduled work. There are two phases to complete the work: render and commit.

    Render Phase

    The actual tree traversal and the use of deadline happens in this phase. This is the internal logic of Fiber, so the changes made on the Fiber tree in this phase won’t be visible to the user. So Fiber can pause, abort, or divide work on multiple frames. 

    We can call this phase the reconciliation phase. Fiber traverses from the root of the fiber tree and processes each fiber. The workLoop function is called for every unit of work to perform the work. We can divide this processing of the work into two steps: begin and complete.

    Begin Step

    If you find the workLoop function from the React codebase, it calls the performUnitOfWork, which takes the nextUnitOfWork as a parameter. It is nothing but the unit of work, which will be performed. The performUnitOfWork function internally calls the beginWork function. This is where the actual work happens on the fiber, and performUnitOfWork is just where the iteration happens. 

    Inside the beginWork function, if the fiber doesn’t have any pending work, it just bails out(skips) the fiber without entering the begin phase. This is how, while traversing the large tree, Fiber skips already processed fibers and directly jumps to the fiber, which has pending work. If you see the large beginWork function code block, we will find a switch block that calls the respective fiber update function, depending on the fiber tag. Like updateHostComponent for host components. These functions update the fiber. 

    The beginWork function returns the child fiber if there is any or null if there is no child. The performUnitOfWork function keeps on iterative and calls the child fibers till the leaf node reaches. In the case of a leaf node, beginWork returns null as there is no any child and performUnitOfWork function calls a completeUnitOfWork function. Let’s see the complete step now.

    Complete Step

    This completeUnitOfWork function completes the current unit of work by calling a completeWork function. completeUnitOfWork returns a sibling fiber if there is any to perform the next unit of work else completes the return(parent) fiber if there is no work on it. This goes till the return is null, i.e.,  until it reaches the root node. Like beginWork, completeWork is also a function where actual work happens, and completeUnitOfWork is for the iterations.

    The result of the render phase creates an effect list (side-effects). These effects are like insert, update, or delete a node of host components, or calling the lifecycle methods for the node of class components. The fibers are marked with the respective effect tag.

    After the render phase, Fiber will be ready to commit the updates. 

    Commit Phase

    This is the phase where the finished work will be used to render it on the UI. As the result of this phase will be visible to the user, it can’t be divided in partial renders. This phase is a synchronous phase. 

    At the beginning of this phase, Fiber has the current tree that’s already rendered on the UI, finishedWork, or the workInProgress tree, which is built during the render phase and the effect list.

    The effect list is the linked list of fibers, which has side-effects. So, it’s a subset of nodes of the workInProgress tree from the render phase, which has side-effects(updates). The effect list nodes are linked using a nextEffect pointer.

    The function called during this phase is completeRoot

    Here, the workInProgress tree becomes the current tree as it is used to render the UI. The actual DOM updates like insert, update, delete, and calls to lifecycle methods—or updates related to refs—happen for the nodes present in the effect list.

    That’s how the Fiber reconciler works.

    Conclusion

    This is how the React Fiber reconciler makes it possible to divide the work into multiple units of work. It sets the priority of each work, and makes it possible to pause, reuse, and abort the unit of work. In the fiber tree, the individual node keeps track of which are needed to make the above things possible. Every fiber is a node of linked lists, which are connected through the child, sibling, and return references. 

    Here is a well documented list of resources you can find to know more about the React Fiber.

    Related Articles

    1. Using Formik To Build Dynamic Forms In React – Faster & Better

    2. Cleaner, Efficient Code with Hooks and Functional Programming

  • Set Up Simple S3 Deployment Workflow with Github Actions and CircleCI

    In this article, we’ll implement a continuous delivery (referred to as CD going forward) workflow using the Serverless framework for our demo React SPA application using Serverless Finch.

    Deploying single-page applications to AWS S3 is a common use case. Manual deployment and bucket configuration can be tedious and unreliable. By using Serverless and CD platforms, we can simplify this commonly faced CD challenge.

    In almost every project we have worked on, we have built a general-purpose continuous integration (referred to as CI through the rest of this article) setup as part of our basic setups. The CI requirements might range from simple test workflows to cluster deployments.

    In this article, we’ll be focusing on a simple deployment workflow using Github Actions and CircleCI. Github Actions brought CI/CD to a wider community by simplifying the setup for CI pipelines. 

    Prerequisites

    This article assumes you have a basic understanding of CICD and AWS services such as IAM and S3. The sample application uses a basic Create React Application for the deployment demo. But knowing React.js is not required. You can implement the same flow for any other SPA or bare-bones application.

    Why Github Actions?

    There have always been great tools and CI platforms, such as AWS CodePipeline, Jenkins, Travis CI, CircleCI, etc. What makes Github Actions so compelling is that it’s built inside Github. Many organizations use Github for source control, and they often have to spend time configuring repositories with CI tools. On top of that, starting with Github Actions is free.

    As Github Actions is built inside the Github ecosystem, it’s a piece of cake to get CI pipelines up and running. Github Actions also allow you to build your own actions. However, there are some limitations because the CI platform is quite new compared to others.

    Why CircleCI?

    CircleCI has been in the market for almost a decade providing CICD solutions. One of many reasons to choose CircleCI is its pricing. CircleCI offers free credits each month without any upfront payments or payment details. It also offers a wide-ranging repository of plugins called Orbs. You can even build your own orbs, which are easy. It also offers simple and reliable workflow building tools. You can check other features as well.

    Let’s Get Started

    To introduce the application, we’ll create a simple React application with master-detail flow added to it. We’ll be using React’s official CRA tool to create our project, which creates the boilerplate for us.

    Installing Dependencies

    Let’s install the create-react-app as a global package. We’ll be calling our demo project “Serverless S3”. Now, we will create our react app with the following:

    yarn global add create-react-app
    create-react-app serverless-s3

    Now that we’ve created the frontend application, we can start building something cool with it. If we run the application with yarn start, we should be able to see the default CRA welcome page:

    Source: React

    To implement our master-detail flow of Github repositories, we’ll need to add some navigation to our app. Also, to keep it short, we’ll be using Github’s official SDK package. So, let’s use the react-router for the same.

    yarn add react-router-dom @octakit/core

    Our demo application will consist of two routes: 

    1. A list of all public repos of an organization
    2. The details of the repository after clicking a repo item from the list 

    We’ll be using the Octokit client to fetch the data from Github’s open endpoints. This won’t need any authentication with Github.

    Adding Application Components

    Alright, now that we have our dependencies installed, we can add the routes to our App.js, which is the entry point for our React app.

    import { BrowserRouter as Router, Switch, Route } from 'react-router-dom';
     
    import RepoList from './RepoList';
    import RepoDetails from './RepoDetails';
     
    import './App.css';
     
    function App() {
      return (
       <Router>
         <div className="App">
           <Switch>
             <Route path="/repo/:owner/:repo" component={RepoDetails} />
             <Route path="/" component={RepoList} />
           </Switch>
         </div>
       </Router>
     );
    }
     
    export default App;

    Let’s initialize our Octokit client, which will help us make calls to Github’s open endpoints to get data.

    import { Octokit } from '@octokit/core';
     
    export const octokit = new Octokit({});

    You can even make calls to authorized resources with the Octokit client. Octokit client supports both GraphQL and REST API. You can learn more about the client through the official documentation.

    Let’s add the RepoList.js component to the application, which will fetch the list of repositories of a given organization and display hyperlinks to the details page.

    import React, { useEffect, useState } from 'react';
    import { Link } from 'react-router-dom';
    import { octokit } from './client';
     
    function RepoList() {
     const [repos, setRepos] = useState([]);
     useEffect(() => {
       octokit
         .request('GET /orgs/:org/repos', {
           org: 'octokit',
         })
         .then((data) => setRepos(data.data));
     }, []);
     
     return (
       <div className="repo-list-container">
         <h1>Repositories</h1>
         <ul>
           {repos.map((repo) => (
             <li key={repo.id} className="repo-list-item">
               <Link to={`/repo/${repo.owner.login}/${repo.name}`}>{repo.full_name}</Link>
             </li>
           ))}
         </ul>
       </div>
     );
    }
     
    export default RepoList;

    Now that we have our list of repositories ready, we can now allow users to see some of their general details. Let’s create our details component called RepoDetails:

    import { useEffect, useState } from 'react';
    import { useParams } from 'react-router-dom';
    import { octokit } from './client';
    function RepoDetails() {
      const [repo, setRepo] = useState();
      const { repo: repoName, owner } = useParams();
      useEffect(() => {
        octokit
          .request('GET /repos/{owner}/{repo}', {
            owner,
            repo: repoName,
          })
          .then((data) => setRepo(data.data));
      }, [repoName, owner]);
      if (!repo) {
        return <b>loading...</b>;
      }
      return (
        <div className="repo-container">
          <h1>{repo.full_name}</h1>
          <p>Description: {repo.description}</p>
          <ul>
            <li><b>Forks:</b> {repo.forks}</li>
            <li><b>Subscribers:</b> {repo.subscribers_count}</li>
            <li><b>Watchers:</b> {repo.watchers}</li>
            <li><b>License:</b> {repo.license.name}</li>
          </ul>
        </div>
      );
    }
    export default RepoDetails;

    Setting up Serverless

    With this done, we have our repositories master-detail flow ready. Assuming we have an AWS account setup, we can start adding the Serverless config to our project. Let’s start with the CD setup. As we said before, we’ll be using the Serverless framework to achieve our deployment workflow. Let’s add it.

    We’ll also install the Serverless plugin called serverless-finch, which allows us to configure and deploy to S3 buckets.

    yarn global add serverless
    yarn add serverless-finch --save-dev

    Now that we have our Serverless CLI installed, we init the serverless service in our project by running the following command to create a hello-world serverless service:

    serverless create -t hello-world

    This will create a configuration yaml file and a handler lambda function. We don’t need the handler, so we can delete handler.js. Our serverless.yml should look like this:

    service: serverless-s3
    frameworkVersion: '2'
     
    # The `provider` block defines where your service will be deployed
    provider:
     name: aws
     runtime: nodejs12.x
     
    functions:
     helloWorld:
       handler: handler.hello-world
         events:
         - http:
             path: helloWorld
             method: get
             cors: true

    The serverless.yml file contains configurations for a lambda function called hello-world. We can remove the functions block completely. After doing that, let’s register our Serverless Finch plugin:

    service: serverless-s3
    frameworkVersion: '2'
     
    provider:
     name: aws
     runtime: nodejs12.x
     
    plugins:
     - serverless-finch

    Alright, now that our plugin is ready to be used, we can add details about our S3 buckets so it can deploy to it. Let’s add this block, which tells Serverless to use the serverless-s3-galileo bucket to deploy our code from the build directory. Make sure you use a different bucket name, as S3 bucket names are unique globally.

    custom:
     client:
       bucketName: serverles-s3-galileo
       distributionFolder: build
       indexDocument: index.html
       errorDocument: index.html

    That is it! We’re ready to deploy our app on our bucket. Haven’t created a bucket yet? No problem—serverless-finch will automatically create it. The last thing we need to add is bucket-policy so our app can be accessed publicly. Let’s create our bucket policy.

    Note: The indexDocument is the entry point for our web application, which is index.html in this case. We also need to add the same to errorDocument so our React routing works well in S3 hosting.

    {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "*"
               },
               "Action": "s3:GetObject",
               "Resource": "arn:aws:s3:::serverles-s3-galileo/*"
           }
       ]
    }

    As the default access to S3 assets is private, we need to set up a bucket policy for our deployment bucket. The policy gives read-only access to the public for our app so we can browse the deployed assets in the browser. You can learn more about bucket policies. Let’s update our Serverless config to use our policy. This is how our serverless.yml should look:

    service: serverless-s3
    frameworkVersion: '2'
     
    provider:
     name: aws
     runtime: nodejs12.x
     
    plugins:
     - serverless-finch
     
    custom:
     client:
       bucketName: serverles-s3-galileo
       distributionFolder: build
       indexDocument: index.html
       errorDocument: index.html
       bucketPolicyFile: config/bucket-policy.json

    Creating Github Actions Workflow

    Assuming you’ve created your repo and pushed the code to it, we can start setting up our first workflow using Github Actions. As we’re using AWS for our Serverless deployments to S3, we need to provide the details of our IAM role. The env block allows us to insert custom env variables into the CI build. In this case, we need the AWS access key and secret access key to deploy build files to the S3 bucket. 

    Github allows us to store secret values that can be used in the CI environment of Github Actions. You can easily set up these secrets for your repositories. This is how they should look when configured:

    Now, we can move ahead and add a Github Action workflow. Let’s create a workflow file at the .github/deploy.yml location and add the following to it.

    name: Serverless S3 Deploy
    on:
     push:
       branches: [ master ]
     pull_request:
       branches: [ master ]

    Alright, so the Github Actions config above tells Github to trigger this workflow whenever someone pushes to the master branch or creates a PR against it.

    As of now, our action config is incomplete and does nothing. Let’s add our first and only job to the workflow:

    name: Serverless S3
     
    on:
     push:
       branches: [ master ]
     pull_request:
       branches: [ master ]
     
    jobs:
     build:
       runs-on: ubuntu-latest
       strategy:
         matrix:
           node-version: [10.x]
       steps:
       - uses: actions/checkout@v2

    Let’s try to digest the config above:

    runs-on:  ubuntu-latest

    The runs-on statement specifies which executor will be running the job. In this case, it’s the latest release of Linux Ubuntu variant.

    Strategy: 

         Matrix:

            node-version: [10.x]

    The strategy defines the environment we want to run our job on. This is usually useful when we want to run tests on multiple machines. In our case, we don’t want that. So, we’ll be using a single node environment with version 10.x

       steps:

       – uses: actions/checkout@v2

    In the configuration’s steps block, we can define various tasks to be sequentially performed within a job. actions/checkout@v2 does the work of checking out branches for us. This step is required so we can do further work on our source code.

    This bare minimum setup is required for running a job in our Github workflows. After this, we will need to set up the environment and deploy our application. So, let’s add the rest of the steps to it.

    name: Serverless S3
     
    on:
     push:
       branches: [ master ]
     pull_request:
       branches: [ master ]
     
    jobs:
     build:
       runs-on: ubuntu-latest
       strategy:
         matrix:
           node-version: [10.x]
       steps:
       - uses: actions/checkout@v2
       - name: Use Node.js ${{ matrix.node-version }}
         uses: actions/setup-node@v1
         with:
           node-version: ${{ matrix.node-version }}
       - run: yarn install
       - run: yarn build
       - name: serverless deploy s3
         uses: serverless/github-action@master
         with:
           args: client deploy --no-confirm
         env:
           AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
           AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

    These actions need to be executed to deploy our frontend assets to our S3 buckets. As we read through the steps, we’re doing the following things in sequence:

    1. Check out the current branch code

    2. Setting up our node.js environment

    3. Installing our dependencies with yarn install

    4. Building our production build with yarn build

    5. Deploying our build to S3 with serverless deploy –no-confirm

    • The uses block defines which custom action we’re using
    • The args block allows us to pass arguments to the actions
    • The –no-confirm flag is needed so Serverless Finch does not ask us for confirmations while deploying to S3 buckets. 
    • The args allows us to tell action to run it with specific arguments
    • env allows us to pass custom environment variables to an action

    Alright, so now we have the CD workflow setup to deploy our app. We can make a commit and push to the master branch. This should trigger our workflow. You can see your workflow running in the Actions section of your repository like this:

    You can check the output of the serverless deploy step and browse the S3 website URL. It should now show our application running.

    Creating CircleCI Workflow

    To start building a repository, we need to authorize it with our Github account. You can do that by signing up for CircleCI and following the steps here.

    As we did, add the IAM role secret credentials to our actions workflow. We can set up env variables for our workflows in CircleCI. This is how they should look once configured in the project settings:

    Just like the Github Actions workflow, we can create workflows in CircleCI. CircleCI also allows us to use third-party custom plugins. We can use the available plugins called Orbs in our deployment workflows in CircleCI.

    We’ll need the official CircleCI distributions of the aws-cli, serverless-framework, and node.js orbs for our deploy workflow. Let’s create our first job for our workflow:

    version: 2.1
     
    orbs:
     aws-cli: circleci/aws-cli@1.0.0
     serverless: circleci/serverless-framework@1.0.1
     node: circleci/node@4.1.0
     
    jobs:
     deploy:
       executor: serverless/default

    The executor here is a prebuilt image, which allows us to run. 

    Just like we defined steps for our jobs in Github Actions, we can add for CircleCI. Here we’re using commands made available from the node orb to install dependencies, build projects, and set up Serverless with AWS. Just like we set up the secrets for Github Actions, we need to define our AWS credentials under the CircleCI environment variables.

    version: 2.1
     
    orbs:
     aws-cli: circleci/aws-cli@1.0.0
     serverless: circleci/serverless-framework@1.0.1
     node: circleci/node@4.1.0
     
    jobs:
     deploy:
       executor: serverless/default
       steps:
         - checkout
         - node/install-yarn
         - run:
             name: install
             command: yarn install
         - run:
             name: build
             command: yarn build
         - aws-cli/setup
         - serverless/setup:
             app-name: serverless-s3
             org-name: velotio
         - run:
             name: deploy
             command: serverless client deploy --no-confirm
    workflows:
     deploy:
       jobs:
         - deploy:
             filters:
               branches:
                 only:
                   - master

    The workflows section in the above yml file indicates that we want to trigger the deploy workflow whenever our master branch gets updated. Just like we mentioned the steps for the Github Actions deploy job, we did the same for CircleCI jobs.

    1. Check out the code
    2. Install yarn package manager with node/install-yarn 
    3. Install dependencies with yarn install
    4. Build the project with yarn build
    5. Setup AWS and Serverless CLI
    6. Deploy to s3 with serverless client deploy –no-confirm

    The workflow block in the config above tells CircleCI to run the deploy job. The filters block for the deploy job above tells us that we want to run the job only when the master branch gets updated. 

    Once we’re done with the above setup, we can make a test commit and check whether our workflow is running.

    Conclusion

    We can easily integrate build/deployment workflows with simple configurations offered through Github Actions. If we don’t primarily use GitHub as version control, we can opt for CircleCI for our workflows.

    Related Articles

    1. Automating Serverless Framework Deployment using Watchdog
    2. To Go Serverless Or Not Is The Question

    You can find the referenced code at this repo.

  • What is Gatsby.Js and What Problems Does it Solve?

    According to their site, “Gatsby is a free and open source framework based on React that helps developers build blazing fast websites and apps”. Gatsby allows the developers to make a site using React and work with any data source (CMSs, Markdown, etc) of their choice. And then at the build time it pulls the data from these sources and spits out a bunch of static files that are optimized by Gatsby for performance. Gatsby loads only the critical HTML, CSS and JavaScript so that the site loads as fast as possible. Once loaded, Gatsby prefetches resources for other pages so clicking around the site feels incredibly fast.

    What Gatsby Tries to Achieve?

    • Construct new, higher-level web building blocks: Gatsby is trying to build abstractions like gatsby-image, gatsby-link which will make web development easier by providing building blocks instead of making a new component for each project.

    • Create a cohesive “content mesh system”: The Content Management System (CMS) was developed to make the content sites possible. Traditionally, a CMS solution was a monolith application to store content, build sites and deliver them to users. But with time, the industry moved to using specialized tools to handle the key areas like search, analytics, payments, etc which have improved rapidly, while the quality of monolithic enterprise CMS applications like Adobe Experience Manager and Sitecore has stayed roughly the same.

      To tackle this modular CMS architecture, Gatsby aims to build a “content mesh” – the infrastructure layer for a decoupled website. The content mesh stitches together content systems in a modern development environment while optimizing website delivery for performance. The content mesh empowers developers while preserving content creators’ workflows. It gives you access to best-of-breed services without the pain of manual integration.

    Image Source: Gatsby

    Make building websites fun by making them simple: Each of the stakeholder in a website project should be able to see their creation quickl Using these building blocks along with the content mesh, website building feels fun no matter how big it gets. As Alan Kay said, “you get simplicity by finding slightly more sophisticated building blocks”.

    An example of this can be seen in gatsby-image component. First lets consider how a single image gets on a website:

    1. A page is designed
    2. Specific images are chosen
    3. The images are resized (with ideally multiple thumbnails to fit different devices)
    4. And finally, the image(s) are included in the HTML/CSS/JS (or React component) for the page.

    gatsby-image is integrated into Gatsby’s data layer and uses its image processing capabilities along with graphql to query for differently sized and shaped images.

    We also skip the complexity around lazy-loading the images which are placed within placeholders. Also the complexity in generating the right sized image thumbnails is also taken care of.

    So instead of a long pipeline of tasks to setup optimized images for your site, the steps now are:
    1. Install gatsby-image
    2. Decide what size of image you need
    3. Add your query and the gatsby-image component to your page
    4. And…that’s it!

    Now images are fun!

    • Build a better web – qualities like speed, security, maintainability SEO, etc should be baked into the framework being used. If they are implemented on a per-site basis then it is a luxury. Gatsby bakes these qualities by default so that the right thing is the easy thing. The most high-impact way to make the web better is to make it high-quality by default.

    It is More Than Just a Static Site Generator

    Gatsby is not just for creating static sites. Gatsby is fully capable of generating a PWA with all the things we think that a modern web app can do, including auth, dynamic interactions, fetching data etc.

    Gatsby does this by generating the static content using React DOM server-side APIs. Once this basic HTML is generated by Gatsby, React picks up where we left off. That basically means that Gatsby renders as much, upfront, as possible statically then client side React picks up and now we can do whatever a traditional React web app can do.

    Best of Both Worlds

    Generating statically generated HTML and then giving client-side React to do whatever it needs to do, using Gatsby gives us the best of both the worlds.

    Statically rendered pages maximize SEO, provide a better TTI, general web performance, etc. Static sites have an easy global distribution and are easier to deploy

    Conclusion

    If the code runs successfully in the development mode (Gatsby develop) it doesn’t mean that there will be no issues with the build version. An easy solution is to build the code regularly and solve the issues. It is easy enough for where the build has to be generated after every change and the build time is a couple of minutes. But if there are frequent changes and the build gets created a few times a week or month, then it might be harder to do as multiple issues will have to be solved at the build time.

    If you have a very big site with a lot of styled components and libraries then the build time increases substantially. If the build takes half an hour to build then it is no longer feasible to run the build after every change which makes finding the build issues regularly complicated.

  • Building a Progressive Web Application in React [With Live Code Examples]

    What is PWA:

    A Progressive Web Application or PWA is a web application that is built to look and behave like native apps, operates offline-first, is optimized for a variety of viewports ranging from mobile, tablets to FHD desktop monitors and more. PWAs are built using front-end technologies such as HTML, CSS and JavaScript and bring native-like user experience to the web platform. PWAs can also be installed on devices just like native apps.

    For an application to be classified as a PWA, it must tick all of these boxes:

    • PWAs must implement service workers. Service workers act as a proxy between the web browsers and API servers. This allows web apps to manage and cache network requests and assets
    • PWAs must be served over a secure network, i.e. the application must be served over HTTPS
    • PWAs must have a web manifest definition, which is a JSON file that provides basic information about the PWA, such as name, different icons, look and feel of the app, splash screen, version of the app, description, author, etc

    Why build a PWA?

    Businesses and engineering teams should consider building a progressive web app instead of a traditional web app. Here are some of the most prominent arguments in favor of PWAs:

    • PWAs are responsive. The mobile-first design approach enables PWAs to support a variety of viewports and orientation
    • PWAs can work on slow Internet or no Internet environment. App developers can choose how a PWA will behave when there’s no Internet connectivity, whereas traditional web apps or websites simply stop working without an active Internet connection
    • PWAs are secure because they are always served over HTTPs
    • PWAs can be installed on the home screen, making the application more accessible
    • PWAs bring in rich features, such as push notification, application updates and more

    PWA and React

    There are various ways to build a progressive web application. One can just use Vanilla JS, HTML and CSS or pick up a framework or library. Some of the popular choices in 2020 are Ionic, Vue, Angular, Polymer, and of course React, which happens to be my favorite front-end library.

    Building PWAs with React

    To get started, let’s create a PWA which lists all the users in a system.

    npm init react-app users
    cd users
    yarn add react-router-dom
    yarn run start

    Next, we will replace the default App.js file with our own implementation.

    import React from "react";
    import { BrowserRouter, Route } from "react-router-dom";
    import "./App.css";
    const Users = () => {
     // state
     const [users, setUsers] = React.useState([]);
     // effects
     React.useEffect(() => {
       fetch("https://jsonplaceholder.typicode.com/users")
         .then((res) => res.json())
         .then((users) => {
           setUsers(users);
         })
         .catch((err) => {});
     }, []);
     // render
     return (
       <div>
         <h2>Users</h2>
         <ul>
           {users.map((user) => (
             <li key={user.id}>
               {user.name} ({user.email})
             </li>
           ))}
         </ul>
       </div>
     );
    };
    const App = () => (
     <BrowserRouter>
       <Route path="/" exact component={Users} />
     </BrowserRouter>
    );
    export default App;

    This displays a list of users fetched from the server.

    Let’s also remove the logo.svg file inside the src directory and truncate the App.css file that is populated as a part of the boilerplate code.

    To make this app a PWA, we need to follow these steps:

    1. Register service worker

    • In the file /src/index.js, replace serviceWorker.unregister() with serviceWorker.register().
    import React from 'react';
    import ReactDOM from 'react-dom';
    import './index.css';
    import App from './App';
    import * as serviceWorker from './serviceWorker';
    ReactDOM.render(
     <React.StrictMode>
       <App />
     </React.StrictMode>,
     document.getElementById('root')
    );
    serviceWorker.register();

    • The default behavior here is to not set up a service worker, i.e. the CRA boilerplate allows the users to opt-in for the offline-first experience.

    2. Update the manifest file

    • The CRA boilerplate provides a manifest file out of the box. This file is located at /public/manifest.json and needs to be modified to include the name of the PWA, description, splash screen configuration and much more. You can read more about available configuration options in the manifest file here.

    Our modified manifest file looks like this:

    {
     "short_name": "User Mgmt.",
     "name": "User Management",
     "icons": [
       {
         "src": "favicon.ico",
         "sizes": "64x64 32x32 24x24 16x16",
         "type": "image/x-icon"
       },
       {
         "src": "logo192.png",
         "type": "image/png",
         "sizes": "192x192"
       },
       {
         "src": "logo512.png",
         "type": "image/png",
         "sizes": "512x512"
       }
     ],
     "start_url": ".",
     "display": "standalone",
     "theme_color": "#aaffaa",
     "background_color": "#ffffff"
    }

    PWA Splash Screen

    Here the display mode selected is “standalone” which tells the web browsers to give this PWA the same look and feel as that of a standalone app. Other display options include, “browser,” which is the default mode and launches the PWA like a traditional web app and “fullscreen,” which opens the PWA in fullscreen mode – hiding all other elements such as navigation, the address bar and the status bar.

    The manifest can be inspected using Chrome dev tools > Application tab > Manifest.

    1. Test the PWA:

    • To test a progressive web app, build it completely first. This is because PWA features, such as caching aren’t enabled while running the app in dev mode to ensure hassle-free development  
    • Create a production build with: npm run build
    • Change into the build directory: cd build
    • Host the app locally: http-server or python3 -m http.server 8080
    • Test the application by logging in to http://localhost:8080

    2. Audit the PWA: If you are testing the app for the first time on a desktop or laptop browser, PWA may look like just another website. To test and audit various aspects of the PWA, let’s use Lighthouse, which is a tool built by Google specifically for this purpose.

    PWA on mobile

    At this point, we already have a simple PWA which can be published on the Internet and made available to billions of devices. Now let’s try to enhance the app by improving its offline viewing experience.

    1. Offline indication: Since service workers can operate without the Internet as well, let’s add an offline indicator banner to let users know the current state of the application. We will use navigator.onLine along with the “online” and “offline” window events to detect the connection status.

     // state
      const [offline, setOffline] = React.useState(false);
      // effects
      React.useEffect(() => {
        window.addEventListener("offline", offlineListener);
        return () => {
          window.removeEventListener("offline", offlineListener);
        };
      }, []);
      
      {/* add to jsx */}
      {offline ? (
        <div className="banner-offline">The app is currently offline</div>
      ) : null}

    The easiest way to test this is to just turn off the Wi-Fi on your dev machine. Chrome dev tools also provide an option to test this without actually going offline. Head over to Dev tools > Network and then select “Offline” from the dropdown in the top section. This should bring up the banner when the app is offline.

    2. Let’s cache a network request using service worker

    CRA comes with its own service-worker.js file which caches all static assets such as JavaScript and CSS files that are a part of the application bundle. To put custom logic into the service worker, let’s create a new file called ‘custom-service-worker.js’ and combine the two.

    • Install react-app-rewired and update package.json:
    1. yarn add react-app-rewired
    2. Update the package.json as follows:
    "scripts": {
       "start": "react-app-rewired start",
       "build": "react-app-rewired build",
       "test": "react-app-rewired test",
       "eject": "react-app-rewired eject"
    },

    • Create a config file to override how CRA generates service workers and inject our custom service worker, i.e. combine the two service worker files.
    const WorkboxWebpackPlugin = require("workbox-webpack-plugin");
    module.exports = function override(config, env) {
      config.plugins = config.plugins.map((plugin) => {
        if (plugin.constructor.name === "GenerateSW") {
          return new WorkboxWebpackPlugin.InjectManifest({
           swSrc: "./src/service-worker-custom.js",
           swDest: "service-worker.js"
          });
        }
        return plugin;
      });
      return config;
    };

    • Create service-worker-custom.js file and cache network request in there:
    workbox.skipWaiting();
    workbox.clientsClaim();
    workbox.routing.registerRoute(
      new RegExp("/users"),
      workbox.strategies.NetworkFirst()
    );
    workbox.precaching.precacheAndRoute(self.__precacheManifest || [])

    Your app should now work correctly in the offline mode.

    Distributing and publishing a PWA

    PWAs can be published just like any other website and only have one additional requirement, i.e. it must be served over HTTPs. When a user visits PWA from mobile or tablet, a pop-up is displayed asking the user if they’d like to install the app to their home screen.

    Conclusion

    Building PWAs with React enables engineering teams to develop, deploy and publish progressive web apps for billions of devices using technologies they’re already familiar with. Existing React apps can also be converted to a PWA. PWAs are fun to build, easy to ship and distribute, and add a lot of value to customers by providing native-live experience, better engagement via features, such as add to homescreen, push notifications and more without any installation process. 

  • Amazon Lex + AWS Lambda: Beyond Hello World

    In my previous blog, I explained how to get started with Amazon Lex and build simple bots. This blog aims at exploring the Lambda functions used by Amazon Lex for code validation and fulfillment. We will go along with the same example we created in our first blog i.e. purchasing a book and will see in details how the dots are connected.

    This blog is divided into following sections:

    1. Lambda function input format
    2. Response format
    3. Managing conversation context
    4. An example (demonstration to understand better how context is maintained to make data flow between two different intents)

    NOTE: Input to a Lambda function will change according to the language you use to create the function. Since we have used NodeJS for our example, everything will thus be explained using it.

    Section 1:  Lambda function input format

    When communication is started with a Bot, Amazon Lex passes control to Lambda function, we have defined while creating the bot.

    There are three arguments that Amazon Lex passes to a Lambda function:

    1. Event:

    event is a JSON variable containing all details regarding a bot conversation. Every time lambda function is invoked, event JSON is sent by Amazon Lex which contains the details of the respective message sent by the user to the bot.

    Below is a sample event JSON:

    {  
    currentIntent: {    
    name: 'orderBook',    
    slots: {      
      bookType: null,      
      bookName: 'null'    
     },    
      confirmationStatus: 'None'
     },
     bot: {  
      name: 'PurchaseBook',  
      alias: '$LATEST',  
      version: '$LATEST'
     },
     userId: 'user-1',
     inputTranscript: 'buy me a book',
     invocationSource: 'DialogCodeHook',
     outputDialogMode: 'Text',
     messageVersion: '1.0'
     };

    Format of event JSON is explained below:-

    • currentIntent:  It will contain information regarding the intent of message sent by the user to the bot. It contains following keys:
    • name: intent name  (for e.g orderBook, we defined this intent in our previous blog).
    • slots: It will contain a map of slot names configured for that particular intent,  populated with values recognized by Amazon Lex during the conversation. Default values are null.  
    • confirmationStatus:  It provides the user response to a confirmation prompt if there is one. Possible values for this variable are:
    • None: Default value
    • Confirmed: When the user responds with a confirmation w.r.t confirmation prompt.
    • Denied: When the user responds with a deny w.r.t confirmation prompt.
    • inputTranscipt: Text input by the user for processing. In case of audio input, the text will be extracted from audio. This is the text that is actually processed to recognize intents and slot values.                                
    • invocationSource: Its value directs the reason for invoking the Lambda function. It can have following two values:
    • DialogCodeHook:  This value directs the Lambda function to initialize the validation of user’s data input. If the intent is not clear, Amazon Lex can’t invoke the Lambda function.
    • FulfillmentCodeHook: This value is set to fulfil the intent. If the intent is configured to invoke a Lambda function as a fulfilment code hook, Amazon Lex sets the invocationSource to this value only after it has all the slot data to fulfil the intent.
    • bot: Details of bot that processed the request. It consists of below information:
    • name:  name of the bot.
    • alias: alias of the bot version.
    • version: the version of the bot.
    • userId: Its value is defined by the client application. Amazon Lex passes it to the Lambda function.
    • outputDialogMode:  Its value depends on how you have configured your bot. Its value can be Text / Voice.
    • messageVersion: The version of the message that identifies the format of the event data going into the Lambda function and the expected format of the response from a Lambda function. In the current implementation, only message version 1.0 is supported. Therefore, the console assumes the default value of 1.0 and doesn’t show the message version.
    • sessionAttributes:  Application-specific session attributes that the client sent in the request. It is optional.

    2. Context:

    AWS Lambda uses this parameter to provide the runtime information of the Lambda function that is executing. Some useful information we can get from context object are:-

    • The time is remaining before AWS Lambda terminates the Lambda function.
    • The CloudWatch log stream associated with the Lambda function that is executing.
    • The AWS request ID returned to the client that invoked the Lambda function which can be used for any follow-up inquiry with AWS support.

    Section 2: Response Format

    Amazon Lex expects a response from a Lambda function in the following format:

    {  
    sessionAttributes: {},  
    dialogAction: {   
    type: "ElicitIntent/ ElicitSlot/ ConfirmIntent/ Delegate/ Close",
    <structure based on type> 
    }
    }

    The response consists of two fields. The sessionAttributes field is optional, the dialogAction field is required. The contents of the dialogAction field depends on the value of the type field.

    • sessionAttributes: This is an optional field, it can be empty. If the function has to send something back to the client it should be passed under sessionAttributes. We will see its use-case in Section-4.
    • dialogAction (Required): Type of this field defines the next course of action. There are five types of dialogAction explained below:-

    1) Close: Informs Amazon Lex not to expect a response from the user. This is the case when all slots get filled. If you don’t specify a message, Amazon Lex uses the goodbye message or the follow-up message configured for the intent.

    dialogAction: {   
    type: "Close",   
    fulfillmentState: "Fulfilled/ Failed", // (required)   
    message: { // (optional)     
    contentType: "PlainText or SSML",     
    content: "Message to convey to the user"   
    } 
    }

    2) ConfirmIntent: Informs Amazon Lex that the user is expected to give a yes or no answer to confirm or deny the current intent. The slots field must contain an entry for each of the slots configured for the specified intent. If the value of a slot is unknown, you must set it to null. The message and responseCard fields are optional.

    dialogAction: {   
    type: "ConfirmIntent",   
    intentName: "orderBook",   
    slots: {     
      bookName: "value",     
      bookType: "value",   
     }   
     message: { // (optional)     
      contentType: "PlainText or SSML",     
      content: "Message to convey to the user"   
      } 
      }

    3) Delegate:  Directs Amazon Lex to choose the next course of action based on the bot configuration. The response must include any session attributes, and the slots field must include all of the slots specified for the requested intent. If the value of the field is unknown, you must set it to null. You will get a DependencyFailedException exception if your fulfilment function returns the Delegate dialog action without removing any slots.

    dialogAction: {   
    type: "Delegate",   
    slots: {     
      slot1: "value",     
      slot2: "value"   
     } 
     }

    4) ElicitIntent: Informs Amazon Lex that the user is expected to respond with an utterance that includes an intent. For example, “I want a buy a book” which indicates the OrderBook intent. The utterance “book,” on the other hand, is not sufficient for Amazon Lex to infer the user’s intent

    dialogAction: 
    {   type: "ElicitIntent",   
    message: { // (optional)     
    contentType: "PlainText or SSML",     
    content: "Message to convey to the user"   
    } 
    }

    5) ElicitSlot:  Informs Amazon Lex that the user is expected to provide a slot value in the response. In below structure, we are informing Amazon lex that user response should provide value for the slot named ‘bookName’.

    dialogAction: {   
      type: "ElicitSlot",   
      intentName: "orderBook",   
      slots: {     
        bookName: "",     
        bookType: "fiction",   
       },   
       slotToElicit: "bookName",   
       message: { // (optional)     
       contentType: "PlainText or SSML",     
       content: "Message to convey to the user"   
       }
       }

    Section 3: Managing Conversation Context

    Conversation context is the information that a user, your application, or a Lambda function provides to an Amazon Lex bot to fulfill an intent. Conversation context includes slot data that the user provides, request attributes set by the client application, and session attributes that the client application and Lambda functions create.

    1. Setting session timeout

    Session timeout is the length of time that a conversation session lasts. For in-progress conversations, Amazon Lex retains the context information, slot data, and session attributes till the session ends. Default session duration is 5 minutes but it can be changed upto 24 hrs while creating the bot in Amazon Lex console.

    2.Setting session attributes

    Session attributes contain application-specific information that is passed between a bot and a client application during a session. Amazon Lex passes session attributes to all Lambda functions configured for a bot. If a Lambda function adds or updates session attributes, Amazon Lex passes the new information back to the client application.

    Session attributes persist for the duration of the session. Amazon Lex stores them in an encrypted data store until the session ends.

    3. Sharing information between intents

    If you have created a bot with more than one intent, information can be shared between them using session attributes. Attributes defined while fulfilling an intent can be used in other defined intent.

    For example, a user of the book ordering bot starts by ordering books. the bot engages in a conversation with the user, gathering slot data, such as book name, and quantity. When the user places an order, the Lambda function that fulfils the order sets the lastConfirmedReservation session attribute containing information regarding ordered book and currentReservationPrice containing the price of the book. So, when the user has fulfilled the intent orderMagazine, the final price will be calculated on the bases of currentReservationPrice.

    lastConfirmedReservation session attribute containing information regarding ordered book and currentReservationPrice containing the price of the book. So, when the user also fulfilled the intent orderMagazine, the final price will be calculated on the basis of currentReservationPrice.

    Section 4:  Example

    The details of example Bot are below:

    Bot Name: PurchaseBot

    Intents :

    • orderBook – bookName, bookType
    • orderMagazine – magazineName, issueMonth

    Session attributes set while fulfilling the intent “orderBook” are:

    1. lastConfirmedReservation: In this variable, we are storing slot values corresponding to intent orderBook.
    2. currentReservationPrice: Book price is calculated and stored in this variable

    When intent orderBook gets fulfilled we will ask the user if he also wants to order a magazine. If the user responds with a confirmation bot will start fulfilling the intent “orderMagazine”.  

    Conclusion

    AWS Lambda functions are used as code hooks for your Amazon Lex bot. You can identify Lambda functions to perform initialization and validation, fulfillment, or both in your intent configuration. This blog bought more technical insight of how Amazon Lex works and how it communicates with Lambda functions. This blog explains how a conversation context is maintained using the session attributes. I hope you find the information useful.

  • Ensure Continuous Delivery On Kubernetes With GitOps’ Argo CD

    What is GitOps?

    GitOps is a Continuous Deployment model for cloud-native applications. In GitOps, the Git repositories which contain the declarative descriptions of the infrastructure are considered as the single source of truth for the desired state of the system and we need to have an automated way to ensure that the deployed state of the system always matches the state defined in the Git repository. All the changes (deployment/upgrade/rollback) on the environment are triggered by changes (commits) made on the Git repository

    The artifacts that we run on any environment always have a corresponding code for them on some Git repositories. Can we say the same thing for our infrastructure code?

    Infrastructure as code tools, completely declarative orchestration tools like Kubernetes allow us to represent the entire state of our system in a declarative way. GitOps intends to make use of this ability and make infrastructure-related operations more developer-centric.

    Role of Infrastructure as Code (IaC) in GitOps

    The ability to represent the infrastructure as code is at the core of GitOps. But just having versioned controlled infrastructure as code doesn’t mean GitOps, we also need to have a mechanism in place to keep (try to keep) our deployed state in sync with the state we define in the Git repository.

    Infrastructure as Code is necessary but not sufficient to achieve GitOps

    GitOps does pull-based deployments

    Most of the deployment pipelines we see currently, push the changes in the deployed environment. For example, consider that we need to upgrade our application to a newer version then we will update its docker image tag in some repository which will trigger a deployment pipeline and update the deployed application. Here the changes were pushed on the environment. In GitOps, we just need to update the image tag on the Git repository for that environment and the changes will be pulled to the environment to match the updated state in the Git repository. The magic of keeping the deployed state in sync with state-defined on Git is achieved with the help of operators/agents. The operator is like a control loop which can identify differences between the deployed state and the desired state and make sure they are the same.

    Key benefits of GitOps:

    1. All the changes are verifiable and auditable as they make their way into the system through Git repositories.
    2. Easy and consistent replication of the environment as Git repository is the single source of truth. This makes disaster recovery much quicker and simpler.
    3. More developer-centric experience for operating infrastructure. Also a smaller learning curve for deploying dev environments.
    4. Consistent rollback of application as well as infrastructure state.

    Introduction to Argo CD

    Argo CD is a continuous delivery tool that works on the principles of GitOps and is built specifically for Kubernetes. The product was developed and open-sourced by Intuit and is currently a part of CNCF.

    Key components of Argo CD:

    1. API Server: Just like K8s, Argo CD also has an API server that exposes APIs that other systems can interact with. The API server is responsible for managing the application, repository and cluster credentials, enforcing authentication and authorization, etc.
    2. Repository server: The repository server keeps a local cache of the Git repository, which holds the K8s manifest files for the application. This service is called by other services to get the K8s manifests.  
    3. Application controller: The application controller continuously watches the deployed state of the application and compares it with the desired state of the application, reports the API server whenever they are not in sync with each other and seldom takes corrective actions as well. It is also responsible for executing user-defined hooks for various lifecycle events of the application.

    Key objects/resources in Argo CD:

    1. Application: Argo CD allows us to represent the instance of the application which we want to deploy in an environment by creating Kubernetes objects of a custom resource definition(CRD) named Application. In the specification of Application type objects, we specify the source (repository) of our application’s K8s manifest files, the K8s server where we want to deploy those manifests, namespace, and other information.
    2. AppProject: Just like Application, Argo CD provides another CRD named AppProject. AppProjects are used to logically group related-applications.
    3. Repo Credentials: In the case of private repositories, we need to provide access credentials. For credentials, Argo CD uses the K8s secrets and config map. First, we create objects of secret types and then we update a special-purpose configuration map named argocd-cm with the repository URL and the secret which contains the credentials.
    4. Cluster Credentials: Along with Git repository credentials, we also need to provide the K8s cluster credentials. These credentials are also managed using K8s secret, we are required to add the label argocd.argoproj.io/secret-type: cluster to these secrets.

    Demo:

    Enough of theory, let’s try out the things we discussed above. For the demo, I have created a simple app named message-app. This app reads a message set in the environment variable named MESSAGE. We will populate the values of this environment variable using a K8s config map. I have kept the K8s manifest files for the app in a separate repository. We have the application and the K8s manifest files ready. Now we are all set to install Argo CD and deploy our application.

    Installing Argo CD:

    For installing Argo CD, we first need to create a namespace named argocd.

    kubectl create namespace argocd
    kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

    Applying the files from the argo-cd repo directly is fine for demo purposes, but in actual environments, you must copy the file in your repository before applying them. 

    We can see that this command has created the core components and CRDs we discussed earlier in the blog. There are some additional resources as well but we can ignore them for the time being.

    Accessing the Argo CD GUI

    We have the Argo CD running in our cluster, Argo CD also provides a GUI which gives us a graphical representation of our k8s objects. It allows us to view events, pod logs, and other configurations.

    By default, the GUI service is not exposed outside the cluster. Let us update its service type to LoadBalancer so that we can access it from outside.

    kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'

    After this, we can use the external IP of the argocd-server service and access the GUI. 

    The initial username is admin and the password is the name of the api-server pod. The password can be obtained by listing the pods in the argocd namespace or directly by this command.

    kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f 2 

    Deploy the app:

    Now let’s go ahead and create our application for the staging environment for our message app.

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: message-app-staging
      namespace: argocd
      environment: staging
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
    
      # Source of the application manifests
      source:
        repoURL: https://github.com/akash-gautam/message-app-manifests.git
        targetRevision: HEAD
        path: manifests/staging
    
      # Destination cluster and namespace to deploy the application
      destination:
        server: https://kubernetes.default.svc
        namespace: staging
    
      syncPolicy:
        automated:
          prune: false
          selfHeal: false

    In the application spec, we have specified the repository, where our manifest files are stored and also the path of the files in the repository. 

    We want to deploy our app in the same k8s cluster where ArgoCD is running so we have specified the local k8s service URL in the destination. We want the resources to be deployed in the staging namespace, so we have set it accordingly.

    In the sync policy, we have enabled automated sync. We have kept the project as default. 

    Adding the resources-finalizer.argocd.argoproj.io ensures that all the resources created for the application are deleted when the Application is deleted. This is fine for our demo setup but might not always be desirable in real-life scenarios.

    Our git repos are public so we don’t need to create secrets for git repo credentials.

    We are deploying in the same cluster where Argo CD itself is running. As this is a demo setup, we can use the admin user created by Argo CD, so we don’t need to create secrets for cluster credentials either.

    Now let’s go ahead and create the application and see the magic happen.

    kubectl apply -f message-app-staging.yaml

    As soon as the application is created, we can see it on the GUI. 

    By clicking on the application, we can see all the Kubernetes objects created for it.

    It also shows the objects which are indirectly created by the objects we create. In the above image, we can see the replica set and endpoint object which are created as a result of creating the deployment and service respectively.

    We can also click on the individual objects and see their configuration. For pods, we can see events and logs as well.

    As our app is deployed now, we can grab public IP of message-app service and access it on the browser.

    We can see that our app is deployed and accessible.

    Updating the app

    For updating our application, all we need to do is commit our changes to the GitHub repository. We know the message-app just displays the message we pass to it via. Config map, so let’s update the message and push it to the repository.

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: message-configmap
      labels:
        app: message-app
    data:
      MESSAGE: "This too shall pass" #Put the message you want to display here.

    Once the commit is done, Argo CD will start to sync again.

    Once the sync is done, we will restart our message app pod, so that it picks up the latest values in the config map. Then we need to refresh the browser to see updated values.

    As we discussed earlier, for making any changes to the environment, we just need to update the repo which is being used as the source for the environment and then the changes will get pulled in the environment. 

    We can follow an exact similar approach and deploy the application to the production environment as well. We just need to create a new application object and set the manifest path and deployment namespace accordingly.

    Conclusion: 

    It’s still early days for GitOps, but it has already been successfully implemented at scale by many organizations. As the GitOps tools mature along with the ever-growing adoption of Kubernetes, I think many organizations will consider adopting GitOps soon. GitOps is not limited only to Kubernetes, but the completely declarative nature of Kubernetes makes it simpler to achieve GitOps. Argo CD is a deployment tool that’s tailored for Kubernetes and allows us to do deployments in a Kubernetes native way while following the principles of GitOps.I hope this blog helped you in understanding how what and why of GitOps and gave some insights to Argo CD.

  • Building a WebSocket Service with AWS Lambda & DynamoDB

    WebSocket is an effective way for full-duplex, real-time communication between a web server and a client. It is widely used for building real-time web applications along with helper libraries that offer better features. Implementing WebSockets requires a persistent connection between two parties. Serverless functions are known for short execution time and non-persistent behavior. However, with the API Gateway support for WebSocket endpoints, it is possible to implement a Serverless service built on AWS Lambda, API Gateway, and DynamoDB.

    Prerequisites

    A basic understanding of real-time web applications will help with this implementation. Throughout this article, we will be using Serverless Framework for developing and deploying the WebSocket service. Also, Node.js is used to write the business logic. 

    Behind the scenes, Serverless uses Cloudformation to create various required resources, like API Gateway APIs, AWS Lambda functions, IAM roles and policies, etc.

    Why Serverless?

    Serverless Framework abstracts the complex syntax needed for creating the Cloudformation stacks and helps us focus on the business logic of the services. Along with that, there are a variety of plugins available that help developing serverless applications easier.

    Why DynamoDB?

    We need persistent storage for WebSocket connection data, along with AWS Lambda. DynamoDB, a serverless key-value database from AWS, offers low latency, making it a great fit for storing and retrieving WebSocket connection details.

    Overview

    In this application, we’ll be creating an AWS Lambda service that accepts the WebSocket connections coming via API Gateway. The connections and subscriptions to topics are persisted using DynamoDB. We will be using ws for implementing basic WebSocket clients for the demonstration. The implementation has a Lambda consuming WebSocket that receives the connections and handles the communication. 

    Base Setup

    We will be using the default Node.js boilerplate offered by Serverless as a starting point.

    serverless create --template aws-nodejs

    A few of the Serverless plugins are installed and used to speed up the development and deployment of the Serverless stack. We also add the webpack config given here to support the latest JS syntax.

    Adding Lambda role and policies:

    The lambda function requires a role attached to it that has enough permissions to access DynamoDB and Execute API. These are the links for the configuration files:

    Link to dynamoDB.yaml

    Link to lambdaRole.yaml

    Adding custom config for plugins:

    The plugins used for local development must have the custom config added in the yaml file.

    This is how our serverless.yaml file should look like after the base serverless configuration:

    service: websocket-app
    frameworkVersion: '2'
    custom:
     dynamodb:
       stages:
         - dev
       start:
         port: 8000
         inMemory: true
         heapInitial: 200m
         heapMax: 1g
         migrate: true
         convertEmptyValues: true
     webpack:
       keepOutputDirectory: true
       packager: 'npm'
       includeModules:
         forceExclude:
           - aws-sdk
     
    provider:
     name: aws
     runtime: nodejs12.x
     lambdaHashingVersion: 20201221
    plugins:
     - serverless-dynamodb-local
     - serverless-plugin-existing-s3
     - serverless-dotenv-plugin
     - serverless-webpack
     - serverless-offline
    resources:
     - Resources: ${file(./config/dynamoDB.yaml)}
     - Resources: ${file(./config/lambdaRoles.yaml)}
    functions:
     hello:
       handler: handler.hello

    Add WebSocket Lambda:

    We need to create a lambda function that accepts WebSocket events from API Gateway. As you can see, we’ve defined 3 WebSocket events for the lambda function.

    • $connect
    • $disconnect
    • $default

    These 3 events stand for the default routes that come with WebSocket API Gateway offering. $connect and $disconnect are used for initialization and termination of the socket connection, where $default route is for data transfer.

    functions:
     websocket:
       handler: lambda/websocket.handler
       events:
         - websocket:
             route: $connect
         - websocket:
             route: $disconnect
         - websocket:
             route: $default

    We can go ahead and update how data is sent and add custom WebSocket routes to the application.

    The lambda needs to establish a connection with the client and handle the subscriptions. The logic for updating the DynamoDB is written in a utility class client. Whenever a connection is received, we create a record in the topics table.

    console.log(`Received socket connectionId: ${event.requestContext && event.requestContext.connectionId}`);
           if (!(event.requestContext && event.requestContext.connectionId)) {
               throw new Error('Invalid event. Missing `connectionId` parameter.');
           }
           const connectionId = event.requestContext.connectionId;
           const route = event.requestContext.routeKey;
           console.log(`data from ${connectionId} ${event.body}`);
           const connection = new Client(connectionId);
           const response = { statusCode: 200, body: '' };
     
           if (route === '$connect') {
               console.log(`Route ${route} - Socket connectionId connectedconected: ${event.requestContext && event.requestContext.connectionId}`);
               await new Client(connectionId).connect();
               return response;
           } 

    The Client utility class internally creates a record for the given connectionId in the DynamoDB topics table.

    async subscribe({ topic, ttl }) {
       return dynamoDBClient
         .put({ 
            Item: {
             topic,
             connectionId: this.connectionId,
            ttl: typeof ttl === 'number' ? ttl : Math.floor(Date.now() / 1000) + 60 * 60 * 2,
           },
           TableName: process.env.TOPICS_TABLE,
         }).promise();
     }

    Similarly, for the $disconnect route, we remove the INITIAL_CONNECTION topic record when a client disconnects.

    else if (route === '$disconnect') {
     console.log(`Route ${route} - Socket disconnected: ${ event.requestContext.connectionId}`);
               await new Client(connectionId).unsubscribe();
               return response;
           }

    The client.unsubscribe method internally removes the connection entry from the DynamoDB table. Here, the getTopics method fetches all the topics the particular client has subscribed to.

    async unsubscribe() {
       const topics = await this.getTopics();
       if (!topics) {
         throw Error(`Topics got undefined`);
       }
       return this.removeTopics({
         [process.env.TOPICS_TABLE]: topics.map(({ topic, connectionId }) => ({
           DeleteRequest: { Key: { topic, connectionId } },
         })),
       });
     }

    Now comes the default route part of the lambda where we customize message handling. In this implementation, we’re relaying our message handling based on the event.body.type, which indicates what kind of message is received from the client to server. The subscribe type here is used to subscribe to new topics. Similarly, the message type is used to receive the message from one client and then publish it to other clients who have subscribed to the same topic as the sender.

    console.log(`Route ${route} - data from ${connectionId}`);
               if (!event.body) {
                   return response;
               }
               let body = JSON.parse(event.body);
               const topic = body.topic;
               if (body.type === 'subscribe') {
                   connection.subscribe({ topic });
                   console.log(`Client subscribing for topic: ${topic}`);
               }
               if (body.type === 'message') {
                   await new Topic(topic).publishMessage({ data: body.message });
                   console.error(`Published messages to subscribers`);
                   return response;
               }
               return response;

    Similar to $connect, the subscribe type of payload, when received, creates a new subscription for the mentioned topic.

    Publishing the messages

    Here is the interesting part of this lambda. When a client sends a payload with type message, the lambda calls the publishMessage method with the data received. The method gets the subscribers active for the topic and publishes messages using another utility TopicSubscriber.sendMessage

    async publishMessage(data) {
       const subscribers = await this.getSubscribers();
       const promises = subscribers.map(async ({ connectionId, subscriptionId }) => {
         const TopicSubscriber = new Client(connectionId);
           const res = await TopicSubscriber.sendMessage({
             id: subscriptionId,
             payload: { data },
             type: 'data',
           });
           return res;
       });
       return Promise.all(promises);
     }

    The sendMessage executes the API endpoint, which is the API Gateway URL after deployment. As we’re using serverless-offline for the local development, the IS_OFFLINE env variable is automatically set.

    const endpoint =  process.env.IS_OFFLINE ? 'http://localhost:3001' : process.env.PUBLISH_ENDPOINT;
       console.log('publish endpoint', endpoint);
       const gatewayClient = new ApiGatewayManagementApi({
         apiVersion: '2018-11-29',
         credentials: config,
         endpoint,
       });
       return gatewayClient
         .postToConnection({
           ConnectionId: this.connectionId,
           Data: JSON.stringify(message),
         })
         .promise();

    Instead of manually invoking the API endpoint, we can also use DynamoDB streams to trigger a lambda asynchronously and publish messages to topics.

    Implementing the client

    For testing the socket implementation, we will be using a node.js script ws-client.js. This creates two nodejs ws clients: one that sends the data and another that receives it.

    const WebSocket = require('ws');
    const sockedEndpoint = 'http://0.0.0.0:3001';
    const ws1 = new WebSocket(sockedEndpoint, {
     perMessageDeflate: false
    });
    const ws2 = new WebSocket(sockedEndpoint, {
     perMessageDeflate: false
    });

    The first client on connect sends the data at an interval of one second to a topic named general. The count increments each send.

    ws1.on('open', () => {
       console.log('WS1 connected');
       let count = 0;
       setInterval(() => {
         const data = {
           type: 'message',
           message: `count is ${count}`,
           topic: 'general'
         }
         const message  = JSON.stringify(data);
         ws1.send(message, (err) => {
           if(err) {
             console.log(`Error occurred while send data ${err.message}`)
           }
           console.log(`WS1 OUT ${message}`);
         })
         count++;
       }, 15000)
    })

    The second client on connect will first subscribe to the general topic and then attach a handler for receiving data.

    ws2.on('open', () => {
     console.log('WS2 connected');
     const data = {
       type: 'subscribe',
       topic: 'general'
     }
     ws2.send(JSON.stringify(data), (err) => {
       if(err) {
         console.log(`Error occurred while send data ${err.message}`)
       }
     })
    });
    ws2.on('message', ( message) => {
     console.log(`ws2 IN ${message}`);
    });

    Once the service is running, we should be able to see the following output, where the two clients successfully sharing and receiving the messages with our socket server.

    Conclusion

    With API Gateway WebSocket support and DynamoDB, we’re able to implement persistent socket connections using serverless functions. The implementation can be improved and can be as complex as needed.

    WebSocket is an effective way for full-duplex, real-time communication between a web server and a client. It is widely used for building real-time web applications along with helper libraries that offer better features. Implementing WebSockets requires a persistent connection between two parties. Serverless functions are known for short execution time and non-persistent behavior. However, with the API Gateway support for WebSocket endpoints, it is possible to implement a Serverless service built on AWS Lambda, API Gateway, and DynamoDB.

  • Elasticsearch – Basic and Advanced Concepts

    What is Elasticsearch?

    In our previous blog, we have seen Elasticsearch is a highly scalable open-source full-text search and analytics engine, built on the top of Apache Lucene. Elasticsearch allows you to store, search, and analyze huge volumes of data as quickly as possible and in near real-time.

    Basic Concepts –

    • Index – Large collection of JSON documents. Can be compared to a database in relational databases. Every document must reside in an index.
    • Shards – Since, there is no limit on the number of documents that reside in an index, indices are often horizontally partitioned as shards that reside on nodes in the cluster. 
      Max documents allowed in a shard = 2,147,483,519 (as of now)
    • Type – Logical partition of an index. Similar to a table in relational databases. 
    • Fields – Similar to a column in relational databases. 
    • Analyzers – Used while indexing/searching the documents. These contain “tokenizers” that split phrases/text into tokens and “token-filters”, that filter/modify tokens during indexing & searching.
    • Mappings – Combination of Field + Analyzers. It defines how your fields can be stored & indexed.

    Inverted Index

    ES uses Inverted Indexes under the hood. Inverted Index is an index which maps terms to documents containing them.

    Let’s say, we have 3 documents :

    1. Food is great
    2. It is raining
    3. Wind is strong

    An inverted index for these documents can be constructed as –

    The terms in the dictionary are stored in a sorted order to find them quickly.

    Searching multiple terms is done by performing a lookup on the terms in the index. It performs either UNION or INTERSECTION on them and fetches relevant matching documents.

    An ES Index is spanned across multiple shards, each document is routed to a shard in a round–robin fashion while indexing. We can customize which shard to route the document, and which shard search-requests are sent to.

    ES Index is made of multiple Lucene indexes, which in turn, are made up of index segments. These are write once, read many types of indices, i.e the index files Lucene writes are immutable (except for deletions).

    Analyzers –

    Analysis is the process of converting text into tokens or terms which are added to the inverted index for searching. Analysis is performed by an analyzer. An analyzer can be either a built-in or a custom. 

    We can define single analyzer for both indexing & searching, or a different search-analyzer and an index-analyzer for a mapping.

    Building blocks of analyzer- 

    • Character filters – receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.
    • Tokenizers – receives a stream of characters, breaks it up into individual tokens. 
    • Token filters – receives the token stream and may add, remove, or change tokens.

    Some Commonly used built-in analyzers –

    1. Standard –

    Divides text into terms on word boundaries. Lower-cases all terms. Removes punctuation and stopwords (if specified, default = None).

    Text:  The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [the, 2, quick, brown, foxes, jumped, over, the, lazy, dog’s, bone]

    2. Simple/Lowercase –

    Divides text into terms whenever it encounters a non-letter character. Lower-cases all terms.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

    3. Whitespace –

    Divides text into terms whenever it encounters a white-space character.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog’s, bone.]

    4. Stopword –

    Same as simple-analyzer with stop word removal by default.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [ quick, brown, foxes, jumped, over, lazy, dog, s, bone]

    5. Keyword / NOOP –

    Returns the entire input string as it is.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.]

    Some Commonly used built-in tokenizers –

    1. Standard –

    Divides text into terms on word boundaries, removes most punctuation.

    2. Letter –

    Divides text into terms whenever it encounters a non-letter character.

    3. Lowercase –

    Letter tokenizer which lowercases all tokens.

    4. Whitespace –

    Divides text into terms whenever it encounters any white-space character.

    5. UAX-URL-EMAIL –

    Standard tokenizer which recognizes URLs and email addresses as single tokens.

    6. N-Gram –

    Divides text into terms when it encounters anything from a list of specified characters (e.g. whitespace or punctuation), and returns n-grams of each word: a sliding window of continuous letters, e.g. quick → [qu, ui, ic, ck, qui, quic, quick, uic, uick, ick].

    7. Edge-N-Gram –

    It is similar to N-Gram tokenizer with n-grams anchored to the start of the word (prefix- based NGrams). e.g. quick → [q, qu, qui, quic, quick].

    8. Keyword –

    Emits exact same text as a single term.

    Make your mappings right –

    Analyzers if not made right, can increase your search time extensively. 

    Avoid using regular expressions in queries as much as possible. Let your analyzers handle them.

    ES provides multiple tokenizers (standard, whitespace, ngram, edge-ngram, etc) which can be directly used, or you can create your own tokenizer. 

    A simple use-case where we had to search for a user who either has “brad” in their name or “brad_pitt” in their email (substring based search), one would simply go and write a regex for this query, if no proper analyzers are written for this mapping.

    {
      "query": {
        "bool": {
          "should": [
            {
              "regexp": {
                "email.raw": ".*brad_pitt.*"
              }
            },
            {
              "regexp": {
                "name.raw": ".*brad.*"
              }
            }
          ]
        }
      }
    }

    This took 16s for us to fetch 1 lakh out of 60 million documents

    Instead, we created an n-gram analyzer with lower-case filter which would generate all relevant tokens while indexing.

    The above regex query was updated to –

    {
      "query": {
        "bool": {
          "multi_match": {
            "query": "brad",
            "fields": [
              "email.suggestion",
              "full_name.suggestion"
            ]
          }
        }
      },
      "size": 25
    }

    This took 109ms for us to fetch 1 lakh out of 60 million documents

    Thus, previous search query which took more than 10-25s got reduced to less than 800-900ms to fetch the same set of records.

    Had the use-case been to search results where name starts with “brad” or email starts with “brad_pitt” (prefix based search), it is better to go for edge-n-gram analyzer or suggesters.

    Performance Improvement with Filter Queries –

    Use Filter queries whenever possible. 

    ES usually scores documents and returns them in sorted order as per their scores. This may take a hit on performance if scoring of documents is not relevant to our use-case. In such scenarios, use “filter” queries which give boolean scores to documents.

    {
      "query": {
        "bool": {
          "multi_match": {
            "query": "brad",
            "fields": [
              "email.suggestion",
              "full_name.suggestion"
            ]
          }
        }
      },
      "size": 25
    }

    Above query can now be written as –

    {
      "query": {
        "bool": {
          "filter": {
            "bool": {
              "must": [
                {
                  "multi_match": {
                    "query": "brad",
                    "fields": [
                      "email.suggestion",
                      "full_name.suggestion"
                    ]
                  }
                }
              ]
            }
          }
        }
      },
      "size": 25
    }

    This will reduce query-time by a few milliseconds.

    Re-indexing made faster –

    Before creating any mappings, know your use-case well.

    ES does not allow us to alter existing mappings unlike “ALTER” command in relational databases, although we can keep adding new mappings to the index. 

    The only way to change existing mappings is by creating a new index, re-indexing existing documents and aliasing the new-index with required name with ZERO downtime on production. Note – This process can take days if you have millions of records to re-index.

    To re-index faster, we can change a few settings  –

    1. Disable swapping – Since no requests will be directed to the new index till indexing is done, we can safely disable swap
    Command for Linux machines –

    sudo swapoff -a

    2. Disable refresh_interval for ES – Default refresh_interval is 1s which can safely be disabled while documents are getting re-indexed.

    3. Change bulk size while indexing – ES usually indexes documents in chunks of size 1k. It is preferred to increase this default size to approx 5 to 10K, although we need to find the sweet spot while reindexing to avoid load on current index.

    4. Reset replica count to 0  – ES creates at least 1 replica per shard, by default. We can set this to 0 while indexing & reset it to required value post indexing.

    Conclusion

    ElasticSearch is a very powerful database for text-based searches. The Elastic ecosystem is widely used for reporting, alerting, machine learning, etc. This article just gives an overview of ElasticSearch mappings and how creating relevant mappings can improve your query performance & accuracy. Giving right mappings, right resources to your ElasticSearch cluster can do wonders.

  • How To Use Inline Functions In React Applications Efficiently

    This blog post explores the performance cost of inline functions in a React application. Before we begin, let’s try to understand what inline function means in the context of a React application.

    What is an inline function?

    Simply put, an inline function is a function that is defined and passed down inside the render method of a React component.

    Let’s understand this with a basic example of what an inline function might look like in a React application:

    export default class CounterApp extends React.Component {
      constructor(props) {
        super(props);
        this.state = { count: 0 };
      }
      render() {
        return (
          <div className="App">
            <button
              onClick={() => {
              this.setState({ count: this.state.count + 1 });
              }}
            >COUNT ({this.state.count})</button>
          </div>
        );
      }
    }

    The onClick prop, in the example above, is being passed as an inline function that calls this.setState. The function is defined within the render method, often inline with JSX. In the context of React applications, this is a very popular and widely used pattern.

    Let’s begin by listing some common patterns and techniques where inline functions are used in a React application:

    • Render prop: A component prop that expects a function as a value. This function must return a JSX element, hence the name. Render prop is a good candidate for inline functions.
    render() {
      return (
        <ListView
          items={items}
          render={({ item }) => (<div>{item.label}</div>)}
        />
      );
    }

    • DOM event handlers: DOM event handlers often make a call to setState or invoke some effect in the React application such as sending data to an API server.
    <button
      onClick={() => {
        this.setState({ count: this.state.count + 1 });
      }}>
      COUNT ({this.state.count})
    </button>

    • Custom function or event handlers passed to child: Oftentimes, a child component requires a custom event handler to be passed down as props. Inline function is usually used in this scenario.
    <Button onTap={() => {
      this.nextPage();
    }}>Next<Button>

    Alternatives to inline function

    • Bind in constructor: One of the most common patterns is to define the function within the class component and then bind context to the function in constructor. We only need to bind the current context if we want to use this keyword inside the handler function.
    export default class CounterApp extends React.Component {
      constructor(props) {
        super(props);
        this.state = { count: 0 };
        this.increaseCount = this.increaseCount.bind(this);
      }
    
      increaseCount() {
        this.setState({ count: this.state.count + 1 });
      }
    
      render() {
        return (
          <div className="App">
            <button onClick={this.increaseCount}>COUNT ({this.state.count})</button>
          </div>
        );
     }
    }

    • Bind in render: Another common pattern is to bind the context inline when the function is passed down. Eventually, this gets repetitive and hence the first approach is more popular.
    render() {
      return (
        <div className="App">
          <button onClick={this.increaseCount.bind(this)}>COUNT ({this.state.count})</button>
        </div>
      );
    }

    • Define as public field:
    increaseCount = () => {
      this.setState({ count: this.state.count + 1 });
    };
    
    render() {
      return (
        <div className="App">
          <button onClick={this.increaseCount}>
            COUNT ({this.state.count})
          </button>
        </div>
      );
    }
    view raw

    There are several other approaches that React dev community has come up with, like using a helper method to bind all functions automatically in the constructor.

    After understanding inline functions with its examples and also taking a look at a few alternatives, let’s see why inline functions are so popular and widely used.

    Why use inline function

    Inline function definitions are right where they are invoked or passed down. This means inline functions are easier to write, especially when the body of the function is of a few instructions such as calling setState. This works well within loops as well.

    For example, when rendering a list and assigning a DOM event handler to each list item, passing down an inline function feels much more intuitive. For the same reason, inline functions also make code more organized and readable.

    Inline arrow functions preserve context that means developers can use this without having to worry about current execution context or explicitly bind a context to the function.

    <Button onTap={() => {
      this.prevPage();
    }}>Previous<Button>

    Inline functions make value from parent scope available within the function definition. It results in more intuitive code and developers need to pass down fewer parameters. Let’s understand this with an example.

    render() {
      const { count } = this.state;
      return (
        <div className="App">
          <button
            onClick={() => {
              this.setState({ count: count + 1 });
          }}>
            COUNT ({count})
          </button>
        </div>
      );
    }

    Here, the value of count is readily available to onClick event handlers. This behavior is called closing over.

    For these reasons, React developers make use of inline functions heavily. That said, inline function has also been a hot topic of debate because of performance concerns. Let’s take a look at a few of these arguments.

    Arguments against inline functions

    • A new function is defined every time the render method is called. It results in frequent garbage collection, and hence performance loss.
    • There is an eslint config that advises against using inline function jsx-no-bind. The idea behind this rule is when an inline function is passed down to a child component, React uses reference checks to re-render the component. This can result in child component rendering again and again as a reference to the passed prop value i.e. inline function. In this case, it doesn’t match the original one.

    <listitem onclick=”{()” ==””> console.log(‘click’)}></listitem>

    Suppose ListItem component implements shouldComponentUpdate method where it checks for onClick prop reference. Since inline functions are created every time a component re-renders, this means that the ListItem component will reference a new function every time, which points to a different location in memory. The comparison checks in shouldComponentUpdate and tells React to re-render ListItem even though the inline function’s behavior doesn’t change. This results in unnecessary DOM updates and eventually reduces the performance of applications.

    Performance concerns revolving around the Function.prototype.bind methods: when not using arrow functions, the inline function being passed down must be bound to a context if using this keyword inside the function. The practice of calling .bind before passing down an inline function raises performance concerns, but it has been fixed. For older browsers, Function.prototype.bind can be supplemented with a polyfill for performance.

    Now that we’ve summarized a few arguments in favor of inline functions and a few arguments against it, let’s investigate and see how inline functions really perform.

    render() {
      return (
        <div>
          {this.state.timeThen > this.state.timeNow ? (
           <>
             <button onClick={() => { /* some action */ }} />
             <button onClick={() => { /* another action */ }} />
           </>
          ) : (
            <button onClick={() => { /* yet another action */ }} />
          )}
        </div>
      );
    }

    Pre-optimization can often lead to bad code. For instance, let’s try to get rid of all the inline function definitions in the component above and move them to the constructor because of performance concerns.

    We’d then have to define 3 custom event handlers in the class definition and bind context to all three functions in the constructor.

    export default class CounterApp extends React.Component {
      constructor(props) {
        super(props);
        this.state = {
          timeThen: ...,
          timeNow: Date.now()
        };
        this.someAction = this.someAction.bind(this);
        this.anotherAction = this.anotherAction.bind(this);
        this.yetAnotherAction = this.yetAnotherAction.bind(this);
      }
    
      someAction() { /* some action */ }
      anotherAction() { /* another action */ }
      yetAnotherAction() { /* yet another action */ }
    
      render() {
        return (<div>
          {this.state.timeThen > this.state.timeNow ? (
            <>
              <button onClick={this.someAction} />
              <button onClick={this.anotherAction} />
            </>
          ) : (
            <button onClick={this.yetAnotherAction} />
          )}</div>);
      }
    }

    This would increase the initialization time of the component significantly as opposed to inline function declarations where only one or two functions are defined and used at a time based on the result of condition timeThen > timeNow.

    Concerns around render props: A render prop is a method that returns a React element and is used to share state among React components.

    Render props are meant to be invoked on each render since they share state between parent components and enclosed React elements. Inline functions are a good candidate for use in render prop and won’t cause any performance concern.

    render() {
      return (
        <ListView
          items={items}
          render={({ item }) => (<div>{item.label}</div>)}
        />
      )
    }

    Here, the render prop of ListView component returns a label enclosed in a div. Since the enclosed component can never know what the value of the item variable is, it can never be a PureComponent or have a meaningful implementation of shouldComponentUpdate(). This eliminates the concerns around use of inline function in render prop. In fact, promotes it in most cases.

    In my experience, inline render props can sometimes be harder to maintain especially when render prop returns a larger more complicated component in terms of code size. In such cases, breaking down the component further or having a separate method that gets passed down as render prop has worked well for me.

    Concerns around PureComponents and shouldComponentUpdate(): Pure components and various implementations of shouldComponentUpdate both do a strict type comparison of props and state. These act as performance enhancers by letting React know when or when not to trigger a render based on changes to state and props. Since inline functions are created on every render, when such a method is passed down to a pure component or a component that implements the shouldComponentUpdate method, it can lead to an unnecessary render. This is because of the changed reference of the inline function.

    To overcome this, consider skipping checks on all function props in shouldComponentUpdate(). This assumes that inline functions passed to the component are only different in reference and not behavior. If there is a difference in the behavior of the function passed down, it will result in a missed render and eventually lead to bugs in the component’s state and effects.

    Conclusion‍

    A common rule of thumb is to measure performance of the app and only optimize if needed. Performance impact of inline function, often categorized under micro-optimizations, is always a tradeoff between code readability, performance gain, code organization, etc that must be thought out carefully on a case by case basis and pre-optimization should be avoided.

    In this blog post, we observed that inline functions don’t necessarily bring in a lot of performance cost. They are widely used because of ease of writing, reading and organizing inline functions, especially when inline function definitions are short and simple.

  • Managing a TLS Certificate for Kubernetes Admission Webhook

    A Kubernetes admission controller is a great way of handling an incoming request, whether to add or modify fields or deny the request as per the rules/configuration defined. To extend the native functionalities, these admission webhook controllers call a custom-configured HTTP callback (webhook server) for additional checks. But the API server only communicates over HTTPS with the admission webhook servers and needs TLS cert’s CA information. This poses a problem for how we handle this webhook server certificate and how to pass CA information to the API server automatically.

    One way to handle these TLS certificate and CA is using Kubernetes cert-manager. However, Kubernetes cert-manager itself is a big application and consists of many CRDs to handle its operation. It is not a good idea to install cert-manager just to handle admission webhook TLS certificate and CA. The second and possibly easier way is to use self-signed certificate and handle CA on our own using the Init Container. This eliminates the dependency on other applications, like cert-manager, and gives us the flexibility to control our application flow.

    How is a custom admission webhook written? We will not cover this in-depth, and only a basic overview of admission controllers and their working will be covered. The main focus for this blog will be to cover the second approach step-by-step: handling admission webhook server TLS certificate and CA on our own using init container so that the API server can communicate with our custom webhook.

    To understand the in-depth working of Admission Controllers, these articles are great: 

    Prerequisites:

    • Knowledge of Kubernetes admission controllers,  MutatingAdmissionWebhook, ValidatingAdmissionWebhook
    • Knowledge of Kubernetes resources like pods and volumes

    Basic Overview: 

    Admission controllers intercept requests to the Kubernetes API server before persistence of objects in the etcd. These controllers are bundled and compiled together into the kube-apiserver binary. They consist of a list of controllers, and in that list, there are two special controllers: MutatingAdmissionWebhook and ValidatingAdmissionWebhook. MutatingAdmissionWebhook, as the name suggests, mutates/adds/modifies some fields in the request object by creating a patch, and ValidatingAdmissionWebhook validates the request by checking if the request object fields are valid or if the operation is allowed, etc., as per custom logic.

    The main reason for these types of controllers is to dynamically add new checks along with the native existing checks in Kubernetes to allow a request, just like the plug-in model. To understand this more clearly, let’s say we want all the deployments in the cluster to have certain required labels. If the deployment does not have required labels, then the create deployment request should be denied. This functionality can be achieved in two ways: 

    1) Add these extra checks natively in Kubernetes API server codebase, compile a new binary, and run with the new binary. This is a tedious process, and every time new checks are needed, a new binary is required. 

    2) Create a custom admission webhook, a simple HTTP server, for these additional checks, and register this admission webhook with the API server using AdmissionRegistration API. To register two configurations, MutatingWebhookconfiguration and ValidatingWebhookConfiguration are used. The second approach is recommended and it’s quite easy as well. We will be discussing it here in detail.

    Custom Admission Webhook Server:

    As mentioned earlier, a custom admission webhook server is a simple HTTP server with TLS that exposes endpoints for mutation and validation. Depending upon the endpoint hit, corresponding handlers process mutation and validation. Once a custom webhook server is ready and deployed in a cluster as a deployment along with webhook service, the next part is to register it with the API server so that the API server can communicate with the custom webhook server. To register, MutatingWebhookconfiguration and ValidatingWebhookConfiguration are used. These configurations have a section to fill custom webhook related information.

    apiVersion: admissionregistration.k8s.io/v1
    kind: MutatingWebhookConfiguration
    metadata:
      name: mutation-config
    webhooks:
      - admissionReviewVersions:
        - v1beta1
        name: mapplication.kb.io
        clientConfig:
          caBundle: ${CA_BUNDLE}
          service:
            name: webhook-service
            namespace: default
            path: /mutate
        rules:
          - apiGroups:
              - apps
          - apiVersions:
              - v1
            resources:
              - deployments
        sideEffects: None

    Here, the service field gives information about the name, namespace, and endpoint path of the webhook server running. An important field here to note is the CA bundle. A custom admission webhook is required to run the HTTP server with TLS only because the API server only communicates over HTTPS. So the webhook server runs with server cert, and key and “caBundle” in the configuration is CA (Certification Authority) information so that API server can recognize server certificate.

    The problem here is how to handle this server certificate and the key—and how to get this CA bundle and pass this information to the API server using MutatingWebhookconfiguration or ValidatingWebhookConfiguration. This will be the main focus of the following part.

    Here, we are going to use a self-signed certificate for the webhook server. Now, this self-signed certificate can be made available to the webhook server using different ways. Two possible ways are:

    • Create a Kubernetes secret containing certificate and key and mount that as volume on to the server pod
    • Somehow create certificate and key in a volume, e.g., emptyDir volume and server consumes those from that volume

    However, even after doing any of the above two possible ways, the remaining important part is to add the CA bundle in mutation/validation configs.

    So, instead of doing all these steps manually, we all make use of Kubernetes init containers to perform all functions for us.

    Custom Admission Webhook Server Init Container:

    The main function of this init container will be to create a self-signed webhook server certificate and provide the CA bundle to the API server via mutation/validation configs. How the webhook server consumes this certificate (via secret volume or emptyDir volume), depends on the use-case. This init container will run a simple Go binary to perform all these functions.

    package main
    
    import (
    	"bytes"
    	cryptorand "crypto/rand"
    	"crypto/rsa"
    	"crypto/x509"
    	"crypto/x509/pkix"
    	"encoding/pem"
    	"fmt"
    	log "github.com/sirupsen/logrus"
    	"math/big"
    	"os"
    	"time"
    )
    
    func main() {
    	var caPEM, serverCertPEM, serverPrivKeyPEM *bytes.Buffer
    	// CA config
    	ca := &x509.Certificate{
    		SerialNumber: big.NewInt(2020),
    		Subject: pkix.Name{
    			Organization: []string{"velotio.com"},
    		},
    		NotBefore:             time.Now(),
    		NotAfter:              time.Now().AddDate(1, 0, 0),
    		IsCA:                  true,
    		ExtKeyUsage:           []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth, x509.ExtKeyUsageServerAuth},
    		KeyUsage:              x509.KeyUsageDigitalSignature | x509.KeyUsageCertSign,
    		BasicConstraintsValid: true,
    	}
    
    	// CA private key
    	caPrivKey, err := rsa.GenerateKey(cryptorand.Reader, 4096)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// Self signed CA certificate
    	caBytes, err := x509.CreateCertificate(cryptorand.Reader, ca, ca, &caPrivKey.PublicKey, caPrivKey)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// PEM encode CA cert
    	caPEM = new(bytes.Buffer)
    	_ = pem.Encode(caPEM, &pem.Block{
    		Type:  "CERTIFICATE",
    		Bytes: caBytes,
    	})
    
    	dnsNames := []string{"webhook-service",
    		"webhook-service.default", "webhook-service.default.svc"}
    	commonName := "webhook-service.default.svc"
    
    	// server cert config
    	cert := &x509.Certificate{
    		DNSNames:     dnsNames,
    		SerialNumber: big.NewInt(1658),
    		Subject: pkix.Name{
    			CommonName:   commonName,
    			Organization: []string{"velotio.com"},
    		},
    		NotBefore:    time.Now(),
    		NotAfter:     time.Now().AddDate(1, 0, 0),
    		SubjectKeyId: []byte{1, 2, 3, 4, 6},
    		ExtKeyUsage:  []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth, x509.ExtKeyUsageServerAuth},
    		KeyUsage:     x509.KeyUsageDigitalSignature,
    	}
    
    	// server private key
    	serverPrivKey, err := rsa.GenerateKey(cryptorand.Reader, 4096)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// sign the server cert
    	serverCertBytes, err := x509.CreateCertificate(cryptorand.Reader, cert, ca, &serverPrivKey.PublicKey, caPrivKey)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// PEM encode the  server cert and key
    	serverCertPEM = new(bytes.Buffer)
    	_ = pem.Encode(serverCertPEM, &pem.Block{
    		Type:  "CERTIFICATE",
    		Bytes: serverCertBytes,
    	})
    
    	serverPrivKeyPEM = new(bytes.Buffer)
    	_ = pem.Encode(serverPrivKeyPEM, &pem.Block{
    		Type:  "RSA PRIVATE KEY",
    		Bytes: x509.MarshalPKCS1PrivateKey(serverPrivKey),
    	})
    
    	err = os.MkdirAll("/etc/webhook/certs/", 0666)
    	if err != nil {
    		log.Panic(err)
    	}
    	err = WriteFile("/etc/webhook/certs/tls.crt", serverCertPEM)
    	if err != nil {
    		log.Panic(err)
    	}
    
    	err = WriteFile("/etc/webhook/certs/tls.key", serverPrivKeyPEM)
    	if err != nil {
    		log.Panic(err)
    	}
    
    }
    
    // WriteFile writes data in the file at the given path
    func WriteFile(filepath string, sCert *bytes.Buffer) error {
    	f, err := os.Create(filepath)
    	if err != nil {
    		return err
    	}
    	defer f.Close()
    
    	_, err = f.Write(sCert.Bytes())
    	if err != nil {
    		return err
    	}
    	return nil
    }

    The steps to generate self-signed CA and sign webhook server certificate using this CA in Golang:

    • Create a config for the CA, ca in the code above.
    • Create an RSA private key for this CA, caPrivKey in the code above.
    • Generate a self-signed CA, caBytes, and caPEM above. Here caPEM is the PEM encoded caBytes and will be the CA bundle given to the API server.
    • Create a config for webhook server certificate, cert in the code above. The important field in this configuration is the DNSNames and commonName. This name must be the full webhook service name of the webhook server to reach the webhook pod.
    • Create an RS private key for the webhook server, serverPrivKey in the code above.
    • Create server certificate using ca and caPrivKey, serverCertBytes in the code above.
    • Now, PEM encode the serverPrivKey and serverCertBytes. This serverPrivKeyPEM and serverCertPEM is the TLS certificate and key and will be consumed by the webhook server.

    At this point, we have generated the required certificate, key, and CA bundle using init container. Now we will share this server certificate and key with the actual webhook server container in the same pod. 

    • One approach is to create an empty secret resource before-hand, create webhook deployment by passing the secret name as an environment variable. Init container will generate server certificate and key and populate the empty secret with certificate and key information. This secret will be mounted on to webhook server container to start HTTP server with TLS.
    • The second approach (used in the code above) is to use Kubernete’s native pod specific emptyDir volume. This volume will be shared between both the containers. In the code above, we can see the init container is writing these certificate and key information in a file on a particular path. This path will be the one emptyDir volume is mounted to, and the webhook server container will read certificate and key for TLS configuration from that path and start the HTTP webhook server. Refer to the below diagram:

    The pod spec will look something like this:

    spec:
      initContainers:
          image: <webhook init-image name>
          imagePullPolicy: IfNotPresent
          name: webhook-init
          volumeMounts:
            - mountPath: /etc/webhook/certs
              name: webhook-certs
      containers:
          image: <webhook server image name>
          imagePullPolicy: IfNotPresent
          name: webhook-server
          volumeMounts:
            - mountPath: /etc/webhook/certs
              name: webhook-certs
              readOnly: true
      volumes:
        - name: webhook-certs
          emptyDir: {}

    The only part remaining is to give this CA bundle information to the API server using mutation/validation configs. This can be done in two ways:

    • Patch the CA bundle in the existing MutatingWebhookConfiguration or ValidatingWebhookConfiguration using Kubernetes go-client in the init container.
    • Create MutatingWebhookConfiguration or ValidatingWebhookConfiguration in the init container itself with CA bundle information in configs.

    Here, we will create configs through init container. To get certain parameters, like mutation config name, webhook service name, and webhook namespace dynamically, we can take these values from init containers env:

    initContainers:
      image: <webhook init-image name>
      imagePullPolicy: IfNotPresent
      name: webhook-init
      volumeMounts:
        - mountPath: /etc/webhook/certs
          name: webhook-certs
      env:
        - name: MUTATE_CONFIG
          value: mutating-webhook-configuration
        - name: VALIDATE_CONFIG
          value: validating-webhook-configuration
        - name: WEBHOOK_SERVICE
          value: webhook-service
        - name: WEBHOOK_NAMESPACE
          value:  default

    To create MutatingWebhookConfiguration, we will add the below piece of code in init container code below the certificate generation code.

    package main
    
    import (
    	"bytes"
    	admissionregistrationv1 "k8s.io/api/admissionregistration/v1"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/client-go/kubernetes"
    	"os"
    	ctrl "sigs.k8s.io/controller-runtime"
    )
    
    func createMutationConfig(caCert *bytes.Buffer) {
    
    	var (
    		webhookNamespace, _ = os.LookupEnv("WEBHOOK_NAMESPACE")
    		mutationCfgName, _  = os.LookupEnv("MUTATE_CONFIG")
    		// validationCfgName, _ = os.LookupEnv("VALIDATE_CONFIG") Not used here in below code
    		webhookService, _ = os.LookupEnv("WEBHOOK_SERVICE")
    	)
    	config := ctrl.GetConfigOrDie()
    	kubeClient, err := kubernetes.NewForConfig(config)
    	if err != nil {
    		panic("failed to set go -client")
    	}
    
    	path := "/mutate"
    	fail := admissionregistrationv1.Fail
    
    	mutateconfig := &admissionregistrationv1.MutatingWebhookConfiguration{
    		ObjectMeta: metav1.ObjectMeta{
    			Name: mutationCfgName,
    		},
    		Webhooks: []admissionregistrationv1.MutatingWebhook{{
    			Name: "mapplication.kb.io",
    			ClientConfig: admissionregistrationv1.WebhookClientConfig{
    				CABundle: caCert.Bytes(), // CA bundle created earlier
    				Service: &admissionregistrationv1.ServiceReference{
    					Name:      webhookService,
    					Namespace: webhookNamespace,
    					Path:      &path,
    				},
    			},
    			Rules: []admissionregistrationv1.RuleWithOperations{{Operations: []admissionregistrationv1.OperationType{
    				admissionregistrationv1.Create},
    				Rule: admissionregistrationv1.Rule{
    					APIGroups:   []string{"apps"},
    					APIVersions: []string{"v1"},
    					Resources:   []string{"deployments"},
    				},
    			}},
    			FailurePolicy: &fail,
    		}},
    	}
      
    	if _, err := kubeClient.AdmissionregistrationV1().MutatingWebhookConfigurations().Create(mutateconfig)
    		err != nil {
    		panic(err)
    	}
    }

    The code above is just a sample code to create MutatingWebhookConfiguration. Here first, we are importing the required packages. Then, we are reading the environment variables like webhookNamespace, etc. Next, we are defining the MutatingWebhookConfiguration struct with CA bundle information (created earlier) and other required information. Finally, we are creating a configuration using the go-client. The same approach can be followed for creating the ValidatingWebhookConfiguration. For cases of pod restart or deletion, we can add extra logic in init containers like delete the existing configs first before creating or updating only the CA bundle if configs already exist.  

    For certificate rotation, the approach will be different for each approach taken for serving this certificate to the server container:

    • If we are using emptyDir volume, then the approach will be to just restart the webhook pod. As emptyDir volume is ephemeral and bound to the lifecycle of the pod, on restart, a new certificate will be generated and served to the server container. A new CA bundle will be added in configs if configs already exist.
    • If we are using secret volume, then, while restarting the webhook pod, the expiration of the existing certificate from the secret can be checked to decide whether to use the existing certificate for the server or create a new one.

    In both cases, the webhook pod restart is required to trigger the certificate rotation/renew process. When you will want to restart the webhook pod and how the webhook pod will be restarted will vary depending on the use-case. A few possible ways can be using cron-job, controllers, etc.

    Now, our custom webhook is registered, the API server can read CA bundle information through configs, and the webhook server is ready to serve the mutation/validation requests as per rules defined in configs. 

    Conclusion:

    We covered how we will add additional checks mutation/validation by registering our own custom admission webhook server. We also covered how we can automatically handle webhook server TLS certificate and key using init containers and passing the CA bundle information to API server through mutation/validation configs.

    Related Articles:

    1. OPA On Kubernetes: An Introduction For Beginners

    2. Prow + Kubernetes – A Perfect Combination To Execute CI/CD At Scale