Category: Services

  • An Introduction to React Fiber – The Algorithm Behind React

    In this article, we will learn about React Fiber—the core algorithm behind React. React Fiber is the new reconciliation algorithm in React 16. You’ve most likely heard of the virtualDOM from React 15. It’s the old reconciler algorithm (also known as the Stack Reconciler) because it uses stack internally. The same reconciler is shared with different renderers like DOM, Native, and Android view. So, calling it virtualDOM may lead to confusion.

    So without any delay, let’s see what React Fiber is.

    Introduction

    React Fiber is a completely backward-compatible rewrite of the old reconciler. This new reconciliation algorithm from React is called Fiber Reconciler. The name comes from fiber, which it uses to represent the node of the DOM tree. We will go through fiber in detail in later sections.

    The main goals of the Fiber reconciler are incremental rendering, better or smoother rendering of UI animations and gestures, and responsiveness of the user interactions. The reconciler also allows you to divide the work into multiple chunks and divide the rendering work over multiple frames. It also adds the ability to define the priority for each unit of work and pause, reuse, and abort the work. 

    Some other features of React include returning multiple elements from a render function, supporting better error handling(we can use the componentDidCatch method to get clearer error messages), and portals.

    While computing new rendering updates, React refers back to the main thread multiple times. As a result, high-priority work can be jumped over low-priority work. React has priorities defined internally for each update. 

    Before going into technical details, I would recommend you learn the following terms, which will help understand React Fiber.

    Prerequisites

    Reconciliation

    As explained in the official React documentation, reconciliation is the algorithm for diffing two DOM trees. When the UI renders for the first time, React creates a tree of nodes. Every individual node represents the React element. It creates a virtual tree (which is known as virtualDOM) that’s a copy of the rendered DOM tree. After any update from the UI, it recursively compares every tree node from two trees. The cumulative changes are then passed to the renderer.

    Scheduling

    As explained in the React documentation, suppose we have some low-priority work (like a large computing function or the rendering of recently fetched elements), and some high-priority work (such as animation). There should be an option to prioritize the high-priority work over low-priority work. In the old stack reconciler implementation, recursive traversal and calling the render method of the whole updated tree happens in single flow. This can lead to dropping frames. 

    Scheduling can be time-based or priority-based. The updates should be scheduled according to the deadline. The high-priority work should be scheduled over low-priority work.

    requestIdleCallback 

    requestAnimationFrame schedules the high-priority function to be called before the next animation frame. Similarly, requestIdleCallback schedules the low-priority or non-essential function to be called in the free time at the end of the frame. 

     requestIdleCallback(lowPriorityWork);

    This shows the usage of requestIdleCallback. lowPriorityWork is a callback function that will be called in the free time at the end of the frame.

    function lowPriorityWork(deadline) {
        while (deadline.timeRemaining() > 0 && workList.length > 0)
          performUnitOfWork();
      
        if (workList.length > 0)
          requestIdleCallback(lowPriorityWork);
      }

    When this callback function is called, it gets the argument deadline object. As you can see in the snippet above, the timeRemaining function returns the latest idle time remaining. If this time is greater than zero, we can do the work needed. And if the work is not completed, we can schedule it again at the last line for the next frame.

    So, now we are good to proceed with how the fiber object itself looks and see how React Fiber works

    Structure of fiber

    A fiber(lowercase ‘f’) is a simple JavaScript object. It represents the React element or a node of the DOM tree. It’s a unit of work. In comparison, Fiber is the React Fiber reconciler.

    This example shows a simple React component that renders in root div.

    function App() {
        return (
          <div className="wrapper">
            <div className="list">
              <div className="list_item">List item A</div>
              <div className="list_item">List item B</div>
            </div>
            <div className="section">
              <button>Add</button>
              <span>No. of items: 2</span>
            </div>
          </div>
        );
      }
     
      ReactDOM.render(<App />, document.getElementById('root'));

    It’s a simple component that shows a list of items for the data we have got from the component state. (I have replaced the .map and iteration over data with two list items just to make this example look simpler.) There is also a button and the span,which shows the number of list items.

    As mentioned earlier, fiber represents the React element. While rendering for the first time, React goes through each of the React elements and creates a tree of fibers. (We will see how it creates this tree in later sections.) 

    It creates a fiber for each individual React element, like in the example above. It will create a fiber, such as W, for the div, which has the class wrapper. Then, fiber L for the div, which has a class list, and so on. Let’s name the fibers for two list items as LA and LB.

    In the later section, we will see how it iterates and the final structure of the tree. Though we call it a tree, React Fiber creates a linked list of nodes where each node is a fiber. And there is a relationship between parent, child, and siblings. React uses a return key to point to the parent node, where any of the children fiber should return after completion of work. So, in the above example, LA’s return is L, and the sibling is LB.

    So, how does this fiber object actually look?

    Below is the definition of type, as defined in the React codebase. I have removed some extra props and kept some comments to understand the meaning of the properties. You can find the detailed structure in the React codebase.

    export type Fiber = {
        // Tag identifying the type of fiber.
        tag: TypeOfWork,
     
        // Unique identifier of this child.
        key: null | string,
     
        // The value of element.type which is used to preserve the identity during
        // reconciliation of this child.
        elementType: any,
     
        // The resolved function/class/ associated with this fiber.
        type: any,
     
        // The local state associated with this fiber.
        stateNode: any,
     
        // Remaining fields belong to Fiber
     
        // The Fiber to return to after finishing processing this one.
        // This is effectively the parent.
        // It is conceptually the same as the return address of a stack frame.
        return: Fiber | null,
     
        // Singly Linked List Tree Structure.
        child: Fiber | null,
        sibling: Fiber | null,
        index: number,
     
        // The ref last used to attach this node.
        ref: null | (((handle: mixed) => void) & {_stringRef: ?string, ...}) | RefObject,
     
        // Input is the data coming into process this fiber. Arguments. Props.
        pendingProps: any, // This type will be more specific once we overload the tag.
        memoizedProps: any, // The props used to create the output.
     
        // A queue of state updates and callbacks.
        updateQueue: mixed,
     
        // The state used to create the output
        memoizedState: any,
     
        mode: TypeOfMode,
     
        // Effect
        effectTag: SideEffectTag,
        subtreeTag: SubtreeTag,
        deletions: Array<Fiber> | null,
     
        // Singly linked list fast path to the next fiber with side-effects.
        nextEffect: Fiber | null,
     
        // The first and last fiber with side-effect within this subtree. This allows
        // us to reuse a slice of the linked list when we reuse the work done within
        // this fiber.
        firstEffect: Fiber | null,
        lastEffect: Fiber | null,
     
        // This is a pooled version of a Fiber. Every fiber that gets updated will
        // eventually have a pair. There are cases when we can clean up pairs to save
        // memory if we need to.
        alternate: Fiber | null,
      };

    How does React Fiber work?

    Next, we will see how the React Fiber creates the linked list tree and what it does when there is an update.

    Before that, let’s explain what a current tree and workInProgress tree is and how the tree traversal happens. 

    The tree, which is currently flushed to render the UI, is called current. It’s one that was used to render the current UI. Whenever there is an update, Fiber builds a workInProgress tree, which is created from the updated data from the React elements. React performs work on this workInProgress tree and uses this updated tree for the next render. Once this workInProgress tree is rendered on the UI, it becomes the current tree.

    Fig:- Current and workInProgress trees

    Fiber tree traversal happens like this:

    • Start: Fiber starts traversal from the topmost React element and creates a fiber node for it. 
    • Child: Then, it goes to the child element and creates a fiber node for this element. This continues until the leaf element is reached. 
    • Sibling: Now, it checks for the sibling element if there is any. If there is any sibling, it traverses the sibling subtree until the leaf element of the sibling. 
    • Return: If there is no sibling, then it returns to the parent. 

    Every fiber has a child (or a null value if there is no child), sibling, and parent property (as you have seen the structure of fiber in the earlier section). These are the pointers in the Fiber to work as a linked list.

    Fig:- React Fiber tree traversal

    Let’s take the same example, but let’s name the fibers that correspond to the specific React elements.

    function App() {    // App
        return (
          <div className="wrapper">    // W
            <div className="list">    // L
              <div className="list_item">List item A</div>    // LA
              <div className="list_item">List item B</div>    // LB
            </div>
            <div className="section">   // S
              <button>Add</button>   // SB
              <span>No. of items: 2</span>   // SS
            </div>
          </div>
        );
      }
     
      ReactDOM.render(<App />, document.getElementById('root'));  // HostRoot

    First, we will quickly cover the mounting stage where the tree is created, and after that, we will see the detailed logic behind what happens after any update.

    Initial render

    The App component is rendered in root div, which has the id of root.

    Before traversing further, React Fiber creates a root fiber. Every Fiber tree has one root node. Here in our case, it’s HostRoot. There can be multiple roots if we import multiple React Apps in the DOM.

    Before rendering for the first time, there won’t be any tree. React Fiber traverses through the output from each component’s render function and creates a fiber node in the tree for each React element. It uses createFiberFromTypeAndProps to convert React elements to fiber. The React element can be a class component or a host component like div or span. For the class component, it creates an instance, and for the host component, it gets the data/props from the React Element.

    So, as shown in the example, it creates a fiber App. Going further, it creates one more fiber, W, and then it goes to child div and creates a fiber L. So on, it creates a fiber, LA  and LB, for its children. The fiber, LA, will have return (can also be called as a parent in this case) fiber as L, and sibling as LB.

    So, this is how the final fiber tree will look.

    Fig:- React Fiber Relationship

    This is how the nodes of a tree are connected using the child, sibling, and return pointers.

    Update Phase

    Now, let’s cover the second case, which is update—say due to setState. 

    So, at this time, Fiber already has the current tree. For every update, it builds a workInProgress tree. It starts with the root fiber and traverses the tree until the leaf node. Unlike the initial render phase, it doesn’t create a new fiber for every React element. It just uses the preexisting fiber for that React element and merges the new data/props from the updated element in the update phase. 

    Earlier, in React 15, the stack reconciler was synchronous. So, an update would traverse the whole tree recursively and make a copy of the tree. Suppose in between this, if some other update comes that has a higher priority than this, then there is no chance to abort or pause the first update and perform the second update. 

    React Fiber divides the update into units of works. It can assign the priority to each unit of work, and has the ability to pause, reuse, or abort the unit of work if not needed. React Fiber divides the work into multiple units of work, which is fiber. It schedules the work in multiple frames and uses the deadline from the requestIdleCallback. Every update has its priority defined like animation, or user input has a higher priority than rendering the list of items from the fetched data. Fiber uses requestAnimationFrame for higher priority updates and requestIdleCallback for lower priority updates. So, while scheduling a work, Fiber checks the priority of the current update and the deadline (free time after the end of the frame).

    Fiber can schedule multiple units of work after a single frame if the priority is higher than the pending work—or if there is no deadline or the deadline has yet to be reached. And the next set of units of work is carried over the further frames. This is what makes it possible for Fiber to pause, reuse, and abort the unit of work.

    So, let’s see what actually happens in the scheduled work. There are two phases to complete the work: render and commit.

    Render Phase

    The actual tree traversal and the use of deadline happens in this phase. This is the internal logic of Fiber, so the changes made on the Fiber tree in this phase won’t be visible to the user. So Fiber can pause, abort, or divide work on multiple frames. 

    We can call this phase the reconciliation phase. Fiber traverses from the root of the fiber tree and processes each fiber. The workLoop function is called for every unit of work to perform the work. We can divide this processing of the work into two steps: begin and complete.

    Begin Step

    If you find the workLoop function from the React codebase, it calls the performUnitOfWork, which takes the nextUnitOfWork as a parameter. It is nothing but the unit of work, which will be performed. The performUnitOfWork function internally calls the beginWork function. This is where the actual work happens on the fiber, and performUnitOfWork is just where the iteration happens. 

    Inside the beginWork function, if the fiber doesn’t have any pending work, it just bails out(skips) the fiber without entering the begin phase. This is how, while traversing the large tree, Fiber skips already processed fibers and directly jumps to the fiber, which has pending work. If you see the large beginWork function code block, we will find a switch block that calls the respective fiber update function, depending on the fiber tag. Like updateHostComponent for host components. These functions update the fiber. 

    The beginWork function returns the child fiber if there is any or null if there is no child. The performUnitOfWork function keeps on iterative and calls the child fibers till the leaf node reaches. In the case of a leaf node, beginWork returns null as there is no any child and performUnitOfWork function calls a completeUnitOfWork function. Let’s see the complete step now.

    Complete Step

    This completeUnitOfWork function completes the current unit of work by calling a completeWork function. completeUnitOfWork returns a sibling fiber if there is any to perform the next unit of work else completes the return(parent) fiber if there is no work on it. This goes till the return is null, i.e.,  until it reaches the root node. Like beginWork, completeWork is also a function where actual work happens, and completeUnitOfWork is for the iterations.

    The result of the render phase creates an effect list (side-effects). These effects are like insert, update, or delete a node of host components, or calling the lifecycle methods for the node of class components. The fibers are marked with the respective effect tag.

    After the render phase, Fiber will be ready to commit the updates. 

    Commit Phase

    This is the phase where the finished work will be used to render it on the UI. As the result of this phase will be visible to the user, it can’t be divided in partial renders. This phase is a synchronous phase. 

    At the beginning of this phase, Fiber has the current tree that’s already rendered on the UI, finishedWork, or the workInProgress tree, which is built during the render phase and the effect list.

    The effect list is the linked list of fibers, which has side-effects. So, it’s a subset of nodes of the workInProgress tree from the render phase, which has side-effects(updates). The effect list nodes are linked using a nextEffect pointer.

    The function called during this phase is completeRoot

    Here, the workInProgress tree becomes the current tree as it is used to render the UI. The actual DOM updates like insert, update, delete, and calls to lifecycle methods—or updates related to refs—happen for the nodes present in the effect list.

    That’s how the Fiber reconciler works.

    Conclusion

    This is how the React Fiber reconciler makes it possible to divide the work into multiple units of work. It sets the priority of each work, and makes it possible to pause, reuse, and abort the unit of work. In the fiber tree, the individual node keeps track of which are needed to make the above things possible. Every fiber is a node of linked lists, which are connected through the child, sibling, and return references. 

    Here is a well documented list of resources you can find to know more about the React Fiber.

    Related Articles

    1. Using Formik To Build Dynamic Forms In React – Faster & Better

    2. Cleaner, Efficient Code with Hooks and Functional Programming

  • How Much Do You Really Know About Simplified Cloud Deployments?

    Is your EC2/VM bill giving you sleepless nights?

    Are your EC2 instances under-utilized? Have you been wondering if there was an easy way to maximize the EC2/VM usage?

    Are you investing too much in your Control Plane and wish you could divert some of that investment towards developing more features in your applications (business logic)?

    Is your Configuration Management system overwhelming you and seems to have got a life of its own?

    Do you have legacy applications that do not need Docker at all?

    Would you like to simplify your deployment toolchain to streamline your workflows?

    Have you been recommended to use Kubernetes as a problem to fix all your woes, but you aren’t sure if Kubernetes is actually going to help you?

    Do you feel you are moving towards Docker, just so that Kubernetes can be used?

    If you answered “Yes” to any of the questions above, do read on, this article is just what you might need.

    There are steps to create a simple setup on your laptop at the end of the article.

    Introduction

    In the following article, we will present the typical components of a multi-tier application and how it is setup and deployed.

    We shall further go on to see how the same application deployment can be remodeled for scale using any Cloud Infrastructure. (The same software toolchain can be used to deploy the application on your On-Premise Infrastructure as well)

    The tools that we propose are Nomad and Consul. We shall focus more on how to use these tools, rather than deep-dive into the specifics of the tools. We will briefly see the features of the software which would help us achieve our goals.

    • Nomad is a distributed workload manager for not only Docker containers, but also for various other types of workloads like legacy applications, JAVA, LXC, etc.

    More about Nomad Drivers here: Nomadproject.io, application delivery with HashiCorp, introduction to HashiCorp Nomad.

    • Consul is a distributed service mesh, with features like service registry and a key-value store, among others.

    Using these tools, the application/startup workflow would be as follows:

    Nomad will be responsible for starting the service.

    Nomad will publish the service information in Consul. The service information will include details like:

    • Where is the application running (IP:PORT) ?
    • What “service-name” is used to identify the application?
    • What “tags” (metadata) does this application have?

    A Typical Application

    A typical application deployment consists of a certain fixed set of processes, usually coupled with a database and a set of few (or many) peripheral services.

    These services could be primary (must-have) or support (optional) features of the application.

    Note: We are aware about what/how a proper “service-oriented-architecture” should be, though we will skip that discussion for now. We will rather focus on how real-world applications are setup and deployed.

    Simple Multi-tier Application

    In this section, let’s see the components of a multi-tier application along with typical access patterns from outside the system and within the system.

    • Load Balancer/Web/Front End Tier
    • Application Services Tier
    • Database Tier
    • Utility (or Helper Servers): To run background, cron, or queued jobs.

    Using a proxy/loadbalancer, the services (Service-A, Service-B, Service-C) could be accessed using distinct hostnames:

    • a.example.tld
    • b.example.tld
    • c.example.tld

    For an equivalent path-based routing approach, the setup would be similar. Instead of distinct hostnames, the communication mechanism would be:

    • common-proxy.example.tld/path-a/
    • common-proxy.example.tld/path-b/
    • common-proxy.example.tld/path-c/

    Problem Scenario 1

    Some of the basic problems with the deployment of the simple multi-tier application are:

    • What if the service process crashes during its runtime?
    • What if the host on which the services run shuts down, reboots or terminates?

    This is where Nomad’s feature of always keep the service running would be useful.

    In spite of this auto-restart feature, there could be issues if the service restarts on a different machine (i.e. different IP address).

    In case of Docker and ephemeral ports, the service could start on a different port as well.

    To solve this, we will use the service discovery feature provided by Consul, combined with a with a Consul-aware load-balancer/proxy to redirect traffic to the appropriate service.

    The order of the operations within the Nomad job will thus be:

    • Nomad will launch the job/task.
    • Nomad will register the task details as a service definition in Consul.
      (These steps will be re-executed if/when the application is restarted due to a crash/fail-over)
    • The Consul-aware load-balancer will route the traffic to the service (IP:PORT)

    Multi-tier Application With Load Balancer

    Using the Consul-aware load-balancer, the diagram will now look like:

    The details of the setup now are:

    • A Consul-aware load-balancer/proxy; the application will access the services via the load-balancer.
    • 3 (three) instances of service A; A1, A2, A3
    • 3 (three) instances of service B; B1, B2, B3

    The Routing Question

    At this moment, you could be wondering, “Why/How would the load-balancer know that it has to route traffic for service-A to A1/A2/A3 and route traffic for service-B to B1/B2/B3 ?”

    The answer lies in the Consul tags which will be published as part of the service definition (when Nomad registers the service in Consul).

    The appropriate Consul tags will tell the load-balancer to route traffic of a particular service to the appropriate backend. (+++)

    Let’s read that statement again (very slowly, just to be sure); The Consul tags, which are part of the service definition, will inform (advertise) the load-balancer to route traffic to the appropriate backend.

    The reason to dwell upon this distinction is very important, as this is different from how the classic load-balancer/proxy software like HAProxy or NGINX are configured. For HAProxy/NGINX the backend routing information resides with the load-balancer instance and is not “advertised” by the backend.

    The traditional load-balancers like NGINX/HAProxy do not natively support dynamic reloading of the backends. (when the backends stop/start/move-around). The heavy lifting of regenerating the configuration file and reloading the service is left up to an external entity like Consul-Template.

    The use of a Consul-aware load-balancer, instead of a traditional load-balancer, eliminates the need of external workarounds.

    The setup can thus be termed as a zero-configuration setup; you don’t have to re-configure the load-balancer, it will discover the changing backend services based on the information available from Consul.

    Problem Scenario 2

    So far we have achieved a method to “automatically” discover the backends, but isn’t the Load-Balancer itself a single-point-of-failure (SPOF)?

    It absolutely is, and you should always have redundant load-balancers instances (which is what any cloud-provided load-balancer has).

    As there is a certain cost associated with using “cloud-provided load-balancer”, we would create the load-balancers ourselves and not use cloud-provided load-balancers.

    To provide redundancy to the load-balancer instances, you should configure them using and AutoScalingGroup (AWS), VM Scale Sets (Azure), etc.

    The same redundancy strategy should also be used for the worker nodes, where the actual services reside, by using AutoScaling Groups/VMSS for the worker nodes.

    The Complete Picture

    Installation and Configuration

    Given that nowadays laptops are pretty powerful, you can easily create a test setup on your laptop using VirtualBox, VMware Workstation Player, VMware Workstation, etc.

    As a prerequisite, you will need a few virtual machines which can communicate with each other.

    NOTE: Create the VMs with networking set to bridged mode.

    The machines needed for the simple setup/demo would be:

    • 1 Linux VM to act as a server (srv1)
    • 1 Linux VM to act as a load-balancer (lb1)
    • 2 Linux VMs to act as worker machines (client1, client2)

    *** Each machine can be 2 CPU 1 GB memory each.

    The configuration files and scripts needed for the demo, which will help you set up the Nomad and Consul cluster are available here.

    Setup the Server

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service

    Create the Server Configuration

    ### On the server machine ...
    
    ### Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/server.hcl -O /etc/consul/server.hcl
    
    ### Edit Consul's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
    sudo vim /etc/consul/server.hcl
    
    ### Nomad
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/server.hcl -O /etc/nomad/server.hcl
    
    ### Edit Nomad's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
    sudo vim /etc/nomad/server.hcl
    
    ### After you are done with the edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members

    Setup the Load-Balancer

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service
    sudo mv -fv nomad.service /etc/systemd/system/nomad.service

    Create the Load-Balancer Configuration

    ### On the load-balancer machine ...
    
    ### for Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl
    
    ### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/consul/client.hcl
    
    ### for Nomad ...
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl
    
    ### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/nomad/client.hcl
    
    ### After you are done with the edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members
    sudo nomad node status -verbose

    Setup the Client (Worker) Machines

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service
    sudo mv -fv nomad.service /etc/systemd/system/nomad.service

    Create the Worker Configuration

    ### On the client (worker) machine ...
    
    ### Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl
    
    ### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/consul/client.hcl
    
    ### Nomad
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl
    
    ### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/nomad/client.hcl
    
    ### After you are sure about your edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members
    sudo nomad node status -verbose

    Test the Setup

    For the sake of simplicity, we shall assume the following IP addresses for the machines. (You can adapt the IPs as per your actual cluster configuration)

    srv1: 192.168.1.11

    lb1: 192.168.1.101

    client1: 192.168.201

    client1: 192.168.202

    You can access the web GUI for Consul and Nomad at the following URLs:

    Consul: http://192.168.1.11:8500

    Nomad: http://192.168.1.11:4646

    Login into the server and start the following watch command:

    # watch -n 5 "consul members; echo; nomad server members; echo; nomad node status -verbose; echo; nomad job status"

    Output:

    Node     Address             Status  Type    Build  Protocol  DC   Segment
    srv1     192.168.1.11:8301   alive   server  1.5.1  2         dc1  <all>
    client1  192.168.1.201:8301  alive   client  1.5.1  2         dc1  <default>
    client2  192.168.1.202:8301  alive   client  1.5.1  2         dc1  <default>
    lb1      192.168.1.101:8301  alive   client  1.5.1  2         dc1  <default>
    
    Name         Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
    srv1.global  192.168.1.11  4648  alive   true    2         0.9.3  dc1         global
    
    ID           DC   Name     Class   Address        Version Drain  Eligibility  Status
    37daf354...  dc1  client2  worker  192.168.1.202  0.9.3  false  eligible     ready
    9bab72b1...  dc1  client1  worker  192.168.1.201  0.9.3  false  eligible     ready
    621f4411...  dc1  lb1      lb      192.168.1.101  0.9.3  false  eligible     ready

    Submit Jobs

    Login into the server (srv1) and download the sample jobs

    Run the load-balancer job

    # nomad run fabio_docker.nomad

    Output:

    ==> Monitoring evaluation "bb140467"
        Evaluation triggered by job "fabio_docker"
        Allocation "1a6a5587" created: node "621f4411", group "fabio"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "bb140467" finished with status "complete"

    Check the status of the load-balancer

    # nomad alloc status 1a6a5587

    Output:

    ID                  = 1a6a5587
    Eval ID             = bb140467
    Name                = fabio_docker.fabio[0]
    Node ID             = 621f4411
    Node Name           = lb1
    Job ID              = fabio_docker
    Job Version         = 0
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 1m9s ago
    Modified            = 1m3s ago
    
    Task "fabio" is "running"
    Task Resources
    CPU        Memory          Disk     Addresses
    5/200 MHz  10 MiB/128 MiB  300 MiB  lb: 192.168.1.101:9999
                                        ui: 192.168.1.101:9998
    
    Task Events:
    Started At     = 2019-06-13T19:15:17Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-13T19:15:17Z  Started     Task started by client
    2019-06-13T19:15:12Z  Driver      Downloading image
    2019-06-13T19:15:12Z  Task Setup  Building Task Directory
    2019-06-13T19:15:12Z  Received    Task received by client

    Run the service ‘foo’

    # nomad run foo_docker.nomad

    Output:

    ==> Monitoring evaluation "a994bbf0"
        Evaluation triggered by job "foo_docker"
        Allocation "7794b538" created: node "9bab72b1", group "gowebhello"
        Allocation "eecceffc" modified: node "37daf354", group "gowebhello"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "a994bbf0" finished with status "complete"

    Check the status of service ‘foo’

    # nomad alloc status 7794b538

    Output:

    ID                  = 7794b538
    Eval ID             = a994bbf0
    Name                = foo_docker.gowebhello[1]
    Node ID             = 9bab72b1
    Node Name           = client1
    Job ID              = foo_docker
    Job Version         = 1
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 9s ago
    Modified            = 7s ago
    
    Task "gowebhello" is "running"
    Task Resources
    CPU        Memory           Disk     Addresses
    0/500 MHz  4.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23382
    
    Task Events:
    Started At     = 2019-06-13T19:27:17Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-13T19:27:17Z  Started     Task started by client
    2019-06-13T19:27:16Z  Task Setup  Building Task Directory
    2019-06-13T19:27:15Z  Received    Task received by client

    Run the service ‘bar’

    # nomad run bar_docker.nomad

    Output:

    ==> Monitoring evaluation "075076bc"
        Evaluation triggered by job "bar_docker"
        Allocation "9f16354b" created: node "9bab72b1", group "gowebhello"
        Allocation "b86d8946" created: node "37daf354", group "gowebhello"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "075076bc" finished with status "complete"

    Check the status of service ‘bar’

    # nomad alloc status 9f16354b

    Output:

    ID                  = 9f16354b
    Eval ID             = 075076bc
    Name                = bar_docker.gowebhello[1]
    Node ID             = 9bab72b1
    Node Name           = client1
    Job ID              = bar_docker
    Job Version         = 0
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 4m28s ago
    Modified            = 4m16s ago
    
    Task "gowebhello" is "running"
    Task Resources
    CPU        Memory           Disk     Addresses
    0/500 MHz  6.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23646
    
    Task Events:
    Started At     = 2019-06-14T06:49:36Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-14T06:49:36Z  Started     Task started by client
    2019-06-14T06:49:35Z  Task Setup  Building Task Directory
    2019-06-14T06:49:35Z  Received    Task received by client

    Check the Fabio Routes

    http://192.168.1.101:9998/routes

    Connect to the Services

    The services “foo” and “bar” are available at:

    http://192.168.1.101:9999/foo

    http://192.168.1.101:9999/bar

    Output:

    gowebhello root page
    
    https://github.com/udhos/gowebhello is a simple golang replacement for 'python -m SimpleHTTPServer'.
    Welcome!
    gowebhello version 0.7 runtime go1.12.5 os=linux arch=amd64
    Keepalive: true
    Application banner: Welcome to FOO
    ...
    ...

    Pressing F5 to refresh the browser should keep changing the backend service that you are eventually connected to.

    Conclusion

    This article should give you a fair idea about the common problems of a distributed application and how they can be solved.

    Remodeling an existing application deployment as it scales can be quite a challenge. Hopefully the sample/demo setup will help you to explore, design and optimize the deployment workflows of your application, be it On-Premise or any Cloud Environment.

  • Why You Should Prefer Next.js 12 Over Other React Setup

    If you are coming from a robust framework, such as Angular or any other major full-stack framework, you have probably asked yourself why a popular library like React (yes, it’s not a framework, hence this blog) has the worst tooling and developer experience.

    They’ve done the least amount of work possible to build this framework: no routing, no support for SSR, nor a decent design system, or CSS support. While some people might disagree—“The whole idea is to keep it simple so that people can bootstrap their own framework.” –Dan Abramov. However, here’s the catch: Most people don’t want to go through the tedious process of setting up.

    Many just want to install and start building some robust applications, and with the new release of Next.js (12), it’s more production-ready than your own setup can ever be.

    Before we get started discussing what Next.js 12 can do for us, let’s get some facts straight:

    • React is indeed a library that could be used with or without JSX.
    • Next.js is a framework (Not entirely UI ) for building full-stack applications. 
    • Next.js is opinionated, so if your plan is to do whatever you want or how you want, maybe Next isn’t the right thing for you (mind that it’s for production).
    • Although Next is one of  the most updated code bases and has a massive community supporting it, a huge portion of it is handled by Vercel, and like other frameworks backed by a tech giant… be ready for occasional Vendor-lockin (don’t forget React–[Meta] ).
    • This is not a Next.js tutorial; I won’t be going over Next.js. I will be going over the features that are released with V12 that make it go over the inflection point where Next could be considered as the primary framework for React apps.

    ES module support

    ES modules bring a standardized module system to the entire JS ecosystem. They’re supported by all major browsers and node.js, enabling your build to have smaller package sizes. This lets you use any package using a URL—no installation or build step required—use any CDN that serves ES module as well as the design tools of the future (Framer already does it –https://www.framer.com/ ).

    import Card from 'https://framer.com/m/Card-3Yxh.js@gsb1Gjlgc5HwfhuD1VId';
    import Head from 'next/head';
    
    export default class MyDocument extends Document {
      render() {
        return (
          <>
            <Head>
              <title>URL imports for Next 12</title>
            </Head>
            <div>
              <Card variant='R3F' />
            </div>
          </>
        );
      }
    }

    As you can see, we are importing a Card component directly from the framer CDN on the go with all its perks. This would, in turn, be the start of seamless integration with all your developer environments in the not-too-distant future. If you want to learn more about URL imports and how to enable the alpha version, go here.

    New engine for faster DEV run and production build:

    Next.js 12 comes with a new Rust compiler that comes with a native infrastructure. This is built on top of SWC, an open platform for fast tooling systems. It comes with an impressive stat of having 3 times faster local refresh and 5 times faster production builds.

    Contrary to most productions builds with React using webpack, which come with a ton of overheads and don’t really run on the native system, SWC is going to save you a ton of time that you waste during your mundane workloads.

    Source: Nextjs.org

    Next.js Live:

    If you are anything like me, you’ve probably had some changes that you aren’t really sure about and just want to go through them with the designer, but you don’t really wanna push the code to PROD. Taking a call with the designer and sharing your screen isn’t really the best way to do it. If only there were a way to share your workflow on-the-go with your team with some collaboration feature that just wouldn’t take up an entire day to setup. Well, Next.js Live lets you do just that.

    Source: Next.js

    With the help of ES module system and native support for webassembly, Next.js Live runs entirely on the browser, and irrespective of where you host it, the development engine behind it will soon be open source so that more platforms can actually take advantage of this, but for now, it’s all Next.js.

    Go over to V and do a test run.

    Middleware & serverless: 

    These are just repetitive pieces of code that you think could run on their own out of your actual backend. The best part about this is that you don’t really need to place these close to your backend. Before the request gets completed, you can potentially rewrite, redirect, add headers, or even stream HTML., Depending upon how you host your middleware using Vercel edge functions or lambdas with AWS, they can potentially handle

    • Authentication
    • Bot protection
    • Redirects 
    • Browser support
    • Feature flags 
    • A/B tests
    • Server-side analytics 
    • Logging

    And since this is part of the Next build output, you can technically use any hosting providers with an Edge network (No Vendor lock-in)

    For implementing middleware, we can create a file _middleware inside any pages folder that will run before any requests at that particular route (routename)

    pages/routeName/_middleware.ts. 

    import type { NextFetchEvent } from 'next/server';
    import { NextResponse } from 'next/server';
    export function middleware(event: NextFetchEvent) {
      // gram the user's location or use India for default
      const country = event.request.geo.country.toLowerCase() || 'IND';
    
      //rewrite to static, cached page for each local
      return event.respondWith(NextResponse.rewrite(`/routeName/${country}`));
    }

    Since this middleware, each request will be cached, and  rewriting the response change the URL in your client Next.js can make the difference and still provide you the country flag. 

    Server-side streaming:

    React 18 now supports server-side suspense API and SSR streaming. One big drawback of SSR was that it wasn’t restricted to the strict run time of REST fetch standard. So, in theory, any page that needed heavy lifting from the server could give you higher FCP (first contentful paint). Now this will allow you to stream server-rendered pages using HTTP streaming that will solve your problem for higher render time you can take a look at the alpha version by adding. 

    module.exports = {
      experimental: {
        concurrentFeatures: true
      }
    }

    React server components:

    React server components allow us to render almost everything, including the components themselves inside the server. This is fundamentally different from SSR where you are just generating HTML on the server, with server components, there’s zero client-side Javascript needed, making the rendering process much faster (basically no hydration process). This could also be deemed as including the best parts of server rendering with client-side interactivity.

    import Footer from '../components/Footer';
    import Page from '../components/Page';
    import Story from '../components/Story';
    import fetchData from '../lib/api';
    export async function getServerSideProps() {
      const storyIds = await fetchData('storyIds');
      const data = await Promise.all(
        storyIds.slice(0, 30).map(async (id) => await fetchData(`item/${id}`))
      );
    
      return {
        props: {
          data,
        },
      };
    }
    
    export default function News({ data }) {
      return (
        <Page>
          {data?.map((item, i) => (
            <Story key={i} {...item} />
          ))}
          <Footer />
        </Page>
      );
    }

    As you can see in the above SSR example, while we are fetching the stories from the endpoint, our client is actually waiting for a response with a blank page, and depending upon how fast your APIs are, this is a pretty big problem—and the reason we don’t just use SSR blindly everywhere.

    Now, let’s take a look at a server component example:

    Any file ending with .server.js/.ts will be treated as a server component in your Next.js application. 

    export async function NewsWithData() {
      const storyIds = await fetchData('storyIds');
      return (
        <>
          {storyIds.slice(0, 30).map((id) => {
            return (
              <Suspense fallback={<Spinner />}>
                <StoryWithData id={id} />
              </Suspense>
            );
          })}
        </>
      );
    }
    
    export default function News() {
      return (
        <Page>
          <Suspense fallback={<Spinner />}>
            <NewsWithData />
          </Suspense>
          <Footer />
        </Page>
      );
    }

    This implementation will stream your components progressively and eventually show your data as it gets generated in the server component–by-component. The difference is huge; it will be the next level of code-splitting ,and allow you to do data fetching at the component level and you don’t need to worry about making an API call in the browser.

    And functions like getStaticProps and getserverSideProps will be a liability of the past.

    And this also identifies the React Hooks model, going with the de-centralized component model. It also removes the choice we often need to make between static or dynamic, bringing the best of both worlds. In the future, the feature of incremental static regeneration will be based on a per-component level, removing the all or nothing page caching and in terms will allow decisive / intelligent caching based on your needs.

    Next.js is internally working on a data component, which is basically the React suspense API but with surrogate keys, revalidate, and fallback, which will help to realize these things in the future. Defining your caching semantics at the component level

    Conclusion:

    Although all the features mentioned above are still in the development stage, just the inception of these will take the React world and frontend in general into a particular direction, and it’s the reason you should be keeping it as your default go-to production framework. 

  • Building a Collaborative Editor Using Quill and Yjs

    “Hope this email finds you well” is how 2020-2021 has been in a nutshell. Since we’ve all been working remotely since last year, actively collaborating with teammates became one notch harder, from activities like brainstorming a topic on a whiteboard to building documentation.

    Having tools powered by collaborative systems had become a necessity, and to explore the same following the principle of build fast fail fast, I started building up a collaborative editor using existing available, open-source tools, which can eventually be extended for needs across different projects.

    Conflicts, as they say, are inevitable, when multiple users are working on the same document constantly modifying it, especially if it’s the same block of content. Ultimately, the end-user experience is defined by how such conflicts are resolved.

    There are various conflict resolution mechanisms, but two of the most commonly discussed ones are Operational Transformation (OT) and Conflict-Free Replicated Data Type (CRDT). So, let’s briefly talk about those first.

    Operational Transformation

    The order of operations matter in OT, as each user will have their own local copy of the document, and since mutations are atomic, such as insert V at index 4 and delete X at index 2. If the order of these operations is changed, the end result will be different. And that’s why all the operations are synchronized through a central server. The central server can then alter the indices and operations and then forward to the clients. For example, in the below image, User2 makes a delete(0) operation, but as the OT server realizes that User1 has made an insert operation, the User2’s operation needs to be changed as delete(1) before applying to User1.

    OT with a central server is typically easier to implement. Plain text operations with OT in its basic form only has three defined operations: insert, delete, and apply.

    Source: Conclave

    “Fully distributed OT and adding rich text operations are very hard, and that’s why there’s a million papers.”

    CRDT

    Instead of performing operations directly on characters like in OT, CRDT uses a complex data structure to which it can then add/update/remove properties to signify transformation, enabling scope for commutativity and idempotency. CRDTs guarantee eventual consistency.

    There are different algorithms, but in general, CRDT has two requirements: globally unique characters and globally ordered characters. Basically, this involves a global reference for each object, instead of positional indices, in which the ordering is based on the neighboring objects. Fractional indices can be used to assign index to an object.

    Source: Conclave

    As all the objects have their own unique reference, delete operation becomes idempotent. And giving fractional indices is one way to give unique references while insertion and updation.

    There are two types of CRDT, one is state-based, where the whole state (or delta) is shared between the instances and merged continuously. The other is operational based, where only individual operations are sent between replicas. If you want to dive deep into CRDT, here’s a nice resource.

    For our purposes, we choose CRDT since it can also support peer-to-peer networks. If you directly want to jump to the code, you can visit the repo here.

    Tools used for this project:

    As our goal was for a quick implementation, we targeted off-the-shelf tools for editor and backend to manage collaborative operations.

    • Quill.js is an API-driven WYSIWYG rich text editor built for compatibility and extensibility. We choose Quill as our editor because of the ease to plug it into your application and availability of extensions.
    • Yjs is a framework that provides shared editing capabilities by exposing its different shared data types (Array, Map, Text, etc) that are synced automatically. It’s also network agnostic, so the changes are synced when a client is online. We used it because it’s a CRDT implementation, and surprisingly had readily available bindings for quill.js.

    Prerequisites:

    To keep it simple, we’ll set up a client and server both in the same code base. Initialize a project with npm init and install the below dependencies:

    npm i quill quill-cursors webpack webpack-cli webpack-dev-server y-quill y-websocket yjs

    • Quill: Quill is the WYSIWYG rich text editor we will use as our editor.
    • quill-cursors is an extension that helps us to display cursors of other connected clients to the same editor room.
    • Webpack, webpack-cli, and webpack-dev-server are developer utilities, webpack being the bundler that creates a deployable bundle for your application.
    • The Y-quill module provides bindings between Yjs and QuillJS with use of the SharedType y.Text. For more information, you can check out the module’s source on Github.
    • Y-websocket provides a WebsocketProvider to communicate with Yjs server in a client-server manner to exchange awareness information and data.
    • Yjs, this is the CRDT framework which orchestrates conflict resolution between multiple clients. 

    Code to use

    const path = require('path');
    
    module.exports = {
      mode: 'development',
      devtool: 'source-map',
      entry: {
        index: './index.js'
      },
      output: {
        globalObject: 'self',
        path: path.resolve(__dirname, './dist/'),
        filename: '[name].bundle.js',
        publicPath: '/quill/dist'
      },
      devServer: {
        contentBase: path.join(__dirname),
        compress: true,
        publicPath: '/dist/'
      }
    }

    This is a basic webpack config where we have provided which file is the starting point of our frontend project, i.e., the index.js file. Webpack then uses that file to build the internal dependency graph of your project. The output property is to define where and how the generated bundles should be saved. And the devServer config defines necessary parameters for the local dev server, which runs when you execute “npm start”.

    We’ll first create an index.html file to define the basic skeleton:

    <!DOCTYPE html>
    <html>
      <head>
        <title>Yjs Quill Example</title>
        <script src="./dist/index.bundle.js" async defer></script>
        <link rel=stylesheet href="//cdn.quilljs.com/1.3.6/quill.snow.css" async defer>
      </head>
      <body>
        <button type="button" id="connect-btn">Disconnect</button>
        <div id="editor" style="height: 500px;"></div>
      </body>
    </html>

    The index.html has a pretty basic structure. In <head>, we’ve provided the path of the bundled js file that will be created by webpack, and the css theme for the quill editor. And for the <body> part, we’ve just created a button to connect/disconnect from the backend and a placeholder div where the quill editor will be plugged.

    • Here, we’ve just made the imports, registered quill-cursors extension, and added an event listener for window load:
    import Quill from "quill";
    import * as Y from 'yjs';
    import { QuillBinding } from 'y-quill';
    import { WebsocketProvider } from 'y-websocket';
    import QuillCursors from "quill-cursors";
    
    // Register QuillCursors module to add the ability to show multiple cursors on the editor.
    Quill.register('modules/cursors', QuillCursors);
    
    window.addEventListener('load', () => {
      // We'll add more blocks as we continue
    });

    • Let’s initialize the Yjs document, socket provider, and load the document:
    window.addEventListener('load', () => {
      const ydoc = new Y.Doc();
      const provider = new WebsocketProvider('ws://localhost:3312', 'velotio-demo', ydoc);
      const type = ydoc.getText('Velotio-Blog');
    });

    • We’ll now initialize and plug the Quill editor with its bindings:
    window.addEventListener('load', () => {
      // ### ABOVE CODE HERE ###
    
      const editorContainer = document.getElementById('editor');
      const toolbarOptions = [
        ['bold', 'italic', 'underline', 'strike'],  // toggled buttons
        ['blockquote', 'code-block'],
        [{ 'header': 1 }, { 'header': 2 }],               // custom button values
        [{ 'list': 'ordered' }, { 'list': 'bullet' }],
        [{ 'script': 'sub' }, { 'script': 'super' }],      // superscript/subscript
        [{ 'indent': '-1' }, { 'indent': '+1' }],          // outdent/indent
        [{ 'direction': 'rtl' }],                         // text direction
        // array for drop-downs, empty array = defaults
        [{ 'size': [] }],
        [{ 'header': [1, 2, 3, 4, 5, 6, false] }],
        [{ 'color': [] }, { 'background': [] }],          // dropdown with defaults from theme
        [{ 'font': [] }],
        [{ 'align': [] }],
        ['image', 'video'],
        ['clean']                                         // remove formatting button
      ];
    
      const editor = new Quill(editorContainer, {
        modules: {
          cursors: true,
          toolbar: toolbarOptions,
          history: {
            userOnly: true  // only user changes will be undone or redone.
          }
        },
        placeholder: "collab-edit-test",
        theme: "snow"
      });
    
      const binding = new QuillBinding(type, editor, provider.awareness);
    });

    • Finally, let’s implement the Connect/Disconnect button and complete the callback:
    window.addEventListener('load', () => {
      // ### ABOVE CODE HERE ###
    
      const connectBtn = document.getElementById('connect-btn');
      connectBtn.addEventListener('click', () => {
    	if (provider.shouldConnect) {
      	  provider.disconnect();
      	  connectBtn.textContent = 'Connect'
    	} else {
      	  provider.connect();
      	  connectBtn.textContent = 'Disconnect'
    	}
      });
    
      window.example = { provider, ydoc, type, binding, Y }
    });

    Steps to run:

    • Server:

    For simplicity, we’ll directly use the y-websocket-server out of the box.

    NOTE: You can either let it run and open a new terminal for the next commands, or let it run in the background using `&` at the end of the command.

    • Client:

    Start the client by npm start. On successful compilation, it should open on your default browser, or you can just go to http://localhost:8080.

    Show me the repo

    You can find the repository here.

    Conclusion:

    Conflict resolution approaches are not relatively new, but with the trend of remote culture, it is important to have good collaborative systems in place to enhance productivity.

    Although this example was just on rich text editing capabilities, we can extend existing resources to build more features and structures like tabular data, graphs, charts, etc. Yjs shared types can be used to define your own data format based on how your custom editor represents data internally.

  • Acquiring Temporary AWS Credentials with Browser Navigated Authentication

    In one of my previous blog posts (Hacking your way around AWS IAM Roles), we demonstrated how users can access AWS resources without having to store AWS credentials on disk. This was achieved by setting up an OpenVPN server and client-side route that gets automatically pushed when the user is connected to the VPN. To this date, I really find this as a complaint-friendly solution without forcing users to do any manual configuration on their system. It also makes sense to have access to AWS resources as long as they are connected on VPN. One of the downsides to this method is maintaining an OpenVPN server, keeping it secure and having it running in a highly available (HA) state. If the OpenVPN server is compromised, our credentials are at stake. Secondly, all the users connected on VPN get the same level of access.

    In this blog post, we present to you a CLI utility written in Rust that writes temporary AWS credentials to a user profile (~/.aws/credentials file) using web browser navigated Google authentication. This utility is inspired by gimme-aws-creds (written in python for Okta authenticated AWS farm) and heroku cli (written in nodejs and utilizes oclif framework). We will refer to our utility as aws-authcreds throughout this post.

    “If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”

    – George Bernard Shaw

    What does this CLI utility (auth-awscreds) do?

    When the user fires a command (auth-awscreds) on the terminal, our program reads utility configuration from file .auth-awscreds located in the user home directory. If this file is not present, the utility prompts for setting the configuration for the first time. Utility configuration file is INI format. Program then opens a default web browser and navigates to the URL read from the configuration file. At this point, the utility waits for the browser URL to navigate and authorize. Web UI then navigates to Google Authentication. If authentication is successful, a callback is shared with CLI utility along with temporary AWS credentials, which is then written to ~/.aws/credentials file.

    Block Diagram

    Tech Stack Used

    As stated earlier, we wrote this utility in Rust. One of the reasons for choosing Rust is because we wanted a statically typed binary (ELF) file (executed independent of interpreter), which ships as it is when compiled. Unlike programs written in Python or Node.js, one needs a language interpreter and has supporting libraries installed for your program. The golang would have also suffice our purpose, but I prefer Rust over golang.

    Software Stack:

    • Rust (for CLI utility)
    • Actix Web – HTTP Server
    • Node.js, Express, ReactJS, serverless-http, aws-sdk, AWS Amplify, axios
    • Terraform and serverless framework

    Infrastructure Stack:

    • AWS Cognito (User Pool and Federated Identities)
    • AWS API Gateway (HTTP API)
    • AWS Lambda
    • AWS S3 Bucket (React App)
    • AWS CloudFront (For Serving React App)
    • AWS ACM (SSL Certificate)

    Recipe

    Architecture Diagram

    CLI Utility: auth-awscreds

    Our goal is, when the auth-awscreds command is fired, we first check if the user’s home directory ~/.aws/credentials file exists. If not, we create a ~/.aws directory. This is the default AWS credentials directory, where usually AWS SDK looks for credentials (unless exclusively specified by env var AWS_SHARED_CREDENTIALS_FILE). The next step would be to check if a ~/.auth-awscredds file exists. If this file doesn’t exist, we create a prompt user with two inputs: 

    1. AWS credentials profile name (used by SDK, default is preferred) 

    2. Application domain URL (Our backend app domain is used for authentication)

    let app_profile_file = format!("{}/.auth-awscreds",&user_home_dir);
     
       let config_exist : bool = Path::new(&app_profile_file).exists();
     
       let mut profile_name = String::new();
       let mut app_domain = String::new();
     
       if !config_exist {
           //ask the series of questions
           print!("Which profile to write AWS Credentials [default] : ");
           io::stdout().flush().unwrap();
           io::stdin()
               .read_line(&mut profile_name)
               .expect("Failed to read line");
     
           print!("App Domain : ");
           io::stdout().flush().unwrap();
          
           io::stdin()
               .read_line(&mut app_domain)
               .expect("Failed to read line");
          
           profile_name=String::from(profile_name.trim());
           app_domain=String::from(app_domain.trim());
          
           config_profile(&profile_name,&app_domain);
          
       }
       else {
           (profile_name,app_domain) = read_profile();
       }

    These two properties are written in ~/.auth-awscreds under the default section. Followed by this, our utility generates RSA asymmetric 1024 bit public and private key. Both the keypair are converted to base64.

    pub fn genkeypairs() -> (String,String) {
       let rsa = Rsa::generate(1024).unwrap();
     
       let private_key: Vec<u8> = rsa.private_key_to_pem_passphrase(Cipher::aes_128_cbc(),"Sagar Barai".as_bytes()).unwrap();
       let public_key: Vec<u8> = rsa.public_key_to_pem().unwrap();
     
       (base64::encode(private_key) , base64::encode(public_key))
    }

    We then launch a browser window and navigate to the specified app domain URL. At this stage, our utility starts a temporary web server with the help of the Actix Web framework and listens on 63442 port of localhost.

    println!("Opening web ui for authentication...!");
       open::that(&app_domain).unwrap();
     
       HttpServer::new(move || {
           //let stopper = tx.clone();
           let cors = Cors::permissive();
           App::new()
           .wrap(cors)
           //.app_data(stopper)
           .app_data(crypto_data.clone())
           .service(get_public_key)
           .service(set_aws_creds)
       })
       .bind(("127.0.0.1",63442))?
       .run()
       .await

    Localhost web server has two end points.

    1. GET Endpoint (/publickey): This endpoint is called by our React app after authentication and returns the public key created during the initialization process. Since the web server hosted by the Rust application is insecure (non ssl),  when actual AWS credentials are received, they should be posted as an encrypted string with the help of this public key.

    #[get("/publickey")]
    pub async fn get_public_key(data: web::Data<AppData>) -> impl Responder {
       let public_key = &data.public_key;
      
       web::Json(HTTPResponseData{
           status: 200,
           msg: String::from("Ok"),
           success: true,
           data: String::from(public_key)
       })
    }

    2. POST Endpoint (/setcreds): This endpoint is called when the react app has successfully retrieved credentials from API Gateway. Credentials are decrypted by private key and then written to ~/.aws/credentials file defined by profile name in utility configuration. 

    let encrypted_data = payload["data"].as_array().unwrap();
       let username = payload["username"].as_str().unwrap();
     
       let mut decypted_payload = vec![];
     
       for str in encrypted_data.iter() {
           //println!("{}",str.to_string());
           let s = str.as_str().unwrap();
           let decrypted = decrypt_data(&private_key, &s.to_string());
           decypted_payload.extend_from_slice(&decrypted);
       }
     
       let credentials : serde_json::Value = serde_json::from_str(&String::from_utf8(decypted_payload).unwrap()).unwrap();
     
       let aws_creds = AWSCreds{
           profile_name: String::from(profile_name),
           aws_access_key_id: String::from(credentials["AccessKeyId"].as_str().unwrap()),
           aws_secret_access_key: String::from(credentials["SecretAccessKey"].as_str().unwrap()),
           aws_session_token: String::from(credentials["SessionToken"].as_str().unwrap())
       };
     
       println!("Authenticated as {}",username);
       println!("Updating AWS Credentials File...!");
     
       configcreds(&aws_creds);

    One of the interesting parts of this code is the decryption process, which iterates through an array of strings and is joined by method decypted_payload.extend_from_slice(&decrypted);. RSA 1024 is 128-byte encryption, and we used OAEP padding, which uses 42 bytes for padding and the rest for encrypted data. Thus, 86 bytes can be encrypted at max. So, when credentials are received they are an array of 128 bytes long base64 encoded data. One has to decode the bas64 string to a data buffer and then decrypt data piece by piece.

    To generate a statically typed binary file, run: cargo build –release

    AWS Cognito and Google Authentication

    This guide does not cover how to set up Cognito and integration with Google Authentication. You can refer to our old post for a detailed guide on setting up authentication and authorization. (Refer to the sections Setup Authentication and Setup Authorization).

    React App:

    The React app is launched via our Rust CLI utility. This application is served right from the S3 bucket via CloudFront. When our React app is loaded, it checks if the current session is authenticated. If not, then with the help of the AWS Amplify framework, our app is redirected to Cognito-hosted UI authentication, which in turn auto redirects to Google Login page.

    render(){
       return (
         <div className="centerdiv">
           {
             this.state.appInitialised ?
               this.state.user === null ? Auth.federatedSignIn({provider: 'Google'}) :
               <Aux>
                 {this.state.pageContent}
               </Aux>
             :
             <Loader/>
           }
         </div>
       )
     }

    Once the session is authenticated, we set the react state variables and then retrieve the public key from the actix web server (Rust CLI App: auth-awscreds) by calling /publickey GET method. Followed by this, an Ajax POST request (/auth-creds) is made via axios library to API Gateway. The payload contains a public key, and JWT token for authentication. Expected response from API gateway is encrypted AWS temporary credentials which is then proxied to our CLI application.

    To ease this deployment, we have written a terraform code (available in the repository) that takes care of creating an S3 bucket, CloudFront distribution, ACM, React build, and deploying it to the S3 bucket. Navigate to vars.tf file and change the respective default variables). The Terraform script will fail at first launch since the ACM needs a DNS record validation. You can create a CNAME record for DNS validation and re-run the Terraform script to continue deployment. The React app expects few environment variables. Below is the sample .env file; update the respective values for your environment.

    REACT_APP_IDENTITY_POOL_ID=
    REACT_APP_COGNITO_REGION=
    REACT_APP_COGNITO_USER_POOL_ID=
    REACT_APP_COGNTIO_DOMAIN_NAME=
    REACT_APP_DOMAIN_NAME=
    REACT_APP_CLIENT_ID=
    REACT_APP_CLI_APP_URL=
    REACT_APP_API_APP_URL=

    Finally, deploy the React app using below sample commands.

    $ terraform plan -out plan     #creates plan for revision
    $ terraform apply plan         #apply plan and deploy

    API Gateway HTTP API and Lambda Function

    When a request is first intercepted by API Gateway, it validates the JWT token on its own. API Gateway natively supports Cognito integration. Thus, any payload with invalid authorization header is rejected at API Gateway itself. This eases our authentication process and validates the identity. If the request is valid, it is then received by our Lambda function. Our Lambda function is written in Node.js and wrapped by serverless-http framework around express app. The Express app has only one endpoint.

    /auth-creds (POST): once the request is received, it retrieves the ID from Cognito and logs it to stdout for audit purpose.

    let identityParams = {
               IdentityPoolId: process.env.IDENTITY_POOL_ID,
               Logins: {}
           };
      
           identityParams.Logins[`${process.env.COGNITOIDP}`] = req.headers.authorization;
      
           const ci = new CognitoIdentity({region : process.env.AWSREGION});
      
           let idpResponse = await ci.getId(identityParams).promise();
      
           console.log("Auth Creds Request Received from ",JSON.stringify(idpResponse));

    The app then extracts the base64 encoded public key. Followed by this, an STS api call (Security Token Service) is made and temporary credentials are derived. These credentials are then encrypted with a public key in chunks of 86 bytes.

    const pemPublicKey = Buffer.from(public_key,'base64').toString();
     
           const authdata=await sts.assumeRole({
               ExternalId: process.env.STS_EXTERNAL_ID,
               RoleArn: process.env.IAM_ROLE_ARN,
               RoleSessionName: "DemoAWSAuthSession"
           }).promise();
     
           const creds = JSON.stringify(authdata.Credentials);
           const splitData = creds.match(/.{1,86}/g);
          
           const encryptedData = splitData.map(d=>{
               return publicEncrypt(pemPublicKey,Buffer.from(d)).toString('base64');
           });

    Here, the assumeRole calls the IAM role, which has appropriate policy documents attached. For the sake of this demo, we attached an Administrator role. However, one should consider a hardening policy document and avoid attaching Administrator policy directly to the role.

    resources:
     Resources:
       AuthCredsAssumeRole:
         Type: AWS::IAM::Role
         Properties:
           AssumeRolePolicyDocument:
             Version: "2012-10-17"
             Statement:
               -
                 Effect: Allow
                 Principal:
                   AWS: !GetAtt IamRoleLambdaExecution.Arn
                 Action: sts:AssumeRole
                 Condition:
                   StringEquals:
                     sts:ExternalId: ${env:STS_EXTERNAL_ID}
           RoleName: auth-awscreds-api
           ManagedPolicyArns:
             - arn:aws:iam::aws:policy/AdministratorAccess

    Finally, the response is sent to the React app. 

    We have used the Serverless framework to deploy the API. The Serverless framework creates API gateway, lambda function, Lambda Layer, and IAM role, and takes care of code deployment to lambda function.

    To deploy this application, follow the below steps.

    1. cd layer/nodejs && npm install && cd ../.. && npm install

    2. npm install -g serverless (on mac you can skip this step and use the npx serverless command instead) 

    3. Create .env file and below environment variables to file and set the respective values.

    AWSREGION=ap-south-1
    COGNITO_USER_POOL_ID=
    IDENTITY_POOL_ID=
    COGNITOIDP=
    APP_CLIENT_ID=
    STS_EXTERNAL_ID=
    IAM_ROLE_ARN=
    DEPLOYMENT_BUCKET=
    APP_DOMAIN=

    4. serverless deploy or npx serverless deploy

    Entire codebase for CLI APP, React App, and Backend API  is available on the GitHub repository.

    Testing:

    Assuming that you have compiled binary (auth-awscreds) available in your local machine and for the sake of testing you have installed `aws-cli`, you can then run /path/to/your/auth-awscreds. 

    App Testing

    If you selected your AWS profile name as “demo-awscreds,” you can then export the AWS_PROFILE environment variable. If you prefer a “default” profile, you don’t need to export the environment variable as AWS SDK selects a “default” profile on its own.

    [demo-awscreds]
    aws_access_key_id=ASIAUAOF2CHC77SJUPZU
    aws_secret_access_key=r21J4vwPDnDYWiwdyJe3ET+yhyzFEj7Wi1XxdIaq
    aws_session_token=FwoGZXIvYXdzEIj//////////wEaDHVLdvxSNEqaQZPPQyK2AeuaSlfAGtgaV1q2aKBCvK9c8GCJqcRLlNrixCAFga9n+9Vsh/5AWV2fmea6HwWGqGYU9uUr3mqTSFfh+6/9VQH3RTTwfWEnQONuZ6+E7KT9vYxPockyIZku2hjAUtx9dSyBvOHpIn2muMFmizZH/8EvcZFuzxFrbcy0LyLFHt2HI/gy9k6bLCMbcG9w7Ej2l8vfF3dQ6y1peVOQ5Q8dDMahhS+CMm1q/T1TdNeoon7mgqKGruO4KJrKiZoGMi1JZvXeEIVGiGAW0ro0/Vlp8DY1MaL7Af8BlWI1ZuJJwDJXbEi2Y7rHme5JjbA=

    To validate, you can then run “aws s3 ls.” You should see S3 buckets listed from your AWS account. Note that these credentials are only valid for 60 minutes. This means you will have to re-run the command and acquire a new pair of AWS credentials. Of course, you can configure your IAM role to extend expiry for an “assume role.” 

    auth-awscreds in Action:

    Summary

    Currently, “auth-awscreds” is at its early development stage. This post demonstrates how AWS credentials can be acquired temporarily without having to worry about key rotation. One of the features that we are currently working on is RBAC, with the help of AWS Cognito. Since this tool currently doesn’t support any command line argument, we can’t reconfigure utility configuration. You can manually edit or delete the utility configuration file, which triggers a prompt for configuring during the next run. We also want to add multiple profiles so that multiple AWS accounts can be used.

  • A Primer on HTTP Load Balancing in Kubernetes using Ingress on Google Cloud Platform

    Containerized applications and Kubernetes adoption in cloud environments is on the rise. One of the challenges while deploying applications in Kubernetes is exposing these containerized applications to the outside world. This blog explores different options via which applications can be externally accessed with focus on Ingress – a new feature in Kubernetes that provides an external load balancer. This blog also provides a simple hand-on tutorial on Google Cloud Platform (GCP).  

    Ingress is the new feature (currently in beta) from Kubernetes which aspires to be an Application Load Balancer intending to simplify the ability to expose your applications and services to the outside world. It can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, offer name based virtual hosting etc. Before we dive into Ingress, let’s look at some of the alternatives currently available that help expose your applications, their complexities/limitations and then try to understand Ingress and how it addresses these problems.

    Current ways of exposing applications externally:

    There are certain ways using which you can expose your applications externally. Lets look at each of them:

    EXPOSE Pod:

    You can expose your application directly from your pod by using a port from the node which is running your pod, mapping that port to a port exposed by your container and using the combination of your HOST-IP:HOST-PORT to access your application externally. This is similar to what you would have done when running docker containers directly without using Kubernetes. Using Kubernetes you can use hostPortsetting in service configuration which will do the same thing. Another approach is to set hostNetwork: true in service configuration to use the host’s network interface from your pod.

    Limitations:

    • In both scenarios you should take extra care to avoid port conflicts at the host, and possibly some issues with packet routing and name resolutions.
    • This would limit running only one replica of the pod per cluster node as the hostport you use is unique and can bind with only one service.

    EXPOSE Service:

    Kubernetes services primarily work to interconnect different pods which constitute an application. You can scale the pods of your application very easily using services. Services are not primarily intended for external access, but there are some accepted ways to expose services to the external world.

    Basically, services provide a routing, balancing and discovery mechanism for the pod’s endpoints. Services target pods using selectors, and can map container ports to service ports. A service exposes one or more ports, although usually, you will find that only one is defined.

    A service can be exposed using 3 ServiceType choices:

    • ClusterIP: Exposes the service on a cluster-internal IP. Choosing this value makes the service only reachable from within the cluster. This is the default ServiceType.
    • NodePort: Exposes the service on each Node’s IP at a static port (the NodePort). A ClusterIP service, to which the NodePort service will route, is automatically created. You’ll be able to contact the NodePort service, from outside the cluster, by requesting <nodeip>:<nodeport>.Here NodePort remains fixed and NodeIP can be any node IP of your Kubernetes cluster.</nodeport></nodeip>
    • LoadBalancer: Exposes the service externally using a cloud provider’s load balancer (eg. AWS ELB). NodePort and ClusterIP services, to which the external load balancer will route, are automatically created.
    • ExternalName: Maps the service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up. This requires version 1.7 or higher of kube-dns

    Limitations:

    • If we choose NodePort to expose our services, kubernetes will generate ports corresponding to the ports of your pods in the range of 30000-32767. You will need to add an external proxy layer that uses DNAT to expose more friendly ports. The external proxy layer will also have to take care of load balancing so that you leverage the power of your pod replicas. Also it would not be easy to add TLS or simple host header routing rules to the external service.
    • ClusterIP and ExternalName similarly while easy to use have the limitation where we can add any routing or load balancing rules.
    • Choosing LoadBalancer is probably the easiest of all methods to get your service exposed to the internet. The problem is that there is no standard way of telling a Kubernetes service about the elements that a balancer requires, again TLS and host headers are left out. Another limitation is reliance on an external load balancer (AWS’s ELB, GCP’s Cloud Load Balancer etc.)

    Endpoints

    Endpoints are usually automatically created by services, unless you are using headless services and adding the endpoints manually. An endpoint is a host:port tuple registered at Kubernetes, and in the service context it is used to route traffic. The service tracks the endpoints as pods, that match the selector are created, deleted and modified. Individually, endpoints are not useful to expose services, since they are to some extent ephemeral objects.

    Summary

    If you can rely on your cloud provider to correctly implement the LoadBalancer for their API, to keep up-to-date with Kubernetes releases, and you are happy with their management interfaces for DNS and certificates, then setting up your services as type LoadBalancer is quite acceptable.

    On the other hand, if you want to manage load balancing systems manually and set up port mappings yourself, NodePort is a low-complexity solution. If you are directly using Endpoints to expose external traffic, perhaps you already know what you are doing (but consider that you might have made a mistake, there could be another option).

    Given that none of these elements has been originally designed to expose services to the internet, their functionality may seem limited for this purpose.

    Understanding Ingress

    Traditionally, you would create a LoadBalancer service for each public application you want to expose. Ingress gives you a way to route requests to services based on the request host or path, centralizing a number of services into a single entrypoint.

    Ingress is split up into two main pieces. The first is an Ingress resource, which defines how you want requests routed to the backing services and second is the Ingress Controller which does the routing and also keeps track of the changes on a service level.

    Ingress Resources

    The Ingress resource is a set of rules that map to Kubernetes services. Ingress resources are defined purely within Kubernetes as an object that other entities can watch and respond to.

    Ingress Supports defining following rules in beta stage:

    • host header:  Forward traffic based on domain names.
    • paths: Looks for a match at the beginning of the path.
    • TLS: If the ingress adds TLS, HTTPS and a certificate configured through a secret will be used.

    When no host header rules are included at an Ingress, requests without a match will use that Ingress and be mapped to the backend service. You will usually do this to send a 404 page to requests for sites/paths which are not sent to the other services. Ingress tries to match requests to rules, and forwards them to backends, which are composed of a service and a port.

    Ingress Controllers

    Ingress controller is the entity which grants (or remove) access, based on the changes in the services, pods and Ingress resources. Ingress controller gets the state change data by directly calling Kubernetes API.

    Ingress controllers are applications that watch Ingresses in the cluster and configure a balancer to apply those rules. You can configure any of the third party balancers like HAProxy, NGINX, Vulcand or Traefik to create your version of the Ingress controller.  Ingress controller should track the changes in ingress resources, services and pods and accordingly update configuration of the balancer.

    Ingress controllers will usually track and communicate with endpoints behind services instead of using services directly. This way some network plumbing is avoided, and we can also manage the balancing strategy from the balancer. Some of the open source implementations of Ingress Controllers can be found here.

    Now, let’s do an exercise of setting up a HTTP Load Balancer using Ingress on Google Cloud Platform (GCP), which has already integrated the ingress feature in it’s Container Engine (GKE) service.

    Ingress-based HTTP Load Balancer in Google Cloud Platform

    The tutorial assumes that you have your GCP account setup done and a default project created. We will first create a Container cluster, followed by deployment of a nginx server service and an echoserver service. Then we will setup an ingress resource for both the services, which will configure the HTTP Load Balancer provided by GCP

    Basic Setup

    Get your project ID by going to the “Project info” section in your GCP dashboard. Start the Cloud Shell terminal, set your project id and the compute/zone in which you want to create your cluster.

    $ gcloud config set project glassy-chalice-129514$ 
    gcloud config set compute/zone us-east1-d
    # Create a 3 node cluster with name “loadbalancedcluster”$ 
    gcloud container clusters create loadbalancedcluster  

    Fetch the cluster credentials for the kubectl tool:

    $ gcloud container clusters get-credentials loadbalancedcluster --zone us-east1-d --project glassy-chalice-129514

    Step 1: Deploy an nginx server and echoserver service

    $ kubectl run nginx --image=nginx --port=80
    $ kubectl run echoserver --image=gcr.io/google_containers/echoserver:1.4 --port=8080
    $ kubectl get deployments
    NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    echoserver   1         1         1            1           15s
    nginx        1         1         1            1           26m

    Step 2: Expose your nginx and echoserver deployment as a service internally

    Create a Service resource to make the nginx and echoserver deployment reachable within your container cluster:

    $ kubectl expose deployment nginx --target-port=80  --type=NodePort
    $ kubectl expose deployment echoserver --target-port=8080 --type=NodePort

    When you create a Service of type NodePort with this command, Container Engine makes your Service available on a randomly-selected high port number (e.g. 30746) on all the nodes in your cluster. Verify the Service was created and a node port was allocated:

    $ kubectl get service nginx
    NAME      CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
    nginx     10.47.245.54   <nodes>       80:30746/TCP   20s
    $ kubectl get service echoserver
    NAME         CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
    echoserver   10.47.251.9   <nodes>       8080:32301/TCP   33s

    In the output above, the node port for the nginx Service is 30746 and for echoserver service is 32301. Also, note that there is no external IP allocated for this Services. Since the Container Engine nodes are not externally accessible by default, creating this Service does not make your application accessible from the Internet. To make your HTTP(S) web server application publicly accessible, you need to create an Ingress resource.

    Step 3: Create an Ingress resource

    On Container Engine, Ingress is implemented using Cloud Load Balancing. When you create an Ingress in your cluster, Container Engine creates an HTTP(S) load balancer and configures it to route traffic to your application. Container Engine has internally defined an Ingress Controller, which takes the Ingress resource as input for setting up proxy rules and talk to Kubernetes API to get the service related information.

    The following config file defines an Ingress resource that directs traffic to your nginx and echoserver server:

    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata: 
    name: fanout-ingress
    spec: 
    rules: 
    - http:     
    paths:     
    - path: /       
    backend:         
    serviceName: nginx         
    servicePort: 80     
    - path: /echo       
    backend:         
    serviceName: echoserver         
    servicePort: 8080

    To deploy this Ingress resource run in the cloud shell:

    $ kubectl apply -f basic-ingress.yaml

    Step 4: Access your application

    Find out the external IP address of the load balancer serving your application by running:

    $ kubectl get ingress fanout-ingres
    NAME             HOSTS     ADDRESS          PORTS     AG
    fanout-ingress   *         130.211.36.168   80        36s    

     

    Use http://<external-ip-address> </external-ip-address>and http://<external-ip-address>/echo</external-ip-address> to access nginx and the echo-server.

    Summary

    Ingresses are simple and very easy to deploy, and really fun to play with. However, it’s currently in beta phase and misses some of the features that may restrict it from production use. Stay tuned to get updates in Ingress on Kubernetes page and their Github repo.

    References

  • SEO for Web Apps: How to Boost Your Search Rankings

    The responsibilities of a web developer are not just designing and developing a web application but adding the right set of features that allow the site get higher traffic. One way of getting traffic is by ensuring your web page is listed in top search results of Google. Search engines consider certain factors while ranking the web page (which are covered in this guide below), and accommodating these factors in your web app is called search engine optimization. 

    A web app that is search engine optimized loads faster, has a good user experience, and is shown in the top search results of Google. If you want your web app to have these features, then this essential guide to SEO will provide you with a checklist to follow when working on SEO improvements.

    Key Facts:

    • 75% of visitors only visit the first three links listed and results from the second page get only 0.78% of clicks.
    • 95% of visitors visit only the links from the first page of Google.
    • Search engines give 300% more traffic than social media.
    • 8% of searches from browsers are in the form of a question.
    • 40% of visitors will leave a website if it takes more than 3 seconds to load. And more shocking is that 80% of those visitors will not visit the same site again.

    How Search Works:

     

     

    1. Crawling: These are the automated scripts that are often referred to as web crawlers, web spiders, Googlebot, and sometimes shortened to crawlers. These scripts look for the past crawls and look for the sitemap file, which is found at the root directory of the web application. We will cover more on the sitemap later. For now, just understand that the sitemap file has all the links to your website, which are ordered hierarchically. Crawlers add those links to the crawl queue so that they can be crawled later. Crawlers pay special attention to newly added sites and frequently updated/visited sites, and they use several algorithms to find how often the existing site should be recrawled.
    2. Indexing: Let us first understand what indexing means. Indexing is collecting, parsing, and storing data to enable a super-fast response to queries. Now, Google uses the same steps to perform web indexing. Google visits each page from the crawl queue and analyzes what the page is about and analyzes the content, images, and video, then parses the analyzed result and stores it into their database called Google Index.
    3. Serving: When a user makes a search query on Google, Google tries to determine the highest quality result and considers other criteria before serving the result, like user’s location, user’s submitted data, language, and device (desktop/mobile). That is why responsiveness is also considered for SEO. Unresponsive sites might have a higher ranking for desktop but will have a lower ranking for mobile because, while analyzing the page content, these bots see the pages as what the user sees and assign the ranking accordingly.

    Factors that affect SEO ranking:

    1. Sitemap: The sitemap file has two types: HTML & XML, and both files are placed at the root of the web app. The HTML sitemap guides users around the website pages, and it has the pages listed hierarchically  to help users understand the flow of the website. The XML sitemap helps the search engine bots crawl the pages of the site, and it helps the crawlers to understand the website structure. It has different types of data, which helps the bots to perform crawling cleverly.

    loc: The URL of the webpage.

    lastmod: When the content of the URL got updated.

    changefreq: How often the content of the page gets changed.

    priority: It has the range from 0 to 1—0 represents the lowest priority, and 1 represents the highest. 1 is generally given to the home or landing page. Setting 1 to every URL will cause search engines to ignore this field.

    Click here to see how a sitemap.xml looks like.

    The below example shows how the URL will be written along with the fields.

     

    2. Meta tags: Meta tags are very important because they indirectly affect the SEO ranking,  and they contain important information about the web page, and this information is shown as the snippet in Google search results. Users see this snippet and decide whether to click this link, and search engines consider the click rates parameter when serving the results. Meta tags are not visible to the user on the web page, but they are part of HTML code.

    A few important meta tags for SEO are:

    • Meta title: This is the primary content shown by the search results, and it plays a huge role in deciding the click rates because it gives users a quick glance at what this page is about. It should ideally be 50-60 characters long, and the title should be unique for each page.
    • Meta description: It summarizes or gives an overview of the page content in short. The description should be precise and of high quality. It should include some targeted keywords the user will likely search and be under 160 characters.
    • Meta robots: It tells search engines whether to index and crawl web pages. The four values it can contain are index, noindex, follow, or nofollow. If these values are not used correctly, then it will negatively impact the SEO.
      index/noindex: Tells whether to index the web page.
      follow/nofollow: Tells whether to crawl links on the web page.
    • Meta viewport: It sends the signal to search engines that the web page is responsive to different screen sizes, and it instructs the browser on how to render the page. This tag presence helps search engines understand that the website is mobile-friendly, which matters because Google ranks the results differently in mobile search. If the desktop version is opened in mobile, then the user will most likely close the page, sending a negative signal to Google that this page has some undesirable content and results in lowering the ranking. This tag should be present on all the web pages.

      Let us look at what a Velotio page would look like with and without the meta viewport tag.


    • Meta charset: It sets the character encoding of the webpage in simple terms, telling how the text should be displayed on the page. Wrong character encoding will make content hard to read for search engines and will lead to a bad user experience. Use UTF-8 character encoding wherever possible.
    • Meta keywords: Search engines don’t consider this tag anymore. Bing considers this tag as spam. If this tag is added to any of the web pages, it may work against SEO. It is advisable not to have this tag on your pages.

    3. Usage of Headers / Hierarchical content: Header tags are the heading tags that are important for user readability and search engines. Headers organize the content of the web page so that it won’t look like a plain wall of text. Bots check for how well the content is organized and assign the ranking accordingly. Headers make the content user-friendly, scannable, and accessible. Header tags are from h1 to h6, with h1 being high importance and h6 being low importance. Googlebot considers h1 mainly because it is typically the title of the page and provides brief information about what this page content has.

    If Velotio’s different pages of content were written on one big page (not good advice, just for example), then hierarchy can be done like the below snapshot.

    4. Usage of Breadcrumb: Breadcrumbs are the navigational elements that allow users to track which page they are currently on. Search engines find this helpful to understand the structure of the website. It lowers the bounce rate by engaging users to explore other pages of the website. Breadcrumbs can be found at the top of the page with slightly smaller fonts. Usage of breadcrumb is always recommended if your site has deeply nested pages.

    If we refer to the MDN pages, then a hierarchical breadcrumb can be found at the top of the page.

    5. User Experience (UX): UX has become an integral component of SEO. A good UX always makes your users stay longer, which lowers the bounce rate and makes them visit your site again. Google recognizes this stay time and click rates and considers the site as more attractive to users, ranking it higher in the search results. Consider the following points to have a good user experience.

    1. Divide content into sections, not just a plain wall of text
    2. Use hierarchical font sizes
    3. Use images/videos that summarize the content
    4. Good theme and color contrast
    5. Responsiveness (desktop/tablet/mobile)

    6. Robots.txt: The robots.txt file prevents crawlers from accessing all pages of the site. It contains some commands that tell the bots not to index the disallowed pages. By doing this, crawlers will not crawl those pages and will not index them. The best example of a page that should not be crawled is the payment gateway page. Robots.txt is kept at the root of the web app and should be public. Refer to Velotio’s robots.txt file to know more about it. User-Agent:* means the given command will be applied to all the bots that support robots.txt.

    7. Page speed: Page speed is the time it takes to get the page fully displayed and interactive. Google also considers page speed an important factor for SEO. As we have seen from the facts section, users tend to close a site if it takes longer than 3 seconds to load. To Googlebot, this is something unfavorable to the user experience, and it will lower the ranking. We will go through some tools later in this section to  know the loading speed of a page, but if your site loads slowly, then look into the recommendations below.

    • Image compression: In a consumer-oriented website, the images contribute to around 50-90% of the page. The images must load quickly. Use compressed images, which lowers the file size without compromising the quality. Cloudinary is a platform that does this job decently.
      If your image size is 700×700 and is shown in a 300x*300 container, then rather than doing this with CSS, load the image at 300x*300 only, because browsers don’t need to load such a big image, and it will take more time to reduce the image through CSS. All this time can be avoided by loading an image of the required size.
      By utilizing deferring/lazy image loading, images are downloaded when they are needed as the user scrolls on the webpage. Doing this allows the images to not be loaded at once, and browsers will have the bandwidth to perform other tasks.
      Using sprite images is also an effective way to reduce the HTTP requests by combining small icons into one sprite image and displaying the section we want to show. This will save load time by avoiding loading multiple images.
    • Code optimization: Every developer should consider reusability while developing code, which will help in reducing the code size. Nowadays, most websites are developed using bundlers. Use bundle analyzers to analyze which piece of code is leading to a size increase. Bundlers are already doing the minification process while generating the build artifacts.
    • Removing render-blocking resources: Browsers build the DOM tree by parsing HTML. During this process, if it finds any scripts, then the creation of the DOM tree is paused and script execution starts. This will increase the page load time, and to make it work without blocking DOM creation, use async & defer in your scripts and load the script at the footer of the body. Keep in mind, though, that some scripts need to be loaded on the header like Google analytics script. Don’t use this suggested step blindly as it may cause some unusual behavior in your site.
    • Implementing a Content Distribution Network (CDN): It helps in loading the resources in a shorter time by figuring out the nearest server located from the user location and delivering the content from the nearest server.
    • Good hosting platform: Optimizing images and code alone can not always improve page speed. Budget-friendly servers serve millions of other websites, which will prevent your site from loading quickly. So, it is always recommended to use the premium hosting service or a dedicated server.
    • Implement caching: If resources are cached on a browser, then they are not fetched from the server; rather the browser picks them from the cache. It is important to have an expiration time while setting cache. And caching should also be done only on the resources that are not updated frequently.
    • Reducing redirects: In redirecting a page, an additional time is added for the HTTP request-response cycle. It is advisable not to use too many redirects.

    Some tools help us find the score of our website and provide information on what areas can be improved. These tools consider SEO, user experience, and accessibility point of view while calculating the score. These tools give results in some technical terms. Let us understand them in short:

    1. Time to first byte: It represents the moment when the web page starts loading. When we see a white screen for some time on page landing, that is TTFB at work.

    2. First contentful paint: It represents when the user sees something on the web page.

    3. First meaningful paint: It tells when the user understands the content, like text/images on the web page.

    4. First CPU idle: It represents the moment when the site has loaded enough information for it to be able to handle the user’s first input.

    5. Largest contentful paint: It represents when everything above the page’s fold (without scrolling) is visible.

    6. Time to interactive: It represents the moment when the web page is fully interactive.

    7. Total blocking time: It is the total amount of time the webpage was blocked.

    8. Cumulative layout shift: It is measured as the time taken in shifting web elements while the page is being rendered.

    Below are some popular tools we can use for performance analysis:

    1. Page speed insights: This assessment tool provides the score and opportunities to improve.

    2. Web page test: This monitoring tool lets you analyze each resource’s loading time.

    3. Gtmetrix: This is also an assessment tool like Lighthouse that gives some more information, and we can set test location as well.

    Conclusion:

    We have seen what SEO is, how it works, and how we can improve it by going through sitemap, meta tags, heading tags, robots.txt, breadcrumb, user experience, and finally the page load speed. For a business-to-consumer application, SEO is highly important. It lets you drive more traffic to your website. Hopefully, this basic guide will help you improve SEO for your existing and future websites.

    Related Articles

    1. Eliminate Render-blocking Resources using React and Webpack

    2. Building High-performance Apps: A Checklist To Get It Right

    3. Building a Progressive Web Application in React [With Live Code Examples]

  • Elasticsearch 101: Fundamentals & Core Components

    Elasticsearch is currently the most popular way to implement free text search and analytics in applications. It is highly scalable and can easily manage petabytes of data. It supports a variety of use cases like allowing users to easily search through any portal, collect and analyze log data, build business intelligence dashboards to quickly analyze and visualize data.  

    This blog acts as an introduction to Elasticsearch and covers the basic concepts of clusters, nodes, index, document and shards.

    What is Elasticsearch?

    Elasticsearch (ES) is a combination of open-source, distributed, highly scalable data store, and Lucene – a search engine that supports extremely fast full-text search. It is a beautifully crafted software, which hides the internal complexities and provides full-text search capabilities with simple REST APIs. Elasticsearch is written in Java with Apache Lucene at its core. It should be clear that Elasticsearch is not like a traditional RDBMS. It is not suitable for your transactional database needs, and hence, in my opinion, it should not be your primary data store. It is a common practice to use a relational database as the primary data store and inject only required data into Elasticsearch.

    Elasticsearch is meant for fast text search. There are several functionalities, which make it different from RDBMS. Unlike RDBMS, Elasticsearch stores data in the form of a JSON document, which is denormalized and doesn’t support transactions, referential integrity, joins, and subqueries.

    Elasticsearch works with structured, semi-structured, and unstructured data as well. In the next section, let’s walk through the various components in Elasticsearch.

    Elasticsearch Components

    Cluster

    One or more servers collectively providing indexing and search capabilities form an Elasticsearch cluster. The cluster size can vary from a single node to thousands of nodes, depending on the use cases.

    Node

    Node is a single physical or virtual machine that holds full or part of your data and provides computing power for indexing and searching your data. Every node is identified with a unique name. If the node identifier is not specified, a random UUID is assigned as a node identifier at the startup. Every node configuration has the property `cluster.name`. The cluster will be formed automatically with all the nodes having the same `cluster.name` at startup.

    A node has to accomplish several duties such as:

    • storing the data
    • performing operations on data (indexing, searching, aggregation, etc.)
    • maintaining the health of the cluster

    Each node in a cluster can do all these operations. Elasticsearch provides the capability to split responsibilities across different nodes. This makes it easy to scale, optimize, and maintain the cluster. Based on the responsibilities, the following are the different types of nodes that are supported:

    Data Node

    Data node is the node that has storage and computation capability. Data node stores the part of data in the form of shards (explained in the later section). Data nodes also participate in the CRUD, search, and aggregate operations. These operations are resource-intensive, and hence, it is a good practice to have dedicated data nodes without having the additional load of cluster administration. By default, every node of the cluster is a data node.

    Master Node

    Master nodes are reserved to perform administrative tasks. Master nodes track the availability/failure of the data nodes. The master nodes are responsible for creating and deleting the indices (explained in the later section).

    This makes the master node a critical part of the Elasticsearch cluster. It has to be stable and healthy. A single master node for a cluster is certainly a single point of failure. Elasticsearch provides the capability to have multiple master-eligible nodes. All the master eligible nodes participate in an election to elect a master node. It is recommended to have a minimum of three nodes in the cluster to avoid a split-brain situation. By default, all the nodes are both data nodes as well as master nodes. However, some nodes can be master-eligible nodes only through explicit configuration.

    Coordinating-Only Node

    Any node, which is not a master node or a data node, is a coordinating node. Coordinating nodes act as smart load balancers. Coordinating nodes are exposed to end-user requests. It appropriately redirects the requests between data nodes and master nodes.

    To take an example, a user’s search request is sent to different data nodes. Each data node searches locally and sends the result back to the coordinating node. Coordinating node aggregates and returns the result to the user.

    There are a few concepts that are core to Elasticsearch. Understanding these basic concepts will tremendously ease the learning process.

    Index

    Index is a container to store data similar to a database in the relational databases. An index contains a collection of documents that have similar characteristics or are logically related. If we take an example of an e-commerce website, there will be one index for products, one for customers, and so on. Indices are identified by the lowercase name. The index name is required to perform the add, update, and delete operations on the documents.

    Type

    Type is a logical grouping of the documents within the index. In the previous example of product index, we can further group documents into types, like electronics, fashion, furniture, etc. Types are defined on the basis of documents having similar properties in it. It isn’t easy to decide when to use the type over the index. Indices have more overheads, so sometimes, it is better to use different types in the same index for better performance. There are a couple of restrictions to use types as well. For example, two fields having the same name in different types of documents should be of the same datatype (string, date, etc.).

    Document

    Document is the piece indexed by Elasticsearch. A document is represented in the JSON format. We can add as many documents as we want into an index. The following snippet shows how to create a document of type mobile in the index store. We will cover more about the individual field of the document in the Mapping Type section.

    HTTP POST <hostname:port>/store/mobile/
    {    
    "name": "Motorola G5",    
    "model": "XT3300",    
    "release_date": "2016-01-01",    
    "features": "16 GB ROM | Expandable Upto 128 GB | 5.2 inch Full HD Display | 12MP Rear Camera | 5MP Front Camera | 3000 mAh Battery | Snapdragon 625 Processor",    
    "ram_gb": "3",    
    "screen_size_inches": "5.2"
    }

    Mapping Types

    To create different types in an index, we need mapping types (or simply mapping) to be specified during index creation. Mappings can be defined as a list of directives given to Elasticseach about how the data is supposed to be stored and retrieved. It is important to provide mapping information at the time of index creation based on how we want to retrieve our data later. In the context of relational databases, think of mappings as a table schema.

    Mapping provides information on how to treat each JSON field. For example, the field can be of type date, geolocation, or person name. Mappings also allow specifying which fields will participate in the full-text search, and specify the analyzers used to transform and decorate data before storing into an index. If no mapping is provided, Elasticsearch tries to identify the schema itself, known as Dynamic Mapping. 

    Each mapping type has Meta Fields and Properties. The snippet below shows the mapping of the type mobile.

    {    
    "mappings": {        
      "mobile": {            
        "properties": {                
          "name": {                    
            "type": "keyword"                
          },                
            "model": {                    
              "type": "keyword"                
           },               
              "release_date": {                    
                "type": "date"                
           },                
                "features": {                    
                  "type": "text"               
             },                
                "ram_gb": {                    
                  "type": "short"                
              },                
                  "screen_size_inches": {                    
                    "type": "float"                
              }            
            }        
          }    
       }
    }

    Meta Fields

    As the name indicates, meta fields stores additional information about the document. Meta fields are meant for mostly internal usage, and it is unlikely that the end-user has to deal with meta fields. Meta field names starts with an underscore. There are around ten meta fields in total. We will talk about some of them here:

    _index

    It stores the name of the index document it belongs to. This is used internally to store/search the document within an index.

    _type

    It stores the type of the document. To get better performance, it is often included in search queries.

    _id

    This is the unique id of the document. It is used to access specific document directly over the HTTP GET API.

    _source

    This holds the original JSON document before applying any analyzers/transformations. It is important to note that Elasticsearch can query on fields that are indexed (provided mapping for). The _source field is not indexed, and hence, can’t be queried on but it can be included in the final search result.

    Fields Or Properties

    List of fields specifies which all JSON fields in the document should be included in a particular type. In the e-commerce website example, mobile can be a type. It will have fields, like operating_system, camera_specification, ram_size, etc.

    Fields also carry the data type information with them. This directs Elasticsearch to treat the specific fields in a particular way of storing/searching data. Data types are similar to what we see in any other programming language. We will talk about a few of them here.

    Simple Data Types

    Text

    This data type is used to store full-text like product description. These fields participate in full-text search. These types of fields are analyzed while storing, which enables to searching them by the individual word in it. Such fields are not used in sorting and aggregation queries.

    Keywords

    This type is also used to store text data, but unlike Text, it is not analyzed and stored. This is suitable to store information like a user’s mobile number, city, age, etc. These fields are used in filter, aggregation, and sorting queries. For e.g., list all users from a particular city and filter them by age.

    Numeric

    Elasticsearch supports a wide range of numeric type: long, integer, short, byte, double, float.

    There are a few more data types to support date, boolean (true/false, on/off, 1/0), IP (to store IP addresses).

    Special Data Types

    Geo Point

    This data type is used to store geographical location. It accepts latitude and longitude pair. For example, this data type can be used to arrange the user’s photo library by their geographical location or graphically display the locations trending on social media news.

    Geo Shape

    It allows storing arbitrary geometric shapes like rectangle, polygon, etc.

    Completion Suggester

    This data type is used to provide auto-completion feature over a specific field. As the user types certain text, the completion suggester can guide the user to reach particular results.

    Complex Data Type

    Object

    If you know JSON well, this concept won’t be new for you. Elasticsearch also allows storing nested JSON object structure as a document.

    Nested

    The Object data type is not that useful due to its underlying data representation in the Lucene index. Lucene index does not support inner JSON object. ES flattens the original JSON to make it compatible with storing in Lucene index. Thus, fields of the multiple inner objects get merged into one leading object to wrong search results. Most of the time, you may use Nested data type over Object.

    Shards

    Shards help with enabling Elasticsearch to become horizontally scalable. An index can store millions of documents and occupy terabytes of data. This can cause problems with performance, scalability, and maintenance. Let’s see how Shards help achieve scalability.

    Indices are divided into multiple units called Shards (refer the diagram below). Shard is a full-featured subset of an index. Shards of the same index now can reside on the same or different nodes of the cluster. Shard decides the degree of parallelism for search and indexing operations. Shards allow the cluster to grow horizontally. The number of shards per index can be specified at the time of index creation. By default, the number of shards created is 5. Although, once the index is created the number of shards can not be changed. To change the number of shards, reindex the data.

    Replication

    Hardware can fail at any time. To ensure fault tolerance and high availability, ES provides a feature to replicate the data. Shards can be replicated. A shard which is being copied is called as Primary Shard. The copy of the primary shard is called a replica shard or simply replica. Like the number of shards, the number of replication can also be specified at the time of index creation. Replication served two purposes:

    • High Availability – Replica is never been created on the same node where the primary shard is present. This ensures that data can be available through the replica shard even if the complete node is failed.
    • Performance – Replica can also contribute to search capabilities. The search queries will be executed parallelly across the replicas.

    To summarize, to achieve high availability and performance, the index is split into multiple shards. In a production environment, multiple replicas are created for every index. In the replicated index, only primary shards can serve write requests. However, all the shards (the primary shard as well as replicated shards) can serve read/query requests. The replication factor is defined at the time of index creation and can be changed later if required. Choosing the number of shards is an important exercise. As once defined, it can’t be changed. In critical scenarios, changing the number of shards requires creating a new index with required shards and reindexing old data.

    Summary

    In this blog, we have covered the basic but important aspects of Elasticsearch. In the following posts, I will talk about how indexing & searching works in detail. Stay tuned!

  • Improving Elasticsearch Indexing in the Rails Model using Searchkick

    Searching has become a prominent feature of any web application, and a relevant search feature requires a robust search engine. The search engine should be capable of performing a full-text search, auto completion, providing suggestions, spelling corrections, fuzzy search, and analytics. 

    Elasticsearch, a distributed, fast, and scalable search and analytic engine, takes care of all these basic search requirements.

    The focus of this post is using a few approaches with Elasticsearch in our Rails application to reduce time latency for web requests. Let’s review one of the best ways to improve the Elasticsearch indexing in Rails models by moving them to background jobs.

    In a Rails application, Elasticsearch can be integrated with any of the following popular gems:

    We can continue with any of these gems mentioned above. But for this post, we will be moving forward with the Searchkick gem, which is a much more Rails-friendly gem.

    The default Searchkick gem option uses the object callbacks to sync the data in the respective Elasticsearch index. Being in the callbacks, it costs the request, which has the creation and updation of a resource to take additional time to process the web request.

    The below image shows logs from a Rails application, captured for an update request of a user record. We have added a print statement before Elasticsearch tries to sync in the Rails model so that it helps identify from the logs where the indexing has started. These logs show that the last two queries were executed for indexing the data in the Elasticsearch index.

    Since the Elasticsearch sync is happening while updating a user record, we can conclude that the user update request will take additional time to cover up the Elasticsearch sync.

    Below is the request flow diagram:

    From the request flow diagram, we can say that the end-user must wait for step 3 and 4 to be completed. Step 3 is to fetch the children object details from the database.

    To tackle the problem, we can move the Elasticsearch indexing to the background jobs. Usually, for Rails apps in production, there are separate app servers, database servers, background job processing servers, and Elasticsearch servers (in this scenario).

    This is how the request flow looks when we move Elasticsearch indexing:

    Let’s get to coding!

    For demo purposes, we will have a Rails app with models: `User` and `Blogpost`. The stack used here:

    • Rails 5.2
    • Elasticsearch 6.6.7
    • MySQL 5.6
    • Searchkick (gem for writing Elasticsearch queries in Ruby)
    • Sidekiq (gem for background processing)

    This approach does not require  any specific version of Rails, Elasticsearch or Mysql. Moreover, this approach is database agnostic. You can go through the code from this Github repo for reference.

    Let’s take a look at the user model with Elasticsearch index.

    # == Schema Information
    #
    # Table name: users
    #
    #  id            :bigint           not null, primary key
    #  name          :string(255)
    #  email         :string(255)
    #  mobile_number :string(255)
    #  created_at    :datetime         not null
    #  updated_at    :datetime         not null
    #
    class User < ApplicationRecord
     searchkick
    
     has_many :blogposts
     def search_data
       {
         name: name,
         email: email,
         total_blogposts: blogposts.count,
         last_published_blogpost_date: last_published_blogpost_date
       }
     end
     ...
    end

    Anytime a user object is inserted, updated, or deleted, Searchkick reindexes the data in the Elasticsearch user index synchronously.

    Searchkick already provides four ways to sync Elasticsearch index:

    • Inline (default)
    • Asynchronous
    • Queuing
    • Manual

    For more detailed information on this, refer to this page. In this post, we are looking in the manual approach to reindex the model data.

    To manually reindex, the user model will look like:

    class User < ApplicationRecord
     searchkick callbacks: false
    
     def search_data
       ...
     end
    end

    Now, we will need to define a callback that can sync the data to the Elasticsearch index. Typically, this callback must be written in all the models that have the Elasticsearch index. Instead, we can write a common concern and include it to required models.

    Here is what our concern will look like:

    module ElasticsearchIndexer
     extend ActiveSupport::Concern
    
     included do
       after_commit :reindex_model
       def reindex_model
         ElasticsearchWorker.perform_async(self.id, self.class.name)
       end
     end
    end

    In the above active support concern, we have called the Sidekiq worker named ElasticsearchWorker. After adding this concern, don’t forget to include the Elasticsearch indexer concern in the user model, like so:

    include ElasticsearchIndexer

    Now, let’s see the Elasticsearch Sidekiq worker:

    class ElasticsearchWorker
     include Sidekiq::Worker
     def perform(id, klass)
       begin
         klass.constantize.find(id.to_s).reindex
       rescue => e
         # Handle exception
       end
     end
    end

    That’s it, we’ve done it. Cool, huh? Now, whenever a user creates, updates, or deletes web request, a background job will be created. The background job can be seen in the Sidekiq web UI at localhost:3000/sidekiq

    Now, there is little problem in the Elasticsearch indexer concern. To reproduce this, go to your user edit page, click save, and look at localhost:3000/sidekiq—a job will be queued.

    We can handle this case by tracking the dirty attributes. 

    module ElasticsearchIndexer
     extend ActiveSupport::Concern
     included do
       after_commit :reindex_model
       def reindex_model
         return if self.previous_changes.keys.blank?
         ElasticsearchWorker.perform_async(self.id, klass)
       end
     end
    end

    Furthermore, there are few more areas of improvement. Suppose you are trying to update the field of user model that is not part of the Elasticsearch index, the Elasticsearch worker Sidekiq job will still get created and reindex the associated model object. This can be modified to create the Elasticsearch indexing worker Sidekiq job only if the Elasticsearch index fields are updated.

    module ElasticsearchIndexer
     extend ActiveSupport::Concern
     included do
       after_commit :reindex_model
       def reindex_model
         updated_fields = self.previous_changes.keys
        
         # For getting ES Index fields you can also maintain constant
         # on model level or get from the search_data method.
         es_index_fields = self.search_data.stringify_keys.keys
         return if (updated_fields & es_index_fields).blank?
         ElasticsearchWorker.perform_async(self.id, klass)
       end
     end
    end

    Conclusion

    Moving the Elasticsearch indexing to background jobs is a great way to boost the performance of the web app by reducing the response time of any web request. Implementing this approach for every model would not be ideal. I would recommend this approach only if the Elasticsearch index data are not needed in real-time.

    Since the execution of background jobs depends on the number of jobs it must perform, it might take time to reflect the changes in the Elasticsearch index if there are lots of jobs queued up. To solve this problem to some extent, the Elasticsearch indexing jobs can be added in a queue with high priority. Also, make sure you have a different app server and background job processing server. This approach works best if the app server is different than the background job processing server.

  • Managing a TLS Certificate for Kubernetes Admission Webhook

    A Kubernetes admission controller is a great way of handling an incoming request, whether to add or modify fields or deny the request as per the rules/configuration defined. To extend the native functionalities, these admission webhook controllers call a custom-configured HTTP callback (webhook server) for additional checks. But the API server only communicates over HTTPS with the admission webhook servers and needs TLS cert’s CA information. This poses a problem for how we handle this webhook server certificate and how to pass CA information to the API server automatically.

    One way to handle these TLS certificate and CA is using Kubernetes cert-manager. However, Kubernetes cert-manager itself is a big application and consists of many CRDs to handle its operation. It is not a good idea to install cert-manager just to handle admission webhook TLS certificate and CA. The second and possibly easier way is to use self-signed certificate and handle CA on our own using the Init Container. This eliminates the dependency on other applications, like cert-manager, and gives us the flexibility to control our application flow.

    How is a custom admission webhook written? We will not cover this in-depth, and only a basic overview of admission controllers and their working will be covered. The main focus for this blog will be to cover the second approach step-by-step: handling admission webhook server TLS certificate and CA on our own using init container so that the API server can communicate with our custom webhook.

    To understand the in-depth working of Admission Controllers, these articles are great: 

    Prerequisites:

    • Knowledge of Kubernetes admission controllers,  MutatingAdmissionWebhook, ValidatingAdmissionWebhook
    • Knowledge of Kubernetes resources like pods and volumes

    Basic Overview: 

    Admission controllers intercept requests to the Kubernetes API server before persistence of objects in the etcd. These controllers are bundled and compiled together into the kube-apiserver binary. They consist of a list of controllers, and in that list, there are two special controllers: MutatingAdmissionWebhook and ValidatingAdmissionWebhook. MutatingAdmissionWebhook, as the name suggests, mutates/adds/modifies some fields in the request object by creating a patch, and ValidatingAdmissionWebhook validates the request by checking if the request object fields are valid or if the operation is allowed, etc., as per custom logic.

    The main reason for these types of controllers is to dynamically add new checks along with the native existing checks in Kubernetes to allow a request, just like the plug-in model. To understand this more clearly, let’s say we want all the deployments in the cluster to have certain required labels. If the deployment does not have required labels, then the create deployment request should be denied. This functionality can be achieved in two ways: 

    1) Add these extra checks natively in Kubernetes API server codebase, compile a new binary, and run with the new binary. This is a tedious process, and every time new checks are needed, a new binary is required. 

    2) Create a custom admission webhook, a simple HTTP server, for these additional checks, and register this admission webhook with the API server using AdmissionRegistration API. To register two configurations, MutatingWebhookconfiguration and ValidatingWebhookConfiguration are used. The second approach is recommended and it’s quite easy as well. We will be discussing it here in detail.

    Custom Admission Webhook Server:

    As mentioned earlier, a custom admission webhook server is a simple HTTP server with TLS that exposes endpoints for mutation and validation. Depending upon the endpoint hit, corresponding handlers process mutation and validation. Once a custom webhook server is ready and deployed in a cluster as a deployment along with webhook service, the next part is to register it with the API server so that the API server can communicate with the custom webhook server. To register, MutatingWebhookconfiguration and ValidatingWebhookConfiguration are used. These configurations have a section to fill custom webhook related information.

    apiVersion: admissionregistration.k8s.io/v1
    kind: MutatingWebhookConfiguration
    metadata:
      name: mutation-config
    webhooks:
      - admissionReviewVersions:
        - v1beta1
        name: mapplication.kb.io
        clientConfig:
          caBundle: ${CA_BUNDLE}
          service:
            name: webhook-service
            namespace: default
            path: /mutate
        rules:
          - apiGroups:
              - apps
          - apiVersions:
              - v1
            resources:
              - deployments
        sideEffects: None

    Here, the service field gives information about the name, namespace, and endpoint path of the webhook server running. An important field here to note is the CA bundle. A custom admission webhook is required to run the HTTP server with TLS only because the API server only communicates over HTTPS. So the webhook server runs with server cert, and key and “caBundle” in the configuration is CA (Certification Authority) information so that API server can recognize server certificate.

    The problem here is how to handle this server certificate and the key—and how to get this CA bundle and pass this information to the API server using MutatingWebhookconfiguration or ValidatingWebhookConfiguration. This will be the main focus of the following part.

    Here, we are going to use a self-signed certificate for the webhook server. Now, this self-signed certificate can be made available to the webhook server using different ways. Two possible ways are:

    • Create a Kubernetes secret containing certificate and key and mount that as volume on to the server pod
    • Somehow create certificate and key in a volume, e.g., emptyDir volume and server consumes those from that volume

    However, even after doing any of the above two possible ways, the remaining important part is to add the CA bundle in mutation/validation configs.

    So, instead of doing all these steps manually, we all make use of Kubernetes init containers to perform all functions for us.

    Custom Admission Webhook Server Init Container:

    The main function of this init container will be to create a self-signed webhook server certificate and provide the CA bundle to the API server via mutation/validation configs. How the webhook server consumes this certificate (via secret volume or emptyDir volume), depends on the use-case. This init container will run a simple Go binary to perform all these functions.

    package main
    
    import (
    	"bytes"
    	cryptorand "crypto/rand"
    	"crypto/rsa"
    	"crypto/x509"
    	"crypto/x509/pkix"
    	"encoding/pem"
    	"fmt"
    	log "github.com/sirupsen/logrus"
    	"math/big"
    	"os"
    	"time"
    )
    
    func main() {
    	var caPEM, serverCertPEM, serverPrivKeyPEM *bytes.Buffer
    	// CA config
    	ca := &x509.Certificate{
    		SerialNumber: big.NewInt(2020),
    		Subject: pkix.Name{
    			Organization: []string{"velotio.com"},
    		},
    		NotBefore:             time.Now(),
    		NotAfter:              time.Now().AddDate(1, 0, 0),
    		IsCA:                  true,
    		ExtKeyUsage:           []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth, x509.ExtKeyUsageServerAuth},
    		KeyUsage:              x509.KeyUsageDigitalSignature | x509.KeyUsageCertSign,
    		BasicConstraintsValid: true,
    	}
    
    	// CA private key
    	caPrivKey, err := rsa.GenerateKey(cryptorand.Reader, 4096)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// Self signed CA certificate
    	caBytes, err := x509.CreateCertificate(cryptorand.Reader, ca, ca, &caPrivKey.PublicKey, caPrivKey)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// PEM encode CA cert
    	caPEM = new(bytes.Buffer)
    	_ = pem.Encode(caPEM, &pem.Block{
    		Type:  "CERTIFICATE",
    		Bytes: caBytes,
    	})
    
    	dnsNames := []string{"webhook-service",
    		"webhook-service.default", "webhook-service.default.svc"}
    	commonName := "webhook-service.default.svc"
    
    	// server cert config
    	cert := &x509.Certificate{
    		DNSNames:     dnsNames,
    		SerialNumber: big.NewInt(1658),
    		Subject: pkix.Name{
    			CommonName:   commonName,
    			Organization: []string{"velotio.com"},
    		},
    		NotBefore:    time.Now(),
    		NotAfter:     time.Now().AddDate(1, 0, 0),
    		SubjectKeyId: []byte{1, 2, 3, 4, 6},
    		ExtKeyUsage:  []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth, x509.ExtKeyUsageServerAuth},
    		KeyUsage:     x509.KeyUsageDigitalSignature,
    	}
    
    	// server private key
    	serverPrivKey, err := rsa.GenerateKey(cryptorand.Reader, 4096)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// sign the server cert
    	serverCertBytes, err := x509.CreateCertificate(cryptorand.Reader, cert, ca, &serverPrivKey.PublicKey, caPrivKey)
    	if err != nil {
    		fmt.Println(err)
    	}
    
    	// PEM encode the  server cert and key
    	serverCertPEM = new(bytes.Buffer)
    	_ = pem.Encode(serverCertPEM, &pem.Block{
    		Type:  "CERTIFICATE",
    		Bytes: serverCertBytes,
    	})
    
    	serverPrivKeyPEM = new(bytes.Buffer)
    	_ = pem.Encode(serverPrivKeyPEM, &pem.Block{
    		Type:  "RSA PRIVATE KEY",
    		Bytes: x509.MarshalPKCS1PrivateKey(serverPrivKey),
    	})
    
    	err = os.MkdirAll("/etc/webhook/certs/", 0666)
    	if err != nil {
    		log.Panic(err)
    	}
    	err = WriteFile("/etc/webhook/certs/tls.crt", serverCertPEM)
    	if err != nil {
    		log.Panic(err)
    	}
    
    	err = WriteFile("/etc/webhook/certs/tls.key", serverPrivKeyPEM)
    	if err != nil {
    		log.Panic(err)
    	}
    
    }
    
    // WriteFile writes data in the file at the given path
    func WriteFile(filepath string, sCert *bytes.Buffer) error {
    	f, err := os.Create(filepath)
    	if err != nil {
    		return err
    	}
    	defer f.Close()
    
    	_, err = f.Write(sCert.Bytes())
    	if err != nil {
    		return err
    	}
    	return nil
    }

    The steps to generate self-signed CA and sign webhook server certificate using this CA in Golang:

    • Create a config for the CA, ca in the code above.
    • Create an RSA private key for this CA, caPrivKey in the code above.
    • Generate a self-signed CA, caBytes, and caPEM above. Here caPEM is the PEM encoded caBytes and will be the CA bundle given to the API server.
    • Create a config for webhook server certificate, cert in the code above. The important field in this configuration is the DNSNames and commonName. This name must be the full webhook service name of the webhook server to reach the webhook pod.
    • Create an RS private key for the webhook server, serverPrivKey in the code above.
    • Create server certificate using ca and caPrivKey, serverCertBytes in the code above.
    • Now, PEM encode the serverPrivKey and serverCertBytes. This serverPrivKeyPEM and serverCertPEM is the TLS certificate and key and will be consumed by the webhook server.

    At this point, we have generated the required certificate, key, and CA bundle using init container. Now we will share this server certificate and key with the actual webhook server container in the same pod. 

    • One approach is to create an empty secret resource before-hand, create webhook deployment by passing the secret name as an environment variable. Init container will generate server certificate and key and populate the empty secret with certificate and key information. This secret will be mounted on to webhook server container to start HTTP server with TLS.
    • The second approach (used in the code above) is to use Kubernete’s native pod specific emptyDir volume. This volume will be shared between both the containers. In the code above, we can see the init container is writing these certificate and key information in a file on a particular path. This path will be the one emptyDir volume is mounted to, and the webhook server container will read certificate and key for TLS configuration from that path and start the HTTP webhook server. Refer to the below diagram:

    The pod spec will look something like this:

    spec:
      initContainers:
          image: <webhook init-image name>
          imagePullPolicy: IfNotPresent
          name: webhook-init
          volumeMounts:
            - mountPath: /etc/webhook/certs
              name: webhook-certs
      containers:
          image: <webhook server image name>
          imagePullPolicy: IfNotPresent
          name: webhook-server
          volumeMounts:
            - mountPath: /etc/webhook/certs
              name: webhook-certs
              readOnly: true
      volumes:
        - name: webhook-certs
          emptyDir: {}

    The only part remaining is to give this CA bundle information to the API server using mutation/validation configs. This can be done in two ways:

    • Patch the CA bundle in the existing MutatingWebhookConfiguration or ValidatingWebhookConfiguration using Kubernetes go-client in the init container.
    • Create MutatingWebhookConfiguration or ValidatingWebhookConfiguration in the init container itself with CA bundle information in configs.

    Here, we will create configs through init container. To get certain parameters, like mutation config name, webhook service name, and webhook namespace dynamically, we can take these values from init containers env:

    initContainers:
      image: <webhook init-image name>
      imagePullPolicy: IfNotPresent
      name: webhook-init
      volumeMounts:
        - mountPath: /etc/webhook/certs
          name: webhook-certs
      env:
        - name: MUTATE_CONFIG
          value: mutating-webhook-configuration
        - name: VALIDATE_CONFIG
          value: validating-webhook-configuration
        - name: WEBHOOK_SERVICE
          value: webhook-service
        - name: WEBHOOK_NAMESPACE
          value:  default

    To create MutatingWebhookConfiguration, we will add the below piece of code in init container code below the certificate generation code.

    package main
    
    import (
    	"bytes"
    	admissionregistrationv1 "k8s.io/api/admissionregistration/v1"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/client-go/kubernetes"
    	"os"
    	ctrl "sigs.k8s.io/controller-runtime"
    )
    
    func createMutationConfig(caCert *bytes.Buffer) {
    
    	var (
    		webhookNamespace, _ = os.LookupEnv("WEBHOOK_NAMESPACE")
    		mutationCfgName, _  = os.LookupEnv("MUTATE_CONFIG")
    		// validationCfgName, _ = os.LookupEnv("VALIDATE_CONFIG") Not used here in below code
    		webhookService, _ = os.LookupEnv("WEBHOOK_SERVICE")
    	)
    	config := ctrl.GetConfigOrDie()
    	kubeClient, err := kubernetes.NewForConfig(config)
    	if err != nil {
    		panic("failed to set go -client")
    	}
    
    	path := "/mutate"
    	fail := admissionregistrationv1.Fail
    
    	mutateconfig := &admissionregistrationv1.MutatingWebhookConfiguration{
    		ObjectMeta: metav1.ObjectMeta{
    			Name: mutationCfgName,
    		},
    		Webhooks: []admissionregistrationv1.MutatingWebhook{{
    			Name: "mapplication.kb.io",
    			ClientConfig: admissionregistrationv1.WebhookClientConfig{
    				CABundle: caCert.Bytes(), // CA bundle created earlier
    				Service: &admissionregistrationv1.ServiceReference{
    					Name:      webhookService,
    					Namespace: webhookNamespace,
    					Path:      &path,
    				},
    			},
    			Rules: []admissionregistrationv1.RuleWithOperations{{Operations: []admissionregistrationv1.OperationType{
    				admissionregistrationv1.Create},
    				Rule: admissionregistrationv1.Rule{
    					APIGroups:   []string{"apps"},
    					APIVersions: []string{"v1"},
    					Resources:   []string{"deployments"},
    				},
    			}},
    			FailurePolicy: &fail,
    		}},
    	}
      
    	if _, err := kubeClient.AdmissionregistrationV1().MutatingWebhookConfigurations().Create(mutateconfig)
    		err != nil {
    		panic(err)
    	}
    }

    The code above is just a sample code to create MutatingWebhookConfiguration. Here first, we are importing the required packages. Then, we are reading the environment variables like webhookNamespace, etc. Next, we are defining the MutatingWebhookConfiguration struct with CA bundle information (created earlier) and other required information. Finally, we are creating a configuration using the go-client. The same approach can be followed for creating the ValidatingWebhookConfiguration. For cases of pod restart or deletion, we can add extra logic in init containers like delete the existing configs first before creating or updating only the CA bundle if configs already exist.  

    For certificate rotation, the approach will be different for each approach taken for serving this certificate to the server container:

    • If we are using emptyDir volume, then the approach will be to just restart the webhook pod. As emptyDir volume is ephemeral and bound to the lifecycle of the pod, on restart, a new certificate will be generated and served to the server container. A new CA bundle will be added in configs if configs already exist.
    • If we are using secret volume, then, while restarting the webhook pod, the expiration of the existing certificate from the secret can be checked to decide whether to use the existing certificate for the server or create a new one.

    In both cases, the webhook pod restart is required to trigger the certificate rotation/renew process. When you will want to restart the webhook pod and how the webhook pod will be restarted will vary depending on the use-case. A few possible ways can be using cron-job, controllers, etc.

    Now, our custom webhook is registered, the API server can read CA bundle information through configs, and the webhook server is ready to serve the mutation/validation requests as per rules defined in configs. 

    Conclusion:

    We covered how we will add additional checks mutation/validation by registering our own custom admission webhook server. We also covered how we can automatically handle webhook server TLS certificate and key using init containers and passing the CA bundle information to API server through mutation/validation configs.

    Related Articles:

    1. OPA On Kubernetes: An Introduction For Beginners

    2. Prow + Kubernetes – A Perfect Combination To Execute CI/CD At Scale