Author: admin

  • Elasticsearch – Basic and Advanced Concepts

    What is Elasticsearch?

    In our previous blog, we have seen Elasticsearch is a highly scalable open-source full-text search and analytics engine, built on the top of Apache Lucene. Elasticsearch allows you to store, search, and analyze huge volumes of data as quickly as possible and in near real-time.

    Basic Concepts –

    • Index – Large collection of JSON documents. Can be compared to a database in relational databases. Every document must reside in an index.
    • Shards – Since, there is no limit on the number of documents that reside in an index, indices are often horizontally partitioned as shards that reside on nodes in the cluster. 
      Max documents allowed in a shard = 2,147,483,519 (as of now)
    • Type – Logical partition of an index. Similar to a table in relational databases. 
    • Fields – Similar to a column in relational databases. 
    • Analyzers – Used while indexing/searching the documents. These contain “tokenizers” that split phrases/text into tokens and “token-filters”, that filter/modify tokens during indexing & searching.
    • Mappings – Combination of Field + Analyzers. It defines how your fields can be stored & indexed.

    Inverted Index

    ES uses Inverted Indexes under the hood. Inverted Index is an index which maps terms to documents containing them.

    Let’s say, we have 3 documents :

    1. Food is great
    2. It is raining
    3. Wind is strong

    An inverted index for these documents can be constructed as –

    The terms in the dictionary are stored in a sorted order to find them quickly.

    Searching multiple terms is done by performing a lookup on the terms in the index. It performs either UNION or INTERSECTION on them and fetches relevant matching documents.

    An ES Index is spanned across multiple shards, each document is routed to a shard in a round–robin fashion while indexing. We can customize which shard to route the document, and which shard search-requests are sent to.

    ES Index is made of multiple Lucene indexes, which in turn, are made up of index segments. These are write once, read many types of indices, i.e the index files Lucene writes are immutable (except for deletions).

    Analyzers –

    Analysis is the process of converting text into tokens or terms which are added to the inverted index for searching. Analysis is performed by an analyzer. An analyzer can be either a built-in or a custom. 

    We can define single analyzer for both indexing & searching, or a different search-analyzer and an index-analyzer for a mapping.

    Building blocks of analyzer- 

    • Character filters – receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.
    • Tokenizers – receives a stream of characters, breaks it up into individual tokens. 
    • Token filters – receives the token stream and may add, remove, or change tokens.

    Some Commonly used built-in analyzers –

    1. Standard –

    Divides text into terms on word boundaries. Lower-cases all terms. Removes punctuation and stopwords (if specified, default = None).

    Text:  The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [the, 2, quick, brown, foxes, jumped, over, the, lazy, dog’s, bone]

    2. Simple/Lowercase –

    Divides text into terms whenever it encounters a non-letter character. Lower-cases all terms.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

    3. Whitespace –

    Divides text into terms whenever it encounters a white-space character.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog’s, bone.]

    4. Stopword –

    Same as simple-analyzer with stop word removal by default.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [ quick, brown, foxes, jumped, over, lazy, dog, s, bone]

    5. Keyword / NOOP –

    Returns the entire input string as it is.

    Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.

    Output: [The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.]

    Some Commonly used built-in tokenizers –

    1. Standard –

    Divides text into terms on word boundaries, removes most punctuation.

    2. Letter –

    Divides text into terms whenever it encounters a non-letter character.

    3. Lowercase –

    Letter tokenizer which lowercases all tokens.

    4. Whitespace –

    Divides text into terms whenever it encounters any white-space character.

    5. UAX-URL-EMAIL –

    Standard tokenizer which recognizes URLs and email addresses as single tokens.

    6. N-Gram –

    Divides text into terms when it encounters anything from a list of specified characters (e.g. whitespace or punctuation), and returns n-grams of each word: a sliding window of continuous letters, e.g. quick → [qu, ui, ic, ck, qui, quic, quick, uic, uick, ick].

    7. Edge-N-Gram –

    It is similar to N-Gram tokenizer with n-grams anchored to the start of the word (prefix- based NGrams). e.g. quick → [q, qu, qui, quic, quick].

    8. Keyword –

    Emits exact same text as a single term.

    Make your mappings right –

    Analyzers if not made right, can increase your search time extensively. 

    Avoid using regular expressions in queries as much as possible. Let your analyzers handle them.

    ES provides multiple tokenizers (standard, whitespace, ngram, edge-ngram, etc) which can be directly used, or you can create your own tokenizer. 

    A simple use-case where we had to search for a user who either has “brad” in their name or “brad_pitt” in their email (substring based search), one would simply go and write a regex for this query, if no proper analyzers are written for this mapping.

    {
      "query": {
        "bool": {
          "should": [
            {
              "regexp": {
                "email.raw": ".*brad_pitt.*"
              }
            },
            {
              "regexp": {
                "name.raw": ".*brad.*"
              }
            }
          ]
        }
      }
    }

    This took 16s for us to fetch 1 lakh out of 60 million documents

    Instead, we created an n-gram analyzer with lower-case filter which would generate all relevant tokens while indexing.

    The above regex query was updated to –

    {
      "query": {
        "bool": {
          "multi_match": {
            "query": "brad",
            "fields": [
              "email.suggestion",
              "full_name.suggestion"
            ]
          }
        }
      },
      "size": 25
    }

    This took 109ms for us to fetch 1 lakh out of 60 million documents

    Thus, previous search query which took more than 10-25s got reduced to less than 800-900ms to fetch the same set of records.

    Had the use-case been to search results where name starts with “brad” or email starts with “brad_pitt” (prefix based search), it is better to go for edge-n-gram analyzer or suggesters.

    Performance Improvement with Filter Queries –

    Use Filter queries whenever possible. 

    ES usually scores documents and returns them in sorted order as per their scores. This may take a hit on performance if scoring of documents is not relevant to our use-case. In such scenarios, use “filter” queries which give boolean scores to documents.

    {
      "query": {
        "bool": {
          "multi_match": {
            "query": "brad",
            "fields": [
              "email.suggestion",
              "full_name.suggestion"
            ]
          }
        }
      },
      "size": 25
    }

    Above query can now be written as –

    {
      "query": {
        "bool": {
          "filter": {
            "bool": {
              "must": [
                {
                  "multi_match": {
                    "query": "brad",
                    "fields": [
                      "email.suggestion",
                      "full_name.suggestion"
                    ]
                  }
                }
              ]
            }
          }
        }
      },
      "size": 25
    }

    This will reduce query-time by a few milliseconds.

    Re-indexing made faster –

    Before creating any mappings, know your use-case well.

    ES does not allow us to alter existing mappings unlike “ALTER” command in relational databases, although we can keep adding new mappings to the index. 

    The only way to change existing mappings is by creating a new index, re-indexing existing documents and aliasing the new-index with required name with ZERO downtime on production. Note – This process can take days if you have millions of records to re-index.

    To re-index faster, we can change a few settings  –

    1. Disable swapping – Since no requests will be directed to the new index till indexing is done, we can safely disable swap
    Command for Linux machines –

    sudo swapoff -a

    2. Disable refresh_interval for ES – Default refresh_interval is 1s which can safely be disabled while documents are getting re-indexed.

    3. Change bulk size while indexing – ES usually indexes documents in chunks of size 1k. It is preferred to increase this default size to approx 5 to 10K, although we need to find the sweet spot while reindexing to avoid load on current index.

    4. Reset replica count to 0  – ES creates at least 1 replica per shard, by default. We can set this to 0 while indexing & reset it to required value post indexing.

    Conclusion

    ElasticSearch is a very powerful database for text-based searches. The Elastic ecosystem is widely used for reporting, alerting, machine learning, etc. This article just gives an overview of ElasticSearch mappings and how creating relevant mappings can improve your query performance & accuracy. Giving right mappings, right resources to your ElasticSearch cluster can do wonders.

  • Elasticsearch 101: Fundamentals & Core Components

    Elasticsearch is currently the most popular way to implement free text search and analytics in applications. It is highly scalable and can easily manage petabytes of data. It supports a variety of use cases like allowing users to easily search through any portal, collect and analyze log data, build business intelligence dashboards to quickly analyze and visualize data.  

    This blog acts as an introduction to Elasticsearch and covers the basic concepts of clusters, nodes, index, document and shards.

    What is Elasticsearch?

    Elasticsearch (ES) is a combination of open-source, distributed, highly scalable data store, and Lucene – a search engine that supports extremely fast full-text search. It is a beautifully crafted software, which hides the internal complexities and provides full-text search capabilities with simple REST APIs. Elasticsearch is written in Java with Apache Lucene at its core. It should be clear that Elasticsearch is not like a traditional RDBMS. It is not suitable for your transactional database needs, and hence, in my opinion, it should not be your primary data store. It is a common practice to use a relational database as the primary data store and inject only required data into Elasticsearch.

    Elasticsearch is meant for fast text search. There are several functionalities, which make it different from RDBMS. Unlike RDBMS, Elasticsearch stores data in the form of a JSON document, which is denormalized and doesn’t support transactions, referential integrity, joins, and subqueries.

    Elasticsearch works with structured, semi-structured, and unstructured data as well. In the next section, let’s walk through the various components in Elasticsearch.

    Elasticsearch Components

    Cluster

    One or more servers collectively providing indexing and search capabilities form an Elasticsearch cluster. The cluster size can vary from a single node to thousands of nodes, depending on the use cases.

    Node

    Node is a single physical or virtual machine that holds full or part of your data and provides computing power for indexing and searching your data. Every node is identified with a unique name. If the node identifier is not specified, a random UUID is assigned as a node identifier at the startup. Every node configuration has the property `cluster.name`. The cluster will be formed automatically with all the nodes having the same `cluster.name` at startup.

    A node has to accomplish several duties such as:

    • storing the data
    • performing operations on data (indexing, searching, aggregation, etc.)
    • maintaining the health of the cluster

    Each node in a cluster can do all these operations. Elasticsearch provides the capability to split responsibilities across different nodes. This makes it easy to scale, optimize, and maintain the cluster. Based on the responsibilities, the following are the different types of nodes that are supported:

    Data Node

    Data node is the node that has storage and computation capability. Data node stores the part of data in the form of shards (explained in the later section). Data nodes also participate in the CRUD, search, and aggregate operations. These operations are resource-intensive, and hence, it is a good practice to have dedicated data nodes without having the additional load of cluster administration. By default, every node of the cluster is a data node.

    Master Node

    Master nodes are reserved to perform administrative tasks. Master nodes track the availability/failure of the data nodes. The master nodes are responsible for creating and deleting the indices (explained in the later section).

    This makes the master node a critical part of the Elasticsearch cluster. It has to be stable and healthy. A single master node for a cluster is certainly a single point of failure. Elasticsearch provides the capability to have multiple master-eligible nodes. All the master eligible nodes participate in an election to elect a master node. It is recommended to have a minimum of three nodes in the cluster to avoid a split-brain situation. By default, all the nodes are both data nodes as well as master nodes. However, some nodes can be master-eligible nodes only through explicit configuration.

    Coordinating-Only Node

    Any node, which is not a master node or a data node, is a coordinating node. Coordinating nodes act as smart load balancers. Coordinating nodes are exposed to end-user requests. It appropriately redirects the requests between data nodes and master nodes.

    To take an example, a user’s search request is sent to different data nodes. Each data node searches locally and sends the result back to the coordinating node. Coordinating node aggregates and returns the result to the user.

    There are a few concepts that are core to Elasticsearch. Understanding these basic concepts will tremendously ease the learning process.

    Index

    Index is a container to store data similar to a database in the relational databases. An index contains a collection of documents that have similar characteristics or are logically related. If we take an example of an e-commerce website, there will be one index for products, one for customers, and so on. Indices are identified by the lowercase name. The index name is required to perform the add, update, and delete operations on the documents.

    Type

    Type is a logical grouping of the documents within the index. In the previous example of product index, we can further group documents into types, like electronics, fashion, furniture, etc. Types are defined on the basis of documents having similar properties in it. It isn’t easy to decide when to use the type over the index. Indices have more overheads, so sometimes, it is better to use different types in the same index for better performance. There are a couple of restrictions to use types as well. For example, two fields having the same name in different types of documents should be of the same datatype (string, date, etc.).

    Document

    Document is the piece indexed by Elasticsearch. A document is represented in the JSON format. We can add as many documents as we want into an index. The following snippet shows how to create a document of type mobile in the index store. We will cover more about the individual field of the document in the Mapping Type section.

    HTTP POST <hostname:port>/store/mobile/
    {    
    "name": "Motorola G5",    
    "model": "XT3300",    
    "release_date": "2016-01-01",    
    "features": "16 GB ROM | Expandable Upto 128 GB | 5.2 inch Full HD Display | 12MP Rear Camera | 5MP Front Camera | 3000 mAh Battery | Snapdragon 625 Processor",    
    "ram_gb": "3",    
    "screen_size_inches": "5.2"
    }

    Mapping Types

    To create different types in an index, we need mapping types (or simply mapping) to be specified during index creation. Mappings can be defined as a list of directives given to Elasticseach about how the data is supposed to be stored and retrieved. It is important to provide mapping information at the time of index creation based on how we want to retrieve our data later. In the context of relational databases, think of mappings as a table schema.

    Mapping provides information on how to treat each JSON field. For example, the field can be of type date, geolocation, or person name. Mappings also allow specifying which fields will participate in the full-text search, and specify the analyzers used to transform and decorate data before storing into an index. If no mapping is provided, Elasticsearch tries to identify the schema itself, known as Dynamic Mapping. 

    Each mapping type has Meta Fields and Properties. The snippet below shows the mapping of the type mobile.

    {    
    "mappings": {        
      "mobile": {            
        "properties": {                
          "name": {                    
            "type": "keyword"                
          },                
            "model": {                    
              "type": "keyword"                
           },               
              "release_date": {                    
                "type": "date"                
           },                
                "features": {                    
                  "type": "text"               
             },                
                "ram_gb": {                    
                  "type": "short"                
              },                
                  "screen_size_inches": {                    
                    "type": "float"                
              }            
            }        
          }    
       }
    }

    Meta Fields

    As the name indicates, meta fields stores additional information about the document. Meta fields are meant for mostly internal usage, and it is unlikely that the end-user has to deal with meta fields. Meta field names starts with an underscore. There are around ten meta fields in total. We will talk about some of them here:

    _index

    It stores the name of the index document it belongs to. This is used internally to store/search the document within an index.

    _type

    It stores the type of the document. To get better performance, it is often included in search queries.

    _id

    This is the unique id of the document. It is used to access specific document directly over the HTTP GET API.

    _source

    This holds the original JSON document before applying any analyzers/transformations. It is important to note that Elasticsearch can query on fields that are indexed (provided mapping for). The _source field is not indexed, and hence, can’t be queried on but it can be included in the final search result.

    Fields Or Properties

    List of fields specifies which all JSON fields in the document should be included in a particular type. In the e-commerce website example, mobile can be a type. It will have fields, like operating_system, camera_specification, ram_size, etc.

    Fields also carry the data type information with them. This directs Elasticsearch to treat the specific fields in a particular way of storing/searching data. Data types are similar to what we see in any other programming language. We will talk about a few of them here.

    Simple Data Types

    Text

    This data type is used to store full-text like product description. These fields participate in full-text search. These types of fields are analyzed while storing, which enables to searching them by the individual word in it. Such fields are not used in sorting and aggregation queries.

    Keywords

    This type is also used to store text data, but unlike Text, it is not analyzed and stored. This is suitable to store information like a user’s mobile number, city, age, etc. These fields are used in filter, aggregation, and sorting queries. For e.g., list all users from a particular city and filter them by age.

    Numeric

    Elasticsearch supports a wide range of numeric type: long, integer, short, byte, double, float.

    There are a few more data types to support date, boolean (true/false, on/off, 1/0), IP (to store IP addresses).

    Special Data Types

    Geo Point

    This data type is used to store geographical location. It accepts latitude and longitude pair. For example, this data type can be used to arrange the user’s photo library by their geographical location or graphically display the locations trending on social media news.

    Geo Shape

    It allows storing arbitrary geometric shapes like rectangle, polygon, etc.

    Completion Suggester

    This data type is used to provide auto-completion feature over a specific field. As the user types certain text, the completion suggester can guide the user to reach particular results.

    Complex Data Type

    Object

    If you know JSON well, this concept won’t be new for you. Elasticsearch also allows storing nested JSON object structure as a document.

    Nested

    The Object data type is not that useful due to its underlying data representation in the Lucene index. Lucene index does not support inner JSON object. ES flattens the original JSON to make it compatible with storing in Lucene index. Thus, fields of the multiple inner objects get merged into one leading object to wrong search results. Most of the time, you may use Nested data type over Object.

    Shards

    Shards help with enabling Elasticsearch to become horizontally scalable. An index can store millions of documents and occupy terabytes of data. This can cause problems with performance, scalability, and maintenance. Let’s see how Shards help achieve scalability.

    Indices are divided into multiple units called Shards (refer the diagram below). Shard is a full-featured subset of an index. Shards of the same index now can reside on the same or different nodes of the cluster. Shard decides the degree of parallelism for search and indexing operations. Shards allow the cluster to grow horizontally. The number of shards per index can be specified at the time of index creation. By default, the number of shards created is 5. Although, once the index is created the number of shards can not be changed. To change the number of shards, reindex the data.

    Replication

    Hardware can fail at any time. To ensure fault tolerance and high availability, ES provides a feature to replicate the data. Shards can be replicated. A shard which is being copied is called as Primary Shard. The copy of the primary shard is called a replica shard or simply replica. Like the number of shards, the number of replication can also be specified at the time of index creation. Replication served two purposes:

    • High Availability – Replica is never been created on the same node where the primary shard is present. This ensures that data can be available through the replica shard even if the complete node is failed.
    • Performance – Replica can also contribute to search capabilities. The search queries will be executed parallelly across the replicas.

    To summarize, to achieve high availability and performance, the index is split into multiple shards. In a production environment, multiple replicas are created for every index. In the replicated index, only primary shards can serve write requests. However, all the shards (the primary shard as well as replicated shards) can serve read/query requests. The replication factor is defined at the time of index creation and can be changed later if required. Choosing the number of shards is an important exercise. As once defined, it can’t be changed. In critical scenarios, changing the number of shards requires creating a new index with required shards and reindexing old data.

    Summary

    In this blog, we have covered the basic but important aspects of Elasticsearch. In the following posts, I will talk about how indexing & searching works in detail. Stay tuned!

  • A Beginner’s Guide to Edge Computing

    In the world of data centers with wings and wheels, there is an opportunity to lay some work off from the centralized cloud computing by taking less compute intensive tasks to other components of the architecture. In this blog, we will explore the upcoming frontier of the web – Edge Computing.

    What is the “Edge”?

    The ‘Edge’ refers to having computing infrastructure closer to the source of data. It is the distributed framework where data is processed as close to the originating data source possible. This infrastructure requires effective use of resources that may not be continuously connected to a network such as laptops, smartphones, tablets, and sensors. Edge Computing covers a wide range of technologies including wireless sensor networks, cooperative distributed peer-to-peer ad-hoc networking and processing, also classifiable as local cloud/fog computing, mobile edge computing, distributed data storage and retrieval, autonomic self-healing networks, remote cloud services, augmented reality, and more.

    Cloud Computing is expected to go through a phase of decentralization. Edge Computing is coming up with an ideology of bringing compute, storage and networking closer to the consumer.

    But Why?

    Legit question! Why do we even need Edge Computing? What are the advantages of having this new infrastructure?

    Imagine a case of a self-driving car where the car is sending a live stream continuously to the central servers. Now, the car has to take a crucial decision. The consequences can be disastrous if the car waits for the central servers to process the data and respond back to it. Although algorithms like YOLO_v2 have sped up the process of object detection the latency is at that part of the system when the car has to send terabytes to the central server and then receive the response and then act! Hence, we need the basic processing like when to stop or decelerate, to be done in the car itself.

    The goal of Edge Computing is to minimize the latency by bringing the public cloud capabilities to the edge. This can be achieved in two forms – custom software stack emulating the cloud services running on existing hardware, and the public cloud seamlessly extended to multiple point-of-presence (PoP) locations.

    Following are some promising reasons to use Edge Computing:

    1. Privacy: Avoid sending all raw data to be stored and processed on cloud servers.
    2. Real-time responsiveness: Sometimes the reaction time can be a critical factor.
    3. Reliability: The system is capable to work even when disconnected to cloud servers. Removes a single point of failure.

    To understand the points mentioned above, let’s take the example of a device which responds to a hot keyword. Example, Jarvis from Iron Man. Imagine if your personal Jarvis sends all of your private conversations to a remote server for analysis. Instead, It is intelligent enough to respond when it is called. At the same time, it is real-time and reliable.

    Intel CEO Brian Krzanich said in an event that autonomous cars will generate 40 terabytes of data for every eight hours of driving. Now with that flood of data, the time of transmission will go substantially up. In cases of self-driving cars, real-time or quick decisions are an essential need. Here edge computing infrastructure will come to rescue. These self-driving cars need to take decisions is split of a second whether to stop or not else consequences can be disastrous.

    Another example can be drones or quadcopters, let’s say we are using them to identify people or deliver relief packages then the machines should be intelligent enough to take basic decisions like changing the path to avoid obstacles locally.

    Forms of Edge Computing

    Device Edge:

    In this model, Edge Computing is taken to the customers in the existing environments. For example, AWS Greengrass and Microsoft Azure IoT Edge.

    Cloud Edge:

    This model of Edge Computing is basically an extension of the public cloud. Content Delivery Networks are classic examples of this topology in which the static content is cached and delivered through a geographically spread edge locations.

    Vapor IO is an emerging player in this category. They are attempting to build infrastructure for cloud edge. Vapor IO has various products like Vapor Chamber. These are self-monitored. They have sensors embedded in them using which they are continuously monitored and evaluated by Vapor Software, VEC(Vapor Edge Controller). They also have built OpenDCRE, which we will see later in this blog.

    The fundamental difference between device edge and cloud edge lies in the deployment and pricing models. The deployment of these models – device edge and cloud edge – are specific to different use cases. Sometimes, it may be an advantage to deploy both the models.

    Edges around you

    Edge Computing examples can be increasingly found around us:

    1. Smart street lights
    2. Automated Industrial Machines
    3. Mobile devices
    4. Smart Homes
    5. Automated Vehicles (cars, drones etc)

    Data Transmission is expensive. By bringing compute closer to the origin of data, latency is reduced as well as end users have better experience. Some of the evolving use cases of Edge Computing are Augmented Reality(AR) or Virtual Reality(VR) and the Internet of things. For example, the rush which people got while playing an Augmented Reality based pokemon game, wouldn’t have been possible if “real-timeliness” was not present in the game. It was made possible because the smartphone itself was doing AR not the central servers. Even Machine Learning(ML) can benefit greatly from Edge Computing. All the heavy-duty training of ML algorithms can be done on the cloud and the trained model can be deployed on the edge for near real-time or even real-time predictions. We can see that in today’s data-driven world edge computing is becoming a necessary component of it.

    There is a lot of confusion between Edge Computing and IOT. If stated simply, Edge Computing is nothing but the intelligent Internet of things(IOT) in a way. Edge Computing actually complements traditional IOT. In the traditional model of IOT, all the devices, like sensors, mobiles, laptops etc are connected to a central server. Now let’s imagine a case where you give the command to your lamp to switch off, for such simple task, data needs to be transmitted to the cloud, analyzed there and then lamp will receive a command to switch off. Edge Computing brings computing closer to your home, that is either the fog layer present between lamp and cloud servers is smart enough to process the data or the lamp itself.

    If we look at the below image, it is a standard IOT implementation where everything is centralized. While Edge Computing philosophy talks about decentralizing the architecture.

    The Fog  

    Sandwiched between edge layer and cloud layer, there is the Fog Layer. It bridges connection between other two layers.

    The difference between fog and edge computing is described in this article

    • Fog Computing – Fog computing pushes intelligence down to the local area network level of network architecture, processing data in a fog node or IoT gateway.
    • Edge computing pushes the intelligence, processing power and communication capabilities of an edge gateway or appliance directly into devices like programmable automation controllers (PACs).

    How do we manage Edge Computing?

    The Device Relationship Management or DRM refers to managing, monitoring the interconnected components over the internet. AWS IOT Core and AWS Greengrass, Nebbiolo Technologies have developed Fog Node and Fog OS, Vapor IO has OpenDCRE using which one can control and monitor the data centers.

    Following image (source – AWS) shows how to manage ML on Edge Computing using AWS infrastructure.

    AWS Greengrass makes it possible for users to use Lambda functions to build IoT devices and application logic. Specifically, AWS Greengrass provides cloud-based management of applications that can be deployed for local execution. Locally deployed Lambda functions are triggered by local events, messages from the cloud, or other sources.

    This GitHub repo demonstrates a traffic light example using two Greengrass devices, a light controller, and a traffic light.

    Conclusion

    We believe that next-gen computing will be influenced a lot by Edge Computing and will continue to explore new use-cases that will be made possible by the Edge.

    References

  • The 7 Most Useful Design Patterns in ES6 (and how you can implement them)

    After spending a couple of years in JavaScript development, I’ve realized how incredibly important design patterns are, in modern JavaScript (ES6). And I’d love to share my experience and knowledge on the subject, hoping you’d make this a critical part of your development process as well.

    Note: All the examples covered in this post are implemented with ES6 features, but you can also integrate the design patterns with ES5.

    At Velotio, we always follow best practices to achieve highly maintainable and more robust code. And we are strong believers of using design patterns as one of the best ways to write clean code. 

    In the post below, I’ve listed the most useful design patterns I’ve implemented so far and how you can implement them too:

    1. Module

    The module pattern simply allows you to keep units of code cleanly separated and organized. 

    Modules promote encapsulation, which means the variables and functions are kept private inside the module body and can’t be overwritten.

    Creating a module in ES6 is quite simple.

    // Addition module
    export const sum = (num1, num2) => num1 + num2;

    // usage
    import { sum } from 'modules/sum';
    const result = sum(20, 30); // 50

    ES6 also allows us to export the module as default. The following example gives you a better understanding of this.

    // All the variables and functions which are not exported are private within the module and cannot be used outside. Only the exported members are public and can be used by importing them.
    
    // Here the businessList is private member to city module
    const businessList = new WeakMap();
     
    // Here City uses the businessList member as it’s in same module
    class City {
     constructor() {
       businessList.set(this, ['Pizza Hut', 'Dominos', 'Street Pizza']);
     }
     
     // public method to access the private ‘businessList’
     getBusinessList() {
       return businessList.get(this);
     }
    
    // public method to add business to ‘businessList’
     addBusiness(business) {
       businessList.get(this).push(business);
     }
    }
     
    // export the City class as default module
    export default City;

    // usage
    import City from 'modules/city';
    const city = new City();
    city.getBusinessList();

    There is a great article written on the features of ES6 modules here.

    2. Factory

    Imagine creating a Notification Management application where your application currently only allows for a notification through Email, so most of the code lives inside the EmailNotification class. And now there is a new requirement for PushNotifications. So, to implement the PushNotifications, you have to do a lot of work as your application is mostly coupled with the EmailNotification. You will repeat the same thing for future implementations.

    To solve this complexity, we will delegate the object creation to another object called factory.

    class PushNotification {
     constructor(sendTo, message) {
       this.sendTo = sendTo;
       this.message = message;
     }
    }
     
    class EmailNotification {
     constructor(sendTo, cc, emailContent) {
       this.sendTo = sendTo;
       this.cc = cc;
       this.emailContent = emailContent;
     }
    }
     
    // Notification Factory
     
    class NotificationFactory {
     createNotification(type, props) {
       switch (type) {
         case 'email':
           return new EmailNotification(props.sendTo, props.cc, props.emailContent);
         case 'push':
           return new PushNotification(props.sendTo, props.message);
       }
     }
    }
     
    // usage
    const factory = new NotificationFactory();
     
    // create email notification
    const emailNotification = factory.createNotification('email', {
     sendTo: 'receiver@domain.com',
     cc: 'test@domain.com',
     emailContent: 'This is the email content to be delivered.!',
    });
     
    // create push notification
    const pushNotification = factory.createNotification('push', {
     sendTo: 'receiver-device-id',
     message: 'The push notification message',
    });

    3. Observer

    (Also known as the publish/subscribe pattern.)

    An observer pattern maintains the list of subscribers so that whenever an event occurs, it will notify them. An observer can also remove the subscriber if the subscriber no longer wishes to be notified.

    On YouTube, many times, the channels we’re subscribed to will notify us whenever a new video is uploaded.

    // Publisher
    class Video {
     constructor(observable, name, content) {
       this.observable = observable;
       this.name = name;
       this.content = content;
       // publish the ‘video-uploaded’ event
       this.observable.publish('video-uploaded', {
         name,
         content,
       });
     }
    }
    // Subscriber
    class User {
     constructor(observable) {
       this.observable = observable;
       this.intrestedVideos = [];
       // subscribe with the event naame and the call back function
       this.observable.subscribe('video-uploaded', this.addVideo.bind(this));
     }
     
     addVideo(video) {
       this.intrestedVideos.push(video);
     }
    }
    // Observer 
    class Observable {
     constructor() {
       this.handlers = [];
     }
     
     subscribe(event, handler) {
       this.handlers[event] = this.handlers[event] || [];
       this.handlers[event].push(handler);
     }
     
     publish(event, eventData) {
       const eventHandlers = this.handlers[event];
     
       if (eventHandlers) {
         for (var i = 0, l = eventHandlers.length; i < l; ++i) {
           eventHandlers[i].call({}, eventData);
         }
       }
     }
    }
    // usage
    const observable = new Observable();
    const user = new User(observable);
    const video = new Video(observable, 'ES6 Design Patterns', videoFile);

    4. Mediator

    The mediator pattern provides a unified interface through which different components of an application can communicate with each other.

    If a system appears to have too many direct relationships between components, it may be time to have a central point of control that components communicate through instead. 

    The mediator promotes loose coupling. 

    A real-time analogy could be a traffic light signal that handles which vehicles can go and stop, as all the communications are controlled from a traffic light.

    Let’s create a chatroom (mediator) through which the participants can register themselves. The chatroom is responsible for handling the routing when the participants chat with each other. 

    // each participant represented by Participant object
    class Participant {
     constructor(name) {
       this.name = name;
     }
      getParticiantDetails() {
       return this.name;
     }
    }
     
    // Mediator
    class Chatroom {
     constructor() {
       this.participants = {};
     }
     
     register(participant) {
       this.participants[participant.name] = participant;
       participant.chatroom = this;
     }
     
     send(message, from, to) {
       if (to) {
         // single message
         to.receive(message, from);
       } else {
         // broadcast message to everyone
         for (key in this.participants) {
           if (this.participants[key] !== from) {
             this.participants[key].receive(message, from);
           }
         }
       }
     }
    }
     
    // usage
    // Create two participants  
     const john = new Participant('John');
     const snow = new Participant('Snow');
    // Register the participants to Chatroom
     var chatroom = new Chatroom();
     chatroom.register(john);
     chatroom.register(snow);
    // Participants now chat with each other
     john.send('Hey, Snow!');
     john.send('Are you there?');
     snow.send('Hey man', yoko);
     snow.send('Yes, I heard that!');

    5. Command

    In the command pattern, an operation is wrapped as a command object and passed to the invoker object. The invoker object passes the command to the corresponding object, which executes the command.

    The command pattern decouples the objects executing the commands from objects issuing the commands. The command pattern encapsulates actions as objects. It maintains a stack of commands whenever a command is executed, and pushed to stack. To undo a command, it will pop the action from stack and perform reverse action.

    You can consider a calculator as a command that performs addition, subtraction, division and multiplication, and each operation is encapsulated by a command object.

    // The list of operations can be performed
    const addNumbers = (num1, num2) => num1 + num2;
    const subNumbers = (num1, num2) => num1 - num2;
    const multiplyNumbers = (num1, num2) => num1 * num2;
    const divideNumbers = (num1, num2) => num1 / num2;
     
    // CalculatorCommand class initialize with execute function, undo function // and the value 
    class CalculatorCommand {
     constructor(execute, undo, value) {
       this.execute = execute;
       this.undo = undo;
       this.value = value;
     }
    }
    // Here we are creating the command objects
    const DoAddition = value => new CalculatorCommand(addNumbers, subNumbers, value);
    const DoSubtraction = value => new CalculatorCommand(subNumbers, addNumbers, value);
    const DoMultiplication = value => new CalculatorCommand(multiplyNumbers, divideNumbers, value);
    const DoDivision = value => new CalculatorCommand(divideNumbers, multiplyNumbers, value);
     
    // AdvancedCalculator which maintains the list of commands to execute and // undo the executed command
    class AdvancedCalculator {
     constructor() {
       this.current = 0;
       this.commands = [];
     }
     
     execute(command) {
       this.current = command.execute(this.current, command.value);
       this.commands.push(command);
     }
     
     undo() {
       let command = this.commands.pop();
       this.current = command.undo(this.current, command.value);
     }
     
     getCurrentValue() {
       return this.current;
     }
    }
    
    // usage
    const advCal = new AdvancedCalculator();
     
    // invoke commands
    advCal.execute(new DoAddition(50)); //50
    advCal.execute(new DoSubtraction(25)); //25
    advCal.execute(new DoMultiplication(4)); //100
    advCal.execute(new DoDivision(2)); //50
     
    // undo commands
    advCal.undo();
    advCal.getCurrentValue(); //100

    6. Facade

    The facade pattern is used when we want to show the higher level of abstraction and hide the complexity behind the large codebase.

    A great example of this pattern is used in the common DOM manipulation libraries like jQuery, which simplifies the selection and events adding mechanism of the elements.

    // JavaScript:
    /* handle click event  */
    document.getElementById('counter').addEventListener('click', () => {
     counter++;
    });
     
    // jQuery:
    /* handle click event */
    $('#counter').on('click', () => {
     counter++;
    });

    Though it seems simple on the surface, there is an entire complex logic implemented when performing the operation.

    The following Account Creation example gives you clarity about the facade pattern: 

    // Here AccountManager is responsible to create new account of type 
    // Savings or Current with the unique account number
    let currentAccountNumber = 0;
    
    class AccountManager {
     createAccount(type, details) {
       const accountNumber = AccountManager.getUniqueAccountNumber();
       let account;
       if (type === 'current') {
         account = new CurrentAccount();
       } else {
         account = new SavingsAccount();
       }
       return account.addAccount({ accountNumber, details });
     }
     
     static getUniqueAccountNumber() {
       return ++currentAccountNumber;
     }
    }
    
    
    // class Accounts maintains the list of all accounts created
    class Accounts {
     constructor() {
       this.accounts = [];
     }
     
     addAccount(account) {
       this.accounts.push(account);
       return this.successMessage(complaint);
     }
     
     getAccount(accountNumber) {
       return this.accounts.find(account => account.accountNumber === accountNumber);
     }
     
     successMessage(account) {}
    }
    
    // CurrentAccounts extends the implementation of Accounts for providing more specific success messages on successful account creation
    class CurrentAccounts extends Accounts {
     constructor() {
       super();
       if (CurrentAccounts.exists) {
         return CurrentAccounts.instance;
       }
       CurrentAccounts.instance = this;
       CurrentAccounts.exists = true;
       return this;
     }
     
     successMessage({ accountNumber, details }) {
       return `Current Account created with ${details}. ${accountNumber} is your account number.`;
     }
    }
     
    // Same here, SavingsAccount extends the implementation of Accounts for providing more specific success messages on successful account creation
    class SavingsAccount extends Accounts {
     constructor() {
       super();
       if (SavingsAccount.exists) {
         return SavingsAccount.instance;
       }
       SavingsAccount.instance = this;
       SavingsAccount.exists = true;
       return this;
     }
     
     successMessage({ accountNumber, details }) {
       return `Savings Account created with ${details}. ${accountNumber} is your account number.`;
     }
    }
     
    // usage
    // Here we are hiding the complexities of creating account
    const accountManager = new AccountManager();
     
    const currentAccount = accountManager.createAccount('current', { name: 'John Snow', address: 'pune' });
     
    const savingsAccount = accountManager.createAccount('savings', { name: 'Petter Kim', address: 'mumbai' });

    7. Adapter

    The adapter pattern converts the interface of a class to another expected interface, making two incompatible interfaces work together. 

    With the adapter pattern, you might need to show the data from a 3rd party library with the bar chart representation, but the data formats of the 3rd party library API and the display bar chart are different. Below, you’ll find an adapter that converts the 3rd party library API response to Highcharts’ bar representation:

    // API Response
    [{
       symbol: 'SIC DIVISION',
       exchange: 'Agricultural services',
       volume: 42232,
    }]
     
    // Required format
    [{
       category: 'Agricultural services',
       name: 'SIC DIVISION',
       y: 42232,
    }]
     
    const mapping = {
     symbol: 'category',
     exchange: 'name',
     volume: 'y',
    };
     
    const highchartsAdapter = (response, mapping) => {
     return response.map(item => {
       const normalized = {};
     
       // Normalize each response's item key, according to the mapping
       Object.keys(item).forEach(key => (normalized[mapping[key]] = item[key]));
       return normalized;
     });
    };
     
    highchartsAdapter(response, mapping);

    Conclusion

    This has been a brief introduction to the design patterns in modern JavaScript (ES6). This subject is massive, but hopefully this article has shown you the benefits of using it when writing code.

    Related Articles

    1. Cleaner, Efficient Code with Hooks and Functional Programming

    2. Building a Progressive Web Application in React [With Live Code Examples]

  • Deploy Serverless, Event-driven Python Applications Using Zappa

    Introduction

    Zappa is a  very powerful open source python project which lets you build, deploy and update your WSGI app hosted on AWS Lambda + API Gateway easily.This blog is a detailed step-by-step focusing on challenges faced while deploying Django application on AWS Lambda using Zappa as a deployment tool.

    Building Your Application

    If you do not have a Django application already you can build one by cloning this GitHub repository.

    $ git clone https://github.com/velotiotech/django-zappa-sample.git    

    Cloning into 'django-zappa-sample'...
    remote: Counting objects: 18, done.
    remote: Compressing objects: 100% (13/13), done.
    remote: Total 18 (delta 1), reused 15 (delta 1), pack-reused 0
    Unpacking objects: 100% (18/18), done.
    Checking connectivity... done.

    Once you have cloned the repository you will need a virtual environment which provides an isolated Python environment for your application. I prefer virtualenvwrapper to create one.

    Command :

    $ mkvirtualenv django_zappa_sample 

    Installing setuptools, pip, wheel...done.
    virtualenvwrapper.user_scripts creating /home/velotio/Envs/django_zappa_sample/bin/predeactivate
    virtualenvwrapper.user_scripts creating /home/velotio/Envs/django_zappa_sample/bin/postdeactivate
    virtualenvwrapper.user_scripts creating /home/velotio/Envs/django_zappa_sample/bin/preactivate
    virtualenvwrapper.user_scripts creating /home/velotio/Envs/django_zappa_sample/bin/postactivate
    virtualenvwrapper.user_scripts creating /home/velotio/Envs/django_zappa_sample/bin/get_env_details

    Install dependencies from requirements.txt.

    $ pip install -r requirements.txt

    Collecting Django==1.11.11 (from -r requirements.txt (line 1))
      Downloading https://files.pythonhosted.org/packages/d5/bf/2cd5eb314aa2b89855c01259c94dc48dbd9be6c269370c1f7ae4979e6e2f/Django-1.11.11-py2.py3-none-any.whl (6.9MB)
        100% |████████████████████████████████| 7.0MB 772kB/s 
    Collecting zappa==0.45.1 (from -r requirements.txt (line 2))
    Collecting pytz (from Django==1.11.11->-r requirements.txt (line 1))
      Downloading https://files.pythonhosted.org/packages/dc/83/15f7833b70d3e067ca91467ca245bae0f6fe56ddc7451aa0dc5606b120f2/pytz-2018.4-py2.py3-none-any.whl (510kB)
        100% |████████████████████████████████| 512kB 857kB/s 
    Collecting future==0.16.0 (from zappa==0.45.1->-r requirements.txt (line 2))
    Collecting toml>=0.9.3 (from zappa==0.45.1->-r requirements.txt (line 2))
    Collecting docutils>=0.12 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/50/09/c53398e0005b11f7ffb27b7aa720c617aba53be4fb4f4f3f06b9b5c60f28/docutils-0.14-py2-none-any.whl
    Collecting PyYAML==3.12 (from zappa==0.45.1->-r requirements.txt (line 2))
    Collecting futures==3.1.1 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/a6/1c/72a18c8c7502ee1b38a604a5c5243aa8c2a64f4bba4e6631b1b8972235dd/futures-3.1.1-py2-none-any.whl
    Requirement already satisfied: wheel>=0.30.0 in /home/velotio/Envs/django_zappa_sample/lib/python2.7/site-packages (from zappa==0.45.1->-r requirements.txt (line 2)) (0.31.1)
    Collecting base58==0.2.4 (from zappa==0.45.1->-r requirements.txt (line 2))
    Collecting durationpy==0.5 (from zappa==0.45.1->-r requirements.txt (line 2))
    Collecting kappa==0.6.0 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/ed/cf/a8aa5964557c8a4828da23d210f8827f9ff190318838b382a4fb6f118f5d/kappa-0.6.0-py2-none-any.whl
    Collecting Werkzeug==0.12 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/ae/c3/f59f6ade89c811143272161aae8a7898735e7439b9e182d03d141de4804f/Werkzeug-0.12-py2.py3-none-any.whl
    Collecting boto3>=1.4.7 (from zappa==0.45.1->-r requirements.txt (line 2))
      Downloading https://files.pythonhosted.org/packages/cd/a3/4d1caf76d8f5aac8ab1ffb4924ecf0a43df1572f6f9a13465a482f94e61c/boto3-1.7.24-py2.py3-none-any.whl (128kB)
        100% |████████████████████████████████| 133kB 1.1MB/s 
    Collecting six>=1.11.0 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
    Collecting tqdm==4.19.1 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/c0/d3/7f930cbfcafae3836be39dd3ed9b77e5bb177bdcf587a80b6cd1c7b85e74/tqdm-4.19.1-py2.py3-none-any.whl
    Collecting argcomplete==1.9.2 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/0f/ee/625763d848016115695942dba31a9937679a25622b6f529a2607d51bfbaa/argcomplete-1.9.2-py2.py3-none-any.whl
    Collecting hjson==3.0.1 (from zappa==0.45.1->-r requirements.txt (line 2))
    Collecting troposphere>=1.9.0 (from zappa==0.45.1->-r requirements.txt (line 2))
    Collecting python-dateutil==2.6.1 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/4b/0d/7ed381ab4fe80b8ebf34411d14f253e1cf3e56e2820ffa1d8844b23859a2/python_dateutil-2.6.1-py2.py3-none-any.whl
    Collecting botocore>=1.7.19 (from zappa==0.45.1->-r requirements.txt (line 2))
      Downloading https://files.pythonhosted.org/packages/65/98/12aa979ca3215d69111026405a9812d7bb0c9ae49e2800b00d3bd794705b/botocore-1.10.24-py2.py3-none-any.whl (4.2MB)
        100% |████████████████████████████████| 4.2MB 768kB/s 
    Collecting requests>=2.10.0 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
    Collecting jmespath==0.9.3 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/b7/31/05c8d001f7f87f0f07289a5fc0fc3832e9a57f2dbd4d3b0fee70e0d51365/jmespath-0.9.3-py2.py3-none-any.whl
    Collecting wsgi-request-logger==0.4.6 (from zappa==0.45.1->-r requirements.txt (line 2))
    Collecting lambda-packages==0.19.0 (from zappa==0.45.1->-r requirements.txt (line 2))
    Collecting python-slugify==1.2.4 (from zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/9f/77/ab7134b731d0e831cf82861c1ab0bb318e80c41155fa9da18958f9d96057/python_slugify-1.2.4-py2.py3-none-any.whl
    Collecting placebo>=0.8.1 (from kappa==0.6.0->zappa==0.45.1->-r requirements.txt (line 2))
    Collecting click>=5.1 (from kappa==0.6.0->zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/34/c1/8806f99713ddb993c5366c362b2f908f18269f8d792aff1abfd700775a77/click-6.7-py2.py3-none-any.whl
    Collecting s3transfer<0.2.0,>=0.1.10 (from boto3>=1.4.7->zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/d7/14/2a0004d487464d120c9fb85313a75cd3d71a7506955be458eebfe19a6b1d/s3transfer-0.1.13-py2.py3-none-any.whl
    Collecting cfn-flip>=0.2.5 (from troposphere>=1.9.0->zappa==0.45.1->-r requirements.txt (line 2))
    Collecting certifi>=2017.4.17 (from requests>=2.10.0->zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl
    Collecting chardet<3.1.0,>=3.0.2 (from requests>=2.10.0->zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
    Collecting idna<2.7,>=2.5 (from requests>=2.10.0->zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
    Collecting urllib3<1.23,>=1.21.1 (from requests>=2.10.0->zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
    Collecting Unidecode>=0.04.16 (from python-slugify==1.2.4->zappa==0.45.1->-r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/59/ef/67085e30e8bbcdd76e2f0a4ad8151c13a2c5bce77c85f8cad6e1f16fb141/Unidecode-1.0.22-py2.py3-none-any.whl
    Installing collected packages: pytz, Django, future, toml, docutils, PyYAML, futures, base58, durationpy, jmespath, six, python-dateutil, botocore, s3transfer, boto3, placebo, click, kappa, Werkzeug, tqdm, argcomplete, hjson, cfn-flip, troposphere, certifi, chardet, idna, urllib3, requests, wsgi-request-logger, lambda-packages, Unidecode, python-slugify, zappa
    Successfully installed Django-1.11.11 PyYAML-3.12 Unidecode-1.0.22 Werkzeug-0.12 argcomplete-1.9.2 base58-0.2.4 boto3-1.7.24 botocore-1.10.24 certifi-2018.4.16 cfn-flip-1.0.3 chardet-3.0.4 click-6.7 docutils-0.14 durationpy-0.5 future-0.16.0 futures-3.1.1 hjson-3.0.1 idna-2.6 jmespath-0.9.3 kappa-0.6.0 lambda-packages-0.19.0 placebo-0.8.1 python-dateutil-2.6.1 python-slugify-1.2.4 pytz-2018.4 requests-2.18.4 s3transfer-0.1.13 six-1.11.0 toml-0.9.4 tqdm-4.19.1 troposphere-2.2.1 urllib3-1.22 wsgi-request-logger-0.4.6 zappa-0.45.1
    @velotiotech

    Now if you run the server directly it will log a warning as the database is not set up yet.

    $ python manage.py runserver  

    Performing system checks...
    
    System check identified no issues (0 silenced).
    
    You have 13 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
    Run 'python manage.py migrate' to apply them.
    
    May 20, 2018 - 14:47:32
    Django version 1.11.11, using settings 'django_zappa_sample.settings'
    Starting development server at http://127.0.0.1:8000/
    Quit the server with CONTROL-C.

    Also trying to access admin page (http://localhost:8000/admin/) will throw an “OperationalError” exception with below log at server end.

    Internal Server Error: /admin/
    Traceback (most recent call last):
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/core/handlers/exception.py", line 41, in inner
        response = get_response(request)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, in _get_response
        response = self.process_exception_by_middleware(e, request)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, in _get_response
        response = wrapped_callback(request, *callback_args, **callback_kwargs)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/admin/sites.py", line 242, in wrapper
        return self.admin_view(view, cacheable)(*args, **kwargs)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/utils/decorators.py", line 149, in _wrapped_view
        response = view_func(request, *args, **kwargs)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/views/decorators/cache.py", line 57, in _wrapped_view_func
        response = view_func(request, *args, **kwargs)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/admin/sites.py", line 213, in inner
        if not self.has_permission(request):
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/admin/sites.py", line 187, in has_permission
        return request.user.is_active and request.user.is_staff
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/utils/functional.py", line 238, in inner
        self._setup()
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/utils/functional.py", line 386, in _setup
        self._wrapped = self._setupfunc()
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py", line 24, in <lambda>
        request.user = SimpleLazyObject(lambda: get_user(request))
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py", line 12, in get_user
        request._cached_user = auth.get_user(request)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/auth/__init__.py", line 211, in get_user
        user_id = _get_user_session_key(request)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/auth/__init__.py", line 61, in _get_user_session_key
        return get_user_model()._meta.pk.to_python(request.session[SESSION_KEY])
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/sessions/backends/base.py", line 57, in __getitem__
        return self._session[key]
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/sessions/backends/base.py", line 207, in _get_session
        self._session_cache = self.load()
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/sessions/backends/db.py", line 35, in load
        expire_date__gt=timezone.now()
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/manager.py", line 85, in manager_method
        return getattr(self.get_queryset(), name)(*args, **kwargs)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/query.py", line 374, in get
        num = len(clone)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/query.py", line 232, in __len__
        self._fetch_all()
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/query.py", line 1118, in _fetch_all
        self._result_cache = list(self._iterable_class(self))
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/query.py", line 53, in __iter__
        results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
        raise original_exception
    OperationalError: no such table: django_session
    [20/May/2018 14:59:23] "GET /admin/ HTTP/1.1" 500 153553
    Not Found: /favicon.ico

    In order to fix this you need to run the migration into your database so that essential tables like auth_user, sessions, etc are created before any request is made to the server.

    $ python manage.py migrate 

    Operations to perform:
      Apply all migrations: admin, auth, contenttypes, sessions
    Running migrations:
      Applying contenttypes.0001_initial... OK
      Applying auth.0001_initial... OK
      Applying admin.0001_initial... OK
      Applying admin.0002_logentry_remove_auto_add... OK
      Applying contenttypes.0002_remove_content_type_name... OK
      Applying auth.0002_alter_permission_name_max_length... OK
      Applying auth.0003_alter_user_email_max_length... OK
      Applying auth.0004_alter_user_username_opts... OK
      Applying auth.0005_alter_user_last_login_null... OK
      Applying auth.0006_require_contenttypes_0002... OK
      Applying auth.0007_alter_validators_add_error_messages... OK
      Applying auth.0008_alter_user_username_max_length... OK
      Applying sessions.0001_initial... OK

    NOTE: Use DATABASES from project settings file to configure your database that you would want your Django application to use once hosted on AWS Lambda. By default, its configured to create a local SQLite database file as backend.

    You can run the server again and it should now load the admin panel of your website.

    Do verify if you have the zappa python package into your virtual environment before moving forward.

    Configuring Zappa Settings

    Deploying with Zappa is simple as it only needs a configuration file to run and rest will be managed by Zappa. To create this configuration file run from your project root directory –

    $ zappa init 

    ███████╗ █████╗ ██████╗ ██████╗  █████╗
    ╚══███╔╝██╔══██╗██╔══██╗██╔══██╗██╔══██╗
      ███╔╝ ███████║██████╔╝██████╔╝███████║
     ███╔╝  ██╔══██║██╔═══╝ ██╔═══╝ ██╔══██║
    ███████╗██║  ██║██║     ██║     ██║  ██║
    ╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝     ╚═╝  ╚═╝
    
    Welcome to Zappa!
    
    Zappa is a system for running server-less Python web applications on AWS Lambda and AWS API Gateway.
    This `init` command will help you create and configure your new Zappa deployment.
    Let's get started!
    
    Your Zappa configuration can support multiple production stages, like 'dev', 'staging', and 'production'.
    What do you want to call this environment (default 'dev'): 
    
    AWS Lambda and API Gateway are only available in certain regions. Let's check to make sure you have a profile set up in one that will work.
    We found the following profiles: default, and hdx. Which would you like us to use? (default 'default'): 
    
    Your Zappa deployments will need to be uploaded to a private S3 bucket.
    If you don't have a bucket yet, we'll create one for you too.
    What do you want call your bucket? (default 'zappa-108wqhyn4'): django-zappa-sample-bucket
    
    It looks like this is a Django application!
    What is the module path to your projects's Django settings?
    We discovered: django_zappa_sample.settings
    Where are your project's settings? (default 'django_zappa_sample.settings'): 
    
    You can optionally deploy to all available regions in order to provide fast global service.
    If you are using Zappa for the first time, you probably don't want to do this!
    Would you like to deploy this application globally? (default 'n') [y/n/(p)rimary]: n
    
    Okay, here's your zappa_settings.json:
    
    {
        "dev": {
            "aws_region": "us-east-1", 
            "django_settings": "django_zappa_sample.settings", 
            "profile_name": "default", 
            "project_name": "django-zappa-sa", 
            "runtime": "python2.7", 
            "s3_bucket": "django-zappa-sample-bucket"
        }
    }
    
    Does this look okay? (default 'y') [y/n]: y
    
    Done! Now you can deploy your Zappa application by executing:
    
    	$ zappa deploy dev
    
    After that, you can update your application code with:
    
    	$ zappa update dev
    
    To learn more, check out our project page on GitHub here: https://github.com/Miserlou/Zappa
    and stop by our Slack channel here: https://slack.zappa.io
    
    Enjoy!,
     ~ Team Zappa!

    You can verify zappa_settings.json generated at your project root directory.

    TIP: The virtual environment name should not be the same as the Zappa project name, as this may cause errors.

    Additionally, you could specify other settings in  zappa_settings.json file as per requirement using Advanced Settings.

    Now, you’re ready to deploy!

    IAM Permissions

    In order to deploy the Django Application to Lambda/Gateway, setup an IAM role (eg. ZappaLambdaExecutionRole) with the following permissions:

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": [
    "iam:AttachRolePolicy",
    "iam:CreateRole",
    "iam:GetRole",
    "iam:PutRolePolicy"
    ],
    "Resource": [
    "*"
    ]
    },
    {
    "Effect": "Allow",
    "Action": [
    "iam:PassRole"
    ],
    "Resource": [
    "arn:aws:iam:::role/*-ZappaLambdaExecutionRole"
    ]
    },
    {
    "Effect": "Allow",
    "Action": [
    "apigateway:DELETE",
    "apigateway:GET",
    "apigateway:PATCH",
    "apigateway:POST",
    "apigateway:PUT",
    "events:DeleteRule",
    "events:DescribeRule",
    "events:ListRules",
    "events:ListTargetsByRule",
    "events:ListRuleNamesByTarget",
    "events:PutRule",
    "events:PutTargets",
    "events:RemoveTargets",
    "lambda:AddPermission",
    "lambda:CreateFunction",
    "lambda:DeleteFunction",
    "lambda:GetFunction",
    "lambda:GetPolicy",
    "lambda:ListVersionsByFunction",
    "lambda:RemovePermission",
    "lambda:UpdateFunctionCode",
    "lambda:UpdateFunctionConfiguration",
    "cloudformation:CreateStack",
    "cloudformation:DeleteStack",
    "cloudformation:DescribeStackResource",
    "cloudformation:DescribeStacks",
    "cloudformation:ListStackResources",
    "cloudformation:UpdateStack",
    "logs:DescribeLogStreams",
    "logs:FilterLogEvents",
    "route53:ListHostedZones",
    "route53:ChangeResourceRecordSets",
    "route53:GetHostedZone",
    "s3:CreateBucket",
    ],
    "Resource": [
    "*"
    ]
    },
    {
    "Effect": "Allow",
    "Action": [
    "s3:ListBucket"
    ],
    "Resource": [
    "arn:aws:s3:::"
    ]
    },
    {
    "Effect": "Allow",
    "Action": [
    "s3:DeleteObject",
    "s3:GetObject",
    "s3:PutObject",
    "s3:CreateMultipartUpload",
    "s3:AbortMultipartUpload",
    "s3:ListMultipartUploadParts",
    "s3:ListBucketMultipartUploads"
    ],
    "Resource": [
    "arn:aws:s3:::/*"
    ]
    }
    ]
    }

    Deploying Django Application

    Before deploying the application, ensure that the IAM role is set in the config JSON as follows:

    {
    "dev": {
    ...
    "manage_roles": false, // Disable Zappa client managing roles.
    "role_name": "MyLambdaRole", // Name of your Zappa execution role. Optional, default: --ZappaExecutionRole.
    "role_arn": "arn:aws:iam::12345:role/app-ZappaLambdaExecutionRole", // ARN of your Zappa execution role. Optional.
    ...
    },
    ...
    }

    Once your settings are configured, you can package and deploy your application to a stage called “dev” with a single command:

    $ zappa deploy dev

    Calling deploy for stage dev..
    Downloading and installing dependencies..
    Packaging project as zip.
    Uploading django-zappa-sa-dev-1526831069.zip (10.9MiB)..
    100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.4M/11.4M [01:02<00:00, 75.3KB/s]
    Scheduling..
    Scheduled django-zappa-sa-dev-zappa-keep-warm-handler.keep_warm_callback with expression rate(4 minutes)!
    Uploading django-zappa-sa-dev-template-1526831157.json (1.6KiB)..
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.60K/1.60K [00:02<00:00, 792B/s]
    Waiting for stack django-zappa-sa-dev to create (this can take a bit)..
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:11<00:00,  2.92s/res]
    Deploying API Gateway..
    Deployment complete!: https://akg59b222b.execute-api.us-east-1.amazonaws.com/dev

    You should see that your Zappa deployment completed successfully with URL to API gateway created for your application.

    Troubleshooting

    1. If you are seeing the following error while deployment, it’s probably because you do not have sufficient privileges to run deployment on AWS Lambda. Ensure your IAM role has all the permissions as described above or set “manage_roles” to true so that Zappa can create and manage the IAM role for you.

    Calling deploy for stage dev..
    Creating django-zappa-sa-dev-ZappaLambdaExecutionRole IAM Role..
    Error: Failed to manage IAM roles!
    You may lack the necessary AWS permissions to automatically manage a Zappa execution role.
    To fix this, see here: https://github.com/Miserlou/Zappa#using-custom-aws-iam-roles-and-policies

    2. The below error will be caused as you have not listed “events.amazonaws.com” as Trusted Entity for your IAM Role. You can add the same or set “keep_warm” parameter to false in your Zappa settings file. Your Zappa deployment was partially deployed as it got terminated abnormally.

    Downloading and installing dependencies..
    100%|████████████████████████████████████████████| 44/44 [00:05<00:00, 7.92pkg/s]
    Packaging project as zip..
    Uploading django-zappa-sample-dev-1482817370.zip (8.8MiB)..
    100%|█████████████████████████████████████████| 9.22M/9.22M [00:17<00:00, 527KB/s]
    Scheduling...
    Oh no! An error occurred! :(
    
    ==============
    
    Traceback (most recent call last):
    Traceback (most recent call last):
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 2610, in handle
        sys.exit(cli.handle())
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 505, in handle
        self.dispatch_command(self.command, stage)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 539, in dispatch_command
        self.deploy(self.vargs['zip'])
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 800, in deploy
        self.zappa.add_binary_support(api_id=api_id, cors=self.cors)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/core.py", line 1490, in add_binary_support
        restApiId=api_id
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
        return self._make_api_call(operation_name, kwargs)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
        raise error_class(parsed_response, operation_name)
    ClientError: An error occurred (ValidationError) when calling the PutRole operation: Provided role 'arn:aws:iam:484375727565:role/lambda_basic_execution' cannot be assumed by principal
    'events.amazonaws.com'.
    
    ==============
    
    Need help? Found a bug? Let us know! :D
    File bug reports on GitHub here: https://github.com/Miserlou/Zappa
    And join our Slack channel here: https://slack.zappa.io
    Love!,
    ~ Team Zappa!

    3. Adding the parameter and running zappa update will cause above error. As you can see it says “Stack django-zappa-sa-dev does not exists” as the previous deployment was unsuccessful. To fix this, delete the Lambda function from console and rerun the deployment.

    Downloading and installing dependencies..
    100%|████████████████████████████████████████████| 44/44 [00:05<00:00, 7.92pkg/s]
    Packaging project as zip..
    Uploading django-zappa-sample-dev-1482817370.zip (8.8MiB)..
    100%|█████████████████████████████████████████| 9.22M/9.22M [00:17<00:00, 527KB/s]
    Updating Lambda function code..
    Updating Lambda function configuration..
    Uploading djangoo-zapppa-sample-dev-template-1482817403.json (1.5KiB)..
    100%|████████████████████████████████████████| 1.56K/1.56K [00:00<00:00, 6.56KB/s]
    CloudFormation stack missing, re-deploy to enable updates
    ERROR:Could not get API ID.
    Traceback (most recent call last):
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 2610, in handle
        sys.exit(cli.handle())
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 505, in handle
        self.dispatch_command(self.command, stage)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 539, in dispatch_command
        self.deploy(self.vargs['zip'])
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 800, in deploy
        self.zappa.add_binary_support(api_id=api_id, cors=self.cors)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/core.py", line 1490, in add_binary_support
        restApiId=api_id
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
        return self._make_api_call(operation_name, kwargs)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
        raise error_class(parsed_response, operation_name)
    ClientError: An error occurred (ValidationError) when calling the DescribeStackResource operation: Stack 'django-zappa-sa-dev' does not exist
    Deploying API Gateway..
    Oh no! An error occurred! :(
    
    ==============
    
    Traceback (most recent call last):
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 1847, in handle
    sys.exit(cli.handle())
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 345, in handle
    self.dispatch_command(self.command, environment)
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 379, in dispatch_command
    self.update()
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 605, in update
    endpoint_url = self.deploy_api_gateway(api_id)
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 1816, in deploy_api_gateway
    cloudwatch_metrics_enabled=self.zappa_settings[self.api_stage].get('cloudwatch_metrics_enabled', False),
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/zappa.py", line 1014, in deploy_api_gateway
    variables=variables or {}
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call
    return self._make_api_call(operation_name, kwargs)
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 513, in _make_api_call
    api_params, operation_model, context=request_context)
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 566, in _convert_to_request_dict
    api_params, operation_model)
    File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/validate.py", line 270, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
    ParamValidationError: Parameter validation failed:
    Invalid type for parameter restApiId, value: None, type: <type 'NoneType'>, valid types: <type 'basestring'>
    
    ==============
    
    Need help? Found a bug? Let us know! :D
    File bug reports on GitHub here: https://github.com/Miserlou/Zappa
    And join our Slack channel here: https://slack.zappa.io
    Love!,
    ~ Team Zappa!

    4.  If you run into any distribution error, please try down-grading your pip version to 9.0.1.

    $ pip install pip==9.0.1   

    Calling deploy for stage dev..
    Downloading and installing dependencies..
    Oh no! An error occurred! :(
    
    ==============
    
    Traceback (most recent call last):
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 2610, in handle
        sys.exit(cli.handle())
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 505, in handle
        self.dispatch_command(self.command, stage)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 539, in dispatch_command
        self.deploy(self.vargs['zip'])
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 709, in deploy
        self.create_package()
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 2171, in create_package
        disable_progress=self.disable_progress
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/core.py", line 595, in create_lambda_zip
        installed_packages = self.get_installed_packages(site_packages, site_packages_64)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/core.py", line 751, in get_installed_packages
        pip.get_installed_distributions()
    AttributeError: 'module' object has no attribute 'get_installed_distributions'
    
    ==============
    
    Need help? Found a bug? Let us know! :D
    File bug reports on GitHub here: https://github.com/Miserlou/Zappa
    And join our Slack channel here: https://slack.zappa.io
    Love!,
     ~ Team Zappa!

    or,

    If you run into NotFoundException(Invalid REST API Identifier issue) please try undeploying the Zappa stage and retry again.

    Calling deploy for stage dev..
    Downloading and installing dependencies..
    Packaging project as zip.
    Uploading django-zappa-sa-dev-1526830532.zip (10.9MiB)..
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.4M/11.4M [00:42<00:00, 331KB/s]
    Scheduling..
    Scheduled django-zappa-sa-dev-zappa-keep-warm-handler.keep_warm_callback with expression rate(4 minutes)!
    Uploading django-zappa-sa-dev-template-1526830690.json (1.6KiB)..
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.60K/1.60K [00:01<00:00, 801B/s]
    Oh no! An error occurred! :(
    
    ==============
    
    Traceback (most recent call last):
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 2610, in handle
        sys.exit(cli.handle())
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 505, in handle
        self.dispatch_command(self.command, stage)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 539, in dispatch_command
        self.deploy(self.vargs['zip'])
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 800, in deploy
        self.zappa.add_binary_support(api_id=api_id, cors=self.cors)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/core.py", line 1490, in add_binary_support
        restApiId=api_id
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
        return self._make_api_call(operation_name, kwargs)
      File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
        raise error_class(parsed_response, operation_name)
    NotFoundException: An error occurred (NotFoundException) when calling the GetRestApi operation: Invalid REST API identifier specified 484375727565:akg59b222b
    
    ==============
    
    Need help? Found a bug? Let us know! :D
    File bug reports on GitHub here: https://github.com/Miserlou/Zappa
    And join our Slack channel here: https://slack.zappa.io
    Love!,
     ~ Team Zappa!

    TIP: To understand how your application works on serverless environment please visit this link.

    Post Deployment Setup

    Migrate database

    At this point, you should have an empty database for your Django application to fill up with a schema.

    $ zappa manage.py migrate dev

    Once you run above command the database migrations will be applied on the database as specified in your Django settings.

    Creating Superuser of Django Application

    You also might need to create a new superuser on the database. You could use the following command on your project directory.

    $ zappa invoke --raw dev "from django.contrib.auth.models import User; User.objects.create_superuser('username', 'username@yourdomain.com', 'password')"

    Alternatively,

    $ python manage createsuperuser

    Note that your application must be connected to the same database as this is run as standard Django administration command (not a Zappa command).

    Managing static files

    Your Django application will be having a dependency on static files, Django admin panel uses a combination of JS, CSS and image files.

    NOTE: Zappa is for running your application code, not for serving static web assets. If you plan on serving custom static assets in your web application (CSS/JavaScript/images/etc.), you’ll likely want to use a combination of AWS S3 and AWS CloudFront.

    You will need to add following packages to your virtual environment required for management of files to and from S3 django-storages and boto.

    $ pip install django-storages boto
    Add Django-Storage to your INSTALLED_APPS in settings.py
    INSTALLED_APPS = (
    ...,
    storages',
    )
    
    Configure Django-storage in settings.py as
    
    AWS_STORAGE_BUCKET_NAME = 'django-zappa-sample-bucket'
    AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME
    STATIC_URL = "https://%s/" % AWS_S3_CUSTOM_DOMAIN
    STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

    Once you have setup the Django application to serve your static files from AWS S3, run following command to upload the static file from your project to S3.

    $ python manage.py collectstatic --noinput

    or

    $ zappa update dev
    $ zappa manage dev "collectstatic --noinput"

    Check that at least 61 static files are moved to S3 bucket. Admin panel is built over  61 static files.

    NOTE: STATICFILES_DIR must be configured properly to collect your files from the appropriate location.

    Tip: You need to render static files in your templates by loading static path and using the same.  Example, {% static %}

    Setting Up API Gateway

    To connect to your Django application you also need to ensure you have API gateway setup for your AWS Lambda Function.  You need to have GET methods set up for all the URL resources used in your Django application. Alternatively, you can setup a proxy method to allow all subresources to be processed through one API method.

    Go to AWS Lambda function console and add API Gateway from ‘Add triggers’.

    1. Configure API, Deployment Stage, and Security for API Gateway. Click Save once it is done.

    2. Go to API Gateway console and,

    a. Recreate ANY method for / resource.

    i. Check `Use Lambda Proxy integration`

    ii. Set `Lambda Region` and `Lambda Function` and `Save` it.

    a. Recreate ANY method for /{proxy+} resource.

    i. Select `Lambda Function Proxy`

    ii. Set`Lambda Region` and `Lambda Function` and `Save` it.

    3. Click on Action and select Deploy API. Set Deployment Stage and click Deploy

    4. Ensure that GET and POST method for / and Proxy are set as Override for this method

    Setting Up Custom SSL Endpoint

    Optionally, you could also set up your own custom defined SSL endpoint with Zappa and install your certificate with your domain by running certify with Zappa. 

    $ zappa certify dev
    
    ...
    "certificate_arn": "arn:aws:acm:us-east-1:xxxxxxxxxxxx:certificate/xxxxxxxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxxxx",
    "domain": "django-zappa-sample.com"

    Now you are ready to launch your Django Application hosted on AWS Lambda.

    Additional Notes:

    •  Once deployed, you must run “zappa update <stage-name>” for updating your already hosted AWS Lambda function.</stage-name>
    • You can check server logs for investigation by running “zappa tail” command.
    • To un-deploy your application, simply run: `zappa undeploy <stage-name>`</stage-name>

    You’ve seen how to deploy Django application on AWS Lambda using Zappa. If you are creating your Django application for first time you might also want to read Edgar Roman’s Django Zappa Guide.

    Start building your Django application and let us know in the comments if you need any help during your application deployment over AWS Lambda.

  • OPA On Kubernetes: An Introduction For Beginners

    Introduction:

    More often than not organizations need to apply various kinds of policies on the environments where they run their applications. These policies might be required to meet compliance requirements, achieve a higher degree of security, achieve standardization across multiple environments, etc. This calls for an automated/declarative way to define and enforce these policies. Policy engines like OPA help us achieve the same. 

    Motivation behind Open Policy Agent (OPA)

    When we run our application, it generally comprises multiple subsystems. Even in the simplest of cases, we will be having an API gateway/load balancer, 1-2 applications and a database. Generally, all these subsystems will have different mechanisms for authorizing the requests, for example, the application might be using JWT tokens to authorize the request, but your database is using grants to authorize the request, it is also possible that your application is accessing some third-party APIs or cloud services which will again have a different way of authorizing the request. Add to this your CI/CD servers, your log server, etc and you can see how many different ways of authorization can exist even in a small system. 

    The existence of so many authorization models in our system makes life difficult when we need to meet compliance or information security requirements or even some self-imposed organizational policies. For example, if we need to adhere to some new compliance requirements then we need to understand and implement the same for all the components which do authorization in our system.

    “The main motivation behind OPA is to achieve unified policy enforcements across the stack

    What are Open Policy Agent (OPA) and OPA Gatekeeper

    The OPA is an open-source, general-purpose policy engine that can be used to enforce policies on various types of software systems like microservices, CI/CD pipelines, gateways, Kubernetes, etc. OPA was developed by Styra and is currently a part of CNCF.

    OPA provides us with REST APIs which our system can call to check if the policies are being met for a request payload or not. It also provides us with a high-level declarative language, Rego which allows us to specify the policies we want to enforce as code. This provides us with lots of flexibility while defining our policies.

    The above image shows the architecture of OPA. It exposes APIs which any service that needs to make an authorization or policy decision, can call (policy query) and then OPA can make a decision based on the Rego code for the policy and return a decision to the service that further processes the request accordingly. The enforcement is done by the actual service itself, OPA is responsible only for making the decision. This is how OPA becomes a general-purpose policy engine and supports a large number of services.   

    The Gatekeeper project is a Kubernetes specific implementation of the OPA. Gatekeeper allows us to use OPA in a Kubernetes native way to enforce the desired policies. 

    How Gatekeeper enforces policies

    On the Kubernetes cluster, the Gatekeeper is installed as a ValidatingAdmissionWebhook. The Admission Controllers can intercept requests after they have been authenticated and authorized by the K8s API server, but before they are persisted in the database. If any of the admission controllers rejects the request then the overall request is rejected. The limitation of admission controllers is that they need to be compiled into the kube-apiserver and can be enabled only when the apiserver starts up. 

    To overcome this rigidity of the admission controller, admission webhooks were introduced. Once we enable admission webhooks controllers in our cluster, they can send admission requests to external HTTP callbacks and receive admission responses. Admission webhook can be of two types MutatingAdmissionWebhook and ValidatingAdmissionWebhook. The difference between the two is that mutating webhooks can modify the objects that they receive while validating webhooks cannot. The below image roughly shows the flow of an API request once both mutating and validating admission controllers are enabled.

     

    The role of Gatekeeper is to simply check if the request meets the defined policy or not, that is why it is installed as a validating webhook.

    Demo:

    Install Gatekeeper:

    kubectl apply -f
    https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml

    Now we have Gatekeeper up and running in our cluster. The above installation also created a CRD named `constrainttemplates.templates.gatekeeper.sh’. This CRD allows us to create constraint templates for the policy we want to enforce. In the constraint template, we define the constraints logic using the Rego code and also its schema. Once the constraint template is created, we can create the constraints which are instances of the constraint templates, created for specific resources. Think of it as function and actual function calls, the constraint templates are like functions that are invoked with different values of the parameter (resource kind and other values) by constraints.

    To get a better understanding of the same, let’s go ahead and create constraints templates and constraints.

    The policy that we want to enforce is to prevent developers from creating a service of type LoadBalancer in the `dev` namespace of the cluster, where they verify the working of other code. Creating services of type LoadBalancer in the dev environment is adding unnecessary costs. 

    Below is the constraint template for the same.

    apiVersion: templates.gatekeeper.sh/v1beta1
    kind: ConstraintTemplate
    metadata:
      name: lbtypesvcnotallowed
    spec:
      crd:
        spec:
          names:
            kind: LBTypeSvcNotAllowed
            listKind: LBTypeSvcNotAllowedList
            plural: lbtypesvcnotallowed
            singular: lbtypesvcnotallowed
      targets:
        - target: admission.k8s.gatekeeper.sh
          rego: |
            package kubernetes.admission
            violation[{"msg": msg}] {
                        input.review.kind.kind = "Service"
                        input.review.operation = "CREATE"
                        input.review.object.spec.type = "LoadBalancer"
                        msg := "LoadBalancer Services are not permitted"
            }

    In the constraint template spec, we define a new object kind/type which we will use while creating the constraints, then in the target, we specify the Rego code which will verify if the request meets the policy or not. In the Rego code, we specify a violation that if the request is to create a service of type LoadBalancer then the request should be denied.

    Using the above template, we can now define constraints:

    apiVersion: constraints.gatekeeper.sh/v1beta1
    kind: LBTypeSvcNotAllowed
    metadata:
      name: deny-lb-type-svc-dev-ns
    spec:
      match:
        kinds:
          - apiGroups: [""]
            kinds: ["Service"]
        namespaces:
          - "dev"

    Here we have specified the kind of the Kubernetes object (Service) on which we want to apply the constraint and we have specified the namespace as dev because we want the constraint to be enforced only on the dev namespace.

    Let’s go ahead and create the constraint template and constraint:

    Note: After creating the constraint template, please check if its status is true or not, otherwise you will get an error while creating the constraints. Also it is advisable to verify the Rego code snippet before using them in the constraints template.

    Now let’s try to create a service of type LoadBalancer in the dev namespace:

    kind: Service
    apiVersion: v1
    metadata:
      name: opa-service
    spec:
      type: LoadBalancer
      selector:
        app: opa-app
      ports:
      - protocol: TCP
        port: 80
        targetPort: 8080

    When we tried to create a service of type LoadBalancer in the dev namespace, we got the error that it was denied by the admission webhook due to `deny-lb-type-svc-dev-ns` constraint, but when we try to create the service in the default namespace, we were able to do so.

    Here we are not passing any parameters to the Rego policy from our constraints, but we can certainly do so to make our policy more generic, for example, we can add a field named servicetype to constraint template and in the policy code, deny all the request where the servicetype value defined in the constraint matches the value of the request. With this, we will be able to deny service of types other than LoadBalancer as well in any namespace of our cluster.

    Gatekeeper also provides auditing for resources that were created before the constraint was applied. The information is available in the status of the constraint objects. This helps us in identifying which objects in our cluster are not compliant with our constraints. 

    Conclusion:

    OPA allows us to apply fine-grained policies in our Kubernetes clusters and can be instrumental in improving the overall security of Kubernetes clusters which has always been a concern for many organizations while adopting or migrating to Kubernetes. It also makes meeting the compliance and audit requirements much simpler. There is some learning curve as we need to get familiar with Rego to code our policies, but the language is very simple and there are quite a few good examples to help in getting started.

  • Demystifying High Availability in Kubernetes Using Kubeadm

    Introduction

    The rise of containers has reshaped the way we develop, deploy and maintain the software. Containers allow us to package the different services that constitute an application into separate containers, and to deploy those containers across a set of virtual and physical machines. This gives rise to container orchestration tool to automate the deployment, management, scaling and availability of a container-based application. Kubernetes allows deployment and management of container-based applications at scale. Learn more about backup and disaster recovery for your Kubernetes clusters.

    One of the main advantages of Kubernetes is how it brings greater reliability and stability to the container-based distributed application, through the use of dynamic scheduling of containers. But, how do you make sure Kubernetes itself stays up when a component or its master node goes down?
     

     

    Why we need Kubernetes High Availability?

    Kubernetes High-Availability is about setting up Kubernetes, along with its supporting components in a way that there is no single point of failure. A single master cluster can easily fail, while a multi-master cluster uses multiple master nodes, each of which has access to same worker nodes. In a single master cluster the important component like API server, controller manager lies only on the single master node and if it fails you cannot create more services, pods etc. However, in case of Kubernetes HA environment, these important components are replicated on multiple masters(usually three masters) and if any of the masters fail, the other masters keep the cluster up and running.

    Advantages of multi-master

    In the Kubernetes cluster, the master node manages the etcd database, API server, controller manager, and scheduler, along with all the worker nodes. What if we have only a single master node and if that node fails, all the worker nodes will be unscheduled and the cluster will be lost.

    In a multi-master setup, by contrast, a multi-master provides high availability for a single cluster by running multiple apiserver, etcd, controller-manager, and schedulers. This does not only provides redundancy but also improves network performance because all the masters are dividing the load among themselves.

    A multi-master setup protects against a wide range of failure modes, from a loss of a single worker node to the failure of the master node’s etcd service. By providing redundancy, a multi-master cluster serves as a highly available system for your end-users.

    Steps to Achieve Kubernetes HA

    Before moving to steps to achieve high-availability, let us understand what we are trying to achieve through a diagram:

    (Image Source: Kubernetes Official Documentation)

    Master Node: Each master node in a multi-master environment run its’ own copy of Kube API server. This can be used for load balancing among the master nodes. Master node also runs its copy of the etcd database, which stores all the data of cluster. In addition to API server and etcd database, the master node also runs k8s controller manager, which handles replication and scheduler, which schedules pods to nodes.

    Worker Node: Like single master in the multi-master cluster also the worker runs their own component mainly orchestrating pods in the Kubernetes cluster. We need 3 machines which satisfy the Kubernetes master requirement and 3 machines which satisfy the Kubernetes worker requirement.

    For each master, that has been provisioned, follow the installation guide to install kubeadm and its dependencies. In this blog we will use k8s 1.10.4 to implement HA.

    Note: Please note that cgroup driver for docker and kubelet differs in some version of k8s, make sure you change cgroup driver to cgroupfs for docker and kubelet. If cgroup driver for kubelet and docker differs then the master doesn’t come up when rebooted.

    Setup etcd cluster

    1. Install cfssl and cfssljson

    $ curl -o /usr/local/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 
    $ curl -o /usr/local/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
    $ chmod +x /usr/local/bin/cfssl*
    $ export PATH=$PATH:/usr/local/bin

    2 . Generate certificates on master-0

    $ mkdir -p /etc/kubernetes/pki/etcd
    $ cd /etc/kubernetes/pki/etcd

    3. Create config.json file in /etc/kubernetes/pki/etcd folder with following content.

    {
        "signing": {
            "default": {
                "expiry": "43800h"
            },
            "profiles": {
                "server": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "server auth",
                        "client auth"
                    ]
                },
                "client": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "client auth"
                    ]
                },
                "peer": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "server auth",
                        "client auth"
                    ]
                }
            }
        }
    }

    4. Create ca-csr.json file in /etc/kubernetes/pki/etcd folder with following content.

    {
        "CN": "etcd",
        "key": {
            "algo": "rsa",
            "size": 2048
        }
    }

    5. Create client.json file in /etc/kubernetes/pki/etcd folder with following content.

    {
        "CN": "client",
        "key": {
            "algo": "ecdsa",
            "size": 256
        }
    }

    $ cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
    $ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client

    6. Create a directory  /etc/kubernetes/pki/etcd on master-1 and master-2 and copy all the generated certificates into it.

    7. On all masters, now generate peer and etcd certs in /etc/kubernetes/pki/etcd. To generate them, we need the previous CA certificates on all masters.

    $ export PEER_NAME=$(hostname)
    $ export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet K[d.]+')
    
    $ cfssl print-defaults csr > config.json
    $ sed -i 's/www.example.net/'"$PRIVATE_IP"'/' config.json
    $ sed -i 's/example.net/'"$PEER_NAME"'/' config.json
    $ sed -i '0,/CN/{s/example.net/'"$PEER_NAME"'/}' config.json
    
    $ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server config.json | cfssljson -bare server
    $ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer config.json | cfssljson -bare peer

    This will replace the default configuration with your machine’s hostname and IP address, so in case if you encounter any problem just check the hostname and IP address are correct and rerun cfssl command.

    8. On all masters, Install etcd and set it’s environment file.

    $ yum install etcd -y
    $ touch /etc/etcd.env
    $ echo "PEER_NAME=$PEER_NAME" >> /etc/etcd.env
    $ echo "PRIVATE_IP=$PRIVATE_IP" >> /etc/etcd.env

    9. Now, we will create a 3 node etcd cluster on all 3 master nodes. Starting etcd service on all three nodes as systemd. Create a file /etc/systemd/system/etcd.service on all masters.

    [Unit]
    Description=etcd
    Documentation=https://github.com/coreos/etcd
    Conflicts=etcd.service
    Conflicts=etcd2.service
    
    [Service]
    EnvironmentFile=/etc/etcd.env
    Type=notify
    Restart=always
    RestartSec=5s
    LimitNOFILE=40000
    TimeoutStartSec=0
    
    ExecStart=/bin/etcd --name <host_name>  --data-dir /var/lib/etcd --listen-client-urls http://<host_private_ip>:2379,http://127.0.0.1:2379 --advertise-client-urls http://<host_private_ip>:2379 --listen-peer-urls http://<host_private_ip>:2380 --initial-advertise-peer-urls http://<host_private_ip>:2380 --cert-file=/etc/kubernetes/pki/etcd/server.pem --key-file=/etc/kubernetes/pki/etcd/server-key.pem --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --initial-cluster master-0=http://<master0_private_ip>:2380,master-1=http://<master1_private_ip>:2380,master-2=http://<master2_private_ip>:2380 --initial-cluster-token my-etcd-token --initial-cluster-state new --client-cert-auth=false --peer-client-cert-auth=false
    
    [Install]
    WantedBy=multi-user.target

    10. Ensure that you will replace the following placeholder with

    • <host_name> : Replace as the master’s hostname</host_name>
    • <host_private_ip>: Replace as the current host private IP</host_private_ip>
    • <master0_private_ip>: Replace as the master-0 private IP</master0_private_ip>
    • <master1_private_ip>: Replace as the master-1 private IP</master1_private_ip>
    • <master2_private_ip>: Replace as the master-2 private IP</master2_private_ip>

    11. Start the etcd service on all three master nodes and check the etcd cluster health:

    $ systemctl daemon-reload
    $ systemctl enable etcd
    $ systemctl start etcd
    
    $ etcdctl cluster-health

    This will show the cluster healthy and connected to all three nodes.

    Setup load balancer

    There are multiple cloud provider solutions for load balancing like AWS elastic load balancer, GCE load balancing etc. There might not be a physical load balancer available, we can setup a virtual IP load balancer to healthy node master. We are using keepalived for load balancing, install keepalived on all master nodes

    $ yum install keepalived -y

    Create the following configuration file /etc/keepalived/keepalived.conf on all master nodes:

    ! Configuration File for keepalived
    global_defs {
      router_id LVS_DEVEL
    }
    
    vrrp_script check_apiserver {
      script "/etc/keepalived/check_apiserver.sh"
      interval 3
      weight -2
      fall 10
      rise 2
    }
    
    vrrp_instance VI_1 {
        state <state>
        interface <interface>
        virtual_router_id 51
        priority <priority>
        authentication {
            auth_type PASS
            auth_pass velotiotechnologies
        }
        virtual_ipaddress {
            <virtual ip>
        }
        track_script {
            check_apiserver
        }
    }

    • state is either MASTER (on the first master nodes) or BACKUP (the other master nodes).
    • Interface is generally the primary interface, in my case it is eth0
    • Priority should be higher for master node e.g 101 and lower for others e.g 100
    • Virtual_ip should contain the virtual ip of master nodes

    Install the following health check script to /etc/keepalived/check_apiserver.sh on all master nodes:

    #!/bin/sh
    
    errorExit() {
        echo "*** $*" 1>&2
        exit 1
    }
    
    curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
    if ip addr | grep -q <VIRTUAL-IP>; then
        curl --silent --max-time 2 --insecure https://<VIRTUAL-IP>:6443/ -o /dev/null || errorExit "Error GET https://<VIRTUAL-IP>:6443/"
    fi

    $ systemctl restart keepalived

    Setup three master node cluster

    Run kubeadm init on master0:

    Create config.yaml file with following content.

    apiVersion: kubeadm.k8s.io/v1alpha1
    kind: MasterConfiguration
    api:
      advertiseAddress: <master-private-ip>
    etcd:
      endpoints:
      - http://<master0-ip-address>:2379
      - http://<master1-ip-address>:2379
      - http://<master2-ip-address>:2379
      caFile: /etc/kubernetes/pki/etcd/ca.pem
      certFile: /etc/kubernetes/pki/etcd/client.pem
      keyFile: /etc/kubernetes/pki/etcd/client-key.pem
    networking:
      podSubnet: <podCIDR>
    apiServerCertSANs:
    - <load-balancer-ip>
    apiServerExtraArgs:
      endpoint-reconciler-type: lease

    Please ensure that the following placeholders are replaced:

    • <master-private-ip> with the private IPv4 of the master server on which config file resides.</master-private-ip>
    • <master0-ip-address>, <master1-ip-address> and <master-2-ip-address> with the IP addresses of your three master nodes</master-2-ip-address></master1-ip-address></master0-ip-address>
    • <podcidr> with your Pod CIDR. Please read the </podcidr>CNI network section of the docs for more information. Some CNI providers do not require a value to be set. I am using weave-net as pod network, hence podCIDR will be 10.32.0.0/12
    • <load-balancer-ip> with the virtual IP set up in the load balancer in the previous section.</load-balancer-ip>
    $ kubeadm init --config=config.yaml

    10. Run kubeadm init on master1 and master2:

    First of all copy /etc/kubernetes/pki/ca.crt, /etc/kubernetes/pki/ca.key, /etc/kubernetes/pki/sa.key, /etc/kubernetes/pki/sa.pub to master1’s and master2’s /etc/kubernetes/pki folder.

    Note: Copying this files is crucial, otherwise the other two master nodes won’t go into the ready state.

    Copy the config file config.yaml from master0 to master1 and master2. We need to change <master-private-ip> to current master host’s private IP.</master-private-ip>

    $ kubeadm init --config=config.yaml

    11. Now you can install pod network on all three masters to bring them in the ready state. I am using weave-net pod network, to apply weave-net run:

    export kubever=$(kubectl version | base64 | tr -d 'n') kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"

    12. By default, k8s doesn’t schedule any workload on the master, so if you want to schedule workload on master node as well, taint all the master nodes using the command:

    $ kubectl taint nodes --all node-role.kubernetes.io/master-

    13. Now that we have functional master nodes, we can join some worker nodes:

    Use the join string you got at the end of kubeadm init command

    $ kubeadm join 10.0.1.234:6443 --token llb1kx.azsbunpbg13tgc8k --discovery-token-ca-cert-hash sha256:1ad2a436ce0c277d0c5bd3826091e72badbd8417ffdbbd4f6584a2de588bf522

    High Availability in action

    The Kubernetes HA cluster will look like:

    [root@master-0 centos]# kubectl get nodes
    NAME       STATUS     ROLES     AGE       VERSION
    master-0   NotReady   master    4h        v1.10.4
    master-1   Ready      master    4h        v1.10.4
    master-2   Ready      master    4h        v1.10.4

    [root@master-0 centos]# kubectl get pods -n kube-system
    NAME                              READY     STATUS     RESTARTS   AGE
    kube-apiserver-master-0            1/1       Unknown    0          4h
    kube-apiserver-master-1            1/1       Running    0          4h
    kube-apiserver-master-2            1/1       Running    0          4h
    kube-controller-manager-master-0   1/1       Unknown    0          4h
    kube-controller-manager-master-1   1/1       Running    0          4h
    kube-controller-manager-master-2   1/1       Running    0          4h
    kube-dns-86f4d74b45-wh795          3/3       Running    0          4h
    kube-proxy-9ts6r                   1/1       Running    0          4h
    kube-proxy-hkbn7                   1/1       NodeLost   0          4h
    kube-proxy-sq6l6                   1/1       Running    0          4h
    kube-scheduler-master-0            1/1       Unknown    0          4h
    kube-scheduler-master-1            1/1       Running    0          4h
    kube-scheduler-master-2            1/1       Running    0          4h
    weave-net-6nzbq                    2/2       NodeLost   0          4h
    weave-net-ndx2q                    2/2       Running    0          4h
    weave-net-w2mfz                    2/2       Running    0          4h

    After failing over one master node the Kubernetes cluster is still accessible.

    [root@master-0 centos]# kubectl get nodes
    NAME       STATUS     ROLES     AGE       VERSION
    master-0   NotReady   master    4h        v1.10.4
    master-1   Ready      master    4h        v1.10.4
    master-2   Ready      master    4h        v1.10.4

    [root@master-0 centos]# kubectl get pods -n kube-system
    NAME                              READY     STATUS     RESTARTS   AGE
    kube-apiserver-master-0            1/1       Unknown    0          4h
    kube-apiserver-master-1            1/1       Running    0          4h
    kube-apiserver-master-2            1/1       Running    0          4h
    kube-controller-manager-master-0   1/1       Unknown    0          4h
    kube-controller-manager-master-1   1/1       Running    0          4h
    kube-controller-manager-master-2   1/1       Running    0          4h
    kube-dns-86f4d74b45-wh795          3/3       Running    0          4h
    kube-proxy-9ts6r                   1/1       Running    0          4h
    kube-proxy-hkbn7                   1/1       NodeLost   0          4h
    kube-proxy-sq6l6                   1/1       Running    0          4h
    kube-scheduler-master-0            1/1       Unknown    0          4h
    kube-scheduler-master-1            1/1       Running    0          4h
    kube-scheduler-master-2            1/1       Running    0          4h
    weave-net-6nzbq                    2/2       NodeLost   0          4h
    weave-net-ndx2q                    2/2       Running    0          4h
    weave-net-w2mfz                    2/2       Running    0          4h

    Even after one node failed, all the important components are up and running. The cluster is still accessible and you can create more pods, deployment services etc.

    [root@master-1 centos]# kubectl create -f nginx.yaml 
    deployment.apps "nginx-deployment" created
    [root@master-1 centos]# kubectl get pods -o wide
    NAME                                READY     STATUS    RESTARTS   AGE       IP              NODE
    nginx-deployment-75675f5897-884kc   1/1       Running   0          10s       10.117.113.98   master-2
    nginx-deployment-75675f5897-crgxt   1/1       Running   0          10s       10.117.113.2    master-1

    Conclusion

    High availability is an important part of reliability engineering, focused on making system reliable and avoid any single point of failure of the complete system. At first glance, its implementation might seem quite complex, but high availability brings tremendous advantages to the system that requires increased stability and reliability. Using highly available cluster is one of the most important aspects of building a solid infrastructure.

  • Creating GraphQL APIs Using Elixir Phoenix and Absinthe

    Introduction

    GraphQL is a new hype in the Field of API technologies. We have been constructing and using REST API’s for quite some time now and started hearing about GraphQL recently. GraphQL is usually described as a frontend-directed API technology as it allows front-end developers to request data in a more simpler way than ever before. The objective of this query language is to formulate client applications formed on an instinctive and adjustable format, for portraying their data prerequisites as well as interactions.

    The Phoenix Framework is running on Elixir, which is built on top of Erlang. Elixir core strength is scaling and concurrency. Phoenix is a powerful and productive web framework that does not compromise speed and maintainability. Phoenix comes in with built-in support for web sockets, enabling you to build real-time apps.

    Prerequisites:

    1. Elixir & Erlang: Phoenix is built on top of these
    2. Phoenix Web Framework: Used for writing the server application. (It’s a well-unknown and lightweight framework in elixir) 
    3. Absinthe: GraphQL library written for Elixir used for writing queries and mutations.
    4. GraphiQL: Browser based GraphQL ide for testing your queries. Consider it similar to what Postman is used for testing REST APIs.

    Overview:

    The application we will be developing is a simple blog application written using Phoenix Framework with two schemas User and Post defined in Accounts and Blog resp. We will design the application to support API’s related to blog creation and management. Assuming you have Erlang, Elixir and mix installed.

    Where to Start:

    At first, we have to create a Phoenix web application using the following command:

    mix phx.new  --no-brunch --no-html

    –no-brunch – do not generate brunch files for static asset building. When choosing this option, you will need to manually handle JavaScript  dependencies if building HTML apps

    • –-no-html – do not generate HTML views.

    Note: As we are going to mostly work with API, we don’t need any web pages, HTML views and so the command args  and

    Dependencies:

    After we create the project, we need to add dependencies in mix.exs to make GraphQL available for the Phoenix application.

    defp deps do
    [
    {:absinthe, "~> 1.3.1"},
    {:absinthe_plug, "~> 1.3.0"},
    {:absinthe_ecto, "~> 0.1.3"}
    ]
    end

    Structuring the Application:

    We can used following components to design/structure our GraphQL application:

    1. GraphQL Schemas : This has to go inside lib/graphql_web/schema/schema.ex. The schema definitions your queries and mutations.
    2. Custom types: Your schema may include some custom properties which should be defined inside lib/graphql_web/schema/types.ex

    Resolvers: We have to write respective Resolver Function’s that handles the business logic and has to be mapped with respective query or mutation. Resolvers should be defined in their own files. We defined it inside lib/graphql/accounts/user_resolver.ex and lib/graphql/blog/post_resolver.ex folder.

    Also, we need to uppdate the router we have to be able to make queries using the GraphQL client in lib/graphql_web/router.ex and also have to create a GraphQL pipeline to route the API request which also goes inside lib/graphql_web/router.ex:

    pipeline :graphql do
    	  plug Graphql.Context  #custom plug written into lib/graphql_web/plug/context.ex folder
    end
    
    scope "/api" do
      pipe_through(:graphql)  #pipeline through which the request have to be routed
    
      forward("/",  Absinthe.Plug, schema: GraphqlWeb.Schema)
      forward("/graphiql", Absinthe.Plug.GraphiQL, schema: GraphqlWeb.Schema)
    end

    Writing GraphQL Queries:

    Lets write some graphql queries which can be considered to be equivalent to GET requests in REST. But before getting into queries lets take a look at GraphQL schema we defined and its equivalent resolver mapping:

    defmodule GraphqlWeb.Schema do
      use Absinthe.Schema
      import_types(GraphqlWeb.Schema.Types)
    
      query do
        field :blog_posts, list_of(:blog_post) do
          resolve(&Graphql.Blog.PostResolver.all/2)
        end
    
        field :blog_post, type: :blog_post do
          arg(:id, non_null(:id))
          resolve(&Graphql.Blog.PostResolver.find/2)
        end
    
        field :accounts_users, list_of(:accounts_user) do
          resolve(&Graphql.Accounts.UserResolver.all/2)
        end
    
        field :accounts_user, :accounts_user do
          arg(:email, non_null(:string))
          resolve(&Graphql.Accounts.UserResolver.find/2)
        end
      end
    end

    You can see above we have defined four queries in the schema. Lets pick a query and see what goes into it :

    field :accounts_user, :accounts_user do
    arg(:email, non_null(:string))
    resolve(&Graphql.Accounts.UserResolver.find/2)
    end

    Above, we have retrieved a particular user using his email address through Graphql query.

    1. arg(:, ): defines an non-null incoming string argument i.e user email for us.
    2. Graphql.Accounts.UserResolver.find/2 : the resolver function that is mapped via schema, which contains the core business logic for retrieving an user.
    3. Accounts_user : the custome defined type which is defined inside lib/graphql_web/schema/types.ex as follows:
    object :accounts_user do
    field(:id, :id)
    field(:name, :string)
    field(:email, :string)
    field(:posts, list_of(:blog_post), resolve: assoc(:blog_posts))
    end

    We need to write a separate resolver function for every query we define. Will go over the resolver function for accounts_user which is present in lib/graphql/accounts/user_resolver.ex file:

    defmodule Graphql.Accounts.UserResolver do
      alias Graphql.Accounts                    #import lib/graphql/accounts/accounts.ex as Accounts
    
      def all(_args, _info) do
        {:ok, Accounts.list_users()}
      end
    
      def find(%{email: email}, _info) do
        case Accounts.get_user_by_email(email) do
          nil -> {:error, "User email #{email} not found!"}
          user -> {:ok, user}
        end
      end
    end

    This function is used to list all users or retrieve a particular user using an email address. Let’s run it now using GraphiQL browser. You need to have the server running on port 4000. To start the Phoenix server use:

    mix deps.get #pulls all the dependencies
    mix deps.compile #compile your code
    mix phx.server #starts the phoenix server

    Let’s retrieve an user using his email address via query:

    Above, we have retrieved the id, email and name fields by executing accountsUser query with an email address. GraphQL also allow us to define variables which we will show later when writing different mutations.

    Let’s execute another query to list all blog posts that we have defined:

     Writing GraphQL Mutations:

    Let’s write some GraphQl mutations. If you have understood the way graphql queries are written mutations are much simpler and similar to queries and easy to understand. It is defined in the same form as queries with a resolver function. Different mutations we are gonna write are as follow:

    1. create_post:- create a new blog post
    2. update_post :- update a existing blog post
    3. delete_post:- delete an existing blog post

    The mutation looks as follows:

    defmodule GraphqlWeb.Schema do
      use Absinthe.Schema
      import_types(GraphqlWeb.Schema.Types)
    
      query do
        mutation do
          field :create_post, type: :blog_post do
            arg(:title, non_null(:string))
            arg(:body, non_null(:string))
            arg(:accounts_user_id, non_null(:id))
    
            resolve(&Graphql.Blog.PostResolver.create/2)
          end
    
          field :update_post, type: :blog_post do
            arg(:id, non_null(:id))
            arg(:post, :update_post_params)
    
            resolve(&Graphql.Blog.PostResolver.update/2)
          end
    
          field :delete_post, type: :blog_post do
            arg(:id, non_null(:id))
            resolve(&Graphql.Blog.PostResolver.delete/2)
          end
        end
    
      end
    end

    Let’s run some mutations to create a post in GraphQL:

    Notice the method is POST and not GET over here.

    Let’s dig into update mutation function :

    field :update_post, type: :blog_post do
    arg(:id, non_null(:id))
    arg(:post, :update_post_params)
    
    resolve(&Graphql.Blog.PostResolver.update/2)
    end

    Here, update post takes two arguments as input ,  non null id and a post parameter of type update_post_params that holds the input parameter values to update. The mutation is defined in lib/graphql_web/schema/schema.ex while the input parameter values are defined in lib/graphql_web/schema/types.ex —

    input_object :update_post_params do
    field(:title, :string)
    field(:body, :string)
    field(:accounts_user_id, :id)
    end

    The difference with previous type definitions is that it’s defined as input_object instead of object.

    The corresponding resolver function is defined as follows :

    def update(%{id: id, post: post_params}, _info) do
    case find(%{id: id}, _info) do
    {:ok, post} -> post |> Blog.update_post(post_params)
    {:error, _} -> {:error, "Post id #{id} not found"}
    end
    end

         

    Here we have defined a query parameter to specify the id of the blog post to be updated.

    Conclusion

    This is all you need, to write a basic GraphQL server for any Phoenix application using Absinthe.  

    References:

    1. https://www.howtographql.com/graphql-elixir/0-introduction/
    2. https://pragprog.com/book/wwgraphql/craft-graphql-apis-in-elixir-with-absinthe
    3. https://itnext.io/graphql-with-elixir-phoenix-and-absinthe-6b0ffd260094
  • Creating a Frictionless SignUp Experience with Auth0 for your Application

    What is Auth0 and frictionless signup?

    Auth0 is a service that handles your application’s authentication and authorization needs with simple drop-in solutions. It can save time and risk compared to building your own authentication/authorization system. Auth0 even has its own universal login/signup page that can be customized through the dashboard, and it also provides APIs to create/manage users.

    A frictionless signup flow allows the user to use a core feature of the application without forcing the user to sign up first. Many companies use this flow, namely Bookmyshow, Redbus, Makemytrip, and Goibibo. 

    So, as an example, we will see how an application like Bookmyshow looks with this frictionless flow. First, let’s assume the user is a first-time user for this application; the user lands on the landing page, selects a movie, selects the theater, selects the number of seats, and then lands on the payment page where they will fill in their contact details (email and mobile number) and proceed to complete the booking flow by paying for the ticket. At this point, the user has accessed the website and made a booking without even signing up.

    Later on, when the user sign ups using the same contact details which were provided during booking, they will notice their previous bookings and other details waiting for them on the app’s account page.

    What we will be doing in this blog?

    In this blog, we will be implementing Auth0 and replicating a similar feature as mentioned above using Auth0. In this code sample, we will be using react.js for the frontend and nest.js for the backend. 

    To keep the blog short, we will only focus on the logic related to the frictionless signup with Auth0. We will not be going through other aspects, like payment service/integration, nest.js, ORM, etc.

    Setup for the Auth0 dashboard:

    Auth0’s documentation is pretty straightforward and easy to understand; we’ll link the sections for this setup, and you can easily sign up and continue your setup with the help of their documentation.

    Do note that you will have to create two applications for this flow. One is a Single-Page Application for your frontend so that you can initiate login from your frontend app and the other is ManagementV2 for your server so that you can use their management APIs to create a user.

    After registering you will get the client id and client secret on the application details page, you will require these keys to plug it in auth0’s SDK so you will be able to use its APIs in your application.

    Setup for your single-page application:

    To use Auth0’s API, we would have to install its SDK. For single-page applications, Auth0 has rewritten a new SDK from auth0-js called auth0-spa-js. But if you are using either Angular, React, or Vue, then auth0 already has created their framework/library-specific wrapper for us to use.

    So, we will move on to installing its React wrapper and continuing with the setup:

    npm install @auth0/auth0-react

    Then, we will wrap our app with Auth0Provider and provide the keys from the Auth0 application settings dashboard:

    <Auth0Provider
         domain={process.env.NEXT_PUBLIC_AUTH0_DOMAIN}
         clientId={process.env.NEXT_PUBLIC_AUTH0_CLIENT_ID}
         redirectUri={
           typeof window !== 'undefined' &&
           `${window.location.origin}/auth-callback`
         }
         onRedirectCallback={onRedirectCallback}
         audience={process.env.NEXT_PUBLIC_AUTH0_AUDIENCE}
         //Safari uses ITP which prevents silent auth.Please refer https://www.py4u.net/discuss/353302
         useRefreshTokens={true}
         cacheLocation="localstorage"
       >
         </App>
       </Auth0Provider>

    You will find the explanation of the above props and more on Auth0’s React APIs through their GitHub link https://github.com/auth0/auth0-react.

    But we do want to cover one issue with their authenticated state and redirection. We noticed that when Auth0 redirects to our application, the isAuthenticated flag doesn’t get reflected immediately. The states get sequentially updated like so:

    • isLoading: false
      isAuthenticated: false
    • isLoading: true
      isAuthenticated: false
    • isLoading: false
      isAuthenticated: false
    • isLoading: false
      isAuthenticated: true

    This can be a pain if you have some common redirection logic based on the user‘s authentication state and user type. 

    What we found out from the Auth0’s community forum is that Auth0 does take some time to parse and update its states, and after the update operations, it then calls the onRedirectCallback function, so it’s safe to put your redirection logic in onRedirectCallback, but there is another issue with that. 

    The function doesn’t have access to Auth0’s context, so you can’t access the user object or any other state for your redirection logic, so you would want to redirect to a page where you have your redirection logic when onRedirectCallback is called.

    So, in place of the actual page set in redirectUri, you would want to use a buffer page like the /auth-callback route where it just shows a progress bar and nothing else.

    Implementation:

    For login/signup, since we are using the universal page we don’t have to do much, we just have to initiate the login with loginWithRedirect() function from the UI, and Auth0 will handle the rest.

    Now, for the core part of the blog, we will now be creating a createBooking API on our nest.js backend, which will accept email, mobile number, booking details (movie, theater location, number of seats), and try to create a booking.

    In this frictionless flow, internally the application does create a user for the booking to refer to; otherwise, it would be difficult to show the bookings once the user signups and tries to access its bookings. 

    So, the logic would go as follows: first, it will check if a user exists with the provided email in our DB. If not, then we will create the user in Auth0 through its management API with a temporary password, and then we will link the newly created Auth0 user in our users table. Then, by using this, we will create a booking.

    Here is an overview of how the createBooking API will look:

    @Post('/bookings/create')
      async createBooking(
        @Body() createBookingDto: createBookingDto
      ): Promise<BookingResponseDto> {
        const { email } = createBookingDto;
        // Checks if the email exists or not, if it doesn’t exists then we will create an account, else we will use the existing user to create a booking
        let user = await this.userRepository.findByEmail(email);
     
        if (!user){
        	const password = Utilities.generatePassword(16);
        	// We use a random password here to create the user on auth0 
        	const { auth0Response } = await this.createUserOnAuth0(
            email,
      password,
        	);
        	this.logger.debug(auth0Response, 'Created Auth0 User');
     
        	let userData: CreateUserDto = {
            email,
            auth0UserId: auth0Response['_id'],
        	};
        	// Creates and links the auth0 user with our DB
        	user = await this.userRepository.addUser(userData);
        }
     
        const booking = {
    userId: user.id,
    transactionId: createBookingDto.transaction.id, // Assuming the payment was done before this API call in a different service
    showId: createBookingDto.show.id,
    theaterId: createBookingDto.theater.id,
    seatNumbers: createBookingDto.seats
        }
        // Creates a booking 
        const bookingObject = await this.bookingRepository.bookTicket(booking)
     
        return new BookingResponseDto(bookingObject)
      }

    As for creating the user on Auth0, we will use Auth0’s management API with the /dbconnections/signup endpoint.
    Apart from the config details that the API requires (client_id, client_secret and connection), it also requires email and password. For the password, we will use a randomly generated one.

    After the user has been created, we will send a forgotten password email to that email address so that the user can set the password and access the account.

    Do note you will have to use the client_id, client_secret, and connection of the ManagementV2 application that was created in the Auth0 dashboard.

    private async createUserOnAuth0(
        email: string,
        password: string,
        createdBy: string,
        retryCount = 0,
      ): Promise<Record<string, string>> {
        try {
          const axiosResponse = await this.httpService
            .post(
              `https://${configService.getAuth0Domain()}/dbconnections/signup`,
              {
                client_id: configService.getAuth0ClientId(),
                client_secret: configService.getAuth0ClientSecret(),
                connection: configService.getAuth0Connection(),
                email,
                password,
              },
            )
            .toPromise();
     
          this.logger.log(
            axiosResponse.data.email,
            'Auth0 user created with email',
          );
     
          // Send password reset email
          this.sendPasswordResetEmail(email);
     
          return { auth0Response: axiosResponse.data, password };
        } catch (err) {
          this.logger.error(err);
          /**
           * {@link https://auth0.com/docs/connections/database/password-strength}
           * Auth0 does not send any specific response, so here we are calling create user again
           * assuming password failed to meet the requirement of auth0
           * But here also we are gonna try it ERROR_RETRY_COUNT times and after that stop call,
           * so we don't get in infinite loop
           */
          if (retryCount < ERROR_RETRY_COUNT) {
            return this.createUserOnAuth0(
              email,
              Utilities.generatePassword(16),
              createdBy,
              retryCount + 1,
            );
          }
     
          throw new HttpException(err, HttpStatus.BAD_REQUEST);
        }
      }

    To send the forgotten password email, we will use the /dbconnections/change_password endpoint from the management API. The code is pretty straightforward.

    This way, the user can change the password, and he/she will be able to access their account.

    private async sendPasswordResetEmail(email: string): Promise<void> {
        try {
          const axiosResponse = await this.httpService
            .post(
              `https://${configService.getAuth0Domain()}/dbconnections/change_password`,
              {
                client_id: configService.getAuth0ClientId(),
                client_secret: configService.getAuth0ClientSecret(),
                connection: configService.getAuth0Connection(),
                email,
              },
            )
            .toPromise();
     
          this.logger.log(email, 'Password reset email sent to');
     
          return axiosResponse.data;
        } catch (err) {
          this.logger.error(err);
        }
      }

    With this, the user can now make a booking without signing up and have a user created in Auth0 for that user, so when he/she logs in later using the universal login page, Auth0 will have a reference for it.

    Conclusion:

    Auth0 is a great platform for managing your application’s authentication and authorization needs if you have a simple enough login/signup flow. It can get a bit tricky when you are trying to implement a non-traditional login/signup flow or a custom flow, which is not supported by Auth0. In such a scenario, you would need to add some custom code as explained in the example above. 

  • Creating Faster and High Performing User Interfaces in Web Apps With Web Workers

    The data we render on a UI originates from different sources like databases, APIs, files, and more. In React applications, when the data is received, we first store it in state and then pass it to the other components in multiple ways for rendering.

    But most of the time, the format of the data is inconvenient for the rendering component. So, we have to format data and perform some prior calculations before we give it to the rendering component.

    Sending data directly to the rendering component and processing the data inside that component is not recommended. Not only data processing but also any heavy background jobs that we would have to depend on the backend can now be done on the client-side because React allows the holding of business logic on the front-end.

    A good practice is to create a separate function for processing that data which is isolated from the rendering logic, so that data processing and data representation will be done separately.

    Why? There are two possible reasons:

    – The processed data can be shared/used by other components, too.

    – The main reason to avoid this is: if the data processing is a time-consuming task, you will see some lag on the UI, or in the worst-case scenario, sometimes the page may become unresponsive.

    As JavaScript is a single-threaded environment, it has only one call stack to execute scripts (in a simple way, you cannot run more than one script at the same time).

    For example, suppose you have to do some DOM manipulations and, at the same time, want to do some complex calculations. You can not perform these two operations in parallel. If the JavaScript engine is busy computing the complex computation, then all the other tasks like event listeners and rendering callbacks will get blocked for that amount of time, and the page may become unresponsive.

    ‍How can you solve this problem?

    Though JavaScript is single-threaded, many developers mimic the concurrency with the help of timer functions and event handlers. Like by breaking heavy (time-consuming) tasks into tiny chunks and by using the timers you can split their execution. Let’s take a look at the following example.

    Here, the processDataArray function uses the timer function to split the execution, which internally uses the setTimeout method for processing some items of array, again after a dedicated time passed execute more items, once all the array elements have been processed, send the processed result back by using the finishCallback

    const processDataArray = (dataArray, finishCallback) => {
     // take a new copy of array
     const todo = dataArray.concat();
     // to store each processed data
     let result = [];
     // timer function
     const timedProcessing = () => {
       const start = +new Date();
       do {
         // process each data item and store it's result
         const singleResult = processSingleData(todo.shift());
         result.push(singleResult);
         // check if todo has something to process and the time difference must not be greater than 50
       } while (todo.length > 0 && +new Date() - start < 50);
     
       // check for remaining items to process
       if (todo.length > 0) {
         setTimeout(timedProcessing, 25);
       } else {
         // finished with all the items, initiate finish callback
         finishCallback(result);
       }
     };
     setTimeout(timedProcessing, 25);
    };
    
    
    
    const processSingleData = data => {
     // process data
     return processedData;
    };

    You can find more about how JavaScript timers work internally here.

    The problem is not solved yet, and the main thread is still busy in the computation so you can see the delay in the UI events like button clicks or mouse scroll. This is a bad user experience when you have a big array computation going on and an impatient web user.

    The better and real multithreading way to solve this problem and to run multiple scripts in parallel is by using Web Workers.

    What are Web Workers?‍

    Web Workers provide a mechanism to spawn a separate script in the background. Where you can do any type of calculations without disturbing the UI. Web Workers run outside the context of the HTML document’s script, making it easiest to allow concurrent execution of JavaScript programs. You can experience multithreading behavior while using Web Workers.

    Communication between the page (main thread) and the worker happens using a simple mechanism. They can send messages to each other using the postMessage method, and they can receive the messages using onmessage callback function. Let’s take a look at a simple example:

    In this example, we will delegate the work of multiplying all the numbers in an array to a Web Worker, and the Web Worker returns the result back to the main thread.

    import "./App.css";
    import { useEffect, useState } from "react";
     
    function App() {
     // This will load and execute the worker.js script in the background.
     const [webworker] = useState(new window.Worker("worker.js"));
     const [result, setResult] = useState("Calculating....");
     
     useEffect(() => {
       const message = { multiply: { array: new Array(1000).fill(2) } };
       webworker.postMessage(message);
    
    
    
       webworker.onerror = () => {
         setResult("Error");
       };
     
       webworker.onmessage = (e) => {
         if (e.data) {
           setResult(e.data.result);
         } else {
           setResult("Error");
         }
       };
     }, []);
    
    useEffect(() => {
       return () => {
         webworker.terminate();
       };
     }, []);
    
     return (
       <div className="App">
         <h1>Webworker Example In React</h1>
         <header className="App-header">
           <h1>Multiplication Of large array</h1>
           <h2>Result: {result}</h2>
         </header>
       </div>
     );
    }
     
    export default App;

    onmessage = (e) => {
     const { multiply } = e.data;
     // check data is correctly framed
     if (multiply && multiply.array.length) {
       // intentionally delay the execution
       setTimeout(() => {
         // this post back the result to the page
         postMessage({
           result: multiply.array.reduce(
             (firstItem, secondItem) => firstItem * secondItem
           ),
         });
       }, 2000);
     } else {
       postMessage({ result: 0 });
     }
    };

    If the worker script throws an exception, you can handle it by attaching a callback function to the onerror property of the worker in the App.js script.

    From the main thread, you can terminate the worker immediately if you want by using the worker’s terminate method. Once the worker is terminated, the worker variable becomes undefined. You need to create another instance if needed.

    You can find a working example here.

    Use cases of Web Workers:

    Charting middleware – Suppose you have to design a dashboard that represents the analytics of businesses engagement for a business retention application by means of a pivot table, pie charts, and bar charts. It involves heavy processing of data to convert it to the expected format of a table, pie chart, a bar chart. This may result in the UI failing to update, freezing, or the page becoming unresponsive because of single-threaded behavior. Here, we can use Web Workers and delegate the processing logic to it. So that the main thread is always available to handle other UI events.

    Emulating excel functionality – For example, if you have thousands of rows in the spreadsheet and each one of them needs some calculations (longer), you can write custom functions containing the processing logic and put them in the WebWorker’s script.

    Real-time text analyzer – This is another good example where we can use WebWorker to show the word count, characters count, repeated word count, etc., by analyzing the text typed by the user in real-time. With a traditional implementation, you may experience performance issues as the text size grows, but this can be optimized by using WebWorkers.

    Web Worker limitations:

    Yes, Web Workers are amazing and quite simple to use, but as the WebWorker is a separate thread, it does not have access to the window object, document object, and parent object. And we can not pass functions through postmessage.

    But Web Workers have access to:

    – Navigator object

    – Location object (read-only)

    – XMLHttpRequest

    – setTimeout, setInterval, clearTimeout, clearInterval

    – You can import other scripts in WebWorker using the importScripts() method

    Here are some other types of workers:
    Shared Worker
    Service Worker
    Audio Worklet

    Conclusion:

    Web Workers make our life easier by doing jobs in parallel in the background, but Web Workers are relatively heavy-weight and come with high startup performance cost and a high per-instance memory cost, so as per the WHATWG community, they are not intended to be used in large numbers.