In our previous blog, we have seen Elasticsearch is a highly scalable open-source full-text search and analytics engine, built on the top of Apache Lucene. Elasticsearch allows you to store, search, and analyze huge volumes of data as quickly as possible and in near real-time.
Basic Concepts –
Index – Large collection of JSON documents. Can be compared to a database in relational databases. Every document must reside in an index.
Shards – Since, there is no limit on the number of documents that reside in an index, indices are often horizontally partitioned as shards that reside on nodes in the cluster. Max documents allowed in a shard = 2,147,483,519 (as of now)
Type – Logical partition of an index. Similar to a table in relational databases.
Fields – Similar to a column in relational databases.
Analyzers – Used while indexing/searching the documents. These contain “tokenizers” that split phrases/text into tokens and “token-filters”, that filter/modify tokens during indexing & searching.
Mappings – Combination of Field + Analyzers. It defines how your fields can be stored & indexed.
Inverted Index
ES uses Inverted Indexes under the hood. Inverted Index is an index which maps terms to documents containing them.
Let’s say, we have 3 documents :
Food is great
It is raining
Wind is strong
An inverted index for these documents can be constructed as –
The terms in the dictionary are stored in a sorted order to find them quickly.
Searching multiple terms is done by performing a lookup on the terms in the index. It performs either UNION or INTERSECTION on them and fetches relevant matching documents.
An ES Index is spanned across multiple shards, each document is routed to a shard in a round–robin fashion while indexing. We can customize which shard to route the document, and which shard search-requests are sent to.
ES Index is made of multiple Lucene indexes, which in turn, are made up of index segments. These are write once, read many types of indices, i.e the index files Lucene writes are immutable (except for deletions).
Analyzers –
Analysis is the process of converting text into tokens or terms which are added to the inverted index for searching. Analysis is performed by an analyzer. An analyzer can be either a built-in or a custom.
We can define single analyzer for both indexing & searching, or a different search-analyzer and an index-analyzer for a mapping.
Building blocks of analyzer-
Character filters – receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.
Tokenizers – receives a stream of characters, breaks it up into individual tokens.
Token filters – receives the token stream and may add, remove, or change tokens.
Some Commonly used built-in analyzers –
1. Standard –
Divides text into terms on word boundaries. Lower-cases all terms. Removes punctuation and stopwords (if specified, default = None).
Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.
Text: The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.
Output: [The 2 QUICK Brown-Foxes jumped over the lazy dog’s bone.]
Some Commonly used built-in tokenizers –
1. Standard –
Divides text into terms on word boundaries, removes most punctuation.
2. Letter –
Divides text into terms whenever it encounters a non-letter character.
3. Lowercase –
Letter tokenizer which lowercases all tokens.
4. Whitespace –
Divides text into terms whenever it encounters any white-space character.
5. UAX-URL-EMAIL –
Standard tokenizer which recognizes URLs and email addresses as single tokens.
6. N-Gram –
Divides text into terms when it encounters anything from a list of specified characters (e.g. whitespace or punctuation), and returns n-grams of each word: a sliding window of continuous letters, e.g. quick → [qu, ui, ic, ck, qui, quic, quick, uic, uick, ick].
7. Edge-N-Gram –
It is similar to N-Gram tokenizer with n-grams anchored to the start of the word (prefix- based NGrams). e.g. quick → [q, qu, qui, quic, quick].
8. Keyword –
Emits exact same text as a single term.
Make your mappings right –
Analyzers if not made right, can increase your search time extensively.
Avoid using regular expressions in queries as much as possible. Let your analyzers handle them.
ES provides multiple tokenizers (standard, whitespace, ngram, edge-ngram, etc) which can be directly used, or you can create your own tokenizer.
A simple use-case where we had to search for a user who either has “brad” in their name or “brad_pitt” in their email (substring based search), one would simply go and write a regex for this query, if no proper analyzers are written for this mapping.
This took 109ms for us to fetch 1 lakh out of 60 million documents
Thus, previous search query which took more than 10-25s got reduced to less than 800-900ms to fetch the same set of records.
Had the use-case been to search results where name starts with “brad” or email starts with “brad_pitt” (prefix based search), it is better to go for edge-n-gram analyzer or suggesters.
Performance Improvement with Filter Queries –
Use Filter queries whenever possible.
ES usually scores documents and returns them in sorted order as per their scores. This may take a hit on performance if scoring of documents is not relevant to our use-case. In such scenarios, use “filter” queries which give boolean scores to documents.
This will reduce query-time by a few milliseconds.
Re-indexing made faster –
Before creating any mappings, know your use-case well.
ES does not allow us to alter existing mappings unlike “ALTER” command in relational databases, although we can keep adding new mappings to the index.
The only way to change existing mappings is by creating a new index, re-indexing existing documents and aliasing the new-index with required name with ZERO downtime on production. Note – This process can take days if you have millions of records to re-index.
To re-index faster, we can change a few settings –
1. Disable swapping – Since no requests will be directed to the new index till indexing is done, we can safely disable swap. Command for Linux machines –
sudo swapoff -a
2. Disable refresh_interval for ES – Default refresh_interval is 1s which can safely be disabled while documents are getting re-indexed.
3. Change bulk size while indexing – ES usually indexes documents in chunks of size 1k. It is preferred to increase this default size to approx 5 to 10K, although we need to find the sweet spot while reindexing to avoid load on current index.
4. Reset replica count to 0 – ES creates at least 1 replica per shard, by default. We can set this to 0 while indexing & reset it to required value post indexing.
Conclusion
ElasticSearch is a very powerful database for text-based searches. The Elastic ecosystem is widely used for reporting, alerting, machine learning, etc. This article just gives an overview of ElasticSearch mappings and how creating relevant mappings can improve your query performance & accuracy. Giving right mappings, right resources to your ElasticSearch cluster can do wonders.
Elasticsearch is currently the most popular way to implement free text search and analytics in applications. It is highly scalable and can easily manage petabytes of data. It supports a variety of use cases like allowing users to easily search through any portal, collect and analyze log data, build business intelligence dashboards to quickly analyze and visualize data.
This blog acts as an introduction to Elasticsearch and covers the basic concepts of clusters, nodes, index, document and shards.
What is Elasticsearch?
Elasticsearch (ES) is a combination of open-source, distributed, highly scalable data store, and Lucene – a search engine that supports extremely fast full-text search. It is a beautifully crafted software, which hides the internal complexities and provides full-text search capabilities with simple REST APIs. Elasticsearch is written in Java with Apache Lucene at its core. It should be clear that Elasticsearch is not like a traditional RDBMS. It is not suitable for your transactional database needs, and hence, in my opinion, it should not be your primary data store. It is a common practice to use a relational database as the primary data store and inject only required data into Elasticsearch.
Elasticsearch is meant for fast text search. There are several functionalities, which make it different from RDBMS. Unlike RDBMS, Elasticsearch stores data in the form of a JSON document, which is denormalized and doesn’t support transactions, referential integrity, joins, and subqueries.
Elasticsearch works with structured, semi-structured, and unstructured data as well. In the next section, let’s walk through the various components in Elasticsearch.
Elasticsearch Components
Cluster
One or more servers collectively providing indexing and search capabilities form an Elasticsearch cluster. The cluster size can vary from a single node to thousands of nodes, depending on the use cases.
Node
Node is a single physical or virtual machine that holds full or part of your data and provides computing power for indexing and searching your data. Every node is identified with a unique name. If the node identifier is not specified, a random UUID is assigned as a node identifier at the startup. Every node configuration has the property `cluster.name`. The cluster will be formed automatically with all the nodes having the same `cluster.name` at startup.
A node has to accomplish several duties such as:
storing the data
performing operations on data (indexing, searching, aggregation, etc.)
maintaining the health of the cluster
Each node in a cluster can do all these operations. Elasticsearch provides the capability to split responsibilities across different nodes. This makes it easy to scale, optimize, and maintain the cluster. Based on the responsibilities, the following are the different types of nodes that are supported:
Data Node
Data node is the node that has storage and computation capability. Data node stores the part of data in the form of shards (explained in the later section). Data nodes also participate in the CRUD, search, and aggregate operations. These operations are resource-intensive, and hence, it is a good practice to have dedicated data nodes without having the additional load of cluster administration. By default, every node of the cluster is a data node.
Master Node
Master nodes are reserved to perform administrative tasks. Master nodes track the availability/failure of the data nodes. The master nodes are responsible for creating and deleting the indices (explained in the later section).
This makes the master node a critical part of the Elasticsearch cluster. It has to be stable and healthy. A single master node for a cluster is certainly a single point of failure. Elasticsearch provides the capability to have multiple master-eligible nodes. All the master eligible nodes participate in an election to elect a master node. It is recommended to have a minimum of three nodes in the cluster to avoid a split-brain situation. By default, all the nodes are both data nodes as well as master nodes. However, some nodes can be master-eligible nodes only through explicit configuration.
Coordinating-Only Node
Any node, which is not a master node or a data node, is a coordinating node. Coordinating nodes act as smart load balancers. Coordinating nodes are exposed to end-user requests. It appropriately redirects the requests between data nodes and master nodes.
To take an example, a user’s search request is sent to different data nodes. Each data node searches locally and sends the result back to the coordinating node. Coordinating node aggregates and returns the result to the user.
There are a few concepts that are core to Elasticsearch. Understanding these basic concepts will tremendously ease the learning process.
Index
Index is a container to store data similar to a database in the relational databases. An index contains a collection of documents that have similar characteristics or are logically related. If we take an example of an e-commerce website, there will be one index for products, one for customers, and so on. Indices are identified by the lowercase name. The index name is required to perform the add, update, and delete operations on the documents.
Type
Type is a logical grouping of the documents within the index. In the previous example of product index, we can further group documents into types, like electronics, fashion, furniture, etc. Types are defined on the basis of documents having similar properties in it. It isn’t easy to decide when to use the type over the index. Indices have more overheads, so sometimes, it is better to use different types in the same index for better performance. There are a couple of restrictions to use types as well. For example, two fields having the same name in different types of documents should be of the same datatype (string, date, etc.).
Document
Document is the piece indexed by Elasticsearch. A document is represented in the JSON format. We can add as many documents as we want into an index. The following snippet shows how to create a document of type mobile in the index store. We will cover more about the individual field of the document in the Mapping Type section.
HTTPPOST<hostname:port>/store/mobile/{ "name": "Motorola G5", "model": "XT3300", "release_date": "2016-01-01", "features": "16 GB ROM | Expandable Upto 128 GB | 5.2 inch Full HD Display | 12MP Rear Camera | 5MP Front Camera | 3000 mAh Battery | Snapdragon 625 Processor", "ram_gb": "3", "screen_size_inches": "5.2"}
Mapping Types
To create different types in an index, we need mapping types (or simply mapping) to be specified during index creation. Mappings can be defined as a list of directives given to Elasticseach about how the data is supposed to be stored and retrieved. It is important to provide mapping information at the time of index creation based on how we want to retrieve our data later. In the context of relational databases, think of mappings as a table schema.
Mapping provides information on how to treat each JSON field. For example, the field can be of type date, geo–location, or person name. Mappings also allow specifying which fields will participate in the full-text search, and specify the analyzers used to transform and decorate data before storing into an index. If no mapping is provided, Elasticsearch tries to identify the schema itself, known as Dynamic Mapping.
Each mapping type has Meta Fields and Properties. The snippet below shows the mapping of the type mobile.
As the name indicates, meta fields stores additional information about the document. Meta fields are meant for mostly internal usage, and it is unlikely that the end-user has to deal with meta fields. Meta field names starts with an underscore. There are around ten meta fields in total. We will talk about some of them here:
_index
It stores the name of the index document it belongs to. This is used internally to store/search the document within an index.
_type
It stores the type of the document. To get better performance, it is often included in search queries.
_id
This is the unique id of the document. It is used to access specific document directly over the HTTP GET API.
_source
This holds the original JSON document before applying any analyzers/transformations. It is important to note that Elasticsearch can query on fields that are indexed (provided mapping for). The _source field is not indexed, and hence, can’t be queried on but it can be included in the final search result.
Fields Or Properties
List of fields specifies which all JSON fields in the document should be included in a particular type. In the e-commerce website example, mobile can be a type. It will have fields, like operating_system, camera_specification, ram_size, etc.
Fields also carry the data type information with them. This directs Elasticsearch to treat the specific fields in a particular way of storing/searching data. Data types are similar to what we see in any other programming language. We will talk about a few of them here.
Simple Data Types
Text
This data type is used to store full-text like product description. These fields participate in full-text search. These types of fields are analyzed while storing, which enables to searching them by the individual word in it. Such fields are not used in sorting and aggregation queries.
Keywords
This type is also used to store text data, but unlike Text, it is not analyzed and stored. This is suitable to store information like a user’s mobile number, city, age, etc. These fields are used in filter, aggregation, and sorting queries. For e.g., list all users from a particular city and filter them by age.
Numeric
Elasticsearch supports a wide range of numeric type: long, integer, short, byte, double, float.
There are a few more data types to support date, boolean (true/false, on/off, 1/0), IP (to store IP addresses).
Special Data Types
Geo Point
This data type is used to store geographical location. It accepts latitude and longitude pair. For example, this data type can be used to arrange the user’s photo library by their geographical location or graphically display the locations trending on social media news.
Geo Shape
It allows storing arbitrary geometric shapes like rectangle, polygon, etc.
Completion Suggester
This data type is used to provide auto-completion feature over a specific field. As the user types certain text, the completion suggester can guide the user to reach particular results.
Complex Data Type
Object
If you know JSON well, this concept won’t be new for you. Elasticsearch also allows storing nested JSON object structure as a document.
Nested
The Object data type is not that useful due to its underlying data representation in the Lucene index. Lucene index does not support inner JSON object. ES flattens the original JSON to make it compatible with storing in Lucene index. Thus, fields of the multiple inner objects get merged into one leading object to wrong search results. Most of the time, you may use Nested data type over Object.
Shards
Shards help with enabling Elasticsearch to become horizontally scalable. An index can store millions of documents and occupy terabytes of data. This can cause problems with performance, scalability, and maintenance. Let’s see how Shards help achieve scalability.
Indices are divided into multiple units called Shards (refer the diagram below). Shard is a full-featured subset of an index. Shards of the same index now can reside on the same or different nodes of the cluster. Shard decides the degree of parallelism for search and indexing operations. Shards allow the cluster to grow horizontally. The number of shards per index can be specified at the time of index creation. By default, the number of shards created is 5. Although, once the index is created the number of shards can not be changed. To change the number of shards, reindex the data.
Replication
Hardware can fail at any time. To ensure fault tolerance and high availability, ES provides a feature to replicate the data. Shards can be replicated. A shard which is being copied is called as Primary Shard. The copy of the primary shard is called a replica shard or simply replica. Like the number of shards, the number of replication can also be specified at the time of index creation. Replication served two purposes:
High Availability – Replica is never been created on the same node where the primary shard is present. This ensures that data can be available through the replica shard even if the complete node is failed.
Performance – Replica can also contribute to search capabilities. The search queries will be executed parallelly across the replicas.
To summarize, to achieve high availability and performance, the index is split into multiple shards. In a production environment, multiple replicas are created for every index. In the replicated index, only primary shards can serve write requests. However, all the shards (the primary shard as well as replicated shards) can serve read/query requests. The replication factor is defined at the time of index creation and can be changed later if required. Choosing the number of shards is an important exercise. As once defined, it can’t be changed. In critical scenarios, changing the number of shards requires creating a new index with required shards and reindexing old data.
Summary
In this blog, we have covered the basic but important aspects of Elasticsearch. In the following posts, I will talk about how indexing & searching works in detail. Stay tuned!
In the world of data centers with wings and wheels, there is an opportunity to lay some work off from the centralized cloud computing by taking less compute intensive tasks to other components of the architecture. In this blog, we will explore the upcoming frontier of the web – Edge Computing.
What is the “Edge”?
The ‘Edge’ refers to having computing infrastructure closer to the source of data. It is the distributed framework where data is processed as close to the originating data source possible. This infrastructure requires effective use of resources that may not be continuously connected to a network such as laptops, smartphones, tablets, and sensors. Edge Computing covers a wide range of technologies including wireless sensor networks, cooperative distributed peer-to-peer ad-hoc networking and processing, also classifiable as local cloud/fog computing, mobile edge computing, distributed data storage and retrieval, autonomic self-healing networks, remote cloud services, augmented reality, and more.
Cloud Computing is expected to go through a phase of decentralization. Edge Computing is coming up with an ideology of bringing compute, storage and networking closer to the consumer.
But Why?
Legit question! Why do we even need Edge Computing? What are the advantages of having this new infrastructure?
Imagine a case of a self-driving car where the car is sending a live stream continuously to the central servers. Now, the car has to take a crucial decision. The consequences can be disastrous if the car waits for the central servers to process the data and respond back to it. Although algorithms like YOLO_v2 have sped up the process of object detection the latency is at that part of the system when the car has to send terabytes to the central server and then receive the response and then act! Hence, we need the basic processing like when to stop or decelerate, to be done in the car itself.
The goal of Edge Computing is to minimize the latency by bringing the public cloud capabilities to the edge. This can be achieved in two forms – custom software stack emulating the cloud services running on existing hardware, and the public cloud seamlessly extended to multiple point-of-presence (PoP) locations.
Following are some promising reasons to use Edge Computing:
Privacy: Avoid sending all raw data to be stored and processed on cloud servers.
Real-time responsiveness: Sometimes the reaction time can be a critical factor.
Reliability: The system is capable to work even when disconnected to cloud servers. Removes a single point of failure.
To understand the points mentioned above, let’s take the example of a device which responds to a hot keyword. Example, Jarvis from Iron Man. Imagine if your personal Jarvis sends all of your private conversations to a remote server for analysis. Instead, It is intelligent enough to respond when it is called. At the same time, it is real-time and reliable.
Intel CEO Brian Krzanich said in an event that autonomous cars will generate 40 terabytes of data for every eight hours of driving. Now with that flood of data, the time of transmission will go substantially up. In cases of self-driving cars, real-time or quick decisions are an essential need. Here edge computing infrastructure will come to rescue. These self-driving cars need to take decisions is split of a second whether to stop or not else consequences can be disastrous.
Another example can be drones or quadcopters, let’s say we are using them to identify people or deliver relief packages then the machines should be intelligent enough to take basic decisions like changing the path to avoid obstacles locally.
This model of Edge Computing is basically an extension of the public cloud. Content Delivery Networks are classic examples of this topology in which the static content is cached and delivered through a geographically spread edge locations.
Vapor IO is an emerging player in this category. They are attempting to build infrastructure for cloud edge. Vapor IO has various products like Vapor Chamber. These are self-monitored. They have sensors embedded in them using which they are continuously monitored and evaluated by Vapor Software, VEC(Vapor Edge Controller). They also have built OpenDCRE, which we will see later in this blog.
The fundamental difference between device edge and cloud edge lies in the deployment and pricing models. The deployment of these models – device edge and cloud edge – are specific to different use cases. Sometimes, it may be an advantage to deploy both the models.
Edges around you
Edge Computing examples can be increasingly found around us:
Smart street lights
Automated Industrial Machines
Mobile devices
Smart Homes
Automated Vehicles (cars, drones etc)
Data Transmission is expensive. By bringing compute closer to the origin of data, latency is reduced as well as end users have better experience. Some of the evolving use cases of Edge Computing are Augmented Reality(AR) or Virtual Reality(VR) and the Internet of things. For example, the rush which people got while playing an Augmented Reality based pokemon game, wouldn’t have been possible if “real-timeliness” was not present in the game. It was made possible because the smartphone itself was doing AR not the central servers. Even Machine Learning(ML) can benefit greatly from Edge Computing. All the heavy-duty training of ML algorithms can be done on the cloud and the trained model can be deployed on the edge for near real-time or even real-time predictions. We can see that in today’s data-driven world edge computing is becoming a necessary component of it.
There is a lot of confusion between Edge Computing and IOT. If stated simply, Edge Computing is nothing but the intelligent Internet of things(IOT) in a way. Edge Computing actually complements traditional IOT. In the traditional model of IOT, all the devices, like sensors, mobiles, laptops etc are connected to a central server. Now let’s imagine a case where you give the command to your lamp to switch off, for such simple task, data needs to be transmitted to the cloud, analyzed there and then lamp will receive a command to switch off. Edge Computing brings computing closer to your home, that is either the fog layer present between lamp and cloud servers is smart enough to process the data or the lamp itself.
If we look at the below image, it is a standard IOT implementation where everything is centralized. While Edge Computing philosophy talks about decentralizing the architecture.
The Fog
Sandwiched between edge layer and cloud layer, there is the Fog Layer. It bridges connection between other two layers.
The difference between fog and edge computing is described in this article –
Fog Computing – Fog computing pushes intelligence down to the local area network level of network architecture, processing data in a fog node or IoT gateway.
Edge computing pushes the intelligence, processing power and communication capabilities of an edge gateway or appliance directly into devices like programmable automation controllers (PACs).
How do we manage Edge Computing?
The Device Relationship Management or DRM refers to managing, monitoring the interconnected components over the internet. AWS IOT Core and AWS Greengrass, Nebbiolo Technologies have developed Fog Node and Fog OS, Vapor IO has OpenDCRE using which one can control and monitor the data centers.
Following image (source – AWS) shows how to manage ML on Edge Computing using AWS infrastructure.
AWS Greengrass makes it possible for users to use Lambda functions to build IoT devices and application logic. Specifically, AWS Greengrass provides cloud-based management of applications that can be deployed for local execution. Locally deployed Lambda functions are triggered by local events, messages from the cloud, or other sources.
This GitHub repo demonstrates a traffic light example using two Greengrass devices, a light controller, and a traffic light.
Conclusion
We believe that next-gen computing will be influenced a lot by Edge Computing and will continue to explore new use-cases that will be made possible by the Edge.
After spending a couple of years in JavaScript development, I’ve realized how incredibly important design patterns are, in modern JavaScript (ES6). And I’d love to share my experience and knowledge on the subject, hoping you’d make this a critical part of your development process as well.
Note: All the examples covered in this post are implemented with ES6 features, but you can also integrate the design patterns with ES5.
At Velotio, we always follow best practices to achieve highly maintainable and more robust code. And we are strong believers of using design patterns as one of the best ways to write clean code.
In the post below, I’ve listed the most useful design patterns I’ve implemented so far and how you can implement them too:
1. Module
The module pattern simply allows you to keep units of code cleanly separated and organized.
Modules promote encapsulation, which means the variables and functions are kept private inside the module body and can’t be overwritten.
// usageimport { sum } from'modules/sum';constresult=sum(20, 30); // 50
ES6 also allows us to export the module as default. The following example gives you a better understanding of this.
// All the variables and functions which are not exported are private within the module and cannot be used outside. Only the exported members are public and can be used by importing them.// Here the businessList is private member to city moduleconstbusinessList=newWeakMap();// Here City uses the businessList member as it’s in same moduleclassCity {constructor() { businessList.set(this, ['Pizza Hut', 'Dominos', 'Street Pizza']); }// public method to access the private ‘businessList’getBusinessList() {return businessList.get(this); }// public method to add business to ‘businessList’addBusiness(business) { businessList.get(this).push(business); }}// export the City class as default moduleexportdefault City;
// usageimport City from'modules/city';constcity=newCity();city.getBusinessList();
There is a great article written on the features of ES6 modules here.
2. Factory
Imagine creating a Notification Management application where your application currently only allows for a notification through Email, so most of the code lives inside the EmailNotification class. And now there is a new requirement for PushNotifications. So, to implement the PushNotifications, you have to do a lot of work as your application is mostly coupled with the EmailNotification. You will repeat the same thing for future implementations.
To solve this complexity, we will delegate the object creation to another object called factory.
An observer pattern maintains the list of subscribers so that whenever an event occurs, it will notify them. An observer can also remove the subscriber if the subscriber no longer wishes to be notified.
On YouTube, many times, the channels we’re subscribed to will notify us whenever a new video is uploaded.
// PublisherclassVideo {constructor(observable, name, content) {this.observable = observable;this.name = name;this.content = content;// publish the ‘video-uploaded’ eventthis.observable.publish('video-uploaded', { name, content, }); }}// SubscriberclassUser {constructor(observable) {this.observable = observable;this.intrestedVideos = [];// subscribe with the event naame and the call back functionthis.observable.subscribe('video-uploaded', this.addVideo.bind(this)); }addVideo(video) {this.intrestedVideos.push(video); }}// Observer classObservable {constructor() {this.handlers = []; }subscribe(event, handler) {this.handlers[event] =this.handlers[event] || [];this.handlers[event].push(handler); }publish(event, eventData) {consteventHandlers=this.handlers[event];if (eventHandlers) {for (var i =0, l = eventHandlers.length; i < l; ++i) { eventHandlers[i].call({}, eventData); } } }}// usageconstobservable=newObservable();constuser=newUser(observable);constvideo=newVideo(observable, 'ES6 Design Patterns', videoFile);
4. Mediator
The mediator pattern provides a unified interface through which different components of an application can communicate with each other.
If a system appears to have too many direct relationships between components, it may be time to have a central point of control that components communicate through instead.
The mediator promotes loose coupling.
A real-time analogy could be a traffic light signal that handles which vehicles can go and stop, as all the communications are controlled from a traffic light.
Let’s create a chatroom (mediator) through which the participants can register themselves. The chatroom is responsible for handling the routing when the participants chat with each other.
// each participant represented by Participant objectclassParticipant {constructor(name) {this.name = name; }getParticiantDetails() {returnthis.name; }}// MediatorclassChatroom {constructor() {this.participants = {}; }register(participant) {this.participants[participant.name] = participant; participant.chatroom =this; }send(message, from, to) {if (to) {// single message to.receive(message, from); } else {// broadcast message to everyonefor (key inthis.participants) {if (this.participants[key] !== from) {this.participants[key].receive(message, from); } } } }}// usage// Create two participants constjohn=newParticipant('John');constsnow=newParticipant('Snow');// Register the participants to Chatroomvar chatroom =newChatroom(); chatroom.register(john); chatroom.register(snow);// Participants now chat with each other john.send('Hey, Snow!'); john.send('Are you there?'); snow.send('Hey man', yoko); snow.send('Yes, I heard that!');
5. Command
In the command pattern, an operation is wrapped as a command object and passed to the invoker object. The invoker object passes the command to the corresponding object, which executes the command.
The command pattern decouples the objects executing the commands from objects issuing the commands. The command pattern encapsulates actions as objects. It maintains a stack of commands whenever a command is executed, and pushed to stack. To undo a command, it will pop the action from stack and perform reverse action.
You can consider a calculator as a command that performs addition, subtraction, division and multiplication, and each operation is encapsulated by a command object.
// The list of operations can be performedconstaddNumbers= (num1, num2) => num1 + num2;constsubNumbers= (num1, num2) => num1 - num2;constmultiplyNumbers= (num1, num2) => num1 * num2;constdivideNumbers= (num1, num2) => num1 / num2;// CalculatorCommand class initialize with execute function, undo function // and the value classCalculatorCommand {constructor(execute, undo, value) {this.execute = execute;this.undo = undo;this.value = value; }}// Here we are creating the command objectsconstDoAddition=value=>newCalculatorCommand(addNumbers, subNumbers, value);constDoSubtraction=value=>newCalculatorCommand(subNumbers, addNumbers, value);constDoMultiplication=value=>newCalculatorCommand(multiplyNumbers, divideNumbers, value);constDoDivision=value=>newCalculatorCommand(divideNumbers, multiplyNumbers, value);// AdvancedCalculator which maintains the list of commands to execute and // undo the executed commandclassAdvancedCalculator {constructor() {this.current =0;this.commands = []; }execute(command) {this.current = command.execute(this.current, command.value);this.commands.push(command); }undo() {let command =this.commands.pop();this.current = command.undo(this.current, command.value); }getCurrentValue() {returnthis.current; }}// usageconstadvCal=newAdvancedCalculator();// invoke commandsadvCal.execute(newDoAddition(50)); //50advCal.execute(newDoSubtraction(25)); //25advCal.execute(newDoMultiplication(4)); //100advCal.execute(newDoDivision(2)); //50// undo commandsadvCal.undo();advCal.getCurrentValue(); //100
6. Facade
The facade pattern is used when we want to show the higher level of abstraction and hide the complexity behind the large codebase.
A great example of this pattern is used in the common DOM manipulation libraries like jQuery, which simplifies the selection and events adding mechanism of the elements.
Though it seems simple on the surface, there is an entire complex logic implemented when performing the operation.
The following Account Creation example gives you clarity about the facade pattern:
// Here AccountManager is responsible to create new account of type // Savings or Current with the unique account numberlet currentAccountNumber =0;classAccountManager {createAccount(type, details) {constaccountNumber= AccountManager.getUniqueAccountNumber();let account;if (type ==='current') { account =newCurrentAccount(); } else { account =newSavingsAccount(); }return account.addAccount({ accountNumber, details }); }staticgetUniqueAccountNumber() {return++currentAccountNumber; }}// class Accounts maintains the list of all accounts createdclassAccounts {constructor() {this.accounts = []; }addAccount(account) {this.accounts.push(account);returnthis.successMessage(complaint); }getAccount(accountNumber) {returnthis.accounts.find(account=> account.accountNumber === accountNumber); }successMessage(account) {}}// CurrentAccounts extends the implementation of Accounts for providing more specific success messages on successful account creationclassCurrentAccountsextendsAccounts {constructor() {super();if (CurrentAccounts.exists) {return CurrentAccounts.instance; } CurrentAccounts.instance =this; CurrentAccounts.exists =true;returnthis; }successMessage({ accountNumber, details }) {return`Current Account created with ${details}. ${accountNumber} is your account number.`; }}// Same here, SavingsAccount extends the implementation of Accounts for providing more specific success messages on successful account creationclassSavingsAccountextendsAccounts {constructor() {super();if (SavingsAccount.exists) {return SavingsAccount.instance; } SavingsAccount.instance =this; SavingsAccount.exists =true;returnthis; }successMessage({ accountNumber, details }) {return`Savings Account created with ${details}. ${accountNumber} is your account number.`; }}// usage// Here we are hiding the complexities of creating accountconstaccountManager=newAccountManager();constcurrentAccount= accountManager.createAccount('current', { name: 'John Snow', address: 'pune' });constsavingsAccount= accountManager.createAccount('savings', { name: 'Petter Kim', address: 'mumbai' });
7. Adapter
The adapter pattern converts the interface of a class to another expected interface, making two incompatible interfaces work together.
With the adapter pattern, you might need to show the data from a 3rd party library with the bar chart representation, but the data formats of the 3rd party library API and the display bar chart are different. Below, you’ll find an adapter that converts the 3rd party library API response to Highcharts’ bar representation:
This has been a brief introduction to the design patterns in modern JavaScript (ES6). This subject is massive, but hopefully this article has shown you the benefits of using it when writing code.
Zappa is a very powerful open source python project which lets you build, deploy and update your WSGI app hosted on AWS Lambda + API Gateway easily.This blog is a detailed step-by-step focusing on challenges faced while deploying Django application on AWS Lambda using Zappa as a deployment tool.
Building Your Application
If you do not have a Django application already you can build one by cloning this GitHub repository.
Once you have cloned the repository you will need a virtual environment which provides an isolated Python environment for your application. I prefer virtualenvwrapper to create one.
Now if you run the server directly it will log a warning as the database is not set up yet.
$ python manage.py runserver
Performing system checks...System check identified no issues (0 silenced).You have 13 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.Run 'python manage.py migrate' to apply them.May 20, 2018-14:47:32Django version 1.11.11, usingsettings'django_zappa_sample.settings'Starting development server at http://127.0.0.1:8000/Quit the server withCONTROL-C.
Also trying to access admin page (http://localhost:8000/admin/) will throw an “OperationalError” exception with below log at server end.
Internal Server Error: /admin/Traceback (most recent call last): File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/core/handlers/exception.py", line 41, in inner response =get_response(request) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, in _get_response response = self.process_exception_by_middleware(e, request) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, in _get_response response =wrapped_callback(request, *callback_args, **callback_kwargs) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/admin/sites.py", line 242, in wrapperreturn self.admin_view(view, cacheable)(*args, **kwargs) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/utils/decorators.py", line 149, in _wrapped_view response =view_func(request, *args, **kwargs) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/views/decorators/cache.py", line 57, in _wrapped_view_func response =view_func(request, *args, **kwargs) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/admin/sites.py", line 213, in innerif not self.has_permission(request): File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/admin/sites.py", line 187, in has_permissionreturn request.user.is_active and request.user.is_staff File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/utils/functional.py", line 238, in inner self._setup() File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/utils/functional.py", line 386, in _setup self._wrapped = self._setupfunc() File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py", line 24, in<lambda> request.user =SimpleLazyObject(lambda: get_user(request)) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py", line 12, in get_user request._cached_user = auth.get_user(request) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/auth/__init__.py", line 211, in get_user user_id =_get_user_session_key(request) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/auth/__init__.py", line 61, in _get_user_session_keyreturnget_user_model()._meta.pk.to_python(request.session[SESSION_KEY]) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/sessions/backends/base.py", line 57, in __getitem__return self._session[key] File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/sessions/backends/base.py", line 207, in _get_session self._session_cache = self.load() File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/contrib/sessions/backends/db.py", line 35, in load expire_date__gt=timezone.now() File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/manager.py", line 85, in manager_methodreturngetattr(self.get_queryset(), name)(*args, **kwargs) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/query.py", line 374, in get num =len(clone) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/query.py", line 232, in __len__ self._fetch_all() File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/query.py", line 1118, in _fetch_all self._result_cache =list(self._iterable_class(self)) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/query.py", line 53, in __iter__ results = compiler.execute_sql(chunked_fetch=self.chunked_fetch) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql raise original_exceptionOperationalError: no such table: django_session[20/May/201814:59:23] "GET /admin/ HTTP/1.1"500153553Not Found: /favicon.ico
In order to fix this you need to run the migration into your database so that essential tables like auth_user, sessions, etc are created before any request is made to the server.
NOTE: Use DATABASES from project settings file to configure your database that you would want your Django application to use once hosted on AWS Lambda. By default, its configured to create a local SQLite database file as backend.
You can run the server again and it should now load the admin panel of your website.
Do verify if you have the zappa python package into your virtual environment before moving forward.
Configuring Zappa Settings
Deploying with Zappa is simple as it only needs a configuration file to run and rest will be managed by Zappa. To create this configuration file run from your project root directory –
$ zappa init
███████╗ █████╗ ██████╗ ██████╗ █████╗╚══███╔╝██╔══██╗██╔══██╗██╔══██╗██╔══██╗ ███╔╝ ███████║██████╔╝██████╔╝███████║ ███╔╝ ██╔══██║██╔═══╝ ██╔═══╝ ██╔══██║███████╗██║ ██║██║ ██║ ██║ ██║╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═╝Welcome to Zappa!Zappa is a system for running server-less Python web applications on AWS Lambda and AWSAPI Gateway.This `init` command will help you create and configure your new Zappa deployment.Let's get started!Your Zappa configuration can support multiple production stages, like 'dev', 'staging', and 'production'.What do you want to call thisenvironment (default 'dev'): AWS Lambda and API Gateway are only available in certain regions. Let's check to make sure you have a profile set up in one that will work.We found the following profiles: default, and hdx. Which would you like us to use? (default 'default'):Your Zappa deployments will need to be uploaded to a private S3 bucket.If you don't have a bucket yet, we'll create one for you too.What do you want call your bucket? (default 'zappa-108wqhyn4'): django-zappa-sample-bucketIt looks like this is a Django application!What is the modulepathtoyourprojects's Django settings?Wediscovered: django_zappa_sample.settingsWhereareyourproject's settings? (default 'django_zappa_sample.settings'):Youcanoptionallydeploytoallavailableregionsinordertoprovidefastglobalservice.IfyouareusingZappaforthefirsttime, youprobablydon't want to do this!Wouldyouliketodeploythisapplicationglobally? (default'n') [y/n/(p)rimary]: nOkay, here's your zappa_settings.json:{"dev": {"aws_region": "us-east-1", "django_settings": "django_zappa_sample.settings", "profile_name": "default", "project_name": "django-zappa-sa", "runtime": "python2.7", "s3_bucket": "django-zappa-sample-bucket" }}Does this look okay? (default 'y') [y/n]: yDone! Now you can deploy your Zappa application by executing: $ zappa deploy devAfter that, you can update your application code with: $ zappa update devTo learn more, check out our project page on GitHub here: https://github.com/Miserlou/Zappaand stop by our Slack channel here: https://slack.zappa.ioEnjoy!,~ Team Zappa!
You can verify zappa_settings.json generated at your project root directory.
TIP: The virtual environment name should not be the same as the Zappa project name, as this may cause errors.
Additionally, you could specify other settings in zappa_settings.json file as per requirement using Advanced Settings.
Now, you’re ready to deploy!
IAM Permissions
In order to deploy the Django Application to Lambda/Gateway, setup an IAM role (eg. ZappaLambdaExecutionRole) with the following permissions:
Before deploying the application, ensure that the IAM role is set in the config JSON as follows:
{"dev": {..."manage_roles": false, // Disable Zappa client managing roles."role_name": "MyLambdaRole", // Name of your Zappa execution role. Optional, default: --ZappaExecutionRole."role_arn": "arn:aws:iam::12345:role/app-ZappaLambdaExecutionRole", // ARN of your Zappa execution role. Optional....},...}
Once your settings are configured, you can package and deploy your application to a stage called “dev” with a single command:
$ zappa deploy dev
Calling deploy for stage dev..Downloading and installing dependencies..Packaging project aszip.Uploading django-zappa-sa-dev-1526831069.zip (10.9MiB)..100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.4M/11.4M [01:02<00:00, 75.3KB/s]Scheduling..Scheduled django-zappa-sa-dev-zappa-keep-warm-handler.keep_warm_callback with expression rate(4 minutes)!Uploading django-zappa-sa-dev-template-1526831157.json (1.6KiB)..100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.60K/1.60K [00:02<00:00, 792B/s]Waiting for stack django-zappa-sa-dev to create (this can take a bit)..100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|4/4 [00:11<00:00, 2.92s/res]Deploying API Gateway..Deployment complete!: https://akg59b222b.execute-api.us-east-1.amazonaws.com/dev
You should see that your Zappa deployment completed successfully with URL to API gateway created for your application.
Troubleshooting
1. If you are seeing the following error while deployment, it’s probably because you do not have sufficient privileges to run deployment on AWS Lambda. Ensure your IAM role has all the permissions as described above or set “manage_roles” to true so that Zappa can create and manage the IAM role for you.
Calling deploy for stage dev..Creating django-zappa-sa-dev-ZappaLambdaExecutionRole IAM Role..Error: Failed to manage IAM roles!You may lack the necessary AWS permissions to automatically manage a Zappa execution role.To fix this, see here: https://github.com/Miserlou/Zappa#using-custom-aws-iam-roles-and-policies
2. The below error will be caused as you have not listed “events.amazonaws.com” as Trusted Entity for your IAM Role. You can add the same or set “keep_warm” parameter to false in your Zappa settings file. Your Zappa deployment was partially deployed as it got terminated abnormally.
Downloading and installing dependencies..100%|████████████████████████████████████████████|44/44 [00:05<00:00, 7.92pkg/s]Packaging project aszip..Uploading django-zappa-sample-dev-1482817370.zip (8.8MiB)..100%|█████████████████████████████████████████| 9.22M/9.22M [00:17<00:00, 527KB/s]Scheduling...Oh no! An error occurred! :(==============Traceback (most recent call last):Traceback (most recent call last): File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 2610, in handle sys.exit(cli.handle()) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 505, in handle self.dispatch_command(self.command, stage) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 539, in dispatch_command self.deploy(self.vargs['zip']) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 800, in deploy self.zappa.add_binary_support(api_id=api_id, cors=self.cors) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/core.py", line 1490, in add_binary_support restApiId=api_id File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call raise error_class(parsed_response, operation_name)ClientError: An error occurred (ValidationError) when calling the PutRole operation: Provided role 'arn:aws:iam:484375727565:role/lambda_basic_execution' cannot be assumed by principal'events.amazonaws.com'.==============Need help? Found a bug? Let us know!:DFile bug reports on GitHub here: https://github.com/Miserlou/ZappaAnd join our Slack channel here: https://slack.zappa.ioLove!,~ Team Zappa!
3. Adding the parameter and running zappa update will cause above error. As you can see it says “Stack django-zappa-sa-dev does not exists” as the previous deployment was unsuccessful. To fix this, delete the Lambda function from console and rerun the deployment.
4. If you run into any distribution error, please try down-grading your pip version to 9.0.1.
$ pip install pip==9.0.1
Calling deploy for stage dev..Downloading and installing dependencies..Oh no! An error occurred! :(==============Traceback (most recent call last): File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 2610, in handle sys.exit(cli.handle()) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 505, in handle self.dispatch_command(self.command, stage) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 539, in dispatch_command self.deploy(self.vargs['zip']) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 709, in deploy self.create_package() File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 2171, in create_package disable_progress=self.disable_progress File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/core.py", line 595, in create_lambda_zip installed_packages = self.get_installed_packages(site_packages, site_packages_64) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/core.py", line 751, in get_installed_packages pip.get_installed_distributions()AttributeError: 'module' object has no attribute 'get_installed_distributions'==============Need help? Found a bug? Let us know!:DFile bug reports on GitHub here: https://github.com/Miserlou/ZappaAnd join our Slack channel here: https://slack.zappa.ioLove!,~ Team Zappa!
or,
If you run into NotFoundException(Invalid REST API Identifier issue) please try undeploying the Zappa stage and retry again.
Calling deploy for stage dev..Downloading and installing dependencies..Packaging project aszip.Uploading django-zappa-sa-dev-1526830532.zip (10.9MiB)..100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.4M/11.4M [00:42<00:00, 331KB/s]Scheduling..Scheduled django-zappa-sa-dev-zappa-keep-warm-handler.keep_warm_callback with expression rate(4 minutes)!Uploading django-zappa-sa-dev-template-1526830690.json (1.6KiB)..100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.60K/1.60K [00:01<00:00, 801B/s]Oh no! An error occurred! :(==============Traceback (most recent call last): File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 2610, in handle sys.exit(cli.handle()) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 505, in handle self.dispatch_command(self.command, stage) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 539, in dispatch_command self.deploy(self.vargs['zip']) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/cli.py", line 800, in deploy self.zappa.add_binary_support(api_id=api_id, cors=self.cors) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/zappa/core.py", line 1490, in add_binary_support restApiId=api_id File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/velotio/Envs/django_zappa_sample/local/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call raise error_class(parsed_response, operation_name)NotFoundException: An error occurred (NotFoundException) when calling the GetRestApi operation: Invalid RESTAPI identifier specified 484375727565:akg59b222b==============Need help? Found a bug? Let us know!:DFile bug reports on GitHub here: https://github.com/Miserlou/ZappaAnd join our Slack channel here: https://slack.zappa.ioLove!,~ Team Zappa!
TIP: To understand how your application works on serverless environment please visit this link.
Post Deployment Setup
Migrate database
At this point, you should have an empty database for your Django application to fill up with a schema.
$ zappa manage.py migrate dev
Once you run above command the database migrations will be applied on the database as specified in your Django settings.
Creating Superuser of Django Application
You also might need to create a new superuser on the database. You could use the following command on your project directory.
Note that your application must be connected to the same database as this is run as standard Django administration command (not a Zappa command).
Managing static files
Your Django application will be having a dependency on static files, Django admin panel uses a combination of JS, CSS and image files.
NOTE: Zappa is for running your application code, not for serving static web assets. If you plan on serving custom static assets in your web application (CSS/JavaScript/images/etc.), you’ll likely want to use a combination of AWS S3 and AWS CloudFront.
You will need to add following packages to your virtual environment required for management of files to and from S3 django-storages and boto.
$ pip install django-storages botoAdd Django-Storage to your INSTALLED_APPSin settings.pyINSTALLED_APPS= (...,storages',)Configure Django-storage in settings.py asAWS_STORAGE_BUCKET_NAME='django-zappa-sample-bucket'AWS_S3_CUSTOM_DOMAIN='%s.s3.amazonaws.com'%AWS_STORAGE_BUCKET_NAMESTATIC_URL="https://%s/"%AWS_S3_CUSTOM_DOMAINSTATICFILES_STORAGE='storages.backends.s3boto.S3BotoStorage'
Once you have setup the Django application to serve your static files from AWS S3, run following command to upload the static file from your project to S3.
$ python manage.py collectstatic --noinput
or
$ zappa update dev$ zappa manage dev "collectstatic --noinput"
Check that at least 61 static files are moved to S3 bucket. Admin panel is built over 61 static files.
NOTE: STATICFILES_DIR must be configured properly to collect your files from the appropriate location.
Tip: You need to render static files in your templates by loading static path and using the same. Example, {% static %}
Setting Up API Gateway
To connect to your Django application you also need to ensure you have API gateway setup for your AWS Lambda Function. You need to have GET methods set up for all the URL resources used in your Django application. Alternatively, you can setup a proxy method to allow all subresources to be processed through one API method.
Go to AWS Lambda function console and add API Gateway from ‘Add triggers’.
1. Configure API, Deployment Stage, and Security for API Gateway. Click Save once it is done.
2. Go to API Gateway console and,
a. Recreate ANY method for / resource.
i. Check `Use Lambda Proxy integration`
ii. Set `Lambda Region` and `Lambda Function` and `Save` it.
a. Recreate ANY method for /{proxy+} resource.
i. Select `Lambda Function Proxy`
ii. Set`Lambda Region` and `Lambda Function` and `Save` it.
3. Click on Action and select Deploy API. Set Deployment Stage and click Deploy
4. Ensure that GET and POST method for / and Proxy are set as Override for this method
Setting Up Custom SSL Endpoint
Optionally, you could also set up your own custom defined SSL endpoint with Zappa and install your certificate with your domain by running certify with Zappa.
Now you are ready to launch your Django Application hosted on AWS Lambda.
Additional Notes:
Once deployed, you must run “zappa update <stage-name>” for updating your already hosted AWS Lambda function.</stage-name>
You can check server logs for investigation by running “zappa tail” command.
To un-deploy your application, simply run: `zappa undeploy <stage-name>`</stage-name>
You’ve seen how to deploy Django application on AWS Lambda using Zappa. If you are creating your Django application for first time you might also want to read Edgar Roman’s Django Zappa Guide.
Start building your Django application and let us know in the comments if you need any help during your application deployment over AWS Lambda.
More often than not organizations need to apply various kinds of policies on the environments where they run their applications. These policies might be required to meet compliance requirements, achieve a higher degree of security, achieve standardization across multiple environments, etc. This calls for an automated/declarative way to define and enforce these policies. Policy engines like OPA help us achieve the same.
Motivation behind Open Policy Agent (OPA)
When we run our application, it generally comprises multiple subsystems. Even in the simplest of cases, we will be having an API gateway/load balancer, 1-2 applications and a database. Generally, all these subsystems will have different mechanisms for authorizing the requests, for example, the application might be using JWT tokens to authorize the request, but your database is using grants to authorize the request, it is also possible that your application is accessing some third-party APIs or cloud services which will again have a different way of authorizing the request. Add to this your CI/CD servers, your log server, etc and you can see how many different ways of authorization can exist even in a small system.
The existence of so many authorization models in our system makes life difficult when we need to meet compliance or information security requirements or even some self-imposed organizational policies. For example, if we need to adhere to some new compliance requirements then we need to understand and implement the same for all the components which do authorization in our system.
“The main motivation behind OPA is to achieve unified policy enforcements across the stack”
What are Open Policy Agent (OPA) and OPA Gatekeeper
The OPA is an open-source, general-purpose policy engine that can be used to enforce policies on various types of software systems like microservices, CI/CD pipelines, gateways, Kubernetes, etc. OPA was developed by Styra and is currently a part of CNCF.
OPA provides us with REST APIs which our system can call to check if the policies are being met for a request payload or not. It also provides us with a high-level declarative language, Rego which allows us to specify the policies we want to enforce as code. This provides us with lots of flexibility while defining our policies.
The above image shows the architecture of OPA. It exposes APIs which any service that needs to make an authorization or policy decision, can call (policy query) and then OPA can make a decision based on the Rego code for the policy and return a decision to the service that further processes the request accordingly. The enforcement is done by the actual service itself, OPA is responsible only for making the decision. This is how OPA becomes a general-purpose policy engine and supports a large number of services.
The Gatekeeper project is a Kubernetes specific implementation of the OPA. Gatekeeper allows us to use OPA in a Kubernetes native way to enforce the desired policies.
How Gatekeeper enforces policies
On the Kubernetes cluster, the Gatekeeper is installed as a ValidatingAdmissionWebhook. The Admission Controllers can intercept requests after they have been authenticated and authorized by the K8s API server, but before they are persisted in the database. If any of the admission controllers rejects the request then the overall request is rejected. The limitation of admission controllers is that they need to be compiled into the kube-apiserver and can be enabled only when the apiserver starts up.
To overcome this rigidity of the admission controller, admission webhooks were introduced. Once we enable admission webhooks controllers in our cluster, they can send admission requests to external HTTP callbacks and receive admission responses. Admission webhook can be of two types MutatingAdmissionWebhook and ValidatingAdmissionWebhook. The difference between the two is that mutating webhooks can modify the objects that they receive while validating webhooks cannot. The below image roughly shows the flow of an API request once both mutating and validating admission controllers are enabled.
The role of Gatekeeper is to simply check if the request meets the defined policy or not, that is why it is installed as a validating webhook.
Now we have Gatekeeper up and running in our cluster. The above installation also created a CRD named `constrainttemplates.templates.gatekeeper.sh’. This CRD allows us to create constraint templates for the policy we want to enforce. In the constraint template, we define the constraints logic using the Rego code and also its schema. Once the constraint template is created, we can create the constraints which are instances of the constraint templates, created for specific resources. Think of it as function and actual function calls, the constraint templates are like functions that are invoked with different values of the parameter (resource kind and other values) by constraints.
To get a better understanding of the same, let’s go ahead and create constraints templates and constraints.
The policy that we want to enforce is to prevent developers from creating a service of type LoadBalancer in the `dev` namespace of the cluster, where they verify the working of other code. Creating services of type LoadBalancer in the dev environment is adding unnecessary costs.
In the constraint template spec, we define a new object kind/type which we will use while creating the constraints, then in the target, we specify the Rego code which will verify if the request meets the policy or not. In the Rego code, we specify a violation that if the request is to create a service of type LoadBalancer then the request should be denied.
Using the above template, we can now define constraints:
Here we have specified the kind of the Kubernetes object (Service) on which we want to apply the constraint and we have specified the namespace as dev because we want the constraint to be enforced only on the dev namespace.
Let’s go ahead and create the constraint template and constraint:
Note: After creating the constraint template, please check if its status is true or not, otherwise you will get an error while creating the constraints. Also it is advisable to verify the Rego code snippet before using them in the constraints template.
Now let’s try to create a service of type LoadBalancer in the dev namespace:
When we tried to create a service of type LoadBalancer in the dev namespace, we got the error that it was denied by the admission webhook due to `deny-lb-type-svc-dev-ns` constraint, but when we try to create the service in the default namespace, we were able to do so.
Here we are not passing any parameters to the Rego policy from our constraints, but we can certainly do so to make our policy more generic, for example, we can add a field named servicetype to constraint template and in the policy code, deny all the request where the servicetype value defined in the constraint matches the value of the request. With this, we will be able to deny service of types other than LoadBalancer as well in any namespace of our cluster.
Gatekeeper also provides auditing for resources that were created before the constraint was applied. The information is available in the status of the constraint objects. This helps us in identifying which objects in our cluster are not compliant with our constraints.
Conclusion:
OPA allows us to apply fine-grained policies in our Kubernetes clusters and can be instrumental in improving the overall security of Kubernetes clusters which has always been a concern for many organizations while adopting or migrating to Kubernetes. It also makes meeting the compliance and audit requirements much simpler. There is some learning curve as we need to get familiar with Rego to code our policies, but the language is very simple and there are quite a few good examples to help in getting started.
The rise of containers has reshaped the way we develop, deploy and maintain the software. Containers allow us to package the different services that constitute an application into separate containers, and to deploy those containers across a set of virtual and physical machines. This gives rise to container orchestration tool to automate the deployment, management, scaling and availability of a container-based application. Kubernetes allows deployment and management of container-based applications at scale. Learn more about backup and disaster recovery for your Kubernetes clusters.
One of the main advantages of Kubernetes is how it brings greater reliability and stability to the container-based distributed application, through the use of dynamic scheduling of containers. But, how do you make sure Kubernetes itself stays up when a component or its master node goes down?
Why we need Kubernetes High Availability?
Kubernetes High-Availability is about setting up Kubernetes, along with its supporting components in a way that there is no single point of failure. A single master cluster can easily fail, while a multi-master cluster uses multiple master nodes, each of which has access to same worker nodes. In a single master cluster the important component like API server, controller manager lies only on the single master node and if it fails you cannot create more services, pods etc. However, in case of Kubernetes HA environment, these important components are replicated on multiple masters(usually three masters) and if any of the masters fail, the other masters keep the cluster up and running.
Advantages of multi-master
In the Kubernetes cluster, the master node manages the etcd database, API server, controller manager, and scheduler, along with all the worker nodes. What if we have only a single master node and if that node fails, all the worker nodes will be unscheduled and the cluster will be lost.
In a multi-master setup, by contrast, a multi-master provides high availability for a single cluster by running multiple apiserver, etcd, controller-manager, and schedulers. This does not only provides redundancy but also improves network performance because all the masters are dividing the load among themselves.
A multi-master setup protects against a wide range of failure modes, from a loss of a single worker node to the failure of the master node’s etcd service. By providing redundancy, a multi-master cluster serves as a highly available system for your end-users.
Steps to Achieve Kubernetes HA
Before moving to steps to achieve high-availability, let us understand what we are trying to achieve through a diagram:
(Image Source: Kubernetes Official Documentation)
Master Node: Each master node in a multi-master environment run its’ own copy of Kube API server. This can be used for load balancing among the master nodes. Master node also runs its copy of the etcd database, which stores all the data of cluster. In addition to API server and etcd database, the master node also runs k8s controller manager, which handles replication and scheduler, which schedules pods to nodes.
Worker Node: Like single master in the multi-master cluster also the worker runs their own component mainly orchestrating pods in the Kubernetes cluster. We need 3 machines which satisfy the Kubernetes master requirement and 3 machines which satisfy the Kubernetes worker requirement.
For each master, that has been provisioned, follow the installation guide to install kubeadm and its dependencies. In this blog we will use k8s 1.10.4 to implement HA.
Note: Please note that cgroup driver for docker and kubelet differs in some version of k8s, make sure you change cgroup driver to cgroupfs for docker and kubelet. If cgroup driver for kubelet and docker differs then the master doesn’t come up when rebooted.
6. Create a directory /etc/kubernetes/pki/etcd on master-1 and master-2 and copy all the generated certificates into it.
7. On all masters, now generate peer and etcd certs in /etc/kubernetes/pki/etcd. To generate them, we need the previous CA certificates on all masters.
This will replace the default configuration with your machine’s hostname and IP address, so in case if you encounter any problem just check the hostname and IP address are correct and rerun cfssl command.
8. On all masters, Install etcd and set it’s environment file.
9. Now, we will create a 3 node etcd cluster on all 3 master nodes. Starting etcd service on all three nodes as systemd. Create a file /etc/systemd/system/etcd.service on all masters.
This will show the cluster healthy and connected to all three nodes.
Setup load balancer
There are multiple cloud provider solutions for load balancing like AWS elastic load balancer, GCE load balancing etc. There might not be a physical load balancer available, we can setup a virtual IP load balancer to healthy node master. We are using keepalived for load balancing, install keepalived on all master nodes
$ yum install keepalived -y
Create the following configuration file /etc/keepalived/keepalived.conf on all master nodes:
Please ensure that the following placeholders are replaced:
<master-private-ip> with the private IPv4 of the master server on which config file resides.</master-private-ip>
<master0-ip-address>, <master1-ip-address> and <master-2-ip-address> with the IP addresses of your three master nodes</master-2-ip-address></master1-ip-address></master0-ip-address>
<podcidr> with your Pod CIDR. Please read the </podcidr>CNI network section of the docs for more information. Some CNI providers do not require a value to be set. I am using weave-net as pod network, hence podCIDR will be 10.32.0.0/12
<load-balancer-ip> with the virtual IP set up in the load balancer in the previous section.</load-balancer-ip>
$ kubeadm init --config=config.yaml
10. Run kubeadm init on master1 and master2:
First of all copy /etc/kubernetes/pki/ca.crt, /etc/kubernetes/pki/ca.key, /etc/kubernetes/pki/sa.key, /etc/kubernetes/pki/sa.pub to master1’s and master2’s /etc/kubernetes/pki folder.
Note: Copying this files is crucial, otherwise the other two master nodes won’t go into the ready state.
Copy the config file config.yaml from master0 to master1 and master2. We need to change <master-private-ip> to current master host’s private IP.</master-private-ip>
$ kubeadm init --config=config.yaml
11. Now you can install pod network on all three masters to bring them in the ready state. I am using weave-net pod network, to apply weave-net run:
12. By default, k8s doesn’t schedule any workload on the master, so if you want to schedule workload on master node as well, taint all the master nodes using the command:
Even after one node failed, all the important components are up and running. The cluster is still accessible and you can create more pods, deployment services etc.
High availability is an important part of reliability engineering, focused on making system reliable and avoid any single point of failure of the complete system. At first glance, its implementation might seem quite complex, but high availability brings tremendous advantages to the system that requires increased stability and reliability. Using highly available cluster is one of the most important aspects of building a solid infrastructure.
GraphQL is a new hype in the Field of API technologies. We have been constructing and using REST API’s for quite some time now and started hearing about GraphQL recently. GraphQL is usually described as a frontend-directed API technology as it allows front-end developers to request data in a more simpler way than ever before. The objective of this query language is to formulate client applications formed on an instinctive and adjustable format, for portraying their data prerequisites as well as interactions.
The Phoenix Framework is running on Elixir, which is built on top of Erlang. Elixir core strength is scaling and concurrency. Phoenix is a powerful and productive web framework that does not compromise speed and maintainability. Phoenix comes in with built-in support for web sockets, enabling you to build real-time apps.
Prerequisites:
Elixir & Erlang: Phoenix is built on top of these
Phoenix Web Framework: Used for writing the server application. (It’s a well-unknown and lightweight framework in elixir)
Absinthe: GraphQL library written for Elixir used for writing queries and mutations.
GraphiQL: Browser based GraphQL ide for testing your queries. Consider it similar to what Postman is used for testing REST APIs.
Overview:
The application we will be developing is a simple blog application written using Phoenix Framework with two schemas User and Post defined in Accounts and Blog resp. We will design the application to support API’s related to blog creation and management. Assuming you have Erlang, Elixir and mix installed.
Where to Start:
At first, we have to create a Phoenix web application using the following command:
mix phx.new --no-brunch --no-html
• –no-brunch – do not generate brunch files for static asset building. When choosing this option, you will need to manually handle JavaScript dependencies if building HTML apps
• –-no-html – do not generate HTML views.
Note: As we are going to mostly work with API, we don’t need any web pages, HTML views and so the command args and
Dependencies:
After we create the project, we need to add dependencies in mix.exs to make GraphQL available for the Phoenix application.
We can used following components to design/structure our GraphQL application:
GraphQL Schemas : This has to go inside lib/graphql_web/schema/schema.ex. The schema definitions your queries and mutations.
Custom types: Your schema may include some custom properties which should be defined inside lib/graphql_web/schema/types.ex
Resolvers: We have to write respective Resolver Function’s that handles the business logic and has to be mapped with respective query or mutation. Resolvers should be defined in their own files. We defined it inside lib/graphql/accounts/user_resolver.ex and lib/graphql/blog/post_resolver.ex folder.
Also, we need to uppdate the router we have to be able to make queries using the GraphQL client in lib/graphql_web/router.ex and also have to create a GraphQL pipeline to route the API request which also goes inside lib/graphql_web/router.ex:
pipeline :graphql do plug Graphql.Context #custom plug written into lib/graphql_web/plug/context.ex folderendscope "/api"dopipe_through(:graphql) #pipeline through which the request have to be routedforward("/", Absinthe.Plug, schema: GraphqlWeb.Schema)forward("/graphiql", Absinthe.Plug.GraphiQL, schema: GraphqlWeb.Schema)end
Writing GraphQL Queries:
Lets write some graphql queries which can be considered to be equivalent to GET requests in REST. But before getting into queries lets take a look at GraphQL schema we defined and its equivalent resolver mapping:
defmodule GraphqlWeb.Schema do use Absinthe.Schemaimport_types(GraphqlWeb.Schema.Types) query dofield :blog_posts, list_of(:blog_post) doresolve(&Graphql.Blog.PostResolver.all/2) endfield :blog_post, type: :blog_post doarg(:id, non_null(:id))resolve(&Graphql.Blog.PostResolver.find/2) endfield :accounts_users, list_of(:accounts_user) doresolve(&Graphql.Accounts.UserResolver.all/2) endfield :accounts_user, :accounts_user doarg(:email, non_null(:string))resolve(&Graphql.Accounts.UserResolver.find/2) end endend
You can see above we have defined four queries in the schema. Lets pick a query and see what goes into it :
field :accounts_user, :accounts_user doarg(:email, non_null(:string))resolve(&Graphql.Accounts.UserResolver.find/2)end
Above, we have retrieved a particular user using his email address through Graphql query.
arg(:, ): defines an non-null incoming string argument i.e user email for us.
Graphql.Accounts.UserResolver.find/2 : the resolver function that is mapped via schema, which contains the core business logic for retrieving an user.
Accounts_user : the custome defined type which is defined inside lib/graphql_web/schema/types.ex as follows:
We need to write a separate resolver function for every query we define. Will go over the resolver function for accounts_user which is present in lib/graphql/accounts/user_resolver.ex file:
defmodule Graphql.Accounts.UserResolver do alias Graphql.Accounts #import lib/graphql/accounts/accounts.ex as Accounts def all(_args, _info) do {:ok, Accounts.list_users()} end def find(%{email: email}, _info) do case Accounts.get_user_by_email(email) do nil -> {:error, "User email #{email} not found!"} user -> {:ok, user} end endend
This function is used to list all users or retrieve a particular user using an email address. Let’s run it now using GraphiQL browser. You need to have the server running on port 4000. To start the Phoenix server use:
mix deps.get #pulls all the dependenciesmix deps.compile #compile your codemix phx.server #starts the phoenix server
Let’s retrieve an user using his email address via query:
Above, we have retrieved the id, email and name fields by executing accountsUser query with an email address. GraphQL also allow us to define variables which we will show later when writing different mutations.
Let’s execute another query to list all blog posts that we have defined:
Writing GraphQL Mutations:
Let’s write some GraphQl mutations. If you have understood the way graphql queries are written mutations are much simpler and similar to queries and easy to understand. It is defined in the same form as queries with a resolver function. Different mutations we are gonna write are as follow:
create_post:- create a new blog post
update_post :- update a existing blog post
delete_post:- delete an existing blog post
The mutation looks as follows:
defmodule GraphqlWeb.Schema do use Absinthe.Schemaimport_types(GraphqlWeb.Schema.Types) query do mutation dofield :create_post, type: :blog_post doarg(:title, non_null(:string))arg(:body, non_null(:string))arg(:accounts_user_id, non_null(:id))resolve(&Graphql.Blog.PostResolver.create/2) endfield :update_post, type: :blog_post doarg(:id, non_null(:id))arg(:post, :update_post_params)resolve(&Graphql.Blog.PostResolver.update/2) endfield :delete_post, type: :blog_post doarg(:id, non_null(:id))resolve(&Graphql.Blog.PostResolver.delete/2) end end endend
Let’s run some mutations to create a post in GraphQL:
Notice the method is POST and not GET over here.
Let’s dig into update mutation function :
field :update_post, type: :blog_post doarg(:id, non_null(:id))arg(:post, :update_post_params)resolve(&Graphql.Blog.PostResolver.update/2)end
Here, update post takes two arguments as input , non null id and a post parameter of type update_post_params that holds the input parameter values to update. The mutation is defined in lib/graphql_web/schema/schema.ex while the input parameter values are defined in lib/graphql_web/schema/types.ex —
Auth0 is a service that handles your application’s authentication and authorization needs with simple drop-in solutions. It can save time and risk compared to building your own authentication/authorization system. Auth0 even has its own universal login/signup page that can be customized through the dashboard, and it also provides APIs to create/manage users.
A frictionless signup flow allows the user to use a core feature of the application without forcing the user to sign up first. Many companies use this flow, namely Bookmyshow, Redbus, Makemytrip, and Goibibo.
So, as an example, we will see how an application like Bookmyshow looks with this frictionless flow. First, let’s assume the user is a first-time user for this application; the user lands on the landing page, selects a movie, selects the theater, selects the number of seats, and then lands on the payment page where they will fill in their contact details (email and mobile number) and proceed to complete the booking flow by paying for the ticket. At this point, the user has accessed the website and made a booking without even signing up.
Later on, when the user sign ups using the same contact details which were provided during booking, they will notice their previous bookings and other details waiting for them on the app’s account page.
What we will be doing in this blog?
In this blog, we will be implementing Auth0 and replicating a similar feature as mentioned above using Auth0. In this code sample, we will be using react.js for the frontend and nest.js for the backend.
To keep the blog short, we will only focus on the logic related to the frictionless signup with Auth0. We will not be going through other aspects, like payment service/integration, nest.js, ORM, etc.
Setup for the Auth0 dashboard:
Auth0’s documentation is pretty straightforward and easy to understand; we’ll link the sections for this setup, and you can easily sign up and continue your setup with the help of their documentation.
Do note that you will have to create two applications for this flow. One is a Single-Page Application for your frontend so that you can initiate login from your frontend app and the other is ManagementV2 for your server so that you can use their management APIs to create a user.
After registering you will get the client id and client secret on the application details page, you will require these keys to plug it in auth0’s SDK so you will be able to use its APIs in your application.
Setup for your single-page application:
To use Auth0’s API, we would have to install its SDK. For single-page applications, Auth0 has rewritten a new SDK from auth0-js called auth0-spa-js. But if you are using either Angular, React, or Vue, then auth0 already has created their framework/library-specific wrapper for us to use.
So, we will move on to installing its React wrapper and continuing with the setup:
npm install @auth0/auth0-react
Then, we will wrap our app with Auth0Provider and provide the keys from the Auth0 application settings dashboard:
But we do want to cover one issue with their authenticated state and redirection. We noticed that when Auth0 redirects to our application, the isAuthenticated flag doesn’t get reflected immediately. The states get sequentially updated like so:
isLoading: false isAuthenticated: false
isLoading: true isAuthenticated: false
isLoading: false isAuthenticated: false
isLoading: false isAuthenticated: true
This can be a pain if you have some common redirection logic based on the user‘s authentication state and user type.
What we found out from the Auth0’s community forum is that Auth0 does take some time to parse and update its states, and after the update operations, it then calls the onRedirectCallback function, so it’s safe to put your redirection logic in onRedirectCallback, but there is another issue with that.
The function doesn’t have access to Auth0’s context, so you can’t access the user object or any other state for your redirection logic, so you would want to redirect to a page where you have your redirection logic when onRedirectCallback is called.
So, in place of the actual page set in redirectUri, you would want to use a buffer page like the /auth-callback route where it just shows a progress bar and nothing else.
Implementation:
For login/signup, since we are using the universal page we don’t have to do much, we just have to initiate the login with loginWithRedirect() function from the UI, and Auth0 will handle the rest.
Now, for the core part of the blog, we will now be creating a createBooking API on our nest.js backend, which will accept email, mobile number, booking details (movie, theater location, number of seats), and try to create a booking.
In this frictionless flow, internally the application does create a user for the booking to refer to; otherwise, it would be difficult to show the bookings once the user signups and tries to access its bookings.
So, the logic would go as follows: first, it will check if a user exists with the provided email in our DB. If not, then we will create the user in Auth0 through its management API with a temporary password, and then we will link the newly created Auth0 user in our users table. Then, by using this, we will create a booking.
Here is an overview of how the createBooking API will look:
@Post('/bookings/create') async createBooking( @Body() createBookingDto: createBookingDto ): Promise<BookingResponseDto> { const { email } = createBookingDto;// Checks if the email exists or not, if it doesn’t exists then we will create an account, else we will use the existing user to create a booking let user =awaitthis.userRepository.findByEmail(email);if (!user){constpassword= Utilities.generatePassword(16);// We use a random password here to create the user on auth0 const { auth0Response } =awaitthis.createUserOnAuth0( email, password, );this.logger.debug(auth0Response, 'Created Auth0 User');let userData:CreateUserDto= { email, auth0UserId: auth0Response['_id'], };// Creates and links the auth0 user with our DB user =awaitthis.userRepository.addUser(userData); } const booking = {userId: user.id,transactionId: createBookingDto.transaction.id, // Assuming the payment was done before this API call in a different serviceshowId: createBookingDto.show.id,theaterId: createBookingDto.theater.id,seatNumbers: createBookingDto.seats }// Creates a booking const bookingObject =awaitthis.bookingRepository.bookTicket(booking) return new BookingResponseDto(bookingObject) }
As for creating the user on Auth0, we will use Auth0’s management API with the /dbconnections/signup endpoint. Apart from the config details that the API requires (client_id, client_secret and connection), it also requires email and password. For the password, we will use a randomly generated one.
After the user has been created, we will send a forgotten password email to that email address so that the user can set the password and access the account.
Do note you will have to use the client_id, client_secret, and connection of the ManagementV2 application that was created in the Auth0 dashboard.
private async createUserOnAuth0( email: string, password: string, createdBy: string, retryCount =0, ): Promise<Record<string, string>> { try {constaxiosResponse=awaitthis.httpService .post(`https://${configService.getAuth0Domain()}/dbconnections/signup`, { client_id: configService.getAuth0ClientId(), client_secret: configService.getAuth0ClientSecret(), connection: configService.getAuth0Connection(), email, password, }, ) .toPromise();this.logger.log( axiosResponse.data.email,'Auth0 user created with email', );// Send password reset emailthis.sendPasswordResetEmail(email);return { auth0Response: axiosResponse.data, password }; } catch (err) { this.logger.error(err);/** * {@linkhttps://auth0.com/docs/connections/database/password-strength} * Auth0 does not send any specific response, so here we are calling create user again * assuming password failed to meet the requirement of auth0 * But here also we are gonna try it ERROR_RETRY_COUNT times and after that stop call, * so we don't get in infinite loop */if (retryCount < ERROR_RETRY_COUNT) {returnthis.createUserOnAuth0( email, Utilities.generatePassword(16), createdBy, retryCount +1, ); } throw new HttpException(err, HttpStatus.BAD_REQUEST); } }
To send the forgotten password email, we will use the /dbconnections/change_password endpoint from the management API. The code is pretty straightforward.
This way, the user can change the password, and he/she will be able to access their account.
With this, the user can now make a booking without signing up and have a user created in Auth0 for that user, so when he/she logs in later using the universal login page, Auth0 will have a reference for it.
Conclusion:
Auth0 is a great platform for managing your application’s authentication and authorization needs if you have a simple enough login/signup flow. It can get a bit tricky when you are trying to implement a non-traditional login/signup flow or a custom flow, which is not supported by Auth0. In such a scenario, you would need to add some custom code as explained in the example above.
The data we render on a UI originates from different sources like databases, APIs, files, and more. In React applications, when the data is received, we first store it in state and then pass it to the other components in multiple ways for rendering.
But most of the time, the format of the data is inconvenient for the rendering component. So, we have to format data and perform some prior calculations before we give it to the rendering component.
Sending data directly to the rendering component and processing the data inside that component is not recommended. Not only data processing but also any heavy background jobs that we would have to depend on the backend can now be done on the client-side because React allows the holding of business logic on the front-end.
A good practice is to create a separate function for processing that data which is isolated from the rendering logic, so that data processing and data representation will be done separately.
Why? There are two possible reasons: – The processed data can be shared/used by other components, too.
– The main reason to avoid this is: if the data processing is a time-consuming task, you will see some lag on the UI, or in the worst-case scenario, sometimes the page may become unresponsive.
As JavaScript is a single-threaded environment, it has only one call stack to execute scripts (in a simple way, you cannot run more than one script at the same time).
For example, suppose you have to do some DOM manipulations and, at the same time, want to do some complex calculations. You can not perform these two operations in parallel. If the JavaScript engine is busy computing the complex computation, then all the other tasks like event listeners and rendering callbacks will get blocked for that amount of time, and the page may become unresponsive.
How can you solve this problem?
Though JavaScript is single-threaded, many developers mimic the concurrency with the help of timer functions and event handlers. Like by breaking heavy (time-consuming) tasks into tiny chunks and by using the timers you can split their execution. Let’s take a look at the following example.
Here, the processDataArray function uses the timer function to split the execution, which internally uses the setTimeout method for processing some items of array, again after a dedicated time passed execute more items, once all the array elements have been processed, send the processed result back by using the finishCallback.
constprocessDataArray= (dataArray, finishCallback) => {// take a new copy of arrayconsttodo= dataArray.concat();// to store each processed datalet result = [];// timer functionconsttimedProcessing= () => {conststart=+newDate();do {// process each data item and store it's resultconstsingleResult=processSingleData(todo.shift()); result.push(singleResult);// check if todo has something to process and the time difference must not be greater than 50 } while (todo.length>0&&+newDate() - start <50);// check for remaining items to processif (todo.length>0) {setTimeout(timedProcessing, 25); } else {// finished with all the items, initiate finish callbackfinishCallback(result); } };setTimeout(timedProcessing, 25);};constprocessSingleData=data=> {// process datareturn processedData;};
You can find more about how JavaScript timers work internally here.
The problem is not solved yet, and the main thread is still busy in the computation so you can see the delay in the UI events like button clicks or mouse scroll. This is a bad user experience when you have a big array computation going on and an impatient web user.
The better and real multithreading way to solve this problem and to run multiple scripts in parallel is by using Web Workers.
What are Web Workers?
Web Workers provide a mechanism to spawn a separate script in the background. Where you can do any type of calculations without disturbing the UI. Web Workers run outside the context of the HTML document’s script, making it easiest to allow concurrent execution of JavaScript programs. You can experience multithreading behavior while using Web Workers.
Communication between the page (main thread) and the worker happens using a simple mechanism. They can send messages to each other using the postMessage method, and they can receive the messages using onmessage callback function. Let’s take a look at a simple example:
In this example, we will delegate the work of multiplying all the numbers in an array to a Web Worker, and the Web Worker returns the result back to the main thread.
import"./App.css";import { useEffect, useState } from"react";functionApp() {// This will load and execute the worker.js script in the background.const [webworker] =useState(new window.Worker("worker.js"));const [result, setResult] =useState("Calculating....");useEffect(() => {constmessage= { multiply: { array: newArray(1000).fill(2) } }; webworker.postMessage(message); webworker.onerror= () => {setResult("Error"); }; webworker.onmessage= (e) => {if (e.data) {setResult(e.data.result); } else {setResult("Error"); } }; }, []);useEffect(() => {return () => { webworker.terminate(); }; }, []);return ( <divclassName="App"> <h1>Webworker Example In React</h1> <headerclassName="App-header"> <h1>Multiplication Of large array</h1> <h2>Result: {result}</h2> </header> </div> );}exportdefault App;
onmessage= (e) => {const { multiply } = e.data;// check data is correctly framedif (multiply && multiply.array.length) {// intentionally delay the executionsetTimeout(() => {// this post back the result to the pagepostMessage({ result: multiply.array.reduce( (firstItem, secondItem) => firstItem * secondItem ), }); }, 2000); } else {postMessage({ result: 0 }); }};
If the worker script throws an exception, you can handle it by attaching a callback function to the onerror property of the worker in the App.js script.
From the main thread, you can terminate the worker immediately if you want by using the worker’s terminate method. Once the worker is terminated, the worker variable becomes undefined. You need to create another instance if needed.
Charting middleware – Suppose you have to design a dashboard that represents the analytics of businesses engagement for a business retention application by means of a pivot table, pie charts, and bar charts. It involves heavy processing of data to convert it to the expected format of a table, pie chart, a bar chart. This may result in the UI failing to update, freezing, or the page becoming unresponsive because of single-threaded behavior. Here, we can use Web Workers and delegate the processing logic to it. So that the main thread is always available to handle other UI events.
– Emulating excel functionality – For example, if you have thousands of rows in the spreadsheet and each one of them needs some calculations (longer), you can write custom functions containing the processing logic and put them in the WebWorker’s script.
– Real-time text analyzer – This is another good example where we can use WebWorker to show the word count, characters count, repeated word count, etc., by analyzing the text typed by the user in real-time. With a traditional implementation, you may experience performance issues as the text size grows, but this can be optimized by using WebWorkers.
Web Worker limitations:
Yes, Web Workers are amazing and quite simple to use, but as the WebWorker is a separate thread, it does not have access to the window object, document object, and parent object. And we can not pass functions through postmessage.
Web Workers make our life easier by doing jobs in parallel in the background, but Web Workers are relatively heavy-weight and come with high startup performance cost and a high per-instance memory cost, so as per the WHATWG community, they are not intended to be used in large numbers.