A Progressive Web Application or PWA is a web application that is built to look and behave like native apps, operates offline-first, is optimized for a variety of viewports ranging from mobile, tablets to FHD desktop monitors and more. PWAs are built using front-end technologies such as HTML, CSS and JavaScript and bring native-like user experience to the web platform. PWAs can also be installed on devices just like native apps.
For an application to be classified as a PWA, it must tick all of these boxes:
PWAs must implement service workers. Service workers act as a proxy between the web browsers and API servers. This allows web apps to manage and cache network requests and assets
PWAs must be served over a secure network, i.e. the application must be served over HTTPS
PWAs must have a web manifest definition, which is a JSON file that provides basic information about the PWA, such as name, different icons, look and feel of the app, splash screen, version of the app, description, author, etc
Why build a PWA?
Businesses and engineering teams should consider building a progressive web app instead of a traditional web app. Here are some of the most prominent arguments in favor of PWAs:
PWAs are responsive. The mobile-first design approach enables PWAs to support a variety of viewports and orientation
PWAs can work on slow Internet or no Internet environment. App developers can choose how a PWA will behave when there’s no Internet connectivity, whereas traditional web apps or websites simply stop working without an active Internet connection
PWAs are secure because they are always served over HTTPs
PWAs can be installed on the home screen, making the application more accessible
PWAs bring in rich features, such as push notification, application updates and more
PWA and React
There are various ways to build a progressive web application. One can just use Vanilla JS, HTML and CSS or pick up a framework or library. Some of the popular choices in 2020 are Ionic, Vue, Angular, Polymer, and of course React, which happens to be my favorite front-end library.
Building PWAs with React
To get started, let’s create a PWA which lists all the users in a system.
npm init react-app userscd usersyarn add react-router-domyarn run start
Next, we will replace the default App.js file with our own implementation.
The default behavior here is to not set up a service worker, i.e. the CRA boilerplate allows the users to opt-in for the offline-first experience.
2. Update the manifest file
The CRA boilerplate provides a manifest file out of the box. This file is located at /public/manifest.json and needs to be modified to include the name of the PWA, description, splash screen configuration and much more. You can read more about available configuration options in the manifest file here.
Here the display mode selected is “standalone” which tells the web browsers to give this PWA the same look and feel as that of a standalone app. Other display options include, “browser,” which is the default mode and launches the PWA like a traditional web app and “fullscreen,” which opens the PWA in fullscreen mode – hiding all other elements such as navigation, the address bar and the status bar.
The manifest can be inspected using Chrome dev tools > Application tab > Manifest.
1. Test the PWA:
To test a progressive web app, build it completely first. This is because PWA features, such as caching aren’t enabled while running the app in dev mode to ensure hassle-free development
Create a production build with: npm run build
Change into the build directory: cd build
Host the app locally: http-server or python3 -m http.server 8080
Test the application by logging in to http://localhost:8080
2. Audit the PWA: If you are testing the app for the first time on a desktop or laptop browser, PWA may look like just another website. To test and audit various aspects of the PWA, let’s use Lighthouse, which is a tool built by Google specifically for this purpose.
PWA on mobile
At this point, we already have a simple PWA which can be published on the Internet and made available to billions of devices. Now let’s try to enhance the app by improving its offline viewing experience.
1. Offline indication: Since service workers can operate without the Internet as well, let’s add an offline indicator banner to let users know the current state of the application. We will use navigator.onLine along with the “online” and “offline” window events to detect the connection status.
The easiest way to test this is to just turn off the Wi-Fi on your dev machine. Chrome dev tools also provide an option to test this without actually going offline. Head over to Dev tools > Network and then select “Offline” from the dropdown in the top section. This should bring up the banner when the app is offline.
2. Let’s cache a network request using service worker
CRA comes with its own service-worker.js file which caches all static assets such as JavaScript and CSS files that are a part of the application bundle. To put custom logic into the service worker, let’s create a new file called ‘custom-service-worker.js’ and combine the two.
Install react-app-rewired and update package.json:
Your app should now work correctly in the offline mode.
Distributing and publishing a PWA
PWAs can be published just like any other website and only have one additional requirement, i.e. it must be served over HTTPs. When a user visits PWA from mobile or tablet, a pop-up is displayed asking the user if they’d like to install the app to their home screen.
Conclusion
Building PWAs with React enables engineering teams to develop, deploy and publish progressive web apps for billions of devices using technologies they’re already familiar with. Existing React apps can also be converted to a PWA. PWAs are fun to build, easy to ship and distribute, and add a lot of value to customers by providing native-live experience, better engagement via features, such as add to homescreen, push notifications and more without any installation process.
“Hope this email finds you well” is how 2020-2021 has been in a nutshell. Since we’ve all been working remotely since last year, actively collaborating with teammates became one notch harder, from activities like brainstorming a topic on a whiteboard to building documentation.
Having tools powered by collaborative systems had become a necessity, and to explore the same following the principle of build fast fail fast, I started building up a collaborative editor using existing available, open-source tools, which can eventually be extended for needs across different projects.
Conflicts, as they say, are inevitable, when multiple users are working on the same document constantly modifying it, especially if it’s the same block of content. Ultimately, the end-user experience is defined by how such conflicts are resolved.
There are various conflict resolution mechanisms, but two of the most commonly discussed ones are Operational Transformation (OT) and Conflict-Free Replicated Data Type (CRDT). So, let’s briefly talk about those first.
Operational Transformation
The order of operations matter in OT, as each user will have their own local copy of the document, and since mutations are atomic, such as insert V at index 4 and delete X at index 2. If the order of these operations is changed, the end result will be different. And that’s why all the operations are synchronized through a central server. The central server can then alter the indices and operations and then forward to the clients. For example, in the below image, User2 makes a delete(0) operation, but as the OT server realizes that User1 has made an insert operation, the User2’s operation needs to be changed as delete(1) before applying to User1.
OT with a central server is typically easier to implement. Plain text operations with OT in its basic form only has three defined operations: insert, delete, and apply.
“Fully distributed OT and adding rich text operations are very hard, and that’s why there’s a million papers.”
CRDT
Instead of performing operations directly on characters like in OT, CRDT uses a complex data structure to which it can then add/update/remove properties to signify transformation, enabling scope for commutativity and idempotency. CRDTs guarantee eventual consistency.
There are different algorithms, but in general, CRDT has two requirements: globally unique characters and globally ordered characters. Basically, this involves a global reference for each object, instead of positional indices, in which the ordering is based on the neighboring objects. Fractional indices can be used to assign index to an object.
As all the objects have their own unique reference, delete operation becomes idempotent. And giving fractional indices is one way to give unique references while insertion and updation.
There are two types of CRDT, one is state-based, where the whole state (or delta) is shared between the instances and merged continuously. The other is operational based, where only individual operations are sent between replicas. If you want to dive deep into CRDT, here’s a nice resource.
For our purposes, we choose CRDT since it can also support peer-to-peer networks. If you directly want to jump to the code, you can visit the repo here.
Tools used for this project:
As our goal was for a quick implementation, we targeted off-the-shelf tools for editor and backend to manage collaborative operations.
Quill.js is an API-driven WYSIWYG rich text editor built for compatibility and extensibility. We choose Quill as our editor because of the ease to plug it into your application and availability of extensions.
Yjs is a framework that provides shared editing capabilities by exposing its different shared data types (Array, Map, Text, etc) that are synced automatically. It’s also network agnostic, so the changes are synced when a client is online. We used it because it’s a CRDT implementation, and surprisingly had readily available bindings for quill.js.
Prerequisites:
To keep it simple, we’ll set up a client and server both in the same code base. Initialize a project with npm init and install the below dependencies:
npm i quill quill-cursors webpack webpack-cli webpack-dev-server y-quill y-websocket yjs
Quill: Quill is the WYSIWYG rich text editor we will use as our editor.
quill-cursors is an extension that helps us to display cursors of other connected clients to the same editor room.
Webpack, webpack-cli, and webpack-dev-server are developer utilities, webpack being the bundler that creates a deployable bundle for your application.
The Y-quill module provides bindings between Yjs and QuillJS with use of the SharedType y.Text. For more information, you can check out the module’s source on Github.
Y-websocket provides a WebsocketProvider to communicate with Yjs server in a client-server manner to exchange awareness information and data.
Yjs, this is the CRDT framework which orchestrates conflict resolution between multiple clients.
This is a basic webpack config where we have provided which file is the starting point of our frontend project, i.e., the index.js file. Webpack then uses that file to build the internal dependency graph of your project. The output property is to define where and how the generated bundles should be saved. And the devServer config defines necessary parameters for the local dev server, which runs when you execute “npm start”.
We’ll first create an index.html file to define the basic skeleton:
The index.html has a pretty basic structure. In <head>, we’ve provided the path of the bundled js file that will be created by webpack, and the css theme for the quill editor. And for the <body> part, we’ve just created a button to connect/disconnect from the backend and a placeholder div where the quill editor will be plugged.
Here, we’ve just made the imports, registered quill-cursors extension, and added an event listener for window load:
import Quill from"quill";import*as Y from'yjs';import { QuillBinding } from'y-quill';import { WebsocketProvider } from'y-websocket';import QuillCursors from"quill-cursors";// Register QuillCursors module to add the ability to show multiple cursors on the editor.Quill.register('modules/cursors', QuillCursors);window.addEventListener('load', () => {// We'll add more blocks as we continue});
Let’s initialize the Yjs document, socket provider, and load the document:
Conflict resolution approaches are not relatively new, but with the trend of remote culture, it is important to have good collaborative systems in place to enhance productivity.
Although this example was just on rich text editing capabilities, we can extend existing resources to build more features and structures like tabular data, graphs, charts, etc. Yjs shared types can be used to define your own data format based on how your custom editor represents data internally.
An app is only as good as the problem it solves. But your app’s performance can be extremely critical to its success as well. A slow-loading web app can make users quit and try out an alternative in no time. Testing an app’s performance should thus be an integral part of your development process and not an afterthought.
In this article, we will talk about how you can proactively monitor and boost your app’s performance as well as fix common issues that are slowing down the performance of your app.
I’ll use the following tools for this blog.
Lighthouse – A performance audit tool, developed by Google
Webpack – A JavaScript bundler
You can find similar tools online, both free and paid. So let’s give our Vue a new Angular perspective to make our apps React faster.
Performance Metrics
First, we need to understand which metrics play an important role in determining an app’s performance. Lighthouse helps us calculate a score based on a weighted average of the below metrics:
First Contentful Paint (FCP) – 15%
Speed Index (SI) – 15%
Largest Contentful Paint (LCP) – 25%
Time to Interactive (TTI) – 15%
Total Blocking Time (TBT) – 25%
Cumulative Layout Shift (CLS) – 5%
By taking the above stats into account, Lighthouse gauges your app’s performance as such:
0 to 49 (slow): Red
50 to 89 (moderate): Orange
90 to 100 (fast): Green
I would recommend going through Lighthouse performance scoring to learn more. Once you understand Lighthouse, you can audit websites of your choosing.
I gathered audit scores for a few websites, including Walmart, Zomato, Reddit, and British Airways. Almost all of them had a performance score below 30. A few even secured a single-digit.
To attract more customers, businesses fill their apps with many attractive features. But they ignore the most important thing: performance, which degrades with the addition of each such feature.
As I said earlier, it’s all about the user experience. You can read more about why performance matters and how it impacts the overall experience.
Now, with that being said, I want to challenge you to conduct a performance test on your favorite app. Let me know if it receives a good score. If not, then don’t feel bad.
Below your scores are the possible opportunities suggested by Lighthouse. Fixing these affects the performance metrics above and eventually boosts your app’s performance. So let’s check them out one-by-one.
Here are all the possible opportunities listed by Lighthouse:
Eliminate render-blocking resources
Properly size images
Defer offscreen images
Minify CSS & JavaScript
Serve images in the next-gen formats
Enable text compression
Preconnect to required origins
Avoid multiple page redirects
Use video formats for animated content
A few other opportunities won’t be covered in this blog, but they are just an extension of the above points. Feel free to read them under the further reading section.
This section lists down all the render-blocking resources. The main goal is to reduce their impact by:
removing unnecessary resources,
deferring non-critical resources, and
in-lining critical resources.
To do that, we need to understand what a render-blocking resource is.
Render-blocking resource and how to identify
As the name suggests, it’s a resource that prevents a browser from rendering processed content. Lighthouse identifies the following as render-blocking resources:
A <script> </script>tag in <head> </head>that doesn’t have a defer or async attribute
A <link rel=””stylesheet””> tag that doesn’t have a media attribute to match a user’s device or a disabled attribute to hint browser to not download if unnecessary
A <link rel=””import””> that doesn’t have an async attribute
To reduce the impact, you need to identify what’s critical and what’s not. You can read how to identify critical resources using the Chrome dev tool.
Classify Resources
Classify resources as critical and non-critical based on the following color code:
Green (critical): Needed for the first paint.
Red (non-critical): Not needed for the first paint but will be needed later.
Solution
Now, to eliminate render-blocking resources:
Extract the critical part into an inline resource and add the correct attributes to the non-critical resources. These attributes will indicate to the browser what to download asynchronously. This can be done manually or by using a JS bundler.
Webpack users can use the libraries below to do it in a few easy steps:
For extracting critical CSS, you can use html-critical-webpack-plugin or critters-webpack-plugin. It’ll generate an inline <style></style> tag in the <head></head> with critical CSS stripped out of the main CSS chunk and preloading the main file
For extracting CSS depending on media queries, use media-query-splitting-plugin or media-query-plugin
The first paint doesn’t need to be dependent on the JavaScript files. Use lazy loading and code splitting techniques to achieve lazy loading resources (downloading only when requested by the browser). The magic comments in lazy loading make it easy
And finally, for the mainchunk, vendor chunk, or any other external scripts (included in index.html), you can defer them using script-ext-html-webpack-plugin
There are many more libraries for inlining CSS and deferring external scripts. Feel free to use as per the use case.
Use Properly Sized Images
This section lists all the images used in a page that aren’t properly sized, along with the stats on potential savings for each image.
How Lighthouse Calculate Oversized Images?
Lighthouse calculates potential savings by comparing the rendered size of each image on the page with its actual size. The rendered image varies based on the device pixel ratio. If the size difference is at least 25 KB, the image will fail the audit.
Solution
DO NOT serve images that are larger than their rendered versions! The wasted size just hampers the load time.
Alternatively,
Use responsive images. With this technique, create multiple versions of the images to be used in the application and serve them depending on the media queries, viewport dimensions, etc
Use image CDNs to optimize images. These are like a web service API for transforming images
Use vector images, like SVG. These are built on simple primitives and can scale without losing data or change in the file size
You can resize images online or on your system using tools. Learn how to serve responsive images.
An offscreen image is an image located outside of the visible browser viewport.
The audit fails if the page has offscreen images. Lighthouse lists all offscreen or hidden images in your page, along with the potential savings.
Solution
Load offscreen images only when the user focuses on that part of the viewport. To achieve this, lazy-load these images after loading all critical resources.
There are many libraries available online that will load images depending on the visible viewport. Feel free to use them as per the use case.
Minify CSS and JavaScript
Lighthouse identifies all the CSS and JS files that are not minified. It will list all of them along with potential savings.
Minifiers can do it for you. Webpack users can use mini-css-extract-plugin and terser-webpack-plugin for minifying CSS and JS, respectively.
Serve Images in Next-gen Formats
Following are the next-gen image formats:
Webp
JPEG 2000
JPEG XR
The image formats we use regularly (i.e., JPEG and PNG) have inferior compression and quality characteristics compared to next-gen formats. Encoding images in these formats can load your website faster and consume less cellular data.
Lighthouse converts each image of the older format to Webp format and reports those which ones have potential savings of more than 8 KB.
Solution
Convert all, or at least the images Lighthouse recommends, into the above formats. Use your converted images with the fallback technique below to support all browsers.
This technique of compressing the original textual information uses compression algorithms to find repeated sequences and replace them with shorter representations. It’s done to further minimize the total network bytes.
Lighthouse lists all the text-based resources that are not compressed.
It computes the potential savings by identifying text-based resources that do not include a content-encoding header set to br, gzip or deflate and compresses each of them with gzip.
If the potential compression savings is more than 10% of the original size, then the file fails the audit.
Solution
Webpack users can use compression-webpack-plugin for text compression.
The best part about this plugin is that it supports Google’s Brotli compression algorithm which is superior to gzip. Alternatively, you can also use brotli-webpack-plugin. All you need to do is configure your server to return Content-Encoding as br.
Brotli compresses faster than gzip and produces smaller files (up to 20% smaller). As of June 2020, Brotli is supported by all major browsers except Safari on iOS and desktop and Internet Explorer.
Don’t worry. You can still use gzip as a fallback.
Preconnect to Required Origins
This section lists all the key fetch requests that are not yet prioritized using <link rel=””preconnect””>.
Establishing connections often involves significant time, especially when it comes to secure connections. It encounters DNS lookups, redirects, and several round trips to the final server handling the user’s request.
Solution
Establish an early connection to required origins. Doing so will improve the user experience without affecting bandwidth usage.
To achieve this connection, use preconnect or dns-prefetch. This informs the browser that the app wants to establish a connection to the third-party origin as soon as possible.
Use preconnect for most critical connections. For non-critical connections, use dns-prefetch. Check out the browser support for preconnect. You can use dns-prefetch as the fallback.
Point all your flagged resources to their current location. It’ll help you optimize your pages’ Critical Rendering Path.
Use Video Formats for Animated Content
This section lists all the animated GIFs on your page, along with the potential savings.
Large GIFs are inefficient when delivering animated content. You can save a significant amount of bandwidth by using videos over GIFs.
Solution
Consider using MPEG4 or WebM videos instead of GIFs. Many tools can convert a GIF into a video, such as FFmpeg.
Use the code below to replicate a GIF’s behavior using MPEG4 and WebM. It’ll be played silent and automatically in an endless loop, just like a GIF. The code ensures that the unsupported format has a fallback.
Note: Do not use video formats for a small batch of GIF animations. It’s not worth doing it. It comes in handy when your website makes heavy use of animated content.
Final Thoughts
I found a great result in my app’s performance after trying out the techniques above.
While they may not all fit your app, try it and see what works and what doesn’t. I have compiled a list of some resources that will help you enhance performance. Hopefully, they help.
Do share your starting and final audit scores with me.
With the evolving architectural design of web applications, microservices have been a successful new trend in architecting the application landscape. Along with the advancements in application architecture, transport method protocols, such as REST and gRPC are getting better in efficiency and speed. Also, containerizing microservice applications help greatly in agile development and high-speed delivery.
In this blog, I will try to showcase how simple it is to build a cloud-native application on the microservices architecture using Go.
We will break the solution into multiple steps. We will learn how to:
1) Build a microservice and set of other containerized services which will have a very specific set of independent tasks and will be related only with the specific logical component.
2) Use go-kit as the framework for developing and structuring the components of each service.
3) Build APIs that will use HTTP (REST) and Protobuf (gRPC) as the transport mechanisms, PostgreSQL for databases and finally deploy it on Azure stack for API management and CI/CD.
Note: Deployment, setting up the CI-CD and API-Management on Azure or any other cloud is not in the scope of the current blog.
Prerequisites:
A beginner’s level of understanding of web services, Rest APIs and gRPC
GoLand/ VS Code
Properly installed and configured Go. If not, check it out here
Set up a new project directory under the GOPATH
Understanding of the standard Golang project. For reference, visit here
PostgreSQL client installed
Go kit
What are we going to do?
We will develop a simple web application working on the following problem statement:
A global publishing company that publishes books and journals wants to develop a service to watermark their documents. A document (books, journals) has a title, author and a watermark property
The watermark operation can be in Started, InProgress and Finished status
The specific set of users should be able to do the watermark on a document
Once the watermark is done, the document can never be re-marked
Example of a document:
{content: “book”, title: “The Dark Code”, author: “Bruce Wayne”, topic: “Science”}
For a detailed understanding of the requirement, please refer to this.
Architecture:
In this project, we will have 3 microservices: Authentication Service, Database Service and the Watermark Service. We have a PostgreSQL database server and an API-Gateway.
Authentication Service:
The application is supposed to have a role-based and user-based access control mechanism. This service will authenticate the user according to its specific role and return HTTP status codes only. 200 when the user is authorized and 401 for unauthorized users.
APIs:
/user/access, Method: GET, Secured: True, payload: user: <name></name> It will take the user name as an input and the auth service will return the roles and the privileges assigned to it
/authenticate, Method: GET, Secured: True, payload: user: <name>, operation: <op></op></name> It will authenticate the user with the passed operation if it is accessible for the role
/healthz, Method: GET, Secured: True It will return the status of the service
Database Service:
We will need databases for our application to store the user, their roles and the access privileges to that role. Also, the documents will be stored in the database without the watermark. It is a requirement that any document cannot have a watermark at the time of creation. A document is said to be created successfully only when the data inputs are valid and the database service returns the success status.
We will be using two databases for two different services for them to be consumed. This design is not necessary, but just to follow the “Single Database per Service” rule under the microservice architecture.
APIs:
/get, Method: GET, Secured: True, payload: filters: []filter{“field-name”: “value”} It will return the list of documents according to the specific filters passed
/update, Method: POST, Secured: True, payload: “Title”: <id>, document: {“field”: “value”, …}</id> It will update the document for the given title id
/add, Method: POST, Secured: True, payload: document: {“field”: “value”, …} It will add the document and return the title-ID
/remove Method: POST, Secured: True, payload: title: <id></id> It will remove the document entry according to the passed title-id
/healthz, Method: GET, Secured: True It will return the status of the service
Watermark Service:
This is the main service that will perform the API calls to watermark the passed document. Every time a user needs to watermark a document, it needs to pass the TicketID in the watermark API request along with the appropriate Mark. It will try to call the database Update API internally with the provided request and returns the status of the watermark process which will be initially “Started”, then in some time the status will be “InProgress” and if the call was valid, the status will be “Finished”, or “Error”, if the request is not valid.
APIs:
/get, Method: GET, Secured: True, payload: filters: []filter{“field-name”: “value”} It will return the list of documents according to the specific filters passed
/status, Method: GET, Secured: True, payload: “Ticket”: <id></id> It will return the status of the document for watermark operation for the passed ticket-id
/addDocument, Method: POST, Secured: True, payload: document: {“field”: “value”, …} It will add the document and return the title-ID
/watermark, Method: POST, Secured: True, payload: title: <id>, mark: “string”</id> It is the main watermark operation API which will accept the mark string
/healthz, Method: GET, Secured: True It will return the status of the service
Operations and Flow:
Watermark Service APIs are the only ones that will be used by the user/actor to request watermark or add the document. Authentication and Database service APIs are the private ones that will be called by other services internally. The only URL accessible to the user is the API Gateway URL.
The user will access the API Gateway URL with the required user name, the ticket-id and the mark with which the user wants the document to apply watermark
The user should not know about the authentication or database services
Once the request is made by the user, it will be accepted by the API Gateway. The gateway will validate the request along with the payload
An API forwarding rule of configuring the traffic of a specific request to a service should be defined in the gateway. The request when validated, will be forwarded to the service according to that rule.
We will define an API forwarding rule where the request made for any watermark will be first forwarded to the authentication service which will authenticate the request, check for authorized users and return the appropriate status code.
The authorization service will check for the user from which the request has been made, into the user database and its roles and permissions. It will send the response accordingly
Once the request has been authorized by the service, it will be forwarded back to the actual watermark service
The watermark service then performs the appropriate operation of putting the watermark on the document or add a new entry of the document or any other request
The operation from the watermark service of Get, Watermark or AddDocument will be performed by calling the database CRUD APIs and forwarded to the user
If the request is to AddDocument then the service should return the “TicketID” or if it is for watermark then it should return the status of the operation
Note:
Each user will have some specific roles, based on which the access controls will be identified for the user. For the sake of simplicity, the roles will be based on the type of document only, not the specific name of the book or journal
Getting Started:
Let’s start by creating a folder for our application in the $GOPATH. This will be the root folder containing our set of services.
Project Layout:
The project will follow the standard Golang project layout. If you want the full working code, please refer here
api: Stores the versions of the APIs swagger files and also the proto and pb files for the gRPC protobuf interface.
cmd: This will contain the entry point (main.go) files for all the services and also any other container images if any
docs: This will contain the documentation for the project
config: All the sample files or any specific configuration files should be stored here
deploy: This directory will contain the deployment files used to deploy the application
internal: This package is the conventional internal package identified by the Go compiler. It contains all the packages which need to be private and imported by its child directories and immediate parent directory. All the packages from this directory are common across the project
pkg: This directory will have the complete executing code of all the services in separate packages.
tests: It will have all the integration and E2E tests
vendor: This directory stores all the third-party dependencies locally so that the version doesn’t mismatch later
We are going to use the Go kit framework for developing the set of services. The official Go kit examples of services are very good, though the documentation is not that great.
Watermark Service:
1. Under the Go kit framework, a service should always be represented by an interface.
Create a package named watermark in the pkg folder. Create a new service.go file in that package. This file is the blueprint of our service.
package watermarkimport ("context""github.com/velotiotech/watermark-service/internal")typeService interface {// Get the list of all documents Get(ctx context.Context, filters ...internal.Filter) ([]internal.Document, error) Status(ctx context.Context, ticketID string) (internal.Status, error) Watermark(ctx context.Context, ticketID, mark string) (int, error) AddDocument(ctx context.Context, doc *internal.Document) (string, error) ServiceStatus(ctx context.Context) (int, error)}
2. As per the functions defined in the interface, we will need five endpoints to handle the requests for the above methods. If you are wondering why we are using a context package, please refer here. Contexts enable the microservices to handle the multiple concurrent requests, but maybe in this blog, we are not using it too much. It’s just the best way to work with it.
3. Implementing our service:
package watermarkimport ("context""net/http""os""github.com/velotiotech/watermark-service/internal""github.com/go-kit/kit/log""github.com/lithammer/shortuuid/v3")typewatermarkService struct{}func NewService() Service { return&watermarkService{} }func (w *watermarkService) Get(_ context.Context, filters ...internal.Filter) ([]internal.Document, error) {// query the database using the filters and return the list of documents// return error if the filter (key) is invalid and also return error if no item founddoc := internal.Document{Content: "book",Title: "Harry Potter and Half Blood Prince",Author: "J.K. Rowling",Topic: "Fiction and Magic", }return []internal.Document{doc}, nil}func (w *watermarkService) Status(_ context.Context, ticketID string) (internal.Status, error) {// query database using the ticketID and return the document info// return err if the ticketID is invalid or no Document exists for that ticketIDreturn internal.InProgress, nil}func (w *watermarkService) Watermark(_ context.Context, ticketID, mark string) (int, error) {// update the database entry with watermark field as non empty// first check if the watermark status is not already in InProgress, Started or Finished state// If yes, then return invalid request// return error if no item found using the ticketIDreturn http.StatusOK, nil}func (w *watermarkService) AddDocument(_ context.Context, doc *internal.Document) (string, error) {// add the document entry in the database by calling the database service// return error if the doc is invalid and/or the database invalid entry errornewTicketID := shortuuid.New()return newTicketID, nil}func (w *watermarkService) ServiceStatus(_ context.Context) (int, error) { logger.Log("Checking the Service health...")return http.StatusOK, nil}var logger log.Loggerfunc init() { logger = log.NewLogfmtLogger(log.NewSyncWriter(os.Stderr)) logger = log.With(logger, "ts", log.DefaultTimestampUTC)}
We have defined the new type watermarkService empty struct which will implement the above-defined service interface. This struct implementation will be hidden from the rest of the world.
NewService() is created as the constructor of our “object”. This is the only function available outside this package to instantiate the service.
4. Now we will create the endpoints package which will contain two files. One is where we will store all types of requests and responses. The other file will be endpoints which will have the actual implementation of the requests parsing and calling the appropriate service function.
– Create a file named reqJSONMap.go. We will define all the requests and responses struct with the fields in this file such as GetRequest, GetResponse, StatusRequest, StatusResponse, etc. Add the necessary fields in these structs which we want to have input in a request or we want to pass the output in the response.
In this file, we have a struct Set which is the collection of all the endpoints. We have a constructor for the same. We have the internal constructor functions which will return the objects which implement the generic endpoint. Endpoint interface of Go kit such as MakeGetEndpoint(), MakeStatusEndpoint() etc.
In order to expose the Get, Status, Watermark, ServiceStatus and AddDocument APIs, we need to create endpoints for all of them. These functions handle the incoming requests and call the specific service methods
5. Adding the Transports method to expose the services. Our services will support HTTP and will be exposed using Rest APIs and protobuf and gRPC.
Create a separate package of transport in the watermark directory. This package will hold all the handlers, decoders and encoders for a specific type of transport mechanism
6. Create a file http.go: This file will have the transport functions and handlers for HTTP with a separate path as the API routes.
This file is the map of the JSON payload to their requests and responses. It contains the HTTP handler constructor which registers the API routes to the specific handler function (endpoints) and also the decoder-encoder of the requests and responses respectively into a server object for a request. The decoders and encoders are basically defined just to translate the request and responses in the desired form to be processed. In our case, we are just converting the requests/responses using the json encoder and decoder into the appropriate request and response structs.
We have the generic encoder for the response output, which is a simple JSON encoder.
7. Create another file in the same transport package with the name grpc.go. Similar to above, the name of the file is self-explanatory. It is the map of protobuf payload to their requests and responses. We create a gRPC handler constructor which will create the set of grpcServers and registers the appropriate endpoint to the decoders and encoders of the request and responses
– Before moving on to the implementation, we have to create a proto file that acts as the definition of all our service interface and the requests response structs, so that the protobuf files (.pb) can be generated to be used as an interface between services to communicate.
– Create package pb in the api/v1 package path. Create a new file watermarksvc.proto. Firstly, we will create our service interface, which represents the remote functions to be called by the client. Refer to this for syntax and deep understanding of the protobuf.
We will convert the service interface to the service interface in the proto file. Also, we have created the request and response structs exactly the same once again in the proto file so that they can be understood by the RPC defined in the service.
Note: Creating the proto files and generating the pb files using protoc is not the scope of this blog. We have assumed that you already know how to create a proto file and generate a pb file from it. If not, please refer protobuf and protoc gen
I have also created a script to generate the pb file, which just needs the path with the name of the proto file.
#!/usr/bin/env sh# Install proto3 from source# brew install autoconf automake libtool# git clone https://github.com/google/protobuf# ./autogen.sh ; ./configure ; make ; make install## Update protoc Go bindings via# go get -u github.com/golang/protobuf/{proto,protoc-gen-go}## See also# https://github.com/grpc/grpc-go/tree/master/examplesREPO_ROOT="${REPO_ROOT:-$(cd "$(dirname "$0")/../.." && pwd)}"PB_PATH="${REPO_ROOT}/api/v1/pb"PROTO_FILE=${1:-"watermarksvc.proto"}echo "Generating pb files for ${PROTO_FILE} service"protoc -I="${PB_PATH}""${PB_PATH}/${PROTO_FILE}"--go_out=plugins=grpc:"${PB_PATH}"
8. Now, once the pb file is generated in api/v1/pb/watermark package, we will create a new struct grpcserver, grouping all the endpoints for gRPC. This struct should implement pb.WatermarkServer which is the server interface referred by the services.
To implement these services, we are defining the functions such as func (g *grpcServer) Get(ctx context.Context, r *pb.GetRequest) (*pb.GetReply, error). This function should take the request param and run the ServeGRPC() function and then return the response. Similarly, we should implement the ServeGRPC() functions for the rest of the functions.
These functions are the actual Remote Procedures to be called by the service.
We will also need to add the decode and encode functions for the request and response structs from protobuf structs. These functions will map the proto Request/Response struct to the endpoint req/resp structs. For example: func decodeGRPCGetRequest(_ context.Context, grpcReq interface{}) (interface{}, error). This will assert the grpcReq to pb.GetRequest and use its fields to fill the new struct of type endpoints.GetRequest{}. The decoding and encoding functions should be implemented similarly for the other requests and responses.
9. Finally, we just have to create the entry point files (main) in the cmd for each service. As we already have mapped the appropriate routes to the endpoints by calling the service functions and also we mapped the proto service server to the endpoints by calling ServeGRPC() functions, now we have to call the HTTP and gRPC server constructors here and start them.
Create a package watermark in the cmd directory and create a file watermark.go which will hold the code to start and stop the HTTP and gRPC server for the service
package mainimport ("fmt""net""net/http""os""os/signal""syscall" pb "github.com/velotiotech/watermark-service/api/v1/pb/watermark""github.com/velotiotech/watermark-service/pkg/watermark""github.com/velotiotech/watermark-service/pkg/watermark/endpoints""github.com/velotiotech/watermark-service/pkg/watermark/transport""github.com/go-kit/kit/log" kitgrpc "github.com/go-kit/kit/transport/grpc""github.com/oklog/oklog/pkg/group""google.golang.org/grpc")const ( defaultHTTPPort ="8081" defaultGRPCPort ="8082")func main() {var ( logger log.Logger httpAddr = net.JoinHostPort("localhost", envString("HTTP_PORT", defaultHTTPPort)) grpcAddr = net.JoinHostPort("localhost", envString("GRPC_PORT", defaultGRPCPort)) ) logger = log.NewLogfmtLogger(log.NewSyncWriter(os.Stderr)) logger = log.With(logger, "ts", log.DefaultTimestampUTC)var ( service = watermark.NewService() eps = endpoints.NewEndpointSet(service) httpHandler = transport.NewHTTPHandler(eps) grpcServer = transport.NewGRPCServer(eps) )var g group.Group {// The HTTP listener mounts the Go kit HTTP handler we created. httpListener, err := net.Listen("tcp", httpAddr)if err != nil { logger.Log("transport", "HTTP", "during", "Listen", "err", err) os.Exit(1) } g.Add(func() error { logger.Log("transport", "HTTP", "addr", httpAddr) return http.Serve(httpListener, httpHandler) }, func(error) { httpListener.Close() }) } {// The gRPC listener mounts the Go kit gRPC server we created. grpcListener, err := net.Listen("tcp", grpcAddr)if err != nil { logger.Log("transport", "gRPC", "during", "Listen", "err", err) os.Exit(1) } g.Add(func() error { logger.Log("transport", "gRPC", "addr", grpcAddr)// we add the Go Kit gRPC Interceptor to our gRPC service as it is used by// the here demonstrated zipkin tracing middleware. baseServer := grpc.NewServer(grpc.UnaryInterceptor(kitgrpc.Interceptor)) pb.RegisterWatermarkServer(baseServer, grpcServer) return baseServer.Serve(grpcListener) }, func(error) { grpcListener.Close() }) } {// This function just sits and waits for ctrl-C.cancelInterrupt :=make(chan struct{}) g.Add(func() error { c :=make(chan os.Signal, 1) signal.Notify(c, syscall.SIGINT, syscall.SIGTERM) select { case sig :=<-c: return fmt.Errorf("received signal %s", sig) case <-cancelInterrupt: return nil } }, func(error) {close(cancelInterrupt) }) } logger.Log("exit", g.Run())}func envString(env, fallback string) string {e := os.Getenv(env)if e =="" {return fallback }return e}
Let’s walk you through the above code. Firstly, we will use the fixed ports to make the server listen to them. 8081 for HTTP Server and 8082 for gRPC Server. Then in these code stubs, we will create the HTTP and gRPC servers, endpoints of the service backend and the service.
service = watermark.NewService()eps = endpoints.NewEndpointSet(service)grpcServer = transport.NewGRPCServer(eps)httpHandler = transport.NewHTTPHandler(eps)
Now the next step is interesting. We are creating a variable of oklog.Group. If you are new to this term, please refer here. Group helps you elegantly manage the group of Goroutines. We are creating three Goroutines: One for HTTP server, second for gRPC server and the last one for watching on the cancel interrupts. Just like this:
Similarly, we will start a gRPC server and a cancel interrupt watcher. Great!! We are done here. Now, let’s run the service.
go run ./cmd/watermark/watermark.go
The server has started locally. Now, just open a Postman or run curl to one of the endpoints. See below: We ran the HTTP server to check the service status:
We have successfully created a service and ran the endpoints.
Further:
I really like to make a project complete always with all the other maintenance parts revolving around. Just like adding the proper README, have proper .gitignore, .dockerignore, Makefile, Dockerfiles, golang-ci-lint config files, and CI-CD config files etc.
I have created a separate Dockerfile for each of the three services in path /images/.
I have created a multi-staged dockerfile to create the binary of the service and run it. We will just copy the appropriate directories of code in the docker image, build the image all in one and then create a new image in the same file and copy the binary in it from the previous one. Similarly, the dockerfiles are created for other services also.
In the dockerfile, we have given the CMD as go run watermark. This command will be the entry point of the container. I have also created a Makefile which has two main targets: build-image and build-push. The first one is to build the image and the second is to push it.
Note: I am keeping this blog concise as it is difficult to cover all the things. The code in the repo that I have shared in the beginning covers most of the important concepts around services. I am still working and continue committing improvements and features.
Let’s see how we can deploy:
We will see how to deploy all these services in the containerized orchestration tools (ex: Kubernetes). Assuming you have worked on Kubernetes with at least a beginner’s understanding before.
In deploy dir, create a sample deployment having three containers: auth, watermark and database. Since for each container, the entry point commands are already defined in the dockerfiles, we don’t need to send any args or cmd in the deployment.
We will also need the service which will be used to route the external traffic of request from another load balancer service or nodeport type service. To make it work, we might have to create a nodeport type of service to expose the watermark-service to make it running for now.
Another important and very interesting part is to deploy the API Gateway. It is required to have at least some knowledge of any cloud provider stack to deploy the API Gateway. I have used Azure stack to deploy an API Gateway using the resource called as “API-Management” in the Azure plane. Refer the rules config files for the Azure APIM api-gateway:
Further, only a proper CI/CD setup is remaining which is one of the most essential parts of a project after development. I would definitely like to discuss all the above deployment-related stuff in more detail but that is not in the scope of my current blog. Maybe I will post another blog for the same.
Wrapping up:
We have learned how to build a complete project with three microservices in Golang using one of the best-distributed system development frameworks: Go kit. We have also used the database PostgreSQL using the GORM used heavily in the Go community. We did not stop just at the development but also we tried to theoretically cover the development lifecycle of the project by understanding what, how and where to deploy.
We created one microservice completely from scratch. Go kit makes it very simple to write the relationship between endpoints, service implementations and the communication/transport mechanisms. Now, go and try to create other services from the problem statement.
Being an avid Google Photos user, I really love some of its features, such as album, face search, and unlimited storage. However, when Google announced the end of unlimited storage on June 1st, 2021, I started thinking about how I could create a cheaper solution that would meet my photo backup requirement.
“Taking an image, freezing a moment, reveals how rich reality truly is.”
– Anonymous
Google offers 100 GB of storage for 130 INR. This storage can be used across various Google applications. However, I don’t use all the space in one go. For me, I snap photos randomly. Sometimes, I visit places and take random snaps with my DSLR and smartphone. So, in general, I upload approximately 200 photos monthly. The size of these photos varies in the range of 4MB to 30MB. On average, I may be using 4GB of monthly storage for backup on my external hard drive to keep raw photos, even the bad ones. Photos backed up on the cloud should be visually high-quality, and it’s good to have a raw copy available at the same time, so that you may do some lightroom changes (although I never touch them 😛). So, here is my minimal requirement:
Should support social authentication (Google sign-in preferred).
Photos should be stored securely in raw format.
Storage should be scaled with usage.
Uploading and downloading photos should be easy.
Web view for preview would be a plus.
Should have almost no operations headache and solution should be as cheap as possible 😉.
Selecting Tech Stack
To avoid operation headaches with servers going down, scaling, or maybe application crashing and overall monitoring, I opted for a serverless solution with AWS. The AWS S3 is infinite scalable storage and you only pay for the amount of storage you used. On top of that, you can opt for the S3 storage class, which is efficient and cost-effective.
– Infrastructure Stack 1. AWS API Gateway (http api) 2. AWS Lambda (for processing images and API gateway queries) 3. Dynamodb (for storing image metadata) 4. AWS Cognito (for authentication) 5. AWS S3 Bucket (for storage and web application hosting) 6. AWS Certificate Manager (to use SSL certificate for a custom domain with API gateway)
We will create three S3 buckets. Create one for hosting a frontend application (refer to architecture diagram, more on this discussed later in the build and hosting part). The second one is for temporarily uploading images. The third one is for actual backup and storage (enable server-side encryption on this bucket). A temporary upload bucket will process uploaded images.
During pre-processing, we will resize the original image into two different sizes. One is for thumbnail purposes (400px width), another one is for viewing purposes, but with reduced quality (webp format). Once images are resized, upload all three (raw, thumbnail, and webview) to the third S3 bucket and create a record in dynamodb. Set up object expiry policy on the temporary bucket for 1 day. This way, uploaded objects are automatically deleted from the temporary bucket.
Setup trigger on the temporary bucket for uploaded images:
We will need to set up an S3 PUT event, which will trigger our Lambda function to download and process images. We will filter the suffix jpg (and jpeg) for an event trigger, meaning that any file with extension .jpg and .jpeg uploaded to our temporary bucket will automatically invoke a lambda function with the event payload. The lambda function with the help of the event payload will download the uploaded file and perform processing. Your serverless function definition would look like:
Notice that in the YAML events section, we set “existing:true”. This ensures that the bucket will not be created during the serverless deployment. However, if you plan not to manually create your s3 bucket, you can let the framework create a bucket for you.
DynamoDB as metadatadb:
AWS dynamodb is a key-value document db that is suitable for our use case. Dynamodb will help us retrieve the list of photos available in the time series. Dynamodb uses a primary key for uniquely identifying each record. A primary key can be composed of a hash key and range key (also called a sort key). A range key is optional. We will use a federated identity ID (discussed in setup authorization) as the hash key (partition key) and name it the username for attribute definition with the type string. We will use the timestamp attribute definition name as a range key with a type number. Range key will help us query results with time-series (Unix epoch). We can also use dynamodb secondary indexes to sort results more specifically. However, to keep the application simple, we’re going to opt-out of this feature for now. Your serverless resource definition would look like:
Finally, you also need to set up the IAM role so that the process image lambda function would have access to the S3 bucket and dynamodb. Here is the serverless definition for the IAM role.
Okay, to set up a Cognito user pool, head to the Cognito console and create a user pool with below config:
1. Pool Name: photobucket-users
2. How do you want your end-users to sign in?
Select: Email Address or Phone Number
Select: Allow Email Addresses
Check: (Recommended) Enable case insensitivity for username input
3. Which standard attributes are required?
email
4. Keep the defaults for “Policies”
5. MFA and Verification:
I opted to manually reset the password for each user (since this is internal app)
Disabled user verification
6. Keep the default for Message Customizations, tags, and devices.
7. App Clients :
App client name: myappclient
Let the refresh token, access token, and id token be default
Check all “Auth flow configurations”
Check enable token revocation
8. Skip Triggers
9. Review and create the pool
Once created, goto app integration -> domain name. Create a domain Cognito subdomain of your choice and note this. Next, I plan to use the Google sign-in feature with Cognito Federation Identity Providers. Use this guide to set up a Google social identity with Cognito.
Setup Authorization:
Once the user identity is verified, we need to allow them to access the s3 bucket with limited permissions. Head to the Cognito console, select federated identities, and create a new identity pool. Follow these steps to configure:
1. Identity pool name: photobucket_auth
2. Keep Unauthenticated and Authentication flow settings unchecked.
${cognito-identity.amazonaws.com:sub} is a special AWS variable. When a user is authenticated with a federated identity, each user is assigned a unique identity. What the above policy means is that any user who is authenticated should have access to objects prefixed by their own identity ID. This is how we intend users to gain authorization in a limited area within the S3 bucket.
Copy the Identity Pool ID (from sample code section). You will need this in your backend to get the identity id of the authenticated user via JWT token.
Amplify configuration for the frontend UI sign-in:
This object helps you set up the minimal configuration for your application. This is all that we need to sign in via Cognito and access the S3 photo bucket.
constawsconfig= { Auth : { identityPoolId: "idenity pool id created during authorization setup", region : "your aws region", identityPoolRegion: "same as above if cognito is in same region", userPoolId : "cognito user pool id created during authentication setup", userPoolWebClientId : "cognito app client id", cookieStorage : { domain : "https://your-app-domain-name", //this is very important secure: true }, oauth: { domain : "{cognito domain name}.auth.{cognito region name}.amazoncognito.com", scope : ["profile","email","openid"], redirectSignIn: 'https://your-app-domain-name', redirectSignOut: 'https://your-app-domain-name', responseType : "token" } }, Storage: { AWSS3 : { bucket: "your-actual-bucket-name", region: "region-of-your-bucket" } }};exportdefault awsconfig;
You can then use the below code to configure and sign in via social authentication.
import Amplify, {Auth} from'aws-amplify';import awsconfig from'./aws-config';Amplify.configure(awsconfig);//once the amplify is configured you can use below call with onClick event of buttons or any other visual component to sign in.//Example<ButtonstartIcon={<imgalt="Sigin in With Google"src={logo} />} fullWidthvariant="outlined"color="primary"onClick={() => Auth.federatedSignIn({provider: 'Google'})}> Sign in with Google</Button>
Gallery View:
When the application is loaded, we use the PhotoGallery component to load photos and view thumbnails on-page. The Photogallery component is a wrapper around the InfinityScoller component, which keeps loading images as the user scrolls. The idea here is that we query a max of 10 images in one go. Our backend returns a list of 10 images (just the map and metadata to the S3 bucket). We must load these images from the S3 bucket and then show thumbnails on-screen as a gallery view. When the user reaches the bottom of the screen or there is empty space left, the InfiniteScroller component loads 10 more images. This continues untill our backend replies with a stop marker.
The key point here is that we need to send the JWT Token as a header to our backend service via an ajax call. The JWT Token is obtained post a sign-in from Amplify framework. An example of obtaininga JWT token:
An example of an infinite scroller component usage is given below. Note that “gallery” is JSX composed array of photo thumbnails. The “loadMore” method calls our ajax function to the server-side backend and updates the “gallery” variable and sets the “hasMore” variable to true/false so that the infinite scroller component can stop queering when there are no photos left to display on the screen.
The Lightbox component gives a zoom effect to the thumbnail. When the thumbnail is clicked, a higher resolution picture (webp version) is downloaded from the S3 bucket and shown on the screen. We use a storage object from the Amplify library. Downloaded content is a blob and must be converted into image data. To do so, we use the javascript native method, createObjectURL. Below is the sample code that downloads the object from the s3 bucket and then converts it into a viewable image for the HTML IMG tag.
The S3 SDK lets you generate a pre-signed POST URL. Anyone who gets this URL will be able to upload objects to the S3 bucket directly without needing credentials. Of course, we can actually set up some boundaries, like a max object size, key of the uploaded object, etc. Refer to this AWS blog for more on pre-signed URLs. Here is the sample code to generate a pre-signed URL.
For a better UX, we can allow our users to upload more than one photo at a time. However, a pre-signed URL lets you upload a single object at a time. To overcome this, we generate multiple pre-signed URLs. Initially, we send a request to our backend asking to upload photos with expected keys. This request is originated once the user selects photos to upload. Our backend then generates pre-signed URLs for us. Our frontend React app then provides the illusion that all photos are being uploaded as a whole.
When the upload is successful, the S3 PUT event is triggered, which we discussed earlier. The complete flow of the application is given in a sequence diagram. You can find the complete source code here in my GitHub repository.
React Build Steps and Hosting:
The ideal way to build the react app is to execute an npm run build. However, we take a slightly different approach. We are not using the S3 static website for serving frontend UI. For one reason, S3 static websites are non-SSL unless we use CloudFront. Therefore, we will make the API gateway our application’s entry point. Thus, the UI will also be served from the API gateway. However, we want to reduce calls made to the API gateway. For this reason, we will only deliver the index.html file hosted with the help API gateway/Lamda, and the rest of the static files (react supporting JS files) from S3 bucket.
Your index.html should have all the reference paths pointed to the S3 bucket. The build mustexclusively specify that static files are located in a different location than what’s relative to the index.html file. Your S3 bucket needs to be public with the right bucket policy and CORS set so that the end-user can only retrieve files and not upload nasty objects. Those who are confused about how the S3 static website and S3 public bucket differ may refer to here. Below are the react build steps, bucket policy, and CORS.
PUBLIC_URL=https://{your-static-bucket-name}.s3.{aws_region}.amazonaws.com/ npm run build//Bucket Policy{"Version": "2012-10-17","Id": "http referer from your domain only","Statement": [ {"Sid": "Allow get requests originating from","Effect": "Allow","Principal": "*","Action": "s3:GetObject","Resource": "arn:aws:s3:::{your-static-bucket-name}/static/*","Condition": {"StringLike": {"aws:Referer": ["https://your-app-domain-name" ] } } } ]}//CORS[ {"AllowedHeaders": ["*" ],"AllowedMethods": ["GET" ],"AllowedOrigins": ["https://your-app-domain-name" ],"ExposeHeaders": [] }]
Once a build is complete, upload index.html to a lambda that serves your UI. Run the below shell commands to compress static contents and host them on our static S3 bucket.
#assuming you are in your react app directorymkdir /tmp/s3uploadscp -ar build/static /tmp/s3uploads/cd /tmp/s3uploads#add gzip encoding to all the filesgzip -9`find ./ -type f`#remove .gz extension from compressed filesfor i in`find ./ -type f`do mv $i ${i%.*}done#sync your files to s3 static bucket and mention that these files are compressed with gzip encoding#so that browser will not treat them asregularfilesaws s3 --region $AWSREGION sync . s3://${S3_STATIC_BUCKET}/static/ --content-encoding gzip --delete --ssecd -rm -rf /tmp/s3uploads
Our backend uses nodejs express framework. Since this is a serverless application, we need to wrap express with a serverless-http framework to work with lambda. Sample source code is given below, along with serverless framework resource definition. Notice that, except for the UI home endpoint ( “/” ), the rest of the API endpoints are authenticated with Cognito on the API gateway itself.
Lastly, we will setup up a custom domain so that we don’t need to use the gibberish domain name generated by the API gateway and certificate for our custom domain. You don’t need to use route53 for this part. If you have an existing domain, you can create a subdomain and point it to the API gateway. First things first: head to the AWS ACM console and generate a certificate for the domain name. Once the request is generated, you need to validate your domain by creating a TXT record as per the ACM console. The ACM is a free service. Domain verification may take few minutes to several hours. Once you have the certificate ready, head back to the API gateway console. Navigate to “custom domain names” and click create.
Enter your application domain name
Check TLS 1.2 as TLS version
Select Endpoint type as Regional
Select ACM certificate from dropdown list
Create domain name
Select the newly created custom domain. Note the API gateway domain name from Domain Details -> Configuration tab. You will need this to map a CNAME/ALIAS record with your DNS provider. Click on the API mappings tab. Click configure API mappings. From the dropdown, select your API gateway, select stage as default, and click save. You are done here.
Future Scope and Improvements :
To improve application latency, we can use CloudFront as CDN. This way, our entry point could be S3, and we no longer need to use API gateway regional endpoint. We can also add AWS WAF as an added security in front of our API gateway to inspect incoming requests and payloads. We can also use Dynamodb secondary indexes so that we can efficiently search metadata in the table. Adding a lifecycle rule on raw photos which have not been accessed for more than a year can be transited to the S3 Glacier storage class. You can further add glacier deep storage transition to save more on storage costs.
Bots are the new black! The entire tech industry seems to be buzzing with “bot” fever. Me and my co-founders often see a “bot” company and discuss its business model. Chirag Jog has always been enthusiastic about the bot wave while I have been mostly pessimistic, especially about B2C bots. We should consider that there are many types of “bots” —chat bots, voice bots, AI assistants, robotic process automation(RPA) bots, conversational agents within apps or websites, etc.
Over the last year, we have been building some interesting chat and voice based bots which has given me some interesting insights. I hope to lay down my thoughts on bots in some detail and with some structure.
What are bots?
Bots are software programs which automate tasks that humans would otherwise do themselves. Bots are developed using machine learning software and are expected to aggregate data to make the interface more intelligent and intuitive. There have always been simple rule-based bots which provide a very specific service with low utility. In the last couple of years, we are seeing emergence of intelligent bots that can serve more complex use-cases.
Why now?
Machine learning, NLP and AI technologies have matured enabling practical applications where bots can actually do intelligent work >75% of the times. Has general AI been solved? No. But is it good enough to do the simple things well and give hope for more complex things? Yes.
Secondly, there are billions of DAUs on Whatsapp & Facebook Messenger. There are tens of millions of users on enterprise messaging platforms like Slack, Skype & Microsoft Teams. Startups and enterprises want to use this distribution channel and will continue to experiment aggressively to find relevant use-cases. Millennials are very comfortable using the chat and voice interfaces for a broader variety of use-cases since they used chat services as soon as they came online. As millennials become a growing part of the workforce, the adoption of bots may increase.
Thirdly, software is becoming more prevalent and more complex. Data is exploding and making sense of this data is getting harder and requiring more skill. Companies are experimenting with bots to provide an “easy to consume” interface to casual users. So non-experts can use the bot interface while experts can use the mobile or web application for the complex workflows. This is mostly true for B2B & enterprise. A good example is how Slack has become the system of engagement for many companies (including at @velotiotech). We require all the software we use (Gitlab, Asana, Jira, Google Docs, Zoho, Marketo, Zendesk, etc.) to provide notifications into Slack. Over time, we expect to start querying the respective Slack bots for information. Only domain experts will log into the actual SaaS applications.
Types of Bots
B2C Chat-Bots
Consumer focused bots use popular messaging and social platforms like Facebook, Telegram, Kik, WeChat, etc. Some examples of consumer bots include weather, e-commerce, travel bookings, personal finance, fitness, news. These are mostly inspired by WeChat which owns the China market and is the default gateway to various internet services. These bots show up as “contacts” in these messenger platforms.
Strategically, the B2C bots are basically trying to get around the distribution monopoly of Apple & Google Android. As many studies have indicated, getting mobile users to install apps is getting extremely hard. Facebook, Skype, Telegram hope to become the system of engagement and distribution for various apps thereby becoming an alternate “App Store” or “Bot Store”.
I believe that SMS is a great channel for basic chatbot functionality. Chatbots with SMS interface can be used by all age groups and in remote parts of the world where data infrastructure is lacking. I do expect to see some interesting companies use SMS chatbots to build new business models. Also mobile bots that sniff or integrate with as many of your mobile apps to provide cross-platform and cross-app “intelligence” will succeed — Google Now is a good example.
An often cited example is the DoNotPay chatbot which helps people contest parking tickets in the UK. In my opinion, the novelty is in the service and it’s efficiency and not in the chatbot interface as such. Also, I have not met anyone who uses a B2C chatbot even on a weekly or monthly basis.
B2B Bots
Enterprise bots are available through platforms and interfaces like Slack, Skype, Microsoft Teams, website chat windows, email assistants, etc. They are focused on collaboration, replacing/augmenting emails, information assistants, support, and speeding up decision-making/communications.
Most of the enterprise bots solve niche and specific problems. This is a great advantage considering the current state of AI/ML technologies. Many of these enterprise bot companies are also able to augment their intelligence with human agents thereby providing better experiences to users.
Some of the interesting bots and services in the enterprise space include:
x.ai and Clara Labs which provide a virtual assistant to help you setup and manage your meetings.
Gong.io and Chorus provide a bot that listens in on sales calls and uses voice-to-text and other machine learning algorithms to help your sales teams get better and close more deals.
Astro is building an AI assisted email app which will have multiple interfaces including voice (Echo).
Twyla is helping to make chatbots on website more intelligent using ML. It integrates with your existing ZenDesk, LivePerson or Salesforce support.
Clarke.ai is a bot which uses AI to take notes for your meeting so you can focus better.
Smacc provides AI assisted automated book-keeping for SMBs.
Slack is one of the fastest growing SaaS companies and has the most popular enterprise bot store. Slack bots are great for pushing and pulling information & data. All SaaS services and apps should have bots that can emit useful updates, charts, data, links, etc to a specific set of users. This is much better than sending emails to an email group. Simple decisions can be taken within a chat interface using something like Slack Buttons. Instead of receiving an email and opening a web page, most people would prefer approving a leave or an expense right within Slack. Slack/Skype/etc will add the ability to embed “cards” or “webviews” or “interactive sections” within chats. This will enable some more complex use-cases to be served via bots. Most enterprise services have Slack bots and are allowing Slack to be a basic system of engagement.
Chatbots or even voice-based bots on websites will be a big deal. Imagine that each website has a virtual support rep or a sales rep available to you 24×7 in most popular languages. All business would want such “agents” or “bots” for greater sales conversions and better support.
Automation of backoffice tasks can be a HUGE business. KPOs & BPOs are a huge market sp if you can build software or software-enabled processes to reduce costs, then you can build a significant sized company. Some interesting examples here Automation Anywhere and WorkFusion.
Voice based Bots
Amazon had a surprise hit in the consumer electronics space with their Amazon Echo device which is a voice-based assistant. Google recently releases their own voice enabled apps to complete with Echo/Alexa. Voice assistants provide weather, music, searches, e-commerce ordering via NLP voice interface. Apple’s Siri should have been leading this market but as usual Apple is following rather leading the market.
Voice bots have one great advantage- with miniaturization of devices (Apple Watch, Earpods, smaller wearables), the only practical interface is voice. The other option is pairing the device with your mobile phone — which is not a smooth and intuitive process. Echo is already a great device for listening to music with its Spotify integration — just this feature is enough of a reason to buy it for most families.
Conclusion
Bots are useful and here to stay. I am not sure about the form or the distribution channel through which bots will become prevalent. In my opinion, bots are an additional interface to intelligence and application workflows. They are not disrupting any process or industry. Consumers will not shop more due to chat or voice interface bots, employees will not collaborate as desired due to bots, information discovery within your company will not improve due to bots. Actually, existing software and SaaS services are getting more intelligent, predictive and prescriptive. So this move towards “intelligent interfaces” is the real disruption.
So my concluding predictions:
B2C chatbots will turn out to be mostly hype and very few practical scalable use-cases will emerge.
Voice bots will see increasing adoption due to smaller device sizes. IoT, wearables and music are excellent use-cases for voice based interfaces. Amazon’s Alexa will become the dominant platform for voice controlled apps and devices. Google and Microsoft will invest aggressively to take on Alexa.
B2B bots can be intelligent interfaces on software platforms and SaaS products. Or they can be agents that solve very specific vertical use-cases. I am most bullish about these enterprise focused bots which are helping enterprises become more productive or to increase efficiency with intelligent assistants for specific job functions.
If you’d like to chat about anything related to this article, what tools we use to build bots, or anything else, get in touch.
Bots are the flavor of the season. Everyday, we hear about a new bot catering to domains like travel, social, legal, support, sales, etc. being launched. Facebook Messenger alone has more than 11,000 bots when I last checked and must have probably added thousands of them as I write this article.
The first generation of bots were dumb since they could understand only a limited set of queries based on keywords in the conversation. But the commoditization of NLP(Natural Language Processing) and machine learning by services like Wit.ai, API.ai, Luis.ai, Amazon Lex, IBM Watson, etc. has resulted in the growth of intelligent bots like donotpay, chatShopper. I don’t know if bots are just hype or the real deal. But I can say with certainty that building a bot is fun and challenging at the same time. In this article, I would like to introduce you to some of the tools to build an intelligent chatbot.
The title of the blog clearly tells that we have used Botkit and Rasa (NLU) to build our bot. Before getting into the technicalities, I would like to share the reason for choosing these two platforms and how they fit our use case. Also read – How to build a serverless chatbot with Amazon Lex.
Bot development Framework — Howdy, Botkit and Microsoft (MS) Bot Framework were good contenders for this. Both these frameworks: – are open source – have integrations with popular messaging platforms like Slack, Facebook Messenger, Twilio etc – have good documentation – have an active developer community
Due to compliance issues, we had chosen AWS to deploy all our services and we wanted the same with the bot as well.
NLU (Natural Language Understanding) — API.ai (acquired by google) and Wit.ai (acquired by Facebook) are two popular NLU tools in the bot industry which we first considered for this task. Both the solutions: – are hosted as a cloud service – have Nodejs, Python SDK and a REST interface – have good documentation – support for state or contextual intents which makes it very easy to build a conversational platform on top of it.
As stated before, we couldn’t use any of these hosted solutions due to compliance and that is where we came across an open source NLU called Rasa which was a perfect replacement for API.ai and Wit.ai and at the same time, we could host and manage it on AWS.
You would now be wondering why I used the term NLU for Api.ai and Wit.ai and not NLP (Natural Language Processing). * NLP refers to all the systems which handle the interactions with humans in the way humans find it natural. It means that we could converse with a system just the way we talk to other human beings. * NLU is a subfield of NLP which handles a narrow but complex challenge of converting unstructured inputs into a structured form which a machine can understand and act upon. So when you say “Book a hotel for me in San Francisco on 20th April 2017”, the bot uses NLU to extract date=20th April 2017, location=San Francisco and action=book hotel which the system can understand.
RASA NLU
In this section, I would like to explain Rasa in detail and some terms used in NLP which you should be familiar with. * Intent: This tells us what the user would like to do. Ex : Raise a complaint, request for refund etc
* Entities: These are the attributes which gives details about the user’s task. Ex — Complaint regarding service disruptions, refund cost etc
* Confidence Score : This is a distance metric which indicates how closely the NLU could classify the result into the list of intents.
Here is an example to help you understand the above mentioned terms — Input:“My internet isn’t working since morning”. – intent: “service_interruption” – entities: “service=internet”, “duration=morning”. – confidence score: 0.84 (This could vary based on your training)
NLU’s job (Rasa in our case) is to accept a sentence/statement and give us the intent, entities and a confidence score which could be used by our bot. Rasa basically provides a high level API over various NLP and ML libraries which does intent classification and entity extraction. These NLP and ML libraries are called as backend in Rasa which brings the intelligence in Rasa. These are some of the backends used with Rasa
MITIE — This is an all inclusive library meaning that it has NLP library for entity extraction as well as ML library for intent classification built into it.
spaCy + sklearn — spaCy is a NLP library which only does entity extraction. sklearn is used with spaCy to add ML capabilities for intent classification.
MITIE + sklearn — This uses best of both the worlds. This uses good entity recognition available in MITIE along with fast and good intent classification in sklearn.
I have used MITIE backend to train Rasa. For the demo, I’ve taken a “Live Support ChatBot” which is trained for messages like this: * My phone isn’t working. * My phone isn’t turning on. * My phone crashed and isn’t working anymore.
My training data looks like this:
{"rasa_nlu_data": {"common_examples": [ {"text": "hi","intent": "greet","entities": [] }, {"text": "my phone isn't turning on.","intent": "device_failure","entities": [ {"start": 3,"end": 8,"value": "phone","entity": "device" } ] }, {"text": "my phone is not working.","intent": "device_failure","entities": [ {"start": 3,"end": 8,"value": "phone","entity": "device" } ] }, {"text": "My phone crashed and isn’t working anymore.","intent": "device_failure","entities": [ {"start": 3,"end": 8,"value": "phone","entity": "device" } ] } ] }}
NOTE — We have observed that MITIE gives better accuracy than spaCy + sklearn for a small training set but as you keep adding more intents, training on MITIE gets slower and slower. For a training set of 200+ examples with about 10–15 intents, MITIE takes about 35–45 minutes for us to train on a C4.4xlarge instance(16 cores, 30 GB RAM) on AWS.
Botkit-Rasa Integration
Botkit is an open source bot development framework designed by the creators of Howdy. It basically provides a set of tools for building bots on Facebook Messenger, Slack, Twilio, Kik and other popular platforms. They have also come up with an IDE for bot development called Botkit Studio. To summarize, Botkit is a tool which allows us to write the bot once and deploy it on multiple messaging platforms.
Botkit also has a support for middleware which can be used to extend the functionality of botkit. Integrations with database, CRM, NLU and statistical tools are provided via middleware which makes the framework extensible. This design also allows us to easily add integrations with other tools and software by just writing middleware modules for them.
I’ve integrated Slack and botkit for this demo. You can use this boilerplate template to setup botkit for Slack. We have extended Botkit-Rasa middleware which you can find here.
Botkit-Rasa has 2 functions: receive and hears which override the default botkit behaviour. 1. receive — This function is invoked when botkit receives a message. It sends the user’s message to Rasa and stores the intent and entities into the botkit message object.
2. hears — This function overrides the default botkit hears method i.e controller.hears. The default hears method uses regex to search the given patterns in the user’s message while the hears method from Botkit-Rasa middleware searches for the intent.
let Botkit =require('botkit');let rasa =require('./Middleware/rasa')({rasa_uri: 'http://localhost:5000'});let controller = Botkit.slackbot({ clientId: process.env.clientId, clientSecret: process.env.clientSecret, scopes: ['bot'], json_file_store: __dirname +'/.db/'});// Override receive method in botkitcontroller.middleware.receive.use(rasa.receive);// Override hears method in botkitcontroller.changeEars(function (patterns, message) {return rasa.hears(patterns, message);});controller.setupWebserver(3000, function (err, webserver) {// Configure a route to receive webhooks from slack controller.createWebhookEndpoints(webserver);});
Let’s try an example — “my phone is not turning on”. Rasa will return the following 1. Intent — device_failure 2. Entites — device=phone
If you notice carefully, the input I gave i.e “my phone is not turning on” is a not present in my training file. Rasa has some intelligence built into it to identify the intent and entities correctly for such combinations.
We need to add a hears method listening to intent “device_failure” to process this input. Remember that intent and entities returned by Rasa will be stored in the message object by Rasa-Botkit middleware.
let Botkit =require('botkit');let rasa =require('./Middleware/rasa')({rasa_uri: 'http://localhost:5000'});let controller = Botkit.slackbot({ clientId: process.env.clientId, clientSecret: process.env.clientSecret, scopes: ['bot'], json_file_store: __dirname +'/.db/'});// Override receive method in botkitcontroller.middleware.receive.use(rasa.receive);// Override hears method in botkitcontroller.changeEars(function (patterns, message) {return rasa.hears(patterns, message);});controller.setupWebserver(3000, function (err, webserver) {// Configure a route to receive webhooks from slack controller.createWebhookEndpoints(webserver);});
You should be able run this bot with slack and see the output as shown below (support_bot is the name of my bot).
Conclusion
You are now familiar with the process of building chatbots with a bot development framework and a NLU. Hope this helps you get started on your bot very quickly. If you have any suggestions, questions, feedback then tweet me @harjun1601. Keep following our blogs for more articles on bot development, ML and AI.
The objective of this article is to design an ETL workflow using Apache NiFi that will scrape a web page with almost no code to get an endpoint, extract and transform the dataset, and load the transformed data into a Hive table.
Problem Statement
One potential use case where we need to create a data pipeline would be to capture the district level COVID-19 information from the COVID19-India API website, which gets updated daily. So, the aim is to create a flow that collates and loads a dataset into a warehouse system used by various downstream applications for further analysis, and the flow should be easily configurable for future changes.
Prerequisites
Before we start, we must have a basic understanding of Apache NiFi, and having it installed on a system would be a great start for this article. If you do not have it installed, please follow these quick steps. Apache Hive should be added to this architecture, which also requires a fully functional Hadoop framework. For this article, I am using Hive on a single cluster installed locally, but you can use a remote hive connection as well.
Basic Terminologies
Apache NiFi is an ETL tool with flow-based programming that comes with a web UI built to provide an easy way (drag & drop) to handle data flow in real-time. It also supports powerful and scalable means of data routing and transformation, which can be run on a single server or in a clustered mode across many servers.
NiFi workflow consists of processors, the rectangular components that can process, verify, filter, join, split, or adjust data. They exchange pieces of information called FlowFiles through queues named connections, and the FlowFile Controller helps to manage the resources between those components.
Web scraping is a process to extract and collect structured web data with automation. It includes extracting and processing underlying HTML code using CSS selectors and the extracted data gets stored into a database.
Apache Hive is a warehouse system built on top of Hadoop used for data summarization, query, and ad-hoc analysis.
Steps for ETL Workflow
Fig:- End-to-End NiFi WorkFlow
The above flow comprises multiple processors each performing different tasks at different stages to process data. The different stages are Collect (InvokeHTTP – API Web Page, InvokeHTTP – Download District Data), Filter (GetHTMLElement, ExtractEndPoints, RouteOnAttribute – District API, QueryRecord), Transform (ReplaceHeaders, ConvertJSONToSQL), Load (PutHiveQL), and Logging (LogAttribute). Each processor is connected through different relationship connections and gets triggered on success until the data gets loaded into the table. The entire flow is scheduled to run daily.
So, let’s dig into each step to understand the flow better.
1. Get the HTML document using the Remote URL
The flow starts with an InvokeHTTP processor that sends an HTTP GET request to the COVID19-India API URL and returns an HTML page in the response queue for further inspection. The processor can be used to invoke multiple HTTP methods (GET, PUT, POST, or PATCH) as well.
Fig:- InvokeHTTP – API Web Page Configuration
2. Extract listed endpoints
The second step occurs when the GETHTMLElement processor targets HTML table rows from the response where all the endpoints are listed inside anchor tags using the CSS selector as tr > td > a. and extracts data into FlowFiles.
Fig:- GetHTMLElement Configuration
After the success of the previous step, the ExtractText processor evaluates regular expressions against the content of the FlowFile to extract the URLs, which are then assigned to a FlowFile attribute named data_url.
Fig:- ExtractEndPoints Configuration
Note: The layout of the web page may have changed in the future. So, if you are reading this article in the future, configure the above processors as per the layout changes if any.
3. Pick districts API and Download the dataset
Here, the RouteOnAttribute processor filters out an API for district-level information and ignores other APIs using Apache NiFi Expression since we are only interested in district.csv.
Fig:- RouteOnAttribute – District API Configuration
And this time, the InvokeHTTP processor downloads the data using the extracted API endpoint assigned to the attribute data_url surrounded with curly braces and the response data will be in the CSV format.
Fig:- InvokeHTTP – Download District Data Configuration
4. Transform and Filter the dataset
In this stage, the header of the response data is changed to lowercase using the ReplaceText processor with Literal Replace strategy, and the first field name is changed from date to recorded_date to avoid using reserved database keywords.
Since the data is being updated daily on an incremental basis, we will only extract the data from the previous day using the QueryRecord processor. It will also convert the CSV data into JSON FlowFile using the CSVReader and JsonRecordSetWriter controller services.
Please note that both the CSVReader and JsonRecordSetWriter services can have the default settings for our use. You can check out this blog for more reading on the controller services.
And as mentioned, QueryRecord evaluates the below query to get data from the previous day out of the FlowFile and passes it to the next processor.
select * from FlowFile where recorded_date=’${now():toNumber():minus(86400000):format(‘yyyy-MM-dd’)}’
Fig:- ReplaceHeaders Configuration
Fig:- QueryRecord Configuration
5. Establish JDBC connection pool for Hive and create a table
Let’s set up the Hive JDBC driver for the NiFi flow using HiveConnectinPool with required local/remote configurations (database connection URL, user, and password). Hive Configuration Resources property expects Hive configuration file path, i.e., hive-site.xml.
Fig:- HiveConnectionPool Setup
Now, we need an empty table to load the data from the NiFi flow, and to do so, you can use the DDL structure below:
CREATETABLEIFNOTEXISTS demo.districts (recorded_date string, state string, district string, confirmed string, recovered string, deceased string, other string, tested string)ROWFORMATDELIMITEDFIELDSTERMINATEDBY',';
6. Load data into the Hive table
In this step, the JSON-formatted FlowFile is converted into an SQL statement using ConvertJSONToSQL to provide a SQL query as the output FlowFile. We can configure the HiveConnectinPool for the JDBC Connection Pool property along with the table name and statement type before running the processor. In this case, the statement would be an inserttype since we need to load the data into the table.
Also, please note that when preparing a SQL command, the SQL Parameter Attribute Prefix property should be hiveql. Otherwise, the very next processor will not be able to identify it and will throw an error.
Then, on success, PutHiveQL executes the input SQL command and loads the data into the table. The success of this processor marks the end of the workflow and the data can be verified by fetching the target table.
Fig:- ConvertJSONToSQL Configurations
Fig:- PutHiveQL Configuration
7. Schedule the flow for daily updates
You can schedule the entire flow to run at any given time using different NiFi scheduling strategies. Since the first InvokeHTTP is the initiator of this flow, we can configure it to run daily at 2 AM.
Fig:- Scheduling Strategy
8. Log Management
Almost every processor has been directed to the LogAttribute processor with a failure/success queue, which will write the state and information of all used attributes into the NiFi file, logs/nifi-app.log. By checking this file, we can debug and fix the issue in case of any failure. To extend it even further, we can also set up a flow to capture and notify error logs using Apache Kafka over email.
9. Consume data for analysis
You can use various open-source visualization tools to start off with the exploratory data analysis on the data stored in the Hive table.
You can download the template covid_etl_workflow.xml and run it on your machine for reference.
Future Scope
There are different ways to build any workflow, and this was one of them. You can take this further by allowing multiple datasets (state_wise, test_datasets) from the list with different combinations of various processors/controllers as a part of the flow.
You can also try scraping data from a product listing page of multiple e-commerce websites for a comparison between goods and price or you can even extract movie reviews and ratings from the IMDb website and use it as a recommendation for users.
Conclusion
In this article, we discussed Apache NiFi and created a workflow to extract, filter, transform, and load the data for analysis purposes. If you are more comfortable building logics and want to focus on the architecture with less code, then Apache NiFi is the tool for you.
R Systems helped Olli set up and configure Disaster Recovery using CloudEndure (AWS DR service) to ensure real-time, asynchronous, block-level replication from on-premises to AWS.