Category: Industry

  • Building Dynamic Forms in React Using Formik

    Every day we see a huge number of web applications allowing us customizations. It involves drag & drop or metadata-driven UI interfaces to support multiple layouts while having a single backend. Feedback taking system is one of the simplest examples of such products, where on the admin side, one can manage the layout and on the consumer side, users are shown that layout to capture the data. This post focuses on building a microframework to support such use cases with the help of React and Formik.

    Building big forms in React can be extremely time consuming and tedious when structural changes are requested. Handling their validations also takes too much time in the development life cycle. If we use Redux-based solutions to simplify this, like Redux-form, we see a lot of performance bottlenecks. So here comes Formik!

    Why Formik?

    “Why” is one of the most important questions while solving any problem. There are quite a few reasons to lean towards Formik for the implementation of such systems, such as:

    • Simplicity
    • Advanced validation support with Yup
    • Good community support with a lot of people helping on Github

    Being said that, it’s one of the easiest frameworks for quick form building activities. Formik’s clean API lets us use it without worrying about a lot of state management.

    Yup is probably the best library out there for validation and Formik provides out of the box support for Yup validations which makes it more programmer-friendly!!

    API Responses:

    We need to follow certain API structures to let our React code understand which component to render where.

    Let’s assume we will be getting responses from the backend API in the following fashion.

    [{
       “type” : “text”,
       “field”: “name”
       “name” : “User’s name”,
       “style” : {
             “width” : “50%
        }
    }]

    We can have any number of fields but each one will have two mandatory unique properties type and field. We will use those properties to build UI as well as response.

    So let’s start with building the simplest form with React and Formik.

    import React from 'react';
    import { useFormik } from 'formik';
    
    const SignupForm = () => {
      const formik = useFormik({
        initialValues: {
          email: '',
        },
        onSubmit: values => {
          alert(JSON.stringify(values, null, 2));
        },
      });
      return (
        <form onSubmit={formik.handleSubmit}>
          <label htmlFor="email">Email Address</label>
          <input
            id="email"
            name="email"
            type="email"
            onChange={formik.handleChange}
            value={formik.values.email}
          />
          <button type="submit">Submit</button>
        </form>
      );
    };
    
    export default SignupForm;

    import React from 'react';
    
    export default ({ name }) => <h1>Hello {name}!</h1>;

    <div id="root"></div>

    import React, { Component } from 'react';
    import { render } from 'react-dom';
    import Basic from './Basic';
    import './style.css';
    
    class App extends Component {
      constructor() {
        super();
        this.state = {
          name: 'React'
        };
      }
    
      render() {
        return (
          <div>
            <Basic />
          </div>
        );
      }
    }
    
    render(<App />, document.getElementById('root'));

    {
      "name": "react",
      "version": "0.0.0",
      "private": true,
      "dependencies": {
        "react": "^16.12.0",
        "react-dom": "^16.12.0",
        "formik": "latest"
      },
      "scripts": {
        "start": "react-scripts start",
        "build": "react-scripts build",
        "test": "react-scripts test --env=jsdom",
        "eject": "react-scripts eject"
      },
      "devDependencies": {
        "react-scripts": "latest"
      }
    }

    h1, p {
      font-family: Lato;
    }

    You can view the fiddle of above code here to see the live demo.

    We will go with the latest functional components to build this form. You can find more information on useFormik hook at useFormik Hook documentation.  

    It’s nothing more than just a wrapper for Formik functionality.

    Adding dynamic nature

    So let’s first create and import the mocked API response to build the UI dynamically.

    import React from 'react';
    import { useFormik } from 'formik';
    import response from "./apiresponse"
    
    const SignupForm = () => {
      const formik = useFormik({
        initialValues: {
          email: '',
        },
        onSubmit: values => {
          alert(JSON.stringify(values, null, 2));
        },
      });
      return (
        <form onSubmit={formik.handleSubmit}>
          <label htmlFor="email">Email Address</label>
          <input
            id="email"
            name="email"
            type="email"
            onChange={formik.handleChange}
            value={formik.values.email}
          />
          <button type="submit">Submit</button>
        </form>
      );
    };
    
    export default SignupForm;

    You can view the fiddle here.

    We simply imported the file and made it available for processing. So now, we need to write the logic to build components dynamically.
    So let’s visualize the DOM hierarchy of components possible:

    <Container>
    	<TextField />
    	<NumberField />
    	<Container />
    		<TextField />
    		<BooleanField />
    	</Container >
    </Container>

    We can have a recurring container within the container, so let’s address this by adding a children attribute in API response.

    export default [
      {
        "type": "text",
        "field": "name",
        "label": "User's name"
      },
      {
        "type": "number",
        "field": "number",
        "label": "User's age",
      },
      {
        "type": "none",
        "field": "none",
        "children": [
          {
            "type": "text",
            "field": "user.hobbies",
            "label": "User's hobbies"
          }
        ]
      }
    ]

    You can see the fiddle with response processing here with live demo.

    To process the recursive nature, we will create a separate component.

    import React, { useMemo } from 'react';
    
    const RecursiveContainer = ({config, formik}) => {
      const builder = (individualConfig) => {
        switch (individualConfig.type) {
          case 'text':
            return (
                    <>
                    <div>
                      <label htmlFor={individualConfig.field}>{individualConfig.label}</label>
                      <input type='text' 
                        name={individualConfig.field} 
                        onChange={formik.handleChange} style={{...individualConfig.style}} />
                      </div>
                    </>
                  );
          case 'number':
            return (
              <>
                <div>
                  <label htmlFor={individualConfig.field}>{individualConfig.label}</label>
                      <input type='number' 
                        name={individualConfig.field} 
                        onChange={formik.handleChange} style={{...individualConfig.style}} />
                </div>
              </>
            )
          case 'array':
            return (
              <RecursiveContainer config={individualConfig.children || []} formik={formik} />
            );
          default:
            return <div>Unsupported field</div>
        }
      }
    
      return (
        <>
          {config.map((c) => {
            return builder(c);
          })}
        </>
      );
    };
    
    export default RecursiveContainer;

    You can view the complete fiddle of the recursive component here.

    So what we do in this is pretty simple. We pass config which is a JSON object that is retrieved from the API response. We simply iterate through config and build the component based on type. When the type is an array, we create the same component RecursiveContainer which is basic recursion.

    We can optimize it by passing the depth and restricting to nth possible depth to avoid going out of stack errors at runtime. Specifying the depth will ultimately make it less prone to runtime errors. There is no standard limit, it varies from use case to use case. If you are planning to build a system that is based on a compliance questionnaire, it can go to a max depth of 5 to 7, while for the basic signup form, it’s often seen to be only 2.

    So we generated the forms but how do we validate them? How do we enforce required, min, max checks on the form?

    For this, Yup is very helpful. Yup is an object schema validation library that helps us validate the object and give us results back. Its chaining like syntax makes it very much easier to build incremental validation functions.

    Yup provides us with a vast variety of existing validations. We can combine them, specify error or warning messages to be thrown and much more.

    You can find more information on Yup at Yup Official Documentation

    To build a validation function, we need to pass a Yup schema to Formik.

    Here is a simple example: 

    import React from 'react';
    import { useFormik } from 'formik';
    import response from "./apiresponse"
    import RecursiveContainer from './RecursiveContainer';
    import * as yup from 'yup';
    
    const SignupForm = () => {
      const signupSchema = yup.object().shape({
          name: yup.string().required()
      });
    
      const formik = useFormik({
        initialValues: {
        },
        onSubmit: values => {
          alert(JSON.stringify(values, null, 2));
        },
        validationSchema: signupSchema
      });
      console.log(formik, response)
      return (
        <form onSubmit={formik.handleSubmit}>
          <RecursiveContainer config={response} formik={formik} />
          <button type="submit">Submit</button>
        </form>
      );
    };
    
    export default SignupForm;

    You can see the schema usage example here.

    In this example, we simply created a schema and passed it to useFormik hook. You can notice now unless and until the user enters the name field, the form submission is not working.

    Here is a simple hack to make the button disabled until all necessary fields are filled.

    import React from 'react';
    import { useFormik } from 'formik';
    import response from "./apiresponse"
    import RecursiveContainer from './RecursiveContainer';
    import * as yup from 'yup';
    
    const SignupForm = () => {
      const signupSchema = yup.object().shape({
          name: yup.string().required()
      });
    
      const formik = useFormik({
        initialValues: {
        },
        onSubmit: values => {
          alert(JSON.stringify(values, null, 2));
        },
        validationSchema: signupSchema
      });
      console.log(formik, response)
      return (
        <form onSubmit={formik.handleSubmit}>
          <RecursiveContainer config={response} formik={formik} />
          <button type="submit" disabled={!formik.isValid}>Submit</button>
        </form>
      );
    };
    
    export default SignupForm;

    You can see how to use submit validation with live fiddle here

    We do get a vast variety of output from Formik while the form is being rendered and we can use them the way it suits us. You can find the full API of Formik at Formik Official Documentation

    So existing validations are fine but we often get into cases where we would like to build our own validations. How do we write them and integrate them with Yup validations?

    For this, there are 2 different ways with Formik + Yup. Either we can extend the Yup to support the additional validation or pass validation function to the Formik. The validation function approach is much simpler. You just need to write a function that gives back an error object to Formik. As simple as it sounds, it does get messy at times.

    So we will see an example of adding custom validation to Yup. Yup provides us an addMethod interface to add our own user-defined validations in the application.

    Let’s say we want to create an alias for existing validation for supporting casing because that’s the most common mistake we see. Url becomes url, trim is coming from the backend as Trim. These method names are case sensitive so if we say Yup.Url, it will fail. But with Yup.url, we get a function. These are just some examples, but you can also alias them with some other names like I can have an alias required to be as readable as NotEmpty.

    The usage is very simple and straightforward as follows: 

    yup.addMethod(yup.string, “URL”, function(...args) {
    return this.url(...args);
    });

    This will create an alias for url as URL.

    Here is an example of custom method validation which takes Y and N as boolean values.

    const validator = function (message) {
        return this.test('is-string-boolean', message, function (value) {
          if (isEmpty(value)) {
            return true;
          }
    
          if (['Y', 'N'].indexOf(value) !== -1) {
            return true;
          } else {
            return false;
          }
        });
      };

    With the above, we will be able to execute yup.string().stringBoolean() and yup.string().StringBoolean().

    It’s a pretty handy syntax that lets users create their own validations. You can create many more validations in your project to be used with Yup and reuse them wherever required.

    So writing schema is also a cumbersome task and is useless if the form is dynamic. When the form is dynamic then validations also need to be dynamic. Yup’s chaining-like syntax lets us achieve it very easily.

    We will consider that the backend sends us additional following things with metadata.

    [{
       “type” : “text”,
       “field”: “name”
       “name” : “User’s name”,
       “style” : {
             “width” : “50%
        },
       “validationType”: “string”,
       “validations”: [{
              type: “required”,
              params: [“Name is required”]
        }]
    }]

    validationType will hold the Yup’s data types like string, number, date, etc and validations will hold the validations that need to be applied to that field.

    So let’s have a look at the following snippet which utilizes the above structure and generates dynamic validation.

    import * as yup from 'yup';
    
    /** Adding just additional methods here */
    
    yup.addMethod(yup.string, "URL", function(...args) {
        return this.url(...args);
    });
    
    
    const validator = function (message) {
        return this.test('is-string-boolean', message, function (value) {
          if (isEmpty(value)) {
            return true;
          }
    
          if (['Y', 'N'].indexOf(value) !== -1) {
            return true;
          } else {
            return false;
          }
        });
      };
    
    yup.addMethod(yup.string, "stringBoolean", validator);
    yup.addMethod(yup.string, "StringBoolean", validator);
    
    
    
    
    export function createYupSchema(schema, config) {
      const { field, validationType, validations = [] } = config;
      if (!yup[validationType]) {
        return schema;
      }
      let validator = yup[validationType]();
      validations.forEach((validation) => {
        const { params, type } = validation;
        if (!validator[type]) {
          return;
        }
        validator = validator[type](...params);
      });
      if (field.indexOf('.') !== -1) {
        // nested fields are not covered in this example but are eash to handle tough
      } else {
        schema[field] = validator;
      }
    
      return schema;
    }
    
    export const getYupSchemaFromMetaData = (
      metadata,
      additionalValidations,
      forceRemove
    ) => {
      const yepSchema = metadata.reduce(createYupSchema, {});
      const mergedSchema = {
        ...yepSchema,
        ...additionalValidations,
      };
    
      forceRemove.forEach((field) => {
        delete mergedSchema[field];
      });
    
      const validateSchema = yup.object().shape(mergedSchema);
    
      return validateSchema;
    };

    You can see the complete live fiddle with dynamic validations with formik here.

    Here we have added the above code snippets to show how easily we can add a new method to Yup. Along with it, there are two functions createYupSchema and getYupSchemaFromMetaData which drive the whole logic for building dynamic schema. We are passing the validations in response and building the validation from it.

    createYupSchema simply builds Yup validation based on the validation array and validationType. getYupSchemaFromMetaData basically iterates over the response array and builds Yup validation for each field and at the end, it wraps it in the Object schema. In this way, we can generate dynamic validations. One can even go further and create nested validations with recursion.

    Conclusion

    It’s often seen that adding just another field is time-consuming in the traditional approach of writing the large boilerplate for forms, while with this approach, it eliminates the need for hardcoding the fields and allows them to be backend-driven. 

    Formik provides very optimized state management which reduces performance issues that we generally see when Redux is used and updated quite frequently.

    As we see above, it’s very easy to build dynamic forms with Formik. We can save the templates and even create template libraries that are very common with question and answer systems. If utilized correctly, we can simply have the templates saved in some NoSQL databases, like MongoDB and can generate a vast number of forms quickly with ease along with validations.

    To learn more and build optimized solutions you can also refer to <fastfield> and <field> APIs at their </field></fastfield>official documentation. Thanks for reading!

  • Building A Containerized Microservice in Golang: A Step-by-step Guide

    With the evolving architectural design of web applications, microservices have been a successful new trend in architecting the application landscape. Along with the advancements in application architecture, transport method protocols, such as REST and gRPC are getting better in efficiency and speed. Also, containerizing microservice applications help greatly in agile development and high-speed delivery.

    In this blog, I will try to showcase how simple it is to build a cloud-native application on the microservices architecture using Go.

    We will break the solution into multiple steps. We will learn how to:

    1) Build a microservice and set of other containerized services which will have a very specific set of independent tasks and will be related only with the specific logical component.

    2) Use go-kit as the framework for developing and structuring the components of each service.

    3) Build APIs that will use HTTP (REST) and Protobuf (gRPC) as the transport mechanisms, PostgreSQL for databases and finally deploy it on Azure stack for API management and CI/CD.

    Note: Deployment, setting up the CI-CD and API-Management on Azure or any other cloud is not in the scope of the current blog.

    Prerequisites:

    • A beginner’s level of understanding of web services, Rest APIs and gRPC
    • GoLand/ VS Code
    • Properly installed and configured Go. If not, check it out here
    • Set up a new project directory under the GOPATH
    • Understanding of the standard Golang project. For reference, visit here
    • PostgreSQL client installed
    • Go kit

    What are we going to do?

    We will develop a simple web application working on the following problem statement:

    • A global publishing company that publishes books and journals wants to develop a service to watermark their documents. A document (books, journals) has a title, author and a watermark property
    • The watermark operation can be in Started, InProgress and Finished status
    • The specific set of users should be able to do the watermark on a document
    • Once the watermark is done, the document can never be re-marked

    Example of a document:

    {content: “book”, title: “The Dark Code”, author: “Bruce Wayne”, topic: “Science”}

    For a detailed understanding of the requirement, please refer to this.

    Architecture:

    In this project, we will have 3 microservices: Authentication Service, Database Service and the Watermark Service. We have a PostgreSQL database server and an API-Gateway.

    Authentication Service:

    The application is supposed to have a role-based and user-based access control mechanism. This service will authenticate the user according to its specific role and return HTTP status codes only. 200 when the user is authorized and 401 for unauthorized users.

    APIs:

    • /user/access, Method: GET, Secured: True, payload: user: <name></name>
      It will take the user name as an input and the auth service will return the roles and the privileges assigned to it
    • /authenticate, Method: GET, Secured: True, payload: user: <name>, operation: <op></op></name>
      It will authenticate the user with the passed operation if it is accessible for the role
    • /healthz, Method: GET, Secured: True
      It will return the status of the service

    Database Service:

    We will need databases for our application to store the user, their roles and the access privileges to that role. Also, the documents will be stored in the database without the watermark. It is a requirement that any document cannot have a watermark at the time of creation. A document is said to be created successfully only when the data inputs are valid and the database service returns the success status.

    We will be using two databases for two different services for them to be consumed. This design is not necessary, but just to follow the “Single Database per Service” rule under the microservice architecture.

    APIs:

    • /get, Method: GET, Secured: True, payload: filters: []filter{“field-name”: “value”}
      It will return the list of documents according to the specific filters passed
    • /update, Method: POST, Secured: True, payload: “Title”: <id>, document: {“field”: “value”, …}</id>
      It will update the document for the given title id
    • /add, Method: POST, Secured: True, payload: document: {“field”: “value”, …}
      It will add the document and return the title-ID
    • /remove Method: POST, Secured: True, payload: title: <id></id>
      It will remove the document entry according to the passed title-id
    • /healthz, Method: GET, Secured: True
      It will return the status of the service

    Watermark Service:

    This is the main service that will perform the API calls to watermark the passed document. Every time a user needs to watermark a document, it needs to pass the TicketID in the watermark API request along with the appropriate Mark. It will try to call the database Update API internally with the provided request and returns the status of the watermark process which will be initially “Started”, then in some time the status will be “InProgress” and if the call was valid, the status will be “Finished”, or “Error”, if the request is not valid.

    APIs:

    • /get, Method: GET, Secured: True, payload: filters: []filter{“field-name”: “value”}
      It will return the list of documents according to the specific filters passed
    • /status, Method: GET, Secured: True, payload: “Ticket”: <id></id>
      It will return the status of the document for watermark operation for the passed ticket-id
    • /addDocument, Method: POST, Secured: True, payload: document: {“field”: “value”, …}
      It will add the document and return the title-ID
    • /watermark, Method: POST, Secured: True, payload: title: <id>, mark: “string”</id>
      It is the main watermark operation API which will accept the mark string
    • /healthz, Method: GET, Secured: True
      It will return the status of the service

    Operations and Flow:

    Watermark Service APIs are the only ones that will be used by the user/actor to request watermark or add the document. Authentication and Database service APIs are the private ones that will be called by other services internally. The only URL accessible to the user is the API Gateway URL.

    1. The user will access the API Gateway URL with the required user name, the ticket-id and the mark with which the user wants the document to apply watermark
    2. The user should not know about the authentication or database services
    3. Once the request is made by the user, it will be accepted by the API Gateway. The gateway will validate the request along with the payload
    4. An API forwarding rule of configuring the traffic of a specific request to a service should be defined in the gateway. The request when validated, will be forwarded to the service according to that rule.
    5. We will define an API forwarding rule where the request made for any watermark will be first forwarded to the authentication service which will authenticate the request, check for authorized users and return the appropriate status code.
    6. The authorization service will check for the user from which the request has been made, into the user database and its roles and permissions. It will send the response accordingly
    7. Once the request has been authorized by the service, it will be forwarded back to the actual watermark service
    8. The watermark service then performs the appropriate operation of putting the watermark on the document or add a new entry of the document or any other request
    9. The operation from the watermark service of Get, Watermark or AddDocument will be performed by calling the database CRUD APIs and forwarded to the user
    10. If the request is to AddDocument then the service should return the “TicketID” or if it is for watermark then it should return the status of  the operation

    Note:

    Each user will have some specific roles, based on which the access controls will be identified for the user. For the sake of simplicity, the roles will be based on the type of document only, not the specific name of the book or journal

    Getting Started:

    Let’s start by creating a folder for our application in the $GOPATH. This will be the root folder containing our set of services.

    Project Layout:

    The project will follow the standard Golang project layout. If you want the full working code, please refer here

    • api: Stores the versions of the APIs swagger files and also the proto and pb files for the gRPC protobuf interface.
    • cmd: This will contain the entry point (main.go) files for all the services and also any other container images if any
    • docs: This will contain the documentation for the project
    • config: All the sample files or any specific configuration files should be stored here
    • deploy: This directory will contain the deployment files used to deploy the application
    • internal: This package is the conventional internal package identified by the Go compiler. It contains all the packages which need to be private and imported by its child directories and immediate parent directory. All the packages from this directory are common across the project
    • pkg: This directory will have the complete executing code of all the services in separate packages.
    • tests: It will have all the integration and E2E tests
    • vendor: This directory stores all the third-party dependencies locally so that the version doesn’t mismatch later

    We are going to use the Go kit framework for developing the set of services. The official Go kit examples of services are very good, though the documentation is not that great.

    Watermark Service:

    1. Under the Go kit framework, a service should always be represented by an interface.

    Create a package named watermark in the pkg folder. Create a new service.go file in that package. This file is the blueprint of our service.

    package watermark
    
    import (
    	"context"
    
    	"github.com/velotiotech/watermark-service/internal"
    )
    
    type Service interface {
    	// Get the list of all documents
    	Get(ctx context.Context, filters ...internal.Filter) ([]internal.Document, error)
    	Status(ctx context.Context, ticketID string) (internal.Status, error)
    	Watermark(ctx context.Context, ticketID, mark string) (int, error)
    	AddDocument(ctx context.Context, doc *internal.Document) (string, error)
    	ServiceStatus(ctx context.Context) (int, error)
    }

    2. As per the functions defined in the interface, we will need five endpoints to handle the requests for the above methods. If you are wondering why we are using a context package, please refer here. Contexts enable the microservices to handle the multiple concurrent requests, but maybe in this blog, we are not using it too much. It’s just the best way to work with it.

    3. Implementing our service:

    package watermark
    
    import (
    	"context"
    	"net/http"
    	"os"
    
    	"github.com/velotiotech/watermark-service/internal"
    
    	"github.com/go-kit/kit/log"
    	"github.com/lithammer/shortuuid/v3"
    )
    
    type watermarkService struct{}
    
    func NewService() Service { return &watermarkService{} }
    
    func (w *watermarkService) Get(_ context.Context, filters ...internal.Filter) ([]internal.Document, error) {
    	// query the database using the filters and return the list of documents
    	// return error if the filter (key) is invalid and also return error if no item found
    	doc := internal.Document{
    		Content: "book",
    		Title:   "Harry Potter and Half Blood Prince",
    		Author:  "J.K. Rowling",
    		Topic:   "Fiction and Magic",
    	}
    	return []internal.Document{doc}, nil
    }
    
    func (w *watermarkService) Status(_ context.Context, ticketID string) (internal.Status, error) {
    	// query database using the ticketID and return the document info
    	// return err if the ticketID is invalid or no Document exists for that ticketID
    	return internal.InProgress, nil
    }
    
    func (w *watermarkService) Watermark(_ context.Context, ticketID, mark string) (int, error) {
    	// update the database entry with watermark field as non empty
    	// first check if the watermark status is not already in InProgress, Started or Finished state
    	// If yes, then return invalid request
    	// return error if no item found using the ticketID
    	return http.StatusOK, nil
    }
    
    func (w *watermarkService) AddDocument(_ context.Context, doc *internal.Document) (string, error) {
    	// add the document entry in the database by calling the database service
    	// return error if the doc is invalid and/or the database invalid entry error
    	newTicketID := shortuuid.New()
    	return newTicketID, nil
    }
    
    func (w *watermarkService) ServiceStatus(_ context.Context) (int, error) {
    	logger.Log("Checking the Service health...")
    	return http.StatusOK, nil
    }
    
    var logger log.Logger
    
    func init() {
    	logger = log.NewLogfmtLogger(log.NewSyncWriter(os.Stderr))
    	logger = log.With(logger, "ts", log.DefaultTimestampUTC)
    }

    We have defined the new type watermarkService empty struct which will implement the above-defined service interface. This struct implementation will be hidden from the rest of the world.

    NewService() is created as the constructor of our “object”. This is the only function available outside this package to instantiate the service.

    4. Now we will create the endpoints package which will contain two files. One is where we will store all types of requests and responses. The other file will be endpoints which will have the actual implementation of the requests parsing and calling the appropriate service function.

    – Create a file named reqJSONMap.go. We will define all the requests and responses struct with the fields in this file such as GetRequest, GetResponse, StatusRequest, StatusResponse, etc. Add the necessary fields in these structs which we want to have input in a request or we want to pass the output in the response.

    package endpoints
    
    import "github.com/velotiotech/watermark-service/internal"
    
    type GetRequest struct {
    	Filters []internal.Filter `json:"filters,omitempty"`
    }
    
    type GetResponse struct {
    	Documents []internal.Document `json:"documents"`
    	Err       string              `json:"err,omitempty"`
    }
    
    type StatusRequest struct {
    	TicketID string `json:"ticketID"`
    }
    
    type StatusResponse struct {
    	Status internal.Status `json:"status"`
    	Err    string          `json:"err,omitempty"`
    }
    
    type WatermarkRequest struct {
    	TicketID string `json:"ticketID"`
    	Mark     string `json:"mark"`
    }
    
    type WatermarkResponse struct {
    	Code int    `json:"code"`
    	Err  string `json:"err"`
    }
    
    type AddDocumentRequest struct {
    	Document *internal.Document `json:"document"`
    }
    
    type AddDocumentResponse struct {
    	TicketID string `json:"ticketID"`
    	Err      string `json:"err,omitempty"`
    }
    
    type ServiceStatusRequest struct{}
    
    type ServiceStatusResponse struct {
    	Code int    `json:"status"`
    	Err  string `json:"err,omitempty"`
    }

    – Create a file named endpoints.go. This file will contain the actual calling of the service implemented functions.

    package endpoints
    
    import (
    	"context"
    	"errors"
    	"os"
    
    	"github.com/aayushrangwala/watermark-service/internal"
    	"github.com/aayushrangwala/watermark-service/pkg/watermark"
    
    	"github.com/go-kit/kit/endpoint"
    	"github.com/go-kit/kit/log"
    )
    
    type Set struct {
    	GetEndpoint           endpoint.Endpoint
    	AddDocumentEndpoint   endpoint.Endpoint
    	StatusEndpoint        endpoint.Endpoint
    	ServiceStatusEndpoint endpoint.Endpoint
    	WatermarkEndpoint     endpoint.Endpoint
    }
    
    func NewEndpointSet(svc watermark.Service) Set {
    	return Set{
    		GetEndpoint:           MakeGetEndpoint(svc),
    		AddDocumentEndpoint:   MakeAddDocumentEndpoint(svc),
    		StatusEndpoint:        MakeStatusEndpoint(svc),
    		ServiceStatusEndpoint: MakeServiceStatusEndpoint(svc),
    		WatermarkEndpoint:     MakeWatermarkEndpoint(svc),
    	}
    }
    
    func MakeGetEndpoint(svc watermark.Service) endpoint.Endpoint {
    	return func(ctx context.Context, request interface{}) (interface{}, error) {
    		req := request.(GetRequest)
    		docs, err := svc.Get(ctx, req.Filters...)
    		if err != nil {
    			return GetResponse{docs, err.Error()}, nil
    		}
    		return GetResponse{docs, ""}, nil
    	}
    }
    
    func MakeStatusEndpoint(svc watermark.Service) endpoint.Endpoint {
    	return func(ctx context.Context, request interface{}) (interface{}, error) {
    		req := request.(StatusRequest)
    		status, err := svc.Status(ctx, req.TicketID)
    		if err != nil {
    			return StatusResponse{Status: status, Err: err.Error()}, nil
    		}
    		return StatusResponse{Status: status, Err: ""}, nil
    	}
    }
    
    func MakeAddDocumentEndpoint(svc watermark.Service) endpoint.Endpoint {
    	return func(ctx context.Context, request interface{}) (interface{}, error) {
    		req := request.(AddDocumentRequest)
    		ticketID, err := svc.AddDocument(ctx, req.Document)
    		if err != nil {
    			return AddDocumentResponse{TicketID: ticketID, Err: err.Error()}, nil
    		}
    		return AddDocumentResponse{TicketID: ticketID, Err: ""}, nil
    	}
    }
    
    func MakeWatermarkEndpoint(svc watermark.Service) endpoint.Endpoint {
    	return func(ctx context.Context, request interface{}) (interface{}, error) {
    		req := request.(WatermarkRequest)
    		code, err := svc.Watermark(ctx, req.TicketID, req.Mark)
    		if err != nil {
    			return WatermarkResponse{Code: code, Err: err.Error()}, nil
    		}
    		return WatermarkResponse{Code: code, Err: ""}, nil
    	}
    }
    
    func MakeServiceStatusEndpoint(svc watermark.Service) endpoint.Endpoint {
    	return func(ctx context.Context, request interface{}) (interface{}, error) {
    		_ = request.(ServiceStatusRequest)
    		code, err := svc.ServiceStatus(ctx)
    		if err != nil {
    			return ServiceStatusResponse{Code: code, Err: err.Error()}, nil
    		}
    		return ServiceStatusResponse{Code: code, Err: ""}, nil
    	}
    }
    
    func (s *Set) Get(ctx context.Context, filters ...internal.Filter) ([]internal.Document, error) {
    	resp, err := s.GetEndpoint(ctx, GetRequest{Filters: filters})
    	if err != nil {
    		return []internal.Document{}, err
    	}
    	getResp := resp.(GetResponse)
    	if getResp.Err != "" {
    		return []internal.Document{}, errors.New(getResp.Err)
    	}
    	return getResp.Documents, nil
    }
    
    func (s *Set) ServiceStatus(ctx context.Context) (int, error) {
    	resp, err := s.ServiceStatusEndpoint(ctx, ServiceStatusRequest{})
    	svcStatusResp := resp.(ServiceStatusResponse)
    	if err != nil {
    		return svcStatusResp.Code, err
    	}
    	if svcStatusResp.Err != "" {
    		return svcStatusResp.Code, errors.New(svcStatusResp.Err)
    	}
    	return svcStatusResp.Code, nil
    }
    
    func (s *Set) AddDocument(ctx context.Context, doc *internal.Document) (string, error) {
    	resp, err := s.AddDocumentEndpoint(ctx, AddDocumentRequest{Document: doc})
    	if err != nil {
    		return "", err
    	}
    	adResp := resp.(AddDocumentResponse)
    	if adResp.Err != "" {
    		return "", errors.New(adResp.Err)
    	}
    	return adResp.TicketID, nil
    }
    
    func (s *Set) Status(ctx context.Context, ticketID string) (internal.Status, error) {
    	resp, err := s.StatusEndpoint(ctx, StatusRequest{TicketID: ticketID})
    	if err != nil {
    		return internal.Failed, err
    	}
    	stsResp := resp.(StatusResponse)
    	if stsResp.Err != "" {
    		return internal.Failed, errors.New(stsResp.Err)
    	}
    	return stsResp.Status, nil
    }
    
    func (s *Set) Watermark(ctx context.Context, ticketID, mark string) (int, error) {
    	resp, err := s.WatermarkEndpoint(ctx, WatermarkRequest{TicketID: ticketID, Mark: mark})
    	wmResp := resp.(WatermarkResponse)
    	if err != nil {
    		return wmResp.Code, err
    	}
    	if wmResp.Err != "" {
    		return wmResp.Code, errors.New(wmResp.Err)
    	}
    	return wmResp.Code, nil
    }
    
    var logger log.Logger
    
    func init() {
    	logger = log.NewLogfmtLogger(log.NewSyncWriter(os.Stderr))
    	logger = log.With(logger, "ts", log.DefaultTimestampUTC)
    }

    In this file, we have a struct Set which is the collection of all the endpoints. We have a constructor for the same. We have the internal constructor functions which will return the objects which implement the generic endpoint. Endpoint interface of Go kit such as MakeGetEndpoint(), MakeStatusEndpoint() etc.

    In order to expose the Get, Status, Watermark, ServiceStatus and AddDocument APIs, we need to create endpoints for all of them. These functions handle the incoming requests and call the specific service methods

    5. Adding the Transports method to expose the services. Our services will support HTTP and will be exposed using Rest APIs and protobuf and gRPC.

    Create a separate package of transport in the watermark directory. This package will hold all the handlers, decoders and encoders for a specific type of transport mechanism

    6. Create a file http.go: This file will have the transport functions and handlers for HTTP with a separate path as the API routes.

    package transport
    
    import (
    	"context"
    	"encoding/json"
    	"net/http"
    	"os"
    
    	"github.com/velotiotech/watermark-service/internal/util"
    	"github.com/velotiotech/watermark-service/pkg/watermark/endpoints"
    
    	"github.com/go-kit/kit/log"
    	httptransport "github.com/go-kit/kit/transport/http"
    )
    
    func NewHTTPHandler(ep endpoints.Set) http.Handler {
    	m := http.NewServeMux()
    
    	m.Handle("/healthz", httptransport.NewServer(
    		ep.ServiceStatusEndpoint,
    		decodeHTTPServiceStatusRequest,
    		encodeResponse,
    	))
    	m.Handle("/status", httptransport.NewServer(
    		ep.StatusEndpoint,
    		decodeHTTPStatusRequest,
    		encodeResponse,
    	))
    	m.Handle("/addDocument", httptransport.NewServer(
    		ep.AddDocumentEndpoint,
    		decodeHTTPAddDocumentRequest,
    		encodeResponse,
    	))
    	m.Handle("/get", httptransport.NewServer(
    		ep.GetEndpoint,
    		decodeHTTPGetRequest,
    		encodeResponse,
    	))
    	m.Handle("/watermark", httptransport.NewServer(
    		ep.WatermarkEndpoint,
    		decodeHTTPWatermarkRequest,
    		encodeResponse,
    	))
    
    	return m
    }
    
    func decodeHTTPGetRequest(_ context.Context, r *http.Request) (interface{}, error) {
    	var req endpoints.GetRequest
    	if r.ContentLength == 0 {
    		logger.Log("Get request with no body")
    		return req, nil
    	}
    	err := json.NewDecoder(r.Body).Decode(&req)
    	if err != nil {
    		return nil, err
    	}
    	return req, nil
    }
    
    func decodeHTTPStatusRequest(ctx context.Context, r *http.Request) (interface{}, error) {
    	var req endpoints.StatusRequest
    	err := json.NewDecoder(r.Body).Decode(&req)
    	if err != nil {
    		return nil, err
    	}
    	return req, nil
    }
    
    func decodeHTTPWatermarkRequest(_ context.Context, r *http.Request) (interface{}, error) {
    	var req endpoints.WatermarkRequest
    	err := json.NewDecoder(r.Body).Decode(&req)
    	if err != nil {
    		return nil, err
    	}
    	return req, nil
    }
    
    func decodeHTTPAddDocumentRequest(_ context.Context, r *http.Request) (interface{}, error) {
    	var req endpoints.AddDocumentRequest
    	err := json.NewDecoder(r.Body).Decode(&req)
    	if err != nil {
    		return nil, err
    	}
    	return req, nil
    }
    
    func decodeHTTPServiceStatusRequest(_ context.Context, _ *http.Request) (interface{}, error) {
    	var req endpoints.ServiceStatusRequest
    	return req, nil
    }
    
    func encodeResponse(ctx context.Context, w http.ResponseWriter, response interface{}) error {
    	if e, ok := response.(error); ok && e != nil {
    		encodeError(ctx, e, w)
    		return nil
    	}
    	return json.NewEncoder(w).Encode(response)
    }
    
    func encodeError(_ context.Context, err error, w http.ResponseWriter) {
    	w.Header().Set("Content-Type", "application/json; charset=utf-8")
    	switch err {
    	case util.ErrUnknown:
    		w.WriteHeader(http.StatusNotFound)
    	case util.ErrInvalidArgument:
    		w.WriteHeader(http.StatusBadRequest)
    	default:
    		w.WriteHeader(http.StatusInternalServerError)
    	}
    	json.NewEncoder(w).Encode(map[string]interface{}{
    		"error": err.Error(),
    	})
    }
    
    var logger log.Logger
    
    func init() {
    	logger = log.NewLogfmtLogger(log.NewSyncWriter(os.Stderr))
    	logger = log.With(logger, "ts", log.DefaultTimestampUTC)
    }

    This file is the map of the JSON payload to their requests and responses. It contains the HTTP handler constructor which registers the API routes to the specific handler function (endpoints) and also the decoder-encoder of the requests and responses respectively into a server object for a request. The decoders and encoders are basically defined just to translate the request and responses in the desired form to be processed. In our case, we are just converting the requests/responses using the json encoder and decoder into the appropriate request and response structs.

    We have the generic encoder for the response output, which is a simple JSON encoder.

    7. Create another file in the same transport package with the name grpc.go. Similar to above, the name of the file is self-explanatory. It is the map of protobuf payload to their requests and responses. We create a gRPC handler constructor which will create the set of grpcServers and registers the appropriate endpoint to the decoders and encoders of the request and responses

    package transport
    
    import (
    	"context"
    
    	"github.com/velotiotech/watermark-service/api/v1/pb/watermark"
    
    	"github.com/velotiotech/watermark-service/internal"
    	"github.com/velotiotech/watermark-service/pkg/watermark/endpoints"
    
    	grpctransport "github.com/go-kit/kit/transport/grpc"
    )
    
    type grpcServer struct {
    	get           grpctransport.Handler
    	status        grpctransport.Handler
    	addDocument   grpctransport.Handler
    	watermark     grpctransport.Handler
    	serviceStatus grpctransport.Handler
    }
    
    func NewGRPCServer(ep endpoints.Set) watermark.WatermarkServer {
    	return &grpcServer{
    		get: grpctransport.NewServer(
    			ep.GetEndpoint,
    			decodeGRPCGetRequest,
    			decodeGRPCGetResponse,
    		),
    		status: grpctransport.NewServer(
    			ep.StatusEndpoint,
    			decodeGRPCStatusRequest,
    			decodeGRPCStatusResponse,
    		),
    		addDocument: grpctransport.NewServer(
    			ep.AddDocumentEndpoint,
    			decodeGRPCAddDocumentRequest,
    			decodeGRPCAddDocumentResponse,
    		),
    		watermark: grpctransport.NewServer(
    			ep.WatermarkEndpoint,
    			decodeGRPCWatermarkRequest,
    			decodeGRPCWatermarkResponse,
    		),
    		serviceStatus: grpctransport.NewServer(
    			ep.ServiceStatusEndpoint,
    			decodeGRPCServiceStatusRequest,
    			decodeGRPCServiceStatusResponse,
    		),
    	}
    }
    
    func (g *grpcServer) Get(ctx context.Context, r *watermark.GetRequest) (*watermark.GetReply, error) {
    	_, rep, err := g.get.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.GetReply), nil
    }
    
    func (g *grpcServer) ServiceStatus(ctx context.Context, r *watermark.ServiceStatusRequest) (*watermark.ServiceStatusReply, error) {
    	_, rep, err := g.get.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.ServiceStatusReply), nil
    }
    
    func (g *grpcServer) AddDocument(ctx context.Context, r *watermark.AddDocumentRequest) (*watermark.AddDocumentReply, error) {
    	_, rep, err := g.addDocument.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.AddDocumentReply), nil
    }
    
    func (g *grpcServer) Status(ctx context.Context, r *watermark.StatusRequest) (*watermark.StatusReply, error) {
    	_, rep, err := g.status.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.StatusReply), nil
    }
    
    func (g *grpcServer) Watermark(ctx context.Context, r *watermark.WatermarkRequest) (*watermark.WatermarkReply, error) {
    	_, rep, err := g.watermark.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.WatermarkReply), nil
    }
    
    func decodeGRPCGetRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	req := grpcReq.(*watermark.GetRequest)
    	var filters []internal.Filter
    	for _, f := range req.Filters {
    		filters = append(filters, internal.Filter{Key: f.Key, Value: f.Value})
    	}
    	return endpoints.GetRequest{Filters: filters}, nil
    }
    
    func decodeGRPCStatusRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	req := grpcReq.(*watermark.StatusRequest)
    	return endpoints.StatusRequest{TicketID: req.TicketID}, nil
    }
    
    func decodeGRPCWatermarkRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	req := grpcReq.(*watermark.WatermarkRequest)
    	return endpoints.WatermarkRequest{TicketID: req.TicketID, Mark: req.Mark}, nil
    }
    
    func decodeGRPCAddDocumentRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	req := grpcReq.(*watermark.AddDocumentRequest)
    	doc := &internal.Document{
    		Content:   req.Document.Content,
    		Title:     req.Document.Title,
    		Author:    req.Document.Author,
    		Topic:     req.Document.Topic,
    		Watermark: req.Document.Watermark,
    	}
    	return endpoints.AddDocumentRequest{Document: doc}, nil
    }
    
    func decodeGRPCServiceStatusRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	return endpoints.ServiceStatusRequest{}, nil
    }
    
    func decodeGRPCGetResponse(_ context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.GetReply)
    	var docs []internal.Document
    	for _, d := range reply.Documents {
    		doc := internal.Document{
    			Content:   d.Content,
    			Title:     d.Title,
    			Author:    d.Author,
    			Topic:     d.Topic,
    			Watermark: d.Watermark,
    		}
    		docs = append(docs, doc)
    	}
    	return endpoints.GetResponse{Documents: docs, Err: reply.Err}, nil
    }
    
    func decodeGRPCStatusResponse(_ context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.StatusReply)
    	return endpoints.StatusResponse{Status: internal.Status(reply.Status), Err: reply.Err}, nil
    }
    
    func decodeGRPCWatermarkResponse(ctx context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.WatermarkReply)
    	return endpoints.WatermarkResponse{Code: int(reply.Code), Err: reply.Err}, nil
    }
    
    func decodeGRPCAddDocumentResponse(ctx context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.AddDocumentReply)
    	return endpoints.AddDocumentResponse{TicketID: reply.TicketID, Err: reply.Err}, nil
    }
    
    func decodeGRPCServiceStatusResponse(ctx context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.ServiceStatusReply)
    	return endpoints.ServiceStatusResponse{Code: int(reply.Code), Err: reply.Err}, nil
    }

    – Before moving on to the implementation, we have to create a proto file that acts as the definition of all our service interface and the requests response structs, so that the protobuf files (.pb) can be generated to be used as an interface between services to communicate.

    – Create package pb in the api/v1 package path. Create a new file watermarksvc.proto. Firstly, we will create our service interface, which represents the remote functions to be called by the client. Refer to this for syntax and deep understanding of the protobuf.

    We will convert the service interface to the service interface in the proto file. Also, we have created the request and response structs exactly the same once again in the proto file so that they can be understood by the RPC defined in the service.

    syntax = "proto3";
    
    package pb;
    
    service Watermark {
        rpc Get (GetRequest) returns (GetReply) {}
    
        rpc Watermark (WatermarkRequest) returns (WatermarkReply) {}
    
        rpc Status (StatusRequest) returns (StatusReply) {}
    
        rpc AddDocument (AddDocumentRequest) returns (AddDocumentReply) {}
    
        rpc ServiceStatus (ServiceStatusRequest) returns (ServiceStatusReply) {}
    }
    
    message Document {
        string content = 1;
        string title = 2;
        string author = 3;
        string topic = 4;
        string watermark = 5;
    }
    
    message GetRequest {
        message Filters {
            string key = 1;
            string value = 2;
        }
        repeated Filters filters = 1;
    }
    
    message GetReply {
        repeated Document documents = 1;
        string Err = 2;
    }
    
    message StatusRequest {
        string ticketID = 1;
    }
    
    message StatusReply {
        enum Status {
            PENDING = 0;
            STARTED = 1;
            IN_PROGRESS = 2;
            FINISHED = 3;
            FAILED = 4;
        }
        Status status = 1;
        string Err = 2;
    }
    
    message WatermarkRequest {
        string ticketID = 1;
        string mark = 2;
    }
    
    message WatermarkReply {
        int64 code = 1;
        string err = 2;
    }
    
    message AddDocumentRequest {
        Document document = 1;
    }
    
    message AddDocumentReply {
        string ticketID = 1;
        string err = 2;
    }
    
    message ServiceStatusRequest {}
    
    message ServiceStatusReply {
        int64 code = 1;
        string err = 2;
    }

    Note: Creating the proto files and generating the pb files using protoc is not the scope of this blog. We have assumed that you already know how to create a proto file and generate a pb file from it. If not, please refer protobuf and protoc gen

    I have also created a script to generate the pb file, which just needs the path with the name of the proto file.

    #!/usr/bin/env sh
    
    # Install proto3 from source
    #  brew install autoconf automake libtool
    #  git clone https://github.com/google/protobuf
    #  ./autogen.sh ; ./configure ; make ; make install
    #
    # Update protoc Go bindings via
    #  go get -u github.com/golang/protobuf/{proto,protoc-gen-go}
    #
    # See also
    #  https://github.com/grpc/grpc-go/tree/master/examples
    
    REPO_ROOT="${REPO_ROOT:-$(cd "$(dirname "$0")/../.." && pwd)}"
    PB_PATH="${REPO_ROOT}/api/v1/pb"
    PROTO_FILE=${1:-"watermarksvc.proto"}
    
    
    echo "Generating pb files for ${PROTO_FILE} service"
    protoc -I="${PB_PATH}"  "${PB_PATH}/${PROTO_FILE}" --go_out=plugins=grpc:"${PB_PATH}"

    8. Now, once the pb file is generated in api/v1/pb/watermark package, we will create a new struct grpcserver, grouping all the endpoints for gRPC. This struct should implement pb.WatermarkServer which is the server interface referred by the services.

    To implement these services, we are defining the functions such as func (g *grpcServer) Get(ctx context.Context, r *pb.GetRequest) (*pb.GetReply, error). This function should take the request param and run the ServeGRPC() function and then return the response. Similarly, we should implement the ServeGRPC() functions for the rest of the functions.

    These functions are the actual Remote Procedures to be called by the service.

    We will also need to add the decode and encode functions for the request and response structs from protobuf structs. These functions will map the proto Request/Response struct to the endpoint req/resp structs. For example: func decodeGRPCGetRequest(_ context.Context, grpcReq interface{}) (interface{}, error). This will assert the grpcReq to pb.GetRequest and use its fields to fill the new struct of type endpoints.GetRequest{}. The decoding and encoding functions should be implemented similarly for the other requests and responses.

    package transport
    
    import (
    	"context"
    
    	"github.com/velotiotech/watermark-service/api/v1/pb/watermark"
    
    	"github.com/velotiotech/watermark-service/internal"
    	"github.com/velotiotech/watermark-service/pkg/watermark/endpoints"
    
    	grpctransport "github.com/go-kit/kit/transport/grpc"
    )
    
    type grpcServer struct {
    	get           grpctransport.Handler
    	status        grpctransport.Handler
    	addDocument   grpctransport.Handler
    	watermark     grpctransport.Handler
    	serviceStatus grpctransport.Handler
    }
    
    func NewGRPCServer(ep endpoints.Set) watermark.WatermarkServer {
    	return &grpcServer{
    		get: grpctransport.NewServer(
    			ep.GetEndpoint,
    			decodeGRPCGetRequest,
    			decodeGRPCGetResponse,
    		),
    		status: grpctransport.NewServer(
    			ep.StatusEndpoint,
    			decodeGRPCStatusRequest,
    			decodeGRPCStatusResponse,
    		),
    		addDocument: grpctransport.NewServer(
    			ep.AddDocumentEndpoint,
    			decodeGRPCAddDocumentRequest,
    			decodeGRPCAddDocumentResponse,
    		),
    		watermark: grpctransport.NewServer(
    			ep.WatermarkEndpoint,
    			decodeGRPCWatermarkRequest,
    			decodeGRPCWatermarkResponse,
    		),
    		serviceStatus: grpctransport.NewServer(
    			ep.ServiceStatusEndpoint,
    			decodeGRPCServiceStatusRequest,
    			decodeGRPCServiceStatusResponse,
    		),
    	}
    }
    
    func (g *grpcServer) Get(ctx context.Context, r *watermark.GetRequest) (*watermark.GetReply, error) {
    	_, rep, err := g.get.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.GetReply), nil
    }
    
    func (g *grpcServer) ServiceStatus(ctx context.Context, r *watermark.ServiceStatusRequest) (*watermark.ServiceStatusReply, error) {
    	_, rep, err := g.get.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.ServiceStatusReply), nil
    }
    
    func (g *grpcServer) AddDocument(ctx context.Context, r *watermark.AddDocumentRequest) (*watermark.AddDocumentReply, error) {
    	_, rep, err := g.addDocument.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.AddDocumentReply), nil
    }
    
    func (g *grpcServer) Status(ctx context.Context, r *watermark.StatusRequest) (*watermark.StatusReply, error) {
    	_, rep, err := g.status.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.StatusReply), nil
    }
    
    func (g *grpcServer) Watermark(ctx context.Context, r *watermark.WatermarkRequest) (*watermark.WatermarkReply, error) {
    	_, rep, err := g.watermark.ServeGRPC(ctx, r)
    	if err != nil {
    		return nil, err
    	}
    	return rep.(*watermark.WatermarkReply), nil
    }
    
    func decodeGRPCGetRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	req := grpcReq.(*watermark.GetRequest)
    	var filters []internal.Filter
    	for _, f := range req.Filters {
    		filters = append(filters, internal.Filter{Key: f.Key, Value: f.Value})
    	}
    	return endpoints.GetRequest{Filters: filters}, nil
    }
    
    func decodeGRPCStatusRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	req := grpcReq.(*watermark.StatusRequest)
    	return endpoints.StatusRequest{TicketID: req.TicketID}, nil
    }
    
    func decodeGRPCWatermarkRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	req := grpcReq.(*watermark.WatermarkRequest)
    	return endpoints.WatermarkRequest{TicketID: req.TicketID, Mark: req.Mark}, nil
    }
    
    func decodeGRPCAddDocumentRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	req := grpcReq.(*watermark.AddDocumentRequest)
    	doc := &internal.Document{
    		Content:   req.Document.Content,
    		Title:     req.Document.Title,
    		Author:    req.Document.Author,
    		Topic:     req.Document.Topic,
    		Watermark: req.Document.Watermark,
    	}
    	return endpoints.AddDocumentRequest{Document: doc}, nil
    }
    
    func decodeGRPCServiceStatusRequest(_ context.Context, grpcReq interface{}) (interface{}, error) {
    	return endpoints.ServiceStatusRequest{}, nil
    }
    
    func decodeGRPCGetResponse(_ context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.GetReply)
    	var docs []internal.Document
    	for _, d := range reply.Documents {
    		doc := internal.Document{
    			Content:   d.Content,
    			Title:     d.Title,
    			Author:    d.Author,
    			Topic:     d.Topic,
    			Watermark: d.Watermark,
    		}
    		docs = append(docs, doc)
    	}
    	return endpoints.GetResponse{Documents: docs, Err: reply.Err}, nil
    }
    
    func decodeGRPCStatusResponse(_ context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.StatusReply)
    	return endpoints.StatusResponse{Status: internal.Status(reply.Status), Err: reply.Err}, nil
    }
    
    func decodeGRPCWatermarkResponse(ctx context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.WatermarkReply)
    	return endpoints.WatermarkResponse{Code: int(reply.Code), Err: reply.Err}, nil
    }
    
    func decodeGRPCAddDocumentResponse(ctx context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.AddDocumentReply)
    	return endpoints.AddDocumentResponse{TicketID: reply.TicketID, Err: reply.Err}, nil
    }
    
    func decodeGRPCServiceStatusResponse(ctx context.Context, grpcReply interface{}) (interface{}, error) {
    	reply := grpcReply.(*watermark.ServiceStatusReply)
    	return endpoints.ServiceStatusResponse{Code: int(reply.Code), Err: reply.Err}, nil
    }

    9. Finally, we just have to create the entry point files (main) in the cmd for each service. As we already have mapped the appropriate routes to the endpoints by calling the service functions and also we mapped the proto service server to the endpoints by calling ServeGRPC() functions, now we have to call the HTTP and gRPC server constructors here and start them.

    Create a package watermark in the cmd directory and create a file watermark.go which will hold the code to start and stop the HTTP and gRPC server for the service

    package main
    
    import (
    	"fmt"
    	"net"
    	"net/http"
    	"os"
    	"os/signal"
    	"syscall"
    
    	pb "github.com/velotiotech/watermark-service/api/v1/pb/watermark"
    	"github.com/velotiotech/watermark-service/pkg/watermark"
    	"github.com/velotiotech/watermark-service/pkg/watermark/endpoints"
    	"github.com/velotiotech/watermark-service/pkg/watermark/transport"
    
    	"github.com/go-kit/kit/log"
    	kitgrpc "github.com/go-kit/kit/transport/grpc"
    	"github.com/oklog/oklog/pkg/group"
    	"google.golang.org/grpc"
    )
    
    const (
    	defaultHTTPPort = "8081"
    	defaultGRPCPort = "8082"
    )
    
    func main() {
    	var (
    		logger   log.Logger
    		httpAddr = net.JoinHostPort("localhost", envString("HTTP_PORT", defaultHTTPPort))
    		grpcAddr = net.JoinHostPort("localhost", envString("GRPC_PORT", defaultGRPCPort))
    	)
    
    	logger = log.NewLogfmtLogger(log.NewSyncWriter(os.Stderr))
    	logger = log.With(logger, "ts", log.DefaultTimestampUTC)
    
    	var (
    		service     = watermark.NewService()
    		eps         = endpoints.NewEndpointSet(service)
    		httpHandler = transport.NewHTTPHandler(eps)
    		grpcServer  = transport.NewGRPCServer(eps)
    	)
    
    	var g group.Group
    	{
    		// The HTTP listener mounts the Go kit HTTP handler we created.
    		httpListener, err := net.Listen("tcp", httpAddr)
    		if err != nil {
    			logger.Log("transport", "HTTP", "during", "Listen", "err", err)
    			os.Exit(1)
    		}
    		g.Add(func() error {
    			logger.Log("transport", "HTTP", "addr", httpAddr)
    			return http.Serve(httpListener, httpHandler)
    		}, func(error) {
    			httpListener.Close()
    		})
    	}
    	{
    		// The gRPC listener mounts the Go kit gRPC server we created.
    		grpcListener, err := net.Listen("tcp", grpcAddr)
    		if err != nil {
    			logger.Log("transport", "gRPC", "during", "Listen", "err", err)
    			os.Exit(1)
    		}
    		g.Add(func() error {
    			logger.Log("transport", "gRPC", "addr", grpcAddr)
    			// we add the Go Kit gRPC Interceptor to our gRPC service as it is used by
    			// the here demonstrated zipkin tracing middleware.
    			baseServer := grpc.NewServer(grpc.UnaryInterceptor(kitgrpc.Interceptor))
    			pb.RegisterWatermarkServer(baseServer, grpcServer)
    			return baseServer.Serve(grpcListener)
    		}, func(error) {
    			grpcListener.Close()
    		})
    	}
    	{
    		// This function just sits and waits for ctrl-C.
    		cancelInterrupt := make(chan struct{})
    		g.Add(func() error {
    			c := make(chan os.Signal, 1)
    			signal.Notify(c, syscall.SIGINT, syscall.SIGTERM)
    			select {
    			case sig := <-c:
    				return fmt.Errorf("received signal %s", sig)
    			case <-cancelInterrupt:
    				return nil
    			}
    		}, func(error) {
    			close(cancelInterrupt)
    		})
    	}
    	logger.Log("exit", g.Run())
    }
    
    func envString(env, fallback string) string {
    	e := os.Getenv(env)
    	if e == "" {
    		return fallback
    	}
    	return e
    }

    Let’s walk you through the above code. Firstly, we will use the fixed ports to make the server listen to them. 8081 for HTTP Server and 8082 for gRPC Server. Then in these code stubs, we will create the HTTP and gRPC servers, endpoints of the service backend and the service.

    service = watermark.NewService()
    eps = endpoints.NewEndpointSet(service)
    grpcServer = transport.NewGRPCServer(eps)
    httpHandler = transport.NewHTTPHandler(eps)

    Now the next step is interesting. We are creating a variable of oklog.Group. If you are new to this term, please refer here. Group helps you elegantly manage the group of Goroutines. We are creating three Goroutines: One for HTTP server, second for gRPC server and the last one for watching on the cancel interrupts. Just like this:

    g.Add(func() error {
        logger.Log("transport", "HTTP", "addr", httpAddr)
        return http.Serve(httpListener, httpHandler)
    }, func(error) {
        httpListener.Close()
    })

    Similarly, we will start a gRPC server and a cancel interrupt watcher.
    Great!! We are done here. Now, let’s run the service.

    go run ./cmd/watermark/watermark.go

    The server has started locally. Now, just open a Postman or run curl to one of the endpoints. See below:
    We ran the HTTP server to check the service status:

    ~ curl http://localhost:8081/healthz
    {"status":200}

    We have successfully created a service and ran the endpoints.

    Further:

    I really like to make a project complete always with all the other maintenance parts revolving around. Just like adding the proper README, have proper .gitignore, .dockerignore, Makefile, Dockerfiles, golang-ci-lint config files, and CI-CD config files etc.

    I have created a separate Dockerfile for each of the three services in path /images/.

    I have created a multi-staged dockerfile to create the binary of the service and run it. We will just copy the appropriate directories of code in the docker image, build the image all in one and then create a new image in the same file and copy the binary in it from the previous one. Similarly, the dockerfiles are created for other services also.

    In the dockerfile, we have given the CMD as go run watermark. This command will be the entry point of the container.
    I have also created a Makefile which has two main targets: build-image and build-push. The first one is to build the image and the second is to push it.

    Note: I am keeping this blog concise as it is difficult to cover all the things. The code in the repo that I have shared in the beginning covers most of the important concepts around services. I am still working and continue committing improvements and features.

    Let’s see how we can deploy:

    We will see how to deploy all these services in the containerized orchestration tools (ex: Kubernetes). Assuming you have worked on Kubernetes with at least a beginner’s understanding before.

    In deploy dir, create a sample deployment having three containers: auth, watermark and database. Since for each container, the entry point commands are already defined in the dockerfiles, we don’t need to send any args or cmd in the deployment.

    We will also need the service which will be used to route the external traffic of request from another load balancer service or nodeport type service. To make it work, we might have to create a nodeport type of service to expose the watermark-service to make it running for now.

    Another important and very interesting part is to deploy the API Gateway. It is required to have at least some knowledge of any cloud provider stack to deploy the API Gateway. I have used Azure stack to deploy an API Gateway using the resource called as “API-Management” in the Azure plane. Refer the rules config files for the Azure APIM api-gateway:

    Further, only a proper CI/CD setup is remaining which is one of the most essential parts of a project after development.
    I would definitely like to discuss all the above deployment-related stuff in more detail but that is not in the scope of my current blog. Maybe I will post another blog for the same.

    Wrapping up:

    We have learned how to build a complete project with three microservices in Golang using one of the best-distributed system development frameworks: Go kit. We have also used the database PostgreSQL using the GORM used heavily in the Go community.
    We did not stop just at the development but also we tried to theoretically cover the development lifecycle of the project by understanding what, how and where to deploy.

    We created one microservice completely from scratch. Go kit makes it very simple to write the relationship between endpoints, service implementations and the communication/transport mechanisms. Now, go and try to create other services from the problem statement.

  • Building Your First AWS Serverless Application? Here’s Everything You Need to Know

    A serverless architecture is a way to implement and run applications and services or micro-services without need to manage infrastructure. Your application still runs on servers, but all the servers management is done by AWS. Now we don’t need to provision, scale or maintain servers to run our applications, databases and storage systems. Services which are developed by developers who don’t let developers build application from scratch.

    Why Serverless

    1. More focus on development rather than managing servers.
    2. Cost Effective.
    3. Application which scales automatically.
    4. Quick application setup.

    Services For ServerLess

    For implementing serverless architecture there are multiple services which are provided by cloud partners though we will be exploring most of the services from AWS. Following are the services which we can use depending on the application requirement.

    1. Lambda: It is used to write business logic / schedulers / functions.
    2. S3: It is mostly used for storing objects but it also gives the privilege to host WebApps. You can host a static website on S3.
    3. API Gateway: It is used for creating, publishing, maintaining, monitoring and securing REST and WebSocket APIs at any scale.
    4. Cognito: It provides authentication, authorization & user management for your web and mobile apps. Your users can sign in directly sign in with a username and password or through third parties such as Facebook, Amazon or Google.
    5. DynamoDB: It is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.

    Three-tier Serverless Architecture

    So, let’s take a use case in which you want to develop a three tier serverless application. The three tier architecture is a popular pattern for user facing applications, The tiers that comprise the architecture include the presentation tier, the logic tier and the data tier. The presentation tier represents the component that users directly interact with web page / mobile app UI. The logic tier contains the code required to translate user action at the presentation tier to the functionality that drives the application’s behaviour. The data tier consists of your storage media (databases, file systems, object stores) that holds the data relevant to the application. Figure shows the simple three-tier application.

     Figure: Simple Three-Tier Architectural Pattern

    Presentation Tier

    The presentation tier of the three tier represents the View part of the application. Here you can use S3 to host static website. On a static website, individual web pages include static content and they also contain client side scripting.

    The following is a quick procedure to configure an Amazon S3 bucket for static website hosting in the S3 console.

    To configure an S3 bucket for static website hosting

    1. Log in to the AWS Management Console and open the S3 console at

    2. In the Bucket name list, choose the name of the bucket that you want to enable static website hosting for.

    3. Choose Properties.

    4. Choose Static Website Hosting

    Once you enable your bucket for static website hosting, browsers can access all of your content through the Amazon S3 website endpoint for your bucket.

    5. Choose Use this bucket to host.

    A. For Index Document, type the name of your index document, which is typically named index.html. When you configure a S3 bucket for website hosting, you must specify an index document, which will be returned by S3 when requests are made to the root domain or any of the subfolders.

    B. (Optional) For 4XX errors, you can optionally provide your own custom error document that provides additional guidance for your users. Type the name of the file that contains the custom error document. If an error occurs, S3 returns an error document.

    C. (Optional) If you want to give advanced redirection rules, In the edit redirection rule text box, you have to XML to describe the rule.
    E.g.

    <RoutingRules>
        <RoutingRule>
            <Condition>
                <HttpErrorCodeReturnedEquals>403</HttpErrorCodeReturnedEquals>
            </Condition>
            <Redirect>
                <HostName>mywebsite.com</HostName>
                <ReplaceKeyPrefixWith>notfound/</ReplaceKeyPrefixWith>
            </Redirect>
        </RoutingRule>
    </RoutingRules>

    6. Choose Save

    7. Add a bucket policy to the website bucket that grants access to the object in the S3 bucket for everyone. You must make the objects that you want to serve publicly readable, when you configure a S3 bucket as a website. To do so, you write a bucket policy that grants everyone S3:GetObject permission. The following bucket policy grants everyone access to the objects in the example-bucket bucket.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "PublicReadGetObject",
                "Effect": "Allow",
                "Principal": "*",
                "Action": [
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::example-bucket/*"
                ]
            }
        ]
    }

    Note: If you choose Disable Website Hosting, S3 removes the website configuration from the bucket, so that the bucket no longer accessible from the website endpoint, but the bucket is still available at the REST endpoint.

    Logic Tier

    The logic tier represents the brains of the application. Here the two core services for serverless will be used i.e. API Gateway and Lambda to form your logic tier can be so revolutionary. The feature of the 2 services allow you to build a serverless production application which is highly scalable, available and secure. Your application could use number of servers, however by leveraging this pattern you do not have to manage a single one. In addition, by using these managed services together you get following benefits:

    1. No operating system to choose, secure or manage.
    2. No servers to right size, monitor.
    3. No risk to your cost by over-provisioning.
    4. No Risk to your performance by under-provisioning.

    API Gateway

    API Gateway is a fully managed service for defining, deploying and maintaining APIs. Anyone can integrate with the APIs using standard HTTPS requests. However, it has specific features and qualities that result it being an edge for your logic tier.

    Integration with Lambda

    API Gateway gives your application a simple way to leverage the innovation of AWS lambda directly (HTTPS Requests). API Gateway forms the bridge that connects your presentation tier and the functions you write in Lambda. After defining the client / server relationship using your API, the contents of the client’s HTTPS requests are passed to Lambda function for execution. The content include request metadata, request headers and the request body.

    API Performance Across the Globe

    Each deployment of API Gateway includes an Amazon CloudFront distribution under the covers. Amazon CloudFront is a content delivery web service that used Amazon’s global network of edge locations as connection points for clients integrating with API. This helps drive down the total response time latency of your API. Through its use of multiple edge locations across the world, Amazon CloudFront also provides you capabilities to combat distributed denial of service (DDoS) attack scenarios.

    You can improve the performance of specific API requests by using API Gateway to store responses in an optional in-memory cache. This not only provides performance benefits for repeated API requests, but is also reduces backend executions, which can reduce overall cost.

    Let’s dive into each step

    1. Create Lambda Function
    Login to Aws Console and head over to Lambda Service and Click on “Create A Function”

    A. Choose first option “Author from scratch”
    B. Enter Function Name
    C. Select Runtime e.g. Python 2.7
    D. Click on “Create Function”

    As your function is ready, you can see your basic function will get generated in language you choose to write.
    E.g.

    import json
    
    def lambda_handler(event, context):
        # TODO implement
        return {
            'statusCode': 200,
            'body': json.dumps('Hello from Lambda!')
        }

    2. Testing Lambda Function

    Click on “Test” button at the top right corner where we need to configure test event. As we are not sending any events, just give event a name, for example, “Hello World” template as it is and “Create” it.

    Now, when you hit the “Test” button again, it runs through testing the function we created earlier and returns the configured value.

    Create & Configure API Gateway connecting to Lambda

    We are done with creating lambda functions but how to invoke function from outside world ? We need endpoint, right ?

    Go to API Gateway & click on “Get Started” and agree on creating an Example API but we will not use that API we will create “New API”. Give it a name by keeping “Endpoint Type” regional for now.

    Create the API and you will go on the page “resources” page of the created API Gateway. Go through the following steps:

    A. Click on the “Actions”, then click on “Create Method”. Select Get method for our function. Then, “Tick Mark” on the right side of “GET” to set it up.
    B. Choose “Lambda Function” as integration type.
    C. Choose the region where we created earlier.
    D. Write the name of Lambda Function we created
    E. Save the method where it will ask you for confirmation of “Add Permission to Lambda Function”. Agree to that & that is done.
    F. Now, we can test our setup. Click on “Test” to run API. It should give the response text we had on the lambda test screen.

    Now, to get endpoint. We need to deploy the API. On the Actions dropdown, click on Deploy API under API Actions. Fill in the details of deployment and hit Deploy.

    After that, we will get our HTTPS endpoint.

    On the above screen you can see the things like cache settings, throttling, logging which can be configured. Save the changes and browse the invoke URL from which we will get the response which was earlier getting from Lambda. So, here is our logic tier of serverless application is to be done.

    Data Tier

    By using Lambda as your logic tier, you have a number of data storage options for your data tier. These options fall into broad categories: Amazon VPC hosted data stores and IAM-enabled data stores. Lambda has the ability to integrate with both securely.

    Amazon VPC Hosted Data Stores

    1. Amazon RDS
    2. Amazon ElasticCache
    3. Amazon Redshift

    IAM-Enabled Data Stores

    1. Amazon DynamoDB
    2. Amazon S3
    3. Amazon ElasticSearch Service

    You can use any of those for storage purpose, But DynamoDB is one of best suited for ServerLess application.

    Why DynamoDB ?

    1. It is NoSQL DB, also that is fully managed by AWS.
    2. It provides fast & prectable performance with seamless scalability.
    3. DynamoDB lets you offload the administrative burden of operating and scaling a distributed system.
    4. It offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data.
    5. You can scale up/down your tables throughput capacity without downtime/performance degradation.
    6. It provides On-Demand backups as well as enable point in time recovery for your DynamoDB tables.
    7. DynamoDB allows you to delete expired items from table automatically to help you reduce storage usage and the cost of storing data that is no longer relevant.

    Following is the sample script for DynamoDB with Python which you can use with lambda.

    from __future__ import print_function # Python 2/3 compatibility
    import boto3
    import json
    import decimal
    from boto3.dynamodb.conditions import Key, Attr
    from botocore.exceptions import ClientError
    
    # Helper class to convert a DynamoDB item to JSON.
    class DecimalEncoder(json.JSONEncoder):
        def default(self, o):
            if isinstance(o, decimal.Decimal):
                if o % 1 > 0:
                    return float(o)
                else:
                    return int(o)
            return super(DecimalEncoder, self).default(o)
    
    dynamodb = boto3.resource("dynamodb", region_name='us-west-2', endpoint_url="http://localhost:8000")
    
    table = dynamodb.Table('Movies')
    
    title = "The Big New Movie"
    year = 2015
    
    try:
        response = table.get_item(
            Key={
                'year': year,
                'title': title
            }
        )
    except ClientError as e:
        print(e.response['Error']['Message'])
    else:
        item = response['Item']
        print("GetItem succeeded:")
        print(json.dumps(item, indent=4, cls=DecimalEncoder))

    Note: To run the above script successfully you need to attach policy to your role for lambda. So in this case you need to attach policy for DynamoDB operations to take place & for CloudWatch if required to store your logs. Following is the policy which you can attach to your role for DB executions.

    {
    	"Version": "2012-10-17",
    	"Statement": [{
    			"Effect": "Allow",
    			"Action": [
    				"dynamodb:BatchGetItem",
    				"dynamodb:GetItem",
    				"dynamodb:Query",
    				"dynamodb:Scan",
    				"dynamodb:BatchWriteItem",
    				"dynamodb:PutItem",
    				"dynamodb:UpdateItem"
    			],
    			"Resource": "arn:aws:dynamodb:eu-west-1:123456789012:table/SampleTable"
    		},
    		{
    			"Effect": "Allow",
    			"Action": [
    				"logs:CreateLogStream",
    				"logs:PutLogEvents"
    			],
    			"Resource": "arn:aws:logs:eu-west-1:123456789012:*"
    		},
    		{
    			"Effect": "Allow",
    			"Action": "logs:CreateLogGroup",
    			"Resource": "*"
    		}
    	]
    }

    Sample Architecture Patterns

    You can implement the following popular architecture patterns using API Gateway & Lambda as your logic tier, Amazon S3 for presentation tier, DynamoDB as your data tier. For each example, we will only use AWS Service that do not require users to manage their own infrastructure.

    Mobile Backend

    1. Presentation Tier: A mobile application running on each user’s smartphone.

    2. Logic Tier: API Gateway & Lambda. The logic tier is globally distributed by the Amazon CloudFront distribution created as part of each API Gateway each API. A set of lambda functions can be specific to user / device identity management and authentication & managed by Amazon Cognito, which provides integration with IAM for temporary user access credentials as well as with popular third party identity providers. Other Lambda functions can define core business logic for your Mobile Back End.

    3. Data Tier: The various data storage services can be leveraged as needed; options are given above in data tier.

    Amazon S3 Hosted Website

    1. Presentation Tier: Static website content hosted on S3, distributed by Amazon CLoudFront. Hosting static website content on S3 is a cost effective alternative to hosting content on server-based infrastructure. However, for a website to contain rich feature, the static content often must integrate with a dynamic back end.

    2. Logic Tier: API Gateway & Lambda, static web content hosted in S3 can directly integrate with API Gateway, which can be CORS complaint.

    3. Data Tier: The various data storage services can be leveraged based on your requirement.

    ServerLess Costing

    At the top of the AWS invoice, we can see the total costing of AWS Services. The bill was processed for 2.1 million API request & all of the infrastructure required to support them.

    Following is the list of services with their costing.

    Note: You can get your costing done from AWS Calculator using following links;

    1. https://calculator.s3.amazonaws.com/index.html
    2. AWS Pricing Calculator

    Conclusion

    The three-tier architecture pattern encourages the best practice of creating application component that are easy to maintain, develop, decoupled & scalable. Serverless Application services varies based on the requirements over development.

  • Learn How to Quickly Setup Istio Using GKE and its Applications

    In this blog, we will try to understand Istio and its YAML configurations. You will also learn why Istio is great for managing traffic and how to set it up using Google Kubernetes Engine (GKE). I’ve also shed some light on deploying Istio in various environments and applications like intelligent routing, traffic shifting, injecting delays, and testing the resiliency of your application.

    What is Istio?

    The Istio’s website says it is “An open platform to connect, manage, and secure microservices”.

    As a network of microservices known as ‘Service Mesh’ grows in size and complexity, it can become tougher to understand and manage. Its requirements can include discovery, load balancing, failure recovery, metrics, and monitoring, and often more complex operational requirements such as A/B testing, canary releases, rate limiting, access control, and end-to-end authentication. Istio claims that it provides complete end to end solution to these problems.

    Why Istio?

    • Provides automatic load balancing for various protocols like HTTP, gRPC, WebSocket, and TCP traffic. It means you can cater to the needs of web services and also frameworks like Tensorflow (it uses gRPC).
    • To control the flow of traffic and API calls between services, make calls more reliable, and make the network more robust in the face of adverse conditions.
    • To gain understanding of the dependencies between services and the nature and flow of traffic between them, providing the ability to quickly identify issues etc.

    Let’s explore the architecture of Istio.

    Istio’s service mesh is split logically into two components:

    1. Data plane – set of intelligent proxies (Envoy) deployed as sidecars to the microservice they control communications between microservices.
    2. Control plane – manages and configures proxies to route traffic. It also enforces policies.

    Envoy – Istio uses an extended version of envoy (L7 proxy and communication bus designed for large modern service-oriented architectures) written in C++. It manages inbound and outbound traffic for service mesh.

    Enough of theory, now let us setup Istio to see things in action. A notable point is that Istio is pretty fast. It’s written in Go and adds a very tiny overhead to your system.

    Setup Istio on GKE

    You can either setup Istio via command line or via UI. We have used command line installation for this blog.

    Sample Book Review Application

    Following this link, you can easily

    The Bookinfo application is broken into four separate microservices:

    • productpage. The productpage microservice calls the details and reviews microservices to populate the page.
    • details. The details microservice contains book information.
    • reviews. The reviews microservice contains book reviews. It also calls the ratings microservice.
    • ratings. The ratings microservice contains book ranking information that accompanies a book review.

    There are 3 versions of the reviews microservice:

    • Version v1 doesn’t call the ratings service.
    • Version v2 calls the ratings service and displays each rating as 1 to 5 black stars.
    • Version v3 calls the ratings service and displays each rating as 1 to 5 red stars.

    The end-to-end architecture of the application is shown below.

    If everything goes well, You will have a web app like this (served at http://GATEWAY_URL/productpage)

    Let’s take a case where 50% of traffic is routed to v1 and the remaining 50% to v3.

    This is how the config file looks like (/path/to/istio-0.2.12/samples/bookinfo/kube/route-rule-reviews-50-v3.yaml) 

    apiVersion: config.istio.io/v1alpha2
    kind: RouteRule
    metadata:
      name: reviews-default
    spec:
      destination:
        name: reviews
      precedence: 1
      route:
      - labels:
          version: v1
        weight: 50
      - labels:
          version: v3
        weight: 50

    Let’s try to understand the config file above.

    Istio provides a simple Domain-specific language (DSL) to control how API calls and layer-4 traffic flow across various services in the application deployment.

    In the above configuration, we are trying to Add a “Route Rule”. It means we will be routing the traffic coming to destinations. The destination is the name of the service to which the traffic is being routed. The route labels identify the specific service instances that will receive traffic.

    In this Kubernetes deployment of Istio, the route label “version: v1” and “version: v3” indicates that only pods containing the label “version: v1” and “version: v3” will receive 50% traffic each.

    Now multiple route rules could be applied to the same destination. The order of evaluation of rules corresponding to a given destination, when there is more than one, can be specified by setting the precedence field of the rule.

    The precedence field is an optional integer value, 0 by default. Rules with higher precedence values are evaluated first. If there is more than one rule with the same precedence value the order of evaluation is undefined.

    When is precedence useful? Whenever the routing story for a particular service is purely weight based, it can be specified in a single rule.

    Once a rule is found that applies to the incoming request, it will be executed and the rule-evaluation process will terminate. That’s why it’s very important to carefully consider the priorities of each rule when there is more than one.

    In short, it means route label “version: v1” is given preference over route label “version: v3”.

    Intelligent Routing Using Istio

    We will demonstrate an example in which we will be aiming to get more control over routing the traffic coming to our app. Before reading ahead, make sure that you have installed Istio and book review application.

    First, we will set a default version for all microservices.

    > kubectl create -f samples/bookinfo/kube/route-rule-all-v1.yaml

    Then wait a few seconds for the rules to propagate to all pods before attempting to access the application. This will set the default route to v1 version (which doesn’t call rating service). Now we want a specific user, say Velotio, to see v2 version. We write a yaml (test-velotio.yaml) file.

    apiVersion: config.istio.io/v1alpha2
    kind: RouteRule
    metadata:
      name: test-velotio
      namespace: default
      ...
    spec:
      destination:
        name: reviews
      match:
        request:
          headers:
            cookie:
              regex: ^(.*?;)?(user=velotio)(;.*)?$
      precedence: 2
      route:
      - labels:
          version: v2

    We then set this rule

    > kubectl create -f path/to/test-velotio.yml

    Now if any other user logs in it won’t see any ratings (it will see v1 version) but when “velotio” user logs in it will see v2 version!

    This is how we can intelligently do content-based routing. We used Istio to send 100% of the traffic to the v1 version of each of the Bookinfo services. You then set a rule to selectively send traffic to version v2 of the reviews service based on a header (i.e., a user cookie) in a request.

    Traffic Shifting

    Now Let’s take a case in which we have to shift traffic from an old service to a new service.

    We can use Istio to gradually transfer traffic from one microservice to another one. For example, we can move 10, 20, 25..100% of traffic. Here for simplicity of the blog, we will move traffic from reviews:v1 to reviews:v3 in two steps 40% to 100%.

    First, we set the default version v1.

    > kubectl create -f samples/bookinfo/kube/route-rule-all-v1.yaml

    We write a yaml file route-rule-reviews-40-v3.yaml

    apiVersion: config.istio.io/v1alpha2
    kind: RouteRule
    metadata:
      name: reviews-default
      namespace: default
    spec:
      destination:
        name: reviews
      precedence: 1
      route:
      - labels:
          version: v1
        weight: 60
      - labels:
          version: v3
        weight: 40

    Then we apply a new rule.

    > kubectl create -f path/to/route-rule-reviews-40-v3.yaml

    Now, Refresh the productpage in your browser and you should now see red colored star ratings approximately 40% of the time. Once that is stable, we transfer all the traffic to v3.

    > istioctl replace -f samples/bookinfo/kube/route-rule-reviews-v3.yaml

    Inject Delays and Test the Resiliency of Your Application

    Here we will check fault injection using HTTP delay. To test our Bookinfo application microservices for resiliency, we will inject a 7s delay between the reviews:v2 and ratings microservices, for user “Jason”. Since the reviews:v2 service has a 10s timeout for its calls to the ratings service, we expect the end-to-end flow to continue without any errors.

    > istioctl create -f samples/bookinfo/kube/route-rule-ratings-test-delay.yaml

    Now we check if the rule was applied correctly,

    > istioctl get routerule ratings-test-delay -o yaml

    Now we allow several seconds to account for rule propagation delay to all pods. Log in as user “Jason”. If the application’s front page was set to correctly handle delays, we expect it to load within approximately 7 seconds.

    Conclusion

    In this blog we only explored the routing capabilities of Istio. We found Istio to give us good amount of control over routing, fault injection etc in microservices. Istio has a lot more to offer like load balancing and security. We encourage you guys to toy around with Istio and tell us about your experiences.

    Happy Coding!

  • The 7 Most Useful Design Patterns in ES6 (and how you can implement them)

    After spending a couple of years in JavaScript development, I’ve realized how incredibly important design patterns are, in modern JavaScript (ES6). And I’d love to share my experience and knowledge on the subject, hoping you’d make this a critical part of your development process as well.

    Note: All the examples covered in this post are implemented with ES6 features, but you can also integrate the design patterns with ES5.

    At Velotio, we always follow best practices to achieve highly maintainable and more robust code. And we are strong believers of using design patterns as one of the best ways to write clean code. 

    In the post below, I’ve listed the most useful design patterns I’ve implemented so far and how you can implement them too:

    1. Module

    The module pattern simply allows you to keep units of code cleanly separated and organized. 

    Modules promote encapsulation, which means the variables and functions are kept private inside the module body and can’t be overwritten.

    Creating a module in ES6 is quite simple.

    // Addition module
    export const sum = (num1, num2) => num1 + num2;

    // usage
    import { sum } from 'modules/sum';
    const result = sum(20, 30); // 50

    ES6 also allows us to export the module as default. The following example gives you a better understanding of this.

    // All the variables and functions which are not exported are private within the module and cannot be used outside. Only the exported members are public and can be used by importing them.
    
    // Here the businessList is private member to city module
    const businessList = new WeakMap();
     
    // Here City uses the businessList member as it’s in same module
    class City {
     constructor() {
       businessList.set(this, ['Pizza Hut', 'Dominos', 'Street Pizza']);
     }
     
     // public method to access the private ‘businessList’
     getBusinessList() {
       return businessList.get(this);
     }
    
    // public method to add business to ‘businessList’
     addBusiness(business) {
       businessList.get(this).push(business);
     }
    }
     
    // export the City class as default module
    export default City;

    // usage
    import City from 'modules/city';
    const city = new City();
    city.getBusinessList();

    There is a great article written on the features of ES6 modules here.

    2. Factory

    Imagine creating a Notification Management application where your application currently only allows for a notification through Email, so most of the code lives inside the EmailNotification class. And now there is a new requirement for PushNotifications. So, to implement the PushNotifications, you have to do a lot of work as your application is mostly coupled with the EmailNotification. You will repeat the same thing for future implementations.

    To solve this complexity, we will delegate the object creation to another object called factory.

    class PushNotification {
     constructor(sendTo, message) {
       this.sendTo = sendTo;
       this.message = message;
     }
    }
     
    class EmailNotification {
     constructor(sendTo, cc, emailContent) {
       this.sendTo = sendTo;
       this.cc = cc;
       this.emailContent = emailContent;
     }
    }
     
    // Notification Factory
     
    class NotificationFactory {
     createNotification(type, props) {
       switch (type) {
         case 'email':
           return new EmailNotification(props.sendTo, props.cc, props.emailContent);
         case 'push':
           return new PushNotification(props.sendTo, props.message);
       }
     }
    }
     
    // usage
    const factory = new NotificationFactory();
     
    // create email notification
    const emailNotification = factory.createNotification('email', {
     sendTo: 'receiver@domain.com',
     cc: 'test@domain.com',
     emailContent: 'This is the email content to be delivered.!',
    });
     
    // create push notification
    const pushNotification = factory.createNotification('push', {
     sendTo: 'receiver-device-id',
     message: 'The push notification message',
    });

    3. Observer

    (Also known as the publish/subscribe pattern.)

    An observer pattern maintains the list of subscribers so that whenever an event occurs, it will notify them. An observer can also remove the subscriber if the subscriber no longer wishes to be notified.

    On YouTube, many times, the channels we’re subscribed to will notify us whenever a new video is uploaded.

    // Publisher
    class Video {
     constructor(observable, name, content) {
       this.observable = observable;
       this.name = name;
       this.content = content;
       // publish the ‘video-uploaded’ event
       this.observable.publish('video-uploaded', {
         name,
         content,
       });
     }
    }
    // Subscriber
    class User {
     constructor(observable) {
       this.observable = observable;
       this.intrestedVideos = [];
       // subscribe with the event naame and the call back function
       this.observable.subscribe('video-uploaded', this.addVideo.bind(this));
     }
     
     addVideo(video) {
       this.intrestedVideos.push(video);
     }
    }
    // Observer 
    class Observable {
     constructor() {
       this.handlers = [];
     }
     
     subscribe(event, handler) {
       this.handlers[event] = this.handlers[event] || [];
       this.handlers[event].push(handler);
     }
     
     publish(event, eventData) {
       const eventHandlers = this.handlers[event];
     
       if (eventHandlers) {
         for (var i = 0, l = eventHandlers.length; i < l; ++i) {
           eventHandlers[i].call({}, eventData);
         }
       }
     }
    }
    // usage
    const observable = new Observable();
    const user = new User(observable);
    const video = new Video(observable, 'ES6 Design Patterns', videoFile);

    4. Mediator

    The mediator pattern provides a unified interface through which different components of an application can communicate with each other.

    If a system appears to have too many direct relationships between components, it may be time to have a central point of control that components communicate through instead. 

    The mediator promotes loose coupling. 

    A real-time analogy could be a traffic light signal that handles which vehicles can go and stop, as all the communications are controlled from a traffic light.

    Let’s create a chatroom (mediator) through which the participants can register themselves. The chatroom is responsible for handling the routing when the participants chat with each other. 

    // each participant represented by Participant object
    class Participant {
     constructor(name) {
       this.name = name;
     }
      getParticiantDetails() {
       return this.name;
     }
    }
     
    // Mediator
    class Chatroom {
     constructor() {
       this.participants = {};
     }
     
     register(participant) {
       this.participants[participant.name] = participant;
       participant.chatroom = this;
     }
     
     send(message, from, to) {
       if (to) {
         // single message
         to.receive(message, from);
       } else {
         // broadcast message to everyone
         for (key in this.participants) {
           if (this.participants[key] !== from) {
             this.participants[key].receive(message, from);
           }
         }
       }
     }
    }
     
    // usage
    // Create two participants  
     const john = new Participant('John');
     const snow = new Participant('Snow');
    // Register the participants to Chatroom
     var chatroom = new Chatroom();
     chatroom.register(john);
     chatroom.register(snow);
    // Participants now chat with each other
     john.send('Hey, Snow!');
     john.send('Are you there?');
     snow.send('Hey man', yoko);
     snow.send('Yes, I heard that!');

    5. Command

    In the command pattern, an operation is wrapped as a command object and passed to the invoker object. The invoker object passes the command to the corresponding object, which executes the command.

    The command pattern decouples the objects executing the commands from objects issuing the commands. The command pattern encapsulates actions as objects. It maintains a stack of commands whenever a command is executed, and pushed to stack. To undo a command, it will pop the action from stack and perform reverse action.

    You can consider a calculator as a command that performs addition, subtraction, division and multiplication, and each operation is encapsulated by a command object.

    // The list of operations can be performed
    const addNumbers = (num1, num2) => num1 + num2;
    const subNumbers = (num1, num2) => num1 - num2;
    const multiplyNumbers = (num1, num2) => num1 * num2;
    const divideNumbers = (num1, num2) => num1 / num2;
     
    // CalculatorCommand class initialize with execute function, undo function // and the value 
    class CalculatorCommand {
     constructor(execute, undo, value) {
       this.execute = execute;
       this.undo = undo;
       this.value = value;
     }
    }
    // Here we are creating the command objects
    const DoAddition = value => new CalculatorCommand(addNumbers, subNumbers, value);
    const DoSubtraction = value => new CalculatorCommand(subNumbers, addNumbers, value);
    const DoMultiplication = value => new CalculatorCommand(multiplyNumbers, divideNumbers, value);
    const DoDivision = value => new CalculatorCommand(divideNumbers, multiplyNumbers, value);
     
    // AdvancedCalculator which maintains the list of commands to execute and // undo the executed command
    class AdvancedCalculator {
     constructor() {
       this.current = 0;
       this.commands = [];
     }
     
     execute(command) {
       this.current = command.execute(this.current, command.value);
       this.commands.push(command);
     }
     
     undo() {
       let command = this.commands.pop();
       this.current = command.undo(this.current, command.value);
     }
     
     getCurrentValue() {
       return this.current;
     }
    }
    
    // usage
    const advCal = new AdvancedCalculator();
     
    // invoke commands
    advCal.execute(new DoAddition(50)); //50
    advCal.execute(new DoSubtraction(25)); //25
    advCal.execute(new DoMultiplication(4)); //100
    advCal.execute(new DoDivision(2)); //50
     
    // undo commands
    advCal.undo();
    advCal.getCurrentValue(); //100

    6. Facade

    The facade pattern is used when we want to show the higher level of abstraction and hide the complexity behind the large codebase.

    A great example of this pattern is used in the common DOM manipulation libraries like jQuery, which simplifies the selection and events adding mechanism of the elements.

    // JavaScript:
    /* handle click event  */
    document.getElementById('counter').addEventListener('click', () => {
     counter++;
    });
     
    // jQuery:
    /* handle click event */
    $('#counter').on('click', () => {
     counter++;
    });

    Though it seems simple on the surface, there is an entire complex logic implemented when performing the operation.

    The following Account Creation example gives you clarity about the facade pattern: 

    // Here AccountManager is responsible to create new account of type 
    // Savings or Current with the unique account number
    let currentAccountNumber = 0;
    
    class AccountManager {
     createAccount(type, details) {
       const accountNumber = AccountManager.getUniqueAccountNumber();
       let account;
       if (type === 'current') {
         account = new CurrentAccount();
       } else {
         account = new SavingsAccount();
       }
       return account.addAccount({ accountNumber, details });
     }
     
     static getUniqueAccountNumber() {
       return ++currentAccountNumber;
     }
    }
    
    
    // class Accounts maintains the list of all accounts created
    class Accounts {
     constructor() {
       this.accounts = [];
     }
     
     addAccount(account) {
       this.accounts.push(account);
       return this.successMessage(complaint);
     }
     
     getAccount(accountNumber) {
       return this.accounts.find(account => account.accountNumber === accountNumber);
     }
     
     successMessage(account) {}
    }
    
    // CurrentAccounts extends the implementation of Accounts for providing more specific success messages on successful account creation
    class CurrentAccounts extends Accounts {
     constructor() {
       super();
       if (CurrentAccounts.exists) {
         return CurrentAccounts.instance;
       }
       CurrentAccounts.instance = this;
       CurrentAccounts.exists = true;
       return this;
     }
     
     successMessage({ accountNumber, details }) {
       return `Current Account created with ${details}. ${accountNumber} is your account number.`;
     }
    }
     
    // Same here, SavingsAccount extends the implementation of Accounts for providing more specific success messages on successful account creation
    class SavingsAccount extends Accounts {
     constructor() {
       super();
       if (SavingsAccount.exists) {
         return SavingsAccount.instance;
       }
       SavingsAccount.instance = this;
       SavingsAccount.exists = true;
       return this;
     }
     
     successMessage({ accountNumber, details }) {
       return `Savings Account created with ${details}. ${accountNumber} is your account number.`;
     }
    }
     
    // usage
    // Here we are hiding the complexities of creating account
    const accountManager = new AccountManager();
     
    const currentAccount = accountManager.createAccount('current', { name: 'John Snow', address: 'pune' });
     
    const savingsAccount = accountManager.createAccount('savings', { name: 'Petter Kim', address: 'mumbai' });

    7. Adapter

    The adapter pattern converts the interface of a class to another expected interface, making two incompatible interfaces work together. 

    With the adapter pattern, you might need to show the data from a 3rd party library with the bar chart representation, but the data formats of the 3rd party library API and the display bar chart are different. Below, you’ll find an adapter that converts the 3rd party library API response to Highcharts’ bar representation:

    // API Response
    [{
       symbol: 'SIC DIVISION',
       exchange: 'Agricultural services',
       volume: 42232,
    }]
     
    // Required format
    [{
       category: 'Agricultural services',
       name: 'SIC DIVISION',
       y: 42232,
    }]
     
    const mapping = {
     symbol: 'category',
     exchange: 'name',
     volume: 'y',
    };
     
    const highchartsAdapter = (response, mapping) => {
     return response.map(item => {
       const normalized = {};
     
       // Normalize each response's item key, according to the mapping
       Object.keys(item).forEach(key => (normalized[mapping[key]] = item[key]));
       return normalized;
     });
    };
     
    highchartsAdapter(response, mapping);

    Conclusion

    This has been a brief introduction to the design patterns in modern JavaScript (ES6). This subject is massive, but hopefully this article has shown you the benefits of using it when writing code.

    Related Articles

    1. Cleaner, Efficient Code with Hooks and Functional Programming

    2. Building a Progressive Web Application in React [With Live Code Examples]

  • Implementing GraphQL with Flutter: Everything you need to know

    Thinking about using GraphQL but unsure where to start? 

    This is a concise tutorial based on our experience using GraphQL. You will learn how to use GraphQL in a Flutter app, including how to create a query, a mutation, and a subscription using the graphql_flutter plugin. Once you’ve mastered the fundamentals, you can move on to designing your own workflow.

    Key topics and takeaways:

    * GraphQL

    * What is graphql_flutter?

    * Setting up graphql_flutter and GraphQLProvider

    * Queries

    * Mutations

    * Subscriptions

    GraphQL

    Looking to call multiple endpoints to populate data for a single screen? Wish you had more control over the data returned by the endpoint? Is it possible to get more data with a single endpoint call, or does the call only return the necessary data fields?

    Follow along to learn how to do this with GraphQL. GraphQL’s goal was to change the way data is supplied from the backend, and it allows you to specify the data structure you want.

    Let’s imagine that we have the table model in our database that looks like this:

    Movie {

     title

     genre

     rating

     year

    }

    These fields represent the properties of the Movie Model:

    • title property is the name of the Movie,
    • genre describes what kind of movie
    • rating represents viewers interests
    • year states when it is released

    We can get movies like this using REST:

    /GET localhost:8080/movies

    [
     {
       "title": "The Godfather",
       "genre":  "Drama",
       "rating": 9.2,
       "year": 1972
     }
    ]

    As you can see, whether or not we need them, REST returns all of the properties of each movie. In our frontend, we may just need the title and genre properties, yet all of them were returned.

    We can avoid redundancy by using GraphQL. We can specify the properties we wish to be returned using GraphQL, for example:

    query movies { Movie {   title   genre }}

    We’re informing the server that we only require the movie table’s title and genre properties. It provides us with exactly what we require:

    {
     "data": [
       {
         "title": "The Godfather",
         "genre": "Drama"
       }
     ]
    }

    GraphQL is a backend technology, whereas Flutter is a frontend SDK for developing mobile apps. We get the data displayed on the mobile app from a backend when we use mobile apps.

    It’s simple to create a Flutter app that retrieves data from a GraphQL backend. Simply make an HTTP request from the Flutter app, then use the returned data to set up and display the UI.

    The new graphql_flutter plugin includes APIs and widgets for retrieving and using data from GraphQL backends.

    What is graphql_flutter?

    The new graphql_flutter plugin includes APIs and widgets that make it simple to retrieve and use data from a GraphQL backend.

    graphql_flutter, as the name suggests, is a GraphQL client for Flutter. It exports widgets and providers for retrieving data from GraphQL backends, such as:

    • HttpLink — This is used to specify the backend’s endpoint or URL.
    • GraphQLClient — This class is used to retrieve a query or mutation from a GraphQL endpoint as well as to connect to a GraphQL server.
    • GraphQLCache — We use this class to cache our queries and mutations. It has an options store where we pass the type of store to it during its caching operation.
    • GraphQLProvider — This widget encapsulates the graphql flutter widgets, allowing them to perform queries and mutations. This widget is given to the GraphQL client to use. All widgets in this provider’s tree have access to this client.
    • Query — This widget is used to perform a backend GraphQL query.
    • Mutation — This widget is used to modify a GraphQL backend.
    • Subscription — This widget allows you to create a subscription.

    Setting up graphql_flutter and GraphQLProvider

    Create a Flutter project:

    flutter create flutter_graphqlcd flutter_graphql

    Next, install the graphql_flutter package:

    flutter pub add graphql_flutter

    The code above will set up the graphql_flutter package. This will include the graphql_flutter package in the dependencies section of your pubspec.yaml file:

    dependencies:graphql_flutter: ^5.0.0

    To use the widgets, we must import the package as follows:

    import 'package:graphql_flutter/graphql_flutter.dart';

    Before we can start making GraphQL queries and mutations, we must first wrap our root widget in GraphQLProvider. A GraphQLClient instance must be provided to the GraphQLProvider’s client property.

    GraphQLProvider( client: GraphQLClient(...))

    The GraphQLClient includes the GraphQL server URL as well as a caching mechanism.

    final httpLink = HttpLink(uri: "http://10.0.2.2:8000/");‍ValueNotifier<GraphQLClient> client = ValueNotifier( GraphQLClient(   cache: InMemoryCache(),   link: httpLink ));

    HttpLink is used to generate the URL for the GraphQL server. The GraphQLClient receives the instance of the HttpLink in the form of a link property, which contains the URL of the GraphQL endpoint.

    The cache passed to GraphQLClient specifies the cache mechanism to be used. To persist or store caches, the InMemoryCache instance makes use of an in-memory database.

    A GraphQLClient instance is passed to a ValueNotifier. This ValueNotifer holds a single value and has listeners that notify it when that value changes. This is used by graphql_flutter to notify its widgets when the data from a GraphQL endpoint changes, which helps graphql_flutter remain responsive.

    We’ll now encase our MaterialApp widget in GraphQLProvider:

    void main() { runApp(MyApp());}‍class MyApp extends StatelessWidget { // This widget is the root of your application. @override Widget build(BuildContext context) {   return GraphQLProvider(       client: client,       child: MaterialApp(         title: 'GraphQL Demo',         theme: ThemeData(primarySwatch: Colors.blue),         home: MyHomePage(title: 'GraphQL Demo'),       )); }}

    Queries

    We’ll use the Query widget to create a query with the graphql_flutter package.

    class MyHomePage extends StatelessWidget { @override Widget build(BuildContext) {   return Query(     options: QueryOptions(       document: gql(readCounters),       variables: {         'counterId': 23,       },       pollInterval: Duration(seconds: 10),     ),     builder: (QueryResult result,         { VoidCallback refetch, FetchMore fetchMore }) {       if (result.hasException) {         return Text(result.exception.toString());       }‍       if (result.isLoading) {         return Text('Loading');       }‍       // it can be either Map or List       List counters = result.data['counter'];‍       return ListView.builder(           itemCount: repositories.length,           itemBuilder: (context, index) {             return Text(counters[index]['name']);           });     },) }}

    The Query widget encloses the ListView widget, which will display the list of counters to be retrieved from our GraphQL server. As a result, the Query widget must wrap the widget where the data fetched by the Query widget is to be displayed.

    The Query widget cannot be the tree’s topmost widget. It can be placed wherever you want as long as the widget that will use its data is underneath or wrapped by it.

    In addition, two properties have been passed to the Query widget: options and builder.

    options

    options: QueryOptions( document: gql(readCounters), variables: {   'conuterId': 23, }, pollInterval: Duration(seconds: 10),),

    The option property is where the query configuration is passed to the Query widget. This options prop is a QueryOptions instance. The QueryOptions class exposes properties that we use to configure the Query widget.

    The query string or the query to be conducted by the Query widget is set or sent in via the document property. We passed in the readCounters string here:

    final String readCounters = """query readCounters($counterId: Int!) {   counter {       name       id   }}""";

    The variables attribute is used to send query variables to the Query widget. There is a ‘counterId’: 23 there. In the readCounters query string, this will be passed in place of $counterId.

    The pollInterval specifies how often the Query widget polls or refreshes the query data. The timer is set to 10 seconds, so the Query widget will perform HTTP requests to refresh the query data every 10 seconds.

    builder

    A function is the builder property. When the Query widget sends an HTTP request to the GraphQL server endpoint, this function is called. The Query widget calls the builder function with the data from the query, a function to re-fetch the data, and a function for pagination. This is used to get more information.

    The builder function returns widgets that are listed below the Query widget. The result argument is a QueryResult instance. The QueryResult class has properties that can be used to determine the query’s current state and the data returned by the Query widget.

    • If the query encounters an error, QueryResult.hasException is set.
    • If the query is still in progress, QueryResult.isLoading is set. We can use this property to show our users a UI progress bar to let them know that something is on its way.
    • The data returned by the GraphQL endpoint is stored in QueryResult.data.

    Mutations

    Let’s look at how to make mutation queries with the Mutation widget in graphql_flutter.

    The Mutation widget is used as follows:

    Mutation( options: MutationOptions(   document: gql(addCounter),   update: (GraphQLDataProxy cache, QueryResult result) {     return cache;   },   onCompleted: (dynamic resultData) {     print(resultData);   }, ), builder: (   RunMutation runMutation,   QueryResult result, ) {   return FlatButton(       onPressed: () => runMutation({             'counterId': 21,           }),       child: Text('Add Counter')); },);

    The Mutation widget, like the Query widget, accepts some properties.

    • options is a MutationOptions class instance. This is the location of the mutation string and other configurations.
    • The mutation string is set using a document. An addCounter mutation has been passed to the document in this case. The Mutation widget will handle it.
    • When we want to update the cache, we call update. The update function receives the previous cache (cache) and the outcome of the mutation. Anything returned by the update becomes the cache’s new value. Based on the results, we’re refreshing the cache.
    • When the mutations on the GraphQL endpoint have been called, onCompleted is called. The onCompleted function is then called with the mutation result builder to return the widget from the Mutation widget tree. This function is invoked with a RunMutation instance, runMutation, and a QueryResult instance result.
    • The Mutation widget’s mutation is executed using runMutation. The Mutation widget causes the mutation whenever it is called. The mutation variables are passed as parameters to the runMutation function. The runMutation function is invoked with the counterId variable, 21.

    When the Mutation’s mutation is finished, the builder is called, and the Mutation rebuilds its tree. runMutation and the mutation result are passed to the builder function.

    Subscriptions

    Subscriptions in GraphQL are similar to an event system that listens on a WebSocket and calls a function whenever an event is emitted into the stream.

    The client connects to the GraphQL server via a WebSocket. The event is passed to the WebSocket whenever the server emits an event from its end. So this is happening in real-time.

    The graphql_flutter plugin in Flutter uses WebSockets and Dart streams to open and receive real-time updates from the server.

    Let’s look at how we can use our Flutter app’s Subscription widget to create a real-time connection. We’ll start by creating our subscription string:

    final counterSubscription = '''subscription counterAdded {   counterAdded {       name       id   }}''';

    When we add a new counter to our GraphQL server, this subscription will notify us in real-time.

    Subscription(   options: SubscriptionOptions(     document: gql(counterSubscription),   ),   builder: (result) {     if (result.hasException) {       return Text("Error occurred: " + result.exception.toString());     }‍     if (result.isLoading) {       return Center(         child: const CircularProgressIndicator(),       );     }‍     return ResultAccumulator.appendUniqueEntries(         latest: result.data,         builder: (context, {results}) => ...     );   }),

    The Subscription widget has several properties, as we can see:

    • options holds the Subscription widget’s configuration.
    • document holds the subscription string.
    • builder returns the Subscription widget’s widget tree.

    The subscription result is used to call the builder function. The end result has the following properties:

    • If the Subscription widget encounters an error while polling the GraphQL server for updates, result.hasException is set.
    • If polling from the server is active, result.isLoading is set.

    The provided helper widget ResultAccumulator is used to collect subscription results, according to graphql_flutter’s pub.dev page.

    Conclusion

    This blog intends to help you understand what makes GraphQL so powerful, how to use it in Flutter, and how to take advantage of the reactive nature of graphql_flutter. You can now take the first steps in building your applications with GraphQL!

  • A Step Towards Machine Learning Algorithms: Univariate Linear Regression

    These days the concept of Machine Learning is evolving rapidly. The understanding of it is so vast and open that everyone is having their independent thoughts about it. Here I am putting mine. This blog is my experience with the learning algorithms. In this blog, we will get to know the basic difference between Artificial Intelligence, Machine Learning, and Deep Learning. We will also get to know the foundation Machine Learning Algorithm i.e Univariate Linear Regression.

    Intermediate knowledge of Python and its library (Numpy, Pandas, MatPlotLib) is good to start. For Mathematics, a little knowledge of Algebra, Calculus and Graph Theory will help to understand the trick of the algorithm.

    A way to Artificial intelligence, Machine Learning, and Deep Learning

    These are the three buzzwords of today’s Internet world where we are seeing the future of the programming language. Specifically, we can say that this is the place where science domain meets with programming. Here we use scientific concepts and mathematics with a programming language to simulate the decision-making process. Artificial Intelligence is a program or the ability of a machine to make decisions more as humans do. Machine Learning is another program that supports Artificial Intelligence.  It helps the machine to observe the pattern and learn from it to make a decision. Here programming is helping in observing the patterns not in making decisions. Machine learning requires more and more information from various sources to observe all of the variables for any given pattern to make more accurate decisions. Here deep learning is supporting machine learning by creating a network (neural network) to fetch all required information and provide it to machine learning algorithms.

    What is Machine Learning

    Definition: Machine Learning provides machines with the ability to learn autonomously based on experiences, observations and analyzing patterns within a given data set without explicitly programming.

    This is a two-part process. In the first part, it observes and analyses the patterns of given data and makes a shrewd guess of a mathematical function that will be very close to the pattern. There are various methods for this. Few of them are Linear, Non-Linear, logistic, etc. Here we calculate the error function using the guessed mathematical function and the given data. In the second part we will minimize the error function. This minimized function is used for the prediction of the pattern.

    Here are the general steps to understand the process of Machine Learning:

    1. Plot the given dataset on x-y axis
    2. By looking into the graph, we will guess more close mathematical function
    3. Derive the Error function with the given dataset and guessed mathematical function
    4. Try to minimize an error function by using some algorithms
    5. Minimized error function will give us a more accurate mathematical function for the given patterns.

    Getting Started with the First Algorithms: Linear Regression with Univariable

    Linear Regression is a very basic algorithm or we can say the first and foundation algorithm to understand the concept of ML. We will try to understand this with an example of given data of prices of plots for a given area. This example will help us understand it better.

    movieID	title	userID	rating	timestamp
    0	1	Toy story	170	3.0	1162208198000
    1	1	Toy story	175	4.0	1133674606000
    2	1	Toy story	190	4.5	1057778398000
    3	1	Toy story	267	2.5	1084284499000
    4	1	Toy story	325	4.0	1134939391000
    5	1	Toy story	493	3.5	1217711355000
    6	1	Toy story	533	5.0	1050012402000
    7	1	Toy story	545	4.0	1162333326000
    8	1	Toy story	580	5.0	1162374884000
    9	1	Toy story	622	4.0	1215485147000
    10	1	Toy story	788	4.0	1188553740000

    With this data, we can easily determine the price of plots of the given area. But what if we want the price of the plot with area 5.0 * 10 sq mtr. There is no direct price of this in our given dataset. So how we can get the price of the plots with the area not given in the dataset. This we can do using Linear Regression.

    So at first, we will plot this data into a graph.

    The below graphs describe the area of plots (10 sq mtr) in x-axis and its prices in y-axis (Lakhs INR).

    Definition of Linear Regression

    The objective of a linear regression model is to find a relationship between one or more features (independent variables) and a continuous target variable(dependent variable). When there is only feature it is called Univariate Linear Regression and if there are multiple features, it is called Multiple Linear Regression.

    Hypothesis function:

    Here we will try to find the relation between price and area of plots. As this is an example of univariate, we can see that the price is only dependent on the area of the plot.

    By observing this pattern we can have our hypothesis function as below:

    f(x) = w * x + b

    where w is weightage and b is biased.

    For the different value set of (w,b) there can be multiple line possible but for one set of value, it will be close to this pattern.

    When we generalize this function for multivariable then there will be a set of values of w then these constants are also termed as model params.

    Note: There is a range of mathematical functions that relate to this pattern and selection of the function is totally up to us. But point to be taken care is that neither it should be under or overmatched and function must be continuous so that we can easily differentiate it or it should have global minima or maxima.

    Error for a point

    As our hypothesis function is continuous, for every Xi (area points) there will be one Yi  Predicted Price and Y will be the actual price.

    So the error at any point,

    Ei = Yi – Y = F(Xi) – Y

    These errors are also called as residuals. These residuals can be positive (if actual points lie below the predicted line) or negative (if actual points lie above the predicted line). Our motive is to minimize this residual for each of the points.

    Note: While observing the patterns it is possible that few points are very far from the pattern. For these far points, residuals will be much more so if these points are less in numbers than we can avoid these points considering that these are errors in the dataset. Such points are termed as outliers.

    Energy Functions

    As there are m training points, we can calculate the Average Energy function below

    E (w,b) =  1/m ( iΣm  (Ei) )

    and

    our motive is to minimize the energy functions

    min (E (w,b)) at point ( w,b )

    Little Calculus: For any continuous function, the points where the first derivative is zero are the points of either minima or maxima. If the second derivative is negative, it is the point of maxima and if it is positive, it is the point of minima.

    Here we will do the trick – we will convert our energy function into an upper parabola by squaring the error function. It will ensure that our energy function will have only one global minima (the point of our concern). It will simplify our calculation that where the first derivative of the energy function will be zero is the point that we need and the value of  (w,b) at that point will be our required point.

    So our final Energy function is

    E (w,b) =  1/2m ( iΣm  (Ei)2 )

    dividing by 2 doesn’t affect our result and at the time of derivation it will cancel out for e.g

    the first derivative of x2  is 2x.

    Gradient Descent Method

    Gradient descent is a generic optimization algorithm. It iteratively hit and trials the parameters of the model in order to minimize the energy function.

    In the above picture, we can see on the right side:

    1. w0 and w1 is the random initialization and by following gradient descent it is moving towards global minima.
    2. No of turns of the black line is the number of iterations so it must not be more or less.
    3. The distance between the turns is alpha i.e the learning parameter.

    By solving this left side equation we will be able to get model params at the global minima of energy functions.

    Points to consider at the time of Gradient Descent calculations:

    1. Random initialization: We start this algorithm at any random point that is set of random (w, b) value. By moving along this algorithm decide at which direction new trials have to be taken. As we know that it will be the upper parabola so by moving into the right direction (towards the global minima) we will get lesser value compared to the previous point.
    2. No of iterations: No of iteration must not be more or less. If it is lesser, we will not reach global minima and if it is more, then it will be extra calculations around the global minima.
    3. Alpha as learning parameters: when alpha is too small then gradient descent will be slow as it takes unnecessary steps to reach the global minima. If alpha is too big then it might overshoot the global minima. In this case it will neither converge nor diverge.

    Implementation of Gradient Descent in Python

    """ Method to read the csv file using Pandas and later use this data for linear regression. """
    """ Better run with Python 3+. """
    
    # Library to read csv file effectively
    import pandas
    import matplotlib.pyplot as plt
    import numpy as np
    
    # Method to read the csv file
    def load_data(file_name):
    	column_names = ['area', 'price']
    	# To read columns
    	io = pandas.read_csv(file_name,names=column_names, header=None)
    	x_val = (io.values[1:, 0])
    	y_val = (io.values[1:, 1])
    	size_array = len(y_val)
    	for i in range(size_array):
    		x_val[i] = float(x_val[i])
    		y_val[i] = float(y_val[i])
    		return x_val, y_val
    
    # Call the method for a specific file
    x_raw, y_raw = load_data('area-price.csv')
    x_raw = x_raw.astype(np.float)
    y_raw = y_raw.astype(np.float)
    y = y_raw
    
    # Modeling
    w, b = 0.1, 0.1
    num_epoch = 100
    converge_rate = np.zeros([num_epoch , 1], dtype=float)
    learning_rate = 1e-3
    for e in range(num_epoch):
    	# Calculate the gradient of the loss function with respect to arguments (model parameters) manually.
    	y_predicted = w * x_raw + b
    	grad_w, grad_b = (y_predicted - y).dot(x_raw), (y_predicted - y).sum()
    	# Update parameters.
    	w, b = w - learning_rate * grad_w, b - learning_rate * grad_b
    	converge_rate[e] = np.mean(np.square(y_predicted-y))
    
    print(w, b)
    print(f"predicted function f(x) = x * {w} + {b}" )
    calculatedprice = (10 * w) + b
    print(f"price of plot with area 10 sqmtr = 10 * {w} + {b} = {calculatedprice}")

    This is the basic implementation of Gradient Descent algorithms using numpy and Pandas. It is basically reading the area-price.csv file. Here we are normalizing the x-axis for better readability of data points over the graph. We have taken (w,b) as (0.1, 0.1) as random initialization. We have taken 100 as count of iterations and learning rate as .001.

    In every iteration, we are calculating w and b value and seeing it for converging rate.

    We can repeat this calculation for (w,b) for different values of random initialization, no of iterations and learning rate (alpha).

    Note: There is another python Library TensorFlow which is more preferable for such calculations. There are inbuilt functions of Gradient Descent in TensorFlow. But for better understanding, we have used library numpy and pandas here.

    RMSE (Root Mean Square Error)

    RMSE: This is the method to verify that our calculation of (w,b) is accurate at what extent. Below is the basic formula of calculation of RMSE where f is the predicted value and the observed value.

    Note: There is no absolute good or bad threshold value for RMSE, however, we can assume this based on our observed value. For an observed value ranges from 0 to 1000, the RMSE value of 0.7 is small, but if the range goes from 0 to 1, it is not that small.

    Conclusion

    As part of this article, we have seen a little introduction to Machine Learning and the need for it. Then with the help of a very basic example, we learned about one of the various optimization algorithms i.e. Linear Regression (for univariate only). This can be generalized for multivariate also. We then use the Gradient Descent Method for the calculation of the predicted data model in Linear Regression. We also learned the basic flow details of Gradient Descent. There is one example in python for displaying Linear Regression via Gradient Descent.

  • Publish APIs For Your Customers: Deploy Serverless Developer Portal For Amazon API Gateway

    Amazon API Gateway is a fully managed service that allows you to create, secure, publish, test and monitor your APIs. We often come across scenarios where customers of these APIs expect a platform to learn and discover APIs that are available to them (often with examples).

    The Serverless Developer Portal is one such application that is used for developer engagement by making your APIs available to your customers. Further, your customers can use the developer portal to subscribe to an API, browse API documentation, test published APIs, monitor their API usage, and submit their feedback.

    This blog is a detailed step-by-step guide for deploying the Serverless Developer Portal for APIs that are managed via Amazon API Gateway.

    Advantages

    The users of the Amazon API Gateway can be vaguely categorized as –

    API Publishers – They can use the Serverless Developer Portal to expose and secure their APIs for customers which can be integrated with AWS Marketplace for monetary benefits. Furthermore, they can customize the developer portal, including content, styling, logos, custom domains, etc. 

    API Consumers – They could be Frontend/Backend developers, third party customers, or simply students. They can explore available APIs, invoke the APIs, and go through the documentation to get an insight into how each API works with different requests. 

    Developer Portal Architecture

    We would need to establish a basic understanding of how the developer portal works. The Serverless Developer Portal is a serverless application built on microservice architecture using Amazon API Gateway, Amazon Cognito, AWS Lambda, Simple Storage Service and Amazon CloudFront. 

    The developer portal comprises multiple microservices and components as described in the following figure.

    Source: AWS

    There are a few key pieces in the above architecture –

    1. Identity Management: Amazon Cognito is basically the secure user directory of the developer portal responsible for user management. It allows you to configure triggers for registration, authentication, and confirmation, thereby giving you more control over the authentication process. 
    2. Business Logic: AWS Cloudfront is configured to serve your static content hosted in a private S3 bucket. The static content is built using the React JS framework which interacts with backend APIs dictating the business logic for various events. 
    3. Catalog Management: Developer portal uses catalog for rendering the APIs with Swagger specifications on the APIs page. The catalog file (catalog.json in S3 Artifact bucket) is updated whenever an API is published or removed. This is achieved by creating an S3 trigger on AWS Lambda responsible for studying the content of the catalog directory and generating a catalog for the developer portal.  
    4. API Key Creation: API Key is created for consumers at the time of registration. Whenever you subscribe to an API, associated Usage Plans are updated to your API key, thereby giving you access to those APIs as defined by the usage plan. Cognito User – API key mapping is stored in the DynamoDB table along with other registration related details.
    5. Static Asset Uploader: AWS Lambda (Static-Asset-Uploader) is responsible for updating/deploying static assets for the developer portal. Static assets include – content, logos, icons, CSS, JavaScripts, and other media files.

    Let’s move forward to building and deploying a simple Serverless Developer Portal.

    Building Your API

    Start with deploying an API which can be accessed using API Gateway from 

    https://<api-id>.execute-api.region.amazonaws.com/stage

    If you do not have any such API available, create a simple application by jumping to the section, “API Performance Across the Globe,” on this blog.

    Setup custom domain name

    For professional projects, I recommend that you create a custom domain name as they provide simpler and more intuitive URLs you can provide to your API users.

    Make sure your API Gateway domain name is updated in the Route53 record set created after you set up your custom domain name. 

    See more on Setting up custom domain names for REST APIs – Amazon API Gateway

    Enable CORS for an API Resource

    There are two ways you can enable CORS on a resource:

    1. Enable CORS Using the Console
    2. Enable CORS on a resource using the import API from Amazon API Gateway

    Let’s discuss the easiest way to do it using a console.

    1. Open API Gateway console.
    2. Select the API Gateway for your API from the list.
    3. Choose a resource to enable CORS for all the methods under that resource.
      Alternatively, you could choose a method under the resource to enable CORS for just this method.
    4. Select Enable CORS from the Actions drop-down menu.
    5. In the Enable CORS form, do the following:
      – Leave Access-Control-Allow-Headers and Access-Control-Allow-Origin header to default values.
      – Click on Enable CORS and replace existing CORS headers.
    6. Review the changes in Confirm method changes popup, choose Yes, overwrite existing values to apply your CORS settings.

    Once enabled, you can see a mock integration on the OPTIONS method for the selected resource. You must enable CORS for ${proxy} resources too. 

    To verify the CORS is enabled on API resource, try curl on OPTIONS method

    curl -v -X OPTIONS -H "Access-Control-Request-Method: POST" -H "Origin: http://example.com" https://api-id.execute-api.region.amazonaws.com/stage
    

    You should see the response OK in the header:

    < HTTP/1.1 200 OK
    < Content-Type: application/json
    < Content-Length: 0
    < Connection: keep-alive
    < Date: Mon, 13 Apr 2020 16:27:44 GMT
    < x-amzn-RequestId: a50b97b5-2437-436c-b99c-22e00bbe9430
    < Access-Control-Allow-Origin: *
    < Access-Control-Allow-Headers: Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token
    < x-amz-apigw-id: K7voBHDZIAMFu9g=
    < Access-Control-Allow-Methods: DELETE,GET,HEAD,OPTIONS,PATCH,POST,PUT
    < X-Cache: Miss from cloudfront
    < Via: 1.1 1c8c957c4a5bf1213bd57bd7d0ec6570.cloudfront.net (CloudFront)
    < X-Amz-Cf-Pop: BOM50-C1
    < X-Amz-Cf-Id: OmxFzV2-TH2BWPVyOohNrhNlJ-s1ZhYVKyoJaIrA_zyE9i0mRTYxOQ==

    Deploy Developer Portal

    There are two ways to deploy the developer portal for your API. 

    Using SAR

    An easy way will be to deploy api-gateway-dev-portal directly from AWS Serverless Application Repository. 

    Note -If you intend to upgrade your Developer portal to a major version then you need to refer to the Upgrading Instructions which is currently under development.

    Using AWS SAM

    1. Ensure that you have the latest AWS CLI and AWS SAM CLI installed and configured.
    2. Download or clone the API Gateway Serverless Developer Portal repository.
    3. Update the Cloudformation template file – cloudformation/template.yaml.

    Parameters you must configure and verify includes: 

    • ArtifactsS3BucketName
    • DevPortalSiteS3BucketName
    • DevPortalCustomersTableName
    • DevPortalPreLoginAccountsTableName
    • DevPortalAdminEmail
    • DevPortalFeedbackTableName
    • CognitoIdentityPoolName
    • CognitoDomainNameOrPrefix
    • CustomDomainName
    • CustomDomainNameAcmCertArn
    • UseRoute53Nameservers
    • AccountRegistrationMode

    You can view your template file in AWS Cloudformation Designer to get a better idea of all the components/services involved and how they are connected.

    See Developer portal settings for more information about parameters.

    1. Replace the static files in your project with the ones you would like to use.
      dev-portal/public/custom-content
      lambdas/static-asset-uploader/build
      api-logo contains the logos you would like to show on the API page (in png format). Portal checks for an api-id_stage.png file when rendering the API page. If not found, it chooses the default logo – default.png.
      content-fragments includes various markdown files comprising the content of the different pages in the portal. 
      Other static assets including favicon.ico, home-image.png and nav-logo.png that appear on your portal. 
    2. Let’s create a ZIP file of your code and dependencies, and upload it to Amazon S3. Running below command creates an AWS SAM template packaged.yaml, replacing references to local artifacts with the Amazon S3 location where the command uploaded the artifacts:
    sam package --template-file ./cloudformation/template.yaml --output-template-file ./cloudformation/packaged.yaml --s3-bucket {your-lambda-artifacts-bucket-name}

    1. Run the following command from the project root to deploy your portal, replace:
      – {your-template-bucket-name}
      with the name of your Amazon S3 bucket.
      – {custom-prefix}
      with a prefix that is globally unique.
      – {cognito-domain-or-prefix}
      with a unique string.
    sam deploy --template-file ./cloudformation/packaged.yaml --s3-bucket {your-template-bucket-name} --stack-name "{custom-prefix}-dev-portal" --capabilities CAPABILITY_NAMED_IAM

    Note: Ensure that you have required privileges to make deployments, as, during the deployment process, it attempts to create various resources such as AWS Lambda, Cognito User Pool, IAM roles, API Gateway, Cloudfront Distribution, etc. 

    After your developer portal has been fully deployed, you can get its URL by following.

    1. Open the AWS CloudFormation console.
    2. Select your stack you created above.
    3. Open the Outputs section. The URL for the developer portal is specified in the WebSiteURL property.

    Create Usage Plan

    Create a usage plan, to list your API under a subscribable APIs category allowing consumers to access the API using their API keys in the developer portal. Ensure that the API gateway stage is configured for the usage plan.

    Publishing an API

    Only Administrators have permission to publish an API. To create an Administrator account for your developer portal –

    1. Go to the WebSiteURL obtained after the successful deployment. 

    2. On the top right of the home page click on Register.

    Source: Github

    3. Fill the registration form and hit Sign up.

    4. Enter the confirmation code received on your email address provided in the previous step.

    5. Promote the user as Administrator by adding it to AdminGroup. 

    • Open Amazon Cognito User Pool console.
    • Select the User Pool created for your developer portal.
    • From the General Settings > Users and Groups page, select the User you want to promote as Administrator.
    • Click on Add to group and then select the Admin group from the dropdown and confirm.

    6. You will be required to log in again to log in as an Administrator. Click on the Admin Panel and choose the API you wish to publish from the APIs list.

    Setting up an account

    The signup process depends on the registration mode selected for the developer portal. 

    For request registration mode, you need to wait for the Administrator to approve your registration request.

    For invite registration mode, you can only register on the portal when invited by the portal administrator. 

    Subscribing an API

    1. Sign in to the developer portal.
    2. Navigate to the Dashboard page and Copy your API Key.
    3. Go to APIs Page to see a list of published APIs.
    4. Select an API you wish to subscribe to and hit the Subscribe button.

    Tips

    1. When a user subscribes to API, all the APIs published under that usage plan are accessible no matter whether they are published or not.
    2. Whenever you subscribe to an API, the catalog is exported from API Gateway resource documentation. You can customize the workflow or override the catalog swagger definition JSON in S3 bucket as defined in ArtifactsS3BucketName under /catalog/<apiid>_<stage>.json</stage></apiid>
    3. For backend APIs, CORS requests are allowed only from custom domain names selected for your developer portal.
    4. Ensure to set the CORS response header from the published API in order to invoke them from the developer portal.

    Summary

    You’ve seen how to deploy a Serverless Developer Portal and publish an API. If you are creating a serverless application for the first time, you might want to read more on Serverless Computing and AWS Gateway before you get started. 

    Start building your own developer portal. To know more on distributing your API Gateway APIs to your customers follow this AWS guide.

  • Using Packer and Terraform to Setup Jenkins Master-Slave Architecture

    Automation is everywhere and it is better to adopt it as soon as possible. Today, in this blog post, we are going to discuss creating the infrastructure. For this, we will be using AWS for hosting our deployment pipeline. Packer will be used to create AMI’s and Terraform will be used for creating the master/slaves. We will be discussing different ways of connecting the slaves and will also run a sample application with the pipeline.

    Please remember the intent of the blog is to accumulate all the different components together, this means some of the code which should be available in development code repo is also included here. Now that we have highlighted the required tools, 10000 ft view and intent of the blog. Let’s begin.

    Using Packer to Create AMI’s for Jenkins Master and Linux Slave

    Hashicorp has bestowed with some of the most amazing tools for simplifying our life. Packer is one of them. Packer can be used to create custom AMI from already available AMI’s. We just need to create a JSON file and pass installation script as part of creation and it will take care of developing the AMI for us. Install packer depending upon your requirement from Packer downloads page. For simplicity purpose, we will be using Linux machine for creating Jenkins Master and Linux Slave. JSON file for both of them will be same but can be separated if needed.

    Note: user-data passed from terraform will be different which will eventually differentiate their usage.

    We are using Amazon Linux 2 – JSON file for the same.

    {
      "builders": [
      {
        "ami_description": "{{user `ami-description`}}",
        "ami_name": "{{user `ami-name`}}",
        "ami_regions": [
          "us-east-1"
        ],
        "ami_users": [
          "XXXXXXXXXX"
        ],
        "ena_support": "true",
        "instance_type": "t2.medium",
        "region": "us-east-1",
        "source_ami_filter": {
          "filters": {
            "name": "amzn2-ami-hvm-2.0*x86_64*",
            "root-device-type": "ebs",
            "virtualization-type": "hvm"
          },
          "most_recent": true,
          "owners": [
            "amazon"
          ]
        },
        "sriov_support": "true",
        "ssh_username": "ec2-user",
        "tags": {
          "Name": "{{user `ami-name`}}"
        },
        "type": "amazon-ebs"
      }
    ],
    "post-processors": [
      {
        "inline": [
          "echo AMI Name {{user `ami-name`}}",
          "date",
          "exit 0"
        ],
        "type": "shell-local"
      }
    ],
    "provisioners": [
      {
        "script": "install_amazon.bash",
        "type": "shell"
      }
    ],
      "variables": {
        "ami-description": "Amazon Linux for Jenkins Master and Slave ({{isotime \"2006-01-02-15-04-05\"}})",
        "ami-name": "amazon-linux-for-jenkins-{{isotime \"2006-01-02-15-04-05\"}}",
        "aws_access_key": "",
        "aws_secret_key": ""
      }
    }

    As you can see the file is pretty simple. The only thing of interest here is the install_amazon.bash script. In this blog post, we will deploy a Node-based application which is running inside a docker container. Content of the bash file is as follows:

    #!/bin/bash
    
    set -x
    
    # For Node
    curl -sL https://rpm.nodesource.com/setup_10.x | sudo -E bash -
    
    # For xmlstarlet
    sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
    
    sudo yum update -y
    
    sleep 10
    
    # Setting up Docker
    sudo yum install -y docker
    sudo usermod -a -G docker ec2-user
    
    # Just to be safe removing previously available java if present
    sudo yum remove -y java
    
    sudo yum install -y python2-pip jq unzip vim tree biosdevname nc mariadb bind-utils at screen tmux xmlstarlet git java-1.8.0-openjdk nc gcc-c++ make nodejs
    
    sudo -H pip install awscli bcrypt
    sudo -H pip install --upgrade awscli
    sudo -H pip install --upgrade aws-ec2-assign-elastic-ip
    
    sudo npm install -g @angular/cli
    
    sudo systemctl enable docker
    sudo systemctl enable atd
    
    sudo yum clean all
    sudo rm -rf /var/cache/yum/
    exit 0
    @velotiotech

    Now there are a lot of things mentioned let’s check them out. As mentioned earlier we will be discussing different ways of connecting to a slave and for one of them, we need xmlstarlet. Rest of the things are packages that we might need in one way or the other.

    Update ami_users with actual user value. This can be found on AWS console Under Support and inside of it Support Center.

    Validate what we have written is right or not by running packer validate amazon.json.

    Once confirmed, build the packer image by running packer build amazon.json.

    After completion check your AWS console and you will find a new AMI created in “My AMI’s”.

    It’s now time to start using terraform for creating the machines. 

    Prerequisite:

    1. Please make sure you create a provider.tf file.

    provider "aws" {
      region                  = "us-east-1"
      shared_credentials_file = "~/.aws/credentials"
      profile                 = "dev"
    }

    The ‘credentials file’ will contain aws_access_key_id and aws_secret_access_key.

    2.  Keep SSH keys handy for server/slave machines. Here is a nice article highlighting how to create it or else create them before hand on aws console and reference it in the code.

    3. VPC:

    # lookup for the "default" VPC
    data "aws_vpc" "default_vpc" {
      default = true
    }
    
    # subnet list in the "default" VPC
    # The "default" VPC has all "public subnets"
    data "aws_subnet_ids" "default_public" {
      vpc_id = "${data.aws_vpc.default_vpc.id}"
    }

    Creating Terraform Script for Spinning up Jenkins Master

    Creating Terraform Script for Spinning up Jenkins Master. Get terraform from terraform download page.

    We will need to set up the Security Group before setting up the instance.

    # Security Group:
    resource "aws_security_group" "jenkins_server" {
      name        = "jenkins_server"
      description = "Jenkins Server: created by Terraform for [dev]"
    
      # legacy name of VPC ID
      vpc_id = "${data.aws_vpc.default_vpc.id}"
    
      tags {
        Name = "jenkins_server"
        env  = "dev"
      }
    }
    
    ###############################################################################
    # ALL INBOUND
    ###############################################################################
    
    # ssh
    resource "aws_security_group_rule" "jenkins_server_from_source_ingress_ssh" {
      type              = "ingress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["<Your Public IP>/32", "172.0.0.0/8"]
      description       = "ssh to jenkins_server"
    }
    
    # web
    resource "aws_security_group_rule" "jenkins_server_from_source_ingress_webui" {
      type              = "ingress"
      from_port         = 8080
      to_port           = 8080
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "jenkins server web"
    }
    
    # JNLP
    resource "aws_security_group_rule" "jenkins_server_from_source_ingress_jnlp" {
      type              = "ingress"
      from_port         = 33453
      to_port           = 33453
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["172.31.0.0/16"]
      description       = "jenkins server JNLP Connection"
    }
    
    ###############################################################################
    # ALL OUTBOUND
    ###############################################################################
    
    resource "aws_security_group_rule" "jenkins_server_to_other_machines_ssh" {
      type              = "egress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins servers to ssh to other machines"
    }
    
    resource "aws_security_group_rule" "jenkins_server_outbound_all_80" {
      type              = "egress"
      from_port         = 80
      to_port           = 80
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins servers for outbound yum"
    }
    
    resource "aws_security_group_rule" "jenkins_server_outbound_all_443" {
      type              = "egress"
      from_port         = 443
      to_port           = 443
      protocol          = "tcp"
      security_group_id = "${aws_security_group.jenkins_server.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins servers for outbound yum"
    }

    Now that we have a custom AMI and security groups for ourselves let’s use them to create a terraform instance.

    # AMI lookup for this Jenkins Server
    data "aws_ami" "jenkins_server" {
      most_recent      = true
      owners           = ["self"]
    
      filter {
        name   = "name"
        values = ["amazon-linux-for-jenkins*"]
      }
    }
    
    resource "aws_key_pair" "jenkins_server" {
      key_name   = "jenkins_server"
      public_key = "${file("jenkins_server.pub")}"
    }
    
    # lookup the security group of the Jenkins Server
    data "aws_security_group" "jenkins_server" {
      filter {
        name   = "group-name"
        values = ["jenkins_server"]
      }
    }
    
    # userdata for the Jenkins server ...
    data "template_file" "jenkins_server" {
      template = "${file("scripts/jenkins_server.sh")}"
    
      vars {
        env = "dev"
        jenkins_admin_password = "mysupersecretpassword"
      }
    }
    
    # the Jenkins server itself
    resource "aws_instance" "jenkins_server" {
      ami                    		= "${data.aws_ami.jenkins_server.image_id}"
      instance_type          		= "t3.medium"
      key_name               		= "${aws_key_pair.jenkins_server.key_name}"
      subnet_id              		= "${data.aws_subnet_ids.default_public.ids[0]}"
      vpc_security_group_ids 		= ["${data.aws_security_group.jenkins_server.id}"]
      iam_instance_profile   		= "dev_jenkins_server"
      user_data              		= "${data.template_file.jenkins_server.rendered}"
    
      tags {
        "Name" = "jenkins_server"
      }
    
      root_block_device {
        delete_on_termination = true
      }
    }
    
    output "jenkins_server_ami_name" {
        value = "${data.aws_ami.jenkins_server.name}"
    }
    
    output "jenkins_server_ami_id" {
        value = "${data.aws_ami.jenkins_server.id}"
    }
    
    output "jenkins_server_public_ip" {
      value = "${aws_instance.jenkins_server.public_ip}"
    }
    
    output "jenkins_server_private_ip" {
      value = "${aws_instance.jenkins_server.private_ip}"
    }

    As mentioned before, we will be discussing multiple ways in which we can connect the slaves to Jenkins master. But it is already known that every time a new Jenkins comes up, it generates a unique password. Now there are two ways to deal with this, one is to wait for Jenkins to spin up and retrieve that password or just directly edit the admin password while creating Jenkins master. Here we will be discussing how to change the password when configuring Jenkins. (If you need the script to retrieve Jenkins password as soon as it gets created than comment and I will share that with you as well).

    Below is the user data to install Jenkins master, configure its password and install required packages.

    #!/bin/bash
    
    set -x
    
    function wait_for_jenkins()
    {
      while (( 1 )); do
          echo "waiting for Jenkins to launch on port [8080] ..."
          
          nc -zv 127.0.0.1 8080
          if (( $? == 0 )); then
              break
          fi
    
          sleep 10
      done
    
      echo "Jenkins launched"
    }
    
    function updating_jenkins_master_password ()
    {
      cat > /tmp/jenkinsHash.py <<EOF
    import bcrypt
    import sys
    if not sys.argv[1]:
      sys.exit(10)
    plaintext_pwd=sys.argv[1]
    encrypted_pwd=bcrypt.hashpw(sys.argv[1], bcrypt.gensalt(rounds=10, prefix=b"2a"))
    isCorrect=bcrypt.checkpw(plaintext_pwd, encrypted_pwd)
    if not isCorrect:
      sys.exit(20);
    print "{}".format(encrypted_pwd)
    EOF
    
      chmod +x /tmp/jenkinsHash.py
      
      # Wait till /var/lib/jenkins/users/admin* folder gets created
      sleep 10
    
      cd /var/lib/jenkins/users/admin*
      pwd
      while (( 1 )); do
          echo "Waiting for Jenkins to generate admin user's config file ..."
    
          if [[ -f "./config.xml" ]]; then
              break
          fi
    
          sleep 10
      done
    
      echo "Admin config file created"
    
      admin_password=$(python /tmp/jenkinsHash.py ${jenkins_admin_password} 2>&1)
      
      # Please do not remove alter quote as it keeps the hash syntax intact or else while substitution, $<character> will be replaced by null
      xmlstarlet -q ed --inplace -u "/user/properties/hudson.security.HudsonPrivateSecurityRealm_-Details/passwordHash" -v '#jbcrypt:'"$admin_password" config.xml
    
      # Restart
      systemctl restart jenkins
      sleep 10
    }
    
    function install_packages ()
    {
    
      wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
      rpm --import https://jenkins-ci.org/redhat/jenkins-ci.org.key
      yum install -y jenkins
    
      # firewall
      #firewall-cmd --permanent --new-service=jenkins
      #firewall-cmd --permanent --service=jenkins --set-short="Jenkins Service Ports"
      #firewall-cmd --permanent --service=jenkins --set-description="Jenkins Service firewalld port exceptions"
      #firewall-cmd --permanent --service=jenkins --add-port=8080/tcp
      #firewall-cmd --permanent --add-service=jenkins
      #firewall-cmd --zone=public --add-service=http --permanent
      #firewall-cmd --reload
      systemctl enable jenkins
      systemctl restart jenkins
      sleep 10
    }
    
    function configure_jenkins_server ()
    {
      # Jenkins cli
      echo "installing the Jenkins cli ..."
      cp /var/cache/jenkins/war/WEB-INF/jenkins-cli.jar /var/lib/jenkins/jenkins-cli.jar
    
      # Getting initial password
      # PASSWORD=$(cat /var/lib/jenkins/secrets/initialAdminPassword)
      PASSWORD="${jenkins_admin_password}"
      sleep 10
    
      jenkins_dir="/var/lib/jenkins"
      plugins_dir="$jenkins_dir/plugins"
    
      cd $jenkins_dir
    
      # Open JNLP port
      xmlstarlet -q ed --inplace -u "/hudson/slaveAgentPort" -v 33453 config.xml
    
      cd $plugins_dir || { echo "unable to chdir to [$plugins_dir]"; exit 1; }
    
      # List of plugins that are needed to be installed 
      plugin_list="git-client git github-api github-oauth github MSBuild ssh-slaves workflow-aggregator ws-cleanup"
    
      # remove existing plugins, if any ...
      rm -rfv $plugin_list
    
      for plugin in $plugin_list; do
          echo "installing plugin [$plugin] ..."
          java -jar $jenkins_dir/jenkins-cli.jar -s http://127.0.0.1:8080/ -auth admin:$PASSWORD install-plugin $plugin
      done
    
      # Restart jenkins after installing plugins
      java -jar $jenkins_dir/jenkins-cli.jar -s http://127.0.0.1:8080 -auth admin:$PASSWORD safe-restart
    }
    
    ### script starts here ###
    
    install_packages
    
    wait_for_jenkins
    
    updating_jenkins_master_password
    
    wait_for_jenkins
    
    configure_jenkins_server
    
    echo "Done"
    exit 0
    

    There is a lot of stuff that has been covered here. But the most tricky bit is changing Jenkins password. Here we are using a python script which uses brcypt to hash the plain text in Jenkins encryption format and xmlstarlet for replacing that password in the actual location. Also, we are using xmstarlet to edit the JNLP port for windows slave. Do remember initial username for Jenkins is admin.

    Command to run: Initialize terraform – terraform init , Check and apply – terraform plan -> terraform apply

    After successfully running apply command go to AWS console and check for a new instance coming up. Hit the <public ip=””>:8080 and enter credentials as you had passed and you will have the Jenkins master for yourself ready to be used. </public>

    Note: I will be providing the terraform script and permission list of IAM roles for the user at the end of the blog.

    Creating Terraform Script for Spinning up Linux Slave and connect it to master

    We won’t be creating a new image here rather use the same one that we used for Jenkins master.

    VPC will be same and updated Security groups for slave are below:

    resource "aws_security_group" "dev_jenkins_worker_linux" {
      name        = "dev_jenkins_worker_linux"
      description = "Jenkins Server: created by Terraform for [dev]"
    
    # legacy name of VPC ID
      vpc_id = "${data.aws_vpc.default_vpc.id}"
    
      tags {
        Name = "dev_jenkins_worker_linux"
        env  = "dev"
      }
    }
    
    ###############################################################################
    # ALL INBOUND
    ###############################################################################
    
    # ssh
    resource "aws_security_group_rule" "jenkins_worker_linux_from_source_ingress_ssh" {
      type              = "ingress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["<Your Public IP>/32"]
      description       = "ssh to jenkins_worker_linux"
    }
    
    # ssh
    resource "aws_security_group_rule" "jenkins_worker_linux_from_source_ingress_webui" {
      type              = "ingress"
      from_port         = 8080
      to_port           = 8080
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "ssh to jenkins_worker_linux"
    }
    
    
    ###############################################################################
    # ALL OUTBOUND
    ###############################################################################
    
    resource "aws_security_group_rule" "jenkins_worker_linux_to_all_80" {
      type              = "egress"
      from_port         = 80
      to_port           = 80
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker to all 80"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_linux_to_all_443" {
      type              = "egress"
      from_port         = 443
      to_port           = 443
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker to all 443"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_linux_to_other_machines_ssh" {
      type              = "egress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_linux.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker linux to jenkins server"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_linux_to_jenkins_server_8080" {
      type                     = "egress"
      from_port                = 8080
      to_port                  = 8080
      protocol                 = "tcp"
      security_group_id        = "${aws_security_group.dev_jenkins_worker_linux.id}"
      source_security_group_id = "${aws_security_group.jenkins_server.id}"
      description              = "allow jenkins workers linux to jenkins server"
    }

    Now that we have the required security groups in place it is time to bring into light terraform script for linux slave.

    data "aws_ami" "jenkins_worker_linux" {
      most_recent      = true
      owners           = ["self"]
    
      filter {
        name   = "name"
        values = ["amazon-linux-for-jenkins*"]
      }
    }
    
    resource "aws_key_pair" "jenkins_worker_linux" {
      key_name   = "jenkins_worker_linux"
      public_key = "${file("jenkins_worker.pub")}"
    }
    
    data "local_file" "jenkins_worker_pem" {
      filename = "${path.module}/jenkins_worker.pem"
    }
    
    data "template_file" "userdata_jenkins_worker_linux" {
      template = "${file("scripts/jenkins_worker_linux.sh")}"
    
      vars {
        env         = "dev"
        region      = "us-east-1"
        datacenter  = "dev-us-east-1"
        node_name   = "us-east-1-jenkins_worker_linux"
        domain      = ""
        device_name = "eth0"
        server_ip   = "${aws_instance.jenkins_server.private_ip}"
        worker_pem  = "${data.local_file.jenkins_worker_pem.content}"
        jenkins_username = "admin"
        jenkins_password = "mysupersecretpassword"
      }
    }
    
    # lookup the security group of the Jenkins Server
    data "aws_security_group" "jenkins_worker_linux" {
      filter {
        name   = "group-name"
        values = ["dev_jenkins_worker_linux"]
      }
    }
    
    resource "aws_launch_configuration" "jenkins_worker_linux" {
      name_prefix                 = "dev-jenkins-worker-linux"
      image_id                    = "${data.aws_ami.jenkins_worker_linux.image_id}"
      instance_type               = "t3.medium"
      iam_instance_profile        = "dev_jenkins_worker_linux"
      key_name                    = "${aws_key_pair.jenkins_worker_linux.key_name}"
      security_groups             = ["${data.aws_security_group.jenkins_worker_linux.id}"]
      user_data                   = "${data.template_file.userdata_jenkins_worker_linux.rendered}"
      associate_public_ip_address = false
    
      root_block_device {
        delete_on_termination = true
        volume_size = 100
      }
    
      lifecycle {
        create_before_destroy = true
      }
    }
    
    resource "aws_autoscaling_group" "jenkins_worker_linux" {
      name                      = "dev-jenkins-worker-linux"
      min_size                  = "1"
      max_size                  = "2"
      desired_capacity          = "2"
      health_check_grace_period = 60
      health_check_type         = "EC2"
      vpc_zone_identifier       = ["${data.aws_subnet_ids.default_public.ids}"]
      launch_configuration      = "${aws_launch_configuration.jenkins_worker_linux.name}"
      termination_policies      = ["OldestLaunchConfiguration"]
      wait_for_capacity_timeout = "10m"
      default_cooldown          = 60
    
      tags = [
        {
          key                 = "Name"
          value               = "dev_jenkins_worker_linux"
          propagate_at_launch = true
        },
        {
          key                 = "class"
          value               = "dev_jenkins_worker_linux"
          propagate_at_launch = true
        },
      ]
    }

    And now the final piece of code, which is user-data of slave machine.

    #!/bin/bash
    
    set -x
    
    function wait_for_jenkins ()
    {
        echo "Waiting jenkins to launch on 8080..."
    
        while (( 1 )); do
            echo "Waiting for Jenkins"
    
            nc -zv ${server_ip} 8080
            if (( $? == 0 )); then
                break
            fi
    
            sleep 10
        done
    
        echo "Jenkins launched"
    }
    
    function slave_setup()
    {
        # Wait till jar file gets available
        ret=1
        while (( $ret != 0 )); do
            wget -O /opt/jenkins-cli.jar http://${server_ip}:8080/jnlpJars/jenkins-cli.jar
            ret=$?
    
            echo "jenkins cli ret [$ret]"
        done
    
        ret=1
        while (( $ret != 0 )); do
            wget -O /opt/slave.jar http://${server_ip}:8080/jnlpJars/slave.jar
            ret=$?
    
            echo "jenkins slave ret [$ret]"
        done
        
        mkdir -p /opt/jenkins-slave
        chown -R ec2-user:ec2-user /opt/jenkins-slave
    
        # Register_slave
        JENKINS_URL="http://${server_ip}:8080"
    
        USERNAME="${jenkins_username}"
        
        # PASSWORD=$(cat /tmp/secret)
        PASSWORD="${jenkins_password}"
    
        SLAVE_IP=$(ip -o -4 addr list ${device_name} | head -n1 | awk '{print $4}' | cut -d/ -f1)
        NODE_NAME=$(echo "jenkins-slave-linux-$SLAVE_IP" | tr '.' '-')
        NODE_SLAVE_HOME="/opt/jenkins-slave"
        EXECUTORS=2
        SSH_PORT=22
    
        CRED_ID="$NODE_NAME"
        LABELS="build linux docker"
        USERID="ec2-user"
    
        cd /opt
        
        # Creating CMD utility for jenkins-cli commands
        jenkins_cmd="java -jar /opt/jenkins-cli.jar -s $JENKINS_URL -auth $USERNAME:$PASSWORD"
    
        # Waiting for Jenkins to load all plugins
        while (( 1 )); do
    
          count=$($jenkins_cmd list-plugins 2>/dev/null | wc -l)
          ret=$?
    
          echo "count [$count] ret [$ret]"
    
          if (( $count > 0 )); then
              break
          fi
    
          sleep 30
        done
    
        # Delete Credentials if present for respective slave machines
        $jenkins_cmd delete-credentials system::system::jenkins _ $CRED_ID
    
        # Generating cred.xml for creating credentials on Jenkins server
        cat > /tmp/cred.xml <<EOF
    <com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey plugin="ssh-credentials@1.16">
      <scope>GLOBAL</scope>
      <id>$CRED_ID</id>
      <description>Generated via Terraform for $SLAVE_IP</description>
      <username>$USERID</username>
      <privateKeySource class="com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey\$DirectEntryPrivateKeySource">
        <privateKey>${worker_pem}</privateKey>
      </privateKeySource>
    </com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey>
    EOF
    
        # Creating credential using cred.xml
        cat /tmp/cred.xml | $jenkins_cmd create-credentials-by-xml system::system::jenkins _
    
        # For Deleting Node, used when testing
        $jenkins_cmd delete-node $NODE_NAME
        
        # Generating node.xml for creating node on Jenkins server
        cat > /tmp/node.xml <<EOF
    <slave>
      <name>$NODE_NAME</name>
      <description>Linux Slave</description>
      <remoteFS>$NODE_SLAVE_HOME</remoteFS>
      <numExecutors>$EXECUTORS</numExecutors>
      <mode>NORMAL</mode>
      <retentionStrategy class="hudson.slaves.RetentionStrategy\$Always"/>
      <launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.5">
        <host>$SLAVE_IP</host>
        <port>$SSH_PORT</port>
        <credentialsId>$CRED_ID</credentialsId>
      </launcher>
      <label>$LABELS</label>
      <nodeProperties/>
      <userId>$USERID</userId>
    </slave>
    EOF
    
      sleep 10
      
      # Creating node using node.xml
      cat /tmp/node.xml | $jenkins_cmd create-node $NODE_NAME
    }
    
    ### script begins here ###
    
    wait_for_jenkins
    
    slave_setup
    
    echo "Done"
    exit 0

    This will not only create a node on Jenkins master but also attach it.

    Command to run: Initialize terraform – terraform init, Check and apply – terraform plan -> terraform apply

    One drawback of this is, if by any chance slave gets disconnected or goes down, it will remain on Jenkins master as offline, also it will not manually attach itself to Jenkins master.

    Some solutions for them are:

    1. Create a cron job on the slave which will run user-data after a certain interval.

    2. Use swarm plugin.

    3. As we are on AWS, we can even use Amazon EC2 Plugin.

    Maybe in a future blog, we will cover using both of these plugins as well.

    Using Packer to create AMI’s for Windows Slave

    Windows AMI will also be created using packer. All the pointers for Windows will remain as it were for Linux.

    {
      "variables": {
        "ami-description": "Windows Server for Jenkins Slave ({{isotime \"2006-01-02-15-04-05\"}})",
        "ami-name": "windows-slave-for-jenkins-{{isotime \"2006-01-02-15-04-05\"}}",
        "aws_access_key": "",
        "aws_secret_key": ""
      },
    
      "builders": [
        {
          "ami_description": "{{user `ami-description`}}",
          "ami_name": "{{user `ami-name`}}",
          "ami_regions": [
            "us-east-1"
          ],
          "ami_users": [
            "XXXXXXXXXX"
          ],
          "ena_support": "true",
          "instance_type": "t3.medium",
          "region": "us-east-1",
          "source_ami_filter": {
            "filters": {
              "name": "Windows_Server-2016-English-Full-Containers-*",
              "root-device-type": "ebs",
              "virtualization-type": "hvm"
            },
            "most_recent": true,
            "owners": [
              "amazon"
            ]
          },
          "sriov_support": "true",
          "user_data_file": "scripts/SetUpWinRM.ps1",
          "communicator": "winrm",
          "winrm_username": "Administrator",
          "winrm_insecure": true,
          "winrm_use_ssl": true,
          "tags": {
            "Name": "{{user `ami-name`}}"
          },
          "type": "amazon-ebs"
        }
      ],
      "post-processors": [
      {
        "inline": [
          "echo AMI Name {{user `ami-name`}}",
          "date",
          "exit 0"
        ],
        "type": "shell-local"
      }
      ],
      "provisioners": [
        {
          "type": "powershell",
          "valid_exit_codes": [ 0, 3010 ],
          "scripts": [
            "scripts/disable-uac.ps1",
            "scripts/enable-rdp.ps1",
            "install_windows.ps1"
          ]
        },
        {
          "type": "windows-restart",
          "restart_check_command": "powershell -command \"& {Write-Output 'restarted.'}\""
        },
        {
          "type": "powershell",
          "inline": [
            "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
            "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"
          ]
        }
      ]
    }

    Now when it comes to windows one should know that it does not behave the same way Linux does. For us to be able to communicate with this image an essential component required is WinRM. We set it up at the very beginning as part of user_data_file. Also, windows require user input for a lot of things and while automating it is not possible to provide it as it will break the flow of execution so we disable UAC and enable RDP so that we can connect to that machine from our local desktop for debugging if needed. And at last, we will execute install_windows.ps1 file which will set up our slave. Please note at the last we are calling two PowerShell scripts to generate random password every time a new machine is created. It is mandatory to have them or you will never be able to login into your machines.

    There are multiple user-data in the above code, let’s understand them in their order of appearance.

    SetUpWinRM.ps1:

    <powershell>
    
    write-output "Running User Data Script"
    write-host "(host) Running User Data Script"
    
    Set-ExecutionPolicy Unrestricted -Scope LocalMachine -Force -ErrorAction Ignore
    
    # Don't set this before Set-ExecutionPolicy as it throws an error
    $ErrorActionPreference = "stop"
    
    # Remove HTTP listener
    Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
    
    $Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "packer"
    New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force
    
    # WinRM
    write-output "Setting up WinRM"
    write-host "(host) setting up WinRM"
    
    cmd.exe /c winrm quickconfig -q
    cmd.exe /c winrm set "winrm/config" '@{MaxTimeoutms="1800000"}'
    cmd.exe /c winrm set "winrm/config/winrs" '@{MaxMemoryPerShellMB="1024"}'
    cmd.exe /c winrm set "winrm/config/service" '@{AllowUnencrypted="true"}'
    cmd.exe /c winrm set "winrm/config/client" '@{AllowUnencrypted="true"}'
    cmd.exe /c winrm set "winrm/config/service/auth" '@{Basic="true"}'
    cmd.exe /c winrm set "winrm/config/client/auth" '@{Basic="true"}'
    cmd.exe /c winrm set "winrm/config/service/auth" '@{CredSSP="true"}'
    cmd.exe /c winrm set "winrm/config/listener?Address=*+Transport=HTTPS" "@{Port=`"5986`";Hostname=`"packer`";CertificateThumbprint=`"$($Cert.Thumbprint)`"}"
    cmd.exe /c netsh advfirewall firewall set rule group="remote administration" new enable=yes
    cmd.exe /c netsh firewall add portopening TCP 5986 "Port 5986"
    cmd.exe /c net stop winrm
    cmd.exe /c sc config winrm start= auto
    cmd.exe /c net start winrm
    
    </powershell>

    The content is pretty straightforward as it is just setting up WInRM. The only thing that matters here is the <powershell> and </powershell>. They are mandatory as packer will not be able to understand what is the type of script. Next, we come across disable-uac.ps1 & enable-rdp.ps1, and we have discussed their purpose before. The last user-data is the actual user-data that we need to install all the required packages in the AMI.

    Chocolatey: a blessing in disguise – Installing required applications in windows by scripting is a real headache as you have to write a lot of stuff just to install a single application but luckily for us we have chocolatey. It works as a package manager for windows and helps us install applications as we are installing packages on Linux. install_windows.ps1 has installation step for chocolatey and how it can be used to install other applications on windows.

    See, such a small script and you can get all the components to run your Windows application in no time (Kidding… This script actually takes around 20 minutes to run :P)

    Remaining user-data can be found here.

    Now that we have the image for ourselves let’s start with terraform script to make this machine a slave of your Jenkins master.

    Creating Terraform Script for Spinning up Windows Slave and Connect it to Master

    This time also we will first create the security groups and then create the slave machine from the same AMI that we developed above.

    resource "aws_security_group" "dev_jenkins_worker_windows" {
      name        = "dev_jenkins_worker_windows"
      description = "Jenkins Server: created by Terraform for [dev]"
    
      # legacy name of VPC ID
      vpc_id = "${data.aws_vpc.default_vpc.id}"
    
      tags {
        Name = "dev_jenkins_worker_windows"
        env  = "dev"
      }
    }
    
    ###############################################################################
    # ALL INBOUND
    ###############################################################################
    
    # ssh
    resource "aws_security_group_rule" "jenkins_worker_windows_from_source_ingress_webui" {
      type              = "ingress"
      from_port         = 8080
      to_port           = 8080
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "ssh to jenkins_worker_windows"
    }
    
    # rdp
    resource "aws_security_group_rule" "jenkins_worker_windows_from_rdp" {
      type              = "ingress"
      from_port         = 3389
      to_port           = 3389
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["<Your Public IP>/32"]
      description       = "rdp to jenkins_worker_windows"
    }
    
    ###############################################################################
    # ALL OUTBOUND
    ###############################################################################
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_all_80" {
      type              = "egress"
      from_port         = 80
      to_port           = 80
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker to all 80"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_all_443" {
      type              = "egress"
      from_port         = 443
      to_port           = 443
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker to all 443"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_jenkins_server_33453" {
      type              = "egress"
      from_port         = 33453
      to_port           = 33453
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["172.31.0.0/16"]
      description       = "allow jenkins worker windows to jenkins server"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_jenkins_server_8080" {
      type                     = "egress"
      from_port                = 8080
      to_port                  = 8080
      protocol                 = "tcp"
      security_group_id        = "${aws_security_group.dev_jenkins_worker_windows.id}"
      source_security_group_id = "${aws_security_group.jenkins_server.id}"
      description              = "allow jenkins workers windows to jenkins server"
    }
    
    resource "aws_security_group_rule" "jenkins_worker_windows_to_all_22" {
      type              = "egress"
      from_port         = 22
      to_port           = 22
      protocol          = "tcp"
      security_group_id = "${aws_security_group.dev_jenkins_worker_windows.id}"
      cidr_blocks       = ["0.0.0.0/0"]
      description       = "allow jenkins worker windows to connect outbound from 22"
    }

    Once security groups are in place we move towards creating the terraform file for windows machine itself. Windows can’t connect to Jenkins master using SSH the method we used while connecting the Linux slave instead we have to use JNLP. A quick recap, when creating Jenkins master we used xmlstarlet to modify the JNLP port and also added rules in sg group to allow connection for JNLP. Also, we have opened the port for RDP so that if any issue occurs you can get in the machine and debug it.

    Terraform file:

    # Setting Up Windows Slave 
    data "aws_ami" "jenkins_worker_windows" {
      most_recent      = true
      owners           = ["self"]
    
      filter {
        name   = "name"
        values = ["windows-slave-for-jenkins*"]
      }
    }
    
    resource "aws_key_pair" "jenkins_worker_windows" {
      key_name   = "jenkins_worker_windows"
      public_key = "${file("jenkins_worker.pub")}"
    }
    
    data "template_file" "userdata_jenkins_worker_windows" {
      template = "${file("scripts/jenkins_worker_windows.ps1")}"
    
      vars {
        env         = "dev"
        region      = "us-east-1"
        datacenter  = "dev-us-east-1"
        node_name   = "us-east-1-jenkins_worker_windows"
        domain      = ""
        device_name = "eth0"
        server_ip   = "${aws_instance.jenkins_server.private_ip}"
        worker_pem  = "${data.local_file.jenkins_worker_pem.content}"
        jenkins_username = "admin"
        jenkins_password = "mysupersecretpassword"
      }
    }
    
    # lookup the security group of the Jenkins Server
    data "aws_security_group" "jenkins_worker_windows" {
      filter {
        name   = "group-name"
        values = ["dev_jenkins_worker_windows"]
      }
    }
    
    resource "aws_launch_configuration" "jenkins_worker_windows" {
      name_prefix                 = "dev-jenkins-worker-"
      image_id                    = "${data.aws_ami.jenkins_worker_windows.image_id}"
      instance_type               = "t3.medium"
      iam_instance_profile        = "dev_jenkins_worker_windows"
      key_name                    = "${aws_key_pair.jenkins_worker_windows.key_name}"
      security_groups             = ["${data.aws_security_group.jenkins_worker_windows.id}"]
      user_data                   = "${data.template_file.userdata_jenkins_worker_windows.rendered}"
      associate_public_ip_address = false
    
      root_block_device {
        delete_on_termination = true
        volume_size = 100
      }
    
      lifecycle {
        create_before_destroy = true
      }
    }
    
    resource "aws_autoscaling_group" "jenkins_worker_windows" {
      name                      = "dev-jenkins-worker-windows"
      min_size                  = "1"
      max_size                  = "2"
      desired_capacity          = "2"
      health_check_grace_period = 60
      health_check_type         = "EC2"
      vpc_zone_identifier       = ["${data.aws_subnet_ids.default_public.ids}"]
      launch_configuration      = "${aws_launch_configuration.jenkins_worker_windows.name}"
      termination_policies      = ["OldestLaunchConfiguration"]
      wait_for_capacity_timeout = "10m"
      default_cooldown          = 60
    
      #lifecycle {
      #  create_before_destroy = true
      #}
    
    
      ## on replacement, gives new service time to spin up before moving on to destroy
      #provisioner "local-exec" {
      #  command = "sleep 60"
      #}
    
      tags = [
        {
          key                 = "Name"
          value               = "dev_jenkins_worker_windows"
          propagate_at_launch = true
        },
        {
          key                 = "class"
          value               = "dev_jenkins_worker_windows"
          propagate_at_launch = true
        },
      ]
    }

    Finally, we reach the user-data for the terraform plan. It will download the required jar file, create a node on Jenkins and register itself as a slave.

    <powershell>
    
    function Wait-For-Jenkins {
    
      Write-Host "Waiting jenkins to launch on 8080..."
    
      Do {
      Write-Host "Waiting for Jenkins"
    
       Nc -zv ${server_ip} 8080
       If( $? -eq $true ) {
         Break
       }
       Sleep 10
    
      } While (1)
    
      Do {
       Write-Host "Waiting for JNLP"
          
       Nc -zv ${server_ip} 33453
       If( $? -eq $true ) {
        Break
       }
       Sleep 10
    
      } While (1)      
    
      Write-Host "Jenkins launched"
    }
    
    function Slave-Setup()
    {
      # Register_slave
      $JENKINS_URL="http://${server_ip}:8080"
    
      $USERNAME="${jenkins_username}"
      
      $PASSWORD="${jenkins_password}"
    
      $AUTH = -join ("$USERNAME", ":", "$PASSWORD")
      echo $AUTH
    
      # Below IP collection logic works for Windows Server 2016 edition and needs testing for windows server 2008 edition
      $SLAVE_IP=(ipconfig | findstr /r "[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*" | findstr "IPv4 Address").substring(39) | findstr /B "172.31"
      
      $NODE_NAME="jenkins-slave-windows-$SLAVE_IP"
      
      $NODE_SLAVE_HOME="C:\Jenkins\"
      $EXECUTORS=2
      $JNLP_PORT=33453
    
      $CRED_ID="$NODE_NAME"
      $LABELS="build windows"
      
      # Creating CMD utility for jenkins-cli commands
      # This is not working in windows therefore specify full path
      $jenkins_cmd = "java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth admin:$PASSWORD"
    
      Sleep 20
    
      Write-Host "Downloading jenkins-cli.jar file"
      (New-Object System.Net.WebClient).DownloadFile("$JENKINS_URL/jnlpJars/jenkins-cli.jar", "C:\Jenkins\jenkins-cli.jar")
    
      Write-Host "Downloading slave.jar file"
      (New-Object System.Net.WebClient).DownloadFile("$JENKINS_URL/jnlpJars/slave.jar", "C:\Jenkins\slave.jar")
    
      Sleep 10
    
      # Waiting for Jenkins to load all plugins
      Do {
      
        $count=(java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH list-plugins | Measure-Object -line).Lines
        $ret=$?
    
        Write-Host "count [$count] ret [$ret]"
    
        If ( $count -gt 0 ) {
            Break
        }
    
        sleep 30
      } While ( 1 )
    
      # For Deleting Node, used when testing
      Write-Host "Deleting Node $NODE_NAME if present"
      java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH delete-node $NODE_NAME
      
      # Generating node.xml for creating node on Jenkins server
      $NodeXml = @"
    <slave>
    <name>$NODE_NAME</name>
    <description>Windows Slave</description>
    <remoteFS>$NODE_SLAVE_HOME</remoteFS>
    <numExecutors>$EXECUTORS</numExecutors>
    <mode>NORMAL</mode>
    <retentionStrategy class="hudson.slaves.RetentionStrategy`$Always`"/>
    <launcher class="hudson.slaves.JNLPLauncher">
      <workDirSettings>
        <disabled>false</disabled>
        <internalDir>remoting</internalDir>
        <failIfWorkDirIsMissing>false</failIfWorkDirIsMissing>
      </workDirSettings>
    </launcher>
    <label>$LABELS</label>
    <nodeProperties/>
    </slave>
    "@
      $NodeXml | Out-File -FilePath C:\Jenkins\node.xml 
    
      type C:\Jenkins\node.xml
    
      # Creating node using node.xml
      Write-Host "Creating $NODE_NAME"
      Get-Content -Path C:\Jenkins\node.xml | java -jar C:\Jenkins\jenkins-cli.jar -s $JENKINS_URL -auth $AUTH create-node $NODE_NAME
    
      Write-Host "Registering Node $NODE_NAME via JNLP"
      Start-Process java -ArgumentList "-jar C:\Jenkins\slave.jar -jnlpCredentials $AUTH -jnlpUrl $JENKINS_URL/computer/$NODE_NAME/slave-agent.jnlp"
    }
    
    ### script begins here ###
    
    Wait-For-Jenkins
    
    Slave-Setup
    
    echo "Done"
    </powershell>
    <persist>true</persist>

    Command to run: Initialize terraform – terraform init, Check and apply – terraform plan -> terraform apply

    Same drawbacks are applicable here and the same solutions will work here as well.

    Congratulations! You have a Jenkins master with Windows and Linux slave attached to it.

    IAM roles for reference

    Jenkins Master

    Linux Slave

    Windows Slave

    Bonus:

    If you want to associate IAM permissions to the user but cannot assign FULL ACCESS here is a curated list below for reference:

    Packer Policy

    Terraform Policy

    Conclusion:

    This blog tries to highlight one of the ways in which we can use packer and Terraform to create AMI’s which will serve as Jenkins master and slave. We not only covered their creation but also focused on how to associate security groups and checked some of the basic IAM roles that can be applied. Although we have covered almost all the possible scenarios but still depending on use case, the required changes would be very less and this can serve as a boiler plate code when beginning to plan your infrastructure on cloud.

  • Web Scraping: Introduction, Best Practices & Caveats

    Web scraping is a process to crawl various websites and extract the required data using spiders. This data is processed in a data pipeline and stored in a structured format. Today, web scraping is widely used and has many use cases:

    • Using web scraping, Marketing & Sales companies can fetch lead-related information.
    • Web scraping is useful for Real Estate businesses to get the data of new projects, resale properties, etc.
    • Price comparison portals, like Trivago, extensively use web scraping to get the information of product and price from various e-commerce sites.

    The process of web scraping usually involves spiders, which fetch the HTML documents from relevant websites, extract the needed content based on the business logic, and finally store it in a specific format. This blog is a primer to build highly scalable scrappers. We will cover the following items:

    1. Ways to scrape: We’ll see basic ways to scrape data using techniques and frameworks in Python with some code snippets.
    2. Scraping at scale: Scraping a single page is straightforward, but there are challenges in scraping millions of websites, including managing the spider code, collecting data, and maintaining a data warehouse. We’ll explore such challenges and their solutions to make scraping easy and accurate.
    3. Scraping Guidelines: Scraping data from websites without the owner’s permission can be deemed as malicious. Certain guidelines need to be followed to ensure our scrappers are not blacklisted. We’ll look at some of the best practices one should follow for crawling.

    So let’s start scraping. 

    Different Techniques for Scraping

    Here, we will discuss how to scrape a page and the different libraries available in Python.

    Note: Python is the most popular language for scraping.  

    1. Requests – HTTP Library in Python: To scrape the website or a page, first find out the content of the HTML page in an HTTP response object. The requests library from Python is pretty handy and easy to use. It uses urllib inside. I like ‘requests’ as it’s easy and the code becomes readable too.

    #Example showing how to use the requests library
    import requests
    r = requests.get("https://velotio.com") #Fetch HTML Page

    2. BeautifulSoup: Once you get the webpage, the next step is to extract the data. BeautifulSoup is a powerful Python library that helps you extract the data from the page. It’s easy to use and has a wide range of APIs that’ll help you extract the data. We use the requests library to fetch an HTML page and then use the BeautifulSoup to parse that page. In this example, we can easily fetch the page title and all links on the page. Check out the documentation for all the possible ways in which we can use BeautifulSoup.

    from bs4 import BeautifulSoup
    import requests
    r = requests.get("https://velotio.com") #Fetch HTML Page
    soup = BeautifulSoup(r.text, "html.parser") #Parse HTML Page
    print "Webpage Title:" + soup.title.string
    print "Fetch All Links:" soup.find_all('a')

    3. Python Scrapy Framework:

    Scrapy is a Python-based web scraping framework that allows you to create different kinds of spiders to fetch the source code of the target website. Scrapy starts crawling the web pages present on a certain website, and then you can write the extraction logic to get the required data. Scrapy is built on the top of Twisted, a Python-based asynchronous library that performs the requests in an async fashion to boost up the spider performance. Scrapy is faster than BeautifulSoup. Moreover, it is a framework to write scrapers as opposed to BeautifulSoup, which is just a library to parse HTML pages.

    Here is a simple example of how to use Scrapy. Install Scrapy via pip. Scrapy gives a shell after parsing a website:

    $ pip install scrapy #Install Scrapy"
    $ scrapy shell https://velotio.com
    In [1]: response.xpath("//a").extract() #Fetch all a hrefs

    Now, let’s write a custom spider to parse a website.

    $cat > myspider.py <import scrapy
    
    class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']
    
    def parse(self, response):
    for title in response.css('h2.entry-title'):
    yield {'title': title.css('a ::text').extract_first()}
    EOF
    scrapy runspider myspider.py

    That’s it. Your first custom spider is created. Now. let’s understand the code.

    • name: Name of the spider. In this case, it’s “blogspider”.
    • start_urls: A list of URLs where the spider will begin to crawl from.
    • parse(self, response): This function is called whenever the crawler successfully crawls a URL. The response object used earlier in the Scrapy shell is the same response object that is passed to the parse(..).

    When you run this, Scrapy will look for start URL and will give you all the divs of the h2.entry-title class and extract the associated text from it. Alternatively, you can write your extraction logic in a parse method or create a separate class for extraction and call its object from the parse method.

    You’ve seen how to extract simple items from a website using Scrapy, but this is just the surface. Scrapy provides a lot of powerful features for making scraping easy and efficient. Here is a tutorial for Scrapy and the additional documentation for LinkExtractor by which you can instruct Scrapy to extract links from a web page.

    4. Python lxml.html library:  This is another library from Python just like BeautifulSoup. Scrapy internally uses lxml. It comes with a list of APIs you can use for data extraction. Why will you use this when Scrapy itself can extract the data? Let’s say you want to iterate over the ‘div’ tag and perform some operation on each tag present under “div”, then you can use this library which will give you a list of ‘div’ tags. Now you can simply iterate over them using the iter() function and traverse each child tag inside the parent div tag. Such traversing operations are difficult in scraping. Here is the documentation for this library.

    Challenges while Scraping at Scale

    Let’s look at the challenges and solutions while scraping at large scale, i.e., scraping 100-200 websites regularly:

    1. Data warehousing: Data extraction at a large scale generates vast volumes of information. Fault-tolerant, scalability, security, and high availability are the must-have features for a data warehouse. If your data warehouse is not stable or accessible then operations, like search and filter over data would be an overhead. To achieve this, instead of maintaining own database or infrastructure, you can use Amazon Web Services (AWS). You can use RDS (Relational Database Service) for a structured database and DynamoDB for the non-relational database. AWS takes care of the backup of data. It automatically takes a snapshot of the database. It gives you database error logs as well. This blog explains how to set up infrastructure in the cloud for scraping.  

    2. Pattern Changes: Scraping heavily relies on user interface and its structure, i.e., CSS and Xpath. Now, if the target website gets some adjustments then our scraper may crash completely or it can give random data that we don’t want. This is a common scenario and that’s why it’s more difficult to maintain scrapers than writing it. To handle this case, we can write the test cases for the extraction logic and run them daily, either manually or from CI tools, like Jenkins to track if the target website has changed or not.

    3. Anti-scraping Technologies: Web scraping is a common thing these days, and every website host would want to prevent their data from being scraped. Anti-scraping technologies would help them in this. For example, if you are hitting a particular website from the same IP address on a regular interval then the target website can block your IP. Adding a captcha on a website also helps. There are methods by which we can bypass these anti-scraping methods. For e.g., we can use proxy servers to hide our original IP. There are several proxy services that keep on rotating the IP before each request. Also, it is easy to add support for proxy servers in the code, and in Python, the Scrapy framework does support it.

    4. JavaScript-based dynamic content:  Websites that heavily rely on JavaScript and Ajax to render dynamic content, makes data extraction difficult. Now, Scrapy and related frameworks/libraries will only work or extract what it finds in the HTML document. Ajax calls or JavaScript are executed at runtime so it can’t scrape that. This can be handled by rendering the web page in a headless browser such as Headless Chrome, which essentially allows running Chrome in a server environment. You can also use PhantomJS, which provides a headless Webkit-based environment.

    5. Honeypot traps: Some websites have honeypot traps on the webpages for the detection of web crawlers. They are hard to detect as most of the links are blended with background color or the display property of CSS is set to none. To achieve this requires large coding efforts on both the server and the crawler side, hence this method is not frequently used.

    6. Quality of data: Currently, AI and ML projects are in high demand and these projects need data at large scale. Data integrity is also important as one fault can cause serious problems in AI/ML algorithms. So, in scraping, it is very important to not just scrape the data, but verify its integrity as well. Now doing this in real-time is not possible always, so I would prefer to write test cases of the extraction logic to make sure whatever your spiders are extracting is correct and they are not scraping any bad data

    7. More Data, More Time:  This one is obvious. The larger a website is, the more data it contains, the longer it takes to scrape that site. This may be fine if your purpose for scanning the site isn’t time-sensitive, but that isn’t often the case. Stock prices don’t stay the same over hours. Sales listings, currency exchange rates, media trends, and market prices are just a few examples of time-sensitive data. What to do in this case then? Well, one solution could be to design your spiders carefully. If you’re using Scrapy like framework then apply proper LinkExtractor rules so that spider will not waste time on scraping unrelated URLs.

    You may use multithreading scraping packages available in Python, such as Frontera and Scrapy Redis. Frontera lets you send out only one request per domain at a time, but can hit multiple domains at once, making it great for parallel scraping. Scrapy Redis lets you send out multiple requests to one domain. The right combination of these can result in a very powerful web spider that can handle both the bulk and variation for large websites.

    8. Captchas: Captchas is a good way of keeping crawlers away from a website and it is used by many website hosts. So, in order to scrape the data from such websites, we need a mechanism to solve the captchas. There are packages, software that can solve the captcha and can act as a middleware between the target website and your spider. Also, you may use libraries like Pillow and Tesseract in Python to solve the simple image-based captchas.

    9. Maintaining Deployment: Normally, we don’t want to limit ourselves to scrape just a few websites. We need the maximum amount of data that are present on the Internet and that may introduce scraping of millions of websites. Now, you can imagine the size of the code and the deployment. We can’t run spiders at this scale from a single machine. What I prefer here is to dockerize the scrapers and take advantage of the latest technologies, like AWS ECS, Kubernetes to run our scraper containers. This helps us keeping our scrapers in high availability state and it’s easy to maintain. Also, we can schedule the scrapers to run at regular intervals.

    Scraping Guidelines/ Best Practices

    1. Respect the robots.txt file:  Robots.txt is a text file that webmasters create to instruct search engine robots on how to crawl and index pages on the website. This file generally contains instructions for crawlers. Now, before even planning the extraction logic, you should first check this file. Usually, you can find this at the website admin section. This file has all the rules set on how crawlers should interact with the website. For e.g., if a website has a link to download critical information then they probably don’t want to expose that to crawlers. Another important factor is the frequency interval for crawling, which means that crawlers can only hit the website at specified intervals. If someone has asked not to crawl their website then we better not do it. Because if they catch your crawlers, it can lead to some serious legal issues.

    2. Do not hit the servers too frequently:  As I mentioned above, some websites will have the frequency interval specified for crawlers. We better use it wisely because not every website is tested against the high load. If you are hitting at a constant interval then it creates huge traffic on the server-side, and it may crash or fail to serve other requests. This creates a high impact on user experience as they are more important than the bots. So, we should make the requests according to the specified interval in robots.txt or use a standard delay of 10 seconds. This also helps you not to get blocked by the target website.

    3. User Agent Rotation and Spoofing: Every request consists of a User-Agent string in the header. This string helps to identify the browser you are using, its version, and the platform. If we use the same User-Agent in every request then it’s easy for the target website to check that request is coming from a crawler. So, to make sure we do not face this, try to rotate the User and the Agent between the requests. You can get examples of genuine User-Agent strings on the Internet very easily, try them out. If you’re using Scrapy, you can set USER_AGENT property in settings.py.

    4. Disguise your requests by rotating IPs and Proxy Services: We’ve discussed this in the challenges above. It’s always better to use rotating IPs and proxy service so that your spider won’t get blocked.

    5. Do not follow the same crawling pattern: Now, as you know many websites use anti-scraping technologies, so it’s easy for them to detect your spider if it’s crawling in the same pattern. Normally, we, as a human, would not follow a pattern on a particular website. So, to have your spiders run smoothly, we can introduce actions like mouse movements, clicking a random link, etc, which gives the impression of your spider as a human.

    6. Scrape during off-peak hours: Off-peak hours are suitable for bots/crawlers as the traffic on the website is considerably less. These hours can be identified by the geolocation from where the site’s traffic originates. This also helps to improve the crawling rate and avoid the extra load from spider requests. Thus, it is advisable to schedule the crawlers to run in the off-peak hours.

    7. Use the scraped data responsibly: We should always take the responsibility of the scraped data. It is not acceptable if someone is scraping the data and then republish it somewhere else. This can be considered as breaking the copyright laws and may lead to legal issues. So, it is advisable to check the target website’s Terms of Service page before scraping.

    8. Use Canonical URLs: When we scrape, we tend to scrape duplicate URLs, and hence the duplicate data, which is the last thing we want to do. It may happen in a single website where we get multiple URLs having the same data. In this situation, duplicate URLs will have a canonical URL, which points to the parent or the original URL. By this, we make sure, we don’t scrape duplicate contents. In frameworks like Scrapy, duplicate URLs are handled by default.

    9. Be transparent: Don’t misrepresent your purpose or use deceptive methods to gain access. If you have a login and a password that identifies you to gain access to a source, use it.  Don’t hide who you are. If possible, share your credentials.

    Conclusion

    We’ve seen the basics of scraping, frameworks, how to crawl, and the best practices of scraping. To conclude:

    • Follow target URLs rules while scraping. Don’t make them block your spider.
    • Maintenance of data and spiders at scale is difficult. Use Docker/ Kubernetes and public cloud providers, like AWS to easily scale your web-scraping backend.
    • Always respect the rules of the websites you plan to crawl. If APIs are available, always use them first.