Tag: amazon

  • Amazon Lex + AWS Lambda: Beyond Hello World

    In my previous blog, I explained how to get started with Amazon Lex and build simple bots. This blog aims at exploring the Lambda functions used by Amazon Lex for code validation and fulfillment. We will go along with the same example we created in our first blog i.e. purchasing a book and will see in details how the dots are connected.

    This blog is divided into following sections:

    1. Lambda function input format
    2. Response format
    3. Managing conversation context
    4. An example (demonstration to understand better how context is maintained to make data flow between two different intents)

    NOTE: Input to a Lambda function will change according to the language you use to create the function. Since we have used NodeJS for our example, everything will thus be explained using it.

    Section 1:  Lambda function input format

    When communication is started with a Bot, Amazon Lex passes control to Lambda function, we have defined while creating the bot.

    There are three arguments that Amazon Lex passes to a Lambda function:

    1. Event:

    event is a JSON variable containing all details regarding a bot conversation. Every time lambda function is invoked, event JSON is sent by Amazon Lex which contains the details of the respective message sent by the user to the bot.

    Below is a sample event JSON:

    {  
    currentIntent: {    
    name: 'orderBook',    
    slots: {      
      bookType: null,      
      bookName: 'null'    
     },    
      confirmationStatus: 'None'
     },
     bot: {  
      name: 'PurchaseBook',  
      alias: '$LATEST',  
      version: '$LATEST'
     },
     userId: 'user-1',
     inputTranscript: 'buy me a book',
     invocationSource: 'DialogCodeHook',
     outputDialogMode: 'Text',
     messageVersion: '1.0'
     };

    Format of event JSON is explained below:-

    • currentIntent:  It will contain information regarding the intent of message sent by the user to the bot. It contains following keys:
    • name: intent name  (for e.g orderBook, we defined this intent in our previous blog).
    • slots: It will contain a map of slot names configured for that particular intent,  populated with values recognized by Amazon Lex during the conversation. Default values are null.  
    • confirmationStatus:  It provides the user response to a confirmation prompt if there is one. Possible values for this variable are:
    • None: Default value
    • Confirmed: When the user responds with a confirmation w.r.t confirmation prompt.
    • Denied: When the user responds with a deny w.r.t confirmation prompt.
    • inputTranscipt: Text input by the user for processing. In case of audio input, the text will be extracted from audio. This is the text that is actually processed to recognize intents and slot values.                                
    • invocationSource: Its value directs the reason for invoking the Lambda function. It can have following two values:
    • DialogCodeHook:  This value directs the Lambda function to initialize the validation of user’s data input. If the intent is not clear, Amazon Lex can’t invoke the Lambda function.
    • FulfillmentCodeHook: This value is set to fulfil the intent. If the intent is configured to invoke a Lambda function as a fulfilment code hook, Amazon Lex sets the invocationSource to this value only after it has all the slot data to fulfil the intent.
    • bot: Details of bot that processed the request. It consists of below information:
    • name:  name of the bot.
    • alias: alias of the bot version.
    • version: the version of the bot.
    • userId: Its value is defined by the client application. Amazon Lex passes it to the Lambda function.
    • outputDialogMode:  Its value depends on how you have configured your bot. Its value can be Text / Voice.
    • messageVersion: The version of the message that identifies the format of the event data going into the Lambda function and the expected format of the response from a Lambda function. In the current implementation, only message version 1.0 is supported. Therefore, the console assumes the default value of 1.0 and doesn’t show the message version.
    • sessionAttributes:  Application-specific session attributes that the client sent in the request. It is optional.

    2. Context:

    AWS Lambda uses this parameter to provide the runtime information of the Lambda function that is executing. Some useful information we can get from context object are:-

    • The time is remaining before AWS Lambda terminates the Lambda function.
    • The CloudWatch log stream associated with the Lambda function that is executing.
    • The AWS request ID returned to the client that invoked the Lambda function which can be used for any follow-up inquiry with AWS support.

    Section 2: Response Format

    Amazon Lex expects a response from a Lambda function in the following format:

    {  
    sessionAttributes: {},  
    dialogAction: {   
    type: "ElicitIntent/ ElicitSlot/ ConfirmIntent/ Delegate/ Close",
    <structure based on type> 
    }
    }

    The response consists of two fields. The sessionAttributes field is optional, the dialogAction field is required. The contents of the dialogAction field depends on the value of the type field.

    • sessionAttributes: This is an optional field, it can be empty. If the function has to send something back to the client it should be passed under sessionAttributes. We will see its use-case in Section-4.
    • dialogAction (Required): Type of this field defines the next course of action. There are five types of dialogAction explained below:-

    1) Close: Informs Amazon Lex not to expect a response from the user. This is the case when all slots get filled. If you don’t specify a message, Amazon Lex uses the goodbye message or the follow-up message configured for the intent.

    dialogAction: {   
    type: "Close",   
    fulfillmentState: "Fulfilled/ Failed", // (required)   
    message: { // (optional)     
    contentType: "PlainText or SSML",     
    content: "Message to convey to the user"   
    } 
    }

    2) ConfirmIntent: Informs Amazon Lex that the user is expected to give a yes or no answer to confirm or deny the current intent. The slots field must contain an entry for each of the slots configured for the specified intent. If the value of a slot is unknown, you must set it to null. The message and responseCard fields are optional.

    dialogAction: {   
    type: "ConfirmIntent",   
    intentName: "orderBook",   
    slots: {     
      bookName: "value",     
      bookType: "value",   
     }   
     message: { // (optional)     
      contentType: "PlainText or SSML",     
      content: "Message to convey to the user"   
      } 
      }

    3) Delegate:  Directs Amazon Lex to choose the next course of action based on the bot configuration. The response must include any session attributes, and the slots field must include all of the slots specified for the requested intent. If the value of the field is unknown, you must set it to null. You will get a DependencyFailedException exception if your fulfilment function returns the Delegate dialog action without removing any slots.

    dialogAction: {   
    type: "Delegate",   
    slots: {     
      slot1: "value",     
      slot2: "value"   
     } 
     }

    4) ElicitIntent: Informs Amazon Lex that the user is expected to respond with an utterance that includes an intent. For example, “I want a buy a book” which indicates the OrderBook intent. The utterance “book,” on the other hand, is not sufficient for Amazon Lex to infer the user’s intent

    dialogAction: 
    {   type: "ElicitIntent",   
    message: { // (optional)     
    contentType: "PlainText or SSML",     
    content: "Message to convey to the user"   
    } 
    }

    5) ElicitSlot:  Informs Amazon Lex that the user is expected to provide a slot value in the response. In below structure, we are informing Amazon lex that user response should provide value for the slot named ‘bookName’.

    dialogAction: {   
      type: "ElicitSlot",   
      intentName: "orderBook",   
      slots: {     
        bookName: "",     
        bookType: "fiction",   
       },   
       slotToElicit: "bookName",   
       message: { // (optional)     
       contentType: "PlainText or SSML",     
       content: "Message to convey to the user"   
       }
       }

    Section 3: Managing Conversation Context

    Conversation context is the information that a user, your application, or a Lambda function provides to an Amazon Lex bot to fulfill an intent. Conversation context includes slot data that the user provides, request attributes set by the client application, and session attributes that the client application and Lambda functions create.

    1. Setting session timeout

    Session timeout is the length of time that a conversation session lasts. For in-progress conversations, Amazon Lex retains the context information, slot data, and session attributes till the session ends. Default session duration is 5 minutes but it can be changed upto 24 hrs while creating the bot in Amazon Lex console.

    2.Setting session attributes

    Session attributes contain application-specific information that is passed between a bot and a client application during a session. Amazon Lex passes session attributes to all Lambda functions configured for a bot. If a Lambda function adds or updates session attributes, Amazon Lex passes the new information back to the client application.

    Session attributes persist for the duration of the session. Amazon Lex stores them in an encrypted data store until the session ends.

    3. Sharing information between intents

    If you have created a bot with more than one intent, information can be shared between them using session attributes. Attributes defined while fulfilling an intent can be used in other defined intent.

    For example, a user of the book ordering bot starts by ordering books. the bot engages in a conversation with the user, gathering slot data, such as book name, and quantity. When the user places an order, the Lambda function that fulfils the order sets the lastConfirmedReservation session attribute containing information regarding ordered book and currentReservationPrice containing the price of the book. So, when the user has fulfilled the intent orderMagazine, the final price will be calculated on the bases of currentReservationPrice.

    lastConfirmedReservation session attribute containing information regarding ordered book and currentReservationPrice containing the price of the book. So, when the user also fulfilled the intent orderMagazine, the final price will be calculated on the basis of currentReservationPrice.

    Section 4:  Example

    The details of example Bot are below:

    Bot Name: PurchaseBot

    Intents :

    • orderBook – bookName, bookType
    • orderMagazine – magazineName, issueMonth

    Session attributes set while fulfilling the intent “orderBook” are:

    1. lastConfirmedReservation: In this variable, we are storing slot values corresponding to intent orderBook.
    2. currentReservationPrice: Book price is calculated and stored in this variable

    When intent orderBook gets fulfilled we will ask the user if he also wants to order a magazine. If the user responds with a confirmation bot will start fulfilling the intent “orderMagazine”.  

    Conclusion

    AWS Lambda functions are used as code hooks for your Amazon Lex bot. You can identify Lambda functions to perform initialization and validation, fulfillment, or both in your intent configuration. This blog bought more technical insight of how Amazon Lex works and how it communicates with Lambda functions. This blog explains how a conversation context is maintained using the session attributes. I hope you find the information useful.

  • Lessons Learnt While Building an ETL Pipeline for MongoDB & Amazon Redshift Using Apache Airflow

    Recently, I was involved in building an ETL (Extract-Transform-Load) pipeline. It included extracting data from MongoDB collections, perform transformations and then loading it into Redshift tables. Many ETL solutions are available in the market which kind-of solves the issue, but the key part of an ETL process lies in its ability to transform or process raw data before it is pushed to its destination.

    Each ETL pipeline comes with a specific business requirement around processing data which is hard to be achieved using off-the-shelf ETL solutions. This is why a majority of ETL solutions are custom built manually, from scratch. In this blog, I am going to talk about my learning around building a custom ETL solution which involved moving data from MongoDB to Redshift using Apache Airflow.

    Background:

    I began by writing a Python-based command line tool which supported different phases of ETL, like extracting data from MongoDB, processing extracted data locally, uploading the processed data to S3, loading data from S3 to Redshift, post-processing and cleanup. I used the PyMongo library to interact with MongoDB and the Boto library for interacting with Redshift and S3.

    I kept each operation atomic so that multiple instances of each operation can run independently of each other, which will help to achieve parallelism. One of the major challenges was to achieve parallelism while running the ETL tasks. One option was to develop our own framework based on threads or developing a distributed task scheduler tool using a message broker tool like Celery combined with RabbitMQ. After doing some research I settled for Apache Airflow. Airflow is a Python-based scheduler where you can define DAGs (Directed Acyclic Graphs), which would run as per the given schedule and run tasks in parallel in each phase of your ETL. You can define DAG as Python code and it also enables you to handle the state of your DAG run using environment variables. Features like task retries on failure handling are a plus.

    We faced several challenges while getting the above ETL workflow to be near real-time and fault tolerant. We discuss the challenges faced and the solutions below:

    Keeping your ETL code changes in sync with Redshift schema

    While you are building the ETL tool, you may end up fetching a new field from MongoDB, but at the same time, you have to add that column to the corresponding Redshift table. If you fail to do so the ETL pipeline will start failing. In order to tackle this, I created a database migration tool which would become the first step in my ETL workflow.

    The migration tool would:

    • keep the migration status in a Redshift table and
    • would track all migration scripts in a code directory.

    In each ETL run, it would get the most recently ran migrations from Redshift and would search for any new migration script available in the code directory. If found it would run the newly found migration script after which the regular ETL tasks would run. This adds the onus on the developer to add a migration script if he is making any changes like addition or removal of a field that he is fetching from MongoDB.

    Maintaining data consistency

    While extracting data from MongoDB, one needs to ensure all the collections are extracted at a specific point in time else there can be data inconsistency issues. We need to solve this problem at multiple levels:

    • While extracting data from MongoDB define parameters like modified date and extract data from different collections with a filter as records less than or equal to that date. This will ensure you fetch point in time data from MongoDB.
    • While loading data into Redshift tables, don’t load directly to master table, instead load it to some staging table. Once you are done loading data in staging for all related collections, load it to master from staging within a single transaction. This way data is either updated in all related tables or in none of the tables.

    A single bad record can break your ETL

    While moving data across the ETL pipeline into Redshift, one needs to take care of field formats. For example, the Date field in the incoming data can be different than that in the Redshift schema design. Another example can be that the incoming data can exceed the length of the field in the schema. Redshift’s COPY command which is used to load data from files to redshift tables is very vulnerable to such changes in data types. Even a single incorrectly formatted record will lead to all your data getting rejected and effectively breaking the ETL pipeline.

    There are multiple ways in which we can solve this problem. Either handle it in one of the transform jobs in the pipeline. Alternately we put the onus on Redshift to handle these variances. Redshift’s COPY command has many options which can help you solve these problems. Some of the very useful options are

    • ACCEPTANYDATE: Allows any date format, including invalid formats such as 00/00/00 00:00:00, to be loaded without generating an error.
    • ACCEPTINVCHARS: Enables loading of data into VARCHAR columns even if the data contains invalid UTF-8 characters.
    • TRUNCATECOLUMNS: Truncates data in columns to the appropriate number of characters so that it fits the column specification.

    Redshift going out of storage

    Redshift is based on PostgreSQL and one of the common problems is when you delete records from Redshift tables it does not actually free up space. So if your ETL process is deleting and creating new records frequently, then you may run out of Redshift storage space. VACUUM operation for Redshift is the solution to this problem. Instead of making VACUUM operation a part of your main ETL flow, define a different workflow which runs on a different schedule to run VACUUM operation. VACUUM operation reclaims space and resorts rows in either a specified table or all tables in the current database. VACUUM operation can be FULL, SORT ONLY, DELETE ONLY & REINDEX. More information on VACUUM can be found here.

    ETL instance going out of storage

    Your ETL will be generating a lot of files by extracting data from MongoDB onto your ETL instance. It is very important to periodically delete those files otherwise you are very likely to go out of storage on your ETL server. If your data from MongoDB is huge, you might end up creating large files on your ETL server. Again, I would recommend defining a different workflow which runs on a different schedule to run a cleanup operation.

    Making ETL Near Real Time

    Processing only the delta rather than doing a full load in each ETL run

    ETL would be faster if you keep track of the already processed data and process only the new data. If you are doing a full load of data in each ETL run, then the solution would not scale as your data scales. As a solution to this, we made it mandatory for the collection in our MongoDB to have a created and a modified date. Our ETL would check the maximum value of the modified date for the given collection from the Redshift table. It will then generate the filter query to fetch only those records from MongoDB which have modified date greater than that of the maximum value. It may be difficult for you to make changes in your product, but it’s worth the effort!

    Compressing and splitting files while loading

    A good approach is to write files in some compressed format. It saves your storage space on ETL server and also helps when you load data to Redshift. Redshift COPY command suggests that you provide compressed files as input. Also instead of a single huge file, you should split your files into parts and give all files to a single COPY command. This will enable Redshift to use it’s computing resources across the cluster to do the copy in parallel, leading to faster loads.

    Streaming mongo data directly to S3 instead of writing it to ETL server

    One of the major overhead in the ETL process is to write data first to ETL server and then uploading it to S3. In order to reduce disk IO, you should not store data to ETL server. Instead, use MongoDB’s handy stream API. For MongoDB Node driver, both the collection.find() and the collection.aggregate() function return cursors. The stream method also accepts a transform function as a parameter. All your custom transform logic could go into the transform function. AWS S3’s node library’s upload() function, also accepts readable streams. Use the stream from the MongoDB Node stream method, pipe it into zlib to gzip it, then feed the readable stream into AWS S3’s Node library. Simple! You will see a large improvement in your ETL process by this simple but important change.

    Optimizing Redshift Queries

    Optimizing Redshift Queries helps in making the ETL system highly scalable, efficient and also reduce the cost. Lets look at some of the approaches:

    Add a distribution key

    Redshift database is clustered, meaning your data is stored across cluster nodes. When you query for certain set of records, Redshift has to search for those records in each node, leading to slow queries. A distribution key is a single metric, which will decide the data distribution of all data records across your tables. If you have a single metric which is available for all your data, you can specify it as distribution key. When loading data into Redshift, all data for a certain value of distribution key will be placed on a single node of Redshift cluster. So when you query for certain records Redshift knows exactly where to search for your data. This is only useful when you are also using the distribution key to query the data.

    Source: Slideshare

     

    Generating a numeric primary key for string primary key

    In MongoDB, you can have any type of field as your primary key. If your Mongo collections are having a non-numeric primary key and you are using those same keys in Redshift, your joins will end up being on string keys which are slower. Instead, generate numeric keys for your string keys and joining on it which will make queries run much faster. Redshift supports specifying a column with an attribute as IDENTITY which will auto-generate numeric unique value for the column which you can use as your primary key.

    Conclusion:

    In this blog, I have covered the best practices around building ETL pipelines for Redshift  based on my learning. There are many more recommended practices which can be easily found in Redshift and MongoDB documentation. 

  • A Quick Guide to Building a Serverless Chatbot With Amazon Lex

    Amazon announced “Amazon Lex” in December 2016 and since then we’ve been using it to build bots for our customers. Lex is effectively the technology used by Alexa, Amazon’s voice-activated virtual assistant which lets people control things with voice commands such as playing music, setting alarm, ordering groceries, etc. It provides deep learning-powered natural-language understanding along with automatic speech recognition. Amazon now provides it as a service that allows developers to take advantage of the same features used by Amazon Alexa. So, now there is no need to spend time in setting up and managing the infrastructure for your bots.

    Now, developers just need to design conversations according to their requirements in Lex console. The phrases provided by the developer are used to build the natural language model. After publishing the bot, Lex will process the text or voice conversations and execute the code to send responses.

    I’ve put together this quick-start tutorial using which you can start building Lex chat-bots. To understand the terms correctly, let’s consider an e-commerce bot that supports conversations involving the purchase of books.

    Lex-Related Terminologies

    Bot: It consists of all the components related to a conversation, which includes:

    • Intent: Intent represents a goal, needed to be achieved by the bot’s user. In our case, our goal is to purchase books.
    • Utterances: An utterance is a text phrase that invokes intent. If we have more than one intent, we need to provide different utterances for them. Amazon Lex builds a language model based on utterance phrases provided by us, which then invoke the required intent. For our demo example, we need a single intent “OrderBook”. Some sample utterances would be:
    • I want to order some books
    • Can you please order a book for me
    • Slots: Each slot is a piece of data that the user must supply in order to fulfill the intent. For instance, purchasing a book requires bookType and bookName as slots for intent “OrderBook” (I am considering these two factors for making the example simpler, otherwise there are so many other factors based on which one will purchase/select a book.). 
      Slots are an input, a string, date, city, location, boolean, number etc. that are needed to reach the goal of the intent. Each slot has a name, slot type, a prompt, and is it required. The slot types are the valid values a user can respond with, which can be either custom defined or one of the Amazon pre-built types.
    • Prompt: A prompt is a question that Lex uses to ask the user to supply some correct data (for a slot) that is needed to fulfill an intent e.g. Lex will ask  “what type of book you want to buy?” to fill the slot bookType.
    • Fulfillment: Fulfillment provides the business logic that is executed after getting all required slot values, need to achieve the goal. Amazon Lex supports the use of Lambda functions for fulfillment of business logic and for validations.

    Let’s Implement this Bot!

    Now that we are aware of the basic terminology used in Amazon Lex, let’s start building our chat-bot.

    Creating Lex Bot:

    • Go to Amazon Lex console, which is available only in US, East (N. Virginia) region and click on create button.
    • Create a custom bot by providing following information:
    1. Bot Name: PurchaseBook
    2. Output voice: None, this is only a test based application
    3. Set Session Timeout: 5 min
    4. Add Amazon Lex basic role to Bot app: Amazon will create it automatically.  Find out more about Lex roles & permissions here.
    5. Click on Create button, which will redirect you to the editor page.

    Architecting Bot Conversations

    Create Slots: We are creating two slots named bookType and bookName. Slot type values can be chosen from 275 pre-built types provided by Amazon or we can create our own customized slot types.

    Create custom slot type for bookType as shown here and consider predefined type named Amazon.Book for bookName.

    Create Intent: Our bot requires single custom intent named OrderBook.

    Configuring the Intents

    • Utterances: Provide some utterances to invoke the intent. An utterance can consist only of Unicode characters, spaces, and valid punctuation marks. Valid punctuation marks are periods for abbreviations, underscores, apostrophes, and hyphens. If there is a slot placeholder in your utterance ensure, that it’s in the {slotName} format and has spaces at both ends.

    Slots: Map slots with their types and provide prompt questions that need to be asked to get valid value for the slot. Note the sequence, Lex-bot will ask the questions according to priority.

    Confirmation prompt: This is optional. If required you can provide a confirmation message e.g. Are you sure you want to purchase book named {bookName}?, where bookName is a slot placeholder.

    Fulfillment: Now we have all necessary data gathered from the chatbot, it can just be passed over in lambda function, or the parameters can be returned to the client application that then calls a REST endpoint.

    Creating Amazon Lambda Functions

    Amazon Lex supports Lambda function to provide code hooks to the bot. These functions can serve multiple purposes such as improving the user interaction with the bot by using prior knowledge, validating the input data that bot received from the user and fulfilling the intent.

    • Go to AWS Lambda console and choose to Create a Lambda function.
    • Select blueprint as blank function and click next.
    • To configure your Lambda function, provide its name, runtime and code needs to be executed when the function is invoked. The code can also be uploaded in a zip folder instead of providing it as inline code. We are considering Nodejs4.3 as runtime.
    • Click next and choose Create Function.

    We can configure our bot to invoke these lambda functions at two places. We need to do this while configuring the intent as shown below:-

    where, botCodeHook and fulfillment are name of lambda functions we created.

    Lambda initialization and validation  

    Lambda function provided here i.e. botCodeHook will be invoked on each user input whose intent is understood by Amazon Lex. It will validate the bookName with predefined list of books.

    'use strict';
    exports.handler = (event, context, callback) => {
        const sessionAttributes = event.sessionAttributes;
        const slots = event.currentIntent.slots;
        const bookName = slots.bookName;
      
        // predefined list of available books
        const validBooks = ['harry potter', 'twilight', 'wings of fire'];
      
        // negative check: if valid slot value is not obtained, inform lex that user is expected 
        // respond with a slot value 
        if (bookName && !(bookName === "") && validBooks.indexOf(bookName.toLowerCase()) === -1) {
            let response = { sessionAttributes: event.sessionAttributes,
              dialogAction: {
                type: "ElicitSlot",
                 message: {
                   contentType: "PlainText",
                   content: `We do not have book: ${bookName}, Provide any other book name. For. e.g twilight.`
                },
                 intentName: event.currentIntent.name,
                 slots: slots,
                 slotToElicit : "bookName"
              }
            }
            callback(null, response);
        }
      
        // if valid book name is obtained, send command to choose next course of action
        let response = {sessionAttributes: sessionAttributes,
          dialogAction: {
            type: "Delegate",
            slots: event.currentIntent.slots
          }
        }
        callback(null, response);
    };

    Fulfillment code hook

    This lambda function is invoked after receiving all slot data required to fulfill the intent.

    'use strict';
    
    exports.handler = (event, context, callback) => {
        // when intent get fulfilled, inform lex to complete the state
        let response = {sessionAttributes: event.sessionAttributes,
          dialogAction: {
            type: "Close",
            fulfillmentState: "Fulfilled",
            message: {
              contentType: "PlainText",
              content: "Thanks for purchasing book."
            }
          }
        }
        callback(null, response);
    };

    Error Handling: We can customize the error message for our bot users. Click on error handling and replace default values with the required ones. Since the number of retries given is two, we can also provide different message for every retry.

    Your Bot is Now Ready To Chat

    Click on Build to build the chat-bot. Congratulations! Your Lex chat-bot is ready to test. We can test it in the overlay which appears in the Amazon Lex console.

    Sample conversations:

    I hope you have understood the basic terminologies of Amazon Lex along with how to create a simple chat-bot using serverless (Amazon Lambda). This is a really powerful platform to build mature and intelligent chatbots.

  • Taking Amazon’s Elastic Kubernetes Service for a Spin

    With the introduction of Elastic Kubernetes service at AWS re: Invent last year, AWS finally threw their hat in the ever booming space of managed Kubernetes services. In this blog post, we will learn the basic concepts of EKS, launch an EKS cluster and also deploy a multi-tier application on it.

    What is Elastic Kubernetes service (EKS)?

    Kubernetes works on a master-slave architecture. The master is also referred to as control plane. If the master goes down it brings our entire cluster down, thus ensuring high availability of master is absolutely critical as it can be a single point of failure. Ensuring high availability of master and managing all the worker nodes along with it becomes a cumbersome task in itself, thus it is most desirable for organizations to have managed Kubernetes cluster so that they can focus on the most important task which is to run their applications rather than managing the cluster. Other cloud providers like Google cloud and Azure already had their managed Kubernetes service named GKE and AKS respectively. Similarly now with EKS Amazon has also rolled out its managed Kubernetes cluster to provide a seamless way to run Kubernetes workloads.

    Key EKS concepts:

    EKS takes full advantage of the fact that it is running on AWS so instead of creating Kubernetes specific features from the scratch they have reused/plugged in the existing AWS services with EKS for achieving Kubernetes specific functionalities. Here is a brief overview:

    IAM-integration: Amazon EKS integrates IAM authentication with Kubernetes RBAC ( role-based access control system native to Kubernetes) with the help of Heptio Authenticator which is a tool that uses AWS IAM credentials to authenticate to a Kubernetes cluster. Here we can directly attach an RBAC role with an IAM entity this saves the pain of managing another set of credentials at the cluster level.

    Container Interface:  AWS has developed an open source cni plugin which takes advantage of the fact that multiple network interfaces can be attached to a single EC2 instance and these interfaces can have multiple secondary private ips associated with them, these secondary ips are used to provide pods running on EKS with real ip address from VPC cidr pool. This improves the latency for inter pod communications as the traffic flows without any overlay.  

    ELB Support:  We can use any of the AWS ELB offerings (classic, network, application) to route traffic to our service running on the working nodes.

    Auto scaling:  The number of worker nodes in the cluster can grow and shrink using the EC2 auto scaling service.

    Route 53: With the help of the External DNS project and AWS route53 we can manage the DNS entries for the load balancers which get created when we create an ingress object in our EKS cluster or when we create a service of type LoadBalancer in our cluster. This way the DNS names are always in sync with the load balancers and we don’t have to give separate attention to it.   

    Shared responsibility for cluster: The responsibilities of an EKS cluster is shared between AWS and customer. AWS takes care of the most critical part of managing the control plane (api server and etcd database) and customers need to manage the worker node. Amazon EKS automatically runs Kubernetes with three masters across three Availability Zones to protect against a single point of failure, control plane nodes are also monitored and replaced if they fail, and are also patched and updated automatically this ensures high availability of the cluster and makes it extremely simple to migrate existing workloads to EKS.

    Prerequisites for launching an EKS cluster:

    1.  IAM role to be assumed by the cluster: Create an IAM role that allows EKS to manage a cluster on your behalf. Choose EKS as the service which will assume this role and add AWS managed policies ‘AmazonEKSClusterPolicy’ and ‘AmazonEKSServicePolicy’ to it.

    2.  VPC for the cluster:  We need to create the VPC where our cluster is going to reside. We need a VPC with subnets, internet gateways and other components configured. We can use an existing VPC for this if we wish or create one using the CloudFormation script provided by AWS here or use the Terraform script available here. The scripts take ‘cidr’ block of the VPC and three other subnets as arguments.

    Launching an EKS cluster:

    1.  Using the web console: With the prerequisites in place now we can go to the EKS console and launch an EKS cluster when we try to launch an EKS cluster we need to provide a the name of the EKS cluster, choose the Kubernetes version to use, provide the IAM role we created in step one and also choose a VPC, once we choose a VPC we also need to select subnets from the VPC where we want our worker nodes to be launched by default all the subnets in the VPC are selected we also need to provide a security group which is applied to the elastic network interfaces (eni) that EKS creates to allow control plane communicate with the worker nodes.

    NOTE: Couple of things to note here is that the subnets must be in at least two different availability zones and the security group that we provided is later updated when we create worker node cluster so it is better to not use this security group with any other entity or be completely sure of the changes happening to it.

    2. Using awscli :

    aws eks create-cluster --name eks-blog-cluster --role-arn arn:aws:iam::XXXXXXXXXXXX:role/eks-service-role  
    --resources-vpc-config subnetIds=subnet-0b8da2094908e1b23,subnet-01a46af43b2c5e16c,securityGroupIds=sg-03fa0c02886c183d4

    {
        "cluster": {
            "status": "CREATING",
            "name": "eks-blog-cluster",
            "certificateAuthority": {},
            "roleArn": "arn:aws:iam::XXXXXXXXXXXX:role/eks-service-role",
            "resourcesVpcConfig": {
                "subnetIds": [
                    "subnet-0b8da2094908e1b23",
                    "subnet-01a46af43b2c5e16c"
                ],
                "vpcId": "vpc-0364b5ed9f85e7ce1",
                "securityGroupIds": [
                    "sg-03fa0c02886c183d4"
                ]
            },
            "version": "1.10",
            "arn": "arn:aws:eks:us-east-1:XXXXXXXXXXXX:cluster/eks-blog-cluster",
            "createdAt": 1535269577.147
        }
    }

    In the response, we see that the cluster is in creating state. It will take a few minutes before it is available. We can check the status using the below command:

    aws eks describe-cluster --name=eks-blog-cluster

    Configure kubectl for EKS:

    We know that in Kubernetes we interact with the control plane by making requests to the API server. The most common way to interact with the API server is via kubectl command line utility. As our cluster is ready now we need to install kubectl.

    1.  Install the kubectl binary

    curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s 
    https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl

    Give executable permission to the binary.

    chmod +x ./kubectl

    Move the kubectl binary to a folder in your system’s $PATH.

    sudo cp ./kubectl /bin/kubectl && export PATH=$HOME/bin:$PATH

    As discussed earlier EKS uses AWS IAM Authenticator for Kubernetes to allow IAM authentication for your Kubernetes cluster. So we need to download and install the same.

    2.  Install aws-iam-authenticator

    curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator

    Give executable permission to the binary

    chmod +x ./aws-iam-authenticator

    Move the aws-iam-authenticator binary to a folder in your system’s $PATH.

    sudo cp ./aws-iam-authenticator /bin/aws-iam-authenticator

    3.  Create the kubeconfig file

    First create the directory.

    mkdir -p ~/.kube

    Open a config file in the folder created above

    sudo vi .kube/config-eks-blog-cluster

    Paste the below code in the file

    clusters:      
    - cluster:       
    server: https://DBFE36D09896EECAB426959C35FFCC47.sk1.us-east-1.eks.amazonaws.com        
    certificate-authority-data: ”....................”        
    name: kubernetes        
    contexts:        
    - context:             
    cluster: kubernetes             
    user: aws          
    name: aws        
    current-context: aws        
    kind: Config       
    preferences: {}        
    users:           
    - name: aws            
    user:                
    exec:                    
    apiVersion: client.authentication.k8s.io/v1alpha1                    
    command: aws-iam-authenticator                    
    args:                       
    - "token"                       
    - "-i"                     
    - “eks-blog-cluster"

    Replace the values of the server and certificateauthority data with the values of your cluster and certificate and also update the cluster name in the args section. You can get these values from the web console as well as using the command.

    aws eks describe-cluster --name=eks-blog-cluster

    Save and exit.

    Add that file path to your KUBECONFIG environment variable so that kubectl knows where to look for your cluster configuration.

    export KUBECONFIG=$KUBECONFIG:~/.kube/config-eks-blog-cluster

    To verify that the kubectl is now properly configured :

    kubectl get all
    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    service/kubernetes ClusterIP 172.20.0.1  443/TCP 50m

    Launch and configure worker nodes :

    Now we need to launch worker nodes before we can start deploying apps. We can create the worker node cluster by using the CloudFormation script provided by AWS which is available here or use the Terraform script available here.

    • ClusterName: Name of the Amazon EKS cluster we created earlier.
    • ClusterControlPlaneSecurityGroup: Id of the security group we used in EKS cluster.
    • NodeGroupName: Name for the worker node auto scaling group.
    • NodeAutoScalingGroupMinSize: Minimum number of worker nodes that you always want in your cluster.
    • NodeAutoScalingGroupMaxSize: Maximum number of worker nodes that you want in your cluster.
    • NodeInstanceType: Type of worker node you wish to launch.
    • NodeImageId: AWS provides Amazon EKS-optimized AMI to be used as worker nodes. Currently AKS is available in only two AWS regions Oregon and N.virginia and the AMI ids are ami-02415125ccd555295 and ami-048486555686d18a0 respectively
    • KeyName: Name of the key you will use to ssh into the worker node.
    • VpcId: Id of the VPC that we created earlier.
    • Subnets: Subnets from the VPC we created earlier.

    To enable worker nodes to join your cluster, we need to download, edit and apply the AWS authenticator config map.

    Download the config map:

    curl -O https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/aws-auth-cm.yaml

    Open it in an editor

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: aws-auth
      namespace: kube-system
    data:
      mapRoles: |
        - rolearn: <ARN of instance role (not instance profile)>
          username: system:node:{{EC2PrivateDNSName}}
          groups:
            - system:bootstrappers
            - system:nodes

    Edit the value of rolearn with the arn of the role of your worker nodes. This value is available in the output of the scripts that you ran. Save the change and then apply

    kubectl apply -f aws-auth-cm.yaml

    Now you can check if the nodes have joined the cluster or not.

    kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    ip-10-0-2-171.ec2.internal Ready  12s v1.10.3
    ip-10-0-3-58.ec2.internal Ready  14s v1.10.3

    Deploying an application:

    As our cluster is completely ready now we can start deploying applications on it. We will deploy a simple books api application which connects to a mongodb database and allows users to store,list and delete book information.

    1. MongoDB Deployment YAML

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: mongodb
    spec:
      template:
        metadata:
          labels:
            app: mongodb
        spec:
          containers:
          - name: mongodb
            image: mongo
            ports:
            - name: mongodbport
              containerPort: 27017
              protocol: TCP

    2. Test Application Development YAML

    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: test-app
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            app: test-app
        spec:
          containers:
          - name: test-app
            image: akash125/pyapp
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 3000

    3. MongoDB Service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: mongodb-service
    spec:
      ports:
      - port: 27017
        targetPort: 27017
        protocol: TCP
        name: mongodbport
      selector:
        app: mongodb

    4. Test Application Service YAML

    apiVersion: v1
    kind: Service
    metadata:
      name: test-service
    spec:
      type: LoadBalancer
      ports:
      - name: test-service
        port: 80
        protocol: TCP
        targetPort: 3000
      selector:
        app: test-app

    Services

    $ kubectl create -f mongodb-service.yaml
    $ kubectl create -f testapp-service.yaml

    Deployments

    $ kubectl create -f mongodb-deployment.yaml
    $ kubectl create -f testapp-deployment.yaml$ kubectl get services
    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 12m
    mongodb-service ClusterIP 172.20.55.194 <none> 27017/TCP 4m
    test-service LoadBalancer 172.20.188.77 a7ee4f4c3b0ea 80:31427/TCP 3m

    In the EXTERNAL-IP section of the test-service we see dns of an load balancer we can now access the application from outside the cluster using this dns.

    To Store Data :

    curl -X POST -d '{"name":"A Game of Thrones (A Song of Ice and Fire)“, "author":"George R.R. Martin","price":343}' http://a7ee4f4c3b0ea11e8b0f912f36098e4d-672471149.us-east-1.elb.amazonaws.com/books
    {"id":"5b8fab49fa142b000108d6aa","name":"A Game of Thrones (A Song of Ice and Fire)","author":"George R.R. Martin","price":343}

    To Get Data :

    curl -X GET http://a7ee4f4c3b0ea11e8b0f912f36098e4d-672471149.us-east-1.elb.amazonaws.com/books
    [{"id":"5b8fab49fa142b000108d6aa","name":"A Game of Thrones (A Song of Ice and Fire)","author":"George R.R. Martin","price":343}]

    We can directly put the URL used in the curl operation above in our browser as well, we will get the same response.

    Now our application is deployed on EKS and can be accessed by the users.

    Comparison BETWEEN GKE, ECS and EKS:

    Cluster creation: Creating GKE and ECS cluster is way simpler than creating an EKS cluster. GKE being the simplest of all three.

    Cost: In case of both, GKE and ECS we pay only for the infrastructure that is visible to us i.e., servers, volumes, ELB etc. and there is no cost for master nodes or other cluster management services but with EKS there is a charge of 0.2 $ per hour for the control plane.

    Add-ons: GKE provides the option of using Calico as the network plugin which helps in defining network policies for controlling inter pod communication (by default all pods in k8s can communicate with each other).

    Serverless: ECS cluster can be created using Fargate which is container as a Service (CaaS) offering from AWS. Similarly EKS is also expected to support Fargate very soon.

    In terms of availability and scalability all the services are at par with each other.

    Conclusion:

    In this blog post we learned the basics concepts of EKS, launched our own EKS cluster and deployed an application as well. EKS is much awaited service from AWS especially for the folks who were already running their Kubernetes workloads on AWS, as now they can easily migrate to EKS and have a fully managed Kubernetes control plane. EKS is expected to be adopted by many organisations in near future.

    References: