Tag: serverless

Acquiring Temporary AWS Credentials with Browser Navigated Authentication

In one of my previous blog posts (Hacking your way around AWS IAM Roles), we demonstrated how users can access AWS resources without having to store AWS credentials on disk. This was achieved by setting up an OpenVPN server and client-side route that gets automatically pushed when the user is connected to the VPN. To this date, I really find this as a complaint-friendly solution without forcing users to do any manual configuration on their system. It also makes sense to have access to AWS resources as long as they are connected on VPN. One of the downsides to this method is maintaining an OpenVPN server, keeping it secure and having it running in a highly available (HA) state. If the OpenVPN server is compromised, our credentials are at stake. Secondly, all the users connected on VPN get the same level of access.

In this blog post, we present to you a CLI utility written in Rust that writes temporary AWS credentials to a user profile (~/.aws/credentials file) using web browser navigated Google authentication. This utility is inspired by gimme-aws-creds (written in python for Okta authenticated AWS farm) and heroku cli (written in nodejs and utilizes oclif framework). We will refer to our utility as aws-authcreds throughout this post.

“If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”‍

– George Bernard Shaw

What does this CLI utility (auth-awscreds) do?

When the user fires a command (auth-awscreds) on the terminal, our program reads utility configuration from file .auth-awscreds located in the user home directory. If this file is not present, the utility prompts for setting the configuration for the first time. Utility configuration file is INI format. Program then opens a default web browser and navigates to the URL read from the configuration file. At this point, the utility waits for the browser URL to navigate and authorize. Web UI then navigates to Google Authentication. If authentication is successful, a callback is shared with CLI utility along with temporary AWS credentials, which is then written to ~/.aws/credentials file.

Tech Stack Used

As stated earlier, we wrote this utility in Rust. One of the reasons for choosing Rust is because we wanted a statically typed binary (ELF) file (executed independent of interpreter), which ships as it is when compiled. Unlike programs written in Python or Node.js, one needs a language interpreter and has supporting libraries installed for your program. The golang would have also suffice our purpose, but I prefer Rust over golang.

Software Stack:

Rust (for CLI utility)
Actix Web – HTTP Server
Node.js, Express, ReactJS, serverless-http, aws-sdk, AWS Amplify, axios
Terraform and serverless framework

Infrastructure Stack:

AWS Cognito (User Pool and Federated Identities)
AWS API Gateway (HTTP API)
AWS Lambda
AWS S3 Bucket (React App)
AWS CloudFront (For Serving React App)
AWS ACM (SSL Certificate)

Recipe

CLI Utility: auth-awscreds

Our goal is, when the auth-awscreds command is fired, we first check if the user’s home directory ~/.aws/credentials file exists. If not, we create a ~/.aws directory. This is the default AWS credentials directory, where usually AWS SDK looks for credentials (unless exclusively specified by env var AWS_SHARED_CREDENTIALS_FILE). The next step would be to check if a ~/.auth-awscredds file exists. If this file doesn’t exist, we create a prompt user with two inputs:

1. AWS credentials profile name (used by SDK, default is preferred)

2. Application domain URL (Our backend app domain is used for authentication)

let app_profile_file = format!("{}/.auth-awscreds",&user_home_dir);
 
   let config_exist : bool = Path::new(&app_profile_file).exists();
 
   let mut profile_name = String::new();
   let mut app_domain = String::new();
 
   if !config_exist {
       //ask the series of questions
       print!("Which profile to write AWS Credentials [default] : ");
       io::stdout().flush().unwrap();
       io::stdin()
           .read_line(&mut profile_name)
           .expect("Failed to read line");
 
       print!("App Domain : ");
       io::stdout().flush().unwrap();
      
       io::stdin()
           .read_line(&mut app_domain)
           .expect("Failed to read line");
      
       profile_name=String::from(profile_name.trim());
       app_domain=String::from(app_domain.trim());
      
       config_profile(&profile_name,&app_domain);
      
   }
   else {
       (profile_name,app_domain) = read_profile();
   }

let app_profile_file = format!("{}/.auth-awscreds",&user_home_dir);
 
   let config_exist : bool = Path::new(&app_profile_file).exists();
 
   let mut profile_name = String::new();
   let mut app_domain = String::new();
 
   if !config_exist {
       //ask the series of questions
       print!("Which profile to write AWS Credentials [default] : ");
       io::stdout().flush().unwrap();
       io::stdin()
           .read_line(&mut profile_name)
           .expect("Failed to read line");
 
       print!("App Domain : ");
       io::stdout().flush().unwrap();
      
       io::stdin()
           .read_line(&mut app_domain)
           .expect("Failed to read line");
      
       profile_name=String::from(profile_name.trim());
       app_domain=String::from(app_domain.trim());
      
       config_profile(&profile_name,&app_domain);
      
   }
   else {
       (profile_name,app_domain) = read_profile();
   }

These two properties are written in ~/.auth-awscreds under the default section. Followed by this, our utility generates RSA asymmetric 1024 bit public and private key. Both the keypair are converted to base64.

pub fn genkeypairs() -> (String,String) {
   let rsa = Rsa::generate(1024).unwrap();
 
   let private_key: Vec<u8> = rsa.private_key_to_pem_passphrase(Cipher::aes_128_cbc(),"Sagar Barai".as_bytes()).unwrap();
   let public_key: Vec<u8> = rsa.public_key_to_pem().unwrap();
 
   (base64::encode(private_key) , base64::encode(public_key))
}

pub fn genkeypairs() -> (String,String) {
   let rsa = Rsa::generate(1024).unwrap();
 
   let private_key: Vec<u8> = rsa.private_key_to_pem_passphrase(Cipher::aes_128_cbc(),"Sagar Barai".as_bytes()).unwrap();
   let public_key: Vec<u8> = rsa.public_key_to_pem().unwrap();
 
   (base64::encode(private_key) , base64::encode(public_key))
}

We then launch a browser window and navigate to the specified app domain URL. At this stage, our utility starts a temporary web server with the help of the Actix Web framework and listens on 63442 port of localhost.

println!("Opening web ui for authentication...!");
   open::that(&app_domain).unwrap();
 
   HttpServer::new(move || {
       //let stopper = tx.clone();
       let cors = Cors::permissive();
       App::new()
       .wrap(cors)
       //.app_data(stopper)
       .app_data(crypto_data.clone())
       .service(get_public_key)
       .service(set_aws_creds)
   })
   .bind(("127.0.0.1",63442))?
   .run()
   .await

println!("Opening web ui for authentication...!");
   open::that(&app_domain).unwrap();
 
   HttpServer::new(move || {
       //let stopper = tx.clone();
       let cors = Cors::permissive();
       App::new()
       .wrap(cors)
       //.app_data(stopper)
       .app_data(crypto_data.clone())
       .service(get_public_key)
       .service(set_aws_creds)
   })
   .bind(("127.0.0.1",63442))?
   .run()
   .await

Localhost web server has two end points.

1. GET Endpoint (/publickey): This endpoint is called by our React app after authentication and returns the public key created during the initialization process. Since the web server hosted by the Rust application is insecure (non ssl), when actual AWS credentials are received, they should be posted as an encrypted string with the help of this public key.

#[get("/publickey")]
pub async fn get_public_key(data: web::Data<AppData>) -> impl Responder {
   let public_key = &data.public_key;
  
   web::Json(HTTPResponseData{
       status: 200,
       msg: String::from("Ok"),
       success: true,
       data: String::from(public_key)
   })
}

#[get("/publickey")]
pub async fn get_public_key(data: web::Data<AppData>) -> impl Responder {
   let public_key = &data.public_key;
  
   web::Json(HTTPResponseData{
       status: 200,
       msg: String::from("Ok"),
       success: true,
       data: String::from(public_key)
   })
}

2. POST Endpoint (/setcreds): This endpoint is called when the react app has successfully retrieved credentials from API Gateway. Credentials are decrypted by private key and then written to ~/.aws/credentials file defined by profile name in utility configuration.

let encrypted_data = payload["data"].as_array().unwrap();
   let username = payload["username"].as_str().unwrap();
 
   let mut decypted_payload = vec![];
 
   for str in encrypted_data.iter() {
       //println!("{}",str.to_string());
       let s = str.as_str().unwrap();
       let decrypted = decrypt_data(&private_key, &s.to_string());
       decypted_payload.extend_from_slice(&decrypted);
   }
 
   let credentials : serde_json::Value = serde_json::from_str(&String::from_utf8(decypted_payload).unwrap()).unwrap();
 
   let aws_creds = AWSCreds{
       profile_name: String::from(profile_name),
       aws_access_key_id: String::from(credentials["AccessKeyId"].as_str().unwrap()),
       aws_secret_access_key: String::from(credentials["SecretAccessKey"].as_str().unwrap()),
       aws_session_token: String::from(credentials["SessionToken"].as_str().unwrap())
   };
 
   println!("Authenticated as {}",username);
   println!("Updating AWS Credentials File...!");
 
   configcreds(&aws_creds);

let encrypted_data = payload["data"].as_array().unwrap();
   let username = payload["username"].as_str().unwrap();
 
   let mut decypted_payload = vec![];
 
   for str in encrypted_data.iter() {
       //println!("{}",str.to_string());
       let s = str.as_str().unwrap();
       let decrypted = decrypt_data(&private_key, &s.to_string());
       decypted_payload.extend_from_slice(&decrypted);
   }
 
   let credentials : serde_json::Value = serde_json::from_str(&String::from_utf8(decypted_payload).unwrap()).unwrap();
 
   let aws_creds = AWSCreds{
       profile_name: String::from(profile_name),
       aws_access_key_id: String::from(credentials["AccessKeyId"].as_str().unwrap()),
       aws_secret_access_key: String::from(credentials["SecretAccessKey"].as_str().unwrap()),
       aws_session_token: String::from(credentials["SessionToken"].as_str().unwrap())
   };
 
   println!("Authenticated as {}",username);
   println!("Updating AWS Credentials File...!");
 
   configcreds(&aws_creds);

One of the interesting parts of this code is the decryption process, which iterates through an array of strings and is joined by method decypted_payload.extend_from_slice(&decrypted);. RSA 1024 is 128-byte encryption, and we used OAEP padding, which uses 42 bytes for padding and the rest for encrypted data. Thus, 86 bytes can be encrypted at max. So, when credentials are received they are an array of 128 bytes long base64 encoded data. One has to decode the bas64 string to a data buffer and then decrypt data piece by piece.

To generate a statically typed binary file, run: cargo build –release

AWS Cognito and Google Authentication

This guide does not cover how to set up Cognito and integration with Google Authentication. You can refer to our old post for a detailed guide on setting up authentication and authorization. (Refer to the sections Setup Authentication and Setup Authorization).

React App:

The React app is launched via our Rust CLI utility. This application is served right from the S3 bucket via CloudFront. When our React app is loaded, it checks if the current session is authenticated. If not, then with the help of the AWS Amplify framework, our app is redirected to Cognito-hosted UI authentication, which in turn auto redirects to Google Login page.

render(){
   return (
     <div className="centerdiv">
       {
         this.state.appInitialised ?
           this.state.user === null ? Auth.federatedSignIn({provider: 'Google'}) :
           <Aux>
             {this.state.pageContent}
           </Aux>
         :
         <Loader/>
       }
     </div>
   )
 }

render(){
   return (
     <div className="centerdiv">
       {
         this.state.appInitialised ?
           this.state.user === null ? Auth.federatedSignIn({provider: 'Google'}) :
           <Aux>
             {this.state.pageContent}
           </Aux>
         :
         <Loader/>
       }
     </div>
   )
 }

Once the session is authenticated, we set the react state variables and then retrieve the public key from the actix web server (Rust CLI App: auth-awscreds) by calling /publickey GET method. Followed by this, an Ajax POST request (/auth-creds) is made via axios library to API Gateway. The payload contains a public key, and JWT token for authentication. Expected response from API gateway is encrypted AWS temporary credentials which is then proxied to our CLI application.

To ease this deployment, we have written a terraform code (available in the repository) that takes care of creating an S3 bucket, CloudFront distribution, ACM, React build, and deploying it to the S3 bucket. Navigate to vars.tf file and change the respective default variables). The Terraform script will fail at first launch since the ACM needs a DNS record validation. You can create a CNAME record for DNS validation and re-run the Terraform script to continue deployment. The React app expects few environment variables. Below is the sample .env file; update the respective values for your environment.

REACT_APP_IDENTITY_POOL_ID=
REACT_APP_COGNITO_REGION=
REACT_APP_COGNITO_USER_POOL_ID=
REACT_APP_COGNTIO_DOMAIN_NAME=
REACT_APP_DOMAIN_NAME=
REACT_APP_CLIENT_ID=
REACT_APP_CLI_APP_URL=
REACT_APP_API_APP_URL=

REACT_APP_IDENTITY_POOL_ID=
REACT_APP_COGNITO_REGION=
REACT_APP_COGNITO_USER_POOL_ID=
REACT_APP_COGNTIO_DOMAIN_NAME=
REACT_APP_DOMAIN_NAME=
REACT_APP_CLIENT_ID=
REACT_APP_CLI_APP_URL=
REACT_APP_API_APP_URL=

Finally, deploy the React app using below sample commands.

$ terraform plan -out plan     #creates plan for revision
$ terraform apply plan         #apply plan and deploy

$ terraform plan -out plan     #creates plan for revision
$ terraform apply plan         #apply plan and deploy

API Gateway HTTP API and Lambda Function

When a request is first intercepted by API Gateway, it validates the JWT token on its own. API Gateway natively supports Cognito integration. Thus, any payload with invalid authorization header is rejected at API Gateway itself. This eases our authentication process and validates the identity. If the request is valid, it is then received by our Lambda function. Our Lambda function is written in Node.js and wrapped by serverless-http framework around express app. The Express app has only one endpoint.

/auth-creds (POST): once the request is received, it retrieves the ID from Cognito and logs it to stdout for audit purpose.

let identityParams = {
           IdentityPoolId: process.env.IDENTITY_POOL_ID,
           Logins: {}
       };
  
       identityParams.Logins[`${process.env.COGNITOIDP}`] = req.headers.authorization;
  
       const ci = new CognitoIdentity({region : process.env.AWSREGION});
  
       let idpResponse = await ci.getId(identityParams).promise();
  
       console.log("Auth Creds Request Received from ",JSON.stringify(idpResponse));

let identityParams = {
           IdentityPoolId: process.env.IDENTITY_POOL_ID,
           Logins: {}
       };
  
       identityParams.Logins[`${process.env.COGNITOIDP}`] = req.headers.authorization;
  
       const ci = new CognitoIdentity({region : process.env.AWSREGION});
  
       let idpResponse = await ci.getId(identityParams).promise();
  
       console.log("Auth Creds Request Received from ",JSON.stringify(idpResponse));

The app then extracts the base64 encoded public key. Followed by this, an STS api call (Security Token Service) is made and temporary credentials are derived. These credentials are then encrypted with a public key in chunks of 86 bytes.

const pemPublicKey = Buffer.from(public_key,'base64').toString();
 
       const authdata=await sts.assumeRole({
           ExternalId: process.env.STS_EXTERNAL_ID,
           RoleArn: process.env.IAM_ROLE_ARN,
           RoleSessionName: "DemoAWSAuthSession"
       }).promise();
 
       const creds = JSON.stringify(authdata.Credentials);
       const splitData = creds.match(/.{1,86}/g);
      
       const encryptedData = splitData.map(d=>{
           return publicEncrypt(pemPublicKey,Buffer.from(d)).toString('base64');
       });

const pemPublicKey = Buffer.from(public_key,'base64').toString();
 
       const authdata=await sts.assumeRole({
           ExternalId: process.env.STS_EXTERNAL_ID,
           RoleArn: process.env.IAM_ROLE_ARN,
           RoleSessionName: "DemoAWSAuthSession"
       }).promise();
 
       const creds = JSON.stringify(authdata.Credentials);
       const splitData = creds.match(/.{1,86}/g);
      
       const encryptedData = splitData.map(d=>{
           return publicEncrypt(pemPublicKey,Buffer.from(d)).toString('base64');
       });

Here, the assumeRole calls the IAM role, which has appropriate policy documents attached. For the sake of this demo, we attached an Administrator role. However, one should consider a hardening policy document and avoid attaching Administrator policy directly to the role.

resources:
 Resources:
   AuthCredsAssumeRole:
     Type: AWS::IAM::Role
     Properties:
       AssumeRolePolicyDocument:
         Version: "2012-10-17"
         Statement:
           -
             Effect: Allow
             Principal:
               AWS: !GetAtt IamRoleLambdaExecution.Arn
             Action: sts:AssumeRole
             Condition:
               StringEquals:
                 sts:ExternalId: ${env:STS_EXTERNAL_ID}
       RoleName: auth-awscreds-api
       ManagedPolicyArns:
         - arn:aws:iam::aws:policy/AdministratorAccess

resources:
 Resources:
   AuthCredsAssumeRole:
     Type: AWS::IAM::Role
     Properties:
       AssumeRolePolicyDocument:
         Version: "2012-10-17"
         Statement:
           -
             Effect: Allow
             Principal:
               AWS: !GetAtt IamRoleLambdaExecution.Arn
             Action: sts:AssumeRole
             Condition:
               StringEquals:
                 sts:ExternalId: ${env:STS_EXTERNAL_ID}
       RoleName: auth-awscreds-api
       ManagedPolicyArns:
         - arn:aws:iam::aws:policy/AdministratorAccess

Finally, the response is sent to the React app.

We have used the Serverless framework to deploy the API. The Serverless framework creates API gateway, lambda function, Lambda Layer, and IAM role, and takes care of code deployment to lambda function.

To deploy this application, follow the below steps.

1. cd layer/nodejs && npm install && cd ../.. && npm install

2. npm install -g serverless (on mac you can skip this step and use the npx serverless command instead)

3. Create .env file and below environment variables to file and set the respective values.

AWSREGION=ap-south-1
COGNITO_USER_POOL_ID=
IDENTITY_POOL_ID=
COGNITOIDP=
APP_CLIENT_ID=
STS_EXTERNAL_ID=
IAM_ROLE_ARN=
DEPLOYMENT_BUCKET=
APP_DOMAIN=

AWSREGION=ap-south-1
COGNITO_USER_POOL_ID=
IDENTITY_POOL_ID=
COGNITOIDP=
APP_CLIENT_ID=
STS_EXTERNAL_ID=
IAM_ROLE_ARN=
DEPLOYMENT_BUCKET=
APP_DOMAIN=

4. serverless deploy or npx serverless deploy

Entire codebase for CLI APP, React App, and Backend API is available on the GitHub repository.

Testing:

Assuming that you have compiled binary (auth-awscreds) available in your local machine and for the sake of testing you have installed `aws-cli`, you can then run /path/to/your/auth-awscreds.

If you selected your AWS profile name as “demo-awscreds,” you can then export the AWS_PROFILE environment variable. If you prefer a “default” profile, you don’t need to export the environment variable as AWS SDK selects a “default” profile on its own.

[demo-awscreds]
aws_access_key_id=ASIAUAOF2CHC77SJUPZU
aws_secret_access_key=r21J4vwPDnDYWiwdyJe3ET+yhyzFEj7Wi1XxdIaq
aws_session_token=FwoGZXIvYXdzEIj//////////wEaDHVLdvxSNEqaQZPPQyK2AeuaSlfAGtgaV1q2aKBCvK9c8GCJqcRLlNrixCAFga9n+9Vsh/5AWV2fmea6HwWGqGYU9uUr3mqTSFfh+6/9VQH3RTTwfWEnQONuZ6+E7KT9vYxPockyIZku2hjAUtx9dSyBvOHpIn2muMFmizZH/8EvcZFuzxFrbcy0LyLFHt2HI/gy9k6bLCMbcG9w7Ej2l8vfF3dQ6y1peVOQ5Q8dDMahhS+CMm1q/T1TdNeoon7mgqKGruO4KJrKiZoGMi1JZvXeEIVGiGAW0ro0/Vlp8DY1MaL7Af8BlWI1ZuJJwDJXbEi2Y7rHme5JjbA=

[demo-awscreds]
aws_access_key_id=ASIAUAOF2CHC77SJUPZU
aws_secret_access_key=r21J4vwPDnDYWiwdyJe3ET+yhyzFEj7Wi1XxdIaq
aws_session_token=FwoGZXIvYXdzEIj//////////wEaDHVLdvxSNEqaQZPPQyK2AeuaSlfAGtgaV1q2aKBCvK9c8GCJqcRLlNrixCAFga9n+9Vsh/5AWV2fmea6HwWGqGYU9uUr3mqTSFfh+6/9VQH3RTTwfWEnQONuZ6+E7KT9vYxPockyIZku2hjAUtx9dSyBvOHpIn2muMFmizZH/8EvcZFuzxFrbcy0LyLFHt2HI/gy9k6bLCMbcG9w7Ej2l8vfF3dQ6y1peVOQ5Q8dDMahhS+CMm1q/T1TdNeoon7mgqKGruO4KJrKiZoGMi1JZvXeEIVGiGAW0ro0/Vlp8DY1MaL7Af8BlWI1ZuJJwDJXbEi2Y7rHme5JjbA=

To validate, you can then run “aws s3 ls.” You should see S3 buckets listed from your AWS account. Note that these credentials are only valid for 60 minutes. This means you will have to re-run the command and acquire a new pair of AWS credentials. Of course, you can configure your IAM role to extend expiry for an “assume role.”

auth-awscreds in Action:

Summary

Currently, “auth-awscreds” is at its early development stage. This post demonstrates how AWS credentials can be acquired temporarily without having to worry about key rotation. One of the features that we are currently working on is RBAC, with the help of AWS Cognito. Since this tool currently doesn’t support any command line argument, we can’t reconfigure utility configuration. You can manually edit or delete the utility configuration file, which triggers a prompt for configuring during the next run. We also want to add multiple profiles so that multiple AWS accounts can be used.

December 12, 2022

Automating Serverless Framework Deployment using Watchdog

These days, we see that most software development is moving towards serverless architecture, and that’s no surprise. Almost all top cloud service providers have serverless services that follow a pay-as-you-go model. This way, consumers don’t have to pay for any unused resources. Also, there’s no need to worry about procuring dedicated servers, network/hardware management, operating system security updates, etc.

Unfortunately, for cloud developers, serverless tools don’t provide auto-deploy services for updating local environments. This is still a headache. The developer must deploy and test changes manually. Web app projects using Node or Django have a watcher on the development environment during app bundling on their respective server runs. Thus, when changes happen in the code directory, the server automatically restarts with these new changes, and the developer can check if the changes are working as expected.

In this blog, we will talk about automating serverless application deployment by changing the local codebase. We are using AWS as a cloud provider and primarily focusing on lambda to demonstrate the functionality.

Prerequisites:

This article uses AWS, so command and programming access are necessary.
This article is written with deployment to AWS in mind, so AWS credentials are needed to make changes in the Stack. In the case of other cloud providers, we would require that provider’s command-line access.
We are using a serverless application framework for deployment. (This example will also work for other tools like Zappa.) So, some serverless context would be required.

Before development, let’s divide the problem statement into sub-tasks and build them one step at a time.

Problem Statement

Create a codebase watcher service that would trigger either a stack update on AWS or run a local test. By doing this, developers would not have to worry about manual deployment on the cloud provider. This service needs to keep an eye on the code and generate events when an update/modify/copy/delete occurs in the given codebase.

Solution

First, to watch the codebase, we need logic that acts as a trigger and notifies when underlining files changes. For this, there are already packages present in different programming languages. In this example, we are using ‘python watchdog.’

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

CODE_PATH = "<codebase path>"

class WatchMyCodebase:
    # Set the directory on watch
    def __init__(self):
        self.observer = Observer()

    def run(self):
        event_handler = EventHandler()
        # recursive flag decides if watcher should collect changes in CODE_PATH directory tree.
        self.observer.schedule(event_handler, CODE_PATH, recursive=True)
        self.observer.start()
        self.observer.join()


class EventHandler(FileSystemEventHandler):
    """Handle events generated by Watchdog Observer"""

    @classmethod
    def on_any_event(cls, event):
        if event.is_directory:
            """Ignore directory level events, like creating new empty directory etc.."""
            return None

        elif event.event_type == 'modified':
            print("file under codebase directory is modified...")

if __name__ == '__main__':
    watch = WatchMyCodebase()
    watch.run()

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

CODE_PATH = "<codebase path>"

class WatchMyCodebase:
    # Set the directory on watch
    def __init__(self):
        self.observer = Observer()

    def run(self):
        event_handler = EventHandler()
        # recursive flag decides if watcher should collect changes in CODE_PATH directory tree.
        self.observer.schedule(event_handler, CODE_PATH, recursive=True)
        self.observer.start()
        self.observer.join()


class EventHandler(FileSystemEventHandler):
    """Handle events generated by Watchdog Observer"""

    @classmethod
    def on_any_event(cls, event):
        if event.is_directory:
            """Ignore directory level events, like creating new empty directory etc.."""
            return None

        elif event.event_type == 'modified':
            print("file under codebase directory is modified...")

if __name__ == '__main__':
    watch = WatchMyCodebase()
    watch.run()

Here, the on_any_event() class method gets called on any updates in the mentioned directory, and we need to add deployment logic here. However, we can’t just deploy once it receives a notification from the watcher because modern IDEs save files as soon as the user changes them. And if we add logic that deploys on every change, then most of the time, it will deploy half-complete services.

To handle this, we must add some timeout before deploying the service.

Here, the program will wait for some time after the file is changed. And if it finds that, for some time, there have been no new changes in the codebase, it will deploy the service.

import time
import subprocess
import threading
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

valid_events = ['created', 'modified', 'deleted', 'moved']
DEPLOY_AFTER_CHANGE_THRESHOLD = 300
STAGE_NAME = ""
CODE_PATH = "<codebase path>"

def deploy_env():
    process = subprocess.Popen(['sls', 'deploy', '--stage', STAGE_NAME, '-v'],
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    print(stdout, stderr)

def deploy_service_on_change():
    while True:
        if EventHandler.last_update_time and (int(time.time() - EventHandler.last_update_time) > DEPLOY_AFTER_CHANGE_THRESHOLD):
            EventHandler.last_update_time = None
            deploy_env()
        time.sleep(5)

def start_interval_watcher_thread():
    interval_watcher_thread = threading.Thread(target=deploy_service_on_change)
    interval_watcher_thread.start()


class WatchMyCodebase:
    # Set the directory on watch
    def __init__(self):
        self.observer = Observer()

    def run(self):
        event_handler = EventHandler()
        self.observer.schedule(event_handler, CODE_PATH, recursive=True)
        self.observer.start()
        self.observer.join()


class EventHandler(FileSystemEventHandler):
    """Handle events generated by Watchdog Observer"""
    last_update_time = None

    @classmethod
    def on_any_event(cls, event):
        if event.is_directory:
            """Ignore directory level events, like creating new empty directory etc.."""
            return None

        elif event.event_type in valid_events and '.serverless' not in event.src_path:
            # Ignore events related to changes in .serverless directory, serverless creates few temp file while deploy
            cls.last_update_time = time.time()


if __name__ == '__main__':
    start_interval_watcher_thread()
    watch = WatchMyCodebase()
    watch.run()

import time
import subprocess
import threading
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

valid_events = ['created', 'modified', 'deleted', 'moved']
DEPLOY_AFTER_CHANGE_THRESHOLD = 300
STAGE_NAME = ""
CODE_PATH = "<codebase path>"

def deploy_env():
    process = subprocess.Popen(['sls', 'deploy', '--stage', STAGE_NAME, '-v'],
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    print(stdout, stderr)

def deploy_service_on_change():
    while True:
        if EventHandler.last_update_time and (int(time.time() - EventHandler.last_update_time) > DEPLOY_AFTER_CHANGE_THRESHOLD):
            EventHandler.last_update_time = None
            deploy_env()
        time.sleep(5)

def start_interval_watcher_thread():
    interval_watcher_thread = threading.Thread(target=deploy_service_on_change)
    interval_watcher_thread.start()


class WatchMyCodebase:
    # Set the directory on watch
    def __init__(self):
        self.observer = Observer()

    def run(self):
        event_handler = EventHandler()
        self.observer.schedule(event_handler, CODE_PATH, recursive=True)
        self.observer.start()
        self.observer.join()


class EventHandler(FileSystemEventHandler):
    """Handle events generated by Watchdog Observer"""
    last_update_time = None

    @classmethod
    def on_any_event(cls, event):
        if event.is_directory:
            """Ignore directory level events, like creating new empty directory etc.."""
            return None

        elif event.event_type in valid_events and '.serverless' not in event.src_path:
            # Ignore events related to changes in .serverless directory, serverless creates few temp file while deploy
            cls.last_update_time = time.time()


if __name__ == '__main__':
    start_interval_watcher_thread()
    watch = WatchMyCodebase()
    watch.run()

The specified valid_events acts as a filter to deploy, and we are only considering these events and acting upon them.

Moreover, to add a delay after file changes and ensure that there are no new changes, we added interval_watcher_thread. This checks the difference between current and last directory update time, and if it’s greater than the specified threshold, we deploy serverless resources.

def deploy_service_on_change():
    while True:
        if EventHandler.last_update_time and (int(time.time() - EventHandler.last_update_time) > DEPLOY_AFTER_CHANGE_SEC):
            EventHandler.last_update_time = None
            deploy_env()
        time.sleep(5)

def start_interval_watcher_thread():
    interval_watcher_thread = threading.Thread(target=deploy_service_on_change)
    interval_watcher_thread.start()

def deploy_service_on_change():
    while True:
        if EventHandler.last_update_time and (int(time.time() - EventHandler.last_update_time) > DEPLOY_AFTER_CHANGE_SEC):
            EventHandler.last_update_time = None
            deploy_env()
        time.sleep(5)

def start_interval_watcher_thread():
    interval_watcher_thread = threading.Thread(target=deploy_service_on_change)
    interval_watcher_thread.start()

Here, the sleep time in deploy_service_on_change is important. It will prevent the program from consuming more CPU cycles to check whether the condition to deploy serverless is satisfied. Also, too much delay would cause more delay in the deployment than the specified value of DEPLOY_AFTER_CHANGE_THRESHOLD.

Note: With programming languages like Golang, and its features like goroutine and channels, we can build an even more efficient application—or even with Python with the help of thread signals.

Let’s build one lambda function that automatically deploys on a change. Let’s also be a little lazy and develop a basic python lambda that takes a number as an input and returns its factorial value.

import math

def lambda_handler(event, context):
    """
    Handler for get factorial
    """

    number = event['number']
    return math.factorial(number)

import math

def lambda_handler(event, context):
    """
    Handler for get factorial
    """

    number = event['number']
    return math.factorial(number)

We are using a serverless application framework, so to deploy this lambda, we need a serverless.yml file that specifies stack details like execution environment, cloud provider, environment variables, etc. More parameters are listed in this guide.

service: get-factorial

provider:
  name: aws
  runtime: python3.7

functions:
  get_factorial:
    handler: handler.lambda_handler

service: get-factorial

provider:
  name: aws
  runtime: python3.7

functions:
  get_factorial:
    handler: handler.lambda_handler

We need to keep both handler.py and serverless.yml in the same folder, or we need to update the function handler in serverless.yml.

We can deploy it manually using this serverless command:

sls deploy --stage production -v

sls deploy --stage production -v

Note: Before deploying, export AWS credentials.

The above command deployed a stack using cloud formation:

–stage is how to specify the environment where the stack should be deployed. Like any other software project, it can have stage names such as production, dev, test, etc.
-v specifies verbose.

To auto-deploy changes from now on, we can use the watcher.

Start the watcher with this command:

python3  auto_deploy_sls.py

python3  auto_deploy_sls.py

This will run continuously and keep an eye on the codebase directory, and if any changes are detected, it will deploy them. We can customize this to some extent, like post-deploy, so it can run test cases against a new stack.

If you are worried about network traffic when the stack has lots of dependencies, using an actual cloud provider for testing might increase billing. However, we can easily fix this by using serverless local development.

Here is a serverless blog that specifies local development and testing of a cloudformation stack. It emulates cloud behavior on the local setup, so there’s no need to worry about cloud service billing.

One great upgrade supports complex directory structure.

In the above example, we are assuming that only one single directory is present, so it’s fine to deploy using the command:

sls deploy --stage production -v

sls deploy --stage production -v

But in some projects, one might have multiple stacks present in the codebase at different hierarchies. Consider the below example: We have three different lambdas, so updating in the `check-prime` directory requires updating only that lambda and not the others.

├── check-prime
│   ├── handler.py
│   └── serverless.yml
├── get-factorial
│   ├── handler.py
│   └── serverless.yml
└── get-factors
    ├── handler.py
    └── serverless.yml

├── check-prime
│   ├── handler.py
│   └── serverless.yml
├── get-factorial
│   ├── handler.py
│   └── serverless.yml
└── get-factors
    ├── handler.py
    └── serverless.yml

The above can be achieved in on_any_event(). By using the variable event.src_path, we can learn the file path that received the event.

Now, deployment command changes to:

cd <updated_directory> && sls deploy --stage <your-stage> -v

cd <updated_directory> && sls deploy --stage <your-stage> -v

This will deploy only an updated stack.

Conclusion

We learned that even if serverless deployment is a manual task, it can be automated with the help of Watchdog for better developer workflow.

With the help of serverless local development, we can test changes as we are making them without needing an explicit deployment to the cloud environment manually to test all the changes being made.

We hope this helps you improve your serverless development experience and close the loop faster.

1. To Go Serverless Or Not Is The Question

2. Building Your First AWS Serverless Application? Here’s Everything You Need to Know

December 12, 2022

Building Your First AWS Serverless Application? Here’s Everything You Need to Know
A serverless architecture is a way to implement and run applications and services or micro-services without need to manage infrastructure. Your application still runs on servers, but all the servers management is done by AWS. Now we don’t need to provision, scale or maintain servers to run our applications, databases and storage systems. Services which are developed by developers who don’t let developers build application from scratch.

Why Serverless
1. More focus on development rather than managing servers.
2. Cost Effective.
3. Application which scales automatically.
4. Quick application setup.
Services For ServerLess

For implementing serverless architecture there are multiple services which are provided by cloud partners though we will be exploring most of the services from AWS. Following are the services which we can use depending on the application requirement.
1. Lambda: It is used to write business logic / schedulers / functions.
2. S3: It is mostly used for storing objects but it also gives the privilege to host WebApps. You can host a static website on S3.
3. API Gateway: It is used for creating, publishing, maintaining, monitoring and securing REST and WebSocket APIs at any scale.
4. Cognito: It provides authentication, authorization & user management for your web and mobile apps. Your users can sign in directly sign in with a username and password or through third parties such as Facebook, Amazon or Google.
5. DynamoDB: It is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
Three-tier Serverless Architecture

So, let’s take a use case in which you want to develop a three tier serverless application. The three tier architecture is a popular pattern for user facing applications, The tiers that comprise the architecture include the presentation tier, the logic tier and the data tier. The presentation tier represents the component that users directly interact with web page / mobile app UI. The logic tier contains the code required to translate user action at the presentation tier to the functionality that drives the application’s behaviour. The data tier consists of your storage media (databases, file systems, object stores) that holds the data relevant to the application. Figure shows the simple three-tier application.

Figure: Simple Three-Tier Architectural Pattern

Presentation Tier

The presentation tier of the three tier represents the View part of the application. Here you can use S3 to host static website. On a static website, individual web pages include static content and they also contain client side scripting.

The following is a quick procedure to configure an Amazon S3 bucket for static website hosting in the S3 console.

To configure an S3 bucket for static website hosting

1. Log in to the AWS Management Console and open the S3 console at

2. In the Bucket name list, choose the name of the bucket that you want to enable static website hosting for.

3. Choose Properties.

4. Choose Static Website Hosting

Once you enable your bucket for static website hosting, browsers can access all of your content through the Amazon S3 website endpoint for your bucket.

5. Choose Use this bucket to host.

A. For Index Document, type the name of your index document, which is typically named index.html. When you configure a S3 bucket for website hosting, you must specify an index document, which will be returned by S3 when requests are made to the root domain or any of the subfolders.

B. (Optional) For 4XX errors, you can optionally provide your own custom error document that provides additional guidance for your users. Type the name of the file that contains the custom error document. If an error occurs, S3 returns an error document.

C. (Optional) If you want to give advanced redirection rules, In the edit redirection rule text box, you have to XML to describe the rule.
E.g.
```
<RoutingRules>
    <RoutingRule>
        <Condition>
            <HttpErrorCodeReturnedEquals>403</HttpErrorCodeReturnedEquals>
        </Condition>
        <Redirect>
            <HostName>mywebsite.com</HostName>
            <ReplaceKeyPrefixWith>notfound/</ReplaceKeyPrefixWith>
        </Redirect>
    </RoutingRule>
</RoutingRules>
```
6. Choose Save

7. Add a bucket policy to the website bucket that grants access to the object in the S3 bucket for everyone. You must make the objects that you want to serve publicly readable, when you configure a S3 bucket as a website. To do so, you write a bucket policy that grants everyone S3:GetObject permission. The following bucket policy grants everyone access to the objects in the example-bucket bucket.
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::example-bucket/*"
            ]
        }
    ]
}
```
Note: If you choose Disable Website Hosting, S3 removes the website configuration from the bucket, so that the bucket no longer accessible from the website endpoint, but the bucket is still available at the REST endpoint.

Logic Tier

The logic tier represents the brains of the application. Here the two core services for serverless will be used i.e. API Gateway and Lambda to form your logic tier can be so revolutionary. The feature of the 2 services allow you to build a serverless production application which is highly scalable, available and secure. Your application could use number of servers, however by leveraging this pattern you do not have to manage a single one. In addition, by using these managed services together you get following benefits:
1. No operating system to choose, secure or manage.
2. No servers to right size, monitor.
3. No risk to your cost by over-provisioning.
4. No Risk to your performance by under-provisioning.
API Gateway

API Gateway is a fully managed service for defining, deploying and maintaining APIs. Anyone can integrate with the APIs using standard HTTPS requests. However, it has specific features and qualities that result it being an edge for your logic tier.

Integration with Lambda

API Gateway gives your application a simple way to leverage the innovation of AWS lambda directly (HTTPS Requests). API Gateway forms the bridge that connects your presentation tier and the functions you write in Lambda. After defining the client / server relationship using your API, the contents of the client’s HTTPS requests are passed to Lambda function for execution. The content include request metadata, request headers and the request body.

API Performance Across the Globe

Each deployment of API Gateway includes an Amazon CloudFront distribution under the covers. Amazon CloudFront is a content delivery web service that used Amazon’s global network of edge locations as connection points for clients integrating with API. This helps drive down the total response time latency of your API. Through its use of multiple edge locations across the world, Amazon CloudFront also provides you capabilities to combat distributed denial of service (DDoS) attack scenarios.

You can improve the performance of specific API requests by using API Gateway to store responses in an optional in-memory cache. This not only provides performance benefits for repeated API requests, but is also reduces backend executions, which can reduce overall cost.

Let’s dive into each step

1. Create Lambda Function
Login to Aws Console and head over to Lambda Service and Click on “Create A Function”

A. Choose first option “Author from scratch”
B. Enter Function Name
C. Select Runtime e.g. Python 2.7
D. Click on “Create Function”

As your function is ready, you can see your basic function will get generated in language you choose to write.
E.g.
```
import json

def lambda_handler(event, context):
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }
```
2. Testing Lambda Function

Click on “Test” button at the top right corner where we need to configure test event. As we are not sending any events, just give event a name, for example, “Hello World” template as it is and “Create” it.

Now, when you hit the “Test” button again, it runs through testing the function we created earlier and returns the configured value.

Create & Configure API Gateway connecting to Lambda

We are done with creating lambda functions but how to invoke function from outside world ? We need endpoint, right ?

Go to API Gateway & click on “Get Started” and agree on creating an Example API but we will not use that API we will create “New API”. Give it a name by keeping “Endpoint Type” regional for now.

Create the API and you will go on the page “resources” page of the created API Gateway. Go through the following steps:

A. Click on the “Actions”, then click on “Create Method”. Select Get method for our function. Then, “Tick Mark” on the right side of “GET” to set it up.
B. Choose “Lambda Function” as integration type.
C. Choose the region where we created earlier.
D. Write the name of Lambda Function we created
E. Save the method where it will ask you for confirmation of “Add Permission to Lambda Function”. Agree to that & that is done.
F. Now, we can test our setup. Click on “Test” to run API. It should give the response text we had on the lambda test screen.

Now, to get endpoint. We need to deploy the API. On the Actions dropdown, click on Deploy API under API Actions. Fill in the details of deployment and hit Deploy.

After that, we will get our HTTPS endpoint.

On the above screen you can see the things like cache settings, throttling, logging which can be configured. Save the changes and browse the invoke URL from which we will get the response which was earlier getting from Lambda. So, here is our logic tier of serverless application is to be done.

Data Tier

By using Lambda as your logic tier, you have a number of data storage options for your data tier. These options fall into broad categories: Amazon VPC hosted data stores and IAM-enabled data stores. Lambda has the ability to integrate with both securely.

Amazon VPC Hosted Data Stores
1. Amazon RDS
2. Amazon ElasticCache
3. Amazon Redshift
IAM-Enabled Data Stores
1. Amazon DynamoDB
2. Amazon S3
3. Amazon ElasticSearch Service
You can use any of those for storage purpose, But DynamoDB is one of best suited for ServerLess application.

Why DynamoDB ?
1. It is NoSQL DB, also that is fully managed by AWS.
2. It provides fast & prectable performance with seamless scalability.
3. DynamoDB lets you offload the administrative burden of operating and scaling a distributed system.
4. It offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data.
5. You can scale up/down your tables throughput capacity without downtime/performance degradation.
6. It provides On-Demand backups as well as enable point in time recovery for your DynamoDB tables.
7. DynamoDB allows you to delete expired items from table automatically to help you reduce storage usage and the cost of storing data that is no longer relevant.
Following is the sample script for DynamoDB with Python which you can use with lambda.
from __future__ import print_function # Python 2/3 compatibility import boto3 import json import decimal from boto3.dynamodb.conditions import Key, Attr from botocore.exceptions import ClientError # Helper class to convert a DynamoDB item to JSON. class DecimalEncoder(json.JSONEncoder): def default(self, o): if isinstance(o, decimal.Decimal): if o % 1 > 0: return float(o) else: return int(o) return super(DecimalEncoder, self).default(o) dynamodb = boto3.resource("dynamodb", region_name='us-west-2', endpoint_url="http://localhost:8000") table = dynamodb.Table('Movies') title = "The Big New Movie" year = 2015 try: response = table.get_item( Key={ 'year': year, 'title': title } ) except ClientError as e: print(e.response['Error']['Message']) else: item = response['Item'] print("GetItem succeeded:") print(json.dumps(item, indent=4, cls=DecimalEncoder))
```
from __future__ import print_function # Python 2/3 compatibility
import boto3
import json
import decimal
from boto3.dynamodb.conditions import Key, Attr
from botocore.exceptions import ClientError

# Helper class to convert a DynamoDB item to JSON.
class DecimalEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, decimal.Decimal):
            if o % 1 > 0:
                return float(o)
            else:
                return int(o)
        return super(DecimalEncoder, self).default(o)

dynamodb = boto3.resource("dynamodb", region_name='us-west-2', endpoint_url="http://localhost:8000")

table = dynamodb.Table('Movies')

title = "The Big New Movie"
year = 2015

try:
    response = table.get_item(
        Key={
            'year': year,
            'title': title
        }
    )
except ClientError as e:
    print(e.response['Error']['Message'])
else:
    item = response['Item']
    print("GetItem succeeded:")
    print(json.dumps(item, indent=4, cls=DecimalEncoder))
```
Note: To run the above script successfully you need to attach policy to your role for lambda. So in this case you need to attach policy for DynamoDB operations to take place & for CloudWatch if required to store your logs. Following is the policy which you can attach to your role for DB executions.
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": [ "dynamodb:BatchGetItem", "dynamodb:GetItem", "dynamodb:Query", "dynamodb:Scan", "dynamodb:BatchWriteItem", "dynamodb:PutItem", "dynamodb:UpdateItem" ], "Resource": "arn:aws:dynamodb:eu-west-1:123456789012:table/SampleTable" }, { "Effect": "Allow", "Action": [ "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "arn:aws:logs:eu-west-1:123456789012:*" }, { "Effect": "Allow", "Action": "logs:CreateLogGroup", "Resource": "*" } ] }
```
{
	"Version": "2012-10-17",
	"Statement": [{
			"Effect": "Allow",
			"Action": [
				"dynamodb:BatchGetItem",
				"dynamodb:GetItem",
				"dynamodb:Query",
				"dynamodb:Scan",
				"dynamodb:BatchWriteItem",
				"dynamodb:PutItem",
				"dynamodb:UpdateItem"
			],
			"Resource": "arn:aws:dynamodb:eu-west-1:123456789012:table/SampleTable"
		},
		{
			"Effect": "Allow",
			"Action": [
				"logs:CreateLogStream",
				"logs:PutLogEvents"
			],
			"Resource": "arn:aws:logs:eu-west-1:123456789012:*"
		},
		{
			"Effect": "Allow",
			"Action": "logs:CreateLogGroup",
			"Resource": "*"
		}
	]
}
```
Sample Architecture Patterns

You can implement the following popular architecture patterns using API Gateway & Lambda as your logic tier, Amazon S3 for presentation tier, DynamoDB as your data tier. For each example, we will only use AWS Service that do not require users to manage their own infrastructure.

Mobile Backend

1. Presentation Tier: A mobile application running on each user’s smartphone.

2. Logic Tier: API Gateway & Lambda. The logic tier is globally distributed by the Amazon CloudFront distribution created as part of each API Gateway each API. A set of lambda functions can be specific to user / device identity management and authentication & managed by Amazon Cognito, which provides integration with IAM for temporary user access credentials as well as with popular third party identity providers. Other Lambda functions can define core business logic for your Mobile Back End.

3. Data Tier: The various data storage services can be leveraged as needed; options are given above in data tier.

Amazon S3 Hosted Website

1. Presentation Tier: Static website content hosted on S3, distributed by Amazon CLoudFront. Hosting static website content on S3 is a cost effective alternative to hosting content on server-based infrastructure. However, for a website to contain rich feature, the static content often must integrate with a dynamic back end.

2. Logic Tier: API Gateway & Lambda, static web content hosted in S3 can directly integrate with API Gateway, which can be CORS complaint.

3. Data Tier: The various data storage services can be leveraged based on your requirement.

ServerLess Costing

At the top of the AWS invoice, we can see the total costing of AWS Services. The bill was processed for 2.1 million API request & all of the infrastructure required to support them.

Following is the list of services with their costing.

Note: You can get your costing done from AWS Calculator using following links;
1. https://calculator.s3.amazonaws.com/index.html
2. AWS Pricing Calculator
Conclusion

The three-tier architecture pattern encourages the best practice of creating application component that are easy to maintain, develop, decoupled & scalable. Serverless Application services varies based on the requirements over development.
December 12, 2022
BigQuery 101: All the Basics You Need to Know
Google BigQuery is an enterprise data warehouse built using BigTable and Google Cloud Platform. It’s serverless and completely managed. BigQuery works great with all sizes of data, from a 100 row Excel spreadsheet to several Petabytes of data. Most importantly, it can execute a complex query on those data within a few seconds.

We need to note before we proceed, BigQuery is not a transactional database. It takes around 2 seconds to run a simple query like ‘SELECT * FROM bigquery-public-data.object LIMIT 10’ on a 100 KB table with 500 rows. Hence, it shouldn’t be thought of as OLTP (Online Transaction Processing) database. BigQuery is for Big Data!

BigQuery supports SQL-like query, which makes it user-friendly and beginner friendly. It’s accessible via its web UI, command-line tool, or client library (written in C#, Go, Java, Node.js, PHP, Python, and Ruby). You can also take advantage of its REST APIs and get our job` done by sending a JSON request.

Now, let’s dive deeper to understand it better. Suppose you are a data scientist (or a startup which analyzes data) and you need to analyze terabytes of data. If you choose a tool like MySQL, the first step before even thinking about any query is to have an infrastructure in place, that can store this magnitude of data.

Designing this setup itself will be a difficult task because you have to figure out what will be the RAM size, DCOS or Kubernetes, and other factors. And if you have streaming data coming, you will need to set up and maintain a Kafka cluster. In BigQuery, all you have to do is a bulk upload of your CSV/JSON file, and you are done. BigQuery handles all the backend for you. If you need streaming data ingestion, you can use Fluentd. Another advantage of this is that you can connect Google Analytics with BigQuery seamlessly.

BigQuery is serverless, highly available, and petabyte scalable service which allows you to execute complex SQL queries quickly. It lets you focus on analysis rather than handling infrastructure. The idea of hardware is completely abstracted and not visible to us, not even as virtual machines.

Architecture of Google BigQuery

You don’t need to know too much about the underlying architecture of BigQuery. That’s actually the whole idea of it – you don’t need to worry about architecture and operation.

However, understanding BigQuery Architecture helps us in controlling costs, optimizing query performance, and optimizing storage. BigQuery is built using the Google Dremel paper.

Quoting an Abstract from the Google Dremel Paper –

“Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.”

Dremel was in production at Google since 2006. Google used it for the following tasks –
- Analysis of crawled web documents.
- Tracking install data for applications on Android Market.
- Crash reporting for Google products.
- OCR results from Google Books.
- Spam analysis.
- Debugging of map tiles on Google Maps.
- Tablet migrations in managed Bigtable instances.
- Results of tests run on Google’s distributed build system.
- Disk I/O statistics for hundreds of thousands of disks.
- Resource monitoring for jobs run in Google’s data centers.
- Symbols and dependencies in Google’s codebase.
BigQuery is much more than Dremel. Dremel is just a query execution engine, whereas Bigquery is based on interesting technologies like Borg (predecessor of Kubernetes) and Colossus. Colossus is the successor to the Google File System (GFS) as mentioned in Google Spanner Paper.

How BigQuery Stores Data?

BigQuery stores data in a columnar format – Capacitor (which is a successor of ColumnarIO). BigQuery achieves very high compression ratio and scan throughput. Unlike ColumnarIO, now on BigQuery, you can directly operate on compressed data without decompressing it.

Columnar storage has the following advantages:
- Traffic minimization – When you submit a query, the required column values on each query are scanned and only those are transferred on query execution. E.g., a query `SELECT title FROM Collection` would access the title column values only.
- Higher compression ratio – Columnar storage can achieve a compression ratio of 1:10, whereas ordinary row-based storage can compress at roughly 1:3.
(Image source: Google Dremel Paper)

Columnar storage has the disadvantage of not working efficiently when updating existing records. That is why Dremel doesn’t support any update queries.

How the Query Gets Executed?

BigQuery depends on Borg for data processing. Borg simultaneously instantiates hundreds of Dremel jobs across required clusters made up of thousands of machines. In addition to assigning compute capacity for Dremel jobs, Borg handles fault-tolerance as well.

Now, how do you design/execute a query which can run on thousands of nodes and fetches the result? This challenge was overcome by using the Tree Architecture. This architecture forms a gigantically parallel distributed tree for pushing down a query to the tree and aggregating the results from the leaves at a blazingly fast speed.

(Image source: Google Dremel Paper)

BigQuery vs. MapReduce

The key differences between BigQuery and MapReduce are –
- Dremel is designed as an interactive data analysis tool for large datasets
- MapReduce is designed as a programming framework to batch process large datasets
Moreover, Dremel finishes most queries within seconds or tens of seconds and can even be used by non-programmers, whereas MapReduce takes much longer (sometimes even hours or days) to process a query.

Following is a comparison on running MapReduce on a row and columnar DB:

(Image source: Google Dremel Paper)

Another important thing to note is that BigQuery is meant to analyze structured data (SQL) but in MapReduce, you can write logic for unstructured data as well.

Comparing BigQuery and Redshift

In Redshift, you need to allocate different instance types and create your own clusters. The benefit of this is that it lets you tune the compute/storage to meet your needs. However, you have to be aware of (virtualized) hardware limits and scale up/out based on that. Note that you are charged by the hour for each instance you spin up.

In BigQuery, you just upload the data and query it. It is a truly managed service. You are charged by storage, streaming inserts, and queries.

There are more similarities in both the data warehouses than the differences.

A smart user will definitely take advantage of the hybrid cloud (GCE+AWS) and leverage different services offered by both the ecosystems. Check out your quintessential guide to AWS Athena here.

Getting Started With Google BigQuery

Following is a quick example to show how you can quickly get started with BigQuery:
1. There are many public datasets available on bigquery, you are going to play with ‘bigquery-public-data:stackoverflow’ dataset. You can click on the “Add Data” button on the left panel and select datasets.
2. Next, find a language that has the best community, based on the response time. You can write the following query to do that.
WITH question_answers_join AS ( SELECT * , GREATEST(1, TIMESTAMP_DIFF(answers.first, creation_date, minute)) minutes_2_answer FROM ( SELECT id, creation_date, title , (SELECT AS STRUCT MIN(creation_date) first, COUNT(*) c FROM `bigquery-public-data.stackoverflow.posts_answers` WHERE a.id=parent_id ) answers , SPLIT(tags, '|') tags FROM `bigquery-public-data.stackoverflow.posts_questions` a WHERE EXTRACT(year FROM creation_date) > 2014 ) ) SELECT COUNT(*) questions, tag , ROUND(EXP(AVG(LOG(minutes_2_answer))), 2) mean_geo_minutes , APPROX_QUANTILES(minutes_2_answer, 100)[SAFE_OFFSET(50)] median FROM question_answers_join, UNNEST(tags) tag WHERE tag IN ('javascript', 'python', 'rust', 'java', 'scala', 'ruby', 'go', 'react', 'c', 'c++') AND answers.c > 0 GROUP BY tag ORDER BY mean_geo_minutes
```
WITH question_answers_join AS (
  SELECT *
    , GREATEST(1, TIMESTAMP_DIFF(answers.first, creation_date, minute)) minutes_2_answer
  FROM (
    SELECT id, creation_date, title
      , (SELECT AS STRUCT MIN(creation_date) first, COUNT(*) c
         FROM `bigquery-public-data.stackoverflow.posts_answers` 
         WHERE a.id=parent_id
      ) answers
      , SPLIT(tags, '|') tags
    FROM `bigquery-public-data.stackoverflow.posts_questions` a
    WHERE EXTRACT(year FROM creation_date) > 2014
  )
)
SELECT COUNT(*) questions, tag
  , ROUND(EXP(AVG(LOG(minutes_2_answer))), 2) mean_geo_minutes
  , APPROX_QUANTILES(minutes_2_answer, 100)[SAFE_OFFSET(50)] median
FROM question_answers_join, UNNEST(tags) tag
WHERE tag IN ('javascript', 'python', 'rust', 'java', 'scala', 'ruby', 'go', 'react', 'c', 'c++')
AND answers.c > 0
GROUP BY tag
ORDER BY mean_geo_minutes
```
3. Now you can execute the query and get results –

You can see that C has the best community followed by JavaScript!

How to do Machine Learning on BigQuery?

Now that you have a sound understanding of BigQuery. It’s time for some real action.

As discussed above, you can connect Google Analytics with BigQuery by going to the Google Analytics Admin panel, then enable BigQuery by clicking on PROPERTY column, click All Products, then click Link BigQuery. After that, you need to enter BigQuery ID (or project number) and then BigQuery will be linked to Google Analytics. Note – Right now BigQuery integration is only available to Google Analytics 360.

Assuming that you already have uploaded your google analytics data, here is how you can create a logistic regression model. Here, you are predicting whether a website visitor will make a transaction or not.
CREATE MODEL `velotio_tutorial.sample_model` OPTIONS(model_type='logistic_reg') AS SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20190401' AND '20180630'
```
CREATE MODEL `velotio_tutorial.sample_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20190401' AND '20180630'
```
Create a model named ‘velotio_tutorial.sample_model’. Now set the ‘model_type’ as ‘logistic_reg’ because you want to train a logistic regression model. A logistic regression model splits input data into two classes and gives the probability that the data is in one of the classes. Usually, in “spam or not spam” type of problems, you use logistic regression. Here, the problem is similar – a transaction will be made or not.

The above query gets the total number of page views, the country from where the session originated, the operating system of visitors device, the total number of e-commerce transactions within the session, etc.

Now you just press run query to execute the query.

Conclusion

BigQuery is a query service that allows us to run SQL-like queries against multiple terabytes of data in a matter of seconds. If you have structured data, BigQuery is the best option to go for. It can help even a non-programmer to get the analytics right!

Learn how to build an ETL Pipeline for MongoDB & Amazon Redshift using Apache Airflow.

If you need help with using machine learning in product development for your organization, connect with experts at Velotio!
December 12, 2022
An Introduction To Cloudflare Workers And Cloudflare KV store
Cloudflare Workers

This post gives a brief introduction to Cloudflare Workers and Cloudflare KV store. They address a fairly common set of problems around scaling an application globally. There are standard ways of doing this but they usually require a considerable amount of upfront engineering work and developers have to be aware of the ‘scalability’ issues to some degree. Serverless application tools target easy scalability and quick response times around the globe while keeping the developers focused on the application logic rather than infra nitty-gritties.

Global responsiveness

When an application is expected to be accessed around the globe, requests from users sitting in different time-zones should take a similar amount of time. There can be multiple ways of achieving this depending upon how data intensive the requests are and what those requests actually do.

Data intensive requests are harder and more expensive to globalize, but again not all the requests are same. On the other hand, static requests like getting a documentation page or a blog post can be globalized by generating markup at build time and deploying them on a CDN.

And there are semi-dynamic requests. They render static content either with some small amount of data or their content change based on the timezone the request came from.

The above is a loose classification of requests but there are exceptions, for example, not all the static requests are presentational.

Serverless frameworks are particularly useful in scaling static and semi-static requests.

Cloudflare Workers Overview

Cloudflare worker is essentially a function deployment service. They provide a serverless execution environment which can be used to develop and deploy small(although not necessarily) and modular cloud functions with minimal effort.

It is very trivial to start with workers. First, lets install wrangler, a tool for managing Cloudfare Worker projects.
```
npm i @cloudflare/wrangler -g
```
Wrangler handles all the standard stuff for you like project generation from templates, build, config, publishing among other things.

A worker primarily contains 2 parts: an event listener that invokes a worker and an event handler that returns a response object. Creating a worker is as easy as adding an event listener to a button.
```
addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
    return new Response("hello world")
}
```
Above is a simple hello world example. Wrangler can be used to build and get a live preview of your worker.
```
wrangler build
```
will build your worker. And
```
wrangler preview 
```
can be used to take a live preview on the browser. The preview is only meant to be used for testing(either by you or others). If you want the workers to be triggered by your own domain or a workers.dev subdomain, you need to publish it.

Publishing is fairly straightforward and requires very less configuration on both wrangler and your project.

Wrangler Configuration

Just create an account on Cloudflare and get API key. To configure wrangler, just do:
```
wrangler config
```
It will ask for the registered email and API key, and you are good to go.

To publish your worker on a workers.dev subdomain, just fill your account ID in the wrangler.toml and hit wrangler publish. The worker will be deployed and live at a generated workers.dev subdomain.

Regarding Routes

When you publish on a {script-name}.{subdomain}.workers.dev domain, the script or project associated with script-name will be invoked. There is no way to call a script just from {subdomain}.workers.dev.

Worker KV

Workers alone can’t be used to make anything complex without any persistent storage, that’s where Workers KV comes into the picture. Workers KV as it sounds, is a low-latency, high-volume, key-value store that is designed for efficient reads.

It optimizes the read latency by dynamically spreading the most frequently read entries to the edges(replicated in several regions) and storing less frequent entries centrally.

Newly added keys(or a CREATE) are immediately reflected in every region while a value change in the keys(or an UPDATE) may take as long as 60 seconds to propagate, depending upon the region.

Workers KV is only available to paid users of Cloudflare.

Writing Data in Workers KV
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces" -X POST -H "X-Auth-Email: $CLOUDFLARE_EMAIL" -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" -H "Content-Type: application/json" --data '{"title": "Requests"}' The above HTTP request will create a namespace by the name Requests. The response should look something like this: { "result": { "id": "30b52f55aafb41d88546d01d5f69440a", "title": "Requests", "supports_url_encoding": true }, "success": true, "errors": [], "messages": [] }
```
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces" 
-X POST 
-H "X-Auth-Email: $CLOUDFLARE_EMAIL" 
-H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" 
-H "Content-Type: application/json" 
--data '{"title": "Requests"}'
The above HTTP request will create a namespace by the name Requests. The response should look something like this:
{
    "result": {
        "id": "30b52f55aafb41d88546d01d5f69440a",
        "title": "Requests",
        "supports_url_encoding": true
    },
    "success": true,
    "errors": [],
    "messages": []
}
```
Now we can write KV pairs in this namespace. The following HTTP requests will do the same:
```
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/values/first-key" 
-X PUT 
-H "X-Auth-Email: $CLOUDFLARE_EMAIL" 
-H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" 
--data 'My first value!'
```
Here the NAMESPACE_ID is the same ID that we received in the last request. First-key is the key name and the My first value is the value.

Let’s complicate things a little

Above overview just introduces the managed cloud workers with a ‘hello world’ app and basics of the Workers KV, but now let’s make something more complicated. We will make an app which will tell how many requests have been made from your country till now. For example, if you pinged the worker from the US then it will return number of requests made so far from the US.

We will need:
- Some place to store the count of requests for each country.
- Find from which country the Worker was invoked.
For the first part, we will use the Workers KV to store the count for every request.

Let’s start

First, we will create a new project using wrangler: wrangler generate request-count.

We will be making HTTP calls to write values in the Workers KV, so let’s add ‘node-fetch’ to the project:
```
npm install node-fetch
```
Now, how do we find from which country each request is coming from? The answer is the cf object that is provided with each request to a worker.

The cf object is a special object that is passed with each request and can be accessed with request.cf. This mainly contains region specific information along with TLS and Auth information. The details of what is provided in the cf, can be found here.

As we can see from the documentation, we can get country from
```
request.cf.country.
```
The cf object is not correctly populated in the wrangler preview, you will need to publish your worker in order to test cf’s usage. An open issue mentioning the same can be found here.

Now, the logic is pretty straightforward here. When we get a request from a country for which we don’t have an entry in the Worker’s KV, we make an entry with value 1, else we increment the value of the country key.

To use Workers KV, we need to create a namespace. A namespace is just a collection of key-value pairs where all the keys have to be unique.

A namespace can be created under the KV tab in Cloudflare web UI by giving the name or using the API call above. You can also view/browse all of your namespaces from the web UI. Following API call can be used to read the value of a key from a namespace:
```
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/values/first-key" 
-H "X-Auth-Email: $CLOUDFLARE_EMAIL" 
-H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" 
```
But, it is neither the fastest nor the easiest way. Cloudflare provides a better and faster way to read data from your namespaces. It’s called binding. Each KV namespace can be bound to a worker script so to make it available in the script by the variable name. Any namespace can be bound with any worker. A KV namespace can be bound to a worker by going to the editing menu of a worker from the Cloudflare UI.

Following steps show you how to bind a namespace to a worker:

Go to the edit page of the worker in Cloudflare web UI and click on the KV tab:

Then add a binding by clicking the ‘Add binding’ button.

You can select the namespace name and the variable name by which it will be bound. More details can be found here. A binding that I’ve made can be seen in the above image.

That’s all we need to get this to work. Following is the relevant part of the script:
const fetch = require('node-fetch') addEventListener('fetch', event => { event.respondWith(handleRequest(event.request)) }) /** * Fetch and log a request * @param {Request} request */ async function handleRequest(request) { const country = request.cf.country const url = `https://api.cloudflare.com/client/v4/accounts/account-id/storage/kv/namespaces/namespace-id/values/${country}` let count = await requests.get(country) if (!count) { count = 1 } else { count = parseInt(count) + 1 } try { response = await fetch(url, { method: 'PUT', headers: {"X-Auth-Email": "email", "X-Auth-Key": "auth-key"}, body: `${count}` }) } catch (error) { return new Response(error, { status: 500 }) } return new Response(`${country}: ${count}`, { status: 200 }) }
```
const fetch = require('node-fetch')

addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request))
})

/**
* Fetch and log a request
* @param {Request} request
*/
async function handleRequest(request) {
    const country = request.cf.country

    const url = `https://api.cloudflare.com/client/v4/accounts/account-id/storage/kv/namespaces/namespace-id/values/${country}`

    let count = await requests.get(country)

    if (!count) {
        count = 1
    } else {
        count = parseInt(count) + 1
    }

    try {
        response = await fetch(url, {
        method: 'PUT',
        headers: {"X-Auth-Email": "email", "X-Auth-Key": "auth-key"},
        body: `${count}`
        })
    } catch (error) {
        return new Response(error, { status: 500 })
    }

    return new Response(`${country}: ${count}`, { status: 200 }) 
}
```
In the above code, I bound the Requests namespace that we created by the requests variable that would be dynamically resolved when we publish.

The full source of this can be found here.

This small application also demonstrates some of the practical aspects of the workers. For example, you would notice that the updates take some time to get reflected and response time of the workers is quick, especially when they are deployed on a .workers.dev subdomain here.

Side note: You will have to recreate the namespace-worker binding everytime you deploy the worker or you do wrangler publish.

Workers vs. AWS Lambda

AWS Lambda has been a major player in the serverless market for a while now. So, how is Cloudflare Workers as compared to it? Let’s see.

Architecture:

Cloudflare Workers `Isolates` instead of a container based underlying architecture. `Isolates` is the technology that allows V8(Google Chrome’s JavaScript Engine) to run thousands of processes on a single server in an efficient and secure manner. This effectively translates into faster code execution and lowers memory usage. More details can be found here.

Price:

The above mentioned architectural difference allows Workers to be significantly cheaper than Lambda. While a Worker offering 50 milliseconds of CPU costs $0.50 per million requests, the equivalent Lambda costs $1.84 per million. A more detailed price comparison can be found here.

Speed:

Workers also show significantly better performance numbers than Lambda and Lambda@Edge. Tests run by Cloudflare claim that they are 441% faster than Lambda and 192% faster than Lambda@Edge. A detailed performance comparison can be found here.

This better performance is also confirmed by serverless-benchmark.

Wrapping Up:

As we have seen, Cloudflare Workers along with the KV Store does make it very easy to start with a serverless application. They provide fantastic performance while using less cost along with intuitive deployment. These properties make them ideal for making globally accessible serverless applications.
December 12, 2022
A Beginner’s Guide to Edge Computing
In the world of data centers with wings and wheels, there is an opportunity to lay some work off from the centralized cloud computing by taking less compute intensive tasks to other components of the architecture. In this blog, we will explore the upcoming frontier of the web – Edge Computing.

What is the “Edge”?

The ‘Edge’ refers to having computing infrastructure closer to the source of data. It is the distributed framework where data is processed as close to the originating data source possible. This infrastructure requires effective use of resources that may not be continuously connected to a network such as laptops, smartphones, tablets, and sensors. Edge Computing covers a wide range of technologies including wireless sensor networks, cooperative distributed peer-to-peer ad-hoc networking and processing, also classifiable as local cloud/fog computing, mobile edge computing, distributed data storage and retrieval, autonomic self-healing networks, remote cloud services, augmented reality, and more.

Cloud Computing is expected to go through a phase of decentralization. Edge Computing is coming up with an ideology of bringing compute, storage and networking closer to the consumer.

But Why?

Legit question! Why do we even need Edge Computing? What are the advantages of having this new infrastructure?

Imagine a case of a self-driving car where the car is sending a live stream continuously to the central servers. Now, the car has to take a crucial decision. The consequences can be disastrous if the car waits for the central servers to process the data and respond back to it. Although algorithms like YOLO_v2 have sped up the process of object detection the latency is at that part of the system when the car has to send terabytes to the central server and then receive the response and then act! Hence, we need the basic processing like when to stop or decelerate, to be done in the car itself.

The goal of Edge Computing is to minimize the latency by bringing the public cloud capabilities to the edge. This can be achieved in two forms – custom software stack emulating the cloud services running on existing hardware, and the public cloud seamlessly extended to multiple point-of-presence (PoP) locations.

Following are some promising reasons to use Edge Computing:
1. Privacy: Avoid sending all raw data to be stored and processed on cloud servers.
2. Real-time responsiveness: Sometimes the reaction time can be a critical factor.
3. Reliability: The system is capable to work even when disconnected to cloud servers. Removes a single point of failure.
To understand the points mentioned above, let’s take the example of a device which responds to a hot keyword. Example, Jarvis from Iron Man. Imagine if your personal Jarvis sends all of your private conversations to a remote server for analysis. Instead, It is intelligent enough to respond when it is called. At the same time, it is real-time and reliable.

Intel CEO Brian Krzanich said in an event that autonomous cars will generate 40 terabytes of data for every eight hours of driving. Now with that flood of data, the time of transmission will go substantially up. In cases of self-driving cars, real-time or quick decisions are an essential need. Here edge computing infrastructure will come to rescue. These self-driving cars need to take decisions is split of a second whether to stop or not else consequences can be disastrous.

Another example can be drones or quadcopters, let’s say we are using them to identify people or deliver relief packages then the machines should be intelligent enough to take basic decisions like changing the path to avoid obstacles locally.

Forms of Edge Computing

Device Edge:

In this model, Edge Computing is taken to the customers in the existing environments. For example, AWS Greengrass and Microsoft Azure IoT Edge.

Cloud Edge:

This model of Edge Computing is basically an extension of the public cloud. Content Delivery Networks are classic examples of this topology in which the static content is cached and delivered through a geographically spread edge locations.

Vapor IO is an emerging player in this category. They are attempting to build infrastructure for cloud edge. Vapor IO has various products like Vapor Chamber. These are self-monitored. They have sensors embedded in them using which they are continuously monitored and evaluated by Vapor Software, VEC(Vapor Edge Controller). They also have built OpenDCRE, which we will see later in this blog.

The fundamental difference between device edge and cloud edge lies in the deployment and pricing models. The deployment of these models – device edge and cloud edge – are specific to different use cases. Sometimes, it may be an advantage to deploy both the models.

Edges around you

Edge Computing examples can be increasingly found around us:
1. Smart street lights
2. Automated Industrial Machines
3. Mobile devices
4. Smart Homes
5. Automated Vehicles (cars, drones etc)
Data Transmission is expensive. By bringing compute closer to the origin of data, latency is reduced as well as end users have better experience. Some of the evolving use cases of Edge Computing are Augmented Reality(AR) or Virtual Reality(VR) and the Internet of things. For example, the rush which people got while playing an Augmented Reality based pokemon game, wouldn’t have been possible if “real-timeliness” was not present in the game. It was made possible because the smartphone itself was doing AR not the central servers. Even Machine Learning(ML) can benefit greatly from Edge Computing. All the heavy-duty training of ML algorithms can be done on the cloud and the trained model can be deployed on the edge for near real-time or even real-time predictions. We can see that in today’s data-driven world edge computing is becoming a necessary component of it.

There is a lot of confusion between Edge Computing and IOT. If stated simply, Edge Computing is nothing but the intelligent Internet of things(IOT) in a way. Edge Computing actually complements traditional IOT. In the traditional model of IOT, all the devices, like sensors, mobiles, laptops etc are connected to a central server. Now let’s imagine a case where you give the command to your lamp to switch off, for such simple task, data needs to be transmitted to the cloud, analyzed there and then lamp will receive a command to switch off. Edge Computing brings computing closer to your home, that is either the fog layer present between lamp and cloud servers is smart enough to process the data or the lamp itself.

If we look at the below image, it is a standard IOT implementation where everything is centralized. While Edge Computing philosophy talks about decentralizing the architecture.

The Fog

Sandwiched between edge layer and cloud layer, there is the Fog Layer. It bridges connection between other two layers.

The difference between fog and edge computing is described in this article –
- Fog Computing – Fog computing pushes intelligence down to the local area network level of network architecture, processing data in a fog node or IoT gateway.
- Edge computing pushes the intelligence, processing power and communication capabilities of an edge gateway or appliance directly into devices like programmable automation controllers (PACs).
How do we manage Edge Computing?

The Device Relationship Management or DRM refers to managing, monitoring the interconnected components over the internet. AWS IOT Core and AWS Greengrass, Nebbiolo Technologies have developed Fog Node and Fog OS, Vapor IO has OpenDCRE using which one can control and monitor the data centers.

Following image (source – AWS) shows how to manage ML on Edge Computing using AWS infrastructure.

AWS Greengrass makes it possible for users to use Lambda functions to build IoT devices and application logic. Specifically, AWS Greengrass provides cloud-based management of applications that can be deployed for local execution. Locally deployed Lambda functions are triggered by local events, messages from the cloud, or other sources.

This GitHub repo demonstrates a traffic light example using two Greengrass devices, a light controller, and a traffic light.

Conclusion

We believe that next-gen computing will be influenced a lot by Edge Computing and will continue to explore new use-cases that will be made possible by the Edge.

References
December 12, 2022

Mesosphere DC/OS Masterclass : Tips and Tricks to Make Life Easier

DC/OS is an open-source operating system and distributed system for data center built on Apache Mesos distributed system kernel. As a distributed system, it is a cluster of master nodes and private/public nodes, where each node also has host operating system which manages the underlying machine.

It enables the management of multiple machines as if they were a single computer. It automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate remote management and monitoring of the cluster and its services.

Distributed System : DC/OS is distributed system with group of private and public nodes which are coordinated by master nodes.
Cluster Manager : DC/OS is responsible for running tasks on agent nodes and providing required resources to them. DC/OS uses Apache Mesos to provide cluster management functionality.
Container Platform : All DC/OS tasks are containerized. DC/OS uses two different container runtimes, i.e. docker and mesos. So that containers can be started from docker images or they can be native executables (binaries or scripts) which are containerized at runtime by mesos.
Operating System : As name specifies, DC/OS is an operating system which abstracts cluster h/w and s/w resources and provide common services to applications.

Unlike Linux, DC/OS is not a host operating system. DC/OS spans multiple machines, but relies on each machine to have its own host operating system and host kernel.

The high level architecture of DC/OS can be seen below :

For the detailed architecture and components of DC/OS, please click here.

Adoption and usage of Mesosphere DC/OS:

Mesosphere customers include :

30% of the Fortune 50 U.S. Companies
5 of the top 10 North American Banks
7 of the top 12 Worldwide Telcos
5 of the top 10 Highest Valued Startups

Some companies using DC/OS are :

Cisco
Yelp
Tommy Hilfiger
Uber
Netflix
Verizon
Cerner
NIO

Installing and using DC/OS

A guide to installing DC/OS can be found here. After installing DC/OS on any platform, install dcos cli by following documentation found here.

Using dcos cli, we can manager cluster nodes, manage marathon tasks and services, install/remove packages from universe and it provides great support for automation process as each cli command can be output to json.

NOTE: The tasks below are executed with and tested on below tools:

DC/OS 1.11 Open Source
DC/OS cli 0.6.0
jq:1.5-1-a5b5cbe

DC/OS commands and scripts

Setup DC/OS cli with DC/OS cluster

dcos cluster setup <CLUSTER URL>

dcos cluster setup <CLUSTER URL>

Example :

dcos cluster setup http://dcos-cluster.com

dcos cluster setup http://dcos-cluster.com

The above command will give you the link for oauth authentication and prompt for auth token. You can authenticate yourself with any of Google, Github or Microsoft account. Paste the token generated after authentication to cli prompt. (Provided oauth is enabled).

DC/OS authentication token

docs config show core.dcos_acs_token

docs config show core.dcos_acs_token

DC/OS cluster url

dcos config show core.dcos_url

dcos config show core.dcos_url

DC/OS cluster name

dcos config show cluster.name

dcos config show cluster.name

Access Mesos UI

<DC/OS_CLUSTER_URL>/mesos

<DC/OS_CLUSTER_URL>/mesos

Example:

http://dcos-cluster.com/mesos

http://dcos-cluster.com/mesos

Access Marathon UI

<DC/OS_CLUSTER_URL>/service/marathon

<DC/OS_CLUSTER_URL>/service/marathon

Example:

http://dcos-cluster.com/service/marathon

http://dcos-cluster.com/service/marathon

Access any DC/OS service, like Marathon, Kafka, Elastic, Spark etc.[DC/OS Services]

<DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

<DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

Example:

http://dcos-cluster.com/service/marathon
http://dcos-cluster.com/service/kafka

http://dcos-cluster.com/service/marathon
http://dcos-cluster.com/service/kafka

Access DC/OS slaves info in json using Mesos API [Mesos Endpoints]

curl -H "Authorization: Bearer $(dcos config show 
core.dcos_acs_token)" $(dcos config show 
core.dcos_url)/mesos/slaves | jq

curl -H "Authorization: Bearer $(dcos config show 
core.dcos_acs_token)" $(dcos config show 
core.dcos_url)/mesos/slaves | jq

Access DC/OS slaves info in json using DC/OS cli

dcos node --json

dcos node --json

Note : DC/OS cli ‘dcos node –json’ is equivalent to running mesos slaves endpoint (/mesos/slaves)

Access DC/OS private slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

Access DC/OS public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r

Access DC/OS private and public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort

dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort

Get public IP of all public agents

#!/bin/bash
for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id'); 
do 
      dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
done 2>/dev/null

#!/bin/bash

for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id'); 
do 
      dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
done 2>/dev/null

Note: As ‘dcos node ssh’ requires private key to be added to ssh. Make sure you add your private key as ssh identity using :

ssh-add </path/to/private/key/file/.pem>

ssh-add </path/to/private/key/file/.pem>

Get public IP of master leader

dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null

dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null

Get all master nodes and their private ip

dcos node --json | jq '.[] | select(.type | contains("master"))
| .ip + " = " + .type' -r

dcos node --json | jq '.[] | select(.type | contains("master"))
| .ip + " = " + .type' -r

Get list of all users who have access to DC/OS cluster

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
"$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
"$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r

Add users to cluster using Mesosphere script (Run this on master)

Users to add are given in list.txt, each user on new line

for i in `cat list.txt`; do echo $i;
sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

for i in `cat list.txt`; do echo $i;
sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

Add users to cluster using DC/OS API

#!/bin/bash
# Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
for i in `cat users.list`; 
do 
  echo $i
  curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

#!/bin/bash

# Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
for i in `cat users.list`; 
do 
  echo $i
  curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

Delete users from DC/OS cluster organization

#!/bin/bash
# Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>
for i in `cat users.list`; 
do 
  echo $i
  curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

#!/bin/bash

# Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>

for i in `cat users.list`; 
do 
  echo $i
  curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

Offers/resources from individual DC/OS agent

In recent versions of the many dcos services, a scheduler endpoint at

http://yourcluster.com/service/<service-name>/v1/debug/offers

http://yourcluster.com/service/<service-name>/v1/debug/offers

will display an HTML table containing a summary of recently-evaluated offers. This table’s contents are currently very similar to what can be found in logs, but in a slightly more accessible format. Alternately, we can look at the scheduler’s logs in stdout. An offer is a set of resources all from one individual DC/OS agent.

<DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers

<DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers

Example:

http://dcos-cluster.com/service/kafka/v1/debug/offers
http://dcos-cluster.com/service/elastic/v1/debug/offers

http://dcos-cluster.com/service/kafka/v1/debug/offers
http://dcos-cluster.com/service/elastic/v1/debug/offers

Save JSON configs of all running Marathon apps

#!/bin/bash
# Save marathon configs in json format for all marathon apps
# Usage : saveMarathonConfig.sh
for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
  dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
done

#!/bin/bash

# Save marathon configs in json format for all marathon apps
# Usage : saveMarathonConfig.sh

for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
  dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
done

Get report of Marathon apps with details like container type, Docker image, tag or service version used by Marathon app.

#!/bin/bash
TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
#dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION  else "[ CMD ]" end end' -r > $TMP_CSV_FILE
sed -i "s|^/||g" $TMP_CSV_FILE
sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
cnt=1
printf '%.0s=' {1..150}
printf "n  %-5s%-35s%-23s%-40s%-20sn" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
printf '%.0s=' {1..150}
while IFS=, read -r app typ image; 
do
        tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
        image=`echo $image | awk -F':' '{print $1}'`
        printf "n  %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
        cnt=$((cnt + 1))
        sleep 0.3
done < $TMP_CSV_FILE_SORT
printf "n"
printf '%.0s=' {1..150}
printf "n"

#!/bin/bash

TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
#dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION  else "[ CMD ]" end end' -r > $TMP_CSV_FILE
sed -i "s|^/||g" $TMP_CSV_FILE
sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
cnt=1
printf '%.0s=' {1..150}
printf "n  %-5s%-35s%-23s%-40s%-20sn" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
printf '%.0s=' {1..150}
while IFS=, read -r app typ image; 
do
        tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
        image=`echo $image | awk -F':' '{print $1}'`
        printf "n  %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
        cnt=$((cnt + 1))
        sleep 0.3
done < $TMP_CSV_FILE_SORT
printf "n"
printf '%.0s=' {1..150}
printf "n"

Get DC/OS nodes with more information like node type, node ip, attributes, number of running tasks, free memory, free cpu etc.

#!/bin/bash
printf "n  %-15s %-18s%-18s%-10s%-15s%-10sn" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
printf '%.0s=' {1..90}
printf "n"
TAB=`echo -e "t"`
dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "t"+ if(.type |contains("master")) then .ip else .hostname end + "t" +  (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "t" + ((.resources.mem - .used_resources.mem)| tostring) + "tt" +  ((.resources.cpus - .used_resources.cpus)| tostring)  else "ttNAtNAttNA"  end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
printf '%.0s=' {1..90}
printf "n"

#!/bin/bash

printf "n  %-15s %-18s%-18s%-10s%-15s%-10sn" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
printf '%.0s=' {1..90}
printf "n"
TAB=`echo -e "t"`
dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "t"+ if(.type |contains("master")) then .ip else .hostname end + "t" +  (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "t" + ((.resources.mem - .used_resources.mem)| tostring) + "tt" +  ((.resources.cpus - .used_resources.cpus)| tostring)  else "ttNAtNAttNA"  end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
printf '%.0s=' {1..90}
printf "n"

Framework Cleaner

Uninstall framework and clean reserved resources if any after framework is deleted/uninstalled. (applicable if running DC/OS 1.9 or older, if higher than 1.10, then only uninstall cli is sufficient)

SERVICE_NAME=
dcos package uninstall $SERVICE_NAME
dcos node ssh --option StrictHostKeyChecking=no --master-proxy
--leader "docker run mesosphere/janitor /janitor.py -r
${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"

SERVICE_NAME=
dcos package uninstall $SERVICE_NAME
dcos node ssh --option StrictHostKeyChecking=no --master-proxy
--leader "docker run mesosphere/janitor /janitor.py -r
${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"

Get DC/OS apps and their placement constraints

dcos marathon app list --json | jq '.[] |
if (.constraints != null) then .id, .constraints else empty end'

dcos marathon app list --json | jq '.[] |
if (.constraints != null) then .id, .constraints else empty end'

Run shell command on all slaves

#!/bin/bash
# Run any shell command on all slave nodes (private and public)
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do 
   echo -e "n###> Running command [ $CMD ] on $i"
   dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
   echo -e "======================================n"
done

#!/bin/bash

# Run any shell command on all slave nodes (private and public)

# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do 
   echo -e "n###> Running command [ $CMD ] on $i"
   dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
   echo -e "======================================n"
done

Run shell command on master leader

CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
LogLevel=quiet --master-proxy --leader "$CMD"

CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
LogLevel=quiet --master-proxy --leader "$CMD"

Run shell command on all master nodes

#!/bin/bash
# Run any shell command on all master nodes
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
  echo -e "n###> Running command [ $CMD ] on $i"
  dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
 echo -e "======================================n"
done

#!/bin/bash

# Run any shell command on all master nodes

# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
  echo -e "n###> Running command [ $CMD ] on $i"
  dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
 echo -e "======================================n"
done

Add node attributes to dcos nodes and run apps on nodes with required attributes using placement constraints

#!/bin/bash
#1. SSH on node 
#2. Create or edit file /var/lib/dcos/mesos-slave-common
#3. Add contents as :
#    MESOS_ATTRIBUTES=<key>:<value>
#    Example:
#    MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
#4. Stop dcos-mesos-slave service
#    systemctl stop dcos-mesos-slave
#5. Remove link for latest slave metadata
#    rm -f /var/lib/mesos/slave/meta/slaves/latest
#6. Start dcos-mesos-slave service
#    systemctl start dcos-mesos-slave
#7. Wait for some time, node will be in HEALTHY state again.
#8. Add app placement constraint with field = key and value = value
#9. Verify attributes, run on any node
#    curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
#    OR Check DCOS cluster UI
#    Nodes => Select any Node => Details Tab
tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")
# key:value paired attribues, separated by ;
ATTRIBUTES=NODE_TYPE:GPU_NODE
cat <<EOF > ${tmpScript}
echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
sudo systemctl stop dcos-mesos-slave
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl start dcos-mesos-slave
EOF
# Add the private ip of nodes on which you want to add attrubutes, one ip per line.
for i in `cat nodes.txt`; do 
    echo $i
    dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
    sleep 10
done

#!/bin/bash

#1. SSH on node 
#2. Create or edit file /var/lib/dcos/mesos-slave-common
#3. Add contents as :
#    MESOS_ATTRIBUTES=<key>:<value>
#    Example:
#    MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
#4. Stop dcos-mesos-slave service
#    systemctl stop dcos-mesos-slave
#5. Remove link for latest slave metadata
#    rm -f /var/lib/mesos/slave/meta/slaves/latest
#6. Start dcos-mesos-slave service
#    systemctl start dcos-mesos-slave
#7. Wait for some time, node will be in HEALTHY state again.
#8. Add app placement constraint with field = key and value = value
#9. Verify attributes, run on any node
#    curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
#    OR Check DCOS cluster UI
#    Nodes => Select any Node => Details Tab

tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")

# key:value paired attribues, separated by ;
ATTRIBUTES=NODE_TYPE:GPU_NODE

cat <<EOF > ${tmpScript}
echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
sudo systemctl stop dcos-mesos-slave
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl start dcos-mesos-slave
EOF

# Add the private ip of nodes on which you want to add attrubutes, one ip per line.
for i in `cat nodes.txt`; do 
    echo $i
    dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
    sleep 10
done

Install DC/OS Datadog metrics plugin on all DC/OS nodes

#!/bin/bash

# Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>

DDAPI=$1

if [[ -z $DDAPI ]]; then
    echo "[Datadog Plugin] Need datadog API key as parameter."
    echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
fi
tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")

declare agent=$tmpScriptAgent
declare master=$tmpScriptMaster

for role in "agent" "master"
do
cat <<EOF > ${!role}
curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
export DD_API_KEY=$DDAPI
export AGENT_ROLE=$role
sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
sudo systemctl daemon-reload
sudo systemctl start dcos-metrics-datadog.service
echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
#sudo systemctl status dcos-metrics-datadog.service | head -3
#sudo journalctl -u dcos-metrics-datadog
EOF
done

echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"

for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` 
do 
    echo -e "\n###> Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
    echo -e "======================================================="
done

for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
    echo -e "\n###> Master Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
    echo -e "======================================================="
done

# Check status of dcos-metrics-datadog.service on all nodes.
#for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do  echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

#!/bin/bash

# Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>

DDAPI=$1

if [[ -z $DDAPI ]]; then
    echo "[Datadog Plugin] Need datadog API key as parameter."
    echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
fi
tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")

declare agent=$tmpScriptAgent
declare master=$tmpScriptMaster

for role in "agent" "master"
do
cat <<EOF > ${!role}
curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
export DD_API_KEY=$DDAPI
export AGENT_ROLE=$role
sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
sudo systemctl daemon-reload
sudo systemctl start dcos-metrics-datadog.service
echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
#sudo systemctl status dcos-metrics-datadog.service | head -3
#sudo journalctl -u dcos-metrics-datadog
EOF
done

echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"

for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` 
do 
    echo -e "\n###> Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
    echo -e "======================================================="
done

for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
    echo -e "\n###> Master Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
    echo -e "======================================================="
done

# Check status of dcos-metrics-datadog.service on all nodes.
#for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do  echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

Get app / node metrics fetched by dcos-metrics component using metrics API

Get DC/OS node id [dcos node]
Get Node metrics (CPU, memory, local filesystems, networks, etc) : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/node</agent_id></dc>
Get id of all containers running on that agent : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers</agent_id></dc>
Get Resource allocation and usage for the given container ID. : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id></container_id></agent_id></dc>
Get Application-level metrics from the container (shipped in StatsD format using the listener available at STATSD_UDP_HOST and STATSD_UDP_PORT) : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id>/app </container_id></agent_id></dc>

Get app / node metrics fetched by dcos-metrics component using dcos cli

Summary of container metrics for a specific task

dcos task metrics summary <task-id>

dcos task metrics summary <task-id>

All metrics in details for a specific task

dcos task metrics details <task-id>

dcos task metrics details <task-id>

Summary of Node metrics for a specific node

dcos task metrics summary <mesos-node-id>

dcos task metrics summary <mesos-node-id>

All Node metrics in details for a specific node

dcos node metrics details <mesos-node-id>

dcos node metrics details <mesos-node-id>

NOTE – All above commands have ‘–json’ flag to use them programmatically.

Launch / run command inside container for a task

DC/OS task exec cli only supports Mesos containers, this script supports both Mesos and Docker containers.

#!/bin/bash
echo "DCOS Task Exec 2.0"
if [ "$#" -eq 0 ]; then
        echo "Need task name or id as input. Exiting."
        exit 1
fi
taskName=$1
taskCmd=${2:-bash}
TMP_TASKLIST_JSON=/tmp/dcostasklist.json
dcos task --json > $TMP_TASKLIST_JSON
taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
if [[ $taskExist -eq 0 ]]; then 
        echo "No task with name $taskName exists."
        echo "Do you mean ?"
        dcos task | grep $taskName | awk '{print $1}'
        exit 1
fi
taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
if [[ $taskExist -ne 1 ]]; then
        echo -e "More than one instances. Please select task ID for executing command.n"
        #allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
        echo ""
        read TaskId
fi
if [[ $taskType !=  "DOCKER" ]]; then
        echo "Task [ $taskName ] is of type MESOS Container."
        execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
        echo "Running [$execCmd]"
        $execCmd
else
        echo "Task [ $taskName ] is of type DOCKER Container."
        taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
        echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
        taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
        taskContID=`echo $taskContID | tr -d 'r'`
        echo "Task Docker Container ID : [ $taskContID ]"
        echo "Running [ docker exec -it $taskContID $taskCmd ]"
        dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
fi

#!/bin/bash

echo "DCOS Task Exec 2.0"
if [ "$#" -eq 0 ]; then
        echo "Need task name or id as input. Exiting."
        exit 1
fi
taskName=$1
taskCmd=${2:-bash}
TMP_TASKLIST_JSON=/tmp/dcostasklist.json
dcos task --json > $TMP_TASKLIST_JSON
taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
if [[ $taskExist -eq 0 ]]; then 
        echo "No task with name $taskName exists."
        echo "Do you mean ?"
        dcos task | grep $taskName | awk '{print $1}'
        exit 1
fi
taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
if [[ $taskExist -ne 1 ]]; then
        echo -e "More than one instances. Please select task ID for executing command.n"
        #allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
        echo ""
        read TaskId
fi
if [[ $taskType !=  "DOCKER" ]]; then
        echo "Task [ $taskName ] is of type MESOS Container."
        execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
        echo "Running [$execCmd]"
        $execCmd
else
        echo "Task [ $taskName ] is of type DOCKER Container."
        taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
        echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
        taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
        taskContID=`echo $taskContID | tr -d 'r'`
        echo "Task Docker Container ID : [ $taskContID ]"
        echo "Running [ docker exec -it $taskContID $taskCmd ]"
        dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
fi

Get DC/OS tasks by node

#!/bin/bash 
function tasksByNodeAPI
{
    echo "DC/OS Tasks By Node"
    if [ "$#" -eq 0 ]; then
        echo "Need node ip as input. Exiting."
        exit 1
    fi
    nodeIp=$1
    mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
    if [ -z "mesosId" ]; then
        echo "No node found with ip $nodeIp. Exiting."
        exit 1
    fi
    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "ttt" + .id'  -r
}
function tasksByNodeCLI
{
        echo "DC/OS Tasks By Node"
        if [ "$#" -eq 0 ]; then
                echo "Need node ip as input. Exiting."
                exit 1
        fi
        nodeIp=$1
        dcos task | egrep "HOST|$nodeIp"
}

#!/bin/bash 

function tasksByNodeAPI
{
    echo "DC/OS Tasks By Node"
    if [ "$#" -eq 0 ]; then
        echo "Need node ip as input. Exiting."
        exit 1
    fi
    nodeIp=$1
    mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
    if [ -z "mesosId" ]; then
        echo "No node found with ip $nodeIp. Exiting."
        exit 1
    fi
    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "ttt" + .id'  -r
}

function tasksByNodeCLI
{
        echo "DC/OS Tasks By Node"
        if [ "$#" -eq 0 ]; then
                echo "Need node ip as input. Exiting."
                exit 1
        fi
        nodeIp=$1
        dcos task | egrep "HOST|$nodeIp"
}

Get cluster metadata – cluster Public IP and cluster ID

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"           
$(dcos config show core.dcos_url)/metadata

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"           
$(dcos config show core.dcos_url)/metadata

Sample Output:

{
"PUBLIC_IPV4": "123.456.789.012",
"CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
}

{
"PUBLIC_IPV4": "123.456.789.012",
"CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
}

Get DC/OS metadata – DC/OS version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq

Sample Output:

{
"version": "1.11.0",
"dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
"bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
}

{
"version": "1.11.0",
"dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
"bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
}

Get Mesos version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/version

Sample Output:

{
"build_date": "2018-02-27 21:31:27",
"build_time": 1519767087.0,
"build_user": "",
"git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
"version": "1.5.0"
}

{
"build_date": "2018-02-27 21:31:27",
"build_time": 1519767087.0,
"build_user": "",
"git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
"version": "1.5.0"
}

Access DC/OS cluster exhibitor UI (Exhibitor supervises ZooKeeper and provides a management web interface)

<CLUSTER_URL>/exhibitor

<CLUSTER_URL>/exhibitor

Access DC/OS cluster data from cluster zookeeper using Zookeeper Python client – Run inside any node / container

from kazoo.client import KazooClient
zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
zk.start()
clusterId = ""
# Here we can give znode path to retrieve its decoded data,
# for ex to get cluster-id, use
# data, stat = zk.get("/cluster-id")
# clusterId = data.decode("utf-8")
# Get cluster Id
if zk.exists("/cluster-id"):
    data, stat = zk.get("/cluster-id")
    clusterId = data.decode("utf-8")
zk.stop()
print (clusterId)

from kazoo.client import KazooClient

zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
zk.start()

clusterId = ""
# Here we can give znode path to retrieve its decoded data,
# for ex to get cluster-id, use
# data, stat = zk.get("/cluster-id")
# clusterId = data.decode("utf-8")

# Get cluster Id
if zk.exists("/cluster-id"):
    data, stat = zk.get("/cluster-id")
    clusterId = data.decode("utf-8")

zk.stop()

print (clusterId)

Access dcos cluster data from cluster zookeeper using exhibitor rest API

# Get znode data using endpoint :
# /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
# Example : Get znode data for path = /cluster-id
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id

# Get znode data using endpoint :
# /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
# Example : Get znode data for path = /cluster-id
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id

Sample Output:

{
"bytes": "3333-XXXXXX",
"str": "abcde-abcde-abcde-abcde-abcde-",
"stat": "XXXXXX"
}

{
"bytes": "3333-XXXXXX",
"str": "abcde-abcde-abcde-abcde-abcde-",
"stat": "XXXXXX"
}

Get cluster name using Mesos API

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

Mark Mesos node as decommissioned

Some times instances which are running as DC/OS node gets terminated and can not come back online, like AWS EC2 instances, once terminated due to any reason, can not start back. When Mesos detects that a node has stopped, it puts the node in the UNREACHABLE state because Mesos does not know if the node is temporarily stopped and will come back online, or if it is permanently stopped. In such case, we can explicitly tell Mesos to put a node in the GONE state if we know a node will not come back.

dcos node decommission <mesos-agent-id>

dcos node decommission <mesos-agent-id>

Conclusion

We learned about Mesosphere DC/OS, its functionality and roles. We also learned how to setup and use DC/OS cli and use http authentication to access DC/OS APIs as well as using DC/OS cli for automating tasks.

We went through different API endpoints like Mesos, Marathon, DC/OS metrics, exhibitor, DC/OS cluster organization etc. Finally, we looked at different tricks and scripts to automate DC/OS, like DC/OS node details, task exec, Docker report, DC/OS API http authentication etc.

December 12, 2022

To Go Serverless Or Not Is The Question
AWS Lambda was launched in 2014. Since then, serverless computing (Function as a Service) came into existence. We have been using Lambda for our projects for the last couple of years to build complete end-to-end web applications which includes usage of AWS Lambda with API Gateway (REST APIs), CloudWatch (logs), S3 (website hosting & data storage), and so on.

Google and Azure also provide serverless technologies like AWS. There is one more popular open-source solution, i.e. OpenWhisk. While implementing serverless applications on AWS, we have learned a lot about running a website on Lambda. Like every other technology, serverless also has its own set of benefits and drawbacks that we will discuss here.

‍What Does Serverless Mean?‍

Serverless is a dynamic cloud computing execution model where the server is run by the cloud providers i.e. AWS, Google, or Azure. This technology actually runs on the servers, but when they say serverless, it means that the servers are abstracted away from the users and provided as a service to them.‍

The Serverless World‍

There’s so much excitement for serverless in the industry. But there are issues that sometimes outweigh the pros of serverless architecture and would need complex workarounds. AWS charges for each invocation of Lambda in multiple of 100ms increments. When there are thousands of incoming requests coming up for EC2 servers, we need to scale up servers to handle them, but Lambda does this on its own. We don’t need to create auto-scaling or load balancers. But, how much does it cost to use Lambda? Let’s compare that below.

Let’s say, we have a serverless application with only 1 Lambda & 1 API Gateway,
- API Gateway
  $3.50/API calls * 200 million API requests/Month = 700 USD
- Lambda
  $0.00001667 GB-second * (200 million requests * 0.3 seconds per execution * 1 GB Memory – 400k free tier seconds in case of new account) = 1308 USD‍
- Total = 2008 USD (This is a lot)
Now let’s see the example of the EC2 server,
- 3 Highly available EC2 Server = 416 USD
  M5.xlarge: 16GB RAM, 4 vCPUs
- Application Load Balancer = 39 USD
- Total = 455 USD/Month
So, if you compare the above pricing, classic servers are cheaper than serverless.

This can be really useful for startups in their early stages where every rupee counts. In that case, Lambda or serverless will be very useful as it charges only for the number of hits coming on your server with less management and more development for the team. For example – you have your development environment for developers, so, instead of setting up new servers, you can go with serverless development.‍

Loss Of Control‍

One of the biggest disadvantages of serverless is that you don’t have the control over your services. We use a lot of services that are managed by third-party cloud providers, like Cloudwatch for logs and DynamoDB for databases. Also, various functions need to be managed as your project grows, and everything is handled by cloud providers. You lose portability as soon as you integrate with other services like Lambda with SNS, DynamoDB, Kinesis, and it also results in vendor lock-in. It becomes difficult for you to change the vendor later.

On the other hand, in the non-serverless world, we can manage our language versions, queues, or db queries. Basically. we have all codes at one place where we don’t need to manage multiple functions. But every technology has its pros and cons which we said earlier as well. In serverless, there is a loss of control that leads to focusing less on development and more on adding the business values of our product.

Choosing serverless or non-serverless will completely depend on the product type. If you have a simple application like selling cakes online and you need simple implementation or authentication, you can go with serverless. But if your application is really complex, you need to add some complex algorithm. To have the control over your code, security, and authentication, you should go with non-serverless.‍

Security Issues‍

The biggest risk in serverless or using cloud services is poorly configured functions, services, or applications. Bad configuration can lead to multiple issues in your application which can be either security-related or infrastructure-related. It doesn’t matter which cloud provider you are using, AWS, GCP, or Azure, it’s important to correctly configure your functions or services with the permission it needs to access other services and manage controls. Otherwise, it can lead to permission issues or security breach. Also, if you are connecting any third-party APIs with your provider, make sure the connections are safe and data is encrypted in the right format.

Giving correct configuration is the most important thing in both serverless and non-serverless applications. When you use cloud services and be very strict about it, you will interact less with security breaches or permission issues in the near future.‍

Testing & Debugging‍

Serverless applications are hard to test. Normally, developers test the code locally and then deploy it. But in the serverless world, testing on local seems to be complicated, as no such tool is available to mock the cloud services on the local environment. So, we need to perform a decent amount of integration testing before moving forward. Currently, you can test & debug the code using Console or Print statement which will be visible in your Cloudwatch logs like below is one code snippet in Node.js.
const https = require('https') let url = "https://google.com" exports.handler = async function(event) { const promise = new Promise(function(resolve, reject) { console.log("Processing URL: "url) https.get(url, (res) => { resolve(res.statusCode) // console for debugging / testing purpose. console.info("Request was successfull!!!") }).on('error', (e) => { reject(Error(e)) // console for errors console.error("Error while processing:" + e) }) }) return promise }
```
const https = require('https')
let url = "https://google.com"

exports.handler = async function(event) {
  const promise = new Promise(function(resolve, reject) {
    console.log("Processing URL: "url)
    https.get(url, (res) => {
        resolve(res.statusCode)
        // console for debugging / testing purpose.
        console.info("Request was successfull!!!")
      }).on('error', (e) => {
        reject(Error(e))
        // console for errors
        console.error("Error while processing:" + e)
      })
    })
  return promise
}
```
For serverless applications, it is important to give some time & effort upfront to architect your application correctly and create good integration tests over cloud infrastructure.

It is difficult to test or debug the applications in serverless. In non-serverless applications, we debug the code, but in serverless, we need to debug end-to-end integration with multiple services that we use. Lambdas are so short-lived that till the time you search for the logs, they disappear. So, in this situation, we can use AWS Cloudwatch or Google Stackdriver that are meant to do that.

Cold Start

Regular cold start

Source: AWS re: Invent

An issue remains an issue until you trace that, and some technical issues are hard to find until you know or face them. Yes, Lambda has one such drawback which is known as Cold start. Lambda gets cold, it means, Lambda code runs on the server which is managed by Amazon. To make it feasible, Amazon doesn’t keep everyone’s code warm, i.e. it doesn’t serve all requests at the same time. So, if your particular function hasn’t run in a while, a request has to wait for Lambda to spin up the server then invoke the code, which will take some time for Lambda to give the result for that request.

But, wait for how long? I was using Node.js and it took around 4 seconds to respond. This is not good for the end-user experience and it can impact your business. This kind of issue is not tolerable in today’s world where we need requests to respond faster to provide a better user experience.

The problem is not much for limited Lambdas, but what if the number of Lambdas increases. Let’s say, there are 50s-100s of Lambdas, and warming up every Lambda can be annoying. You have to call Lambdas before the user calls it again, I mean, why? But there isn’t any solution rather than warming it. I particularly used the Serverless Framework for my serverless implementation. It helped me achieve most of the problems of Lambdas and other resources that we used to build serverless applications.

Conclusion

Serverless has many problems, I agree, but which tech doesn’t. When you choose either serverless or non-serverless, make sure you do your study and analyze your requirements to decide which direction to enter. If you want to implement quicker, small applications with strict deadlines and less budget, go with serverless, otherwise, choose EC2 servers. It mainly depends on the requirements. If you are using serverless, some frameworks will help you a lot. Also, you can compare the pricing here.

If you are new to serverless and want to implement it from scratch, you can have a look at the following link.

Currently, serverless has its downsides, but hoping that Amazon and other cloud providers will come up with some good solutions to make it more efficient. We look forward to learning as the technology evolves.
December 12, 2022

Your Quintessential Guide to AWS Athena

Introduction

Serverless has become a new trend today and is here to stay for sure! Now when you think of wireless internet, you know that it still has some wires but you don’t need to worry about them as you don’t have to maintain them. Similarly, serverless has servers but you don’t have to keep worrying about handling or maintaining them. All you need to do is focus on your code and you’re good to go.

It has some more benefits, such as:

Zero administration: You can deploy code without provisioning anything beforehand, or managing anything later. There is no concept of a fleet, an instance, or even an operating system.
Auto-scaling: It lets your service providers manage the scaling challenges. You don’t need to fire alerts or write scripts to scale up and down. It handles quick bursts of traffic and weekend lulls the same way.
Pay-per-use: The function-as-a-service compute and managed services are charged based on usage rather than pre-provisioned capacity. You can have complete resource utilization without paying a cent for idle time. The results? 90% cost-savings over a cloud VM, and the satisfaction of knowing that you never pay for resources you don’t use.

What is AWS Athena?

AWS Athena is a similar serverless service. It is more of an interactive query service than a code deployment service.

Using Athena one can directly query the data stored in S3 buckets and using standard ANSI SQL.

As mentioned earlier, it works on the principle of serverless, that is, there is no infrastructure to manage, and you pay only for the queries that you run.

Athena is easy to use. You can simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

It is based on Facebook’s PrestoDB and can be used to query structured and semi-structured data.

Some Exciting Features of Athena are:

Serverless. No ETL – Not having to set up and manage any servers or data warehouses.
Only pay for the data that is scanned.
You can ensure better performance by compressing, partitioning, and converting your data into columnar formats.
Can also handle complex analysis, including large joins, window functions, and arrays.
Athena automatically executes queries in parallel.
Need to provide a path to the S3 folder and when new files added automatically reflects in the table.
Supports –
Support CSV, Json, Parquet, ORC, Avro data formats
Complex Joins and datatypes
View creation
Does not Support –
User-defined functions and stored procedures
Hive or Presto transactions
LZO (Snappy is supported)

Pricing of Athena

AWS Athena is priced $5 for each TB of data scanned.
Queries are rounded up to the nearest MB, with a 10 MB minimum.
Users pay for stored data at regular S3 rates.
Amazon advises users to use compressed data files, have data in columnar formats, and routinely delete old results sets to keep charges low. Partitioning data in tables can speed up queries and reduce query bills.

Athena vs. Redshift Spectrum

AWS also has Redshift as data warehouse service, and we can use redshift spectrum to query S3 data, so then why should you use Athena?

Advantages of Redshift Spectrum:

Allows creation of Redshift tables. You’re able to join Redshift tables with Redshift spectrum tables efficiently.

If you do not need those things then you should consider Athena as well Athena differences from Redshift spectrum:

Billing. This is a major difference and depending on your use case you may find one much cheaper than the other Performance.
Athena slightly faster. SQL syntax and features.
Athena is derived from presto and is a bit different to Redshift which has its roots in Postgres.
It’s easy enough to connect to Athena using API, JDBC or ODBC but many more products offer “standard out of the box” connection to Redshift.
Athena has GIS functions and lambdas.

So in nutshell, if you have existing instances of redshift you would probably go for Redshift Spectrum, if not then you can opt for Athena for querying the data. In some cases, you can use both in tandem.

Example

Here is a sample query to create a sample database having 3 tables basic_details, contact_details and bill_details, Uploaded csv file to s3:
‍
Basic_details:

const outside = {weather: FRIGHTFUL}
const inside = {fire: DELIGHTFUL}
const go = places => places.some(p=>p>outside.weather)))

const snow = () => (outside.weather < inside.fire && !go(places)) {
  let it = snow()
}

let it = snow()

const FRIGHTFUL = 1
const DELIGHTFUL = 1337

const outside = {weather: FRIGHTFUL}
const inside = {fire: DELIGHTFUL}
const go = places => places.some(p=>p>outside.weather)))

const snow = () => (outside.weather < inside.fire && !go(places)) {
  let it = snow()
}

let it = snow()

const FRIGHTFUL = 1
const DELIGHTFUL = 1337

Bill_details:

CREATE EXTERNAL TABLE `bil_details`(
  `id` int COMMENT '', 
  `amount_paid` string COMMENT '', 
  `amount_due` string COMMENT '')
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://athena-blog/bill-details'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'skip.header.line.count'='1')

CREATE EXTERNAL TABLE `bil_details`(
  `id` int COMMENT '', 
  `amount_paid` string COMMENT '', 
  `amount_due` string COMMENT '')
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://athena-blog/bill-details'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'skip.header.line.count'='1')

‍Contact_details:

CREATE EXTERNAL TABLE `contact_details`(
  `id` int COMMENT '', 
  `street` string COMMENT '', 
  `city` string COMMENT '', 
  `state` string COMMENT '', 
  `country` string COMMENT '', 
  `zip` string COMMENT '')
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://athena-blog/contact-details'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'skip.header.line.count'='1')

CREATE EXTERNAL TABLE `contact_details`(
  `id` int COMMENT '', 
  `street` string COMMENT '', 
  `city` string COMMENT '', 
  `state` string COMMENT '', 
  `country` string COMMENT '', 
  `zip` string COMMENT '')
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://athena-blog/contact-details'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'skip.header.line.count'='1')

Sample Query for – FirstNames of People from Minnesota with amount_due > $100

WITH basic AS 
    (SELECT id,
         first_name
    FROM basic_details
    WHERE lower(gender) = 'male' ), bill AS 
    (SELECT id
    FROM bil_details
    WHERE CAST(amount_due AS INTEGER) > 100 ), contact AS 
    (SELECT contact_details.id
    FROM contact_details
    JOIN bill
        ON contact_details.id = bill.id
    WHERE state= 'Minnesota' )
SELECT basic.first_name
FROM basic
JOIN contact
    ON basic.id = contact.id

WITH basic AS 
    (SELECT id,
         first_name
    FROM basic_details
    WHERE lower(gender) = 'male' ), bill AS 
    (SELECT id
    FROM bil_details
    WHERE CAST(amount_due AS INTEGER) > 100 ), contact AS 
    (SELECT contact_details.id
    FROM contact_details
    JOIN bill
        ON contact_details.id = bill.id
    WHERE state= 'Minnesota' )
SELECT basic.first_name
FROM basic
JOIN contact
    ON basic.id = contact.id

Output:

Some Other Sample Queries:

1. Searching for Values in JSON

WITH dataset AS (
  SELECT * FROM (VALUES
    (JSON '{"name": "Bob Smith", "org": "legal", "projects": ["project1"]}'),
    (JSON '{"name": "Susan Smith", "org": "engineering", "projects": ["project1", "project2", "project3"]}'),
    (JSON '{"name": "Jane Smith", "org": "finance", "projects": ["project1", "project2"]}')
  ) AS t (users)
)
SELECT json_extract_scalar(users, '$.name') AS user
FROM dataset
WHERE json_array_contains(json_extract(users, '$.projects'), 'project2')

WITH dataset AS (
  SELECT * FROM (VALUES
    (JSON '{"name": "Bob Smith", "org": "legal", "projects": ["project1"]}'),
    (JSON '{"name": "Susan Smith", "org": "engineering", "projects": ["project1", "project2", "project3"]}'),
    (JSON '{"name": "Jane Smith", "org": "finance", "projects": ["project1", "project2"]}')
  ) AS t (users)
)
SELECT json_extract_scalar(users, '$.name') AS user
FROM dataset
WHERE json_array_contains(json_extract(users, '$.projects'), 'project2')

Output:

2. Extracting properties

WITH dataset AS (
  SELECT '{"name": "Susan Smith",
           "org": "engineering",
           "projects": [{"name":"project1", "completed":false},
           {"name":"project2", "completed":true}]}'
    AS blob
)
SELECT
  json_extract(blob, '$.name') AS name,
  json_extract(blob, '$.projects') AS projects
FROM dataset

WITH dataset AS (
  SELECT '{"name": "Susan Smith",
           "org": "engineering",
           "projects": [{"name":"project1", "completed":false},
           {"name":"project2", "completed":true}]}'
    AS blob
)
SELECT
  json_extract(blob, '$.name') AS name,
  json_extract(blob, '$.projects') AS projects
FROM dataset

Output:

3. Converting JSON to Athena Data Types

WITH dataset AS (
  SELECT
    CAST(JSON '"HELLO ATHENA"' AS VARCHAR) AS hello_msg,
    CAST(JSON '12345' AS INTEGER) AS some_int,
    CAST(JSON '{"a":1,"b":2}' AS MAP(VARCHAR, INTEGER)) AS some_map
)
SELECT * FROM dataset

WITH dataset AS (
  SELECT
    CAST(JSON '"HELLO ATHENA"' AS VARCHAR) AS hello_msg,
    CAST(JSON '12345' AS INTEGER) AS some_int,
    CAST(JSON '{"a":1,"b":2}' AS MAP(VARCHAR, INTEGER)) AS some_map
)
SELECT * FROM dataset

Output:

Conclusion

Hence, we can easily say that AWS Athena gives us an efficient way to query our raw data present in different formats in S3 object storage, without spawning a dedicated infrastructure and at minimal cost.

Need help with setting up AWS Athena for your organization? Connect with the experts at Velotio!

December 12, 2022

Building Google Photos Alternative Using AWS Serverless

Being an avid Google Photos user, I really love some of its features, such as album, face search, and unlimited storage. However, when Google announced the end of unlimited storage on June 1st, 2021, I started thinking about how I could create a cheaper solution that would meet my photo backup requirement.

“Taking an image, freezing a moment, reveals how rich reality truly is.”

– Anonymous

‍

Google offers 100 GB of storage for 130 INR. This storage can be used across various Google applications. However, I don’t use all the space in one go. For me, I snap photos randomly. Sometimes, I visit places and take random snaps with my DSLR and smartphone. So, in general, I upload approximately 200 photos monthly. The size of these photos varies in the range of 4MB to 30MB. On average, I may be using 4GB of monthly storage for backup on my external hard drive to keep raw photos, even the bad ones. Photos backed up on the cloud should be visually high-quality, and it’s good to have a raw copy available at the same time, so that you may do some lightroom changes (although I never touch them 😛). So, here is my minimal requirement:

Should support social authentication (Google sign-in preferred).
Photos should be stored securely in raw format.
Storage should be scaled with usage.
Uploading and downloading photos should be easy.
Web view for preview would be a plus.
Should have almost no operations headache and solution should be as cheap as possible 😉.

Selecting Tech Stack

To avoid operation headaches with servers going down, scaling, or maybe application crashing and overall monitoring, I opted for a serverless solution with AWS. The AWS S3 is infinite scalable storage and you only pay for the amount of storage you used. On top of that, you can opt for the S3 storage class, which is efficient and cost-effective.

– Infrastructure Stack
‍
1. AWS API Gateway (http api)
2. AWS Lambda (for processing images and API gateway queries)
3. Dynamodb (for storing image metadata)
4. AWS Cognito (for authentication)
5. AWS S3 Bucket (for storage and web application hosting)
6. AWS Certificate Manager (to use SSL certificate for a custom domain with API gateway)

– Software Stack

1. NodeJS
2. ReactJS and Material-UI (front-end framework and UI)
3. AWS Amplify (for simplifying auth flow with cognito)
4. Sharp (high-speed nodejs library for converting images)
5. Express and serversless-http
6. Infinite Scroller (for gallery view)
7. Serverless Framework (for ease of deployment and Infrastructure as Code)

Create S3 Buckets:

We will create three S3 buckets. Create one for hosting a frontend application (refer to architecture diagram, more on this discussed later in the build and hosting part). The second one is for temporarily uploading images. The third one is for actual backup and storage (enable server-side encryption on this bucket). A temporary upload bucket will process uploaded images.

During pre-processing, we will resize the original image into two different sizes. One is for thumbnail purposes (400px width), another one is for viewing purposes, but with reduced quality (webp format). Once images are resized, upload all three (raw, thumbnail, and webview) to the third S3 bucket and create a record in dynamodb. Set up object expiry policy on the temporary bucket for 1 day. This way, uploaded objects are automatically deleted from the temporary bucket.

Setup trigger on the temporary bucket for uploaded images:

We will need to set up an S3 PUT event, which will trigger our Lambda function to download and process images. We will filter the suffix jpg (and jpeg) for an event trigger, meaning that any file with extension .jpg and .jpeg uploaded to our temporary bucket will automatically invoke a lambda function with the event payload. The lambda function with the help of the event payload will download the uploaded file and perform processing. Your serverless function definition would look like:

functions:
 lambda:
   handler: index.handler
   memorySize: 512
   timeout: 60
   layers:
     - {Ref: PhotoParserLibsLambdaLayer}
   events:
     - s3:
         bucket: your-temporary-bucket-name
         event: s3:ObjectCreated:*
         rules:
           - suffix: .jpg
         existing: true
     - s3:
         bucket: your-temporary-bucket-name
         event: s3:ObjectCreated:*
         rules:
           - suffix: .jpeg
         existing: true

functions:
 lambda:
   handler: index.handler
   memorySize: 512
   timeout: 60
   layers:
     - {Ref: PhotoParserLibsLambdaLayer}
   events:
     - s3:
         bucket: your-temporary-bucket-name
         event: s3:ObjectCreated:*
         rules:
           - suffix: .jpg
         existing: true
     - s3:
         bucket: your-temporary-bucket-name
         event: s3:ObjectCreated:*
         rules:
           - suffix: .jpeg
         existing: true

Notice that in the YAML events section, we set “existing:true”. This ensures that the bucket will not be created during the serverless deployment. However, if you plan not to manually create your s3 bucket, you can let the framework create a bucket for you.

DynamoDB as metadatadb:

AWS dynamodb is a key-value document db that is suitable for our use case. Dynamodb will help us retrieve the list of photos available in the time series. Dynamodb uses a primary key for uniquely identifying each record. A primary key can be composed of a hash key and range key (also called a sort key). A range key is optional. We will use a federated identity ID (discussed in setup authorization) as the hash key (partition key) and name it the username for attribute definition with the type string. We will use the timestamp attribute definition name as a range key with a type number. Range key will help us query results with time-series (Unix epoch). We can also use dynamodb secondary indexes to sort results more specifically. However, to keep the application simple, we’re going to opt-out of this feature for now. Your serverless resource definition would look like:

resources:
 Resources:
   MetaDataDB:
     Type: AWS::DynamoDB::Table
     Properties:
       TableName: your-dynamodb-table-name
       AttributeDefinitions:
         - AttributeName: username
           AttributeType: S
         - AttributeName: timestamp
           AttributeType: N
       KeySchema:
         - AttributeName: username
           KeyType: HASH
         - AttributeName: timestamp
           KeyType: RANGE
       BillingMode: PAY_PER_REQUEST

resources:
 Resources:
   MetaDataDB:
     Type: AWS::DynamoDB::Table
     Properties:
       TableName: your-dynamodb-table-name
       AttributeDefinitions:
         - AttributeName: username
           AttributeType: S
         - AttributeName: timestamp
           AttributeType: N
       KeySchema:
         - AttributeName: username
           KeyType: HASH
         - AttributeName: timestamp
           KeyType: RANGE
       BillingMode: PAY_PER_REQUEST

Finally, you also need to set up the IAM role so that the process image lambda function would have access to the S3 bucket and dynamodb. Here is the serverless definition for the IAM role.

# you can add statements to the Lambda function's IAM Role here
 iam:
   role:
     statements:
     - Effect: "Allow"
       Action:
         - "s3:ListBucket"
       Resource:
         - arn:aws:s3:::your-temporary-bucket-name
         - arn:aws:s3:::your-actual-photo-bucket-name
     - Effect: "Allow"
       Action:
         - "s3:GetObject"
         - "s3:DeleteObject"
       Resource: arn:aws:s3:::your-temporary-bucket-name/*
     - Effect: "Allow"
       Action:
         - "s3:PutObject"
       Resource: arn:aws:s3:::your-actual-photo-bucket-name/*
     - Effect: "Allow"
       Action:
         - "dynamodb:PutItem"
       Resource:
         - Fn::GetAtt: [ MetaDataDB, Arn ]

# you can add statements to the Lambda function's IAM Role here
 iam:
   role:
     statements:
     - Effect: "Allow"
       Action:
         - "s3:ListBucket"
       Resource:
         - arn:aws:s3:::your-temporary-bucket-name
         - arn:aws:s3:::your-actual-photo-bucket-name
     - Effect: "Allow"
       Action:
         - "s3:GetObject"
         - "s3:DeleteObject"
       Resource: arn:aws:s3:::your-temporary-bucket-name/*
     - Effect: "Allow"
       Action:
         - "s3:PutObject"
       Resource: arn:aws:s3:::your-actual-photo-bucket-name/*
     - Effect: "Allow"
       Action:
         - "dynamodb:PutItem"
       Resource:
         - Fn::GetAtt: [ MetaDataDB, Arn ]

Setup Authentication:

Okay, to set up a Cognito user pool, head to the Cognito console and create a user pool with below config:

1. Pool Name: photobucket-users

2. How do you want your end-users to sign in?

Select: Email Address or Phone Number
Select: Allow Email Addresses
Check: (Recommended) Enable case insensitivity for username input

3. Which standard attributes are required?

4. Keep the defaults for “Policies”

5. MFA and Verification:

I opted to manually reset the password for each user (since this is internal app)
Disabled user verification

6. Keep the default for Message Customizations, tags, and devices.

7. App Clients :

App client name: myappclient
Let the refresh token, access token, and id token be default
Check all “Auth flow configurations”
Check enable token revocation

8. Skip Triggers

9. Review and create the pool

Once created, goto app integration -> domain name. Create a domain Cognito subdomain of your choice and note this. Next, I plan to use the Google sign-in feature with Cognito Federation Identity Providers. Use this guide to set up a Google social identity with Cognito.

Setup Authorization:

Once the user identity is verified, we need to allow them to access the s3 bucket with limited permissions. Head to the Cognito console, select federated identities, and create a new identity pool. Follow these steps to configure:

1. Identity pool name: photobucket_auth

2. Keep Unauthenticated and Authentication flow settings unchecked.

3. Authentication providers:

User Pool I: Enter the user pool ID obtained during authentication setup
App Client I: Enter the app client ID generated during the authentication setup. (Cognito user pool -> App Clients -> App client ID)

4. Setup permissions:

Expand view details (Role Summary)
For authenticated identities: edit policy document and use the below JSON policy and skip unauthenticated identities with the default configuration.

{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Effect": "Allow",
           "Action": [
               "mobileanalytics:PutEvents",
               "cognito-sync:*",
               "cognito-identity:*"
           ],
           "Resource": [
               "*"
           ]
       },
       {
           "Sid": "ListYourObjects",
           "Effect": "Allow",
           "Action": "s3:ListBucket",
           "Resource": [
               "arn:aws:s3:::your-actual-photo-bucket-name"
           ],
           "Condition": {
               "StringLike": {
                   "s3:prefix": [
                       "${cognito-identity.amazonaws.com:sub}/",
                       "${cognito-identity.amazonaws.com:sub}/*"
                   ]
               }
           }
       },
       {
           "Sid": "ReadYourObjects",
           "Effect": "Allow",
           "Action": [
               "s3:GetObject"
           ],
           "Resource": [
               "arn:aws:s3:::your-actual-photo-bucket-name/${cognito-identity.amazonaws.com:sub}",
               "arn:aws:s3:::your-actual-photo-bucket-name/${cognito-identity.amazonaws.com:sub}/*"
           ]
       }
   ]
}

{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Effect": "Allow",
           "Action": [
               "mobileanalytics:PutEvents",
               "cognito-sync:*",
               "cognito-identity:*"
           ],
           "Resource": [
               "*"
           ]
       },
       {
           "Sid": "ListYourObjects",
           "Effect": "Allow",
           "Action": "s3:ListBucket",
           "Resource": [
               "arn:aws:s3:::your-actual-photo-bucket-name"
           ],
           "Condition": {
               "StringLike": {
                   "s3:prefix": [
                       "${cognito-identity.amazonaws.com:sub}/",
                       "${cognito-identity.amazonaws.com:sub}/*"
                   ]
               }
           }
       },
       {
           "Sid": "ReadYourObjects",
           "Effect": "Allow",
           "Action": [
               "s3:GetObject"
           ],
           "Resource": [
               "arn:aws:s3:::your-actual-photo-bucket-name/${cognito-identity.amazonaws.com:sub}",
               "arn:aws:s3:::your-actual-photo-bucket-name/${cognito-identity.amazonaws.com:sub}/*"
           ]
       }
   ]
}

${cognito-identity.amazonaws.com:sub} is a special AWS variable. When a user is authenticated with a federated identity, each user is assigned a unique identity. What the above policy means is that any user who is authenticated should have access to objects prefixed by their own identity ID. This is how we intend users to gain authorization in a limited area within the S3 bucket.

Copy the Identity Pool ID (from sample code section). You will need this in your backend to get the identity id of the authenticated user via JWT token.

Amplify configuration for the frontend UI sign-in:

This object helps you set up the minimal configuration for your application. This is all that we need to sign in via Cognito and access the S3 photo bucket.

const awsconfig = {
   Auth : {
       identityPoolId: "idenity pool id created during authorization setup",
       region : "your aws region",
       identityPoolRegion: "same as above if cognito is in same region",
       userPoolId : "cognito user pool id created during authentication setup",
       userPoolWebClientId : "cognito app client id",
       cookieStorage : {
           domain : "https://your-app-domain-name", //this is very important
           secure: true
       },
       oauth: {
           domain : "{cognito domain name}.auth.{cognito region name}.amazoncognito.com",
           scope : ["profile","email","openid"],
           redirectSignIn: 'https://your-app-domain-name',
           redirectSignOut: 'https://your-app-domain-name',
           responseType : "token"
       }
   },
   Storage: {
       AWSS3 : {
           bucket: "your-actual-bucket-name",
           region: "region-of-your-bucket"
       }
   }
};
export default awsconfig;

const awsconfig = {
   Auth : {
       identityPoolId: "idenity pool id created during authorization setup",
       region : "your aws region",
       identityPoolRegion: "same as above if cognito is in same region",
       userPoolId : "cognito user pool id created during authentication setup",
       userPoolWebClientId : "cognito app client id",
       cookieStorage : {
           domain : "https://your-app-domain-name", //this is very important
           secure: true
       },
       oauth: {
           domain : "{cognito domain name}.auth.{cognito region name}.amazoncognito.com",
           scope : ["profile","email","openid"],
           redirectSignIn: 'https://your-app-domain-name',
           redirectSignOut: 'https://your-app-domain-name',
           responseType : "token"
       }
   },
   Storage: {
       AWSS3 : {
           bucket: "your-actual-bucket-name",
           region: "region-of-your-bucket"
       }
   }
};
export default awsconfig;

You can then use the below code to configure and sign in via social authentication.

import Amplify, {Auth} from 'aws-amplify';
import awsconfig from './aws-config';
Amplify.configure(awsconfig);
//once the amplify is configured you can use below call with onClick event of buttons or any other visual component to sign in.
//Example
<Button startIcon={<img alt="Sigin in With Google" src={logo} />} fullWidth variant="outlined" color="primary" onClick={() => Auth.federatedSignIn({provider: 'Google'})}>
   Sign in with Google
</Button>

import Amplify, {Auth} from 'aws-amplify';
import awsconfig from './aws-config';
Amplify.configure(awsconfig);
//once the amplify is configured you can use below call with onClick event of buttons or any other visual component to sign in.
//Example
<Button startIcon={<img alt="Sigin in With Google" src={logo} />} fullWidth variant="outlined" color="primary" onClick={() => Auth.federatedSignIn({provider: 'Google'})}>
   Sign in with Google
</Button>

Gallery View:

When the application is loaded, we use the PhotoGallery component to load photos and view thumbnails on-page. The Photogallery component is a wrapper around the InfinityScoller component, which keeps loading images as the user scrolls. The idea here is that we query a max of 10 images in one go. Our backend returns a list of 10 images (just the map and metadata to the S3 bucket). We must load these images from the S3 bucket and then show thumbnails on-screen as a gallery view. When the user reaches the bottom of the screen or there is empty space left, the InfiniteScroller component loads 10 more images. This continues untill our backend replies with a stop marker.

The key point here is that we need to send the JWT Token as a header to our backend service via an ajax call. The JWT Token is obtained post a sign-in from Amplify framework. An example of obtaininga JWT token:

let authsession = await Auth.currentSession();
let jwtToken = authsession.getIdToken().jwtToken;
let photoList = await axios.get(url,{
   headers : {
       Authorization: jwtToken
   },
   responseType : "json"
});

let authsession = await Auth.currentSession();
let jwtToken = authsession.getIdToken().jwtToken;
let photoList = await axios.get(url,{
   headers : {
       Authorization: jwtToken
   },
   responseType : "json"
});

An example of an infinite scroller component usage is given below. Note that “gallery” is JSX composed array of photo thumbnails. The “loadMore” method calls our ajax function to the server-side backend and updates the “gallery” variable and sets the “hasMore” variable to true/false so that the infinite scroller component can stop queering when there are no photos left to display on the screen.

<InfiniteScroll
   loadMore={this.fetchPhotos}
   hasMore={this.state.hasMore}
   loader={<div style={{padding:"70px"}} key={0}><LinearProgress color="secondary" /></div>}
>
   <div style={{ marginTop: "80px", position: "relative", textAlign: "center" }}>
       <div className="image-grid" style={{ marginTop: "30px" }}>
           {gallery}
       </div>
       {this.state.openLightBox ?
       <LightBox src={this.state.lightBoxImg} callback={this.closeLightBox} />
       : null}
   </div>
</InfiniteScroll>

<InfiniteScroll
   loadMore={this.fetchPhotos}
   hasMore={this.state.hasMore}
   loader={<div style={{padding:"70px"}} key={0}><LinearProgress color="secondary" /></div>}
>
   <div style={{ marginTop: "80px", position: "relative", textAlign: "center" }}>
       <div className="image-grid" style={{ marginTop: "30px" }}>
           {gallery}
       </div>
       {this.state.openLightBox ?
       <LightBox src={this.state.lightBoxImg} callback={this.closeLightBox} />
       : null}
   </div>
</InfiniteScroll>

The Lightbox component gives a zoom effect to the thumbnail. When the thumbnail is clicked, a higher resolution picture (webp version) is downloaded from the S3 bucket and shown on the screen. We use a storage object from the Amplify library. Downloaded content is a blob and must be converted into image data. To do so, we use the javascript native method, createObjectURL. Below is the sample code that downloads the object from the s3 bucket and then converts it into a viewable image for the HTML IMG tag.

thumbClick = (index) => {
   const urlCreater = window.URL || window.webkitURL;
   try {
       this.setState({
           openLightBox: true
       });
       Storage.get(this.state.photoList[index].src,{download: true}).then(data=>{
           let image = urlCreater.createObjectURL(data.Body);
           this.setState({
               lightBoxImg : image
           });
       });
          
   } catch (error) {
       console.log(error);
       this.setState({
           openLightBox: false,
           lightBoxImg : null
       });
   }
};

thumbClick = (index) => {
   const urlCreater = window.URL || window.webkitURL;
   try {
       this.setState({
           openLightBox: true
       });
       Storage.get(this.state.photoList[index].src,{download: true}).then(data=>{
           let image = urlCreater.createObjectURL(data.Body);
           this.setState({
               lightBoxImg : image
           });
       });
          
   } catch (error) {
       console.log(error);
       this.setState({
           openLightBox: false,
           lightBoxImg : null
       });
   }
};

Uploading Photos:

The S3 SDK lets you generate a pre-signed POST URL. Anyone who gets this URL will be able to upload objects to the S3 bucket directly without needing credentials. Of course, we can actually set up some boundaries, like a max object size, key of the uploaded object, etc. Refer to this AWS blog for more on pre-signed URLs. Here is the sample code to generate a pre-signed URL.

let s3Params = {
   Bucket: "your-temporary-bucket-name,
   Conditions : [
       ["content-length-range",1,31457280]
   ],
   Fields : {
       key: "path/to/your/object"
   },
   Expires: 300 //in seconds
};
const s3 = new S3({region : process.env.AWSREGION });
s3.createPresignedPost(s3Params)

let s3Params = {
   Bucket: "your-temporary-bucket-name,
   Conditions : [
       ["content-length-range",1,31457280]
   ],
   Fields : {
       key: "path/to/your/object"
   },
   Expires: 300 //in seconds
};
const s3 = new S3({region : process.env.AWSREGION });
s3.createPresignedPost(s3Params)

For a better UX, we can allow our users to upload more than one photo at a time. However, a pre-signed URL lets you upload a single object at a time. To overcome this, we generate multiple pre-signed URLs. Initially, we send a request to our backend asking to upload photos with expected keys. This request is originated once the user selects photos to upload. Our backend then generates pre-signed URLs for us. Our frontend React app then provides the illusion that all photos are being uploaded as a whole.

When the upload is successful, the S3 PUT event is triggered, which we discussed earlier. The complete flow of the application is given in a sequence diagram. You can find the complete source code here in my GitHub repository.

React Build Steps and Hosting:

The ideal way to build the react app is to execute an npm run build. However, we take a slightly different approach. We are not using the S3 static website for serving frontend UI. For one reason, S3 static websites are non-SSL unless we use CloudFront. Therefore, we will make the API gateway our application’s entry point. Thus, the UI will also be served from the API gateway. However, we want to reduce calls made to the API gateway. For this reason, we will only deliver the index.html file hosted with the help API gateway/Lamda, and the rest of the static files (react supporting JS files) from S3 bucket.

Your index.html should have all the reference paths pointed to the S3 bucket. The build mustexclusively specify that static files are located in a different location than what’s relative to the index.html file. Your S3 bucket needs to be public with the right bucket policy and CORS set so that the end-user can only retrieve files and not upload nasty objects. Those who are confused about how the S3 static website and S3 public bucket differ may refer to here. Below are the react build steps, bucket policy, and CORS.

PUBLIC_URL=https://{your-static-bucket-name}.s3.{aws_region}.amazonaws.com/ npm run build
//Bucket Policy
{
   "Version": "2012-10-17",
   "Id": "http referer from your domain only",
   "Statement": [
       {
           "Sid": "Allow get requests originating from",
           "Effect": "Allow",
           "Principal": "*",
           "Action": "s3:GetObject",
           "Resource": "arn:aws:s3:::{your-static-bucket-name}/static/*",
           "Condition": {
               "StringLike": {
                   "aws:Referer": [
                       "https://your-app-domain-name"
                   ]
               }
           }
       }
   ]
}
//CORS
[
   {
       "AllowedHeaders": [
           "*"
       ],
       "AllowedMethods": [
           "GET"
       ],
       "AllowedOrigins": [
           "https://your-app-domain-name"
       ],
       "ExposeHeaders": []
   }
]

PUBLIC_URL=https://{your-static-bucket-name}.s3.{aws_region}.amazonaws.com/ npm run build
//Bucket Policy
{
   "Version": "2012-10-17",
   "Id": "http referer from your domain only",
   "Statement": [
       {
           "Sid": "Allow get requests originating from",
           "Effect": "Allow",
           "Principal": "*",
           "Action": "s3:GetObject",
           "Resource": "arn:aws:s3:::{your-static-bucket-name}/static/*",
           "Condition": {
               "StringLike": {
                   "aws:Referer": [
                       "https://your-app-domain-name"
                   ]
               }
           }
       }
   ]
}
//CORS
[
   {
       "AllowedHeaders": [
           "*"
       ],
       "AllowedMethods": [
           "GET"
       ],
       "AllowedOrigins": [
           "https://your-app-domain-name"
       ],
       "ExposeHeaders": []
   }
]

Once a build is complete, upload index.html to a lambda that serves your UI. Run the below shell commands to compress static contents and host them on our static S3 bucket.

#assuming you are in your react app directory
mkdir /tmp/s3uploads
cp -ar build/static /tmp/s3uploads/
cd /tmp/s3uploads
#add gzip encoding to all the files
gzip -9 `find ./ -type f`
#remove .gz extension from compressed files
for i in `find ./ -type f`
do
   mv $i ${i%.*}
done
#sync your files to s3 static bucket and mention that these files are compressed with gzip encoding
#so that browser will not treat them as regular files
aws s3 --region $AWSREGION sync . s3://${S3_STATIC_BUCKET}/static/ --content-encoding gzip --delete --sse
cd -
rm -rf /tmp/s3uploads

#assuming you are in your react app directory
mkdir /tmp/s3uploads
cp -ar build/static /tmp/s3uploads/
cd /tmp/s3uploads
#add gzip encoding to all the files
gzip -9 `find ./ -type f`
#remove .gz extension from compressed files
for i in `find ./ -type f`
do
   mv $i ${i%.*}
done
#sync your files to s3 static bucket and mention that these files are compressed with gzip encoding
#so that browser will not treat them as regular files
aws s3 --region $AWSREGION sync . s3://${S3_STATIC_BUCKET}/static/ --content-encoding gzip --delete --sse
cd -
rm -rf /tmp/s3uploads

Our backend uses nodejs express framework. Since this is a serverless application, we need to wrap express with a serverless-http framework to work with lambda. Sample source code is given below, along with serverless framework resource definition. Notice that, except for the UI home endpoint ( “/” ), the rest of the API endpoints are authenticated with Cognito on the API gateway itself.

const serverless = require("serverless-http");
const express = require("express");
const app = express();
.
.
.
.
.
.
app.get("/",(req,res)=> {
 res.sendFile(path.join(__dirname + "/index.html"));
});
module.exports.uihome = serverless(app);

const serverless = require("serverless-http");
const express = require("express");
const app = express();
.
.
.
.
.
.
app.get("/",(req,res)=> {
 res.sendFile(path.join(__dirname + "/index.html"));
});
module.exports.uihome = serverless(app);

provider:
 name: aws
 runtime: nodejs12.x
 lambdaHashingVersion: 20201221
 httpApi:
   authorizers:
     cognitoJWTAuth:
       identitySource: $request.header.Authorization
       issuerUrl: https://cognito-idp.{AWS_REGION}.amazonaws.com/{COGNITO_USER_POOL_ID}
       audience:
         - COGNITO_APP_CLIENT_ID
.
.
.
.
.
.
.
functions:
 react-serve-ui:
   handler: handler.uihome
   memorySize: 256
   timeout: 29
   layers:
     - {Ref: CommonLibsLambdaLayer}
   events:
     - httpApi:
         path: /prep/photoupload
         method: post
         authorizer:
           name: cognitoJWTAuth
     - httpApi:
         path: /list/photos
         method: get
         authorizer:
           name: cognitoJWTAuth
     - httpApi:
         path: /
         method: get

provider:
 name: aws
 runtime: nodejs12.x
 lambdaHashingVersion: 20201221
 httpApi:
   authorizers:
     cognitoJWTAuth:
       identitySource: $request.header.Authorization
       issuerUrl: https://cognito-idp.{AWS_REGION}.amazonaws.com/{COGNITO_USER_POOL_ID}
       audience:
         - COGNITO_APP_CLIENT_ID
.
.
.
.
.
.
.
functions:
 react-serve-ui:
   handler: handler.uihome
   memorySize: 256
   timeout: 29
   layers:
     - {Ref: CommonLibsLambdaLayer}
   events:
     - httpApi:
         path: /prep/photoupload
         method: post
         authorizer:
           name: cognitoJWTAuth
     - httpApi:
         path: /list/photos
         method: get
         authorizer:
           name: cognitoJWTAuth
     - httpApi:
         path: /
         method: get

Final Steps :

Lastly, we will setup up a custom domain so that we don’t need to use the gibberish domain name generated by the API gateway and certificate for our custom domain. You don’t need to use route53 for this part. If you have an existing domain, you can create a subdomain and point it to the API gateway. First things first: head to the AWS ACM console and generate a certificate for the domain name. Once the request is generated, you need to validate your domain by creating a TXT record as per the ACM console. The ACM is a free service. Domain verification may take few minutes to several hours. Once you have the certificate ready, head back to the API gateway console. Navigate to “custom domain names” and click create.

Enter your application domain name
Check TLS 1.2 as TLS version
Select Endpoint type as Regional
Select ACM certificate from dropdown list
Create domain name

Select the newly created custom domain. Note the API gateway domain name from Domain Details -> Configuration tab. You will need this to map a CNAME/ALIAS record with your DNS provider. Click on the API mappings tab. Click configure API mappings. From the dropdown, select your API gateway, select stage as default, and click save. You are done here.

Future Scope and Improvements :

To improve application latency, we can use CloudFront as CDN. This way, our entry point could be S3, and we no longer need to use API gateway regional endpoint. We can also add AWS WAF as an added security in front of our API gateway to inspect incoming requests and payloads. We can also use Dynamodb secondary indexes so that we can efficiently search metadata in the table. Adding a lifecycle rule on raw photos which have not been accessed for more than a year can be transited to the S3 Glacier storage class. You can further add glacier deep storage transition to save more on storage costs.

December 12, 2022

Tag: serverless

What does this CLI utility (auth-awscreds) do?

Tech Stack Used

Software Stack:

Infrastructure Stack:

Recipe

CLI Utility: auth-awscreds

AWS Cognito and Google Authentication

React App:

API Gateway HTTP API and Lambda Function

Testing:

auth-awscreds in Action:

Summary

Related Articles

Three-tier Serverless Architecture

Presentation Tier

To configure an S3 bucket for static website hosting

Logic Tier

API Gateway

Integration with Lambda

API Performance Across the Globe

Data Tier

Sample Architecture Patterns

Mobile Backend

Amazon S3 Hosted Website

ServerLess Costing

Conclusion

Architecture of Google BigQuery

How BigQuery Stores Data?

How the Query Gets Executed?

BigQuery vs. MapReduce

Comparing BigQuery and Redshift

Getting Started With Google BigQuery

How to do Machine Learning on BigQuery?

Conclusion

Cloudflare Workers

Global responsiveness

Cloudflare Workers Overview

Wrangler Configuration

Regarding Routes

Worker KV

Writing Data in Workers KV

Let’s complicate things a little

Let’s start

Workers vs. AWS Lambda

Architecture:

Price:

Speed:

Wrapping Up:

What is the “Edge”?

But Why?

Forms of Edge Computing

Device Edge:

Cloud Edge:

Edges around you

The Fog

How do we manage Edge Computing?

Conclusion

References

Installing and using DC/OS

DC/OS commands and scripts

Setup DC/OS cli with DC/OS cluster

DC/OS authentication token

DC/OS cluster url

DC/OS cluster name

Access Mesos UI

Access Marathon UI

Access any DC/OS service, like Marathon, Kafka, Elastic, Spark etc.[DC/OS Services]

Access DC/OS slaves info in json using Mesos API [Mesos Endpoints]

Access DC/OS slaves info in json using DC/OS cli

Access DC/OS private slaves info using DC/OS cli

Access DC/OS public slaves info using DC/OS cli

Access DC/OS private and public slaves info using DC/OS cli

Get public IP of all public agents

Get public IP of master leader

Get all master nodes and their private ip

Get list of all users who have access to DC/OS cluster

Add users to cluster using Mesosphere script (Run this on master)

Add users to cluster using DC/OS API

Delete users from DC/OS cluster organization