Tag: Cloud

  • From Connected to Intelligent: The Evolution of Smart Homes

    Overview:

    From futuristic speculation to everyday reality, smart homes can go way beyond connected devices – they can become intelligent, collaborative, reactive and adaptable environments. This can be achieved using Multi-agent AI Systems (MAS) to unify IoT devices and lay a solid foundation for innovation, for more seamless and secure living.  

    This remarkable growth of smart homes brings both opportunities and challenges. In this whitepaper, we’ll explore both, moving from the general – market overview and predictions, to specific – blueprint architecture and use cases, using AWS Harmony.  

    Here’s a breakdown of the whitepaper:

    • The Smart Homes market landscape: what is the current state and changes to expect
    • Multi-Agent AI Systems (MAS): how they work and why they’re transforming Smart Homes
    • The technology behind MAS: capabilities, practical applications and benefits
    • Smart Homes on AWS Harmony: blueprint of Agentic AI as the foundation for next-gen experiences
    • Use case for sustainable living: a hybrid Edge + Cloud IoT high-level architecture to implement for energy saving

  • Transforming Infrastructure at Scale with Azure Cloud

    • Infrastructure Costs cut by 30-34% monthly, optimizing resource utilization and generating substantial savings.
    • Customer Onboarding Time reduced from 50 to 4 days, significantly accelerating the client’s ability to onboard new customers.
    • Site Provisioning Time for existing customers reduced from weeks to a few hours, streamlining operations and improving customer satisfaction.
    • Downtime affecting customers was reduced to under 30 minutes, with critical issues resolved within 1 hour and most proactively addressed before customer notification.

  • Modern Data Stack: The What, Why and How?

    This post will provide you with a comprehensive overview of the modern data stack (MDS), including its benefits, how it’s components differ from its predecessors’, and what its future holds.

    “Modern” has the connotation of being up-to-date, of being better. This is true for MDS, but how exactly is MDS better than what was before?

    What was the data stack like?…

    A few decades back, the map-reduce technological breakthrough made it possible to efficiently process large amounts of data in parallel on multiple machines.

    It provided the backbone of a standard pipeline that looked like:

    It was common to see HDFS used for storage, spark for computing, and hive to perform SQL queries on top.

    To run this, we had people handling the deployment and maintenance of Hadoop on their own.

    This core attribute of the setup eventually became a pain point and made it complex and inefficient in the long run.

    Being on-prem while facing growing heavier loads meant scalability became a huge concern.

    Hence, unlike today, the process was much more manual. Adding more RAM, increasing storage, and rolling out updates manually reduced productivity

    Moreover,

    • The pipeline wasn’t modular; components were tightly coupled, causing failures when deciding to shift to something new.
    • Teams committed to specific vendors and found themselves locked in, by design, for years.
    • Setup was complex, and the infrastructure was not resilient. Random surges in data crashed the systems. (This randomness in demand has only increased since the early decade of internet, due to social media-triggered virality.)
    • Self-service was non-existent. If you wanted to do anything with your data, you needed data engineers.
    • Observability was a myth. Your pipeline is failing, but you’re unaware, and then you don’t know why, where, how…Your customers become your testers, knowing more about your system’s issues.
    • Data protection laws weren’t as formalized, especially the lack of policies within the organization. These issues made the traditional setup inefficient in solving modern problems, and as we all know…

    For an upgraded modern setup, we needed something that is scalable, has a smaller learning curve, and something that is feasible for both a seed-stage startup or a fortune 500.

    Standing on the shoulders of tech innovations from the 2000s, data engineers started building a blueprint for MDS tooling with three core attributes: 

    Cloud Native (or the ocean)

    Arguably the definitive change of the MDS era, the cloud reduces the hassle of on-prem and welcomes auto-scaling horizontally or vertically in the era of virality and spikes as technical requirements.

    Modularity

    The M in MDS could stand for modular.

    You can integrate any MDS tool into your existing stack, like LEGO blocks.

    You can test out multiple tools, whether they’re open source or managed, choose the best fit, and iteratively build out your data infrastructure.

    This mindset helps instill a habit of avoiding vendor lock-in by continuously upgrading your architecture with relative ease.

    By moving away from the ancient, one-size-fits-all model, MDS recognizes the uniqueness of each company’s budget, domain, data types, and maturity—and provides the correct solution for a given use case.

    Ease of Use

    MDS tools are easier to set up. You can start playing with these tools within a day.

    Importantly, the ease of use is not limited to technical engineers.

    Owing to the rise of self-serve and no-code tools like tableau—data is finally democratized for usage for all kinds of consumers. SQL remains crucial, but for basic metric calculations PMs, Sales, Marketing, etc., can use a simple drag and drop in the UI (sometimes even simpler than Excel pivot tables).

    MDS also enables one to experiment with different architectural frameworks for their use case. For example, ELT vs. ETL (explained under Data Transformation).

    But, one might think such improvements mean MDS is the v1.1 of Data Stack, a tech upgrade that ultimately uses data to solve similar problems.

    Fortunately, that’s far from the case.

    MDS enables data to solve more human problems across the org—problems that employees have long been facing but could never systematically solve for, helping generate much more value from the data.

    Beyond these, employees want transparency and visibility into how any metric was calculated and which data source in Snowflake was used to build what specific tableau dashboard.

    Critically, with compliance finally being focused on, orgs need solutions for giving the right people the right access at the right time.

    Lastly, as opposed to previous eras, these days, even startups have varied infrastructure components with data; if you’re a PM tasked with bringing insights, how do you know where to start? What data assets the organization has?

    Besides these problem statements being tackled, MDS builds a culture of upskilling employees in various data concepts.

    Data security, governance, and data lineage are important irrespective of department or persona in the organization.

    From designers to support executives, the need for a data-driven culture is a given.

    You’re probably bored of hearing how good the MDS is and want to deconstruct it into its components.

    Let’s dive in.

    SOURCES

    In our modern era, every product is inevitably becoming a tech product

    From a smart bulb to an orbiting satellite, each generates data in its own unique flavor of frequency of generation, data format, data size, etc.

    Social media, microservices, IoT devices, smart devices, DBs, CRMs, ERPs, flat files, and a lot more…

    INGESTION

    Post creation of data, how does one “ingest” or take in that data for actual usage? (the whole point of investing).

    Roughly, there are three categories to help describe the ingestion solutions:

    Generic tools allow us to connect various data sources with data storages.

    E.g.: we can connect Google Ads or Salesforce to dump data into BigQuery or S3.

    These generic tools highlight the modularity and low or no code barrier aspect in MDS.

    Things are as easy as drag and drop, and one doesn’t need to be fluent in scripting.

    Then we have programmable tools as well, where we get more control over how we ingest data through code

    For example, we can write Apache Airflow DAGs in Python to load data from S3 and dump it to Redshift.

    Intermediary – these tools cater to a specific use case or are coupled with the source itself.

    E.g. – Snowpipe, a part of the data source snowflake itself, allows us to load data from files as soon as it’s available at the source.

    DATA STORAGE‍

    Where do you ingest data into?

    Here, we’ve expanded from HDFS & SQL DBs to a wider variety of formats (noSQL, document DB).

    Depending on the use case and the way you interact with data, you can choose from a DW, DB, DL, ObjectStores, etc.

    You might need a standard relational DB for transactions in finance, or you might be collecting logs. You might be experimenting with your product at an early stage and be fine with noSQL without worrying about prescribing schemas.

    One key feature to note is that—most are cloud-based. So, no more worrying about scalability and we pay only for what we use.

    PS: Do stick around till the end for new concepts of Lake House and reverse ETL (already prevalent in the industry).

    DATA TRANSFORMATION

    The stored raw data must be cleaned and restructured into the shape we deem best for actual usage. This slicing and dicing is different for every kind of data.

    For example, we have tools for the E-T-L way, which can be categorized into SaaS and Frameworks, e.g., Fivetran and Spark respectively.

    Interestingly, the cloud era has given storage computational capability such that we don’t even need an external system for transformation, sometimes.

    With this rise of E-LT, we leverage the processing capabilities of cloud data warehouses or lake houses. Using tools like DBT, we write templated SQL queries to transform our data in the warehouses or lake house itself.

    This is enabling analysts to perform heavy lifting of traditional DE problems

    We also see stream processing where we work with applications where “micro” data is processed in real time (analyzed as soon as it’s produced, as opposed to large batches).

    DATA VISUALIZATION

    The ability to visually learn from data has only improved in the MDS era with advanced design, methodology, and integration.

    With Embedded analytics, one can integrate analytical capabilities and data visualizations into the software application itself.

    External analytics, on the other hand, are used to build using your processed data. You choose your source, create a chart, and let it run.

    DATA SCIENCE, MACHINE LEARNING, MLOps

    Source: https://medium.com/vertexventures/thinking-data-the-modern-data-stack-d7d59e81e8c6

    In the last decade, we have moved beyond ad-hoc insight generation in Jupyter notebooks to

    production-ready, real-time ML workflows, like recommendation systems and price predictions. Any startup can and does integrate ML into its products.

    Most cloud service providers offer machine learning models and automated model building as a service.

    MDS concepts like data observation are used to build tools for ML practitioners, whether its feature stores (a feature store is a central repository that provides entity values as of a certain time), or model monitoring (checking data drift, tracking model performance, and improving model accuracy).

    This is extremely important as statisticians can focus on the business problem not infrastructure.

    This is an ever-expanding field where concepts for ex MLOps (DevOps for the ML pipelines—optimizing workflows, efficient transformations) and Synthetic media (using AI to generate content itself) arrive and quickly become mainstream.

    ChatGPT is the current buzz, but by the time you’re reading this, I’m sure there’s going to be an updated one—such is the pace of development.

    DATA ORCHESTRATION

    With a higher number of modularized tools and source systems comes complicated complexity.

    More steps, processes, connections, settings, and synchronization are required.

    Data orchestration in MDS needs to be Cron on steroids.

    Using a wide variety of products, MDS tools help bring the right data for the right purposes based on complex logic.

     

    DATA OBSERVABILITY

    Data observability is the ability to monitor and understand the state and behavior of data as it flows through an organization’s systems.

    In a traditional data stack, organizations often rely on reactive approaches to data management, only addressing issues as they arise. In contrast, data observability in an MDS involves adopting a proactive mindset, where organizations actively monitor and understand the state of their data pipelines to identify potential issues before they become critical.

    Monitoring – a dashboard that provides an operational view of your pipeline or system

    Alerting – both for expected events and anomalies 

    Tracking – ability to set and track specific events

    Analysis – automated issue detection that adapts to your pipeline and data health

    Logging – a record of an event in a standardized format for faster resolution

    SLA Tracking – Measure data quality against predefined standards (cost, performance, reliability)

    Data Lineage – graph representation of data assets showing upstream/downstream steps.

    DATA GOVERNANCE & SECURITY

    Data security is a critical consideration for organizations of all sizes and industries and needs to be prioritized to protect sensitive information, ensure compliance, and preserve business continuity. 

    The introduction of stricter data protection regulations, such as the General Data Protection Regulation (GDPR) and CCPA, introduced a huge need in the market for MDS tools, which efficiently and painlessly help organizations govern and secure their data.

    DATA CATALOG

    Now that we have all the components of MDS, from ingestion to BI, we have so many sources, as well as things like dashboards, reports, views, other metadata, etc., that we need a google like engine just to navigate our components.

    This is where a data catalog helps; it allows people to stitch the metadata (data about your data: the #rows in your table, the column names, types, etc.) across sources.

    This is necessary to help efficiently discover, understand, trust, and collaborate on data assets.

    We don’t want PMs & GTM to look at different dashboards for adoption data.

    Previously, the sole purpose of the original data pipeline was to aggregate and upload events to Hadoop/Hive for batch processing. Chukwa collected events and wrote them to S3 in Hadoop sequence file format. In those days, end-to-end latency was up to 10 minutes. That was sufficient for batch jobs, which usually scan data at daily or hourly frequency.

    With the emergence of Kafka and Elasticsearch over the last decade, there has been a growing demand for real-time analytics on Netflix. By real-time, we mean sub-minute latency. Instead of starting from scratch, Netflix was able to iteratively grow its MDS as per changes in market requirements.

    Source: https://blog.transform.co/data-talks/the-metric-layer-why-you-need-it-examples-and-how-it-fits-into-your-modern-data-stack/

     

    This is a snapshot of the MDS stack a data-mature company like Netflix had some years back where instead of a few all in one tools, each data category was solved by a specialized tool.

    FUTURE COMPONENTS OF MDS?

    DATA MESH

    Source: https://martinfowler.com/articles/data-monolith-to-mesh.html

    The top picture shows how teams currently operate, where no matter the feature or product on the Y axis, the data pipeline’s journey remains the same moving along the X. But in an ideal world of data mesh, those who know the data should own its journey.

    As decentralization is the name of the game, data mesh is MDS’s response to this demand for an architecture shift where domain owners use self-service infrastructure to shape how their data is consumed.

    DATA LAKEHOUSE

    Source: https://www.altexsoft.com/blog/data-lakehouse/

    We have talked about data warehouses and data lakes being used for data storage.

    Initially, when we only needed structured data, data warehouses were used. Later, with big data, we started getting all kinds of data, structured and unstructured.

    So, we started using Data Lakes, where we just dumped everything.

    The lakehouse tries to combine the best of both worlds by adding an intelligent metadata layer on top of the data lake. This layer basically classifies and categorizes data such that it can be interpreted in a structured manner.

    Also, all the data in the lake house is open, meaning that it can be utilized by all kinds of tools. They are generally built on top of open data formats like parquet so that they can be easily accessed by all the tools.

    End users can simply run their SQLs as if they’re querying a DWH. 

    REVERSE ETL

    Suppose you’re a salesperson using Salesforce and want to know if a lead you just got is warm or cold (warm indicating a higher chance of conversion).

    The attributes about your lead, like salary and age are fetched from your OLTP into a DWH, analyzed, and then the flag “warm” is sent back to Salesforce UI, ready to be used in live operations.

     METRICS LAYER

    The Metric layer will be all about consistency, accessibility, and trust in the calculations of metrics.

    Earlier, for metrics, you had v1 v1.1 Excels with logic scattered around.

    Currently, in the modern data stack world, each team’s calculation is isolated in the tool they are used to. For example, BI would store metrics in tableau dashboards while DEs would use code.

    A metric layer would exist to ensure global access of the metrics to every other tool in the data stack.

    For example, DBT metrics layer helps define these in the warehouse—something accessible to both BI and engineers. Similarly, looker, mode, and others have their unique approach to it.

    In summary, this blog post discussed the modern data stack and its advantages over older approaches. We examined the components of the modern data stack, including data sources, ingestion, transformation, and more, and how they work together to create an efficient and effective system for data management and analysis. We also highlighted the benefits of the modern data stack, including increased efficiency, scalability, and flexibility. 

    As technology continues to advance, the modern data stack will evolve and incorporate new components and capabilities.

  • An Introduction To Cloudflare Workers And Cloudflare KV store

    Cloudflare Workers

    This post gives a brief introduction to Cloudflare Workers and Cloudflare KV store. They address a fairly common set of problems around scaling an application globally. There are standard ways of doing this but they usually require a considerable amount of upfront engineering work and developers have to be aware of the ‘scalability’ issues to some degree. Serverless application tools target easy scalability and quick response times around the globe while keeping the developers focused on the application logic rather than infra nitty-gritties.

    Global responsiveness

    When an application is expected to be accessed around the globe, requests from users sitting in different time-zones should take a similar amount of time. There can be multiple ways of achieving this depending upon how data intensive the requests are and what those requests actually do.

    Data intensive requests are harder and more expensive to globalize, but again not all the requests are same. On the other hand, static requests like getting a documentation page or a blog post can be globalized by generating markup at build time and deploying them on a CDN.

    And there are semi-dynamic requests. They render static content either with some small amount of data or their content change based on the timezone the request came from.

    The above is a loose classification of requests but there are exceptions, for example, not all the static requests are presentational.

    Serverless frameworks are particularly useful in scaling static and semi-static requests.

    Cloudflare Workers Overview

    Cloudflare worker is essentially a function deployment service. They provide a serverless execution environment which can be used to develop and deploy small(although not necessarily) and modular cloud functions with minimal effort.

    It is very trivial to start with workers. First, lets install wrangler, a tool for managing Cloudfare Worker projects.

    npm i @cloudflare/wrangler -g

    Wrangler handles all the standard stuff for you like project generation from templates, build, config, publishing among other things.

    A worker primarily contains 2 parts: an event listener that invokes a worker and an event handler that returns a response object. Creating a worker is as easy as adding an event listener to a button.

    addEventListener('fetch', event => {
        event.respondWith(handleRequest(event.request))
    })
    
    async function handleRequest(request) {
        return new Response("hello world")
    }

    Above is a simple hello world example. Wrangler can be used to build and get a live preview of your worker.

    wrangler build

    will build your worker. And 

    wrangler preview 

    can be used to take a live preview on the browser. The preview is only meant to be used for testing(either by you or others). If you want the workers to be triggered by your own domain or a workers.dev subdomain, you need to publish it.

    Publishing is fairly straightforward and requires very less configuration on both wrangler and your project.

    Wrangler Configuration

    Just create an account on Cloudflare and get API key. To configure wrangler, just do:

    wrangler config

    It will ask for the registered email and API key, and you are good to go.

    To publish your worker on a workers.dev subdomain, just fill your account ID in the wrangler.toml and hit wrangler publish. The worker will be deployed and live at a generated workers.dev subdomain.

    Regarding Routes

    When you publish on a {script-name}.{subdomain}.workers.dev domain, the script or project associated with script-name will be invoked. There is no way to call a script just from {subdomain}.workers.dev.

    Worker KV

    Workers alone can’t be used to make anything complex without any persistent storage, that’s where Workers KV comes into the picture. Workers KV as it sounds, is a low-latency, high-volume, key-value store that is designed for efficient reads.

    It optimizes the read latency by dynamically spreading the most frequently read entries to the edges(replicated in several regions) and storing less frequent entries centrally.

    Newly added keys(or a CREATE) are immediately reflected in every region while a value change in the keys(or an UPDATE) may take as long as 60 seconds to propagate, depending upon the region.

    Workers KV is only available to paid users of Cloudflare.

    Writing Data in Workers KV

    curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces" 
    -X POST 
    -H "X-Auth-Email: $CLOUDFLARE_EMAIL" 
    -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" 
    -H "Content-Type: application/json" 
    --data '{"title": "Requests"}'
    The above HTTP request will create a namespace by the name Requests. The response should look something like this:
    {
        "result": {
            "id": "30b52f55aafb41d88546d01d5f69440a",
            "title": "Requests",
            "supports_url_encoding": true
        },
        "success": true,
        "errors": [],
        "messages": []
    }

    Now we can write KV pairs in this namespace. The following HTTP requests will do the same:

    curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/values/first-key" 
    -X PUT 
    -H "X-Auth-Email: $CLOUDFLARE_EMAIL" 
    -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" 
    --data 'My first value!'

    Here the NAMESPACE_ID is the same ID that we received in the last request. First-key is the key name and the My first value is the value.

    Let’s complicate things a little

    Above overview just introduces the managed cloud workers with a ‘hello world’ app and basics of the Workers KV, but now let’s make something more complicated. We will make an app which will tell how many requests have been made from your country till now. For example, if you pinged the worker from the US then it will return number of requests made so far from the US.

    We will need: 

    • Some place to store the count of requests for each country. 
    • Find from which country the Worker was invoked.

    For the first part, we will use the Workers KV to store the count for every request.

    Let’s start

    First, we will create a new project using wrangler: wrangler generate request-count.

    We will be making HTTP calls to write values in the Workers KV, so let’s add ‘node-fetch’ to the project:

    npm install node-fetch

    Now, how do we find from which country each request is coming from? The answer is the cf object that is provided with each request to a worker.

    The cf object is a special object that is passed with each request and can be accessed with request.cf. This mainly contains region specific information along with TLS and Auth information. The details of what is provided in the cf, can be found here.

    As we can see from the documentation, we can get country from

    request.cf.country.

    The cf object is not correctly populated in the wrangler preview, you will need to publish your worker in order to test cf’s usage. An open issue mentioning the same can be found here.

    Now, the logic is pretty straightforward here. When we get a request from a country for which we don’t have an entry in the Worker’s KV, we make an entry with value 1, else we increment the value of the country key.

    To use Workers KV, we need to create a namespace. A namespace is just a collection of key-value pairs where all the keys have to be unique.

    A namespace can be created under the KV tab in Cloudflare web UI by giving the name or using the API call above. You can also view/browse all of your namespaces from the web UI. Following API call can be used to read the value of a key from a namespace:

    curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/values/first-key" 
    -H "X-Auth-Email: $CLOUDFLARE_EMAIL" 
    -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" 

    But, it is neither the fastest nor the easiest way. Cloudflare provides a better and faster way to read data from your namespaces. It’s called binding. Each KV namespace can be bound to a worker script so to make it available in the script by the variable name. Any namespace can be bound with any worker. A KV namespace can be bound to a worker by going to the editing menu of a worker from the Cloudflare UI. 

    Following steps show you how to bind a namespace to a worker:

    Go to the edit page of the worker in Cloudflare web UI and click on the KV tab:

    Then add a binding by clicking the ‘Add binding’ button.

    You can select the namespace name and the variable name by which it will be bound. More details can be found here. A binding that I’ve made can be seen in the above image.

    That’s all we need to get this to work. Following is the relevant part of the script:

    const fetch = require('node-fetch')
    
    addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request))
    })
    
    /**
    * Fetch and log a request
    * @param {Request} request
    */
    async function handleRequest(request) {
        const country = request.cf.country
    
        const url = `https://api.cloudflare.com/client/v4/accounts/account-id/storage/kv/namespaces/namespace-id/values/${country}`
    
        let count = await requests.get(country)
    
        if (!count) {
            count = 1
        } else {
            count = parseInt(count) + 1
        }
    
        try {
            response = await fetch(url, {
            method: 'PUT',
            headers: {"X-Auth-Email": "email", "X-Auth-Key": "auth-key"},
            body: `${count}`
            })
        } catch (error) {
            return new Response(error, { status: 500 })
        }
    
        return new Response(`${country}: ${count}`, { status: 200 }) 
    }

    In the above code, I bound the Requests namespace that we created by the requests variable that would be dynamically resolved when we publish.

    The full source of this can be found here.

    This small application also demonstrates some of the practical aspects of the workers. For example, you would notice that the updates take some time to get reflected and response time of the workers is quick, especially when they are deployed on a .workers.dev subdomain here.

    Side note: You will have to recreate the namespace-worker binding everytime you deploy the worker or you do wrangler publish.

    Workers vs. AWS Lambda

    AWS Lambda has been a major player in the serverless market for a while now. So, how is Cloudflare Workers as compared to it? Let’s see.

    Architecture:

    Cloudflare Workers `Isolates` instead of a container based underlying architecture. `Isolates` is the technology that allows V8(Google Chrome’s JavaScript Engine) to run thousands of processes on a single server in an efficient and secure manner. This effectively translates into faster code execution and lowers memory usage. More details can be found here.

    Price:

    The above mentioned architectural difference allows Workers to be significantly cheaper than Lambda. While a Worker offering 50 milliseconds of CPU costs $0.50 per million requests, the equivalent Lambda costs $1.84 per million. A more detailed price comparison can be found here.

    Speed:

    Workers also show significantly better performance numbers than Lambda and Lambda@Edge. Tests run by Cloudflare claim that they are 441% faster than Lambda and 192% faster than Lambda@Edge. A detailed performance comparison can be found here.

    This better performance is also confirmed by serverless-benchmark.

    Wrapping Up:

    As we have seen, Cloudflare Workers along with the KV Store does make it very easy to start with a serverless application. They provide fantastic performance while using less cost along with intuitive deployment. These properties make them ideal for making globally accessible serverless applications.

  • How Much Do You Really Know About Simplified Cloud Deployments?

    Is your EC2/VM bill giving you sleepless nights?

    Are your EC2 instances under-utilized? Have you been wondering if there was an easy way to maximize the EC2/VM usage?

    Are you investing too much in your Control Plane and wish you could divert some of that investment towards developing more features in your applications (business logic)?

    Is your Configuration Management system overwhelming you and seems to have got a life of its own?

    Do you have legacy applications that do not need Docker at all?

    Would you like to simplify your deployment toolchain to streamline your workflows?

    Have you been recommended to use Kubernetes as a problem to fix all your woes, but you aren’t sure if Kubernetes is actually going to help you?

    Do you feel you are moving towards Docker, just so that Kubernetes can be used?

    If you answered “Yes” to any of the questions above, do read on, this article is just what you might need.

    There are steps to create a simple setup on your laptop at the end of the article.

    Introduction

    In the following article, we will present the typical components of a multi-tier application and how it is setup and deployed.

    We shall further go on to see how the same application deployment can be remodeled for scale using any Cloud Infrastructure. (The same software toolchain can be used to deploy the application on your On-Premise Infrastructure as well)

    The tools that we propose are Nomad and Consul. We shall focus more on how to use these tools, rather than deep-dive into the specifics of the tools. We will briefly see the features of the software which would help us achieve our goals.

    • Nomad is a distributed workload manager for not only Docker containers, but also for various other types of workloads like legacy applications, JAVA, LXC, etc.

    More about Nomad Drivers here: Nomadproject.io, application delivery with HashiCorp, introduction to HashiCorp Nomad.

    • Consul is a distributed service mesh, with features like service registry and a key-value store, among others.

    Using these tools, the application/startup workflow would be as follows:

    Nomad will be responsible for starting the service.

    Nomad will publish the service information in Consul. The service information will include details like:

    • Where is the application running (IP:PORT) ?
    • What “service-name” is used to identify the application?
    • What “tags” (metadata) does this application have?

    A Typical Application

    A typical application deployment consists of a certain fixed set of processes, usually coupled with a database and a set of few (or many) peripheral services.

    These services could be primary (must-have) or support (optional) features of the application.

    Note: We are aware about what/how a proper “service-oriented-architecture” should be, though we will skip that discussion for now. We will rather focus on how real-world applications are setup and deployed.

    Simple Multi-tier Application

    In this section, let’s see the components of a multi-tier application along with typical access patterns from outside the system and within the system.

    • Load Balancer/Web/Front End Tier
    • Application Services Tier
    • Database Tier
    • Utility (or Helper Servers): To run background, cron, or queued jobs.

    Using a proxy/loadbalancer, the services (Service-A, Service-B, Service-C) could be accessed using distinct hostnames:

    • a.example.tld
    • b.example.tld
    • c.example.tld

    For an equivalent path-based routing approach, the setup would be similar. Instead of distinct hostnames, the communication mechanism would be:

    • common-proxy.example.tld/path-a/
    • common-proxy.example.tld/path-b/
    • common-proxy.example.tld/path-c/

    Problem Scenario 1

    Some of the basic problems with the deployment of the simple multi-tier application are:

    • What if the service process crashes during its runtime?
    • What if the host on which the services run shuts down, reboots or terminates?

    This is where Nomad’s feature of always keep the service running would be useful.

    In spite of this auto-restart feature, there could be issues if the service restarts on a different machine (i.e. different IP address).

    In case of Docker and ephemeral ports, the service could start on a different port as well.

    To solve this, we will use the service discovery feature provided by Consul, combined with a with a Consul-aware load-balancer/proxy to redirect traffic to the appropriate service.

    The order of the operations within the Nomad job will thus be:

    • Nomad will launch the job/task.
    • Nomad will register the task details as a service definition in Consul.
      (These steps will be re-executed if/when the application is restarted due to a crash/fail-over)
    • The Consul-aware load-balancer will route the traffic to the service (IP:PORT)

    Multi-tier Application With Load Balancer

    Using the Consul-aware load-balancer, the diagram will now look like:

    The details of the setup now are:

    • A Consul-aware load-balancer/proxy; the application will access the services via the load-balancer.
    • 3 (three) instances of service A; A1, A2, A3
    • 3 (three) instances of service B; B1, B2, B3

    The Routing Question

    At this moment, you could be wondering, “Why/How would the load-balancer know that it has to route traffic for service-A to A1/A2/A3 and route traffic for service-B to B1/B2/B3 ?”

    The answer lies in the Consul tags which will be published as part of the service definition (when Nomad registers the service in Consul).

    The appropriate Consul tags will tell the load-balancer to route traffic of a particular service to the appropriate backend. (+++)

    Let’s read that statement again (very slowly, just to be sure); The Consul tags, which are part of the service definition, will inform (advertise) the load-balancer to route traffic to the appropriate backend.

    The reason to dwell upon this distinction is very important, as this is different from how the classic load-balancer/proxy software like HAProxy or NGINX are configured. For HAProxy/NGINX the backend routing information resides with the load-balancer instance and is not “advertised” by the backend.

    The traditional load-balancers like NGINX/HAProxy do not natively support dynamic reloading of the backends. (when the backends stop/start/move-around). The heavy lifting of regenerating the configuration file and reloading the service is left up to an external entity like Consul-Template.

    The use of a Consul-aware load-balancer, instead of a traditional load-balancer, eliminates the need of external workarounds.

    The setup can thus be termed as a zero-configuration setup; you don’t have to re-configure the load-balancer, it will discover the changing backend services based on the information available from Consul.

    Problem Scenario 2

    So far we have achieved a method to “automatically” discover the backends, but isn’t the Load-Balancer itself a single-point-of-failure (SPOF)?

    It absolutely is, and you should always have redundant load-balancers instances (which is what any cloud-provided load-balancer has).

    As there is a certain cost associated with using “cloud-provided load-balancer”, we would create the load-balancers ourselves and not use cloud-provided load-balancers.

    To provide redundancy to the load-balancer instances, you should configure them using and AutoScalingGroup (AWS), VM Scale Sets (Azure), etc.

    The same redundancy strategy should also be used for the worker nodes, where the actual services reside, by using AutoScaling Groups/VMSS for the worker nodes.

    The Complete Picture

    Installation and Configuration

    Given that nowadays laptops are pretty powerful, you can easily create a test setup on your laptop using VirtualBox, VMware Workstation Player, VMware Workstation, etc.

    As a prerequisite, you will need a few virtual machines which can communicate with each other.

    NOTE: Create the VMs with networking set to bridged mode.

    The machines needed for the simple setup/demo would be:

    • 1 Linux VM to act as a server (srv1)
    • 1 Linux VM to act as a load-balancer (lb1)
    • 2 Linux VMs to act as worker machines (client1, client2)

    *** Each machine can be 2 CPU 1 GB memory each.

    The configuration files and scripts needed for the demo, which will help you set up the Nomad and Consul cluster are available here.

    Setup the Server

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service

    Create the Server Configuration

    ### On the server machine ...
    
    ### Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/server.hcl -O /etc/consul/server.hcl
    
    ### Edit Consul's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
    sudo vim /etc/consul/server.hcl
    
    ### Nomad
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/server.hcl -O /etc/nomad/server.hcl
    
    ### Edit Nomad's server.hcl file and setup the fields 'encrypt' and 'retry_join' as per your cluster.
    sudo vim /etc/nomad/server.hcl
    
    ### After you are done with the edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members

    Setup the Load-Balancer

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service
    sudo mv -fv nomad.service /etc/systemd/system/nomad.service

    Create the Load-Balancer Configuration

    ### On the load-balancer machine ...
    
    ### for Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl
    
    ### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/consul/client.hcl
    
    ### for Nomad ...
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl
    
    ### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/nomad/client.hcl
    
    ### After you are done with the edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members
    sudo nomad node status -verbose

    Setup the Client (Worker) Machines

    Install the binaries on the server

    # install the Consul binary
    wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip -O consul.zip
    unzip -o consul.zip
    sudo chown root:root consul
    sudo mv -fv consul /usr/sbin/
    
    # install the Nomad binary
    wget https://releases.hashicorp.com/nomad/0.11.3/nomad_0.11.3_linux_amd64.zip -O nomad.zip
    unzip -o nomad.zip
    sudo chown root:root nomad
    sudo mv -fv nomad /usr/sbin/
    
    # install Consul's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/consul.service -O consul.service
    sudo chown root:root consul.service
    sudo mv -fv consul.service /etc/systemd/system/consul.service
    
    # install Nomad's service file
    wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/systemd/nomad.service -O nomad.service
    sudo chown root:root nomad.service
    sudo mv -fv nomad.service /etc/systemd/system/nomad.service

    Create the Worker Configuration

    ### On the client (worker) machine ...
    
    ### Consul 
    sudo mkdir -p /etc/consul/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/consul/client.hcl -O /etc/consul/client.hcl
    
    ### Edit Consul's client.hcl file and setup the fields 'name', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/consul/client.hcl
    
    ### Nomad
    sudo mkdir -p /etc/nomad/
    sudo wget https://raw.githubusercontent.com/shantanugadgil/hashistack/master/config/nomad/client.hcl -O /etc/nomad/client.hcl
    
    ### Edit Nomad's client.hcl file and setup the fields 'name', 'node_class', 'encrypt', 'retry_join' as per your cluster.
    sudo vim /etc/nomad/client.hcl
    
    ### After you are sure about your edits ...
    
    sudo systemctl daemon-reload
    sudo systemctl enable consul nomad
    sudo systemctl restart consul nomad
    sleep 10
    sudo consul members
    sudo nomad server members
    sudo nomad node status -verbose

    Test the Setup

    For the sake of simplicity, we shall assume the following IP addresses for the machines. (You can adapt the IPs as per your actual cluster configuration)

    srv1: 192.168.1.11

    lb1: 192.168.1.101

    client1: 192.168.201

    client1: 192.168.202

    You can access the web GUI for Consul and Nomad at the following URLs:

    Consul: http://192.168.1.11:8500

    Nomad: http://192.168.1.11:4646

    Login into the server and start the following watch command:

    # watch -n 5 "consul members; echo; nomad server members; echo; nomad node status -verbose; echo; nomad job status"

    Output:

    Node     Address             Status  Type    Build  Protocol  DC   Segment
    srv1     192.168.1.11:8301   alive   server  1.5.1  2         dc1  <all>
    client1  192.168.1.201:8301  alive   client  1.5.1  2         dc1  <default>
    client2  192.168.1.202:8301  alive   client  1.5.1  2         dc1  <default>
    lb1      192.168.1.101:8301  alive   client  1.5.1  2         dc1  <default>
    
    Name         Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
    srv1.global  192.168.1.11  4648  alive   true    2         0.9.3  dc1         global
    
    ID           DC   Name     Class   Address        Version Drain  Eligibility  Status
    37daf354...  dc1  client2  worker  192.168.1.202  0.9.3  false  eligible     ready
    9bab72b1...  dc1  client1  worker  192.168.1.201  0.9.3  false  eligible     ready
    621f4411...  dc1  lb1      lb      192.168.1.101  0.9.3  false  eligible     ready

    Submit Jobs

    Login into the server (srv1) and download the sample jobs

    Run the load-balancer job

    # nomad run fabio_docker.nomad

    Output:

    ==> Monitoring evaluation "bb140467"
        Evaluation triggered by job "fabio_docker"
        Allocation "1a6a5587" created: node "621f4411", group "fabio"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "bb140467" finished with status "complete"

    Check the status of the load-balancer

    # nomad alloc status 1a6a5587

    Output:

    ID                  = 1a6a5587
    Eval ID             = bb140467
    Name                = fabio_docker.fabio[0]
    Node ID             = 621f4411
    Node Name           = lb1
    Job ID              = fabio_docker
    Job Version         = 0
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 1m9s ago
    Modified            = 1m3s ago
    
    Task "fabio" is "running"
    Task Resources
    CPU        Memory          Disk     Addresses
    5/200 MHz  10 MiB/128 MiB  300 MiB  lb: 192.168.1.101:9999
                                        ui: 192.168.1.101:9998
    
    Task Events:
    Started At     = 2019-06-13T19:15:17Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-13T19:15:17Z  Started     Task started by client
    2019-06-13T19:15:12Z  Driver      Downloading image
    2019-06-13T19:15:12Z  Task Setup  Building Task Directory
    2019-06-13T19:15:12Z  Received    Task received by client

    Run the service ‘foo’

    # nomad run foo_docker.nomad

    Output:

    ==> Monitoring evaluation "a994bbf0"
        Evaluation triggered by job "foo_docker"
        Allocation "7794b538" created: node "9bab72b1", group "gowebhello"
        Allocation "eecceffc" modified: node "37daf354", group "gowebhello"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "a994bbf0" finished with status "complete"

    Check the status of service ‘foo’

    # nomad alloc status 7794b538

    Output:

    ID                  = 7794b538
    Eval ID             = a994bbf0
    Name                = foo_docker.gowebhello[1]
    Node ID             = 9bab72b1
    Node Name           = client1
    Job ID              = foo_docker
    Job Version         = 1
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 9s ago
    Modified            = 7s ago
    
    Task "gowebhello" is "running"
    Task Resources
    CPU        Memory           Disk     Addresses
    0/500 MHz  4.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23382
    
    Task Events:
    Started At     = 2019-06-13T19:27:17Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-13T19:27:17Z  Started     Task started by client
    2019-06-13T19:27:16Z  Task Setup  Building Task Directory
    2019-06-13T19:27:15Z  Received    Task received by client

    Run the service ‘bar’

    # nomad run bar_docker.nomad

    Output:

    ==> Monitoring evaluation "075076bc"
        Evaluation triggered by job "bar_docker"
        Allocation "9f16354b" created: node "9bab72b1", group "gowebhello"
        Allocation "b86d8946" created: node "37daf354", group "gowebhello"
        Evaluation status changed: "pending" -> "complete"
    ==> Evaluation "075076bc" finished with status "complete"

    Check the status of service ‘bar’

    # nomad alloc status 9f16354b

    Output:

    ID                  = 9f16354b
    Eval ID             = 075076bc
    Name                = bar_docker.gowebhello[1]
    Node ID             = 9bab72b1
    Node Name           = client1
    Job ID              = bar_docker
    Job Version         = 0
    Client Status       = running
    Client Description  = Tasks are running
    Desired Status      = run
    Desired Description = <none>
    Created             = 4m28s ago
    Modified            = 4m16s ago
    
    Task "gowebhello" is "running"
    Task Resources
    CPU        Memory           Disk     Addresses
    0/500 MHz  6.2 MiB/256 MiB  300 MiB  http: 192.168.1.201:23646
    
    Task Events:
    Started At     = 2019-06-14T06:49:36Z
    Finished At    = N/A
    Total Restarts = 0
    Last Restart   = N/A
    
    Recent Events:
    Time                  Type        Description
    2019-06-14T06:49:36Z  Started     Task started by client
    2019-06-14T06:49:35Z  Task Setup  Building Task Directory
    2019-06-14T06:49:35Z  Received    Task received by client

    Check the Fabio Routes

    http://192.168.1.101:9998/routes

    Connect to the Services

    The services “foo” and “bar” are available at:

    http://192.168.1.101:9999/foo

    http://192.168.1.101:9999/bar

    Output:

    gowebhello root page
    
    https://github.com/udhos/gowebhello is a simple golang replacement for 'python -m SimpleHTTPServer'.
    Welcome!
    gowebhello version 0.7 runtime go1.12.5 os=linux arch=amd64
    Keepalive: true
    Application banner: Welcome to FOO
    ...
    ...

    Pressing F5 to refresh the browser should keep changing the backend service that you are eventually connected to.

    Conclusion

    This article should give you a fair idea about the common problems of a distributed application and how they can be solved.

    Remodeling an existing application deployment as it scales can be quite a challenge. Hopefully the sample/demo setup will help you to explore, design and optimize the deployment workflows of your application, be it On-Premise or any Cloud Environment.