Category: Services

  • Building A Scalable API Testing Framework With Jest And SuperTest

    Focus on API testing

    Before starting off, below listed are the reasons why API testing should be encouraged:

    • Identifies bugs before it goes to UI
    • Effective testing at a lower level over high-level broad-stack testing
    • Reduces future efforts to fix defects
    • Time-saving

    Well, QA practices are becoming more automation-centric with evolving requirements, but identifying the appropriate approach is the primary and the most essential step. This implies choosing a framework or a tool to develop a test setup which should be:

    • Scalable 
    • Modular
    • Maintainable
    • Able to provide maximum test coverage
    • Extensible
    • Able to generate test reports
    • Easy to integrate with source control tool and CI pipeline

    To attain the goal, why not develop your own asset rather than relying on the ready-made tools like Postman, JMeter, or any? Let’s have a look at why you should choose ‘writing your own code’ over depending on the API testing tools available in the market:

    1. Customizable
    2. Saves you from the trap of limitations of a ready-made tool
    3. Freedom to add configurations and libraries as required and not really depend on the specific supported plugins of the tool
    4. No limit on the usage and no question of cost
    5. Let’s take Postman for example. If we are going with Newman (CLI of Postman), there are several efforts that are likely to evolve with growing or changing requirements. Adding a new test requires editing in Postman, saving it in the collection, exporting it again and running the entire collection.json through Newman. Isn’t it tedious to repeat the same process every time?

    We can overcome such annoyance and meet our purpose using a self-built Jest framework using SuperTest. Come on, let’s dive in!

    Source: school.geekwall

    Why Jest?

    Jest is pretty impressive. 

    • High performance
    • Easy and minimal setup
    • Provides in-built assertion library and mocking support
    • Several in-built testing features without any additional configuration
    • Snapshot testing
    • Brilliant test coverage
    • Allows interactive watch mode ( jest –watch or jest –watchAll )

    Hold on. Before moving forward, let’s quickly visit Jest configurations, Jest CLI commands, Jest Globals and Javascript async/await for better understanding of the coming content.

    Ready, set, go!

    Creating a node project jest-supertest in our local and doing npm init. Into the workspace, we will install Jest, jest-stare for generating custom test reports, jest-serial-runner to disable parallel execution (since our tests might be dependent) and save these as dependencies.

    npm install jest jest-stare jest-serial-runner --save-dev

    Tags to the scripts block in our package.json. 

    
    "scripts": {
        "test": "NODE_TLS_REJECT_UNAUTHORIZED=0 jest --reporters default jest-stare --coverage --detectOpenHandles --runInBand --testTimeout=60000",
        "test:watch": "jest --verbose --watchAll"
      }

    npm run test command will invoke the test parameter with the following:

    • NODE_TLS_REJECT_UNAUTHORIZED=0: ignores the SSL certificate
    • jest: runs the framework with the configurations defined under Jest block
    • –reporters: default jest-stare 
    • –coverage: invokes test coverage
    • –detectOpenHandles: for debugging
    • –runInBand: serial execution of Jest tests
    • –forceExit: to shut down cleanly
    • –testTimeout = 60000 (custom timeout, default is 5000 milliseconds)

    Jest configurations:

    [Note: This is customizable as per requirements]

    "jest": {
        "verbose": true,
        "testSequencer": "/home/abc/jest-supertest/testSequencer.js",
        "coverageDirectory": "/home/abc/jest-supertest/coverage/my_reports/",
        "coverageReporters": ["html","text"],
        "coverageThreshold": {
          "global": {
            "branches": 100,
            "functions": 100,
            "lines": 100,
            "statements": 100
          }
        }
      }

    testSequencer: to invoke testSequencer.js in the workspace to customize the order of running our test files

    touch testSequencer.js

    Below code in testSequencer.js will run our test files in alphabetical order.

    const Sequencer = require('@jest/test-sequencer').default;
    
    class CustomSequencer extends Sequencer {
      sort(tests) {
        // Test structure information
        // https://github.com/facebook/jest/blob/6b8b1404a1d9254e7d5d90a8934087a9c9899dab/packages/jest-runner/src/types.ts#L17-L21
        const copyTests = Array.from(tests);
        return copyTests.sort((testA, testB) => (testA.path > testB.path ? 1 : -1));
      }
    }
    
    module.exports = CustomSequencer;

    • verbose: to display individual test results
    • coverageDirectory: creates a custom directory for coverage reports
    • coverageReporters: format of reports generated
    • coverageThreshold: minimum and maximum threshold enforcements for coverage results

    Testing endpoints with SuperTest

    SuperTest is a node library, superagent driven, to extensively test Restful web services. It hits the HTTP server to send requests (GET, POST, PATCH, PUT, DELETE ) and fetch responses.

    Install SuperTest and save it as a dependency.

    npm install supertest --save-dev

    "devDependencies": {
        "jest": "^25.5.4",
        "jest-serial-runner": "^1.1.0",
        "jest-stare": "^2.0.1",
        "supertest": "^4.0.2"
      }

    All the required dependencies are installed and our package.json looks like:

    {
      "name": "supertestjest",
      "version": "1.0.0",
      "description": "",
      "main": "index.js",
      "jest": {
        "verbose": true,
        "testSequencer": "/home/abc/jest-supertest/testSequencer.js",
        "coverageDirectory": "/home/abc/jest-supertest/coverage/my_reports/",
        "coverageReporters": ["html","text"],
        "coverageThreshold": {
          "global": {
            "branches": 100,
            "functions": 100,
            "lines": 100,
            "statements": 100
          }
        }
      },
      "scripts": {
        "test": "NODE_TLS_REJECT_UNAUTHORIZED=0 jest --reporters default jest-stare --coverage --detectOpenHandles --runInBand --testTimeout=60000",
        "test:watch": "jest --verbose --watchAll"
      },
      "author": "",
      "license": "ISC",
      "devDependencies": {
        "jest": "^25.5.4",
        "jest-serial-runner": "^1.1.0",
        "jest-stare": "^2.0.1",
        "supertest": "^4.0.2"
      }
    }

    Now we are ready to create our Jest tests with some defined conventions:

    • describe block assembles multiple tests or its
    • test block – (an alias usually used is ‘it’) holds single test 
    • expect() –  performs assertions 

    It recognizes the test files in __test__/ folder

    • with .test.js extension
    • with .spec.js extension

    Here is a reference app for API tests.

    Let’s write commonTests.js which will be required by every test file. This hits the app through SuperTest, logs in (if required) and saves authorization token. The aliases are exported from here to be used in all the tests. 

    [Note: commonTests.js, be created or not, will vary as per the test requirements]

    touch commonTests.js

    var supertest = require('supertest'); //require supertest
    const request = supertest('https://reqres.in/'); //supertest hits the HTTP server (your app)
    
    /*
    This piece of code is for getting the authorization token after login to your app.
    const token;
    test("Login to the application", function(){
        return request.post(``).then((response)=>{
            token = response.body.token  //to save the login token for further requests
        })
    }); 
    */
    
    module.exports = 
    {
        request
            //, token     -- export if token is generated
    }

    Moving forward to writing our tests on POST, GET, PUT and DELETE requests for the basic understanding of the setup. For that, we are creating two test files to also see and understand if the sequencer works.

    mkdir __test__/
    touch __test__/postAndGet.test.js __test__/putAndDelete.test.js

    As mentioned above and sticking to Jest protocols, we have our tests written.

    postAndGet.test.js test file:

    • requires commonTests.js into ‘request’ alias
    • POST requests to api/users endpoint, calls supertest.post() 
    • GET requests to api/users endpoint, calls supertest.get()
    • uses file system to write globals and read those across all the tests
    • validates response returned on hitting the HTTP endpoints
    const request = require('../commonTests');
    const fs = require('fs');
    let userID;
    
    //Create a new user
    describe("POST request", () => {
      
      try{
        let userDetails;
        beforeEach(function () {  
            console.log("Input user details!")
            userDetails = {
              "name": "morpheus",
              "job": "leader"
          }; //new user details to be created
          });
        
        afterEach(function () {
          console.log("User is created with ID : ", userID)
        });
    
    	  it("Create user data", async done => {
    
            return request.request.post(`api/users`) //post() of supertest
                    //.set('Authorization', `Token $  {request.token}`) //Authorization token
                    .send(userDetails) //Request header
                    .expect(201) //response to be 201
                    .then((res) => {
                        expect(res.body).toBeDefined(); //test if response body is defined
                        //expect(res.body.status).toBe("success")
                        userID = res.body.id;
                        let jsonContent = JSON.stringify({userId: res.body.id}); // create a json
                        fs.writeFile("data.json", jsonContent, 'utf8', function (err) //write user id into global json file to be used 
                        {
                        if (err) {
                            return console.log(err);
                        }
                        console.log("POST response body : ", res.body)
                        done();
                        });
                      })
                    })
                  }
                  catch(err){
                    console.log("Exception : ", err)
                  }
            });
    
    //GET all users      
    describe("GET all user details", () => {
      
      try{
          beforeEach(function () {
            console.log("GET all users details ")
        });
              
          afterEach(function () {
            console.log("All users' details are retrieved")
        });
    
          test("GET user output", async done =>{
            await request.request.get(`api/users`) //get() of supertest
                                    //.set('Authorization', `Token ${request.token}`) 
                                    .expect(200).then((response) =>{
                                    console.log("GET RESPONSE : ", response.body);
                                    done();
                        })
          })
        }
      catch(err){
        console.log("Exception : ", err)
        }
    });

    putAndDelete.test.js file:

    • requires commonsTests into ‘request’ alias
    • calls data.json into ‘data’ alias which was created by the file system in our previous test to write global variables into it
    • PUT sto api/users/${data.userId} endpoint, calls supertest.put() 
    • DELETE requests to api/users/${data.userId} endpoint, calls supertest.delete() 
    • validates response returned by the endpoints
    • removes data.json (similar to unsetting global variables) after all the tests are done
    const request = require('../commonTests');
    const fs = require('fs'); //file system
    const data = require('../data.json'); //data.json containing the global variables
    
    //Update user data
    describe("PUT user details", () => {
    
        try{
            let newDetails;
            beforeEach(function () {
                console.log("Input updated user's details");
                newDetails = {
                    "name": "morpheus",
                    "job": "zion resident"
                }; // details to be updated
      
            });
            afterEach(function () {
                console.log("user details are updated");
            });
      
            test("Update user now", async done =>{
    
                console.log("User to be updated : ", data.userId)
    
                const response = await request.request.put(`api/users/${data.userId}`).send(newDetails) //call put() of supertest
                                    //.set('Authorization', `Token ${request.token}`) 
                                            .expect(200)
                expect(response.body.updatedAt).toBeDefined();
                console.log("UPDATED RESPONSE : ", response.body);
                done();
        })
      }
        catch(err){
            console.log("ERROR : ", err)
        }
    });
    
    //DELETE the user
    describe("DELETE user details", () =>{
        try{
            beforeAll(function (){
                console.log("To delete user : ", data.userId)
            });
    
            test("Delete request", async done =>{
                const response = await request.request.delete(`api/users/${data.userId}`) //invoke delete() of supertest
                                            .expect(204) 
                console.log("DELETE RESPONSE : ", response.body);
                done(); 
            });
    
            afterAll(function (){
                console.log("user is deleted!!")
                fs.unlinkSync('data.json'); //remove data.json after all tests are run
            });
        }
    
        catch(err){
            console.log("EXCEPTION : ", err);
        }
    });

    And we are done with setting up a decent framework and just a command away!

    npm test

    Once complete, the test results will be immediately visible on the terminal.

    Test results HTML report is also generated as index.html under jest-stare/ 

    And test coverage details are created under coverage/my_reports/ in the workspace.

    Similarly, other HTTP methods can also be tested, like OPTIONS – supertest.options() which allows dealing with CORS, PATCH – supertest.patch(), HEAD – supertest.head() and many more.

    Wasn’t it a convenient and successful journey?

    Conclusion

    So, wrapping it up with a note that API testing needs attention, and as a QA, let’s abide by the concept of a testing pyramid which is nothing but the mindset of a tester and how to combat issues at a lower level and avoid chaos at upper levels, i.e. UI. 

    Testing Pyramid

    I hope you had a good read. Kindly spread the word. Happy coding!

  • Unit Testing Data at Scale using Deequ and Apache Spark

    Everyone knows the importance of knowledge and how critical it is to progress. In today’s world, data is knowledge. But that’s only when the data is “good” and correctly interpreted. Let’s focus on the “good” part. What do we really mean by “good data”?

    Its definition can change from use case to use case but, in general terms, good data can be defined by its accuracy, legitimacy, reliability, consistency, completeness, and availability.

    Bad data can lead to failures in production systems, unexpected outputs, and wrong inferences, leading to poor business decisions.

    It’s important to have something in place that can tell us about the quality of the data we have, how close it is to our expectations, and whether we can rely on it.

    This is basically the problem we’re trying to solve.

    The Problem and the Potential Solutions

    A manual approach to data quality testing is definitely one of the solutions and can work well.

    We’ll need to write code for computing various statistical measures, running them manually on different columns, maybe draw some plots, and then conduct some spot checks to see if there’s something not right or unexpected. The overall process can get tedious and time-consuming if we need to do it on a daily basis.

    Certain tools can make life easier for us, like:

    In this blog, we’ll be focussing on Amazon Deequ.

    Amazon Deequ

    Amazon Deequ is an open-source tool developed and used at Amazon. It’s built on top of Apache Spark, so it’s great at handling big data. Deequ computes data quality metrics regularly, based on the checks and validations set, and generates relevant reports.

    Deequ provides a lot of interesting features, and we’ll be discussing them in detail. Here’s a look at its main components:

    Source: AWS

    Prerequisites

    Working with Deequ requires having Apache Spark up and running with Deequ as one of the dependencies.

    As of this blog, the latest version of Deequ, 1.1.0, supports Spark 2.2.x to 2.4.x and Spark 3.0.x.

    Sample Dataset

    For learning more about Deequ and its features, we’ll be using an open-source IMDb dataset which has the following schema: 

    root
     |-- tconst: string (nullable = true)
     |-- titleType: string (nullable = true)
     |-- primaryTitle: string (nullable = true)
     |-- originalTitle: string (nullable = true)
     |-- isAdult: integer (nullable = true)
     |-- startYear: string (nullable = true)
     |-- endYear: string (nullable = true)
     |-- runtimeMinutes: string (nullable = true)
     |-- genres: string (nullable = true)
     |-- averageRating: double (nullable = true)
     |-- numVotes: integer (nullable = true)

    Here, tconst is the primary key, and the rest of the columns are pretty much self-explanatory.

    Data Analysis and Validation

    Before we start defining checks on the data, if we want to compute some basic stats on the dataset, Deequ provides us with an easy way to do that. They’re called metrics.

    Deequ provides support for the following metrics:

    ApproxCountDistinct, ApproxQuantile, ApproxQuantiles, Completeness, Compliance, Correlation, CountDistinct, DataType, Distance, Distinctness, Entropy, Histogram, Maximum, MaxLength, Mean, Minimum, MinLength, MutualInformation, PatternMatch, Size, StandardDeviation, Sum, UniqueValueRatio, Uniqueness

    Let’s go ahead and apply some metrics to our dataset.

    val runAnalyzer: AnalyzerContext = { AnalysisRunner
      .onData(data)
      .addAnalyzer(Size())
      .addAnalyzer(Completeness("averageRating"))
      .addAnalyzer(Uniqueness("tconst"))
      .addAnalyzer(Mean("averageRating"))
      .addAnalyzer(StandardDeviation("averageRating"))
      .addAnalyzer(Compliance("top rating", "averageRating >= 7.0"))
      .addAnalyzer(Correlation("numVotes", "averageRating"))
      .addAnalyzer(Distinctness("tconst"))
      .addAnalyzer(Maximum("averageRating"))
      .addAnalyzer(Minimum("averageRating"))
      .run()
    }
    
    val metricsResult = successMetricsAsDataFrame(spark, runAnalyzer)
    metricsResult.show(false)

    We get the following output by running the code above:

    +-----------+----------------------+-----------------+--------------------+
    |entity     |instance              |name             |value               |
    +-----------+----------------------+-----------------+--------------------+
    |Mutlicolumn|numVotes,averageRating|Correlation      |0.013454113877394851|
    |Column     |tconst                |Uniqueness       |1.0                 |
    |Column     |tconst                |Distinctness     |1.0                 |
    |Dataset    |*                     |Size             |7339583.0           |
    |Column     |averageRating         |Completeness     |0.14858528066240276 |
    |Column     |averageRating         |Mean             |6.886130810579155   |
    |Column     |averageRating         |StandardDeviation|1.3982924856469208  |
    |Column     |averageRating         |Maximum          |10.0                |
    |Column     |averageRating         |Minimum          |1.0                 |
    |Column     |top rating            |Compliance       |0.080230443609671   |
    +-----------+----------------------+-----------------+--------------------+

    Let’s try to quickly understand what this tells us.

    • The dataset has 7,339,583 rows.
    • The distinctness and uniqueness of the tconst column is 1.0, which means that all the values in the column are distinct and unique, which should be expected as it’s the primary key column.
    • The averageRating column has a min of 1 and a max of 10 with a mean of 6.88 and a standard deviation of 1.39, which tells us about the variation in the average rating values across the data.
    • The completeness of the averageRating column is 0.148, which tells us that we have an average rating available for around 15% of the dataset’s records.
    • Then, we tried to see if there’s any correlation between the numVotes and averageRating column. This metric calculates the Pearson correlation coefficient, which has a value of 0.01, meaning there’s no correlation between the two columns, which is expected.

    This feature of Deequ can be really helpful if we want to quickly do some basic analysis on a dataset.

    Let’s move on to defining and running tests and checks on the data.

    Data Validation

    For writing tests for our dataset, we use Deequ’s VerificationSuite and add checks on attributes of the dataset.

    Deequ has a big handy list of validators available to use, which are:

    hasSize, isComplete, hasCompleteness, isUnique, isPrimaryKey, hasUniqueness, hasDistinctness, hasUniqueValueRatio, hasNumberOfDistinctValues, hasHistogramValues, hasEntropy, hasMutualInformation, hasApproxQuantile, hasMinLength, hasMaxLength, hasMin, hasMax, hasMean, hasSum, hasStandardDeviation, hasApproxCountDistinct, hasCorrelation, satisfies, hasPattern, containsCreditCardNumber, containsEmail, containsURL, containsSocialSecurityNumber, hasDataType, isNonNegative, isPositive, isLessThan, isLessThanOrEqualTo, isGreaterThan, isGreaterThanOrEqualTo, isContainedIn

    Let’s apply some checks to our dataset.

    val validationResult: VerificationResult = { VerificationSuite()
      .onData(data)
      .addCheck(
        Check(CheckLevel.Error, "Review Check") 
          .hasSize(_ >= 100000) // check if the data has atleast 100k records
          .hasMin("averageRating", _ > 0.0) // min rating should not be less than 0
          .hasMax("averageRating", _ < 9.0) // max rating should not be greater than 9
          .containsURL("titleType") // verify that titleType column has URLs
          .isComplete("primaryTitle") // primaryTitle should never be NULL
          .isNonNegative("numVotes") // should not contain negative values
          .isPrimaryKey("tconst") // verify that tconst is the primary key column
          .hasDataType("isAdult", ConstrainableDataTypes.Integral) 
          //column contains Integer values only, expected as values this col has are 0 or 1
          )
      .run()
    }
    
    val results = checkResultsAsDataFrame(spark, validationResult)
    results.select("constraint","constraint_status","constraint_message").show(false)

    We have added some checks to our dataset, and the details about the check can be seen as comments in the above code.

    We expect all checks to pass for our dataset except the containsURL and hasMax ones.

    That’s because the titleType column doesn’t have URLs, and we know that the max rating is 10.0, but we are checking against 9.0.

    We can see the output below:

    +--------------------------------------------------------------------------------------------+-----------------+-----------------------------------------------------+
    |constraint                                                                                  |constraint_status|constraint_message                                   |
    +--------------------------------------------------------------------------------------------+-----------------+-----------------------------------------------------+
    |SizeConstraint(Size(None))                                                                  |Success          |                                                     |
    |MinimumConstraint(Minimum(averageRating,None))                                              |Success          |                                                     |
    |MaximumConstraint(Maximum(averageRating,None))                                              |Failure          |Value: 10.0 does not meet the constraint requirement!|
    |containsURL(titleType)                                                                      |Failure          |Value: 0.0 does not meet the constraint requirement! |
    |CompletenessConstraint(Completeness(primaryTitle,None))                                     |Success          |                                                     |
    |ComplianceConstraint(Compliance(numVotes is non-negative,COALESCE(numVotes, 0.0) >= 0,None))|Success          |                                                     |
    |UniquenessConstraint(Uniqueness(List(tconst),None))                                         |Success          |                                                     |
    |AnalysisBasedConstraint(DataType(isAdult,None),<function1>,Some(<function1>),None)          |Success          |                                                     |
    +--------------------------------------------------------------------------------------------+-----------------+-----------------------------------------------------+
    view raw

    In order to perform these checks, behind the scenes, Deequ calculated metrics that we saw in the previous section.

    To look at the metrics Deequ computed for the checks we defined, we can use: 

    VerificationResult.successMetricsAsDataFrame(spark,validationResult)
                      .show(truncate=false)

    Automated Constraint Suggestion

    Automated constraint suggestion is a really interesting and useful feature provided by Deequ.

    Adding validation checks on a dataset with hundreds of columns or on a large number of datasets can be challenging. With this feature, Deequ tries to make our task easier. Deequ analyses the data distribution and, based on that, suggests potential useful constraints that can be used as validation checks.

    Let’s see how this works.

    This piece of code can automatically generate constraint suggestions for us:

    val constraintResult = { ConstraintSuggestionRunner()
      .onData(data)
      .addConstraintRules(Rules.DEFAULT)
      .run()
    }
    
    val suggestionsDF = constraintResult.constraintSuggestions.flatMap { 
      case (column, suggestions) => 
        suggestions.map { constraint =>
          (column, constraint.description, constraint.codeForConstraint)
        } 
    }.toSeq.toDS()
    
    suggestionsDF.select("_1","_2").show(false)

    Let’s look at constraint suggestions generated by Deequ:

    +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |runtimeMinutes|'runtimeMinutes' has less than 72% missing values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
    |tconst        |'tconst' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
    |titleType     |'titleType' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
    |titleType     |'titleType' has value range 'tvEpisode', 'short', 'movie', 'video', 'tvSeries', 'tvMovie', 'tvMiniSeries', 'tvSpecial', 'videoGame', 'tvShort'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
    |titleType     |'titleType' has value range 'tvEpisode', 'short', 'movie' for at least 90.0% of values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
    |averageRating |'averageRating' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
    |originalTitle |'originalTitle' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
    |startYear     |'startYear' has less than 9% missing values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
    |startYear     |'startYear' has type Integral                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
    |startYear     |'startYear' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
    |endYear       |'endYear' has type Integral  
    |endYear       |'endYear' has value range '2017', '2018', '2019', '2016', '2015', '2020', '2014', '2013', '2012', '2011', '2010',......|
    |endYear       |'endYear' has value range '' for at least 99.0% of values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
    |endYear       |'endYear' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
    |numVotes      |'numVotes' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
    |primaryTitle  |'primaryTitle' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
    |isAdult       |'isAdult' is not null                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
    |isAdult       |'isAdult' has no negative values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
    |genres        |'genres' has less than 7% missing values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
    +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

    We shouldn’t expect the constraint suggestions generated by Deequ to always make sense. They should always be verified before using.

    This is because the algorithm that generates the constraint suggestions just works on the data distribution and isn’t exactly “intelligent.”

    We can see that most of the suggestions generated make sense even though they might be really trivial.

    For the endYear column, one of the suggestions is that endYear should be contained in a list of years, which indeed is true for our dataset. However, it can’t be generalized as every passing year, the value for endYear continues to increase.

    But on the other hand, the suggestion that titleType can take the following values: ‘tvEpisode,’ ‘short,’ ‘movie,’ ‘video,’ ‘tvSeries,’ ‘tvMovie,’ ‘tvMiniSeries,’ ‘tvSpecial,’ ‘videoGame,’ and ‘tvShort’ makes sense and can be generalized, which makes it a great suggestion.

    And this is why we should not blindly use the constraints suggested by Deequ and always cross-check them.

    Something we can do to improve the constraint suggestions is to use the useTrainTestSplitWithTestsetRatio method in ConstraintSuggestionRunner.
    It makes a lot of sense to use this on large datasets.

    How does this work? If we use the config useTrainTestSplitWithTestsetRatio(0.1), Deequ would compute constraint suggestions on 90% of the data and evaluate the suggested constraints on the remaining 10%, which would improve the quality of the suggested constraints.

    Anomaly Detection

    Deequ also supports anomaly detection for data quality metrics.

    The idea behind Deequ’s anomaly detection is that often we have a sense of how much change in certain metrics of our data can be expected. Say we are getting new data every day, and we know that the number of records we get on a daily basis are around 8 to 12k. On a random day, if we get 40k records, we know something went wrong with the data ingestion job or some other job didn’t go right.

    Deequ will regularly store the metrics of our data in a MetricsRepository. Once that’s done, anomaly detection checks can be run. These compare the current values of the metrics to the historical values stored in the MetricsRepository, and that helps Deequ to detect anomalous changes that are a red flag.

    One of Deequ’s anomaly detection strategies is the RateOfChangeStrategy, which limits the maximum change in the metrics by some numerical factor that can be passed as a parameter.

    Deequ supports other strategies that can be found here. And code examples for anomaly detection can be found here.

    Conclusion

    We learned about the main features and capabilities of AWS Lab’s Deequ.

    It might feel a little daunting to people unfamiliar with Scala or Spark, but using Deequ is very easy and straightforward. Someone with a basic understanding of Scala or Spark should be able to work with Deequ’s primary features without any friction.

    For someone who rarely deals with data quality checks, manual test runs might be a good enough option. However, for someone dealing with new datasets frequently, as in multiple times in a day or a week, using a tool like Deequ to perform automated data quality testing makes a lot of sense in terms of time and effort.

    We hope this article helped you get a deep dive into data quality testing and using Deequ for these types of engineering practices.

  • How to Make Asynchronous Calls in Redux Without Middlewares

    Redux has greatly helped in reducing the complexities of state management. Its one way data flow is easier to reason about and it also provides a powerful mechanism to include middlewares which can be chained together to do our biding. One of the most common use cases for the middleware is to make async calls in the application. Different middlewares like redux-thunk, redux-sagas, redux-observable, etc are a few examples. All of these come with their own learning curve and are best suited for tackling different scenarios.

    But what if our use-case is simple enough and we don’t want to have the added complexities that implementing a middleware brings? Can we somehow implement the most common use-case of making async API calls using only redux and javascript?

    The answer is Yes! This blog will try to explain on how to implement async action calls in redux without the use of any middlewares.

    So let us first start by making a simple react project by using create-react-app

    npx create-react-app async-redux-without-middlewares
    cd async-redux-without-middlewares
    npm start

    Also we will be using react-redux in addition to redux to make our life a little easier. And to mock the APIs we will be using https://jsonplaceholder.typicode.com/

    We will just implement two API calls to not to over complicate things.

    Create a new file called api.js .It is the file in which we will keep the fetch calls to the endpoint.

    export const getPostsById = id => fetch(`https://jsonplaceholder.typicode.com/Posts/${id}`);
     
    export const getPostsBulk = () => fetch("https://jsonplaceholder.typicode.com/posts");

    Each API call has three base actions associated with it. Namely, REQUEST, SUCCESS and FAIL. Each of our APIs will be in one of these three states at any given time. And depending on these states we can decide how to show our UI. Like when it is in REQUEST state we can have the UI show a loader and when it is in FAIL state we can show a custom UI to tell the user that something has went wrong.

    So we create three constants of REQUEST, SUCCESS and FAIL for each API call which we will be making. In our case the constants.js file will look something like this:

    export const GET_POSTS_BY_ID_REQUEST = "getpostsbyidrequest";
    export const GET_POSTS_BY_ID_SUCCESS = "getpostsbyidsuccess";
    export const GET_POSTS_BY_ID_FAIL = "getpostsbyidfail";
     
    export const GET_POSTS_BULK_REQUEST = "getpostsbulkrequest";
    export const GET_POSTS_BULK_SUCCESS = "getpostsbulksuccess";
    export const GET_POSTS_BULK_FAIL = "getpostsbulkfail";

    The store.js file and the initialState of our application is as follows:

    import { createStore } from 'redux'
    import reducer from './reducers';
     
    const initialState = {
        byId: {
            isLoading: null,
            error: null,
            data: null
        },
        byBulk: {
            isLoading: null,
            error: null,
            data: null
        }
    };
     
    const store = createStore(reducer, initialState, window.__REDUX_DEVTOOLS_EXTENSION__ && window.__REDUX_DEVTOOLS_EXTENSION__());
     
    export default store;

    As can be seen from the above code, each of our APIs data lives in one object the the state object. Keys isLoading tells us if the API is in the REQUEST state.

    Now as we have our store defined, let us see how we will manipulate the statewith different phases that an API call can be in. Below is our reducers.js file.

    import {
        GET_POSTS_BY_ID_REQUEST,
        GET_POSTS_BY_ID_SUCCESS,
        GET_POSTS_BY_ID_FAIL,
     
        GET_POSTS_BULK_REQUEST,
        GET_POSTS_BULK_SUCCESS,
        GET_POSTS_BULK_FAIL
     
    } from "./constants";
     
    const reducer = (state, action) => {
        switch (action.type) {
            case GET_POSTS_BY_ID_REQUEST:
                return {
                    ...state,
                    byId: {
                        isLoading: true,
                        error: null,
                        data: null
                    }
                }
            case GET_POSTS_BY_ID_SUCCESS:
                return {
                    ...state,
                    byId: {
                        isLoading: false,
                        error: false,
                        data: action.payload
                    }
                }
            case GET_POSTS_BY_ID_FAIL:
                return {
                    ...state,
                    byId: {
                        isLoading: false,
                        error: action.payload,
                        data: false
                    }
                }
     
                case GET_POSTS_BULK_REQUEST:
                return {
                    ...state,
                    byBulk: {
                        isLoading: true,
                        error: null,
                        data: null
                    }
                }
            case GET_POSTS_BULK_SUCCESS:
                return {
                    ...state,
                    byBulk: {
                        isLoading: false,
                        error: false,
                        data: action.payload
                    }
                }
            case GET_POSTS_BULK_FAIL:
                return {
                    ...state,
                    byBulk: {
                        isLoading: false,
                        error: action.payload,
                        data: false
                    }
                }
            default: return state;
        }
    }
     
    export default reducer;

    By giving each individual API call its own variable to denote the loading phase we can now easily implement something like multiple loaders in the same screen according to which API call is in which phase.

    Now to actually implement the async behaviour in the actions we just need a normal JavaScript function which will pass the dispatch as the first argument. We pass dispatch to the function because it dispatches actions to the store. Normally a component has access to dispatch but since we want an external function to take control over dispatching, we need to give it control over dispatching.

    const getPostById = async (dispatch, id) => {
        dispatch({ type: GET_POSTS_BY_ID_REQUEST });
     
        try {
            const response = await getPostsById(id);
            const res = await response.json();
            dispatch({ type: GET_POSTS_BY_ID_SUCCESS, payload: res });
        } catch (e) {
            dispatch({ type: GET_POSTS_BY_ID_FAIL, payload: e });
        }
    };

    And a function to give dispatch in the above function’s scope:

    export const getPostByIdFunc = dispatch => {
        return id => getPostById(dispatch, id);
    }

    So now our complete actions.js file looks like this:

    import {
        GET_POSTS_BY_ID_REQUEST,
        GET_POSTS_BY_ID_SUCCESS,
        GET_POSTS_BY_ID_FAIL,
     
        GET_POSTS_BULK_REQUEST,
        GET_POSTS_BULK_SUCCESS,
        GET_POSTS_BULK_FAIL
     
    } from "./constants";
     
    import {
        getPostsById,
        getPostsBulk
    } from "./api";
     
    const getPostById = async (dispatch, id) => {
        dispatch({ type: GET_POSTS_BY_ID_REQUEST });
     
        try {
            const response = await getPostsById(id);
            const res = await response.json();
            dispatch({ type: GET_POSTS_BY_ID_SUCCESS, payload: res });
        } catch (e) {
            dispatch({ type: GET_POSTS_BY_ID_FAIL, payload: e });
        }
    };
     
    const getPostBulk = async dispatch => {
        dispatch({ type: GET_POSTS_BULK_REQUEST });
     
        try {
            const response = await getPostsBulk();
            const res = await response.json();
            dispatch({ type: GET_POSTS_BULK_SUCCESS, payload: res });
        } catch (e) {
            dispatch({ type: GET_POSTS_BULK_FAIL, payload: e });
        }
    };
     
    export const getPostByIdFunc = dispatch => {
        return id => getPostById(dispatch, id);
    }
     
    export const getPostsBulkFunc = dispatch => {
        return () => getPostBulk(dispatch);
    }

    Once this is done, all that is left to do is to pass these functions in mapDispatchToProps of our connected component.

    const mapDispatchToProps = dispatch => {
      return {
        getPostById: getPostByIdFunc(dispatch),
        getPostBulk: getPostsBulkFunc(dispatch)
      }
    };

    Our App.js file looks like the one below:

    import React, { Component } from 'react';
    import './App.css';
     
    import { connect } from 'react-redux';
    import { getPostByIdFunc, getPostsBulkFunc } from './actions';
     
    class App extends Component {
      render() {
        console.log(this.props);
        return (
          <div className="App">
            <button onClick={() => {
              this.props.getPostById(1);
            }}>By Id</button>
            <button onClick={() => {
              this.props.getPostBulk();
            }}>In bulk</button>
          </div>
        );
      }
    }
     
    const mapStateToProps = state => {
      return {
        state
      };
    }
     
    const mapDispatchToProps = dispatch => {
      return {
        getPostById: getPostByIdFunc(dispatch),
        getPostBulk: getPostsBulkFunc(dispatch)
      }
    };

    This is how we do async calls without middlewares in redux. This is a much simpler approach than using a middleware and the learning curve associated with it. If this approach covers all your use cases then by all means use it.

    Conclusion

    This type of approach really shines when you have to make a simple enough application like a demo of sorts, where API calls is all the side-effect that you need. In larger and more complicated applications there are a few inconveniences with this approach. First we have to pass dispatch around to which seems kind of yucky. Also, remember which call requires dispatch and which do not.

    The full code can be found here.

  • The Ultimate Beginner’s Guide to Jupyter Notebooks

    Jupyter Notebooks offer a great way to write and iterate on your Python code. It is a powerful tool for developing data science projects in an interactive way. Jupyter Notebook allows to showcase the source code and its corresponding output at a single place helping combine narrative text, visualizations and other rich media.The intuitive workflow promotes iterative and rapid development, making notebooks the first choice for data scientists. Creating Jupyter Notebooks is completely free as it falls under Project Jupyter which is completely open source.

    Project Jupyter is the successor to an earlier project IPython Notebook, which was first published as a prototype in 2010. Jupyter Notebook is built on top of iPython, an interactive tool for executing Python code in the terminal using REPL model(Read-Eval-Print-Loop). The iPython kernel executes the python code and communicates with the Jupyter Notebook front-end interface. Jupyter Notebooks also provide additional features like storing your code and output and keep the markdown by extending iPython.

    Although Jupyter Notebooks support using various programming languages, we will focus on Python and its application in this article.

    Getting Started with Jupyter Notebooks!

    Installation

    Prerequisites

    As you would have surmised from the above abstract we need to have Python installed on your machine. Either Python 2.7 or Python 3.+ will do.

    Install Using Anaconda

    The simplest way to get started with Jupyter Notebooks is by installing it using Anaconda. Anaconda installs both Python3 and Jupyter and also includes quite a lot of packages commonly used in the data science and machine learning community. You can follow the latest guidelines from here.

    Install Using Pip

    If, for some reason, you decide not to use Anaconda, then you can install Jupyter manually using Python pip package, just follow the below code:

    pip install jupyter

    Launching First Notebook

    Open your terminal, navigate to the directory where you would like to store you notebook and launch the Jupyter Notebooks. Then type the below command and the program will instantiate a local server at http://localhost:8888/tree.

    jupyter notebook

    A new window with the Jupyter Notebook interface will open in your internet browser. As you might have already noticed Jupyter starts up a local Python server to serve web apps in your browser, where you can access the Dashboard and work with the Jupyter Notebooks. The Jupyter Notebooks are platform independent which makes it easier to collaborate and share with others.

    The list of all files is displayed under the Files tab whereas all the running processes can be viewed by clicking on the Running tab and the third tab, Clusters is extended from IPython parallel, IPython’s parallel computing framework. It helps you to control multiple engines, extended from the IPython kernel.

    Let’s start by making a new notebook. We can easily do this by clicking on the New drop-down list in the top- right corner of the dashboard. You see that you have the option to make a Python 3 notebook as well as regular text file, a folder, and a terminal. Please select the Python 3 notebook option.

    Your Jupyter Notebook will open in a new tab as shown in below image.

    Now each notebook is opened in a new tab so that you can simultaneously work with multiple notebooks. If you go back to the dashboard tab, you will see the new file Untitled.ipynb and you should see some green icon to it’s left which indicates your new notebook is running.

     

    Why a .ipynb file?

    .ipynb is the standard file format for storing Jupyter Notebooks, hence the file name Untitled.ipynb. Let’s begin by first understanding what an .ipynb file is and what it might contain. Each .ipynb file is a text file that describes the content of your notebook in a JSON format. The content of each cell, whether it is text, code or image attachments that have been converted into strings, along with some additional metadata is stored in the .ipynb file. You can also edit the metadata by selecting “Edit > Edit Notebook Metadata” from the menu options in the notebook.

    You can also view the content of your notebook files by selecting “Edit” from the controls on the dashboard, there’s no reason to do so unless you really want to edit the file manually.

    Understanding the Notebook Interface

    Now that you have created a notebook, let’s have a look at the various menu options and functions, which are readily available. Take some time out to scroll through the the list of commands that opens up when you click on the keyboard icon (or press Ctrl + Shift + P).

    There are two prominent terminologies that you should care to learn about: cells and kernels are key both to understanding Jupyter and to what makes it more than just a content writing tool. Fortunately, these concepts are not difficult to understand.

    • A kernel is a program that interprets and executes the user’s code. The Jupyter Notebook App has an inbuilt kernel for Python code, but there are also kernels available for other programming languages.
    • A cell is a container which holds the executable code or normal text 

    Cells

    Cells form the body of a notebook. If you look at the screenshot above for a new notebook (Untitled.ipynb), the text box with the green border is an empty cell. There are 4 types of cells:

    • Code – This is where you type your code and when executed the kernel will display its output below the cell.
    • Markdown – This is where you type your text formatted using Markdown and the output is displayed in place when it is run.
    • Raw NBConvert – It’s a command line tool to convert your notebook into another format (like HTML, PDF etc.)
    • Heading – This is where you add Headings to separate sections and make your notebook look tidy and neat. This has now been merged into the Markdown option itself. Adding a ‘#’ at the beginning ensures that whatever you type after that will be taken as a heading.

    Let’s test out how the cells work with a basic “hello world” example. Type print(‘Hello World!’) in the cell and press Ctrl + Enter or click on the Run button in the toolbar at the top.

    print("Hello World!")

    Hello World!

    When you run the cell, the output will be displayed below, and the label to its left changes from In[ ] to In[1] . Moreover, to signify that the cell is still running, Jupyter changes the label to In[*]

    Additionally, it is important to note that the output of a code cell comes from any of the print statements in the code cell, as well as the value of the last line in the cell, irrespective of it being a variable, function call or some other code snippet.

    Markdown

    Markdown is a lightweight, markup language for formatting plain text. Its syntax has a one-to-one correspondence with HTML tags. As this article has been written in a Jupyter notebook, all of the narrative text and images you can see, are written in Markdown. Let’s go through the basics with the following example.

    # This is a level 1 heading 
    ### This is a level 3 heading
    This is how you write some plain text that would form a paragraph.
    You can emphasize the text by enclosing the text like "**" or "__" to make it bold and enclosing the text in "*" or "_" to make it italic. 
    Paragraphs are separated by an empty line.
    * We can include lists.
      * And also indent them.
    
    1. Getting Numbered lists is
    2. Also easy.
    
    [To include hyperlinks enclose the text with square braces and then add the link url in round braces](http://www.example.com)
    
    Inline code uses single backticks: `foo()`, and code blocks use triple backticks:
    
    ``` 
    foo()
    ```
    
    Or can be indented by 4 spaces: 
    
        foo()
        
    And finally, adding images is easy: ![Online Image](https://www.example.com/image.jpg) or ![Local Image](img/image.jpg) or ![Image Attachment](attachment:image.jpg)

    We have 3 different ways to attach images

    • Link the URL of an image from the web.
    • Use relative path of an image present locally
    • Add an attachment to the notebook by using “Edit>Insert Image” option; This method converts the image into a string and store it inside your notebook

    Note that adding an image as an attachment will make the .ipynb file much larger because it is stored inside the notebook in a string format.

    There are a lot more features available in Markdown. To learn more about markdown, you can refer to the official guide from the creator, John Gruber, on his website.

    Kernels

    Every notebook runs on top of a kernel. Whenever you execute a code cell, the content of the cell is executed within the kernel and any output is returned back to the cell for display. The kernel’s state applies to the document as a whole and not individual cells and is persisted over time.

    For example, if you declare a variable or import some libraries in a cell, they will be accessible in other cells. Now let’s understand this with the help of an example. First we’ll import a Python package and then define a function.

    import os, binascii
    def sum(x,y):
      return x+y

    Once the cell above  is executed, we can reference os, binascii and sum in any other cell.

    rand_hex_string = binascii.b2a_hex(os.urandom(15)) 
    print(rand_hex_string)
    x = 1
    y = 2
    z = sum(x,y)
    print('Sum of %d and %d is %d' % (x, y, z))

    The output should look something like this:

    c84766ca4a3ce52c3602bbf02a
    d1f7 Sum of 1 and 2 is 3

    The execution flow of a notebook is generally from top-to-bottom, but it’s common to go back to make changes. The order of execution is shown to the left of each cell, such as In [2] , will let you know whether any of your cells have stale output. Additionally, there are multiple options in the Kernel menu which often come very handy.

    • Restart: restarts the kernel, thus clearing all the variables etc that were defined.
    • Restart & Clear Output: same as above but will also wipe the output displayed below your code cells.
    • Restart & Run All: same as above but will also run all your cells in order from top-to-bottom.
    • Interrupt: If your kernel is ever stuck on a computation and you wish to stop it, you can choose the Interrupt option.

    Naming Your Notebooks

    It is always a best practice to give a meaningful name to your notebooks. You can rename your notebooks from the notebook app itself by double-clicking on the existing name at the top left corner. You can also use the dashboard or the file browser to rename the notebook file. We’ll head back to the dashboard to rename the file we created earlier, which will have the default notebook file name Untitled.ipynb.

    Now that you are back on the dashboard, you can simply select your notebook and click “Rename” in the dashboard controls

    Jupyter notebook - Rename

    Shutting Down your Notebooks

    We can shutdown a running notebook by selecting “File > Close and Halt” from the notebook menu. However, we can also shutdown the kernel either by selecting the notebook in the dashboard and clicking “Shutdown” or by going to “Kernel > Shutdown” from within the notebook app (see images below).

    Shutdown the kernel from Notebook App:

     

    Shutdown the kernel from Dashboard:

     

     

    Sharing Your Notebooks

    When we talk about sharing a notebook, there are two things that might come to our mind. In most cases, we would want to share the end-result of the work, i.e. sharing non-interactive, pre-rendered version of the notebook, very much similar to this article; however, in some cases we might want to share the code and collaborate with others on notebooks with the aid of version control systems such as Git which is also possible.

    Before You Start Sharing

    The state of the shared notebook including the output of any code cells is maintained when exported to a file. Hence, to ensure that the notebook is share-ready, we should follow below steps before sharing.

    1. Click “Cell > All Output > Clear”
    2. Click “Kernel > Restart & Run All”
    3. After the code cells have finished executing, validate the output. 

    This ensures that your notebooks don’t have a stale state or contain intermediary output.

    Exporting Your Notebooks

    Jupyter has built-in support for exporting to HTML, Markdown and PDF as well as several other formats, which you can find from the menu under “File > Download as” . It is a very convenient way to share the results with others. But if sharing exported files isn’t suitable for you, there are some other popular methods of sharing the notebooks directly on the web.

    • GitHub
    • With home to over 2 million notebooks, GitHub is the most popular place for sharing Jupyter projects with the world. GitHub has integrated support for rendering .ipynb files directly both in repositories and gists on its website.
    • You can just follow the GitHub guides for you to get started on your own.
    • Nbviewer
    • NBViewer is one of the most prominent notebook renderers on the web.
    • It also renders your notebook from GitHub and other such code storage platforms and provide a shareable URL along with it. nbviewer.jupyter.org provides a free rendering service as part of Project Jupyter.

    Data Analysis in a Jupyter Notebook

    Now that we’ve looked at what a Jupyter Notebook is, it’s time to look at how they’re used in practice, which should give you a clearer understanding of why they are so popular. As we walk through the sample analysis, you will be able to see how the flow of a notebook makes the task intuitive to work through ourselves, as well as for others to understand when we share it with them. We also hope to learn some of the more advanced features of Jupyter notebooks along the way. So let’s get started, shall we?

    Analyzing the Revenue and Profit Trends of Fortune 500 US companies from 1955-2013

    So, let’s say you’ve been tasked with finding out how the revenues and profits of the largest companies in the US changed historically over the past 60 years. We shall begin by gathering the data to analyze.

    Gathering the DataSet

    The data set that we will be using to analyze the revenue and profit trends of fortune 500 companies has been sourced from Fortune 500 Archives and Top Foreign Stocks. For your ease we have compiled the data from both the sources and created a CSV for you.

    Importing the Required Dependencies

    Let’s start off with a code cell specifically for imports and initial setup, so that if we need to add or change anything at a later point in time, we can simply edit and re-run the cell without having to change the other cells. We can start by importing Pandas to work with our data, Matplotlib to plot the charts and Seaborn to make our charts prettier.

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    import sys

    Set the design styles for the charts

    sns.set(style="darkgrid")

    Load the Input Data to be Analyzed

    As we plan on using pandas to aid in our analysis, let’s begin by importing our input data set into the most widely used pandas data-structure, DataFrame.

    df = pd.read_csv('../data/fortune500_1955_2013.csv')

    Now that we are done loading our input dataset, let us see how it looks like!

    df.head()

    Looking good. Each row corresponds to a single company per year and all the columns we need are present.

    Exploring the Dataset

    Next, let’s begin by exploring our data set. We will primarily look into the number of records imported and the data types for each of the different columns that were imported.

    As we have 500 data points per year and since the data set has records between 1955 and 2012, the total number of records in the dataset looks good!

    Now, let’s move on to the individual data types for each of the column.

    df.columns = ['year', 'rank', 'company', 'revenue', 'profit']
    len(df)

    df.dtypes

    As we can see from the output of the above command the data types for the columns revenue and profit are being shown as object whereas the expected data type should be float. It indicates that there may be some non-numeric values in the revenue and profit columns.

    So let’s first look at the details of imported values for revenue.

    non_numeric_revenues = df.revenue.str.contains('[^0-9.-]')
    df.loc[non_numeric_revenues].head()

    print("Number of Non-numeric revenue values: ", len(df.loc[non_numeric_revenues]))

    Number of Non-numeric revenue values:	1

    print("List of distinct Non-numeric revenue values: ", set(df.revenue[non_numeric_revenues]))

    List of distinct Non-numeric revenue values:	{'N.A.'}

    As the number of non-numeric revenue values is considerably less compared to the total size of our data set. Hence, it would be easier to just remove those rows.

    df = df.loc[~non_numeric_revenues]
    df.revenue = df.revenue.apply(pd.to_numeric)
    eval(In[6])

    Now that the data type issue for column revenue is resolved, let’s move on to values in column profit.

    non_numeric_profits = df.profit.str.contains('[^0-9.-]')
    df.loc[non_numeric_profits].head()

    print("Number of Non-numeric profit values: ", len(df.loc[non_numeric_profits]))

    Number of Non-numeric profit values:	374

    print("List of distinct Non-numeric profit values: ", set(df.profit[non_numeric_profits]))

    List of distinct Non-numeric profit values:	{'N.A.'}

    As the number of non-numeric profit values is around 1.5% which is a small percentage of our data set, but not completely inconsequential. Let’s take a quick look at the distribution of values and if the rows having N.A. values are uniformly distributed over the years then it would be wise to just remove the rows with missing values.

    bin_sizes, _, _ = plt.hist(df.year[non_numeric_profits], bins=range(1955, 2013))

    As observed from the histogram above, majority of invalid values in single year is fewer than 25, removing these values would account for less than 4% of the data as there are 500 data points per year. Also, other than a surge around 1990, most years have fewer than less than 10 values missing. Let’s assume that this is acceptable for us and move ahead with removing these rows.

    df = df.loc[~non_numeric_profits]
    df.profit = df.profit.apply(pd.to_numeric)

    We should validate if that worked!

    eval(In[6])

    Hurray! Our dataset has been cleaned up.

    Time to Plot the graphs

    Let’s begin with defining a function to plot the graph, set the title and add lables for the x-axis and y-axis.

    # function to plot the graphs for average revenues or profits of the fortune 500 companies against year
    def plot(x, y, ax, title, y_label):
        ax.set_title(title)
        ax.set_ylabel(y_label)
        ax.plot(x, y)
        ax.margins(x=0, y=0)
        
    # function to plot the graphs with superimposed standard deviation    
    def plot_with_std(x, y, stds, ax, title, y_label):
        ax.fill_between(x, y - stds, y + stds, alpha=0.2)
        plot(x, y, ax, title, y_label)

    Let’s plot the average profit by year and average revenue by year using Matplotlib.

    group_by_year = df.loc[:, ['year', 'revenue', 'profit']].groupby('year')
    avgs = group_by_year.mean()
    x = avgs.index
    y = avgs.profit
    
    fig, ax = plt.subplots()
    plot(x, y, ax, 'Increase in mean Fortune 500 company profits from 1955 to 2013', 'Profit (millions)')

    y2 = avgs.revenue
    fig, ax = plt.subplots()
    plot(x, y2, ax, 'Increase in mean Fortune 500 company revenues from 1955 to 2013', 'Revenue (millions)')

    Woah! The charts for profits has got some huge ups and downs. It seems like they correspond to the early 1990s recession, the dot-com bubble in the early 2000s and the Great Recession in 2008.

    On the other hand, the Revenues are constantly growing and are comparatively stable. Also it does help to understand how the average profits recovered so quickly after the staggering drops because of the recession.

    Let’s also take a look at how the average profits and revenues compare to their standard deviations.

    fig, (ax1, ax2) = plt.subplots(ncols=2)
    title = 'Increase in mean and std Fortune 500 company %s from 1955 to 2013'
    stds1 = group_by_year.std().profit.values
    stds2 = group_by_year.std().revenue.values
    plot_with_std(x, y.values, stds1, ax1, title % 'profits', 'Profit (millions)')
    plot_with_std(x, y2.values, stds2, ax2, title % 'revenues', 'Revenue (millions)')
    fig.set_size_inches(14, 4)
    fig.tight_layout()

     

    That’s astonishing, the standard deviations are huge. Some companies are making billions while some others are losing as much, and the risk certainly has increased along with rising profits and revenues over the years. Although we could keep on playing around with our data set and plot plenty more charts to analyze, it is time to bring this article to a close.

    Conclusion

    As part of this article we have seen various features of the Jupyter notebooks, from basics like installation, creating, and running code cells to more advanced features like plotting graphs. The power of Jupyter Notebooks to promote a productive working experience and provide an ease of use is evident from the above example, and I do hope that you feel confident to begin using Jupyter Notebooks in your own work and start exploring more advanced features. You can read more about data analytics using Pandas here.

    If you’d like to further explore and want to look at more examples, Jupyter has put together A Gallery of Interesting Jupyter Notebooks that you may find helpful and the Nbviewer homepage provides a lot of examples for further references. Find the entire code here on Github.

  • Continuous Deployment with Azure Kubernetes Service, Azure Container Registry & Jenkins

    Introduction

    Containerization has taken the application development world by storm. Kubernetes has become the standard way of deploying new containerized distributed applications used by the largest enterprises in a wide range of industries for mission-critical tasks, it has become one of the biggest open-source success stories.

    Although Google Cloud has been providing Kubernetes as a service since November 2014 (Note it started with a beta project), Microsoft with AKS (Azure Kubernetes Service) and Amazon with EKS (Elastic Kubernetes Service)  have jumped on to the scene in the second half of 2017.

    Example:

    AWS had KOPS

    Azure had Azure Container Service.

    However, they were wrapper tools available prior to these services which would help a user create a Kubernetes cluster, but the management and the maintenance (like monitoring and upgrades) needed efforts.

    Azure Container Registry:

    With container demand growing, there is always a need in the market for storing and protecting the container images. Microsoft provides a Geo Replica featured private repository as a service named Azure Container Registry.

    Azure Container Registry is a registry offering from Microsoft for hosting container images privately. It integrates well with orchestrators like Azure Container Service, including Docker Swarm, DC/OS, and the new Azure Kubernetes service. Moreover, ACR  provides capabilities such as Azure Active Directory-based authentication, webhook support, and delete operations.

    The coolest feature provided is Geo-Replication. This will create multiple copies of your image and distribute it across the globe and the container when spawned will have access to the image which is nearest.

    Although Microsoft has good documentation on how to set up ACR  in your Azure Subscription, we did encounter some issues and hence decided to write a blog on the precautions and steps required to configure the Registry in the correct manner.

    Note: We tried this using a free trial account. You can setup it up by referring the following link

    Prerequisites:

    • Make sure you have resource groups created in the supported region.
      Supported Regions: eastus, westeurope, centralus, canada central, canadaeast
    • If you are using Azure CLI for operations please make sure you use the version: 2.0.23 or 2.0.25 (This was the latest version at the time of writing this blog)

    Steps to install Azure CLI 2.0.23 or 2.0.25 (ubuntu 16.04 workstation):

    echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ wheezy main" |            
    sudo tee /etc/apt/sources.list.d/azure-cli.list
    sudo apt-key adv --keyserver packages.microsoft.com --recv-keys 52E16F86FEE04B979B07E28DB02C46DF417A0893
    sudo apt-get install apt-transport-httpssudo apt-get update && sudo apt-get install azure-cli
    
    Install a specific version:
    
    sudo apt install azure-cli=2.0.23-1
    sudo apt install azure-cli=2.0.25.1

    Steps for Container Registry Setup:

    • Login to your Azure Account:
    az  login --username --password

    • Create a resource group:
    az group create --name <RESOURCE-GROUP-NAME>  --location eastus
    Example : az group create --name acr-rg  --location eastus

    • Create a Container Registry:
    az acr create --resource-group <RESOURCE-GROUP-NAME> --name <CONTAINER-REGISTRY-NAME> --sku Basic --admin-enabled true
    Example : az acr create --resource-group acr-rg --name testacr --sku Basic --admin-enabled true

    Note: SKU defines the storage available for the registry for type Basic the storage available is 10GB, 1 WebHook and the billing amount is 11 Rs/day.

    For detailed information on the different SKU available visit the following link

    • Login to the registry :
    az acr login --name <CONTAINER-REGISTRY-NAME>
    Example :az acr login --name testacr

    • Sample docker file for a node application :
    FROM node:carbon
    # Create app directory
    WORKDIR /usr/src/app
    COPY package*.json ./
    # RUN npm install
    EXPOSE 8080
    CMD [ "npm", "start" ]

    • Build the docker image :
    docker build -t <image-tag>:<software>
    Example :docker build -t base:node8

    • Get the login server value for your ACR :
    az acr list --resource-group acr-rg --query "[].{acrLoginServer:loginServer}" --output table
    Output  :testacr.azurecr.io

    • Tag the image with the Login Server Value:
      Note: Get the image ID from docker images command

    Example:

    docker tag image-id testacr.azurecr.io/base:node8

    Push the image to the Azure Container Registry:Example:

    docker push testacr.azurecr.io/base:node8

    Microsoft does provide a GUI option to create the ACR.

    • List Images in the Registry:

    Example:

    az acr repository list --name testacr --output table

    • List tags for the Images:

    Example:

    az acr repository show-tags --name testacr --repository <name> --output table

    • How to use the ACR image in Kubernetes deployment: Use the login Server Name + the image name

    Example :

    containers:- 
    name: demo
    image: testacr.azurecr.io/base:node8

    Azure Kubernetes Service

    Microsoft released the public preview of Managed Kubernetes for Azure Container Service (AKS) on October 24, 2017. This service simplifies the deployment, management, and operations of Kubernetes. It features an Azure-hosted control plane, automated upgrades, self-healing, easy scaling.

    Similarly to Google AKE and Amazon EKS, this new service will allow access to the nodes only and the master will be managed by Cloud Provider. For more information visit the following link.

    Let’s now get our hands dirty and deploy an AKS infrastructure to play with:

    • Enable AKS preview for your Azure Subscription: At the time of writing this blog, AKS is in preview mode, it requires a feature flag on your subscription.
    az provider register -n Microsoft.ContainerService

    • Kubernetes Cluster Creation Command: Note: A new separate resource group should be created for the Kubernetes service.Since the service is in preview, it is available only to certain regions.

    Make sure you create a resource group under the following regions.

    eastus, westeurope, centralus, canadacentral, canadaeast
    az  group create  --name  <RESOURCE-GROUP>   --location eastus
    Example : az group create --name aks-rg --location eastus
    az aks create --resource-group <RESOURCE-GROUP-NAME> --name <CLUSTER-NAME>   --node-count 2 --generate-ssh-keys
    Example : az aks create --resource-group aks-rg --name akscluster  --node-count 2 --generate-ssh-keys

    Example with different arguments :

    Create a Kubernetes cluster with a specific version.

    az aks create -g MyResourceGroup -n MyManagedCluster --kubernetes-version 1.8.1

    Create a Kubernetes cluster with a larger node pool.

    az aks create -g MyResourceGroup -n MyManagedCluster --node-count 7

    Install the Kubectl CLI :

    To connect to the kubernetes cluster from the client computer Kubectl command line client is required.

    sudo az aks install-cli

    Note: If you’re using Azure CloudShell, kubectl is already installed. If you want to install it locally, run the above  command:

    • To configure kubectl to connect to your Kubernetes cluster :
    az aks get-credentials --resource-group=<RESOURCE-GROUP-NAME> --name=<CLUSTER-NAME>

    Example :

    CODE: <a href="https://gist.github.com/velotiotech/ac40b6014a435271f49ca0e3779e800f">https://gist.github.com/velotiotech/ac40b6014a435271f49ca0e3779e800f</a>.js

    • Verify the connection to the cluster :
    kubectl get nodes -o wide 

    • For all the command line features available for Azure check the link: https://docs.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest

    We had encountered a few issues while setting up the AKS cluster at the time of writing this blog. Listing them along with the workaround/fix:

    az aks create --resource-group aks-rg --name akscluster  --node-count 2 --generate-ssh-keys

    Error: Operation failed with status: ‘Bad Request’.

    Details: Resource provider registrations Microsoft.Compute, Microsoft.Storage, Microsoft.Network are needed we need to enable them.

    Fix: If you are using the trial account, click on subscriptions and check whether the following providers are registered or not :

    • Microsoft.Compute
    • Microsoft.Storage
    • Microsoft.Network
    • Microsoft.ContainerRegistry
    • Microsoft.ContainerService

    Error: We had encountered the following mentioned open issues at the time of writing this blog.

    1. Issue-1
    2. Issue-2
    3. Issue-3

    Jenkins setup for CI/CD with ACR, AKS

    Microsoft provides a solution template which will install the latest stable Jenkins version on a Linux (Ubuntu 14.04 LTS) VM along with tools and plugins configured to work with Azure. This includes:

    • git for source control
    • Azure Credentials plugin for connecting securely
    • Azure VM Agents plugin for elastic build, test and continuous integration
    • Azure Storage plugin for storing artifacts
    • Azure CLI to deploy apps using scripts

    Refer the below link to bring up the Instance

    Pipeline plan for Spinning up a Nodejs Application using ACR – AKS – Jenkins

    What the pipeline accomplishes :

    Stage 1:

    The code gets pushed in the Github. The Jenkins job gets triggered automatically. The Dockerfile is checked out from Github.

    Stage 2:

    Docker builds an image from the Dockerfile and then the image is tagged with the build number.Additionally, the latest tag is also attached to the image for the containers to use.

    Stage 3:

    We have default deployment and service YAML files stored on the Jenkins server. Jenkins makes a copy of the default YAML files, make the necessary changes according to the build and put them in a separate folder.

    Stage 4:

    kubectl was initially configured at the time of setting up AKS on the Jenkins server. The YAML files are fed to the kubectl util which in turn creates pods and services.

    Sample Jenkins pipeline code :

    node {      
      // Mark the code checkout 'stage'....        
        stage('Checkout the dockefile from GitHub') {            
          git branch: 'docker-file', credentialsId: 'git_credentials', url: 'https://gitlab.com/demo.git'        
        }        
        // Build and Deploy to ACR 'stage'...        
        stage('Build the Image and Push to Azure Container Registry') {                
          app = docker.build('testacr.azurecr.io/demo')                
          withDockerRegistry([credentialsId: 'acr_credentials', url: 'https://testacr.azurecr.io']) {                
          app.push("${env.BUILD_NUMBER}")                
          app.push('latest')                
          }        
         }        
         stage('Build the Kubernetes YAML Files for New App') {
    <The code here will differ depending on the YAMLs used for the application>        
      }        
      stage('Delpoying the App on Azure Kubernetes Service') {            
        app = docker.image('testacr.azurecr.io/demo:latest')            
        withDockerRegistry([credentialsId: 'acr_credentials', url: 'https://testacr.azurecr.io']) {            
        app.pull()            
        sh "kubectl create -f ."            
        }       
       }    
    }

    What we achieved:

    • We managed to create a private Docker registry on Azure using the ACR feature using az-cli 2.0.25.
    • Secondly, we were able to spin up a private Kubernetes cluster on Azure with 2 nodes.
    • Setup Up Jenkins using a pre-cooked template which had all the plugins necessary for communication with ACR and AKS.
    • Orchestrate  a Continuous Deployment pipeline in Jenkins which uses docker features.
  • Extending Kubernetes APIs with Custom Resource Definitions (CRDs)

    Introduction:

    Custom resources definition (CRD) is a powerful feature introduced in Kubernetes 1.7 which enables users to add their own/custom objects to the Kubernetes cluster and use it like any other native Kubernetes objects. In this blog post, we will see how we can add a custom resource to a Kubernetes cluster using the command line as well as using the Golang client library thus also learning how to programmatically interact with a Kubernetes cluster.

    What is a Custom Resource Definition (CRD)?

    In the Kubernetes API, every resource is an endpoint to store API objects of certain kind. For example, the built-in service resource contains a collection of service objects. The standard Kubernetes distribution ships with many inbuilt API objects/resources. CRD comes into picture when we want to introduce our own object into the Kubernetes cluster to full fill our requirements. Once we create a CRD in Kubernetes we can use it like any other native Kubernetes object thus leveraging all the features of Kubernetes like its CLI, security, API services, RBAC etc.

    The custom resource created is also stored in the etcd cluster with proper replication and lifecycle management. CRD allows us to use all the functionalities provided by a Kubernetes cluster for our custom objects and saves us the overhead of implementing them on our own.

    How to register a CRD using command line interface (CLI)

    Step-1: Create a CRD definition file sslconfig-crd.yaml

    apiVersion: "apiextensions.k8s.io/v1beta1"
    kind: "CustomResourceDefinition"
    metadata:
      name: "sslconfigs.blog.velotio.com"
    spec:
      group: "blog.velotio.com"
      version: "v1alpha1"
      scope: "Namespaced"
      names:
        plural: "sslconfigs"
        singular: "sslconfig"
        kind: "SslConfig"
      validation:
        openAPIV3Schema:
          required: ["spec"]
          properties:
            spec:
              required: ["cert","key","domain"]
              properties:
                cert:
                  type: "string"
                  minimum: 1
                key:
                  type: "string"
                  minimum: 1
                domain:
                  type: "string"
                  minimum: 1 

    Here we are creating a custom resource definition for an object of kind SslConfig. This object allows us to store the SSL configuration information for a domain. As we can see under the validation section specifying the cert, key and the domain are mandatory for creating objects of this kind, along with this we can store other information like the provider of the certificate etc. The name metadata that we specify must be spec.names.plural+”.”+spec.group.

    An API group (blog.velotio.com here) is a collection of API objects which are logically related to each other. We have also specified version for our custom objects (spec.version), as the definition of the object is expected to change/evolve in future so it’s better to start with alpha so that the users of the object knows that the definition might change later. In the scope, we have specified Namespaced, by default a custom resource name is clustered scoped. 

    # kubectl create -f crd.yaml
    # kubectl get crd NAME AGE sslconfigs.blog.velotio.com 5s

    Step-2:  Create objects using the definition we created above

    apiVersion: "blog.velotio.com/v1alpha1"
    kind: "SslConfig"
    metadata:
      name: "sslconfig-velotio.com"
    spec:
      cert: "my cert file"
      key : "my private  key"
      domain: "*.velotio.com"
      provider: "digicert"

    # kubectl create -f crd-obj.yaml
    # kubectl get sslconfig NAME AGE sslconfig-velotio.com 12s

    Along with the mandatory fields cert, key and domain, we have also stored the information of the provider ( certifying authority ) of the cert.

    How to register a CRD programmatically using client-go

    Client-go project provides us with packages using which we can easily create go client and access the Kubernetes cluster.  For creating a client first we need to create a connection with the API server.
    How we connect to the API server depends on whether we will be accessing it from within the cluster (our code running in the Kubernetes cluster itself) or if our code is running outside the cluster (locally)

    If the code is running outside the cluster then we need to provide either the path of the config file or URL of the Kubernetes proxy server running on the cluster.

    kubeconfig := filepath.Join(
    os.Getenv("HOME"), ".kube", "config",
    )
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil {
    log.Fatal(err)
    }

    OR

    var (
    // Set during build
    version string
    
    proxyURL = flag.String("proxy", "",
    `If specified, it is assumed that a kubctl proxy server is running on the
    given url and creates a proxy client. In case it is not given InCluster
    kubernetes setup will be used`)
    )
    if *proxyURL != "" {
    config, err = clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
    &clientcmd.ClientConfigLoadingRules{},
    &clientcmd.ConfigOverrides{
    ClusterInfo: clientcmdapi.Cluster{
    Server: *proxyURL,
    },
    }).ClientConfig()
    if err != nil {
    glog.Fatalf("error creating client configuration: %v", err)
    }

    When the code is to be run as a part of the cluster then we can simply use

    import "k8s.io/client-go/rest"  ...  rest.InClusterConfig() 

    Once the connection is established we can use it to create clientset. For accessing Kubernetes objects, generally the clientset from the client-go project is used, but for CRD related operations we need to use the clientset from apiextensions-apiserver project

    apiextension “k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset”

    kubeClient, err := apiextension.NewForConfig(config)
    if err != nil {
    glog.Fatalf("Failed to create client: %v.", err)
    }

    Now we can use the client to make the API call which will create the CRD for us.

    package v1alpha1
    
    import (
    	"reflect"
    
    	apiextensionv1beta1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1beta1"
    	apiextension "k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset"
    	apierrors "k8s.io/apimachinery/pkg/api/errors"
    	meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    )
    
    const (
    	CRDPlural   string = "sslconfigs"
    	CRDGroup    string = "blog.velotio.com"
    	CRDVersion  string = "v1alpha1"
    	FullCRDName string = CRDPlural + "." + CRDGroup
    )
    
    func CreateCRD(clientset apiextension.Interface) error {
    	crd := &apiextensionv1beta1.CustomResourceDefinition{
    		ObjectMeta: meta_v1.ObjectMeta{Name: FullCRDName},
    		Spec: apiextensionv1beta1.CustomResourceDefinitionSpec{
    			Group:   CRDGroup,
    			Version: CRDVersion,
    			Scope:   apiextensionv1beta1.NamespaceScoped,
    			Names: apiextensionv1beta1.CustomResourceDefinitionNames{
    				Plural: CRDPlural,
    				Kind:   reflect.TypeOf(SslConfig{}).Name(),
    			},
    		},
    	}
    
    	_, err := clientset.ApiextensionsV1beta1().CustomResourceDefinitions().Create(crd)
    	if err != nil && apierrors.IsAlreadyExists(err) {
    		return nil
    	}
    	return err
    }

    In the create CRD function, we first create the definition of our custom object and then pass it to the create method which creates it in our cluster. Just like we did while creating our definition using CLI, here also we set the parameters like version, group, kind etc.

    Once our definition is ready we can create objects of its type just like we did earlier using the CLI. First we need to define our object.

    package v1alpha1
    
    import meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    
    type SslConfig struct {
    	meta_v1.TypeMeta   `json:",inline"`
    	meta_v1.ObjectMeta `json:"metadata"`
    	Spec               SslConfigSpec   `json:"spec"`
    	Status             SslConfigStatus `json:"status,omitempty"`
    }
    type SslConfigSpec struct {
    	Cert   string `json:"cert"`
    	Key    string `json:"key"`
    	Domain string `json:"domain"`
    }
    
    type SslConfigStatus struct {
    	State   string `json:"state,omitempty"`
    	Message string `json:"message,omitempty"`
    }
    
    type SslConfigList struct {
    	meta_v1.TypeMeta `json:",inline"`
    	meta_v1.ListMeta `json:"metadata"`
    	Items            []SslConfig `json:"items"`
    }

    Kubernetes API conventions suggests that each object must have two nested object fields that govern the object’s configuration: the object spec and the object status. Objects must also have metadata associated with them. The custom objects that we define here comply with these standards. It is also recommended to create a list type for every type thus we have also created a SslConfigList struct.

    Now we need to write a function which will create a custom client which is aware of the new resource that we have created.

    package v1alpha1
    
    import (
    	meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/apimachinery/pkg/runtime"
    	"k8s.io/apimachinery/pkg/runtime/schema"
    	"k8s.io/apimachinery/pkg/runtime/serializer"
    	"k8s.io/client-go/rest"
    )
    
    var SchemeGroupVersion = schema.GroupVersion{Group: CRDGroup, Version: CRDVersion}
    
    func addKnownTypes(scheme *runtime.Scheme) error {
    	scheme.AddKnownTypes(SchemeGroupVersion,
    		&SslConfig{},
    		&SslConfigList{},
    	)
    	meta_v1.AddToGroupVersion(scheme, SchemeGroupVersion)
    	return nil
    }
    
    func NewClient(cfg *rest.Config) (*SslConfigV1Alpha1Client, error) {
    	scheme := runtime.NewScheme()
    	SchemeBuilder := runtime.NewSchemeBuilder(addKnownTypes)
    	if err := SchemeBuilder.AddToScheme(scheme); err != nil {
    		return nil, err
    	}
    	config := *cfg
    	config.GroupVersion = &SchemeGroupVersion
    	config.APIPath = "/apis"
    	config.ContentType = runtime.ContentTypeJSON
    	config.NegotiatedSerializer = serializer.DirectCodecFactory{CodecFactory: serializer.NewCodecFactory(scheme)}
    	client, err := rest.RESTClientFor(&config)
    	if err != nil {
    		return nil, err
    	}
    	return &SslConfigV1Alpha1Client{restClient: client}, nil
    }

    Building the custom client library

    Once we have registered our custom resource definition with the Kubernetes cluster we can create objects of its type using the Kubernetes cli as we did earlier but for creating controllers for these objects or for developing some custom functionalities around them we need to build a client library also using which we can access them from go API. For native Kubernetes objects, this type of library is provided for each object.

    package v1alpha1
    
    import (
    	meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/client-go/rest"
    )
    
    func (c *SslConfigV1Alpha1Client) SslConfigs(namespace string) SslConfigInterface {
    	return &sslConfigclient{
    		client: c.restClient,
    		ns:     namespace,
    	}
    }
    
    type SslConfigV1Alpha1Client struct {
    	restClient rest.Interface
    }
    
    type SslConfigInterface interface {
    	Create(obj *SslConfig) (*SslConfig, error)
    	Update(obj *SslConfig) (*SslConfig, error)
    	Delete(name string, options *meta_v1.DeleteOptions) error
    	Get(name string) (*SslConfig, error)
    }
    
    type sslConfigclient struct {
    	client rest.Interface
    	ns     string
    }
    
    func (c *sslConfigclient) Create(obj *SslConfig) (*SslConfig, error) {
    	result := &SslConfig{}
    	err := c.client.Post().
    		Namespace(c.ns).Resource("sslconfigs").
    		Body(obj).Do().Into(result)
    	return result, err
    }
    
    func (c *sslConfigclient) Update(obj *SslConfig) (*SslConfig, error) {
    	result := &SslConfig{}
    	err := c.client.Put().
    		Namespace(c.ns).Resource("sslconfigs").
    		Body(obj).Do().Into(result)
    	return result, err
    }
    
    func (c *sslConfigclient) Delete(name string, options *meta_v1.DeleteOptions) error {
    	return c.client.Delete().
    		Namespace(c.ns).Resource("sslconfigs").
    		Name(name).Body(options).Do().
    		Error()
    }
    
    func (c *sslConfigclient) Get(name string) (*SslConfig, error) {
    	result := &SslConfig{}
    	err := c.client.Get().
    		Namespace(c.ns).Resource("sslconfigs").
    		Name(name).Do().Into(result)
    	return result, err
    }

    We can add more methods like watch, update status etc. Their implementation will also be similar to the methods we have defined above. For looking at the methods available for various Kubernetes objects like pod, node etc. we can refer to the v1 package.

    Putting all things together

    Now in our main function we will get all the things together.

    package main
    
    import (
    	"flag"
    	"fmt"
    	"time"
    
    	"blog.velotio.com/crd-blog/v1alpha1"
    	"github.com/golang/glog"
    	apiextension "k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset"
    	meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/client-go/rest"
    	"k8s.io/client-go/tools/clientcmd"
    	clientcmdapi "k8s.io/client-go/tools/clientcmd/api"
    )
    
    var (
    	// Set during build
    	version string
    
    	proxyURL = flag.String("proxy", "",
    		`If specified, it is assumed that a kubctl proxy server is running on the
    		given url and creates a proxy client. In case it is not given InCluster
    		kubernetes setup will be used`)
    )
    
    func main() {
    
    	flag.Parse()
    	var err error
    
    	var config *rest.Config
    	if *proxyURL != "" {
    		config, err = clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
    			&clientcmd.ClientConfigLoadingRules{},
    			&clientcmd.ConfigOverrides{
    				ClusterInfo: clientcmdapi.Cluster{
    					Server: *proxyURL,
    				},
    			}).ClientConfig()
    		if err != nil {
    			glog.Fatalf("error creating client configuration: %v", err)
    		}
    	} else {
    		if config, err = rest.InClusterConfig(); err != nil {
    			glog.Fatalf("error creating client configuration: %v", err)
    		}
    	}
    
    	kubeClient, err := apiextension.NewForConfig(config)
    	if err != nil {
    		glog.Fatalf("Failed to create client: %v", err)
    	}
    	// Create the CRD
    	err = v1alpha1.CreateCRD(kubeClient)
    	if err != nil {
    		glog.Fatalf("Failed to create crd: %v", err)
    	}
    
    	// Wait for the CRD to be created before we use it.
    	time.Sleep(5 * time.Second)
    
    	// Create a new clientset which include our CRD schema
    	crdclient, err := v1alpha1.NewClient(config)
    	if err != nil {
    		panic(err)
    	}
    
    	// Create a new SslConfig object
    
    	SslConfig := &v1alpha1.SslConfig{
    		ObjectMeta: meta_v1.ObjectMeta{
    			Name:   "sslconfigobj",
    			Labels: map[string]string{"mylabel": "test"},
    		},
    		Spec: v1alpha1.SslConfigSpec{
    			Cert:   "my-cert",
    			Key:    "my-key",
    			Domain: "*.velotio.com",
    		},
    		Status: v1alpha1.SslConfigStatus{
    			State:   "created",
    			Message: "Created, not processed yet",
    		},
    	}
    	// Create the SslConfig object we create above in the k8s cluster
    	resp, err := crdclient.SslConfigs("default").Create(SslConfig)
    	if err != nil {
    		fmt.Printf("error while creating object: %vn", err)
    	} else {
    		fmt.Printf("object created: %vn", resp)
    	}
    
    	obj, err := crdclient.SslConfigs("default").Get(SslConfig.ObjectMeta.Name)
    	if err != nil {
    		glog.Infof("error while getting the object %vn", err)
    	}
    	fmt.Printf("SslConfig Objects Found: n%vn", obj)
    	select {}
    }

    Now if we run our code then our custom resource definition will get created in the Kubernetes cluster and also an object of its type will be there just like with the cli. The docker image akash125/crdblog is build using the code discussed above it can be directly pulled from docker hub and run in a Kubernetes cluster. After the image is run successfully, the CRD definition that we discussed above will get created in the cluster along with an object of its type. We can verify the same using the CLI the way we did earlier, we can also check the logs of the pod running the docker image to verify it. The complete code is available here.

    Conclusion and future work

    We learned how to create a custom resource definition and objects using Kubernetes command line interface as well as the Golang client. We also learned how to programmatically access a Kubernetes cluster, using which we can build some really cool stuff on Kubernetes, we can now also create custom controllers for our resources which continuously watches the cluster for various life cycle events of our object and takes desired action accordingly. To read more about CRD refer the following links:

  • Tutorial: Developing Complex Plugins for Jenkins

    Introduction

    Recently, I needed to develop a complex Jenkins plug-in for a customer in the containers & DevOps space. In this process, I realized that there is lack of good documentation on Jenkins plugin development and good information is very hard to find. That’s why I decided to write this blog to share my knowledge on Jenkins plugin development.

    Topics covered in this Blog

    1. Setting up the development environment
    2. Jenkins plugin architecture: Plugin classes and understanding of the source code.
    3. Complex tasks: Tasks like the integration of REST API in the plugin and exposing environment variables through source code.
    4. Plugin debugging and deployment

    So let’s start, shall we?

    1. Setting up the development environment

    I have used Ubuntu 16.04 for this environment, but the steps remain identical for other flavors. The only difference will be in the commands used for each operating system.

    Let me give you a brief list of the requirements:

    1. Compatible JDK: Jenkins plugin development is done in Java. Thus a compatible JDK is what you need first. JDK 6 and above are supported as per the Jenkins documentation.
    2. Maven: Installation guide. I know many of us don’t like to use Maven, as it downloads stuff over the Internet at runtime but it’s required. Check this to understand why using Maven is a good idea.
    3. Jenkins: Check this Installation Guide. Obviously, you would need a Jenkins setup – can be local on hosted on a server/VM.
    4. IDE for development: An IDE like Netbeans, Eclipse or IntelliJ IDEA is preferred. I have used Netbeans 8.1 for this project.

    Before going forward, please ensure that you have the above prerequisites installed on your system. Jenkins does have official documentation for setting up the environment – Check this. If you would like to use an IDE besides Netbeans, the above document covers that too.

    Let’s start with the creation of your project. I will explain with Maven commands and with use of the IDE as well.

    First, let’s start with the approach of using commands.

    It may be helpful to add the following to your ~/.m2/settings.xml (Windows users will find them in %USERPROFILE%.m2settings.xml):

    <settings>
     <pluginGroups>
       <pluginGroup>org.jenkins-ci.tools</pluginGroup>
     </pluginGroups>
    
    <profiles>
       <!-- Give access to Jenkins plugins -->
       <profile>
         <id>jenkins</id>
         <activation>
           <activeByDefault>true</activeByDefault> <!-- change this to false, if you don't like to have it on per default -->
         </activation>
    
         <repositories>
           <repository>
             <id>repo.jenkins-ci.org</id>
             <url>http://repo.jenkins-ci.org/public/</url>
           </repository>
         </repositories>
         
         <pluginRepositories>
           <pluginRepository>
             <id>repo.jenkins-ci.org</id>
             <url>http://repo.jenkins-ci.org/public/</url>
           </pluginRepository>
         </pluginRepositories>
       </profile>
     </profiles>
     
     <mirrors>
       <mirror>
         <id>repo.jenkins-ci.org</id>
         <url>http://repo.jenkins-ci.org/public/</url>
         <mirrorOf>m.g.o-public</mirrorOf>
       </mirror>
     </mirrors>
    </settings>

    This basically lets you use short names in commands e.g. instead of org.jenkins-ci.tools:maven-hpi-plugin:1.61:create, you can use hpi:create. hpi is the packaging style used to deploy the plugins.

    Create the plugin

    $ mvn -U org.jenkins-ci.tools:maven-hpi-plugin:create


    This will ask you a few questions, like the groupId (the Maven jargon for the package name) and the artifactId (the Maven jargon for your project name), then create a skeleton plugin from which you can start with. This command should create the sample HelloWorldBuilder plugin.

    Command Explanation:

    • -U: Maven needs to update the relevant Maven plugins (check plugin updates).
    • hpi: this prefix specifies that the Jenkins HPI Plugin is being invoked, a plugin that supports the development of Jenkins plugins.
    • create is the goal which creates the directory layout and the POM for your new Jenkins plugin and it adds it to the module list.

    Source code tree would be like this:

    Your Project Name    
      Pom.xml      
        Src          
          Main              
            Java                  
              package folder(usually consist of groupId and artifactId)                      
                HelloWorldBuilder.java              
          Resources                  
              Package folder/HelloWorldBuilder/jelly files

    Run “mvn package” which compiles all sources, runs the tests and creates a package – when used by the HPI plugin it will create an *.hpi file.

    Building the Plugin:

    Run mvn install in the directory where pom.xml resides. This is similar to mvn package command but at the end, it will create your plugins .hpi file which you can deploy. Simply copy the create .hpi file and paste to /plugins folder of your Jenkins setup. Restart your Jenkins and you should see the plugin on Jenkins.

    Now let’s see how this can be done with IDE.

    With Netbeans IDE:

    I have used Netbeans for development(Download). Check with the JDK version. Latest version 8.2 works with JDK 8. Once you install Netbeans, install NetBeans plugin for Jenkins/Stapler development.

    You can now create plugin via New Project » Maven » Jenkins Plugin.

    This is the same as “mvn -U org.jenkins-ci.tools:maven-hpi-plugin:create” command which should create the simple “HelloWorldBuilder” application.

    Netbeans comes with Maven built-in so even if you don’t have Maven installed on your system this should work. But you may face error accessing the Jenkins repo. Remember we added some configuration settings in settings.xml in the very first step. Yes, if you have added that already then you shouldn’t face any problem but if you haven’t added that you can add that in Netbeans Maven settings.xml which you can find at: netbeans_installation_path/java/maven/conf/settings.xml

    Now you have your “HelloWorldBuilder” application ready.  This is shown as TODO plugin in Netbeans. Simply run it(F6). This creates the Jenkins instance and runs it on 8080 port. Now, if you already have local Jenkins setup then you need to stop it otherwise this will give you an exception. Go to localhost:8080/jenkins and create a simple job. In “Add Build Step” you should see “Say Hello World” plugin already there.

    Now how it got there and the source code explanation is next.

    2. Jenkins plugin architecture and understanding

    Now that we have our sample HelloWorldBuilder plugin ready,  let’s see its components.

    As you may know, Jenkins plugin has two parts: Build Step and Post Build Step. This sample application is designed to work for Build step and that’s why you see “Say Hello world” plugin in Build step. I am going to cover Build Step itself.

    Do you want to develop Post Build plugin? Don’t worry as these two don’t have much difference. The difference is only in the classes which we extend. For Build step, we extend “hudson.tasks.Builder” and for Post Build “hudson.tasks.Recorder” and with Descriptor class for Build step “BuildStepDescriptor<builder></builder>” for Post Build “BuildStepDescriptor<publisher></publisher>”.

    We will go through these classes in detail below:

    hudson.tasks.Builder Class:

    In brief, this simply tells Jenkins that you are writing a Build Step plugin. A full explanation is here. Now you will see “perform” method once you override this class.

    @Override
    public boolean perform(AbstractBuild build, Launcher launcher, BuildListener listener)

    Note that we are not implementing the ”SimpleBuildStep” interface which is there in HelloWorldBuilder source code. Perform method for that Interface is a  bit different from what I have given above. My explanation goes around this perform method.

    The perform method is basically called when you run your Build. If you see the Parameters passed you have full control over the Build configured, you can log to Jenkins console screen using listener object. What you should do here is access the values set by the user on UI and perform the plugin activity. Note that this method is returning a boolean, True means build is Successful and False is Build Failed.

    Understanding the Descriptor Class:  

    You will notice there is a static inner class in your main class named as DescriptorImpl. This class is basically used for handling configuration of your Plugin. When you click on “Configure” link on Jenkins it basically calls this method and loads the configured data.

    You can perform validations here, save the global configuration and many things. We will see these in detail as when required. Now there is an overridden method:

    @Override
    public String getDisplayName() {
    return "Say Hello World";
    }

    That’s why we see “Say Hello World” in the Build Step. You can rename it to what your plugin does.

    @Override
    public boolean configure(StaplerRequest req, JSONObject formData) throws FormException {
    // To persist global configuration information,
    // set that to properties and call save().
    useFrench = formData.getBoolean("useFrench");
    // ^Can also use req.bindJSON(this, formData);
    //(easier when there are many fields; need set* methods for this, like setUseFrench)
    save();
    return super.configure(req,formData);
    }

    This method basically saves your configuration, or you can even get global data like we have taken “useFrench” attribute which can be set from Jenkins global configuration. If you would like to set any global parameter you can place them in the global.jelly file.

    Understanding Action class and jelly files:

    To understand the main Action class and what it’s purpose is, let’s first understand the jelly files.

    There are two main jelly files: config.jelly and global.jelly. The global.jelly file is used to set global parameters while config.jelly is used for local parameters configuration. Jenkins uses these jelly files to show the parameters or fields on UI. So anything you write in config.jelly will show up on Jobs configuration page as configurable.

    <f:entry title="Name" field="name">
    <f:textbox />
    </f:entry>

    This is what is there in our HelloWorldBuilder application. It simply renders a textbox for entering name.

    Jelly has its own syntax and supports HTML and Javascript as well. It has radio buttons, checkboxes, dropdown lists and so on.

    How does Jenkins manage to pull the data set by the user? This is where our Action class comes into the picture. If you see the structure of the sample application, it has a private field as name and a constructor.

    @DataBoundConstructor
    public HelloWorldBuilder(String name) {
    this.name = name;
    }

    This DataBoundConstructor annotation tells Jenkins to bind the value of jelly fields. If you notice there’s field as “name” in jelly and the same is used here to put the data. Note that, whatever name you set in field attribute of jelly same you should use here as they are tightly coupled.

    Also, add getters for these fields so that Jenkins can access the values.

    @Override
    public DescriptorImpl getDescriptor() {
    return (DescriptorImpl)super.getDescriptor();
    }

    This method gives you the instance of Descriptor class. So if you want to access methods or properties of Descriptor class in your Action class you can use this.

    3. Complex tasks:

    We now have a good idea on how the Jenkins plugin structure is and how it works. Now let’s start with some complex stuff.

    On the internet, there are examples on how to render a selection box(drop-down) with static data. What if you want to load in a dynamic manner? I came with the below solution. We will use Amazon’s publicly available REST API for getting the coupons and load that data in the selection box.

    Here, the objective is to load the data in the selection box. I have the response for REST API as below:

    "offers" : {    
      "AmazonChimeDialin" : {      
        "offerCode" : "AmazonChimeDialin",      
        "versionIndexUrl" : "/offers/v1.0/aws/AmazonChimeDialin/index.json",      
        "currentVersionUrl" : "/offers/v1.0/aws/AmazonChimeDialin/current/index.json",     
        "currentRegionIndexUrl" : "/offers/v1.0/aws/AmazonChimeDialin/current/region_index.json"    
       },    
       "mobileanalytics" : {      
        "offerCode" : "mobileanalytics",      
        "versionIndexUrl" : "/offers/v1.0/aws/mobileanalytics/index.json",      
        "currentVersionUrl" : "/offers/v1.0/aws/mobileanalytics/current/index.json",      
        "currentRegionIndexUrl" : "/offers/v1.0/aws/mobileanalytics/current/region_index.json"    
        }
        }

    I have taken all these offers and created one dictionary and rendered it on UI. Thus the user will see the list of coupon codes and can choose anyone of them.

    Let’s understand how to create the selection box and load the data into it.

    <f:entry title="select Offer From Amazon" field="getOffer">   
     <f:select id="offer-${editorId}" onfocus="getOffers(this.id)"/>  
     </f:entry>

    This is the code which will generate the selection box on configuration page.  Now you will see here “getOffer” field means there’s field with the same name in the Action class.

    When you create any selection box, Jenkins needs doFill{fieldname}Items method in your descriptor class. As we have seen, Descriptor class is configuration class it tries to load the data from this method when you click on the configuration of the job. So in this case, “doFillGetOfferItems” method is required.

    After this, selection box should pop up on the configuration page of your plugin.

    Now here as we need to do dynamic actions, we will perform some action and will load the data.

    As an example, we will click on the button and load the data in Selection Box.

    <f:validateButton title="Get Amazon Offers" progress="Fetching Offers..."method="getAmazonOffers"/>

    Above is the code to create a button. In method attribute, specify the backend method which should be present in your Descriptor class. So when you click on this button “getAmazonOffers” method will get called at the backend and it will get the data from API.

    Now when we click on the selection box, we need to show the contents. As I said earlier, Jelly does support HTML and Javascript. Yes, if you want to do dynamic action use Javascript simply. If you see in selection box code of jelly I have used onfocus() method of Javascript which is pointing to getOffers() function.

    Now you need to have this function, define script tag like this.

    <script> 
    function getOffers(){ 
    }
    </script>

    Now here get the data from backend and load it in the selection box. To do this we need to understand some objects of Jenkins.

    1. Descriptor: As you now know, Descriptor is configuration class this object points to. So from jelly at any point, you can call the method from your Descriptor class.
    2. Instance: This is the object currently being configured on the configuration page. Null if it’s a newly added instance. Means by using this you can call the methods from your Action class. Like getters of field attribute.

    Now how to use these objects? To use you need to first set them.

    <st:bind var="backend" value="${descriptor}"/>

    Here you are binding descriptor object to backend variable and this variable is now ready for use anywhere in config.jelly.  Similarly for instance, <st:bind var=”backend” value=”${instance}”>.</st:bind>

    To make calls use backend.{backend method name}() and it should call your backend method.

    But if you are using this from JavaScript then you need use @JavaScriptMethod annotation over the method being called.

    We can now get the REST data from backend function in JavaScript and to load the data into the element you can use the document object of JavaScript.

    E.g. var selection = document.getElementById(“element-id”); This part is normal Javascript.

    So after clicking on “Get Amazon Offers” button and clicking on Selection box it should now load the data.

    Multiple Plugin Instance: If we are creating a multiple Build Step plugin then you can create multiple instances of your plugin while configuring it. If you try to do what we have done up till now, it will fail to load the data in the second instance. This is because the same element already exists on the UI with the same id. JavaScript will get confused while putting the data. We need to have a mechanism to create the different ids of the same fields.

    I thought of one approach for this. Get the index from backend while configuring the fields and add as a suffix in id attribute.

    @JavaScriptMethod
    public synchronized String createEditorId() {
    return String.valueOf(lastEditorId++);
    }

    This is the method which just returns the id+1 each time it gets called. You know now how to call backend methods from Jelly.

    <j:set var="editorId" value="${descriptor.createEditorId()}" />

    In this manner, we set the ID value in variable “editorId” and this can be used while creation of fields.

    (Check out the selection box creation code above. I have appended this variable in ID attribute)

    Now create as many instances you want in configuration page it should work fine.

    Exposing Environment Variables:

    Environment variables are needed quite often in Jenkins. Your plugin may require the support of some environment variables or the use of the built-in environment variables provided by Jenkins.

    First, you need to create the Envvars object.

    EnvVars envVars = new EnvVars();
    ** Assign it to the build environment.
    envVars = build.getEnvironment(listener);
    ** Put the values which you wanted to expose as environment variable.
    envVars.put("offer", getOffer);

    If you print this then you will get all the default Jenkins environment variables as well as variables which you have exposed. Using this you can even use third party plugins like “Parameterized Trigger Plugin” to export the current build’s environment variable to different jobs.You can even get the value of any environment variable using this.

    4. Plugin Debugging and Deployment:

    You have now got an idea on how to write a plugin in Jenkins, now we move on to perform some complex tasks. We will see how to debug the issue and deploy the plugin. If you are using the IDE then debugging is same like you do for Java program similar to setting up the breakpoints and running the project.

    If you want to perform any validation on fields, in the configuration class you would need to have docheck{fieldname} method which will return FormValidation object. In this example, we are validating the “name” field from our sample “HelloWorldBuilder” example.

    public FormValidation doCheckName(@QueryParameter String value)
    throws IOException, ServletException {
    if (value.length() == 0)
    return FormValidation.error("Please set a name"); 
    if (value.length() < 4)
    return FormValidation.warning("Isn't the name too short?");
    return FormValidation.ok(); 
    }

    Plugin deployment:  

    We have now created the plugin, how are we going to deploy it? We have created the plugin using Netbeans IDE and as I said earlier if you want to deploy it on your local Jenkins setup you need to use the Maven command mvn install and copy .hpi to /plugins/ folder.

    But what if you want to deploy it on Jenkins Marketplace? Well, it’s a pretty long process and thankfully Jenkins has good documentation for it.

    In short, you need to have a jenkins-ci.org account. Your public Git repo will have the plugin source code. Raise an issue on JIRA to get space on their Git repo and in this operation, they will have forked your git repo. Finally, release the plugin using Maven. The above document explains well what exactly needs to be done.

    Conclusion:

    We went through the basics of Jenkins plugin development such as classes, configuration, and some complex tasks.

    Jenkins plugin development is not difficult, but I feel the poor documentation is what makes the task challenging. I have tried to cover my understanding while developing the plugin, however, it is advisable to create a plugin only if the required functionality does not already exist.

    Below are some important links on plugin development:

    1. Jenkins post build plugin development: This is a very good blog which covers things like setting up the environment, plugin classes and developing Post build action.
    2. Basic guide to use jelly: This covers how to use jelly files in Jenkins and attributes of jelly. 

    You can check the code of the sample application discussed in this blog here. I hope this helps you to build interesting Jenkins plugins. Happy Coding!!

  • Node.js vs Deno: Is Deno Really The Node.js Alternative We All Didn’t Know We Needed?

    Ryan Dahl gave an interesting talk at JSConf EU in 2018 on the 10 regrets he had after creating Node.js. He spoke about the flaws that developers don’t usually think about, such as how the entire package management was an afterthought. In addition, he was also not completely comfortable with the association of npm for package management—or how he might have jumped early to async/await, ignoring some potential advantages of promises

    However, the thing that caught most people’s attention was his pet project (Deno), which he started to answer most of these issues. The project came as no surprise since no one discusses problems at length unless they are planning to solve them.

    What first appeared to be a clever play on the word “node” turned out to be much more than that. Dahl was trying to make a secure V8 runtime with TypeScript and module management that was more in-line with what we already have on the front-end. This goal was his original focus—but fast-forward to May 2020 and, Deno 1.0 was launching with many improvements. Let’s see what it is all about.

    What is Deno?

    Let’s start with a simple explanation: Deno executes TypeScript on your system like Node.js executes JavaScript. Just like Node.js, you use it to code async desktop apps and servers. The first visible difference is that you will be coding in TypeScript. However, if you can easily integrate the TypeScript compiler into Node.js for static type-checking, then why should you use Deno? First of all, Deno isn’t just a combination of Node.js and TypeScript—it’s an entirely new system designed from scratch. Below is a high-level comparison of Node.js and Deno. Bear in mind that these are just paper specs.

    As you can see, on paper, Deno seems promising and future-proof, but let’s take a more in-depth look.

    Let’s Install Deno

    As Deno is new, it intends to avoid a lot of things that add complexity to the Node ecosystem. You just need to run the following command to install; these examples were done in Linux:

    curl -fsSL https://deno.land/x/install/install.sh | sh

    This simple shell script downloads the Deno binary to a .deno directory in your home directory. That’s all. It’s a single binary without any dependencies, and it includes the V8 JavaScript engine, the TypeScript compiler, Rust binding crates, etc. After that, add the ‘deno’ binary to your PATH variable using:

    echo 'export DENO_INSTALL="/home/siddharth/.deno"' >> ~/.zshrc
    echo 'export PATH="$DENO_INSTALL/bin:$PATH"' >> ~/.zshrc

    Let’s Play

    Now that we’ve finished our installation, we can start executing TypeScript with it. Just use ‘deno run’ to execute a script.

    So, here is the “hello world” program as per the tradition:

    echo ‘console.log(“Hello Deno”)’ > script.ts && deno run script.ts

    It can also run a script from an URL. For instance, there is a “hello world” example in the standard library hosted here. You can directly run it by entering:

    deno run https://deno.land/std/examples/welcome.ts

    Similar to an HTML’s script tag, Deno can fetch and execute scripts from anywhere. You can do it in the code, too. There is no concept of a “module” like there is with npm modules. Deno just needs a URL that points to a valid TypeScript file, and it will run/import it. You will also see an import statement like:

    import { serve } from "https://deno.land/std@0.57.0/http/server.ts";

    This may seem counterintuitive and chaotic at first, but it makes a lot of sense. Since libraries/modules are just TypeScript files over the Internet, you don’t need a module system like npm to handle your dependencies. You don’t need a package.json file, either. Your project will not suddenly blow up if something goes wrong with npm’s registry.

    Going Further

    Let’s do something more meaningful. Here is a basic server using the HTTP server available with the standard library.

    import { serve } from "https://deno.land/std@0.57.0/http/server.ts";
    const server = serve({ port: 5000 });
    console.log("http://localhost:5000/");
    for await (const request of server) {  
      request.respond({ body: "Hello Worldn" });
    }

    Take notice of the URL import that we were talking about. Next, make a file named server.ts, input this code, and try to run it with:

    deno run server.ts

    We get this:

    error: Uncaught PermissionDenied: network access to "0.0.0.0:5000", run again with the --allow-net flag
    at unwrapResponse ($deno$/ops/dispatch_json.ts:43:11)
    at Object.sendSync ($deno$/ops/dispatch_json.ts:72:10)
    at Object.listen ($deno$/ops/net.ts:51:10)
    at listen ($deno$/net.ts:154:22)
    at serve (https://deno.land/std@0.57.0/http/server.ts:260:20)
    at file:///home/siddharth/server.ts:2:11

    This seems like a permission issue, which we should be able to solve by doing ‘sudo’, right? Well, not exactly. As it says, you will have to run it with a ‘–allow-net’ flag to manually permit it to access the network. Let’s try again with:

    deno run --allow-net server.ts

    Now it runs. So, what’s happening here? 

    Security

    This is another aspect that is completely missing in Node.js. Deno, by default, does not allow access to system resources like network, disk, etc. for any script. You have to explicitly give it permission for these resources which adds a layer of security and consent.

    If you are using a lesser-known library from a small developer, which we often do, you can always limit its scope to ensure that nothing shady is happening in the background. This also complements the “free and distributed” nature of Deno when it comes to adding dependencies. As there is no centralized authority to watch over and audit all the modules, everyone needs to have their own security tools.

    There are also different flags available, which provides granular control over a system’s resources, like “–allow-env” for accessing the environment. But if you trust the script entirely, or it is something you have written from scratch, you can use the ‘-A’ flag to give it to access all the resources.

    Plugins and Experimental Features

    Let’s look at an example of interacting with a database. Connecting a MongoDB instance with Deno is another common thing that developers usually do.

    For a MongoDB instance, let’s spin up a basic MongoDB container with:

    docker run --name some-mongo -p 27017:27017 -d mongo

    Now, when we make a file named database.ts and put the following code into it, it will create a simple document into a new collection named “cities”:

    import { MongoClient } from "https://deno.land/x/mongo@v0.8.0/mod.ts";
    
    const client = new MongoClient();
    client.connectWithUri("mongodb://localhost:27017");
    
    const db = client.database("test");
    const users = db.collection("cities");
    
    // insert
    const insertId = await users.insertOne({
      name: "Delhi",
      country: "India",
      population: 10000
    });
    console.log(`Successfully inserted with the id: ${JSON.stringify(insertId)}`)

    Now, as you can see, this looks pretty similar to Node.js code. In fact, most of the programming style remains the same, and you can follow similar patterns. Next, let’s run this to insert that document into our MongoDB container:

    deno run -A database.ts

    What happens is that you get an error that looks something like this:

    INFO load deno plugin "deno_mongo" from local "/home/siddharth/.deno_plugins/deno_mongo_2970fbc7cebff869aa12ecd5b8a1e7e4.so"
    error: Uncaught TypeError: Deno.openPlugin is not a function
    return Deno.openPlugin(localPath);
    ^
    at prepare (https://deno.land/x/plugin_prepare@v0.6.0/mod.ts:64:15)
    at async init (https://deno.land/x/mongo@v0.8.0/ts/util.ts:41:3)
    at async https://deno.land/x/mongo@v0.8.0/mod.ts:13:1

    This happened becuase we even gave the ‘-A’ flag which allowed access to all kinds of resources.

    Let’s rerun this with the ‘–unstable’ flag:

    deno run -A --unstable database.ts

    Now, this seems to run, and the output should look something like:

    INFO load deno plugin "deno_mongo" from local "/home/siddharth/.deno_plugins/deno_mongo_2970fbc7cebff869aa12ecd5b8a1e7e4.so"
    Successfully inserted with the id: {"$oid":"5ef0ba4a000d214a00c4367f"}

    This happens because the MongoDB driver uses some extra capabilities (“ops” to be precise). They are not present in the Deno’s runtime, so it adds a plugin. While Deno has a plugin system, the interface itself is not finalized and is hidden behind the ‘–unstable’ flag. By default, Deno doesn’t allow scripts to use unstable APIs, but again, there is this flag to force it.

    The Bigger Picture

    Why do we need a different take? Are the problems with Node so big that we need a new system? Well, no. Many people won’t even consider them to be problems, but there is a central idea behind Deno that makes its existence reasonable and design choices understandable:

    Node deviates significantly from the browser’s way of doing things.

    For example, take the permissions for when a website wants to record audio; the browser will ask the user to give consent. These kinds of permissions were absent from Node, but Deno brings them back.

    Also, regarding dependencies, a browser doesn’t understand a Node module; it just understands scripts that can be linked from anywhere around the web. Node.js is different. You have to make and publish a module so that it can be imported and reused globally. Take the fetch API, for example. To use fetch in Node, there is a different node-fetch module. Deno goes back to simple scripts and tries its best to do things similar to a browser.

    This is the overall theme even with the implementational details. Deno tries to be as close to the browser as possible so that there is minimal friction while porting libraries from front-end to back-end or vice-versa. This can be better in the long term.

    This All Looks Great, but I Have Several Questions

    Like every new take on an already-established system, Deno also raises several questions. Here are some answers to some common ones:

    If it runs TypeScript natively, then what about the speed? Node is fast because of V8.

    The important question is whether Deno actually ‘run’ TypeScript.

    Well, yes, but actually no.

    Deno executes TypeScript, but it also uses V8 to run the code. All the type checks are done before then, and at the runtime, it’s only JavaScript. Everything is abstracted from the developer’s side, and you don’t have to install and configure tsc.

    So yes, it’s fast because it also runs on V8, and there are no runtime types.

    The URL imports look ugly and fragile. What happens when the website of one of the dependencies goes down?

    The first thing is that Deno downloads and caches every dependency, and they recommend checking these with your project so that they are always available. 

    And if you don’t want to see URL imports in your code, you can do two things:

    1. Re-export the dependencies locally: To export the standard HTTP server locally, you can make a file named ‘local_http.ts’ with the following line ‘export { serve } from “https://deno.land/std@0.57.0/http/server.ts“‘ and then import from this file in the original code.

    2. Use an import map: Create a JSON file that maps the URLs to the name you want to use in code. So, create a file named ‘importmap.json’ and add the following content to it:

    {
      "imports": {
         "http/": "https://deno.land/std/http/"
      }
    }

    Now, you just need to provide this as the importmap to use when you run the script:

    deno run -A --importmap=importmap.json script.ts

    And you can import the serve function from the HTTP name like:

    import { serve } from "http/server.ts";

    Is it safe to rely on the URLs for versioning? What happens if the developer pushes the latest build on the same URL and not a new one?

    Well, then it’s the developer’s fault, and this can also happen with the npm module system. But if you are still unsure whether you have cached the latest dependencies, then there is an option to reload some or all of them. 

    Conclusion

    Deno is an interesting project, to say the least. We only have the first stable version, and it has a long way to go. For instance, they are actively working on improving the performance of the TypeScript compiler. Also, there are a good number of APIs hidden behind the ‘–unstable’ flag. These may change in the upcoming releases. The ideas like TypeScript first and browser-compatible modules are certainly appealing, which makes Deno worth keeping an eye on.

  • Scalable Real-time Communication With Pusher

    What and why?

    Pusher is a hosted API service which makes adding real-time data and functionality to web and mobile applications seamless. 

    Pusher works as a real-time communication layer between the server and the client. It maintains persistent connections at the client using WebSockets, as and when new data is added to your server. If a server wants to push new data to clients, they can do it instantly using Pusher. It is highly flexible, scalable, and easy to integrate. Pusher has exposed over 40+ SDKs that support almost all tech stacks.

    In the context of delivering real-time data, there are other hosted and self-hosted services available. It depends on the use case of what exactly one needs, like if you need to broadcast data across all the users or something more complex having specific target groups. In our use case, Pusher was well-suited, as the decision was based on the easy usage, scalability, private and public channels, webhooks, and event-based automation. Other options which we considered were Socket.IO, Firebase & Ably, etc. 

    Pusher is categorically well-suited for communication and collaboration features using WebSockets. The key difference with  Pusher: it’s a hosted service/API.  It takes less work to get started, compared to others, where you need to manage the deployment yourself. Once we do the setup, it comes to scaling, that reduces future efforts/work.

    Some of the most common use cases of Pusher are:

    1. Notification: Pusher can inform users if there is any relevant change.  Notifications can also be thought of as a form of signaling, where there is no representation of the notification in the UI. Still, it triggers a reaction within an application.

    2. Activity streams: Stream of activities which are published when something changes on the server or someone publishes it across all channels.

    3. Live Data Visualizations: Pusher allows you to broadcast continuously changing data when needed.

    4. Chats: You can use Pusher for peer to peer or peer to multichannel communication.

    In this blog, we will be focusing on using Channels, which is an alias for Pub/Sub messaging API for a JavaScript-based application. Pusher also comes with Chatkit and Beams (Push Notification) SDK/APIs.

    • Chatkit is designed to make chat integration to your app as simple as possible. It allows you to add group chat and 1 to 1 chat feature to your app. It also allows you to add file attachments and online indicators.
    • Beams are used for adding Push Notification in your Mobile App. It includes SDKs to seamlessly manage push token and send notifications.

    Step 1: Getting Started

    Setup your account on the Pusher dashboard and get your free API keys.

    Image Source: Pusher

    1. Click on Channels
    2. Create an App. Add details based on the project and the environment
    3. Click on the App Keys tab to get the app keys.
    4. You can also check the getting started page. It will give code snippets to get you started.

    Add Pusher to your project:

    var express = require('express');
    var bodyParser = require('body-parser');
    
    var app = express();
    app.use(bodyParser.json());
    app.use(bodyParser.urlencoded({ extended: false }));
    
    app.post('/pusher/auth', function(req, res) {
      var socketId = req.body.socket_id;
      var channel = req.body.channel_name;
      var auth = pusher.authenticate(socketId, channel);
      res.send(auth);
    });
    
    var port = process.env.PORT || 5000;
    app.listen(port);

    CODE: https://gist.github.com/velotiotech/f09f14363bacd51446d5318e5050d628.js

    or using npm

    npm i pusher

    CODE: https://gist.github.com/velotiotech/423115d0943c1b882c913e437c529d11.js

    Step 2: Subscribing to Channels

    There are three types of channels in Pusher: Public, Private, and Presence.

    • Public channels: These channels are public in nature, so anyone who knows the channel name can subscribe to the channel and start receiving messages from the channel. Public channels are commonly used to broadcast general/public information, which does not contain any secure information or user-specific data.
    • Private channels: These channels have an access control mechanism that allows the server to control who can subscribe to the channel and receive data from the channel. All private channels should have a private- prefixed to the name. They are commonly used when the sever needs to know who can subscribe to the channel and validate the subscribers.
    • Presence channels: It is an extension to the private channel. In addition to the properties which private channels have, it lets the server ‘register’ users information on subscription to the channel. It also enables other members to identify who is online.

    In your application, you can create a subscription and start listening to events on: 

    // Here my-channel is the channel name
    // all the event published to this channel would be available
    // once you subscribe to the channel and start listing to it.
    
    var channel = pusher.subscribe('my-channel');
    
    channel.bind('my-event', function(data) {
      alert('An event was triggered with message: ' + data.message);
    });

    CODE: https://gist.github.com/velotiotech/d8c27960e2fac408a8db57b92f1e846d.js

    Step 3: Creating Channels

    For creating channels, you can use the dashboard or integrate it with your server. For more details on how to integrate Pusher with your server, you can read (Server API). You need to create an app on your Pusher dashboard and can use it to further trigger events to your app.

    or 

    Integrate Pusher with your server. Here is a sample snippet from our node App:

    var Pusher = require('pusher');
    
    var pusher = new Pusher({
      appId: 'APP_ID',
      key: 'APP_KEY',
      secret: 'APP_SECRET',
      cluster: 'APP_CLUSTER'
    });
    
    // Logic which will then trigger events to a channel
    function trigger(){
    ...
    ...
    pusher.trigger('my-channel', 'my-event', {"message": "hello world"});
    ...
    ...
    }

    CODE: https://gist.github.com/velotiotech/6f5b0f6407c0a74a0bce4b398a849410.js

    Step 4: Adding Security

    As a default behavior, anyone who knows your public app key can open a connection to your channels app. This behavior does not add any security risk, as connections can only access data on channels. 

    For more advanced use cases, you need to use the “Authorized Connections” feature. It authorizes every single connection to your channels, and hence, avoids unwanted/unauthorized connection. To enable the authorization, set up an auth endpoint, then modify your client code to look like this.

    const channels = new Pusher(APP_KEY, {
      cluster: APP_CLUSTER,
      authEndpoint: '/your_auth_endpoint'
    });
    
    const channel = channels.subscribe('private-<channel-name>');

    CODE: https://gist.github.com/velotiotech/9369051e5661a95352f08b1fdd8bf9ed.js

    For more details on how to create an auth endpoint for your server, read this. Here is a snippet from Node.js app

    var express = require('express');
    var bodyParser = require('body-parser');
    
    var app = express();
    app.use(bodyParser.json());
    app.use(bodyParser.urlencoded({ extended: false }));
    
    app.post('/pusher/auth', function(req, res) {
      var socketId = req.body.socket_id;
      var channel = req.body.channel_name;
      var auth = pusher.authenticate(socketId, channel);
      res.send(auth);
    });
    
    var port = process.env.PORT || 5000;
    app.listen(port);

    CODE: https://gist.github.com/velotiotech/fb67d5efe3029174abc6991089a910e1.js

    Step 5: Scale as you grow

     

    Pusher comes with a wide range of plans which you can subscribe to based on your usage. You can scale your application as it grows. Here is a snippet from available plans for mode details you can refer this.

    Image Source: Pusher

    Conclusion

    This article has covered a brief description of Pusher, its use cases, and how you can use it to build a scalable real-time application. Using Pusher may vary based on different use cases; it is no real debate on what one can choose. Pusher approach is simple and API based. It enables developers to add real-time functionality to any application in very little time.

    If you want to get hands-on tutorials/blogs, please visit here.

  • The Ultimate Cheat Sheet on Splitting Dynamic Redux Reducers

    This post is specific to need of code-splitting in React/Redux projects. While exploring the possibility to optimise the application, the common problem occurs with reducers. This article specifically focuses on how do we split reducers to be able to deliver them in chunks.

    What are the benefits of splitting reducers in chunks?

    1) True code splitting is possible

    2) A good architecture can be maintained by keeping page/component level reducers isolated from other parts of application minimising the dependency on other parts of application.

    Why Do We Need to Split Reducers?

    1. For fast page loads

    Splitting reducers will have and advantage of loading only required part of web application which in turn makes it very efficient in rendering time of main pages

    2. Organization of code

    Splitting reducers on page level or component level will give a better code organization instead of just putting all reducers at one place. Since reducer is loaded only when page/component is loaded will ensure that there are standalone pages which are not dependent on other parts of application. That ensures seamless development since it will essentially avoid cross references in reducers and throwing away complexities

    3. One page/component

    One reducer design pattern. Things are better written, read and understood when they are modular. With dynamic reducers it becomes possible to achieve it.

    4. SEO

    SEO is vast topic but it gets hit very hard if your website is having huge response times which happens in case if code is not split. With reducer level code splitting, reducers can be code split on component level which will reduce the loading time of website thereby increasing SEO rankings.

    What Exists Today?

    A little googling around the topic shows us some options. Various ways has been discussed here.  

    Dan Abramov’s answer is what we are following in this post and we will be writing a simple abstraction to have dynamic reducers but with more functionality.

    A lot of solutions already exists, so why do we need to create our own? The answer is simple and straightforward:

       1) The ease of use

    Every library out there is little catchy is some way. Some have complex api’s while some have too much boilerplate codes. We will be targeting to be near react-redux api.

       2) Limitation to add reducers at top level only

    This is a very common problem that a lot of existing libraries have as of today. That’s what we will be targeting to solve in this post. This opens new doors for possibilities to do code splitting on component level.

    A quick recap of redux facts:

    1) Redux gives us following methods:
    – “getState”,
    – “dispatch(action)”
    – “subscribe(listener)”
    – “replaceReducer(nextReducer)”

    2) reducers are plain functions returning next state of application

    3) “replaceReducer” requires the entire root reducer.

    What we are going to do?

    We will be writing abstraction around “replaceReducer” to develop an API to allow us to inject a reducer at a given key dynamically.

    A simple Redux store definition goes like the following:

    Let’s simplify the store creation wrapper as:

    What it Does?

    “dynamicActionGenerator” and “isValidReducer” are helper function to determine if given reducer is valid or not.

    For e.g.

    CODE:

    isValidReducer(() => { return ) // should return true
    isValidReducer(1) //should return false
    isValidReducer(true) //should return false
    isValidReducer(“example”) //should return false

    This is an essential check to ensure all inputs to our abstraction layer over createStore should be valid reducers.

    “createStore” takes initial Root reducer, initial state and enhancers that will be applicable to created store.

    In addition to that we are maintaining, “asyncReducers” and “attachReducer” on store object.

    “asyncReducers” keeps the mapping of dynamically added reducers.

    “attachReducer” is partial in above implementation and we will see the complete implementation below. The basic use of “attachReducer” is to add reducer from any part of web application.

    Given that our store object now becomes like follows:

    Store:

    CODE:

    - getState: Func
    - dispatch(action): Func
    - subscribe(listener): Func
    - replaceReducer(RootReducer): Func
    - attachReducer(reducer): Func
    - asyncReducers: JSONObject

    Now here is an interesting problem, replaceReducer requires a final root reducer function. That means we will have to recreate the root reducers every time.
    So we will create a dynamicRootReducer function itself to simply the process.

    So now our store object becomes as follows:
    Store:

    CODE:

    - getState: Func
    - dispatch(action) : Func
    - subscribe(listener) : Func
    - replaceReducer(RootReducer) : Func
    - attachReducer(reducer) : Func

    What does dynamicRootReducer does?
    1) Processes initial root reducer passed to it
    2) Executes dynamic reducers to get next state.

    So we now have an api exposed as :
    store.attachReducer(“home”, (state = {}, action) => { return state }); // Will add a dynamic reducer after the store has been created

    store.attachReducer(“home.grid”, (state={}, action) => { return state}); // Will add a dynamic reducer at a given nested key in store.

    Final Implementation:

    Working Example:

    Further implementations based on simplified code:

    Based on it I have simplified implementations into two libraries:

    Conclusion

    In this way, we can achieve code splitting with reducers which is a very common problem in almost every react-redux application. With above solution you can do code splitting on page level, component level and can also create reusable stateful components which uses redux state. The simplified approach will reduce your application boilerplate. Moreover common complex components like grid or even the whole pages like login can be exported and imported from one project to another making development faster than ever!