More often than not organizations need to apply various kinds of policies on the environments where they run their applications. These policies might be required to meet compliance requirements, achieve a higher degree of security, achieve standardization across multiple environments, etc. This calls for an automated/declarative way to define and enforce these policies. Policy engines like OPA help us achieve the same.
Motivation behind Open Policy Agent (OPA)
When we run our application, it generally comprises multiple subsystems. Even in the simplest of cases, we will be having an API gateway/load balancer, 1-2 applications and a database. Generally, all these subsystems will have different mechanisms for authorizing the requests, for example, the application might be using JWT tokens to authorize the request, but your database is using grants to authorize the request, it is also possible that your application is accessing some third-party APIs or cloud services which will again have a different way of authorizing the request. Add to this your CI/CD servers, your log server, etc and you can see how many different ways of authorization can exist even in a small system.
The existence of so many authorization models in our system makes life difficult when we need to meet compliance or information security requirements or even some self-imposed organizational policies. For example, if we need to adhere to some new compliance requirements then we need to understand and implement the same for all the components which do authorization in our system.
“The main motivation behind OPA is to achieve unified policy enforcements across the stack”
What are Open Policy Agent (OPA) and OPA Gatekeeper
The OPA is an open-source, general-purpose policy engine that can be used to enforce policies on various types of software systems like microservices, CI/CD pipelines, gateways, Kubernetes, etc. OPA was developed by Styra and is currently a part of CNCF.
OPA provides us with REST APIs which our system can call to check if the policies are being met for a request payload or not. It also provides us with a high-level declarative language, Rego which allows us to specify the policies we want to enforce as code. This provides us with lots of flexibility while defining our policies.
The above image shows the architecture of OPA. It exposes APIs which any service that needs to make an authorization or policy decision, can call (policy query) and then OPA can make a decision based on the Rego code for the policy and return a decision to the service that further processes the request accordingly. The enforcement is done by the actual service itself, OPA is responsible only for making the decision. This is how OPA becomes a general-purpose policy engine and supports a large number of services.
The Gatekeeper project is a Kubernetes specific implementation of the OPA. Gatekeeper allows us to use OPA in a Kubernetes native way to enforce the desired policies.
How Gatekeeper enforces policies
On the Kubernetes cluster, the Gatekeeper is installed as a ValidatingAdmissionWebhook. The Admission Controllers can intercept requests after they have been authenticated and authorized by the K8s API server, but before they are persisted in the database. If any of the admission controllers rejects the request then the overall request is rejected. The limitation of admission controllers is that they need to be compiled into the kube-apiserver and can be enabled only when the apiserver starts up.
To overcome this rigidity of the admission controller, admission webhooks were introduced. Once we enable admission webhooks controllers in our cluster, they can send admission requests to external HTTP callbacks and receive admission responses. Admission webhook can be of two types MutatingAdmissionWebhook and ValidatingAdmissionWebhook. The difference between the two is that mutating webhooks can modify the objects that they receive while validating webhooks cannot. The below image roughly shows the flow of an API request once both mutating and validating admission controllers are enabled.
The role of Gatekeeper is to simply check if the request meets the defined policy or not, that is why it is installed as a validating webhook.
Now we have Gatekeeper up and running in our cluster. The above installation also created a CRD named `constrainttemplates.templates.gatekeeper.sh’. This CRD allows us to create constraint templates for the policy we want to enforce. In the constraint template, we define the constraints logic using the Rego code and also its schema. Once the constraint template is created, we can create the constraints which are instances of the constraint templates, created for specific resources. Think of it as function and actual function calls, the constraint templates are like functions that are invoked with different values of the parameter (resource kind and other values) by constraints.
To get a better understanding of the same, let’s go ahead and create constraints templates and constraints.
The policy that we want to enforce is to prevent developers from creating a service of type LoadBalancer in the `dev` namespace of the cluster, where they verify the working of other code. Creating services of type LoadBalancer in the dev environment is adding unnecessary costs.
In the constraint template spec, we define a new object kind/type which we will use while creating the constraints, then in the target, we specify the Rego code which will verify if the request meets the policy or not. In the Rego code, we specify a violation that if the request is to create a service of type LoadBalancer then the request should be denied.
Using the above template, we can now define constraints:
Here we have specified the kind of the Kubernetes object (Service) on which we want to apply the constraint and we have specified the namespace as dev because we want the constraint to be enforced only on the dev namespace.
Let’s go ahead and create the constraint template and constraint:
Note: After creating the constraint template, please check if its status is true or not, otherwise you will get an error while creating the constraints. Also it is advisable to verify the Rego code snippet before using them in the constraints template.
Now let’s try to create a service of type LoadBalancer in the dev namespace:
When we tried to create a service of type LoadBalancer in the dev namespace, we got the error that it was denied by the admission webhook due to `deny-lb-type-svc-dev-ns` constraint, but when we try to create the service in the default namespace, we were able to do so.
Here we are not passing any parameters to the Rego policy from our constraints, but we can certainly do so to make our policy more generic, for example, we can add a field named servicetype to constraint template and in the policy code, deny all the request where the servicetype value defined in the constraint matches the value of the request. With this, we will be able to deny service of types other than LoadBalancer as well in any namespace of our cluster.
Gatekeeper also provides auditing for resources that were created before the constraint was applied. The information is available in the status of the constraint objects. This helps us in identifying which objects in our cluster are not compliant with our constraints.
Conclusion:
OPA allows us to apply fine-grained policies in our Kubernetes clusters and can be instrumental in improving the overall security of Kubernetes clusters which has always been a concern for many organizations while adopting or migrating to Kubernetes. It also makes meeting the compliance and audit requirements much simpler. There is some learning curve as we need to get familiar with Rego to code our policies, but the language is very simple and there are quite a few good examples to help in getting started.
If you landed here directly and want to know how to setup Jenkins master-slave architecture, please visit this post related to Setting-up the Jenkins Master-Slave Architecture.
The source code that we are using here is also a continuation of the code that was written in this GitHub Packer-Terraform-Jenkins repository.
Creating Jenkinsfile
We will create some Jenkinsfile to execute a job from our Jenkins master.
Here I will create two Jenkinsfile ideally, it is expected that your Jenkinsfile is present in source code repo but it can be passed directly in the job as well.
There are 2 ways of writing Jenkinsfile – Scripted and Declarative. You can find numerous points online giving their difference. We will be creating both of them to do a build so that we can get a hang of both of them.
Jenkinsfile for Angular App (Scripted)
As mentioned before we will be highlighting both formats of writing the Jenkinsfile. For the Angular app, we will be writing a scripted one but can be easily written in declarative format too.
We will be running this inside a docker container. Thus, the tests are also going to get executed in a headless manner.
Here is the Jenkinsfile for reference.
Here we are trying to leverage Docker volume to keep updating our source code on bare metal and use docker container for the environments.
Dissecting Node App’s Jenkinsfile
We are using CleanWs() to clear the workspace.
Next is the Main build in which we define our complete build process.
We are pulling the required images.
Highlighting the steps that we will be executing.
Checkout SCM: Checking out our code from Git
We are now starting the node container inside of which we will be running npm install and npm run lint.
Get test dependency: Here we are downloading chrome.json which will be used in the next step when starting the container.
Here we test our app. Specific changes for running the test are mentioned below.
Build: Finally we build the app.
Deploy: Once CI is completed we need to start with CD. The CD itself can be a blog of itself but wanted to highlight what basic deployment would do.
Here we are using Nginx container to host our application.
If the container does not exist it will create a container and use the “dist” folder for deployment.
If Nginx container exists, then it will ask for user input to recreate a container or not.
If you select not to create, don’t worry as we are using Nginx it will do a hot reload with new changes.
The angular application used here was created using the standard generate command given by the CLI itself. Although the build and install give no trouble in a bare metal some tweaks are required for running test in a container.
In karma.conf.js update browsers withChromeHeadless.
Next in protractor.conf.js update browserName with chrome and add
That’s it! And We have our CI pipeline setup for Angular based application.
Jenkinsfile for .Net App (Declarative)
For a .Net application, we have to setup MSBuild and MSDeploy. In the blog post mentioned above, we have already setup MSBuild and we will shortly discuss how to setup MSDeploy.
To do the Windows deployment we have two options. Either setup MSBuild in Jenkins Global Tool Configuration or use the full path of MSBuild on the slave machine.
Passing the path is fairly simple and here we will discuss how to use global tool configuration in a Jenkinsfile.
First, get the path of MSBuild from your server. If it is not the latest version then the path is different and is available in Current directory otherwise always in <version> directory.</version>
As we are using MSBuild 2017. Our MSBuild path is:
Now you have your configuration ready to be used in Jenkinsfile.
Jenkinsfile to build and test the app is given below.
As seen above the structure of Declarative syntax is almost same as that of Declarative. Depending upon which one you find easier to read you should opt the syntax.
Dissecting Dotnet App’s Jenkinsfile
In this case too we are cleaning the workspace as the first step.
Checkout: This is also the same as before.
Nuget Restore: We are downloading dependent required packages for both PrimeService and PrimeService.Tests
Build: Building the Dotnet app using MSBuild tool which we had configured earlier before writing the Jenkinsfile.
UnitTest: Here we have used dotnet test although we could’ve used MSTest as well here just wanted to highlight how easy dotnet utility makes it. We can even use dotnet build for the build as well.
Deploy: Deploying on the IIS server. Creation of IIS we are covering below.
From the above-given examples, you get a hang of what Jenkinsfile looks like and how it can be used for creating jobs. Above file highlights basic job creation but it can be extended to everything that old-style job creation could do.
Creating IIS Server
Unlike our Angular application where we just had to get another image and we were good to go. Here we will have to Packer to create our IIS server. We will be automating the creation process and will be using it to host applications.
Here is a Powershell script for IIS for reference.
We won’t be deploying any application on it as we have created a sample app for PrimeNumber. But in the real world, you might be deploying Web Based application and you will need IIS. We have covered here the basic idea of how to install IIS along with any dependency that might be required.
Conclusion
In this post, we have covered deploying Windows and Linux based applications using Jenkinsfile in both scripted and declarative format.
Kubernetes, the awesome container orchestration tool is changing the way applications are being developed and deployed. You can specify the required resources you want and have it available without worrying about the underlying infrastructure. Kubernetes is way ahead in terms of high availability, scaling, managing your application, but storage section in the k8s is still evolving. Many storage supports are getting added and are production ready.
People are preferring clustered applications to store the data. But, what about the non-clustered applications? Where does these applications store data to make it highly available? Considering these questions, let’s go through the Ceph storage and its integration with Kubernetes.
What is Ceph Storage?
Ceph is open source, software-defined storage maintained by RedHat. It’s capable of block, object, and file storage. The clusters of Ceph are designed in order to run on any hardware with the help of an algorithm called CRUSH (Controlled Replication Under Scalable Hashing). This algorithm ensures that all the data is properly distributed across the cluster and data quickly without any constraints. Replication, Thin provisioning, Snapshots are the key features of the Ceph storage.
There are good storage solutions like Gluster, Swift but we are going with Ceph for following reasons:
File, Block, and Object storage in the same wrapper.
Better transfer speed and lower latency
Easily accessible storage that can quickly scale up or down
We are going to use 2 types of storage in this blog to integrate with kubernetes.
Deploying highly available Ceph cluster is pretty straightforward and easy. I am assuming that you are familiar with setting up the Ceph cluster. If not then refer the official document here.
If you check the status, you should see something like:
Here notice that my Ceph monitors IPs are 10.0.1.118, 10.0.1.227 and 10.0.1.172
K8s Integration
After setting up the Ceph cluster, we would consume it with Kubernetes. I am assuming that your Kubernetes cluster is up and running. We will be using Ceph-RBD and CephFS as storage in Kubernetes.
Ceph-RBD and Kubernetes
We need a Ceph RBD client to achieve interaction between Kubernetes cluster and CephFS. This client is not in the official kube-controller-manager container so let’s try to create the external storage plugin for Ceph.
Till now we have seen how to use the block based storage i.e Ceph-RBD with kubernetes by creating the dynamic storage provisioner. Now let’s go through the process for setting up the storage using file system based storage i.e. CephFS.
CephFS and Kubernetes
Let’s create the provisioner and storage class for the CephFS. Create the dedicated namespace for CephFS
# kubectl create ns cephfs
Create the kubernetes secrete using the Ceph admin auth token
We are all set now. CephFS provisioner is created. Let’s wait till it gets into running state.
# kubectl get pods -n cephfsNAMEREADYSTATUSRESTARTSAGEcephfs-provisioner-8d957f95f-s7mdq 1/1 Running 0 1m
Once the CephFS provider is up, try creating the persistent volume claim. In this step, storage class will take care of creating the persistent volume dynamically.
# kubectl get pvcNAMESTATUSVOLUMECAPACITYACCESSMODESSTORAGECLASSAGEclaim1 Bound pvc-a7db18a7-9641-11e9-ab86-12e154d66096 1Gi RWX cephfs 2m# kubectl get pvNAMECAPACITYACCESSMODESRECLAIMPOLICYSTATUSCLAIMSTORAGECLASSREASONAGEpvc-a7db18a7-9641-11e9-ab86-12e154d66096 1Gi RWX Delete Bound default/claim1 cephfs 2m
Conclusion
We have seen how to integrate the Ceph storage with Kubernetes. In the integration, we covered ceph-rbd and cephfs. This approach is highly useful when your application is not clustered application and if you are looking to make it highly available.
In this blog post, we will explore the basics of cross platform mobile application development using Flutter, compare it with existing cross-platform solutions and create a simple to-do application to demonstrate how quickly we can build apps with Flutter.
Brief introduction
Flutter is a free and open source UI toolkit for building natively compiled applications for mobile platforms like Android and iOS, and for the web and desktop as well. Some of the prominent features are native performance, single codebase for multiple platforms, quick development, and a wide range of beautifully designed widgets.
Flutter apps are written in Dart programming language, which is a very intuitive language with a C-like syntax. Dart is optimized for performance and developer friendliness. Apps written in Dart can be as fast as native applications because Dart code compiles down to machine instructions for ARM and x64 processors and to Javascript for the web platform. This, along with the Flutter engine, makes Flutter apps platform agnostic.
Other interesting Dart features used in Flutter apps is the just-in-time (JIT) compiler, used during development and debugging, which powers the hot reload functionality. And the ahead-of-time (AOT) compiler which is used when building applications for the target platforms such as Android or iOS, resulting in native performance.
Everything composed on the screen with Flutter is a widget including stuff like padding, alignment or opacity. The Flutter engine draws and controls each pixel on the screen using its own graphics engine called Skia.
Flutter vs React-Native
Flutter apps are truly native and hence offer great performance, whereas apps built with react-native requires a JavaScript bridge to interact with OEM widgets. Flutter apps are much faster to develop because of a wide range of built-in widgets, good amount of documentation, hot reload, and several other developer-friendly choices made by Google while building Dart and Flutter.
React Native, on the other hand, has the advantage of being older and hence has a large community of businesses and developers who have experience in building react-native apps. It also has more third party libraries and packages as compared to Flutter. That said, Flutter is catching up and rapidly gaining momentum as evident from Stackoverflow’s 2019 developer survey, where it scored 75.4% under “Most Loved Framework, Libraries and Tools”.
All in all, Flutter is a great tool to have in our arsenal as mobile developers in 2020.
Getting started with a sample application
Flutter’s official docs are really well written and include getting started guides for different OS platforms, API documentation, widget catalogue along with several cookbooks and codelabs that one can follow along to learn more about Flutter.
To get started with development, we will follow the official guide which is available here. Flutter requires Flutter SDK as well as native build tools to be installed on the machine to begin development. To write apps, one may use Android Studios or VS Code, or any text editor can be used with Flutter’s command line tools. But a good rule of thumb is to install Android Studio because it offers better support for management of Android SDK, build tools and virtual devices. It also includes several built-in tools such as the icons and assets editor.
Once done with the setup, we will start by creating a project. Open VS Code and create a new Flutter project:
We should see the main file main.Dart with some sample code (the counter application). We will start editing this file to create our to-do app.
Some of the features we will add to our to-do app:
Display a list of to-do items
Mark to-do items as completed
Add new item to the list
Let’s start by creating a widget to hold our list of to-do items. This is going to be a StatefulWidget, which is a type of widget with some state. Flutter tracks changes to the state and redraws the widget when a new change in the state is detected.
After creating theToDoList widget, our main.Dart file looks like this:
/// imports widgets from the material design import'package:flutter/material.dart';voidmain() =>runApp(TodoApp());/// Stateless widgets must implement the build() method and return a widget. /// The first parameter passed to build function is the context in which this widget is builtclassTodoAppextendsStatelessWidget { @override Widget build(BuildContextcontext) {returnMaterialApp( title: 'TODO', theme: ThemeData( primarySwatch: Colors.blue, ), home: TodoList(), ); }}/// Stateful widgets must implement the createState method/// State of a stateless widget against has a build() method with contextclassTodoListextendsStatefulWidget { @override State<StatefulWidget>createState() =>TodoListState();}classTodoListStateextendsState<TodoList> { @override Widget build(BuildContextcontext) {returnScaffold( appBar: AppBar( title: Text('Todo'), ), body: Text('Todo List'), ); }}
The ToDoApp class here extends Stateless widget i.e. a widget without any state whereas ToDoList extends StatefulWidget. All Flutter apps are a combination of these two types of widgets. StatelessWidgets must implement the build() method whereas Stateful widgets must implement the createState() method.
Some built-in widgets used here are the MaterialApp widget, the Scaffold widget and AppBar and Text widgets. These all are imported from Flutter’s implementation of material design, available in the material.dart package. Similarly, to use native looking iOS widgets in applications, we can import widgets from the flutter/cupertino.dart package.
Next, let’s create a model class that represents an individual to-do item. We will keep this simple i.e. only store label and completed status of the to-do item.
classTodo { final String label; bool completed;Todo(this.label, this.completed);}
The constructor we wrote in the code above is implemented using one of Dart’s syntactic sugar to assign a constructor argument to the instance variable. For more such interesting tidbits, take the Dart language tour.
Now let’s modify the ToDoListState class to store a list of to-do items in its state and also display it in a list. We will use ListView.builder to create a dynamic list of to-do items. We will also use Checkbox and Text widget to display to-do items.
/// State is composed all the variables declared in the State implementation of a Stateful widgetclassTodoListStateextendsState<TodoList> { final List<Todo>todos=List<Todo>(); @override Widget build(BuildContextcontext) {returnScaffold( appBar: AppBar( title: Text('Todo'), ), body: Padding( padding: EdgeInsets.all(16.0), child: todos.length>0? ListView.builder( itemCount: todos.length, itemBuilder: _buildRow, ):Text('There is nothing here yet. Start by adding some Todos'), ), ); }/// build a single row of the list Widget _buildRow(context, index) =>Row(children: <Widget>[Checkbox(value:todos[index].completed,onChanged: (value) =>_changeTodo(index, value)),Text(todos[index].label,style:TextStyle(decoration:todos[index].completed?TextDecoration.lineThrough:null)) ], );/// toggle the completed state of a todo item_changeTodo(intindex, boolvalue) =>setState(() =>todos[index].completed = value);}
A few things to note here are: private functions start with an underscore, functions with a single line of body can be written using fat arrows (=>) and most importantly, to change the state of any variable contained in a Stateful widget, one must call the setState method.
The ListView.builder constructor allows us to work with very large lists, since list items are created only when they are scrolled.
Another takeaway here is the fact that Dart is such an intuitive language that it is quite easy to understand and you can start writing Dart code immediately.
Everything on a screen, like padding, alignment or opacity, is a widget. Notice in the code above, we have used Padding as a widget that wraps the list or a text widget depending on the number of to-do items. If there’s nothing in the list, a text widget is displayed with some default message.
Also note how we haven’t used the new keyword when creating instances of a class, say Text. That’s because using the new keyword is optional in Dart and discouraged, according to the effective Dart guidelines.
Running the application
At this point, let’s run the code and see how the app looks on a device. Press F5, then select a virtual device and wait for the app to get installed. If you haven’t created a virtual device yet, refer to the getting started guide.
Once the virtual device launches, we should see the following screen in a while. During development, the first launch always takes a while because the entire app gets built and installed on the virtual device, but subsequent changes to code are instantly reflected on the device, thanks to Flutter’s amazing hot reload feature. This reduces development time and also allows developers and designers to experiment more frequently with the interface changes.
As we can see, there are no to-dos here yet. Now let’s add a floating action button that opens a dialog which we will use to add new to-do items.
Adding the FAB is as easy as passing floatingActionButton parameter to the scaffold widget.
And declare a function inside ToDoListState that displays a popup (AlertDialog) with a text input box.
/// display a dialog that accepts text_promptDialog(BuildContext context) { String _todoLabel ='';returnshowDialog( context: context, builder: (context) { return AlertDialog(title:Text('Enter TODO item'),content:TextField(onChanged: (value) =>_todoLabel = value,decoration:InputDecoration(hintText:'Add new TODO item')),actions: <Widget>[FlatButton(child:newText('CANCEL'),onPressed: () =>Navigator.of(context).pop(), ),FlatButton(child:newText('ADD'),onPressed: () {setState(() =>todos.add(Todo(_todoLabel, false)));/// dismisses the alert dialogNavigator.of(context).pop(); }, ) ], ); }); }
At this point, saving changes to the file should result in the application getting updated on the virtual device (hot reload), so we can just click on the new floating action button that appeared on the bottom right of the screen and start testing how the dialog looks.
We used a few more built-in widgets here:
AlertDialog: a dialog prompt that opens up when clicking on the FAB
TextField: text input field for accepting user input
InputDecoration: a widget that adds style to the input field
FlatButton: a variation of button with no border or shadow
FloatingActionButton: a floating icon button, used to trigger primary action on the screen
Here’s a quick preview of how the application should look and function at this point:
And just like that, in less than 100 lines of code, we’ve built the user interface of a simple, cross platform to-do application.
The source code for this application is available here.
To conclude, Flutter is an extremely powerful toolkit to build cross platform applications that have native performance and are beautiful to look at. Dart, the language behind Flutter, is designed considering the nuances of user interface development and Flutter offers a wide range of built-in widgets. This makes development fun and development cycles shorter; something that we experienced while building the to-do app. With Flutter, time to market is also greatly reduced which enables teams to experiment more often, collect more feedback and ship applications faster. And finally, Flutter has a very enthusiastic and thriving community of designers and developers who are always experimenting and adding to the Flutter ecosystem.
AWS Lambda was launched in 2014. Since then, serverless computing (Function as a Service) came into existence. We have been using Lambda for our projects for the last couple of years to build complete end-to-end web applications which includes usage of AWS Lambda with API Gateway (REST APIs), CloudWatch (logs), S3 (website hosting & data storage), and so on.
Google and Azure also provide serverless technologies like AWS. There is one more popular open-source solution, i.e. OpenWhisk. While implementing serverless applications on AWS, we have learned a lot about running a website on Lambda. Like every other technology, serverless also has its own set of benefits and drawbacks that we will discuss here.
What Does Serverless Mean?
Serverless is a dynamic cloud computing execution model where the server is run by the cloud providers i.e. AWS, Google, or Azure. This technology actually runs on the servers, but when they say serverless, it means that the servers are abstracted away from the users and provided as a service to them.
The Serverless World
There’s so much excitement for serverless in the industry. But there are issues that sometimes outweigh the pros of serverless architecture and would need complex workarounds. AWS charges for each invocation of Lambda in multiple of 100ms increments. When there are thousands of incoming requests coming up for EC2 servers, we need to scale up servers to handle them, but Lambda does this on its own. We don’t need to create auto-scaling or load balancers. But, how much does it cost to use Lambda? Let’s compare that below.
Let’s say, we have a serverless application with only 1 Lambda & 1 API Gateway,
API Gateway $3.50/API calls * 200 million API requests/Month = 700 USD
Lambda $0.00001667 GB-second * (200 million requests * 0.3 seconds per execution * 1 GB Memory – 400k free tier seconds in case of new account) = 1308 USD
Total = 2008 USD (This is a lot)
Now let’s see the example of the EC2 server,
3 Highly available EC2 Server = 416 USD M5.xlarge: 16GB RAM, 4 vCPUs
Application Load Balancer = 39 USD
Total = 455 USD/Month
So, if you compare the above pricing, classic servers are cheaper than serverless.
This can be really useful for startups in their early stages where every rupee counts. In that case, Lambda or serverless will be very useful as it charges only for the number of hits coming on your server with less management and more development for the team. For example – you have your development environment for developers, so, instead of setting up new servers, you can go with serverless development.
Loss Of Control
One of the biggest disadvantages of serverless is that you don’t have the control over your services. We use a lot of services that are managed by third-party cloud providers, like Cloudwatch for logs and DynamoDB for databases. Also, various functions need to be managed as your project grows, and everything is handled by cloud providers. You lose portability as soon as you integrate with other services like Lambda with SNS, DynamoDB, Kinesis, and it also results in vendor lock-in. It becomes difficult for you to change the vendor later.
On the other hand, in the non-serverless world, we can manage our language versions, queues, or db queries. Basically. we have all codes at one place where we don’t need to manage multiple functions. But every technology has its pros and cons which we said earlier as well. In serverless, there is a loss of control that leads to focusing less on development and more on adding the business values of our product.
Choosing serverless or non-serverless will completely depend on the product type. If you have a simple application like selling cakes online and you need simple implementation or authentication, you can go with serverless. But if your application is really complex, you need to add some complex algorithm. To have the control over your code, security, and authentication, you should go with non-serverless.
Security Issues
The biggest risk in serverless or using cloud services is poorly configured functions, services, or applications. Bad configuration can lead to multiple issues in your application which can be either security-related or infrastructure-related. It doesn’t matter which cloud provider you are using, AWS, GCP, or Azure, it’s important to correctly configure your functions or services with the permission it needs to access other services and manage controls. Otherwise, it can lead to permission issues or security breach. Also, if you are connecting any third-party APIs with your provider, make sure the connections are safe and data is encrypted in the right format.
Giving correct configuration is the most important thing in both serverless and non-serverless applications. When you use cloud services and be very strict about it, you will interact less with security breaches or permission issues in the near future.
Testing & Debugging
Serverless applications are hard to test. Normally, developers test the code locally and then deploy it. But in the serverless world, testing on local seems to be complicated, as no such tool is available to mock the cloud services on the local environment. So, we need to perform a decent amount of integration testing before moving forward. Currently, you can test & debug the code using Console or Print statement which will be visible in your Cloudwatch logs like below is one code snippet in Node.js.
consthttps=require('https')let url ="https://google.com"exports.handler=asyncfunction(event) {constpromise=newPromise(function(resolve, reject) { console.log("Processing URL: "url) https.get(url, (res) => {resolve(res.statusCode)// console for debugging / testing purpose. console.info("Request was successfull!!!") }).on('error', (e) => {reject(Error(e))// console for errors console.error("Error while processing:"+ e) }) })return promise}
For serverless applications, it is important to give some time & effort upfront to architect your application correctly and create good integration tests over cloud infrastructure.
It is difficult to test or debug the applications in serverless. In non-serverless applications, we debug the code, but in serverless, we need to debug end-to-end integration with multiple services that we use. Lambdas are so short-lived that till the time you search for the logs, they disappear. So, in this situation, we can use AWS Cloudwatch or Google Stackdriver that are meant to do that.
An issue remains an issue until you trace that, and some technical issues are hard to find until you know or face them. Yes, Lambda has one such drawback which is known as Cold start. Lambda gets cold, it means, Lambda code runs on the server which is managed by Amazon. To make it feasible, Amazon doesn’t keep everyone’s code warm, i.e. it doesn’t serve all requests at the same time. So, if your particular function hasn’t run in a while, a request has to wait for Lambda to spin up the server then invoke the code, which will take some time for Lambda to give the result for that request.
But, wait for how long? I was using Node.js and it took around 4 seconds to respond. This is not good for the end-user experience and it can impact your business. This kind of issue is not tolerable in today’s world where we need requests to respond faster to provide a better user experience.
The problem is not much for limited Lambdas, but what if the number of Lambdas increases. Let’s say, there are 50s-100s of Lambdas, and warming up every Lambda can be annoying. You have to call Lambdas before the user calls it again, I mean, why? But there isn’t any solution rather than warming it. I particularly used the Serverless Framework for my serverless implementation. It helped me achieve most of the problems of Lambdas and other resources that we used to build serverless applications.
Conclusion
Serverless has many problems, I agree, but which tech doesn’t. When you choose either serverless or non-serverless, make sure you do your study and analyze your requirements to decide which direction to enter. If you want to implement quicker, small applications with strict deadlines and less budget, go with serverless, otherwise, choose EC2 servers. It mainly depends on the requirements. If you are using serverless, some frameworks will help you a lot. Also, you can compare the pricing here.
If you are new to serverless and want to implement it from scratch, you can have a look at the following link.
Currently, serverless has its downsides, but hoping that Amazon and other cloud providers will come up with some good solutions to make it more efficient. We look forward to learning as the technology evolves.
With the test framework that Python and Django provide, there is a lot of code boilerplate, maintainability, and duplication issues rise as your projects grow. It’s also not a very pythonic way of writing tests.
Pytest provides a simple and more elegant way to write tests.
It provides the ability to write tests as functions, which means a lot of boilerplate code has been removed, making your code more readable and easy to maintain. Pytest also provides functionality in terms of test discovery—and defining and using fixtures.
Why Pytest Fixtures?
When writing tests, it’s very common that the test will need objects, and those objects may be needed by multiple tests. There might be a complicated process for the creation of these objects. It will be difficult to add that complex process in each of the test cases, and on any model changes, we will need to update our logic in all places. This will create issues of code duplication and its maintainability.
To avoid all of this, we can use the fixture provided by the pytest, where we will define the fixture in one place, and then we can inject that fixture in any of the tests in a much simpler way.
Briefly, if we have to understand fixtures, in the literal sense, they are where we prepare everything for our test. They’re everything that the test needs to do its thing.
We are going to explore how effectively we can make use of fixtures with Django models that are more readable and easy to maintain. These are the fixtures provided by the pytest and not to be confused with Django fixtures.
Installation and Setup
For this blog, we will set up a basic e-commerce application and set up the test suite for pytest.
Creating Django App
Before we begin testing, let’s create a basic e-commerce application and add a few models on which we can perform tests later.
To create a Django app, go to the folder you want to work in, open the terminal, and run the below commands:
$ django-admin startproject e_commerce_app$ cd e-commerce-app$ python manage.py startapp product
Once the app is created, go to the settings.py and add the newly created product app to the INSTALLED_APPS.
The models and database is now ready, and we can move on to writing test cases for these models.
Let’s set up the pytest in our Django app first.
For testing our Django applications with pytest, we will use the plugin pytest-django, which provides a set of useful tools for testing Django apps and projects. Let’s start with installing and configuration of the plugin.
Installing pytest
Pytest can be installed with pip:
$ pip install pytest-django
Installing pytest-django will also automatically install the latest version of pytest. Once installed, we need to tell pytest-django where our settings.py file is located.
The easiest way to do this is to create a pytest configuration file with this information.
Create a file called pytest.ini in your project directory and add this content:
Django and pytest automatically detect and run your test cases in files whose name starts with ‘test’.
In the product app folder, create a new module named tests. Then add a file called test_models.py in which we will write all the model test cases for this app.
$ cd product$ mkdir tests$ cd tests && touch test_models.py
Running your Test Suite
Tests are invoked directly with the pytest command:
$ pytest$ pytest tests # test a directory$ pytest test.py # test file
For now, we are configured and ready for writing the first test with pytest and Django.
Writing Tests with Pytest
Here, we will write a few test cases to test the models we have written in the models.py file. To start with, let’s create a simple test case to test the category creation.
Now, try to execute this test from your command line:
$ pytest============================= test session starts ==============================platform linux -- Python 3.7.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1django: settings: pytest_fixtures.settings (from ini)rootdir: /home/suraj/PycharmProjects/e_commerce_app, configfile: pytest.iniplugins: django-4.1.0collected 1 item product/tests/test_models.py F [100%]===================================FAILURES===================================_____________________________ test_create_category _____________________________ def test_create_category():> category = Category.objects.create(name="Books")product/tests/test_models.py:5: ...ERuntimeError: Database access not allowed, use the "django_db" mark, or the "db" or "transactional_db" fixtures to enable it.venv/lib/python3.7/site-packages/django/db/backends/base/base.py:235: RuntimeError=========================== short test summary info ============================FAILED product/tests/test_models.py::test_create_category -RuntimeError: Dat...==============================1 failed in 0.21s ===============================
The tests failed. If you look at the error, it has to do something with the database. The pytest-django doc says:
pytest-django takes a conservative approach to enabling database access. By default your tests will fail if they try to access the database. Only if you explicitly request database access will this be allowed. This encourages you to keep database-needing tests to a minimum which makes it very clear what code uses the database.
This means we need to explicitly provide database access to our test cases. For this, we need to use [pytest marks](<https://docs.pytest.org/en/stable/mark.html#mark>) to tell pytest-django your test needs database access.
Alternatively, there is one more way we can access the database in the test cases, i.e., using the db helper fixture provided by the pytest-django. This fixture will ensure the Django database is set up. It’s only required for fixtures that want to use the database themselves.
If you look at both the test cases, one thing you can observe is that both the test cases do not test Category creation logic, and the Category instance is also getting created twice, once per test case. Once the project becomes large, we might have many test cases that will need the Category instance. If every test is creating its own category, then you might face trouble if any changes to the Category model happen.
This is where fixtures come to the rescue. It promotes code reusability in your test cases. To reuse an object in many test cases, you can create a test fixture:
Here, we have created a simple function called category and decorated it with @pytest.fixture to mark it as a fixture. It can now be injected into the test cases just like we injected the fixture db.
Now, if a new requirement comes in that every category should have a description and a small icon to represent the category, we don’t need to now go to each test case and update the category to create logic. We just need to update the fixture, i.e., only one place. And it will take effect in every test case.
Using fixtures, you can avoid code duplication and make tests more maintainable.
Parametrizing fixtures
It is recommended to have a single fixture function that can be executed across different input values. This can be achieved via parameterized pytest fixtures.
Let’s write the fixture for the product and consider we will need to create a SKU product number that has 6 characters and contains only alphanumeric characters.
import pytestfrom product.models import Category, Product@pytest.fixturedef product_one(db): return Product.objects.create(name="Book 1", sku="ABC123")def test_product_sku(product_one): assert all(letter.isalnum() for letter in product_one.sku) assert len(product_one.sku) == 6
We now want to test the case against multiple sku cases and make sure for all types of inputs the test is validated. We can flag the fixture to create three different product_one fixture instances. The fixture function gets access to each parameter through the special request object:
import pytestfrom product.models import Product@pytest.fixture(params=("ABC123", "123456", "ABCDEF"))def product_one(db,request): return Product.objects.create(name="Book 1",sku=request.param)def test_product_sku(product_one): assert all(letter.isalnum() for letter in product_one.sku) assert len(product_one.sku) == 6
Fixture functions can be parametrized in which case they will be called multiple times, each time executing the set of dependent tests, i.e., the tests that depend on this fixture.
Test functions usually do not need to be aware of their re-running. Fixture parametrization helps to write exhaustive functional tests for components that can be configured in multiple ways.
Open the terminal and run the test:
$ pytest============================= test session starts ==============================platform linux -- Python 3.7.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1django: settings: pytest_fixtures.settings (from ini)rootdir: /home/suraj/PycharmProjects/e_commerce_app, configfile: pytest.iniplugins: django-4.1.0collected 3 items product/tests/test_models.py ... [100%]==============================3 passed in 0.27s ===============================
We can see that our test_product_sku function ran thrice.
Injecting Fixtures into Other fixtures.
We will often come across a case wherein, we will need an object for a case that will be dependent on some other object. Let’s try to create a few products under the category “Books”.
If we try to test this in the terminal, we will encounter an error:
$ pytest============================= test session starts ==============================platform linux -- Python 3.7.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1django: settings: pytest_fixtures.settings (from ini)rootdir: /home/suraj/PycharmProjects/e_commerce_app, configfile: pytest.iniplugins: django-4.1.0collected 1 item product/tests/test_models.py E [100%]====================================ERRORS====================================______________ ERROR at setup of test_two_different_books_create _______________...query ='INSERT INTO "product_category" ("name") VALUES (?)', params = ['Books']...E django.db.utils.IntegrityError: UNIQUE constraint failed: product_category.namevenv/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py:413: IntegrityError=========================== short test summary info ============================ERROR product/tests/test_models.py::test_two_different_books_create - django....===============================1 error in 0.44s ===============================
The test case throws an IntegrityError, saying we tried to create the “Books” category twice. And if you look at the code, we have created the category in both product_one and product_two fixtures. What could we have done better?
If you look carefully, we have injected db in both the product_one and product_two fixtures, and db is just another fixture. So that means fixtures can be injected into other fixtures.
One of pytest’s greatest strengths is its extremely flexible fixture system. It allows us to boil down complex requirements for tests into more simple and organized functions, where we only need to have each one describe the things they are dependent on.
You can use this feature to address the IntegrityError above. Create the category fixture and inject it into both the product fixtures.
If we try to run the test now, it should run successfully.
$ pytest============================= test session starts ==============================platform linux -- Python 3.7.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1django: settings: pytest_fixtures.settings (from ini)rootdir: /home/suraj/PycharmProjects/e_commerce_app, configfile: pytest.iniplugins: django-4.1.0collected 1 item product/tests/test_models.py . [100%]==============================1 passed in 0.20s ===============================
By restructuring the fixtures this way, we have made code easier to maintain. By simply injecting fixtures, we can maintain a lot of complex model fixtures in a much simpler way.
Let’s say we need to add an example where product one and product two will be sold by retail shop “ABC”. This can be easily achieved by injecting retailer fixtures into the product fixture.
Sometimes, you may want to have a fixture (or even several) that you know all your tests will depend on. “Autouse” fixtures are a convenient way of making all tests automatically request them. This can cut out a lot of redundant requests, and can even provide more advanced fixture usage.
We can make a fixture an autouse fixture by passing in autouse=True to the fixture’s decorator. Here’s a simple example of how they can be used:
In this example, the append_retailers fixture is an autouse fixture. Because it happens automatically, test_product_retailer is affected by it, even though the test did not request it. That doesn’t mean they can’t be requested though; just that it isn’t necessary.
Factories as Fixtures
So far, we have created objects with a small number of arguments. However, practically models are a bit more complex and may require more inputs. Let’s say we will need to store the sku, mrp, and weight information along with name and category.
If we decide to provide every input to the product fixture, then the logic inside the product fixtures will get a little complicated.
Product creation has a somewhat complex logic of managing retailers and generating unique SKU. And the product creation logic will grow as we keep adding requirements. There may be some extra logic needed if we consider discounts and coupon code complexity for every retailer. There may also be a lot of versions of the product instance we may want to test against, and you have already learned how difficult it is to maintain such a complex code.
The “factory as fixture” pattern can help in these cases where the same class instance is needed for different tests. Instead of returning an instance directly, the fixture will return a function, and upon calling which one, you can get the distance that you wanted to test.
This is not far from what you’ve already done, so let’s break it down:
The category and retailer_abc fixture remains the same.
A new product_factory fixture is added, and it is injected with the category and retailer_abc fixture.
The fixture product_factory creates a wrapper and returns an inner function called create_product.
Inject product_factory into another fixture and use it to create a product instance
The factory fixture works similar to how decorators work in python.
Sharing Fixtures Using Scopes
Fixtures requiring network or db access depend on connectivity and are usually time-expensive to create. In the previous example, every time we request any fixture within our tests, it is used to run the method, generate an instance and pass them to the test. So if we have written ‘n’ tests, and every test calls for the same fixture then that fixture instance will be created n times during the entire execution.
This is mainly happening because fixtures are created when first requested by a test, and are destroyed based on their scope:
Function: the default scope, the fixture is destroyed at the end of the test.
Class: the fixture is destroyed during the teardown of the last test in the class.
Module: the fixture is destroyed during teardown of the last test in the module.
Package: the fixture is destroyed during teardown of the last test in the package.
Session: the fixture is destroyed at the end of the test session.
In the previous example, we can add scope=”module” so that the category, retailer_abc, product_one, and product_two instances will only be invoked once per test module.
Multiple test functions in a test module will thus each receive the same category, retailer_abc, product_one, and product_two fixture instance, thus saving time.
This is how we can add scope to the fixtures, and you can do it for all the fixtures.
But, If we try to test this in the terminal, we will encounter an error:
$ pytest============================= test session starts ==============================platform linux -- Python 3.7.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1django: settings: pytest_fixtures.settings (from ini)rootdir: /home/suraj/PycharmProjects/e_commerce_app, configfile: pytest.iniplugins: django-4.1.0collected 2 items product/tests/test_models.py EE [100%]====================================ERRORS====================================___________________ ERROR at setup of test_product_retailer ____________________ScopeMismatch: You tried to access the 'function' scoped fixture 'db'with a 'module' scoped request object, involved factoriesproduct/tests/test_models.py:13: def retailer_abc(db) -> product.models.Categoryvenv/lib/python3.7/site-packages/pytest_django/fixtures.py:193: def db(request, django_db_setup, django_db_blocker)______________________ ERROR at setup of test_product_one ______________________ScopeMismatch: You tried to access the 'function' scoped fixture 'db'with a 'module' scoped request object, involved factories...==============================2 errors in 0.24s ===============================
The reason for this error is that the db fixture has the function scope for a reason, so the transaction rollbacks on the end of each test ensure the database is left in the same state it has when the test starts. Nevertheless, you can have the session/module scoped access to the database in the fixture by using the django_db_blocker fixture:
Now, if we go to the terminal and run the tests, it will run successfully.
$ pytest============================= test session starts ==============================platform linux -- Python 3.7.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1django: settings: pytest_fixtures.settings (from ini)rootdir: /home/suraj/PycharmProjects/e_commerce_app, configfile: pytest.iniplugins: django-4.1.0collected 2 items product/tests/test_models.py .. [100%]==============================2 passed in 0.22s ===============================
Warning: Beware that when unlocking the database in session scope, you’re on your own if you alter the database in other fixtures or tests.
Conclusion
We have successfully learned various features pytest fixtures provide and how we can benefit from the code reusability perspective and have maintainable code in your tests. Dependency management and arranging your test data becomes easy with the help of fixtures.
This was a blog about how you can use fixtures and the various features it provides along with the Django models. You can check more on fixtures by referring to the official documentation.
With the evolving architectural design of web applications, microservices have been a successful new trend in architecting the application landscape. Along with the advancements in application architecture, transport method protocols, such as REST and gRPC are getting better in efficiency and speed. Also, containerizing microservice applications help greatly in agile development and high-speed delivery.
In this blog, I will try to showcase how simple it is to build a cloud-native application on the microservices architecture using Go.
We will break the solution into multiple steps. We will learn how to:
1) Build a microservice and set of other containerized services which will have a very specific set of independent tasks and will be related only with the specific logical component.
2) Use go-kit as the framework for developing and structuring the components of each service.
3) Build APIs that will use HTTP (REST) and Protobuf (gRPC) as the transport mechanisms, PostgreSQL for databases and finally deploy it on Azure stack for API management and CI/CD.
Note: Deployment, setting up the CI-CD and API-Management on Azure or any other cloud is not in the scope of the current blog.
Prerequisites:
A beginner’s level of understanding of web services, Rest APIs and gRPC
GoLand/ VS Code
Properly installed and configured Go. If not, check it out here
Set up a new project directory under the GOPATH
Understanding of the standard Golang project. For reference, visit here
PostgreSQL client installed
Go kit
What are we going to do?
We will develop a simple web application working on the following problem statement:
A global publishing company that publishes books and journals wants to develop a service to watermark their documents. A document (books, journals) has a title, author and a watermark property
The watermark operation can be in Started, InProgress and Finished status
The specific set of users should be able to do the watermark on a document
Once the watermark is done, the document can never be re-marked
Example of a document:
{content: “book”, title: “The Dark Code”, author: “Bruce Wayne”, topic: “Science”}
For a detailed understanding of the requirement, please refer to this.
Architecture:
In this project, we will have 3 microservices: Authentication Service, Database Service and the Watermark Service. We have a PostgreSQL database server and an API-Gateway.
Authentication Service:
The application is supposed to have a role-based and user-based access control mechanism. This service will authenticate the user according to its specific role and return HTTP status codes only. 200 when the user is authorized and 401 for unauthorized users.
APIs:
/user/access, Method: GET, Secured: True, payload: user: <name></name> It will take the user name as an input and the auth service will return the roles and the privileges assigned to it
/authenticate, Method: GET, Secured: True, payload: user: <name>, operation: <op></op></name> It will authenticate the user with the passed operation if it is accessible for the role
/healthz, Method: GET, Secured: True It will return the status of the service
Database Service:
We will need databases for our application to store the user, their roles and the access privileges to that role. Also, the documents will be stored in the database without the watermark. It is a requirement that any document cannot have a watermark at the time of creation. A document is said to be created successfully only when the data inputs are valid and the database service returns the success status.
We will be using two databases for two different services for them to be consumed. This design is not necessary, but just to follow the “Single Database per Service” rule under the microservice architecture.
APIs:
/get, Method: GET, Secured: True, payload: filters: []filter{“field-name”: “value”} It will return the list of documents according to the specific filters passed
/update, Method: POST, Secured: True, payload: “Title”: <id>, document: {“field”: “value”, …}</id> It will update the document for the given title id
/add, Method: POST, Secured: True, payload: document: {“field”: “value”, …} It will add the document and return the title-ID
/remove Method: POST, Secured: True, payload: title: <id></id> It will remove the document entry according to the passed title-id
/healthz, Method: GET, Secured: True It will return the status of the service
Watermark Service:
This is the main service that will perform the API calls to watermark the passed document. Every time a user needs to watermark a document, it needs to pass the TicketID in the watermark API request along with the appropriate Mark. It will try to call the database Update API internally with the provided request and returns the status of the watermark process which will be initially “Started”, then in some time the status will be “InProgress” and if the call was valid, the status will be “Finished”, or “Error”, if the request is not valid.
APIs:
/get, Method: GET, Secured: True, payload: filters: []filter{“field-name”: “value”} It will return the list of documents according to the specific filters passed
/status, Method: GET, Secured: True, payload: “Ticket”: <id></id> It will return the status of the document for watermark operation for the passed ticket-id
/addDocument, Method: POST, Secured: True, payload: document: {“field”: “value”, …} It will add the document and return the title-ID
/watermark, Method: POST, Secured: True, payload: title: <id>, mark: “string”</id> It is the main watermark operation API which will accept the mark string
/healthz, Method: GET, Secured: True It will return the status of the service
Operations and Flow:
Watermark Service APIs are the only ones that will be used by the user/actor to request watermark or add the document. Authentication and Database service APIs are the private ones that will be called by other services internally. The only URL accessible to the user is the API Gateway URL.
The user will access the API Gateway URL with the required user name, the ticket-id and the mark with which the user wants the document to apply watermark
The user should not know about the authentication or database services
Once the request is made by the user, it will be accepted by the API Gateway. The gateway will validate the request along with the payload
An API forwarding rule of configuring the traffic of a specific request to a service should be defined in the gateway. The request when validated, will be forwarded to the service according to that rule.
We will define an API forwarding rule where the request made for any watermark will be first forwarded to the authentication service which will authenticate the request, check for authorized users and return the appropriate status code.
The authorization service will check for the user from which the request has been made, into the user database and its roles and permissions. It will send the response accordingly
Once the request has been authorized by the service, it will be forwarded back to the actual watermark service
The watermark service then performs the appropriate operation of putting the watermark on the document or add a new entry of the document or any other request
The operation from the watermark service of Get, Watermark or AddDocument will be performed by calling the database CRUD APIs and forwarded to the user
If the request is to AddDocument then the service should return the “TicketID” or if it is for watermark then it should return the status of the operation
Note:
Each user will have some specific roles, based on which the access controls will be identified for the user. For the sake of simplicity, the roles will be based on the type of document only, not the specific name of the book or journal
Getting Started:
Let’s start by creating a folder for our application in the $GOPATH. This will be the root folder containing our set of services.
Project Layout:
The project will follow the standard Golang project layout. If you want the full working code, please refer here
api: Stores the versions of the APIs swagger files and also the proto and pb files for the gRPC protobuf interface.
cmd: This will contain the entry point (main.go) files for all the services and also any other container images if any
docs: This will contain the documentation for the project
config: All the sample files or any specific configuration files should be stored here
deploy: This directory will contain the deployment files used to deploy the application
internal: This package is the conventional internal package identified by the Go compiler. It contains all the packages which need to be private and imported by its child directories and immediate parent directory. All the packages from this directory are common across the project
pkg: This directory will have the complete executing code of all the services in separate packages.
tests: It will have all the integration and E2E tests
vendor: This directory stores all the third-party dependencies locally so that the version doesn’t mismatch later
We are going to use the Go kit framework for developing the set of services. The official Go kit examples of services are very good, though the documentation is not that great.
Watermark Service:
1. Under the Go kit framework, a service should always be represented by an interface.
Create a package named watermark in the pkg folder. Create a new service.go file in that package. This file is the blueprint of our service.
package watermarkimport ("context""github.com/velotiotech/watermark-service/internal")typeService interface {// Get the list of all documents Get(ctx context.Context, filters ...internal.Filter) ([]internal.Document, error) Status(ctx context.Context, ticketID string) (internal.Status, error) Watermark(ctx context.Context, ticketID, mark string) (int, error) AddDocument(ctx context.Context, doc *internal.Document) (string, error) ServiceStatus(ctx context.Context) (int, error)}
2. As per the functions defined in the interface, we will need five endpoints to handle the requests for the above methods. If you are wondering why we are using a context package, please refer here. Contexts enable the microservices to handle the multiple concurrent requests, but maybe in this blog, we are not using it too much. It’s just the best way to work with it.
3. Implementing our service:
package watermarkimport ("context""net/http""os""github.com/velotiotech/watermark-service/internal""github.com/go-kit/kit/log""github.com/lithammer/shortuuid/v3")typewatermarkService struct{}func NewService() Service { return&watermarkService{} }func (w *watermarkService) Get(_ context.Context, filters ...internal.Filter) ([]internal.Document, error) {// query the database using the filters and return the list of documents// return error if the filter (key) is invalid and also return error if no item founddoc := internal.Document{Content: "book",Title: "Harry Potter and Half Blood Prince",Author: "J.K. Rowling",Topic: "Fiction and Magic", }return []internal.Document{doc}, nil}func (w *watermarkService) Status(_ context.Context, ticketID string) (internal.Status, error) {// query database using the ticketID and return the document info// return err if the ticketID is invalid or no Document exists for that ticketIDreturn internal.InProgress, nil}func (w *watermarkService) Watermark(_ context.Context, ticketID, mark string) (int, error) {// update the database entry with watermark field as non empty// first check if the watermark status is not already in InProgress, Started or Finished state// If yes, then return invalid request// return error if no item found using the ticketIDreturn http.StatusOK, nil}func (w *watermarkService) AddDocument(_ context.Context, doc *internal.Document) (string, error) {// add the document entry in the database by calling the database service// return error if the doc is invalid and/or the database invalid entry errornewTicketID := shortuuid.New()return newTicketID, nil}func (w *watermarkService) ServiceStatus(_ context.Context) (int, error) { logger.Log("Checking the Service health...")return http.StatusOK, nil}var logger log.Loggerfunc init() { logger = log.NewLogfmtLogger(log.NewSyncWriter(os.Stderr)) logger = log.With(logger, "ts", log.DefaultTimestampUTC)}
We have defined the new type watermarkService empty struct which will implement the above-defined service interface. This struct implementation will be hidden from the rest of the world.
NewService() is created as the constructor of our “object”. This is the only function available outside this package to instantiate the service.
4. Now we will create the endpoints package which will contain two files. One is where we will store all types of requests and responses. The other file will be endpoints which will have the actual implementation of the requests parsing and calling the appropriate service function.
– Create a file named reqJSONMap.go. We will define all the requests and responses struct with the fields in this file such as GetRequest, GetResponse, StatusRequest, StatusResponse, etc. Add the necessary fields in these structs which we want to have input in a request or we want to pass the output in the response.
In this file, we have a struct Set which is the collection of all the endpoints. We have a constructor for the same. We have the internal constructor functions which will return the objects which implement the generic endpoint. Endpoint interface of Go kit such as MakeGetEndpoint(), MakeStatusEndpoint() etc.
In order to expose the Get, Status, Watermark, ServiceStatus and AddDocument APIs, we need to create endpoints for all of them. These functions handle the incoming requests and call the specific service methods
5. Adding the Transports method to expose the services. Our services will support HTTP and will be exposed using Rest APIs and protobuf and gRPC.
Create a separate package of transport in the watermark directory. This package will hold all the handlers, decoders and encoders for a specific type of transport mechanism
6. Create a file http.go: This file will have the transport functions and handlers for HTTP with a separate path as the API routes.
This file is the map of the JSON payload to their requests and responses. It contains the HTTP handler constructor which registers the API routes to the specific handler function (endpoints) and also the decoder-encoder of the requests and responses respectively into a server object for a request. The decoders and encoders are basically defined just to translate the request and responses in the desired form to be processed. In our case, we are just converting the requests/responses using the json encoder and decoder into the appropriate request and response structs.
We have the generic encoder for the response output, which is a simple JSON encoder.
7. Create another file in the same transport package with the name grpc.go. Similar to above, the name of the file is self-explanatory. It is the map of protobuf payload to their requests and responses. We create a gRPC handler constructor which will create the set of grpcServers and registers the appropriate endpoint to the decoders and encoders of the request and responses
– Before moving on to the implementation, we have to create a proto file that acts as the definition of all our service interface and the requests response structs, so that the protobuf files (.pb) can be generated to be used as an interface between services to communicate.
– Create package pb in the api/v1 package path. Create a new file watermarksvc.proto. Firstly, we will create our service interface, which represents the remote functions to be called by the client. Refer to this for syntax and deep understanding of the protobuf.
We will convert the service interface to the service interface in the proto file. Also, we have created the request and response structs exactly the same once again in the proto file so that they can be understood by the RPC defined in the service.
Note: Creating the proto files and generating the pb files using protoc is not the scope of this blog. We have assumed that you already know how to create a proto file and generate a pb file from it. If not, please refer protobuf and protoc gen
I have also created a script to generate the pb file, which just needs the path with the name of the proto file.
#!/usr/bin/env sh# Install proto3 from source# brew install autoconf automake libtool# git clone https://github.com/google/protobuf# ./autogen.sh ; ./configure ; make ; make install## Update protoc Go bindings via# go get -u github.com/golang/protobuf/{proto,protoc-gen-go}## See also# https://github.com/grpc/grpc-go/tree/master/examplesREPO_ROOT="${REPO_ROOT:-$(cd "$(dirname "$0")/../.." && pwd)}"PB_PATH="${REPO_ROOT}/api/v1/pb"PROTO_FILE=${1:-"watermarksvc.proto"}echo "Generating pb files for ${PROTO_FILE} service"protoc -I="${PB_PATH}""${PB_PATH}/${PROTO_FILE}"--go_out=plugins=grpc:"${PB_PATH}"
8. Now, once the pb file is generated in api/v1/pb/watermark package, we will create a new struct grpcserver, grouping all the endpoints for gRPC. This struct should implement pb.WatermarkServer which is the server interface referred by the services.
To implement these services, we are defining the functions such as func (g *grpcServer) Get(ctx context.Context, r *pb.GetRequest) (*pb.GetReply, error). This function should take the request param and run the ServeGRPC() function and then return the response. Similarly, we should implement the ServeGRPC() functions for the rest of the functions.
These functions are the actual Remote Procedures to be called by the service.
We will also need to add the decode and encode functions for the request and response structs from protobuf structs. These functions will map the proto Request/Response struct to the endpoint req/resp structs. For example: func decodeGRPCGetRequest(_ context.Context, grpcReq interface{}) (interface{}, error). This will assert the grpcReq to pb.GetRequest and use its fields to fill the new struct of type endpoints.GetRequest{}. The decoding and encoding functions should be implemented similarly for the other requests and responses.
9. Finally, we just have to create the entry point files (main) in the cmd for each service. As we already have mapped the appropriate routes to the endpoints by calling the service functions and also we mapped the proto service server to the endpoints by calling ServeGRPC() functions, now we have to call the HTTP and gRPC server constructors here and start them.
Create a package watermark in the cmd directory and create a file watermark.go which will hold the code to start and stop the HTTP and gRPC server for the service
package mainimport ("fmt""net""net/http""os""os/signal""syscall" pb "github.com/velotiotech/watermark-service/api/v1/pb/watermark""github.com/velotiotech/watermark-service/pkg/watermark""github.com/velotiotech/watermark-service/pkg/watermark/endpoints""github.com/velotiotech/watermark-service/pkg/watermark/transport""github.com/go-kit/kit/log" kitgrpc "github.com/go-kit/kit/transport/grpc""github.com/oklog/oklog/pkg/group""google.golang.org/grpc")const ( defaultHTTPPort ="8081" defaultGRPCPort ="8082")func main() {var ( logger log.Logger httpAddr = net.JoinHostPort("localhost", envString("HTTP_PORT", defaultHTTPPort)) grpcAddr = net.JoinHostPort("localhost", envString("GRPC_PORT", defaultGRPCPort)) ) logger = log.NewLogfmtLogger(log.NewSyncWriter(os.Stderr)) logger = log.With(logger, "ts", log.DefaultTimestampUTC)var ( service = watermark.NewService() eps = endpoints.NewEndpointSet(service) httpHandler = transport.NewHTTPHandler(eps) grpcServer = transport.NewGRPCServer(eps) )var g group.Group {// The HTTP listener mounts the Go kit HTTP handler we created. httpListener, err := net.Listen("tcp", httpAddr)if err != nil { logger.Log("transport", "HTTP", "during", "Listen", "err", err) os.Exit(1) } g.Add(func() error { logger.Log("transport", "HTTP", "addr", httpAddr) return http.Serve(httpListener, httpHandler) }, func(error) { httpListener.Close() }) } {// The gRPC listener mounts the Go kit gRPC server we created. grpcListener, err := net.Listen("tcp", grpcAddr)if err != nil { logger.Log("transport", "gRPC", "during", "Listen", "err", err) os.Exit(1) } g.Add(func() error { logger.Log("transport", "gRPC", "addr", grpcAddr)// we add the Go Kit gRPC Interceptor to our gRPC service as it is used by// the here demonstrated zipkin tracing middleware. baseServer := grpc.NewServer(grpc.UnaryInterceptor(kitgrpc.Interceptor)) pb.RegisterWatermarkServer(baseServer, grpcServer) return baseServer.Serve(grpcListener) }, func(error) { grpcListener.Close() }) } {// This function just sits and waits for ctrl-C.cancelInterrupt :=make(chan struct{}) g.Add(func() error { c :=make(chan os.Signal, 1) signal.Notify(c, syscall.SIGINT, syscall.SIGTERM) select { case sig :=<-c: return fmt.Errorf("received signal %s", sig) case <-cancelInterrupt: return nil } }, func(error) {close(cancelInterrupt) }) } logger.Log("exit", g.Run())}func envString(env, fallback string) string {e := os.Getenv(env)if e =="" {return fallback }return e}
Let’s walk you through the above code. Firstly, we will use the fixed ports to make the server listen to them. 8081 for HTTP Server and 8082 for gRPC Server. Then in these code stubs, we will create the HTTP and gRPC servers, endpoints of the service backend and the service.
service = watermark.NewService()eps = endpoints.NewEndpointSet(service)grpcServer = transport.NewGRPCServer(eps)httpHandler = transport.NewHTTPHandler(eps)
Now the next step is interesting. We are creating a variable of oklog.Group. If you are new to this term, please refer here. Group helps you elegantly manage the group of Goroutines. We are creating three Goroutines: One for HTTP server, second for gRPC server and the last one for watching on the cancel interrupts. Just like this:
Similarly, we will start a gRPC server and a cancel interrupt watcher. Great!! We are done here. Now, let’s run the service.
go run ./cmd/watermark/watermark.go
The server has started locally. Now, just open a Postman or run curl to one of the endpoints. See below: We ran the HTTP server to check the service status:
We have successfully created a service and ran the endpoints.
Further:
I really like to make a project complete always with all the other maintenance parts revolving around. Just like adding the proper README, have proper .gitignore, .dockerignore, Makefile, Dockerfiles, golang-ci-lint config files, and CI-CD config files etc.
I have created a separate Dockerfile for each of the three services in path /images/.
I have created a multi-staged dockerfile to create the binary of the service and run it. We will just copy the appropriate directories of code in the docker image, build the image all in one and then create a new image in the same file and copy the binary in it from the previous one. Similarly, the dockerfiles are created for other services also.
In the dockerfile, we have given the CMD as go run watermark. This command will be the entry point of the container. I have also created a Makefile which has two main targets: build-image and build-push. The first one is to build the image and the second is to push it.
Note: I am keeping this blog concise as it is difficult to cover all the things. The code in the repo that I have shared in the beginning covers most of the important concepts around services. I am still working and continue committing improvements and features.
Let’s see how we can deploy:
We will see how to deploy all these services in the containerized orchestration tools (ex: Kubernetes). Assuming you have worked on Kubernetes with at least a beginner’s understanding before.
In deploy dir, create a sample deployment having three containers: auth, watermark and database. Since for each container, the entry point commands are already defined in the dockerfiles, we don’t need to send any args or cmd in the deployment.
We will also need the service which will be used to route the external traffic of request from another load balancer service or nodeport type service. To make it work, we might have to create a nodeport type of service to expose the watermark-service to make it running for now.
Another important and very interesting part is to deploy the API Gateway. It is required to have at least some knowledge of any cloud provider stack to deploy the API Gateway. I have used Azure stack to deploy an API Gateway using the resource called as “API-Management” in the Azure plane. Refer the rules config files for the Azure APIM api-gateway:
Further, only a proper CI/CD setup is remaining which is one of the most essential parts of a project after development. I would definitely like to discuss all the above deployment-related stuff in more detail but that is not in the scope of my current blog. Maybe I will post another blog for the same.
Wrapping up:
We have learned how to build a complete project with three microservices in Golang using one of the best-distributed system development frameworks: Go kit. We have also used the database PostgreSQL using the GORM used heavily in the Go community. We did not stop just at the development but also we tried to theoretically cover the development lifecycle of the project by understanding what, how and where to deploy.
We created one microservice completely from scratch. Go kit makes it very simple to write the relationship between endpoints, service implementations and the communication/transport mechanisms. Now, go and try to create other services from the problem statement.
ClickHouse is an open-source column-oriented data warehouse for online analytical processing of queries (OLAP). It is fast, scalable, flexible, cost-efficient, and easy to run. It supports the best in the industry query performance while significantly reducing storage requirements through innovative use of columnar storage and compression.
ClickHouse’s performance exceeds comparable column-oriented database management systems that are available on the market. ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases at runtime, loading data, and running queries without reconfiguring and restarting the server.
ClickHouse processes from hundreds of millions to over a billion rows of data across hundreds of node clusters. It utilizes all available hardware for processing queries to their fastest. The peak processing performance for a single query stands at more than two terabytes per second.
What makes ClickHouse unique?
Data Storage & Compression: ClickHouse is designed to work on regular hard drives but uses SSD and additional RAM if available. Data compression in ClickHouse plays a crucial role in achieving excellent performance. It provides general-purpose compression codecs and some specialized codecs for specific kinds of data. These codecs have different CPU consumption and disk space and help ClickHouse outperform other databases.
High Performance: By using vector computation, engine data is processed by vectors which are parts of columns, and achieve high CPU efficiency. It supports parallel processing across multiple cores, turning large queries into parallelized naturally. ClickHouse also supports distributed query processing; data resides across shards which are used for parallel execution of the query.
Primary & Secondary Index: Data is sorted physically by the primary key allowing low latency extraction of specific values or ranges. The secondary index in ClickHouse enable the database to know that the query filtering conditions would skip some of the parts entirely. Therefore, these are also called data skipping indexes.
Support for Approximated Calculations: ClickHouse trades accuracy for performance by approximated calculations. It provides aggregate functions for an approximated estimate of several distinct values, medians, and quantiles. It retrieves proportionally fewer data from the disk to run queries based on the part of data to get approximated results.
Data Replication and Data Integrity Support: All the remaining duplicates retrieve their copies in the background after being written to any available replica. The system keeps identical data on several clones. Most failures are recovered automatically or semi-automatically in complex scenarios.
But it can’t be all good, can it? there are some disadvantages to ClickHouse as well:
No full-fledged transactions.
Inability to efficiently and precisely change or remove previously input data. For example, to comply with GDPR, data could well be cleaned up or modified using batch deletes and updates.
ClickHouse is less efficient for point queries that retrieve individual rows by their keys due to the sparse index.
ClickHouse against its contemporaries
So with all these distinctive features, how does ClickHouse compare with other industry-leading data storage tools. Now, ClickHouse being general-purpose, has a variety of use cases, and it has its pros and cons, so here’s a high-level comparison against the best tools in their domain. Depending on the use case, each tool has its unique traits, and comparison around them would not be fair, but what we care about the most is performance, scalability, cost, and other key attributes that can be compared irrespective of the domain. So here we go:
ClickHouse vs Snowflake:
With its decoupled storage & compute approach, Snowflake is able to segregate workloads and enhance performance. The search optimization service in Snowflake further enhances the performance for point lookups but has additional costs attached with it. ClickHouse, on the other hand, with local runtime and inherent support for multiple forms of indexing, drastically improves query performance.
Regarding scalability, ClickHouse being on-prem makes it slightly challenging to scale compared to Snowflake, which is cloud-based. Managing hardware manually by provisioning clusters and migrating is doable but tedious. But one possible solution to tackle is to deploy CH on the cloud, a very good option that is cheaper and, frankly, the most viable.
ClickHouse vs Redshift:
Redshift is a managed, scalable cloud data warehouse. It offers both provisioned and serverless options. Its RA3 nodes compute scalably and cache the necessary data. Still, even with that, its performance does not separate different workloads that are on the same data putting it on the lower end of the decoupled compute & storage cloud architectures. ClickHouse’s local runtime is one of the fastest.
Both Redshift and ClickHouse are columnar, sort data, allowing read-only specific data. But deploying CH is cheaper, and although RS is tailored to be a ready-to-use tool, CH is better if you’re not entirely dependent on Redshift’s features like configuration, backup & monitoring.
ClickHouse vs InfluxDB:
InfluxDB, written in Go, this open-source no-SQL is one of the most popular choices when it comes to dealing with time-series data and analysis. Despite being a general-purpose analytical DB, ClickHouse provides competitive write performance.
ClickHouse’s data structures like AggregatingMergeTree allow real-time data to be stored in a pre-aggregated format which puts it on par in performance regarding TSDBs. It is significantly faster in heavy queries and comparable in the case of light queries.
ClickHouse vs PostgreSQL:
Postgres is another DB that is very versatile and thus is widely used by the world for various use cases, just like ClickHouse. Postgres, however, is an OLTP DB, so unlike ClickHouse, analytics is not its primary aim, but it’s still used for analytics purposes to a certain extent.
In terms of transactional data, ClickHouse’s columnar nature puts it below Postgres, but when it comes to analytical capabilities, even after tuning Postgres to its max potential, for, e.g., by using materialized views, indexing, cache size, buffers, etc. ClickHouse is ahead.
ClickHouse vs Apache Druid:
Apache Druid is an open-source data store that is primarily used for OLAP. Both Druid & ClickHouse are very similar in terms of their approaches and use cases but differ in terms of their architecture. Druid is mainly used for real-time analytics with heavy ingestions and high uptime.
Unlike Druid, ClickHouse has a much simpler deployment. CH can be deployed on only one server, while Druid setup needs multiple types of nodes (master, broker, ingestion, etc.). ClickHouse, with its support for SQL-like nature, provides better flexibility. It is more performant when the deployment is small.
To summarize the differences between ClickHouse and other data warehouses:
ClickHouse Engines
Depending on the type of your table (internal or external) ClickHouse provides an array of engines that help us connect to different data storages and also determine the way data is stored, accessed, and other interactions on it.
These engines are mainly categorized into two types:
Database Engines:
These allow us to work with different databases & tables. ClickHouse uses the Atomic database engine to provide configurable table engines and dialects. The popular ones are PostgreSQL, MySQL, and so on.
Table Engines:
These determine
how and where data is stored
where to read/write it from/to
which queries it supports
use of indexes
concurrent data access and so on.
These engines are further classified into families based on the above parameters:
MergeTree Engines:
This is the most universal and functional table for high-load tasks. The engines of this family support quick data insertion with subsequent background data processing. These engines also support data replication, partitioning, secondary data-skipping indexes and some other features. Following are some of the popular engines in this family:
MergeTree
SummingMergeTree
AggregatingMergeTree
MergeTree engines with indexing and partitioning support allow data to be processed at a tremendous speed. These can also be leveraged to form materialized views that store aggregated data further improving the performance.
Log Engines:
These are lightweight engines with minimum functionality. These work the best when the requirement is to quickly write into many small tables and read them later as a whole. This family consists of:
Log
StripeLog
TinyLog
These engines append data to the disk in a sequential fashion and support concurrent reading. They do not support indexing, updating, or deleting and hence are only useful when the data is small, sequential, and immutable.
Integration Engines:
These are used for communicating with other data storage and processing systems. This support:
JDBC
MongoDB
HDFS
S3
Kafka and so on.
Using these engines we can import and export data from external sources. With engines like Kafka we can ingest data directly from a topic to a table in ClickHouse and with the S3 engine, we work directly with S3 objects.
Special Engines:
ClickHouse offers some special engines that are specific to the use case. For example:
MaterializedView
Distributed
Merge
File and so on.
These special engines have their own quirks for eg. with File we can export data to a file, update data in the table by updating the file, etc.
Summary
We learned that ClickHouse is a very powerful and versatile tool. One that has stellar performance is feature-packed, very cost-efficient, and open-source. We saw a high-level comparison of ClickHouse with some of the best choices in an array of use cases. Although it ultimately comes down to how specific and intense your use case is, ClickHouse and its generic nature measure up pretty well on multiple occasions.
ClickHouse’s applicability in web analytics, network management, log analysis, time series analysis, asset valuation in financial markets, and security threat identification makes it tremendously versatile. With consistently solving business problems in a low latency response for petabytes of data, ClickHouse is indeed one of the faster data warehouses out there.
The concept of operators was introduced by CoreOs in the last quarter of 2016 and post the introduction of operator framework last year, operators are rapidly becoming the standard way of managing applications on Kubernetes especially the ones which are stateful in nature. In this blog post, we will learn what an operator is. Why they are needed and what problems do they solve. We will also create a helm based operator as an example.
This is the first part of our Kubernetes Operator Series. In the second part, getting started with Kubernetes operators (Ansible based), and the third part, getting started with Kubernetes operators (Golang based), you can learn how to build Ansible and Golang based operators.
What is an Operator?
Whenever we deploy our application on Kubernetes we leverage multiple Kubernetes objects like deployment, service, role, ingress, config map, etc. As our application gets complex and our requirements become non-generic, managing our application only with the help of native Kubernetes objects becomes difficult and we often need to introduce manual intervention or some other form of automation to make up for it.
Operators solve this problem by making our application first class Kubernetes objects that is we no longer deploy our application as a set of native Kubernetes objects but a custom object/resource of its kind, having a more domain-specific schema and then we bake the “operational intelligence” or the “domain-specific knowledge” into the controller responsible for maintaining the desired state of this object. For example, etcd operator has made etcd-cluster a first class object and for deploying the cluster we create an object of Etcd Cluster kind. With operators, we are able to extend Kubernetes functionalities for custom use cases and manage our applications in a Kubernetes specific way allowing us to leverage Kubernetes APIs and Kubectl tooling.
Operators combine crds and custom controllers and intend to eliminate the requirement for manual intervention (human operator) while performing tasks like an upgrade, handling failure recovery, scaling in case of complex (often stateful) applications and make them more resilient and self-sufficient.
How to Build Operators ?
For building and managing operators we mostly leverage the Operator Framework which is an open source tool kit allowing us to build operators in a highly automated, scalable and effective way. Operator framework comprises of three subcomponents:
Operator SDK: Operator SDK is the most important component of the operator framework. It allows us to bootstrap our operator project in minutes. It exposes higher level APIs and abstraction and saves developers the time to dig deeper into kubernetes APIs and focus more on building the operational logic. It performs common tasks like getting the controller to watch the custom resource (cr) for changes etc as part of the project setup process.
Operator Lifecycle Manager: Operators also run on the same kubernetes clusters in which they manage applications and more often than not we create multiple operators for multiple applications. Operator lifecycle manager (OLM) provides us a declarative way to install, upgrade and manage all the operators and their dependencies in our cluster.
Operator Metering: Operator metering is currently an alpha project. It records historical cluster usage and can generate usage reports showing usage breakdown by pod or namespace over arbitrary time periods.
Types of Operators
Currently there are three different types of operator we can build:
Helm based operators: Helm based operators allow us to use our existing Helm charts and build operators using them. Helm based operators are quite easy to build and are preferred to deploy a stateless application using operator pattern.
Ansible based Operator: Ansible based operator allows us to use our existing ansible playbooks and roles and build operators using them. There are also easy to build and generally preferred for stateless applications.
Go based operators: Go based operators are built to solve the most complex use cases and are generally preferred for stateful applications. In case of an golang based operator, we build the controller logic ourselves providing it with all our custom requirements. This type of operators is also relatively complex to build.
Building a Helm based operator
1. Let’s first install the operator sdk
go get -d github.com/operator-framework/operator-sdkcd $GOPATH/src/github.com/operator-framework/operator-sdkgit checkout mastermake depmake install
Now we will have the operator-sdk binary in the $GOPATH/bin folder.
2. Setup the project
For building a helm based operator we can use an existing Helm chart. We will be using the book-store Helm chart which deploys a simple python app and mongodb instances. This app allows us to perform crud operations via. rest endpoints.
Now we will use the operator-sdk to create our Helm based bookstore-operator project.
operator-sdk new bookstore-operator --api-version=velotio.com/v1alpha1 --kind=BookStore --type=helm --helm-chart=book-store--helm-chart-repo=https://akash-gautam.github.io/helmcharts/
In the above command, the bookstore-operator is the name of our operator/project. –kind is used to specify the kind of objects this operator will watch and –api-verison is used for versioning of this object. The operator sdk takes only this much information and creates the custom resource definition (crd) and also the custom resource (cr) of its type for us (remember we talked about high-level abstraction operator sdk provides). The above command bootstraps a project with below folder structure.
bookstore-operator/||- build/ # Contains the Dockerfile to build the operator image|- deploy/ # Contains the crd,cr and manifest files for deploying operator|- helm-charts/ # Contains the helm chart we used while creating the project|- watches.yaml # Specifies the resource the operator watches (maintains the state of)
We had discussed the operator-sdk automates setting up the operator projects and that is exactly what we can observe here. Under the build folder, we have the Dockerfile to build our operator image. Under deploy folder we have a crd folder containing both the crd and the cr. This folder also has operator.yaml file using which we will run the operator in our cluster, along with this we have manifest files for role, rolebinding and service account file to be used while deploying the operator. We have our book-store helm chart under helm-charts. In the watches.yaml file.
We can see that the bookstore-operator watches events related to BookStore kind objects and executes the helm chart specified.
If we take a look at the cr file under deploy/crds (velotio_v1alpha1_bookstore_cr.yaml) folder then we can see that it looks just like the values.yaml file of our book-store helm chart.
apiVersion: velotio.com/v1alpha1kind: BookStoremetadata:name: example-bookstorespec: # Default values copied from <project_dir>/helm-charts/book-store/values.yaml # Default values for book-store. # This is a YAML-formatted file. # Declare variables to be passed into your templates.replicaCount: 1image:app:repository: akash125/pyapptag: latestpullPolicy: IfNotPresentmongodb:repository: mongotag: latestpullPolicy: IfNotPresentservice:app:type: LoadBalancerport: 80targetPort: 3000mongodb:type: ClusterIPport: 27017targetPort: 27017resources: {} # We usually recommend not to specify default resources and to leave thisasaconscious # choice for the user. This also increases chances charts run on environments with little # resources, such asMinikube. Ifyoudowanttospecifyresources, uncomment the following # lines, adjust them asnecessary, and remove the curly braces after 'resources:'. # limits: # cpu: 100m # memory: 128Mi # requests: # cpu: 100m # memory: 128MinodeSelector: {}tolerations: []affinity: {}
In the case of Helm charts, we use the values.yaml file to pass the parameter to our Helm releases, Helm based operator converts all these configurable parameters into the spec of our custom resource. This allows us to express the values.yaml with a custom resource (CR) which, as a native Kubernetes object, enables the benefits of RBAC applied to it and an audit trail. Now when we want to update out deployed we can simply modify the CR and apply it, and the operator will ensure that the changes we made are reflected in our app.
For each object of `BookStore` kind the bookstore-operator will perform the following actions:
Create the bookstore app deployment if it doesn’t exists.
Create the bookstore app service if it doesn’t exists.
Create the mongodb deployment if it doesn’t exists.
Create the mongodb service if it doesn’t exists.
Ensure deployments and services match their desired configurations like the replica count, image tag, service port etc.
3. Build the Bookstore-operator Image
The Dockerfile for building the operator image is already in our build folder we need to run the below command from the root folder of our operator project to build the image.
As we have our operator image ready we can now go ahead and run it. The deployment file (operator.yaml under deploy folder) for the operator was created as a part of our project setup we just need to set the image for this deployment to the one we built in the previous step.
After updating the image in the operator.yaml we are ready to deploy the operator.
Note: The role created might have more permissions then actually required for the operator so it is always a good idea to review it and trim down the permissions in production setups.
Verify that the operator pod is in running state.
5. Deploy the Bookstore App
Now we have the bookstore-operator running in our cluster we just need to create the custom resource for deploying our bookstore app.
First, we can create bookstore cr we need to register its crd.
Since its early days Kubernetes was believed to be a great tool for managing stateless application but the managing stateful applications on Kubernetes was always considered difficult. Operators are a big leap towards managing stateful applications and other complex distributed, multi (poly) cloud workloads with the same ease that we manage the stateless applications. In this blog post, we learned the basics of Kubernetes operators and build a simple helm based operator. In the next installment of this blog series, we will build an Ansible based Kubernetes operator and then in the last blog we will build a full-fledged Golang based operator for managing stateful workloads.
When working with servers or command-line-based applications, we spend most of our time on the command line. A good-looking and productive terminal is better in many aspects than a GUI (Graphical User Interface) environment since the command line takes less time for most use cases. Today, we’ll look at some of the features that make a terminal cool and productive.
You can use the following steps on Ubuntu 20.04. if you are using a different operating system, your commands will likely differ. If you’re using Windows, you can choose between Cygwin, WSL, and Git Bash.
Prerequisites
Let’s upgrade the system and install some basic tools needed.
Zsh is an extended Bourne shell with many improvements, including some features of Bash and other shells.
Let’s install Z-Shell:
sudo apt install zsh
Make it our default shell for our terminal:
chsh -s $(which zsh)
Now restart the system and open the terminal again to be welcomed by ZSH. Unlike other shells like Bash, ZSH requires some initial configuration, so it asks for some configuration options the first time we start it and saves them in a file called .zshrc in the home directory (/home/user) where the user is the current system user.
For now, we’ll skip the manual work and get a head start with the default configuration. Press 2, and ZSH will populate the .zshrc file with some default options. We can change these later.
The initial configuration setup can be run again as shown in the below image
Oh-My-ZSH
Oh-My-ZSH is a community-driven, open-source framework to manage your ZSH configuration. It comes with many plugins and helpers. It can be installed with one single command as below.
Installation
sh -c "$(wget https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh -O -)"
It’d take a backup of our existing .zshrc in a file zshrc.pre-oh-my-zsh, so whenever you uninstall it, the backup would be restored automatically.
Font
A good terminal needs some good fonts, we’d use Terminess nerd font to make our terminal look awesome, which can be downloaded here. Once downloaded, extract and move them to ~/.local/share/fonts to make them available for the current user or to /usr/share/fonts to be available for all the users.
tar -xvf Terminess.zipmv *.ttf ~/.local/share/fonts
Once the font is installed, it will look like:
Among all the things Oh-My-ZSH provides, 2 things are community favorites, plugins, and themes.
Theme
My go-to ZSH theme is powerlevel10k because it’s flexible, provides everything out of the box, and is easy to install with one command as shown below:
Close the terminal and start it again. Powerlevel10k will welcome you with the initial setup, go through the setup with the options you want. You can run this setup again by executing the below command:
p10k configure
Tools and plugins we can’t live without
Plugins can be added to the plugins array in the .zshrc file. For all the plugins you want to use from the below list, add those to the plugins array in the .zshrc file like so:
ZSH-Syntax-Highlighting
This enables the highlighting of commands as you type and helps you catch syntax errors before you execute them:
As you can see, “ls” is in green but “lss” is in red.
It’s a faster way of navigating the file system; it works by maintaining a database of directories you visit the most. More details can be found here.
sudo apt install autojump
You can also use the plugin Z as an alternative if you’re not able to install autojump or for any other reason.
Internal Plugins
Some plugins come installed with oh-my-zsh, and they can be included directly in .zshrc file without any installation.
copyfile
It copies the content of a file to the clipboard.
copyfile test.txt
copypath
It copies the absolute path of the current directory to the clipboard.
copybuffer
This plugin copies the command that is currently typed in the command prompt to the clipboard. It works with the keyboard shortcut CTRL + o.
sudo
Sometimes, we forget to prefix a command with sudo, but that can be done in just a second with this plugin. When you hit the ESC key twice, it will prefix the command you’ve typed in the terminal with sudo.
web-search
This adds some aliases for searching with Google, Wikipedia, etc. For example, if you want to web-search with Google, you can execute the below command:
Remember, you’d have to add each of these plugins in the .zshrc file as well. So, in the end, this is how the plugins array in .zshrc file should look like:
You can add more plugins, like docker, heroku, kubectl, npm, jsontools, etc., if you’re a developer. There are plugins for system admins as well or for anything else you need. You can explore them here.
Enhancd
Enhancd is the next-gen method to navigate file system with cli. It works with a fuzzy finder, we’ll install it fzf for this purpose.
sudo apt install fzf
Enhancd can be installed with the zplug plugin manager for ZSH, so first we’ll install zplug with the below command:
Now close your terminal, open it again, and use zplug to install enhanced
zplug "b4b4r07/enhancd", use:init.sh
Aliases
As a developer, I need to execute git commands many times a day, typing each command every time is too cumbersome, so we can use aliases for them. Aliases need to be added .zshrc, and here’s how we can add them.
Now, restart your terminal and execute the command colors in your terminal to see the magic!
Bonus – We can add some aliases as well if we want the same output of Colorls when we execute the command ls. Note that we’re adding another alias for ls to make it available as well.
alias cl='ls'alias ls='colorls'alias la='colorls -a'alias ll='colorls -l'alias lla='colorls -la'
These are the tools and plugins I can’t live without now, Let me know if I’ve missed anything.
Automation
Do you wanna repeat this process again, if let’s say, you’ve bought a new laptop and want the same setup?
You can automate all of this if your answer is no, and that’s why I’ve created Project Automator. This project does a lot more than just setting up a terminal: it works with Arch Linux as of now but you can take the parts you need and make it work with almost any *nix system you like.
Explaining how it works is beyond the scope of this article, so I’ll have to leave you guys here to explore it on your own.
Conclusion
We need to perform many tasks on our systems, and using a GUI(Graphical User Interface) tool for a task can consume a lot of your time, especially if you repeat the same task on a daily basis like converting a media stream, setting up tools on a system, etc.
Using a command-line tool can save you a lot of time and you can automate repetitive tasks with scripting. It can be a great tool for your arsenal.