Devops

Introduction

In this particular blog post you will get familiar with the most important kubernetes interview questions asked in the DevOps space, I have tried to cover the most basic questions to the more advanced questions on Kubernetes. Even if you are not a DevOps engineer, this post will give adequate knowledge if you are an application developer.

What is Kubernetes?

Kubernetes is one of the most famous open source container orchestration tool. We have tools like Docker to create containers, but in order to manage these containers we would need a container orchestration tools. These container orchestration tools are used to automate the tasks such as management, monitoring, scaling and deployment of the containers.

What is K8s ?

K8s is another name or alias that is used for Kubernetes, 8 stands for the number of letters between ‘K’ and ‘s’ in the word Kubernetes.

What are the feature of Kubernetes?

Horizontal Scaling – This feature uses a HorizontalPodAutoscalar which automatically increases or decreases the number of pods in a deployment, replication controller, replica set, or stateful set based on the CPU utilization. Scaling can be controlled via configuration as well.

Self Healing – Whenever the container fails, Kubernetes tries to restart them automatically. Whenever a node dies it reschedules or replaces a container. It will automatically kill the old containers and remove it out of the network if they are not responding to the user requests. All these decision are taken based on the health checks done by Kubernetes which is done internally.

Load balancing & service discovery – This is one of the key features where Kubernetes assigns IP addresses and a DNS Name for a set of containers. Kubernetes can also take care of load balancing and service discovery like any other cloud platforms.

Automated Rollouts and Rolls backs – Kubernetes makes the changes progressively. If there are any issues in the deployment, Kubernetes will automatically rollback the deployments.

Storage Orchestration – Kubernetes can automatically mount the storage system according to your choice, be it local storage or cloud storage like AWS etc. Kubernetes provides an essential feature called ‘persistent storage‘ for storing the data, which cannot be lost after the pod is killed or rescheduled. Kubernetes supports various storage systems for storing the data, such as Google Compute Engine’s Persistent Disks (GCE PD) or Amazon Elastic Block Storage (EBS). It also provides the distributed file systems: NFS or GFS.

Batch Execution – Everything that runs on Kubernetes is a workload. A workload can be a single component or several. There are several built-in workload resources available. Kubernetes provides two workload resources to create batch transactions i.e. a Job object and a CronJob object.

Configuration Management – Kubernetes provides a feature to change the configuration without rebuilding the image, so there is no need to redeploy the application when we change the configuration. Configuration will be outside the image.

What is container orchestration in Devops

Orchestration means coordination of multiple services and stringing them together in order to execute a workflow or a process. If you have seen an orchestra where there are multiple people playing their own instrument. So orchestration means the amalgamation of all instruments playing together in harmony. Similarly container orchestration means all the services in individual containers working together to fulfill the needs of a single server.

What is the need for Container Orchestration

Container orchestration is needed to achieve complete workflow of an application. Lets say we have 6-7 application docker containers. Every application needs to communicate with each other. So in order to manage these containers we would need a container orchestration tools. These tools are used to automate the tasks such as management, monitoring, scaling and deployment of the containers.

Difference between Kubernetes and Docker

From the trends it seems like both of these are competing technologies, but they are not alternatives to each other. In fact they go hand in hand together. Docker can run without kubernetes and kubernetes can run without docker. Good projects and architectures will use both of these wisely as they complement each other.

Docker	Kubernetes
Docker is a container technology which creates an isolated environment for the application.	While kubernetes is an Infrastructure for managing multiple containers.
Docker is mostly used in the CI CD process for automating, building and deploying applications.	It is used for automated scheduling and management of the deployed application containers.
Docker is a container platform for configuring, building and distributing containers.	Kubernetes is an ecosystem for managing a cluster of docker containers.

Difference between Kubernetes and Docker Swarm

Kubernetes and Docker Swarm are both container orchestration tools and alternatives to each other.

Kubernetes	Docker Swarm
Kubernetes is a portable, open-source, cloud-native infrastructure tool initially designed by Google to manage their clusters. Being a container orchestration tool, it automates the scaling, deployment, and management of containerized applications.	Docker Swarm is an open-source container orchestration platform that is native to Docker. It supports orchestrating clusters of Docker engines.
Complex installation.	Easier installation.
More complex with high learning curve, but more powerful.	More lightweight and easier to use , but limited functionality.
Supports auto scaling.	Supports manual scaling.
Kubernetes has built in monitoring.	Docker swarm needs third party tools for monitoring.
Manual setup of load balancer is needed.	Docker swarm supports auto load balancing.
Kubernetes has Kubectl CLI tool.	Docker swarm integrates with docker CLI.

What is a node in Kubernetes

Node in Kubernetes is like a single machine in a cluster, which can be a physical machine in a data center or a virtual machine from a cloud provider. There is a two type of nodes in Kubernetes i.e. master node and worker nodes. The master in Kubernetes controls the nodes that have containers. Each machine can substitute any other machine in a Kubernetes cluster.

Master node has 4 processes running inside it i.e. Api server, scheduler, controller manager, etcd.

What is a pod in Kubernetes

Pod is a basic and smallest unit of Kubernetes.
Pod is an abstraction over a container.
Pod is like a layer or running environment for the container.
Usually only one application is run inside a single Pod and each pod gets its own IP address.
Pods communicate with each other using this IP address. For some reason if a Pod dies, and a new Pod is created in its place, then a new IP address is assigned to it.
The main reason for Kubernetes to introduce Pods is to be independent of the container technology so that it can be replaced if required. Instead of working with the containers directly, you interact with the pods

What is a service in Kubernetes

In Kubernetes, every pod has its own IP address. The IP address keeps on changing whenever a pod dies and a new pod is created. A service in Kubernetes stands in front of the pods and it has a stable IP address, hence the clients can call service IP address to access to pods instead of calling the pods directly whose IP address keeps changing frequently. One of the best thing about kubernetes service is that, the lifecycle of the service and the pod is not connected. Even if the pod dies, the service and its IP address will stay.

Service can also act as a load balancer to forward the request to a pod, if there are multiple pods of the same application.

What are the various Service types available in Kubernetes?

Cluster IP – Cluster IP is the default service which has a static IP address. Accessible within the cluster.
NodePort – Nodeport service makes external traffic access to fixed port on each worker node. This Service type is not secure. Not for production use case.
LoadBalancer – Load balancer can be thought of an extension to NodePort service type and NodePort is an extension of ClusterIP service type. Each cloud provider has its own native loadbalancer implementation. Whenever a loadbalancer service is created, the NodePort and ClusterIP service is created automatically.
Headless – When clients or internal application want to communicate with specific pods (like stateful database pods) directly.

What is ingress in Kubernetes

An ingress allows users to access Kubernetes services from outside the Kubernetes cluster. We can configure the access rules that define which inbound connections reach which services.

In order to access the kubernetes service, we have IP address of the node we want to talk to, so in the browser it will look like http://123.78.101.5:8081. Obviously for the end product you don’t want the users to type the IP address for accessing the application. So we would need a proper domain name such as https://your-app.com . For this purpose we have a Kubernetes component called Ingress. So instead of service, the request first goes to ingress, and then the ingress will forward the request to the service.

What is the difference between Deploying applications on host and container

A host will have an operating system with a kernel that holds various libraries installed on the operating system that is needed for an application. In this host based architecture there will be n number of applications, and each application will share the libraries present in the operating system.

In container based architecture, there is only one common component i.e. kernel. Every application will be containerized and it will be isolated from other applications. So, the applications have the necessary libraries and binaries isolated from the rest of the system, and cannot be encroached by any other application.

What are the different components of Kubernetes architecture

Click here to see an entire blog post on kubernetes architecture.

What is MiniKube

MiniKube is used to run single node Kubernetes cluster on personal computers containing Windows, MacOS etc. If you want to try out Kubernetes for learning purpose or for daily development work then you can install MiniKube.

What is ConfigMap and Secret

ConfigMap is used to store configuration information like URLs of other dependent services that your service depends on. It can also be used to store the database configuration information. We just need to connect the configMap to the pod and the pod will get the required configuration. The traditional way of storing such information was either in application.properties or application.yaml or in the form for environment variables. The properties or yaml files would be part of the application build. One of the drawback of this approach was that, if any of the value changed then we would have to rebuild the application and redo other steps to reflect the new change. In case of ConfigMap if there is any change in values then we don’t need to rebuild the image, we just change the ConfigMap and the changes will be reflected.

Secret is just like the ConfigMap. The only difference is that Secret is used to store secret data such as credentials, certificates etc. It would be risky to store the passwords in ConfigMap, hence secrets are the ultimate choice. The data is stored in base64 encoded format. Just like ConfigMap, it is also connected to the Pod so that it can read the information. You can read the information from the ConfigMap or Secret inside of your application pod using the environment variable or even as a properties file.

What are Volumes in Kubernetes

Volumes in Kubernetes are used to persist the data in the database reliably and for the long term period. It basically attaches a physical storage or your hard drive to your pod. The storage could be either at your local machine where the pod is running or it could be at the remote storage which would be outside your kubernetes cluster. It could be either cloud storage or on premise storage which is not part of your kubernetes cluster.

If there were no kubernetes volumes and if the application or the database container or the pod gets restarted, then data would be gone which would be problematic and inconvenient. Since the introduction of volumes if your kubernetes pod or container got restarted, your data would be safe.

Kubernetes does not manage the data persistence or data backup, you as a kubernetes user have to manage it.

What is deployment in kubernetes

Deployment in kubernetes is like an abstraction over a pod, which makes it more convenient to interact with the pods, replicate them and do some other configuration. So in practice you would mostly work with deployments and not with the pods.

What is statefulset in kubernetes

Statefulset is meant for applications like databases such as mongodb, postgres etc, as these are stateful applications. If you want to replicate the databases, then you need to use the statefulset rather than the deployments. Statefulset just like deployment would take care of replicating the pods and scaling them up and down, but making sure that database reads, writes are synchronized so that no database inconsistencies occur. However deploying databases using statefuset in kubernetes cluster can be somewhat tedious. One of the common practice is to host the database outside the kubernetes cluster.

What is etcd in kubernetes

Etcd in kubernetes is one of the processes present in the master node. Etcd is a key-value store which contains the cluster state information. You can think of it as a cluster brain. Every change in the cluster e.g. when a new pod gets created in the cluster or when when the pod dies, all these changes are saved or updated into this key-value store. Other processes of the master node such as api server, scheduler, controller manager are dependent on the information present in the etcd.

Etcd contains the information such as whether the cluster is healthy or not? how many resources are available? etc.

What is ClusterIP in kubernetes

The ClusterIP is the default Kubernetes service that provides a service inside a cluster (with no external access) that other apps inside your cluster can access.

What is a LoadBalancer in Kubernetes

The Load Balancer service is used to expose services in the cluster to the internet. A Network load balancer creates a single IP address that forwards all traffic to your service.

As shown in the below figure, all the incoming traffic from outside the internet comes to a single IP address on the load balancer which routes the incoming traffic to a particular pod (via service) using an algorithm known as round-robin. If any of the pod goes down load balances are notified so that the traffic is not routed to that unavailable node. Thus load balancers in Kubernetes are responsible for distributing incoming traffic to the pods

What is Kubelet in Kubernetes

Kubelet is one of the process in kubernetes which runs on each worker node and enables the worker node to communicate with the master. So, Kubelet works based on the configuration of containers present in the PodSpec and makes sure that the containers described in the PodSpec are healthy and running.

Kubelet interacts with both the container and the node.

What is Kubectl in Kubernetes

Kubectl is the platform where you can pass commands to the cluster. Kubectl basically provides the CLI to run commands against the Kubernetes cluster with various ways to create and manage the Kubernetes component.

What is Heapster

Heapster is similar to other pods in the Kubernetes cluster which is used to query usage data from all the nodes within the Kubernetes cluster. Heapster is like an aggregator which monitors the performance and collects all the metrics from the data collected by the Kublet.

What is a Kube-Proxy in Kuberntes

Kube-proxy is used for directing traffic to the right container based on IP and the port number of incoming requests. Kube-proxy is an implementation of a load balancer and network proxy used to support service abstraction with other networking operations. Kube-proxy must be installed on every worker node.

To summarize, Kube-Proxy is responsible for the communication of pods within the cluster and to the outside network, which runs on every node. This service is responsible to maintain network protocols when your pod establishes a network communication.

What is Kube-apiserver

Kube-apiserver is a master API service which acts as an entry point to K8 cluster. API server is the heart of the entire cluster. The API Server exposes RESTful APIs so that we can interact with the master node. The way we can fire the calls is using kubectl. It is basically a command line utility tool which takes yml or json files that we provide, and it in turn hands it over to the API server, the API server will take the request, validate and process by performing the task against the worker nodes. kubectl is just the wrapper around the curl command which makes the http call.

What does node status contain

Address, Condition, Capacity, and Info are the main components of node status.

What is a kube-scheduler in Kubernetes

The kube-scheduler is responsible for assigning nodes to newly created pods. It schedule’s POD’s according to available resources on executor nodes.

What do you mean by namespace in Kubernetes

The namespaces allow applications to be organized into groups that fall under same domain, such as a separate namespace for all finance applications, for monitoring applications and another for all security applications.

Namespaces can also be used to divide cluster resources between multiple users. They are meant for environments where there are multiple users spanning multiple projects or teams and provide a scope of resources.

What do you mean by Kubernetes controller manager

Kubernetes controller manager enables the running of multiple processes on the master node even though they are compiled to run as a single process.

Control manager runs in the background. The Control manager is responsible for making sure the cluster is in the desired state. For example, If we want 5 pods to be up and running, and within those 5 pods we always need to have 2 containers in each of those pods, then it is the responsibility of the control manager to make sure that the cluster is in the desired state as requested.

What are the types of controller managers

The primary controller managers that can run on the master node are the service accounts controller, namespace controller, endpoints controller, token controller, node controller, and replication controller.

What do you mean by headless service

Headless service enables you to access the pods directly without a proxy. It is similar to normal service, but the only difference is that it does not have Cluster IP.

What is the difference between Replication controller and Replica set ?

Replication Controller monitors the pods and automatically restarts them if there is any failure. If the node fails, the the controller will respawn all the pods of that particular node on another node. It is responsible for updating or deleting multiple pods with a single command.

Replication controller and Replica set are almost the same thing. Replica set can be said as an advanced version of Replication controller. The only difference is in the usage of selectors to replicate pods. Replica Set uses Set-Based selectors while replication controllers use Equity-Based selectors.

What are the ways to establish security in Kubernetes

We can introduce RBAC (Role-based access control).
We can make use of namespaces to establish security boundaries.
Set the admission control policies to avoid running the privileged containers.
Enable audit logging

What is Blue Green deployment

September 1, 2022 by Ravikiran Kada Leave a Comment

Introduction

One of the activity that every developer wishes, is hassle free application deployment so that they don’t have sleepless nights due to deployment failure 😋😂. Specially in the large teams and business units, when there are hundreds of microservices, the deployment task has to be carried out very cautiously. If the deployment fails due to any reason, be it wrong application configuration or any other issue, we should have rollback strategy in place, so that application can be brought back stable state.

Traditional application Deployment

In the traditional deployment, if there was any deployment failure, the previous stable version of the application was being deployed. Sometimes rollback is not easy because we need to perform all the activities from scratch. In the worst case the application is unavailable for long periods of time.

In order to avoid all the havoc, we need to have proper deployment strategy in place. In this post we will discuss one of the many deployment strategies i.e. Blue Green Deployment.

Blue Green Deployment

A Blue Green deployment is a deployment strategy which consists of two live production environments at the same time that are identical to each other. In this deployment strategy, the role of live production can either be Blue or Green. Lets consider Blue as the current production application present in production, and green as the new version that is to be deployed. The main goal of this strategy is to reduce the downtime of the application so that no business user is impacted.

Blue environment consists of the live production application. After adding a additional features to your application you deploy it to the Green Environment. In order to test if existing features as well as the new features are working fine, you allow some part of the applications traffic (1% or any criteria of your business domain say user Id) to the green environment. After successful testing either gradually increase the traffic or switch over the Green environment as the live production environment. In case of any failures you can rollback to the previous stable version i.e. Blue.

When you have new set of features ready for your next release, you switch from green to blue like you did from blue to green earlier. That way both green and blue environments are playing the roles of live and rollback versions.

From the database standpoint, the deployment strategy can be tricky. The priority should be given to database schema changes (if any) first which supports the Blue as well as Green version. The rollback version should be tested first, if everything is fine then deploy the new version of application. If testing was successful with the new production version, then the next task would be to remove the database support for the old version of the application.

Blue Green deployment on Pivotal Cloud Foundry(PCF)

Lets talk about blue green deployment in Pivotal Cloud Foundry. CF Router plays a major role of switching the traffic between Blue and Green versions of the application. All we have to do is Map or Unmap the route (URL) to Green or Blue based on the verification done for your application flows.

For more detailed information, you can go over the official documentation by Pivotal.

Blue Green deployment on Amazon Web Services (AWS)

In AWS, there are are components which contribute or work together to achieve Blue Green deployment strategy. For detailed and well explained documentation, you can go over the official AWS blue green deployment guide.

I hope you liked this post. Please do share across the community so that everyone will have a good time learning. Please feel free to comment if you find anything to be corrected in the post.

What Are Kubernetes Persistent Volumes

February 14, 2022 by Ravikiran Kada Leave a Comment

In the early days of computers, people used the term “volume” a lot. This was the term referred to the computer hard drive. It is where all your data is stored. Let me set the context by explaining a simple example. While working on Spring boot with H2 database (in-memory database) at your local IDE level, whenever we perform CRUD operations against the database, the data will be wiped out when you restart your spring boot application. Obviously in production environment/scenario we will use others databases like Oracle/Mysql/Postgresql and we will ensure that the system is a highly available.

In Kubernetes terms, lets say you have a MySQL pod where the data is stored, the data will be wiped out when you restart the pod. In the kubernetes architecture, you have to take care of your applications data store requirements. In short, the data storage should not be dependent on the POD lifecycle nor the cluster state in the event of crash.

Data storage requirements can be anything from database to storing files in a directory.

To sum up in a single line, Kubernetes Persistent Volume is a component in Kubernetes architecture, which is just like a cluster resource which helps with the storage requirements of the containers in the pod. Just imagine if Volumes wouldn’t exist, the data which containers created would be lost if the containers died 😶 This is the reason why there was a need for a separate component in the kubernetes cluster i.e. Persistent Volumes. Now that you have some idea about Persistent volumes, lets explore each aspect.

Characteristics of Volumes

Since the data is an essential component that your applications relies on, volumes should be highly available in nature.
Even if the cluster dies or crashes completely the Persistent Volumes needs to stay intact.
Volumes should be available to all the nodes.

Persistent Volumes (PV)

Persistent volumes are just an abstract component or an interface, the actual storage will be taken care by various cloud providers such as AWS, Azure etc. Actual storage can be actual physical storage from local Hard disk drive of the cluster node or NFS servers outside the cluster. Persistent volumes gets created by using kubernetes yml file. In this file we can specify various properties like the storage provider, storage volume etc

You can take a look at the list of Persistent Volumes that kubernetes supports. Lets look at one example of Persistent volume yaml file configuration.

I hope you liked the post. Happy Learning 😊

What is Infrastructure as Code (IaC)

February 9, 2022 by Ravikiran Kada 1 Comment

Introduction

As the name says, Infrastructure as code, it means creating your infrastructure with the help of code. Now you might think if I can create the resources and design my infrastructure by signing into my cloud providers account such as AWS, Google etc which is super easy and interactive, then why do I have to write code for this as well? As we go along the post we would be uncovering the real benefits of creating you infra via code vs the web UI 😊

Lets take an example of AWS in this post. There was a reason behind Amazon creating AWS to support their growing business in various horizons and as days went by, AWS became one of the go to place for many major organisation for cloud solutions. At the time of writing this post, AWS stands as the most profitable business for the Amazon. But here come a twist, at least what I feel after logging into my AWS account, I kind of go lost as to where to start from. So many options and wide variety of services that AWS provides that you tend to keep searching for things. After so much struggle lets say you manage to setup your development environment by spinning up a bunch of EC2 instances, setting up queues, database, S3 buckets etc etc, As days pass by, now you have to setup you new UAT environment 🥺 . As of writing this post, I have noticed that AWS sometimes change their UI as well as change some options here and there.

Nowadays instead of creating the infra by web UI, people have started creating the infrastructure via code. Instead of searching through so many options and creating an S3 bucket or an EC2 instance through UI , writing code is much more simpler

Benefits of Infrastructure as code (IaC)

Configuration Consistency and Resuse

We can create an environment using the same set of configuration used for initial environment setup. Chances of human error for environment setup would be less as compared to creating via the web UI. If you have done this using UI, you would tend to forget some setup steps. Imagine if there was no code, the engineer sets up the complex environment and left the organization some day. Imagine how difficult it would be for the new engineer to figure out how it was setup.

Infrastructure Automation

We are automating the whole process of infrastructure rather than using the manual steps to configure via the Web UI. This reduces the risk on human error.

GitOps

Not only are we creating infra via code, but this code is being maintained in version control system. It will be leveraged or improved by multiple people. Once we push the new changes, build pipeline is run which provisions the infra for you, hence this becomes he part of the build process. Same process of code reviews and pull requests can be followed here as well like we do for our application code base. We can change back and forth with the versions maintained if there are any issues with the infra.

Terraform

Terraform is Infrastructure as a code tool that you can use to work with any of the the current available cloud providers platforms. Terraform is open source and uses Hashicorp configuration language (HCL) to define resources. Terraform makes sure that whatever code you write it will make sure that it will call the corresponding cloud provider API’s and provision those resources. Sometimes the properties can be different because each cloud provider probably expects is own unique set of values to provision a source. We write code for multiple cloud providers in the same file and Terraform is responsible to create the resources with the right cloud providers.

I hope you liked this post. Happy Learning 😊