Design

Design API Rate Limiter

August 5, 2022 by Ravikiran Kada Leave a Comment

What is Rate Limiter?

Rate Limiter restricts the number of client requests received for a particular API endpoint, for a specific duration or time frame. If the number of client requests exceeds the set threshold, those requests will be ignored or dropped. We can also define rate limiter as the mechanism used to define the rate and speed at which the consumers can access the APIs. Below are some of the example use cases for rate limiter.

Allowing only 100 tweets allowed per hour by Twitter. Tweets beyond 100 count will be blocked or discarded with some user friendly error saying “Only 100 tweets allowed per hour”.
Viewing only 20 LinkedIn profiles from a specific IP Address without login. When trying to access 21st time, users are required to login.
Online PDF editing service allowing only 10 pdf’s per day from an IP address for their free tier service.

Why Rate Limiting is used?

Preventing DoS Attack

Denial of service attack is one of the reasons rate limiting is required. In DoS attack, a particular server or resource is bombarded with huge number of requests. By doing this the server or the system will crash, making the application inaccessible to the legitimate users. The request seems to be coming from a legitimate user, but in reality they can be bots triggering the requests intentionally.

Preventing excess server load

Some organizations have fewer resources. Due to low capacity server’s, the load on the server has to be limited or minimal. Rate limiter will filter out the excess requests.

Cost Reduction

When you are using third party APIs which are charged for every call you make, in this case you want to limit the number of requests made to paid third party APIs.

Functional Requirements

Lets say we want to limit the request to 20 requests per minute duration.
The user should get an error message when the threshold of 20 requests are completed.

Non-Functional Requirements

The system should be available and accurately rate limit the user.
After introducing the rate limiter, the system should not slow down or the performance should not be impacted.
Rate limiter should work correctly with multiple servers.
Entire system should not go down when there are issues in API rate limiter .

High Level System Design for Rate Limiter

There are 3 places where we can keep the API rate limiter as shown below.

Server side rate limiter

Rate Limiter middleware

Rate limiter within gateway

Algorithms for Rate Limiter

In this section we will discuss about some of the most popular algorithms used for implementation of rate limiter.

What is Token Bucket algorithm

Token bucket is one of the widely used rate limiting algorithm. Large companies like Amazon and Stripe use this algorithm. Lets understand how the token bucket works.

For every incoming request, a token is assigned from the bucket. Bucket will have a capacity which will hold ‘n’ number of tokens only, e.g. 10. Whenever the tokens are consumed by the incoming request, and the bucket is emptied, the bucket will be refilled with another 10 tokens when the next time interval starts. So for every time interval say 60 seconds , only 10 requests can be accepted by consuming 10 tokens from the bucket. Any future requests will dropped since the quota of 10 requests was served.

What is Leaky Bucket algorithm

Leaky bucket algorithm is almost similar to token bucket algorithm, the only difference is here we have queue’s instead of bucket, and the requests are processed at a fixed rate (time interval).

There are two parameters in leaky bucket algorithm
Bucket size – Bucket size is the size of the queue which holds the requests.
Outflow rate – It defines how many requests can be processed at a fixed rate (time parameter, i.e. seconds)

As shown in the above diagram, when a new request arrives, it is added in the queue to start processing.
In order the process a request, it is pulled from the queue.
If the queue is full, the requests are dropped.

What is Fixed window counter algorithm

In Fixed window counter algorithm, we only allow ‘n’ number of requests for a fixed duration of time. ‘n’ is the predefined threshold and fixed duration can be per seconds or per minute. To keep track of the number of requests processed, we have a counter which is incremented as the requests arrive, and only those are requests are allowed within that fixed window duration. E.g. ‘n’ requests per minute. Once the request count reaches the predefined counter, the excessive request are dropped.

Design Rate Limiter : Interview question

Candidates who are new to system design or not worked extensively on large applications are unaware of what is rate limiter and how to design rate limiter. This post is sufficient enough to get a good grasp on rate limiter system design. We have discussed three rate limiting algorithms in this post, you can explain all three or any two would be fine.

What is Infrastructure as Code (IaC)

February 9, 2022 by Ravikiran Kada 1 Comment

Introduction

As the name says, Infrastructure as code, it means creating your infrastructure with the help of code. Now you might think if I can create the resources and design my infrastructure by signing into my cloud providers account such as AWS, Google etc which is super easy and interactive, then why do I have to write code for this as well? As we go along the post we would be uncovering the real benefits of creating you infra via code vs the web UI 😊

Lets take an example of AWS in this post. There was a reason behind Amazon creating AWS to support their growing business in various horizons and as days went by, AWS became one of the go to place for many major organisation for cloud solutions. At the time of writing this post, AWS stands as the most profitable business for the Amazon. But here come a twist, at least what I feel after logging into my AWS account, I kind of go lost as to where to start from. So many options and wide variety of services that AWS provides that you tend to keep searching for things. After so much struggle lets say you manage to setup your development environment by spinning up a bunch of EC2 instances, setting up queues, database, S3 buckets etc etc, As days pass by, now you have to setup you new UAT environment 🥺 . As of writing this post, I have noticed that AWS sometimes change their UI as well as change some options here and there.

Nowadays instead of creating the infra by web UI, people have started creating the infrastructure via code. Instead of searching through so many options and creating an S3 bucket or an EC2 instance through UI , writing code is much more simpler

Benefits of Infrastructure as code (IaC)

Configuration Consistency and Resuse

We can create an environment using the same set of configuration used for initial environment setup. Chances of human error for environment setup would be less as compared to creating via the web UI. If you have done this using UI, you would tend to forget some setup steps. Imagine if there was no code, the engineer sets up the complex environment and left the organization some day. Imagine how difficult it would be for the new engineer to figure out how it was setup.

Infrastructure Automation

We are automating the whole process of infrastructure rather than using the manual steps to configure via the Web UI. This reduces the risk on human error.

GitOps

Not only are we creating infra via code, but this code is being maintained in version control system. It will be leveraged or improved by multiple people. Once we push the new changes, build pipeline is run which provisions the infra for you, hence this becomes he part of the build process. Same process of code reviews and pull requests can be followed here as well like we do for our application code base. We can change back and forth with the versions maintained if there are any issues with the infra.

Terraform

Terraform is Infrastructure as a code tool that you can use to work with any of the the current available cloud providers platforms. Terraform is open source and uses Hashicorp configuration language (HCL) to define resources. Terraform makes sure that whatever code you write it will make sure that it will call the corresponding cloud provider API’s and provision those resources. Sometimes the properties can be different because each cloud provider probably expects is own unique set of values to provision a source. We write code for multiple cloud providers in the same file and Terraform is responsible to create the resources with the right cloud providers.

I hope you liked this post. Happy Learning 😊

What is Chaos Engineering?

December 23, 2021 by Ravikiran Kada Leave a Comment

Use Case

Just Consider for reality 😎 you have a website or a service which is worth millions and billions of dollar and all of a sudden it goes off for a significant amount of time. You are totally unaware of what impact or loss this will incur. Till date you were so confident that you have used the best possible technologies out there and experienced developers/architects worked on it. But now you are under the radar of the management and the customers 🤐.

I have personally faced this situation while I was working for a renowned bank. The fact of the matter is we all are afraid of this situation and always want to avoid any service outage or stoppage in production. So we should all be ready for such situation by simulating such scenarios every now and then.

Introduction to Chaos Engineering

Chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Chaos engineering is a discipline that directly addresses System Availability and affects your culture. But we should always strive and ensure nothing breaks in production. Even if it breaks then we should see how will our application behave or handle. Along with our applications, the platforms and infrastructure also should help in tackling this situation. Chaos engineering is nothing but experimenting something in production which no one can expect to happen. So that we can identify how the application or system will behave when this situation will arise. It will enable us to find the loopholes in the system and fix it accordingly.

It is something similar to the Mock Fire Drills we perform in our office’s. Hope this was close to what I am trying to explain. 🙆‍♂️ Let me know in the comments.

Principles

Build a Hypothesis around Steady State Behavior

Consider we have a banking application, and we are assuming that our application is in a steady state behavior, it is working absolutely fine. Some day if there is a sudden outage while a customer is doing a money transfer from one account to another, how will the system behave? will the money get deducted from the customer account and not deposit into the destination account? where will the funds go?

This is one of the use case that can happen. In that way we need to list all possible negative scenarios and our goal should be to work on handling or fixing those missing pieces.

Vary Real-world Events

Chaos variables reflect real-world events. Here we need to think of real world events like sudden increase in traffic or server damage or hardware failure or any non expected real world events which will disrupts the steady state of the application.

Run Experiments in Production

Systems behave differently depending on environment and traffic. Application feature may work fine in Development environment , but may timeout or respond slowly. You have to run your experiments in real production environment. Unless you run in production, you wont be able to know the impact.

Automate Experiments to Run Continuously

Chaos Engineering builds automation into the system to drive both orchestration and analysis. The experiment has to be carried out from every now and then to see if any new issue surfaces. Experiments can be either monthly or quarterly basis.

Minimize Blast Radius

Identifying the failure and reducing the impact of the failure without causing much damage to the live customers. Experimenting in production has the potential to cause unnecessary customer pain. While there must be an allowance for some short-term negative impact, it is the responsibility and obligation of the Chaos Engineer to ensure the fallout from experiments are minimized and contained.

Tools Available

Netfix has a Simian Army. It is nothing but a group of tools used to test chaos situation in production every now and then. Below are the tools by Netflix

Latency Monkey will purposefully delay the requests and see what happens to the requests.
Chaos Money , It goes and randomly kills a microservice and see what happens to the flow or behavior.
Chaos Gorilla, it kills the entire availability zone. Availability zone in case of AWS can be NA-West .
Chaos Kong is another tool which will kill the entire region randomly , which can be NA-East or NA-West to check the behavior of the whole system.

Similar to Netflix every large organization has their own strategy to handle these situations. Facebook has something called as Facebook Storm.

I hope you liked the post 😉 Keep visiting and learn new things.

The 12-Factor App Principles

December 18, 2021 by Ravikiran Kada Leave a Comment

Codebase

Every deployment should have a separate repository, i.e. if you have 3 different deployments then you will have 3 different repositories. There is always a one-to-one correlation between the codebase and the app. Multiple apps sharing the same code is a violation of twelve-factor. The solution here is to factor shared code into libraries which can be included through the dependency manager.

There is only one codebase per app, but there will be many deploys of the app. A deploy is a running instance of the app. Every developer has a copy of the app running in their local development environment, each of which also qualifies as a deploy.

The codebase is the same across all deploys, although different versions may be active in each deploy. For example, a developer has some commits not yet deployed to staging; staging has some commits not yet deployed to production. But they all share the same codebase, thus making them identifiable as different deploys of the same app.

Dependencies

The dependency management should be done by the application itself.

Config

Usually whatever code you deploy across environments like development, staging and production might be same but the only thing that might vary is the configs. Usually the configs contain below

Resource handles to the database, Memcached, and other backing services.
Credentials to external services such as Amazon S3 or Twitter
Per-deploy values such as the canonical hostname for the deploy

Config should be out of the repository, i.e. the code and the config should be altogether separate. If there is a change in the config then in that case you don’t have to redeploy the application. We can also store configs in the environment variables. Environment variables are easy to change without changing any code.

Storing configs as constants in the code is violation of 12 factor principles.

Backing services

A backing service is a service the your application consumes over the network as part of the operation. Examples are databases such as MySQL or PostgreSQL, messaging/queueing systems such as RabbitMQ or ActiveMQ, SMTP services for outbound email such as Postfix, and caching systems such as Memcached. Each backing service that your application consumes is also called as a resource.

A deploy of the twelve-factor app should be able to swap out a local MySQL database with one managed by a third party (such as Amazon RDS) without any changes to the app’s code.

Resources can be attached to and detached from deploys at will. For example, if the app’s database is misbehaving due to a hardware issue, the app’s administrator might spin up a new database server restored from a recent backup. The current production database could be detached, and the new database attached – all without any code changes.

Build, Release and Run

Your application should follow a proper sequence i.e. Build, Release and then Run. It will never happen that you release the application without building it.

Build – A Build stage transforms your code into and executable unit called as build.
Release – In this stage the build produced by the build stage is combined with the deploy’s current config. The resulting release contains both the build and the config and is ready for immediate execution in the execution environment.
Run – This stage runs the app in the execution environment.

Usually every release will have a unique release ID such as timestamp 2021-12-19-20:32:17. Any changes made to the code thereafter should have a new release ID. The deployment tools or the release management tools should have the ability to rollback to previous release in case there are any issues with the current release.

Stateless processing

You applications should not have any state maintained on the server. It should be stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database.

The memory space or filesystem of the process can be used as a brief, single-transaction cache. For example, downloading a large file, operating on it, and storing the results of the operation in the database. The twelve-factor app never assumes that anything cached in memory or on disk will be available on a future request or job – with many processes of each type running, chances are high that a future request will be served by a different process. Even when running only one process, a restart (triggered by code deploy, config change, or the execution environment relocating the process to a different physical location) will usually wipe out all local (e.g., memory and filesystem) state.

Some web systems rely on sticky session – that is, caching user session data in memory of the app’s process and expecting future requests from the same visitor to be routed to the same process. Sticky sessions are a violation of twelve-factor and should never be used or relied upon. Session state data is a good candidate for a datastore that offers time-expiration, such as Memcached or Redis.

Port Binding

The application should be responsible to tell which port the app should be running. In that way, one app can become the backing service for another app, by providing the URL to the backing app as a resource handle in the config for the consuming app.

Concurrency

When we scale the application we should ensure that the concurrency of the application is not compromised and we should make use of appropriate process to carry out the work.

Disposability

When the App starts up or shuts down it should happen quickly.

DEV/PROD parity

Whenever you are deploying any application to the DEV, it should be the same application you are deploying to the PROD. It should not have any parity between the environments.

So the main moto here is we should keep the development, staging and production as similar as possible.

Logs

The application logs allow you to view what’s going on in your running application.

Logs are the stream of aggregated, time-ordered events collected from the output streams of all running processes and backing services. Logs in their raw form are typically a text format with one event per line (though backtraces from exceptions may span multiple lines). Logs have no fixed beginning or end, but flow continuously as long as the app is operating.

The event stream for an app can be routed to a file, the logs are stored on multiple machines in the cloud so logs should be streamed or watched in realtime using terminal or Splunk or any other tool.

Admin Processes

When we are going to perform one time activity for example running some one-time script or running database migration, it should have a separate process rather than combining it with other process. Admin code or scripts must ship with application code to avoid synchronization issues.

What is loadbalancing and algorithms used for loadbalancing

March 29, 2021 by Ravikiran Kada Leave a Comment

Introduction

Load balancers in the context of microservices are used to distribute the incoming traffic to the instances that are running on various hosts, and these hosts sit behind the load balancer.

Looking at the below image you see Money transfer microservice instance is present in 3 hosts and whenever customers try to invoke the web url to transfer their money, instead of the request directly coming to a microservice it has to pass load balancer.

Some of the salient features of load balancers are as below.

Load balancers distribute calls sent to them to one or more instances based on some algorithm.
Reliability and high availability is maintained by redirecting the requests only to the server which are available.
We have the ease of adding or removing the servers in the network based on the demand. E.g. you must have seen some giant eCommerce websites launching their biggest sale where most of the items are sold at big discounts. And with this it’s obvious that the website traffic will also increase and this gives the flexibility to increase the number of servers running the application. In this case load balancers will take the job of distributing the traffic evenly across the servers.

We will discuss 3 common types of load balancing algorithms.

Round Robin

In this algorithm the incoming requests are distributed in round robin fashion. Let’s say there are 3 instances of application running. The first request will go to the 1st instance, the second will go to the 2nd instance and the 3rd will go to the 3rd instance. If 5 other requests come in, the request will be redirected to the 1st then 2nd and likewise.

Least connection

In this algorithm the incoming request will be redirected to the server that has the least no of requests. Load balancers should know which server has the least number of requests. Load balancer has the additional overhead of getting the data about the available servers and the request count they are processing based on which it has to make the decision.

IP hashing

Load balancer will redirect the request based on the IP address of the client from where the request originated. All requests from Client A will always go to that specific server as long as the server is alive.