General

Shrink a VM boot disk size on GCP without losing any data

shrink a disk

If, for some reason, you increased a GCP VM boot disk size following the standard procedure and then decided you didn’t need all that space, when you want to shrink the disk size, you find out that, well… it is not that easy as increasing it, but it is possible.

It is a tough and time-consuming process… 

TLDR

In short, we will need a temporary VM, attach the disk to it, resize the filesystem and partition, create a new disk with the desired size, and clone the contents.

Important assumptions:

  • We will resize a 500G disk to 200G.
  • We have already freed up space in the disk. 
  • For this particular tutorial, we are using Debian 9

Step by Step Process

  • First, stop the original VM
  • Create a snapshot of the disk (for backup purposes)
  • Create a new VM. We will name it the temp VM and use it to mount the disk and resize it.
  • Detach the big disk from the original VM. Attach it to our new temp VM.
  • Mount it
mkdir -p /mnt/disks/bigdisk
mount -o discard,defaults /dev/sdb1 /mnt/disks/bigdisk
ls -la /mnt/disks/bigdisk
  • Check the available disk size
df -Th
Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs  2.0G     0  2.0G   0% /dev
tmpfs          tmpfs     393M  456K  393M   1% /run
/dev/sda1      ext4      9.7G  2.0G  7.2G  22% /
tmpfs          tmpfs     2.0G     0  2.0G   0% /dev/shm
tmpfs          tmpfs     5.0M     0  5.0M   0% /run/lock
/dev/sda15     vfat      124M   12M  113M  10% /boot/efi
tmpfs          tmpfs     393M     0  393M   0% /run/user/1002
/dev/sdb1      ext4      493G  133G  340G  29% /mnt/disks/bigdisk

Note that the partition is using 133G of space. We can shrink the disk to 200G and leave some room to grow.

NOTE: to avoid any issues and make things run faster, we will shrink the filesystem to 150G and partition to 160G, then expand everything to 200 in the new disk

  • Run resize2fs to resize the filesystem
umount /dev/sdb1
e2fsck -f /dev/sdb1
resize2fs -p /dev/sdb1 150G
e2fsck -fy /dev/sdb1

The process will take some time, approximately 10 minutes in this case.

  • After resizing the filesystem, we prepare to resize the partition

You can use fdisk or cfdisk. Note that to resize the partitions, we point to the disk, sdb instead of sdb1

  • If fdisk is not installed, apt install fdisk
  • Check partitions and get the first block size:
fdisk -l /dev/sdb
Disk /dev/sdb: 500 GiB, 536870912000 bytes, 1048576000 sectors
Disk model: PersistentDisk  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x96c8eef6

Device     Boot Start       End   Sectors  Size Id Type
/dev/sdb1  *     4096 419434495 419430400  500G 83 Linux

The important number is 4096. We will use it later

  • Resize the partition 

Keyboard inputs are in bold

fdisk /dev/sdb

Command (m for help): p
Disk /dev/sdb: 500 GiB, 536870912000 bytes, 1048576000 sectors
Disk model: PersistentDisk  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x96c8eef6

Device     Boot Start       End   Sectors  Size Id Type
/dev/sdb1  *     2048 419432447 419430400  200G 83 Linux

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): *Enter*
First sector (2048-1048575999, default 2048): 4096
Last sector, +/-sectors or +/-size{K,M,G,T,P} (4096-1048575999, default 1048575999): +160G

Created a new partition 1 of type 'Linux' and of size 200 GiB.
Partition #1 contains a ext4 signature.

Do you want to remove the signature? [Y]es/[N]o: N

Command (m for help): a
Selected partition 1
The bootable flag on partition 1 is enabled now.

Command (m for help): p
Disk /dev/sdb: 500 GiB, 536870912000 bytes, 1048576000 sectors
Disk model: PersistentDisk  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x96c8eef6

Device     Boot Start       End   Sectors  Size Id Type
/dev/sdb1  *     4096 419434495 419430400  160G 83 Linux

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
  • Now we attach a new empty 200G disk to the temp VM. (it will be /dev/sdc)
  • Clone the disk to the new one:
dd if=/dev/sdb of=/dev/sdc bs=8M status=progress

This operation will take some time, it took another 10 minutes

After 200G (the physical disk size), dd will fail, saying the destination disk is full (which is true!)

dd: error writing '/dev/sdc': No space left on device
25601+0 records in
25600+0 records out
214748364800 bytes (215 GB, 200 GiB) copied, 1028.88 s, 209 MB/s
  • Mount to verify it is working, and run e2fsck just in case
mkdir -p /mnt/disks/shrinkeddisk
mount -o discard,defaults /dev/sdc1 /mnt/disks/shrinkeddisk
ls -la /mnt/disks/shrinkeddisk
umount /dev/sdc1
e2fsck -f /dev/sdc1
  • Detach the new small disk from the temp VM and attach it to the original VM

And we are done!!!

How to access a Private EKS Cluster from your local machine without a VPN

There are many ways to access a Kubernetes private cluster. My preferred way is the GKE one, on which the API endpoint is public but you whitelist your machine to be able to connect to it.
You can do it with a simple gcloud command and that is it. You can even automate your pipelines to do it, then remove the whitelisted IP when ready.

Both Azure and AWS do not do it this way when you create a private cluster inside a Private Network, it is really private… and it becomes painful. Hey, in Azure I even think their own Azure Portal cannot see the workloads. WTF.

The most preferred-by-infosec way of doing it is by using a VPN. Which yes, works, yet is another thing to take care of and managed VPNs are not cheap.

The general popular idea is to set up a Bastion Host (usually a VM) which is exposed to the internet (hey, this sounds great!) and then you can SSH into the Bastion to run kubectl commands in there:

PLEASE, DON’T DO THIS. Why would you create a private cluster if you are going to put a freaking VM with port 22 open in front of it…
::insert-facepalm-emoji-here:: It is nonsense!!!

Yet many people do it.

There are options like gototeleport.io which allow you to securely SSH into the Bastion host with MFA, or even kubectl directly to the cluster, skipping the Bastion at all. But it is really expensive. (I save you the Google search, $100/m per cluster)

Couldn’t be worse eh?

It gets even worse. I’ve seen some architectures that looks like this:

So now you have two SSH jumps. One into the “public” JumpBox, then from there, you SSH into a Bastion host located inside the private network, from which you can run kubectl commands.

This architecture is bad for many reasons. First, you are still exposing a VM with port 22 open. It also promotes using the JumpBox and the Bastion as “trash boxes” where your Kubernetes YAML files will lie forever.
Finally, I certainly believe this architecture should not exist. The proper way of having a truly private cluster is to make it private first, then only access it by using pipelines. If you really need to kubectl into it, then it is a DEVELOPMENT cluster, and whitelisting IPs or using a VPN are the best options.

How to jump over the crap

Now, let’s return to the issue, the architecture exists, it is a development cluster, it is real, and it is my problem now. I want to run kubectl from my machine, where I have my IDE and my stuff.

THE problem

SSH and ~/.ssh/config to the rescue!

With SSH, you can create a proxy and forward all connections to a certain host, but in this case, we have a double SSH jump to do, so, in order to make it easy and readable, we can use the ~/.ssh/config file like this:

Host jumpbox
  HostName x.x.x.x    #[this is the public ip]   
  User myuser
  IdentityFile ~/.ssh/my-key.key

Host bastion
  HostName a.b.c.d    #[this is the private ip]
  User mybastionuser  #ec2-user in my case  
  ProxyJump jumpbox

Now, you can ssh bastion and you should be able to login to the Bastion server.

If you get a public_key denied message, make sure your machine’s public key is in the authorized_keys of the Bastion, not only on the JumpBox

How to make kubectl work over an SSH proxy?

In order to make kubectl work over a proxy, we need to do a few quick things:

  • Copy the ~/.kube/config data you need from the Bastion to your local machine.
  • If you are using AWS RBAC make sure aws cli is set up in your local machine, then follow this tutorial.
  • Create an SSH proxy with ssh -D 1080 -f bastion -N
    • -f will send ssh to the background
  • Force your current session to use that proxy for HTTPS with export HTTPS_PROXY=socks5://127.0.0.1:1080

Now, you can run kubectl get namespaces in your terminal and it will work!!!!

I know, it is not the easiest solution, but it is quite simple and transparent. I hope this will help someone!

Multi-tenant cloud architectures. How to make a decision?

In this post, I want to show you two different cloud architectures for multi-tenancy. It doesn’t matter if you are planning to have a SaaS product with millions of tenants or just a handful of them, it is important to think about the pros and cons of the different models.

We will discuss the simplified cloud infrastructure architecture, not the code. Should you have separate databases? replicate workloads? a different load-balancer?
Let’s answer all those questions.

Multi-User vs Multi-Tenant. What do we mean by multi-tenancy?

A quick google search shows a lot of different answers for that question, so let’s clarify what we will be covering in this post.
When we say multi-tenant we refer to a software architecture designed in such a manner that it can serve multiple users, grouped in tenants. Not only a multi-user SaaS app.
For example, a software whose customers are companies, and the users are the companies employees.

Multi-User vs Multi-Tenant

In this post, we will be focusing on the multi-tenant architectures, not the multi-user ones. Therefore the analysis is going to be focused on 4 items:

  • Ingress
  • Backend
  • Data
  • Cloud Infra and Security
  • Product compatibility. When to choose each one?

Architecture #1: tenant aware app

Tenant aware app (single database)

When we say tenant aware, we are saying that the application contains the logic for splitting the traffic/data between tenants. So, for example, based on subdomains or using a login form with a “tenant Id” field (any parameter that identifies the tenant) the application contains code that is aware of from which tenant the request is coming from.

Ingress, backend and data layers

For this approach, the ingress layer and the backend layer are the same for all tenants. The important separation has to be done in the code (as we discussed above) and for the data storage we have mainly 2 options:

  • Single database – shared schema or multiple schemas
  • Multiple databases
Tenant aware app (multiple databases)

Cloud Infra and Security

The multi-tenancy is achieved in a hybrid manner, part of the work is done by the infrastructure and the other part is done at the application itself. The cloud part is easily achievable here, you just need a scalable ingress, backend, and database.

It is a good option when your application is going to be built mostly on serverless services, like cloud functions/lambdas or app engine/beanstalk. In this case, the ingress and backend infrastructure is managed by your cloud provider. For the data layer, either option you choose (single DB vs multiple DB) you will have to come up with some automation to create new tenants.

Product compatibility. When to choose it?

The main infrastructure is unique and shared between tenants therefore it is cheap. It allows you to scale down to the lowest amount of resources possible, thus saving money.

It’s a great choice for products that will have thousands, millions of tenants. Generally, those products are not expensive for the tenants. Let me give you an example… imagine a cloud email app. Thousands of companies will pay a little money for it.

When using the “tenant-aware” architecture, making sure your code is good and your tenants are properly isolated becomes a priority. By no means you want to have problems with mixed data or causing cross issues between two different tenants.

Approach #2: replicated workloads

Replicated Workloads (multiple ingresses)

For this approach, the multi-tenancy is resolved completely at the infrastructure level, therefore, the code for each tenant is not aware of any multi-tenant capabilities.

Ingress, backend and data layers

The ingress can be multiple ingresses (above image) or a single one (below image) that reroutes the traffic based on subdomains or paths in the URL, or any other tenant identifier field that a “smart” load balancer can detect, for example, the AWS Application Load Balancer can redirect the traffic based on a header value.

Replicated Workloads (single ingresses)

The decision either to choose one ingress vs multiple ingresses comes with some tradeoffs too.
If you have a single ingress, depending on the type of ingress you design, you may face a scalability issue. For example, if you are using Google Cloud HTTPs Load Balancer as a single entry point, and you have one subdomain per tenant, you will need to double-check your SSL certificates because there is a hard limit of the number of certificates the load balancer can have.
If you go with the multiple ingresses approach, make sure you have everything automated because managing a lot of ingresses will become an issue when the number of tenants raises up.

Cloud Infra and Security

There is a backend and a data layer completely dedicated to each tenant, thus we have completely isolated and therefore more secure environments.

The negative part is that we have more resource usage. Instead of grouping compute resources together, you will have idle resources laying around everywhere. Fortunately, Kubernetes comes to help us here! This is a good approach when you plan to use microservices and/or dockerized workloads. Replicating namespaces and workloads is not “that expensive” with Kubernetes. Of course, you need to think about the data and ingress layers too.

Product compatibility. When to choose it?

The tradeoff is clear, by using this approach, your infrastructure costs more, but probably you do not need to spend a lot of time in making your code “tenant-aware”.

This is a good option when your product is expensive for the customers so you can afford to have a more expensive infrastructure.
Again, if security and isolation are really important, this approach is a must for complying with the standards.

Conclusion

Each architecture has benefits and tradeoffs, but the important takeaway from here is that it is not only a technical decision but it has to be aligned with the expectations from the product side of your SaaS application. How much it will cost to obtain new customers? How much your customers will be willing to pay for your product? is isolation a key?

I hope this post helps you to trigger the right discussions. 🙂

Microservices architecture – orchestrator, choreography, hybrid… which approach to use?

So you want to build microservices, you have identified the properly isolated domains and boundaries for each one of them and now it’s time to find out how to make them interact with each other… Now is when you ask yourself, should we use an orchestrator approach? a choreography approach? mmmm, maybe a hybrid solution?

In this post, I’m going to talk about the different microservices architectures, and I will try to answer those questions, so when you have to make the call you have at least an idea of what are the tradeoffs of each one.

The orchestrator architecture

Orchestrator architecture

On the orchestrator architecture, our users will probably hit an api-gateway, which will then trigger an event on the orchestrator. Now the orchestrator service is in charge of executing the business logic by making requests to the other microservices while keeping track of the event status. It is a centralized controller and the requests are often synchronous.

Let’s dive into the pros and cons of the orchestrator approach:

Pros

  • Business logic is “hardcoded” and tangible
  • Easy request flow and status tracking

Cons

  • Tightly coupled services. Dependencies
  • Single point of failure

The choreography architecture

Choreography architecture

The choreography architecture, often called message broker or reactive architecture is, reactive and loosely coupled. Here, a message/event broker receives the requests and the microservices subscribe to them. Each one of them triggers when a certain message appears in the broker. If needed, after a service is triggered by a message it can also talk to other services directly, to chain different flows. Messaging is asynchronous here, with a publisher/subscriber pattern.

It has some advantages over the orchestrator architecture, but it comes with other problems, let’s dive into them:

Pros

  • Loosely coupled services
  • Isolated, independent packages of services
  • No single point of failure

Cons

  • Monitoring and observability are quite complex
  • Difficult request flow control (timeouts, retries, errors)
  • End-to-end knowledge of the system, where is the business logic?

Benefits and tradeoffs of both approaches

A typical tradeoff of the orchestrator approach is that you have tightly coupled services, on this topic, the choreography approach is the clear winner.

The single-point-of-failure vs no-point-of-failure would score another point for the choreography architecture, yet it is something that can be solved with the underlying cloud architecture and some good programming skills. So, no clear winner here.

Monitoring, observability, and flow control are big benefits for the orchestrator approach, so it is the clear winner. Something simple like setting up a timeout for a request is a pain to do in the choreography architecture and don’t try to find out where your system is failing on a choreography architecture, it is a nightmare if you haven’t spent a lot of time on properly monitoring the system.

However, there are other benefits and tradeoffs that are almost never mentioned…
On an orchestrator architecture, the orchestrator itself contains the business logic. It is there. You can read the actual code. On the other hand, on the choreography architecture, you have to read a document or see a graphic, because only a few people really know the whole end-to-end architecture. This becomes an issue when for example, someone leaves the company, understanding end-to-end complex choreographies can be really a problem.

The choreography approach is often a popular choice done by “cloud-native architects”. This is because most cloud providers have some really good event/messaging systems and there are a lot of tools with buzzwords names that support it. However, for developers, (yes, those that will actually write the code) thinking about a system asynchronously is quite difficult and it means an important mind shift.

The hybrid approach

With all these benefits and tradeoffs of both architectures, there is no clear standard path to choose. Both have their own pros and cons, and the tradeoffs of each approach are different. But if you look closer, they kind of complement each other. If we could have a mix of the benefits from the orchestrator approach and the choreography approach, it would be ideal!

Enters the hybrid approach, and there are many ways you can implement a mix of both architectures, but let’s focus on the next example:

Hybrid architecture

On the hybrid architecture, we combine the two things, an orchestrator and an event tracker. The requests are received by both and the orchestrator starts to execute the business logic but keeping track of the request in sync with the event tracker. This allows the microservice C, B, and D to do decoupled things, based on the event subscription/status, while the orchestrator still orchestrates what happens with the whole flow and directly controls microservices E and F.

With this mix, we can have tightly coupled services when sequentiality is needed, and loosely coupled services when we need to do parallel things “in the background”

When to use each one?

The orchestrator approach is good when you have a business-critical sequential flow. There, the orchestrator can handle the “what if” cases. For example, if one microservice needs to return successfully before calling a second microservice.

The choreography approach is good when you need to do a lot of things in parallel, it allows you to have faster processing times and it scales easily. For example, job processing in the background

The hybrid approach is more flexible, here you can have parallel asynchronous processing while still keeping track of the business logic when it is needed. For example, job processing in the background, but only after a logic sequence of services has been addressed.

Conclusion

Whatever approach you decide to follow, take it as a guide, but not be limited to it.
There are perfectly fit use-cases for each approach, no one is better than the other, I prefer the hybrid approach because of its flexibility.

Handling batch operations with REST APIs

So, you created your REST API following the best practices, named your endpoints accordingly, used the correct HTTP verbs and everything is working well.

For example, you create users by making a POST /users call, get a list of them using GET /users, or get a single one by doing GET /users/:userId

Awesome!

Developers are happy, customers using the API are happy, what a beautiful world!

Until, someone comes and says: “Hey, I need to import 10000 users”. Ouch.

Your first thought might be, “well… just do a for loop and make a POST for each one of them, I don’t care if it takes half an hour”

That may be a solution in most of the cases because as I said, you built your API following the best practices and your underlying cloud infrastructure is horizontally scalable, you can create as many instances as your credit card allows, with almost unlimited computing resources like CPU or RAM.
But there is something that doesn’t scale too much, networking.

Why networking is an issue?

Networking or the number of calls you need to make is a bottleneck, each networking call needs to negotiate a complicated protocol like TCP and find its way through an unreliable global network of routers and switches.

Some clients may face the issue of having a hard limit of outbound connections. (As I did, for an API I built myself, which triggered this post)
Usually, an outbound connection from within a system uses SNAT which means one TCP port needs to be used per each request. Poorly built or complex systems may have a really low amount of available (allocatable) TCP ports.

How can we fix it?

The solution for this networking issue is to have some way of sending multiple items in a single call. Therefore, we can make fewer requests with more data as opposed to making a single request per user.

But this means we need to do changes in our beautifully designed API. Not only that but it also means you need to start asking yourself a lot of new questions, for example, what happens if I (as an API) receive an array of users, and when processing one of them fails due to the lack of a required property. What should be the response code? do I return 200? 400? do I return an array of response codes?

That is totally the opposite of best-practices. So we need to find a better way.

What options do we have?

We have several options to fix this issue, let’s enumerate them:

  • Change your contract to accept arrays in the body
  • Change your server-side code to accept multiple body formats
  • Rename your endpoints
  • Create a new endpoint for arrays
  • Create an endpoint for receiving batches (for each entity)
  • Create a new batch endpoint

Now, let’s take a deeper dive into each one of the options, and let’s talk about why you should use it or not:

Change your contract to accept arrays in the body

This might be the first thing that came to your mind, hey, let’s just accept an array instead of an object as we do nowadays.

So instead of:

POST /users
{
  "username": "Diego",
  "password": "123456"
}

You would do:

POST /users
[
  {
    "username": "Diego",
    "password": "123456"
  }
]

That sounds nice, but it is an anti-pattern and worst, if you have your API already published with customers using it, it means it is a breaking change. It is not backward-compatible, it is a contract change.

You could still use this approach if you change the version, like POST /v2/users

Change your server-side code to accept multiple body formats

If you created your API using… let’s say node, express, and swagger/openAPI, you might be tempted to fork the swagger library and modify its code.

This is the worst option of all (I believe). It means you need to change the libraries you used to create the API, and make them more “smart” so they don’t explode and route the traffic differently if an array was received instead of an object.

This is, again, an anti-pattern. DON’T DO THIS PLEASE.

Rename your endpoints

What if instead of users, we change its name to user and then make the users endpoint to accept array and user to accept a single object…

This is another anti-pattern, entity names should be in plural. (not to mention that this is also a breaking change)

Create a new endpoint for arrays

Ok, now things are taking more shape… what if we do POST /usersArray

It is a new endpoint, at first it is not an anti-pattern, but looks fishy. I wouldn’t recommend this approach since it can become inconsistent quickly. Although it may be a good “quick-fix” it is not a long-term solution.

Create an endpoint for receiving batches (for each entity)

We can do POST /users/batch

Hey, this kind of looks nice. You are exposing a new endpoint for the POST method, you will need to add logic in your controller to do a batch job with the received array, but it is not an anti-pattern and is one of the recommended ways to go.

Create a new batch endpoint

This takes the previous approach to the next level, providing a good long-term re-usable solution.

Instead of POST /users/batch you can do POST /batch/users

This sight order change means a huge difference in the backend. If your API is using microservices, this means a completely new batch microservice. If not, it means a completely new controller, like the users one, but this one is called batch.

The purpose of this controller is to run batch jobs against the API endpoints.

So we can do:

POST /batch/users
[
  {
    "method": "POST",
    "body": {
              "username": "Diego",
              "password": "123456"
             }
  },
  {
    "method": "POST",
    "body": {
              "username": "Diego2"
             }
  },
]

And the response code of the batch endpoint would almost always be 200, but the response body can contain an array with each single response code:

[
  {
    "responseCode": "200",
    "responseBody": {
              "userId": "12839"
              "username": "Diego"
             }
  },
  {
    "responseCode": "400",
    "responseBody": {
              "error": "Missing password"
             }
  },
]

You can take this even further, by allowing async jobs by adding an ID to each batch job, then you can query for that ID to get the results.

Conclusion

From all the options, only three of them are not anti-patterns, you can change your contract and accept arrays in a newer version of your API if you are willing to confront the risks of having an inconsistent experience, or you can use any of the last two options to build a batch endpoint.

I tend to choose the last one (build a batch microservice/controller) as the “best” option, but it really depends on your API, the business context, and some other factors.

Service Mesh comparison – Istio vs LinkerD

What is a service mesh?

A service mesh is in charge of managing the network traffic between the services. It does it in a more automated and scalable way compared to otherwise what it would take a lot of manual work. (and we know that manual work is not good 🙂 )

In other words, the service mesh lays in top of Kubernetes, takes over your services networking and makes the communications safe and reliable. It allows to connect, secure and monitor your services.

You can focus on writing your microservices and leverage the observability, network, and security policies to the service mesh.

Key capabilities of a service mesh

We can split the key capabilities in three:

  • Traffic Management
    • mTLS
    • Fault Injection
    • Circuit Breaking
    • Deployment Strategies
  • Metrics/Observability
    • Out-of-the-box monitoring and tracing tools
  • Security/Policies
    • Enforce policies and isolation

A service mesh enables “intelligent route control” capabilities, along with end-to-end encryption in the communications between the services. In addition, it also enables fault injection and advanced deployment strategies like blue/green, canaries, rolling upgrades, and more.

A service mesh becomes the “dashboard of your microservices“, a place where you can view and control what’s going on inside your cluster.

Service mesh comparison

When it comes to service mesh for Kubernetes, there are two big players, Istio and LinkerD. Both have amazing features and work in a very similar way so it is often a complicated choice.

Istio

  • Istio is a Kubernetes native service mesh, but it supports other orchestration tools like Consul and even VMs.
  • It was created by Lyft, but Google and IBM are now supporting it. Today Anthos comes with “Istio” service mesh capabilities.
  • It relies on a control plane and sidecars for working
  • We can say it is the most popular Kubernetes Service Mesh
  • It is very customizable
  • It is complex

LinkerD

  • Mirrors Istio architecture closely. (sidecars, control plane)
  • It is part of the Cloud Native Foundation (CNCF)
  • LinkerD focuses on simplicity instead of Flexibility
  • It is a Kubernetes-only solution
  • We can say it is the second most used Kubernetes service mesh

Comparison Table

Here you can find a summary of a comparison between the features:

ISTIOLINKERD
Mesh Features
Encrypted trafficYes. Auto mTLSYes, but not for TCP. (mongo connection)
Inter microservices traffic managementYes. PoliciesLimited
Microservices metrics for automatic scalabilityYes. PrometheusYes. Prometheus
Real-time mesh status observabilityYes. Kiali. (more graphical)Yes. (own dashboard, simpler)
Microservices Rate LimitingYesNo
Microservices Testing Features (fail injection, delays, % balancer)YesLimited
Microservices Circuit breakingYesNo
Microservices Release mechanisms (canary, green/blue, etc.)YesYes
Monitoring integration and metrics standards (opentelemetry)Yes. Grafana, Jaeger (tracing)Yes
Access Logs GenerationYesNo out-of-the-box (possible with plugins)
Does it work for Inter-cluster communications? (multi-cluster)YesNo
Mesh Generic stuff
ComplexityHighLow
Easy of UseComplexMedium
Companies behind itGoogle, Lyft, IBMBuoyant
Market acceptance/maturity. Community sizeLargeMedium
Support ModelCommunity and Google on AnthosCommunity. Buoyant
License/CostApache 2.0Apache 2.0
ProsMany Features. Can be extended.Easy to Use
ConsComplexDeeply Integrated with kubernetes. Cannot be expanded.

Comparison Conclusion

In conclusion, Istio is very flexible, but it is also very complex, therefore the learning curve is really big, yet it allows you to do more. LinkerD on the other hand focuses on simplicity, so it is easy to use but less customizable.

Resource management strategy for docker containers on Kubernetes (nodejs + express)

It seems simple, but it is not.

How to properly assign resources requirements and limits to a HTTP dockerized microservice running on kubernetes?

Good question, right?

Well, as you can imagine there is not a single answer to it. But there is a strategy you can follow. This is not a “book definition” what I mean with this is that there might be another way of sizing resource requirements, but, so far, I’ve been using this method successfully.

It is a delicate balance between the hardware size on which your containers will run (ie: k8s nodes) and how many requests the container itself can handle.

The idea is to find that delicate balance so our pods can be scheduled on the nodes, without wasting precious resources that can be used for other workloads.

So, how do we do that?

A simple method:

Let’s take a simple microservice as an example: a nodejs express REST API which talks to a MongoDB.
We will be doing: GET /example

Run the container on your local machine or a k8s cluster, measure the idle resource usage:

  • CPU: 1%
  • Memory: 80Mb


Using a load testing app like Apache JMeter, we fire up 100 concurrent requests in 1 second… measure the resource spike using docker stats or similar:

  • CPU: 140%
  • Memory: 145Mb

We exceeded one CPU core, and that is not good, so we reduce the number of parallel requests to 70

  • CPU: 93%
  • Memory: 135Mb

Now, we have something to work with. We can assume that a single instance of our app, can handle up to 70 requests per second.

Setting Kubernetes resources:

According to how resource limits and requests are set in Kubernetes, we can say that it is safe to consider that our resource requests can be:

  • CPU: 0.1
  • Memory: 100Mb

And we can limit the resources to:

  • CPU: 1
  • Memory: 150Mb

That will ensure that we can handle at least, 70 requests per second, per replica.

Disclaimer: as I said before, this is a simple approach, we are not considering several things, for example, node resources, MongoDB capacity…

Configurando un Reverse Proxy con Apache

Un Proxy es un servicio que actúa como intermediario entre una comunicación del tipo Cliente-Servidor.

Mientras que un proxy normal (Forward Proxy) hace que un servidor no te contacte directamente, sino que es el proxy quien se conecta con el servidor, manteniendo al cliente en el anonimato, un Reverse Proxy, mantiene al servidor en el anonimato para con sus clientes…

 

Éstas imágenes lo aclaran muy bien:

Forward Proxy:

Forward Proxy

Reverse Proxy:Reverse Proxy

(Gracias Wikipedia por las imágenes)

 

Para qué podemos utilizar un Reverse Proxy?

Los usos son muchos, pero el clásico es el siguiente:

Supongamos que instalamos un servicio en nuestro servidor, este servicio responde al puerto 8080, es decir, accedemos a este servicio mediante http://www.example.com:8080.

Pero queremos acceder al servicio mediante http://servicio.example.com porque es más fácil para los usuarios.

Obviamente, no podemos configurar el puerto del servicio para que use el 80 porque tenemos un Apache o que ya está corriendo en ese puerto…

Entonces, “enmascaramos” la verdadera URL con el Reverse Proxy…

 

Como configurar un Reverse Proxy en Apache:

Primero instalamos un paquete que nos va a servir:

apt-get-install libapache2-mod-proxy-html

Luego, activamos los módulos de apache:

a2enmod proxy 

a2enmod proxy_html

service apache2 restart

Con eso, ya estamos listos para crear el proxy.

Es muy sencillo:

NameVirtualHost *:80
<VirtualHost *:80>
	ServerName service.example.com

	ProxyPreserveHost On
	ProxyRequests Off	

	ProxyPass / http://www.example.com:8080/
	ProxyPassReverse / http://www.example.com:8080/
</VirtualHost>

Esas lineas dentro de un Virtual Host de Apache, son suficientes para que el proxy funcione

Pero, si el servicio que esta detrás del proxy utiliza mucho Ajax/JavaScript/CSS, vamos a notar un rendimiento muy pobre, en otras palabras, si accedemos al sitio desde la URL original: www.example.com:8080 tendremos una respuesta normal, pero si accedemos mediante la URL enmascarada, service.example.com, podemos llegar a tener demoras de 10 a 20 segundos!! inclusive puede ocasionar Time-Outs…

Es decir, un sitio web detrás del Reverse Proxy, se va a notar muy muy lento. Mucho más lento que accediendo desde la URL original.

Esto sucede porque por defecto, el mod_proxy solo re-mapea las URLs en los headers, no en el contenido de la página, además el modulo mod_proxy_html tampoco parsea CSS o JavaScript, entonces las URLs que se encuentran dentro de esos archivos y el contenido de la página, no son re-mapeadas, provocando una serie de Lookups que demoran mucho tiempo

 

Entonces, el código final, con lo  anterior solucionado sería el siguiente:

NameVirtualHost *:80
<VirtualHost *:80>
	ServerName sub.example.com

	ProxyPreserveHost On
	ProxyRequests Off	

	ProxyPass / http://127.0.0.1:8040/
	ProxyPassReverse / http://127.0.0.1:8040/
	ProxyHTMLURLMap http://127.0.0.1 /

	SetOutputFilter  proxy-html
	RequestHeader    unset  Accept-Encoding

</VirtualHost>

 

Luego, hacemos un último:

service apache2 restart

Y ya estamos listos….

 

Con eso deberiamos poder acceder a http://service.example.com exactamente igual que si accediéramos por la URL original!!!

 

Saludos!!

MySQL: Analizando la performance de un SELECT

Me encontré con la necesidad de saber con exactitud cuánto demora un SELECT de una fila cuando el campo que buscamos no es un índice, o si es un índice secundario o uno primario.

Para explicar mejor la situación, propongo el siguiente ejemplo:

Tenemos la siguiente Tabla de Usuarios:

Id Name UserName Telephone Notes

 

La configuración que primero se nos ocurre es que el campo “Id” sea un índice primario, pero qué sucede si por alguna razón, tenemos que hacer siempre un SELECT donde en el WHERE se use con la columna UserName, por ejemplo, si tenemos que loguear al usuario y el dato que tenemos es el username, la consulta sería:

SELECT * FROM Users WHERE UserName=’pepe’;

En cualquier sistema, hasta en un servidor con escasos recursos, esta consulta no tarda mucho y poco importa si tarda unos milisegundos más o menos…  pero en ciertas ocasiones, con tablas de miles de registros,  dependiendo del entorno,  los milisegundos comienzan a tomar importancia, sobre todo si no tenemos mucho poder de procesamiento, el timing comienza a ser de mucha importancia.

Las bases de datos actuales, para resumirlo de una forma brutal y básica, guardan la información en estructuras de datos ordenadas (árboles avanzados) por índices, lo cual nos hace pensar que si buscamos por el campo “Id” es más rápido que si buscamos por el campo “UserName” ya que el campo “Id” está indexado.

Qué podemos hacer entonces??

Hagamos que UserName sea un índice!!!  Muy bonito, pero, cuando la base de datos busca, comparar un INTEGER (“Id”) es muchísimo más rápido que comparar un VARCHAR (“UserName”) de unos 20 caracteres de longitud, por lo tanto, cualquier operación en la tabla se vuelve un poco más lenta…

Entonces, que conviene más?  Usar un índice que haga la tabla más lenta en general, pero que devuelva una fila mucho más rápido? o usar un índice ágil, pero luego tendremos demoras al buscar una sola fila por otro campo?

La respuesta a esta última pregunta, depende un 100% del entorno del sistema en el que estemos…
En un sistema con pocos usuarios, podría no ser un inconveniente, pero cuando tenemos un servidor con escasos recursos, como por ejemplo la Raspberry Pi o similares y en nuestra aplicación una diferencia de milisegundos importa, se convierte en un inconveniente.

Se me ocurrieron algunas opciones para medir el rendimiento y realizar algunos benchmarks de MySQL corriendo con bajos recursos…

  • Comparar los resultados usando:

    • a)  El campo Id como Índice Primario y el resto campos comunes. (Situación normal)
    • b)  El campo Id como Índice Primario y el campo UserName como Índice Secundario
    • c)  El campo UserName como Índice Primario y el campo Id como Índice Secundario
  • Comparar el comportamiento de las configuraciones anteriores con distintas cantidades de registros:

    • a)  100 (cien) registros
    • b)  1000 (mil) registros
    • c)  10000 (diez mil) registros
    • d)  100000 (cien mil) registros
    • f )  1000000 (un millón) de registros
  • Ya que estamos haciendo benchmarks, comparemos los resultados anteriores con los dos motores más populares de MySQL:

    • a)  MyISAM
    • b)  InnoDB

Cómo hago estos benchmarks???

Para ello, escribí unas pocas lineas en PHP, que me permitieron llenar una BD de pruebas con información aleatoria y precisa para las pruebas…  si, en total generé más de 6.500.00 filas! casi 1GB de registros aleatorios. (demoró varios minutos)
Por si a alguien le interesa, al final del post les dejo la descarga del archivo PHP utilizado para generar los datos aleatorios, la estructura de la base de datos sin registros y la bd llena de datos.

Condiciones de las pruebas:

  • Las consultas se hacen en PhpMyAdmin.
  • Se busca un registro a la mitad del total de la tabla.
  • El tiempo es obtenido de PhpMyAdmin.
  • El servidor MySQL corre en una Raspberry Pi.
  • El sistema operativo es Raspbian ‘Wheezy’.
  • Las configuraciones de Apache/PHP/MySQL son las que vienen por defecto al instalarlos.
  • El cache MySQL es reseteado antes de cada prueba.
  • El campo UserName es un VARCHAR de 25 caracteres.
Ejemplo de consultas:
SELECT * FROM  `innodb_100_usuarios_noindex` WHERE id=50;
/*
(1 total, Query took 0.0061 sec)
id	name	           username	                 telephone	      notes
50	7Rge46fiCLEp8Au	84cdde86a4560c10000000050	vjA0WXmydgFTLf1	84cdde86a4560c10000000050 - 50
*/

SELECT * FROM  `innodb_100_usuarios_noindex` WHERE username='84cdde86a4560c10000000050'
/*
(1 total, Query took 0.0073 sec)
id	name	           username	                 telephone	      notes
50	7Rge46fiCLEp8Au	84cdde86a4560c10000000050	vjA0WXmydgFTLf1	84cdde86a4560c10000000050 - 50
*/

Basta de palabras!!!   quiero ver los números!

Todos los valores de tiempos están expresados en mS (milisegundos).

Aquí tenemos dos tablas, una con los resultados de MyISAM y otra con los de InnoDB. En ambos casos comparamos las tres configuraciones de índices mencionadas anteriormente (a, b, c) en distintas cantidades de registros. Primero buscando por el campo ID y luego por el campo UserName.

MyISAM

[table id=1 /]

InnoDB

[table id=2 /]

Resultados:

A primera vista, podemos notar la diferencia entre InnoDB y MyISAM en cuanto a la performance de los SELECTs.  Si tomamos como referencia las primer columna, donde ID es un Índice Primario, en InnoDB buscar un indice en 1.000.000 de registros, es un 35% más lento, pero si buscamos un registro no indexado, InnoDB es un 440% más lento que MyISAM.

Entonces, descartamos InnoDB para el resto de las comparaciones.

Evidentemente, buscar un registro que no es un índice, en un millón de filas, demora 8158.8 milisegundos, eso es es más de 8 segundos!!!  De hecho en 10.000 filas, demora casi 100 milisegundos, lo cual, en algunos casos, puede ocasionar problemas. Obviamente y como podíamos imaginar, si el tiempo es clave, ésta es la peor configuración.

Nos quedan dos opciones, utilizar el campo UserName como un indice secundario, o como un indice primario.

Como se puede ver en la tabla, si buscamos la mejor velocidad, nos conviene hacerlo un índice primario, pero en contra, la busqueda por ‘Id’ demora un par de milisegundos más.

Conclusión Final:

Como conclusión final, voy a elegir la configuración de “UserName como índice secundario“.  Porqué? Porque en promedio, parece tener los mejores tiempos, ningun tiempo supera los 6ms, sin importar si es índice primario o secundario. Realmente, muy buenos timings.

Iniciar y apagar Máquina virtual VirtualBox autoáticamente junto con el SO. (Ubuntu)

La mejor forma de iniciar una maquina virtual de VirtualBox al iniciar el sistema operativo host, y a su vez, apagarla cuando éste se apague, es con un script de init.d

Creamos un archivo para guardar el script:

sudo nano /etc/init.d/VM

Copiamos el siguiente contenido en el archivo y guardamos con F2.

#! /bin/sh
# /etc/init.d/VM
#

#Editar las siguientes variables
VMUSER=vbox
VMNAME="NombreDeLaVM"

case "$1" in
  start)
    echo "Starting VirtualBox VM..."
    sudo -H -b -u $VMUSER vboxmanage startvm "$VMNAME" --type headless 
    ;; 
  stop) 
    echo "Saving state of Virtualbox VM..." 
    sudo -H -b -u $VMUSER vboxmanage controlvm "$VMNAME" savestate
    ;; 
  *) 
    echo "Usage: /etc/init.d/VM {start|stop}" 
    exit 1 
    ;; 
  esac 

exit 0

Nota: la acción de stop del script, no apaga la VM, sino que guarda el estado y la cierra.

Le damos permisos de ejecución al script:

sudo chmod +x /etc/init.d/VM

Ahora, le decimos al script que la máquina virtual sea lo último que se inicie, y lo primero en apagarse.

sudo update-rc.d VM defaults 99 01

PD: Encontré este script acá, pero le hice un par de modificaciones para mejorarlo.