My first brush with an open-source, enterprise-grade software was during my 12-month long internship @ WSO2. It was the companys’ IoT Server middleware that gave me a taste of what it's like to work along with a community of developers in a matured product. It was this experience that thereafter fueled my final year thesis, my research publications and eventually lead me to work on Docker and eventually GSoC.
Before I started working on this project I had absolutely no clue as to what an ESB was or how one worked. As far as I was concerned it was software that ran on a secluded server somewhere, shuttling data and requests between different web services.
Little did I know, that in a few weeks I’d be hip deep doing programming on one.
At the beginning of any project, a list of objectives is created that one hopes to achieve by its’ conclusion. GSoC is no different.
Since most are still getting accommodated with the piece of software by the time the objectives are being formulated, the support of the allocated mentor is also taken.
These objectives however are not set in stone, rather they evolve and grow based on the progress of the task at hand and what has been achieved.
The biggest update to my objectives list came after my 1st evaluation, which took place about 1.5 months after the start of the project.
As you might’ve noticed, the objectives went from being very broad to extremely specific.
This blog post will be an exploration of the decisions taken to achieve those objectives and will be organized as such.
But before that… a commercial break with some explanations…
…Enterprise Service Bus (ESB)
A piece of software acting like a central hub that helps wire different services together. Their middleman role allows ESBs to perform tasks like proxying, analytics, translation, logging and more.
They’re especially useful when it comes to working with heterogeneous services and also bundling services into atomic units.
In the days before microservices and containers, ESBs reigned supreme. However, they are not without merit in the post-Kubernetes world either.
“Just like VMs in practice, completely unlike them in performance.”
This is the mantra used by devs to explain containers.
Birthed from several Linux kernel features like cgroups and namespaces, they deserve a majority of the credit for creating the quick response web landscape we see today.
Before containers, VMs were the name of the game and bare-metal hardware even before them. Enabled by new advances in virtualization, VMs allowed multiple applications to be concurrently run on the same hardware without hindering performance. However, they were quite bulky, cumbersome and slow to start.
The magic of Containers is that they work just like VMs but with less heft.
Out of the multitude of products available, Docker has emerged as the defacto standard for the technology, so much so that it has taken on the role of being the poster boy.
Several other products have since been created based on its’ foundations.
Released in 2014 after graduating as an internal project at Google, Kubernetes is what is called a “container orchestration tool”.
What this means is that Kubernetes manages a lot of containers deployed concurrently, so that it can do a lot of cool stuff like having zero service interruptions and auto-scaling based on demand.
It's a common misconception that Kubernetes and Docker are mutually exclusive, with the two often pitted against each other. While Docker does have its own built-in container orchestration system; Docker Swarm, Kubernetes has since emerged as the industry standard but still runs on the foundation of Docker.
Having said that, Kubernetes has certain concepts and techniques that differ quite significantly from Docker, so much so, that moving from one to the other feels like moving from a mountain bike to a motorbike.
While usually, Kubernetes would be running on server infrastructure, for the purposes of testing out our implementation, we would be using the single node, local runtime; minikube.
With that out of the way, we can now get on with the regular program.
The main goal of this project was to take the preliminary steps in bringing Synapse to a cloud-native form.
Accordingly, the 12 Factor apps were used as a blueprint in our work, specifically the 3rd factor: Configuration.
In a nutshell, what it describes is that ‘apps’ or cloud-native services should be made such that they are environment agnostic. This means that the exact same image can be launched locally or in the cloud, with zero configuration changes. And any changes that do exist, are present in the environment itself, ready to be injected during runtime.
This was established when defining the objectives at the very onset of the project and it was what guided our decisions moving forward.
Accordingly the objectives we defined, and the decisions we took are as follows;
1. Parameter injection
ESBs like Synapse are pretty much useless without one key component;
a configuration file. This is what ‘instructs’ Synapse on exactly what to do with the requests it’s getting.
Configuration files themselves have been around for quite a while in the middleware space and not just in ESBs either. Even build automation and package management tools like Maven and NPM use them to catalogue the multitude of dependencies used by a particular project.
While those opt for more modern formats like JSON and YAML, Synapse instead uses the somewhat dated XML format of Apache Axis2 descent.
In a classic deployment, the config file is static and requires the support of an automated build and deployment process to be ‘mounted’ onto a Synapse server. However, this arrangement would not be practical in a cloud-native environment as a new image would be needed for every minute configuration change, particularly when it comes to changes involving URI endpoints.
What this objective aimed to do was to pass the endpoint address to the config file as a variable during runtime. This also had to be done while accomodating that it would be running within a container controlled by Kubernetes.
We figured that the most ideal way this could be achieved is through environment variables.
For this to happen, 2 things needed to be implemented;
- Create a keyword that can be detected when included within a config file.
- Set up a keyword detection system that would then populate the detected keyword with the preset environment variable.
For the 1st part, we took some inspiration from a fork of Synapse; WSO2 EI and defined an ‘injection keyword’ named $SYSTEM. This will be then suffixed by the variable name with a colon separating the two.
An example config file snippet is as follows…
To begin with, we decided to focus on injecting parameters to address URIs for SOAP and WSDL requests.
Accordingly, to detect and populate the config file, code changes were done here…
SOAP service code changes · n-jay/synapse
Apache Synapse is a lightweight and high-performance Enterprise Service Bus (ESB) - synapse/AddressEndpointFactory.java…
WSDL service code changes · n-jay/synapse
Apache Synapse is a lightweight and high-performance Enterprise Service Bus (ESB) - synapse/WSDLEndpointFactory.java at…
However, none of these changes would’ve made a smidge of difference if the environment variables couldn’t actually be set within the container.
Luckily, Kubernetes came to our rescue with a handy little feature called ConfigMaps.
ConfigMaps are built for the express purpose of passing configurations to Pods to enable the portability of containerized applications. They’re created in the exact same way one would create a pod or a deployment, by using a .yml file. And within this .yml file is where we define our environment variables as key-value pairs under the data argument.
Thereafter it's simply a matter of connecting them to the deployment, in its own .yml file under the env argument.
And with that, the first objective was complete.
2. Configuration hot-swapping
Coming from the same line of thought as parameter injection, the other objective we were mulling over was whether it’d be possible to hot-swap the entire configuration file itself.
Imagine if the Synapse ESB were a SEGA Genesis; a fourth generation home gaming console that required a cartridge to contain the game data.
If you swap a cartridge and insert a new one, a new game stored in that respective cartridge starts running.
But with Synapse, instead of cartridges and games, it's config files and mediation logic.
In a standard runtime, this process is fairly straightforward.
You simply SSH into the machine Synapse is running on, swap the config file and restart the server. But this is not the case with cloud-native Synapse.
PersistentVolume is a feature provided by Kubernetes that allows developers to ‘allocate’ a bit of disk space to be used by a running pod or deployment. However, while pods and deployments are ephemeral, PersistentVolumes (as the name might suggest) are permanent.
Getting one set up, especially on the first try, was a little bit tricky.
The process starts by literally defining a PersistentVolume. This is what creates the storage area within the disk space based on the instructions given in its .yml file.
The capacity argument defines the ‘amount’ of storage (in this instance in megabytes) and the hostPath defines the directory where the shared files are stored, similar to a NFS (Networked File System). On minikube, this directory has to be created (if it doesn’t exist) via the minikube ssh command. This would differ on a remote cloud deployment of Kubernetes depending on what type of PS service you are using (AWS Elastic BeanStalk, Azure Disk etc.).
Once the PersistentVolume has been created then we move on to creating the PersistentVolumeClaim.
This is similar to a subscription that a deployment makes to a persistent volume. Why Kubernetes has included an intermediary ‘claim’ system rather than allowing containers to directly access the PV is so that it enables multiple pods or deployments running within a node to share the same PersistentVolume.
After that, it’s simply a matter of ‘connecting’ the deployment to the PV claim...
and then mounting the PersistentVolume space onto a directory within the container.
And with that, the 2nd objective was also completed.
3: Leaner Image
Last but not least, we wanted to make the image as lean as possible.
With all code changes finalized, the finished Docker image came to a whopping half a gigabyte incapacity. This is due to the fact that we were using the OpenJDK 8 image as a base which was already chonky on its own.
This was not acceptable as it's advised to make Docker images as small in size as possible to enable enhanced portability and increased performance.
However sometimes, trimming the fat is not so straightforward.
My first approach was to use a tool to get the job done.
Unfortunately, the tool failed to read the Synapse image correctly and crashed with an error.
My 2nd approach was to address the base image itself.
The unzipped Synapse directory on its own was around 60Mb. Therefore it was pretty logical to assume that most of the extra weight was coming from the base image.
With that in mind, I went for the most minuscule image I could find; Alpine Linux and amended my Dockerfile to use it as the base.
Needless to say, the results (shown in the screenshot below as the v9 tag of the docker-synapse image) speak for themselves.
The change in base image saw the image reduce to approximately 1/5th of the size it was initially.
While this too is still too large for comfort, it's a start nonetheless and a leapfrog improvement over what was before.
And with that, the 3rd objective was also achieved, bringing the project to a close.
This project took a sum total of 10 weeks to complete and I can confidently say what I learnt within that period is vast and fascinating.
While I am nowhere near as proficient as I’d like to be in any of these technologies, the taste I got while working on this project has encouraged me to keep on going.
For those of you who are interested, the complete GitHub repo with all the containerization related code can be found here…
GitHub - n-jay/gsoc-2021: Repo for GSoC project to dockerize Apache Synapse ESB
Project dedicated to containerizing the Apache Synapse ESB. Client - Project Directory for simple HTTP client (Written…
and a video demonstration of the final output is linked below.