Collaborators: Daniel Hallmark, Aaron Jackson & Ryan Johnston
We recently published a blog entry outlining the results of a set of performance tests we ran against Flowable in a high-powered AWS environment (link). After concluding that effort, we were excited to run a similar set of performance tests against Camunda in the same environment... and we're excited to be able to share those results now!
Camunda's Job Executor is one of the unique and differentiating features in the platform. Its role is to handle autonomous and asynchronous continuations in process instances; in other words, in situations where process instances reach wait states where external intervention isn't required, the Job Executor steps in. These autonomous, asynchronous continuations are called jobs, and they're managed in database tables (as queues) that can be used by clustered Job Executors, allowing for nearly seamless scaling based on processing requirements.
To accomplish this, the Job Executor uses a table - the ACT_RU_JOB table - as a queue in its relational database. Jobs are "acquired" through job acquisition and then executed using a configured thread pool in each Job Executor instance. Here's a quick diagram from Camunda's online documentation that illustrates how that works:
Since clustered Job Executor instances all pull from the centralized job table referenced above, there is contention in a clustered environment when attempting to pull those jobs for processing, and that contention does cause some inefficiency. Camunda added a specific feature years ago to improve job acquisition at scale (in a cluster) and to address that inefficiency. That feature is referred to as "exponential backoff", and it's discussed in this blog entry: https://camunda.com/blog/2015/09/scaling-camunda-bpm-in-cluster-job/.
Understanding that one size does not fit all, Camunda has provided a set of configuration parameters that can be used to tune Job Executor performance to align with any user's/customer's specific performance needs. A full listing of those configuration settings can be found here.
So - with that background - how does Camunda Platform (Community Edition) perform at massive scale? And does the exponential backoff feature really offer any performance improvements in high-volume, clustered scenarios? Let's find out.
These tests were run using the same environment that we used for our Flowable performance testing in our earlier blog post, meaning that it was an environment with some serious horsepower. Each node in the 8-node cluster we employed was a c5.2xlarge node with 8 virtual CPU's and 16 GB of RAM. The database was a single db.m6g.8xlarge node with 32 CPU's and 128 GB of RAM (!). Sweet! Here's a diagram depicting that environment:
All tests were run with 1,000,000 instances of either "no-op" jobs (jobs that don't perform any specific operations) or jobs that executed for a random duration based on a configured maximum duration. Note also that we're trying to replicate "real world" scenarios to the fullest extent possible; as a result, we kept the history level set at "full", which is what most customers use due to strict reporting and/or compliance requirements.
First, let's take a look at how Camunda handles 1M no-op job instances with 8 job execution threads per node:
2,480 jobs per second equates to 8.9M jobs per hour, a staggering amount of throughput. However, remember that these jobs weren't actually doing anything. Let's take a look at how Camunda did in processing 1M job instances where each job was configured to take between 0-99ms (with a ~50ms average), still with 8 job execution threads per node:
These numbers are more representative of what we might see in a typical environment, as an average job execution time of ~50ms is probably pretty close to what we would normally see in a user/customer implementation. 1,110 jobs/second is still blazing fast, equating to just under 4M jobs per hour.
Let's raise the bar again, this time processing jobs with an average execution time of ~500ms (i.e. a randomly-distributed set of wait times between 0-999ms). Let's also allocate varying numbers of job execution threads to each node:
With job execution times averaging ~500ms, job acquisition won't happen as often, meaning there is less contention when attempting to acquire jobs. However, the number of job execution threads becomes much more important, as those threads are busy with the longer-running jobs. As a result, you see the results get much better as we allocate more threads to the thread pool.
Folks, this is very, very fast throughput with these extremely long-running jobs, which are much longer-running than most jobs we would expect to see at customer sites. 1,990 jobs/second equates to over 7M jobs per hour... with jobs that are taking half a second on average!
Before we wrap up, let's take a look at Camunda's performance with their exponential backoff feature enabled.
When exponential backoff is enabled, a Job Executor node/instance that attempts to acquire and lock jobs and fails to do so - meaning that another Job Executor is contending for those jobs - will then wait for a specified amount of time before trying to acquire jobs again. By exponentially increasing the wait time if & when this recurs, the theory is that the job acquisition cycles will overlap less and less over time, thus eventually resulting in interleaved job acquisition operations and the reduction of any inefficiencies associated with acquisition contention. Does it work? Let's see.
For these tests, we used the following exponential backoff settings:
- maxBackoff: 450ms
- backoffTimeInMillis: 90ms
- waitIncreaseFactor: 2.0
Here are our observed results with 1M no-op jobs:
It works! Job throughput almost doubles, as the contention for jobs is reduced. This makes sense, as we have eight nodes contending for 1M jobs that take essentially no time to complete. Here, exponential backoff really shines, but will it shine in a more real-world scenario, i.e. with job execution times averaging ~50ms? Let's see:
Exponential backoff shines once again, in this case offering about a 61.5% improvement in throughput in our scenario with an average job execution time of ~50ms.
We didn't notice any statistically significant differences in throughput with exponential backoff enabled with longer-running jobs (an average of ~500ms) or with jobs averaging ~50ms with lower numbers of max threads. This makes sense, of course, as there would be less contention for jobs in those scenarios and thus less opportunity for efficiency improvements associated with reduced job acquisition contention.
Finally, of note is the fact that Camunda efficiently utilized the super high-performance database we employed for these tests. The performance was stable without any observed deadlocks, and Camunda consistently used over 50% of the CPU power in that huge database(!).
We're very impressed with Camunda's performance in these tests, and we believe that these results should inspire confidence among users or customers who are looking to utilize Camunda in any high-throughput scenarios. Moreover, we were surprised at just how effective Camunda's exponential backoff feature was in improving throughput in situations where job acquisition contention exists. As such, we would encourage any user running high volumes of jobs in a clustered configuration to test the exponential backoff settings in their environment to see if the feature can help them to achieve significant performance improvements.
Please don't hesitate to contact us at firstname.lastname@example.org if we can answer any questions for you or if you need assistance with scaling or performance testing your Camunda Community Edition or Enterprise Edition environments.