In our last blog entry, "An Overwhelming Set of Workflow Options"*, we took a look at three categories of workflow solutions - BPM suites, open-source BPM platforms and lightweight workflow platforms - and reached the following conclusions:
BPM suites (e.g. Appian, IBM BPM and Pegasystems) are most appropriate when users expect additional capabilities such as advanced user interface generation and reporting capabilities from their workflow solution,
Lightweight workflow platforms (e.g. Netflix Conductor and Zeebe) are unproven but promising tools if massive scalability (>= ~1B workflow instances per day) is required from a workflow solution and
Open-source workflow platforms are most appropriate in situations where flexibility and developer-friendliness are required.
Since it may be difficult for some users to choose between the much newer "lightweight" workflow platforms and the older (legacy? :) ) open-source workflow platforms, let's dig into them a bit more and examine the relative capabilities of each.
Background
Camunda BPM is one of a set of open-source workflow tools that share a common heritage. Joram Barrez and Paul-Holmes Higgin of Flowable authored a wonderful blog entry that covers that heritage in detail here. In short, there are four related platforms in the open-source BPM platform market: jBPM (link), Activiti (link), Camunda BPM (link) and Flowable BPM (link). jBPM is the spiritual ancestor of all of those platforms, and its creators developed Activiti, an evolution of the concepts in jBPM at that time. Camunda BPM and Flowable BPM have both been forked from Activiti. Throughout the remainder of this blog entry, we'll use Camunda BPM as the representative for this set of related products, but the other products in this category have similar features and capabilities.
Netflix Conductor and Camunda's Zeebe are two of the new lightweight workflow engines that have been introduced over the last several years. In Conductor's case, it was built to "orchestrate microservices based process flows" (link); in Zeebe's case, it was designed to "solve the microservices orchestration problem" (link). Moreover, both products are designed to be able to be scaled to handle massive numbers of concurrent instances. In other words, they have very similar stated goals.
Netflix and Camunda - with Conductor and Zeebe - have focused on building very fast and scalable state machines, i.e. workflow engines that manage the states of each instance very effectively. Both generally expect external clients to complete the work in each step of each workflow; while this definitely has its advantages, it has tended to limit the features and functionality in those tools.
Relative Capabilities
Camunda BPM provides nearly full support for BPMN (Business Process Model and Notation) 2.0, an independent standard in the workflow community. It supports:
Tasks: User Tasks, Service Tasks, Script Tasks, Business Rule Tasks etc.
Events: Timer Events, Message Events, Signal Events etc.
Gateways: Exclusive Gateways, Parallel Gateways, Inclusive Gateways and Event-Based Gateways
Other: Pools, Lanes, Call Activities (for calling separate workflows), Embedded Subprocesses etc.
The lightweight workflow tools only provide subsets of this functionality. For instance, Zeebe only supports:
Tasks: Service Tasks
Events: Timer Events and Message Events
Gateways: Exclusive Gateways
Other: Embedded Subprocesses
Netflix Conductor doesn't support BPMN 2.0 and has its own modeling language that uses JSON. For comparison purposes, we can say that it supports analogs of the following (with the Netflix Conductor names for each in parentheses):
Tasks: Service Tasks (called Simple Tasks) and Script Tasks (Lambda Tasks)
Events: Message Events (Events and Waits)
Gateways: Exclusive Gateways (Decisions) and Parallel Gateways (Forks)
Other: Call Activities (Sub Workflows)
While Zeebe effectively only allows logic to be executed by external clients, Netflix Conductor provides some functionality to allow work to be done within the engine itself, e.g. with Lambda Tasks.
We believe that these platforms will evolve to provide a more complete set of workflow capabilities, but they're currently somewhat limited.
Use Case Comparison
Let's implement a simple workflow/state machine use case in all three platforms and in so doing provide a technical comparison of the three tools.
First, here's the process model that's used for our use case in both Camunda BPM and Zeebe:
Figure 1: Automated Invoice Workflow in Camunda BPM and Zeebe
And here's the workflow definition that's used in Netflix Conductor:
Figure 2: Automated Invoice Workflow in Netflix Conductor
Let's talk briefly about the diagrams. Since both Zeebe and Camunda BPM leverage the BPMN 2.0 standard, their process models look the same visually. On the contrary, since Netflix Conductor uses the previously mentioned proprietary JSON representation and a custom renderer, its model looks different. Hidden behind these visual representations are significant differences:
As Camunda BPM has nearly full support for BPMN 2.0, it supports the vast majority of the BPMN 2.0 standard. Thus, much more from the set of BPMN 2.0 features is available than is shown here. The diagram is necessarily limited to what Zeebe and Netflix Conductor can support.
Zeebe's support is limited essentially to what is shown here; the only features that are currently available and aren't illustrated here are Timer and Message Events and Expanded Subprocesses.
Netflix Conductor's support - when filtering out the BPMN 2.0-specific terminology - is similar to Zeebe's, although Netflix Conductor also offers the ability to incorporate JavaScript logic inline, to call Web services inline, to fork execution and to reference a separate workflow definition from within a "parent" workflow definition.
You may have noticed the term "inline" in #3 above, and that gives us a nice opportunity to transition to a very important design tenet of both Netflix Conductor and Zeebe, which is also shared by Camunda BPM when using External Tasks. Since these tools are acting simply as state machines here and are expecting external clients to execute the task work, they essentially operate queues that maintain sets of work that are subscribed to and completed by entirely external applications. Zeebe uses a term that sums it up perfectly: Job Workers. These external clients are expected to subscribe to task types and to complete the work for those task types, letting the engines/state machines know once each unit of work has been completed.
In an effort to illustrate the agnostic approach each of these tools takes toward the choice of technology for their clients or Job Workers (using Zeebe's terminology), we've used C# to write the clients that are used in this technical comparison. Some important details of how the clients were written:
For Zeebe, we've used their community-contributed C# client (link), which uses gRPC (link) to connect in to Zeebe. NOTE: Since Netflix Conductor also uses gRPC, we could likely adapt this client - which is licensed using the Apache Software Foundation, V2.0 license (link) - for use with Netflix Conductor fairly easily.
For Netflix Conductor and Camunda BPM, we've used their respective REST API's to build client applications. In C#, we used the System.Net.Http namespace (link) to connect to the REST API endpoints. NOTE: Connecting to Zeebe via a REST API was not an option, as it currently doesn't offer a REST API.
Now, let's delve just a bit more deeply into how we built the clients for each of these platforms.
Netflix Conductor
We used the following three REST API endpoints to poll for and complete tasks in Netflix Conductor:
GET /tasks/poll/batch: This endpoint allows a client to "long poll", i.e. instruct the server to keep the connection open for a specified period of time or until at least one task arrives that meets the criteria supplied.
POST /tasks/{taskId}/ack: This endpoint allows a client to "acknowlege" the task, essentially locking it for completion to keep other clients from working on the same task(s) simultaneously.
POST /tasks: Finally, this endpoint allows a client to complete a task, providing any new data to the specific workflow instance.
Here is an example line of code, in this case the code that uses RESTUtility - our utility class - to long poll for available tasks:
Object lpResponse = RESTUtility.get(CONDUCTOR_BASE_URL,
"poll/batch/Determine%20Approver%20Groups?count=5&timeout=5000&workerId=" + WORKER_ID,
typeof(List<PollResponse>));
In this case, the lpResponse Object is polymorphic; it can either represent a valid response or an error, and we handle it accordingly. More specifics are beyond the scope of this blog post.
Everything worked exactly as expected; we were able to complete the workflow instances as intended.
Zeebe
Since Zeebe's community has provided an elegant, community-contributed C# client, writing the client code for each Zeebe task was very easy. Here's an example of a call to open a subscription for the completion of a task:
client.NewWorker() .JobType("determine-approver-groups") .Handler(HandleDetermineApproverGroups) .MaxJobsActive(5) .Name(WORKER_NAME) .AutoCompletion() .PollInterval(TimeSpan.FromSeconds(1)) .Timeout(TimeSpan.FromSeconds(10)) .Open();
As it refers to a function called HandleDetermineApproverGroups, here is that code for your review:
private static void HandleDetermineApproverGroups(IJobClient jobClient, IJob job) { var jobKey = job.Key; Console.WriteLine("Handling \"determine-approver-groups\" job: " + job); jobClient.NewCompleteJobCommand(jobKey)
.Variables("{\"approverGroups\":\"accounting\"}")
.Send(); }
As indicated above, since this is technically a gRPC client, it should be usable with few modifications with Netflix Conductor, which also supports the use of gRPC for its clients.
It was also very easy to open multiple subscriptions within the same application. Although we'd like Zeebe to offer a REST API, we found the C# client very easy to use. It was actually easier than writing REST API clients in C# for Netflix Conductor and Camunda BPM.
Camunda BPM
Since Camunda BPM distills the process of long polling and acknowledging/locking tasks into one REST API call, it was slightly easier to interact with via REST than was Netflix Conductor. And as a result, there were only two endpoints used:
POST /external-task/fetchAndLock: Allows for long polling and the fetching and locking of tasks from multiple topics (task types) in one call.
POST /external-task/{id}/complete: Allows for the completion of a task, including providing any new variables for the workflow/process instance.
The code used here was similar to the code used for the Netflix Conductor client, including the use of the RESTUtility class. As with Conductor and Zeebe, no issues were encountered in subscribing to and completing the tasks.
Conclusion
All three of these products were able to provide the state machine capabilities and simple branching capabilities that we needed for this exercise. Although Zeebe's C# client - contributed by their community - was very elegant and easy to use, Zeebe's lack of a REST API client means you're limited to the clients they provide or to interfacing directly via gRPC to the engine. On the flip side, although Netflix Conductor and Camunda BPM didn't offer C# clients, their available REST API's provide greater flexibility in client choice.
From a scalability perspective, Netflix Conductor and Zeebe feature theoretically superior architectural design characteristics that should allow for much more effective scalability in building enterprise applications. However, where the rubber meets the road and as discussed in our previous blog entry entitled "An Overwhelming Set of Workflow Options"*, we don't have real-world performance data on these new platforms that illustrates their ability to scale beyond what we've seen in the open-source workflow platforms.
The difference that stands out to us is the more complete feature set that Camunda BPM offers. While it can provide the same state machine capabilities and support the use of external clients as the others do, its more complete feature set gives its users many more options for building out workflow definitions. For that reason, it is likely a better option in most cases than are the other two, with the exception being cases where massive scalability is required.
If you'd like more information regarding the capabilities of these platforms and/or are interested in speaking with us about your own evaluation process, please let us know at info@summit58.co.
* - "An Overwhelming Set of Workflow Options" (Summit58 blog) (link)
Comments