Tweaking the process definition cache in Activiti 5.12

Introduction

Generally, the Activiti engine works in a stateless way. All the state about process instances, tasks and whatnot is stored in the database and only retrieved when necessary. As such, a process instance can be ‘dormant’ for a very long time, without any impact (resource-wise) on running processes.

This approach has two major benefits:

  • It keeps the memory footprint of Activiti low. Nothing which is directly needed is offloaded to the database and only fetched when it is absolutely necessary.
  • It allows to run Activiti easily in a clustered setup (with a shared database, of course), as no node in the cluster will have any state stored which could be needed by another node. The database is the one and only ‘authentic source’ and when a node in a cluster needs some process state, it is fetched from the database, used, and written back.

There is an exception to that rule: during the execution of an operation of an Activiti service, the MyBatis frameworks (the underlying ORM framework) keeps a ‘session cache’ to avoid fetching the same data twice during the duration of a transaction. However, this cache is very short-lived and does not endanger the ability to run Activiti on a cluster.

Static data

As said, Activiti operations in a stateless way. But there is of course data that will never change, which make it a prime candidate for caching.

A process definition is an example of such ‘static data’. You provide the Activiti engine a BPMN 2.0 XML, the engine parses it to something it can execute (more specifically: a tree of pojo’s corresponding with the structure of the process definition) and stores the xml and some data such as the description, business key, etc in the database (It’s also very important to understand that Activiti does not store the parsed version of the process definition in the database). Such a process definition will never change. Once it’s in the database, the stored data will remain the same until the process definition is deleted.

On top of that, parsing a BPMN 2.0 xml to something executable is quite a costly operation (xml parsing always is). This is why the Activiti engine internally uses a process definition cache to store the parsed version of the BPMN 2.0 xml. As shown on the picture below, the mechanism is pretty easy: whenever the process definition is needed, eg. to start a process instance or to complete a task, the cache is checked and the process definition is only loaded and parsed when needed.

ProcessDefinitionCache

Tweaking the process definition cache

The process definition cache has been part of the first release of Activiti, and hasn’t known any major changes since then. Unfortunately, the cache wasn’t very sophisticated: just a plain dumb hashmap of all process definitions the engine ever touched during the whole uptime of the process engine. For most cases, this isn’t really a problem. You would need an awful lot of process definitions before this becomes a bottleneck.

However, my developer heart bled whenever I passed this class. And last week, I finally decided to spend some time in tweaking the process definition cache. You can find the git commit here: 

https://github.com/Activiti/Activiti/commit/bc6fe9ef4cf9e98210a75fd59faf61ac5dc790f3

Basically, it changes two things:

  •  It allows to set a limit to the process definition cache. If such a limit is set, the default hashmap of before will be swapped with an LRU cache with the provided hard limit.
  •  It makes the process definition cache (and also the rules cache) pluggable. I’ll explain this bit in a later blogpost.

Limiting the process definition cache

To demonstrate the use of the process definition cache limit, I knocked together a small command line example which you can find on github:

https://github.com/jbarrez/Activiti-process-definition-cache-pluggability

Note that the project already contains some code I will use in my next blogpost. Try to act surprised when you see it then 😉 To build the demo jar, run mvn clean package shade:shade. Go to the target folder and run the demo:

java -jar activiti-procdefcache-demo.jar default

This will boot an in-memory database, boot the process engine and create all the Activiti tables. The process engine will have a cache limit for the process definition cache in the activiti.cfg.xml configuration file:


<bean id="processEngineConfiguration" class="org.activiti.engine.impl.cfg.StandaloneProcessEngineConfiguration">

  ...

  <property name="processDefinitionCacheLimit" value="10" />

</bean>

The application will now ask you how many process definitions you want do deploy. You can fill in any number here. All the process definitions will be the same, only the process definition id will be changed such that a new cache entry will be needed for every such process definition:

demo01

After all the process definition have been deployed, the application will now ask you how many process instances you want to start:

demo02

Again fill in any number. Don’t expect anything fancy, you’ll just see a lot of logging passing by (the log level is set very low, because the process definition cache logs on a low level and otherwise you won’t see it). But in those loggings, you will often see that the cache limit is hit. The reason for this is that I set the limit pretty low (10, see configuration above), for demo purposes:

demo03

And that’s it! That’s all there is too it to set the cache limit, use an LRU cache and tweak the amount of process definition cached in your Activiti engine.

I want to try it!

This feature will be part of the Activiti 5.12 release. If you want to try it already today, you build a snapshot version of 5.12 until we release the real deal. Clone Activiti from github and run the ‘mvn clean install’:

https://github.com/Activiti/Activiti

Sidenote: do not use a snapshot version of Activiti on a production database. Chances are high your database will be screwed when upgrading to the actual non-snapshot version.

8 Comments

  1. Amr April 18, 2013

    Hi
    Thanks for your blog.. Just wanted to ask another question, Can Activiti run on multiple duplicated databases. Such as in a multi data center environment. The issue that I’m facing is to ‘propagate’ the database Lock across the multiple sites, so a process waiting for a timer for example, will only fire once a and not on each of the data centers.
    Thanks
    Amr

  2. Joram Barrez April 18, 2013

    @Amr: I don’t quite get the use case. You mean Activiti goes to one database and that database is then duplicated? Why would you want a timer to fire multiple times, if the databases are duplicated?

  3. Mike Brown August 17, 2013

    Hi.

    What happens if the business process definition is updated?
    Will the cache refresh or do you have to bounce the process engine?

  4. Joram Barrez August 17, 2013

    @Mike: updating a process definition is not possible in Activiti, but you can deploy a new version of the same business process definition. The caching system will just work in this case.

  5. tommy November 8, 2013

    i am wondering, is there a way to reload process processname.bpmn20.xml into cache without deploying new one or restarting Application.
    I mean, if i changed the process definition model file BPMN 2.0 XML online, is there a way to delete the prased one in cache and reparse the new one into cache?

  6. Joram Barrez November 12, 2013

    @Tommy: you probably will need to get the ProcessEngineConfiguration from the ProcessEngine, and cast it to one of the ProcessEngineConfigurationImpl. That class has access to the processDefinitionCache, which has a clear() method.

  7. rajkumar November 12, 2014

    Hi

    When does the process definition gets refreshed? i believe if we restart the server. We have activiti running on two instances and we do a rolling deployment. In that case, will the cache will be cleared or will be maintained the same way.
    We have a issue of recurring Deployment is creating instances after being suspended. The process definition says it is suspended. But still it creates.

  8. Joram Barrez November 14, 2014

    @rajkumar: the process definition is put in the cache when it isn’t there and you need to start a process instance of it/get a task/etc. It stays there until the cache impl evicts it.

    If you restart the server, the cache will be empty again. It will be filled again when the process definition is needed. So nothing is needed when you do rolling deployment.

    About your issue: probably better to discuss this on the forum, way more people see it there.

Leave a Reply

Your email address will not be published. Required fields are marked *