Using a distributed cache as process definition cache in Activiti 5.12

In my last post, I described the general working of the process definition cache and how to limit the amount of data stored in it. If you haven’t read it yet, I would (of course) welcome you to read it first.

So what if the default cache implementations for some reason don’t cut it for you? Well, don’t worry, we made sure the cache is pluggable and it is very easy to inject your home-brew version. And in this post I’ll show you how to do it.

A distributed cache, you say?

As you could have guessed from the title, we’re going to swap the default process definition cache with a distributed one. Simply put, a distributed cache is generally a key-value store which data is distributed across multiple nodes in a networked cluster. There are a few reasons why you might decide to do this:

  •  You are running Activiti in a cluster and you have an awful lot of process definitions. Most of them are used very frequently. Storing all these process definitions in the cache takes too much memory. But you also don’t want to introduce a hard cache limit because you want to avoid hitting the database too much on a cache miss.
  • You are running on off-the-shelf hardware with limited memory. You want to distribute the memory usage.
  • For some reason, database access is slow and you want to load every process definition only once for the whole cluster.
  • It is just plain cool.

There are plenty of distributed cache implementations: Infinispan, HazelCast, GridGain, EHCache and many, many others.

I chose for Infinispan for the simple reason I already knew its API. Besides a personal preference, it also has some nice ‘extras’ beyond the distributability such as support for JTA transactions when accessing the cache or dealing with Out-of-memory exceptions by evicting stuff from the cache automatically. But the point of this post is to show you how easily you could swap this implementation with your personal preference.

Show me some code!

The first thing you need to do is to make your process engine aware of your process definition cache implementation. Add the following property to your activiti.cfg.xml:


<bean id="processEngineConfiguration" class="org.activiti.engine.impl.cfg.StandaloneProcessEngineConfiguration">
    ...

    <property name="processDefinitionCache">
        <bean class="org.activiti.cache.DistributedCache" />
    </property>

</property>

The referenced class must implement the org.activiti.engine.impl.persistence.deploy.DeploymentCache interface, which looks as follows:


public interface DeploymentCache <T> {

    T get(String s);

    void add(String s, T t);

    void remove(String s);

    void clear();

 }

As you can see, this is a pretty generic interface, which makes it easy to plug in any kind of cache implementation. The Infinispan implementation looks as follows:


public class DistributedCache implements DeploymentCache<ProcessDefinitionEntity> {

    protected Cache<String, ProcessDefinitionEntity> cache;

    public DistributedCache() {
        try {
            CacheContainer manager = new DefaultCacheManager("inifispan-cfg.xml");
            this.cache = manager.getCache();
            this.cache.addListener(new CacheListener());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public ProcessDefinitionEntity get(String id) {
        return cache.get(id);
    }

    public void add(String id, ProcessDefinitionEntity processDefinitionEntity) {
        cache.put(id, processDefinitionEntity);
    }

    public void remove(String id) {
        cache.remove(id);
    }

    public void clear() {
        cache.clear();
    }

}

The real meat here is in the constructor. All the other methods are actually exactly as you would use a regular HashMap. In the constructor, a distributed cache is created using an specific configuration file. The cache listener is added there just for logging purposes so you’d see the contents of the cache in the logs. The actual Infinispan config is pretty simple (also, kudos to the Infinispan team, really good docs!):


<infinispan>
    <global>
        <transport>
            <properties>
                <property name="configurationFile" value="jgroups-tcp.xml" />
            </properties>
        </transport>
    </global>
    <default>
        <!-- Configure a synchronous replication cache -->
        <clustering mode="distribution">
            <sync />
            <hash numOwners="2" />
        </clustering>
    </default>
 </infinispan>

For the actual details, I kindly refer you to the Infinispan documentation. Basically, this config uses jGroups to facilitate the communication using TCP. The following configuration lines state we want at least two nodes in the cluster to have the data (which means a node can fail without data being lost).

I want to try it myself!

To demonstrate the use of the distributed cache, I knocked together a small command line example which you can find on github:

https://github.com/jbarrez/Activiti-process-definition-cache-pluggability

To build the demo jar, run mvn clean package shade:shade. Go to the target folder and run the demo:

java -jar activiti-procdefcache-demo.jar distributed

This will boot an in-memory database, boot the process engine and create all the Activiti tables. The application will ask you for a number of process definitions to generate. Fill in anything you like, but you can make it pretty big, because the data will be distributed anyway. You can now start process instances after all process definitions are deployed. Open a few new terminals and execute the command above again in there. When you start new process instances now, the logging will show you that the cached process definitions are spread across the nodes. You will also see some nodes will have more entries then others.

This is how it looks like when I fire up 9 nodes, with 1000 process definitions in the database:

cache_screenshot01

And when you shut down some nodes again (here only three survive), you will see that Infinispan takes care of distributing all the cached entries nicely across the remaining nodes

cache_screenshot02

All the cached process definitions are now nicely spread across all the nodes in the cluster. Isn’t that pretty?

So that’s all it takes to plug in your own cache implementation. It doesn’t need to be a distributed cache of course, any Java implementation will do. The only limit is your imagination … ;-)

One Comment

  1. Larry July 17, 2013

    Today I learnt about a new concept.You have explained it in detail. Thanks!!!

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>