Posts in Category: Alfresco

Upcoming Webinar: Process Driven Spring Applications with Activiti – Sept 23rd

oss-logo-spring

Next week, I’ll be doing a webinar together with my friend Josh Long (he’s a Spring Developer Advocate, committer to many open source projects and of course Activiti). I will show some of the new Activiti tooling we’ve been working on recently, while Josh will demonstrate with live coding how easy it is to use Activiti in Spring Boot (spoiler: really easy).

You can register for the webinar for free here: https://spring.io/blog/2014/07/29/webinar-process-driven-spring-applications-with-activiti-sept-23rd

One day later, the Alfresco Summit will be kicked off in San Francisco. I’m joining two talks there:

For those who can’t make it to San Francisco: don’t worry, we’ll be landing in London two weeks later, Oct 7-9!

Execute Custom queries in Activiti

(This will probably end up in the user guide of the Activiti 5.15 release, but I wanted to share it already)

The Activiti API allows for interacting with the database using a high level API. For example, for retrieving data the Query API and the Native Query API are powerful in its usage. However, for some use cases they might not be flexible enough. The following section described how a completely custom SQL statement (select, insert, updated and deletes are possible) can be executed against the Activiti data store, but completely within the configured Process Engine (and thus levering the transaction setup for example).

To define custom SQL statements, the Activiti engine leverages the capabilities of its underlying framework, MyBatis. The first thing to do when using custom SQL, is to create a MyBatis mapper class. More info can be read in the MyBatis user guide. For example, suppose that for some use case not the whole task data is needed, but only a small subset of it. A Mapper that could do this, looks as follows:

public interface MyTestMapper {

  @Select("SELECT ID_ as id, NAME_ as name, CREATE_TIME_ as createTime FROM ACT_RU_TASK")
  List<Map<String, Object>> selectTasks();

}

This mapper must be provided to the Process Engine configuration as follows:

...
<property name="customMybatisMappers">
  <set>
    <value>org.activiti.standalone.cfg.MyTestMapper</value>
  </set>
</property>
...

Notice that this is an interface. The underlying MyBatis framework will make an instance of it that can be used at runtime. Also notice that the return value of the method is not typed, but a list of maps (which corresponds to the list of rows with column values). Typing is possible with the MyBatis mappers if wanted.

To execute the query above, the managementService.executeCustomSql method must be used. This method takes in a CustomSqlExecution instance. This is a wrapper that hides the internal bits of the engine otherwise needed to make it work.

Unfortunately, Java generics make it a bit less readable than it could have been. The two generic types below are the mapper class and the return type class. However, the actual logic is simply to call the mapper method and return its results (if applicable).

CustomSqlExecution<MyTestMapper, List<Map<String, Object>>> customSqlExecution =
    new AbstractCustomSqlExecution<MyTestMapper, List<Map<String, Object>>>(MyTestMapper.class) {

  public List<Map<String, Object>> execute(MyTestMapper customMapper) {
    return customMapper.selectTasks();
  }

};

List<Map<String, Object>> results = managementService.executeCustomSql(customSqlExecution);

The Map entries in the list above will only contain id, name and create time in this case and not the full task object.

Any SQL is possible when using the approach above. Another more complex example:

  @Select({
    "SELECT task.ID_ as taskId, variable.LONG_ as variableValue FROM ACT_RU_VARIABLE variable",
    "inner join ACT_RU_TASK task on variable.TASK_ID_ = task.ID_",
    "where variable.NAME_ = #{variableName}"
  })
  List<Map<String, Object>> selectTaskWithSpecificVariable(String variableName);

Using this method, the task table will be joined with the variables table. Only where the variable has a certain name is retained, and the task id and the corresponding numerical value is returned.

This will be possible in Activiti 5.15. However, the code (and more specifically the Command implementation and the wrapper interface) can be used in any older version of Activiti.

Reporting capabilities in Activiti 5.12

In the Activiti 5.12 release, we added reporting capabilities on top of the Activiti engine, demonstrating the concepts through the Activiti Explorer web application (but of course usable everywhere).

Now, don’t be fooled: since a very long time , the Activiti engine has the capability of gathering historical or audit data when you execute business processes. All this data is stored in the historical database tables and can thus be easily queried. Which means that any reporting tool such as JasperReports, Birt, Crystal Reports etc. can just use the historical tables as a datasource to produce reports in any format you’d like (Word, PDF, …) to get insight how your business is executing its business processes. I’ll probably blog such an example pretty soon.

Eating our own dogfood

But the thing where I’d like to focus on today is the web side of things: web charts/reports which can be combined into a dashboard for example. The first thing we must be able to do is to expose the historical data in a way we can use it to create these charts. But where do you put the logic (the queries and data manipulation) to generate a dataset for the chart and/or the report? Do you embed the SQL in your UI-layer? Of course not. What if multiple applications want to use the data? What if we want to store the generated dataset to get a snapshot of the data at a certain point in time.

When we thought about this problem we first though about it in the traditional way. A new service with reporting capabilities, probably using some sort of DSL to define the dataset generation which are stored in some kind of data store. Anyway, a whole new things and concepts to learn and master. Not to mention extra implementation and maintenance.

But then it hit us. Everything we needed is already available in the Activiti engine. If we use a process to define the logic to create the dataset for the report, we can leverage all the facilities of the engine. The only requirement for this process is that it generates the JSON data which follows a fixed format. Some benefits

  • The process has straight access to the internals of the Activiti engine. It has direct access to the database used by the engine.
  • The dataset that is generated can be stored in the historical tables of Activiti if wanted. So we have a ‘save report data’ mechanism for free.
  • The job executor can be used as for any other process. This means that you can asynchronously generate the process or only execute certain steps asynchronously. It also means you can use timers, eg. to generate the report data on certain points in time.
  • Creating a new report can be done with known tools and known concepts. Also, no new concepts, services or applications are needed. Deploying or uploading a new report is the same as deploying a new process. Generating a report is the same as running a process instance.
  • It allows to use the BPMN 2.0 constructs. This means that all things like parallel steps, do branching based on data or even request user input during the generation are possible out-of-the-box.

Screen Shot 2013-03-22 at 13.22.49

A dash of Javascript

Since the generation of the dataset is done by a process, everything possible in a process can be used. So you can use Java delegate classes or whatever you fancy.

But since the kool kids nowadays are using Javascript, we added some example process to the demo data of Activiti Explorer that use the scripting functionality in the engine. The nice thing about Javascript is that JSON is native to it, and creating a JSON object is really easy. As said above, the only requirement for such a process is that it must generate the JSON dataset following the predefined format.

Screen Shot 2013-03-22 at 10.43.20

For example to generate an overview of all past process instances we could have a process like this:


<?xml version="1.0" encoding="UTF-8"?>
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:activiti="http://activiti.org/bpmn"
 xmlns:bpmndi="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:omgdc="http://www.omg.org/spec/DD/20100524/DC"
 xmlns:omgdi="http://www.omg.org/spec/DD/20100524/DI" typeLanguage="http://www.w3.org/2001/XMLSchema"
 expressionLanguage="http://www.w3.org/1999/XPath"
 targetNamespace="activiti-report">

<process id="process-instance-overview-report" name="Process Instance Overview" isExecutable="true">

 <startEvent id="startevent1" name="Start" />
 <sequenceFlow id="flow1" sourceRef="startevent1" targetRef="generateDataset" />

 <scriptTask id="generateDataset" name="Execute script" scriptFormat="JavaScript" activiti:autoStoreVariables="false">
 <script><![CDATA[

  importPackage(java.sql);
  importPackage(java.lang);
  importPackage(org.activiti.explorer.reporting);

  var result = ReportingUtil.executeSelectSqlQuery("SELECT PD.NAME_, PD.VERSION_ , count(*) FROM ACT_HI_PROCINST PI
       inner join ACT_RE_PROCDEF PD on PI.PROC_DEF_ID_ = PD.ID_ group by PROC_DEF_ID_");

  var reportData = {};
  reportData.datasets = [];
  
  // Native json usage
  var dataset = {};
  dataset.type = "pieChart";
  dataset.description = "Process instance overview (" + new java.util.Date() + ")";
  dataset.data = {};

  while (result.next()) { // process results one row at a time
    var name = result.getString(1);
    var version = result.getLong(2)
    var count = result.getLong(3);
    dataset.data[name + " (v" + version + ")"] = count;
  }
  reportData.datasets.push(dataset);

  // Storing the json as process variable
  execution.setVariable("reportData", new java.lang.String(JSON.stringify(reportData)).getBytes("UTF-8"));
 ]]></script>
 </scriptTask>
 <sequenceFlow id="flow3" sourceRef="generateDataset" targetRef="theEnd" />

 <endEvent id="theEnd" />

 </process></em>

</definitions>

The script is pretty easy to understand. All it does is querying the database, creating a json object with the data and storing it as process variable. Since it is stored as a variable, it is basically a snapshot of the dataset. So at any time in the future you can just fetch the json data and look at the report as it was at that point in time.

The json produced by the process above can then easily be used to generate charts, as demonstrated by the Activiti Explorer application:

Screen Shot 2013-03-22 at 13.30.45

 

It is also very easy to see how easy it is to create a dashboard app with the same approach. But that’ll be for a next blogpost.

Thanks for reading!

 

 

The Death of Google Reader : Why Open Source Matters in a Cloud Era

This morning when I was at the breakfast table I read the news that Google is shutting down Google Reader in July. I was completely flabbergasted. Every day, sipping my morning coffee, I go to Google Reader to see what has happened in the software world the past day. That has been my routine since 2008, regardless of which employer I had or which project I was doing. I can’t think of any service I use for that long and for that amount of time per day. Or maybe I can … GMail. And it got me thinking.

google-reader-logo

But let me take a step back first. Google proclaims it has seen a decline in its usage. Yet, when I see my twitter feeds and the rss feeds (in Reader) all I can see is the Google Reader news. Yes, I’m probably biased since I’m a software developer and I tend to talk with and follow fellow geeks. This is how Mashable.com puts it:

Hear that clunking sound? That’s thousands of jaws dropping at the news that Google Reader is going to be retired come July 1, 2013. That whooshing sound is “Google Reader” shooting to the top of Twitter’s worldwide trends, even on a day when a new pope was picked.

And that giant “NOOOOOOOO” sound is the Internet’s reaction to Google’s most unpopular decision in — well, as far back as I can remember.

I would gladly pay to use Google Reader. According my my stats I’ve read about 35.000 blogs through Reader since I started using it. It is my single source to keep up to date with the industry and Twitter or any other social media by no means comes close. The noise on there is just to large. Somewhere I read “Google Reader to Twitter is like a filing cabinet to a bag of cats”.

I also hope that Google thought very well about the people that they piss off now. A vocal and influential group (if only it were that they install and maintain the other family members computers ;-) ) Kyle Wild, CEO at Keen_IO states it clearly:

Why Open Source Matters

But let’s quit whining. We all knew this could happen one day, right? After all, Reader is in the hands of one colossal company and is publicly traded on the stock market. They aren’t doing this for charity.

But me, and many others, rely on Google every day. It runs my life: e-mail, calendar, navigation, … heck, when my internet is down I check www.google.com because it is always there. And in the back of our heads we know that is one company behind all these things and yes, we know that this is a bad thing….. but Google is not evil, right?

The facts are however plain and simple: if you don’t control it, the company owning it might pull the plug any day. They have any right to do so. And that brings me the title of this post (by the way, I wonder what the impact on my users visits will be once Reader goes down).  I’m an open source guy. And this move by Google really made me reinforce my believe in open source software (again).

You see, if we would decide to pull the plug on Activiti or Alfresco today, it will be bad news for sure. But because the software we write is open source, it only means the people who are currently writing code for the Activiti and Alfresco project are gone. Activiti and Alfresco will still exist. The code, the documentation will be there. You can still open the code and patch it. Other committers will still be there. Somebody or some other company could take leadership and continue.

The point is: you’re not at the mercy of one single company. And in this era, where everything is becoming cloud-based and closed, we should really think about what brought us to open source software in the first place. Do we want to put our businesses in the hands of other mega-corporations wo care very little about us? Do we want a vendor/service-lock-in for our critical businesses?

I’m very happy that the company I work for, Alfresco, has the right mindset on this. Not only for letting me work on open source software. Yes, we do have Alfresco Cloud. But it is build on the same codebase as the one we’re shipping as community edition. If for some reason the plug is pulled, anybody can take the code, tweak it and run it. Even build their own cloud version. With Alfresco and Activiti, we’ve got nothing to hide. Our code is right there. That is our strength. It’s not only about a kick-ass product. It’s about openness and freedom to be in control of your own path, regardless of what happens.

Don’t get me wrong though, I’m not saying everything should be open-source. Companies still need to make money. But the core, the foundations needs to be. Take GitHub for example: it offers services on top of an open source version control system (Git). If GitHub pulls the plug, I can happily still code away. Sure it will hurt a bit in the beginning, but I’m not locked in.

That being said … anybody has decent alternative for Google Reader? Preferably open source.

Try Activiti Explorer 5.12 now on CloudBees for free!

Running Activiti on the cloud is really easy. All you need is a simple database and a web container if you’d like to run the Activiti Explorer UI. Since Activiti uses very little memory and scales/clusters horizontally out-of-the-box, it is a perfect fit for a cloud deployment.

Doing such a deployment on the cloud gets easier every day. Of course you can set up your own stack on Amazon, but nowadays there are many dedicated PaaS solutions that make your life much easier by offering dedicated platforms. At least from a click-and-run point of view these are much easier to work with. CloudBees is such a PaaS specifically tailored to developers and Java applications.

For framework builders like us, CloudBees offers a very interesting concept called a ‘clickstart’. If you’re interested, here are the full details. But basically, you just have to put a simple json file online (I chose to use GitHub) that tells where to find your war file and what kind of configuration you want. If you want, you can also specify a repository and CloudBees will build and deploy it for you (they hired the lead developer of Jenkins, after all). The CloudBees platform then allows you to go to a special url, passing the url to your json as parameter, which boots up a cloud instance with your app.

So, as we released Activiti 5.12 two days ago (at midnight at the bar, nonetheless) I thought it was a good idea to create a CloudBees Clickstart for Activiti Explorer 5.12. Simply click on the button below (it goes to that special url). You will have to create a CloudBees account (don’t worry, it’s free and you don’t need to provide a credit card like on Amazon) to run your own personal instance of Activiti Explorer 5.12 on the CloudBees cloud.

If that is too much hassle (trust me, it isn’t), you can also try out my demo instance. I’m assuming you’ll see some delays once people start hitting it … so it’s best to try it on your own account.

Update: some funny people think it is funny to change the user passwords (I do too, in some way ;-) )… so if you can’t login to my instance you’ll have to wait until I restart it … or run your own instance :-0

What?!? Activiti needs how much memory?

A while ago, somebody proclaimed their application was going out of memory sometimes due to Activiti. I don’t need to tell you that this hurt my developer heart. We know our architecture is sound and very resource-friendly. But without hard numbers, anybody can just blurt out that Activiti is a memory hog without us being able counter it.

So I decided to do the sane thing. I measured … and boy, was I surprised with the results!

Setup

I cobbled together something which would mimic typical Activiti usage. The code is open and available on

https://github.com/jbarrez/Activiti-Memory-Usage-Test

This program does the following:

  • Has a thread that starts a new process instance every few milliseconds
  • Has a thread that fetches counts from the history table and prints it on the screen
  • Has a group of threads mimicking users that fetch and complete tasks

The processes that are started are randomly chosen from five deployed processes (in the order of the picture below):

  • A simple four user tasks process
  • A five user tasks process with a parallel gateway where all the user tasks are asynchronous
  • A process with a simple script
  • A Process with an exclusive choice with four branches
  • A process with a subprocess with timer

processes

Settings

I decided to use following parameters for running the test. Note that I did not tried to play around with these settings. It could very well be that a much higher throughput is possible, but I believe that the numbers I chose now are an adequate representation of a company doing a fair amount of business process management.

  • 50 users. This means there will be 50 threads asking every x seconds for tasks and completing them
  • Run for 30 minutes
  • Start 120 processes per minute (ie. 2 per second)
  • Have the user threads sleep for a random amount of seconds between 0 and 20.

Again, I didn’t check what the limit of these numbers was. It could very well be you can start 500 processes per second. But the point here is memory usage.

Also, I’m using a standard MySQL installation (just installed, nothing tweaked) as database.

A trip down memory lane

To make it a but interesting, I decided to start low and build up from there. So I ran the benchmark using 32 MB of heap space:

java -jar -Xms32M -Xmx32M -XX:+UseG1GC  activiti-memory-usage.jar

Note that i’m using the new G1 garbage collector which is supposed to be doing good in cases where memory usage is more than 50% of the max heap. I also attached the Yourkit profiler to get an insight into the memory usage. I let the benchmark run for 30 minutes. When I came back, following statistics were shown:

Screen Shot 2013-02-04 at 12.31.55

So to my surprise 32 MB was enough to finish the benchmark! And using it during 30 minutes allowed to finish 3157 process instances and complete 12802 tasks! And even more, the profiler showed me that it wasn’t even using all of the available memory (see first chart)!

Screen Shot 2013-02-04 at 12.33.41

You can also see that when the garbage collector passes by, only 13MB is being used:

Screen Shot 2013-02-04 at 12.35.25

And the CPU was really boring himself during the benchmark: It never really goes above 10% usage.

Screen Shot 2013-02-04 at 12.35.49

Also, there was quite a bit of garbage collecting going on (28 seconds on 30 minutes), which is expectable:

Screen Shot 2013-02-04 at 12.37.57

How low can you go?

The first test learned us 32 is more than enough to run this ‘BPM platform’. And like I said, I believe that the load isn’t that different from a typical company using a BPM solution. But how low can we go?

So I reran my tests using less memory. I quickly learned that when throwing less than 32 MB of RAM at it, I couldn’t complete the benchmark with the Yourkit profiler attached. Probably the profiler agent also steals some memory. So I ran the benchmarks using less memory:

java -jar -XmsXXXM -XmxXXXM -XX:+UseG1GC activiti-memory-usage.jar

I tried 24 MB. Success!

I went down to 16 MB. Success!

I went down to 14 MB. Dang! Out of heap space. But no worries: the exception occurred when the BPMN diagram was generated during process deployment. This takes quite a bit of memory, as Java2D is involved and the PNG is built up in memory. So I configured the engine to not generate this diagram (setting ‘createDiagramOnDeploy’ to false). And yes. Success!

I went down to 12 MB. Success!

And 12 MB of Ram was the lowest I could go. With less memory you get ‘out of heap space’ exception quickly. The statistics for the 12 MB run are actually quite similar to the 32 MB version.

Screen Shot 2013-02-04 at 12.44.34

Let me rephrase that: A measly Twelve Megabytes of RAM memory!! Twelve!!

Conclusion

Activiti (or at least my approximation of a typical Activiti load) needs 12 MB of memory to run. Probably even less, cause the fifty user threads also take up some memory here. To put this in perspective:

  • An iPhone 5 has 85 times more RAM memory (1GB).
  • A Raspberry Pi (25 $ version) has 21 times more RAM memory (256 MB). The 35$ has 42 times more RAM memory (512 MB).
  • An Amazon Micro instance has 51 times more RAM memory (613 MB).
  • The ‘biggest’ Amazon machine you can get at the moment has 2560 times more memory (30GB).

Edit 5 feb 2013: See comments below. Andreas has succeeded in running it on 9MB! 

Of course, in a ‘real’ application, you’d also need a web container, servlets, REST layer, etc. Also, I didn’t touch the permgen settings. But it is equal for all Java programs. The point remains the same: Activiti is REALLY memory friendly! And we learned earlier that Activiti is also really fast

So why even bother looking at the competition?

Using a distributed cache as process definition cache in Activiti 5.12

In my last post, I described the general working of the process definition cache and how to limit the amount of data stored in it. If you haven’t read it yet, I would (of course) welcome you to read it first.

So what if the default cache implementations for some reason don’t cut it for you? Well, don’t worry, we made sure the cache is pluggable and it is very easy to inject your home-brew version. And in this post I’ll show you how to do it.

A distributed cache, you say?

As you could have guessed from the title, we’re going to swap the default process definition cache with a distributed one. Simply put, a distributed cache is generally a key-value store which data is distributed across multiple nodes in a networked cluster. There are a few reasons why you might decide to do this:

  •  You are running Activiti in a cluster and you have an awful lot of process definitions. Most of them are used very frequently. Storing all these process definitions in the cache takes too much memory. But you also don’t want to introduce a hard cache limit because you want to avoid hitting the database too much on a cache miss.
  • You are running on off-the-shelf hardware with limited memory. You want to distribute the memory usage.
  • For some reason, database access is slow and you want to load every process definition only once for the whole cluster.
  • It is just plain cool.

There are plenty of distributed cache implementations: Infinispan, HazelCast, GridGain, EHCache and many, many others.

I chose for Infinispan for the simple reason I already knew its API. Besides a personal preference, it also has some nice ‘extras’ beyond the distributability such as support for JTA transactions when accessing the cache or dealing with Out-of-memory exceptions by evicting stuff from the cache automatically. But the point of this post is to show you how easily you could swap this implementation with your personal preference.

Show me some code!

The first thing you need to do is to make your process engine aware of your process definition cache implementation. Add the following property to your activiti.cfg.xml:


<bean id="processEngineConfiguration" class="org.activiti.engine.impl.cfg.StandaloneProcessEngineConfiguration">
    ...

    <property name="processDefinitionCache">
        <bean class="org.activiti.cache.DistributedCache" />
    </property>

</property>

The referenced class must implement the org.activiti.engine.impl.persistence.deploy.DeploymentCache interface, which looks as follows:


public interface DeploymentCache <T> {

    T get(String s);

    void add(String s, T t);

    void remove(String s);

    void clear();

 }

As you can see, this is a pretty generic interface, which makes it easy to plug in any kind of cache implementation. The Infinispan implementation looks as follows:


public class DistributedCache implements DeploymentCache<ProcessDefinitionEntity> {

    protected Cache<String, ProcessDefinitionEntity> cache;

    public DistributedCache() {
        try {
            CacheContainer manager = new DefaultCacheManager("inifispan-cfg.xml");
            this.cache = manager.getCache();
            this.cache.addListener(new CacheListener());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public ProcessDefinitionEntity get(String id) {
        return cache.get(id);
    }

    public void add(String id, ProcessDefinitionEntity processDefinitionEntity) {
        cache.put(id, processDefinitionEntity);
    }

    public void remove(String id) {
        cache.remove(id);
    }

    public void clear() {
        cache.clear();
    }

}

The real meat here is in the constructor. All the other methods are actually exactly as you would use a regular HashMap. In the constructor, a distributed cache is created using an specific configuration file. The cache listener is added there just for logging purposes so you’d see the contents of the cache in the logs. The actual Infinispan config is pretty simple (also, kudos to the Infinispan team, really good docs!):


<infinispan>
    <global>
        <transport>
            <properties>
                <property name="configurationFile" value="jgroups-tcp.xml" />
            </properties>
        </transport>
    </global>
    <default>
        <!-- Configure a synchronous replication cache -->
        <clustering mode="distribution">
            <sync />
            <hash numOwners="2" />
        </clustering>
    </default>
 </infinispan>

For the actual details, I kindly refer you to the Infinispan documentation. Basically, this config uses jGroups to facilitate the communication using TCP. The following configuration lines state we want at least two nodes in the cluster to have the data (which means a node can fail without data being lost).

I want to try it myself!

To demonstrate the use of the distributed cache, I knocked together a small command line example which you can find on github:

https://github.com/jbarrez/Activiti-process-definition-cache-pluggability

To build the demo jar, run mvn clean package shade:shade. Go to the target folder and run the demo:

java -jar activiti-procdefcache-demo.jar distributed

This will boot an in-memory database, boot the process engine and create all the Activiti tables. The application will ask you for a number of process definitions to generate. Fill in anything you like, but you can make it pretty big, because the data will be distributed anyway. You can now start process instances after all process definitions are deployed. Open a few new terminals and execute the command above again in there. When you start new process instances now, the logging will show you that the cached process definitions are spread across the nodes. You will also see some nodes will have more entries then others.

This is how it looks like when I fire up 9 nodes, with 1000 process definitions in the database:

cache_screenshot01

And when you shut down some nodes again (here only three survive), you will see that Infinispan takes care of distributing all the cached entries nicely across the remaining nodes

cache_screenshot02

All the cached process definitions are now nicely spread across all the nodes in the cluster. Isn’t that pretty?

So that’s all it takes to plug in your own cache implementation. It doesn’t need to be a distributed cache of course, any Java implementation will do. The only limit is your imagination … ;-)

Tweaking the process definition cache in Activiti 5.12

Introduction

Generally, the Activiti engine works in a stateless way. All the state about process instances, tasks and whatnot is stored in the database and only retrieved when necessary. As such, a process instance can be ‘dormant’ for a very long time, without any impact (resource-wise) on running processes.

This approach has two major benefits:

  • It keeps the memory footprint of Activiti low. Nothing which is directly needed is offloaded to the database and only fetched when it is absolutely necessary.
  • It allows to run Activiti easily in a clustered setup (with a shared database, of course), as no node in the cluster will have any state stored which could be needed by another node. The database is the one and only ‘authentic source’ and when a node in a cluster needs some process state, it is fetched from the database, used, and written back.

There is an exception to that rule: during the execution of an operation of an Activiti service, the MyBatis frameworks (the underlying ORM framework) keeps a ‘session cache’ to avoid fetching the same data twice during the duration of a transaction. However, this cache is very short-lived and does not endanger the ability to run Activiti on a cluster.

Static data

As said, Activiti operations in a stateless way. But there is of course data that will never change, which make it a prime candidate for caching.

A process definition is an example of such ‘static data’. You provide the Activiti engine a BPMN 2.0 XML, the engine parses it to something it can execute (more specifically: a tree of pojo’s corresponding with the structure of the process definition) and stores the xml and some data such as the description, business key, etc in the database (It’s also very important to understand that Activiti does not store the parsed version of the process definition in the database). Such a process definition will never change. Once it’s in the database, the stored data will remain the same until the process definition is deleted.

On top of that, parsing a BPMN 2.0 xml to something executable is quite a costly operation (xml parsing always is). This is why the Activiti engine internally uses a process definition cache to store the parsed version of the BPMN 2.0 xml. As shown on the picture below, the mechanism is pretty easy: whenever the process definition is needed, eg. to start a process instance or to complete a task, the cache is checked and the process definition is only loaded and parsed when needed.

ProcessDefinitionCache

Tweaking the process definition cache

The process definition cache has been part of the first release of Activiti, and hasn’t known any major changes since then. Unfortunately, the cache wasn’t very sophisticated: just a plain dumb hashmap of all process definitions the engine ever touched during the whole uptime of the process engine. For most cases, this isn’t really a problem. You would need an awful lot of process definitions before this becomes a bottleneck.

However, my developer heart bled whenever I passed this class. And last week, I finally decided to spend some time in tweaking the process definition cache. You can find the git commit here: 

https://github.com/Activiti/Activiti/commit/bc6fe9ef4cf9e98210a75fd59faf61ac5dc790f3

Basically, it changes two things:

  •  It allows to set a limit to the process definition cache. If such a limit is set, the default hashmap of before will be swapped with an LRU cache with the provided hard limit.
  •  It makes the process definition cache (and also the rules cache) pluggable. I’ll explain this bit in a later blogpost.

Limiting the process definition cache

To demonstrate the use of the process definition cache limit, I knocked together a small command line example which you can find on github:

https://github.com/jbarrez/Activiti-process-definition-cache-pluggability

Note that the project already contains some code I will use in my next blogpost. Try to act surprised when you see it then ;-) To build the demo jar, run mvn clean package shade:shade. Go to the target folder and run the demo:

java -jar activiti-procdefcache-demo.jar default

This will boot an in-memory database, boot the process engine and create all the Activiti tables. The process engine will have a cache limit for the process definition cache in the activiti.cfg.xml configuration file:


<bean id="processEngineConfiguration" class="org.activiti.engine.impl.cfg.StandaloneProcessEngineConfiguration">

  ...

  <property name="processDefinitionCacheLimit" value="10" />

</bean>

The application will now ask you how many process definitions you want do deploy. You can fill in any number here. All the process definitions will be the same, only the process definition id will be changed such that a new cache entry will be needed for every such process definition:

demo01

After all the process definition have been deployed, the application will now ask you how many process instances you want to start:

demo02

Again fill in any number. Don’t expect anything fancy, you’ll just see a lot of logging passing by (the log level is set very low, because the process definition cache logs on a low level and otherwise you won’t see it). But in those loggings, you will often see that the cache limit is hit. The reason for this is that I set the limit pretty low (10, see configuration above), for demo purposes:

demo03

And that’s it! That’s all there is too it to set the cache limit, use an LRU cache and tweak the amount of process definition cached in your Activiti engine.

I want to try it!

This feature will be part of the Activiti 5.12 release. If you want to try it already today, you build a snapshot version of 5.12 until we release the real deal. Clone Activiti from github and run the ‘mvn clean install':

https://github.com/Activiti/Activiti

Sidenote: do not use a snapshot version of Activiti on a production database. Chances are high your database will be screwed when upgrading to the actual non-snapshot version.

Screencast: Suspending a Process With Activiti 5.11

In Activiti 5.11, we’ve added the capability to suspend (and re-activate, of course) process definitions and process instances. The groundwork for this was done in a previous release by Daniel, so (again) many thanks for this.

On the repositoryService following methods were added:


 void suspendProcessDefinitionById(String processDefinitionId);
 void suspendProcessDefinitionById(String processDefinitionId, boolean suspendProcessInstances, Date suspensionDate);
 void suspendProcessDefinitionByKey(String processDefinitionKey);
 void suspendProcessDefinitionByKey(String processDefinitionKey, boolean suspendProcessInstances, Date suspensionDate);

 void activateProcessDefinitionById(String processDefinitionId);
 void activateProcessDefinitionById(String processDefinitionId, boolean activateProcessInstances, Date activationDate);
 void activateProcessDefinitionByKey(String processDefinitionKey);
 void activateProcessDefinitionByKey(String processDefinitionKey, boolean activateProcessInstances, Date activationDate);

When you suspend a process definition, this means that you won’t be able to start new process instances for that process definition (an exception will be thrown). As you can see in the methods above, you can also suspend all process instances related to the process definition at once. Read on to learn what that means. And lastly, both suspend and activate take an optional date. When providing a date, this will be the point in time when the actual suspend/activation will happen. Up to that point in time, the process definition will stay in its current state.

This makes for some cool use cases, like for example an election process that is only valid a certain period of time. Or phasing out a process by not allowing new process instances to be created.

You can also suspend/activate process instances on an individual basis, using following methods on the RuntimeService:


 void suspendProcessInstanceById(String processInstanceId);
 void activateProcessInstanceById(String processInstanceId);

When a process instance is suspended, it cannot be continued. This means that no tasks can be completed, no variables can be set and even no jobs (timers and async steps) will be executed. The process instance is simply halted where it is and it cannot be continued in any way.

As usual, it’s very easy to include these API’s in your own application. For demo purposes, we’ve enhanced the Activiti Explorer web application to show this functionality. The screencast below gives you an impression of it. Again, don’t forget to press the little quality icon in the bottom right to get the HD view of the video (Youtube doesn’t allow embedding HD directly).

Activiti 5.11 released!

We’ve released version 5.11 of Activiti just now. It’s a big release, containing many improvements and bug fixes.

Read all about it on Tijs his blog.