Posts in Category: Alfresco

How to create an Activiti pull request

Once every while, the question on how to create a pull request for Activiti is asked in our Forum. It isn’t hard to do, but specially for people that don’t know git it can be daunting.

So, I created a short movie that shows you how easy it is and what you need to do to:

 

Looking forward to your awesome fixes and features in a pull request!

 

Edit: Zoltan Alfattar mentioned to me on Skype there is a better way using branches:

altfatterz
@jbarrez Nice, however creating separate branch would be the better https://t.co/nco8rRZlk3
05/11/14 15:06

With which I agree fully. However, for people who are new to Git, it might be too complex to grasp. But surely have a look at the link if you are interested in doing it in the ‘proper’ way.

Model your BPMN 2.0 processes in the Activiti Cloud

Last month, we’ve released our very first version of our cloud offering around Activiti. If you missed it, there’s a short press release here (scroll to section about Activiti). Doing it as a cloud-first release has provided us with very good feedback. Not only on the deployment and setup side, but also from partners and potential customers who can easily play around with it.

As mentioned in the press release, all of this will be available on premise in a couple of weeks from now (if all goes according to plan and we keep appeasing the software development gods, which up till now seems to be the case!).

One of the cool things you can do in the Activiti Cloud is model your BPMN 2.0 processes in our completely rewritten web modeler. There are obviously many other cool things to do, but let me get back to those in later blogposts. The only thing you need to do to create BPMN 2.0 processes, which are fully ready to run on the Activiti engine, is to create an account on https://activiti.alfresco.com. With such an account, you can model BPMN 2.0 processes without any limitation. Anyone can do it, no strings attached.

I uploaded a movie to Youtube showing how easy it is:

Of course, as one needs bread on the table, there are also options if you want that something more (eg. the analytics,  options for apps, etc), you want this on-premise (with cool stuff such as LDAP integration and all that stuff you want on premise) or you are looking for support on the Activiti Engine only. Anyway, contact Alfresco Sales if you think this is something you want!

 

My Five Rules for Remote Working

A couple of weeks ago, there was a stir (again) about remote working and its succes and/or failure: it was reported that Reddit, the website where many people lose countless of hours, were forcing all their employees to move to SF. After a similar thing happened at Yahoo last year it made me think about why remote work is such a huge success for us at Activiti and Alfresco.

You see, I’m a remote worker for more than five years now. First at Red Hat and then at Alfresco. I worked a couple of years as Java consultant before that, so I’ve seen my share of office environments (checking my Linkedin, it comes down to about 10 different office environments). I had to go to these offices each day.

Comparing those experiences, I can – without exaggeration – say that I’m way more productive nowadays, working from home. Many people (both in and outside IT) ask me how I do it. They say “they couldn’t do it”. Maybe that’s true. Maybe some people need a lot of people around them. But for the kind of job I am into – developing software – I believe having a lot of people around me doesn’t aid me in writing higher quality software faster.

Anyway, like I said, I did some thinking around it and I came to the following “rules” which I have been following all these years which I believe are crucial (at least for me!) to making remote working a success.

dilbert_remote

(comic from http://www.dilbert.com/ )

Rule 1: The Door

Having a separate space to work is crucial when wanting do serious remote working. Mentally it is important that you can close “The Door” of your office space when you finished working. It brings some kind of closure to the working day.

Many people, when they work from home, put their laptop on let’s say the kitchen table. That doesn’t work. It is not a space that encourages work. There are distractions everywhere (kids that come home, food very close by, …). But most importantly, there is no distinction between when you are working and when you are not.

My wife and kids they know and understand that when The Door is closed, I’m at work. I can’t be disturbed until that Door opens. But when I close The Door in the evening and come downstairs, they also know that I’m fully available for them.

My door!

My door!

Rule 2: The Gear

The second rule is related to the first one: what to put in that room. The answer is simple: only the best. A huge desk, a big-ass 27″ monitor (or bigger), a comfortable chair (your ass spends a lot of time on it), the fastest internet you can buy, some quality speakers, a couple of cool posters and family pictures on the wall, ….

This is the room where you spend most of your time in the week, so you need to make it a place where you love to go to.

My setup

My setup

Often, I hear from people which company allows for remote work that their company should pay for all of this. I think that’s wrong. It’s a two-way street: your company gives you the choice, privilege and trust to work from home, so you from your side must take care that your home office isn’t decreasing anything compared to the office gear you have. Internet connection, chair and computer monitor are probably the most important bits here. If you try to be cheap on any of those, you’ll repay it in decreased productivity.

Rule 3: The Partner

Your partner is of utmost importance to make remote work a success. Don’t be fooled by the third place here, when your partner is not into it, all the other points are useless.

It’s pretty simple and comes down to one core agreement you need to make when working from home: when you are working from home you are not “at home”. When you work, there is no time for cleaning the house, doing the dishes, mowing the grass, etc … You are at work, and that needs to be seen as a full-time, serious thing. Your partner needs to understand that when you would do any of these things, it would be bad for your career.

Many people think this is easy, but I’ve seen many fail. A lot of people still see working from home as something that is not the same as “regular work”. They think you’ve got all the time in the world now. Wrong. Talk it through with your partner. If he/she doesn’t see it (or is jealous), don’t do it.

Rule 4: Communicate, communicate, communicate

More than a team in an office, you need to communicate. If you don’t communicate, you simply don’t exist.

At Activiti, we are skyping a lot during the day. We all know exactly what the other team members are currently doing. We have an informal agreement that we don’t announce a call typically. You just press the ‘call’ button and the other side has to pick it up and respond. It’s the only way remote work can work. Communicate often.

Also important: when you are away from your laptop, say it in a common chat window. There is nothing as damaging for remote workers as not picking up Skype/Phone for no reason.

Rule 5: Trust People

The last rule is crucial. Working remote is based on trust. Unlike in the office, there is no physical proof that you are actually working (although being physically in an office is not correlated with being productive!). You need to trust people that they do their job. But at the same time, don’t be afraid to check up on people’s work (for us, those are the commits) and ask the questions why something is taking longer than expected. Trust grows both ways.

The second part of this trust-story is that there needs to be trust from the company to the team. If that trust is missing, your team won’t be working remote for long. At Activiti, we are very lucky to have Paul Holmes Higgin as our manager. He is often in the office of Alfresco and makes sure that whatever we are doing is known to the company and vice versa. He attends many of the (online) meetings that happen company wide all the time so that we are free to code. There is nothing as bad for a remote team as working in isolation.

trust_fall

Conclusion

So those are my five (personal!) rules I follow when working from home. With all these bad press from the likes of Reddit and Yahoo, I thought it was time for some positive feedback. Remote work is perfect for me: it allows me to be very productive, while still being able to see my family a lot. Even though I put in a lot of hours every week, I’m still seeing my kids grow up every single day and I am there for them when they need me. And that is something priceless.

Does this sound cool to you? Well, at Alfresco we are still hiring people to work on Activiti!

Upcoming Webinar: Process Driven Spring Applications with Activiti – Sept 23rd

oss-logo-spring

Next week, I’ll be doing a webinar together with my friend Josh Long (he’s a Spring Developer Advocate, committer to many open source projects and of course Activiti). I will show some of the new Activiti tooling we’ve been working on recently, while Josh will demonstrate with live coding how easy it is to use Activiti in Spring Boot (spoiler: really easy).

You can register for the webinar for free here: https://spring.io/blog/2014/07/29/webinar-process-driven-spring-applications-with-activiti-sept-23rd

One day later, the Alfresco Summit will be kicked off in San Francisco. I’m joining two talks there:

For those who can’t make it to San Francisco: don’t worry, we’ll be landing in London two weeks later, Oct 7-9!

Execute Custom queries in Activiti

(This will probably end up in the user guide of the Activiti 5.15 release, but I wanted to share it already)

The Activiti API allows for interacting with the database using a high level API. For example, for retrieving data the Query API and the Native Query API are powerful in its usage. However, for some use cases they might not be flexible enough. The following section described how a completely custom SQL statement (select, insert, updated and deletes are possible) can be executed against the Activiti data store, but completely within the configured Process Engine (and thus levering the transaction setup for example).

To define custom SQL statements, the Activiti engine leverages the capabilities of its underlying framework, MyBatis. The first thing to do when using custom SQL, is to create a MyBatis mapper class. More info can be read in the MyBatis user guide. For example, suppose that for some use case not the whole task data is needed, but only a small subset of it. A Mapper that could do this, looks as follows:

public interface MyTestMapper {

  @Select("SELECT ID_ as id, NAME_ as name, CREATE_TIME_ as createTime FROM ACT_RU_TASK")
  List<Map<String, Object>> selectTasks();

}

This mapper must be provided to the Process Engine configuration as follows:

...
<property name="customMybatisMappers">
  <set>
    <value>org.activiti.standalone.cfg.MyTestMapper</value>
  </set>
</property>
...

Notice that this is an interface. The underlying MyBatis framework will make an instance of it that can be used at runtime. Also notice that the return value of the method is not typed, but a list of maps (which corresponds to the list of rows with column values). Typing is possible with the MyBatis mappers if wanted.

To execute the query above, the managementService.executeCustomSql method must be used. This method takes in a CustomSqlExecution instance. This is a wrapper that hides the internal bits of the engine otherwise needed to make it work.

Unfortunately, Java generics make it a bit less readable than it could have been. The two generic types below are the mapper class and the return type class. However, the actual logic is simply to call the mapper method and return its results (if applicable).

CustomSqlExecution<MyTestMapper, List<Map<String, Object>>> customSqlExecution =
    new AbstractCustomSqlExecution<MyTestMapper, List<Map<String, Object>>>(MyTestMapper.class) {

  public List<Map<String, Object>> execute(MyTestMapper customMapper) {
    return customMapper.selectTasks();
  }

};

List<Map<String, Object>> results = managementService.executeCustomSql(customSqlExecution);

The Map entries in the list above will only contain id, name and create time in this case and not the full task object.

Any SQL is possible when using the approach above. Another more complex example:

  @Select({
    "SELECT task.ID_ as taskId, variable.LONG_ as variableValue FROM ACT_RU_VARIABLE variable",
    "inner join ACT_RU_TASK task on variable.TASK_ID_ = task.ID_",
    "where variable.NAME_ = #{variableName}"
  })
  List<Map<String, Object>> selectTaskWithSpecificVariable(String variableName);

Using this method, the task table will be joined with the variables table. Only where the variable has a certain name is retained, and the task id and the corresponding numerical value is returned.

This will be possible in Activiti 5.15. However, the code (and more specifically the Command implementation and the wrapper interface) can be used in any older version of Activiti.

Reporting capabilities in Activiti 5.12

In the Activiti 5.12 release, we added reporting capabilities on top of the Activiti engine, demonstrating the concepts through the Activiti Explorer web application (but of course usable everywhere).

Now, don’t be fooled: since a very long time , the Activiti engine has the capability of gathering historical or audit data when you execute business processes. All this data is stored in the historical database tables and can thus be easily queried. Which means that any reporting tool such as JasperReports, Birt, Crystal Reports etc. can just use the historical tables as a datasource to produce reports in any format you’d like (Word, PDF, …) to get insight how your business is executing its business processes. I’ll probably blog such an example pretty soon.

Eating our own dogfood

But the thing where I’d like to focus on today is the web side of things: web charts/reports which can be combined into a dashboard for example. The first thing we must be able to do is to expose the historical data in a way we can use it to create these charts. But where do you put the logic (the queries and data manipulation) to generate a dataset for the chart and/or the report? Do you embed the SQL in your UI-layer? Of course not. What if multiple applications want to use the data? What if we want to store the generated dataset to get a snapshot of the data at a certain point in time.

When we thought about this problem we first though about it in the traditional way. A new service with reporting capabilities, probably using some sort of DSL to define the dataset generation which are stored in some kind of data store. Anyway, a whole new things and concepts to learn and master. Not to mention extra implementation and maintenance.

But then it hit us. Everything we needed is already available in the Activiti engine. If we use a process to define the logic to create the dataset for the report, we can leverage all the facilities of the engine. The only requirement for this process is that it generates the JSON data which follows a fixed format. Some benefits

  • The process has straight access to the internals of the Activiti engine. It has direct access to the database used by the engine.
  • The dataset that is generated can be stored in the historical tables of Activiti if wanted. So we have a ‘save report data’ mechanism for free.
  • The job executor can be used as for any other process. This means that you can asynchronously generate the process or only execute certain steps asynchronously. It also means you can use timers, eg. to generate the report data on certain points in time.
  • Creating a new report can be done with known tools and known concepts. Also, no new concepts, services or applications are needed. Deploying or uploading a new report is the same as deploying a new process. Generating a report is the same as running a process instance.
  • It allows to use the BPMN 2.0 constructs. This means that all things like parallel steps, do branching based on data or even request user input during the generation are possible out-of-the-box.

Screen Shot 2013-03-22 at 13.22.49

A dash of Javascript

Since the generation of the dataset is done by a process, everything possible in a process can be used. So you can use Java delegate classes or whatever you fancy.

But since the kool kids nowadays are using Javascript, we added some example process to the demo data of Activiti Explorer that use the scripting functionality in the engine. The nice thing about Javascript is that JSON is native to it, and creating a JSON object is really easy. As said above, the only requirement for such a process is that it must generate the JSON dataset following the predefined format.

Screen Shot 2013-03-22 at 10.43.20

For example to generate an overview of all past process instances we could have a process like this:


<?xml version="1.0" encoding="UTF-8"?>
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:activiti="http://activiti.org/bpmn"
 xmlns:bpmndi="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:omgdc="http://www.omg.org/spec/DD/20100524/DC"
 xmlns:omgdi="http://www.omg.org/spec/DD/20100524/DI" typeLanguage="http://www.w3.org/2001/XMLSchema"
 expressionLanguage="http://www.w3.org/1999/XPath"
 targetNamespace="activiti-report">

<process id="process-instance-overview-report" name="Process Instance Overview" isExecutable="true">

 <startEvent id="startevent1" name="Start" />
 <sequenceFlow id="flow1" sourceRef="startevent1" targetRef="generateDataset" />

 <scriptTask id="generateDataset" name="Execute script" scriptFormat="JavaScript" activiti:autoStoreVariables="false">
 <script><![CDATA[

  importPackage(java.sql);
  importPackage(java.lang);
  importPackage(org.activiti.explorer.reporting);

  var result = ReportingUtil.executeSelectSqlQuery("SELECT PD.NAME_, PD.VERSION_ , count(*) FROM ACT_HI_PROCINST PI
       inner join ACT_RE_PROCDEF PD on PI.PROC_DEF_ID_ = PD.ID_ group by PROC_DEF_ID_");

  var reportData = {};
  reportData.datasets = [];
  
  // Native json usage
  var dataset = {};
  dataset.type = "pieChart";
  dataset.description = "Process instance overview (" + new java.util.Date() + ")";
  dataset.data = {};

  while (result.next()) { // process results one row at a time
    var name = result.getString(1);
    var version = result.getLong(2)
    var count = result.getLong(3);
    dataset.data[name + " (v" + version + ")"] = count;
  }
  reportData.datasets.push(dataset);

  // Storing the json as process variable
  execution.setVariable("reportData", new java.lang.String(JSON.stringify(reportData)).getBytes("UTF-8"));
 ]]></script>
 </scriptTask>
 <sequenceFlow id="flow3" sourceRef="generateDataset" targetRef="theEnd" />

 <endEvent id="theEnd" />

 </process></em>

</definitions>

The script is pretty easy to understand. All it does is querying the database, creating a json object with the data and storing it as process variable. Since it is stored as a variable, it is basically a snapshot of the dataset. So at any time in the future you can just fetch the json data and look at the report as it was at that point in time.

The json produced by the process above can then easily be used to generate charts, as demonstrated by the Activiti Explorer application:

Screen Shot 2013-03-22 at 13.30.45

 

It is also very easy to see how easy it is to create a dashboard app with the same approach. But that’ll be for a next blogpost.

Thanks for reading!

 

 

The Death of Google Reader : Why Open Source Matters in a Cloud Era

This morning when I was at the breakfast table I read the news that Google is shutting down Google Reader in July. I was completely flabbergasted. Every day, sipping my morning coffee, I go to Google Reader to see what has happened in the software world the past day. That has been my routine since 2008, regardless of which employer I had or which project I was doing. I can’t think of any service I use for that long and for that amount of time per day. Or maybe I can … GMail. And it got me thinking.

google-reader-logo

But let me take a step back first. Google proclaims it has seen a decline in its usage. Yet, when I see my twitter feeds and the rss feeds (in Reader) all I can see is the Google Reader news. Yes, I’m probably biased since I’m a software developer and I tend to talk with and follow fellow geeks. This is how Mashable.com puts it:

Hear that clunking sound? That’s thousands of jaws dropping at the news that Google Reader is going to be retired come July 1, 2013. That whooshing sound is “Google Reader” shooting to the top of Twitter’s worldwide trends, even on a day when a new pope was picked.

And that giant “NOOOOOOOO” sound is the Internet’s reaction to Google’s most unpopular decision in — well, as far back as I can remember.

I would gladly pay to use Google Reader. According my my stats I’ve read about 35.000 blogs through Reader since I started using it. It is my single source to keep up to date with the industry and Twitter or any other social media by no means comes close. The noise on there is just to large. Somewhere I read “Google Reader to Twitter is like a filing cabinet to a bag of cats”.

I also hope that Google thought very well about the people that they piss off now. A vocal and influential group (if only it were that they install and maintain the other family members computers ;-) ) Kyle Wild, CEO at Keen_IO states it clearly:

Why Open Source Matters

But let’s quit whining. We all knew this could happen one day, right? After all, Reader is in the hands of one colossal company and is publicly traded on the stock market. They aren’t doing this for charity.

But me, and many others, rely on Google every day. It runs my life: e-mail, calendar, navigation, … heck, when my internet is down I check www.google.com because it is always there. And in the back of our heads we know that is one company behind all these things and yes, we know that this is a bad thing….. but Google is not evil, right?

The facts are however plain and simple: if you don’t control it, the company owning it might pull the plug any day. They have any right to do so. And that brings me the title of this post (by the way, I wonder what the impact on my users visits will be once Reader goes down).  I’m an open source guy. And this move by Google really made me reinforce my believe in open source software (again).

You see, if we would decide to pull the plug on Activiti or Alfresco today, it will be bad news for sure. But because the software we write is open source, it only means the people who are currently writing code for the Activiti and Alfresco project are gone. Activiti and Alfresco will still exist. The code, the documentation will be there. You can still open the code and patch it. Other committers will still be there. Somebody or some other company could take leadership and continue.

The point is: you’re not at the mercy of one single company. And in this era, where everything is becoming cloud-based and closed, we should really think about what brought us to open source software in the first place. Do we want to put our businesses in the hands of other mega-corporations wo care very little about us? Do we want a vendor/service-lock-in for our critical businesses?

I’m very happy that the company I work for, Alfresco, has the right mindset on this. Not only for letting me work on open source software. Yes, we do have Alfresco Cloud. But it is build on the same codebase as the one we’re shipping as community edition. If for some reason the plug is pulled, anybody can take the code, tweak it and run it. Even build their own cloud version. With Alfresco and Activiti, we’ve got nothing to hide. Our code is right there. That is our strength. It’s not only about a kick-ass product. It’s about openness and freedom to be in control of your own path, regardless of what happens.

Don’t get me wrong though, I’m not saying everything should be open-source. Companies still need to make money. But the core, the foundations needs to be. Take GitHub for example: it offers services on top of an open source version control system (Git). If GitHub pulls the plug, I can happily still code away. Sure it will hurt a bit in the beginning, but I’m not locked in.

That being said … anybody has decent alternative for Google Reader? Preferably open source.

Try Activiti Explorer 5.12 now on CloudBees for free!

Running Activiti on the cloud is really easy. All you need is a simple database and a web container if you’d like to run the Activiti Explorer UI. Since Activiti uses very little memory and scales/clusters horizontally out-of-the-box, it is a perfect fit for a cloud deployment.

Doing such a deployment on the cloud gets easier every day. Of course you can set up your own stack on Amazon, but nowadays there are many dedicated PaaS solutions that make your life much easier by offering dedicated platforms. At least from a click-and-run point of view these are much easier to work with. CloudBees is such a PaaS specifically tailored to developers and Java applications.

For framework builders like us, CloudBees offers a very interesting concept called a ‘clickstart’. If you’re interested, here are the full details. But basically, you just have to put a simple json file online (I chose to use GitHub) that tells where to find your war file and what kind of configuration you want. If you want, you can also specify a repository and CloudBees will build and deploy it for you (they hired the lead developer of Jenkins, after all). The CloudBees platform then allows you to go to a special url, passing the url to your json as parameter, which boots up a cloud instance with your app.

So, as we released Activiti 5.12 two days ago (at midnight at the bar, nonetheless) I thought it was a good idea to create a CloudBees Clickstart for Activiti Explorer 5.12. Simply click on the button below (it goes to that special url). You will have to create a CloudBees account (don’t worry, it’s free and you don’t need to provide a credit card like on Amazon) to run your own personal instance of Activiti Explorer 5.12 on the CloudBees cloud.

If that is too much hassle (trust me, it isn’t), you can also try out my demo instance. I’m assuming you’ll see some delays once people start hitting it … so it’s best to try it on your own account.

Update: some funny people think it is funny to change the user passwords (I do too, in some way ;-) )… so if you can’t login to my instance you’ll have to wait until I restart it … or run your own instance :-0

What?!? Activiti needs how much memory?

A while ago, somebody proclaimed their application was going out of memory sometimes due to Activiti. I don’t need to tell you that this hurt my developer heart. We know our architecture is sound and very resource-friendly. But without hard numbers, anybody can just blurt out that Activiti is a memory hog without us being able counter it.

So I decided to do the sane thing. I measured … and boy, was I surprised with the results!

Setup

I cobbled together something which would mimic typical Activiti usage. The code is open and available on

https://github.com/jbarrez/Activiti-Memory-Usage-Test

This program does the following:

  • Has a thread that starts a new process instance every few milliseconds
  • Has a thread that fetches counts from the history table and prints it on the screen
  • Has a group of threads mimicking users that fetch and complete tasks

The processes that are started are randomly chosen from five deployed processes (in the order of the picture below):

  • A simple four user tasks process
  • A five user tasks process with a parallel gateway where all the user tasks are asynchronous
  • A process with a simple script
  • A Process with an exclusive choice with four branches
  • A process with a subprocess with timer

processes

Settings

I decided to use following parameters for running the test. Note that I did not tried to play around with these settings. It could very well be that a much higher throughput is possible, but I believe that the numbers I chose now are an adequate representation of a company doing a fair amount of business process management.

  • 50 users. This means there will be 50 threads asking every x seconds for tasks and completing them
  • Run for 30 minutes
  • Start 120 processes per minute (ie. 2 per second)
  • Have the user threads sleep for a random amount of seconds between 0 and 20.

Again, I didn’t check what the limit of these numbers was. It could very well be you can start 500 processes per second. But the point here is memory usage.

Also, I’m using a standard MySQL installation (just installed, nothing tweaked) as database.

A trip down memory lane

To make it a but interesting, I decided to start low and build up from there. So I ran the benchmark using 32 MB of heap space:

java -jar -Xms32M -Xmx32M -XX:+UseG1GC  activiti-memory-usage.jar

Note that i’m using the new G1 garbage collector which is supposed to be doing good in cases where memory usage is more than 50% of the max heap. I also attached the Yourkit profiler to get an insight into the memory usage. I let the benchmark run for 30 minutes. When I came back, following statistics were shown:

Screen Shot 2013-02-04 at 12.31.55

So to my surprise 32 MB was enough to finish the benchmark! And using it during 30 minutes allowed to finish 3157 process instances and complete 12802 tasks! And even more, the profiler showed me that it wasn’t even using all of the available memory (see first chart)!

Screen Shot 2013-02-04 at 12.33.41

You can also see that when the garbage collector passes by, only 13MB is being used:

Screen Shot 2013-02-04 at 12.35.25

And the CPU was really boring himself during the benchmark: It never really goes above 10% usage.

Screen Shot 2013-02-04 at 12.35.49

Also, there was quite a bit of garbage collecting going on (28 seconds on 30 minutes), which is expectable:

Screen Shot 2013-02-04 at 12.37.57

How low can you go?

The first test learned us 32 is more than enough to run this ‘BPM platform’. And like I said, I believe that the load isn’t that different from a typical company using a BPM solution. But how low can we go?

So I reran my tests using less memory. I quickly learned that when throwing less than 32 MB of RAM at it, I couldn’t complete the benchmark with the Yourkit profiler attached. Probably the profiler agent also steals some memory. So I ran the benchmarks using less memory:

java -jar -XmsXXXM -XmxXXXM -XX:+UseG1GC activiti-memory-usage.jar

I tried 24 MB. Success!

I went down to 16 MB. Success!

I went down to 14 MB. Dang! Out of heap space. But no worries: the exception occurred when the BPMN diagram was generated during process deployment. This takes quite a bit of memory, as Java2D is involved and the PNG is built up in memory. So I configured the engine to not generate this diagram (setting ‘createDiagramOnDeploy’ to false). And yes. Success!

I went down to 12 MB. Success!

And 12 MB of Ram was the lowest I could go. With less memory you get ‘out of heap space’ exception quickly. The statistics for the 12 MB run are actually quite similar to the 32 MB version.

Screen Shot 2013-02-04 at 12.44.34

Let me rephrase that: A measly Twelve Megabytes of RAM memory!! Twelve!!

Conclusion

Activiti (or at least my approximation of a typical Activiti load) needs 12 MB of memory to run. Probably even less, cause the fifty user threads also take up some memory here. To put this in perspective:

  • An iPhone 5 has 85 times more RAM memory (1GB).
  • A Raspberry Pi (25 $ version) has 21 times more RAM memory (256 MB). The 35$ has 42 times more RAM memory (512 MB).
  • An Amazon Micro instance has 51 times more RAM memory (613 MB).
  • The ‘biggest’ Amazon machine you can get at the moment has 2560 times more memory (30GB).

Edit 5 feb 2013: See comments below. Andreas has succeeded in running it on 9MB! 

Of course, in a ‘real’ application, you’d also need a web container, servlets, REST layer, etc. Also, I didn’t touch the permgen settings. But it is equal for all Java programs. The point remains the same: Activiti is REALLY memory friendly! And we learned earlier that Activiti is also really fast

So why even bother looking at the competition?

Using a distributed cache as process definition cache in Activiti 5.12

In my last post, I described the general working of the process definition cache and how to limit the amount of data stored in it. If you haven’t read it yet, I would (of course) welcome you to read it first.

So what if the default cache implementations for some reason don’t cut it for you? Well, don’t worry, we made sure the cache is pluggable and it is very easy to inject your home-brew version. And in this post I’ll show you how to do it.

A distributed cache, you say?

As you could have guessed from the title, we’re going to swap the default process definition cache with a distributed one. Simply put, a distributed cache is generally a key-value store which data is distributed across multiple nodes in a networked cluster. There are a few reasons why you might decide to do this:

  •  You are running Activiti in a cluster and you have an awful lot of process definitions. Most of them are used very frequently. Storing all these process definitions in the cache takes too much memory. But you also don’t want to introduce a hard cache limit because you want to avoid hitting the database too much on a cache miss.
  • You are running on off-the-shelf hardware with limited memory. You want to distribute the memory usage.
  • For some reason, database access is slow and you want to load every process definition only once for the whole cluster.
  • It is just plain cool.

There are plenty of distributed cache implementations: Infinispan, HazelCast, GridGain, EHCache and many, many others.

I chose for Infinispan for the simple reason I already knew its API. Besides a personal preference, it also has some nice ‘extras’ beyond the distributability such as support for JTA transactions when accessing the cache or dealing with Out-of-memory exceptions by evicting stuff from the cache automatically. But the point of this post is to show you how easily you could swap this implementation with your personal preference.

Show me some code!

The first thing you need to do is to make your process engine aware of your process definition cache implementation. Add the following property to your activiti.cfg.xml:


<bean id="processEngineConfiguration" class="org.activiti.engine.impl.cfg.StandaloneProcessEngineConfiguration">
    ...

    <property name="processDefinitionCache">
        <bean class="org.activiti.cache.DistributedCache" />
    </property>

</property>

The referenced class must implement the org.activiti.engine.impl.persistence.deploy.DeploymentCache interface, which looks as follows:


public interface DeploymentCache <T> {

    T get(String s);

    void add(String s, T t);

    void remove(String s);

    void clear();

 }

As you can see, this is a pretty generic interface, which makes it easy to plug in any kind of cache implementation. The Infinispan implementation looks as follows:


public class DistributedCache implements DeploymentCache<ProcessDefinitionEntity> {

    protected Cache<String, ProcessDefinitionEntity> cache;

    public DistributedCache() {
        try {
            CacheContainer manager = new DefaultCacheManager("inifispan-cfg.xml");
            this.cache = manager.getCache();
            this.cache.addListener(new CacheListener());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public ProcessDefinitionEntity get(String id) {
        return cache.get(id);
    }

    public void add(String id, ProcessDefinitionEntity processDefinitionEntity) {
        cache.put(id, processDefinitionEntity);
    }

    public void remove(String id) {
        cache.remove(id);
    }

    public void clear() {
        cache.clear();
    }

}

The real meat here is in the constructor. All the other methods are actually exactly as you would use a regular HashMap. In the constructor, a distributed cache is created using an specific configuration file. The cache listener is added there just for logging purposes so you’d see the contents of the cache in the logs. The actual Infinispan config is pretty simple (also, kudos to the Infinispan team, really good docs!):


<infinispan>
    <global>
        <transport>
            <properties>
                <property name="configurationFile" value="jgroups-tcp.xml" />
            </properties>
        </transport>
    </global>
    <default>
        <!-- Configure a synchronous replication cache -->
        <clustering mode="distribution">
            <sync />
            <hash numOwners="2" />
        </clustering>
    </default>
 </infinispan>

For the actual details, I kindly refer you to the Infinispan documentation. Basically, this config uses jGroups to facilitate the communication using TCP. The following configuration lines state we want at least two nodes in the cluster to have the data (which means a node can fail without data being lost).

I want to try it myself!

To demonstrate the use of the distributed cache, I knocked together a small command line example which you can find on github:

https://github.com/jbarrez/Activiti-process-definition-cache-pluggability

To build the demo jar, run mvn clean package shade:shade. Go to the target folder and run the demo:

java -jar activiti-procdefcache-demo.jar distributed

This will boot an in-memory database, boot the process engine and create all the Activiti tables. The application will ask you for a number of process definitions to generate. Fill in anything you like, but you can make it pretty big, because the data will be distributed anyway. You can now start process instances after all process definitions are deployed. Open a few new terminals and execute the command above again in there. When you start new process instances now, the logging will show you that the cached process definitions are spread across the nodes. You will also see some nodes will have more entries then others.

This is how it looks like when I fire up 9 nodes, with 1000 process definitions in the database:

cache_screenshot01

And when you shut down some nodes again (here only three survive), you will see that Infinispan takes care of distributing all the cached entries nicely across the remaining nodes

cache_screenshot02

All the cached process definitions are now nicely spread across all the nodes in the cluster. Isn’t that pretty?

So that’s all it takes to plug in your own cache implementation. It doesn’t need to be a distributed cache of course, any Java implementation will do. The only limit is your imagination … ;-)