Posts in Category: Uncategorized

How to use transient variables in Activiti

A feature that has been requested quite a bit – transient variables – has landed in the Beta3 of Activiti v6 we’ve released yesterday. In this post, I’ll show you an example on how transient variables can be used to cover some advanced use cases that weren’t possible (or optimal) before.

So far, all variables in Activiti were persistent. This means the variable and value are stored in the data store and historical audit data is kept. Transient variables on the other hand act and behave like a regular variable, but they are not persisted. Beyond not being persisted, the following is special to transient variables:

  • a transient variable only survives until the next ‘wait state’, when the state of the process instance is persisted to the database.
  • a transient variable shadows a persistent variable with the same name.

More detailed information about transient variables and the API’s can be found in the documentation.

Example

The process definition that we’ll use to demo some bits of the transient variables is shown below. It’s a fairly simple process: the idea is that we’ll ask some things like keyword and language from the user and use it to do a GitHub API call. If successful, the results are shown to the user. It’s easy to write a UI for this (or use the new forms in the Beta3 angularJS app), but in this post we’ll focus on the code only.

The BPMN 2.0 xml and code can be found on this Github repo: https://github.com/jbarrez/transient-vars-example

Screenshot from 2016-09-01 11:44:50

Let’s walk through the process together. The process starts by providing some input from the user about what should be searched on (usually this would be done using a start form).

repositoryService.createDeployment().addClasspathResource("process.bpmn20.xml").deploy();

Map<String, Object> variables = new HashMap<String, Object>();
variables.put("keyWord", "workflow");
variables.put("language", "java");
ProcessInstance processInstance = runtimeService.startProcessInstanceByKey("githubsearch", variables);

The variables we pass when starting the process instance are regular variables. They are persisted and audit history will be kept, as there is no reason why this shouldn’t be the case.

The first step that is executed is the ‘execute HTTP call’ step, which is a Service Task with a Java delegate:

<serviceTask name="Execute HTTP call" activiti:class="org.activiti.ExecuteHttpCallDelegate"></serviceTask>

Java code:

public class ExecuteHttpCallDelegate implements JavaDelegate {

    public void execute(DelegateExecution execution) {

        String keyword = (String) execution.getVariable("keyWord");
        String language = (String) execution.getVariable("language");

        String url = "https://api.github.com/search/repositories?q=%s+language:%s&sort=stars&order=desc";
        url = String.format(url, keyword, language);
        HttpGet httpget = new HttpGet(url);

        CloseableHttpClient httpclient = HttpClients.createDefault();
        try {
            CloseableHttpResponse response = httpclient.execute(httpget);

            execution.setTransientVariable("response", IOUtils.toString(response.getEntity().getContent(), "UTF-8"));
            execution.setTransientVariable("responseStatus", response.getStatusLine().getStatusCode());

            response.close();

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

}

Here, we’re doing a simple HTTP get against the GitHub API, using the ‘keyword’ and ‘language’ variables we’ve passed on process instance start. Special here is on line 16 and 17 that we’re storing the response and response status in transient variables (that’s the setTransientVariable() call). The reasons for choosing transient variables here are

  • The json response from the Github API is very large. It can be stored in a persistent way of course, but this won’t be good for performance.
  • From an audit point of view, the whole response matters very little. We’ll extract the important bits later from that response, and those will be stored in the historical data.

After getting the response and storing it in a transient variable, we pass the exclusive gateway. The sequenceflow looks like this:

<sequenceFlow ... >
  <extensionElements>
    <activiti:executionListener event="take" class="org.activiti.ProcessResponseExecutionListener"></activiti:executionListener>
  </extensionElements>
  <conditionExpression xsi:type="tFormalExpression"><![CDATA[${responseStatus == 200}]]></conditionExpression>
</sequenceFlow>

Note that for the sequence flow condition there is no difference when it comes to using a transient or non-transient variable. A regular getVariable will also return the transient variable with the name, if set (this is the shadowing part in the docs mentioned above). A getTransientVariable also exists, when only the transient set of variables should be consulted. Anyway: for the condition: no difference at all.

You can also see that the sequence flow has a (hidden in the diagram) execution listener. The execution listener will parse the json response, select the relevant bits and store these in a transient array list. This is important, as you’ll read below the code.

public class ProcessResponseExecutionListener implements ExecutionListener {

    private ObjectMapper objectMapper = new ObjectMapper();

    public void notify(DelegateExecution execution) {

        List<String> searchResults = new ArrayList<String>();

        String response = (String) execution.getVariable("response");
        try {
            JsonNode jsonNode = objectMapper.readTree(response);
            JsonNode itemsNode = jsonNode.get("items");
            if (itemsNode != null && itemsNode.isArray()) {
                for (JsonNode itemNode : (ArrayNode) itemsNode) {
                    String url = itemNode.get("html_url").asText();
                    searchResults.add(url);
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

        execution.setTransientVariable("searchResults", searchResults);
    }

}

The reason for storing the list as a transient variable is an important one. As you can see in the diagram, a multi instance subprocess follows. A subprocess often takes a collection variable to create the instances. So far, this was a persistent variable, in the form of a java-serialized ArrayList. I’ve never liked that (I’ve always used delegateExpressions with a bean if I had to do it before). This has always bothered me a bit. Now, the arraylist is transient and won’t be stored in the data store:

  <subProcess name="subProcess">
    <multiInstanceLoopCharacteristics isSequential="false" 
          activiti:collection="searchResults" activiti:elementVariable="searchResult" />   

Do note that the ‘searchResult’ variable above will be a persistent variable.

Note that the transient variables will be there until the user task is reached and the state is stored in the data store. It’s also possible to pass transient variables when starting a process instance (which could have been an option here, but I think storing user input is a thing you’d want in your audit data).

If you’d run the process instance like this for example:

Map<String, Object> variables = new HashMap<String, Object>();
variables.put("keyWord", "workflow");
variables.put("language", "java");
ProcessInstance processInstance = runtimeService.startProcessInstanceByKey("githubsearch", variables);

List<Task> tasks = taskService.createTaskQuery().processInstanceId(processInstance.getId()).list();
for (Task task : tasks) {
  System.out.println("Current task : " + task.getName());
}

Which gives as example output (limited to 5 results):

Current task : Review result https://github.com/Activiti/Activiti
Current task : Review result https://github.com/twitter/ambrose
Current task : Review result https://github.com/azkaban/azkaban
Current task : Review result https://github.com/romannurik/AndroidDesignPreview
Current task : Review result https://github.com/spring-projects/spring-xd

The user could now look into the details of each of the results and continue the process.

Last Words

As you can imagine, there are quite a few use cases for transient variables. I know that for many people this was an important feature, so I’m glad it’s out there now. Feedback and comments of course, as usual, always welcome!

How the Secure Scripting in Activiti works

One of the prominent features of the recent Activiti 5.21.0 release is ‘secure scripting’. The way to enable and use this feature is documented in detail in the Activiti user guide. In this post, I’ll show you how we came to its final implementation and what it’s doing under the hood. And of course, as it is my usual signature style, we’ll also have a bit of a look at the performance.

The Problem

The Activiti engine has supported scripting for script tasks (and task/execution listeners) since a long time. The scripts that are used are defined in the process definition and they can be executed directly after deploying the process definition. Which is something many people like. This is a big difference with Java delegate classes or delegate expressions, as they generally require putting the actual logic on the classpath. Which, in itself already introduces some sort of ‘protection’ as a power user generally only can do this. 

However, with scripts, no such ‘extra step’ is needed. If you give the power of script tasks to end users (and we know from some of our users some companies do have this use case), all bets are pretty much off. You can shut down the JVM or do malicious things by executing a process instance.

A second problem is that it’s quite easy to write a script that does an infinite loop and never ends. A third problem is that a script can easily use a lot of memory when executed and hog a lot of system resources.

Let’s look at the first problem for starters. First off all, let’s add the latest and greatest Activiti engine dependency and the H2 in memory database library:

<dependencies>
  <dependency>
    <groupId>org.activiti</groupId>
    <artifactId>activiti-engine</artifactId>
    <version>5.21.0</version>
  </dependency>
  <dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <version>1.3.176</version>
  </dependency>
</dependencies>

The process we’ll use here is trivially simple: just a start event, script task and end. The process is not really the point here, the script execution is. Screenshot from 2016-06-13 20:16:21

The first script we’ll try does two things: it will get and display my machine’s current network configuration (but there are obviously more dangerous applications of this idea) and then shutdown the whole JVM. Of course, in a proper setup, some of this will be mitigated by making sure that the user running the logic does not have any rights that matter on the machine (but doesn’t solve the resources hogging issue). But I think that demonstrates pretty well why giving the power of scripts to just about anyone is really bad security-wise.

<scriptTask id="myScriptTask" scriptFormat="javascript">
  <script>
    var s = new java.util.Scanner(java.lang.Runtime.getRuntime().exec("ifconfig").getInputStream()).useDelimiter("\\A");
    var output = s.hasNext() ? s.next() : "";
    java.lang.System.out.println("--- output = " + output);
    java.lang.System.exit(1);
  </script>
</scriptTask>

Let’s deploy the process definition and execute a process instance:

public class Demo1 {

    public static void main (String[] args) {

        // Build engine and deploy
        ProcessEngine processEngine = new StandaloneInMemProcessEngineConfiguration().buildProcessEngine();
        RepositoryService repositoryService = processEngine.getRepositoryService();
        repositoryService.createDeployment().addClasspathResource("process.bpmn20.xml").deploy();

        // Start process instance
        RuntimeService runtimeService = processEngine.getRuntimeService();
        runtimeService.startProcessInstanceByKey("myProcess");
    }
}

Which gives following output (shortened here):

— output = eth0 Link encap:Ethernet
inet addr:192.168.0.114 Bcast:192.168.0.255 Mask:255.255.255.0

Process finished with exit code 1

It outputs information about all my network interfaces and then shutdows down the whole JVM. Yipes. That’s scary.

Trying Nashorn

The solution to our first problem is that we need to whitelist what we want to expose in a script, and have everything blacklisted by default. This way, users won’t be able to run any class or method that can do something malicious.

In Activiti, when a javascript script task is part of a process definition, we give this script to the javascript engine that is embedded in the JDK, using the ScriptEngine class in the JDK. In JDK 6/7 this was Rhino, in JDK 8 this is Nashorn. I first did some serious googling to find a solution for Nashorn (as this would be more future-proof). Nashorn does have a ‘class filter’ concept to effectively implement white-listing. However, the ScriptEngine abstraction does not have any facilities to actually tweak or configure the Nashorn engine. We’ll have to do some low-level magic to get it working.

Instead of using the default Nashorn scripting engine, we instantiate the Nashorn scripting engine ourselves in a ‘SecureScriptTask’ (which is a regular JavaDelegate). Note the use of the usage of jdk.nashorn.* package – not really nice. We follow the docs from https://docs.oracle.com/javase/8/docs/technotes/guides/scripting/nashorn/api.html to make the script execution more secure by adding a ‘ClassFilter’ to the Nashorn engine. This effectively acts as a white-list of approved classes that can be used in the script.

public class SafeScriptTaskDemo2 implements JavaDelegate {

    private Expression script;

    public void execute(DelegateExecution execution) throws Exception {
        NashornScriptEngineFactory factory = new NashornScriptEngineFactory();
        ScriptEngine scriptEngine = factory.getScriptEngine(new SafeClassFilter());

        ScriptingEngines scriptingEngines = Context
                .getProcessEngineConfiguration()
                .getScriptingEngines();

        Bindings bindings = scriptingEngines.getScriptBindingsFactory().createBindings(execution, false);
        scriptEngine.eval((String) script.getValue(execution), bindings);

        System.out.println("Java delegate done");
    }

    public static class SafeClassFilter implements ClassFilter {

        public boolean exposeToScripts(String s) {
            return false;
        }

    }

}

When executed, the script above won’t be executed, an exception is thrown stating ‘Exception in thread “main” java.lang.RuntimeException: java.lang.ClassNotFoundException: java.lang.System.out.println’.

Note that the ClassFilter is only available from JDK 1.8.0_40 (quite recent!).

However, this doesn’t solve our second problem with infinite loops. Let’s execute a simple script:

while (true) {
  print("Hello");
}

You can guess what this’ll do. This will run forever. If you’re lucky, a transaction timeout will happen as the script task is executed in a transaction. But that’ far from a decent solution, as it hogs CPU resources for a while doing nothing.

The third problem, using a lot of memory, is also easy to demonstrate:

var array = []
for(var i = 0; i < 2147483647; ++i) {
  array.push(i);
  java.lang.System.out.println(array.length);
}

When starting the process instance, the memory will quickly fill up (starting with only a couple of MB):

Screenshot from 2016-06-13 20:47:45

and eventually end with an OutOfMemoryException: Exception in thread “main” java.lang.OutOfMemoryError: GC overhead limit exceeded

Switching to Rhino

Between the following example and the previous one a lot of time was spent to make Nashorn somehow intercept or cope with the infinite loop/memory usage. However, after extensive searching and experimenting, it seems the features simply are not (yet?) in Nashorn. A quick search will teach you that we’re not the only one looking for a solution to this. Often, it is mentioned that Rhino did have features on board to solve this.

For example in JDK < 8, the Rhino javascript engine had the 'instructionCount' callback mechanism, which is not present in Nashorn. It basically gives you a way to execute logic in a callback that is automatically called every x instructions (bytecode instructions!). I first tried (and lost a lot of time) to mimic the instructionCount idea with Nashorn, for example by prettifying the script first (because people could write the whole script on one line) and then injecting a line of code in the script that triggers a callback. However, that was 1) not very straightforward to do 2) one would still be able to write an instruction on one line that runs infinitely/uses a lot of memory.

Being stuck there, the search led us to the Rhino engine from Mozilla. Since its inclusion in the JDK a long time ago it actually evolved further on its own, while the version in the JDK wasn’t updated with those changes! After reading up the (quite sparse) Rhino docs, it became clear Rhino seemed to have a far richer feature-set with regards to our use case.

The ClassFilter from Nashorn matched the ‘ClassShutter’ concept in Rhino. The cpu and memory problem were solved using the callback mechanism of Rhino: you can define a callback that is called every x instructions. This means that one line could be hundreds of byte code instructions and we get a callback every x instructions …. which make it an excellent candidate for monitoring our cpu and memory usage when executing the script.

If you are interested in our implementation of these ideas in the code, have a look here.

This does mean that whatever JDK version you are using, you will not be using the embedded javascript engine, but always Rhino.

Trying it out

To use the new secure scripting feature, add the following depdendency:

<dependency>
  <groupId>org.activiti</groupId>
  <artifactId>activiti-secure-javascript</artifactId>
  <version>5.21.0</version>
</dependency>

This will transitevly include the Rhino engine. This also enables the SecureJavascriptConfigurator, which needs to be configured before creating the process engine:

SecureJavascriptConfigurator configurator = new SecureJavascriptConfigurator()
  .setWhiteListedClasses(new HashSet<String>(Arrays.asList("java.util.ArrayList")))
  .setMaxStackDepth(10)
  .setMaxScriptExecutionTime(3000L)
  .setMaxMemoryUsed(3145728L)
  .setNrOfInstructionsBeforeStateCheckCallback(10);

ProcessEngine processEngine = new StandaloneInMemProcessEngineConfiguration()
  .addConfigurator(configurator)
  .buildProcessEngine();

This will configure the secure scripting to

  • Every 10 instructions, check the CPU execution time and memory usage
  • Give the script 3 seconds and 3MB to execute
  • Limit stack depth to 10 (to avoid recursing)
  • Expose the array list as a class that is safe to use in the scripts

Running the script from above that tries to read the ifconfig and shut down the JVM leads to:

TypeError: Cannot call property getRuntime in object [JavaPackage java.lang.Runtime]. It is not a function, it is “object”.

Running the infinite loop script from above gives

Exception in thread “main” java.lang.Error: Maximum variableScope time of 3000 ms exceeded

And running the memory usage script from above gives

Exception in thread “main” java.lang.Error: Memory limit of 3145728 bytes reached

And hurray! The problems defined above are solved 🙂

Performance

I did a very unscientific quick check … and I almost didn’t dare to share it as the result go against what I assumed would happen.

I created a quick main that runs a process instance with a script task 10000 times:

public class PerformanceUnsecure {

    public static void main (String[] args) {

        ProcessEngine processEngine = new StandaloneInMemProcessEngineConfiguration().buildProcessEngine();

        RepositoryService repositoryService = processEngine.getRepositoryService();
        repositoryService.createDeployment().addClasspathResource("performance.bpmn20.xml").deploy();

        Random random = new Random();

        RuntimeService runtimeService = processEngine.getRuntimeService();

        int nrOfRuns = 10000;
        long total = 0;

        for (int i=0; i<nrOfRuns; i++) {
            Map<String, Object> variables = new HashMap<String, Object>();
            variables.put("a", random.nextInt());
            variables.put("b", random.nextInt());
            long start = System.currentTimeMillis();
            runtimeService.startProcessInstanceByKey("myProcess", variables);
            long end = System.currentTimeMillis();
            total += (end - start);
        }
        System.out.println("Finished process instances : " + processEngine.getHistoryService().createHistoricProcessInstanceQuery().count());
        System.out.println("Total time = " + total + " ms");
        System.out.println("Avg time/process instance = " + ((double)total/(double)nrOfRuns) + " ms");
    }

}

The process definition is just a start -> script task -> end. The script task simply adds to variables and saves the result in a third variable.

<scriptTask id="myScriptTask" scriptFormat="javascript">
  <script>
    var c = a + b;
    execution.setVariable('c', c);
  </script>
</scriptTask>

I ran this five times, and got an average of 2.57 ms / process instance. This is on a recent JDK 8 (so Nashorn).

Then I switched the first couple of lines above to use the new secure scripting, thus switching to Rhino plus the security features enabled:

SecureJavascriptConfigurator configurator = new SecureJavascriptConfigurator()
  .addWhiteListedClass("org.activiti.engine.impl.persistence.entity.ExecutionEntity")
  .setMaxStackDepth(10)
  .setMaxScriptExecutionTime(3000L)
  .setMaxMemoryUsed(3145728L)
  .setNrOfInstructionsBeforeStateCheckCallback(1);

ProcessEngine processEngine = new StandaloneInMemProcessEngineConfiguration()
  .addConfigurator(configurator)
  .buildProcessEngine();

Did again five runs … and got 1.07 ms / process instance. Which is more than twice as fast for the same thing.

Of course, this is not a real test. I assumed the Rhino execution would be slower, with the class whitelisting checking and the callbacks … but no such thing. Maybe this particular case is one that is simply better suited for Rhino … If anyone can explain it, please leave a comment. But it is an interesting result nonetheless.

Conclusion

If you are using scripts in your process definition, do read up on this new secure scripting feature in the engine. As this is a new feature, feedback and improvements are more than welcome!

Activiti 5.19.0.2 released

We just released Activiti 5.19.0.2.

This is a bugfix release fixing following changes:

  • Security fix around using commons-collection (more info see the commit)
  • Upgrade to MyBatis 3.3.0
  • Enhancing AsyncExecutor for high load scenarios (especially when the job queue is full)
  • A few small bug fixes

 

Multi-Tenancy with separate database schemas in Activiti

One feature request we’ve heard in the past is that of running the Activiti engine in a multi-tenant way where the data of a tenant is isolated from the others. Certainly in certain cloud/SaaS environments this is a must.

A couple of months ago I was approached by Raphael Gielen, who is a student at the university of Bonn, working on a master thesis about multi-tenancy in Activiti. We got together in a co-working coffee bar a couple of weeks ago, bounced ideas and hacked together a first prototype with database schema isolation for tenants. Very fun :-).

Anyway, we’ve been refining and polishing that code and committed it to the Activiti codebase. Let’s have a look at the existing ways of doing multi-tenancy with Activiti in the first two sections below. In the third section, we’ll dive into the new multi-tenant multi-schema feature sprinkled with some real-working code examples!

Shared Database Multi-tenancy

Activiti has been multi-tenant capable for a while now (since version 5.15). The approach taken was that of a shared database: there is one (or more) Activiti engines and they all go to the same database. Each entry in the database table has a tenant identifier, which is best to be understood as a sort of tag for that data. The Activiti engine and API’s then read and use that tenant identifier to perform its various operations in the context of a tenant.

For example, as shown in the picture below, two different tenants can have a process definition with the same key. The engine and API’s make sure there is no mixup of data.

Screenshot 2015-10-06 12.57.00

The benefit of this approach is the simplicity of deployment, as there is no difference from setting up a ‘regular’ Activiti engine. The downside is that you have to remember to use the right API calls (i.e. those that take in account the tenant identifier). Also, it has the same problem as any system with shared resources: there will always be competition for the resources between tenants. In most use cases this is fine, but there are use cases that can’t be done in this way, like giving certain tenants more or less system resources.

Multi-Engine Multi-Tenancy

Another approach, which has been possible since the very first version of Activiti is simply having one engine instance for each tenant:

Screenshot 2015-10-06 13.12.56

In this setup, each tenant can have different resource configurations or even run on different physical servers. Each engine in this picture here can of course be multiple engines for more performance/failover/etc. The benefit now is that the resources are tailored for the tenant. The downside is the more complex setup (multiple database schemas, having a different configuration file for each tenant, etc.). Each engine instance will take up memory (but that’s very low with Activiti). Also, you’d need t write some routing component that knows somehow the current tenant context and routes to the correct engine.

Screenshot 2015-10-06 13.18.36

Multi-Schema Multi-Tenancy

The latest addition to the Activiti multi-tenancy story was added two weeks ago (here’s the commit), simultaneously on version 5 and 6. Here, there is a database (schema) for each tenant, but only one engine instance. Again, in practice there might be multiple instances for performance/failover/etc., but the concept is the same:

Screenshot 2015-10-06 13.41.20

The benefit is obvious: there is but one engine instance to manage and configure and the API’s are exactly the same as with a non-multi-tenant engine. But foremost, the data of a tenant is completely separated from the data of other tenants. The downside (similar to the multi-engine multi-tenant approach) is that someone needs to manage and configure different databases. But the complex engine management is gone.

The commit I linked to above also contains a unit test showing how the Multi-Schema Multi-Tenant engine works.

Building the process engine is easy, as there is a MultiSchemaMultiTenantProcessEngineConfiguration that abstracts away most of the details:


config = new MultiSchemaMultiTenantProcessEngineConfiguration(tenantInfoHolder);

config.setDatabaseType(MultiSchemaMultiTenantProcessEngineConfiguration.DATABASE_TYPE_H2);
config.setDatabaseSchemaUpdate(MultiSchemaMultiTenantProcessEngineConfiguration.DB_SCHEMA_UPDATE_DROP_CREATE);
    
config.registerTenant("alfresco", createDataSource("jdbc:h2:mem:activiti-mt-alfresco;DB_CLOSE_DELAY=1000", "sa", ""));
config.registerTenant("acme", createDataSource("jdbc:h2:mem:activiti-mt-acme;DB_CLOSE_DELAY=1000", "sa", ""));
config.registerTenant("starkindustries", createDataSource("jdbc:h2:mem:activiti-mt-stark;DB_CLOSE_DELAY=1000", "sa", ""));
    
processEngine = config.buildProcessEngine();

This looks quite similar to booting up a regular Activiti process engine instance. The main difference is that we’re registring tenants with the engine. Each tenant needs to be added with its unique tenant identifier and Datasource implementation. The datasource implementation of course needs to have its own connection pooling. This means you can effectively give certain tenants different connection pool configuration depending on their use case. The Activiti engine will make sure each database schema has been either created or validated to be correct.

The magic to make this all work is the TenantAwareDataSourceThis is a javax.sql.DataSource implementation that delegates to the correct datasource depending on the current tenant identifier. The idea of this class was heavily influenced by Spring’s AbstractRoutingDataSource (standing on the shoulders of other open-source projects!).

The routing to the correct datasource is being done by getting the current tenant identifier from the TenantInfoHolder instance. As you can see in the code snippet above, this is also a mandatory argument when constructing a MultiSchemaMultiTenantProcessEngineConfiguration. The TenantInfoHolder is an interface you need to implement, depending on how users and tenants are managed in your environment. Typically you’d use a ThreadLocal to store the current user/tenant information (much like Spring Security does) that gets filled by some security filter. This class effectively acts as the routing component’ in the picture below:

Screenshot 2015-10-06 13.53.13

In the unit test example, we use indeed a ThreadLocal to store the current tenant identifier, and fill it up with some demo data:

 private void setupTenantInfoHolder() {
    DummyTenantInfoHolder tenantInfoHolder = new DummyTenantInfoHolder();
    
    tenantInfoHolder.addTenant("alfresco");
    tenantInfoHolder.addUser("alfresco", "joram");
    tenantInfoHolder.addUser("alfresco", "tijs");
    tenantInfoHolder.addUser("alfresco", "paul");
    tenantInfoHolder.addUser("alfresco", "yvo");
    
    tenantInfoHolder.addTenant("acme");
    tenantInfoHolder.addUser("acme", "raphael");
    tenantInfoHolder.addUser("acme", "john");
    
    tenantInfoHolder.addTenant("starkindustries");
    tenantInfoHolder.addUser("starkindustries", "tony");
    
    this.tenantInfoHolder = tenantInfoHolder;
  }
 

We now start some process instance, while also switching the current tenant identifier. In practice, you have to imagine that multiple threads come in with requests, and they’ll set the current tenant identifier based on the logged in user:

startProcessInstances("joram");
startProcessInstances("joram");
startProcessInstances("raphael");
completeTasks("raphael");

The startProcessInstances method above will set the current user and tenant identifier and start a few process instances, using the standard Activiti API as if there was no multi tenancy at all (the completeTasks method similarly completes a few tasks).

Also pretty cool is that you can dynamically register (and delete) new tenants, by using the same method that was used when building the process engine. The Activiti engine will make sure the database schema is either created or validated.

config.registerTenant("dailyplanet", createDataSource("jdbc:h2:mem:activiti-mt-daily;DB_CLOSE_DELAY=1000", "sa", ""));

Here’s a movie showing the unit test being run and the data effectively being isolated:

Multi-Tenant Job Executor

The last piece to the puzzle is the job executor. Regular Activiti API calls ‘borrow’ the current thread to execute its operations and thus can use any user/tenant context that has been set before on the thread.

The job executor however, runs using a background threadpool and has no such context. Since the AsyncExecutor in Activiti is an interface, it isn’t hard to implement a multi-schema multi-tenant job executor. Currently, we’ve added two implementations. The first implementation is called the SharedExecutorServiceAsyncExecutor:

config.setAsyncExecutorEnabled(true);
config.setAsyncExecutorActivate(true);
config.setAsyncExecutor(new SharedExecutorServiceAsyncExecutor(tenantInfoHolder));

This implementations (as the name implies) uses one threadpool for all tenants. Each tenant does have its own job acquisition threads, but once the job is acquired, it is put on the shared threadpool. The benefit of this system is that the number of threads being used by Activiti is constrained.

The second implementation is called the ExecutorPerTenantAsyncExecutor:

config.setAsyncExecutorEnabled(true);
config.setAsyncExecutorActivate(true);
config.setAsyncExecutor(new ExecutorPerTenantAsyncExecutor(tenantInfoHolder));

As the name implies, this class acts as a ‘proxy’ AsyncExecutor. For each tenant registered, a complete default AsyncExecutor is booted. Each with its own acquisition threads and execution threadpool. The ‘proxy’ simply delegates to the right AsyncExecutor instance. The benefit of this approach is that each tenant can have a fine-grained job executor configuration, tailored towards the needs of the tenant.

Conclusion

As always, all feedback is more than welcome. Do give the multi-schema multi-tenancy a go and let us know what you think and what could be improved for the future!

Well worth reading

I hardly post link to other blogs here, but I felt that this deserved more attention than a regular tweet:

Future of Programming – Rise of the Scientific Programmer (and fall of the craftsman)

Many of the ideas written there resonate very well with me and are similar to what I’ve been pondering about and saying recently.

We’re spending our time bickering about frameworks and cool development things and we hardly spent time with what we could do with the (huge amount of) data our software produces and how it could be used to improve the day to day work of people.

Some nice quotes from the article:

 I think TDD and agile has given us a safety net that as a tightrope walker, instead of focusing on our walking technique, we improve the safety net. As long as we do the motions, we are safe. Unit tests, coverage, planning poker, retrospective, definition of done, Story, task, creating tickets, moving tickets. How many bad programmers have you seen that are masters of agile?

We got to grow up and go back to school, relearn all about Maths, statistics, and generally scientific reasoning. We need to man up and re-learn that being a good coder has nothing to do with the number of stickers you have at the back of your Mac.

Food for thought.

Five years of blogging

I recently realized that I have this blog for five years already. More precisely, on March 31 2008, I did my first post here.

It was my project lead at that moment that encouraged me to write down some stuff I had done with jBPM. He is now a real good friend of mine, and it is funny to look back how different things where five years ago. In that same period of time I got a house, got married, two kids, a dog, was hired by JBoss to work on jBPM and later joined Alfresco to build Activiti.

I remember that my first posts on this blog caused quite a stir in the company I was working back then. Something about ‘company secrets’ or something. Yeah, it was five years ago, I told you. It’s good to see they changed their attitude towards blogging if I see my ex-colleagues freely blogging now.

Over the past five years, my blog grew really nicely. Below you can see an overview of the unique visitors / month. There’s a jump in June 2009, when I joined JBoss. For a long time, I’ve been floating around a steady 5K line but the last months traffic is sure on the rise as shown by the slope on the right (8850 unique visits last month). It really amazes me that a simple blog like this can attract this amount of people. Really humbling. As such, it really motivates me to continue posting on what I’m working on from day to day.

Screen Shot 2013-04-15 at 19.22.42

In those five years, I wrote 115 posts (that’s 23 posts on average each year),  223.211 people have visited this blog, of which 128.222 unique visitors. If you think about that number of people … that’ just amazing. That are more people than the amount of inhabitants of Bruges, one of the more famous cities in Belgium. But more importantly, over those five years the average time on the blog is 2:21 minutes. So people really take their time to write down what I write! In the age of twitter that means a lot.

The image below shows where all the visits are coming from.

Screen Shot 2013-04-15 at 19.25.29The top 10 looks as follows

  1. USA (most traffic from California, New York and Texas)
  2. Germany
  3. India (most traffic from Karnataka, which is the province where Bangalore is)
  4. Belgium (yay!)
  5. China (most traffic from Beiijing, Shanghai and Shenzen)
  6. France
  7. UK
  8. Poland
  9. Italy
  10. Spain

The top 3 is no surprise, as it is where we also see most requests from in the Activiti community.

I can say nothing more than that I’m really humbled that you all take the time to read what I write down here.

Thank you. Thank all of the 128.222 of you! 

Activiti performance showdown update

A few weeks ago, I wrote a lengthy post on the benchmark results of Activiti called ‘The Activiti Performance Showdown‘. In that post, I shouted out to the community if someone could help me with testing on a ‘real’ Oracle database system, as I had only an Oracle XE installation available.

And so it happened. I was contacted by Jure Grom, who had an Oracle installation at his disposition (and managed by a real DBA). He ran the benchmark in the same way as I did in my original article. Of course, the machines are different, so comparing them is hard. But you can definitely learn something from the patterns and the graphs which were produced.

Interestingly, the benchmark was done on two machines: one dedicated DB machine (2 six-core AMD Opteron with 32GB RAM) and one for running the processes (a quad core Intel Xeon 3.07Ghz with 4GB RAM). In my orginal benchmark, there was no network latency as both ran on the same machine. Not keeping you in suspense anymore, here are the numbers:

Activiti 5.10 – real Oracle DB by Jure Grom – default config, no history

Like I said before, it’s hard to compare with the numbers of the original article due to different machines and the network latency. The single threaded tests are slower than on my setup, but I think that is due to the network here. Which proves that even with network latency, Activiti performes very, very good!

But if you compare the numbers with my Oracle results, you can see that my suspicion was right about the crippled Oracle XE.  Once you go up to a ‘realistic’ number of threads (around 8, it seems here), the results of the real Oracle installation start to blow away the other ones. The throughput is much, much higher, while still keeping the average timings very acceptable.

Jure also extended the tests to run up to 20 threads (my tests only went to 10) and the numbers prove that a real Oracle installation nicely scales when hit with many threads at the same time. Take for example a look at the numbers for the last result, 20 threads:

PROCESS NR OF EXECUTIONS TOTAL TIME (ms) AVERAGE (ms) THROUGHPUT/SEC THROUGHPUT/HOUR
process-multi-instance-01 2500 33429 252.41 74.79 269227.32
process-usertask-01 2500 5005 31.38 499.5 1798201.8
process-usertask-02 2500 22431 145.99 111.45 401230.44
process-usertask-03 2500 9056 64.93 276.06 993816.25
process01 2500 1353 10.84 1847.75 6651884.7
process02 2500 135 1.06 18518.52 6.666666667E7
process03 2500 882 7.03 2834.47 1.020408163E7
process04 2500 1948 15.54 1283.37 4620123.2
process05 2500 134 1.06 18656.72 6.71641791E7

The throughput numbers are better than any of the MYSQL results at this high concurrency. Of course, I know, you can’t compare the machines at all. But the main conclusion of the original article remains: Activiti performs and scales very good, and is extremely capable of coping with high-concurrency scenarios.

Many, many thanks to Jure Grom for his efforts and time!

‘Activiti in Action’ book by Tijs Rademakers is out!

Activiti-enthousiast all over the globe: rejoice! 

After many months of hard work, my colleague and fellow Activiti core developer Tijs Rademakers has finished his Magnum Opus: the book ‘Activiti in Action, executable business processes in BPMN 2.0’ is out!

Check the website of Manning to buy the ebook or the print version. Or both. Because I can personally guarantee you that it is worth every single cent. I’ve been proof-reading the book since the very beginning, and I’ve witnessed the book growing into a book of which there are very few of this level of quality. Very clearly written, with great example code everywhere, this book will make you an Activiti master.

I’m also very grateful that Tijs ought me capable of writing a foreword of the book. So I’m going to paste an excerpt of my full foreword here, because it is exactly my opinion on the book:

‘Tijs does an outstanding job of covering every facet of Activiti in great detail, and I’m excited and thankful that he put so much time into this book project. Software and open source frameworks in general rise or fall with the available documentation, and it’s my belief that this is a superb book that provides much-needed, detailed information. There currently is no better source of knowledge on Activiti and BPMN 2.0. Period.’

Here’s the link again. Don’t hesitate 😉

Maven and Activiti users: repository url has changed!

If you are an Activiti and maven user, this will most definitely concern you.

As of this morning, the Powers That Be have decided to upgrade our maven repository. As such, the old url (which apparently was an internal url not meant to be spread…) will not work anymore. You now have to use the following url for the repository:


<repositories>

  <repository>

    <id>Alfresco</id>

    <name>Alfresco</name>

    <url> http://maven.alfresco.com/nexus/content/groups/public </url>

  </repository>

</repositories>

Sorry for the incovience.

More info/comments: see the post on the Activiti forum.

Does a bigger monitor make me more productive?

Two monitors = productivity++

It’s been a know fact that multiple monitors do make developers more productive. IDE on the one screen, browser with docs/webapp result on the other screen. It’s fairly easy to understand why this would make you more productive as a developer.

I’ve been using two monitors non-stop for the past four years now (before, as a consultant, not every customer allowed it) and I really would have a hard time switching back to a single monitor. On a typical day, the monitor before me has the IDE and a few terminal windows open all the day, while the one on the right has Skype/email/browser open.

Since I started working from my home office, I’ve been using the same setup: macbook on the right and a dell 22″ before me. I really liked the monitor (I actually bought another one for a different machine), but recently I’m started doing more ios development. The 22 incher has a resolution of 1680×1050, which is for XCode just not enough.

The choice

After some internet research I ended up with two candidates: the Apple Thunderbolt display and the Dell U2711. Both are 27″ displays and both have a resolution of 2560×1440. I really love everything Apple, and I love the display I have on my iMac (which is the older version of the Thunderbolt). However, in the end I decided to go against my nature and buy the Dell.

Two main reasons got me to the other side: first of all I was able to buy the Dell for about half the price it’s listed on the Dell website. Secondly, the Thunderbolt Display only has one input:  Thunderbolt. My Macbook which I use for work doesn’t have a Thunderbould output. That would mean I had to buy again another convertor, the one thing I really hate about Apple (who hasn’t lost their display port to vga convertor?). The Dell has every input imaginable: Displayport, HDMI, VGA, DVI, composite and some other ones. The Thunderbolt Display is a LED screen, with vibrant colors, while the Dell is a hi-quality H-IPS LCD screen. The Thunderbolt its colors are just awesome. But 90% of my time is spent working in an IDE.

Point is, when I decomission this screen in the future, I will have plenty of ‘markets’ to get rid of this screen, not only the Thunderbolt owners. And knowing Apple, by then Thunderbolt will again be changed by the next ‘standard’ they invent.

Bigger monitors == productivity++ ?

My current home office setup

Anyway, I’m now using the Dell U2711 for about three months now, so I thought it was time to evaluate if the bigger screen actually makes me more productive.

The first days, I had a bit of issues with the display coating (as blogged about by some). It is indeed a negative point of the Dell U2711. However,I got used to it quickly, and by now I don’t see it anymore. If I know switch to my iMac, I actually dislike seeing myself in the non-coated reflection :-).

But back to the productivity point. For starters: 2560×1440 is freaking huge. You can have multiple windows besides each other open without have to sacrifice any usability, as they would be full-screen on ‘regular’ resolution. If two screens make you more productive since you can just turn your head a bit to see the email/browser/whatever then more screen real estate would also make you more productive, right?

Sadly, I’m not ‘feeling’ the same productivity gain, I felt going from one to two monitors. I think that maybe the two physical displays allow you to also make the ‘context switch’ more easily compared to switching windows one the same monitor. It did change my window positioning habits: now, I have IDE + webapp (or rather, ipad simulator nowadays) on the big screen in front of me, while I use my smaller screen for Skype, email and Twitter.

However, altough I’m not feeling more productive, I do feel more ‘comfortable’. When working in XCode/AppCode, which tend to be screen estate greedy, the huge resolution does miracles. No dragging borders to expand a certain region, hiding stuff to get more code lines on the screen, etc. Everything can be fully expanded and it still feels zen. It’s very hard to explain this, but I do feel calmer then before. Do note that this is totally unscientifically measured.

Altough I can’t say it makes me x % more productive (I would really like to say that!). Probably it does make me a little bit more productive, but definitely not with an impact as switching to two displays. But man, the feeling you get from coding on such a display is awesome. yes, I know some people will have similar remarks as to people driving big cars 😉

I don’t regret buying the bigger screen one bit, and I would recommend it to any developer out there. Having more screen estate always a good thing. Period. But don’t expect to perform now 50% more in the same amount of time 😉

When you think about it: when I did my first coding (15-ish years ago), I did it on a 640×480 CRT monitor. The screen flickered due to not being able to get set to 60Hz (or more). Damn… times sure have changed. My iPhone now has a better screen and resolution! We sure do live in interesting times…