How the Secure Scripting in Activiti works

One of the prominent features of the recent Activiti 5.21.0 release is ‘secure scripting’. The way to enable and use this feature is documented in detail in the Activiti user guide. In this post, I’ll show the details of the implementation and what it’s doing under the hood.

The Problem

The Activiti engine has supported scripting for script tasks (and task/execution listeners) since a long time. The scripts that are used are defined in the process definition and they can be executed directly after deploying the process definition. Which is something many people like. This is a big difference with Java delegate classes or delegate expressions, as they generally require putting the actual logic on the classpath. Which, in itself already introduces some sort of ‘protection’ as a power user generally only can do this.

However, with scripts, no such ‘extra step’ is needed. If you give the power of script tasks to end users (and we know from some of our users some companies do have this use case), all bets are pretty much off. You can shut down the JVM or do malicious things by executing a process instance.

A second problem is that it’s quite easy to write a script that does an infinite loop and never ends. A third problem is that a script can easily use a lot of memory when executed and hog a lot of system resources.

Let’s look at the first problem for starters. First off all, let’s add the latest and greatest Activiti engine dependency and the H2 in memory database library:

<dependencies>
  <dependency>
    <groupId>org.activiti</groupId>
    <artifactId>activiti-engine</artifactId>
    <version>5.21.0</version>
  </dependency>
  <dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <version>1.3.176</version>
  </dependency>
</dependencies>

The process we’ll use here is trivially simple: just a start event, script task and end. The process is not really the point here, the script execution is. Screenshot from 2016-06-13 20:16:21

The first script we’ll try does two things: it will get and display my machine’s current network configuration (but there are obviously more dangerous applications of this idea) and then shutdown the whole JVM. Of course, in a proper setup, some of this will be mitigated by making sure that the user running the logic does not have any rights that matter on the machine (but doesn’t solve the resources hogging issue). But I think that demonstrates pretty well why giving the power of scripts to just about anyone is really bad security-wise.

<scriptTask id="myScriptTask" scriptFormat="javascript">
  <script>
    var s = new java.util.Scanner(java.lang.Runtime.getRuntime().exec("ifconfig").getInputStream()).useDelimiter("\\A");
    var output = s.hasNext() ? s.next() : "";
    java.lang.System.out.println("--- output = " + output);
    java.lang.System.exit(1);
  </script>
</scriptTask>

Let’s deploy the process definition and execute a process instance:

public class Demo1 {

    public static void main (String[] args) {

        // Build engine and deploy
        ProcessEngine processEngine = new StandaloneInMemProcessEngineConfiguration().buildProcessEngine();
        RepositoryService repositoryService = processEngine.getRepositoryService();
        repositoryService.createDeployment().addClasspathResource("process.bpmn20.xml").deploy();

        // Start process instance
        RuntimeService runtimeService = processEngine.getRuntimeService();
        runtimeService.startProcessInstanceByKey("myProcess");
    }
}

Which gives following output (shortened here):

— output = eth0 Link encap:Ethernet
inet addr:192.168.0.114 Bcast:192.168.0.255 Mask:255.255.255.0

Process finished with exit code 1

It outputs information about all my network interfaces and then shutdows down the whole JVM. Yipes. That’s scary.

Trying Nashorn

The solution to our first problem is that we need to whitelist what we want to expose in a script, and have everything blacklisted by default. This way, users won’t be able to run any class or method that can do something malicious.

In Activiti, when a javascript script task is part of a process definition, we give this script to the javascript engine that is embedded in the JDK, using the ScriptEngine class in the JDK. In JDK 6/7 this was Rhino, in JDK 8 this is Nashorn. I first did some serious googling to find a solution for Nashorn (as this would be more future-proof). Nashorn does have a ‘class filter’ concept to effectively implement white-listing. However, the ScriptEngine abstraction does not have any facilities to actually tweak or configure the Nashorn engine. We’ll have to do some low-level magic to get it working.

Instead of using the default Nashorn scripting engine, we instantiate the Nashorn scripting engine ourselves in a ‘SecureScriptTask’ (which is a regular JavaDelegate). Note the use of the usage of jdk.nashorn.* package – not really nice. We follow the docs from https://docs.oracle.com/javase/8/docs/technotes/guides/scripting/nashorn/api.html to make the script execution more secure by adding a ‘ClassFilter’ to the Nashorn engine. This effectively acts as a white-list of approved classes that can be used in the script.

public class SafeScriptTaskDemo2 implements JavaDelegate {

    private Expression script;

    public void execute(DelegateExecution execution) throws Exception {
        NashornScriptEngineFactory factory = new NashornScriptEngineFactory();
        ScriptEngine scriptEngine = factory.getScriptEngine(new SafeClassFilter());

        ScriptingEngines scriptingEngines = Context
                .getProcessEngineConfiguration()
                .getScriptingEngines();

        Bindings bindings = scriptingEngines.getScriptBindingsFactory().createBindings(execution, false);
        scriptEngine.eval((String) script.getValue(execution), bindings);

        System.out.println("Java delegate done");
    }

    public static class SafeClassFilter implements ClassFilter {

        public boolean exposeToScripts(String s) {
            return false;
        }

    }

}

When executed, the script above won’t be executed, an exception is thrown stating ‘Exception in thread “main” java.lang.RuntimeException: java.lang.ClassNotFoundException: java.lang.System.out.println’.

Note that the ClassFilter is only available from JDK 1.8.0_40 (quite recent!).

However, this doesn’t solve our second problem with infinite loops. Let’s execute a simple script:

while (true) {
  print("Hello");
}

You can guess what this’ll do. This will run forever. If you’re lucky, a transaction timeout will happen as the script task is executed in a transaction. But that’ far from a decent solution, as it hogs CPU resources for a while doing nothing.

The third problem, using a lot of memory, is also easy to demonstrate:

var array = []
for(var i = 0; i < 2147483647; ++i) {
  array.push(i);
  java.lang.System.out.println(array.length);
}

When starting the process instance, the memory will quickly fill up (starting with only a couple of MB):

Screenshot from 2016-06-13 20:47:45

and eventually end with an OutOfMemoryException: Exception in thread “main” java.lang.OutOfMemoryError: GC overhead limit exceeded

Switching to Rhino


Between the following example and the previous one a lot of time was spent to make Nashorn somehow intercept or cope with the infinite loop/memory usage. However, after extensive searching and experimenting, it seems the features simply are not (yet?) in Nashorn. A quick search will teach you that we're not the only one looking for a solution to this. Often, it is mentioned that Rhino did have features on board to solve this.

For example in JDK < 8, the Rhino javascript engine had the 'instructionCount' callback mechanism, which is not present in Nashorn. It basically gives you a way to execute logic in a callback that is automatically called every x instructions (bytecode instructions!). I first tried (and lost a lot of time) to mimic the instructionCount idea with Nashorn, for example by prettifying the script first (because people could write the whole script on one line) and then injecting a line of code in the script that triggers a callback. However, that was 1) not very straightforward to do 2) one would still be able to write an instruction on one line that runs infinitely/uses a lot of memory.

Being stuck there, the search led us to the Rhino engine from Mozilla. Since its inclusion in the JDK a long time ago it actually evolved further on its own, while the version in the JDK wasn't updated with those changes! After reading up the (quite sparse) Rhino docs, it became clear Rhino seemed to have a far richer feature-set with regards to our use case. 

The ClassFilter from Nashorn matched the 'ClassShutter' concept in Rhino. The cpu and memory problem were solved using the callback mechanism of Rhino: you can define a callback that is called every x instructions. This means that one line could be hundreds of byte code instructions and we get a callback every x instructions .... which make it an excellent candidate for monitoring our cpu and memory usage when executing the script. 

If you are interested in our implementation of these ideas in the code, have a look here.

This does mean that whatever JDK version you are using, you will not be using the embedded javascript engine, but always Rhino.

Trying it out


To use the new secure scripting feature, add the following depdendency:

<dependency>
  <groupId>org.activiti</groupId>
  <artifactId>activiti-secure-javascript</artifactId>
  <version>5.21.0</version>
</dependency>

This will transitevly include the Rhino engine. This also enables the SecureJavascriptConfigurator, which needs to be configured before creating the process engine:

SecureJavascriptConfigurator configurator = new SecureJavascriptConfigurator()
  .setWhiteListedClasses(new HashSet<String>(Arrays.asList("java.util.ArrayList")))
  .setMaxStackDepth(10)
  .setMaxScriptExecutionTime(3000L)
  .setMaxMemoryUsed(3145728L)
  .setNrOfInstructionsBeforeStateCheckCallback(10);

ProcessEngine processEngine = new StandaloneInMemProcessEngineConfiguration()
  .addConfigurator(configurator)
  .buildProcessEngine();

This will configure the secure scripting to

  • Every 10 instructions, check the CPU execution time and memory usage
  • Give the script 3 seconds and 3MB to execute
  • Limit stack depth to 10 (to avoid recursing)
  • Expose the array list as a class that is safe to use in the scripts

Running the script from above that tries to read the ifconfig and shut down the JVM leads to:

TypeError: Cannot call property getRuntime in object

[JavaPackage java.lang.Runtime]. It is not a function, it is "object".

Running the infinite loop script from above gives

Exception in thread “main” java.lang.Error: Maximum variableScope time of 3000 ms exceeded

And running the memory usage script from above gives

Exception in thread “main” java.lang.Error: Memory limit of 3145728 bytes reached

And hurray! The problems defined above are solved 🙂

Performance

I did a very unscientific quick check … and I almost didn’t dare to share it as the result go against what I assumed would happen. I created a quick main that runs a process instance with a script task 10000 times:

 public class PerformanceUnsecure { public static void main (String[] args) { ProcessEngine processEngine = new StandaloneInMemProcessEngineConfiguration().buildProcessEngine(); RepositoryService repositoryService = processEngine.getRepositoryService(); repositoryService.createDeployment().addClasspathResource("performance.bpmn20.xml").deploy(); Random random = new Random(); RuntimeService runtimeService = processEngine.getRuntimeService(); int nrOfRuns = 10000; long total = 0; for (int i=0; i<nrOfRuns; i++) { Map<String, Object> variables = new HashMap<String, Object>(); variables.put("a", random.nextInt()); variables.put("b", random.nextInt()); long start = System.currentTimeMillis(); runtimeService.startProcessInstanceByKey("myProcess", variables); long end = System.currentTimeMillis(); total += (end - start); } System.out.println("Finished process instances : " + processEngine.getHistoryService().createHistoricProcessInstanceQuery().count()); System.out.println("Total time = " + total + " ms"); System.out.println("Avg time/process instance = " + ((double)total/(double)nrOfRuns) + " ms"); } } 

The process definition is just a start -> script task -> end. The script task simply adds to variables and saves the result in a third variable.

<scriptTask id="myScriptTask" scriptFormat="javascript">
  <script>
    var c = a + b;
    execution.setVariable('c', c);
  </script>
</scriptTask>

I ran this five times, and got an average of 2.57 ms / process instance. This is on a recent JDK 8 (so Nashorn).

Then I switched the first couple of lines above to use the new secure scripting, thus switching to Rhino plus the security features enabled:

SecureJavascriptConfigurator configurator = new SecureJavascriptConfigurator()
  .addWhiteListedClass("org.activiti.engine.impl.persistence.entity.ExecutionEntity")
  .setMaxStackDepth(10)
  .setMaxScriptExecutionTime(3000L)
  .setMaxMemoryUsed(3145728L)
  .setNrOfInstructionsBeforeStateCheckCallback(1);

ProcessEngine processEngine = new StandaloneInMemProcessEngineConfiguration()
  .addConfigurator(configurator)
  .buildProcessEngine();

Did again five runs … and got 1.07 ms / process instance. Which is more than twice as fast for the same thing.

Of course, this is not a real test. I assumed the Rhino execution would be slower, with the class whitelisting checking and the callbacks … but no such thing. Maybe this particular case is one that is simply better suited for Rhino … If anyone can explain it, please leave a comment. But it is an interesting result nonetheless.

 

4 Comments

  1. Axel Faust June 15, 2016

    Nashorn has a more significant startup cost than Rhino and you appear to always create a new NashhornEngine instance. This means that all the internals need to be re-created and – judging from the bit of code I see – none of the internal bytecode caches may be used to improve the next execution of the same script. Additionally – though I don’t believe this to be the case in your simple test – using a custom bindings object per invocation can end up in quite a few re-linking and even re-compilation of script fragments, i.e. when optimistic assumptions don’t hold true.

    Unfortunately there aren’t official performance guides / blog posts out there to outline the DOs and DON’Ts in one article. My information is from own experience writing a Nashorn engine for Alfresco (https://github.com/AFaust/alfresco-nashorn-script-engine) and running my own tests with various configurations. Nashorn typically outperforms Rhino beginning somewhere between the 10th to 25th execution (script-dependant) when you re-use the engine (same approach for both), and the specific difference is dramatically influenced by re-use of bindings and avoidance of (excessive) re-linking.

    Proper use of internal script / bytecode caches is also essential. Doing an “eval()” with an anonymous scriptlet is the WORST thing you can do (in both Rhino and Nashorn), especially if all the scripts you execute are executed by “eval()”. Rhino is affected less by this because it either runs in interpreter mode or – when it’s not – its bytecode generation mechanism is not as complex (not as tightly integrated into JVM as well).

    Using a ClassFilter / ClassShutter alone doesn’t necessarily prevent the issues you outlined. There are some very simple ways this can be circumvented, especially with reflection. In Nashorn you have support for using the Java security manager facility to restrict what scripts can do, and can technically just deny the permission to execute some of the sensitive operations and even prevent use of reflection.

    There are lots of things to consider for both engines. Unfortunately Rhino does have the advantage in documentation and pre-existing community to answer questions / problems quickly. Due to the still evolving nature of Nashorn you sometimes need to take a look into code or seek confirmation / input from developers (they are quite active / attentive on Twitter and StackOverflow, and very helpful).

  2. Joram Barrez June 15, 2016

    @Axel: thanks for your (as always) detailed comment. Let me try to respond to them one by one.

    – I’m actually very interested in making it work on Nashorn. And for that matter, Groovy too.

    – The startup cost of the Nashorn engine could indeed be what is playing here. We’re using the ScriptEngine class for the JDK and it doesn’t support caching for Nashorn last time I checked. We could go around this of course by working with Nashorn directly (as we do with Rhino here). However, I couldn’t find anything that would show that this would be thread-safe, as in that scripts for multiple concurrent process instance are actually isolated from each other. If you have more details how this caching works such that the scripts are thread-safe, please let me know. I’ll also look into your github repo you pasted above.

    – I did tinker with reflection in combination with the class shutter, but I couldn’t find something that would break it (but I did not dive too deep in it). I’m very interested if you have a sample or ideas how this could be circumvented. The security manager does sound good.

    – If Nashorn would have a instruction callback mechanism, a lot of the downsides would already be mitigated.

    – The documentation bit is indeed saddening … for both engines 🙁

  3. Axel Faust June 15, 2016

    Thread-safety with regards to the Nashorn engine evolves primarily about the thread-safety of the execution scope / binding. Any other part of the engine must already be thread-safe or you couldn’t (shouldn’t be able to) execute different scripts with different bindings in parallel. It is basically the same as with the Rhino engine and its execution scope (default Alfresco i.e. uses a shared-scope object as the prototype of the execution scope, to avoid concurrency issues).

    In Nashorn you can share the engine scope as long as you prevent its modification by scripts and use means other than binding attributes to pass input data. In my engine I prevent modification of a (pre-populated) engine scope by loading additional scripts in their own, isolated scope within my AMD module loader (amd.js L399), and I avoid using binding attributes for data input by using function parameters via an extracted interface backed by JavaScript code (NashornScriptProcessor.java L360) and then passing that input along to other components (amd-script-runner.js L7).

    Even though my script engine is very complex and makes use of AMD module loading concepts, it still supports simple scripts that expect input to be available in the global scope. This is achieved by using a global __noSuchProperty__ hook (noSuchProperty.js) that looks up potential root objects that developers expect from the legacy Alfresco script engine (Rhino).

    Caching of script sources at runtime is transparently done within the script engine even if you don’t re-use it for multiple executions – class/bytecode caching is likely to be another matter, depending on the configuration of the engine (an engine may use an isolated class loader for scripts). There also is support for persistent code caching via the “-pcc” option you can pass to the Nashorn-specific script engine factory (service-context.xml L8). I have not been able to reliable measure the impact of this after a JVM restart, but it certainly does not mean the 1st execution after start is as fast as the last before shutdown, due to normal Java JIT / hotspot optimization not having kicked in yet.

    In order for the cache to work efficiently, the script should not be anonymous as the name/URL of the script – along with its actual source for “eval()”-ed scripts – is part of the lookup key. Scripts loaded from an URL-addressed resource additionally benefit from lastModified checks to avoid recompilation to avoid potentially expensive loads of remote source.

    The script name can be provided via a execution scope attribute (NashornScriptProcessor.java L630). If you use the Nashorn load() function or use an URLReader as script source, this is transparently handled without the need to modify a shared data structure.

    P.S.: Spam protection prevented me from posting real links for all the source samples in my GitHub repository.

  4. Axel Faust June 15, 2016

    As for the “instruction count” / “instruction callback”: this is certainly a limitation that results from the compilation to pure Java bytecode. You can’t easily control resource utilization of 3rd-party, pure Java code and terminate it accordingly either…
    It might change when an interpreter mode may be added to Nashorn sometime in JDK 9 (or later), but at the moment the only way I see to get a reliable instruction count / callback would involve either modifying the resulting bytecode or hooking into the invokeDynamic bootstrap handling to add a guard handle for per-instruction counting with optional callback. Alternatively, rewriting the JavaScript code may be the best alternative, but instead of doing a simple String parse+replace/insert you could use the Nashorn parser utilities (https://docs.oracle.com/javase/8/docs/jdk/api/nashorn/jdk/nashorn/api/scripting/ScriptUtils.html#parse-java.lang.String-java.lang.String-boolean-), operate on the AST and create an “instrumented” script. For performance reasons you would need to add a custom script cache in front of Nashorn to avoid parse-overhead unless the script changed (or cache entry was removed due resource/time constraints).

Leave a Reply

Your email address will not be published. Required fields are marked *