You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by "R.C. Hoekstra" <r....@erasmusmc.nl> on 2014/05/09 11:40:13 UTC

our project setup: any tips specifically on performance/speed?

Hi list,

As written before, We're a university team of scientists working on multi agent simulations of tropical diseases for a world health organization project. A disease can be considered as a state machine, with the patient going through various states and transitions, each triggering new events.

We've managed to make a working example of a xml file where a patient is going through various stages of the disease, including treatments with medicine. Our most important concern at the moment is: how efficient is it? Our aim is a multi agent simulation with possibly a few 100,000 of instances of a State Machine engine (SCXMLExecutor). I'd like to share the code setup with you guys, and maybe you can give some clues on how efficient it will be in terms of performance/speed, and maybe some hints if an alternative approach would be better?

general setup
we have a Population object (a wrapped list) containing all agent objects. Each agent is assigned an SCXMLExecutor as the engine, so there are many instances of SCXMLExecutor. We use the default JexlEvaluator, and each SCXMLExecutor gets the agent it belongs to assigned to the rootContext, so the agent's properties can be accessed from the scxml file.

Transitions
Our transitions are usually of a special type: a patient usually stays x days in a certain state, after which the transition takes place. The x days is determined on basis of drawing a random number from a statistical distribution. There is usually more than one possible transition; each with different probabilities.
So the scxml file must contain the following information:
* distribution name and parameters to determine the time until next transition.
* A number coupled to each possible transition indicating the likelyhood that it happens.

We solved this in the following way:
* distribution name, mean and variance parameters, and chances are defined in the datamodel as single variables: <data id=”distr”>
in each state's onentry we set these variables with the values specific for that state, via the assign tag. The chances variable is defined as an array:
<assign name="chances" expr="[0.05d, 0.10d, 0.20d]" />
* The state's onentry also contains a send tag. Send passes the agent's id, the forementioned variables and the event concerned.
The send message is captured by our own implementation of EventDispatcher. This does two things:
** It draws the random time based on the passed distribution parameters. It schedules this in our own discreet event manager. When the desired time has passed, the discreet event manager passes the correct event back to the correct SCXMLExecutor instance.
** It determines which transition will be chosen by drawing a random number on basis of the chances array. This results in an index number of the transition to be chosen. This index number is passed as payload to the event. The scxml file checks this index number in the cond attribute of the transitions.

Agent properties:
Each disease state also has its effect on the agent's properties, for example the infectivity of the agent, or its fitness. The agent was passed to the rootContext, so the onentry of each state contains code to set the agent's properties specific to that state:
<script>
agent.infectivity = 1
</script>

This is our overall approach. I'd be happy to receive any comments; specifically tips regarding the expected speed/performance.

best regards,
Rinke

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: our project setup: any tips specifically on performance/speed?

Posted by Woonsan Ko <wo...@yahoo.com>.

Hi Rinke,

I think you can measure a rough range of how many times to invoke SCXMLExecutor (#go(), #triggerEvent() for instance) in simulation on N instances. Then you'll probably able to get an estimate of the total execution time somehow.
If the estimate of the pure SCXML executions can possibly meet your requirements, then I think the next thing to consider might be how to reduce IOs if you have to (de)serialize those instances. In one of our projects, we initialize a root context and an executor every time before triggering events. So, the SCXML definition is responsible for initializing itself (to move the right current state) from the given root context (in script blocks). I think this pattern could help reduce the amount of (de)serialized data, and so reduce IO.

Regards,

Woonsan


On Monday, May 12, 2014 12:58 AM, R.C. Hoekstra <r....@erasmusmc.nl> wrote:
 

>
>
>Hi list, 
>
>As written before, We're a university team of scientists working on multi agent simulations of tropical diseases for a world health organization project. A disease can be considered as a state machine, with the patient going through various states and transitions, each triggering new events.
>
>We've managed to make a working example of a xml file where a patient is going through various stages of the disease, including treatments with medicine. Our most important concern at the moment is: how efficient is it? Our aim is a multi agent simulation with possibly a few 100,000 of instances of a State Machine engine (SCXMLExecutor). I'd like to share the code setup with you guys, and maybe you can give some clues on how efficient it will be in terms of performance/speed, and maybe some hints if an alternative approach would be better?
>
>general setup
>we have a Population object (a wrapped list) containing all agent objects. Each agent is assigned an SCXMLExecutor as the engine, so there are many instances of SCXMLExecutor. We use the default JexlEvaluator, and each SCXMLExecutor gets the agent it belongs to assigned to the rootContext, so the agent's properties can be accessed from the scxml file. 
>
>Transitions
>Our transitions are usually of a special type: a patient usually stays x days in a certain state, after which the transition takes place. The x days is determined on basis of drawing a random number from a statistical distribution. There is usually more than one possible transition; each with different probabilities. 
>So the scxml file must contain the following information: 
>* distribution name and parameters to determine the time until next transition.
>* A number coupled to each possible transition indicating the likelyhood that it happens. 
>
>We solved this in the following way:
>* distribution name, mean and variance parameters, and chances are defined in the datamodel as single variables: <data id=”distr”>
>in each state's onentry we set these variables with the values specific for that state, via the assign tag. The chances variable is defined as an array: 
><assign name="chances" expr="[0.05d, 0.10d, 0.20d]" />
>* The state's onentry also contains a send tag. Send passes the agent's id, the forementioned variables and the event concerned. 
>The send message is captured by our own implementation of EventDispatcher. This does two things: 
>** It draws the random time based on the passed distribution parameters. It schedules this in our own discreet event manager. When the desired time has passed, the discreet event manager passes the correct event back to the correct SCXMLExecutor instance. 
>** It determines which transition will be chosen by drawing a random number on basis of the chances array. This results in an index number of the transition to be chosen. This index number is passed as payload to the event. The scxml file checks this index number in the cond attribute of the transitions.
>
>Agent properties:
>Each disease state also has its effect on the agent's properties, for example the infectivity of the agent, or its fitness. The agent was passed to the rootContext, so the onentry of each state contains code to set the agent's properties specific to that state: 
>                <script>
>                    agent.infectivity = 1
>                </script>
>
>This is our overall approach. I'd be happy to receive any comments; specifically tips regarding the expected speed/performance. 
>
>best regards,
>Rinke
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>For additional commands, e-mail: user-help@commons.apache.org
>
>
>

Re: [SCXML] our project setup: any tips specifically on performance/speed?

Posted by Ate Douma <at...@douma.nu>.

Hi Rinke,

First of all, I've prefixed my reply with [SCXML] which is the convention on the 
Apache Commons lists so we can easily group and identify specific component 
related messages.

I've added more specific comments inline below.

On 09-05-14 11:40, R.C. Hoekstra wrote:
> Hi list,
>
> As written before, We're a university team of scientists working on multi
> agent simulations of tropical diseases for a world health organization
> project. A disease can be considered as a state machine, with the patient
> going through various states and transitions, each triggering new events.
>
> We've managed to make a working example of a xml file where a patient is
> going through various stages of the disease, including treatments with
> medicine. Our most important concern at the moment is: how efficient is it?
> Our aim is a multi agent simulation with possibly a few 100,000 of instances
> of a State Machine engine (SCXMLExecutor). I'd like to share the code setup
> with you guys, and maybe you can give some clues on how efficient it will be
> in terms of performance/speed, and maybe some hints if an alternative
> approach would be better?

Maybe, but its a bit difficult in the abstract without having more concrete 
information on how you setup the project and certain usages.

If you have the code publicly viewable it will definitely be helpful if you can 
share that.

Also very important (but that should become clear if we can view the code) is 
which version of Commons SCXML you've set this up. As you know, the current 
SCXML trunk is a major rewrite compared to the old and outdated 0.9 release.
If you currently are still using the 0.9 release, it might be difficult 
(certainly to me) to provide concrete feedback and help as I'm only focusing on 
the trunk 2.0 version.

>
> general setup we have a Population object (a wrapped list) containing all
> agent objects. Each agent is assigned an SCXMLExecutor as the engine, so
> there are many instances of SCXMLExecutor. We use the default JexlEvaluator,
> and each SCXMLExecutor gets the agent it belongs to assigned to the
> rootContext, so the agent's properties can be accessed from the scxml file.
>
> Transitions Our transitions are usually of a special type: a patient usually
> stays x days in a certain state, after which the transition takes place. The
> x days is determined on basis of drawing a random number from a statistical
> distribution. There is usually more than one possible transition; each with
> different probabilities. So the scxml file must contain the following
> information: * distribution name and parameters to determine the time until
> next transition. * A number coupled to each possible transition indicating
> the likelyhood that it happens.

With potentially a 100K+ agents/SCXML instances concurrently, running for x 
number of days, I can imagine memory becoming an issue. Or maybe not. Are you 
(intending) to use some level of SCXML state serialization/de-serialization to 
keep memory footprint under control, or is everything expected to be kept 
running in memory? What are your environment (hardware) conditions/constraints?
>
> We solved this in the following way: * distribution name, mean and variance
> parameters, and chances are defined in the datamodel as single variables:
> <data id=”distr”> in each state's onentry we set these variables with the
> values specific for that state, via the assign tag. The chances variable is
> defined as an array: <assign name="chances" expr="[0.05d, 0.10d, 0.20d]" /> *
> The state's onentry also contains a send tag. Send passes the agent's id, the
> forementioned variables and the event concerned. The send message is captured
> by our own implementation of EventDispatcher. This does two things: ** It
> draws the random time based on the passed distribution parameters. It
> schedules this in our own discreet event manager. When the desired time has
> passed, the discreet event manager passes the correct event back to the
> correct SCXMLExecutor instance. ** It determines which transition will be
> chosen by drawing a random number on basis of the chances array. This results
> in an index number of the transition to be chosen. This index number is
> passed as payload to the event. The scxml file checks this index number in
> the cond attribute of the transitions.

Performance wise, I don't think the SCXML engine itself likely becoming an 
issue, but maybe your custom EventDispatcher/event-manager interaction might, 
certainly if these (all?) have to run on separate threads.
You probably need or already have implemented some custom instance-to-event 
mapping solution for this?
Using 100K+ separate threads isn't likely to work ;)

>
> Agent properties: Each disease state also has its effect on the agent's
> properties, for example the infectivity of the agent, or its fitness. The
> agent was passed to the rootContext, so the onentry of each state contains
> code to set the agent's properties specific to that state: <script>
> agent.infectivity = 1 </script>
>
> This is our overall approach. I'd be happy to receive any comments;
> specifically tips regarding the expected speed/performance.
>
Persistence of your data would be the next thing I'd need more information 
about. Are you 'just' using SCXML serialization to write out and save your data 
and results, or maybe you have (need for) some custom database storage solution?
I expect you'll need to do analysis of the results and doing that on 100K+ SCXML 
documents seems a bit verbose and highly inefficient to me :)

The current (trunk) SCXML datamodel handling and implementation definitely needs 
improvements, as also already is on the roadmap as well.
And in your use-case using (only) the datamodel to store the data without a 
separate/secondary backend storage might be cause for some concern.

Looking forward to more detailed information: your project definitely is very 
interesting and at a scale which I'd love to see Commons SCXML being used.

I can't really say if Commons SCXML *today* will be able to perform and scale 
well enough for this, but if not yet I'm definitely willing to help improving 
and fixing it if feasible.

Regards, Ate

> best regards, Rinke
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: user-unsubscribe@commons.apache.org For additional
> commands, e-mail: user-help@commons.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org