You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@groovy.apache.org by Paul Moore <p....@gmail.com> on 2017/03/28 20:25:28 UTC

Optimising a Groovy script

I'm very much a newbie with Groovy, so I apologise in advance if this
is not the right place for questions like this. I couldn't find
anywhere else that looked like a better option - if there is somewhere
I should have asked, feel free to redirect me.

I want to write a simulation script using Groovy - this is something
of a hobby challenge for me, I have a friend who has done a similar
task in C++, and I'm looking for a more user-friendly language to
write the code, while not losing too much performance over the C
version.

The code is basically to simulate "a game". The particular game is
defined by the user, as a function that generates scores. The program
runs the game many times, and summarises the distribution of the
results. Basically, a Monte Carlo simulation. My current code in
Groovy for this, using "roll 3 dice and add up the results" as the
target game, looks as follows:

@Grapes(
    @Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
)
import org.apache.commons.math3.random.MersenneTwister

def benchmark = { closure ->
  start = System.currentTimeMillis()
  closure.call()
  now = System.currentTimeMillis()
  now - start
}

rng = new MersenneTwister()

int roll() {
    rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
}

int N = 1000000
def results = [:]

def time = benchmark {
    N.times {
        int n = roll()
        results[n] = results.containsKey(n) ? results[n] + 1 : 1
    }
}

println "Took ${time/100} sec"
for (e in results.sort()) {
    println "$e.key: $e.value"
}

This does exactly what I want, but takes about 5 seconds to run the
simulation on my PC. My friend's C++ code runs a similar simulation in
about 0.1 second. That's a massive penalty for Groovy, and likely
means that for more realistic simulations (which would be a lot more
complex than 3d6!) I wouldn't even be close to competitive.

The code above is completely unoptimised. I know that Groovy's dynamic
programming features can introduce some overhead, but I also get the
impression from the documentation that by careful use of exact types,
and similar techniques, this can be speeded up a lot (the docs claim
potentially better than C performance in some cases).

What should I be looking at to optimise the above code? The areas I
can think of are:

1. The RNG. I assume that the apache commons code is pretty efficient,
though. I do want a reasonably decent RNG, and I'd heard that the JVM
RNG is not sufficient for simulation. For now I'm assuming that this
is sufficiently good.
2. The roll() function. This is the core of the inner loop, and likely
the big bottleneck. I've declared the type, which I guess is the first
step, but I don't know what else I can do here. I tried a
CompileStatic annotation, but that gave me errors about referencing
the rng variable. I'm not sure what that implies - is my code doing
something wrong in how it references the global rng variable?
3. Collecting the results in a map is likely not ideal. Is there a
better data structure I should be using? I basically want to be able
to count how many times each result appears - results will be integers
(in certain cases I might want non-integers but I can handle them
exceptionally) but I don't necessarily know the range in advance (so
I'd avoid a static array unless it gives significant performance
benefits - when I did a quick test, I got about a second faster
runtime, noticeable, but not enough to get me anywhere near my sub-1
second target)

I tried using the GProf profiler to see if that shed any light on what
I should do. When I ran with a realistic number of iterations it took
forever and then failed with an out of memory error. Dropping it to
10000 iterations, I got

 %    cumulative   self            self     total    self    total
self    total
time   seconds    seconds  calls  ms/call  ms/call  min ms  min ms
max ms  max ms  name
34.5        0.04     0.04  10000     0.00     0.00    0.00    0.00
0.91    1.13  blackpool$_run_closure2$_closure3.roll
18.5        0.06     0.02  39984     0.00     0.00    0.00    0.00
0.53    0.53  java.lang.Integer.plus
15.9        0.08     0.02  10000     0.00     0.01    0.00    0.00
0.28    1.26  blackpool$_run_closure2$_closure3.doCall
11.0        0.10     0.01  30000     0.00     0.00    0.00    0.00
0.36    0.36  org.apache.commons.math3.random.MersenneTwister.nextInt
 5.1        0.10     0.00  10000     0.00     0.00    0.00    0.00
0.19    0.19  java.util.LinkedHashMap.containsKey
 4.4        0.11     0.00  10000     0.00     0.00    0.00    0.00
0.02    0.02  java.util.LinkedHashMap.putAt
 4.0        0.12     0.00      1     5.25   125.62    5.25  125.62
5.25  125.62  java.lang.Integer.times
 3.9        0.12     0.00   9984     0.00     0.00    0.00    0.00
0.02    0.02  java.util.LinkedHashMap.getAt
 2.2        0.12     0.00      1     2.85   128.48    2.85  128.48
2.85  128.48  blackpool$_run_closure2.doCall

But I don't really know how to interpret that. (Also, am I somehow
using GProf wrongly? It seems like it shouldn't run out of memory
profiling a 5-second program run...)

Can anyone offer any advice on what I should be looking at here?

Thanks,
Paul

Re: Optimising a Groovy script

Posted by obesga <ob...@tirant.com>.

If you use a compiled code - not an interpreted script

- Use Java 8 and the indy libraries and compiler
- Use the @CompileStatic annotation on your code, or on some of it. But you could lose some of the advantages of the dynamic code

http://stackoverflow.com/questions/14762094/should-i-use-groovys-compilestatic-if-im-also-using-java-7 <http://stackoverflow.com/questions/14762094/should-i-use-groovys-compilestatic-if-im-also-using-java-7>

Hope it helps

Oscar Besga Arcauz
Editorial Tirant lo Blanch
C/ Artes Gráficas, 14, entlo.
46010 - Valencia
Telf. 963 610 048
Correo electrónico: obesga@tirant.com <ma...@tirant.com>
Editorial <http://www.tirant.com/> • Librería Tirant <http://www.tirant.es/> • Librería Juridica <http://www.tirantderecho.com/> • Tirant Humanidades <http://www.tirant.com/humanidades> • Tirant México <http://www.tirant.com/mex> • Corporativa  <http://www.tirant.net/>• Tirant Online <http://www.tirantonline.com/> • Tirant Formación <http://www.tirantformacion.com/> •Tirant Asesores  <http://www.tirantasesores.com/>• Tirant Notarios <http://notariado.tirant.com/> • Tirant DDHH <http://derechoshumanos.tirant.com/> • Tirant Online México <http://www.tirantonline.com.mx/> • Tirant Latam <http://latam.tirantonline.com/latam/> • Tirant Propiedad Horizontal <http://propiedadhorizontal.tirant.com/> • Biblioteca Virtual <http://biblioteca.tirant.com/> •Nube de Lectura <http://www.nubedelectura.com/>
ADVERTENCIA LEGAL:

De conformidad con lo dispuesto en la Ley Orgánica 15/1999, de 13 de diciembre, de Protección de Datos de Carácter Personal y en la Ley 34/2002, de 11 de julio, de Servicios de la Sociedad de la Información y de Comercio Electrónico, le comunicamos que los datos de carácter personal utilizados en este envío están incluidos en el fichero "CLIENTES-PROVEEDORES” cuya titularidad ostenta EDITORIAL TIRANT LO BLANCH S.L. puede ejercer sus derechos de acceso, rectificación, cancelación y oposición mediante comunicación escrita a CALLE ARTES GRAFICAS, Nº 14 BAJO, 46010, VALENCIA en la dirección indicada o en el correo electrónico remitente. 

Le notificamos que este mensaje va dirigido exclusivamente a la persona designada como destinatario y que la información que contiene es confidencial. Si Vd. ha recibido este mensaje por error le rogamos nos lo comunique mediante correo electrónico remitido a nuestra atención y proceda a su eliminación así como a la de cualquier documento adjunto al mismo, quedando prohibida cualquier divulgación, distribución o copia del mismo. 

El consumo de papel es perjudicial para el medio ambiente. Por favor téngalo en cuenta antes de imprimir este mensaje.

> On 29 Mar 2017, at 17:46, Jochen Theodorou <bl...@gmx.org> wrote:
> 
> 
> 
> On 29.03.2017 10:19, Paul Moore wrote:
>> On 28 March 2017 at 22:08, Nelson, Erick <Er...@hdsupply.com> wrote:
>>> Try this...
>> 
>> Thanks for the suggestion - there were some nice improvements in here.
>> 
>>> def rng = new MersenneTwister()
>>> 
>>> def roll = {
>>>        rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
>>> }
>> 
>> You changed my definitions to use "def" here. This seems to be the
>> thing that makes the most difference in performance. I'm really
>> struggling to find a good explanation as to the effect of using or not
>> using "def".
> 
> 
> basically it depends on if Groovy thinks there has to be a cast here or not and if the primitive optimizations can kick in.
> 
>> I had imagined that using "int roll() {..." would be
>> better, as it explicitly states the types which would help the
>> compiler avoid the need for generic code. Obviously I was wrong, but
>> I'm not at all clear why.
> 
> yes, it is better, but only if primitive optimizations are working... or if you compile static. I haven't looked into why, but in
> 
>>        int n = roll()
>>        results[n] = results.containsKey(n) ? results[n] + 1 : 1
> 
> result is referenced outside the closure and I think that is already enough to not have primitive optimizations in use here. Which means you have to pay the boxing cost from int to Integer and back to int for every call to roll... and then you have again boxing for results.containsKey(n) and results[n]= and results[n] + 1. In that case it looks really much better to just use Integer from the start, thus def has an advantage here.
> 
>> Also, if I use "def rng" but keep "int roll()", I get an error "No
>> such property: rng". I'm not clear why that is.
> 
> you wrote a script, not a class, thus
> 
> rng = new MersenneTwister()
> 
> will set a variable called rng in the binding, and
> 
> int roll() {
>    rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
> }
> 
> this will use rng from the binding.
> 
> def rng = new MersenneTwister()
> 
> will make a local variable and leaves the binding rng blank, thus roll gets into trouble
> 
>> Do you know of a good resource that explains the difference between
>> using def and not doing so?
> 
> hm... I think the page I wrote back then got lost in the codehaus move :(
> bye Jochen

Re: Optimising a Groovy script

Posted by Jochen Theodorou <bl...@gmx.org>.

On 29.03.2017 10:19, Paul Moore wrote:
> On 28 March 2017 at 22:08, Nelson, Erick <Er...@hdsupply.com> wrote:
>> Try this...
>
> Thanks for the suggestion - there were some nice improvements in here.
>
>> def rng = new MersenneTwister()
>>
>> def roll = {
>>         rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
>> }
>
> You changed my definitions to use "def" here. This seems to be the
> thing that makes the most difference in performance. I'm really
> struggling to find a good explanation as to the effect of using or not
> using "def".

basically it depends on if Groovy thinks there has to be a cast here or 
not and if the primitive optimizations can kick in.

> I had imagined that using "int roll() {..." would be
> better, as it explicitly states the types which would help the
> compiler avoid the need for generic code. Obviously I was wrong, but
> I'm not at all clear why.

yes, it is better, but only if primitive optimizations are working... or 
if you compile static. I haven't looked into why, but in

>         int n = roll()
>         results[n] = results.containsKey(n) ? results[n] + 1 : 1

result is referenced outside the closure and I think that is already 
enough to not have primitive optimizations in use here. Which means you 
have to pay the boxing cost from int to Integer and back to int for 
every call to roll... and then you have again boxing for 
results.containsKey(n) and results[n]= and results[n] + 1. In that case 
it looks really much better to just use Integer from the start, thus def 
has an advantage here.

> Also, if I use "def rng" but keep "int roll()", I get an error "No
> such property: rng". I'm not clear why that is.

you wrote a script, not a class, thus

rng = new MersenneTwister()

will set a variable called rng in the binding, and

int roll() {
     rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
}

this will use rng from the binding.

def rng = new MersenneTwister()

will make a local variable and leaves the binding rng blank, thus roll 
gets into trouble

> Do you know of a good resource that explains the difference between
> using def and not doing so?

hm... I think the page I wrote back then got lost in the codehaus move :(
bye Jochen

Re: Optimising a Groovy script

Posted by Paul Moore <p....@gmail.com>.

On 29 March 2017 at 15:34, Paul Moore <p....@gmail.com> wrote:
> So the big difference is removing def from roll, with removing def
> from rng having a smaller but detectable effect.

I just tried to generalise the script, by making a simulate function
that takes the action as a closure. I won't post all the code here,
but basically:

def simulate = { N, cl ->

    def results = ...
    N.times {
        int n = cl()
        results[n]++
    }

    // Report the results
}

simulate(1000000) {
    ... body of roll() here
}

This took 20 seconds to run.

It's certainly possible I've made a stupid mistake here, but I thought
that doing this was essentially a simple refactoring of the original
code, and I'm pretty surprised to see a 40x increase in runtime.
Before I spend ages hunting for my mistake, is there any obvious
reason why this *isn't* just a refactoring, and I should have expected
it to run a lot slower?

(If nothing else, this is a great learning exercise for me :-))

Thanks,
Paul

Re: Optimising a Groovy script

Posted by Paul Moore <p....@gmail.com>.

On 29 March 2017 at 14:56, Nelson, Erick <Er...@hdsupply.com> wrote:
> I'm not sure using or not using def would cause performance differences.

There definitely *seems* to be a difference.

def rng = new MersenneTwister()

def roll = {
    rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
}

Both def: 0.4 to 0.5 sec
No def on rng, def on roll: 0.5 to 0.8 sec - spread seems greater, too
Def on rng, not on roll: 1.2 to 1.6 sec, with a couple of 1.9s.
Def on neither: 1.4 to 1.9 sec.

int roll() and no def on rng looks similar to def on roll and not on
rng. But def on rng isn't allowed when I'm using int roll().

All results from multiple runs on my PC, summarising the average/spread by eye.

So the big difference is removing def from roll, with removing def
from rng having a smaller but detectable effect.

> It just affects variable scope.
>
> http://mrhaki.blogspot.com/2009/11/groovy-goodness-variable-scope-in.html

I still find this confusing. That article says that def or type are
equivalent (in terms of scope). But int roll() doesn't allow def rng,
whereas def roll does.

I assume that the issue here is that

    def roll = { ... }

is making roll an attribute whose value is a closure. Whereas

    int roll() { ... }

is making roll a method? So that implies that you can't reference a
def from within a method, but you can reference it from within a
closure? OK, as a rule I can accept that might be the case, but I
don't understand why. Furthermore, I can't even refer to rng as
"this.rng" from within roll, which is very confusing, as I thought a
script was compiled as an implicit class, in which case isn't this the
instance of that class, and hence this.attr should be a means of
accessing any attribute of the class?

Looking at the details in the manual ("Program Structure" section 3
"Scripts vs Classes") I think I start to understand:

* def roll = { } is declaring a local variable in the implied run()
method, whose value is a closure
* roll = { } declares a variable in the "script binding" whose value
is a closure
* int roll { } is a method of the script class
* def rng = ... is a local variable of the run method, which the docs
explicitly state is not available to methods
* rng = ... is a variable in the script binding which *is* visible from methods
* @Field rng = ...is a field, which is what I'd need to use for
this.rng to work as I'm expecting

I'll have to look some more into the script binding. I don't really
understand the visibility rules for values in the binding, apart from
the few cases explicitly noted (e.g. "they are available from
methods").

Wow, this is more complicated than it looks at first glance!!!

On the other hand, fields, methods, closures and script bindings are
all very different things, so I can easily imagine them having
different performance characteristics. In don't know how I'd develop
an intuition about which is the right one to use in any specific
situation, though. Try them all and see which works better feels like
the only practical option at the moment :-(

Thanks,
Paul

Re: Optimising a Groovy script

Posted by "Nelson, Erick" <Er...@hdsupply.com>.

I'm not sure using or not using def would cause performance differences.
It just affects variable scope.

http://mrhaki.blogspot.com/2009/11/groovy-goodness-variable-scope-in.html




Erick Nelson
Senior Developer
HD Supply, FM
Cell 858-740-6523
Home 760-930-0461

CONFIDENTIALITY NOTICE: This message is for intended addressee(s) only and
may contain information that is confidential, proprietary or exempt from
disclosure, and subject to terms at: http://www.hdsupply.com/email.





On 3/29/17, 1:19 AM, "Paul Moore" <p....@gmail.com> wrote:

>On 28 March 2017 at 22:08, Nelson, Erick <Er...@hdsupply.com>
>wrote:
>> Try this...
>
>Thanks for the suggestion - there were some nice improvements in here.
>
>> def rng = new MersenneTwister()
>>
>> def roll = {
>>         rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
>> }
>
>You changed my definitions to use "def" here. This seems to be the
>thing that makes the most difference in performance. I'm really
>struggling to find a good explanation as to the effect of using or not
>using "def". I had imagined that using "int roll() {..." would be
>better, as it explicitly states the types which would help the
>compiler avoid the need for generic code. Obviously I was wrong, but
>I'm not at all clear why.
>
>Also, if I use "def rng" but keep "int roll()", I get an error "No
>such property: rng". I'm not clear why that is.
>
>Do you know of a good resource that explains the difference between
>using def and not doing so? I'm currently working my way through
>"Groovy in Action" and while the subject has been discussed, I didn't
>really follow it. I've also looked at the online docs and they haven't
>helped a lot. It's quite possible that my confusion comes from the
>fact that I only really have a casual knowledge of Java, so the
>precise way classes reference properties and variables isn't clear to
>me - if there's some background reading in Java that would help
>clarify, that would be useful too.
>
>> int N = 1000000
>> def results = [:].withDefault{0}
>
>I never knew about withDefault - that's a really nice feature, thanks!
>
>Thanks for your help,
>Paul

Re: Optimising a Groovy script

Posted by Paul Moore <p....@gmail.com>.

On 28 March 2017 at 22:08, Nelson, Erick <Er...@hdsupply.com> wrote:
> Try this...

Thanks for the suggestion - there were some nice improvements in here.

> def rng = new MersenneTwister()
>
> def roll = {
>         rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
> }

You changed my definitions to use "def" here. This seems to be the
thing that makes the most difference in performance. I'm really
struggling to find a good explanation as to the effect of using or not
using "def". I had imagined that using "int roll() {..." would be
better, as it explicitly states the types which would help the
compiler avoid the need for generic code. Obviously I was wrong, but
I'm not at all clear why.

Also, if I use "def rng" but keep "int roll()", I get an error "No
such property: rng". I'm not clear why that is.

Do you know of a good resource that explains the difference between
using def and not doing so? I'm currently working my way through
"Groovy in Action" and while the subject has been discussed, I didn't
really follow it. I've also looked at the online docs and they haven't
helped a lot. It's quite possible that my confusion comes from the
fact that I only really have a casual knowledge of Java, so the
precise way classes reference properties and variables isn't clear to
me - if there's some background reading in Java that would help
clarify, that would be useful too.

> int N = 1000000
> def results = [:].withDefault{0}

I never knew about withDefault - that's a really nice feature, thanks!

Thanks for your help,
Paul

Re: Optimising a Groovy script

Posted by "Nelson, Erick" <Er...@hdsupply.com>.

Try this...


@Grapes(
	@Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
)
import org.apache.commons.math3.random.MersenneTwister

import groovy.time.*

def benchmark = { closure ->
	def start = new Date()
	closure.call()
	TimeCategory.minus(new Date(), start)
}

def rng = new MersenneTwister()

def roll = {
	rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
}

int N = 1000000
def results = [:].withDefault{0}

def time = benchmark {
	(1..N).each {
		int n = roll()
		results[n]++
	}
}

println time
results.each { key, value -> println "$key: $value" }






when I run it i get this output

0.565 seconds
12: 115149
14: 69665
18: 4546
6: 46159
10: 125497
11: 125188
9: 116157
8: 97236
15: 46510
7: 69281
13: 96961
5: 27663
16: 27731
3: 4572
17: 13827
4: 13858




Erick Nelson
Senior Developer
HD Supply, FM
Cell 858-740-6523
Home 760-930-0461

CONFIDENTIALITY NOTICE: This message is for intended addressee(s) only and
may contain information that is confidential, proprietary or exempt from
disclosure, and subject to terms at: http://www.hdsupply.com/email.





On 3/28/17, 1:25 PM, "Paul Moore" <p....@gmail.com> wrote:

>I'm very much a newbie with Groovy, so I apologise in advance if this
>is not the right place for questions like this. I couldn't find
>anywhere else that looked like a better option - if there is somewhere
>I should have asked, feel free to redirect me.
>
>I want to write a simulation script using Groovy - this is something
>of a hobby challenge for me, I have a friend who has done a similar
>task in C++, and I'm looking for a more user-friendly language to
>write the code, while not losing too much performance over the C
>version.
>
>The code is basically to simulate "a game". The particular game is
>defined by the user, as a function that generates scores. The program
>runs the game many times, and summarises the distribution of the
>results. Basically, a Monte Carlo simulation. My current code in
>Groovy for this, using "roll 3 dice and add up the results" as the
>target game, looks as follows:
>
>@Grapes(
>    @Grab(group='org.apache.commons', module='commons-math3',
>version='3.6.1')
>)
>import org.apache.commons.math3.random.MersenneTwister
>
>def benchmark = { closure ->
>  start = System.currentTimeMillis()
>  closure.call()
>  now = System.currentTimeMillis()
>  now - start
>}
>
>rng = new MersenneTwister()
>
>int roll() {
>    rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
>}
>
>int N = 1000000
>def results = [:]
>
>def time = benchmark {
>    N.times {
>        int n = roll()
>        results[n] = results.containsKey(n) ? results[n] + 1 : 1
>    }
>}
>
>println "Took ${time/100} sec"
>for (e in results.sort()) {
>    println "$e.key: $e.value"
>}
>
>This does exactly what I want, but takes about 5 seconds to run the
>simulation on my PC. My friend's C++ code runs a similar simulation in
>about 0.1 second. That's a massive penalty for Groovy, and likely
>means that for more realistic simulations (which would be a lot more
>complex than 3d6!) I wouldn't even be close to competitive.
>
>The code above is completely unoptimised. I know that Groovy's dynamic
>programming features can introduce some overhead, but I also get the
>impression from the documentation that by careful use of exact types,
>and similar techniques, this can be speeded up a lot (the docs claim
>potentially better than C performance in some cases).
>
>What should I be looking at to optimise the above code? The areas I
>can think of are:
>
>1. The RNG. I assume that the apache commons code is pretty efficient,
>though. I do want a reasonably decent RNG, and I'd heard that the JVM
>RNG is not sufficient for simulation. For now I'm assuming that this
>is sufficiently good.
>2. The roll() function. This is the core of the inner loop, and likely
>the big bottleneck. I've declared the type, which I guess is the first
>step, but I don't know what else I can do here. I tried a
>CompileStatic annotation, but that gave me errors about referencing
>the rng variable. I'm not sure what that implies - is my code doing
>something wrong in how it references the global rng variable?
>3. Collecting the results in a map is likely not ideal. Is there a
>better data structure I should be using? I basically want to be able
>to count how many times each result appears - results will be integers
>(in certain cases I might want non-integers but I can handle them
>exceptionally) but I don't necessarily know the range in advance (so
>I'd avoid a static array unless it gives significant performance
>benefits - when I did a quick test, I got about a second faster
>runtime, noticeable, but not enough to get me anywhere near my sub-1
>second target)
>
>I tried using the GProf profiler to see if that shed any light on what
>I should do. When I ran with a realistic number of iterations it took
>forever and then failed with an out of memory error. Dropping it to
>10000 iterations, I got
>
> %    cumulative   self            self     total    self    total
>self    total
>time   seconds    seconds  calls  ms/call  ms/call  min ms  min ms
>max ms  max ms  name
>34.5        0.04     0.04  10000     0.00     0.00    0.00    0.00
>0.91    1.13  blackpool$_run_closure2$_closure3.roll
>18.5        0.06     0.02  39984     0.00     0.00    0.00    0.00
>0.53    0.53  java.lang.Integer.plus
>15.9        0.08     0.02  10000     0.00     0.01    0.00    0.00
>0.28    1.26  blackpool$_run_closure2$_closure3.doCall
>11.0        0.10     0.01  30000     0.00     0.00    0.00    0.00
>0.36    0.36  org.apache.commons.math3.random.MersenneTwister.nextInt
> 5.1        0.10     0.00  10000     0.00     0.00    0.00    0.00
>0.19    0.19  java.util.LinkedHashMap.containsKey
> 4.4        0.11     0.00  10000     0.00     0.00    0.00    0.00
>0.02    0.02  java.util.LinkedHashMap.putAt
> 4.0        0.12     0.00      1     5.25   125.62    5.25  125.62
>5.25  125.62  java.lang.Integer.times
> 3.9        0.12     0.00   9984     0.00     0.00    0.00    0.00
>0.02    0.02  java.util.LinkedHashMap.getAt
> 2.2        0.12     0.00      1     2.85   128.48    2.85  128.48
>2.85  128.48  blackpool$_run_closure2.doCall
>
>But I don't really know how to interpret that. (Also, am I somehow
>using GProf wrongly? It seems like it shouldn't run out of memory
>profiling a 5-second program run...)
>
>Can anyone offer any advice on what I should be looking at here?
>
>Thanks,
>Paul

Re: Optimising a Groovy script

Posted by Paul Moore <p....@gmail.com>.

On 29 March 2017 at 20:54, Keith Suderman <su...@anc.org> wrote:
> Two optimizations I have not seen mentioned so far; don't be so Groovy ;-)

:-) My background is Python, so I tend to think in terms of highly
dynamic code. That of course is why I like Groovy ;-)

> 1. Replace the the Map<> with an array of primitive ints. Why use an integer
> as a key into a hash map when it can be used as an array index?
> int[] results = new int[19] // since we need to index values 3..18

Two reasons, which I did mention, but only in in passing. I don't
necessarily know the range of outputs (so a fixed upper bound may not
be appropriate), and in some cases the result may not even be an int.
I could special-case those situations, but I'd prefer to get as much
out of a general design as I can before going down that route.

> 2. Replace the N.times{} or (1..N).each{} loops with a good old fashioned
> for(int i=0;i<N;i++) loop. I was actually surprised how much of an
> improvement this made if @CompileStatic was not used.  For statically
> compiled code this didn't really make a difference, but for dynamic code the
> speed up is huge.

I'll definitely try that!

The roll() function will ultimately (in my final version) be supplied
by the user, so I don't want to impose too many restrictions on what's
allowed there. I'm OK with making the user responsible for a certain
level of tuning the code, but a key goal for me is to make the way the
user specifies the function to simulate as straightforward as
possible.

> Of course, the big winner is using @CompileStatic, which when combined with
> using an array results in a  ~10x speed up.

Yep, those figures are impressive. I'll take a good look at
@CompileStatic - I've not really looked at it much yet. From what I've
read, it imposes some restrictions on the code, but I don't think the
restrictions will be a problem for my situation. One thought, though -
if the function being simulated is supplied by the user, would it have
to be the user's responsibility to include the @CompileStatic
annotation? I assume it's not possible for my simulation function to
take the user's code as input, and apply @CompileStatic to it before
running it? Even if I don't want a solution that reads code at
runtime, I'd still probably like a function that took the user's code
as a closure - something like "simulate(1000000) { user code here }"
but again I assume that means I can't use @CompileStatic?

The array definitely seems like a win, and as it's something I can do
in the driver code without affecting the user code, I may look again
at that. Maybe I'll try a hybrid approach that uses an array for most
values with a map to hold "outliers" (numbers bigger than I expect and
non-numbers). That may lose the benefit, though - the only way to know
for sure is to measure.

Thanks for the suggestions.
Paul

Re: Optimising a Groovy script

Posted by Keith Suderman <su...@anc.org>.

Two optimizations I have not seen mentioned so far; don't be so Groovy ;-)

1. Replace the the Map<> with an array of primitive ints. Why use an integer as a key into a hash map when it can be used as an array index? 
	int[] results = new int[19] // since we need to index values 3..18

2. Replace the N.times{} or (1..N).each{} loops with a good old fashioned for(int i=0;i<N;i++) loop. I was actually surprised how much of an improvement this made if @CompileStatic was not used.  For statically compiled code this didn't really make a difference, but for dynamic code the speed up is huge.

Of course, the big winner is using @CompileStatic, which when combined with using an array results in a  ~10x speed up.

For my tests I basically took the code posted by John and James and ran 50 rounds of 1M iterations.  Here are some representative total times on my oldish quad core MacBook Pro:

@CompileDynamic
Map + N.times = 17 seconds
Map + for-loop = 7 seconds
Array + N.times = 13 seconds
Array + for-loop = 3 seconds

@CompileStatic
Map + N.times = 3.4 seconds
Map + for-loop = 3.0 seconds
Array + N.times = 1.4 seconds
Array + for-loop = 1.4 seconds


Interesting, dynamically compiled code using a primitive array and a for-loop performed just as well as statically compiled code doing it the "Groovy way".

- Keith


> On Mar 28, 2017, at 4:25 PM, Paul Moore <p....@gmail.com> wrote:
> 
> I'm very much a newbie with Groovy, so I apologise in advance if this
> is not the right place for questions like this. I couldn't find
> anywhere else that looked like a better option - if there is somewhere
> I should have asked, feel free to redirect me.
> 
> I want to write a simulation script using Groovy - this is something
> of a hobby challenge for me, I have a friend who has done a similar
> task in C++, and I'm looking for a more user-friendly language to
> write the code, while not losing too much performance over the C
> version.
> 
> The code is basically to simulate "a game". The particular game is
> defined by the user, as a function that generates scores. The program
> runs the game many times, and summarises the distribution of the
> results. Basically, a Monte Carlo simulation. My current code in
> Groovy for this, using "roll 3 dice and add up the results" as the
> target game, looks as follows:
> 
> @Grapes(
>    @Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
> )
> import org.apache.commons.math3.random.MersenneTwister
> 
> def benchmark = { closure ->
>  start = System.currentTimeMillis()
>  closure.call()
>  now = System.currentTimeMillis()
>  now - start
> }
> 
> rng = new MersenneTwister()
> 
> int roll() {
>    rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
> }
> 
> int N = 1000000
> def results = [:]
> 
> def time = benchmark {
>    N.times {
>        int n = roll()
>        results[n] = results.containsKey(n) ? results[n] + 1 : 1
>    }
> }
> 
> println "Took ${time/100} sec"
> for (e in results.sort()) {
>    println "$e.key: $e.value"
> }
> 
> This does exactly what I want, but takes about 5 seconds to run the
> simulation on my PC. My friend's C++ code runs a similar simulation in
> about 0.1 second. That's a massive penalty for Groovy, and likely
> means that for more realistic simulations (which would be a lot more
> complex than 3d6!) I wouldn't even be close to competitive.
> 
> The code above is completely unoptimised. I know that Groovy's dynamic
> programming features can introduce some overhead, but I also get the
> impression from the documentation that by careful use of exact types,
> and similar techniques, this can be speeded up a lot (the docs claim
> potentially better than C performance in some cases).
> 
> What should I be looking at to optimise the above code? The areas I
> can think of are:
> 
> 1. The RNG. I assume that the apache commons code is pretty efficient,
> though. I do want a reasonably decent RNG, and I'd heard that the JVM
> RNG is not sufficient for simulation. For now I'm assuming that this
> is sufficiently good.
> 2. The roll() function. This is the core of the inner loop, and likely
> the big bottleneck. I've declared the type, which I guess is the first
> step, but I don't know what else I can do here. I tried a
> CompileStatic annotation, but that gave me errors about referencing
> the rng variable. I'm not sure what that implies - is my code doing
> something wrong in how it references the global rng variable?
> 3. Collecting the results in a map is likely not ideal. Is there a
> better data structure I should be using? I basically want to be able
> to count how many times each result appears - results will be integers
> (in certain cases I might want non-integers but I can handle them
> exceptionally) but I don't necessarily know the range in advance (so
> I'd avoid a static array unless it gives significant performance
> benefits - when I did a quick test, I got about a second faster
> runtime, noticeable, but not enough to get me anywhere near my sub-1
> second target)
> 
> I tried using the GProf profiler to see if that shed any light on what
> I should do. When I ran with a realistic number of iterations it took
> forever and then failed with an out of memory error. Dropping it to
> 10000 iterations, I got
> 
> %    cumulative   self            self     total    self    total
> self    total
> time   seconds    seconds  calls  ms/call  ms/call  min ms  min ms
> max ms  max ms  name
> 34.5        0.04     0.04  10000     0.00     0.00    0.00    0.00
> 0.91    1.13  blackpool$_run_closure2$_closure3.roll
> 18.5        0.06     0.02  39984     0.00     0.00    0.00    0.00
> 0.53    0.53  java.lang.Integer.plus
> 15.9        0.08     0.02  10000     0.00     0.01    0.00    0.00
> 0.28    1.26  blackpool$_run_closure2$_closure3.doCall
> 11.0        0.10     0.01  30000     0.00     0.00    0.00    0.00
> 0.36    0.36  org.apache.commons.math3.random.MersenneTwister.nextInt
> 5.1        0.10     0.00  10000     0.00     0.00    0.00    0.00
> 0.19    0.19  java.util.LinkedHashMap.containsKey
> 4.4        0.11     0.00  10000     0.00     0.00    0.00    0.00
> 0.02    0.02  java.util.LinkedHashMap.putAt
> 4.0        0.12     0.00      1     5.25   125.62    5.25  125.62
> 5.25  125.62  java.lang.Integer.times
> 3.9        0.12     0.00   9984     0.00     0.00    0.00    0.00
> 0.02    0.02  java.util.LinkedHashMap.getAt
> 2.2        0.12     0.00      1     2.85   128.48    2.85  128.48
> 2.85  128.48  blackpool$_run_closure2.doCall
> 
> But I don't really know how to interpret that. (Also, am I somehow
> using GProf wrongly? It seems like it shouldn't run out of memory
> profiling a 5-second program run...)
> 
> Can anyone offer any advice on what I should be looking at here?
> 
> Thanks,
> Paul

----------------------
Keith Suderman
Research Associate
Department of Computer Science
Vassar College, Poughkeepsie NY
suderman@cs.vassar.edu

Re: Optimising a Groovy script

Posted by James Kleeh <ja...@gmail.com>.

This version runs around 0.10

@Grapes(
    @Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
)
import org.apache.commons.math3.random.MersenneTwister

@groovy.transform.CompileStatic
class Benchmark {

    int benchmark(Closure closure) {
      def start = System.currentTimeMillis()
      closure.call()
      System.currentTimeMillis() - start
    }

    def run() {
        int N = 1000000
        Map<Integer,Integer> results
        
        int time = benchmark {
            results = Twister.rolls(N)
        }
        
        println "Took ${time/1000} sec"
        for (e in results.sort()) {
            println "$e.key: $e.value"
        }
    }

}

@groovy.transform.CompileStatic
class Twister {
    static MersenneTwister rng = new MersenneTwister()
    
    static int roll() {
        rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
    }
    
    static Map<Integer,Integer> rolls(int num) {
        Map<Integer,Integer> results = [:]
        num.times {
            int n = roll()
            results[n] = results.containsKey(n) ? results[n] + 1 : 1
        }
        return results
    }
}

new Benchmark().run()

> On Mar 28, 2017, at 5:41 PM, John Wagenleitner <jo...@gmail.com> wrote:
> 
> Hi Paul,
> 
> The milliseconds to seconds conversion was off, so that puts the real time at ~0.5 seconds.
>  
> println "Took ${time/100} sec"
> 
> Using the following I get somewhere close to 0.15.  Using an int array may be worth it for higher values of N to avoid the boxing/unboxing of the ints.
> 
> @Grapes(
>     @Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
> )
> import org.apache.commons.math3.random.MersenneTwister
> 
> def benchmark = { closure ->
>   start = System.currentTimeMillis()
>   closure.call()
>   now = System.currentTimeMillis()
>   now - start
> }
> 
> @groovy.transform.CompileStatic
> class Twister {
>     static MersenneTwister rng = new MersenneTwister()
>     
>     static int roll() {
>         rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
>     }
>     
>     static Map<Integer,Integer> rolls(int num) {
>         Map<Integer,Integer> results = [:]
>         num.times {
>             int n = roll()
>             results[n] = results.containsKey(n) ? results[n] + 1 : 1
>         }
>         return results
>     }
> }
> 
> int N = 1000000
> def results
> 
> def time = benchmark {
>     results = Twister.rolls(N)
> }
> 
> println "Took ${time/1000} sec"
> for (e in results.sort()) {
>     println "$e.key: $e.value"
> }
> 
>  
> 
> On Tue, Mar 28, 2017 at 1:25 PM, Paul Moore <p.f.moore@gmail.com <ma...@gmail.com>> wrote:
> I'm very much a newbie with Groovy, so I apologise in advance if this
> is not the right place for questions like this. I couldn't find
> anywhere else that looked like a better option - if there is somewhere
> I should have asked, feel free to redirect me.
> 
> I want to write a simulation script using Groovy - this is something
> of a hobby challenge for me, I have a friend who has done a similar
> task in C++, and I'm looking for a more user-friendly language to
> write the code, while not losing too much performance over the C
> version.
> 
> The code is basically to simulate "a game". The particular game is
> defined by the user, as a function that generates scores. The program
> runs the game many times, and summarises the distribution of the
> results. Basically, a Monte Carlo simulation. My current code in
> Groovy for this, using "roll 3 dice and add up the results" as the
> target game, looks as follows:
> 
> @Grapes(
>     @Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
> )
> import org.apache.commons.math3.random.MersenneTwister
> 
> def benchmark = { closure ->
>   start = System.currentTimeMillis()
>   closure.call()
>   now = System.currentTimeMillis()
>   now - start
> }
> 
> rng = new MersenneTwister()
> 
> int roll() {
>     rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
> }
> 
> int N = 1000000
> def results = [:]
> 
> def time = benchmark {
>     N.times {
>         int n = roll()
>         results[n] = results.containsKey(n) ? results[n] + 1 : 1
>     }
> }
> 
> println "Took ${time/100} sec"
> for (e in results.sort()) {
>     println "$e.key: $e.value"
> }
> 
> This does exactly what I want, but takes about 5 seconds to run the
> simulation on my PC. My friend's C++ code runs a similar simulation in
> about 0.1 second. That's a massive penalty for Groovy, and likely
> means that for more realistic simulations (which would be a lot more
> complex than 3d6!) I wouldn't even be close to competitive.
> 
> The code above is completely unoptimised. I know that Groovy's dynamic
> programming features can introduce some overhead, but I also get the
> impression from the documentation that by careful use of exact types,
> and similar techniques, this can be speeded up a lot (the docs claim
> potentially better than C performance in some cases).
> 
> What should I be looking at to optimise the above code? The areas I
> can think of are:
> 
> 1. The RNG. I assume that the apache commons code is pretty efficient,
> though. I do want a reasonably decent RNG, and I'd heard that the JVM
> RNG is not sufficient for simulation. For now I'm assuming that this
> is sufficiently good.
> 2. The roll() function. This is the core of the inner loop, and likely
> the big bottleneck. I've declared the type, which I guess is the first
> step, but I don't know what else I can do here. I tried a
> CompileStatic annotation, but that gave me errors about referencing
> the rng variable. I'm not sure what that implies - is my code doing
> something wrong in how it references the global rng variable?
> 3. Collecting the results in a map is likely not ideal. Is there a
> better data structure I should be using? I basically want to be able
> to count how many times each result appears - results will be integers
> (in certain cases I might want non-integers but I can handle them
> exceptionally) but I don't necessarily know the range in advance (so
> I'd avoid a static array unless it gives significant performance
> benefits - when I did a quick test, I got about a second faster
> runtime, noticeable, but not enough to get me anywhere near my sub-1
> second target)
> 
> I tried using the GProf profiler to see if that shed any light on what
> I should do. When I ran with a realistic number of iterations it took
> forever and then failed with an out of memory error. Dropping it to
> 10000 iterations, I got
> 
>  %    cumulative   self            self     total    self    total
> self    total
> time   seconds    seconds  calls  ms/call  ms/call  min ms  min ms
> max ms  max ms  name
> 34.5        0.04     0.04  10000     0.00     0.00    0.00    0.00
> 0.91    1.13  blackpool$_run_closure2$_closure3.roll
> 18.5        0.06     0.02  39984     0.00     0.00    0.00    0.00
> 0.53    0.53  java.lang.Integer.plus
> 15.9        0.08     0.02  10000     0.00     0.01    0.00    0.00
> 0.28    1.26  blackpool$_run_closure2$_closure3.doCall
> 11.0        0.10     0.01  30000     0.00     0.00    0.00    0.00
> 0.36    0.36  org.apache.commons.math3.random.MersenneTwister.nextInt
>  5.1        0.10     0.00  10000     0.00     0.00    0.00    0.00
> 0.19    0.19  java.util.LinkedHashMap.containsKey
>  4.4        0.11     0.00  10000     0.00     0.00    0.00    0.00
> 0.02    0.02  java.util.LinkedHashMap.putAt
>  4.0        0.12     0.00      1     5.25   125.62    5.25  125.62
> 5.25  125.62  java.lang.Integer.times
>  3.9        0.12     0.00   9984     0.00     0.00    0.00    0.00
> 0.02    0.02  java.util.LinkedHashMap.getAt
>  2.2        0.12     0.00      1     2.85   128.48    2.85  128.48
> 2.85  128.48  blackpool$_run_closure2.doCall
> 
> But I don't really know how to interpret that. (Also, am I somehow
> using GProf wrongly? It seems like it shouldn't run out of memory
> profiling a 5-second program run...)
> 
> Can anyone offer any advice on what I should be looking at here?
> 
> Thanks,
> Paul
>

Re: Optimising a Groovy script

Posted by Paul Moore <p....@gmail.com>.

On 28 March 2017 at 22:41, John Wagenleitner
<jo...@gmail.com> wrote:
> Hi Paul,
>
> The milliseconds to seconds conversion was off, so that puts the real time
> at ~0.5 seconds.
>
>>
>> println "Took ${time/100} sec"

*bangs head against desk*...

It's always the stupid mistakes that trip you up!

Regardless, though, thanks to Erick and James for their suggestions,
there are a number of things in them that look a lot cleaner and add
further timing improvements.

Paul

Re: Optimising a Groovy script

Posted by John Wagenleitner <jo...@gmail.com>.

Hi Paul,

The milliseconds to seconds conversion was off, so that puts the real time
at ~0.5 seconds.


> println "Took ${time/100} sec"


Using the following I get somewhere close to 0.15.  Using an int array may
be worth it for higher values of N to avoid the boxing/unboxing of the ints.

@Grapes(
    @Grab(group='org.apache.commons', module='commons-math3', version=
'3.6.1')
)
import org.apache.commons.math3.random.MersenneTwister

def benchmark = { closure ->
  start = System.currentTimeMillis()
  closure.call()
  now = System.currentTimeMillis()
  now - start
}

@groovy.transform.CompileStatic
class Twister {
    static MersenneTwister rng = new MersenneTwister()

    static int roll() {
        rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
    }

    static Map<Integer,Integer> rolls(int num) {
        Map<Integer,Integer> results = [:]
        num.times {
            int n = roll()
            results[n] = results.containsKey(n) ? results[n] + 1 : 1
        }
        return results
    }
}

int N = 1000000
def results

def time = benchmark {
    results = Twister.rolls(N)
}

println "Took ${time/1000} sec"
for (e in results.sort()) {
    println "$e.key: $e.value"
}



On Tue, Mar 28, 2017 at 1:25 PM, Paul Moore <p....@gmail.com> wrote:

> I'm very much a newbie with Groovy, so I apologise in advance if this
> is not the right place for questions like this. I couldn't find
> anywhere else that looked like a better option - if there is somewhere
> I should have asked, feel free to redirect me.
>
> I want to write a simulation script using Groovy - this is something
> of a hobby challenge for me, I have a friend who has done a similar
> task in C++, and I'm looking for a more user-friendly language to
> write the code, while not losing too much performance over the C
> version.
>
> The code is basically to simulate "a game". The particular game is
> defined by the user, as a function that generates scores. The program
> runs the game many times, and summarises the distribution of the
> results. Basically, a Monte Carlo simulation. My current code in
> Groovy for this, using "roll 3 dice and add up the results" as the
> target game, looks as follows:
>
> @Grapes(
>     @Grab(group='org.apache.commons', module='commons-math3',
> version='3.6.1')
> )
> import org.apache.commons.math3.random.MersenneTwister
>
> def benchmark = { closure ->
>   start = System.currentTimeMillis()
>   closure.call()
>   now = System.currentTimeMillis()
>   now - start
> }
>
> rng = new MersenneTwister()
>
> int roll() {
>     rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
> }
>
> int N = 1000000
> def results = [:]
>
> def time = benchmark {
>     N.times {
>         int n = roll()
>         results[n] = results.containsKey(n) ? results[n] + 1 : 1
>     }
> }
>
> println "Took ${time/100} sec"
> for (e in results.sort()) {
>     println "$e.key: $e.value"
> }
>
> This does exactly what I want, but takes about 5 seconds to run the
> simulation on my PC. My friend's C++ code runs a similar simulation in
> about 0.1 second. That's a massive penalty for Groovy, and likely
> means that for more realistic simulations (which would be a lot more
> complex than 3d6!) I wouldn't even be close to competitive.
>
> The code above is completely unoptimised. I know that Groovy's dynamic
> programming features can introduce some overhead, but I also get the
> impression from the documentation that by careful use of exact types,
> and similar techniques, this can be speeded up a lot (the docs claim
> potentially better than C performance in some cases).
>
> What should I be looking at to optimise the above code? The areas I
> can think of are:
>
> 1. The RNG. I assume that the apache commons code is pretty efficient,
> though. I do want a reasonably decent RNG, and I'd heard that the JVM
> RNG is not sufficient for simulation. For now I'm assuming that this
> is sufficiently good.
> 2. The roll() function. This is the core of the inner loop, and likely
> the big bottleneck. I've declared the type, which I guess is the first
> step, but I don't know what else I can do here. I tried a
> CompileStatic annotation, but that gave me errors about referencing
> the rng variable. I'm not sure what that implies - is my code doing
> something wrong in how it references the global rng variable?
> 3. Collecting the results in a map is likely not ideal. Is there a
> better data structure I should be using? I basically want to be able
> to count how many times each result appears - results will be integers
> (in certain cases I might want non-integers but I can handle them
> exceptionally) but I don't necessarily know the range in advance (so
> I'd avoid a static array unless it gives significant performance
> benefits - when I did a quick test, I got about a second faster
> runtime, noticeable, but not enough to get me anywhere near my sub-1
> second target)
>
> I tried using the GProf profiler to see if that shed any light on what
> I should do. When I ran with a realistic number of iterations it took
> forever and then failed with an out of memory error. Dropping it to
> 10000 iterations, I got
>
>  %    cumulative   self            self     total    self    total
> self    total
> time   seconds    seconds  calls  ms/call  ms/call  min ms  min ms
> max ms  max ms  name
> 34.5        0.04     0.04  10000     0.00     0.00    0.00    0.00
> 0.91    1.13  blackpool$_run_closure2$_closure3.roll
> 18.5        0.06     0.02  39984     0.00     0.00    0.00    0.00
> 0.53    0.53  java.lang.Integer.plus
> 15.9        0.08     0.02  10000     0.00     0.01    0.00    0.00
> 0.28    1.26  blackpool$_run_closure2$_closure3.doCall
> 11.0        0.10     0.01  30000     0.00     0.00    0.00    0.00
> 0.36    0.36  org.apache.commons.math3.random.MersenneTwister.nextInt
>  5.1        0.10     0.00  10000     0.00     0.00    0.00    0.00
> 0.19    0.19  java.util.LinkedHashMap.containsKey
>  4.4        0.11     0.00  10000     0.00     0.00    0.00    0.00
> 0.02    0.02  java.util.LinkedHashMap.putAt
>  4.0        0.12     0.00      1     5.25   125.62    5.25  125.62
> 5.25  125.62  java.lang.Integer.times
>  3.9        0.12     0.00   9984     0.00     0.00    0.00    0.00
> 0.02    0.02  java.util.LinkedHashMap.getAt
>  2.2        0.12     0.00      1     2.85   128.48    2.85  128.48
> 2.85  128.48  blackpool$_run_closure2.doCall
>
> But I don't really know how to interpret that. (Also, am I somehow
> using GProf wrongly? It seems like it shouldn't run out of memory
> profiling a 5-second program run...)
>
> Can anyone offer any advice on what I should be looking at here?
>
> Thanks,
> Paul
>