You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Peter Thygesen <pt...@gmail.com> on 2012/03/23 00:34:05 UTC

How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Trying to build a token model with UIMA CAS editor. But I get out of memory
exception. The run configuration is a UIMA Analysis Engine configuration
and does not have a arguments tab (like plain java apps) where I can set
the command line arguments for assigning more memory....

Yes I know.. probably a newbie question :-{

/PEter

Mar 23, 2012 12:27:04 AM opennlp.uima.tokenize.TokenizerTrainer
collectionProcessComplete(203)

INFO: Collected 930 token samples.

Indexing events using cutoff of 5


 Computing event counts...  done. 1369984 events

Indexing...  Exception in thread "Poller SunPKCS11-Darwin"
java.lang.OutOfMemoryError: Java heap space

at sun.security.pkcs11.wrapper.PKCS11.C_GetSlotInfo(Native Method)

at sun.security.pkcs11.SunPKCS11.initToken(SunPKCS11.java:767)

at sun.security.pkcs11.SunPKCS11.access$100(SunPKCS11.java:42)

at sun.security.pkcs11.SunPKCS11$TokenPoller.run(SunPKCS11.java:700)

at java.lang.Thread.run(Thread.java:680)

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Jörn Kottmann <ko...@gmail.com>.
No, we added an eclipse launcher to start AEs, so that you do
not have to use some kind of script.

Jörn

On 03/23/2012 02:06 AM, Fitch, Britt wrote:
> I assume you are running one of the scripts in the bin directory, if so, you can add the java -Xmx param to where it gets called. If you look in runUimaClass[bat|sh] there is a var for JVM_OPTS
>
> Hope that helps.
>
>
> ________________________________________
> From: Peter Thygesen [pt.activemq@gmail.com]
> Sent: Thursday, March 22, 2012 7:34 PM
> To: user@uima.apache.org
> Subject: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?
>
> Trying to build a token model with UIMA CAS editor. But I get out of memory
> exception. The run configuration is a UIMA Analysis Engine configuration
> and does not have a arguments tab (like plain java apps) where I can set
> the command line arguments for assigning more memory....
>
> Yes I know.. probably a newbie question :-{
>
> /PEter
>
> Mar 23, 2012 12:27:04 AM opennlp.uima.tokenize.TokenizerTrainer
> collectionProcessComplete(203)
>
> INFO: Collected 930 token samples.
>
> Indexing events using cutoff of 5
>
>
>   Computing event counts...  done. 1369984 events
>
> Indexing...  Exception in thread "Poller SunPKCS11-Darwin"
> java.lang.OutOfMemoryError: Java heap space
>
> at sun.security.pkcs11.wrapper.PKCS11.C_GetSlotInfo(Native Method)
>
> at sun.security.pkcs11.SunPKCS11.initToken(SunPKCS11.java:767)
>
> at sun.security.pkcs11.SunPKCS11.access$100(SunPKCS11.java:42)
>
> at sun.security.pkcs11.SunPKCS11$TokenPoller.run(SunPKCS11.java:700)
>
> at java.lang.Thread.run(Thread.java:680)


RE: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by "Fitch, Britt" <Br...@hms.harvard.edu>.
I assume you are running one of the scripts in the bin directory, if so, you can add the java -Xmx param to where it gets called. If you look in runUimaClass[bat|sh] there is a var for JVM_OPTS

Hope that helps.


________________________________________
From: Peter Thygesen [pt.activemq@gmail.com]
Sent: Thursday, March 22, 2012 7:34 PM
To: user@uima.apache.org
Subject: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Trying to build a token model with UIMA CAS editor. But I get out of memory
exception. The run configuration is a UIMA Analysis Engine configuration
and does not have a arguments tab (like plain java apps) where I can set
the command line arguments for assigning more memory....

Yes I know.. probably a newbie question :-{

/PEter

Mar 23, 2012 12:27:04 AM opennlp.uima.tokenize.TokenizerTrainer
collectionProcessComplete(203)

INFO: Collected 930 token samples.

Indexing events using cutoff of 5


 Computing event counts...  done. 1369984 events

Indexing...  Exception in thread "Poller SunPKCS11-Darwin"
java.lang.OutOfMemoryError: Java heap space

at sun.security.pkcs11.wrapper.PKCS11.C_GetSlotInfo(Native Method)

at sun.security.pkcs11.SunPKCS11.initToken(SunPKCS11.java:767)

at sun.security.pkcs11.SunPKCS11.access$100(SunPKCS11.java:42)

at sun.security.pkcs11.SunPKCS11$TokenPoller.run(SunPKCS11.java:700)

at java.lang.Thread.run(Thread.java:680)

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Peter Thygesen <pt...@gmail.com>.
Yes! Now it works. Apparently I was missing the -xmi option in the
arguments passed to the program. The reason for this is that when you
suggested that I used RunAE class to build my token model, I read the
documentation written above the class (in source code) Here you (the uima
team) forgot to describe the -xmi option. :-/ I found it when I tried to
run it from the terminal.

thanks for you help Jörn.

Peter Thygesen

Den 13. apr. 2012 11.47 skrev Jörn Kottmann <ko...@gmail.com>:

> On 04/12/2012 11:32 AM, Peter Thygesen wrote:
>
>> Strange. Still problems. I reduced the corpus files to 10 files. running
>> with RunAE still doesn't produce any events, but when I run it with UIMA
>> Analysis Engine configuration it works.
>>
>
> That sounds strange, because it should not make a difference at all.
> Trivial reasons for that are that something is really different,
> e.g. you consume not the same CASes, you use another xml descriptor
> for the training, etc. I suggest to double check that.
>
> Or you are just hitting some kind of bug. To figure that out we should
> improve the log output of the OpenNLP Tokenizer Trainer AE in a way
> it actually tells us what is wrong.
> Would you mind to build a trunk version of OpenNLP and test with that one
> instead?
>
> Jörn
>
>

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 04/12/2012 11:32 AM, Peter Thygesen wrote:
> Strange. Still problems. I reduced the corpus files to 10 files. running
> with RunAE still doesn't produce any events, but when I run it with UIMA
> Analysis Engine configuration it works.

That sounds strange, because it should not make a difference at all.
Trivial reasons for that are that something is really different,
e.g. you consume not the same CASes, you use another xml descriptor
for the training, etc. I suggest to double check that.

Or you are just hitting some kind of bug. To figure that out we should
improve the log output of the OpenNLP Tokenizer Trainer AE in a way
it actually tells us what is wrong.
Would you mind to build a trunk version of OpenNLP and test with that one
instead?

Jörn


Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Peter Thygesen <pt...@gmail.com>.
Strange. Still problems. I reduced the corpus files to 10 files. running
with RunAE still doesn't produce any events, but when I run it with UIMA
Analysis Engine configuration it works.
I'm stuck.
:(

Den 30. mar. 2012 10.21 skrev Jörn Kottmann <ko...@gmail.com>:

> On 03/29/2012 11:22 AM, Peter Thygesen wrote:
>
>> Using RunAE;
>> Must be doing something wrong. No model is created and I dont see any
>> scores being generated...
>>
>> main class: org.apache.uima.examples.RunAE
>>
>> arguments: -s2 descriptors/TokenizerTrainer.**xml corpus
>>
>> VM args: -Xmx1000m
>>
>>
>>
>> CONSOLE OUTPUT:
>> ------------------------------**--------
>> Processed Document aaaaaaa.xmi
>>
>> .....
>>
>> Processed Document zzzzzzz.xmi
>>
>> Mar 29, 2012 11:17:00 AM opennlp.uima.tokenize.**TokenizerTrainer
>> collectionProcessComplete(203)
>>
>> INFO: Collected 929 token samples.
>>
>
> It was able to find 929 sentences, but maybe they do not
> contain tokens?
>
> You should check the sentence and token type in your Tokenizer Trainer
> descriptor. Does the specified types there match with the annotations
> in the CAS?
>
>
>  Indexing events using cutoff of 5
>>
>>
>>  Computing event counts...  done. 0 events
>>
>>
>>
> It should be able to generate a couple of thousand events
> here. So it is strange that its zero.
>
> Anyway we might want to enhance the log output a bit so we can
> find problems.
>
> Jörn
>

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 03/29/2012 11:22 AM, Peter Thygesen wrote:
> Using RunAE;
> Must be doing something wrong. No model is created and I dont see any
> scores being generated...
>
> main class: org.apache.uima.examples.RunAE
>
> arguments: -s2 descriptors/TokenizerTrainer.xml corpus
>
> VM args: -Xmx1000m
>
>
>
> CONSOLE OUTPUT:
> --------------------------------------
> Processed Document aaaaaaa.xmi
>
> .....
>
> Processed Document zzzzzzz.xmi
>
> Mar 29, 2012 11:17:00 AM opennlp.uima.tokenize.TokenizerTrainer
> collectionProcessComplete(203)
>
> INFO: Collected 929 token samples.

It was able to find 929 sentences, but maybe they do not
contain tokens?

You should check the sentence and token type in your Tokenizer Trainer
descriptor. Does the specified types there match with the annotations
in the CAS?

> Indexing events using cutoff of 5
>
>
>   Computing event counts...  done. 0 events
>
>

It should be able to generate a couple of thousand events
here. So it is strange that its zero.

Anyway we might want to enhance the log output a bit so we can
find problems.

Jörn

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Peter Thygesen <pt...@gmail.com>.
Using RunAE;
Must be doing something wrong. No model is created and I dont see any
scores being generated...

main class: org.apache.uima.examples.RunAE

arguments: -s2 descriptors/TokenizerTrainer.xml corpus

VM args: -Xmx1000m



CONSOLE OUTPUT:
--------------------------------------
Processed Document aaaaaaa.xmi

.....

Processed Document zzzzzzz.xmi

Mar 29, 2012 11:17:00 AM opennlp.uima.tokenize.TokenizerTrainer
collectionProcessComplete(203)

INFO: Collected 929 token samples.

Indexing events using cutoff of 5


 Computing event counts...  done. 0 events

Indexing...  done.

Sorting and merging events... Done indexing.

Incorporating indexed data for training...

PERFORMANCE STATS

-----------------

Component Name: File System Collection Reader

Event Type: Process

Duration: 614ms (79.64%)

Result: success

Component Name: UserAE

Event Type: Analysis

Duration: 92ms (11.93%)

Component Name: UserAE

Event Type: End of Batch

Duration: 65ms (8.43%)



Total Analysis Engine Time: 0ms

Analysis: <10ms

Framework Overhead: <10ms


Den 29. mar. 2012 10.22 skrev Peter Thygesen <pt...@gmail.com>:

> Hi Jörn,
> Sorry for the late reply. Caught a cold and had to spend some days home :(.
>
> Couldn't we log it as a feature request in your Jira? Perhaps someone else
> will encounter this problem one day.
>
> thx
> Peter
>
>
>
> Den 23. mar. 2012 10.01 skrev Jörn Kottmann <ko...@gmail.com>:
>
> On 03/23/2012 12:34 AM, Peter Thygesen wrote:
>>
>>> Trying to build a token model with UIMA CAS editor. But I get out of
>>> memory
>>> exception. The run configuration is a UIMA Analysis Engine configuration
>>> and does not have a arguments tab (like plain java apps) where I can set
>>> the command line arguments for assigning more memory....
>>>
>>
>> There is currently no tab where you can specify VM arguments like
>> it is possible for Java apps. We need to add one.
>>
>> Do you want to open a jira?
>>
>> In the mean time I suggest that you use the RunAE class to do the
>> training.
>>
>> It can be found here in our distribution:
>> apache-uima/examples/src/org/**apache/uima/examples/RunAE.**java
>>
>> Or just add the uimaj-examples.jar to your class path, create a new
>> eclipse
>> launcher and then give it more memory.
>>
>> Jörn
>>
>
>

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 03/29/2012 10:22 AM, Peter Thygesen wrote:
> Hi Jörn,
> Sorry for the late reply. Caught a cold and had to spend some days home :(.
>
> Couldn't we log it as a feature request in your Jira?

+1, please open a jira issue.

Jörn

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Peter Thygesen <pt...@gmail.com>.
Hi Jörn,
Sorry for the late reply. Caught a cold and had to spend some days home :(.

Couldn't we log it as a feature request in your Jira? Perhaps someone else
will encounter this problem one day.

thx
Peter



Den 23. mar. 2012 10.01 skrev Jörn Kottmann <ko...@gmail.com>:

> On 03/23/2012 12:34 AM, Peter Thygesen wrote:
>
>> Trying to build a token model with UIMA CAS editor. But I get out of
>> memory
>> exception. The run configuration is a UIMA Analysis Engine configuration
>> and does not have a arguments tab (like plain java apps) where I can set
>> the command line arguments for assigning more memory....
>>
>
> There is currently no tab where you can specify VM arguments like
> it is possible for Java apps. We need to add one.
>
> Do you want to open a jira?
>
> In the mean time I suggest that you use the RunAE class to do the
> training.
>
> It can be found here in our distribution:
> apache-uima/examples/src/org/**apache/uima/examples/RunAE.**java
>
> Or just add the uimaj-examples.jar to your class path, create a new eclipse
> launcher and then give it more memory.
>
> Jörn
>

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 03/23/2012 12:34 AM, Peter Thygesen wrote:
> Trying to build a token model with UIMA CAS editor. But I get out of memory
> exception. The run configuration is a UIMA Analysis Engine configuration
> and does not have a arguments tab (like plain java apps) where I can set
> the command line arguments for assigning more memory....

There is currently no tab where you can specify VM arguments like
it is possible for Java apps. We need to add one.

Do you want to open a jira?

In the mean time I suggest that you use the RunAE class to do the
training.

It can be found here in our distribution:
apache-uima/examples/src/org/apache/uima/examples/RunAE.java

Or just add the uimaj-examples.jar to your class path, create a new eclipse
launcher and then give it more memory.

Jörn