You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Peter Thygesen <pt...@gmail.com> on 2012/04/12 11:32:46 UTC

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Strange. Still problems. I reduced the corpus files to 10 files. running
with RunAE still doesn't produce any events, but when I run it with UIMA
Analysis Engine configuration it works.
I'm stuck.
:(

Den 30. mar. 2012 10.21 skrev Jörn Kottmann <ko...@gmail.com>:

> On 03/29/2012 11:22 AM, Peter Thygesen wrote:
>
>> Using RunAE;
>> Must be doing something wrong. No model is created and I dont see any
>> scores being generated...
>>
>> main class: org.apache.uima.examples.RunAE
>>
>> arguments: -s2 descriptors/TokenizerTrainer.**xml corpus
>>
>> VM args: -Xmx1000m
>>
>>
>>
>> CONSOLE OUTPUT:
>> ------------------------------**--------
>> Processed Document aaaaaaa.xmi
>>
>> .....
>>
>> Processed Document zzzzzzz.xmi
>>
>> Mar 29, 2012 11:17:00 AM opennlp.uima.tokenize.**TokenizerTrainer
>> collectionProcessComplete(203)
>>
>> INFO: Collected 929 token samples.
>>
>
> It was able to find 929 sentences, but maybe they do not
> contain tokens?
>
> You should check the sentence and token type in your Tokenizer Trainer
> descriptor. Does the specified types there match with the annotations
> in the CAS?
>
>
>  Indexing events using cutoff of 5
>>
>>
>>  Computing event counts...  done. 0 events
>>
>>
>>
> It should be able to generate a couple of thousand events
> here. So it is strange that its zero.
>
> Anyway we might want to enhance the log output a bit so we can
> find problems.
>
> Jörn
>

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Peter Thygesen <pt...@gmail.com>.
Yes! Now it works. Apparently I was missing the -xmi option in the
arguments passed to the program. The reason for this is that when you
suggested that I used RunAE class to build my token model, I read the
documentation written above the class (in source code) Here you (the uima
team) forgot to describe the -xmi option. :-/ I found it when I tried to
run it from the terminal.

thanks for you help Jörn.

Peter Thygesen

Den 13. apr. 2012 11.47 skrev Jörn Kottmann <ko...@gmail.com>:

> On 04/12/2012 11:32 AM, Peter Thygesen wrote:
>
>> Strange. Still problems. I reduced the corpus files to 10 files. running
>> with RunAE still doesn't produce any events, but when I run it with UIMA
>> Analysis Engine configuration it works.
>>
>
> That sounds strange, because it should not make a difference at all.
> Trivial reasons for that are that something is really different,
> e.g. you consume not the same CASes, you use another xml descriptor
> for the training, etc. I suggest to double check that.
>
> Or you are just hitting some kind of bug. To figure that out we should
> improve the log output of the OpenNLP Tokenizer Trainer AE in a way
> it actually tells us what is wrong.
> Would you mind to build a trunk version of OpenNLP and test with that one
> instead?
>
> Jörn
>
>

Re: How do I assign more heap memory to CAS Editor Analysis Engine when running Training?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 04/12/2012 11:32 AM, Peter Thygesen wrote:
> Strange. Still problems. I reduced the corpus files to 10 files. running
> with RunAE still doesn't produce any events, but when I run it with UIMA
> Analysis Engine configuration it works.

That sounds strange, because it should not make a difference at all.
Trivial reasons for that are that something is really different,
e.g. you consume not the same CASes, you use another xml descriptor
for the training, etc. I suggest to double check that.

Or you are just hitting some kind of bug. To figure that out we should
improve the log output of the OpenNLP Tokenizer Trainer AE in a way
it actually tells us what is wrong.
Would you mind to build a trunk version of OpenNLP and test with that one
instead?

Jörn