You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Ar...@bka.bund.de on 2015/01/22 09:20:38 UTC

RUTA and shared resources

Hello!

This a very short and simple gazetteer using RUTA.

Document{->GREEDYANCHORING(true)};
%s*{->MARKFAST(%s,'%s')};

where the first %s is replaced using String.format() by the name of the source type, the second %s is replaced by the target type name, and the third %s is replaced by the URL of a word list. Doing so, it's a little bit for flexible. This is done once in CasAnnotator_ImplBase.initialize().

Then the script is executed with Ruta.apply(cas, script) in process(). But that means that the word list is read again for every CAS processed. Is there any way to have RUTA use the word list as a SharedResourceObject, so that it is read once only?

Regards,
Armin

Re: RUTA and shared resources

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi,

we are currently waiting for the uimaj release, and there are still some 
open issues. End of February could be realistic.

Best,

Peter

Am 27.01.2015 um 13:40 schrieb Armin.Wegner@bka.bund.de:
> Hi!
>
> Looks good, but is not part of the current release. It's not that urgent to deviate from the current stable release. Any ideas when 2.3.0 will be released.
>
> Thanks,
> Armin
>
> -----Ursprüngliche Nachricht-----
> Von: Silvestre Losada [mailto:silvestre.losada@gmail.com]
> Gesendet: Sonntag, 25. Januar 2015 08:42
> An: user@uima.apache.org
> Betreff: Re: RUTA and shared resources
>
> Hi Armin,
>
> Apologies for late response. I was able to load a datatable as external
> resource, I think that the example showed in comment is self-explanatory.
> If you have any issues loading it, please contact me.
>
> Kind regards.
>
> On 23 January 2015 at 08:59, <Ar...@bka.bund.de> wrote:
>
>> Hi Peter!
>>
>> Thanks for your help. I will look at it.
>> At least for now, greedy anchoring and markfast work as expected. But I've
>> used only short word lists with simple entries.
>>
>> Cheers,
>> Armin
>>
>>
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de]
>> Gesendet: Donnerstag, 22. Januar 2015 11:24
>> An: user@uima.apache.org
>> Betreff: Re: RUTA and shared resources
>>
>> Hi,
>>
>> Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
>>> Hello!
>>>
>>> This a very short and simple gazetteer using RUTA.
>>>
>>> Document{->GREEDYANCHORING(true)};
>>> %s*{->MARKFAST(%s,'%s')};
>> First of all, I am sorry that I was not yet able to implement the greedy
>> matching for the gazetteers/wordlists. I have not forgotten it.
>> Just curious: does the rule perform as you expect/intend? I mean the
>> combination of greedy anchoring and the windowed stream caused by the
>> matching condition.
>>
>>
>>> where the first %s is replaced using String.format() by the name of
>> the source type, the second %s is replaced by the target type name, and
>> the third %s is replaced by the URL of a word list. Doing so, it's a
>> little bit for flexible. This is done once in
>> CasAnnotator_ImplBase.initialize().
>>> Then the script is executed with Ruta.apply(cas, script) in process().
>> But that means that the word list is read again for every CAS processed.
>> Is there any way to have RUTA use the word list as a
>> SharedResourceObject, so that it is read once only?
>>
>> The problem is that Ruta.apply() creates a new descriptor and a new
>> analysis engine. You could integrate the ruta analysis engine in your
>> analysis engine as a field or something and call its process() in your
>> process() method (and initialize()). Then, the worlists should not be
>> reloaded for each process().
>>
>> As for SharedResourceObject: This should be done, but it was never at
>> the top of my todo list. I hope I will find the time sometime.
>>
>> You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
>> Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
>> table using external resources. Could also work for you maybe. Maybe
>> Silvestre can share his experiences?
>>
>> Best,
>>
>> Peter
>>
>>> Regards,
>>> Armin
>>
>>


AW: RUTA and shared resources

Posted by Ar...@bka.bund.de.
Hi!

Looks good, but is not part of the current release. It's not that urgent to deviate from the current stable release. Any ideas when 2.3.0 will be released.

Thanks,
Armin

-----Ursprüngliche Nachricht-----
Von: Silvestre Losada [mailto:silvestre.losada@gmail.com] 
Gesendet: Sonntag, 25. Januar 2015 08:42
An: user@uima.apache.org
Betreff: Re: RUTA and shared resources

Hi Armin,

Apologies for late response. I was able to load a datatable as external
resource, I think that the example showed in comment is self-explanatory.
If you have any issues loading it, please contact me.

Kind regards.

On 23 January 2015 at 08:59, <Ar...@bka.bund.de> wrote:

> Hi Peter!
>
> Thanks for your help. I will look at it.
> At least for now, greedy anchoring and markfast work as expected. But I've
> used only short word lists with simple entries.
>
> Cheers,
> Armin
>
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de]
> Gesendet: Donnerstag, 22. Januar 2015 11:24
> An: user@uima.apache.org
> Betreff: Re: RUTA and shared resources
>
> Hi,
>
> Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
> > Hello!
> >
> > This a very short and simple gazetteer using RUTA.
> >
> > Document{->GREEDYANCHORING(true)};
> > %s*{->MARKFAST(%s,'%s')};
>
> First of all, I am sorry that I was not yet able to implement the greedy
> matching for the gazetteers/wordlists. I have not forgotten it.
> Just curious: does the rule perform as you expect/intend? I mean the
> combination of greedy anchoring and the windowed stream caused by the
> matching condition.
>
>
> >
> > where the first %s is replaced using String.format() by the name of
> the source type, the second %s is replaced by the target type name, and
> the third %s is replaced by the URL of a word list. Doing so, it's a
> little bit for flexible. This is done once in
> CasAnnotator_ImplBase.initialize().
> >
> > Then the script is executed with Ruta.apply(cas, script) in process().
> But that means that the word list is read again for every CAS processed.
> Is there any way to have RUTA use the word list as a
> SharedResourceObject, so that it is read once only?
>
> The problem is that Ruta.apply() creates a new descriptor and a new
> analysis engine. You could integrate the ruta analysis engine in your
> analysis engine as a field or something and call its process() in your
> process() method (and initialize()). Then, the worlists should not be
> reloaded for each process().
>
> As for SharedResourceObject: This should be done, but it was never at
> the top of my todo list. I hope I will find the time sometime.
>
> You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
> Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
> table using external resources. Could also work for you maybe. Maybe
> Silvestre can share his experiences?
>
> Best,
>
> Peter
>
> >
> > Regards,
> > Armin
>
>
>

Re: RUTA and shared resources

Posted by Silvestre Losada <si...@gmail.com>.
Hi Armin,

Apologies for late response. I was able to load a datatable as external
resource, I think that the example showed in comment is self-explanatory.
If you have any issues loading it, please contact me.

Kind regards.

On 23 January 2015 at 08:59, <Ar...@bka.bund.de> wrote:

> Hi Peter!
>
> Thanks for your help. I will look at it.
> At least for now, greedy anchoring and markfast work as expected. But I've
> used only short word lists with simple entries.
>
> Cheers,
> Armin
>
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de]
> Gesendet: Donnerstag, 22. Januar 2015 11:24
> An: user@uima.apache.org
> Betreff: Re: RUTA and shared resources
>
> Hi,
>
> Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
> > Hello!
> >
> > This a very short and simple gazetteer using RUTA.
> >
> > Document{->GREEDYANCHORING(true)};
> > %s*{->MARKFAST(%s,'%s')};
>
> First of all, I am sorry that I was not yet able to implement the greedy
> matching for the gazetteers/wordlists. I have not forgotten it.
> Just curious: does the rule perform as you expect/intend? I mean the
> combination of greedy anchoring and the windowed stream caused by the
> matching condition.
>
>
> >
> > where the first %s is replaced using String.format() by the name of
> the source type, the second %s is replaced by the target type name, and
> the third %s is replaced by the URL of a word list. Doing so, it's a
> little bit for flexible. This is done once in
> CasAnnotator_ImplBase.initialize().
> >
> > Then the script is executed with Ruta.apply(cas, script) in process().
> But that means that the word list is read again for every CAS processed.
> Is there any way to have RUTA use the word list as a
> SharedResourceObject, so that it is read once only?
>
> The problem is that Ruta.apply() creates a new descriptor and a new
> analysis engine. You could integrate the ruta analysis engine in your
> analysis engine as a field or something and call its process() in your
> process() method (and initialize()). Then, the worlists should not be
> reloaded for each process().
>
> As for SharedResourceObject: This should be done, but it was never at
> the top of my todo list. I hope I will find the time sometime.
>
> You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
> Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
> table using external resources. Could also work for you maybe. Maybe
> Silvestre can share his experiences?
>
> Best,
>
> Peter
>
> >
> > Regards,
> > Armin
>
>
>

AW: RUTA and shared resources

Posted by Ar...@bka.bund.de.
Hi Peter!

Thanks for your help. I will look at it.
At least for now, greedy anchoring and markfast work as expected. But I've used only short word lists with simple entries.

Cheers,
Armin





-----Ursprüngliche Nachricht-----
Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de] 
Gesendet: Donnerstag, 22. Januar 2015 11:24
An: user@uima.apache.org
Betreff: Re: RUTA and shared resources

Hi,

Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
> Hello!
>
> This a very short and simple gazetteer using RUTA.
>
> Document{->GREEDYANCHORING(true)};
> %s*{->MARKFAST(%s,'%s')};

First of all, I am sorry that I was not yet able to implement the greedy
matching for the gazetteers/wordlists. I have not forgotten it.
Just curious: does the rule perform as you expect/intend? I mean the
combination of greedy anchoring and the windowed stream caused by the
matching condition.


>
> where the first %s is replaced using String.format() by the name of
the source type, the second %s is replaced by the target type name, and
the third %s is replaced by the URL of a word list. Doing so, it's a
little bit for flexible. This is done once in
CasAnnotator_ImplBase.initialize().
>
> Then the script is executed with Ruta.apply(cas, script) in process().
But that means that the word list is read again for every CAS processed.
Is there any way to have RUTA use the word list as a
SharedResourceObject, so that it is read once only?

The problem is that Ruta.apply() creates a new descriptor and a new
analysis engine. You could integrate the ruta analysis engine in your
analysis engine as a field or something and call its process() in your
process() method (and initialize()). Then, the worlists should not be
reloaded for each process().

As for SharedResourceObject: This should be done, but it was never at
the top of my todo list. I hope I will find the time sometime.

You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
table using external resources. Could also work for you maybe. Maybe
Silvestre can share his experiences?

Best,

Peter

>
> Regards,
> Armin



Re: RUTA and shared resources

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi,

Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
> Hello!
>
> This a very short and simple gazetteer using RUTA.
>
> Document{->GREEDYANCHORING(true)};
> %s*{->MARKFAST(%s,'%s')};

First of all, I am sorry that I was not yet able to implement the greedy
matching for the gazetteers/wordlists. I have not forgotten it.
Just curious: does the rule perform as you expect/intend? I mean the
combination of greedy anchoring and the windowed stream caused by the
matching condition.


>
> where the first %s is replaced using String.format() by the name of
the source type, the second %s is replaced by the target type name, and
the third %s is replaced by the URL of a word list. Doing so, it's a
little bit for flexible. This is done once in
CasAnnotator_ImplBase.initialize().
>
> Then the script is executed with Ruta.apply(cas, script) in process().
But that means that the word list is read again for every CAS processed.
Is there any way to have RUTA use the word list as a
SharedResourceObject, so that it is read once only?

The problem is that Ruta.apply() creates a new descriptor and a new
analysis engine. You could integrate the ruta analysis engine in your
analysis engine as a field or something and call its process() in your
process() method (and initialize()). Then, the worlists should not be
reloaded for each process().

As for SharedResourceObject: This should be done, but it was never at
the top of my todo list. I hope I will find the time sometime.

You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
table using external resources. Could also work for you maybe. Maybe
Silvestre can share his experiences?

Best,

Peter

>
> Regards,
> Armin