You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Ar...@bka.bund.de on 2013/05/21 15:49:38 UTC

Ruta - MARKFAST

Hello!

Is there any possibility to match strings like

nC.
v. Chr.

with MARKFAST?

Cheers,
Armin


Re: AW: Ruta - MARKFAST

Posted by Marshall Schor <ms...@schor.com>.
On 5/23/2013 9:03 AM, Armin.Wegner@bka.bund.de wrote:
> Hello Jörn,
>
> absolutely right. But for now I'm still a nooby. That's why I'm asking so much.

Sometimes, noobies make better contributions, because they write for other
noobies :-).  I would encourage you to contribute, anyways.  You can mark up
your contribution with little tags like <?> etc. to indicate you're not sure an
whoever integrates your patch in should pay more attention.

-Marshall

>
> Cheers,
> Armin
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Jörn Kottmann [mailto:kottmann@gmail.com] 
> Gesendet: Donnerstag, 23. Mai 2013 14:24
> An: user@uima.apache.org
> Betreff: Re: Ruta - MARKFAST
>
> On 05/23/2013 01:19 PM, Peter Klügl wrote:
>> That is the official documentation. An up-to-date version that 
>> describes the new features since 2.0.0 can be found in the trunk.
>>
>> I know that there are many passages and section that need to be added 
>> or improved, but it is hard to find enough time for it.
> Another way to improve the documentation is to contribute patches for it, if you use a specific feature of Ruta and know it well enough, just take 10 minutes, write some documentation, open a jira issue and attach the patch to it.
>
> Jörn
>
>


AW: Ruta - MARKFAST

Posted by Ar...@bka.bund.de.
Hello Jörn,

absolutely right. But for now I'm still a nooby. That's why I'm asking so much.

Cheers,
Armin



-----Ursprüngliche Nachricht-----
Von: Jörn Kottmann [mailto:kottmann@gmail.com] 
Gesendet: Donnerstag, 23. Mai 2013 14:24
An: user@uima.apache.org
Betreff: Re: Ruta - MARKFAST

On 05/23/2013 01:19 PM, Peter Klügl wrote:
> That is the official documentation. An up-to-date version that 
> describes the new features since 2.0.0 can be found in the trunk.
>
> I know that there are many passages and section that need to be added 
> or improved, but it is hard to find enough time for it.

Another way to improve the documentation is to contribute patches for it, if you use a specific feature of Ruta and know it well enough, just take 10 minutes, write some documentation, open a jira issue and attach the patch to it.

Jörn


Re: Ruta - MARKFAST

Posted by Jörn Kottmann <ko...@gmail.com>.
On 05/23/2013 01:19 PM, Peter Klügl wrote:
> That is the official documentation. An up-to-date version that describes
> the new features since 2.0.0 can be found in the trunk.
>
> I know that there are many passages and section that need to be added or
> improved, but it is hard to find enough time for it.

Another way to improve the documentation is to contribute patches for it,
if you use a specific feature of Ruta and know it well enough, just take 
10 minutes,
write some documentation, open a jira issue and attach the patch to it.

Jörn

Re: AW: AW: Ruta - MARKFAST

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi,

On 23.05.2013 13:06, Armin.Wegner@bka.bund.de wrote:
> Hello Peter,
>
> Now that I understand it, it's a nice feature.
>
> By the way, where can I find a good documentation of Ruta? I only know of http://people.apache.org/~pkluegl/site/textmarker-current/tools.textmarker.book.html 

That is the official documentation. An up-to-date version that describes
the new features since 2.0.0 can be found in the trunk.

I know that there are many passages and section that need to be added or
improved, but it is hard to find enough time for it.

There is ongoing work by others to improve the description of the java
integration for uses cases in part of speech tagging, and we are
planning to provide screencasts for the Ruta Workbench.

Are there any specific passages that should be improved or added? I also
easily forget to add important information (since I implemented it).

> and http://tmwiki.informatik.uni-wuerzburg.de/. A more detailed description would be appreciated.

This wiki refers to the old version hosted at sourceforge and should not
be referred to.

Best,

Peter

> Thanks,
> Armin
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de] 
> Gesendet: Mittwoch, 22. Mai 2013 15:09
> An: user@uima.apache.org
> Betreff: Re: AW: Ruta - MARKFAST
>
> Hi,
>
> yes this example won't work without changes, because the word list is sensitive to white spaces, e.g., you distinguish between "n.C." and "n.
> C.". I know this sound like a bug, but it is rather a feature.
>
> In order to solve your problem you could either remove all spaces in your word list, you could add "n.Chr." and "v.Chr." (without space) to your word list, or you could retain the spaces before calling MARKFAST (Document{-> RETAINTYPE(SPACE)};)
>
> The short explanation for this is that the action and the word list won't see any spaces with the default filtering settings, thus they check on a candidate like "n.Chr". However, in the trie, there is no "h"
> in that path without space before the "C".
>
> Best,
>
> Peter
>
> On 22.05.2013 10:52, Armin.Wegner@bka.bund.de wrote:
>> Hi Peter,
>>
>> your example does work perfectly fine. But try this as word list and input document:
>>
>> nach Christus
>> nach der Zeitenwende
>> n. C.
>> n.C.
>> nC.
>> n. Chr.
>> n. d. Z.
>> n.d.Z.
>> unserer Zeit
>> unserer Zeitrechnung
>> u. Z.
>> u.Z.
>> v. C.
>> v.C.
>> vC.
>> v. Chr.
>> v. d. Z.
>> v.d.Z.
>> vor Christus
>> vor der Zeitenwende
>> vor unserer Zeitrechnung
>> v. u. Z.
>> v.u.Z.
>>
>> "n. Chr." and "v. Chr." are not recognized. Do you have the same result?
>>
>> Cheers,
>> Armin
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de]
>> Gesendet: Dienstag, 21. Mai 2013 19:58
>> An: user@uima.apache.org
>> Betreff: Re: Ruta - MARKFAST
>>
>> Hi,
>>
>> On 21.05.2013 15:49, Armin.Wegner@bka.bund.de wrote:
>>> Hello!
>>>
>>> Is there any possibility to match strings like
>>>
>>> nC.
>>> v. Chr.
>>>
>>> with MARKFAST?
>> Yes. Did you observe any problems? I just tested it with:
>>
>> Wordlist:
>> nC.
>> v. Chr.
>>
>> Input document:
>> nC.
>> v. Chr.
>> n C .
>> v . Chr.
>>
>> Script:
>> PACKAGE uima.ruta.tests;
>> WORDLIST testList = 'test.txt';
>> DECLARE Test;
>> Document{->MARKFAST(Test, testList)};
>>
>> ... creates four annotations of type test.
>>
>> Best,
>>
>> Peter
>>
>>
>>
>>> Cheers,
>>> Armin


AW: AW: Ruta - MARKFAST

Posted by Ar...@bka.bund.de.
Hello Peter,

Now that I understand it, it's a nice feature.

By the way, where can I find a good documentation of Ruta? I only know of http://people.apache.org/~pkluegl/site/textmarker-current/tools.textmarker.book.html and http://tmwiki.informatik.uni-wuerzburg.de/. A more detailed description would be appreciated.

Thanks,
Armin

-----Ursprüngliche Nachricht-----
Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de] 
Gesendet: Mittwoch, 22. Mai 2013 15:09
An: user@uima.apache.org
Betreff: Re: AW: Ruta - MARKFAST

Hi,

yes this example won't work without changes, because the word list is sensitive to white spaces, e.g., you distinguish between "n.C." and "n.
C.". I know this sound like a bug, but it is rather a feature.

In order to solve your problem you could either remove all spaces in your word list, you could add "n.Chr." and "v.Chr." (without space) to your word list, or you could retain the spaces before calling MARKFAST (Document{-> RETAINTYPE(SPACE)};)

The short explanation for this is that the action and the word list won't see any spaces with the default filtering settings, thus they check on a candidate like "n.Chr". However, in the trie, there is no "h"
in that path without space before the "C".

Best,

Peter

On 22.05.2013 10:52, Armin.Wegner@bka.bund.de wrote:
> Hi Peter,
>
> your example does work perfectly fine. But try this as word list and input document:
>
> nach Christus
> nach der Zeitenwende
> n. C.
> n.C.
> nC.
> n. Chr.
> n. d. Z.
> n.d.Z.
> unserer Zeit
> unserer Zeitrechnung
> u. Z.
> u.Z.
> v. C.
> v.C.
> vC.
> v. Chr.
> v. d. Z.
> v.d.Z.
> vor Christus
> vor der Zeitenwende
> vor unserer Zeitrechnung
> v. u. Z.
> v.u.Z.
>
> "n. Chr." and "v. Chr." are not recognized. Do you have the same result?
>
> Cheers,
> Armin
>
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de]
> Gesendet: Dienstag, 21. Mai 2013 19:58
> An: user@uima.apache.org
> Betreff: Re: Ruta - MARKFAST
>
> Hi,
>
> On 21.05.2013 15:49, Armin.Wegner@bka.bund.de wrote:
>> Hello!
>>
>> Is there any possibility to match strings like
>>
>> nC.
>> v. Chr.
>>
>> with MARKFAST?
> Yes. Did you observe any problems? I just tested it with:
>
> Wordlist:
> nC.
> v. Chr.
>
> Input document:
> nC.
> v. Chr.
> n C .
> v . Chr.
>
> Script:
> PACKAGE uima.ruta.tests;
> WORDLIST testList = 'test.txt';
> DECLARE Test;
> Document{->MARKFAST(Test, testList)};
>
> ... creates four annotations of type test.
>
> Best,
>
> Peter
>
>
>
>> Cheers,
>> Armin



Re: AW: Ruta - MARKFAST

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi,

yes this example won't work without changes, because the word list is
sensitive to white spaces, e.g., you distinguish between "n.C." and "n.
C.". I know this sound like a bug, but it is rather a feature.

In order to solve your problem you could either remove all spaces in
your word list, you could add "n.Chr." and "v.Chr." (without space) to
your word list, or you could retain the spaces before calling MARKFAST
(Document{-> RETAINTYPE(SPACE)};)

The short explanation for this is that the action and the word list
won't see any spaces with the default filtering settings, thus they
check on a candidate like "n.Chr". However, in the trie, there is no "h"
in that path without space before the "C".

Best,

Peter

On 22.05.2013 10:52, Armin.Wegner@bka.bund.de wrote:
> Hi Peter,
>
> your example does work perfectly fine. But try this as word list and input document:
>
> nach Christus
> nach der Zeitenwende
> n. C.
> n.C.
> nC.
> n. Chr.
> n. d. Z.
> n.d.Z.
> unserer Zeit
> unserer Zeitrechnung
> u. Z.
> u.Z.
> v. C.
> v.C.
> vC.
> v. Chr.
> v. d. Z.
> v.d.Z.
> vor Christus
> vor der Zeitenwende
> vor unserer Zeitrechnung
> v. u. Z.
> v.u.Z.
>
> "n. Chr." and "v. Chr." are not recognized. Do you have the same result?
>
> Cheers,
> Armin
>
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de] 
> Gesendet: Dienstag, 21. Mai 2013 19:58
> An: user@uima.apache.org
> Betreff: Re: Ruta - MARKFAST
>
> Hi,
>
> On 21.05.2013 15:49, Armin.Wegner@bka.bund.de wrote:
>> Hello!
>>
>> Is there any possibility to match strings like
>>
>> nC.
>> v. Chr.
>>
>> with MARKFAST?
> Yes. Did you observe any problems? I just tested it with:
>
> Wordlist:
> nC.
> v. Chr.
>
> Input document:
> nC.
> v. Chr.
> n C .
> v . Chr.
>
> Script:
> PACKAGE uima.ruta.tests;
> WORDLIST testList = 'test.txt';
> DECLARE Test;
> Document{->MARKFAST(Test, testList)};
>
> ... creates four annotations of type test.
>
> Best,
>
> Peter
>
>
>
>> Cheers,
>> Armin


AW: Ruta - MARKFAST

Posted by Ar...@bka.bund.de.
Hi Peter,

your example does work perfectly fine. But try this as word list and input document:

nach Christus
nach der Zeitenwende
n. C.
n.C.
nC.
n. Chr.
n. d. Z.
n.d.Z.
unserer Zeit
unserer Zeitrechnung
u. Z.
u.Z.
v. C.
v.C.
vC.
v. Chr.
v. d. Z.
v.d.Z.
vor Christus
vor der Zeitenwende
vor unserer Zeitrechnung
v. u. Z.
v.u.Z.

"n. Chr." and "v. Chr." are not recognized. Do you have the same result?

Cheers,
Armin


-----Ursprüngliche Nachricht-----
Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de] 
Gesendet: Dienstag, 21. Mai 2013 19:58
An: user@uima.apache.org
Betreff: Re: Ruta - MARKFAST

Hi,

On 21.05.2013 15:49, Armin.Wegner@bka.bund.de wrote:
> Hello!
>
> Is there any possibility to match strings like
>
> nC.
> v. Chr.
>
> with MARKFAST?

Yes. Did you observe any problems? I just tested it with:

Wordlist:
nC.
v. Chr.

Input document:
nC.
v. Chr.
n C .
v . Chr.

Script:
PACKAGE uima.ruta.tests;
WORDLIST testList = 'test.txt';
DECLARE Test;
Document{->MARKFAST(Test, testList)};

... creates four annotations of type test.

Best,

Peter



> Cheers,
> Armin



Re: Ruta - MARKFAST

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi,

On 21.05.2013 15:49, Armin.Wegner@bka.bund.de wrote:
> Hello!
>
> Is there any possibility to match strings like
>
> nC.
> v. Chr.
>
> with MARKFAST?

Yes. Did you observe any problems? I just tested it with:

Wordlist:
nC.
v. Chr.

Input document:
nC.
v. Chr.
n C .
v . Chr.

Script:
PACKAGE uima.ruta.tests;
WORDLIST testList = 'test.txt';
DECLARE Test;
Document{->MARKFAST(Test, testList)};

... creates four annotations of type test.

Best,

Peter



> Cheers,
> Armin