You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Nikolai Krot <ta...@gmail.com> on 2019/06/04 09:29:45 UTC

Text traversal order

Hi all,

I have an example of rules that dont quite work, which leads me to
realization that I dont understand how text is traversed in ruta and how
rules are applied.

Below is a simplified example of what I m doing.

Say, i have a text that has "words" like this

1 aa+bb
2 aa / aa+bb
3 aa /aa /aa+bb

I want to annotate the tokens as follows

1 FOUND
2 FOUND / FOUND
3 FOUND / FOUND / FOUND

and there can be longer sequences separated by a slash.

These are my rules:

"aa" "+" "bb"  {->MARK(FOUND,1,3)};
"aa" "/" FOUND {->MARK(FOUND, 1)};

In other words: the rightmost token of the sequence is annotated first as
FOUND. and this becomes an evidence to annotate preceeding tokens as FOUND
as well.

The thing is that only cases 1 and 2 are fully annotated. The case 3 is
annotated only partially.

1 FOUND
2 FOUND / FOUND
3 aa / FOUND / FOUND

Seems that the second rule is applied only once, though I expect it to be
applied many times in a loop as long as there is a match. The case 3 should
work as soon as the case 2 has been annotated, because case 3 is an
extension of case 2.

Case 3 starts to work when the second rule is duplicated. Which is not a
good solution, in my opinion. My question is: is the above by design (rule
matching does not restart after a match) or is it a bug in ruta? Or maybe
there is a configuration option to choose a behaviour?

Thank you in advance and best regards,
Nikolai

Re: Text traversal order

Posted by Peter Klügl <pe...@averbis.com>.
Hi,

yes, it is intentional that the action of a rule will not automatically
restarts the matching process due to different reasons.


The rule matching is sensitive to the consequences of a rule during the
matching process (also across rule matches in a rule apply) but not
concerning the anchors of the rule matching. A bit simplified, the rule
matching behaves like an iterator for each anchoring matching condition
(the first rule element in most cases) with following iterators for
sequential patterns creating new rule matches (matching alternatives).
This means that failed match on the first "aa" consumes that position
and it is not investigated again, even if it could be successful with
new facts/annotations created by later rule matches.


This does not mean that you cannot specify such patterns. You could use
different rules to achieve the desired result. My first guess would be
something like:


("aa" "+" "bb"){-> FOUND};
("aa"{-> FOUND} "/")[1,10] @FOUND;

I think there are also other ways to solve it.

Best,

Peter


Am 04.06.2019 um 11:29 schrieb Nikolai Krot:
> Hi all,
>
> I have an example of rules that dont quite work, which leads me to
> realization that I dont understand how text is traversed in ruta and how
> rules are applied.
>
> Below is a simplified example of what I m doing.
>
> Say, i have a text that has "words" like this
>
> 1 aa+bb
> 2 aa / aa+bb
> 3 aa /aa /aa+bb
>
> I want to annotate the tokens as follows
>
> 1 FOUND
> 2 FOUND / FOUND
> 3 FOUND / FOUND / FOUND
>
> and there can be longer sequences separated by a slash.
>
> These are my rules:
>
> "aa" "+" "bb"  {->MARK(FOUND,1,3)};
> "aa" "/" FOUND {->MARK(FOUND, 1)};
>
> In other words: the rightmost token of the sequence is annotated first as
> FOUND. and this becomes an evidence to annotate preceeding tokens as FOUND
> as well.
>
> The thing is that only cases 1 and 2 are fully annotated. The case 3 is
> annotated only partially.
>
> 1 FOUND
> 2 FOUND / FOUND
> 3 aa / FOUND / FOUND
>
> Seems that the second rule is applied only once, though I expect it to be
> applied many times in a loop as long as there is a match. The case 3 should
> work as soon as the case 2 has been annotated, because case 3 is an
> extension of case 2.
>
> Case 3 starts to work when the second rule is duplicated. Which is not a
> good solution, in my opinion. My question is: is the above by design (rule
> matching does not restart after a match) or is it a bug in ruta? Or maybe
> there is a configuration option to choose a behaviour?
>
> Thank you in advance and best regards,
> Nikolai
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó