You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Daya Chinthana Wimalasuriya <ch...@gmail.com> on 2016/01/07 22:21:01 UTC

Problem with optional rule elements

This looks like a bug to me.

I can get the optional rule elements to work when followed by COMMA, as follows.

DECLARE Feedback;
(W{REGEXP("change|modify|alter")} W? W? W? COMMA) {-> MARK(Feedback)};

This would match the following.
change the current rubric,
change the rubric,
change rubric,

But if I have a W or a W with REGEXP as 
the element following the optional elements, it 
doesn't work.

DECLARE Feedback;
(W{REGEXP("change|modify|alter")} 
W? W? W? W{REGEXP("rubric")}) {-> MARK(Feedback)};

Doesn't match the following. (it should)
change the current rubric
change the rubric

But it would match "change the the the rubric" 
(when words occur for optional elements)

I've seen that there has been a similar issue sometime back: 
https://issues.apache.org/jira/browse/UIMA-3338
Not sure whether this issue related to that bug.

I'm using UIMA RUTA workbench 2.3.1 on Eclipse Luna (4.4.2)


Re: Problem with optional rule elements

Posted by Daya Chinthana Wimalasuriya <ch...@gmail.com>.
Hi Peter,

W[0,3]? solves it. Thanks a lot! 

Regards,
Daya


Re: Problem with optional rule elements

Posted by Peter Klügl <pe...@averbis.com>.
Hi,

this is not directly a bug. A single question mark in ruta is greedy and
therefore the rule element consumes the matched annotations without
looking at the next one. The reluctant version of an optional rule
element can be specified with two question marks (an additional question
mark is always the reluctant version of the quantifier in ruta). This
would look like:

(W{REGEXP("change|modify|alter")}
    W?? W?? W?? W{REGEXP("rubric")}) {-> MARK(Feedback)};

The rule element evaluates the next rule element and macthes only if the
next one is not able to. But there is now a problem because stacked
reluctant quantifiers are not supported right now in ruta (you could
call it a known bug).

Is there a certain reason why you want to use three single optional rule
elements? You could use the reluctant min/max quantifier:

(W{REGEXP("change|modify|alter")}
   W[0,3]? W{REGEXP("rubric")}) {-> MARK(Feedback)};

If you do not want to use reluctant quantifiers, you can always
duplicate the conditions, but that's really ugly for more than one rule
element:

(W{REGEXP("change|modify|alter")}
    W?{-REGEXP("rubric")} W?{-REGEXP("rubric")} W?{-REGEXP("rubric")}
W{REGEXP("rubric")}) {-> MARK(Feedback)};

You can also use the wildcard with additional conditions:

(W{REGEXP("change|modify|alter")}
       #{CONTAINS(W, 0, 3), CONTAINS(W, 100, 100, true)}
W{REGEXP("rubric")}) {-> MARK(Feedback)};

Wildcards are normally faster, but with additional conditions, they are
can be slower.

Best,

Peter

Am 07.01.2016 um 22:21 schrieb Daya Chinthana Wimalasuriya:
> This looks like a bug to me.
>
> I can get the optional rule elements to work when followed by COMMA, as follows.
>
> DECLARE Feedback;
> (W{REGEXP("change|modify|alter")} W? W? W? COMMA) {-> MARK(Feedback)};
>
> This would match the following.
> change the current rubric,
> change the rubric,
> change rubric,
>
> But if I have a W or a W with REGEXP as 
> the element following the optional elements, it 
> doesn't work.
>
> DECLARE Feedback;
> (W{REGEXP("change|modify|alter")} 
> W? W? W? W{REGEXP("rubric")}) {-> MARK(Feedback)};
>
> Doesn't match the following. (it should)
> change the current rubric
> change the rubric
>
> But it would match "change the the the rubric" 
> (when words occur for optional elements)
>
> I've seen that there has been a similar issue sometime back: 
> https://issues.apache.org/jira/browse/UIMA-3338
> Not sure whether this issue related to that bug.
>
> I'm using UIMA RUTA workbench 2.3.1 on Eclipse Luna (4.4.2)
>