You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Peter Klügl (JIRA)" <de...@uima.apache.org> on 2013/07/11 16:45:48 UTC

[jira] [Created] (UIMA-3071) Break up sequential matching in Ruta rules

Peter Klügl created UIMA-3071:
---------------------------------

             Summary: Break up sequential matching in Ruta rules
                 Key: UIMA-3071
                 URL: https://issues.apache.org/jira/browse/UIMA-3071
             Project: UIMA
          Issue Type: New Feature
          Components: ruta
    Affects Versions: 2.0.2ruta
            Reporter: Peter Klügl
            Assignee: Peter Klügl
             Fix For: 2.0.2ruta


Break up sequential matching in Ruta rules: Right now a list of rule elements specify a sequential pattern. In some use cases, however, the sequence of annoations is not as important as the their existence. An example: A rule should fire, if some complex annotations patterns occur within a sentence whereas the location or order is not important. Two use cases can be distinghuished: "and" and "or". A disjunctive matcher is already implemented, but only suppport simple matching conditions, but not complex patterns (e.g., additional conditions) for the alternatives. I am still thinking about the best syntax for this. Right now, my favorite is a special character that separates the rule elements. An example:

{noformat}
BLOCK(b) Sentence{}{
  CW PERIOD & SW COLON;
}
{noformat}
... a sentence that contains a capitalized word followed by a period AND a small written word followed by a colon, regardless of where they occur in the sentence.

Maybe also something like the follwoing is then possible:

{noformat}
NUM (CW{REGEXP("A") -> MARK(NUMA,1,2)} | CW{REGEXP("B") -> MARK(NUMB,1,2)});
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira