You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2017/07/10 14:25:14 UTC
svn commit: r1801474 -
/uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml
Author: pkluegl
Date: Mon Jul 10 14:25:14 2017
New Revision: 1801474
URL: http://svn.apache.org/viewvc?rev=1801474&view=rev
Log:
UIMA-5458 - added doc
Modified:
uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml
Modified: uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml
URL: http://svn.apache.org/viewvc/uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml?rev=1801474&r1=1801473&r2=1801474&view=diff
==============================================================================
--- uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml (original)
+++ uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml Mon Jul 10 14:25:14 2017
@@ -135,8 +135,64 @@ Dr.JoachimBaumeister
</para>
</section>
+
+ <section id="ugr.tools.ruta.language.wildcard">
+ <title>Wildcard #</title>
+ <para>
+ The wildcard <code>#</code> is a special matching condition of a rule element,
+ which does not match itself but uses the next rule element to determine its match.
+ It's behavior is similar to a generic rule element with a reluctant, not restricted quantifier like
+ <code>ANY+?</code> but it much more efficient since no additional annotations have to be matched.
+ The functionality of the wildcard is illustrated with following examples:
+
+ <programlisting><![CDATA[PERIOD #{-> Sentence} PERIOD;]]></programlisting>
+
+ In this example, everything in beteen two periods is annotated with an annotation of the type
+ <code>Sentence</code>. This rule is much more efficient than a rule like
+ <code>PERIOD ANY+{-PARTOF(PERIOD)} PERIOD;</code> since it only navigated in the index of PERIOD annotations
+ and does not match on all tokens.
+
+ The wildcard is a normal matching condition and can be used as any other matching condition. If the sentence
+ should include the period, the rule would look like:
+
+ <programlisting><![CDATA[PERIOD (# PERIOD){-> Sentence};]]></programlisting>
+
+ This rule creates only annotations after a period. If the wildcard is used as an anchor of the rule,
+ e.g., is the first rule element and no manual anchor is specified, then it starts to match at the beginning
+ of the doucment or current window.
+
+ <programlisting><![CDATA[(# PERIOD){-> Sentence};]]></programlisting>
+
+ This rule creates a Sentence annotation starting at the begin of the document ending with the first period.
+ If the rule lements are swicthed, the result is quite different because of the starting anchor of the rule:
+
+ <programlisting><![CDATA[(PERIOD #){-> Sentence};]]></programlisting>
+
+ Here, one annotation of the type Sentence is create for each PERIOD annotation starting with the period and
+ ending at the end of the document.
+
+ Currently, optional rule elements after wildcards are not optional.
+ </para>
+ </section>
-
+ <section id="ugr.tools.ruta.language.labels">
+ <title>Label expressions</title>
+ <para>
+ Rule elements can be extended with labels, which introduce a new local variable storing one or
+ multiple annotations - the annotations matched by the matching condition of the rule element.
+ The name of the variable is the short identifier before the colon in front of the matching condition, e.g.,
+ in <code>sw:SW</code>, <code>SW</code> is the matching condition and <code>sw</code> is the name of the local variable.
+ The variable will be assigned when the rule element tries to match (also when it fails afterall)
+ and can be utilzed in all other language elements afterwards.
+ The functionality of the label expressions is illustrated with following examples:
+
+ <programlisting><![CDATA[sw1:SW sw2:SW{sw1.end=sw2.begin};]]></programlisting>
+
+ This rule matches on two consecutive small-written words, but matches only if there is no space inbetween them.
+
+ Label expression can also be used across <xref linkend='ugr.tools.ruta.language.inlined' />.
+ </para>
+ </section>
<section id="ugr.tools.ruta.language.blocks">
<title>Blocks</title>