You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2017/07/10 14:25:14 UTC

svn commit: r1801474 - /uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml

Author: pkluegl
Date: Mon Jul 10 14:25:14 2017
New Revision: 1801474

URL: http://svn.apache.org/viewvc?rev=1801474&view=rev
Log:
UIMA-5458 - added doc

Modified:
    uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml

Modified: uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml
URL: http://svn.apache.org/viewvc/uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml?rev=1801474&r1=1801473&r2=1801474&view=diff
==============================================================================
--- uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml (original)
+++ uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml Mon Jul 10 14:25:14 2017
@@ -135,8 +135,64 @@ Dr.JoachimBaumeister
     </para>
 
   </section>
+
+  <section id="ugr.tools.ruta.language.wildcard">
+    <title>Wildcard #</title>
+    <para>
+      The wildcard <code>#</code> is a special matching condition of a rule element, 
+      which does not match itself but uses the next rule element to determine its match.
+      It's behavior is similar to a generic rule element with a reluctant, not restricted quantifier like
+      <code>ANY+?</code> but it much more efficient since no additional annotations have to be matched.
+      The functionality of the wildcard is illustrated with following examples:
+      
+      <programlisting><![CDATA[PERIOD #{-> Sentence} PERIOD;]]></programlisting>
+      
+      In this example, everything in beteen two periods is annotated with an annotation of the type
+      <code>Sentence</code>. This rule is much more efficient than a rule like 
+      <code>PERIOD ANY+{-PARTOF(PERIOD)} PERIOD;</code> since it only navigated in the index of PERIOD annotations 
+      and does not match on all tokens.
+      
+      The wildcard is a normal matching condition and can be used as any other matching condition. If the sentence 
+      should include the period, the rule would look like:
+      
+      <programlisting><![CDATA[PERIOD (# PERIOD){-> Sentence};]]></programlisting>
+      
+      This rule creates only annotations after a period. If the wildcard is used as an anchor of the rule, 
+      e.g., is the first rule element and no manual anchor is specified, then it starts to match at the beginning 
+      of the doucment or current window.
+      
+      <programlisting><![CDATA[(# PERIOD){-> Sentence};]]></programlisting>
+      
+      This rule creates a Sentence annotation starting at the begin of the document ending with the first period.
+      If the rule lements are swicthed, the result is quite different because of the starting anchor of the rule:
+      
+      <programlisting><![CDATA[(PERIOD #){-> Sentence};]]></programlisting>
+      
+      Here, one annotation of the type Sentence is create for each PERIOD annotation starting with the period and 
+      ending at the end of the document.
+      
+      Currently, optional rule elements after wildcards are not optional.
+    </para>
+  </section>
   
-  
+  <section id="ugr.tools.ruta.language.labels">
+    <title>Label expressions</title>
+    <para>
+      Rule elements can be extended with labels, which introduce a new local variable storing one or 
+      multiple annotations - the annotations matched by the matching condition of the rule element. 
+      The name of the variable is the short identifier before the colon in front of the matching condition, e.g., 
+      in <code>sw:SW</code>, <code>SW</code> is the matching condition and <code>sw</code> is the name of the local variable.
+      The variable will be assigned when the rule element tries to match (also when it fails afterall) 
+      and can be utilzed in all other language elements afterwards.
+      The functionality of the label expressions is illustrated with following examples:
+      
+      <programlisting><![CDATA[sw1:SW sw2:SW{sw1.end=sw2.begin};]]></programlisting>
+      
+      This rule matches on two consecutive small-written words, but matches only if there is no space inbetween them.
+      
+      Label expression can also be used across <xref linkend='ugr.tools.ruta.language.inlined' />.
+    </para>
+  </section>
   
   <section id="ugr.tools.ruta.language.blocks">
     <title>Blocks</title>