You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2013/08/19 15:28:13 UTC

svn commit: r1515405 - in /uima/sandbox/ruta/trunk/ruta-docbook/src/docbook: tools.ruta.language.syntax.xml tools.ruta.language.xml tools.ruta.overview.xml

Author: pkluegl
Date: Mon Aug 19 13:28:13 2013
New Revision: 1515405

URL: http://svn.apache.org/r1515405
Log:
UIMA-3115
- added some documentation about inlined rules

Modified:
    uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml
    uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml
    uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml

Modified: uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml?rev=1515405&r1=1515404&r2=1515405&view=diff
==============================================================================
--- uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml (original)
+++ uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml Mon Aug 19 13:28:13 2013
@@ -93,20 +93,17 @@ GroupAssignment        -> TypeExpression
 RuleElements           -> RuleElement+
 RuleElement            -> RuleElementType | RuleElementLiteral
                         | RuleElementComposed | RuleElementWildCard
-RuleElementType        ->  TypeExpression QuantifierPart?
-                                         ("{" Conditions?  Actions? "}")?
-RuleElementWithCA      ->  TypeExpression QuantifierPart?
-                                            "{" Conditions?  Actions? "}"
-RuleElementLiteral     ->  SimpleStringExpression QuantifierPart?
-                                          ("{" Conditions?  Actions? "}")?
-RuleElementComposed    -> ( RuleElement ("&" RuleElement)+
-                          | RuleElement ("|" RuleElement)+
-                          | "(" RuleElements ")") 
-                          QuantifierPart? ("{" Conditions?  Actions? "}")?
-RuleElementDisjunctive -> "(" (TypeExpression | SimpleStringExpression)
-                        ("|" (TypeExpression | SimpleStringExpression) )+
-                        (")" QuantifierPart? "{" Conditions?  Actions? }")?
-RuleElementWildCard    -> "#"("{" Conditions?  Actions? }")?
+RuleElementType        ->  TypeExpression OptionalRuleElementPart
+RuleElementWithCA      ->  TypeExpression OptionalRuleElementPart
+RuleElementLiteral     ->  SimpleStringExpression OptionalRuleElementPart
+RuleElementComposed    -> "(" RuleElement ("&" RuleElement)+ ")"
+                          | "(" RuleElement ("|" RuleElement)+ ")"
+                          | "(" RuleElements ")"
+                          OptionalRuleElementPart
+OptionalRuleElementPart-> QuantifierPart? ("{" Conditions?  Actions? "}")?
+                          InlinedRules?
+InlinedRules           -> ( "<-" | "->" ) "{" SimpleStatement+ "}"
+RuleElementWildCard    -> "#"("{" Conditions?  Actions? }")? InlinedRules?
 QuantifierPart         -> "*" | "*?" | "+" | "+?" | "?" | "??"
                         | "[" NumberExpression "," NumberExpression "]"
                         | "[" NumberExpression "," NumberExpression "]?"

Modified: uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml?rev=1515405&r1=1515404&r2=1515405&view=diff
==============================================================================
--- uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml (original)
+++ uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.xml Mon Aug 19 13:28:13 2013
@@ -343,6 +343,47 @@ Document{->CALL(MyScript.countNumberOfTy
     </section>
 
   </section>
+  
+  <section id="ugr.tools.ruta.language.inlined">
+    <title>Inlined rules</title>
+    <para>
+      A rule element can have a few optional parts, e.g., the quantifier or the curly brackets with conditions and actions.
+      After the part with the conditions and actions, the rule element can also contain an optional part with inlined rules.
+      These rules are applied in the context of the rule element similar to the rules within a block construct: The rules 
+      will try to match within the window specified by the current match of the rule element. There are two types of inlined rules.
+      If the curly brackets start with the symbol <quote>-></quote>, the inlined rules will only be applied for successful matches of the surrounding rule.
+      This behavior is very similar to the block construct. However, there are also some differences, e.g, inlined rules do not specify a 
+      namespace, may not contain declarations and cannot be called by other rules.
+      If the curly brackets start with the symbol <quote>-></quote>,
+      then the inlined rules are interpreted as some sort of conditions. The surrounding rules will only match, if one of the inlined rules was successfully applied.
+      The functionality introduced by inlined rules is illustrated with a few examples:
+    </para>
+    <programlisting><![CDATA[Sentence{} -> {NUM{-> NumBeforeWord} W;};
+Sentence{-> SentenceWithNumBeforeWord} <- {NUM W;};
+]]></programlisting>
+    <para>
+      The first rule in this example matches on each <quote>Sentence</quote> annotation and applies the inlined rule within each matched sentence. The inlined rule 
+      matches on numbers followed by a word and annotates the number with an annotation of the type <quote>NumBeforeWord</quote>. The second rule matches on each sentence 
+      and applies the inlined rule within each sentence. Note that the inlined rule contains no actions. The rule matches only successfully on a sentence if one of the inlined rules was
+      successfully applied. In this case, the sentence is only annotated with an annotation of the type <quote>SentenceWithNumBeforeWord</quote>, if the 
+      sentence contains a number followed by a word.
+    </para>
+
+    <programlisting><![CDATA[Document.language == "en"{} -> {
+  PERIOD #{} <- {
+      COLON COLON % COMMA COMMA;
+    }
+    PERIOD{-> SpecialPeriod};
+}    
+]]></programlisting>
+    <para>
+      This examples combines both types of inlined rules. First, the rule matches on document annotations with the language feature set to <quote>en</quote>. Only for those documents,
+      the first inner rule is applied. The inner rule matches on everything between two period, but only if the text span between the period fulfills two conditions: There must be two 
+      successive colons and two successive commas within the window of the matched part of the wildcard. Only if these constraints are fulfilled, then the last period is annotated with the type 
+      <quote>SpecialPeriod</quote>.
+    </para>  
+  </section>
+  
   <section id="ugr.tools.ruta.language.score">
     <title>Heuristic extraction using scoring rules</title>
     <para>

Modified: uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml?rev=1515405&r1=1515404&r2=1515405&view=diff
==============================================================================
--- uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml (original)
+++ uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml Mon Aug 19 13:28:13 2013
@@ -406,8 +406,9 @@ NUM{PARSE(moneyAmount)} SPECIAL{REGEXP("
     </para>
     
     <programlisting><![CDATA[DECLARE LessThan;
-    MoneyAmount.currency=="€"{-> MoneyAmount.currency="Euro"};
-    MoneyAmount{(MoneyAmount.amount<=100), MoneyAmount.currency=="Euro" -> LessThan};]]></programlisting>
+MoneyAmount.currency=="€"{-> MoneyAmount.currency="Euro"};
+MoneyAmount{(MoneyAmount.amount<=100), 
+    MoneyAmount.currency=="Euro" -> LessThan};]]></programlisting>
 
     <para>
       UIMA Ruta script files with many rules can quickly confuse the reader. The UIMA Ruta language, therefore, allows to import other script files in order to increase
@@ -484,6 +485,27 @@ BLOCK(ForEach) Sentence{} {
     </para>
 
     <para>
+      There are two more language constructs (<quote><![CDATA[->]]></quote> and <quote><![CDATA[<-]]></quote>) that allow to apply rules within a certain context. These rules are added to an arbitrary rule element 
+      and are called inlined rules. The first example interprets the inlined rules as actions. They are executed if the surrounding rule was able to match, 
+      which makes this one very similar to the block statement.
+    </para>
+
+    <programlisting><![CDATA[DECLARE SentenceWithNoLeadingNP;
+Sentence{}->{
+    Document{-STARTSWITH(NP) -> SentenceWithNoLeadingNP};
+};
+]]></programlisting>
+
+    <para>
+      The second one (<quote><![CDATA[<-]]></quote>) interprets the inlined rules as conditions. The surrounding rule can only match if at least one inlined rule was successfully applied.
+      In the following example, a sentence is annotated with the type SentenceWithNPNP, if there are two successive NP annotations within this sentence.
+    </para>
+    <programlisting><![CDATA[DECLARE SentenceWithNPNP;
+Sentence{-> SentenceWithNPNP}<-{
+    NP NP;
+};
+]]></programlisting>
+    <para>
       Let us take a closer look on what exactly the UIMA Ruta rules match. The following rule matches on a word followed by another word:
     </para>
     <programlisting><![CDATA[W W;]]></programlisting>
@@ -858,7 +880,7 @@ ae.process(cas);]]></programlisting>
                   <entry>
                     <link linkend='ugr.tools.ruta.ae.basic.parameter.createdBy'>createdBy</link>
                   </entry>
-                  <entry>Option to add additional information, which rule created a annotation.
+                  <entry>Option to add additional information, which rule created an annotation.
                   </entry>
                   <entry>Single Boolean</entry>
                 </row>