You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2015/04/29 14:24:44 UTC
svn commit: r1676730 - in /uima/ruta/trunk/ruta-docbook/src/docbook: tools.ruta.book.xml tools.ruta.howtos.xml tools.ruta.overview.xml tools.ruta.workbench.textruler.xml

Author: pkluegl
Date: Wed Apr 29 12:24:35 2015
New Revision: 1676730

URL: http://svn.apache.org/r1676730
Log:
UIMA-3863 UIMA-4320 UIMA-3650
- added new chapter for howtos
- moved some section there
- added documentation for ruta maven plugin

Added:
    uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.howtos.xml   (with props)
Modified:
    uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.book.xml
    uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml
    uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.workbench.textruler.xml

Modified: uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.book.xml
URL: http://svn.apache.org/viewvc/uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.book.xml?rev=1676730&r1=1676729&r2=1676730&view=diff
==============================================================================
--- uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.book.xml (original)
+++ uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.book.xml Wed Apr 29 12:24:35 2015
@@ -31,4 +31,6 @@ under the License.
   
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.ruta.workbench.xml"/>
   
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.ruta.howtos.xml"/>
+  
 </book>

Added: uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.howtos.xml
URL: http://svn.apache.org/viewvc/uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.howtos.xml?rev=1676730&view=auto
==============================================================================
--- uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.howtos.xml (added)
+++ uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.howtos.xml Wed Apr 29 12:24:35 2015
@@ -0,0 +1,521 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY imgroot "images/tools/tools.ruta/" >
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
+%uimaents;
+]>
+<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor 
+	license agreements. See the NOTICE file distributed with this work for additional 
+	information regarding copyright ownership. The ASF licenses this file to 
+	you under the Apache License, Version 2.0 (the "License"); you may not use 
+	this file except in compliance with the License. You may obtain a copy of 
+	the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required 
+	by applicable law or agreed to in writing, software distributed under the 
+	License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 
+	OF ANY KIND, either express or implied. See the License for the specific 
+	language governing permissions and limitations under the License. -->
+<chapter id="ugr.tools.ruta.howtos">
+	<title>Apache UIMA Ruta HowTos</title>
+	<para>This chapter contains a selection of some use cases and HowTos
+		for UIMA Ruta.
+	</para>
+	<section id="ugr.tools.ruta.ae.basic.apply">
+		<title>Apply UIMA Ruta Analysis Engine in plain Java</title>
+		<para>
+			Let us assume that the reader wrote the UIMA Ruta rules using
+			the UIMA
+			Ruta Workbench, which already creates correctly configured
+			descriptors.
+			In this case, the following java code can be used to
+			apply the UIMA
+			Ruta script.
+		</para>
+		<programlisting><![CDATA[File specFile = new File("pathToMyWorkspace/MyProject/descriptor/"+
+    "my/package/MyScriptEngine.xml");
+XMLInputSource in = new XMLInputSource(specFile);
+ResourceSpecifier specifier = UIMAFramework.getXMLParser().
+    parseResourceSpecifier(in);
+// for import by name... set the datapath in the ResourceManager
+AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);
+CAS cas = ae.newCAS();
+cas.setDocumentText("This is my document.");
+ae.process(cas);]]></programlisting>
+		<note>
+			<para>
+				The UIMA Ruta Analysis Engine utilizes type priorities. If the
+				CAS
+				object is
+				not created using the UIMA Ruta Analysis Engine
+				descriptor by other
+				means, then please
+				provide the necessary type
+				priorities for a valid execution of the UIMA
+				Ruta rules.
+			</para>
+		</note>
+		<para>
+			If the UIMA Ruta script was written, for example, with a common text
+			editor and no configured descriptors are yet available,
+			then the
+			following java code can be used, which, however, is only
+			applicable
+			for executing single script files that do not import
+			additional
+			components or scripts. In this case the other parameters,
+			e.g.,
+			<quote>additionalScripts</quote>
+			, need to be configured correctly.
+		</para>
+		<programlisting><![CDATA[URL aedesc = RutaEngine.class.getResource("BasicEngine.xml");
+XMLInputSource inae = new XMLInputSource(aedesc);
+ResourceSpecifier specifier = UIMAFramework.getXMLParser().
+    parseResourceSpecifier(inae);
+ResourceManager resMgr = UIMAFramework.newDefaultResourceManager();
+AnalysisEngineDescription aed = (AnalysisEngineDescription) specifier;
+TypeSystemDescription basicTypeSystem = aed.getAnalysisEngineMetaData().
+    getTypeSystem();
+
+Collection<TypeSystemDescription> tsds = 
+    new ArrayList<TypeSystemDescription>();
+tsds.add(basicTypeSystem);
+// add some other type system descriptors 
+// that are needed by your script file   
+TypeSystemDescription mergeTypeSystems = CasCreationUtils.
+    mergeTypeSystems(tsds);
+aed.getAnalysisEngineMetaData().setTypeSystem(mergeTypeSystems);
+aed.resolveImports(resMgr);
+        
+AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(aed, 
+    resMgr, null);
+File scriptFile = new File("path/to/file/MyScript.ruta");
+ae.setConfigParameterValue(RutaEngine.PARAM_SCRIPT_PATHS, 
+    new String[] { scriptFile.getParentFile().getAbsolutePath() });
+String name = scriptFile.getName().substring(0, 
+    scriptFile.getName().length() - 5);
+ae.setConfigParameterValue(RutaEngine.PARAM_MAIN_SCRIPT, name);
+ae.reconfigure();
+CAS cas = ae.newCAS();
+cas.setDocumentText("This is my document.");
+ae.process(cas);]]></programlisting>
+		<para>
+			There is also a convenience implementation for applying simple
+			scripts, which do not introduce new types. The following java code
+			applies a simple rule
+			<quote>T1 SW{-> MARK(T2)};</quote>
+			on the given CAS. Note that the types need to be already defined in
+			the type system of the CAS.
+		</para>
+		<programlisting><![CDATA[Ruta.apply(cas, "T1 SW{-> MARK(T2)};");]]></programlisting>
+	</section>
+	<section id="ugr.tools.ruta.integration">
+		<title>Integrating UIMA Ruta in an existing UIMA Annotator</title>
+		<para>This section provides a walk-through tutorial on integrating
+			Ruta in an existing UIMA annotator. In our artificial example
+			we will
+			use Ruta rules to post-process the output of a
+			Part-of-Speech tagger.
+			The POS tagger is a UIMA annotator that iterates over
+			sentences
+			and
+			tokens and updates the posTag field of each Token with a part of
+			speech. For example, given this text...
+		</para>
+		<programlisting>The quick brown fox receives many bets.
+The fox places many bets.
+The fox gets up early.
+The rabbit made up this story.</programlisting>
+		<para>...it assigns the posTag JJ (adjective) to the token
+			&quot;brown&quot; , the posTag NN (common noun) to the
+			token
+			&quot;fox&quot; and the tag VBZ (verb, 3rd person
+			singular present) to
+			the token &quot;receives&quot; in the first sentence.
+		</para>
+		<para>We have noticed that the tagger sometimens fails to disambiguate
+			NNS (common noun plural) and
+			VBZ tags, as in the second sentence. The
+			word &quot;up&quot; also seems
+			to confuse the tagger,
+			which always
+			assigns it an RB (adverb) tag, even when it is a particle
+			(RP)
+			following a verb, as in
+			the third and fourth sentences:
+		</para>
+		<programlisting>The|DT quick|JJ brown|JJ fox|NN receives|VBZ many|JJ bets|NNS .|.
+The|DT fox|NN places|NNS many|JJ bets|NNS .|.
+The|DT fox|NN gets|VBZ up|RB early|RB .|.
+The|DT rabbit|NN made|VBD up|RB this|DT story|NN .|.</programlisting>
+		<para>Let&apos;s imagine that after applying every possible approach
+			available in the POS tagging literature, our tagger still generates
+			these and some other errors. We decide to write a few Ruta
+			rules to
+			post-process the output of the tagger.
+		</para>
+		<section id="ugr.tools.ruta.ae.integration.mvn">
+			<title>Adding Ruta to our Annotator</title>
+			<para>The POS tagger is being developed as a Maven-based project.
+				Since Ruta maven artifacts are available on Maven Central, we
+				add the
+				following dependency to the project&apos;s pom.xml. The
+				functionalities described in this section
+				require a version of Ruta
+				equal to or greater than 2.1.0.
+			</para>
+			<programlisting>&lt;dependency&gt;
+  &lt;groupId&gt;org.apache.uima&lt;/groupId&gt;
+  &lt;artifactId&gt;ruta-core&lt;/artifactId&gt;
+  &lt;version&gt;[2.0.2,)&lt;/version&gt;
+&lt;/dependency&gt;</programlisting>
+			<para>We also take care that the Ruta basic typesystem is loaded when
+				our annotator is initialized. The Ruta typesystem descriptors are
+				available from
+				ruta-core/src/main/resources/org/apache/uima/ruta/engine/
+			</para>
+		</section>
+		<section id="ugr.tools.ruta.ae.integration.loading">
+			<title>Developing Ruta rules and applying them from inside Java code
+			</title>
+			<para>We are now ready to write some rules. The ones we develop for
+				fixing the two errors look like this:
+			</para>
+			<programlisting>Token.posTag =="NN" Token.posTag=="NNS"{-> Token.posTag="VBZ"}
+    Token.posTag=="JJ";
+Token{REGEXP(Token.posTag, "VB(.?)")} 
+    Token.posTag=="RB"{REGEXP("up")-> Token.posTag="RP"};  </programlisting>
+			<para>That
+				is, we change a Token's NNS tag to VBZ, if it is surrounded by a
+				Token tagged as NN and a Token tagged as JJ. We also change an RB
+				tag for an &quot;up&quot; token to RP, if &quot;up&quot; is preceded
+				by any verbal tag (VB, VBZ, etc.) matched with the help of the
+				<link linkend='ugr.tools.ruta.language.conditions.regexp'>REGEXP</link>
+				condition.
+			</para>
+			<para>We test our rules in the Ruta Workbench and see that they
+				indeed fix most of our problems. We save those and some more rules
+				in a text file
+				src/main/resources/ruta.txt.
+			</para>
+			<para>We declare the file with our rules as an external resource and
+				we load it during initialization.
+				Here's a way to do it using
+				uimaFIT:
+			</para>
+			<programlisting>/**
+ * Ruta rules for post-processing the tagger's output
+ */
+public static final String RUTA_RULES_PARA = "RutaRules";
+ExternalResource(key = RUTA_RULES_PARA, mandatory=false)
+...
+File rutaRulesF = new File((String) 
+    aContext.getConfigParameterValue(RUTA_RULES_PARA));
+</programlisting>
+			<para>After our CAS has been populated with posTag annotations from
+				the main algorithm, we post-process the CAS using
+				Ruta.apply():
+			</para>
+			<programlisting>String rutaRules = org.apache.commons.io.FileUtils.readFileToString(
+    rutaRulesF, "UTF-8");
+Ruta.apply(cas,  rutaRules);    
+</programlisting>
+			<para>We are now happy to see that the final output of our annotator
+				now
+				looks much better:
+			</para>
+			<programlisting>The|DT quick|JJ brown|JJ fox|NN receives|VBZ many|JJ bets|NNS .|.
+The|DT fox|NN places|VBZ many|JJ bets|NNS .|.
+The|DT fox|NN gets|VBZ up|RP early|RB .|.
+The|DT rabbit|NN made|VBD up|RP this|DT story|NN .|.</programlisting>
+		</section>
+	</section>
+
+
+	<section id="ugr.tools.ruta.maven">
+		<title>UIMA Ruta Maven Plugin</title>
+		<para>UIMA Ruta provides a maven plugin for building analysis engine
+			and type system descriptors for rule scripts.
+			Additionally, this maven plugin is able able to compile word list (gazetteers) to
+			the more efficient structures, tree word list and multi tree word
+			list. The usage and configuration is shortly summarized in the
+			following. An exemplary maven project for UIMA Ruta is given here:
+			<code>https://svn.apache.org/repos/asf/uima/ruta/trunk/example-projects/ruta-maven-example</code>
+		</para>
+		<section>
+		<title>generate goal</title>
+		<para>
+		The generate goal can be utilized to create xml descriptors for the UIMA Ruta script files. 
+		Its usage and configuration is summarized in the following example:
+		</para>
+		<programlisting><![CDATA[<plugin>
+<groupId>org.apache.uima</groupId>
+<artifactId>ruta-maven-plugin</artifactId>
+<version>2.3.0</version>
+<configuration>
+
+ <!-- This is a exemplary configuration, which explicitly specifies the 
+  default configuration values if not mentioned otherwise. -->
+
+ <!-- The directory where the generated type system descriptors will 
+  be written stored. -->
+ <!-- default value: ${project.build.directory}/generated-sources/
+   ruta/descriptor -->
+ <typeSystemOutputDirectory>${project.build.directory}/generated-sources/
+   ruta/descriptor</typeSystemOutputDirectory>
+
+ <!-- The directory where the generated analysis engine descriptors will 
+  be stored. -->
+ <!-- default value: ${project.build.directory}/generated-sources/ruta/
+   descriptor -->
+ <analysisEngineOutputDirectory>${project.build.directory}/
+  generated-sources/ruta/descriptor</analysisEngineOutputDirectory>
+
+ <!-- The template descriptor for the generated type system. By default the 
+   descriptor of the maven dependency is loaded. -->
+ <!-- default value: none -->
+ <!-- not used in this example <typeSystemTemplate>...
+   </typeSystemTemplate> -->
+
+ <!-- The template descriptor for the generated analysis engine. 
+   By default the descriptor of the maven dependency is loaded. -->
+ <!-- default value: none -->
+ <!-- not used in this example <analysisEngineTemplate>...
+   </analysisEngineTemplate> -->
+
+ <!-- Script paths of the generated analysis engine descriptor. -->
+ <!-- default value: none -->
+ <scriptPaths>
+  <scriptPath>${basedir}/src/main/ruta/</scriptPath>
+ </scriptPaths>
+
+ <!-- Descriptor paths of the generated analysis engine descriptor. -->
+ <!-- default value: none -->
+ <descriptorPaths>
+  <descriptorPath>${project.build.directory}/generated-sources/ruta/
+   descriptor</descriptorPath>
+ </descriptorPaths>
+
+ <!-- Resource paths of the generated analysis engine descriptor. -->
+ <!-- default value: none -->
+ <resourcePaths>
+  <resourcePath>${basedir}/src/main/resources/</resourcePath>
+  <resourcePath>${project.build.directory}/generated-sources/ruta/
+   resources/</resourcePath>
+ </resourcePaths>
+
+ <!-- Suffix used for the generated type system descriptors. -->
+ <!-- default value: Engine -->
+ <analysisEngineSuffix>Engine</analysisEngineSuffix>
+
+ <!-- Suffix used for the generated analysis engine descriptors. -->
+ <!-- default value: TypeSystem -->
+ <typeSystemSuffix>TypeSystem</typeSystemSuffix>
+
+ <!-- Source file encoding. -->
+ <!-- default value: ${project.build.sourceEncoding} -->
+ <encoding>UTF-8</encoding>
+
+ <!-- Type of type system imports. false = import by location. -->
+ <!-- default value: false -->
+ <importByName>false</importByName>
+
+ <!-- Option to resolve imports while building. -->
+ <!-- default value: false -->
+ <resolveImports>false</resolveImports>
+
+ <!-- Amount of retries for building dependent descriptors. Default value 
+  -1 leads to three retires for each script. -->
+  <!-- default value: -1 -->
+ <maxBuildRetries>-1</maxBuildRetries>
+
+ <!-- List of packages with language extensions -->
+ <!-- default value: none -->
+ <extensionPackages>
+  <extensionPackage>org.apache.uima.ruta</extensionPackage>
+ </extensionPackages>
+
+ <!-- Add UIMA Ruta nature to .project -->
+ <!-- default value: false -->
+ <addRutaNature>true</addRutaNature>
+
+
+ <!-- Buildpath of the UIMA Ruta Workbench (IDE) for this project -->
+ <!-- default value: none -->
+ <buildPaths>
+  <buildPath>script:src/main/ruta/</buildPath>
+  <buildPath>descriptor:target/generated-sources/ruta/descriptor/
+  </buildPath>
+  <buildPath>resources:src/main/resources/</buildPath>
+ </buildPaths>
+
+</configuration>
+<executions>
+ <execution>
+  <id>default</id>
+  <phase>process-classes</phase>
+  <goals>
+   <goal>generate</goal>
+  </goals>
+ </execution>
+</executions>
+</plugin>
+]]>		</programlisting>
+    <para>The configuration parameters for this goal either define the build behavior, 
+    e.g., where the generated descriptor should be placed or which suffix the files should get,
+    or the configuration of the generated analysis engine descriptor, e.g., 
+    the values of the configuration parameter scriptPaths.
+    However, there are also other parameters: addRutaNature and buildPaths. 
+    Both can be utilzed to configure the current Eclipse project (due to the missing m2e connector).
+    This is required if the functionality of the UIMA Ruta Workbench, e.g., syntax checking or auto-completeion,
+     should be available in the maven project. If the parameter addRutaNature is set to true, then
+     the UIMA Ruta Workbench will recognize the project as a script project. Only then, 
+     the buildpath of the UIMA Ruta project can be configured using the buildPaths parameter, which specifies 
+     the three important source folders of the UIMA Ruta project. In normal UIMA Ruta Workbnech projects, 
+     these are script, descriptor and resources.
+    </para>
+		</section>
+		
+		<section>
+		<title>twl goal</title>
+    <para>
+    The twl goal can be utilized to create .twl files from .txt files.
+    Its usage and configuration is summarized in the following example:
+    </para>
+    <programlisting><![CDATA[<plugin>
+<groupId>org.apache.uima</groupId>
+<artifactId>ruta-maven-plugin</artifactId>
+<version>2.3.0</version>
+<configuration></configuration>
+<executions>
+<execution>
+ <id>default</id>
+ <phase>process-classes</phase>
+ <goals>
+  <goal>twl</goal>
+ </goals>
+ <configuration>
+  <!-- This is a exemplary configuration, which explicitly specifies 
+   the default configuration values if not mentioned otherwise. -->
+
+  <!-- Compress resulting tree word list. -->
+  <!-- default value: true -->
+  <compress>true</compress>
+
+  <!-- The source files for the tree word list. -->
+  <!-- default value: none -->
+  <inputFiles>
+   <directory>${basedir}/src/main/resources</directory>
+   <includes>
+    <include>*.txt</include>
+   </includes>
+  </inputFiles>
+
+  <!-- The directory where the generated tree word lists will be 
+    written to.-->
+  <!-- default value: ${project.build.directory}/generated-sources/
+    ruta/resources/ -->
+  <outputDirectory>${project.build.directory}/generated-sources/ruta/
+    resources/</outputDirectory>
+
+  <!-- Source file encoding. -->
+  <!-- default value: ${project.build.sourceEncoding} -->
+  <encoding>UTF-8</encoding>
+
+ </configuration>
+</execution>
+</executions>
+</plugin>
+    ]]>   </programlisting>
+		</section>
+		
+		<section>
+    <title>mtwl goal</title>
+    <para>
+    The mtwl goal can be utilized to create a .mtwl file from multiple .txt files.
+    Its usage and configuration is summarized in the following example:
+    </para>
+    <programlisting><![CDATA[<plugin>
+<groupId>org.apache.uima</groupId>
+<artifactId>ruta-maven-plugin</artifactId>
+<version>2.3.0</version>
+<configuration></configuration>
+<executions>
+<execution>
+ <id>default</id>
+ <phase>process-classes</phase>
+ <goals>
+  <goal>mtwl</goal>
+ </goals>
+ <configuration>
+  <!-- This is a exemplary configuration, which explicitly specifies 
+   the default configuration values if not mentioned otherwise. -->
+
+  <!-- Compress resulting tree word list. -->
+  <!-- default value: true -->
+  <compress>true</compress>
+
+  <!-- The source files for the multi tree word list. -->
+  <!-- default value: none -->
+  <inputFiles>
+   <directory>${basedir}/src/main/resources</directory>
+   <includes>
+    <include>*.txt</include>
+   </includes>
+  </inputFiles>
+
+  <!-- The directory where the generated tree word list will be 
+    written to. -->
+  <!-- default value: ${project.build.directory}/generated-sources/ruta/
+    resources/generated.mtwl -->
+  <outputFile>${project.build.directory}/generated-sources/ruta/resources/
+    generated.mtwl</outputFile>
+
+  <!-- Source file encoding. -->
+  <!-- default value: ${project.build.sourceEncoding} -->
+  <encoding>UTF-8</encoding>
+  
+ </configuration>
+</execution>
+</executions>
+</plugin>
+    ]]>   </programlisting>
+    </section>
+		
+	</section>
+
+<section id="section.tools.ruta.workbench.textruler.example">
+   <title>Induce rules with the TextRuler framework</title>
+      <para> 
+      This section gives a short example how the TextRuler framework is applied in order to induce annotation rules. We refer to the screenshot in <xref linkend="figure.tools.ruta.workbench.textruler.main"/>
+      for the configuration and are using the exemplary UIMA Ruta project <quote>TextRulerExample</quote>, which is part of the source release of UIMA Ruta.
+      After importing the project into your workspace, please rebuild all UIMA Ruta scripts in order to create the descriptors, e.g., by cleaning the project.
+      </para>
+      <para> 
+        In this example, we are using the <quote>KEP</quote> algorithm for learning annotation rules for identifying Bibtex entries in the reference section of scientific publications:
+        <orderedlist>
+        <listitem>
+          <para>Select the folder <quote>single</quote> and drag and drop it to the <quote>Training Data</quote> text field. This folder contains one file with 
+          correct annotations and serves as gold standard data in our example.</para>
+        </listitem>
+        <listitem>
+          <para>Select the file <quote>Feature.ruta</quote> and drag and drop it to the <quote>Preprocess Script</quote> text field. This UIMA Ruta script knows all necessary types, especially the types
+          of the annotations we try the learn rules for, and additionally it contains rules that create useful annotations, which can be used by the algorithm in order to learn better rules.</para>
+        </listitem>
+        <listitem>
+          <para>Select the file <quote>InfoTypes.txt</quote> and drag and drop it to the <quote>Information Types</quote> list. This specifies the goal of the learning process, 
+          which types of annotations should be annotated by the induced rules, respectively.</para>
+        </listitem>
+        <listitem>
+          <para>Check the checkbox of the <quote>KEP</quote> algorithm and press the start button in the toolbar fo the view.</para>
+        </listitem>
+        <listitem>
+          <para>The algorithm now tries to induce rules for the targeted types. The current result is displayed in the view <quote>KEP Results</quote> in the right part of the perspective.</para>
+        </listitem>
+        <listitem>
+          <para>After the algorithms finished the learning process, create a new UIMA Ruta file in the <quote>uima.ruta.example</quote> package and copy the content of the result view
+          to the new file. Now, the induced rules can be applied as a normal UIMA Ruta script file.</para>
+        </listitem>
+      </orderedlist>
+      </para>
+    </section>
+</chapter>

Propchange: uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.howtos.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Modified: uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml
URL: http://svn.apache.org/viewvc/uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml?rev=1676730&r1=1676729&r2=1676730&view=diff
==============================================================================
--- uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml (original)
+++ uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml Wed Apr 29 12:24:35 2015
@@ -600,69 +600,7 @@ Document{-> EXEC(MyAnalysisEngine, {MyTy
          are generated in the descriptor folder corresponding to the package namespace of the script file.
          The available configuration parameters of the UIMA Ruta Analysis Engine are described in the following.   
       </para>
-      <section id="ugr.tools.ruta.ae.basic.apply">
-      <title>Apply UIMA Ruta Analysis Engine in plain Java</title>
-        <para>
-          Let us assume that the reader wrote the UIMA Ruta rules using the UIMA Ruta Workbench, which already creates correctly configured descriptors.
-          In this case, the following java code can be used to apply the UIMA Ruta script.
-        </para>
-          <programlisting><![CDATA[File specFile = new File("pathToMyWorkspace/MyProject/descriptor/"+
-    "my/package/MyScriptEngine.xml");
-XMLInputSource in = new XMLInputSource(specFile);
-ResourceSpecifier specifier = UIMAFramework.getXMLParser().
-    parseResourceSpecifier(in);
-// for import by name... set the datapath in the ResourceManager
-AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);
-CAS cas = ae.newCAS();
-cas.setDocumentText("This is my document.");
-ae.process(cas);]]></programlisting>
-      <note><para>
-        The UIMA Ruta Analysis Engine utilizes type priorities. If the CAS object is 
-        not created using the UIMA Ruta Analysis Engine descriptor by other means, then please 
-        provide the necessary type priorities for a valid execution of the UIMA Ruta rules.
-      </para></note>
-      <para>
-        If the UIMA Ruta script was written, for example, with a common text editor and no configured descriptors are yet available, 
-        then the following java code can be used, which, however, is only applicable for executing single script files that do not import 
-        additional components or scripts. In this case the other parameters, e.g., <quote>additionalScripts</quote>, need to be configured correctly.
-      </para>
-      <programlisting><![CDATA[URL aedesc = RutaEngine.class.getResource("BasicEngine.xml");
-XMLInputSource inae = new XMLInputSource(aedesc);
-ResourceSpecifier specifier = UIMAFramework.getXMLParser().
-    parseResourceSpecifier(inae);
-ResourceManager resMgr = UIMAFramework.newDefaultResourceManager();
-AnalysisEngineDescription aed = (AnalysisEngineDescription) specifier;
-TypeSystemDescription basicTypeSystem = aed.getAnalysisEngineMetaData().
-    getTypeSystem();
-
-Collection<TypeSystemDescription> tsds = 
-    new ArrayList<TypeSystemDescription>();
-tsds.add(basicTypeSystem);
-// add some other type system descriptors 
-// that are needed by your script file   
-TypeSystemDescription mergeTypeSystems = CasCreationUtils.
-    mergeTypeSystems(tsds);
-aed.getAnalysisEngineMetaData().setTypeSystem(mergeTypeSystems);
-aed.resolveImports(resMgr);
-        
-AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(aed, 
-    resMgr, null);
-File scriptFile = new File("path/to/file/MyScript.ruta");
-ae.setConfigParameterValue(RutaEngine.PARAM_SCRIPT_PATHS, 
-    new String[] { scriptFile.getParentFile().getAbsolutePath() });
-String name = scriptFile.getName().substring(0, 
-    scriptFile.getName().length() - 5);
-ae.setConfigParameterValue(RutaEngine.PARAM_MAIN_SCRIPT, name);
-ae.reconfigure();
-CAS cas = ae.newCAS();
-cas.setDocumentText("This is my document.");
-ae.process(cas);]]></programlisting>
-      <para>
-        There is also a convenience implementation for applying simple scripts, which do not introduce new types. The following java code
-        applies a simple rule <quote>T1 SW{-> MARK(T2)};</quote> on the given CAS. Note that the types need to be already defined in the type system of the CAS.
-      </para>
-      <programlisting><![CDATA[Ruta.apply(cas, "T1 SW{-> MARK(T2)};");]]></programlisting>
-      </section>
+      
       <section id="ugr.tools.ruta.ae.basic.parameter">
         <title>Configuration Parameters</title>
         <para>
@@ -1461,82 +1399,4 @@ ae.process(cas);]]></programlisting>
     </section>
     
   </section>
-	<section id="ugr.tools.ruta.integration">
-		<title>Integrating Ruta in an existing UIMA Annotator</title>
-		<para>This section provides a walk-through tutorial on integrating 
-		Ruta in an existing UIMA annotator. In our artificial example 
-		we will use Ruta rules to post-process the output of a 
-		Part-of-Speech tagger. The POS tagger is a UIMA annotator that iterates over sentences 
-		and tokens and updates the posTag field of each Token with a part of 
-		speech. For example, given this text...</para>
-<programlisting>The quick brown fox receives many bets.
-The fox places many bets.
-The fox gets up early.
-The rabbit made up this story.</programlisting>
-		<para>...it assigns the posTag JJ (adjective) to the token &quot;brown&quot; , the posTag NN (common noun) to the 
-		token &quot;fox&quot; and the tag VBZ (verb, 3rd person 
-		singular present) to the token &quot;receives&quot; in the first sentence.</para>
-		<para>We have noticed that the tagger sometimens fails to disambiguate NNS (common noun plural) and 
-		VBZ tags, as in the second sentence. The word &quot;up&quot; also seems to confuse the tagger, 
-		which always assigns it an RB (adverb) tag, even when it is a particle (RP) following a verb, as in
-		the third and fourth sentences:</para>
-<programlisting>The|DT quick|JJ brown|JJ fox|NN receives|VBZ many|JJ bets|NNS .|.
-The|DT fox|NN places|NNS many|JJ bets|NNS .|.
-The|DT fox|NN gets|VBZ up|RB early|RB .|.
-The|DT rabbit|NN made|VBD up|RB this|DT story|NN .|.</programlisting>
-		<para>Let&apos;s imagine that after applying every possible approach 
-		available in the POS tagging literature, our tagger still generates 
-		these and some other errors. We decide to write a few Ruta 
-		rules to post-process the output of the tagger.</para>
-		<section id="ugr.tools.ruta.ae.integration.mvn">
-			<title>Adding Ruta to our Annotator</title>
-			<para>The POS tagger is being developed as a Maven-based project. 
-			Since Ruta maven artifacts are available on Maven Central, we 
-			add the following dependency to the project&apos;s pom.xml. The functionalities described in this section 
-			require a version of Ruta equal to or greater than 2.1.0.</para>
-			<programlisting>&lt;dependency&gt;
-	&lt;groupId&gt;org.apache.uima&lt;/groupId&gt;
-	&lt;artifactId&gt;ruta-core&lt;/artifactId&gt;
-	&lt;version&gt;[2.0.2,)&lt;/version&gt;
-&lt;/dependency&gt;</programlisting>
-                <para>We  also take care that the Ruta basic typesystem is loaded when our annotator is initialized. The Ruta typesystem descriptors are available from ruta-core/src/main/resources/org/apache/uima/ruta/engine/</para>
-		</section>
-		<section id="ugr.tools.ruta.ae.integration.loading">
-			<title>Developing Ruta rules and applying them from inside Java code</title>
-			<para>We are now ready to write some rules. The ones we develop for fixing the two errors look like this:</para>
-<programlisting>Token.posTag =="NN" Token.posTag=="NNS"{-> Token.posTag="VBZ"}
-    Token.posTag=="JJ";
-Token{REGEXP(Token.posTag, "VB(.?)")} 
-    Token.posTag=="RB"{REGEXP("up")-> Token.posTag="RP"};  </programlisting>
-                        <para>That is, we change a Token's NNS tag to VBZ, if it is surrounded by a Token tagged as NN and a Token tagged as JJ. We also change an RB tag for an &quot;up&quot; token to RP, if &quot;up&quot; is preceded by any verbal tag (VB, VBZ, etc.) matched with the help of the <link linkend='ugr.tools.ruta.language.conditions.regexp'>REGEXP</link> condition. 
-			</para>
-			<para>We test our rules in the Ruta Workbench and see that they 
-			indeed fix most of our problems. We save those and some more rules in a text file  
-			src/main/resources/ruta.txt.</para>
-			<para>We declare the file with our rules as an external resource and we load it during initialization. 
-			Here's a way to do it using uimaFIT:</para>
-<programlisting>/**
- * Ruta rules for post-processing the tagger's output
- */
-public static final String RUTA_RULES_PARA = "RutaRules";
-ExternalResource(key = RUTA_RULES_PARA, mandatory=false)
-...
-File rutaRulesF = new File((String) 
-    aContext.getConfigParameterValue(RUTA_RULES_PARA));
-</programlisting>
-			<para>After our CAS has been populated with posTag annotations from 
-			the main algorithm, we post-process the CAS using 
-			Ruta.apply():</para>
-<programlisting>String rutaRules = org.apache.commons.io.FileUtils.readFileToString(
-    rutaRulesF, "UTF-8");
-Ruta.apply(cas,  rutaRules);		
-</programlisting>
-			<para>We are now happy to see that the final output of our annotator now 
-			looks much better:</para>
-			<programlisting>The|DT quick|JJ brown|JJ fox|NN receives|VBZ many|JJ bets|NNS .|.
-The|DT fox|NN places|VBZ many|JJ bets|NNS .|.
-The|DT fox|NN gets|VBZ up|RP early|RB .|.
-The|DT rabbit|NN made|VBD up|RP this|DT story|NN .|.</programlisting>
-		</section>
-	</section>
 </chapter>

Modified: uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.workbench.textruler.xml
URL: http://svn.apache.org/viewvc/uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.workbench.textruler.xml?rev=1676730&r1=1676729&r2=1676730&view=diff
==============================================================================
--- uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.workbench.textruler.xml (original)
+++ uima/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.workbench.textruler.xml Wed Apr 29 12:24:35 2015
@@ -251,40 +251,4 @@ under the License.
       </mediaobject>
     </figure>
    </section>
-   
-   <section id="section.tools.ruta.workbench.textruler.example">
-   <title>Example</title>
-      <para> 
-      This section gives a short example how the TextRuler framework is applied in order to induce annotation rules. We refer to the screenshot in <xref linkend="figure.tools.ruta.workbench.textruler.main"/>
-      for the configuration and are using the exemplary UIMA Ruta project <quote>TextRulerExample</quote>, which is part of the source release of UIMA Ruta.
-      After importing the project into your workspace, please rebuild all UIMA Ruta scripts in order to create the descriptors, e.g., by cleaning the project.
-      </para>
-      <para> 
-        In this example, we are using the <quote>KEP</quote> algorithm for learning annotation rules for identifying Bibtex entries in the reference section of scientific publications:
-        <orderedlist>
-        <listitem>
-          <para>Select the folder <quote>single</quote> and drag and drop it to the <quote>Training Data</quote> text field. This folder contains one file with 
-          correct annotations and serves as gold standard data in our example.</para>
-        </listitem>
-        <listitem>
-          <para>Select the file <quote>Feature.ruta</quote> and drag and drop it to the <quote>Preprocess Script</quote> text field. This UIMA Ruta script knows all necessary types, especially the types
-          of the annotations we try the learn rules for, and additionally it contains rules that create useful annotations, which can be used by the algorithm in order to learn better rules.</para>
-        </listitem>
-        <listitem>
-          <para>Select the file <quote>InfoTypes.txt</quote> and drag and drop it to the <quote>Information Types</quote> list. This specifies the goal of the learning process, 
-          which types of annotations should be annotated by the induced rules, respectively.</para>
-        </listitem>
-        <listitem>
-          <para>Check the checkbox of the <quote>KEP</quote> algorithm and press the start button in the toolbar fo the view.</para>
-        </listitem>
-        <listitem>
-          <para>The algorithm now tries to induce rules for the targeted types. The current result is displayed in the view <quote>KEP Results</quote> in the right part of the perspective.</para>
-        </listitem>
-        <listitem>
-          <para>After the algorithms finished the learning process, create a new UIMA Ruta file in the <quote>uima.ruta.example</quote> package and copy the content of the result view
-          to the new file. Now, the induced rules can be applied as a normal UIMA Ruta script file.</para>
-        </listitem>
-      </orderedlist>
-      </para>
-    </section>
 </section>