You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by re...@apache.org on 2014/04/18 21:13:13 UTC

svn commit: r1588543 - in /uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook: tools.uimafit.book.xml tools.uimafit.packaging.xml tools.uimafit.typesystem.xml

Author: rec
Date: Fri Apr 18 19:13:13 2014
New Revision: 1588543

URL: http://svn.apache.org/r1588543
Log:
[UIMA-3385] Make meta data discovery compatible with fat jars

Added:
    uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml
      - copied, changed from r1573368, uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.pipelines.xml
Modified:
    uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml
    uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml

Modified: uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml
URL: http://svn.apache.org/viewvc/uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml?rev=1588543&r1=1588542&r2=1588543&view=diff
==============================================================================
--- uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml (original)
+++ uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml Fri Apr 18 19:13:13 2014
@@ -41,6 +41,8 @@ under the License.
 
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.uimafit.typesystem.xml"/>
 
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.uimafit.packaging.xml"/>
+
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.uimafit.maven.xml"/>
 
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.uimafit.migration.xml"/>

Copied: uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml (from r1573368, uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.pipelines.xml)
URL: http://svn.apache.org/viewvc/uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml?p2=uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml&p1=uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.pipelines.xml&r1=1573368&r2=1588543&rev=1588543&view=diff
==============================================================================
--- uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.pipelines.xml (original)
+++ uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml Fri Apr 18 19:13:13 2014
@@ -16,58 +16,82 @@
 	specific language governing permissions and limitations
 	under the License.
 -->
-<chapter id="ugr.tools.uimafit.pipelines">
-  <title>Pipelines</title>
-  <para>UIMA is a component-based architecture that allows composing various processing components
-    into a complex processing pipeline. A pipeline typically involves a <emphasis>collection
-      reader</emphasis> which ingests documents and <emphasis>analysis engines</emphasis> that do
-    the actual processing.</para>
-  <para>Normally, you would run a pipeline using a UIMA Collection Processing Engine or using UIMA
-    AS. uimaFIT offers a third alternative that is much simpler to use and well suited for embedding
-    UIMA pipelines into applications or for writing tests.</para>
-  <para>As uimaFIT does not supply any readers or processing components, we just assume that we have
-    written three components:</para>
-  <itemizedlist>
-    <listitem>
-      <para><classname>TextReader</classname> - reads text files from a directory</para>
-    </listitem>
-    <listitem>
-      <para><classname>Tokenizer</classname> - annotates tokens</para>
-    </listitem>
-    <listitem>
-      <para><classname>TokenFrequencyWriter</classname> - writes a list of tokens and their
-        frequency to a file</para>
-    </listitem>
-  </itemizedlist>
-  <para>We create descriptors for all components and run them as a pipeline:</para>
-  <programlisting>CollectionReaderDescription reader = 
-  CollectionReaderFactory.createReaderDescription(
-    TextReader.class, 
-    TextReader.PARAM_INPUT, "/home/uimafit/documents");
-
-AnalysisEngineDescription tokenizer = 
-  AnalysisEngineFactory.createEngineDescription(
-    Tokenizer.class);
-
-AnalysisEngineDescription tokenFrequencyWriter = 
-  AnalysisEngineFactory.createEngineDescription(
-    TokenFrequencyWriter.class, 
-    TokenFrequencyWriter.PARAM_OUTPUT, "counts.txt");
-
-SimplePipeline.runPipeline(reader, tokenizer, writer);</programlisting>
-  <para>Instead of running the full pipeline end-to-end, we can also process one document at a time
-    and inspect the analysis results:</para>
-  <programlisting>CollectionReaderDescription reader = 
-  CollectionReaderFactory.createReaderDescription(
-    TextReader.class, 
-    TextReader.PARAM_INPUT, "/home/uimafit/documents");
-
-AnalysisEngineDescription tokenizer = 
-  AnalysisEngineFactory.createEngineDescription(
-    Tokenizer.class);
-
-for (JCas jcas : SimplePipeline.iteratePipeline(reader, tokenizer)) {
-  System.out.printf("Found %d tokens%n", 
-    JCasUtil.select(jcas, Token.class).size());
-}</programlisting>
+<chapter id="ugr.tools.uimafit.packaging">
+  <title>Building an executable JAR</title>
+  <para>Building an executable JAR including uimaFIT components typically requires extra care. Per
+    convention, uimaFIT expects certain information in specific locations on the classpath, e.g. the
+      <filename>types.txt</filename> file that controls the <link
+      linkend="ugr.tools.uimafit.packaging">automatic type system detection</link> mechanism must
+    reside at <filename>META-INF/org.apache.uima.fit/types.txt</filename>. It often occurs that a
+    project has several dependencies, each supplying its own configuration files at these standard
+    locations. However, this causes a problem with naive approaches to creating an executable
+      <emphasis>fat-jar</emphasis> merging all dependencies into a single JAR file. Without extra
+    care, the files supplied by the different dependencies overwrite each other during the packaging
+    process and only one file <emphasis>wins</emphasis> in the end. As a consequence, the types
+    configured in the other files cannot be detected at runtime. Such a native approach is taken,
+    for example, by the Maven Assembly Plugin.</para>
+  <para>The Maven Shade Plugin provides a convenient alternative for the creation of executable
+    fat-jars, as it provides a mechanism to concatenate the configuration files from different
+    dependencies while creating the fat-jar. To use the Maven Shade Plugin with uimaFIT, use the
+    following configuration section in your POM file and make sure to change the
+      <parameter>mainClass</parameter> as required for your project:</para>
+  <programlisting>&lt;build>
+  &lt;plugins>
+    &lt;plugin>
+      &lt;groupId>org.apache.maven.plugins&lt;/groupId>
+      &lt;artifactId>maven-shade-plugin&lt;/artifactId>
+      &lt;version>2.2&lt;/version>
+      &lt;executions>
+        &lt;execution>
+          &lt;phase>package&lt;/phase>
+          &lt;goals>&lt;goal>shade&lt;/goal>&lt;/goals>
+          &lt;configuration>
+            &lt;transformers>
+              &lt;!-- Set the main class of the executable JAR -->
+              &lt;transformer
+                implementation="org.apache.maven.plugins.shade.\
+                                resource.ManifestResourceTransformer">
+                &lt;mainClass>org.apache.uima.fit.example.Main&lt;/mainClass>
+              &lt;/transformer>
+              &lt;!-- Merge the uimaFIT configuration files -->
+              &lt;transformer
+                implementation="org.apache.maven.plugins.shade.\
+                                resource.AppendingTransformer">
+                &lt;resource>\
+                  META-INF/org.apache.uima.fit/fsindexes.txt\
+                &lt;/resource>
+              &lt;/transformer>
+              &lt;transformer
+                implementation="org.apache.maven.plugins.shade.\
+                                resource.AppendingTransformer">
+                &lt;resource>\
+                  META-INF/org.apache.uima.fit/types.txt\
+                &lt;/resource>
+              &lt;/transformer>
+              &lt;transformer
+                implementation="org.apache.maven.plugins.shade.\
+                                resource.AppendingTransformer">
+                &lt;resource>\
+                  META-INF/org.apache.uima.fit/typepriorities.txt\
+                &lt;/resource>
+              &lt;/transformer>
+            &lt;/transformers>
+          &lt;/configuration>
+        &lt;/execution>
+      &lt;/executions>
+    &lt;/plugin>
+  &lt;/plugins>
+&lt;/build></programlisting>
+  <note>
+    <para>Due to formatting constraints in the PDF version of this manual, the example above uses
+        <code>\</code> to indicate a line continuation. Remove these and join the lines when you
+      copy/paste this example.</para>
+  </note>
+  <note>
+    <para>You might want to consider also merging additional files, such as LICENSE, NOTICE, or
+      DEPENDENCY files,  configuration files for the Java Service Locator API, or  files used by
+      other frameworks that uses similar conventions for configuration file locations. Check the
+      documentation of the Maven Shade Plugin, as different kinds of configuration files require
+      different specialized transformers.</para>
+  </note>
 </chapter>

Modified: uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml
URL: http://svn.apache.org/viewvc/uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml?rev=1588543&r1=1588542&r2=1588543&view=diff
==============================================================================
--- uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml (original)
+++ uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml Fri Apr 18 19:13:13 2014
@@ -35,6 +35,13 @@
         <filename>org/apache/uima/fit/type/Token.xml</filename>, then the file should have the
       following contents:</para>
     <programlisting>classpath*:org/apache/uima/fit/type/Token.xml</programlisting>
+    <note>
+      <para>Mind that the file <filename>types.txt</filename> is must be located in
+          <filename>META-INF/org.apache.uima.fit</filename> where
+          <filename>org.apache.uima.fit</filename> is the name of a sub-directory inside
+          <filename>META-INF</filename>. <emphasis>We are not using the Java package notation
+          here!</emphasis></para>
+    </note>
     <para>To specify multiple TSDs, add additonal lines to the file. If you have a large number of
       TSDs, you may prefer to add a pattern. Assume that we have a large number of TSDs under
         <filename>org/apache/uima/fit/type</filename>, we can use the following pattern which
@@ -46,12 +53,13 @@
     <para>If it is not possible or inconvenient to add the `types.txt` file, patterns can also be
       specified using the system property
         <parameter>org.apache.uima.fit.type.import_pattern</parameter>. Multiple patterns may be
-      specified separated by semicolon<footnote>
-        <para>The <literal>\</literal> in the example is used as a line-continuation indicator. It
-          and all spaces following it should be ommitted.</para>
-      </footnote>:</para>
+      specified separated by semicolon:</para>
     <programlisting>-Dorg.apache.uima.fit.type.import_pattern=\
   classpath*:org/apache/uima/fit/type/**/*.xml</programlisting>
+    <note>
+      <para>The <literal>\</literal> in the example is used as a line-continuation indicator. It
+        and all spaces following it should be ommitted.</para>
+    </note>
   </section>
   <section>
     <title>Using type auto-detection </title>