You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by re...@apache.org on 2014/04/18 21:13:13 UTC
svn commit: r1588543 - in
/uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook:
tools.uimafit.book.xml tools.uimafit.packaging.xml
tools.uimafit.typesystem.xml
Author: rec
Date: Fri Apr 18 19:13:13 2014
New Revision: 1588543
URL: http://svn.apache.org/r1588543
Log:
[UIMA-3385] Make meta data discovery compatible with fat jars
Added:
uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml
- copied, changed from r1573368, uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.pipelines.xml
Modified:
uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml
uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml
Modified: uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml
URL: http://svn.apache.org/viewvc/uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml?rev=1588543&r1=1588542&r2=1588543&view=diff
==============================================================================
--- uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml (original)
+++ uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.book.xml Fri Apr 18 19:13:13 2014
@@ -41,6 +41,8 @@ under the License.
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.uimafit.typesystem.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.uimafit.packaging.xml"/>
+
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.uimafit.maven.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.uimafit.migration.xml"/>
Copied: uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml (from r1573368, uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.pipelines.xml)
URL: http://svn.apache.org/viewvc/uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml?p2=uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml&p1=uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.pipelines.xml&r1=1573368&r2=1588543&rev=1588543&view=diff
==============================================================================
--- uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.pipelines.xml (original)
+++ uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.packaging.xml Fri Apr 18 19:13:13 2014
@@ -16,58 +16,82 @@
specific language governing permissions and limitations
under the License.
-->
-<chapter id="ugr.tools.uimafit.pipelines">
- <title>Pipelines</title>
- <para>UIMA is a component-based architecture that allows composing various processing components
- into a complex processing pipeline. A pipeline typically involves a <emphasis>collection
- reader</emphasis> which ingests documents and <emphasis>analysis engines</emphasis> that do
- the actual processing.</para>
- <para>Normally, you would run a pipeline using a UIMA Collection Processing Engine or using UIMA
- AS. uimaFIT offers a third alternative that is much simpler to use and well suited for embedding
- UIMA pipelines into applications or for writing tests.</para>
- <para>As uimaFIT does not supply any readers or processing components, we just assume that we have
- written three components:</para>
- <itemizedlist>
- <listitem>
- <para><classname>TextReader</classname> - reads text files from a directory</para>
- </listitem>
- <listitem>
- <para><classname>Tokenizer</classname> - annotates tokens</para>
- </listitem>
- <listitem>
- <para><classname>TokenFrequencyWriter</classname> - writes a list of tokens and their
- frequency to a file</para>
- </listitem>
- </itemizedlist>
- <para>We create descriptors for all components and run them as a pipeline:</para>
- <programlisting>CollectionReaderDescription reader =
- CollectionReaderFactory.createReaderDescription(
- TextReader.class,
- TextReader.PARAM_INPUT, "/home/uimafit/documents");
-
-AnalysisEngineDescription tokenizer =
- AnalysisEngineFactory.createEngineDescription(
- Tokenizer.class);
-
-AnalysisEngineDescription tokenFrequencyWriter =
- AnalysisEngineFactory.createEngineDescription(
- TokenFrequencyWriter.class,
- TokenFrequencyWriter.PARAM_OUTPUT, "counts.txt");
-
-SimplePipeline.runPipeline(reader, tokenizer, writer);</programlisting>
- <para>Instead of running the full pipeline end-to-end, we can also process one document at a time
- and inspect the analysis results:</para>
- <programlisting>CollectionReaderDescription reader =
- CollectionReaderFactory.createReaderDescription(
- TextReader.class,
- TextReader.PARAM_INPUT, "/home/uimafit/documents");
-
-AnalysisEngineDescription tokenizer =
- AnalysisEngineFactory.createEngineDescription(
- Tokenizer.class);
-
-for (JCas jcas : SimplePipeline.iteratePipeline(reader, tokenizer)) {
- System.out.printf("Found %d tokens%n",
- JCasUtil.select(jcas, Token.class).size());
-}</programlisting>
+<chapter id="ugr.tools.uimafit.packaging">
+ <title>Building an executable JAR</title>
+ <para>Building an executable JAR including uimaFIT components typically requires extra care. Per
+ convention, uimaFIT expects certain information in specific locations on the classpath, e.g. the
+ <filename>types.txt</filename> file that controls the <link
+ linkend="ugr.tools.uimafit.packaging">automatic type system detection</link> mechanism must
+ reside at <filename>META-INF/org.apache.uima.fit/types.txt</filename>. It often occurs that a
+ project has several dependencies, each supplying its own configuration files at these standard
+ locations. However, this causes a problem with naive approaches to creating an executable
+ <emphasis>fat-jar</emphasis> merging all dependencies into a single JAR file. Without extra
+ care, the files supplied by the different dependencies overwrite each other during the packaging
+ process and only one file <emphasis>wins</emphasis> in the end. As a consequence, the types
+ configured in the other files cannot be detected at runtime. Such a native approach is taken,
+ for example, by the Maven Assembly Plugin.</para>
+ <para>The Maven Shade Plugin provides a convenient alternative for the creation of executable
+ fat-jars, as it provides a mechanism to concatenate the configuration files from different
+ dependencies while creating the fat-jar. To use the Maven Shade Plugin with uimaFIT, use the
+ following configuration section in your POM file and make sure to change the
+ <parameter>mainClass</parameter> as required for your project:</para>
+ <programlisting><build>
+ <plugins>
+ <plugin>
+ <groupId>org.apache.maven.plugins</groupId>
+ <artifactId>maven-shade-plugin</artifactId>
+ <version>2.2</version>
+ <executions>
+ <execution>
+ <phase>package</phase>
+ <goals><goal>shade</goal></goals>
+ <configuration>
+ <transformers>
+ <!-- Set the main class of the executable JAR -->
+ <transformer
+ implementation="org.apache.maven.plugins.shade.\
+ resource.ManifestResourceTransformer">
+ <mainClass>org.apache.uima.fit.example.Main</mainClass>
+ </transformer>
+ <!-- Merge the uimaFIT configuration files -->
+ <transformer
+ implementation="org.apache.maven.plugins.shade.\
+ resource.AppendingTransformer">
+ <resource>\
+ META-INF/org.apache.uima.fit/fsindexes.txt\
+ </resource>
+ </transformer>
+ <transformer
+ implementation="org.apache.maven.plugins.shade.\
+ resource.AppendingTransformer">
+ <resource>\
+ META-INF/org.apache.uima.fit/types.txt\
+ </resource>
+ </transformer>
+ <transformer
+ implementation="org.apache.maven.plugins.shade.\
+ resource.AppendingTransformer">
+ <resource>\
+ META-INF/org.apache.uima.fit/typepriorities.txt\
+ </resource>
+ </transformer>
+ </transformers>
+ </configuration>
+ </execution>
+ </executions>
+ </plugin>
+ </plugins>
+</build></programlisting>
+ <note>
+ <para>Due to formatting constraints in the PDF version of this manual, the example above uses
+ <code>\</code> to indicate a line continuation. Remove these and join the lines when you
+ copy/paste this example.</para>
+ </note>
+ <note>
+ <para>You might want to consider also merging additional files, such as LICENSE, NOTICE, or
+ DEPENDENCY files, configuration files for the Java Service Locator API, or files used by
+ other frameworks that uses similar conventions for configuration file locations. Check the
+ documentation of the Maven Shade Plugin, as different kinds of configuration files require
+ different specialized transformers.</para>
+ </note>
</chapter>
Modified: uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml
URL: http://svn.apache.org/viewvc/uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml?rev=1588543&r1=1588542&r2=1588543&view=diff
==============================================================================
--- uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml (original)
+++ uima/uimafit/branches/2.0.x/uimafit-docbook/src/docbook/tools.uimafit.typesystem.xml Fri Apr 18 19:13:13 2014
@@ -35,6 +35,13 @@
<filename>org/apache/uima/fit/type/Token.xml</filename>, then the file should have the
following contents:</para>
<programlisting>classpath*:org/apache/uima/fit/type/Token.xml</programlisting>
+ <note>
+ <para>Mind that the file <filename>types.txt</filename> is must be located in
+ <filename>META-INF/org.apache.uima.fit</filename> where
+ <filename>org.apache.uima.fit</filename> is the name of a sub-directory inside
+ <filename>META-INF</filename>. <emphasis>We are not using the Java package notation
+ here!</emphasis></para>
+ </note>
<para>To specify multiple TSDs, add additonal lines to the file. If you have a large number of
TSDs, you may prefer to add a pattern. Assume that we have a large number of TSDs under
<filename>org/apache/uima/fit/type</filename>, we can use the following pattern which
@@ -46,12 +53,13 @@
<para>If it is not possible or inconvenient to add the `types.txt` file, patterns can also be
specified using the system property
<parameter>org.apache.uima.fit.type.import_pattern</parameter>. Multiple patterns may be
- specified separated by semicolon<footnote>
- <para>The <literal>\</literal> in the example is used as a line-continuation indicator. It
- and all spaces following it should be ommitted.</para>
- </footnote>:</para>
+ specified separated by semicolon:</para>
<programlisting>-Dorg.apache.uima.fit.type.import_pattern=\
classpath*:org/apache/uima/fit/type/**/*.xml</programlisting>
+ <note>
+ <para>The <literal>\</literal> in the example is used as a line-continuation indicator. It
+ and all spaces following it should be ommitted.</para>
+ </note>
</section>
<section>
<title>Using type auto-detection </title>