You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by al...@apache.org on 2007/01/31 21:48:01 UTC
svn commit: r501983 [2/3] - in
/incubator/uima/uimaj/trunk/uima-docbooks/src: docbook/references/
docbook/tutorials_and_users_guides/ olink/references/
olink/tutorials_and_users_guides/
Modified: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.application.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.application.xml?view=diff&rev=501983&r1=501982&r2=501983
==============================================================================
--- incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.application.xml (original)
+++ incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.application.xml Wed Jan 31 12:48:00 2007
@@ -26,86 +26,95 @@
<chapter id="ugr.tug.application">
<title>Application Developer's Guide</title>
- <para>This chapter describes how to develop an application using the Unstructured
- Information Management Architecture (UIMA). The term
- <emphasis>application</emphasis> describes a program that provides end-user
- functionality. A UIMA application incorporates one or more UIMA components such as
- Analysis Engines, Collection Processing Engines, a Search Engine, and/or a Document
- Store and adds application-specific logic and user interfaces.</para>
+ <para>This chapter describes how to develop an application using the Unstructured Information Management
+ Architecture (UIMA). The term <emphasis>application</emphasis> describes a program that provides end-user
+ functionality. A UIMA application incorporates one or more UIMA components such as Analysis Engines,
+ Collection Processing Engines, a Search Engine, and/or a Document Store and adds application-specific logic
+ and user interfaces.</para>
<section id="ugr.tug.appication.uimaframework_class">
<title>The UIMAFramework Class</title>
- <para>An application developer's starting point for accessing UIMA framework
- functionality is the <literal>org.apache.uima.UIMAFramework</literal> class.
- The following is a short introduction to some important methods on this class. Several
- of these methods are used in examples in the rest of this chapter. For more details, see
- the JavaDocs (in the docs/api directory of the UIMA SDK).
-
- <itemizedlist><listitem><para>UIMAFramework.getXMLParser(): Returns an
- instance of the UIMA XML Parser class, which then can be used to parse the various types
- of UIMA component descriptors. Examples of this can be found in the remainder of this
- chapter.</para></listitem>
-
- <listitem><para>UIMAFramework.produceXXX(ResourceSpecifier): There are
- various produce methods that are used to create different types of UIMA components
- from their descriptors. The argument type, ResourceSpecifier, is the base
- interface that subsumes all types of component descriptors in UIMA. You can get a
- ResourceSpecifier from the XMLParser. Examples of produce methods are:
-
- <itemizedlist>
- <listitem><para>produceAnalysisEngine</para></listitem>
- <listitem><para>produceCasConsumer</para></listitem>
- <listitem><para>produceCasInitializer</para></listitem>
- <listitem><para>produceCollectionProcessingEngine</para></listitem>
- <listitem><para>produceCollectionReader</para></listitem>
- </itemizedlist>
-
- There are other variations of each of these methods that take additional, optional
- arguments. See the JavaDocs for details. </para></listitem>
-
- <listitem><para>UIMAFramework.getLogger(<optional-logger-name>):
- Gets a reference to the UIMA Logger, to which you can write log messages. If no logger
- name is passed, the name of the returned logger instance is
- <quote>org.apache.uima</quote>.</para></listitem>
-
- <listitem><para>UIMAFramework.getVersionString(): Gets the number of the UIMA
- version you are using.</para></listitem>
-
- <listitem><para>UIMAFramework.newDefaultResourceManager(): Gets an instance
- of the UIMA ResourceManager. The key method on ResourceManager is setDataPath,
- which allows you to specify the location where UIMA components will go to look for
- their external resource files. Once you've obtained and initialized a
- ResourceManager, you can pass it to any of the produceXXX methods. </para>
- </listitem></itemizedlist></para>
+ <para>An application developer's starting point for accessing UIMA framework functionality is the
+ <literal>org.apache.uima.UIMAFramework</literal> class. The following is a short introduction to some
+ important methods on this class. Several of these methods are used in examples in the rest of this chapter. For
+ more details, see the JavaDocs (in the docs/api directory of the UIMA SDK).
+
+ <itemizedlist>
+ <listitem>
+ <para>UIMAFramework.getXMLParser(): Returns an instance of the UIMA XML Parser class, which then can be
+ used to parse the various types of UIMA component descriptors. Examples of this can be found in the
+ remainder of this chapter.</para>
+ </listitem>
+
+ <listitem>
+ <para>UIMAFramework.produceXXX(ResourceSpecifier): There are various produce methods that are used
+ to create different types of UIMA components from their descriptors. The argument type,
+ ResourceSpecifier, is the base interface that subsumes all types of component descriptors in UIMA. You
+ can get a ResourceSpecifier from the XMLParser. Examples of produce methods are:
+
+ <itemizedlist>
+ <listitem>
+ <para>produceAnalysisEngine</para>
+ </listitem>
+ <listitem>
+ <para>produceCasConsumer</para>
+ </listitem>
+ <listitem>
+ <para>produceCasInitializer</para>
+ </listitem>
+ <listitem>
+ <para>produceCollectionProcessingEngine</para>
+ </listitem>
+ <listitem>
+ <para>produceCollectionReader</para>
+ </listitem>
+ </itemizedlist>
+ There are other variations of each of these methods that take additional, optional arguments. See the
+ JavaDocs for details. </para>
+ </listitem>
+
+ <listitem>
+ <para>UIMAFramework.getLogger(<optional-logger-name>): Gets a reference to the UIMA Logger,
+ to which you can write log messages. If no logger name is passed, the name of the returned logger instance
+ is <quote>org.apache.uima</quote>.</para>
+ </listitem>
+
+ <listitem>
+ <para>UIMAFramework.getVersionString(): Gets the number of the UIMA version you are using.</para>
+ </listitem>
+
+ <listitem>
+ <para>UIMAFramework.newDefaultResourceManager(): Gets an instance of the UIMA ResourceManager. The
+ key method on ResourceManager is setDataPath, which allows you to specify the location where UIMA
+ components will go to look for their external resource files. Once you've obtained and initialized a
+ ResourceManager, you can pass it to any of the produceXXX methods. </para>
+ </listitem>
+ </itemizedlist></para>
</section>
<section id="ugr.tug.application.using_aes">
<title>Using Analysis Engines</title>
- <para>This section describes how to add analysis capability to your application by using
- Analysis Engines developed using the UIMA SDK. An <emphasis>Analysis Engine
- (AE)</emphasis> is a component that analyzes artifacts (e.g. documents) and infers
- information about them.</para>
-
- <para>An Analysis Engine consists of two parts - Java classes (typically packaged as one
- or more JAR files) and <emphasis>AE descriptors</emphasis> (one or more XML files).
- You must put the Java classes in your application's class path, but thereafter you
- will not need to directly interact with them. The UIMA framework insulates you from this
- by providing a standard AnalysisEngine interfaces.</para>
-
- <para>The term <emphasis>Text Analysis Engine (TAE)</emphasis> is sometimes used to
- describe an Analysis Engine that analyzes a text document. In the UIMA SDK v1.x, there
- was a TextAnalysisEngine interface that was commonly used. However, as of the UIMA SDK
- v2.0, this interface has been deprecated and all applications should switch to using
- the standard AnalysisEngine interface.</para>
-
- <para>The AE descriptor XML files contain the configuration settings for the Analysis
- Engine as well as a description of the AE's input and output requirements. You may
- need to edit these files in order to configure the AE appropriately for your application
- - the supplier of the AE may have provided documentation (or comments in the XML
- descriptor itself) about how to do this.</para>
+ <para>This section describes how to add analysis capability to your application by using Analysis Engines
+ developed using the UIMA SDK. An <emphasis>Analysis Engine (AE)</emphasis> is a component that analyzes
+ artifacts (e.g. documents) and infers information about them.</para>
+
+ <para>An Analysis Engine consists of two parts - Java classes (typically packaged as one or more JAR files) and
+ <emphasis>AE descriptors</emphasis> (one or more XML files). You must put the Java classes in your
+ application's class path, but thereafter you will not need to directly interact with them. The UIMA
+ framework insulates you from this by providing a standard AnalysisEngine interfaces.</para>
+
+ <para>The term <emphasis>Text Analysis Engine (TAE)</emphasis> is sometimes used to describe an Analysis
+ Engine that analyzes a text document. In the UIMA SDK v1.x, there was a TextAnalysisEngine interface that was
+ commonly used. However, as of the UIMA SDK v2.0, this interface has been deprecated and all applications should
+ switch to using the standard AnalysisEngine interface.</para>
+
+ <para>The AE descriptor XML files contain the configuration settings for the Analysis Engine as well as a
+ description of the AE's input and output requirements. You may need to edit these files in order to
+ configure the AE appropriately for your application - the supplier of the AE may have provided documentation
+ (or comments in the XML descriptor itself) about how to do this.</para>
<section id="ugr.tug.application.instantiating_an_ae">
<title>Instantiating an Analysis Engine</title>
@@ -122,18 +131,14 @@
AnalysisEngine ae =
UIMAFramework.produceAnalysisEngine(specifier);</programlisting></para>
- <para>The first two lines parse the XML descriptor (for AEs with multiple descriptor
- files, one of them is the <quote>main</quote> descriptor - the AE documentation
- should indicate which it is). The result of the parse is a
- <literal>ResourceSpecifier</literal> object. The third line of code invokes a
- static factory method
- <literal>UIMAFramework.produceAnalysisEngine</literal>, which takes the
- specifier and instantiates an <literal>AnalysisEngine</literal>
- object.</para>
-
- <para>There is one caveat to using this approach - the Analysis Engine instance that you
- create will not support multiple threads running through it concurrently. If you
- need to support this, see <xref
+ <para>The first two lines parse the XML descriptor (for AEs with multiple descriptor files, one of them is the
+ <quote>main</quote> descriptor - the AE documentation should indicate which it is). The result of the parse
+ is a <literal>ResourceSpecifier</literal> object. The third line of code invokes a static factory method
+ <literal>UIMAFramework.produceAnalysisEngine</literal>, which takes the specifier and instantiates
+ an <literal>AnalysisEngine</literal> object.</para>
+
+ <para>There is one caveat to using this approach - the Analysis Engine instance that you create will not support
+ multiple threads running through it concurrently. If you need to support this, see <xref
linkend="ugr.tug.applications.multi_threaded"/>.</para>
</section>
@@ -141,14 +146,13 @@
<section id="ugr.tug.application.analyzing_text_documents">
<title>Analyzing Text Documents</title>
- <para>There are two ways to use the AE interface to analyze documents. You can either use
- the <emphasis>JCas</emphasis> interface, which is described in detail by <olink
- targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/> or you can directly
- use the <emphasis>CAS</emphasis> interface, which is described in detail in <olink
- targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/>. Besides text
- documents, other kinds of artifacts can also be analyzed; see <olink
- targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/> for
- more information.</para>
+ <para>There are two ways to use the AE interface to analyze documents. You can either use the
+ <emphasis>JCas</emphasis> interface, which is described in detail by <olink
+ targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/> or you can directly use the
+ <emphasis>CAS</emphasis> interface, which is described in detail in <olink
+ targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/>. Besides text documents, other kinds of
+ artifacts can also be analyzed; see <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.aas"/> for more information.</para>
<para>The basic structure of your application will look similar in both cases:</para>
@@ -194,59 +198,67 @@
//done
ae.destroy();</programlisting></para>
- <para>First, you create the CAS or JCas that you will use. Then, you repeat the following
- four steps for each document:</para>
+ <para>First, you create the CAS or JCas that you will use. Then, you repeat the following four steps for each
+ document:</para>
- <orderedlist spacing="compact"><listitem><para>Put the document text into the CAS or JCas.</para>
+ <orderedlist spacing="compact">
+ <listitem>
+ <para>Put the document text into the CAS or JCas.</para>
</listitem>
- <listitem><para>Call the AE's process method, passing the CAS or JCas as an
- argument</para></listitem>
+ <listitem>
+ <para>Call the AE's process method, passing the CAS or JCas as an argument</para>
+ </listitem>
- <listitem><para>Do something with the results that the AE has added to the CAS or
- JCas</para></listitem>
+ <listitem>
+ <para>Do something with the results that the AE has added to the CAS or JCas</para>
+ </listitem>
- <listitem><para>Call the CAS's or JCas's reset() method to prepare for another
- analysis </para></listitem></orderedlist>
+ <listitem>
+ <para>Call the CAS's or JCas's reset() method to prepare for another analysis </para>
+ </listitem>
+ </orderedlist>
</section>
<section id="ugr.tug.applications.analyzing_non_text_artifacts">
<title>Analyzing Non-Text Artifacts</title>
- <para>Analyzing non-text artifacts is similar to analyzing text documents. The main
- difference is that instead of using the <literal>setDocumentText</literal>
- method, you need to use the Sofa APIs to set the artifact into the CAS. See <olink
- targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/> for
- details.</para>
+ <para>Analyzing non-text artifacts is similar to analyzing text documents. The main difference is that
+ instead of using the <literal>setDocumentText</literal> method, you need to use the Sofa APIs to set the
+ artifact into the CAS. See <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>
+ for details.</para>
</section>
<section id="ugr.tug.applications.accessing_analysis_results">
<title>Accessing Analysis Results</title>
- <para>Annotators (and applications) access the results of analysis via the CAS, using
- the CAS or JCas interfaces. These results are accessed using the CAS Indexes. There is
- one built-in index for instances of the built-in type
- <literal>uima.tcas.Annotation</literal> that can be used to retrieve instances
- of <literal>Annotation</literal> or any subtype of Annotation. You can also define
- additional indexes over other types. </para>
- <para>Indexes provide a method to obtain an iterators over their contents; the
- iterator returns the matching elements one at time from the CAS.</para>
+ <para>Annotators (and applications) access the results of analysis via the CAS, using the CAS or JCas
+ interfaces. These results are accessed using the CAS Indexes. There is one built-in index for instances of
+ the built-in type <literal>uima.tcas.Annotation</literal> that can be used to retrieve instances of
+ <literal>Annotation</literal> or any subtype of Annotation. You can also define additional indexes over
+ other types. </para>
+ <para>Indexes provide a method to obtain an iterators over their contents; the iterator returns the matching
+ elements one at time from the CAS.</para>
<section id="ugr.tug.applications.accessing_results_using_jcas">
<title>Accessing Analysis Results using the JCas</title>
<para>See:</para>
- <itemizedlist><listitem><para> <olink
- targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.aae.reading_results_previous_annotators"/> </para>
+ <itemizedlist>
+ <listitem>
+ <para> <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.aae.reading_results_previous_annotators"/> </para>
</listitem>
- <listitem><para> <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/></para></listitem>
+ <listitem>
+ <para> <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/></para>
+ </listitem>
- <listitem><para>The JavaDocs for
- <literal>org.apache.uima.jcas.JCas</literal>. </para></listitem>
- </itemizedlist>
+ <listitem>
+ <para>The JavaDocs for <literal>org.apache.uima.jcas.JCas</literal>. </para>
+ </listitem>
+ </itemizedlist>
</section>
@@ -255,26 +267,29 @@
<para>See:</para>
- <itemizedlist><listitem><para> <olink targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.cas"/></para></listitem>
+ <itemizedlist>
+ <listitem>
+ <para> <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/></para>
+ </listitem>
+
+ <listitem>
+ <para> The source code for <literal>org.apache.uima.examples.PrintAnnotations</literal>, which
+ is in <literal>examples\src.</literal></para>
+ </listitem>
- <listitem><para> The source code for
- <literal>org.apache.uima.examples.PrintAnnotations</literal>, which is
- in <literal>examples\src.</literal></para></listitem>
-
- <listitem><para>The JavaDocs for the
- <literal>org.apache.uima.cas</literal> and
- <literal>org.apache.uima.cas.text</literal> packages. </para>
- </listitem></itemizedlist>
+ <listitem>
+ <para>The JavaDocs for the <literal>org.apache.uima.cas</literal> and
+ <literal>org.apache.uima.cas.text</literal> packages. </para>
+ </listitem>
+ </itemizedlist>
</section>
</section>
<section id="ugr.tug.applications.multi_threaded">
<title>Multi-threaded Applications</title>
- <para>The simplest way to use an AE in a multi-threaded environment is to use the Java
- synchronized keyword to ensure that only one thread is using an AE at any given time.
- For example:
+ <para>The simplest way to use an AE in a multi-threaded environment is to use the Java synchronized keyword to
+ ensure that only one thread is using an AE at any given time. For example:
<programlisting>public class MyApplication {
@@ -305,30 +320,26 @@
...
}</programlisting></para>
- <para>Without the synchronized keyword, this application would not be thread-safe.
- If multiple threads called the analyzeDocument method simultaneously, they would
- both use the same CAS and clobber each others' results. The synchronized keyword
- ensures that no more than one thread is executing this method at any given time. For
- more information on thread synchronization in Java, see <ulink
+ <para>Without the synchronized keyword, this application would not be thread-safe. If multiple threads
+ called the analyzeDocument method simultaneously, they would both use the same CAS and clobber each others'
+ results. The synchronized keyword ensures that no more than one thread is executing this method at any given
+ time. For more information on thread synchronization in Java, see <ulink
url="http://java.sun.com/docs/books/tutorial/essential/threads/multithreaded.html"/>
.</para>
- <para>The synchronized keyword ensures thread-safety, but does not allow you to
- process more than one document at a time. If you need to process multiple documents
- simultaneously (for example, to make use of a multiprocessor machine), you'll
- need to use more than one CAS instance.</para>
-
- <para>Because CAS instances use memory and can take some time to construct, you don't
- want to create a new CAS instance for each request. Instead, you should use a feature of
- the UIMA SDK called the <emphasis>CAS Pool</emphasis>, implemented by the type
- <literal>CasPool</literal>.</para>
-
- <para>A CAS Pool contains some number of CAS instances (you specify how many when you
- create the pool). When a thread wants to use a CAS, it <emphasis>checks
- out</emphasis> an instance from the pool. When the thread is done using the CAS, it
- must <emphasis>release</emphasis> the CAS instance back into the pool. If all
- instances are checked out, additional threads will block and wait for an instance to
- become available. Here is some example code:
+ <para>The synchronized keyword ensures thread-safety, but does not allow you to process more than one
+ document at a time. If you need to process multiple documents simultaneously (for example, to make use of a
+ multiprocessor machine), you'll need to use more than one CAS instance.</para>
+
+ <para>Because CAS instances use memory and can take some time to construct, you don't want to create a new CAS
+ instance for each request. Instead, you should use a feature of the UIMA SDK called the <emphasis>CAS
+ Pool</emphasis>, implemented by the type <literal>CasPool</literal>.</para>
+
+ <para>A CAS Pool contains some number of CAS instances (you specify how many when you create the pool). When a
+ thread wants to use a CAS, it <emphasis>checks out</emphasis> an instance from the pool. When the thread is
+ done using the CAS, it must <emphasis>release</emphasis> the CAS instance back into the pool. If all
+ instances are checked out, additional threads will block and wait for an instance to become available. Here
+ is some example code:
<programlisting>public class MyApplication {
@@ -367,37 +378,33 @@
...
}</programlisting></para>
- <para>There is not much more code required here than in the previous example. First,
- there is one additional parameter to the AnalysisEngine producer, specifying the
- number of annotator instances to create<footnote><para> Both the UIMA Collection
- Processing Manager framework and the remote deployment services framework have
- implementations which use CAS pools in this manner, and thereby relieve the
- annotator developer of the necessity to make their annotators thread-safe.</para>
- </footnote>. Then, instead of creating a single CAS in the constructor, we now create
- a CasPool containing 3 instances. In the analyze method, we check out a CAS, use it, and
- then release it.</para>
- <note><para>Frequently, the two numbers (number of CASes, and the number of AEs) will
- be the same. It would not make sense to have the number of CASes less than the number of AEs
- – the extra AE instances would always block waiting for a CAS from the pool. It
- could make sense to have additional CASes, though – if you had other
- multi-threaded processes that were using the CASes, other than the AEs. </para>
- </note>
-
- <para>The getCAS() method returns a CAS which is not specialized to any particular
- subject of analysis. To process things other than this, please refer to <olink
- targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>
- .</para>
-
- <para>Note the use of the try...finally block. This is very important, as it ensures
- that the CAS we have checked out will be released back into the pool, even if the
- analysis code throws an exception. You should always use try...finally when using
- the CAS pool; if you do not, you risk exhausting the pool and causing deadlock.</para>
-
- <para>The parameter 0 passed to the CasPool.getCas() method is a timeout value. If this
- is set to a positive integer, it is the maximum number of milliseconds that the thread
- will wait for an instance to become available in the pool. If this time elapses, the
- getCas method will return null, and the application can do something intelligent,
- like ask the user to try again later. A value of 0 will cause the thread to wait for an
+ <para>There is not much more code required here than in the previous example. First, there is one additional
+ parameter to the AnalysisEngine producer, specifying the number of annotator instances to
+ create<footnote>
+ <para> Both the UIMA Collection Processing Manager framework and the remote deployment services framework
+ have implementations which use CAS pools in this manner, and thereby relieve the annotator developer of
+ the necessity to make their annotators thread-safe.</para> </footnote>. Then, instead of creating a
+ single CAS in the constructor, we now create a CasPool containing 3 instances. In the analyze method, we check
+ out a CAS, use it, and then release it.</para> <note>
+ <para>Frequently, the two numbers (number of CASes, and the number of AEs) will be the same. It would not make
+ sense to have the number of CASes less than the number of AEs
+ – the extra AE instances would always block waiting for a CAS from the pool. It could make sense to have
+ additional CASes, though – if you had other multi-threaded processes that were using the CASes, other
+ than the AEs. </para> </note>
+
+ <para>The getCAS() method returns a CAS which is not specialized to any particular subject of analysis. To
+ process things other than this, please refer to <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.aas"/> .</para>
+
+ <para>Note the use of the try...finally block. This is very important, as it ensures that the CAS we have checked
+ out will be released back into the pool, even if the analysis code throws an exception. You should always use
+ try...finally when using the CAS pool; if you do not, you risk exhausting the pool and causing
+ deadlock.</para>
+
+ <para>The parameter 0 passed to the CasPool.getCas() method is a timeout value. If this is set to a positive
+ integer, it is the maximum number of milliseconds that the thread will wait for an instance to become
+ available in the pool. If this time elapses, the getCas method will return null, and the application can do
+ something intelligent, like ask the user to try again later. A value of 0 will cause the thread to wait for an
available CAS, potentially forever.</para>
</section>
@@ -405,23 +412,19 @@
<title>Using Multiple Analysis Engines and Creating Shared CASes</title>
<titleabbrev>Multiple AEs & Creating Shared CASes</titleabbrev>
- <para>In most cases, the easiest way to use multiple Analysis Engines from within an
- application is to combine them into an aggregate AE. For instructions, see <olink
- targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.aae.building_aggregates"/>. Be sure that you
- understand this method before deciding to use the more advanced feature described in
- this section.</para>
-
- <para>If you decide that your application does need to instantiate multiple AEs and
- have those AEs share a single CAS, then you will no longer be able to use the various
- methods on the <literal>AnalysisEngine</literal> class that create CASes (or
- JCases) to create your CAS. This is because these methods create a CAS with a data model
- specific to a single AE and which therefore cannot be shared by other AEs. Instead, you
- create a CAS as follows:</para>
-
- <para>Suppose you have two analysis engines, and one CAS Consumer, and you want to
- create one type system from the merge of all of their type specifications. Then you can
- do the following:</para>
+ <para>In most cases, the easiest way to use multiple Analysis Engines from within an application is to combine
+ them into an aggregate AE. For instructions, see <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.aae.building_aggregates"/>. Be sure that you understand this method before
+ deciding to use the more advanced feature described in this section.</para>
+
+ <para>If you decide that your application does need to instantiate multiple AEs and have those AEs share a
+ single CAS, then you will no longer be able to use the various methods on the
+ <literal>AnalysisEngine</literal> class that create CASes (or JCases) to create your CAS. This is because
+ these methods create a CAS with a data model specific to a single AE and which therefore cannot be shared by
+ other AEs. Instead, you create a CAS as follows:</para>
+
+ <para>Suppose you have two analysis engines, and one CAS Consumer, and you want to create one type system from
+ the merge of all of their type specifications. Then you can do the following:</para>
<programlisting>AnalysisEngineDescription aeDesc1 =
@@ -444,76 +447,66 @@
// (optional, if using the JCas interface)
JCas jcas = cas.getJCas();</programlisting>
- <para>The CasCreationUtils class takes care of the work of merging the AEs' type
- systems and producing a CAS for the combined type system. If the type systems are not
- compatible, an exception will be thrown.</para>
+ <para>The CasCreationUtils class takes care of the work of merging the AEs' type systems and producing a
+ CAS for the combined type system. If the type systems are not compatible, an exception will be thrown.</para>
</section>
<section id="ugr.tug.application.saving_cases_to_file_systems">
<title>Saving CASes to file systems</title>
- <para>The UIMA framework provides APIs to save and restore the contents of a CAS to
- streams. The CASes are stored in an XML format. There are two forms of this format. The
- preferred form is the XMI form (see <olink
+ <para>The UIMA framework provides APIs to save and restore the contents of a CAS to streams. The CASes are stored
+ in an XML format. There are two forms of this format. The preferred form is the XMI form (see <olink
targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.xmi_emf.using_xmi_cas_serialization"/>). An older
- format is also available, called XCAS.</para>
+ targetptr="ugr.tug.xmi_emf.using_xmi_cas_serialization"/>). An older format is also available,
+ called XCAS.</para>
- <para>To save an XMI representation of a CAS, use the <literal>serialize</literal>
- method of the class <literal>org.apache.uima.util.XmlCasSerializer</literal>.
- To save an XCAS representation of a CAS, use the class
- <literal>org.apache.uima.cas.impl.XCASSerializer</literal> instead; see
- the JavaDocs for details.</para>
-
- <para>Both of these external forms can be read back in, using the
- <literal>deserialize</literal> method of the class
- <literal>org.apache.uima.util.XmlCasDeserializer</literal>. This
- method deserializes into a pre-existing CAS, which you must create ahead of time,
- pre-set-up with the proper type system. See the JavaDocs for details.</para>
+ <para>To save an XMI representation of a CAS, use the <literal>serialize</literal> method of the class
+ <literal>org.apache.uima.util.XmlCasSerializer</literal>. To save an XCAS representation of a CAS,
+ use the class <literal>org.apache.uima.cas.impl.XCASSerializer</literal> instead; see the JavaDocs
+ for details.</para>
+
+ <para>Both of these external forms can be read back in, using the <literal>deserialize</literal> method of
+ the class <literal>org.apache.uima.util.XmlCasDeserializer</literal>. This method deserializes
+ into a pre-existing CAS, which you must create ahead of time, pre-set-up with the proper type system. See the
+ JavaDocs for details.</para>
</section>
</section>
<section id="ugr.tug.application.using_cpes">
<title>Using Collection Processing Engines</title>
- <para>A <emphasis>Collection Processing Engine (CPE)</emphasis> processes
- collections of artifacts (documents) through the combination of the following
- components: a Collection Reader, an optional CAS Initializer, Analysis Engines, and
- CAS Consumers. Collection Processing Engines and their components are described in
- <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cpe"/>
- .</para>
-
- <para>Like Analysis Engines, CPEs consist of a set of Java classes and a set of
- descriptors. You need to make sure the Java classes are in your classpath, but otherwise
- you only deal with descriptors.</para>
+ <para>A <emphasis>Collection Processing Engine (CPE)</emphasis> processes collections of artifacts
+ (documents) through the combination of the following components: a Collection Reader, an optional CAS
+ Initializer, Analysis Engines, and CAS Consumers. Collection Processing Engines and their components are
+ described in <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cpe"/> .</para>
+
+ <para>Like Analysis Engines, CPEs consist of a set of Java classes and a set of descriptors. You need to make sure
+ the Java classes are in your classpath, but otherwise you only deal with descriptors.</para>
<section id="ugr.tug.application.running_a_cpe_from_a_descriptor">
<title>Running a Collection Processing Engine from a Descriptor</title>
<titleabbrev>Running a CPE from a Descriptor</titleabbrev>
<para><olink targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.cpe.running_cpe_from_application"/> describes how to
- use the APIs to read a CPE descriptor and run it from an application.</para>
+ targetptr="ugr.tug.cpe.running_cpe_from_application"/> describes how to use the APIs to read a CPE
+ descriptor and run it from an application.</para>
</section>
- <section
- id="ugr.tug.application.configuring_a_cpe_descriptor_programmatically">
+ <section id="ugr.tug.application.configuring_a_cpe_descriptor_programmatically">
<title>Configuring a Collection Processing Engine Descriptor Programmatically</title>
<titleabbrev>Configuring a CPE Descriptor Programmatically</titleabbrev>
- <para>For the finest level of control over the CPE descriptor settings, the CPE offers
- programmatic access to the descriptor via an API. With this API, a developer can
- create a complete descriptor and then save the result to a file. This also can be used to
- read in a descriptor (using XMLParser.parseCpeDescription as shown in the previous
- section), modify it, and write it back out again. The CPE Descriptor API allows a
- developer to redefine default behavior related to error handling for each
- component, turn-on check-pointing, change performance characteristics of the
- CPE, and plug-in a custom timer.</para>
+ <para>For the finest level of control over the CPE descriptor settings, the CPE offers programmatic access to
+ the descriptor via an API. With this API, a developer can create a complete descriptor and then save the result
+ to a file. This also can be used to read in a descriptor (using XMLParser.parseCpeDescription as shown in the
+ previous section), modify it, and write it back out again. The CPE Descriptor API allows a developer to
+ redefine default behavior related to error handling for each component, turn-on check-pointing, change
+ performance characteristics of the CPE, and plug-in a custom timer.</para>
- <para>Below is some example code that illustrates how this works. See the JavaDocs for
- package org.apache.uima.collection.metadata for more details.</para>
+ <para>Below is some example code that illustrates how this works. See the JavaDocs for package
+ org.apache.uima.collection.metadata for more details.</para>
<programlisting>//Creates descriptor with default settings
@@ -640,29 +633,31 @@
<section id="ugr.tug.application.setting_configuration_parameters">
<title>Setting Configuration Parameters</title>
- <para>Configuration parameters can be set using APIs as well as configured using the XML
- descriptor metadata specification (see <olink
- targetdoc="&uima_docs_tutorial_guides;"
+ <para>Configuration parameters can be set using APIs as well as configured using the XML descriptor metadata
+ specification (see <olink targetdoc="&uima_docs_tutorial_guides;"
targetptr="ugr.tug.aae.configuration_parameters"/>.</para>
<para>There are two different places you can set the parameters via the APIs.</para>
- <itemizedlist spacing="compact"><listitem><para>After reading the XML descriptor
- for a component, but before you produce the component itself, and</para></listitem>
-
- <listitem><para>After the component has been produced. </para></listitem>
- </itemizedlist>
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>After reading the XML descriptor for a component, but before you produce the component itself,
+ and</para>
+ </listitem>
+
+ <listitem>
+ <para>After the component has been produced. </para>
+ </listitem>
+ </itemizedlist>
<para>Setting the parameters before you produce the component is done using the
- ConfigurationParameterSettings object. You get an instance of this for a particular
- component by accessing that component description's metadata. For instance, if
- you produced a component description by using
- <literal>UIMAFramework.getXMLParser().parse...</literal> method, you can use
- that component description's getMetaData() method to get the metadata, and then
- the metadata's getConfigurationParameterSettings method to get the
- ConfigurationParameterSettings object. Using that object, you can set individual
- parameters using the setParameterValue method. Here's an example, for a CAS
- Consumer component:
+ ConfigurationParameterSettings object. You get an instance of this for a particular component by accessing
+ that component description's metadata. For instance, if you produced a component description by using
+ <literal>UIMAFramework.getXMLParser().parse...</literal> method, you can use that component
+ description's getMetaData() method to get the metadata, and then the metadata's
+ getConfigurationParameterSettings method to get the ConfigurationParameterSettings object. Using that
+ object, you can set individual parameters using the setParameterValue method. Here's an example, for a
+ CAS Consumer component:
<programlisting>// Create a description object by reading the XML for the descriptor
@@ -686,69 +681,58 @@
<programlisting>CasConsumer component =
UIMAFramework.produceCasConsumer(casConsumerDesc);</programlisting></para>
- <para>A side effect of producing a component is calling the component's
- <quote>initialize</quote> method, allowing it to read its configuration
- parameters. If you want to change parameters after this, use
+ <para>A side effect of producing a component is calling the component's <quote>initialize</quote> method,
+ allowing it to read its configuration parameters. If you want to change parameters after this, use
<programlisting>component.setConfigParameterValue(
<quote><parameter-name></quote>,
<quote><parameter-value></quote>);</programlisting>
- and then signal the component to re-read its configuration by calling the component's
- reconfigure method:
+ and then signal the component to re-read its configuration by calling the component's reconfigure method:
<programlisting>component.reconfigure();</programlisting></para>
- <para>Although these examples are for a CAS Consumer component, the parameter APIs also
- work for other kinds of components.</para>
+ <para>Although these examples are for a CAS Consumer component, the parameter APIs also work for other kinds of
+ components.</para>
</section>
<section id="ugr.tug.application.integrating_text_analysis_and_search">
<title>Integrating Text Analysis and Search</title>
- <para>The UIMA SDK on IBM's alphaWorks <ulink
- url="http://www.alphaworks.ibm.com/tech/uima"/> includes a semantic search
- engine that you can use to build a search index that includes the results of the analysis
- done by your AE.
- This combination of AEs with a search engine capable of indexing both words and
- annotations over spans of text enables what UIMA refers to as <emphasis>semantic
- search</emphasis>. Over time we expect to provide additional information on
- integrating other open source search engines.</para>
-
- <para>Semantic search is a search where the semantic intent of the query is specified
- using one or more entity or relation specifiers. For example, one could specify that
- they are looking for a person (named) <quote>Bush.</quote> Such a query would then not
- return results about the kind of bushes that grow in your garden.</para>
+ <para>The UIMA SDK on IBM's alphaWorks <ulink url="http://www.alphaworks.ibm.com/tech/uima"/> includes a
+ semantic search engine that you can use to build a search index that includes the results of the analysis done by
+ your AE. This combination of AEs with a search engine capable of indexing both words and annotations over spans
+ of text enables what UIMA refers to as <emphasis>semantic search</emphasis>. Over time we expect to provide
+ additional information on integrating other open source search engines.</para>
+
+ <para>Semantic search is a search where the semantic intent of the query is specified using one or more entity or
+ relation specifiers. For example, one could specify that they are looking for a person (named)
+ <quote>Bush.</quote> Such a query would then not return results about the kind of bushes that grow in your
+ garden.</para>
<section id="ugr.tug.application.building_an_index">
<title>Building an Index</title>
- <para>To build a semantic search index using the UIMA SDK, you run a Collection
- Processing Engine that includes your AE along with a CAS Consumer which takes the
- tokens and annotatitions, together with sentence boundaries, and feeds them to a
- semantic searcher's index term input. The alphaWorks semantic search component
- includes a CAS Consumer called the <emphasis>Semantic Search CAS
- Indexer</emphasis> that does this; this component is available from the alphaWorks
- site. Your AE must include an annotator that produces Tokens and Sentence
- annotations, along with any <quote>semantic</quote> annotations, because the
- Indexer requires this. The Semantic Search CAS Indexer's descriptor is located
- here:
- <literal>examples/descriptors/cas_consumer/SemanticSearchCasIndexer.xml</literal>
- .</para>
+ <para>To build a semantic search index using the UIMA SDK, you run a Collection Processing Engine that includes
+ your AE along with a CAS Consumer which takes the tokens and annotatitions, together with sentence
+ boundaries, and feeds them to a semantic searcher's index term input. The alphaWorks semantic search
+ component includes a CAS Consumer called the <emphasis>Semantic Search CAS Indexer</emphasis> that does
+ this; this component is available from the alphaWorks site. Your AE must include an annotator that produces
+ Tokens and Sentence annotations, along with any <quote>semantic</quote> annotations, because the
+ Indexer requires this. The Semantic Search CAS Indexer's descriptor is located here:
+ <literal>examples/descriptors/cas_consumer/SemanticSearchCasIndexer.xml</literal> .</para>
<section id="ugr.tug.application.search.configuring_indexer">
<title>Configuring the Semantic Search CAS Indexer</title>
- <para>Since there are several ways you might want to build a search index from the
- information in the CAS produced by your AE, you need to supply the Semantic Search
- CAS Consumer – Indexer with configuration information in the form of an
- <emphasis>Index Build Specification</emphasis> file. Apache UIMA includes
- code for parsing Index Build Specification files (see the Javadocs for details).
- An example of an Indexing specification tailored to the AE from the tutorial in the
- <olink targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.aae"/> is located in
- <literal>examples/descriptors/tutorial/search/MeetingIndexBuildSpec.xml</literal>
- . It looks like this:
+ <para>Since there are several ways you might want to build a search index from the information in the CAS
+ produced by your AE, you need to supply the Semantic Search CAS Consumer – Indexer with
+ configuration information in the form of an <emphasis>Index Build Specification</emphasis> file.
+ Apache UIMA includes code for parsing Index Build Specification files (see the Javadocs for details). An
+ example of an Indexing specification tailored to the AE from the tutorial in the <olink
+ targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/> is located in
+ <literal>examples/descriptors/tutorial/search/MeetingIndexBuildSpec.xml</literal> . It looks
+ like this:
<programlisting><![CDATA[<indexBuildSpecification>
@@ -797,161 +781,156 @@
</indexBuildItem>
</indexBuildSpecification>]]></programlisting></para>
- <para>The index build specification is a series of index build items, each of which
- identifies a CAS annotation type (a subtype of
- <literal>uima.tcas.Annotation</literal> – see <olink
- targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/>) and a
- style.</para>
+ <para>The index build specification is a series of index build items, each of which identifies a CAS
+ annotation type (a subtype of <literal>uima.tcas.Annotation</literal> – see <olink
+ targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/>) and a style.</para>
<para>The first item in this example specifies that the annotation type
- <literal>org.apache.uima.examples.tokenizer.Token</literal> should be
- indexed with the <quote>Term</quote> style. This means that each span of text
- annotated by a Token will be considered a single token for standard text search
- purposes.</para>
+ <literal>org.apache.uima.examples.tokenizer.Token</literal> should be indexed with the
+ <quote>Term</quote> style. This means that each span of text annotated by a Token will be considered a
+ single token for standard text search purposes.</para>
<para>The second item in this example specifies that the annotation type
- <literal>org.apache.uima.examples.tokenizer.Sentence</literal> should be
- indexed with the <quote>Breaking</quote> style. This means that each span of text
- annotated by a Sentence will be considered a single sentence, which can affect that
- search engine's algorithm for matching queries. The semantic search engine
- available from alphaWorks always requires tokens and sentences in order to index a
- document.</para>
- <note><para>Requirements for Term and Breaking rules: The Semantic Search indexer
- from alphaWorks requires that the items to be indexed as words be designated using the
- Term rule. </para></note>
-
- <para>The remaining items all use the <quote>Annotation</quote> style. This
- indicates that each annotation of the specified types will be stored in the index as
- a searchable span, with a name equal to the annotation name (without the
- namespace).</para>
+ <literal>org.apache.uima.examples.tokenizer.Sentence</literal> should be indexed with the
+ <quote>Breaking</quote> style. This means that each span of text annotated by a Sentence will be
+ considered a single sentence, which can affect that search engine's algorithm for matching queries. The
+ semantic search engine available from alphaWorks always requires tokens and sentences in order to index a
+ document.</para> <note>
+ <para>Requirements for Term and Breaking rules: The Semantic Search indexer from alphaWorks requires that
+ the items to be indexed as words be designated using the Term rule. </para></note>
+
+ <para>The remaining items all use the <quote>Annotation</quote> style. This indicates that each
+ annotation of the specified types will be stored in the index as a searchable span, with a name equal to the
+ annotation name (without the namespace).</para>
<para>Also, features of annotations can be indexed using the
- <literal><attributeMappings></literal> subelement. In the example
- index build specification, we declare that the <literal>building</literal>
- feature of the type <literal>org.apache.uima.tutorial.RoomNumber</literal>
- should be indexed. The <literal><indexName></literal> element can be
- used to map the feature name to a different name in the index, but in this example we
- have opted to use the same name, <literal>building</literal>. </para>
-
- <para> At the end of the batch or collection, the Semantic Search CAS Indexer
- builds the index. This index can be queried with simple tokens or with XML
- tags.</para>
+ <literal><attributeMappings></literal> subelement. In the example index build
+ specification, we declare that the <literal>building</literal> feature of the type
+ <literal>org.apache.uima.tutorial.RoomNumber</literal> should be indexed. The
+ <literal><indexName></literal> element can be used to map the feature name to a different name in
+ the index, but in this example we have opted to use the same name, <literal>building</literal>. </para>
+
+ <para> At the end of the batch or collection, the Semantic Search CAS Indexer builds the index. This index can
+ be queried with simple tokens or with XML tags.</para>
<para>Examples:
- <itemizedlist spacing="compact"><listitem><para>A query on the word
- <quote>UIMA</quote> will retrieve all documents that have the occurrence of
- the word. But a query of the type
- <literal><Meeting>UIMA</Meeting></literal> will retrieve
- only those documents that contain a Meeting annotation (produced by our
- MeetingDetector TAE, for example), where that Meeting annotation contains the
- word <quote>UIMA</quote>.</para></listitem>
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>A query on the word <quote>UIMA</quote> will retrieve all documents that have the occurrence
+ of the word. But a query of the type <literal><Meeting>UIMA</Meeting></literal>
+ will retrieve only those documents that contain a Meeting annotation (produced by our
+ MeetingDetector TAE, for example), where that Meeting annotation contains the word
+ <quote>UIMA</quote>.</para>
+ </listitem>
- <listitem><para>A query for <literal><RoomNumber
- building="Yorktown"/></literal> will return documents that have a
- RoomNumber annotation whose <literal>building</literal> feature
- contains the term <quote>Yorktown</quote>. </para></listitem>
- </itemizedlist></para>
-
- <para>More information on the syntax of these kinds of queries, called XML
- Fragments, can be found in documentation for the semantic search engine
- component
- on <ulink url="http://www.alphaworks.ibm.com/tech/uima"/>. For more information on the Index
- Build Specification format, see the UIMA JavaDocs for class
- <literal>org.apache.uima.search.IndexBuildSpecification</literal>.
- Accessing the JavaDocs is described <olink targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.javadocs"/>.</para>
+ <listitem>
+ <para>A query for <literal><RoomNumber building="Yorktown"/></literal> will return
+ documents that have a RoomNumber annotation whose <literal>building</literal> feature
+ contains the term <quote>Yorktown</quote>. </para>
+ </listitem>
+ </itemizedlist></para>
+
+ <para>More information on the syntax of these kinds of queries, called XML Fragments, can be found in
+ documentation for the semantic search engine component on <ulink
+ url="http://www.alphaworks.ibm.com/tech/uima"/>. For more information on the Index Build
+ Specification format, see the UIMA JavaDocs for class
+ <literal>org.apache.uima.search.IndexBuildSpecification</literal>. Accessing the JavaDocs is
+ described <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.javadocs"/>.</para>
</section>
- <section
- id="ugr.tug.application.search.cpe_with_semantic_search_cas_consumer">
+ <section id="ugr.tug.application.search.cpe_with_semantic_search_cas_consumer">
<title>Building and Running a CPE including the Semantic Search CAS Indexer</title>
<titleabbrev>Using Semantic Search CAS Indexer</titleabbrev>
- <para>The following steps illustrate how to build and run a CPE that uses the UIMA
- Meeting Detector TAE and the Simple Token and Sentence Annotator, discussed in the
- <olink targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.aae"/> along with a CAS Consumer called the
- Semantic Search CAS Indexer, to build an index that allows you to query for
- documents based not only on textual content but also on whether they contain
- mentions of Meetings detected by the TAE.</para>
-
- <para>Run the CPE Configurator tool by executing the <literal>cpeGui</literal>
- shell script in the <literal>bin</literal> directory of the UIMA SDK. (For
- instructions on using this tool, see the <olink targetdoc="&uima_docs_tools;"
- targetptr="ugr.tools.cpe"/>.)</para>
+ <para>The following steps illustrate how to build and run a CPE that uses the UIMA Meeting Detector TAE and the
+ Simple Token and Sentence Annotator, discussed in the <olink
+ targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/> along with a CAS Consumer
+ called the Semantic Search CAS Indexer, to build an index that allows you to query for documents based not
+ only on textual content but also on whether they contain mentions of Meetings detected by the TAE.</para>
+
+ <para>Run the CPE Configurator tool by executing the <literal>cpeGui</literal> shell script in the
+ <literal>bin</literal> directory of the UIMA SDK. (For instructions on using this tool, see the <olink
+ targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.)</para>
- <para>In the CPE Configurator tool, select the following components by browsing to
- their descriptors:</para>
+ <para>In the CPE Configurator tool, select the following components by browsing to their
+ descriptors:</para>
<itemizedlist spacing="compact">
- <listitem><para>Collection Reader:
- <literal>%UIMA_HOME%/examples/descriptors/collectionReader/
- FileSystemCollectionReader.xml</literal></para></listitem>
-
- <listitem><para>Analysis Engine: include both of these; one produces
- tokens/sentences, required by the indexer in all cases and the other produces
- the meeting annotations of interest.
-
- <literallayout>%UIMA_HOME%/examples/descriptors/analysis_engine/
+ <listitem>
+ <para>Collection Reader: <literal>%UIMA_HOME%/examples/descriptors/collectionReader/
+ FileSystemCollectionReader.xml</literal></para>
+ </listitem>
+
+ <listitem>
+ <para>Analysis Engine: include both of these; one produces tokens/sentences, required by the indexer
+ in all cases and the other produces the meeting annotations of interest.
+
+
+ <literallayout>%UIMA_HOME%/examples/descriptors/analysis_engine/
SimpleTokenAndSentenceAnnotator.xml</literallayout></para>
- </listitem>
+ </listitem>
- <listitem><para><literal> and
- %UIMA_HOME%/examples/descriptors/tutorial/ex6/
- UIMAMeetingDetectorTAE.xml</literal></para></listitem>
+ <listitem>
+ <para><literal> and %UIMA_HOME%/examples/descriptors/tutorial/ex6/
+ UIMAMeetingDetectorTAE.xml</literal></para>
+ </listitem>
- <listitem><para>Two CAS Consumers:
-
-
- <literallayout>%UIMA_HOME%/examples/descriptors/cas_consumer/
+ <listitem>
+ <para>Two CAS Consumers:
+
+
+ <literallayout>%UIMA_HOME%/examples/descriptors/cas_consumer/
SemanticSearchCasIndexer.xml
%UIMA_HOME%/examples/descriptors/cas_consumer/
XmiWriterCasConsumer.xml</literallayout></para>
- </listitem></itemizedlist>
+ </listitem>
+ </itemizedlist>
<para>Set up parameters:</para>
- <itemizedlist spacing="compact"><listitem><para> Set the File System
- Collection Reader's <quote>Input Directory</quote> parameter to point to the
- <literal>%UIMA_HOME%/examples/data</literal> directory.</para>
- </listitem>
-
- <listitem><para>Set the Semantic Search CAS Indexer's <quote>Indexing
- Specification Descriptor</quote> parameter to point to
- <literal>%UIMA_HOME%/examples/descriptors/tutorial/search/
- MeetingIndexBuildSpec.xml</literal></para></listitem>
-
- <listitem><para>Set the Semantic Search CAS Indexer's <quote>Index
- Dir</quote> parameter to whatever directory into which you want the indexer to
- write its index files. <warning><para>The Indexer
- <emphasis>erases</emphasis> old versions of the files it creates in this
- directory. </para></warning> </para></listitem>
-
- <listitem><para>Set the XMI Writer CAS Consumer's <quote>Output
- Directory</quote> parameter to whatever directory into which you want to store
- the XMI files containing the results of your analysis for each document.
- </para></listitem></itemizedlist>
-
- <para>Click on the Run Button. Once the run completes, a statistics dialog should
- appear, in which you can see how much time was spent in each of the components
- involved in the run.</para>
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para> Set the File System Collection Reader's <quote>Input Directory</quote> parameter to point to
+ the <literal>%UIMA_HOME%/examples/data</literal> directory.</para>
+ </listitem>
+
+ <listitem>
+ <para>Set the Semantic Search CAS Indexer's <quote>Indexing Specification Descriptor</quote>
+ parameter to point to <literal>%UIMA_HOME%/examples/descriptors/tutorial/search/
+ MeetingIndexBuildSpec.xml</literal></para>
+ </listitem>
+
+ <listitem>
+ <para>Set the Semantic Search CAS Indexer's <quote>Index Dir</quote> parameter to whatever
+ directory into which you want the indexer to write its index files. <warning>
+ <para>The Indexer <emphasis>erases</emphasis> old versions of the files it creates in this
+ directory. </para></warning> </para>
+ </listitem>
+
+ <listitem>
+ <para>Set the XMI Writer CAS Consumer's <quote>Output Directory</quote> parameter to whatever
+ directory into which you want to store the XMI files containing the results of your analysis for each
+ document. </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Click on the Run Button. Once the run completes, a statistics dialog should appear, in which you can see
+ how much time was spent in each of the components involved in the run.</para>
</section>
</section>
<section id="ugr.tug.application.search.query_tool">
<title>Semantic Search Query Tool</title>
- <para>The Semantic Search component from UIMA on alphaWorks contains a simple tool for
- running queries against a semantic search index. After building an index as
- described in the previous section, you can launch this tool by running the shell
- script: semanticSearch, found in the <literal>/bin</literal> subdirectory of the
- Semantic Search UIMA install, at the command prompt. If you are using Eclipse, and
- have installed the UIMA examples, there will be a Run configuration you can use to
- conveniently launch this, called <literal>UIMA Semantic Search</literal>. This
- will display the following screen:
+ <para>The Semantic Search component from UIMA on alphaWorks contains a simple tool for running queries
+ against a semantic search index. After building an index as described in the previous section, you can launch
+ this tool by running the shell script: semanticSearch, found in the <literal>/bin</literal> subdirectory
+ of the Semantic Search UIMA install, at the command prompt. If you are using Eclipse, and have installed the
+ UIMA examples, there will be a Run configuration you can use to conveniently launch this, called
+ <literal>UIMA Semantic Search</literal>. This will display the following screen:
<screenshot>
@@ -966,150 +945,164 @@
<para>Configure the fields on this screen as follows:
- <itemizedlist spacing="compact"><listitem><para>Set the <quote>Index
- Directory</quote> to the directory where you built your index. This is the same
- value that you supplied for the <quote>Index Dir</quote> parameter of the
- Semantic Search CAS Indexer in the CPE Configurator.</para>
- </listitem>
-
- <listitem><para>Set the <quote>XMI/XCAS Directory</quote> to the
- directory where you stored the results of your analysis. This is the same value
- that you supplied for the <quote>Output Directory</quote> parameter of XMI
- Writer CAS Consumer in the CPE Configurator.</para></listitem>
-
- <listitem><para>Optionally, set the <quote>Original Documents Directory</quote> to
- the directory containing the original plain text documents that were analyzed
- and indexed. This is only needed for the "View Original Document" button.</para></listitem>
-
- <listitem><para> Set the <quote>Type System Descriptor</quote> to the location
- of the descriptor that describes your type system. For this example, this will be
- <literal>%UIMA_HOME%/examples/
- descriptors/tutorial/ex4/TutorialTypeSystem.xml</literal> </para>
- </listitem></itemizedlist></para>
-
- <para>Now, in the <quote>XML Fragments</quote> field, you can type in single words or
- XML queries where the XML tags correspond to the labels in the index build
- specification file (e.g.
- <literal><Meeting>UIMA</Meeting></literal>). XML Fragments are
- described in the documentation for the semantic search engine
- component
- on <ulink url="http://www.alphaworks.ibm.com/tech/uima"/>.</para>
-
- <para>After you enter a query and click the <quote>Search</quote> button, a list of
- hits will appear. Select one of the documents and click <quote>View
- Analysis</quote> to view the document in the UIMA Annotation Viewer.</para>
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>Set the <quote>Index Directory</quote> to the directory where you built your index. This is the
+ same value that you supplied for the <quote>Index Dir</quote> parameter of the Semantic Search CAS
+ Indexer in the CPE Configurator.</para>
+ </listitem>
+
+ <listitem>
+ <para>Set the <quote>XMI/XCAS Directory</quote> to the directory where you stored the results of your
+ analysis. This is the same value that you supplied for the <quote>Output Directory</quote>
+ parameter of XMI Writer CAS Consumer in the CPE Configurator.</para>
+ </listitem>
+
+ <listitem>
+ <para>Optionally, set the <quote>Original Documents Directory</quote> to the directory containing
+ the original plain text documents that were analyzed and indexed. This is only needed for the "View
+ Original Document" button.</para>
+ </listitem>
+
+ <listitem>
+ <para> Set the <quote>Type System Descriptor</quote> to the location of the descriptor that describes
+ your type system. For this example, this will be <literal>%UIMA_HOME%/examples/
+ descriptors/tutorial/ex4/TutorialTypeSystem.xml</literal> </para>
+ </listitem>
+ </itemizedlist></para>
+
+ <para>Now, in the <quote>XML Fragments</quote> field, you can type in single words or XML queries where the XML
+ tags correspond to the labels in the index build specification file (e.g.
+ <literal><Meeting>UIMA</Meeting></literal>). XML Fragments are described in the
+ documentation for the semantic search engine component on <ulink
+ url="http://www.alphaworks.ibm.com/tech/uima"/>.</para>
+
+ <para>After you enter a query and click the <quote>Search</quote> button, a list of hits will appear. Select
+ one of the documents and click <quote>View Analysis</quote> to view the document in the UIMA Annotation
+ Viewer.</para>
<para>The source code for the Semantic Search query program is in
- <literal>examples/src/com/ibm/uima/examples/search/SemanticSearchGUI.java</literal>
- . A simple command-line query program is also provided in
- <literal>examples/src/com/ibm/uima/examples/search/SemanticSearch.java</literal>
- . Using these as a model, you can build a query interface from your own application. For
- details on the Semantic Search Engine query language and interface, see
- the documentation for the semantic search engine
- component
- on <ulink url="http://www.alphaworks.ibm.com/tech/uima"/>.</para>
+ <literal>examples/src/com/ibm/uima/examples/search/SemanticSearchGUI.java</literal> . A simple
+ command-line query program is also provided in
+ <literal>examples/src/com/ibm/uima/examples/search/SemanticSearch.java</literal> . Using these
+ as a model, you can build a query interface from your own application. For details on the Semantic Search
+ Engine query language and interface, see the documentation for the semantic search engine component on
+ <ulink url="http://www.alphaworks.ibm.com/tech/uima"/>.</para>
</section>
</section>
<section id="ugr.tug.application.remote_services">
<title>Working with Remote Services</title>
- <para>The UIMA SDK allows you to easily take any Analysis Engine or CAS Consumer and deploy
- it as a service. That Analysis Engine or CAS Consumer can then be called from a remote
- machine using various network protocols.</para>
+ <para>The UIMA SDK allows you to easily take any Analysis Engine or CAS Consumer and deploy it as a service. That
+ Analysis Engine or CAS Consumer can then be called from a remote machine using various network
+ protocols.</para>
<para>The UIMA SDK provides support for two communications protocols:
- <itemizedlist spacing="compact"><listitem><para>SOAP, the standard Web Services
- protocol</para></listitem>
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>SOAP, the standard Web Services protocol</para>
+ </listitem>
- <listitem><para>Vinci, a lightweight version of SOAP, included as a part of Apache
- UIMA. </para></listitem></itemizedlist></para>
+ <listitem>
+ <para>Vinci, a lightweight version of SOAP, included as a part of Apache UIMA. </para>
+ </listitem>
+ </itemizedlist></para>
<para>The UIMA framework can make use of these services in two different ways:
- <orderedlist><listitem><para>An Analysis Engine can create a proxy to a remote
- service; this proxy acts like a local component, but connects to the remote. The proxy
- has limited error handling and retry capabilities. Both Vinci and SOAP are
- supported.</para></listitem>
-
- <listitem><para>A Collection Processing Engine can specify non-Integrated mode
- (see <olink targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.cpe.deploying_a_cpe"/>. The CPE provides more
- extensive error recovery capabilities. This mode only supports the Vinci
- communications protocol. </para></listitem></orderedlist></para>
+ <orderedlist>
+ <listitem>
+ <para>An Analysis Engine can create a proxy to a remote service; this proxy acts like a local component, but
+ connects to the remote. The proxy has limited error handling and retry capabilities. Both Vinci and SOAP
+ are supported.</para>
+ </listitem>
+
+ <listitem>
+ <para>A Collection Processing Engine can specify non-Integrated mode (see <olink
+ targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cpe.deploying_a_cpe"/>. The
+ CPE provides more extensive error recovery capabilities. This mode only supports the Vinci
+ communications protocol. </para>
+ </listitem>
+ </orderedlist></para>
<section id="ugr.tug.application.how_to_deploy_as_soap">
<title>Deploying a UIMA Component as a SOAP Service</title>
<titleabbrev>Deploying as SOAP Service</titleabbrev>
- <para>To deploy a UIMA component as a SOAP Web Service, you need to first install the
- following software components:
+ <para>To deploy a UIMA component as a SOAP Web Service, you need to first install the following software
+ components:
- <itemizedlist spacing="compact"><listitem><para>Apache Tomcat 5.0 or 5.5 (
- <ulink url="http://jakarta.apache.org/tomcat/"/>) </para></listitem>
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>Apache Tomcat 5.0 or 5.5 ( <ulink url="http://jakarta.apache.org/tomcat/"/>) </para>
+ </listitem>
- <listitem><para>Apache Axis 1.1 or 1.3 (<ulink
- url="http://ws.apache.org/axis/"/>) </para></listitem>
- </itemizedlist></para>
+ <listitem>
+ <para>Apache Axis 1.1 or 1.3 (<ulink url="http://ws.apache.org/axis/"/>) </para>
+ </listitem>
+ </itemizedlist></para>
- <para>Later versions of these components will likely also work, but have not been
- tested.</para>
+ <para>Later versions of these components will likely also work, but have not been tested.</para>
<para>Next, you need to do the following setup steps:
- <itemizedlist><listitem><para>Set the CATALINA_HOME environment variable
- to the location where Tomcat is installed.</para></listitem>
+ <itemizedlist>
+ <listitem>
+ <para>Set the CATALINA_HOME environment variable to the location where Tomcat is installed.</para>
+ </listitem>
+
+ <listitem>
+ <para>Copy all of the JAR files from <literal>%UIMA_HOME%/lib</literal> to the
+ <literal>%CATALINA_HOME%/webapps/axis/WEB-INF/lib</literal> in your installation.</para>
+ </listitem>
+
+ <listitem>
+ <para>Copy your JAR files for the UIMA components that you wish to
+ <literal>%CATALINA_HOME%/webapps/axis/WEB-INF/lib</literal> in your installation.</para>
+ </listitem>
+
+ <listitem>
+ <para><emphasis role="bold-italic">IMPORTANT</emphasis>: any time you add JAR files to Tomcat (for
+ instance, in the above 2 steps), you must shutdown and restart Tomcat before it
+ <quote>notices</quote> this. So now, please shutdown and restart Tomcat.</para>
+ </listitem>
- <listitem><para>Copy all of the JAR files from
- <literal>%UIMA_HOME%/lib</literal> to the
- <literal>%CATALINA_HOME%/webapps/axis/WEB-INF/lib</literal> in your
- installation.</para></listitem>
-
- <listitem><para>Copy your JAR files for the UIMA components that you wish to
- <literal>%CATALINA_HOME%/webapps/axis/WEB-INF/lib</literal> in your
- installation.</para></listitem>
-
- <listitem><para><emphasis role="bold-italic">IMPORTANT</emphasis>: any
- time you add JAR files to Tomcat (for instance, in the above 2 steps), you must
- shutdown and restart Tomcat before it <quote>notices</quote> this. So now,
- please shutdown and restart Tomcat.</para></listitem>
-
- <listitem><para>All the Java classes for the UIMA Examples are packaged in the
- <literal>uimaj-examples.jar</literal> file which is included in the
- <literal>%UIMA_HOME%/lib</literal> folder.</para></listitem>
-
- <listitem><para>In addition, if an annotator needs to locate resource files in
- the classpath, those resources must be available in the Axis classpath, so copy
- these also to
- <literal>%CATALINA_HOME%/webapps/axis/WEB-INF/classes</literal>
- .</para>
+ <listitem>
+ <para>All the Java classes for the UIMA Examples are packaged in the
+ <literal>uimaj-examples.jar</literal> file which is included in the
+ <literal>%UIMA_HOME%/lib</literal> folder.</para>
+ </listitem>
+
+ <listitem>
+ <para>In addition, if an annotator needs to locate resource files in the classpath, those resources
+ must be available in the Axis classpath, so copy these also to
+ <literal>%CATALINA_HOME%/webapps/axis/WEB-INF/classes</literal> .</para>
- <para>As an example, if you are deploying the GovernmentTitleRecognizer
- (found in <literal>examples/descriptors/analysis_engine/
- GovernmentOfficialRecognizer_RegEx_TAE</literal>) as a SOAP service,
- you need to copy the file
- <literal>examples/resources/GovernmentTitlePatterns.dat</literal>
- into <literal>.../WEB-INF/classes</literal>. </para></listitem>
- </itemizedlist></para>
+ <para>As an example, if you are deploying the GovernmentTitleRecognizer (found in
+ <literal>examples/descriptors/analysis_engine/
+ GovernmentOfficialRecognizer_RegEx_TAE</literal>) as a SOAP service, you need to copy the file
+ <literal>examples/resources/GovernmentTitlePatterns.dat</literal> into
+ <literal>.../WEB-INF/classes</literal>. </para>
+ </listitem>
+ </itemizedlist></para>
<para>Test your installation of Tomcat and Axis by starting Tomcat and going to
- <literal>http://localhost:8080/axis/happyaxis.jsp</literal> in your
- browser. Check to be sure that this reports that all of the required Axis libraries are
- present. One common missing file may be activation.jar, which you can get from
- java.sun.com.</para>
-
- <para>After completing these setup instructions, you can deploy Analysis Engines or
- CAS Consumers as SOAP web services by using the <literal>deploytool</literal>
- utility, with is located in the <literal>/bin</literal> directory of the UIMA SDK.
- <literal>deploytool</literal> is a command line program utility that takes as an
- argument a web services deployment descriptors (WSDD file); example WSDD files are
- provided in the <literal>examples/deploy/soap</literal> directory of the UIMA
- SDK. Deployment Descriptors have been provided for deploying and undeploying some
- of the example Analysis Engines that come with the SDK.</para>
+ <literal>http://localhost:8080/axis/happyaxis.jsp</literal> in your browser. Check to be sure that
+ this reports that all of the required Axis libraries are present. One common missing file may be
+ activation.jar, which you can get from java.sun.com.</para>
+
+ <para>After completing these setup instructions, you can deploy Analysis Engines or CAS Consumers as SOAP web
+ services by using the <literal>deploytool</literal> utility, with is located in the
+ <literal>/bin</literal> directory of the UIMA SDK. <literal>deploytool</literal> is a command line
+ program utility that takes as an argument a web services deployment descriptors (WSDD file); example WSDD
+ files are provided in the <literal>examples/deploy/soap</literal> directory of the UIMA SDK. Deployment
+ Descriptors have been provided for deploying and undeploying some of the example Analysis Engines that come
+ with the SDK.</para>
- <para>As an example, the WSDD file for deploying the example Person Title annotator
- looks like this (important parts are in bold italics):
+ <para>As an example, the WSDD file for deploying the example Person Title annotator looks like this (important
+ parts are in bold italics):
<programlisting><deployment name="<emphasis role="bold-italic">PersonTitleAnnotator</emphasis>"
@@ -1143,40 +1136,36 @@
</deployment></programlisting></para>
- <para>To modify this WSDD file to deploy your own Analysis Engine or CAS Consumer, just
- replace the areas indicated in bold italics (deployment name, service name, and
- resource specifier path) with values appropriate for your component.</para>
-
- <para>The <literal>numInstances</literal> parameter specifies how many instances of
- your Analysis Engine or CAS Consumer will be created. This allows your service to
- support multiple clients concurrently. When a new request comes in, if all of the
- instances are busy, the new request will wait until an instance becomes available.</para>
-
+ <para>To modify this WSDD file to deploy your own Analysis Engine or CAS Consumer, just replace the areas
+ indicated in bold italics (deployment name, service name, and resource specifier path) with values
+ appropriate for your component.</para>
+
+ <para>The <literal>numInstances</literal> parameter specifies how many instances of your Analysis Engine
+ or CAS Consumer will be created. This allows your service to support multiple clients concurrently. When a
+ new request comes in, if all of the instances are busy, the new request will wait until an instance becomes
+ available.</para>
+
<para>To deploy the Person Title annotator service, issue the following command:
<programlisting>C:/Program Files/apache/uima/bin>deploytool
../examples/deploy/soap/Deploy_PersonTitleAnnotator.wsdd</programlisting></para>
- <para>Test if the deployment was successful by starting up a browser, pointing it to
- your Tomcat installation's <quote>axis</quote> webpage (e.g.,
- <literal>http://localhost:8080/axis</literal>) and clicking on the List link.
- This should bring up a page which shows the deployed services, where you should
+ <para>Test if the deployment was successful by starting up a browser, pointing it to your Tomcat
+ installation's <quote>axis</quote> webpage (e.g., <literal>http://localhost:8080/axis</literal>)
+ and clicking on the List link. This should bring up a page which shows the deployed services, where you should
see the service you just deployed.</para>
<para>The other components can be deployed by replacing
- <literal>Deploy_PersonTitleAnnotator.wsdd</literal> with one of the other
- Deploy descriptors in the deploy directory. The deploytool utility can also
- undeploy services when passed one of the Undeploy descriptors.</para>
- <note><para>The <literal>deploytool</literal> shell script assumes that the web
- services are to be installed at <literal>http://localhost:8080/axis</literal>. If
- this is not the case, you will need to update the shell script appropriately.</para>
- </note>
-
- <para>Once you have deployed your component as a web service, you may call it from a
- remote machine. See <xref
- linkend="ugr.tug.application.how_to_call_a_uima_service"/> for
- instructions.</para>
+ <literal>Deploy_PersonTitleAnnotator.wsdd</literal> with one of the other Deploy descriptors in the
+ deploy directory. The deploytool utility can also undeploy services when passed one of the Undeploy
+ descriptors.</para> <note>
+ <para>The <literal>deploytool</literal> shell script assumes that the web services are to be installed at
+ <literal>http://localhost:8080/axis</literal>. If this is not the case, you will need to update the shell
+ script appropriately.</para> </note>
+
+ <para>Once you have deployed your component as a web service, you may call it from a remote machine. See <xref
+ linkend="ugr.tug.application.how_to_call_a_uima_service"/> for instructions.</para>
</section>
@@ -1184,17 +1173,15 @@
<title>Deploying a UIMA Component as a Vinci Service</title>
<titleabbrev>Deploying as a Vinci Service</titleabbrev>
- <para>There are no software prerequisites for deploying a Vinci service. The
- necessary libraries are part of the UIMA SDK. However, before you can use Vinci
- services you need to deploy the Vinci Naming Service (VNS), as described in section
- <xref linkend="ugr.tug.application.vns"/>.</para>
-
- <para>To deploy a service, you have to insure any components you want to include can be
- found on the class path. One way to do this is to set the environment variable
- UIMA_CLASSPATH to the set of class paths you need for any included components. Then
- run the <literal>startVinciService</literal> shell script, which is located in
- the <literal>bin</literal> directory, and pass it the path to a Vinci deployment
- descriptor, for example: <literal>C:UIMA>bin/startVinciService
+ <para>There are no software prerequisites for deploying a Vinci service. The necessary libraries are part of
+ the UIMA SDK. However, before you can use Vinci services you need to deploy the Vinci Naming Service (VNS), as
+ described in section <xref linkend="ugr.tug.application.vns"/>.</para>
+
+ <para>To deploy a service, you have to insure any components you want to include can be found on the class path.
[... 732 lines stripped ...]