You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2008/08/28 23:28:16 UTC
svn commit: r689997 [12/32] - in /incubator/uima/uimaj/trunk/uima-docbooks:
./ src/ src/docbook/overview_and_setup/ src/docbook/references/
src/docbook/tools/ src/docbook/tutorials_and_users_guides/
src/docbook/uima/organization/ src/olink/references/
Modified: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.xml.component_descriptor.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.xml.component_descriptor.xml?rev=689997&r1=689996&r2=689997&view=diff
==============================================================================
--- incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.xml.component_descriptor.xml (original)
+++ incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.xml.component_descriptor.xml Thu Aug 28 14:28:14 2008
@@ -1,2235 +1,2235 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
-"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
-<!ENTITY % uimaents SYSTEM "../entities.ent" >
-<!ENTITY tp "ugr.ref.xml.component_descriptor.">
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.ref.xml.component_descriptor">
- <title>Component Descriptor Reference</title>
-
- <para>This chapter is the reference guide for the UIMA SDK's Component Descriptor XML
- schema. A <emphasis>Component Descriptor</emphasis> (also sometimes called a
- <emphasis>Resource Specifier</emphasis> in the code) is an XML file that either (a)
- completely describes a component, including all information needed to construct the
- component and interact with it, or (b) specifies how to connect to and interact with an
- existing component that has been published as a remote service.
- <emphasis>Component</emphasis> (also called <emphasis>Resource</emphasis>) is a
- general term for modules produced by UIMA developers and used by UIMA applications. The
- types of Components are: Analysis Engines, Collection Readers, CAS
- Initializers<footnote><para>This component is deprecated and should not be use in new
- development.</para></footnote>, CAS Consumers, and Collection Processing Engines.
- However, Collection Processing Engine Descriptors are significantly different in
- format and are covered in a separate chapter, <olink targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.xml.cpe_descriptor"/>.</para>
-
- <para><xref linkend="&tp;notation"/> describes the notation used in this
- chapter.</para>
-
- <para><xref linkend="&tp;imports"/> describes the UIMA SDK's
- <emphasis>import</emphasis> syntax, used to allow XML descriptors to import
- information from other XML files, to allow sharing of information between several XML
- descriptors.</para>
-
- <para><xref linkend="&tp;aes"/> describes the XML format for <emphasis>Analysis Engine
- Descriptors</emphasis>. These are descriptors that completely describe Analysis
- Engines, including all information needed to construct and interact with them.</para>
-
- <para><xref linkend="&tp;collection_processing_parts"/> describes the XML format for
- <emphasis>Collection Processing Component Descriptors</emphasis>. This includes
- Collection Iterator, CAS Initializer, and CAS Consumer Descriptors.</para>
-
- <para><xref linkend="&tp;service_client"/> describes the XML format for
- <emphasis>Service Client Descriptors</emphasis>, which specify how to connect to and
- interact with resources deployed as remote services.</para>
-
- <para><xref linkend="&tp;custom_resource_specifiers"/> describes the XML format for
- <emphasis>Custom Resource Specifiers</emphasis>, which allow you to plug in your
- own Java class as a UIMA Resource.</para>
-
- <section id="&tp;notation">
- <title>Notation</title>
-
- <para>This chapter uses an informal notation to specify the syntax of Component
- Descriptors. The formal syntax is defined by an XML schema definition, which is
- contained in the file <literal>resourceSpecifierSchema.xsd</literal>,
- located in the <literal>uima-core.jar</literal> file.</para>
-
- <para>The notation used in this chapter is:</para>
-
- <itemizedlist><listitem><para>An ellipsis (...) inside an element body indicates
- that the substructure of that element has been omitted (to be described in another
- section of this chapter). An example of this would be:
-
-
- <programlisting><analysisEngineMetaData>
-...
-</analysisEngineMetaData></programlisting>
- An ellipsis immediately after an element indicates that the element type may be may be
- repeated arbitrarily many times. For example:
-
-
- <programlisting><parameter>[String]</parameter>
-<parameter>[String]</parameter>
-...</programlisting>
- indicates that there may be arbitrarily many parameter elements in this
- context.</para></listitem>
-
- <listitem><para>Bracketed expressions (e.g. <literal>[String]</literal>)
- indicate the type of value that may be used at that location.</para></listitem>
-
- <listitem><para>A vertical bar, as in <literal>true|false</literal>, indicates
- alternatives. This can be applied to literal values, bracketed type names, and
- elements.</para></listitem>
-
- <listitem><para>Which elements are optional and which are required is specified in
- prose, not in the syntax definition. </para></listitem></itemizedlist>
- </section>
-
- <section id="&tp;imports">
- <title>Imports</title>
-
- <para>The UIMA SDK defines a particular syntax for XML descriptors to import information
- from other XML files. When one of the following appears in an XML descriptor:
-
-
- <programlisting><import location="[URL]" /> or
-<import name="[Name]" /></programlisting>
- it indicates that information from a separate XML file is being imported. Note that
- imports are allowed only in certain places in the descriptor. In the remainder of this
- chapter, it will be indicated at which points imports are allowed.</para>
-
- <para>If an import specifies a <literal>location</literal> attribute, the value of
- that attribute specifies the URL at which the XML file to import will be found. This can be
- a relative URL, which will be resolved relative to the descriptor containing the
- <literal>import</literal> element, or an absolute URL. Relative URLs can be written
- without a protocol/scheme (e.g., <quote>file:</quote>), and without a host machine
- name. In this case the relative URL might look something like
- <literal>org/apache/myproj/MyTypeSystem.xml.</literal></para>
-
- <para>An absolute URL is written with one of the following prefixes, followed by a path
- such as <literal>org/apache/myproj/MyTypeSystem.xml</literal>:
-
- <itemizedlist spacing="compact"><listitem><para>file:/ ← has no network
- address</para></listitem>
- <listitem><para>file:/// ← has an empty network address</para></listitem>
- <listitem><para>file://some.network.address/</para></listitem>
- </itemizedlist></para>
-
- <para>For more information about URLs, please read the javadoc information for the Java
- class <quote>URL</quote>.</para>
-
- <para>If an import specifies a <literal>name</literal> attribute, the value of that
- attribute should take the form of a Java-style dotted name (e.g.
- <literal>org.apache.myproj.MyTypeSystem</literal>). An .xml file with this name
- will be searched for in the classpath or datapath (described below). As in Java, the dots
- in the name will be converted to file path separators. So an import specifying the
- example name in this paragraph will result in a search for
- <literal>org/apache/myproj/MyTypeSystem.xml</literal> in the classpath or
- datapath.</para>
-
- <para id="&tp;datapath">The datapath works similarly to the classpath but can be set programmatically
- through the resource manager API. Application developers can specify a datapath
- during initialization, using the following code:
-
-
- <programlisting>
-ResourceManager resMgr = UIMAFramework.newDefaultResourceManager();
-resMgr.setDataPath(yourPathString);
-AnalysisEngine ae = UIMAFramework.produceAE(desc, resMgr, null);
-</programlisting></para>
-
- <para>The default datapath for the entire JVM can be set via the
- <literal>uima.datapath</literal> Java system property, but this feature should
- only be used for standalone applications that don't need to run in the same JVM as
- other code that may need a different datapath.</para>
- <para>Previous versions of UIMA also supported XInclude. That support didn't work in
- many situations, and it is no longer supported. To include other files, please use
- <import>.</para>
- <!--
- <para>The UIMA SDK also supports XInclude, a W3C candidate recommendation,
- to include XML files within other XML files. However, it is recommended that the import syntax be used instead, as it
- is more flexible and better supports tool developers.</para>
-
- <note><para>UIMA tools for editing XML
- descriptors do not support the use of xi:include because they cannot correctly
- determine what parts of a descriptor are updatable, and what parts are included
- from other files. They do support the
- use of <import>.
- </para></note>
-
- <para>To use XInclude, you first must include the XInclude
- namespace in your document's root element, e.g.:</para>
-
- <programlisting><analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier" xmlns:xi="http://www.w3.org/2001/XInclude"></programlisting>
-
- <para>Then, you can include a file using the syntax <literal><xi:include
- href="[URL]"/></literal></para>
-
- <para>where [URL] can be any relative or absolute URL referring
- to another XML document. The referred-to
- document must be a valid XML document, meaning that it must consist of exactly
- one root element and must define all of the namespace prefixes that it uses. The default namespace (generally <literal>http://uima.apache.org/resourceSpecifier</literal>) will be
- inherited from the parent document. When UIMA parses the XML document, it will automatically replace the <literal><xi:include> </literal>element with the entire XML document
- referred to by the href. For more
- information on XInclude see
- <a href="http://www.w3.org/TR/xinclude/">http://www.w3.org/TR/xinclude/</a>.</para>
- -->
-
- </section>
-
- <section id="&tp;type_system">
- <title>Type System Descriptors</title>
-
- <para>A Type System Descriptor is used to define the types and features that can be
- represented in the CAS. A Type System Descriptor can be imported into an Analysis Engine
- or Collection Processing Component Descriptor.</para>
-
- <para>The basic structure of a Type System Descriptor is as follows:
-
-
- <programlisting><![CDATA[<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
-
- <name> [String] </name>
- <description>[String]</description>
- <version>[String]</version>
- <vendor>[String]</vendor>
-
- <imports>
- <import ...>
- ...
- </imports>
-
- <types>
- <typeDescription>
- ...
- </typeDescription>
-
- ...
-
- </types>
-
-</typeSystemDescription>]]></programlisting></para>
-
- <para>All of the subelements are optional.</para>
-
- <section id="&tp;type_system.imports">
- <title>Imports</title>
-
- <para>The <literal>imports</literal> section allows this descriptor to import
- types from other type system descriptors. The import syntax is described in <xref
- linkend="&tp;imports"/>. A type system may import any number of other type
- systems and then define additional types which refer to imported types. Circular
- imports are allowed.</para>
- </section>
-
- <section id="&tp;type_system.types">
- <title>Types</title>
-
- <para>The <literal>types</literal> element contains zero or more
- <literal>typeDescription</literal> elements. Each
- <literal>typeDescription</literal> has the form:
-
-
- <programlisting><![CDATA[<typeDescription>
- <name>[TypeName]</name>
- <description>[String]</description>
- <supertypeName>[TypeName]</supertypeName>
- <features>
- ...
- </features>
-</typeDescription>]]></programlisting></para>
-
- <para>The name element contains the name of the type. A
- <literal>[TypeName]</literal> is a dot-separated list of names, where each name
- consists of a letter followed by any number of letters, digits, or underscores.
- <literal>TypeNames</literal> are case sensitive. Letter and digit are as defined
- by Java; therefore, any Unicode letter or digit may be used (subject to the character
- encoding defined by the descriptor file's XML header). The name following the
- final dot is considered to be the <quote>short name</quote> of the type; the
- preceding portion is the namespace (analogous to the package.class syntax used in
- Java). Namespaces beginning with uima are reserved and should not be used. Examples
- of valid type names are:</para>
-
- <itemizedlist spacing="compact"><listitem><para>test.TokenAnnotation</para>
- </listitem>
-
- <listitem><para>org.myorg.TokenAnnotation</para></listitem>
-
- <listitem><para>com.my_company.proj123.TokenAnnotation </para></listitem>
- </itemizedlist>
-
- <para>These would all be considered distinct types since they have different
- namespaces. Best practice here is to follow the normal Java naming conventions of
- having namespaces be all lowercase, with the short type names having an initial
- capital, but this is not mandated, so <literal>ABC.mYtyPE</literal> is an allowed
- type name. While type names without namespaces (e.g.
- <literal>TokenAnnotation</literal> alone) are allowed, but discouraged because
- naming conflicts can then result when combining annotators that use different
- type systems.</para>
-
- <para>The <literal>description</literal> element contains a textual description
- of the type. The <literal>supertypeName</literal> element contains the name of the
- type from which it inherits (this can be set to the name of another user-defined type,
- or it may be set to any built-in type which may be subclassed, such as
- <literal>uima.tcas.Annotation</literal> for a new annotation
- type or <literal>uima.cas.TOP</literal> for a new type that is not
- an annotation). All three of these elements are required.</para>
-
- </section>
-
- <section id="&tp;type_system.features">
- <title>Features</title>
-
- <para>The <literal>features</literal> element of a
- <literal>typeDescription</literal> is required only if the type we are specifying
- introduces new features. If the <literal>features</literal> element is present,
- it contains zero or more <literal>featureDescription</literal> elements, each of
- which has the form:</para>
-
-
- <programlisting><![CDATA[<featureDescription>
- <name>[Name]</name>
- <description>[String]</description>
- <rangeTypeName>[Name]</rangeTypeName>
- <elementType>[Name]</elementType>
- <multipleReferencesAllowed>true|false</multipleReferencesAllowed>
-</featureDescription>]]></programlisting>
-
- <para>A feature's name follows the same rules as a type short name – a letter
- followed by any number of letters, digits, or underscores. Feature names are case
- sensitive.</para>
-
- <para>The feature's <literal>rangeTypeName</literal> specifies the type of
- value that the feature can take. This may be the name of any type defined in your type
- system, or one of the predefined types. All of the predefined types have names that are
- prefixed with <literal>uima.cas</literal> or <literal>uima.tcas</literal>,
- for example:
-
-
- <programlisting>uima.cas.TOP
-uima.cas.String
-uima.cas.Long
-uima.cas.FSArray
-uima.cas.StringList
-uima.tcas.Annotation.</programlisting>
- For a complete list of predefined types, see the CAS API documentation.</para>
-
- <para>The <literal>elementType</literal> of a feature is optional, and applies only
- when the <literal>rangeTypeName</literal> is
- <literal>uima.cas.FSArray</literal> or <literal>uima.cas.FSList</literal>
- The <literal>elementType</literal> specifies what type of value can be assigned as
- an element of the array or list. This must be the name of a non-primitive type. If
- omitted, it defaults to <literal>uima.cas.TOP</literal>, meaning that any
- FeatureStructure can be assigned as an element the array or list. Note: depending on
- the CAS Interface that you use in your code, this constraint may or may not be
- enforced.</para>
-
- <para>The <literal>multipleReferencesAllowed</literal> feature is optional, and
- applies only when the <literal>rangeTypeName</literal> is an array or list type (it
- applies to arrays and lists of primitive as well as non-primitive types). Setting
- this to false (the default) indicates that this feature has exclusive ownership of
- the array or list, so changes to the array or list are localized. Setting this to true
- indicates that the array or list may be shared, so changes to it may affect other
- objects in the CAS. Note: there is currently no guarantee that the framework will
- enforce this restriction. However, this setting may affect how the CAS is
- serialized.</para>
-
- </section>
-
- <section id="&tp;type_system.string_subtypes">
- <title>String Subtypes</title>
-
- <para>There is one other special type that you can declare – a subset of the String
- type that specifies a restricted set of allowed values. This is useful for features
- that can have only certain String values, such as parts of speech. Here is an example of
- how to declare such a type:</para>
-
-
- <programlisting><![CDATA[<typeDescription>
- <name>PartOfSpeech</name>
- <description>A part of speech.</description>
- <supertypeName>uima.cas.String</supertypeName>
- <allowedValues>
- <value>
- <string>NN</string>
- <description>Noun, singular or mass.</description>
- </value>
- <value>
- <string>NNS</string>
- <description>Noun, plural.</description>
- </value>
- <value>
- <string>VB</string>
- <description>Verb, base form.</description>
- </value>
- ...
- </allowedValues>
-</typeDescription>]]></programlisting>
-
- </section>
- </section>
-
- <section id="&tp;aes">
- <title>Analysis Engine Descriptors</title>
-
- <para>Analysis Engine (AE) descriptors completely describe Analysis Engines. There
- are two basic types of Analysis Engines – <emphasis>Primitive</emphasis> and
- <emphasis>Aggregate</emphasis>. A <emphasis>Primitive</emphasis> Analysis
- Engine is a container for a single <emphasis>annotator</emphasis>, where as an
- <emphasis>Aggregate</emphasis> Analysis Engine is composed of a collection of other
- Analysis Engines. (For more information on this and other terminology, see <olink
- targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.conceptual"/>).</para>
-
- <para>Both Primitive and Aggregate Analysis Engines have descriptors, and the two types
- of descriptors have some similarities and some differences. <xref linkend="&tp;aes.primitive"/>
- discusses Primitive Analysis Engine descriptors. <xref linkend="&tp;aes.aggregate"/> then
- describes how Aggregate Analysis Engine descriptors are different.</para>
-
- <section id="&tp;aes.primitive">
- <title>Primitive Analysis Engine Descriptors</title>
-
- <section id="&tp;aes.primitive.basic">
- <title>Basic Structure</title>
-
-
- <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
-<analysisEngineDescription
- xmlns="http://uima.apache.org/resourceSpecifier">
- <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
-
- <primitive>true</primitive>
- <annotatorImplementationName> [String] </annotatorImplementationName>
-
- <analysisEngineMetaData>
- ...
- </analysisEngineMetaData>
-
- <externalResourceDependencies>
- ...
- </externalResourceDependencies>
-
- <resourceManagerConfiguration>
- ...
- </resourceManagerConfiguration>
-
-</analysisEngineDescription>]]></programlisting>
-
- <para>The document begins with a standard XML header. The recommended root tag is
- <literal><analysisEngineDescription></literal>, although
- <literal><taeDescription></literal> is also allowed for backwards
- compatibility.</para>
-
- <para>Within the root element we declare that we are using the XML namespace
- <literal>http://uima.apache.org/resourceSpecifier.</literal> It is
- required that this namespace be used; otherwise, the descriptor will not be able to
- be validated for errors.</para>
-
- <para> The first subelement,
- <literal><frameworkImplementation>,</literal> currently must have
- the value <literal>org.apache.uima.java</literal>, or
- <literal>org.apache.uima.cpp</literal>. In future versions, there may be
- other framework implementations, or perhaps implementations produced by other
- vendors.</para>
-
- <para>The second subelement, <literal><primitive>,</literal> contains
- the Boolean value <literal>true</literal>, indicating that this XML document
- describes a <emphasis>Primitive</emphasis> Analysis Engine.</para>
-
- <para>The next subelement,<literal>
- <annotatorImplementationName></literal> is how the UIMA framework
- determines which annotator class to use. This should contain a fully-qualified
- Java class name for Java implementations, or the name of a .dll or .so file for C++
- implementations.</para>
-
- <para>The <literal><analysisEngineMetaData></literal> object contains
- descriptive information about the analysis engine and what it does. It is
- described in <xref linkend="&tp;aes.metadata"/>.</para>
-
- <para>The <literal><externalResourceDependencies></literal> and
- <literal><resourceManagerConfiguration></literal> elements declare
- the external resource files that the analysis engine relies
- upon. They are optional and are described in <xref
- linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref
- linkend="&tp;aes.primitive.resource_manager_configuration"/>.</para>
-
- </section>
-
- <section id="&tp;aes.metadata">
- <title>Analysis Engine MetaData</title>
-
-
- <programlisting><![CDATA[<analysisEngineMetaData>
- <name> [String] </name>
- <description>[String]</description>
- <version>[String]</version>
- <vendor>[String]</vendor>
-
- <configurationParameters> ... </configurationParameters>
-
- <configurationParameterSettings>
- ...
- </configurationParameterSettings>
-
- <typeSystemDescription> ... </typeSystemDescription>
-
- <typePriorities> ... </typePriorities>
-
- <fsIndexCollection> ... </fsIndexCollection>
-
- <capabilities> ... </capabilities>
-
- <operationalProperties> ... </operationalProperties>
-
-</analysisEngineMetaData>]]></programlisting>
-
- <para>The <literal>analysisEngineMetaData</literal> element contains four
- simple string fields – <literal>name</literal>,
- <literal>description</literal>, <literal>version</literal>, and
- <literal>vendor</literal>. Only the <literal>name</literal> field is
- required, but providing values for the other fields is recommended. The
- <literal>name</literal> field is just a descriptive name meant to be read by
- users; it does not need to be unique across all Analysis Engines.</para>
-
- <para>The other sub-elements –
- <literal>configurationParameters</literal>,
- <literal>configurationParameterSettings</literal>,
- <literal>typeSystemDescription</literal>,
- <literal>typePriorities</literal>, <literal>fsIndexes</literal>,
- <literal>capabilities</literal> and
- <literal>operationalProperties</literal> are described in the following
- sections. The only one of these that is required is
- <literal>capabilities</literal>; the others are optional.</para>
-
- </section>
-
- <section id="&tp;aes.configuration_parameter_declaration">
- <title>Configuration Parameter Declaration</title>
-
- <para>Configuration Parameters are made available to annotator
- implementations and applications by the following interfaces:
- <literal>AnnotatorContext</literal> <footnote><para>Deprecated; use
- UimaContext instead.</para></footnote> (passed as an argument to the
- initialize() method of a version 1 annotator),
- <literal>ConfigurableResource</literal> (every Analysis Engine
- implements this interface), and the <literal>UimaContext</literal> (passed
- as an argument to the initialize() method of a version 2 annotator) (you can get
- this from any resource, including Analysis Engines, using the method
- <literal>getUimaContext</literal>()).</para>
-
- <para>Use AnnotatorContext within version 1 annotators and UimaContext for
- version 2 annotators and outside of annotators (for instance, in CasConsumers,
- or the containing application) to access configuration parameters.</para>
-
- <para>Configuration parameters are set from the corresponding elements in the
- XML descriptor for the application. If you need to programmatically change
- parameter settings within an application, you can use methods in
- ConfigurableResource; if you do this, you need to call reconfigure()
- afterwards to have the UIMA framework notify all the contained analysis
- components that the parameter configuration has changed (the analysis
- engine's reinitialize() methods will be called). Note that in the current
- implementation, only integrated deployment components have configuration
- parameters passed to them; remote components obtain their parameters from
- their remote startup environment. This will likely change in the
- future.</para>
-
- <para>There are two ways to specify the
- <literal><configurationParameters></literal> section – as a
- list of configuration parameters or a list of groups. A list of parameters, which
- are not part of any group, looks like this:
-
-
- <programlisting><![CDATA[<configurationParameters>
- <configurationParameter>
- <name>[String]</name>
- <description>[String]</description>
- <type>String|Integer|Float|Boolean</type>
- <multiValued>true|false</multiValued>
- <mandatory>true|false</mandatory>
- <overrides>
- <parameter>[String]</parameter>
- <parameter>[String]</parameter>
- ...
- </overrides>
- </configurationParameter>
- <configurationParameter>
- ...
- </configurationParameter>
- ...
-</configurationParameters>]]></programlisting></para>
-
- <para>For each configuration parameter, the following are specified:</para>
-
- <itemizedlist><listitem><para><emphasis role="bold">name</emphasis>
- – the name by which the annotator code refers to the parameter. All
- parameters declared in an analysis engine descriptor must have distinct names.
- (required). The name is composed of normal Java identifier characters.</para>
- </listitem>
-
- <listitem><para><emphasis role="bold">description</emphasis> – a
- natural language description of the intent of the parameter
- (optional)</para></listitem>
-
- <listitem><para><emphasis role="bold">type</emphasis> – the data
- type of the parameter's value – must be one of
- <literal>String</literal>, <literal>Integer</literal>,
- <literal>Float</literal>, or <literal>Boolean</literal>
- (required).</para></listitem>
-
- <listitem><para><emphasis role="bold">multiValued</emphasis> –
- <literal>true</literal> if the parameter can take multiple-values (an
- array), <literal>false</literal> if the parameter takes only a single value
- (optional, defaults to false).</para></listitem>
-
- <listitem><para><emphasis role="bold">mandatory</emphasis> –
- <literal>true</literal> if a value must be provided for the parameter
- (optional, defaults to false).</para></listitem>
-
- <listitem><para><emphasis role="bold">overrides</emphasis> – this
- is used only in aggregate Analysis Engines, but is included here for
- completeness. See <xref
- linkend="&tp;aes.aggregate.configuration_parameter_overrides"/>
- for a discussion of configuration parameter overriding in aggregate
- Analysis Engines. (optional) </para></listitem></itemizedlist>
-
- <para>A list of groups looks like this:
-
-
- <programlisting><![CDATA[<configurationParameters defaultGroup="[String]"
- searchStrategy="none|default_fallback|language_fallback" >
-
- <commonParameters>
- [zero or more parameters]
- </commonParameters>
-
- <configurationGroup names="name1 name2 name3 ...">
- [zero or more parameters]
- </configurationGroup>
-
- <configurationGroup names="name4 name5 ...">
- [zero or more parameters]
- </configurationGroup>
-
- ...
-
-</configurationParameters>]]></programlisting></para>
-
- <para>Both the<literal> <commonParameters></literal> and
- <literal><configurationGroup></literal> elements contain zero or
- more <literal><configurationParameter></literal> elements, with
- the same syntax described above.</para>
-
- <para>The <literal><commonParameters></literal> element declares
- parameters that exist in all groups. Each
- <literal><configurationGroup></literal> element has a names
- attribute, which contains a list of group names separated by whitespace (space
- or tab characters). Names consist of any number of non-whitespace characters;
- however the Component Descriptor Editor tool restricts this to be normal Java
- identifiers, including the period (.) and the dash (-). One configuration group
- will be created for each name, and all of the groups will contain the same set of
- parameters.</para>
-
- <para>The <literal>defaultGroup</literal> attribute specifies the name of the
- group to be used in the case where an annotator does a lookup for a configuration
- parameter without specifying a group name. It may also be used as a fallback if the
- annotator specifies a group that does not exist – see below.</para>
-
- <para>The <literal>searchStrategy</literal> attribute determines the action
- to be taken when the context is queried for the value of a parameter belonging to a
- particular configuration group, if that group does not exist or does not contain
- a value for the requested parameter. There are currently three possible values:
-
- <itemizedlist><listitem><para><emphasis role="bold">none</emphasis>
- – there is no fallback; return null if there is no value in the exact group
- specified by the user.</para></listitem>
-
- <listitem><para><emphasis role="bold">default_fallback</emphasis>
- – if there is no value found in the specified group, look in the default
- group (as defined by the <literal>default</literal> attribute)</para>
- </listitem>
-
- <listitem><para><emphasis role="bold">language_fallback</emphasis>
- – this setting allows for a specific use of configuration parameter
- groups where the groups names correspond to ISO language and country codes
- (for an example, see below). The fallback sequence is:
- <literal><lang>_<country>_<region> →
- <lang>_<country> → <lang> →
- <default>.</literal> </para></listitem></itemizedlist>
- </para>
-
- <section id="&tp;aes.configuration_parameter_declaration.example">
- <title>Example</title>
-
-
- <programlisting><![CDATA[<configurationParameters defaultGroup="en"
- searchStrategy="language_fallback">
-
- <commonParameters>
- <configurationParameter>
- <name>DictionaryFile</name>
- <description>Location of dictionary for this
- language</description>
- <type>String</type>
- <multiValued>false</multiValued>
- <mandatory>false</mandatory>
- </configurationParameter>
- </commonParameters>
-
- <configurationGroup names="en de en-US"/>
-
- <configurationGroup names="zh">
- <configurationParameter>
- <name>DBC_Strategy</name>
- <description>Strategy for dealing with double-byte
- characters.</description>
- <type>String</type>
- <multiValued>false</multiValued>
- <mandatory>false</mandatory>
- </configurationParameter>
- </configurationGroup>
-
-</configurationParameters>]]></programlisting>
-
- <para>In this example, we are declaring a <literal>DictionaryFile</literal>
- parameter that can have a different value for each of the languages that our AE
- supports
- – English (general), German, U.S. English, and Chinese. For Chinese
- only, we also declare a <literal>DBC_Strategy</literal>
- parameter.</para>
-
- <para>We are using the <literal>language_fallback</literal> search
- strategy, so if an annotator requests the dictionary file for the
- <literal>en-GB</literal> (British English) group, we will fall back to the
- more general <literal>en</literal> group.</para>
-
- <para>Since we have defined <literal>en</literal> as the default group, this
- value will be returned if the context is queried for the
- <literal>DictionaryFile</literal> parameter without specifying any
- group name, or if a nonexistent group name is specified.</para>
- </section>
- </section>
-
- <section id="&tp;aes.configuration_parameter_settings">
- <title>Configuration Parameter Settings</title>
-
- <para>If no configuration groups were declared, the
- <literal><configurationParameterSettings></literal> element
- looks like this:
-
-
- <programlisting><![CDATA[<configurationParameterSettings>
- <nameValuePair>
- <name>[String]</name>
- <value>
- <string>[String]</string> |
- <integer>[Integer]</integer> |
- <float>[Float]</float> |
- <boolean>true|false</boolean> |
- <array> ... </array>
- </value>
- </nameValuePair>
-
- <nameValuePair>
- ...
- </nameValuePair>
- ...
-</configurationParameterSettings>]]></programlisting></para>
-
- <para>There are zero or more <literal>nameValuePair</literal> elements. Each
- <literal>nameValuePair</literal> contains a name (which refers to one of the
- configuration parameters) and a value for that parameter.</para>
-
- <para>The <literal>value</literal> element contains an element that matches
- the type of the parameter. For single-valued parameters, this is either
- <literal><string></literal>, <literal><integer></literal>
- , <literal><float></literal>, or
- <literal><boolean></literal>. For multi-valued parameters, this is
- an <literal><array></literal> element, which then contains zero or
- more instances of the appropriate type of primitive value, e.g.:
-
-
- <programlisting><array><string>One</string><string>Two</string></array></programlisting></para>
-
- <para>If configuration groups were declared, then the
- <literal><configurationParameterSettings></literal> element
- looks like this:
-
-
- <programlisting><![CDATA[<configurationParameterSettings>
-
- <settingsForGroup name="[String]">
- [one or more <nameValuePair> elements]
- </settingsForGroup>
-
- <settingsForGroup name="[String]">
- [one or more <nameValuePair> elements]
- </settingsForGroup>
-
-...
-
-</configurationParameterSettings>]]></programlisting>
- where each <literal><settingsForGroup></literal> element has a name
- that matches one of the configuration groups declared under the
- <literal><configurationParameters></literal> element and contains
- the parameter settings for that group.</para>
-
- <section id="&tp;aes.configuration_parameter_settings.example">
- <title>Example</title>
-
- <para>Here are the settings that correspond to the parameter declarations in
- the previous example:
-
-
- <programlisting><![CDATA[<configurationParameterSettings>
-
- <settingsForGroup name="en">
- <nameValuePair>
- <name>DictionaryFile</name>
- <value><string>resourcesEnglishdictionary.dat></string></value>
- </nameValuePair>
- </settingsForGroup>
-
- <settingsForGroup name="en-US">
- <nameValuePair>
- <name>DictionaryFile</name>
- <value><string>resourcesEnglish_USdictionary.dat</string></value>
- </nameValuePair>
- </settingsForGroup>
-
- <settingsForGroup name="de">
- <nameValuePair>
- <name>DictionaryFile</name>
- <value><string>resourcesDeutschdictionary.dat</string></value>
- </nameValuePair>
- </settingsForGroup>
-
- <settingsForGroup name="zh">
- <nameValuePair>
- <name>DictionaryFile</name>
- <value><string>resourcesChinesedictionary.dat</string></value>
- </nameValuePair>
-
- <nameValuePair>
- <name>DBC_Strategy</name>
- <value><string>default</string></value>
- </nameValuePair>
-
- </settingsForGroup>
-
-</configurationParameterSettings>]]></programlisting></para>
- </section>
- </section>
-
- <section id="&tp;aes.type_system">
- <title>Type System Definition</title>
-
-
- <programlisting><![CDATA[<typeSystemDescription>
-
- <name> [String] </name>
- <description>[String]</description>
- <version>[String]</version>
- <vendor>[String]</vendor>
-
- <imports>
- <import ...>
- ...
- </imports>
-
- <types>
- <typeDescription>
- ...
- </typeDescription>
-
- ...
-
- </types>
-
-</typeSystemDescription>]]></programlisting>
-
- <para>A <literal>typeSystemDescription</literal> element defines a type
- system for an Analysis Engine. The syntax for the element is described in <xref
- linkend="&tp;type_system"/>.</para>
-
- <para>The recommended usage is to <literal>import</literal> an external type
- system, using the import syntax described in <xref linkend="&tp;imports"/>
- of this chapter. For example:
-
-
- <programlisting><typeSystemDescription>
- <imports>
- <import location="MySharedTypeSystem.xml">
- </imports>
-</typeSystemDescription></programlisting></para>
-
- <para>This allows several AEs to share a single type system definition. The file
- <literal>MySharedTypeSystem.xml</literal> would then contain the full
- type system information, including the <literal>name</literal>,
- <literal>description</literal>, <literal>vendor</literal>,
- <literal>version</literal>, and <literal>types</literal>.</para>
-
- </section>
- <section id="&tp;aes.type_priority">
- <title>Type Priority Definition</title>
-
-
- <programlisting><![CDATA[<typePriorities>
- <name> [String] </name>
- <description>[String]</description>
- <version>[String]</version>
- <vendor>[String]</vendor>
-
- <imports>
- <import ...>
- ...
- </imports>
-
- <priorityLists>
- <priorityList>
- <type>[TypeName]</type>
- <type>[TypeName]</type>
- ...
- </priorityList>
-
- ...
-
- </priorityLists>
-</typePriorities>]]></programlisting>
-
- <para>The <literal><typePriorities></literal> element contains
- zero or more <literal><priorityList></literal> elements; each
- <literal><priorityList></literal> contains zero or more types.
- Like a type system, a type priorities definition may also declare a name,
- description, version, and vendor, and may import other type priorities. See
- <xref linkend="&tp;imports"/> for the import syntax.</para>
-
- <para>Type priority is used when iterating over feature structures in the CAS.
- For example, if the CAS contains a <literal>Sentence</literal> annotation
- and a <literal>Paragraph</literal> annotation with the same span of text
- (i.e. a one-sentence paragraph), which annotation should be returned first
- by an iterator? Probably the Paragraph, since it is conceptually
- <quote>bigger,</quote> but the framework does not know that and must be
- explicitly told that the Paragraph annotation has priority over the Sentence
- annotation, like this:
-
-
- <programlisting><typePriorities>
- <priorityList>
- <type>org.myorg.Paragraph</type>
- <type>org.myorg.Sentence</type>
- </priorityList>
-</typePriorities></programlisting></para>
-
- <para>All of the <literal><priorityList></literal> elements defined
- in the descriptor (and in all component descriptors of an aggregate analysis
- engine descriptor) are merged to produce a single priority list.</para>
-
- <para>Subtypes of types specified here are also ordered, unless overridden by
- another user-specified type ordering. For example, if you specify type A
- comes before type B, then subtypes of A will come before subtypes of B, unless
- there is an overriding specification which declares some subtype of B comes
- before some subtype of A.</para>
-
- <para>If there are inconsistencies between the priority list (type A declared
- before type B in one priority list, and type B declared before type A in
- another), the framework will throw an exception.</para>
-
- <para>User defined indexes may declare if they wish to use the type priority or
- not; see the next section.</para>
- </section>
-
- <section id="&tp;aes.index">
- <title>Index Definition</title>
-
-
- <programlisting><![CDATA[<fsIndexCollection>
-
- <name>[String]</name>
- <description>[String]</description>
- <version>[String]</version>
- <vendor>[String]</vendor>
-
- <imports>
- <import ...>
- ...
- </imports>
-
- <fsIndexes>
-
- <fsIndexDescription>
- ...
- </fsIndexDescription>
-
- <fsIndexDescription>
- ...
- </fsIndexDescription>
-
- </fsIndexes>
-
-</fsIndexCollection>]]></programlisting>
-
- <para>The <literal>fsIndexCollection</literal> element declares<emphasis> Feature Structure
- Indexes</emphasis>, each of which defined an index that holds feature structures of a given type.
- Information in the CAS is always accessed through an index. There is a built-in default annotation
- index declared which can be used to access instances of type
- <literal>uima.tcas.Annotation</literal> (or its subtypes), sorted based on their
- <literal>begin</literal> and <literal>end</literal> features. For all other types, there is a
- default, unsorted (bag) index. If there is a need for a specialized index it must be declared in this
- element of the descriptor. See <olink targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.cas.indexes_and_iterators"/> for details on FS indexes.</para>
-
- <para>Like type systems and type priorities, an
- <literal>fsIndexCollection</literal> can declare a
- <literal>name</literal>, <literal>description</literal>,
- <literal>vendor</literal>, and <literal>version</literal>, and may
- import other <literal>fsIndexCollection</literal>s. The import syntax is
- described in <xref linkend="&tp;imports"/>.</para>
-
- <para>An <literal>fsIndexCollection</literal> may also define zero or more
- <literal>fsIndexDescription</literal> elements, each of which defines a
- single index. Each <literal>fsIndexDescription</literal> has the form:
-
-
- <programlisting><![CDATA[<fsIndexDescription>
-
- <label>[String]</label>
- <typeName>[TypeName]</typeName>
- <kind>sorted|bag|set</kind>
-
- <keys>
-
- <fsIndexKey>
- <featureName>[Name]</featureName>
- <comparator>standard|reverse</comparator>
- </fsIndexKey>
-
- <fsIndexKey>
- <typePriority/>
- </fsIndexKey>
-
- ...
-
- </keys>
-</fsIndexDescription>]]></programlisting></para>
-
- <para>The <literal>label</literal> element defines the name by which
- applications and annotators refer to this index. The
- <literal>typeName</literal> element contains the name of the type that will
- be contained in this index. This must match one of the type names defined in the
- <literal><typeSystemDescription></literal>.</para>
-
- <para>There are three possible values for the
- <literal><kind></literal> of index. Sorted indexes enforce an
- ordering of feature structures, and may contain duplicates. Bag indexes do
- not enforce ordering, and also may contain duplicates. Set indexes do not
- enforce ordering and may not contain duplicates. If the <literal><kind></literal>element is omitted, it will default to
- sorted, which is the most common type of index.</para>
-
- <note><para>There is usually no need to explicitly declare a Bag index in your descriptor.
- As of UIMA v2.1, if you do not declare any index for a type (or any of its
- supertypes), a Bag index will be automatically created.</para></note>
-
- <para>An index may define one or more <emphasis>keys</emphasis>. These keys
- determine the sort order of the feature structures within a sorted index, and
- determine equality for set indexes. Bag indexes do not use keys. Keys are
- ordered by precedence – the first key is evaluated first, and
- subsequent keys are evaluated only if necessary.</para>
-
- <para>Each key is represented by an <literal>fsIndexKey</literal> element.
- Most <literal>fsIndexKeys</literal> contains a
- <literal>featureName</literal> and a <literal>comparator</literal>.
- The <literal>featureName</literal> must match the name of one of the
- features for the type specified in the
- <literal><typeName></literal> element for this index. The
- comparator defines how the features will be compared – a value of
- <literal>standard</literal> means that features will be compared using the
- standard comparison for their data type (e.g. for numerical types, smaller
- values precede larger values, and for string types, Unicode string
- comparison is performed). A value of <literal>reverse</literal> means that
- features will be compared using the reverse of the standard comparison (e.g.
- for numerical types, larger values precede smaller values, etc.). For Set
- indexes, the comparator direction is ignored – the keys are only used
- for the equality testing.</para>
-
- <para>Each key used in comparisons must refer to a feature whose range type is
- String, Float, or Integer.</para>
-
- <para>There is a second type of a key, one which contains only the
- <literal><typePriority/></literal>. When this key is used, it
- indicates that Feature Structures will be compared using the type priorities
- declared in the <literal><typePriorities></literal> section of the
- descriptor.</para>
-
- </section>
-
- <section id="&tp;aes.capabilities">
- <title>Capabilities</title>
-
-
- <programlisting><![CDATA[<capabilities>
- <capability>
-
- <inputs>
- <type allAnnotatorFeatures="true|false"[TypeName]</type>
- ...
- <feature>[TypeName]:[Name]</feature>
- ...
- </inputs>
-
- <outputs>
- <type allAnnotatorFeatures="true|false"[TypeName]</type>
- ...
- <feature>[TypeName]:[Name]</feature>
- ...
- </output>
-
- <languagesSupported>
- <language>[ISO Language ID]</language>
- ...
- </languagesSupported>
-
- <inputSofas>
- <sofaName>[name]</sofaName>
- ...
- </inputSofas>
-
- <outputSofas>
- <sofaName>[name]</sofaName>
- ...
- </outputSofas>
- </capability>
-
- <capability>
- ...
- </capability>
-
- ...
-
-</capabilities>]]></programlisting>
-
- <para>The capabilities definition is used by the UIMA Framework in several
- ways, including setting up the Results Specification for process calls,
- routing control for aggregates based on language, and as part of the Sofa
- mapping function.</para>
-
- <para>The <literal>capabilities</literal> element contains one or more
- <literal>capability</literal> elements. In Version 2 and onwards, only one
- capability set should be used (multiple sets will continue to work for a while,
- but they're not logically consistently supported).
- <!-- Because you can therefore
- declare multiple capability sets, you can use this to model component behavior
-
- that for a given set of inputs, produces a particular set of outputs. --></para>
-
- <para>Each <literal>capability</literal> contains
- <literal>inputs</literal>, <literal>outputs</literal>,
- <literal>languagesSupported, inputSofas, and outputSofas</literal>.
- Inputs and outputs element are required (though they may be empty);
- <literal><languagesSupported>, <inputSofas</literal>>,
- and <literal><outputSofas></literal> are optional.</para>
-
- <para>Both inputs and outputs may contain a mixture of type and feature
- elements.</para>
-
- <para><literal><type...></literal> elements contain the name of one
- of the types defined in the type system or one of the built in types. Declaring a
- type as an input means that this component expects instances of this type to be
- in the CAS when it receives it to process. Declaring a type as an output means
- that this component creates new instances of this type in the CAS.</para>
-
- <para>There is an optional attribute
- <literal>allAnnotatorFeatures</literal>, which defaults to false if
- omitted. The Component Descriptor Editor tool defaults this to true when a new
- type is added to the list of inputs and/or outputs. When this attribute is true,
- it specifies that all of the type's features are also declared as input or
- output. Otherwise, the features that are required as inputs or populated as
- outputs must be explicitly specified in feature elements.</para>
-
- <para><literal><feature...></literal> elements contain the
- <quote>fully-qualified</quote> feature name, which is the type name
- followed by a colon, followed by the feature name, e.g.
- <literal>org.myorg.TokenAnnotation:lemma</literal>.
- <literal><feature...></literal> elements in the
- <literal><inputs></literal> section must also have a corresponding
- type declared as an input. In output sections, this is not required. If the type
- is not specified as an output, but a feature for that type is, this means that
- existing instances of the type have the values of the specified features
- updated. Any type mentioned in a <literal><feature></literal>
- element must be either specified as an input or an output or both.</para>
-
- <para><literal>language </literal>elements contain one of the ISO language
- identifiers, such as <literal>en</literal> for English, or
- <literal>en-US</literal> for the United States dialect of English.</para>
-
- <para>The list of language codes can be found here: <ulink
- url="http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt"/>
- and the country codes here:
- <ulink
- url="http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html"/>
- </para>
-
- <para><literal><inputSofas></literal> and
- <literal><outputSofas></literal> declare sofa names used by this
- component. All Sofa names must be unique within a particular capability set. A
- Sofa name must be an input or an output, and cannot be both. It is an error to have a
- Sofa name declared as an input in one capability set, and also have it declared
- as an output in another capability set.</para>
-
- <para>A <literal><sofaName></literal> is written as a simple
- Java-style identifier, without any periods in the name, except that it may be
- written to end in <quote><literal>.*</literal></quote>. If written in this
- manner, it specifies a set of Sofa names, all of which start with the base name
- (the part before the .*) followed by a period and then an arbitrary Java
- identifier (without periods). This form is used to specify in the descriptor
- that the component could generate an arbitrary number of Sofas, the exact
- names and numbers of which are unknown before the component is run.</para>
-
- </section>
-
- <section id="&tp;aes.operational_properties">
- <title>OperationalProperties</title>
-
- <para>Components can specify specific operational properties that can be
- useful in deployment. The following are available:</para>
-
-
- <programlisting><![CDATA[<operationalProperties>
- <modifiesCas> true|false </modifiesCas>
- <multipleDeploymentAllowed> true|false </multipleDeploymentAllowed>
- <outputsNewCASes> true|false </outputsNewCASes>
-</operationalProperties>]]></programlisting>
-
- <para><literal>ModifiesCas</literal>, if false, indicates that this
- component does not modify the CAS. If it is not specified, the default value is
- true except for CAS Consumer components.</para>
-
- <para><literal>multipleDeploymentAllowed</literal>, if true, allows the
- component to be deployed multiple times to increase performance throught
- scale-out techniques. If it is not specified, the default value is true,
- except for CAS Consumer and Collection Reader components.</para>
-
- <note><para>If you wrap one or more CAS Consumers inside an aggregate as the only
- components, you must explicitly specify in the aggregate the
- <literal>multipleDeploymentAllowed</literal> property as false (assuming the CAS Consumer
- components take the default here); otherwise the framework will complain about inconsistent
- settings for these.</para></note>
-
- <para><literal>outputsNewCASes</literal>, if true, allows the component to
- create new CASes during processing, for example to break a large artifact into
- smaller pieces. See <olink targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.cm"/> for details.</para>
- </section>
-
- <section id="&tp;aes.primitive.external_resource_dependencies">
- <title>External Resource Dependencies</title>
-
-
- <programlisting><![CDATA[<externalResourceDependencies>
- <externalResourceDependency>
- <key>[String]</key>
- <description>[String] </description>
- <interfaceName>[String]</interfaceName>
- <optional>true|false</optional>
- </externalResourceDependency>
-
- <externalResourceDependency>
- ...
- </externalResourceDependency>
-
- ...
-
-</externalResourceDependencies>]]></programlisting>
-
- <para>A primitive annotator may declare zero or more
- <literal><externalResourceDependency></literal> elements. Each
- dependency has the following elements:
-
- <itemizedlist><listitem><para><literal>key</literal> – the
- string by which the annotator code will attempt to access the resource. Must
- be unique within this annotator.</para></listitem>
-
- <listitem><para><literal>description</literal> – a textual
- description of the dependency</para></listitem>
-
- <listitem><para><literal>interfaceName</literal> – the
- fully-qualified name of the Java interface through which the annotator
- will access the data. This is optional. If not specified, the annotator
- can only get an InputStream to the data.</para></listitem>
-
- <listitem><para><literal>optional</literal> – whether the
- resource is optional. If false, an exception will be thrown if no resource
- is assigned to satisfy this dependency. Defaults to false. </para>
- </listitem></itemizedlist></para>
-
- </section>
-
- <section id="&tp;aes.primitive.resource_manager_configuration">
- <title>Resource Manager Configuration</title>
-
-
- <programlisting><![CDATA[<resourceManagerConfiguration>
-
- <name>[String]</name>
- <description>[String]</description>
- <version>[String]</version>
- <vendor>[String]</vendor>
-
- <imports>
- <import ...>
- ...
- </imports>
-
- <externalResources>
-
- <externalResource>
- <name>[String]</name>
- <description>[String]</description>
- <fileResourceSpecifier>
- <fileUrl>[URL]</fileUrl>
- </fileResourceSpecifier>
- <implementationName>[String]</implementationName>
- </externalResource>
- ...
- </externalResources>
-
- <externalResourceBindings>
- <externalResourceBinding>
- <key>[String]</key>
- <resourceName>[String]</resourceName>
- </externalResourceBinding>
- ...
- </externalResourceBindings>
-
-</resourceManagerConfiguration>]]></programlisting>
-
- <para>This element declares external resources and binds them to
- annotators' external resource dependencies.</para>
-
- <para>The <literal>resourceManagerConfiguration</literal> element may
- optionally contain an <literal>import</literal>, which allows resource
- definitions to be stored in a separate (shareable) file. See <xref
- linkend="&tp;imports"/> for details.</para>
-
- <para>The <literal>externalResources</literal> element contains zero or
- more <literal>externalResource</literal> elements, each of which
- consists of:
-
- <itemizedlist><listitem><para><literal>name</literal> – the
- name of the resource. This name is referred to in the bindings (see below).
- Resource names need to be unique within any Aggregate Analysis Engine or
- Collection Processing Engine, so the Java-like
- <literal>org.myorg.mycomponent.MyResource</literal> syntax is
- recommended.</para></listitem>
-
- <listitem><para><literal>description</literal> – English
- description of the resource</para></listitem>
-
- <listitem><para>Resource Specifier –
- Declares the location of the resource. There are different
- possibilities for how this is done (see below).</para></listitem>
-
- <listitem><para><literal>implementationName</literal> – The
- fully-qualified name of the Java class that will be instantiated from the
- resource data. This is optional; if not specified, the resource will be
- accessible as an input stream to the raw data. If specified, the Java class
- must implement the <literal>interfaceName</literal> that is
- specified in the External Resource Dependency to which it is bound.
- </para></listitem></itemizedlist></para>
-
- <para>One possibility for the resource specifier is a
- <literal><fileResourceSpecifier></literal>, as shown above. This
- simply declares a URL to the resource data. This support is built on the Java
- class URL and its method URL.openStream(); it supports the protocols
- <quote>file</quote>, <quote>http</quote> and <quote>jar</quote> (for
- referring to files in jars) by default, and you can plug in handlers for other
- protocols. The URL has to start with file: (or some other protocol). It is
- relative to either the classpath or the <quote>data path</quote>. The data
- path works like the classpath but can be set programmatically via
- <literal>ResourceManager.setDataPath()</literal>. Setting the Java
- System property <literal>uima.datapath</literal> also works.</para>
-
- <para><literal>file:com/apache.d.txt</literal> is a relative path;
- relative paths for resources are resolved using the classpath and/or the
- datapath. For the file protocol, URLs starting with file:/ or file:/// are
- absolute. Note that <literal>file://org/apache/d.txt</literal> is NOT an
- absolute path starting with <quote>org</quote>. The <quote>//</quote>
- indicates that what follows is a host name. Therefore if you try to use this URL
- it will complain that it can't connect to the host <quote>org</quote>
- </para>
-
- <para>Another option is a
- <literal><fileLanguageResourceSpecifier></literal>, which is
- intended to support resources, such as dictionaries, that depend on the
- language of the document being processed. Instead of a single URL, a prefix and
- suffix are specified, like this:
-
-
- <programlisting><![CDATA[<fileLanguageResourceSpecifier>
- <fileUrlPrefix>file:FileLanguageResource_implTest_data_</fileUrlPrefix>
- <fileUrlSuffix>.dat</fileUrlSuffix>
-</fileLanguageResourceSpecifier>]]></programlisting></para>
-
- <para>The URL of the actual resource is then formed by concatenating the prefix,
- the language of the document (as an ISO language code, e.g.
- <literal>en</literal> or <literal>en-US</literal>
- – see <xref linkend="&tp;aes.capabilities"/> for more
- information), and the suffix.</para>
-
- <para>A third option is a <literal>customResourceSpecifier</literal>, which allows
- you to plug in an arbitrary Java class. See <xref linkend="&tp;custom_resource_specifiers"/>
- for more information.</para>
-
- <para>The <literal>externalResourceBindings</literal> element declares
- which resources are bound to which dependencies. Each
- <literal>externalResourceBinding</literal> consists of:
-
- <itemizedlist><listitem><para><literal>key</literal> –
- identifies the dependency. For a binding declared in a primitive analysis
- engine descriptor, this must match the value of the
- <literal>key</literal> element of one of the
- <literal>externalResourceDependency</literal> elements. Bindings
- may also be specified in aggregate analysis engine descriptors, in which
- case a compound key is used
- – see <xref
- linkend="&tp;aes.aggregate.external_resource_bindings"/>
- .</para></listitem>
-
- <listitem><para><literal>resourceName</literal> – the name of
- the resource satisfying the dependency. This must match the value of the
- <literal>name</literal> element of one of the
- <literal>externalResource</literal> declarations. </para>
- </listitem></itemizedlist></para>
-
- <para>A given resource dependency may only be bound to one external resource;
- one external resource may be bound to many dependencies – to allow
- resource sharing.</para>
- </section>
-
- <section id="&tp;aes.environment_variable_references">
- <title>Environment Variable References</title>
-
- <para>In several places throughout the descriptor, it is possible to reference
- environment variables. In Java, these are actually references to Java system
- properties. To reference system environment variables from a Java analysis
- engine you must pass the environment variables into the Java virtual machine
- by using the <literal>-D</literal> option on the <literal>java</literal>
- command line.</para>
-
- <para>The syntax for environment variable references is
- <literal><envVarRef>[VariableName]</envVarRef></literal>
- , where [VariableName] is any valid Java system property name. Environment
- variable references are valid in the following places:
-
- <itemizedlist spacing="compact"><listitem><para>The value of a
- configuration parameter (String-valued parameters only)</para>
- </listitem>
-
- <listitem><para>The
- <literal><annotatorImplementationName></literal> element
- of a primitive AE descriptor</para></listitem>
-
- <listitem><para>The <literal><name></literal> element within
- <literal><analysisEngineMetaData></literal></para>
- </listitem>
-
- <listitem><para>Within a
- <literal><fileResourceSpecifier></literal> or
- <literal><fileLanguageResourceSpecifier></literal>
- </para></listitem></itemizedlist></para>
-
- <para>For example, if the value of a configuration parameter were specified as:
- <literal><string><envVarRef>TEMP_DIR</envVarRef>/temp.dat</string></literal>
- , and the value of the <literal>TEMP_DIR</literal> Java System property were
- <literal>c:/temp</literal>, then the configuration parameter's
- value would evaluate to <literal>c:/temp/temp.dat</literal>.</para>
-
- </section>
- </section>
- <section id="&tp;aes.aggregate">
- <title>Aggregate Analysis Engine Descriptors</title>
-
- <para>Aggregate Analysis Engines do not contain an annotator, but instead
- contain one or more component (also called <emphasis>delegate</emphasis>)
- analysis engines.</para>
-
- <para>Aggregate Analysis Engine Descriptors maintain most of the same structure
- as Primitive Analysis Engine Descriptors. The differences are:</para>
-
- <itemizedlist><listitem><para>An Aggregate Analysis Engine Descriptor
- contains the element
- <literal><primitive>false</primitive></literal> rather
- than <literal><primitive>true</primitive></literal>.
- </para></listitem>
-
- <listitem><para>An Aggregate Analysis Engine Descriptor must not include a
- <literal><annotatorImplementationName></literal>
- element.</para></listitem>
-
- <listitem><para>In place of the
- <literal><annotatorImplementationName></literal>, an Aggregate
- Analysis Engine Descriptor must have a
- <literal><delegateAnalysisEngineSpecifiers></literal>
- element. See <xref linkend="&tp;aes.aggregate.delegates"/>.</para>
- </listitem>
-
- <listitem><para>An Aggregate Analysis Engine Descriptor may provide a
- <literal><flowController></literal> element immediately
- following the
- <literal><delegateAnalysisEngineSpecifiers></literal>. <xref
- linkend="&tp;aes.aggregate.flow_controller"/>.</para></listitem>
-
- <listitem><para>Under the analysisEngineMetaData element, an Aggregate
- Analysis Engine Descriptor may specify an additional element --
- <literal><flowConstraints></literal>. See <xref
- linkend="&tp;aes.aggregate.flow_constraints"/>. Typically only one
- of <literal><flowController></literal> and
- <literal><flowConstraints></literal> are specified. If both are
- specified, the <literal><flowController></literal> takes
- precedence, and the flow controller implementation can use the information
- in specified in the <literal><flowConstraints></literal> as part of
- its configuration input.</para></listitem>
-
- <listitem><para>An aggregate Analysis Engine Descriptors must not contain a
- <literal><typeSystemDescription></literal> element. The Type
- System of the Aggregate Analysis Engine is derived by merging the Type System
- of the Analysis Engines that the aggregate contains.</para></listitem>
-
- <listitem><para>Within aggregate Analysis Engine Descriptors,
- <literal><configurationParameter></literal> elements may define
- <literal><overrides></literal>. See <xref
- linkend="&tp;aes.aggregate.configuration_parameter_overrides"/>
- .</para></listitem>
-
- <listitem><para>External Resource Bindings can bind resources to
- dependencies declared by any delegate AE within the aggregate. See <xref
- linkend="&tp;aes.aggregate.external_resource_bindings"/>.</para>
- </listitem>
-
- <listitem><para>An additional optional element,
- <literal><sofaMappings></literal>, may be included. </para>
- </listitem></itemizedlist>
-
- <section id="&tp;aes.aggregate.delegates">
- <title>Delegate Analysis Engine Specifiers</title>
-
-
- <programlisting><![CDATA[<delegateAnalysisEngineSpecifiers>
-
- <delegateAnalysisEngine key="[String]">
- <analysisEngineDescription>...</analysisEngineDescription> |
- <import .../>
- </delegateAnalysisEngine>
-
- <delegateAnalysisEngine key="[String]">
- ...
- </delegateAnalysisEngine>
-
- ...
-
-</delegateAnalysisEngineSpecifiers>]]></programlisting>
-
- <para>The <literal>delegateAnalysisEngineSpecifiers</literal> element
- contains one or more <literal>delegateAnalysisEngine</literal>
- elements. Each of these must have a unique key, and must contain
- either:</para>
-
- <itemizedlist><listitem><para>A complete
- <literal>analysisEngineDescription</literal> element describing the
- delegate analysis engine <emphasis role="bold">OR</emphasis></para>
- </listitem>
-
- <listitem><para>An <literal>import</literal> element giving the name or
- location of the XML descriptor for the delegate analysis engine (see <xref
- linkend="&tp;imports"/>).</para></listitem></itemizedlist>
-
- <para>The latter is the much more common usage, and is the only form supported by
- the Component Descriptor Editor tool.</para>
- </section>
- <section id="&tp;aes.aggregate.flow_controller">
- <title>FlowController</title>
-
-
- <programlisting><![CDATA[<flowController key="[String]">
- <flowControllerDescription>...</flowControllerDescription> |
- <import .../>
- </flowController>]]></programlisting>
-
- <para>The optional <literal>flowController</literal> element identifies
- the descriptor of the FlowController component that will be used to determine
- the order in which delegate Analysis Engine are called.</para>
-
- <para>The <literal>key</literal> attribute is optional, but recommended; it
- assigns the FlowController an identifier that can be used for configuration
- parameter overrides, Sofa mappings, or external resource bindings. The key
- must not be the same as any of the delegate analysis engine keys.</para>
-
- <para>As with the <literal>delegateAnalysisEngine</literal> element, the
- <literal>flowController</literal> element may contain either a complete
- <literal>flowControllerDescription</literal> or an
- <literal>import</literal>, but the import is recommended. The Component
- Descriptor Editor tool only supports imports here.</para>
-
- </section>
- <section id="&tp;aes.aggregate.flow_constraints">
- <title>FlowConstraints</title>
-
- <para>If a <literal><flowController></literal> is not specified, the
- order in which delegate Analysis Engines are called within the aggregate
- Analysis Engine is specified using the
- <literal><flowConstraints></literal> element, which must occur
- immediately following the
- <literal>configurationParameterSettings</literal> element. If a
- <literal><flowController></literal> is specified, then the
- <literal><flowConstraints></literal> are optional. They can be
- used to pass an ordering of delegate keys to the
- <literal><flowController></literal>.</para>
-
- <para>There are two options for flow constraints --
- <literal><fixedFlow></literal> or
- <literal><capabilityLanguageFlow></literal>. Each is discussed
- in a separate section below.</para>
-
- <section id="&tp;aes.aggregate.flow_constraints.fixed_flow">
- <title>Fixed Flow</title>
-
-
- <programlisting><![CDATA[<flowConstraints>
- <fixedFlow>
- <node>[String]</node>
- <node>[String]</node>
- ...
- </fixedFlow>
-</flowConstraints>]]></programlisting>
-
- <para>The <literal>flowConstraints</literal> element must be included
- immediately following the
- <literal>configurationParameterSettings</literal> element.</para>
-
- <para>Currently the <literal>flowConstraints</literal> element must
- contain a <literal>fixedFlow</literal> element. Eventually, other
- types of flow constraints may be possible.</para>
-
- <para>The <literal>fixedFlow</literal> element contains one or more
- <literal>node</literal> elements, each of which contains an identifier
- which must match the key of a delegate analysis engine specified in the
- <literal>delegateAnalysisEngineSpecifiers</literal>
- element.</para>
-
- </section>
- <section
- id="&tp;aes.aggregate.flow_constraints.capability_language_flow">
- <title>Capability Language Flow</title>
-
-
- <programlisting><![CDATA[<flowConstraints>
- <capabilityLanguageFlow>
- <node>[String]</node>
- <node>[String]</node>
- ...
- </capabilityLanguageFlow>
-</flowConstraints>]]></programlisting>
-
- <para>If you use <literal><capabilityLanguageFlow></literal>,
- the delegate Analysis Engines named by the
- <literal><node></literal> elements are called in the given order,
- except that a delegate Analysis Engine is skipped if any of the following are
- true (according to that Analysis Engine's declared output
- capabilities):</para>
-
- <itemizedlist><listitem><para>It cannot produce any of the aggregate
- Analysis Engine's output capabilities for the language of the
- current document.</para></listitem>
-
- <listitem><para>All of the output capabilities have already been
- produced by an earlier Analysis Engine in the flow. </para></listitem>
- </itemizedlist>
-
- <para>For example, if two annotators produce
- <literal>org.myorg.TokenAnnotation</literal> feature structures for
- the same language, these feature structures will only be produced by the
- first annotator in the list.</para>
-
- <note><para>The flow analysis uses the specific types that are specified in the
- output capabilities, without any expansion for subtypes. So, if you expect
- a type TT and another type SubTT (which is a subtype of TT) in the output, you
- must include both of them in the output capabilities.</para></note>
- </section>
- </section>
-
- <section id="&tp;aes.aggregate.configuration_parameter_overrides">
- <title>Configuration Parameter Overrides</title>
-
- <para>In an aggregate Analysis Engine Descriptor, each
- <literal><configurationParameter> </literal>element should
- contain an <literal><overrides></literal> element, with the
- following syntax:</para>
-
-
- <programlisting><![CDATA[<overrides>
-
- <parameter>
- [delegateAnalysisEngineKey]/[parameterName]
- </parameter>
-
- <parameter>
- [delegateAnalysisEngineKey]/[parameterName]
- </parameter>
- ...
-
-</overrides>]]></programlisting>
-
- <para>Since aggregate Analysis Engines have no code associated with them, the
- only way in which their configuration parameters can affect their processing
- is by overriding the parameter values of one or more delegate analysis
- engines. The <literal><overrides> </literal>element determines
- which parameters, in which delegate Analysis Engines, are overridden by this
- configuration parameter.</para>
-
- <para>For example, consider an aggregate Analysis Engine Descriptor that
- contains delegate Analysis Engines with keys
- <literal>annotator1</literal> and <literal>annotator2</literal> (as
- declared in the <delegateAnalysisEngine> element – see <xref
- linkend="&tp;aes.aggregate.delegates"/>) and also declares a
- configuration parameter as follows:
-
-
- <programlisting><![CDATA[<configurationParameter>
- <name>AggregateParam</name>
- <type>String</type>
- <overrides>
- <parameter>annotator1/param1</parameter>
- <parameter>annotator2/param2</parameter>
- </overrides>
-</configurationParameter>]]></programlisting></para>
-
- <para>The value of the <literal>AggregateParam</literal> parameter
- (whether assigned in the aggregate descriptor or at runtime by an
- application) will override the value of parameter
- <literal>param1</literal> in <literal>annotator1</literal> and also
- override the value of parameter <literal>param2</literal> in
- <literal>annotator2</literal>. No other parameters will be
- affected.</para>
-
- <para>For historical reasons only, if an aggregate Analysis Engine descriptor
- declares a configuration parameter with no explicit overrides, that
- parameter will override any parameters having the same name within any
- delegate analysis engine. This usage is strongly discouraged. The UIMA SDK
- currently supports this usage but logs a warning message to the log file. This
- support may be dropped in future versions.</para>
-
- </section>
-
- <section id="&tp;aes.aggregate.external_resource_bindings">
- <title>External Resource Bindings</title>
-
- <para>Aggregate analysis engine descriptors can declare resource bindings
- that bind resources to dependencies declared in any of the delegate analysis
- engines (or their subcomponents, recursively) within that aggregate. This
- allows resource sharing. Any binding at this level overrides (supersedes)
- any binding specified by a contained component or their subcomponents,
- recursively.</para>
-
- <para>For example, consider an aggregate Analysis Engine Descriptor that
- contains delegate Analysis Engines with keys
- <literal>annotator1</literal> and <literal>annotator2</literal> (as
- declared in the <literal><delegateAnalysisEngine></literal>
- element – see <xref linkend="&tp;aes.aggregate.delegates"/>),
- where <literal>annotator1</literal> declares a resource dependency with
- key <literal>myResource</literal> and <literal>annotator2</literal>
- declares a resource dependency with key <literal>someResource</literal>
- .</para>
-
[... 2712 lines stripped ...]