You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by al...@apache.org on 2007/03/14 23:12:10 UTC
svn commit: r518354 [11/21] - in /incubator/uima/site/trunk/uima-website:
docs/ docs/downloads/releaseDocs/
docs/downloads/releaseDocs/2.1.0-incubating/
docs/downloads/releaseDocs/2.1.0-incubating/docs/
docs/downloads/releaseDocs/2.1.0-incubating/docs/...
Added: incubator/uima/site/trunk/uima-website/docs/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html
URL: http://svn.apache.org/viewvc/incubator/uima/site/trunk/uima-website/docs/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html?view=auto&rev=518354
==============================================================================
--- incubator/uima/site/trunk/uima-website/docs/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html (added)
+++ incubator/uima/site/trunk/uima-website/docs/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html Wed Mar 14 15:11:54 2007
@@ -0,0 +1,4143 @@
+<html><head>
+ <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+ <title>UIMA Tutorial and Developers' Guides</title><link rel="stylesheet" href="css/stylesheet.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.70.0"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="book" lang="en" id="d0e2"><div class="titlepage"><div><div><h1 class="title"><a name="d0e2"></a>UIMA Tutorial and Developers' Guides</h1></div><div><div class="authorgroup"><h3 class="corpauthor">Authors: The Apache UIMA Development Community</h3></div></div><div><p class="releaseinfo">Version 2.1</p></div><div><p class="copyright">Copyright © 2006, 2007 The Apache Software Foundation</p></div><div><p class="copyright">Copyright © 2004, 2006 International Business Machines Corporation</p></div><div><div class="legalnotice"><a name="d0e15"></a><p> </p><p><b>Incubation Notice and Disclaimer. </b>Apache UIMA is an effort undergoing incubation at the Apache Software Foundation (ASF).
+ Incubation is required of all newly accepted projects until a further review indicates that
+ the infrastructure, communications, and decision making process have stabilized in a manner
+ consistent with other successful ASF projects. While incubation status is not necessarily
+ a reflection of the completeness or stability of the code,
+ it does indicate that the project has yet to be fully endorsed by the ASF.</p><p> </p><p> </p><p><b>License and Disclaimer. </b>The ASF licenses this documentation
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this documentation except in compliance
+ with the License. You may obtain a copy of the License at
+
+ </p><div class="blockquote"><blockquote class="blockquote"><a href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a></blockquote></div><p>
+
+ Unless required by applicable law or agreed to in writing,
+ this documentation and its contents are distributed under the License
+ on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ </p><p> </p><p> </p><p><b>Trademarks. </b>All terms mentioned in the text that are known to be trademarks or
+ service marks have been appropriately capitalized. Use of such terms
+ in this book should not be regarded as affecting the validity of the
+ the trademark or service mark.
+ </p></div></div><div><p class="pubdate">February, 2007</p></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#ugr.tug.aae">1. Annotator & AE Developer's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.getting_started">1.1. Getting Started</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.defining_types">1.1.1. Defining Types</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.generating_jcas_sources">1.1.2. Generating Java Source Files for CAS Types</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.developing_annotator_code">1.1.3. Developing Your Annotator Code</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.creating_xml_descriptor">1.1.4. Creating the XML Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.testing_your_annotator">1.1.5. Testing Your Annotator</a></span></dt></dl></dd><dt><span c
lass="section"><a href="#ugr.tug.aae.configuration_logging">1.2. Configuration and Logging</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.configuration_parameters">1.2.1. Configuration Parameters</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.logging">1.2.2. Logging</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aae.building_aggregates">1.3. Building Aggregate Analysis Engines</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.combining_annotators">1.3.1. Combining Annotators</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.aaes_can_contain_cas_consumers">1.3.2. AEs can also contain CAS Consumers</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.reading_results_previous_annotators">1.3.3. Reading the Results of Previous Annotators</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aae.other_examples">1.4. Other examples</a></span></dt><dt><span class
="section"><a href="#ugr.tug.aae.additional_topics">1.5. Additional Topics</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.contract_for_annotator_methods">1.5.1. Annotator Methods</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.reporting_errors_from_annotators">1.5.2. Reporting errors from Annotators</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.throwing_exceptions_from_annotators">1.5.3. Throwing Exceptions from Annotators</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.accessing_external_resource_files">1.5.4. Accessing External Resource Files</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.result_specification_setting">1.5.5. Result Specifications</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.classpath_when_using_jcas">1.5.6. Class path setup when using JCas</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.using_shell_scripts">1.5.7. Using the Shell Scripts<
/a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aae.common_pitfalls">1.6. Common Pitfalls</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.viewing_UIMA_objects_in_eclipse_debugger">1.7. UIMA Objects in Eclipse Debugger</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro_ae_descriptor">1.8. Analysis Engine XML Descriptor</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.header_annotator_class_identification">1.8.1. Header and Annotator Class Identification</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro_simple_metadata_attributes">1.8.2. Simple Metadata Attributes</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro_type_system_definition">1.8.3. Type System Definition</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro_capabilities">1.8.4. Capabilities</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro.configurat
ion_parameters">1.8.5. Configuration Parameters (Optional)</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#ugr.tug.cpe">2. CPE Developer's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.concepts">2.1. CPE Concepts</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.configurator_and_viewer">2.2. CPE Configurator and CAS viewer</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.cpe_configurator">2.2.1. Using the CPE Configurator</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.running_cpe_configurator_from_eclipse">2.2.2. Running the CPE Configurator from Eclipse</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cpe.running_cpe_from_application">2.3. Running a CPE from Your Own Java Application</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.using_listeners">2.3.1. Using Listeners</a></span></dt></dl></dd><dt><span class="section"><a href=
"#ugr.tug.cpe.developing_collection_processing_components">2.4. Developing Collection Processing Components</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.collection_reader.developing">2.4.1. Developing Collection Readers</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.cas_initializer.developing">2.4.2. Developing CAS
+ Initializers</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.cas_consumer.developing">2.4.3. Developing CAS
+ Consumers</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cpe.deploying_a_cpe">2.5. Deploying a CPE</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.managed_deployment">2.5.1. Deploying Managed CAS Processors</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.deploying_nonmanaged_cas_processors">2.5.2. Deploying Non-managed CAS Processors</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.integrated_deployment">2.5.3. Deploying Integrated CAS Processors</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cpe.collection_processing_examples">2.6. Collection Processing Examples</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.application">3. Application Developer's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.appication.uimaframework_class">3.1. The UIMAFramework Class</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.usi
ng_aes">3.2. Using Analysis Engines</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.application.instantiating_an_ae">3.2.1. Instantiating an Analysis Engine</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.analyzing_text_documents">3.2.2. Analyzing Text Documents</a></span></dt><dt><span class="section"><a href="#ugr.tug.applications.analyzing_non_text_artifacts">3.2.3. Analyzing Non-Text Artifacts</a></span></dt><dt><span class="section"><a href="#ugr.tug.applications.accessing_analysis_results">3.2.4. Accessing Analysis Results</a></span></dt><dt><span class="section"><a href="#ugr.tug.applications.multi_threaded">3.2.5. Multi-threaded Applications</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.using_multiple_aes">3.2.6. Multiple AEs & Creating Shared CASes</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.saving_cases_to_file_systems">3.2.7. Saving CASes to file systems</a></span></d
t></dl></dd><dt><span class="section"><a href="#ugr.tug.application.using_cpes">3.3. Using Collection Processing Engines</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.application.running_a_cpe_from_a_descriptor">3.3.1. Running a CPE from a Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.configuring_a_cpe_descriptor_programmatically">3.3.2. Configuring a CPE Descriptor Programmatically</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.application.setting_configuration_parameters">3.4. Setting Configuration Parameters</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.integrating_text_analysis_and_search">3.5. Integrating Text Analysis and Search</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.application.building_an_index">3.5.1. Building an Index</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.search.query_tool">3.5.2. Semantic Search Query
Tool</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.application.remote_services">3.6. Working with Remote Services</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.application.how_to_deploy_as_soap">3.6.1. Deploying as SOAP Service</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.how_to_deploy_a_vinci_service">3.6.2. Deploying as a Vinci Service</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.how_to_call_a_uima_service">3.6.3. Calling a UIMA Service</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.restrictions_on_remotely_deployed_services">3.6.4. Restrictions on remotely deployed services</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.vns">3.6.5. The Vinci Naming Services (VNS)</a></span></dt><dt><span class="section"><a href="#ugr.tug.configuring_timeout_settings">3.6.6. Configuring Timeout Settings</a></span></dt></dl></dd><dt><span class="sec
tion"><a href="#ugr.tug.application.increasing_performance_using_parallelism">3.7. Increasing performance using parallelism</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.jmx">3.8. Monitoring AE Performance using JMX</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.fc">4. Flow Controller Developer's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.fc.developing_fc_code">4.1. Developing the Flow Controller Code</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.fc.fc_interface_overview">4.1.1. Flow Controller Interface Overview</a></span></dt><dt><span class="section"><a href="#ugr.tug.fc.example_code">4.1.2. Example Code</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.fc.creating_fc_descriptor">4.2. Creating the Flow Controller Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.fc.adding_fc_to_aggregate">4.3. Adding Flow Controller to an Aggregate</a></
span></dt><dt><span class="section"><a href="#ugr.tug.fc.adding_fc_to_cpe">4.4. Adding Flow Controller to CPE</a></span></dt><dt><span class="section"><a href="#ugr.tug.fc.using_fc_with_cas_multipliers">4.5. Using Flow Controllers with CAS Multipliers</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.aas">5. Annotations, Artifacts & Sofas</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aas.terminology">5.1. Terminology</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aas.artifact">5.1.1. Artifact</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.sofa">5.1.2. Subject of Analysis — Sofa</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aas.sofa_data_formats">5.2. Formats of Sofa Data</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.setting_accessing_sofa_data">5.3. Setting and Accessing Sofa Data</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aas.
setting_sofa_data">5.3.1. Setting Sofa Data</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.accessing_sofa_data">5.3.2. Accessing Sofa Data</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.accessing_sofa_data_using_java_stream">5.3.3. Accessing Sofa Data using a Java Stream</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aas.sofa_fs">5.4. The Sofa Feature Structure</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.annotations">5.5. Annotations</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aas.built_in_annotation_types">5.5.1. Built-in Annotation types</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.annotations_associated_sofa">5.5.2. Annotations have an associated Sofa</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aas.annotationbase">5.6. AnnotationBase</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.mvs">6. Multiple CAS Views</a></sp
an></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.cas_views_and_sofas">6.1. CAS Views and Sofas</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.naming_views_sofas">6.1.1. Naming CAS Views and Sofas</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.multi_view_and_single_view">6.1.2. Multi/Single View parts in Applications</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.mvs.multi_view_components">6.2. Multi-View Components</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.deciding_multi_view">6.2.1. Deciding: Multi-View</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.additional_capabilities">6.2.2. Multi-View: additional capabilities</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.component_xml_metadata">6.2.3. Component XML metadata</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.mvs.sofa_capabilities_and_apis_for_apps">6.3. Sofa Capab
ilities & APIs for Apps</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sofa_name_mapping">6.4. Sofa Name Mapping</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.name_mapping_aggregate">6.4.1. Name Mapping in an Aggregate Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.name_mapping_cpe">6.4.2. Name Mapping in a CPE
+ Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.specifying_cas_view_for_single_view">6.4.3. CAS View for Single-View Parts</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.name_mapping_application">6.4.4. Name Mapping in a UIMA Application</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.name_mapping_remote_services">6.4.5. Name Mapping for Remote Services</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.mvs.jcas_extensions_for_multi_views">6.5. JCas extensions for Multiple Views</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sample_application">6.6. Sample Multi-View Application</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.sample_application.descriptor">6.6.1. Annotator Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sample_application.setup">6.6.2. Application Setup</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs
.sample_application.annotator_processing">6.6.3. Annotator Processing</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sample_application.accessing_results">6.6.4. Accessing the results of analysis</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.mvs.views_api_summary">6.7. Views API Summary</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sofa_incompatibilities_v1_v2">6.8. Sofa Incompatibilities: V1 and V2</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.cm">7. CAS Multiplier</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.developing_multiplier_code">7.1. Developing the CAS Multiplier Code</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.cm_interface_overview">7.1.1. CAS Multiplier Interface Overview</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.how_to_get_empty_cas_instance">7.1.2. Getting an empty CAS Instance</a></span></dt><dt><span class="secti
on"><a href="#ugr.tug.cm.example_code">7.1.3. Example Code</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cm.creating_cm_descriptor">7.2. CAS Multiplier Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.using_cm_in_aae">7.3. Using CAS Multipliers in Aggregates</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.adding_cm_to_aggregate">7.3.1. Aggregate: Adding the CAS Multiplier</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.cm_and_fc">7.3.2. CAS Multipliers and Flow Control</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.aggregate_cms">7.3.3. Aggregate CAS Multipliers</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cm.using_cm_in_cpe">7.4. CAS Multipliers in CPE's</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.calling_cm_from_app">7.5. Applications: Calling CAS Multipliers</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.retriev
ing_output_cases">7.5.1. Output CASes</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.using_cm_with_other_aes">7.5.2. CAS Multipliers with other AEs</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cm.using_cm_to_merge_cases">7.6. Merging with CAS Multipliers</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.overview_of_how_to_merge_cases">7.6.1. CAS Merging Overview</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.example_cas_merger">7.6.2. Example CAS Merger</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.using_the_simple_text_merger_in_an_aggregate_ae">7.6.3. SimpleTextMerger in an Aggregate</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#ugr.tug.xmi_emf">8. XMI & EMF</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.xmi_emf.overview">8.1. Overview</a></span></dt><dt><span class="section"><a href="#ugr.tug.xmi_emf.converting_ecore_to_from_uima_type_
system">8.2. Converting an Ecore Model to or from a UIMA Type System</a></span></dt><dt><span class="section"><a href="#ugr.tug.xmi_emf.using_xmi_cas_serialization">8.3. Using XMI CAS Serialization</a></span></dt></dl></dd></dl></div><div class="chapter" lang="en" id="ugr.tug.aae"><div class="titlepage"><div><div><h2 class="title"><a name="ugr.tug.aae"></a>Chapter 1. Annotator and Analysis Engine Developer's Guide</h2></div></div></div><p>This chapter describes how to develop UIMA <span class="emphasis"><em>type systems</em></span>,
+ <span class="emphasis"><em>Annotators</em></span> and <span class="emphasis"><em>Analysis Engines</em></span> using
+ the UIMA SDK. It is helpful to read the UIMA Conceptual Overview chapter for a review on
+ these concepts.</p><p>An <span class="emphasis"><em>Analysis Engine (AE)</em></span> is a program that analyzes artifacts
+ (e.g. documents) and infers information from them.</p><p>Analysis Engines are constructed from building blocks called
+ <span class="emphasis"><em>Annotators</em></span>. An annotator is a component that contains analysis
+ logic. Annotators analyze an artifact (for example, a text document) and create
+ additional data (metadata) about that artifact. It is a goal of UIMA that annotators need
+ not be concerned with anything other than their analysis logic – for example the
+ details of their deployment or their interaction with other annotators.</p><p>An Analysis Engine (AE) may contain a single annotator (this is referred to as a
+ <span class="emphasis"><em>Primitive AE)</em></span>, or it may be a composition of others and therefore
+ contain multiple annotators (this is referred to as an <span class="emphasis"><em>Aggregate
+ AE</em></span>). Primitive and aggregate AEs implement the same interface and can be used
+ interchangeably by applications.</p><p>Annotators produce their analysis results in the form of typed <span class="emphasis"><em>Feature
+ Structures</em></span>, which are simply data structures that have a type and a set of
+ (attribute, value) pairs. An <span class="emphasis"><em>annotation</em></span> is a particular type of
+ Feature Structure that is attached to a region of the artifact being analyzed (a span of
+ text in a document, for example).</p><p>For example, an annotator may produce an Annotation over the span of text
+ <code class="literal">President Bush</code>, where the type of the Annotation is
+ <code class="literal">Person</code> and the attribute <code class="literal">fullName</code> has the
+ value <code class="literal">George W. Bush</code>, and its position in the artifact is character
+ position 12 through character position 26.</p><p>It is also possible for annotators to record information associated with the entire
+ document rather than a particular span (these are considered Feature Structures but not
+ Annotations).</p><p>All feature structures, including annotations, are represented in the UIMA
+ <span class="emphasis"><em>Common Analysis Structure(CAS)</em></span>. The CAS is the central data
+ structure through which all UIMA components communicate. Included with the UIMA SDK is an
+ easy-to-use, native Java interface to the CAS called the <span class="emphasis"><em>JCas</em></span>.
+ The JCas represents each feature structure as a Java object; the example feature
+ structure from the previous paragraph would be an instance of a Java class Person with
+ getFullName() and setFullName() methods. Though the examples in this guide all use the
+ JCas, it is also possible to directly access the underlying CAS system; for more
+ information see <a href="../references/references.html#ugr.ref.cas" class="olink">Chapter 4, CAS Reference
+ </a> in <span class="olinkdocname">UIMA References</span>
+ .</p><p>The remainder of this chapter will refer to the analysis of text documents and the
+ creation of annotations that are attached to spans of text in those documents. Keep in mind
+ that the CAS can represent arbitrary types of feature structures, and feature structures
+ can refer to other feature structures. For example, you can use the CAS to represent a parse
+ tree for a document. Also, the artifact that you are analyzing need not be a text
+ document.</p><p>This guide is organized as follows:</p><div class="itemizedlist"><ul type="disc"><li><p><span class="bold-italic"><a href="#ugr.tug.aae.getting_started" title="1.1. Getting Started">Section 1.1, “Getting Started”</a></span> is a
+ tutorial with step-by-step instructions for how to develop and test a simple UIMA annotator.</p></li><li><p><span class="bold-italic"><a href="#ugr.tug.aae.configuration_logging" title="1.2. Configuration and Logging">Section 1.2, “Configuration and Logging”</a>
+ </span> discusses how to make your UIMA annotator configurable, and how it can write messages to the UIMA
+ log file.</p></li><li><p> <span class="bold-italic"><a href="#ugr.tug.aae.building_aggregates" title="1.3. Building Aggregate Analysis Engines">Section 1.3, “Building Aggregate Analysis Engines”</a></span>
+ describes how annotators can be combined into aggregate analysis engines. It also describes how one
+ annotator can make use of the analysis results produced by an annotator that has run previously.</p></li><li><p><span class="bold-italic"><a href="#ugr.tug.aae.other_examples" title="1.4. Other examples">Section 1.4, “Other examples”</a></span>
+ describes several other examples you may find interesting, including</p><div class="itemizedlist"><ul type="circle" compact><li><p>SimpleTokenAndSentenceAnnotator
+ – a simple tokenizer and sentence annotator.</p></li><li><p>PersonTitleDBWriterCasConsumer – a sample CAS Consumer which populates a relational
+ database with some annotations. It uses JDBC and in this example, hooks up with the Open Source Apache
+ Derby database. </p></li></ul></div></li><li><p><span class="bold-italic"><a href="#ugr.tug.aae.additional_topics" title="1.5. Additional Topics">Section 1.5, “Additional Topics”</a></span>
+ describes additional features of the UIMA SDK that may help you in building your own annotators and analysis
+ engines.</p></li><li><p><span class="bold-italic"><a href="#ugr.tug.aae.common_pitfalls" title="1.6. Common Pitfalls">Section 1.6, “Common Pitfalls”</a> </span>
+ contains some useful guidelines to help you ensure that your annotators will work correctly in any UIMA
+ application.</p></li></ul></div><p>This guide does not discuss how to build UIMA Applications, which are programs that
+ use Analysis Engines, along with other components, e.g. a search engine, document store,
+ and user interface, to deliver a complete package of functionality to an end-user. For
+ information on application development, see <a href="tutorials_and_users_guides.html#ugr.tug.application" class="olink">Chapter 3: “Application Developer's Guide”</a>
+ .</p><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="ugr.tug.aae.getting_started"></a>1.1. Getting Started</h2></div></div></div><p>This section is a step-by-step tutorial that will get you started developing UIMA
+ annotators. All of the files referred to by the examples in this chapter are in the
+ <code class="literal">examples</code> directory of the UIMA SDK. This directory is designed to
+ be imported into your Eclipse workspace; see <a href="../overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup.example_code" class="olink">Section 3.2, “Setting up Eclipse to view Example Code”</a> in <span class="olinkdocname">Overview & Setup</span> for instructions on how to do
+ this. Also you may wish to refer to the UIMA SDK JavaDocs located in the <a href="file:api/index.html" target="_top">docs/api</a> directory.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>In Eclipse 3.1, if you highlight a UIMA class or method defined in the UIMA SDK
+ JavaDocs, you can conveniently have Eclipse open the corresponding JavaDoc for that
+ class or method in a browser, by pressing Shift + F2.</p></div><p>The example annotator that we are going to walk through will detect room numbers for
+ rooms where the room numbering scheme follows some simple conventions. In our example,
+ there are two kinds of patterns we want to find; here are some examples, together with
+ their corresponding regular expression patterns:
+ </p><div class="variablelist"><dl><dt><span class="term">Yorktown patterns:</span></dt><dd><p>20-001, 31-206, 04-123(Regular Expression Pattern:
+ ##-[0-2]##)</p></dd><dt><span class="term">Hawthorne patterns:</span></dt><dd><p>GN-K35, 1S-L07, 4N-B21 (Regular Expression Pattern:
+ [G1-4][NS]-[A-Z]##)</p></dd></dl></div><p> </p><p>There are several steps to develop and test a simple UIMA annotator.</p><div class="orderedlist"><ol type="1" compact><li><p>Define the CAS types that the
+ annotator will use.</p></li><li><p>Generate the Java classes for these types.</p></li><li><p>Write the actual annotator Java code.</p></li><li><p>Create the Analysis Engine descriptor.</p></li><li><p>Test the annotator. </p></li></ol></div><p>These steps are discussed in the next sections.</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.defining_types"></a>1.1.1. Defining Types</h3></div></div></div><p>The first step in developing an annotator is to define the CAS Feature Structure
+ types that it creates. This is done in an XML file called a <span class="emphasis"><em>Type System
+ Descriptor</em></span>. UIMA defines basic primitive types such as
+ Boolean, Byte, Short, Integer, Long, Float, and Double, as well as Arrays of these primitive
+ types. UIMA also defines the built-in types <code class="literal">TOP</code>, which is the root
+ of the type system, analogous to Object in Java; <code class="literal">FSArray</code>, which is
+ an array of Feature Structures (i.e. an array of instances of TOP); and
+ <code class="literal">Annotation</code>, which we will discuss in more detail in this section.</p><p>UIMA includes an Eclipse plug-in that will help you edit Type System
+ Descriptors, so if you are using Eclipse you will not need to worry about the details of
+ the XML syntax. See <a href="../overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup" class="olink">Chapter 3, Setting up the Eclipse IDE to work with UIMA
+ </a> in <span class="olinkdocname">Overview & Setup</span> for instructions on setting up Eclipse and
+ installing the plugin.</p><p>The Type System Descriptor for our annotator is located in the file
+ <code class="literal">descriptors/tutorial/ex1/TutorialTypeSystem.xml.</code> (This
+ and all other examples are located in the <code class="literal">examples</code> directory of
+ the installation of the UIMA SDK, which can be imported into an Eclipse project for
+ your convenience, as described in <a href="../overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup.example_code" class="olink">Section 3.2, “Setting up Eclipse to view Example Code”</a> in <span class="olinkdocname">Overview & Setup</span>.)</p><p>In Eclipse, expand the <code class="literal">uimaj-examples</code> project in the
+ Package Explorer view, and browse to the file
+ <code class="literal">descriptors/tutorial/ex1/TutorialTypeSystem.xml</code>.
+ Right-click on the file in the navigator and select Open With <span class="symbol">→</span> Component
+ Descriptor Editor. Once the editor opens, click on the “<span class="quote">Type System</span>”
+ tab at the bottom of the editor window. You should see a view such as the
+ following:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="768"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image002.jpg" width="768" alt="Screenshot of editor for Type System Definitions"></td></tr></table></div></div><p>Our annotator will need only one type –
+ <code class="literal">org.apache.uima.tutorial.RoomNumber</code>. (We use the same
+ namespace conventions as are used for Java classes.) Just as in Java, types have
+ supertypes. The supertype is listed in the second column of the left table. In this
+ case our RoomNumber annotation extends from the built-in type
+ <code class="literal">uima.tcas.Annotation</code>.</p><p>Descriptions can be included with types and features. In this example, there is a
+ description associated with the <code class="literal">building</code> feature. To see it,
+ hover the mouse over the feature.</p><p>The bottom tab labeled “<span class="quote">Source</span>” will show you the XML source file
+ associated with this descriptor.</p><p>The built-in Annotation type declares three fields (called
+ <span class="emphasis"><em>Features</em></span> in CAS terminology). The features <code class="literal">begin</code>
+ and <code class="literal">end</code> store the character offsets of the span of text to which the
+ annotation refers. The feature <code class="literal">sofa</code> (Subject of Analysis) indicates
+ which document the begin and end offsets point into. The <code class="literal">sofa</code> feature
+ can be ignored for now since we assume in this tutorial that the CAS contains only one
+ subject of analysis (document).</p><p>Our RoomNumber type will inherit these three features from
+ <code class="literal">uima.tcas.Annotation</code>, its supertype; they are not visible in
+ this view because inherited features are not shown. One additional feature,
+ <code class="literal">building</code>, is declared. It takes a String as its value. Instead
+ of String, we could have declared the range-type of our feature to be any other CAS type
+ (defined or built-in).</p><p>If you are not using Eclipse, if you need to edit the type system, do so using any XML
+ or text editor, directly. The following is the actual XML representation of the Type
+ System displayed above in the editor:</p><pre class="programlisting"><?xml version="1.0" encoding="UTF-8" ?>
+ <typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
+ <name>TutorialTypeSystem</name>
+ <description>Type System Definition for the tutorial examples -
+ as of Exercise 1</description>
+ <vendor>Apache Software Foundation</vendor>
+ <version>1.0</version>
+ <types>
+ <typeDescription>
+ <name>org.apache.uima.tutorial.RoomNumber</name>
+ <description></description>
+ <supertypeName>uima.tcas.Annotation</supertypeName>
+ <features>
+ <featureDescription>
+ <name>building</name>
+ <description>Building containing this room</description>
+ <rangeTypeName>uima.cas.String</rangeTypeName>
+ </featureDescription>
+ </features>
+ </typeDescription>
+ </types>
+ </typeSystemDescription></pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.generating_jcas_sources"></a>1.1.2. Generating Java Source Files for CAS Types</h3></div></div></div><p>When you save a descriptor that you have modified, the Component Descriptor
+ Editor will automatically generate Java classes corresponding to the types that are
+ defined in that descriptor (unless this has been disabled), using a utility called
+ JCasGen. These Java classes will have the same name (including package) as the CAS
+ types, and will have get and set methods for each of the features that you have
+ defined.</p><p>This feature is enabled/disabled using the UIMA menu pulldown (or the Eclipse
+ Preferences <span class="symbol">→</span> UIMA). If automatic running of JCasGen is not happening, please
+ make sure the option is checked:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="575"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image004.jpg" width="575" alt="Screenshot of enabling automatic running of JCasGen"></td></tr></table></div></div><p>The Java class for the example org.apache.uima.tutorial.RoomNumber type can
+ be found in <code class="literal">src/org/apache/uima/tutorial/RoomNumber.java</code>
+ . You will see how to use these generated classes in the next section.</p><p>If you are not using the Component Descriptor Editor, you will need to generate
+ these Java classes by using the <span class="emphasis"><em>JCasGen</em></span> tool. JCasGen reads a
+ Type System Descriptor XML file and generates the corresponding Java classes that
+ you can then use in your annotator code. To launch JCasGen, run the jcasgen shell
+ script located in the <code class="literal">/bin</code> directory of the UIMA SDK
+ installation. This should launch a GUI that looks something like this:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="532"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image006.jpg" width="532" alt="Screenshot of JCasGen"></td></tr></table></div></div><p>Use the “<span class="quote">Browse</span>” buttons to select your input file
+ (TutorialTypeSystem.xml) and output directory (the root of the source tree into
+ which you want the generated files placed). Then click the “<span class="quote">Go</span>”
+ button. If the Type System Descriptor has no errors, new Java source files will be
+ generated under the specified output directory.</p><p>There are some additional options to choose from when running JCasGen; please
+ refer to the <a href="../tools/tools.html#ugr.tools.jcasgen" class="olink">Chapter 6, JCasGen User's Guide
+ </a> in <span class="olinkdocname">UIMA Tools Guide and Reference</span> for details.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.developing_annotator_code"></a>1.1.3. Developing Your Annotator Code</h3></div></div></div><p>Annotator implementations all implement a standard interface (AnalysisComponent), having several
+ methods, the most important of which are:
+
+ </p><div class="itemizedlist"><ul type="disc" compact><li><p><code class="literal">initialize</code>, </p></li><li><p><code class="literal">process</code>, and </p></li><li><p><code class="literal">destroy</code>. </p></li></ul></div><p><code class="literal">initialize</code> is called by the framework once when it first creates an instance of the
+ annotator class. <code class="literal">process</code> is called once per item being processed.
+ <code class="literal">destroy</code> may be called by the application when it is done using your annotator. There is a
+ default implementation of this interface for annotators using the JCas, called JCasAnnotator_ImplBase, which
+ has implementations of all required methods except for the process method.</p><p>Our annotator class extends the JCasAnnotator_ImplBase; most annotators that use the JCas will extend
+ from this class, so they only have to implement the process method. This class is not restricted to handling
+ just text; see <a href="tutorials_and_users_guides.html#ugr.tug.aas" class="olink">Chapter 5, Annotations, Artifacts, and Sofas
+ </a>.</p><p>Annotators are not required to extend from the JCasAnnotator_ImplBase class; they may instead
+ directly implement the AnalysisComponent interface, and provide all method implementations themselves.
+ <sup>[<a name="d0e414" href="#ftn.d0e414">1</a>]</sup> This allows you to have
+ your annotator inherit from some other superclass if necessary. If you would like to do this, see the JavaDocs
+ for JCasAnnotator for descriptions of the methods you must implement.</p><p>Annotator classes need to be public, cannot be declared abstract, and must have public, 0-argument
+ constructors, so that they can be instantiated by the framework. <sup>[<a name="d0e432" href="#ftn.d0e432">2</a>]</sup> .</p><p>The class definition for our RoomNumberAnnotator implements the process method, and is shown here. You
+ can find the source for this in the
+ <code class="literal">uimaj-examples/src/org/apache/uima/tutorial/ex1/RoomNumberAnnotator.java</code> .
+ </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>In Eclipse, in the “<span class="quote">Package Explorer</span>” view, this will appear by default in the project
+ <code class="literal">uimaj-examples</code>, in the folder <code class="literal">src</code>, in the package
+ <code class="literal">org.apache.uima.tutorial.ex1</code>.</p></div><p> In Eclipse, open the
+ RoomNumberAnnotator.java in the uimaj-examples project, under the src directory.</p><pre class="programlisting">package org.apache.uima.tutorial.ex1;
+
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
+import org.apache.uima.jcas.JCas;
+import org.apache.uima.tutorial.RoomNumber;
+
+/**
+ * Example annotator that detects room numbers using
+ * Java 1.4 regular expressions.
+ */
+public class RoomNumberAnnotator extends JCasAnnotator_ImplBase {
+ private Pattern mYorktownPattern =
+ Pattern.compile("\\b[0-4]\\d-[0-2]\\d\\d\\b");
+
+ private Pattern mHawthornePattern =
+ Pattern.compile("\\b[G1-4][NS]-[A-Z]\\d\\d\\b");
+
+ public void process(JCas aJCas) {
+ // Discussed Later
+ }
+}</pre><p>The two Java class fields, mYorktownPattern and mHawthornePattern, hold regular expressions that
+ will be used in the process method. Note that these two fields are part of the Java implementation of the
+ annotator code, and not a part of the CAS type system. We are using the regular expression facility that is
+ built into Java 1.4. It is not critical that you know the details of how this works, but if you are curious the
+ details can be found in the Java API docs for the java.util.regex package.</p><p>The only method that we are required to implement is <code class="literal">process</code>. This method is typically
+ called once for each document that is being analyzed. This method takes one argument, which is a JCas instance;
+ this holds the document to be analyzed and all of the analysis results. <sup>[<a name="d0e466" href="#ftn.d0e466">3</a>]</sup></p><pre class="programlisting">public void process(JCas aJCas) {
+ // get document text
+ String docText = aJCas.getDocumentText();
+ // search for Yorktown room numbers
+ Matcher matcher = mYorktownPattern.matcher(docText);
+ int pos = 0;
+ while (matcher.find(pos)) {
+ // found one - create annotation
+ RoomNumber annotation = new RoomNumber(aJCas);
+ annotation.setBegin(matcher.start());
+ annotation.setEnd(matcher.end());
+ annotation.setBuilding("Yorktown");
+ annotation.addToIndexes();
+ pos = matcher.end();
+ }
+ // search for Hawthorne room numbers
+ matcher = mHawthornePattern.matcher(docText);
+ pos = 0;
+ while (matcher.find(pos)) {
+ // found one - create annotation
+ RoomNumber annotation = new RoomNumber(aJCas);
+ annotation.setBegin(matcher.start());
+ annotation.setEnd(matcher.end());
+ annotation.setBuilding("Hawthorne");
+ annotation.addToIndexes();
+ pos = matcher.end();
+ }
+}</pre><p>The Matcher class is part of the java.util.regex package and is used to find the room numbers in the
+ document text. When we find one, recording the annotation is as simple as creating a new Java object and
+ calling some set methods:</p><pre class="programlisting">RoomNumber annotation = new RoomNumber(aJCas);
+annotation.setBegin(matcher.start());
+annotation.setEnd(matcher.end());
+annotation.setBuilding("Yorktown");</pre><p>The <code class="literal">RoomNumber</code> class was generated from the type system description by the
+ Component Descriptor Editor or the JCasGen tool, as discussed in the previous section.</p><p>Finally, we call <code class="literal">annotation.addToIndexes()</code> to add the new annotation to the
+ indexes maintained in the CAS. By default, the CAS implementation used for analysis of text documents keeps
+ an index of all annotations in their order from beginning to end of the document. Subsequent annotators or
+ applications use the indexes to iterate over the annotations. </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p> If you don't add the instance to the indexes, it cannot be retrieved by down-stream annotators,
+ using the indexes. </p></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>You can also call <code class="literal">addToIndexes()</code> on Feature Structures that are not subtypes of
+ <code class="literal">uima.tcas.Annotation</code>, but these will not be sorted in any particular way. If you want
+ to specify a sort order, you can define your own custom indexes in the CAS: see <a href="../references/references.html#ugr.ref.cas" class="olink">Chapter 4, CAS Reference
+ </a> in <span class="olinkdocname">UIMA References</span> and <a href="../references/references.html#ugr.ref.xml.component_descriptor.aes.index" class="olink">Section 2.4.1.7, “Index Definition”</a> in <span class="olinkdocname">UIMA References</span> for details.</p></div><p>We're almost ready to test the RoomNumberAnnotator. There is just one more step
+ remaining.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.creating_xml_descriptor"></a>1.1.4. Creating the XML Descriptor</h3></div></div></div><p>The UIMA architecture requires that descriptive information about an
+ annotator be represented in an XML file and provided along with the annotator class
+ file(s) to the UIMA framework at run time. This XML file is called an
+ <span class="emphasis"><em>Analysis Engine Descriptor</em></span>. The descriptor includes:
+
+ </p><div class="itemizedlist"><ul type="disc"><li><p>Name, description, version, and vendor</p></li><li><p>The annotator's inputs and outputs, defined in terms of
+ the types in a Type System Descriptor</p></li><li><p>Declaration of the configuration parameters that the
+ annotator accepts </p></li></ul></div><p> </p><p>The <span class="emphasis"><em>Component Descriptor Editor</em></span> plugin, which we
+ previously used to edit the Type System descriptor, can also be used to edit Analysis
+ Engine Descriptors.</p><p>A descriptor for our RoomNumberAnnotator is provided with the UIMA
+ distribution under the name
+ <code class="literal">descriptors/tutorial/ex1/RoomNumberAnnotator.xml.</code> To
+ edit it in Eclipse, right-click on that file in the navigator and select Open With
+ <span class="symbol">→</span> Component Descriptor Editor.</p><div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Tip</h3><p>In Eclipse, you can double
+ click on the tab at the top of the Component Descriptor Editor's window
+ identifying the currently selected editor, and the window will
+ “<span class="quote">Maximize</span>”. Double click it again to restore the original size.</p></div><p>If you are not using Eclipse, you will need to edit Analysis Engine descriptors
+ manually. See <a href="#ugr.tug.aae.xml_intro_ae_descriptor" title="1.8. Introduction to Analysis Engine Descriptor XML Syntax">Section 1.8, “Analysis Engine XML Descriptor”</a> for an
+ introduction to the Analysis Engine descriptor XML syntax. The remainder of this
+ section assumes you are using the Component Descriptor Editor plug-in to edit the
+ Analysis Engine descriptor.</p><p>The Component Descriptor Editor consists of several tabbed pages; we will only
+ need to use a few of them here. For more information on using this editor, see <a href="../tools/tools.html#ugr.tools.cde" class="olink">Chapter 1, Component Descriptor Editor User's Guide
+ </a> in <span class="olinkdocname">UIMA Tools Guide and Reference</span>.</p><p>The initial page of the Component Descriptor Editor is the Overview page, which
+ appears as follows:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image008.jpg" width="574" alt="Screenshot of Component Descriptor Editor overview page"></td></tr></table></div></div><p>This presents an overview of the RoomNumberAnnotator Analysis Engine (AE). The
+ left side of the page shows that this descriptor is for a
+ <span class="emphasis"><em>Primitive</em></span> AE (meaning it consists of a single annotator),
+ and that the annotator code is developed in Java. Also, it specifies the Java class
+ that implements our logic (the code which was discussed in the previous section).
+ Finally, on the right side of the page are listed some descriptive attributes of our
+ annotator.</p><p>The other two pages that need to be filled out are the Type System page and the
+ Capabilities page. You can switch to these pages using the tabs at the bottom of the
+ Component Descriptor Editor. In the tutorial, these are already filled out for
+ you.</p><p>The RoomNumberAnnotator will be using the TutorialTypeSystem we looked at in
+ Section <a href="#ugr.tug.aae.defining_types" title="1.1.1. Defining Types">Section 1.1.1, “Defining Types”</a>. To specify this, we add
+ this type system to the Analysis Engine's list of Imported Type Systems, using
+ the Type System page's right side panel, as shown here:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="576"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image010.jpg" width="576" alt="Screenshot of CDE Type System page"></td></tr></table></div></div><p>On the Capabilities page, we define our annotator's inputs and outputs, in
+ terms of the types in the type system. The Capabilities page is shown below:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="534"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image012.jpg" width="534" alt="Screenshot of CDE Capabilities page"></td></tr></table></div></div><p>Although capabilities come in sets, having multiple sets is deprecated; here
+ we're just using one set. The RoomNumberAnnotator is very simple. It requires
+ no input types, as it operates directly on the document text -- which is supplied as a
+ part of the CAS initialization (and which is always assumed to be present). It
+ produces only one output type (RoomNumber), and it sets the value of the
+ <code class="literal">building</code> feature on that type. This is all represented on the
+ Capabilities page.</p><p>The Capabilities page has two other parts for specifying languages and Sofas.
+ The languages section allows you to specify which languages your Analysis Engine
+ supports. The RoomNumberAnnotator happens to be language-independent, so we can
+ leave this blank. The Sofas section allows you to specify the names of additional
+ subjects of analysis. This capability and the Sofa Mappings at the bottom are
+ advanced topics, described in <a href="tutorials_and_users_guides.html#ugr.tug.aas" class="olink">Chapter 5, Annotations, Artifacts, and Sofas
+ </a>. </p><p>This is all of the information we need to provide for a simple annotator. If you
+ want to peek at the XML that this tool saves you from having to write, click on the
+ “<span class="quote">Source</span>” tab at the bottom to view the generated XML.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.testing_your_annotator"></a>1.1.5. Testing Your Annotator</h3></div></div></div><p>Having developed an annotator, we need a way to try it out on some example
+ documents. The UIMA SDK includes a tool called the Document Analyzer that will allow
+ us to do this. To run the Document Analyzer, execute the documentAnalyzer shell
+ script that is in the <code class="literal">bin</code> directory of your UIMA SDK
+ installation, or, if you are using the example Eclipse project, execute the
+ “<span class="quote">UIMA Document Analyzer</span>” run configuration supplied with that
+ project. (To do this, click on the menu bar Run <span class="symbol">→</span> Run ... <span class="symbol">→</span> and under Java
+ Applications in the left box, click on UIMA Document Analyzer.)</p><p>You should see a screen that looks like this:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image014.jpg" width="574" alt="Screenshot of UIMA Document Analyzer GUI"></td></tr></table></div></div><p>There are six options on this screen:</p><div class="orderedlist"><ol type="1"><li><p>Directory containing documents to analyze</p></li><li><p>Directory where analysis results will be written</p></li><li><p>The XML descriptor for the Analysis Engine (AE) you want to
+ run</p></li><li><p>(Optional) an XML tag, within the input documents, that contains
+ the text to be analyzed. For example, the value TEXT would cause the AE to only
+ analyze the portion of the document enclosed within
+ <TEXT>...</TEXT> tags.</p></li><li><p>Language of the document </p></li><li><p>Character encoding </p></li></ol></div><p>Use the Browse button next to the third item to set the “<span class="quote">Location of AE XML
+ Descriptor</span>” field to the descriptor we've just been discussing
+ —
+ <code class="literal"><where-you-installed-uima-e.g.UIMA_HOME>
+ /examples/descriptors/tutorial/ex1/RoomNumberAnnotator.xml</code>
+ . Set the other fields to the values shown in the screen shot above (which should be the
+ default values if this is the first time you've run the Document Analyzer). Then
+ click the “<span class="quote">Run</span>” button to start processing.</p><p>When processing completes, an “<span class="quote">Analysis Results</span>” window should
+ appear.</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="332"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image016.jpg" width="332" alt="Screenshot of UIMA Document Analyzer Results GUI"></td></tr></table></div></div><p>Make sure “<span class="quote">Java Viewer</span>” is selected as the Results Display
+ Format, and <span class="bold"><strong>double-click</strong></span> on the document
+ UIMASummerSchool2003.txt to view the annotations that were discovered. The view
+ should look something like this:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="510"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image018.jpg" width="510" alt="Screenshot of UIMA CAS Annotation Viewer GUI"></td></tr></table></div></div><p>You can click the mouse on one of the highlighted annotations to see a list of all
+ its features in the frame on the right.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The legend will only show
+ those types which have at least one instance in the CAS, and are declared as outputs in the
+ capabilities section of the descriptor (see <a href="#ugr.tug.aae.creating_xml_descriptor" title="1.1.4. Creating the XML Descriptor">Section 1.1.4, “Creating the XML Descriptor”</a>. </p></div><p>You can use the DocumentAnalyzer to test any UIMA annotator
+ — just make sure that the annotator's classes are in the class
+ path.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="ugr.tug.aae.configuration_logging"></a>1.2. Configuration and Logging</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.configuration_parameters"></a>1.2.1. Configuration Parameters</h3></div></div></div><p>The example RoomNumberAnnotator from the previous section used hardcoded
+ regular expressions and location names, which is obviously not very flexible. For
+ example, you might want to have the patterns of room numbers be supplied by a
+ configuration parameter, rather than having to redo the annotator's Java code
+ to add additional patterns. Rather than add a new hardcoded regular expression for a
+ new pattern, a better solution is to use configuration parameters.</p><p>UIMA allows annotators to declare configuration parameters in their
+ descriptors. The descriptor also specifies default values for the parameters,
+ though these can be overridden at runtime.</p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.declaring_parameters_in_the_descriptor"></a>1.2.1.1. Declaring Parameters in the Descriptor</h4></div></div></div><p>The example descriptor
+ <code class="literal">descriptors/tutorial/ex2/RoomNumberAnnotator.xml</code> is
+ the same as the descriptor from the previous section except that information has
+ been filled in for the Parameters and Parameter Settings pages of the Component
+ Descriptor Editor.</p><p>First, in Eclipse, open example two's RoomNumberAnnotator in the
+ Component Descriptor Editor, and then go to the Parameters page (click on the
+ parameters tab at the bottom of the window), which is shown below:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="538"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image020.jpg" width="538" alt="Screenshot of UIMA Component Descriptor Editor (CDE) Parameters page"></td></tr></table></div></div><p>Two parameters – Patterns and Locations -- have been declared. In this
+ screen shot, the mouse (not shown) is hovering over Patterns to show its
+ description in the small popup window. Every parameter has the following
+ information associated with it:</p><div class="itemizedlist"><ul type="disc"><li><p>name – the name by which the annotator code
+ refers to the parameter</p></li><li><p>description – a natural language description of the
+ intent of the parameter</p></li><li><p>type – the data type of the parameter's value
+ – must be one of String, Integer, Float, or Boolean.</p></li><li><p>multiValued – true if the parameter can take
+ multiple-values (an array), false if the parameter takes only a single value.
+ Shown above as <code class="literal">Multi</code>.</p></li><li><p>mandatory – true if a value must be provided for the
+ parameter. Shown above as <code class="literal">Req</code> (for required). </p></li></ul></div><p>Both of our parameters are mandatory and accept an array of Strings as their
+ value.</p><p>Next, default values are assigned to the parameters on the Parameter Settings
+ page:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="538"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image022.jpg" width="538" alt="Screenshot of UIMA Component Descriptor Editor (CDE) Parameter Settings page"></td></tr></table></div></div><p>Here the “<span class="quote">Patterns</span>” parameter is selected, and the right pane
+ shows the list of values for this parameter, in this case the regular expressions
+ that match particular room numbering conventions. Notice the third pattern is
+ new, for matching the style of room numbers in the third building, which has room
+ numbers such as <code class="literal">J2-A11</code>.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.accessing_parameter_values_from_annotator"></a>1.2.1.2. Accessing Parameter Values from the Annotator Code</h4></div></div></div><p>The class
+ <code class="literal">org.apache.uima.tutorial.ex2.RoomNumberAnnotator</code> has
+ overridden the initialize method. The initialize method is called by the UIMA
+ framework when the annotator is instantiated, so it is a good place to read
+ configuration parameter values. The default initialize method does nothing with
+ configuration parameters, so you have to override it. To see the code in Eclipse,
+ switch to the src folder, and open
+ <code class="literal">org.apache.uima.tutorial.ex2</code>. Here is the method
+ body:</p><pre class="programlisting">/**
+* @see AnalysisComponent#initialize(UimaContext)
+*/
+public void initialize(UimaContext aContext)
+ throws ResourceInitializationException {
+ super.initialize(aContext);
+
+ // Get config. parameter values
+ String[] patternStrings =
+ (String[]) aContext.getConfigParameterValue("Patterns");
+ mLocations =
+ (String[]) aContext.getConfigParameterValue("Locations");
+
+ // compile regular expressions
+ mPatterns = new Pattern[patternStrings.length];
+ for (int i = 0; i < patternStrings.length; i++) {
+ mPatterns[i] = Pattern.compile(patternStrings[i]);
+ }
+}</pre><p>Configuration parameter values are accessed through the UimaContext. As you
+ will see in subsequent sections of this chapter, the UimaContext is the
+ annotator's access point for all of the facilities provided by the UIMA
+ framework – for example logging and external resource access.</p><p>The UimaContext's <code class="literal">getConfigParameterValue</code>
+ method takes the name of the parameter as an argument; this must match one of the
+ parameters declared in the descriptor. The return value of this method is a Java
+ Object, whose type corresponds to the declared type of the parameter. It is up to the
+ annotator to cast it to the appropriate type, String[] in this case.</p><p>If there is a problem retrieving the parameter values, the framework throws an
+ exception. Generally annotators don't handle these, and just let them
+ propagate up.</p><p>To see the configuration parameters working, run the Document Analyzer
+ application and select the descriptor
+ <code class="literal">examples/descriptors/tutorial/ex2/RoomNumberAnnotator.xml</code>
+ . In the example document <code class="literal">WatsonConferenceRooms.txt</code>, you
+ should see some examples of Hawthorne II room numbers that would not have been
+ detected by the ex1 version of RoomNumberAnnotator.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.supporting_reconfiguration"></a>1.2.1.3. Supporting Reconfiguration</h4></div></div></div><p>If you take a look at the JavaDocs (located in the <a href="api/index.html" target="_top">docs/api</a> directory) for
+ <code class="literal">org.apache.uima.analysis_component.AnaysisComponent</code>
+ (which our annotator implements indirectly through JCasAnnotator_ImplBase),
+ you will see that there is a reconfigure() method, which is called by the containing
+ application through the UIMA framework, if the configuration parameter values
+ are changed.</p><p>The AnalysisComponent_ImplBase class provides a default implementation
+ that just calls the annotator's destroy method followed by its initialize
+ method. This works fine for our annotator. The only situation in which you might
+ want to override the default reconfigure() is if your annotator has very expensive
+ initialization logic, and you don't want to reinitialize everything if just
+ one configuration parameter has changed. In that case, you can provide a more
+ intelligent implementation of reconfigure() for your annotator.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.configuration_parameter_groups"></a>1.2.1.4. Configuration Parameter Groups</h4></div></div></div><p>For annotators with many sets of configuration parameters, UIMA supports
+ organizing them into groups. It is possible to define a parameter with the same name
+ in multiple groups; one common use for this is for annotators that can process
+ documents in several languages and which want to have different parameter
+ settings for the different languages.</p><p>The syntax for defining parameter groups in your descriptor is fairly
+ straightforward – see <a href="../references/references.html#ugr.ref.xml.component_descriptor" class="olink">Chapter 2, Component Descriptor Reference
+ </a> in <span class="olinkdocname">UIMA References</span> for details. Values of
+ parameters defined within groups are accessed through the two-argument version
+ of <code class="literal">UimaContext.getConfigParameterValue</code>, which takes
+ both the group name and the parameter name as its arguments.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.logging"></a>1.2.2. Logging</h3></div></div></div><p>The UIMA SDK provides a logging facility, which is very similar to the
+ java.util.logging.Logger class that was introduced in Java 1.4.</p><p>In the Java architecture, each logger instance is associated with a name. By
+ convention, this name is often the fully qualified class name of the component
+ issuing the logging call. The name can be referenced in a configuration file when
+ specifying which kinds of log messages to actually log, and where they should
+ go.</p><p>The UIMA framework supports this convention using the
+ <code class="literal">UimaContext</code> object. If you access a logger instance using
+ <code class="literal">getContext().getLogger()</code> within an Annotator, the logger
+ name will be the fully qualified name of the Annotator implementation class.</p><p>Here is an example from the process method of
+ <code class="literal">org.apache.uima.tutorial.ex2.RoomNumberAnnotator</code>:
+
+
+ </p><pre class="programlisting">getContext().getLogger().log(Level.FINEST,"Found: " + annotation);</pre><p>
+ </p><p>The first argument to the log method is the level of the log output. Here, a value of
+ FINEST indicates that this is a highly-detailed tracing message. While useful for
+ debugging, it is likely that real applications will not output log messages at this
+ level, in order to improve their performance. Other defined levels, from lowest to
+ highest importance, are FINER, FINE, CONFIG, INFO, WARNING, and SEVERE.</p><p>If no logging configuration file is provided (see next section), the Java
+ Virtual Machine defaults would be used, which typically set the level to INFO and
+ higher messages, and direct output to the console.</p><p>If you specify the standard UIMA SDK <code class="literal">Logger.properties,</code>
+ the output will be directed to a file named uima.log, in the current working directory
+ (often the “<span class="quote">project</span>” directory when running from Eclipse, for
+ instance).</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>When using Eclipse, the uima.log file, if written
+ into the Eclipse workspace in the project uimaj-examples, for example, may not appear
+ in the Eclipse package explorer view until you right-click the uimaj-examples project
+ with the mouse, and select “<span class="quote">Refresh</span>”. This operation refreshes the
+ Eclipse display to conform to what may have changed on the file system. Also, you can set
+ the Eclipse preferences for the workspace to automatically refresh (Window <span class="symbol">→</span>
+ Preferences <span class="symbol">→</span> General <span class="symbol">→</span> Workspace, then click the “<span class="quote">refresh
+ automatically</span>” checkbox.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.configuring"></a>1.2.2.1. Specifying the Logging Configuration</h4></div></div></div><p>The standard UIMA logger uses the underlying Java 1.4 logging mechanism. You
+ can use the APIs that come with that to configure the logging. In addition, the
+ standard Java 1.4 logging initialization mechanisms will look for a Java System
+ Property named <code class="literal">java.util.logging.config.file</code> and if
+ found, will use the value of this property as the name of a standard
+ “<span class="quote">properties</span>” file, for setting the logging level. Please refer to
+ the Java 1.4. documentation for more information on the format and use of this
+ file.</p><p>Two sample logging specification property files can be found in the UIMA_HOME
+ directory where the UIMA SDK is installed:
+ <code class="literal">config/Logger.properties</code>, and
+ <code class="literal">config/FileConsoleLogger.properties</code>. These specify the same
+ logging, except the first logs just to a file, while the second logs both to a file and
+ to the console. You can edit these files, or create additional ones, as described
+ below, to change the logging behavior.</p><p>When running your own Java application, you can specify the location of the
+ logging configuration file on your Java command line by setting the Java system
+ property <code class="literal">java.util.logging.config.file</code>to be the logging
+ configuration filename. This file specification can be either absolute or
+ relative to the working directory. For example:
+
+
+ </p><pre class="programlisting">java "-Djava.util.logging.config.file=C:/Program Files/apache-uima/config/Logger.properties"</pre><p>
+ </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>In a shell script, you can use environment variables such as
+ UIMA_HOME if convenient.</p></div><p> </p><p>If you are using Eclipse to launch your application, you can set this property
+ in the VM arguments section of the Arguments tab of the run configuration screen. If
+ you've set an environment variable UIMA_HOME, you could for example, use the
+ string:
+ <code class="literal">"-Djava.util.logging.config.file=${env_var:UIMA_HOME}/config/Logger.properties".</code>
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.setting_logging_levels"></a>1.2.2.2. Setting Logging Levels</h4></div></div></div><p>Within the logging control file, the default global logging level specifies
+ which kinds of events are logged across all loggers. For any given facility this
+ global level can be overridden by a facility specific level. Multiple handlers are
+ supported. This allows messages to be directed to a log file, as well as to a
+ “<span class="quote">console</span>”. Note that the ConsoleHandler also has a separate level
+ setting to limit messages printed to the console. For example: <code class="literal">.level=
+ INFO</code> </p><p>The properties file can change where the log is written, as well.</p><p>Facility specific properties allow different logging for each class, as
+ well. For example, to set the com.xyz.foo logger to only log SEVERE messages:
+ <code class="literal">com.xyz.foo.level = SEVERE</code></p><p>If you have a sample annotator in the package
+ <code class="literal">org.apache.uima.SampleAnnotator</code> you can set the log level
+ by specifying: <code class="literal">org.apache.uima.SampleAnnotator.level =
+ ALL</code></p><p>There are other logging controls; for a full discussion, please read the
+ contents of the <code class="literal">Logger.properties</code> file and the Java
+ specification for logging in Java 1.4.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.output_format"></a>1.2.2.3. Format of logging output</h4></div></div></div><p>The logging output is formatted by handlers specified in the properties file
+ for configuring logging, described above. The default formatter that comes with
+ the UIMA SDK formats logging output as follows:</p><p><code class="literal">Timestamp - threadID: sourceInfo: Message level:
+ message</code></p><p> Here's an example:</p><p><code class="literal">7/12/04 2:15:35 PM - 10:
+ org.apache.uima.util.TestClass.main(62): INFO: You are not logged
+ in!</code></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.meaning_of_severity_levels"></a>1.2.2.4. Meaning of the logging severity levels</h4></div></div></div><p>These levels are defined by the Java logging framework, which was
+ incorporated into Java as of the 1.4 release level. The levels are defined in the
+ JavaDocs for java.util.logging.Level, and include both logging and tracing
+ levels:
+ </p><div class="itemizedlist"><ul type="disc" compact><li><p>OFF is a special level that can be used to turn off
+ logging.</p></li><li><p>ALL indicates that all messages should be logged. </p></li><li><p>CONFIG is a message level for configuration messages. These
+ would typically occur once (during configuration) in methods like
+ <code class="literal">initialize()</code>. </p></li><li><p>INFO is a message level for informational messages, for
+ example, connected to server IP: 192.168.120.12 </p></li><li><p>WARNING is a message level indicating a potential
+ problem.</p></li><li><p>SEVERE is a message level indicating a serious
+ failure.</p></li></ul></div><p> Tracing levels, typically used for debugging:
+ </p><div class="itemizedlist"><ul type="disc"><li><p>FINE is a message level providing tracing information,
+ typically at a collection level (messages occurring once per collection).
+ </p></li><li><p>FINER indicates a fairly detailed tracing message,
+ typically at a document level (once per document).</p></li><li><p>FINEST indicates a highly detailed tracing message. </p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.using_outside_of_an_annotator"></a>1.2.2.5. Using the logger outside of an annotator</h4></div></div></div><p>An application using UIMA may want to log its messages using the same logging
+ framework. This can be done by getting a reference to the UIMA logger, as follows:
+
+
+ </p><pre class="programlisting">Logger logger = UIMAFramework.getLogger(TestClass.class);</pre><p>
+ </p><p>The optional class argument allows filtering by class (if the log handler
+ supports this). If not specified, the name of the returned logger instance is
+ “<span class="quote">org.apache.uima</span>”.</p></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="ugr.tug.aae.building_aggregates"></a>1.3. Building Aggregate Analysis Engines</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.combining_annotators"></a>1.3.1. Combining Annotators</h3></div></div></div><p>The UIMA SDK makes it very easy to combine any sequence of Analysis Engines to
+ form an <span class="emphasis"><em>Aggregate Analysis Engine</em></span>. This is done through an
+ XML descriptor; no Java code is required!</p><p>If you go to the <code class="literal">examples/descriptors/tutorial/ex3</code>
+ folder (in Eclipse, it's in your uimaj-examples project, under the
+ <code class="literal">descriptors/tutorial/ex3</code> folder), you will find a
+ descriptor for a TutorialDateTime annotator. This annotator detects dates and
+ times (and also sentences and words). To see what this annotator can do, try it out
+ using the Document Analyzer. If you are curious as to how this annotator works, the
+ source code is included, but it is not necessary to understand the code at this
+ time.</p><p>We are going to combine the TutorialDateTime annotator with the
+ RoomNumberAnnotator to create an aggregate Analysis Engine. This is illustrated
+ in the following figure:
+
+ </p><div class="figure"><a name="ugr.tug.aae.fig.combining_annotators"></a><div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="560"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image024.png" width="560" alt="Combining Annotators to form an Aggregate Analysis Engine"></td></tr></table></div></div><p class="title"><b>Figure 1.1. Combining Annotators to form an Aggregate Analysis Engine</b></p></div><p><br class="figure-break"> </p><p>The descriptor that does this is named
+ <code class="literal">RoomNumberAndDateTime.xml</code>, which you can open in the
+ Component Descriptor Editor plug-in. This is in the uimaj-examples project in the
+ folder <code class="literal">descriptors/tutorial/ex3</code>. </p><p>The “<span class="quote">Aggregate</span>” page of the Component Descriptor Editor is
+ used to define which components make up the aggregate. A screen shot is shown below.
+ (If you are not using Eclipse, see <a href="#ugr.tug.aae.xml_intro_ae_descriptor" title="1.8. Introduction to Analysis Engine Descriptor XML Syntax">Section 1.8, “Analysis Engine XML Descriptor”</a> for the actual XML syntax
+ for Aggregate Analysis Engine Descriptors.)</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="575"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image026.jpg" width="575" alt="Aggregate page of the Component Descriptor Editor (CDE)"></td></tr></table></div></div><p>On the left side of the screen is the list of component engines that make up the
+ aggregate – in this case, the TutorialDateTime annotator and the
+ RoomNumberAnnotator. To add a component, you can click the “<span class="quote">Add</span>”
+ button and browse to its descriptor. You can also click the “<span class="quote">Find AE</span>”
+ button and search for an Analysis Engine in your Eclipse workspace.
+ </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The “<span class="quote">AddRemote</span>” button is used for adding components
+ which run remotely (for example, on another machine using a remote networking
+ connection). This capability is described in section <a href="tutorials_and_users_guides.html#ugr.tug.application.how_to_call_a_uima_service" class="olink">Section 3.6.3, “Calling a UIMA Service”</a>,</p></div><p> </p><p>The order of the components in the left pane does not imply an order of
+ execution. The order of execution, or “<span class="quote">flow</span>” is determined in the
+ “<span class="quote">Component Engine Flow</span>” section on the right. UIMA supports
+ different types of algorithms (including user-definable) for determining the
+ flow. Here we pick the simplest: <code class="literal">FixedFlow</code>. We have chosen to
+ have the RoomNumberAnnotator execute first, although in this case it
+ doesn't really matter, since the RoomNumber and DateTime annotators do not
+ have any dependencies on one another.</p><p>If you look at the “<span class="quote">Type System</span>” page of the Component
+ Descriptor Editor, you will see that it displays the type system but is not
+ editable. The Type System of an Aggregate Analysis Engine is automatically
+ computed by merging the Type Systems of all of its components.</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>If the components have different definitions for the same type name,
+ The Component Descriptor Editor will show a warning. It is possible to continue past
+ this warning, in which case your aggregate's type system will have the correct
+ “<span class="quote">merged</span>”
+ type definition that contains all of the features defined on that type by all of your
+ components. However, it is not recommended to use this feature in conjunction with JCAS,
+ since the JCAS Java Class definitions cannot be so easily merged. See
+ <a href="../references/references.html#ugr.ref.jcas.merging_types_from_other_specs" class="olink">Section 5.5, “Merging Types”</a> in <span class="olinkdocname">UIMA References</span> for more information.
+ </p></div><p>The Capabilities page is where you explicitly declare the aggregate Analysis
+ Engine's inputs and outputs. Sofas and Languages are described later.
+
+
+ </p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="565"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image028.jpg" width="565" alt="Screen shot of the Capabilities page of the Component Descriptor Editor"></td></tr></table></div></div><p>
+ </p><p>Note that it is not automatically assumed that all outputs of each component
+ Analysis Engine (AE) are passed through as outputs of the aggregate AE. In this
+ case, for example, we have decided to suppress the Word and Sentence annotations
+ that are produced by the TutorialDateTime annotator.</p><p>You can run this AE using the Document Analyzer in the same way that you run any
+ other AE. Just select the <code class="literal">examples/descriptors/tutorial/ex3/
+ RoomNumberAndDateTime.xml</code> descriptor and click the Run button. You
+ should see that RoomNumbers, Dates, and Times are all shown but that Words and
+ Sentences are not:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="525"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image030.jpg" width="525" alt="Screen shot results of running the Document Analyzer"></td></tr></table></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.aaes_can_contain_cas_consumers"></a>1.3.2. AEs can also contain CAS Consumers</h3></div></div></div><p>In addition to aggregating Analysis Engines, Aggregates can also contain CAS
+ Consumers (see <a href="tutorials_and_users_guides.html#ugr.tug.cpe" class="olink">Chapter 2, Collection Processing Engine Developer's Guide
+ </a>, or even a mixture of these components with regular
+ Analysis Engines. The UIMA Examples has an example of an Aggregate which contains
+ both an analysis engine and a CAS consumer, in
+ <code class="literal">examples/descriptors/MixedAggregate.xml.</code></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.reading_results_previous_annotators"></a>1.3.3. Reading the Results of Previous Annotators</h3></div></div></div><p>So far, we have been looking at annotators that look directly at the document text. However, annotators
+ can also use the results of other annotators. One useful thing we can do at this point is look for the
+ co-occurrence of a Date, a RoomNumber, and two Times – and annotate that as a Meeting.</p><p>The CAS maintains <span class="emphasis"><em>indexes</em></span> of annotations, and from an index you can obtain an
+ iterator that allows you to step through all annotations of a particular type. Here's some example code
+ that would iterate over all of the TimeAnnot annotations in the JCas:
+
+
+ </p><pre class="programlisting">FSIndex timeIndex = aJCas.getAnnotationIndex(TimeAnnot.type);
+Iterator timeIter = timeIndex.iterator();
+while (timeIter.hasNext()) {
+ TimeAnnot time = (TimeAnnot)timeIter.next();
+
+ //do something
+}</pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>You can also use the method
+ <code class="literal">JCAS.getJFSIndexRepository().getAllIndexedFS(YourClass.type)</code>, which returns an iterator
[... 3539 lines stripped ...]