You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by al...@apache.org on 2007/03/14 23:12:10 UTC

svn commit: r518354 [11/21] - in /incubator/uima/site/trunk/uima-website: docs/ docs/downloads/releaseDocs/ docs/downloads/releaseDocs/2.1.0-incubating/ docs/downloads/releaseDocs/2.1.0-incubating/docs/ docs/downloads/releaseDocs/2.1.0-incubating/docs/...

Added: incubator/uima/site/trunk/uima-website/docs/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html
URL: http://svn.apache.org/viewvc/incubator/uima/site/trunk/uima-website/docs/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html?view=auto&rev=518354
==============================================================================
--- incubator/uima/site/trunk/uima-website/docs/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html (added)
+++ incubator/uima/site/trunk/uima-website/docs/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html Wed Mar 14 15:11:54 2007
@@ -0,0 +1,4143 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+   <title>UIMA Tutorial and Developers' Guides</title><link rel="stylesheet" href="css/stylesheet.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.70.0"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="book" lang="en" id="d0e2"><div class="titlepage"><div><div><h1 class="title"><a name="d0e2"></a>UIMA Tutorial and Developers' Guides</h1></div><div><div class="authorgroup"><h3 class="corpauthor">Authors: The Apache UIMA Development Community</h3></div></div><div><p class="releaseinfo">Version 2.1</p></div><div><p class="copyright">Copyright &copy; 2006, 2007 The Apache Software Foundation</p></div><div><p class="copyright">Copyright &copy; 2004, 2006 International Business Machines Corporation</p></div><div><div class="legalnotice"><a name="d0e15"></a><p> </p><p><b>Incubation Notice and Disclaimer.&nbsp;</b>Apache UIMA is an effort undergoing incubation at the Apache Software Foundation (ASF). 
+          Incubation is required of all newly accepted projects until a further review indicates that 
+          the infrastructure, communications, and decision making process have stabilized in a manner 
+          consistent with other successful ASF projects. While incubation status is not necessarily 
+          a reflection of the completeness or stability of the code, 
+          it does indicate that the project has yet to be fully endorsed by the ASF.</p><p> </p><p> </p><p><b>License and Disclaimer.&nbsp;</b>The ASF licenses this documentation
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this documentation except in compliance
+           with the License.  You may obtain a copy of the License at
+         
+         </p><div class="blockquote"><blockquote class="blockquote"><a href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a></blockquote></div><p>
+         
+           Unless required by applicable law or agreed to in writing,
+           this documentation and its contents are distributed under the License 
+           on an 
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+         </p><p> </p><p> </p><p><b>Trademarks.&nbsp;</b>All terms mentioned in the text that are known to be trademarks or 
+        service marks have been appropriately capitalized.  Use of such terms
+        in this book should not be regarded as affecting the validity of the
+        the trademark or service mark.
+        </p></div></div><div><p class="pubdate">February, 2007</p></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#ugr.tug.aae">1. Annotator &amp; AE Developer's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.getting_started">1.1. Getting Started</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.defining_types">1.1.1. Defining Types</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.generating_jcas_sources">1.1.2. Generating Java Source Files for CAS Types</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.developing_annotator_code">1.1.3. Developing Your Annotator Code</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.creating_xml_descriptor">1.1.4. Creating the XML Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.testing_your_annotator">1.1.5. Testing Your Annotator</a></span></dt></dl></dd><dt><span c
 lass="section"><a href="#ugr.tug.aae.configuration_logging">1.2. Configuration and Logging</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.configuration_parameters">1.2.1. Configuration Parameters</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.logging">1.2.2. Logging</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aae.building_aggregates">1.3. Building Aggregate Analysis Engines</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.combining_annotators">1.3.1. Combining Annotators</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.aaes_can_contain_cas_consumers">1.3.2. AEs can also contain CAS Consumers</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.reading_results_previous_annotators">1.3.3. Reading the Results of Previous Annotators</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aae.other_examples">1.4. Other examples</a></span></dt><dt><span class
 ="section"><a href="#ugr.tug.aae.additional_topics">1.5. Additional Topics</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.contract_for_annotator_methods">1.5.1. Annotator Methods</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.reporting_errors_from_annotators">1.5.2. Reporting errors from Annotators</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.throwing_exceptions_from_annotators">1.5.3. Throwing Exceptions from Annotators</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.accessing_external_resource_files">1.5.4. Accessing External Resource Files</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.result_specification_setting">1.5.5. Result Specifications</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.classpath_when_using_jcas">1.5.6. Class path setup when using JCas</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.using_shell_scripts">1.5.7. Using the Shell Scripts<
 /a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aae.common_pitfalls">1.6. Common Pitfalls</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.viewing_UIMA_objects_in_eclipse_debugger">1.7. UIMA Objects in Eclipse Debugger</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro_ae_descriptor">1.8. Analysis Engine XML Descriptor</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aae.header_annotator_class_identification">1.8.1. Header and Annotator Class Identification</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro_simple_metadata_attributes">1.8.2. Simple Metadata Attributes</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro_type_system_definition">1.8.3. Type System Definition</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro_capabilities">1.8.4. Capabilities</a></span></dt><dt><span class="section"><a href="#ugr.tug.aae.xml_intro.configurat
 ion_parameters">1.8.5. Configuration Parameters (Optional)</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#ugr.tug.cpe">2. CPE Developer's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.concepts">2.1. CPE Concepts</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.configurator_and_viewer">2.2. CPE Configurator and CAS viewer</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.cpe_configurator">2.2.1. Using the CPE Configurator</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.running_cpe_configurator_from_eclipse">2.2.2. Running the CPE Configurator from Eclipse</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cpe.running_cpe_from_application">2.3. Running a CPE from Your Own Java Application</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.using_listeners">2.3.1. Using Listeners</a></span></dt></dl></dd><dt><span class="section"><a href=
 "#ugr.tug.cpe.developing_collection_processing_components">2.4. Developing Collection Processing Components</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.collection_reader.developing">2.4.1. Developing Collection Readers</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.cas_initializer.developing">2.4.2. Developing CAS
+      Initializers</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.cas_consumer.developing">2.4.3. Developing CAS
+      Consumers</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cpe.deploying_a_cpe">2.5. Deploying a CPE</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cpe.managed_deployment">2.5.1. Deploying Managed CAS Processors</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.deploying_nonmanaged_cas_processors">2.5.2. Deploying Non-managed CAS Processors</a></span></dt><dt><span class="section"><a href="#ugr.tug.cpe.integrated_deployment">2.5.3. Deploying Integrated CAS Processors</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cpe.collection_processing_examples">2.6. Collection Processing Examples</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.application">3. Application Developer's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.appication.uimaframework_class">3.1. The UIMAFramework Class</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.usi
 ng_aes">3.2. Using Analysis Engines</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.application.instantiating_an_ae">3.2.1. Instantiating an Analysis Engine</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.analyzing_text_documents">3.2.2. Analyzing Text Documents</a></span></dt><dt><span class="section"><a href="#ugr.tug.applications.analyzing_non_text_artifacts">3.2.3. Analyzing Non-Text Artifacts</a></span></dt><dt><span class="section"><a href="#ugr.tug.applications.accessing_analysis_results">3.2.4. Accessing Analysis Results</a></span></dt><dt><span class="section"><a href="#ugr.tug.applications.multi_threaded">3.2.5. Multi-threaded Applications</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.using_multiple_aes">3.2.6. Multiple AEs &amp; Creating Shared CASes</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.saving_cases_to_file_systems">3.2.7. Saving CASes to file systems</a></span></d
 t></dl></dd><dt><span class="section"><a href="#ugr.tug.application.using_cpes">3.3. Using Collection Processing Engines</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.application.running_a_cpe_from_a_descriptor">3.3.1. Running a CPE from a Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.configuring_a_cpe_descriptor_programmatically">3.3.2. Configuring a CPE Descriptor Programmatically</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.application.setting_configuration_parameters">3.4. Setting Configuration Parameters</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.integrating_text_analysis_and_search">3.5. Integrating Text Analysis and Search</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.application.building_an_index">3.5.1. Building an Index</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.search.query_tool">3.5.2. Semantic Search Query 
 Tool</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.application.remote_services">3.6. Working with Remote Services</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.application.how_to_deploy_as_soap">3.6.1. Deploying as SOAP Service</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.how_to_deploy_a_vinci_service">3.6.2. Deploying as a Vinci Service</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.how_to_call_a_uima_service">3.6.3. Calling a UIMA Service</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.restrictions_on_remotely_deployed_services">3.6.4. Restrictions on remotely deployed services</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.vns">3.6.5. The Vinci Naming Services (VNS)</a></span></dt><dt><span class="section"><a href="#ugr.tug.configuring_timeout_settings">3.6.6. Configuring Timeout Settings</a></span></dt></dl></dd><dt><span class="sec
 tion"><a href="#ugr.tug.application.increasing_performance_using_parallelism">3.7. Increasing performance using parallelism</a></span></dt><dt><span class="section"><a href="#ugr.tug.application.jmx">3.8. Monitoring AE Performance using JMX</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.fc">4. Flow Controller Developer's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.fc.developing_fc_code">4.1. Developing the Flow Controller Code</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.fc.fc_interface_overview">4.1.1. Flow Controller Interface Overview</a></span></dt><dt><span class="section"><a href="#ugr.tug.fc.example_code">4.1.2. Example Code</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.fc.creating_fc_descriptor">4.2. Creating the Flow Controller Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.fc.adding_fc_to_aggregate">4.3. Adding Flow Controller to an Aggregate</a></
 span></dt><dt><span class="section"><a href="#ugr.tug.fc.adding_fc_to_cpe">4.4. Adding Flow Controller to CPE</a></span></dt><dt><span class="section"><a href="#ugr.tug.fc.using_fc_with_cas_multipliers">4.5. Using Flow Controllers with CAS Multipliers</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.aas">5. Annotations, Artifacts &amp; Sofas</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aas.terminology">5.1. Terminology</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aas.artifact">5.1.1. Artifact</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.sofa">5.1.2. Subject of Analysis &#8212; Sofa</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aas.sofa_data_formats">5.2. Formats of Sofa Data</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.setting_accessing_sofa_data">5.3. Setting and Accessing Sofa Data</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aas.
 setting_sofa_data">5.3.1. Setting Sofa Data</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.accessing_sofa_data">5.3.2. Accessing Sofa Data</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.accessing_sofa_data_using_java_stream">5.3.3. Accessing Sofa Data using a Java Stream</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aas.sofa_fs">5.4. The Sofa Feature Structure</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.annotations">5.5. Annotations</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.aas.built_in_annotation_types">5.5.1. Built-in Annotation types</a></span></dt><dt><span class="section"><a href="#ugr.tug.aas.annotations_associated_sofa">5.5.2. Annotations have an associated Sofa</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.aas.annotationbase">5.6. AnnotationBase</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.mvs">6. Multiple CAS Views</a></sp
 an></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.cas_views_and_sofas">6.1. CAS Views and Sofas</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.naming_views_sofas">6.1.1. Naming CAS Views and Sofas</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.multi_view_and_single_view">6.1.2. Multi/Single View parts in Applications</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.mvs.multi_view_components">6.2. Multi-View Components</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.deciding_multi_view">6.2.1. Deciding: Multi-View</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.additional_capabilities">6.2.2. Multi-View: additional capabilities</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.component_xml_metadata">6.2.3. Component XML metadata</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.mvs.sofa_capabilities_and_apis_for_apps">6.3. Sofa Capab
 ilities &amp; APIs for Apps</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sofa_name_mapping">6.4. Sofa Name Mapping</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.name_mapping_aggregate">6.4.1. Name Mapping in an Aggregate Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.name_mapping_cpe">6.4.2. Name Mapping in a CPE
+      Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.specifying_cas_view_for_single_view">6.4.3. CAS View for Single-View Parts</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.name_mapping_application">6.4.4. Name Mapping in a UIMA Application</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.name_mapping_remote_services">6.4.5. Name Mapping for Remote Services</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.mvs.jcas_extensions_for_multi_views">6.5. JCas extensions for Multiple Views</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sample_application">6.6. Sample Multi-View Application</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.mvs.sample_application.descriptor">6.6.1. Annotator Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sample_application.setup">6.6.2. Application Setup</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs
 .sample_application.annotator_processing">6.6.3. Annotator Processing</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sample_application.accessing_results">6.6.4. Accessing the results of analysis</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.mvs.views_api_summary">6.7. Views API Summary</a></span></dt><dt><span class="section"><a href="#ugr.tug.mvs.sofa_incompatibilities_v1_v2">6.8. Sofa Incompatibilities: V1 and V2</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tug.cm">7. CAS Multiplier</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.developing_multiplier_code">7.1. Developing the CAS Multiplier Code</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.cm_interface_overview">7.1.1. CAS Multiplier Interface Overview</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.how_to_get_empty_cas_instance">7.1.2. Getting an empty CAS Instance</a></span></dt><dt><span class="secti
 on"><a href="#ugr.tug.cm.example_code">7.1.3. Example Code</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cm.creating_cm_descriptor">7.2. CAS Multiplier Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.using_cm_in_aae">7.3. Using CAS Multipliers in Aggregates</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.adding_cm_to_aggregate">7.3.1. Aggregate: Adding the CAS Multiplier</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.cm_and_fc">7.3.2. CAS Multipliers and Flow Control</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.aggregate_cms">7.3.3. Aggregate CAS Multipliers</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cm.using_cm_in_cpe">7.4. CAS Multipliers in CPE's</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.calling_cm_from_app">7.5. Applications: Calling CAS Multipliers</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.retriev
 ing_output_cases">7.5.1. Output CASes</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.using_cm_with_other_aes">7.5.2. CAS Multipliers with other AEs</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tug.cm.using_cm_to_merge_cases">7.6. Merging with CAS Multipliers</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.cm.overview_of_how_to_merge_cases">7.6.1. CAS Merging Overview</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.example_cas_merger">7.6.2. Example CAS Merger</a></span></dt><dt><span class="section"><a href="#ugr.tug.cm.using_the_simple_text_merger_in_an_aggregate_ae">7.6.3. SimpleTextMerger in an Aggregate</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#ugr.tug.xmi_emf">8. XMI &amp; EMF</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tug.xmi_emf.overview">8.1. Overview</a></span></dt><dt><span class="section"><a href="#ugr.tug.xmi_emf.converting_ecore_to_from_uima_type_
 system">8.2. Converting an Ecore Model to or from a UIMA Type System</a></span></dt><dt><span class="section"><a href="#ugr.tug.xmi_emf.using_xmi_cas_serialization">8.3. Using XMI CAS Serialization</a></span></dt></dl></dd></dl></div><div class="chapter" lang="en" id="ugr.tug.aae"><div class="titlepage"><div><div><h2 class="title"><a name="ugr.tug.aae"></a>Chapter&nbsp;1.&nbsp;Annotator and Analysis Engine Developer's Guide</h2></div></div></div><p>This chapter describes how to develop UIMA <span class="emphasis"><em>type systems</em></span>,
+    <span class="emphasis"><em>Annotators</em></span> and <span class="emphasis"><em>Analysis Engines</em></span> using
+    the UIMA SDK. It is helpful to read the UIMA Conceptual Overview chapter for a review on
+    these concepts.</p><p>An <span class="emphasis"><em>Analysis Engine (AE)</em></span> is a program that analyzes artifacts
+    (e.g. documents) and infers information from them.</p><p>Analysis Engines are constructed from building blocks called
+    <span class="emphasis"><em>Annotators</em></span>. An annotator is a component that contains analysis
+    logic. Annotators analyze an artifact (for example, a text document) and create
+    additional data (metadata) about that artifact. It is a goal of UIMA that annotators need
+    not be concerned with anything other than their analysis logic &#8211; for example the
+    details of their deployment or their interaction with other annotators.</p><p>An Analysis Engine (AE) may contain a single annotator (this is referred to as a
+    <span class="emphasis"><em>Primitive AE)</em></span>, or it may be a composition of others and therefore
+    contain multiple annotators (this is referred to as an <span class="emphasis"><em>Aggregate
+    AE</em></span>). Primitive and aggregate AEs implement the same interface and can be used
+    interchangeably by applications.</p><p>Annotators produce their analysis results in the form of typed <span class="emphasis"><em>Feature
+    Structures</em></span>, which are simply data structures that have a type and a set of
+    (attribute, value) pairs. An <span class="emphasis"><em>annotation</em></span> is a particular type of
+    Feature Structure that is attached to a region of the artifact being analyzed (a span of
+    text in a document, for example).</p><p>For example, an annotator may produce an Annotation over the span of text
+    <code class="literal">President Bush</code>, where the type of the Annotation is
+    <code class="literal">Person</code> and the attribute <code class="literal">fullName</code> has the
+    value <code class="literal">George W. Bush</code>, and its position in the artifact is character
+    position 12 through character position 26.</p><p>It is also possible for annotators to record information associated with the entire
+    document rather than a particular span (these are considered Feature Structures but not
+    Annotations).</p><p>All feature structures, including annotations, are represented in the UIMA
+    <span class="emphasis"><em>Common Analysis Structure(CAS)</em></span>. The CAS is the central data
+    structure through which all UIMA components communicate. Included with the UIMA SDK is an
+    easy-to-use, native Java interface to the CAS called the <span class="emphasis"><em>JCas</em></span>.
+    The JCas represents each feature structure as a Java object; the example feature
+    structure from the previous paragraph would be an instance of a Java class Person with
+    getFullName() and setFullName() methods. Though the examples in this guide all use the
+    JCas, it is also possible to directly access the underlying CAS system; for more
+    information see <a href="../references/references.html#ugr.ref.cas" class="olink">Chapter&nbsp;4, CAS Reference
+      </a> in <span class="olinkdocname">UIMA References</span>
+    .</p><p>The remainder of this chapter will refer to the analysis of text documents and the
+    creation of annotations that are attached to spans of text in those documents. Keep in mind
+    that the CAS can represent arbitrary types of feature structures, and feature structures
+    can refer to other feature structures. For example, you can use the CAS to represent a parse
+    tree for a document. Also, the artifact that you are analyzing need not be a text
+    document.</p><p>This guide is organized as follows:</p><div class="itemizedlist"><ul type="disc"><li><p><span class="bold-italic"><a href="#ugr.tug.aae.getting_started" title="1.1.&nbsp;Getting Started">Section&nbsp;1.1, &#8220;Getting Started&#8221;</a></span> is a
+        tutorial with step-by-step instructions for how to develop and test a simple UIMA annotator.</p></li><li><p><span class="bold-italic"><a href="#ugr.tug.aae.configuration_logging" title="1.2.&nbsp;Configuration and Logging">Section&nbsp;1.2, &#8220;Configuration and Logging&#8221;</a>
+        </span> discusses how to make your UIMA annotator configurable, and how it can write messages to the UIMA
+        log file.</p></li><li><p> <span class="bold-italic"><a href="#ugr.tug.aae.building_aggregates" title="1.3.&nbsp;Building Aggregate Analysis Engines">Section&nbsp;1.3, &#8220;Building Aggregate Analysis Engines&#8221;</a></span>
+        describes how annotators can be combined into aggregate analysis engines. It also describes how one
+        annotator can make use of the analysis results produced by an annotator that has run previously.</p></li><li><p><span class="bold-italic"><a href="#ugr.tug.aae.other_examples" title="1.4.&nbsp;Other examples">Section&nbsp;1.4, &#8220;Other examples&#8221;</a></span>
+        describes several other examples you may find interesting, including</p><div class="itemizedlist"><ul type="circle" compact><li><p>SimpleTokenAndSentenceAnnotator
+            &#8211; a simple tokenizer and sentence annotator.</p></li><li><p>PersonTitleDBWriterCasConsumer &#8211; a sample CAS Consumer which populates a relational
+            database with some annotations. It uses JDBC and in this example, hooks up with the Open Source Apache
+            Derby database. </p></li></ul></div></li><li><p><span class="bold-italic"><a href="#ugr.tug.aae.additional_topics" title="1.5.&nbsp;Additional Topics">Section&nbsp;1.5, &#8220;Additional Topics&#8221;</a></span>
+        describes additional features of the UIMA SDK that may help you in building your own annotators and analysis
+        engines.</p></li><li><p><span class="bold-italic"><a href="#ugr.tug.aae.common_pitfalls" title="1.6.&nbsp;Common Pitfalls">Section&nbsp;1.6, &#8220;Common Pitfalls&#8221;</a> </span>
+        contains some useful guidelines to help you ensure that your annotators will work correctly in any UIMA
+        application.</p></li></ul></div><p>This guide does not discuss how to build UIMA Applications, which are programs that
+    use Analysis Engines, along with other components, e.g. a search engine, document store,
+    and user interface, to deliver a complete package of functionality to an end-user. For
+    information on application development, see <a href="tutorials_and_users_guides.html#ugr.tug.application" class="olink">Chapter&nbsp;3: &#8220;Application Developer's Guide&#8221;</a>
+    .</p><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="ugr.tug.aae.getting_started"></a>1.1.&nbsp;Getting Started</h2></div></div></div><p>This section is a step-by-step tutorial that will get you started developing UIMA
+      annotators. All of the files referred to by the examples in this chapter are in the
+      <code class="literal">examples</code> directory of the UIMA SDK. This directory is designed to
+      be imported into your Eclipse workspace; see <a href="../overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup.example_code" class="olink">Section&nbsp;3.2, &#8220;Setting up Eclipse to view Example Code&#8221;</a> in <span class="olinkdocname">Overview &amp; Setup</span> for instructions on how to do
+      this. Also you may wish to refer to the UIMA SDK JavaDocs located in the <a href="file:api/index.html" target="_top">docs/api</a> directory.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>In Eclipse 3.1, if you highlight a UIMA class or method defined in the UIMA SDK
+    JavaDocs, you can conveniently have Eclipse open the corresponding JavaDoc for that
+    class or method in a browser, by pressing Shift + F2.</p></div><p>The example annotator that we are going to walk through will detect room numbers for
+      rooms where the room numbering scheme follows some simple conventions. In our example,
+      there are two kinds of patterns we want to find; here are some examples, together with
+      their corresponding regular expression patterns:
+      </p><div class="variablelist"><dl><dt><span class="term">Yorktown patterns:</span></dt><dd><p>20-001, 31-206, 04-123(Regular Expression Pattern:
+            ##-[0-2]##)</p></dd><dt><span class="term">Hawthorne patterns:</span></dt><dd><p>GN-K35, 1S-L07, 4N-B21 (Regular Expression Pattern:
+            [G1-4][NS]-[A-Z]##)</p></dd></dl></div><p> </p><p>There are several steps to develop and test a simple UIMA annotator.</p><div class="orderedlist"><ol type="1" compact><li><p>Define the CAS types that the
+      annotator will use.</p></li><li><p>Generate the Java classes for these types.</p></li><li><p>Write the actual annotator Java code.</p></li><li><p>Create the Analysis Engine descriptor.</p></li><li><p>Test the annotator. </p></li></ol></div><p>These steps are discussed in the next sections.</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.defining_types"></a>1.1.1.&nbsp;Defining Types</h3></div></div></div><p>The first step in developing an annotator is to define the CAS Feature Structure
+        types that it creates. This is done in an XML file called a <span class="emphasis"><em>Type System
+        Descriptor</em></span>. UIMA defines basic primitive types such as
+        Boolean, Byte, Short, Integer, Long, Float, and Double, as well as Arrays of these primitive
+        types.  UIMA also defines the built-in types <code class="literal">TOP</code>, which is the root 
+        of the type system, analogous to Object in Java; <code class="literal">FSArray</code>, which is 
+        an array of Feature Structures (i.e. an array of instances of TOP); and
+        <code class="literal">Annotation</code>, which we will discuss in more detail in this section.</p><p>UIMA includes an Eclipse plug-in that will help you edit Type System
+        Descriptors, so if you are using Eclipse you will not need to worry about the details of
+        the XML syntax. See <a href="../overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup" class="olink">Chapter&nbsp;3, Setting up the Eclipse IDE to work with UIMA
+      </a> in <span class="olinkdocname">Overview &amp; Setup</span> for instructions on setting up Eclipse and
+        installing the plugin.</p><p>The Type System Descriptor for our annotator is located in the file
+        <code class="literal">descriptors/tutorial/ex1/TutorialTypeSystem.xml.</code> (This
+        and all other examples are located in the <code class="literal">examples</code> directory of
+        the installation of the UIMA SDK, which can be imported into an Eclipse project for
+        your convenience, as described in <a href="../overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup.example_code" class="olink">Section&nbsp;3.2, &#8220;Setting up Eclipse to view Example Code&#8221;</a> in <span class="olinkdocname">Overview &amp; Setup</span>.)</p><p>In Eclipse, expand the <code class="literal">uimaj-examples</code> project in the
+        Package Explorer view, and browse to the file
+        <code class="literal">descriptors/tutorial/ex1/TutorialTypeSystem.xml</code>.
+        Right-click on the file in the navigator and select Open With <span class="symbol">&#8594;</span> Component
+        Descriptor Editor. Once the editor opens, click on the &#8220;<span class="quote">Type System</span>&#8221;
+        tab at the bottom of the editor window. You should see a view such as the
+        following:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="768"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image002.jpg" width="768" alt="Screenshot of editor for Type System Definitions"></td></tr></table></div></div><p>Our annotator will need only one type &#8211;
+        <code class="literal">org.apache.uima.tutorial.RoomNumber</code>. (We use the same
+        namespace conventions as are used for Java classes.) Just as in Java, types have
+        supertypes. The supertype is listed in the second column of the left table. In this
+        case our RoomNumber annotation extends from the built-in type
+        <code class="literal">uima.tcas.Annotation</code>.</p><p>Descriptions can be included with types and features. In this example, there is a
+        description associated with the <code class="literal">building</code> feature. To see it,
+        hover the mouse over the feature.</p><p>The bottom tab labeled &#8220;<span class="quote">Source</span>&#8221; will show you the XML source file
+        associated with this descriptor.</p><p>The built-in Annotation type declares three fields (called
+        <span class="emphasis"><em>Features</em></span> in CAS terminology).  The features <code class="literal">begin</code>
+        and <code class="literal">end</code> store the character offsets of the span of text to which the 
+        annotation refers.  The feature <code class="literal">sofa</code> (Subject of Analysis) indicates
+        which document the begin and end offsets point into.  The <code class="literal">sofa</code> feature
+        can be ignored for now since we assume in this tutorial that the CAS contains only one
+        subject of analysis (document).</p><p>Our RoomNumber type will inherit these three features from
+        <code class="literal">uima.tcas.Annotation</code>, its supertype; they are not visible in
+        this view because inherited features are not shown. One additional feature,
+        <code class="literal">building</code>, is declared. It takes a String as its value. Instead
+        of String, we could have declared the range-type of our feature to be any other CAS type
+        (defined or built-in).</p><p>If you are not using Eclipse, if you need to edit the type system, do so using any XML
+        or text editor, directly. The following is the actual XML representation of the Type
+        System displayed above in the editor:</p><pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
+  &lt;typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier"&gt;
+    &lt;name&gt;TutorialTypeSystem&lt;/name&gt;
+    &lt;description&gt;Type System Definition for the tutorial examples - 
+        as of Exercise 1&lt;/description&gt;
+    &lt;vendor&gt;Apache Software Foundation&lt;/vendor&gt;
+    &lt;version&gt;1.0&lt;/version&gt;
+    &lt;types&gt;
+      &lt;typeDescription&gt;
+        &lt;name&gt;org.apache.uima.tutorial.RoomNumber&lt;/name&gt;
+        &lt;description&gt;&lt;/description&gt;
+        &lt;supertypeName&gt;uima.tcas.Annotation&lt;/supertypeName&gt;
+        &lt;features&gt;
+          &lt;featureDescription&gt;
+            &lt;name&gt;building&lt;/name&gt;
+            &lt;description&gt;Building containing this room&lt;/description&gt;
+            &lt;rangeTypeName&gt;uima.cas.String&lt;/rangeTypeName&gt;
+          &lt;/featureDescription&gt;
+        &lt;/features&gt;
+      &lt;/typeDescription&gt;
+    &lt;/types&gt;
+  &lt;/typeSystemDescription&gt;</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.generating_jcas_sources"></a>1.1.2.&nbsp;Generating Java Source Files for CAS Types</h3></div></div></div><p>When you save a descriptor that you have modified, the Component Descriptor
+        Editor will automatically generate Java classes corresponding to the types that are
+        defined in that descriptor (unless this has been disabled), using a utility called
+        JCasGen. These Java classes will have the same name (including package) as the CAS
+        types, and will have get and set methods for each of the features that you have
+        defined.</p><p>This feature is enabled/disabled using the UIMA menu pulldown (or the Eclipse
+        Preferences <span class="symbol">&#8594;</span> UIMA). If automatic running of JCasGen is not happening, please
+        make sure the option is checked:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="575"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image004.jpg" width="575" alt="Screenshot of enabling automatic running of JCasGen"></td></tr></table></div></div><p>The Java class for the example org.apache.uima.tutorial.RoomNumber type can
+        be found in <code class="literal">src/org/apache/uima/tutorial/RoomNumber.java</code>
+        . You will see how to use these generated classes in the next section.</p><p>If you are not using the Component Descriptor Editor, you will need to generate
+        these Java classes by using the <span class="emphasis"><em>JCasGen</em></span> tool. JCasGen reads a
+        Type System Descriptor XML file and generates the corresponding Java classes that
+        you can then use in your annotator code. To launch JCasGen, run the jcasgen shell
+        script located in the <code class="literal">/bin</code> directory of the UIMA SDK
+        installation. This should launch a GUI that looks something like this:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="532"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image006.jpg" width="532" alt="Screenshot of JCasGen"></td></tr></table></div></div><p>Use the &#8220;<span class="quote">Browse</span>&#8221; buttons to select your input file
+        (TutorialTypeSystem.xml) and output directory (the root of the source tree into
+        which you want the generated files placed). Then click the &#8220;<span class="quote">Go</span>&#8221;
+        button. If the Type System Descriptor has no errors, new Java source files will be
+        generated under the specified output directory.</p><p>There are some additional options to choose from when running JCasGen; please
+        refer to the <a href="../tools/tools.html#ugr.tools.jcasgen" class="olink">Chapter&nbsp;6, JCasGen User's Guide
+      </a> in <span class="olinkdocname">UIMA Tools Guide and Reference</span> for details.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.developing_annotator_code"></a>1.1.3.&nbsp;Developing Your Annotator Code</h3></div></div></div><p>Annotator implementations all implement a standard interface (AnalysisComponent), having several
+        methods, the most important of which are:
+        
+        </p><div class="itemizedlist"><ul type="disc" compact><li><p><code class="literal">initialize</code>, </p></li><li><p><code class="literal">process</code>, and </p></li><li><p><code class="literal">destroy</code>. </p></li></ul></div><p><code class="literal">initialize</code> is called by the framework once when it first creates an instance of the
+        annotator class. <code class="literal">process</code> is called once per item being processed.
+        <code class="literal">destroy</code> may be called by the application when it is done using your annotator. There is a 
+        default implementation of this interface for annotators using the JCas, called JCasAnnotator_ImplBase, which 
+        has implementations of all required methods except for the process method.</p><p>Our annotator class extends the JCasAnnotator_ImplBase; most annotators that use the JCas will extend
+        from this class, so they only have to implement the process method. This class is not restricted to handling
+        just text; see <a href="tutorials_and_users_guides.html#ugr.tug.aas" class="olink">Chapter&nbsp;5, Annotations, Artifacts, and Sofas
+      </a>.</p><p>Annotators are not required to extend from the JCasAnnotator_ImplBase class; they may instead
+        directly implement the AnalysisComponent interface, and provide all method implementations themselves.
+        <sup>[<a name="d0e414" href="#ftn.d0e414">1</a>]</sup> This allows you to have
+        your annotator inherit from some other superclass if necessary. If you would like to do this, see the JavaDocs
+        for JCasAnnotator for descriptions of the methods you must implement.</p><p>Annotator classes need to be public, cannot be declared abstract, and must have public, 0-argument 
+        constructors, so that they can be instantiated by the framework. <sup>[<a name="d0e432" href="#ftn.d0e432">2</a>]</sup> .</p><p>The class definition for our RoomNumberAnnotator implements the process method, and is shown here. You
+        can find the source for this in the
+        <code class="literal">uimaj-examples/src/org/apache/uima/tutorial/ex1/RoomNumberAnnotator.java</code> .
+        </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>In Eclipse, in the &#8220;<span class="quote">Package Explorer</span>&#8221; view, this will appear by default in the project
+          <code class="literal">uimaj-examples</code>, in the folder <code class="literal">src</code>, in the package
+          <code class="literal">org.apache.uima.tutorial.ex1</code>.</p></div><p> In Eclipse, open the
+        RoomNumberAnnotator.java in the uimaj-examples project, under the src directory.</p><pre class="programlisting">package org.apache.uima.tutorial.ex1;
+
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
+import org.apache.uima.jcas.JCas;
+import org.apache.uima.tutorial.RoomNumber;
+
+/**
+ * Example annotator that detects room numbers using 
+ * Java 1.4 regular expressions.
+ */
+public class RoomNumberAnnotator extends JCasAnnotator_ImplBase {
+  private Pattern mYorktownPattern = 
+        Pattern.compile("\\b[0-4]\\d-[0-2]\\d\\d\\b");
+
+  private Pattern mHawthornePattern = 
+        Pattern.compile("\\b[G1-4][NS]-[A-Z]\\d\\d\\b");
+
+  public void process(JCas aJCas) {
+    // Discussed Later
+  }
+}</pre><p>The two Java class fields, mYorktownPattern and mHawthornePattern, hold regular expressions that
+        will be used in the process method. Note that these two fields are part of the Java implementation of the
+        annotator code, and not a part of the CAS type system. We are using the regular expression facility that is
+        built into Java 1.4. It is not critical that you know the details of how this works, but if you are curious the
+        details can be found in the Java API docs for the java.util.regex package.</p><p>The only method that we are required to implement is <code class="literal">process</code>. This method is typically 
+        called once for each document that is being analyzed. This method takes one argument, which is a JCas instance; 
+        this holds the document to be analyzed and all of the analysis results. <sup>[<a name="d0e466" href="#ftn.d0e466">3</a>]</sup></p><pre class="programlisting">public void process(JCas aJCas) {
+  // get document text
+  String docText = aJCas.getDocumentText();
+  // search for Yorktown room numbers
+  Matcher matcher = mYorktownPattern.matcher(docText);
+  int pos = 0;
+  while (matcher.find(pos)) {
+    // found one - create annotation
+    RoomNumber annotation = new RoomNumber(aJCas);
+    annotation.setBegin(matcher.start());
+    annotation.setEnd(matcher.end());
+    annotation.setBuilding("Yorktown");
+    annotation.addToIndexes();
+    pos = matcher.end();
+  }
+  // search for Hawthorne room numbers
+  matcher = mHawthornePattern.matcher(docText);
+  pos = 0;
+  while (matcher.find(pos)) {
+    // found one - create annotation
+    RoomNumber annotation = new RoomNumber(aJCas);
+    annotation.setBegin(matcher.start());
+    annotation.setEnd(matcher.end());
+    annotation.setBuilding("Hawthorne");
+    annotation.addToIndexes();
+    pos = matcher.end();
+  }
+}</pre><p>The Matcher class is part of the java.util.regex package and is used to find the room numbers in the
+        document text. When we find one, recording the annotation is as simple as creating a new Java object and
+        calling some set methods:</p><pre class="programlisting">RoomNumber annotation = new RoomNumber(aJCas);
+annotation.setBegin(matcher.start());
+annotation.setEnd(matcher.end());
+annotation.setBuilding("Yorktown");</pre><p>The <code class="literal">RoomNumber</code> class was generated from the type system description by the
+        Component Descriptor Editor or the JCasGen tool, as discussed in the previous section.</p><p>Finally, we call <code class="literal">annotation.addToIndexes()</code> to add the new annotation to the
+        indexes maintained in the CAS. By default, the CAS implementation used for analysis of text documents keeps
+        an index of all annotations in their order from beginning to end of the document. Subsequent annotators or
+        applications use the indexes to iterate over the annotations. </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p> If you don't add the instance to the indexes, it cannot be retrieved by down-stream annotators,
+        using the indexes. </p></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>You can also call <code class="literal">addToIndexes()</code> on Feature Structures that are not subtypes of
+        <code class="literal">uima.tcas.Annotation</code>, but these will not be sorted in any particular way. If you want
+        to specify a sort order, you can define your own custom indexes in the CAS: see <a href="../references/references.html#ugr.ref.cas" class="olink">Chapter&nbsp;4, CAS Reference
+      </a> in <span class="olinkdocname">UIMA References</span> and <a href="../references/references.html#ugr.ref.xml.component_descriptor.aes.index" class="olink">Section&nbsp;2.4.1.7, &#8220;Index Definition&#8221;</a> in <span class="olinkdocname">UIMA References</span> for details.</p></div><p>We're almost ready to test the RoomNumberAnnotator. There is just one more step
+        remaining.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.creating_xml_descriptor"></a>1.1.4.&nbsp;Creating the XML Descriptor</h3></div></div></div><p>The UIMA architecture requires that descriptive information about an
+        annotator be represented in an XML file and provided along with the annotator class
+        file(s) to the UIMA framework at run time. This XML file is called an
+        <span class="emphasis"><em>Analysis Engine Descriptor</em></span>. The descriptor includes:
+        
+        </p><div class="itemizedlist"><ul type="disc"><li><p>Name, description, version, and vendor</p></li><li><p>The annotator's inputs and outputs, defined in terms of
+            the types in a Type System Descriptor</p></li><li><p>Declaration of the configuration parameters that the
+            annotator accepts </p></li></ul></div><p> </p><p>The <span class="emphasis"><em>Component Descriptor Editor</em></span> plugin, which we
+        previously used to edit the Type System descriptor, can also be used to edit Analysis
+        Engine Descriptors.</p><p>A descriptor for our RoomNumberAnnotator is provided with the UIMA
+        distribution under the name
+        <code class="literal">descriptors/tutorial/ex1/RoomNumberAnnotator.xml.</code> To
+        edit it in Eclipse, right-click on that file in the navigator and select Open With
+        <span class="symbol">&#8594;</span> Component Descriptor Editor.</p><div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Tip</h3><p>In Eclipse, you can double
+      click on the tab at the top of the Component Descriptor Editor's window
+      identifying the currently selected editor, and the window will
+      &#8220;<span class="quote">Maximize</span>&#8221;. Double click it again to restore the original size.</p></div><p>If you are not using Eclipse, you will need to edit Analysis Engine descriptors
+        manually. See <a href="#ugr.tug.aae.xml_intro_ae_descriptor" title="1.8.&nbsp;Introduction to Analysis Engine Descriptor XML Syntax">Section&nbsp;1.8, &#8220;Analysis Engine XML Descriptor&#8221;</a> for an
+        introduction to the Analysis Engine descriptor XML syntax. The remainder of this
+        section assumes you are using the Component Descriptor Editor plug-in to edit the
+        Analysis Engine descriptor.</p><p>The Component Descriptor Editor consists of several tabbed pages; we will only
+        need to use a few of them here. For more information on using this editor, see <a href="../tools/tools.html#ugr.tools.cde" class="olink">Chapter&nbsp;1, Component Descriptor Editor User's Guide
+      </a> in <span class="olinkdocname">UIMA Tools Guide and Reference</span>.</p><p>The initial page of the Component Descriptor Editor is the Overview page, which
+        appears as follows:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image008.jpg" width="574" alt="Screenshot of Component Descriptor Editor overview page"></td></tr></table></div></div><p>This presents an overview of the RoomNumberAnnotator Analysis Engine (AE). The
+        left side of the page shows that this descriptor is for a
+        <span class="emphasis"><em>Primitive</em></span> AE (meaning it consists of a single annotator),
+        and that the annotator code is developed in Java. Also, it specifies the Java class
+        that implements our logic (the code which was discussed in the previous section).
+        Finally, on the right side of the page are listed some descriptive attributes of our
+        annotator.</p><p>The other two pages that need to be filled out are the Type System page and the
+        Capabilities page. You can switch to these pages using the tabs at the bottom of the
+        Component Descriptor Editor. In the tutorial, these are already filled out for
+        you.</p><p>The RoomNumberAnnotator will be using the TutorialTypeSystem we looked at in
+        Section <a href="#ugr.tug.aae.defining_types" title="1.1.1.&nbsp;Defining Types">Section&nbsp;1.1.1, &#8220;Defining Types&#8221;</a>. To specify this, we add
+        this type system to the Analysis Engine's list of Imported Type Systems, using
+        the Type System page's right side panel, as shown here:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="576"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image010.jpg" width="576" alt="Screenshot of CDE Type System page"></td></tr></table></div></div><p>On the Capabilities page, we define our annotator's inputs and outputs, in
+        terms of the types in the type system. The Capabilities page is shown below:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="534"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image012.jpg" width="534" alt="Screenshot of CDE Capabilities page"></td></tr></table></div></div><p>Although capabilities come in sets, having multiple sets is deprecated; here
+        we're just using one set. The RoomNumberAnnotator is very simple. It requires
+        no input types, as it operates directly on the document text -- which is supplied as a
+        part of the CAS initialization (and which is always assumed to be present). It
+        produces only one output type (RoomNumber), and it sets the value of the
+        <code class="literal">building</code> feature on that type. This is all represented on the
+        Capabilities page.</p><p>The Capabilities page has two other parts for specifying languages and Sofas.
+        The languages section allows you to specify which languages your Analysis Engine
+        supports. The RoomNumberAnnotator happens to be language-independent, so we can
+        leave this blank. The Sofas section allows you to specify the names of additional
+        subjects of analysis. This capability and the Sofa Mappings at the bottom are
+        advanced topics, described in <a href="tutorials_and_users_guides.html#ugr.tug.aas" class="olink">Chapter&nbsp;5, Annotations, Artifacts, and Sofas
+      </a>. </p><p>This is all of the information we need to provide for a simple annotator. If you
+        want to peek at the XML that this tool saves you from having to write, click on the
+        &#8220;<span class="quote">Source</span>&#8221; tab at the bottom to view the generated XML.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.testing_your_annotator"></a>1.1.5.&nbsp;Testing Your Annotator</h3></div></div></div><p>Having developed an annotator, we need a way to try it out on some example
+        documents. The UIMA SDK includes a tool called the Document Analyzer that will allow
+        us to do this. To run the Document Analyzer, execute the documentAnalyzer shell
+        script that is in the <code class="literal">bin</code> directory of your UIMA SDK
+        installation, or, if you are using the example Eclipse project, execute the
+        &#8220;<span class="quote">UIMA Document Analyzer</span>&#8221; run configuration supplied with that
+        project. (To do this, click on the menu bar Run <span class="symbol">&#8594;</span> Run ... <span class="symbol">&#8594;</span> and under Java
+        Applications in the left box, click on UIMA Document Analyzer.)</p><p>You should see a screen that looks like this:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image014.jpg" width="574" alt="Screenshot of UIMA Document Analyzer GUI"></td></tr></table></div></div><p>There are six options on this screen:</p><div class="orderedlist"><ol type="1"><li><p>Directory containing documents to analyze</p></li><li><p>Directory where analysis results will be written</p></li><li><p>The XML descriptor for the Analysis Engine (AE) you want to
+          run</p></li><li><p>(Optional) an XML tag, within the input documents, that contains
+          the text to be analyzed. For example, the value TEXT would cause the AE to only
+          analyze the portion of the document enclosed within
+          &lt;TEXT&gt;...&lt;/TEXT&gt; tags.</p></li><li><p>Language of the document </p></li><li><p>Character encoding </p></li></ol></div><p>Use the Browse button next to the third item to set the &#8220;<span class="quote">Location of AE XML
+        Descriptor</span>&#8221; field to the descriptor we've just been discussing
+        &#8212;
+        <code class="literal">&lt;where-you-installed-uima-e.g.UIMA_HOME&gt; 
+          /examples/descriptors/tutorial/ex1/RoomNumberAnnotator.xml</code>
+        . Set the other fields to the values shown in the screen shot above (which should be the
+        default values if this is the first time you've run the Document Analyzer). Then
+        click the &#8220;<span class="quote">Run</span>&#8221; button to start processing.</p><p>When processing completes, an &#8220;<span class="quote">Analysis Results</span>&#8221; window should
+        appear.</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="332"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image016.jpg" width="332" alt="Screenshot of UIMA Document Analyzer Results GUI"></td></tr></table></div></div><p>Make sure &#8220;<span class="quote">Java Viewer</span>&#8221; is selected as the Results Display
+        Format, and <span class="bold"><strong>double-click</strong></span> on the document
+        UIMASummerSchool2003.txt to view the annotations that were discovered. The view
+        should look something like this:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="510"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image018.jpg" width="510" alt="Screenshot of UIMA CAS Annotation Viewer GUI"></td></tr></table></div></div><p>You can click the mouse on one of the highlighted annotations to see a list of all
+        its features in the frame on the right.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The legend will only show
+      those types which have at least one instance in the CAS, and are declared as outputs in the
+      capabilities section of the descriptor (see <a href="#ugr.tug.aae.creating_xml_descriptor" title="1.1.4.&nbsp;Creating the XML Descriptor">Section&nbsp;1.1.4, &#8220;Creating the XML Descriptor&#8221;</a>. </p></div><p>You can use the DocumentAnalyzer to test any UIMA annotator
+        &#8212; just make sure that the annotator's classes are in the class
+        path.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="ugr.tug.aae.configuration_logging"></a>1.2.&nbsp;Configuration and Logging</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.configuration_parameters"></a>1.2.1.&nbsp;Configuration Parameters</h3></div></div></div><p>The example RoomNumberAnnotator from the previous section used hardcoded
+        regular expressions and location names, which is obviously not very flexible. For
+        example, you might want to have the patterns of room numbers be supplied by a
+        configuration parameter, rather than having to redo the annotator's Java code
+        to add additional patterns. Rather than add a new hardcoded regular expression for a
+        new pattern, a better solution is to use configuration parameters.</p><p>UIMA allows annotators to declare configuration parameters in their
+        descriptors. The descriptor also specifies default values for the parameters,
+        though these can be overridden at runtime.</p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.declaring_parameters_in_the_descriptor"></a>1.2.1.1.&nbsp;Declaring Parameters in the Descriptor</h4></div></div></div><p>The example descriptor
+          <code class="literal">descriptors/tutorial/ex2/RoomNumberAnnotator.xml</code> is
+          the same as the descriptor from the previous section except that information has
+          been filled in for the Parameters and Parameter Settings pages of the Component
+          Descriptor Editor.</p><p>First, in Eclipse, open example two's RoomNumberAnnotator in the
+          Component Descriptor Editor, and then go to the Parameters page (click on the
+          parameters tab at the bottom of the window), which is shown below:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="538"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image020.jpg" width="538" alt="Screenshot of UIMA Component Descriptor Editor (CDE) Parameters page"></td></tr></table></div></div><p>Two parameters &#8211; Patterns and Locations -- have been declared. In this
+          screen shot, the mouse (not shown) is hovering over Patterns to show its
+          description in the small popup window. Every parameter has the following
+          information associated with it:</p><div class="itemizedlist"><ul type="disc"><li><p>name &#8211; the name by which the annotator code
+          refers to the parameter</p></li><li><p>description &#8211; a natural language description of the
+            intent of the parameter</p></li><li><p>type &#8211; the data type of the parameter's value
+            &#8211; must be one of String, Integer, Float, or Boolean.</p></li><li><p>multiValued &#8211; true if the parameter can take
+            multiple-values (an array), false if the parameter takes only a single value.
+            Shown above as <code class="literal">Multi</code>.</p></li><li><p>mandatory &#8211; true if a value must be provided for the
+            parameter. Shown above as <code class="literal">Req</code> (for required). </p></li></ul></div><p>Both of our parameters are mandatory and accept an array of Strings as their
+          value.</p><p>Next, default values are assigned to the parameters on the Parameter Settings
+          page:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="538"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image022.jpg" width="538" alt="Screenshot of UIMA Component Descriptor Editor (CDE) Parameter Settings page"></td></tr></table></div></div><p>Here the &#8220;<span class="quote">Patterns</span>&#8221; parameter is selected, and the right pane
+          shows the list of values for this parameter, in this case the regular expressions
+          that match particular room numbering conventions. Notice the third pattern is
+          new, for matching the style of room numbers in the third building, which has room
+          numbers such as <code class="literal">J2-A11</code>.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.accessing_parameter_values_from_annotator"></a>1.2.1.2.&nbsp;Accessing Parameter Values from the Annotator Code</h4></div></div></div><p>The class
+          <code class="literal">org.apache.uima.tutorial.ex2.RoomNumberAnnotator</code> has
+          overridden the initialize method. The initialize method is called by the UIMA
+          framework when the annotator is instantiated, so it is a good place to read
+          configuration parameter values. The default initialize method does nothing with
+          configuration parameters, so you have to override it. To see the code in Eclipse,
+          switch to the src folder, and open
+          <code class="literal">org.apache.uima.tutorial.ex2</code>. Here is the method
+          body:</p><pre class="programlisting">/**
+* @see AnalysisComponent#initialize(UimaContext)
+*/
+public void initialize(UimaContext aContext) 
+        throws ResourceInitializationException {
+  super.initialize(aContext);
+  
+  // Get config. parameter values  
+  String[] patternStrings = 
+        (String[]) aContext.getConfigParameterValue("Patterns");
+  mLocations = 
+        (String[]) aContext.getConfigParameterValue("Locations");
+
+  // compile regular expressions
+  mPatterns = new Pattern[patternStrings.length];
+  for (int i = 0; i &lt; patternStrings.length; i++) {
+    mPatterns[i] = Pattern.compile(patternStrings[i]);
+  }
+}</pre><p>Configuration parameter values are accessed through the UimaContext. As you
+          will see in subsequent sections of this chapter, the UimaContext is the
+          annotator's access point for all of the facilities provided by the UIMA
+          framework &#8211; for example logging and external resource access.</p><p>The UimaContext's <code class="literal">getConfigParameterValue</code>
+          method takes the name of the parameter as an argument; this must match one of the
+          parameters declared in the descriptor. The return value of this method is a Java
+          Object, whose type corresponds to the declared type of the parameter. It is up to the
+          annotator to cast it to the appropriate type, String[] in this case.</p><p>If there is a problem retrieving the parameter values, the framework throws an
+          exception. Generally annotators don't handle these, and just let them
+          propagate up.</p><p>To see the configuration parameters working, run the Document Analyzer
+          application and select the descriptor
+          <code class="literal">examples/descriptors/tutorial/ex2/RoomNumberAnnotator.xml</code>
+          . In the example document <code class="literal">WatsonConferenceRooms.txt</code>, you
+          should see some examples of Hawthorne II room numbers that would not have been
+          detected by the ex1 version of RoomNumberAnnotator.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.supporting_reconfiguration"></a>1.2.1.3.&nbsp;Supporting Reconfiguration</h4></div></div></div><p>If you take a look at the JavaDocs (located in the <a href="api/index.html" target="_top">docs/api</a> directory) for
+          <code class="literal">org.apache.uima.analysis_component.AnaysisComponent</code>
+          (which our annotator implements indirectly through JCasAnnotator_ImplBase),
+          you will see that there is a reconfigure() method, which is called by the containing
+          application through the UIMA framework, if the configuration parameter values
+          are changed.</p><p>The AnalysisComponent_ImplBase class provides a default implementation
+          that just calls the annotator's destroy method followed by its initialize
+          method. This works fine for our annotator. The only situation in which you might
+          want to override the default reconfigure() is if your annotator has very expensive
+          initialization logic, and you don't want to reinitialize everything if just
+          one configuration parameter has changed. In that case, you can provide a more
+          intelligent implementation of reconfigure() for your annotator.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.configuration_parameter_groups"></a>1.2.1.4.&nbsp;Configuration Parameter Groups</h4></div></div></div><p>For annotators with many sets of configuration parameters, UIMA supports
+          organizing them into groups. It is possible to define a parameter with the same name
+          in multiple groups; one common use for this is for annotators that can process
+          documents in several languages and which want to have different parameter
+          settings for the different languages.</p><p>The syntax for defining parameter groups in your descriptor is fairly
+          straightforward &#8211; see <a href="../references/references.html#ugr.ref.xml.component_descriptor" class="olink">Chapter&nbsp;2, Component Descriptor Reference
+      </a> in <span class="olinkdocname">UIMA References</span> for details. Values of
+          parameters defined within groups are accessed through the two-argument version
+          of <code class="literal">UimaContext.getConfigParameterValue</code>, which takes
+          both the group name and the parameter name as its arguments.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.logging"></a>1.2.2.&nbsp;Logging</h3></div></div></div><p>The UIMA SDK provides a logging facility, which is very similar to the
+        java.util.logging.Logger class that was introduced in Java 1.4.</p><p>In the Java architecture, each logger instance is associated with a name. By
+        convention, this name is often the fully qualified class name of the component
+        issuing the logging call. The name can be referenced in a configuration file when
+        specifying which kinds of log messages to actually log, and where they should
+        go.</p><p>The UIMA framework supports this convention using the
+        <code class="literal">UimaContext</code> object. If you access a logger instance using
+        <code class="literal">getContext().getLogger()</code> within an Annotator, the logger
+        name will be the fully qualified name of the Annotator implementation class.</p><p>Here is an example from the process method of
+        <code class="literal">org.apache.uima.tutorial.ex2.RoomNumberAnnotator</code>:
+        
+        
+        </p><pre class="programlisting">getContext().getLogger().log(Level.FINEST,"Found: " + annotation);</pre><p>
+        </p><p>The first argument to the log method is the level of the log output. Here, a value of
+        FINEST indicates that this is a highly-detailed tracing message. While useful for
+        debugging, it is likely that real applications will not output log messages at this
+        level, in order to improve their performance. Other defined levels, from lowest to
+        highest importance, are FINER, FINE, CONFIG, INFO, WARNING, and SEVERE.</p><p>If no logging configuration file is provided (see next section), the Java
+        Virtual Machine defaults would be used, which typically set the level to INFO and
+        higher messages, and direct output to the console.</p><p>If you specify the standard UIMA SDK <code class="literal">Logger.properties,</code>
+        the output will be directed to a file named uima.log, in the current working directory
+        (often the &#8220;<span class="quote">project</span>&#8221; directory when running from Eclipse, for
+        instance).</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>When using Eclipse, the uima.log file, if written
+      into the Eclipse workspace in the project uimaj-examples, for example, may not appear
+      in the Eclipse package explorer view until you right-click the uimaj-examples project
+      with the mouse, and select &#8220;<span class="quote">Refresh</span>&#8221;. This operation refreshes the
+      Eclipse display to conform to what may have changed on the file system. Also, you can set
+      the Eclipse preferences for the workspace to automatically refresh (Window <span class="symbol">&#8594;</span>
+      Preferences <span class="symbol">&#8594;</span> General <span class="symbol">&#8594;</span> Workspace, then click the &#8220;<span class="quote">refresh
+      automatically</span>&#8221; checkbox.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.configuring"></a>1.2.2.1.&nbsp;Specifying the Logging Configuration</h4></div></div></div><p>The standard UIMA logger uses the underlying Java 1.4 logging mechanism. You
+          can use the APIs that come with that to configure the logging. In addition, the
+          standard Java 1.4 logging initialization mechanisms will look for a Java System
+          Property named <code class="literal">java.util.logging.config.file</code> and if
+          found, will use the value of this property as the name of a standard
+          &#8220;<span class="quote">properties</span>&#8221; file, for setting the logging level. Please refer to
+          the Java 1.4. documentation for more information on the format and use of this
+          file.</p><p>Two sample logging specification property files can be found in the UIMA_HOME
+          directory where the UIMA SDK is installed:
+          <code class="literal">config/Logger.properties</code>, and
+          <code class="literal">config/FileConsoleLogger.properties</code>. These specify the same
+          logging, except the first logs just to a file, while the second logs both to a file and
+          to the console. You can edit these files, or create additional ones, as described
+          below, to change the logging behavior.</p><p>When running your own Java application, you can specify the location of the
+          logging configuration file on your Java command line by setting the Java system
+          property <code class="literal">java.util.logging.config.file</code>to be the logging
+          configuration filename. This file specification can be either absolute or
+          relative to the working directory. For example:
+          
+          
+          </p><pre class="programlisting">java "-Djava.util.logging.config.file=C:/Program Files/apache-uima/config/Logger.properties"</pre><p>
+          </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>In a shell script, you can use environment variables such as
+          UIMA_HOME if convenient.</p></div><p> </p><p>If you are using Eclipse to launch your application, you can set this property
+          in the VM arguments section of the Arguments tab of the run configuration screen. If
+          you've set an environment variable UIMA_HOME, you could for example, use the
+          string:
+          <code class="literal">"-Djava.util.logging.config.file=${env_var:UIMA_HOME}/config/Logger.properties".</code>
+          </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.setting_logging_levels"></a>1.2.2.2.&nbsp;Setting Logging Levels</h4></div></div></div><p>Within the logging control file, the default global logging level specifies
+          which kinds of events are logged across all loggers. For any given facility this
+          global level can be overridden by a facility specific level. Multiple handlers are
+          supported. This allows messages to be directed to a log file, as well as to a
+          &#8220;<span class="quote">console</span>&#8221;. Note that the ConsoleHandler also has a separate level
+          setting to limit messages printed to the console. For example: <code class="literal">.level=
+          INFO</code> </p><p>The properties file can change where the log is written, as well.</p><p>Facility specific properties allow different logging for each class, as
+          well. For example, to set the com.xyz.foo logger to only log SEVERE messages:
+          <code class="literal">com.xyz.foo.level = SEVERE</code></p><p>If you have a sample annotator in the package
+          <code class="literal">org.apache.uima.SampleAnnotator</code> you can set the log level
+          by specifying: <code class="literal">org.apache.uima.SampleAnnotator.level =
+          ALL</code></p><p>There are other logging controls; for a full discussion, please read the
+          contents of the <code class="literal">Logger.properties</code> file and the Java
+          specification for logging in Java 1.4.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.output_format"></a>1.2.2.3.&nbsp;Format of logging output</h4></div></div></div><p>The logging output is formatted by handlers specified in the properties file
+          for configuring logging, described above. The default formatter that comes with
+          the UIMA SDK formats logging output as follows:</p><p><code class="literal">Timestamp - threadID: sourceInfo: Message level:
+          message</code></p><p> Here's an example:</p><p><code class="literal">7/12/04 2:15:35 PM - 10:
+          org.apache.uima.util.TestClass.main(62): INFO: You are not logged
+          in!</code></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.meaning_of_severity_levels"></a>1.2.2.4.&nbsp;Meaning of the logging severity levels</h4></div></div></div><p>These levels are defined by the Java logging framework, which was
+          incorporated into Java as of the 1.4 release level. The levels are defined in the
+          JavaDocs for java.util.logging.Level, and include both logging and tracing
+          levels:
+          </p><div class="itemizedlist"><ul type="disc" compact><li><p>OFF is a special level that can be used to turn off
+              logging.</p></li><li><p>ALL indicates that all messages should be logged. </p></li><li><p>CONFIG is a message level for configuration messages. These
+              would typically occur once (during configuration) in methods like
+              <code class="literal">initialize()</code>. </p></li><li><p>INFO is a message level for informational messages, for
+              example, connected to server IP: 192.168.120.12 </p></li><li><p>WARNING is a message level indicating a potential
+              problem.</p></li><li><p>SEVERE is a message level indicating a serious
+              failure.</p></li></ul></div><p> Tracing levels, typically used for debugging:
+          </p><div class="itemizedlist"><ul type="disc"><li><p>FINE is a message level providing tracing information,
+              typically at a collection level (messages occurring once per collection).
+              </p></li><li><p>FINER indicates a fairly detailed tracing message,
+              typically at a document level (once per document).</p></li><li><p>FINEST indicates a highly detailed tracing message. </p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ugr.tug.aae.logging.using_outside_of_an_annotator"></a>1.2.2.5.&nbsp;Using the logger outside of an annotator</h4></div></div></div><p>An application using UIMA may want to log its messages using the same logging
+          framework. This can be done by getting a reference to the UIMA logger, as follows:
+          
+          
+          </p><pre class="programlisting">Logger logger = UIMAFramework.getLogger(TestClass.class);</pre><p>
+          </p><p>The optional class argument allows filtering by class (if the log handler
+          supports this). If not specified, the name of the returned logger instance is
+          &#8220;<span class="quote">org.apache.uima</span>&#8221;.</p></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="ugr.tug.aae.building_aggregates"></a>1.3.&nbsp;Building Aggregate Analysis Engines</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.combining_annotators"></a>1.3.1.&nbsp;Combining Annotators</h3></div></div></div><p>The UIMA SDK makes it very easy to combine any sequence of Analysis Engines to
+        form an <span class="emphasis"><em>Aggregate Analysis Engine</em></span>. This is done through an
+        XML descriptor; no Java code is required!</p><p>If you go to the <code class="literal">examples/descriptors/tutorial/ex3</code>
+        folder (in Eclipse, it's in your uimaj-examples project, under the
+        <code class="literal">descriptors/tutorial/ex3</code> folder), you will find a
+        descriptor for a TutorialDateTime annotator. This annotator detects dates and
+        times (and also sentences and words). To see what this annotator can do, try it out
+        using the Document Analyzer. If you are curious as to how this annotator works, the
+        source code is included, but it is not necessary to understand the code at this
+        time.</p><p>We are going to combine the TutorialDateTime annotator with the
+        RoomNumberAnnotator to create an aggregate Analysis Engine. This is illustrated
+        in the following figure:
+        
+        </p><div class="figure"><a name="ugr.tug.aae.fig.combining_annotators"></a><div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="560"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image024.png" width="560" alt="Combining Annotators to form an Aggregate Analysis Engine"></td></tr></table></div></div><p class="title"><b>Figure&nbsp;1.1.&nbsp;Combining Annotators to form an Aggregate Analysis Engine</b></p></div><p><br class="figure-break"> </p><p>The descriptor that does this is named
+        <code class="literal">RoomNumberAndDateTime.xml</code>, which you can open in the
+        Component Descriptor Editor plug-in. This is in the uimaj-examples project in the
+        folder <code class="literal">descriptors/tutorial/ex3</code>. </p><p>The &#8220;<span class="quote">Aggregate</span>&#8221; page of the Component Descriptor Editor is
+        used to define which components make up the aggregate. A screen shot is shown below.
+        (If you are not using Eclipse, see <a href="#ugr.tug.aae.xml_intro_ae_descriptor" title="1.8.&nbsp;Introduction to Analysis Engine Descriptor XML Syntax">Section&nbsp;1.8, &#8220;Analysis Engine XML Descriptor&#8221;</a> for the actual XML syntax
+        for Aggregate Analysis Engine Descriptors.)</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="575"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image026.jpg" width="575" alt="Aggregate page of the Component Descriptor Editor (CDE)"></td></tr></table></div></div><p>On the left side of the screen is the list of component engines that make up the
+        aggregate &#8211; in this case, the TutorialDateTime annotator and the
+        RoomNumberAnnotator. To add a component, you can click the &#8220;<span class="quote">Add</span>&#8221;
+        button and browse to its descriptor. You can also click the &#8220;<span class="quote">Find AE</span>&#8221;
+        button and search for an Analysis Engine in your Eclipse workspace.
+        </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The &#8220;<span class="quote">AddRemote</span>&#8221; button is used for adding components
+        which run remotely (for example, on another machine using a remote networking
+        connection). This capability is described in section <a href="tutorials_and_users_guides.html#ugr.tug.application.how_to_call_a_uima_service" class="olink">Section&nbsp;3.6.3, &#8220;Calling a UIMA Service&#8221;</a>,</p></div><p> </p><p>The order of the components in the left pane does not imply an order of
+        execution. The order of execution, or &#8220;<span class="quote">flow</span>&#8221; is determined in the
+        &#8220;<span class="quote">Component Engine Flow</span>&#8221; section on the right. UIMA supports
+        different types of algorithms (including user-definable) for determining the
+        flow. Here we pick the simplest: <code class="literal">FixedFlow</code>. We have chosen to
+        have the RoomNumberAnnotator execute first, although in this case it
+        doesn't really matter, since the RoomNumber and DateTime annotators do not
+        have any dependencies on one another.</p><p>If you look at the &#8220;<span class="quote">Type System</span>&#8221; page of the Component
+        Descriptor Editor, you will see that it displays the type system but is not
+        editable. The Type System of an Aggregate Analysis Engine is automatically
+        computed by merging the Type Systems of all of its components.</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>If the components have different definitions for the same type name,
+        The Component Descriptor Editor will show a warning.  It is possible to continue past
+        this warning, in which case your aggregate's type system will have the correct
+        &#8220;<span class="quote">merged</span>&#8221;
+        type definition that contains all of the features defined on that type by all of your
+        components.  However, it is not recommended to use this feature in conjunction with JCAS,
+        since the JCAS Java Class definitions cannot be so easily merged.  See
+        <a href="../references/references.html#ugr.ref.jcas.merging_types_from_other_specs" class="olink">Section&nbsp;5.5, &#8220;Merging Types&#8221;</a> in <span class="olinkdocname">UIMA References</span> for more information.
+      </p></div><p>The Capabilities page is where you explicitly declare the aggregate Analysis
+        Engine's inputs and outputs. Sofas and Languages are described later.
+        
+          
+          </p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="565"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image028.jpg" width="565" alt="Screen shot of the Capabilities page of the Component Descriptor Editor"></td></tr></table></div></div><p>
+          </p><p>Note that it is not automatically assumed that all outputs of each component
+          Analysis Engine (AE) are passed through as outputs of the aggregate AE. In this
+          case, for example, we have decided to suppress the Word and Sentence annotations
+          that are produced by the TutorialDateTime annotator.</p><p>You can run this AE using the Document Analyzer in the same way that you run any
+          other AE. Just select the <code class="literal">examples/descriptors/tutorial/ex3/
+          RoomNumberAndDateTime.xml</code> descriptor and click the Run button. You
+          should see that RoomNumbers, Dates, and Times are all shown but that Words and
+          Sentences are not:</p><div class="screenshot"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="525"><tr><td><img src="../images/tutorials_and_users_guides/tug.aae/image030.jpg" width="525" alt="Screen shot results of running the Document Analyzer"></td></tr></table></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.aaes_can_contain_cas_consumers"></a>1.3.2.&nbsp;AEs can also contain CAS Consumers</h3></div></div></div><p>In addition to aggregating Analysis Engines, Aggregates can also contain CAS
+        Consumers (see <a href="tutorials_and_users_guides.html#ugr.tug.cpe" class="olink">Chapter&nbsp;2, Collection Processing Engine Developer's Guide
+      </a>, or even a mixture of these components with regular
+        Analysis Engines. The UIMA Examples has an example of an Aggregate which contains
+        both an analysis engine and a CAS consumer, in
+        <code class="literal">examples/descriptors/MixedAggregate.xml.</code></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ugr.tug.aae.reading_results_previous_annotators"></a>1.3.3.&nbsp;Reading the Results of Previous Annotators</h3></div></div></div><p>So far, we have been looking at annotators that look directly at the document text. However, annotators
+        can also use the results of other annotators. One useful thing we can do at this point is look for the
+        co-occurrence of a Date, a RoomNumber, and two Times &#8211; and annotate that as a Meeting.</p><p>The CAS maintains <span class="emphasis"><em>indexes</em></span> of annotations, and from an index you can obtain an
+        iterator that allows you to step through all annotations of a particular type. Here's some example code
+        that would iterate over all of the TimeAnnot annotations in the JCas:
+        
+        
+        </p><pre class="programlisting">FSIndex timeIndex = aJCas.getAnnotationIndex(TimeAnnot.type);
+Iterator timeIter = timeIndex.iterator();   
+while (timeIter.hasNext()) {
+  TimeAnnot time = (TimeAnnot)timeIter.next();
+
+  //do something
+}</pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>You can also use the method
+        <code class="literal">JCAS.getJFSIndexRepository().getAllIndexedFS(YourClass.type)</code>, which returns an iterator

[... 3539 lines stripped ...]