You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2010/05/06 16:00:16 UTC

svn commit: r941736 [2/3] - in /uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup: ./ src/ src/docbook/ src/docbook/images/ src/docbook/images/overview-and-setup/ src/docbook/images/overview-and-setup/conceptual_overview_files/ src/docbook...

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/faqs.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/faqs.xml?rev=941736&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/faqs.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/faqs.xml Thu May  6 14:00:16 2010
@@ -0,0 +1,411 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.faqs">
+  <title>UIMA Frequently Asked Questions (FAQ&apos;s)</title>
+  <titleabbrev>UIMA FAQ&apos;s</titleabbrev>
+
+  <variablelist>
+    <varlistentry id="ugr.faqs.what_is_uima">
+    <term><emphasis role="bold">What is UIMA?</emphasis></term>
+        <listitem><para>UIMA stands for Unstructured Information Management
+          Architecture. It is component software architecture for the development,
+          discovery, composition and deployment of multi-modal analytics for the analysis
+          of unstructured information.</para>
+          <para>UIMA processing occurs through a series of modules called 
+            <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link>. The result of analysis is an assignment of semantics to the elements of
+            unstructured data, for example, the indication that the phrase
+            <quote>Washington</quote> refers to a person&apos;s name or that it refers to a
+            place.</para>
+          
+          <para>Analysis Engine&apos;s output can be saved in conventional structures,
+            for example, relational databases or search engine indices, where the content
+            of the original unstructured information may be efficiently accessed
+            according to its inferred semantics. </para>
+          
+          <para>UIMA supports developers in creating,
+            integrating, and deploying components across platforms and among dispersed
+            teams working to develop unstructured information management
+            applications.</para>
+        </listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.pronounce">
+        <term><emphasis role="bold">How do you pronounce UIMA?</emphasis></term>
+        <listitem><para>You &ndash; eee &ndash; muh. 
+        <!-- Or, in IPA notation, /juːiːmə/ (which does not
+        display correctly in our PDF documentation, so it's commented out). --></para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.difference_apache_uima">
+        <term><emphasis role="bold">What&apos;s the difference between UIMA and the Apache UIMA?</emphasis></term>
+        <listitem><para>UIMA is an architecture which specifies component interfaces,
+          design patterns, data representations and development roles.</para>
+          
+          <para>Apache UIMA is an open source, Apache-licensed software project,
+            currently undergoing incubation at Apache.org.  It includes run-time
+            frameworks in Java and C++, APIs and tools for implementing, composing, packaging
+            and deploying UIMA components.</para>
+          
+          <para>The UIMA run-time framework allows developers to plug-in their components
+            and applications and run them on different platforms and according to different
+            deployment options that range from tightly-coupled (running in the same
+            process space) to loosely-coupled (distributed across different processes or
+            machines for greater scale, flexibility and recoverability).</para>
+        </listitem>
+      </varlistentry>
+     
+      <varlistentry id="ugr.faqs.include_semantic_search">
+        <term><emphasis role="bold">
+          Does UIMA include a semantic search engine?
+        </emphasis></term>
+        <listitem><para>
+          The Apache UIMA project does not itself include a semantic search engine.  
+          It can interface with the semantic search engine
+            component (available from <ulink
+              url="www.alphaworks.ibm.com/tech/uima"/> for indexing and querying over
+            the results of analysis. Over time, we expect that additional search engines will
+            add support for semantic searching.
+          </para>
+        </listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.what_is_an_annotation">
+        
+        <term><emphasis role="bold">What is an Annotation?</emphasis></term>
+        <listitem><para>An annotation is metadata that is associated with a region of a
+          document. It often is a label, typically represented as string of characters. The
+          region may be the whole document. </para>
+          
+          <para>An example is the label <quote>Person</quote> associated with the span of
+            text <quote>George Washington</quote>. We say that <quote>Person</quote>
+            annotates <quote>George Washington</quote> in the sentence <quote>George
+            Washington was the first president of the United States</quote>. The
+            association of the label
+            <quote>Person</quote> with a particular span of text is an annotation. Another
+            example may have an annotation represent a topic, like <quote>American
+            Presidents</quote> and be used to label an entire document.</para>
+          
+          <para>Annotations are not limited to regions of texts. An annotation may annotate
+            a region of an image or a segment of audio. The same concepts apply.</para>
+        </listitem>
+      </varlistentry>
+ 
+  
+      <varlistentry id="ugr.faqs.what_is_the_cas">
+        <term><emphasis role="bold">What is the CAS?</emphasis></term>
+        <listitem><para>The CAS stands for Common Analysis Structure. It provides
+          cooperating UIMA components with a common representation and mechanism for
+          shared access to the artifact being analyzed (e.g., a document, audio file, video
+          stream etc.) and the current analysis results.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.what_does_the_cas_contain">
+        <term><emphasis role="bold">What does the CAS contain?</emphasis></term>
+        <listitem><para>The CAS is a data structure for which UIMA provides multiple
+          interfaces. It contains and provides the analysis algorithm or application
+          developer with access to</para>
+          
+          <itemizedlist spacing="compact">
+            
+            <listitem><para>the subject of analysis (the artifact being analyzed, like
+              the document),</para></listitem>
+            
+            <listitem><para>the analysis results or metadata(e.g., annotations, parse
+              trees, relations, entities etc.),</para></listitem>
+            
+            <listitem><para>indices to the analysis results, and</para></listitem>
+            
+            <listitem><para>the type system (a schema for the analysis results).</para>
+            </listitem>
+          </itemizedlist>
+          
+          <para>A CAS can hold multiple versions of the artifact being analyzed (for
+            instance, a raw html document, and a detagged version, or an English version and a
+            corresponding German version, or an audio sample, and the text that
+            corresponds, etc.). For each version there is a separate instance of the results
+            indices.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.only_annotations">
+        <term><emphasis role="bold">Does the CAS only contain Annotations?</emphasis></term>
+        <listitem><para>No. The CAS contains the artifact being analyzed plus the analysis
+          results. Analysis results are those metadata recorded by <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> in the
+          CAS. The most common form of analysis result is the addition of an annotation. But an
+          analysis engine may write any structure that conforms to the CAS&apos;s type
+          system into the CAS. These may not be annotations but may be other things, for
+          example links between annotations and properties of objects associated with
+          annotations.</para>
+          <para>The CAS may have multiple representations of the artifact being analyzed, each one
+            represented in the CAS as a particular Subject of Analysis. or <link linkend="ugr.faqs.what_is_a_sofa">Sofa</link></para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.just_xml">
+        <term><emphasis role="bold">Is the CAS just XML?</emphasis></term>
+        <listitem><para>No, in fact there are many possible representations of the CAS. If all
+          of the <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> are running in the same process, an efficient, in-memory
+          data object is used. If a CAS must be sent to an analysis engine on a remote machine, it
+          can be done via an XML or a binary serialization of the CAS. </para>
+          
+          <para>The UIMA framework provides serialization and de-serialization methods
+            for a particular XML representation of the CAS named the XMI.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.what_is_a_type_system">
+        <term><emphasis role="bold">What is a Type System?</emphasis></term>
+        <listitem><para>Think of a type system as a schema or class model for the <link linkend="ugr.faqs.what_is_the_cas">CAS</link>. It defines
+          the types of objects and their properties (or features) that may be instantiated in
+          a CAS. A specific CAS conforms to a particular type system. UIMA components declare
+          their input and output with respect to a type system. </para>
+          
+          <para>Type Systems include the definitions of types, their properties, range
+            types (these can restrict the value of properties to other types) and
+            single-inheritance hierarchy of types.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.what_is_a_sofa">
+        <term><emphasis role="bold">What is a Sofa?</emphasis></term>
+        <listitem><para>Sofa stands for &ldquo;Subject of Analysis&quot;. A <link linkend="ugr.faqs.what_is_the_cas">CAS</link> is
+          associated with a single artifact being analysed by a collection of UIMA analysis
+          engines. But a single artifact may have multiple independent views, each of which
+          may be analyzed separately by a different set of <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link>. For example,
+          given a document it may have different translations, each of which are associated
+          with the original document but each potentially analyzed by different engines. A
+          CAS may have multiple Views, each containing a different Subject of Analysis
+          corresponding to some version of the original artifact. This feature is ideal for
+          multi-modal analysis, where for example, one view of a video stream may be the video
+          frames and the other the close-captions.</para></listitem>
+      </varlistentry>
+
+      
+      <varlistentry id="ugr.faqs.annotator_versus_ae">
+        <term><emphasis role="bold">What's the difference between an Annotator and an Analysis
+          Engine?</emphasis></term>
+        <listitem><para>In the terminology of UIMA, an annotator is simply some code that
+          analyzes documents and outputs <link linkend="ugr.faqs.what_is_an_annotation">annotations</link> on the content of the documents. The
+          UIMA framework takes the annotator, together with metadata describing such
+          things as the input requirements and outputs types of the annotator, and produces
+          an analysis engine. </para>
+          
+          <para>Analysis Engines contain the framework-provided infrastructure that
+            allows them to be easily combined with other analysis engines in different flows
+            and according to different deployment options (collocated or as web services,
+            for example). </para>
+          
+          <para>Analysis Engines are the framework-generated objects that an Application
+            interacts with. An Annotator is a user-written class that implements the one of
+            the supported Annotator interfaces.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.web_services">
+        <term><emphasis role="bold">Are UIMA analysis engines web services?</emphasis></term>
+        <listitem><para>They can be deployed as such. Deploying an analysis engine as a web
+          service is one of the deployment options supported by the UIMA framework.</para>
+        </listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.stateless_aes">
+        <term><emphasis role="bold">Do Analysis Engines have to be
+          &quot;stateless&quot;?</emphasis></term>
+        <listitem><para>This is a user-specifyable option. The XML metadata for the
+          component includes an
+          <code>operationalProperties</code> element which can specify if multiple
+          deployment is allowed. If true, then a particular instance of an Engine might not
+          see all the CASes being processed. If false, then that component will see all of the
+          CASes being processed. In this case, it can accumulate state information among all
+          the CASes. Typically, Analysis Engines in the main analysis pipeline are marked
+          multipleDeploymentAllowed = true. The CAS Consumer component, on the other hand,
+          defaults to having this property set to false, and is typically associated with
+          some resource like a database or search engine that aggregates analysis results
+          across an entire collection.</para>
+          
+          <para>Analysis Engines developers are encouraged not to maintain state between
+            documents that would prevent their engine from working as advertised if
+            operated in a parallelized environment.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.uddi">
+        <term><emphasis role="bold">Is engine meta-data compatible with web services and
+          UDDI?</emphasis></term>
+        <listitem><para>All UIMA component implementations are associated with Component
+          Descriptors which represents metadata describing various properties about the
+          component to support discovery, reuse, validation, automatic composition and
+          development tooling. In principle, UIMA component descriptors are compatible
+          with web services and UDDI. However, the UIMA framework currently uses its own XML
+          representation for component metadata. It would not be difficult to convert
+          between UIMA&apos;s XML representation and the WSDL and UDDI standards.</para>
+        </listitem>
+      </varlistentry>
+
+      
+      <varlistentry id="ugr.faqs.scaling">
+        <term><emphasis role="bold">How do you scale a UIMA application?</emphasis></term>
+        <listitem><para>The UIMA framework allows components such as <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> and
+          CAS Consumers to be easily deployed as services or in other containers and managed
+          by systems middleware designed to scale. UIMA applications tend to naturally
+          scale-out across documents allowing many documents to be analyzed in
+          parallel.</para>
+          <para>A component in the UIMA framework called the CPM (Collection Processing
+            Manager) has a host of features and configuration settings for scaling an
+            application to increase its throughput and recoverability.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.embedding">
+        <term><emphasis role="bold">What does it mean to embed UIMA in systems middleware?</emphasis></term>
+        <listitem><para>An example of an embedding would be the deployment of a UIMA analysis
+          engine as an Enterprise Java Bean inside an application server such as IBM
+          WebSphere. Such an embedding allows the deployer to take advantage of the features
+          and tools provided by WebSphere for achieving scalability, service management,
+          recoverability etc. UIMA is independent of any particular systems middleware, so
+          <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> could be deployed on other application servers as well.</para>
+        </listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.cpm_versus_cpe">
+        <term><emphasis role="bold">How is the CPM different from a CPE?</emphasis></term>
+        <listitem><para>These name complimentary aspects of collection processing. The CPM
+          (Collection Processing <emphasis role="bold">Manager</emphasis> is the part of 
+          the UIMA framework that manages the execution of a workflow of UIMA
+          components orchestrated to analyze a large collection of documents. The UIMA
+          developer does not implement or describe a CPM. It is a piece of infrastructure code
+          that handles CAS transport, instance management, batching, check-pointing,
+          statistics collection and failure recovery in the execution of a collection
+          processing workflow.</para>
+          
+          <para>A Collection Processing Engine (CPE) is component created by the framework
+            from a specific CPE descriptor. A CPE descriptor refers to a series of UIMA
+            components including a Collection Reader, CAS Initializer, Analysis
+            Engine(s) and CAS Consumers. These components are organized in a work flow and
+            define a collection analysis job or CPE. A CPE acquires documents from a source
+            collection, initializes CASs with document content, performs document
+            analysis and then produces collection level results (e.g., search engine
+            index, database etc). The CPM is the execution engine for a CPE.</para>
+        </listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.semantic_search">
+        <term><emphasis role="bold">What is Semantic Search and what is its relationship to
+          UIMA?</emphasis></term>
+        <listitem><para>Semantic Search refers to a document search paradigm that allows
+          users to search based not just on the keywords contained in the documents, but also
+          on the semantics associated with the text by <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link>. UIMA applications
+          perform analysis on text documents and generate semantics in the form of
+          <link linkend="ugr.faqs.what_is_an_annotation">annotations</link> on regions of text. For example, a UIMA analysis engine may discover
+          the text <quote>First Financial Bank</quote> to refer to an organization and
+          annotated it as such. With traditional keyword search, the query
+          <command>first</command> will return all documents that contain that word.
+          <command>First</command> is a frequent and ambiguous term &ndash; it occurs a lot
+          and can mean different things in different places. If the user is looking for
+          organizations that contain that word <command>first</command> in their names,
+          s/he will likely have to sift through lots of documents containing the word
+          <quote>first</quote> used in different ways. Semantic Search exploits the
+          results of analysis to allow more precise queries. For example, the semantic
+          search query <emphasis>&lt;organization&gt; first
+          &lt;/organization&gt;</emphasis> will rank first documents that contain the
+          word <quote>first</quote> as part of the name of an organization. The UIMA SDK
+          documentation demonstrates how UIMA applications can be built using semantic
+          search. It provides details about the XML Fragment Query language. This is the
+          particular query language used by the semantic search engine that comes with the
+          SDK.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.xml_fragment_not_xml">
+        <term><emphasis role="bold">Is an XML Fragment Query valid XML?</emphasis></term>
+        <listitem><para>Not necessarily. The XML Fragment Query syntax is used to formulate
+          queries interpreted by the semantic search engine that ships with the UIMA SDK.
+          This query language relies on basic XML syntax as an intuitive way to describe
+          hierarchical patterns of annotations that may occur in a <link linkend="ugr.faqs.what_is_the_cas">CAS</link>. The language
+          deviates from valid XML in order to support queries over
+          <quote>overlapping</quote> or <quote>cross-over</quote> annotations and
+          other features that affect the interpretation of the query by the query processor.
+          For example, it admits notations in the query to indicate whether a keyword or an
+          annotation is optional or required to match a document.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.modalities_other_than_text">
+        <term><emphasis role="bold">Does UIMA support modalities other than text?</emphasis></term>
+        <listitem><para>The UIMA architecture supports the development, discovery,
+          composition and deployment of multi-modal analytics including text, audio and
+          video. Applications that process text, speech and video have been developed using
+          UIMA. This release of the SDK, however, does not include examples of these
+          multi-modal applications. </para>
+          
+          <para>It does however include documentation and programming examples for using
+            the key feature required for building multi-modal applications. UIMA supports
+            multiple subjects of analysis or <link linkend="ugr.faqs.what_is_a_sofa">Sofas</link>. These allow multiple views of a single
+            artifact to be associated with a <link linkend="ugr.faqs.what_is_the_cas">CAS</link>. For example, if an artifact is a video
+            stream, one Sofa could be associated with the video frames and another with the
+            closed-captions text. UIMA&apos;s multiple Sofa feature is included and
+            described in this release of the SDK.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.compare">
+        <term><emphasis role="bold">How does UIMA compare to other similar work?</emphasis></term>
+        <listitem><para>A number of different frameworks for NLP have preceded UIMA. Two of
+          them were developed at IBM Research and represent UIMA&apos;s early roots. For
+          details please refer to the UIMA article that appears in the IBM Systems Journal
+          Vol. 43, No. 3 (<ulink
+            url="http://www.research.ibm.com/journal/sj/433/ferrucci.html"/>
+          ).</para>
+          
+          <para>UIMA has advanced that state of the art along a number of dimensions
+            including: support for distributed deployments in different middleware
+            environments, easy framework embedding in different software product
+            platforms (key for commercial applications), broader architectural converge
+            with its collection processing architecture, support for
+            multiple-modalities, support for efficient integration across programming
+            languages, support for a modern software engineering discipline calling out
+            different roles in the use of UIMA to develop applications, the extensive use of
+            descriptive component metadata to support development tooling, component
+            discovery and composition. (Please note that not all of these features are
+            available in this release of the SDK.)</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.open_source">
+        <term><emphasis role="bold">Is UIMA Open Source?</emphasis></term>
+        <listitem><para>Yes. As of version 2, UIMA development has moved to Apache and is being
+          developed within the Apache open source processes. It is licensed under the Apache
+          version 2 license. Previous versions are available on the IBM alphaWorks site (
+            <ulink url="http://www.alphaworks.ibm.com/tech/uima"/>) and the source
+          code for previous version of the UIMA framework is available on SourceForge (
+            <ulink url="http://uima-framework.sourceforge.net/"/>).</para>
+        </listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.levels_required">
+        <term><emphasis role="bold">What Java level and OS are required for the UIMA SDK?</emphasis></term>
+        <listitem><para>As of release 2.2.1, the UIMA SDK requires a Java 1.5 level (or later).  Releases prior to 2.2.1
+          require as a minimum the Java 1.4 level; they will not run on 1.3 (or earlier levels). 
+          The release has been tested with Java 5 and 6. 
+          It has been tested on mainly on Windows XP and Linux Intel 32bit platforms, with some
+          testing on the MacOSX. Other
+          platforms and JDK implementations will likely work, but have
+          not been as significantly tested.</para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.building_apps_on_top_of_uima">
+        <term><emphasis role="bold">Can I build my UIM application on top of UIMA?</emphasis></term>
+        <listitem><para>Yes. Apache UIMA is licensed under the Apache version 2 license,
+          enabling you to build and distribute applications which include the framework.
+          </para></listitem>
+      </varlistentry>
+      <varlistentry id="ugr.faqs.commercial_products">
+        <term><emphasis role="bold">Do any commercial products support the UIMA framework or include
+          it as part of their product?</emphasis></term>
+        <listitem><para>Yes. IBM's WebSphere Information Integration Omnifind Edition
+          product (<ulink
+            url="http://www.ibm.com/developerworks/db2/zones/db2ii"/> or <ulink
+            url="http://www-306.ibm.com/software/data/integration/db2ii/editions_womnifind.html"/>
+          ) has UIMA <quote>inside</quote> and supports adding UIMA annotators to the
+          processing pipeline. We are actively seeking other product embeddings. </para>
+        </listitem>
+      </varlistentry>
+      <!--
+      <varlistentry>
+      <term><emphasis role="bold"></emphasis></term>
+      <listitem><para></para></listitem>
+      </varlistentry>
+      -->
+ </variablelist>
+</chapter>

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/glossary.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/glossary.xml?rev=941736&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/glossary.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/glossary.xml Thu May  6 14:00:16 2010
@@ -0,0 +1,559 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<glossary id="ugr.glossary">
+  <title>Glossary: Key Terms &amp; Concepts</title>
+  <titleabbrev>Glossary</titleabbrev>
+ <!-- 
+  <para></para>
+  <glossary id="ugr.glossary.glossary">
+   -->
+    <!--
+    <glossentry id="ugr.glossary.">
+      <glossterm></glossterm>
+      <glosssee otherterm="ugr.glossary."></glosssee>
+      <glossdef>
+        <para></para>
+        <glossseealso otherterm="ugr.glossary."/>
+      </glossdef>
+    </glossentry>
+      -->
+       <glossentry id="ugr.glossary.aggregate">
+      <glossterm>Aggregate &ae;</glossterm>
+      <glossdef>
+        <para>An <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>
+ made up of multiple subcomponent
+&ae;s arranged in a flow.  The
+flow can be one of the two built-in flows, or a custom flow provided by the user.</para>
+      </glossdef>
+    </glossentry> 
+    
+  <glossentry id="ugr.glossary.analysis_engine">
+    <glossterm>&ae;</glossterm>
+    <glossdef><para>A program that analyzes artifacts (e.g. documents) and infers information about
+them, and which implements the UIMA &ae; interface Specification. It
+does not matter how the program is built, with what framework or whether or not
+it contains component (<quote>sub</quote>) &ae;s.</para>
+    </glossdef>
+  </glossentry>
+
+  
+    <glossentry id="ugr.glossary.annotation">
+      <glossterm>Annotation</glossterm>
+      <glossdef>
+        <para>The association of a metadata, such as a label, with a region of text (or other
+type of artifact). For example, the label <quote>Person</quote> associated with a
+region of text <quote>John Doe</quote> constitutes an annotation. We say
+<quote>Person</quote> annotates the span of text from X to Y containing exactly
+<quote>John Doe</quote>. An annotation is represented as a special
+          <glossterm linkend="ugr.glossary.type">type</glossterm> 
+
+in a UIMA <glossterm linkend="ugr.glossary.type_system">type system</glossterm>.
+           It is the type used to record
+the labeling of regions of a <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm></para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.annotator">
+      <glossterm>Annotator</glossterm>
+      <glossdef>
+        <para>A software
+component that implements the UIMA annotator interface. Annotators are
+implemented to produce and record annotations over regions of an artifact
+(e.g., text document, audio, and video).</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.application">
+      <glossterm>Application</glossterm>
+      <glossdef>
+        <para>An application is the outer containing code that invokes
+        the UIMA framework functions to instantiate an 
+        <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm> or a
+        <glossterm linkend="ugr.glossary.cpe">Collection Processing Engine</glossterm> from a particular 
+        descriptor, and run it.</para>
+      </glossdef>
+    </glossentry>
+
+      <glossentry id="ugr.glossary.apache_uima_java_framework">
+      <glossterm>Apache UIMA Java Framework</glossterm>
+      <glossdef>
+        <para>A Java-based implementation of the <glossterm linkend="ugr.glossary.uima">UIMA</glossterm>
+         architecture.  It provides a run-time environment in which developers can plug in and run their UIMA component 
+         implementations and with which they can build and deploy UIM applications.  The framework is the
+         core part of the <glossterm linkend="ugr.glossary.apache_uima_sdk">Apache UIMA SDK</glossterm>.</para>
+      </glossdef>
+    </glossentry>
+
+    <glossentry id="ugr.glossary.apache_uima_sdk">
+      <glossterm>Apache UIMA Software Development Kit (SDK)</glossterm>
+      <glossdef>
+        <para>The SDK for which you are now reading the documentation.  The SDK includes the framework
+          plus additional components such as tooling and examples.  Some of the tooling is Eclipse-based 
+          (<ulink url="http://www.eclipse.org/"/>). The Apache UIMA SDK is being developed at the 
+          <ulink url="http://incubator.apache.org">Apache Incuabator</ulink>.</para>
+      </glossdef>
+    </glossentry>
+    
+      <glossentry id="ugr.glossary.cas">
+      <glossterm>CAS</glossterm>
+      <glossdef>
+        <para>The UIMA Common Analysis Structure is
+the primary data structure which UIMA analysis components use to represent and
+share analysis results.  It contains:</para>
+
+<itemizedlist><listitem><para>The artifact. This is the object
+being analyzed such as a text document or audio or video stream. The CAS
+projects one or more views of the artifact. Each view is referred to as a 
+  <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.</para></listitem>
+
+
+<listitem><para>A type system description &ndash;
+indicating the types, subtypes, and their features. </para></listitem>
+
+
+<listitem><para>Analysis metadata &ndash; <quote>standoff</quote>
+annotations describing the artifact or a region of the artifact </para></listitem>
+
+
+<listitem><para>An index repository to support
+efficient access to and iteration over the results of analysis.
+</para></listitem></itemizedlist>
+
+<para>UIMA&apos;s primary interface to this structure is provided by
+a class called the Common Analysis System. We use <quote>CAS</quote> to refer to
+both the structure and system. Where the common analysis structure is used
+through a different interface, the particular implementation of the structure
+is indicated, For example, the <glossterm linkend="ugr.glossary.jcas">JCas</glossterm> is a native Java object
+representation of the contents of the common analysis structure.</para>
+
+<para>A CAS can have multiple views; each view has a unique
+representation of the artifact, and has its own index repository, representing
+results of analysis for that representation of the artifact.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.cas_consumer">
+      <glossterm>CAS Consumer</glossterm>
+      <glossdef>
+        <para>A component that
+receives each CAS in the collection, usually after it has been processed by an 
+          <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>. It is responsible for taking the results from
+the CAS and using them for some purpose, perhaps storing selected results into
+a database, for instance.  The CAS
+Consumer may also perform collection-level analysis, saving these results in an
+application-specific, aggregate data structure.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.cas_initializer">
+      <glossterm>CAS Initializer (deprecated)</glossterm>
+      <glossdef>
+        <para>Prior to version 2, this was the component that took an 
+          undefined input form and produced a particular <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.
+          For version 2, this has been replaced with using any <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>
+          which takes a particular <glossterm linkend="ugr.glossary.cas_view">CAS View</glossterm> and creates a
+          new output Sofa.  For example, if the document is HTML, an &ae; might 
+          create a Sofa which is a detagged version of an input CAS View, perhaps also
+creating annotations derived from the tags. For example &lt;p&gt; tags
+might be translated into Paragraph annotations in the CAS.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.cas_multiplier">
+      <glossterm>CAS Multiplier</glossterm>
+      <glossdef>
+        <para>A component, implemented by a UIMA developer,
+that takes a CAS as input and produces 0 or more new CASes as output.  Common use cases for a CAS Multiplier
+          include creating alternative versions of an input <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm> 
+          (see <glossterm linkend="ugr.glossary.cas_initializer">CAS Initializer</glossterm>), and breaking 
+          a large input CAS into smaller pieces, each of which is emitted as a
+separate output CAS.  There are other
+uses, however, such as aggregating input CASes into a single output CAS.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.cas_processor">
+      <glossterm>CAS Processor</glossterm>
+      <glossdef>
+        <para>A component of a Collection Processing Engine (CPE) that
+takes a CAS as input and returns a CAS as output. There are two types of CAS
+Processors: <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s and 
+          <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumer</glossterm>s.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.cas_view">
+      <glossterm>CAS View</glossterm>
+      <glossdef>
+        <para>A CAS Object which shares the base CAS and type system
+definition and index specifications, but has a unique index repository and a
+particular <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.   Views are named, and applications and
+annotators can dynamically create additional views whenever they are needed.
+Annotations are made with respect to one view.  Feature structures can have references to feature structures 
+          indexed in other views, as needed.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.cde">
+      <glossterm>CDE</glossterm>
+      <glossdef>
+        <para>The Component Descriptor Editor. This
+is the Eclipse tool that lets you conveniently edit the UIMA descriptors; 
+          see <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde"/>.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.cpe">
+      <glossterm>Collection Processing Engine (CPE)</glossterm>
+      <glossdef>
+        <para>Performs Collection Processing
+through the combination of a 
+          <glossterm linkend="ugr.glossary.collection_reader">Collection Reader</glossterm>,
+          0 or more <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s,
+ and zero or more <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumer</glossterm>s.
+The Collection Processing Manager (CPM) manages the execution of the engine.</para>
+        <para>The CPE also refers to the XML specification of the Collection Processing
+        engine.  The CPM reads a CPE specification and instantiates a CPE instance from it,
+        and runs it.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.cpm">
+      <glossterm>Collection Processing Manager (CPM)</glossterm>
+      <glossdef>
+        <para>The part of the framework that
+manages the execution of collection processing, routing CASs from the 
+          <glossterm linkend="ugr.glossary.collection_reader">Collection Reader</glossterm>
+          
+to 0 or more <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s
+and then to the 0 or more <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumer</glossterm>s. The CPM
+provides feedback such as performance statistics and error reporting and supports
+other features such as parallelization and error handling.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.collection_reader">
+      <glossterm>Collection Reader</glossterm>
+      <glossdef>
+        <para>A component
+that reads documents from some source, for example a file system or database.
+The collection reader initializes a CAS with this document.  
+          Each document is returned as a CAS that may then be processed by 
+          an <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s. If the task of populating a CAS
+from the document is complex, you may use an arbitrarily complex chain of 
+          <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s and have the last one
+          create and initialize a new <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.fact_search">
+      <glossterm>Fact Search</glossterm>
+      <glossdef>
+        <para>A search that given a fact pattern, returns facts
+extracted from a collection of documents by a set of &ae;s that
+match the fact pattern.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.feature">
+      <glossterm>Feature</glossterm>
+      <glossdef>
+        <para>A data member or attribute of a type.  Each feature itself has an
+associated range type, the type of the value that it can hold.  In the
+database analogy where types are tables, features are columns.
+        In the world of structured data types, each feature is a <quote>field</quote>,
+        or data member.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.flow_controller">
+      <glossterm>Flow Controller</glossterm>
+      <glossdef>
+        <para>A component which implements the interfaces needed
+to specify a custom flow within an <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.hybrid_analysis_engine">
+      <glossterm>Hybrid &ae;</glossterm>
+      <glossdef>
+        <para>An <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm> 
+          where more than one of its component &ae;s are deployed
+the same address space and one or more are deployed remotely (part tightly and
+part loosely-coupled).</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.index">
+      <glossterm>Index</glossterm>
+      <glossdef>
+        <para>Data in the CAS can only be retrieved using Indexes.  
+          Indexes are analogous to the indexes that are
+specified on tables of a database.  Indexes belong to Index Repositories;
+there is one Repository for each
+view of the CAS.  Indexes are specified
+to retrieve instances of some CAS Type (including its subtypes), and can be
+optionally sorted in a user-definable way. 
+          For example, all types derived from the UIMA
+built-in type <literal>uima.tcas.Annotation</literal> contain begin
+and end features, which mark the begin and end offsets in the text where this
+annotation occurs.  There is a built-in index of Annotations that specifies that
+annotations are retrieved sequentially by sorting first on the value of the begin 
+feature (ascending) and then by the value of the end feature (descending).
+In this case, iterating over the annotations, one first obtains annotations that 
+come sequentially first in the text, while favoring longer annotations, in the case
+where two annotations start at the same offset.  Users can define their own indexes
+as well.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.jcas">
+      <glossterm>JCas</glossterm>
+      <glossdef>
+        <para>A Java object interface to the contents of the CAS.  
+          This interface use additional generated Java classes, where each type in the CAS
+is represented as a Java class with the same name, each feature is represented with
+a getter and setter method, and each instance of a type is represented as a
+Java object of the corresponding Java class.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.keyword_search">
+      <glossterm>Keyword Search</glossterm>
+      <glossdef>
+        <para>The standard search method where one supplies words (or <quote>keywords</quote>)
+and candidate documents are returned.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.knowledge_base">
+      <glossterm>Knowledge Base</glossterm>
+      <glossdef>
+        <para>A collection of data that may be interpreted as a
+set of facts and rules considered true in a possible world.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.loosely_coupled_analysis_engine">
+      <glossterm>Loosely-Coupled &ae;</glossterm>
+      <glossdef>
+        <para>An <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>
+         where no two of its component &ae;s run in the
+same address space but where each is remote with respect to the others that
+make up the aggregate. Loosely coupled engines are ideal for using 
+          remote &ae; services that are
+not locally available, or for quickly assembling and testing functionality in
+cross-language, cross-platform distributed environments. They also better enable
+distributed scaleable implementations where quick recoverability may have a
+greater impact on overall throughput than analysis speed.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.ontology">
+      <glossterm></glossterm>
+      <glossdef>
+        <para>The part of a knowledge base that defines the semantics of the data
+axiomatically.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.pear">
+      <glossterm>PEAR</glossterm>
+      <glossdef>
+        <para>An archive file that packages up a UIMA component with its code,
+descriptor files and other resources required to install and run it in another
+environment. You can generate PEAR files using utilities that come with the
+UIMA SDK.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.primitive_analysis_engine">
+      <glossterm>Primitive &ae;</glossterm>
+      <glossdef>
+        <para>An <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm> 
+          that is composed of a single 
+          <glossterm linkend="ugr.glossary.annotator">Annotator</glossterm>; one that has
+no component (or <quote>sub</quote>) &ae;s inside of it; 
+contrast with
+          <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.semantic_search">
+      <glossterm>Semantic Search</glossterm>
+      <glossdef>
+        <para> search where the semantic intent of the query is
+specified using one or more entity or relation specifiers.  For example,
+one could specify that they are looking for a person (named) <quote>Bush.</quote>
+Such a query would then not return results about the kind of bushes that grow
+in your garden but rather just persons named Bush.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.structured_information">
+      <glossterm>Structured Information</glossterm>
+      <glossdef>
+        <para>Items stored in structured resources such as
+search engine indices, databases or knowledge bases. The canonical example of
+structured information is the database table. Each element of information in
+the database is associated with a precisely defined schema where each table
+column heading indicates its precise semantics, defining exactly how the
+information should be interpreted by a computer program or end-user.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.sofa">
+      <glossterm>Subject of Analysis (Sofa)</glossterm>
+      <glossdef>
+        <para>A piece of
+data (e.g., text document, image, audio segment, or video segment), which is intended
+for analysis by UIMA analysis components.  It belongs to a 
+          <glossterm linkend="ugr.glossary.cas_view">CAS View</glossterm> which has the same name; there
+          is a one-to-one correspondence between these.  There can be multiple Sofas contained within
+one CAS, each one representing a different view of the original artifact &ndash; for example,
+an audio file could be the original artifact, and also be one Sofa, and another
+could be the output of a voice-recognition component, where the Sofa would be
+the corresponding text document. Sofas may be analyzed independently or
+simultaneously; they all co-exist within the CAS.  </para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.tightly_coupled_analysis_engine">
+      <glossterm>Tightly-Coupled &ae;</glossterm>
+      <glossdef>
+        <para>An <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>
+ where all of its component &ae;s run in the same address space.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.type">
+      <glossterm>Type</glossterm>
+      <glossdef>
+        <para>A specification of an object in the
+          <glossterm linkend="ugr.glossary.cas">CAS</glossterm> used to store the results of
+analysis.  Types are defined using inheritance, so some types may be
+defined purely for the sake of defining other types, and are in this sense <quote>abstract
+types.</quote>  Types usually contain 
+          <glossterm linkend="ugr.glossary.feature">Feature</glossterm>s, which are attributes, or
+properties of the type.  A type is roughly equivalent to a class in an
+object oriented programming language, or a table in a database.  Instances of types in the CAS
+          may be indexed for retrieval.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.type_system">
+      <glossterm>Type System</glossterm>
+      <glossdef>
+        <para>A collection of related <glossterm linkend="ugr.glossary.type">types</glossterm>.
+          All components that can access the CAS,
+including <glossterm linkend="ugr.glossary.application">Applications</glossterm>,
+          <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s,
+          <glossterm linkend="ugr.glossary.collection_reader">Collection Readers</glossterm>,
+          <glossterm linkend="ugr.glossary.flow_controller">Flow Controllers</glossterm>, or
+          <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumers</glossterm>
+declare the type system that they use. Type systems are shared across &ae;s, allowing the outputs 
+          of one &ae; to be read as input by another &ae;.
+A type system is roughly analogous to a set of related classes in object
+oriented programming, or a set of related tables in a database.  The type
+system / type / feature terminology comes from computational linguistics.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.unstructured_information">
+      <glossterm>Unstructured Information</glossterm>
+      <glossdef>
+        <para>The canonical example of unstructured
+information is the natural language text document. The intended meaning of a
+document's content is only implicit and its precise interpretation by a
+computer program requires some degree of analysis to explicate the document's
+semantics. Other examples include audio, video and images. Contrast with
+<glossterm linkend="ugr.glossary.structured_information">Structured Information</glossterm>.
+        </para>          
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.uima">
+      <glossterm>UIMA</glossterm>
+      <glossdef>
+        <para>UIMA is an acronym that stands for Unstructured Information Management Architecture; 
+          it is a software architecture which specifies component interfaces, design patterns
+and development roles for creating, describing, discovering, composing and
+deploying multi-modal analysis capabilities.  The UIMA specification is being developed by a 
+        technical committee at <ulink url="http://www.oasis-open.org/committees/uima">OASIS</ulink>.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.uima_java_framework">
+      <glossterm>UIMA Java Framework</glossterm>
+      <glossdef>
+        <para>See <glossterm linkend="ugr.glossary.apache_uima_java_framework">Apache UIMA Java Framework</glossterm>.</para>
+        <para/>
+      </glossdef>
+    </glossentry>
+
+    <glossentry id="ugr.glossary.uima_sdk">
+      <glossterm>UIMA SDK</glossterm>
+      <glossdef>
+        <para>See <glossterm linkend="ugr.glossary.apache_uima_sdk">Apache UIMA SDK</glossterm>.</para>
+        <para/>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.xcas">
+      <glossterm>XCAS</glossterm>
+      <glossdef>
+        <para>An XML representation of the CAS. The XCAS can be used for saving
+and restoring CASs to and from streams. The UIMA SDK provides XCAS serialization and
+de-serialization methods for CASes.  This is an older serialization format and
+new UIMA code should use the standard <glossterm linkend="ugr.glossary.xmi">XMI</glossterm>
+format instead.</para>
+      </glossdef>
+    </glossentry>
+  
+    <glossentry id="ugr.glossary.xmi">
+      <glossterm>XML Metadata Interchange (XMI)</glossterm>
+      <glossdef>
+        <para>An OMG standard for representing
+object graphs in XML, which UIMA uses to serialize analysis results from the
+CAS to an XML representation.  The UIMA SDK provides XMI serialization and
+de-serialization methods for CASes</para>
+      </glossdef>
+    </glossentry>
+
+
+  <!--  
+    <glossentry id="ugr.glossary.">
+      <glossterm></glossterm>
+      <glossdef>
+        <para></para>
+      </glossdef>
+    </glossentry>
+  -->
+  
+  </glossary>
+
+ <!-- 
+</chapter>
+   -->

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image002.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image002.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image002.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image004.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image004.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image004.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image006.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image006.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image006.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image008.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image008.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image008.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image010.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image010.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image010.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image012.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image012.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image012.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image014.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image014.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image014.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image002.jpg
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image002.jpg?rev=941736&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image002.jpg
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image004.jpg
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image004.jpg?rev=941736&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image004.jpg
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/known_issues.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/known_issues.xml?rev=941736&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/known_issues.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/known_issues.xml Thu May  6 14:00:16 2010
@@ -0,0 +1,68 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.issues">
+  <title>Known Issues</title>
+  <titleabbrev>Known Issues</titleabbrev>
+
+  <variablelist>
+    <varlistentry id="ugr.issues.cr_to_xml">
+    <term><emphasis role="bold">Sun Java 1.4.2_12 doesn't serialize CR characters to XML</emphasis></term>
+        <listitem>
+        <para>(Note: Apache UIMA now requires Java 1.5, so this issue is moot.) The XML serialization support in Sun Java 1.4.2_12 doesn't serialize CR characters to 
+        XML. As a result, if the document text contains CR characters, XCAS or XMI serialization 
+        will cause them to be lost, resulting in incorrect annotation offsets. This is exposed in 
+        the DocumentAnalyzer, with the highlighting being incorrect if the input document contains 
+        CR characters. </para>
+        </listitem>
+      </varlistentry>
+    <varlistentry id="ugr.issues.jcasgen_java_1.4">
+      <term><emphasis role="bold">JCasGen merge facility only supports Java levels 1.4 or earlier</emphasis></term>
+      <listitem>
+        <para>JCasGen has a facility to merge in user (hand-coded) changes with the code generated
+          by JCasGen.  This merging supports Java 1.4 constructs only.  JCasGen generates Java 1.4 
+          compliant code, so as long as any code you change here also only uses Java 1.4 constructs, the 
+      merge will work, even if you're using Java 5 or later.  
+          If you use syntactic structures particular to Java 5 or later, the merge
+        operation will likely fail to merge properly.</para>
+      </listitem>
+    </varlistentry>
+    <varlistentry id="ugr.issues.libgcj.4.1.2">
+      <term><emphasis role="bold">Descriptor editor in Eclipse tooling does not work with libgcj 4.1.2</emphasis></term>
+      <listitem>
+        <para>The descriptor editor in the Eclipse tooling does not work with libgcj 4.1.2, and
+        possibly other versions of libgcj.  This is apparently due to a bug in the implementation of
+        their XML library, which results in a class cast error.  libgcj is used as the default
+        JVM for Eclipse in Ubuntu (and other Linux distributions?).  The workaround is to use a
+        different JVM to start Eclipse.</para>
+      </listitem>
+    </varlistentry>
+      <!--
+      <varlistentry>
+      <term><emphasis role="bold"></emphasis></term>
+      <listitem><para></para></listitem>
+      </varlistentry>
+      -->
+ </variablelist>
+</chapter>

Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/overview-and-setup.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/overview-and-setup.xml?rev=941736&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/overview-and-setup.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/overview-and-setup.xml Thu May  6 14:00:16 2010
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<book lang="en" >
+  <title>UIMA Overview &amp; SDK Setup</title>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../target/docbook-shared/common_book_info_ibm_c.xml"/>
+
+  <toc/>
+
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="project_overview.xml" />
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="conceptual_overview.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="eclipse_setup.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="faqs.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="known_issues.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="glossary.xml"/>
+</book>