You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2010/05/06 16:00:16 UTC
svn commit: r941736 [2/3] - in
/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup: ./ src/
src/docbook/ src/docbook/images/ src/docbook/images/overview-and-setup/
src/docbook/images/overview-and-setup/conceptual_overview_files/
src/docbook...
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/faqs.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/faqs.xml?rev=941736&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/faqs.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/faqs.xml Thu May 6 14:00:16 2010
@@ -0,0 +1,411 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.faqs">
+ <title>UIMA Frequently Asked Questions (FAQ's)</title>
+ <titleabbrev>UIMA FAQ's</titleabbrev>
+
+ <variablelist>
+ <varlistentry id="ugr.faqs.what_is_uima">
+ <term><emphasis role="bold">What is UIMA?</emphasis></term>
+ <listitem><para>UIMA stands for Unstructured Information Management
+ Architecture. It is component software architecture for the development,
+ discovery, composition and deployment of multi-modal analytics for the analysis
+ of unstructured information.</para>
+ <para>UIMA processing occurs through a series of modules called
+ <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link>. The result of analysis is an assignment of semantics to the elements of
+ unstructured data, for example, the indication that the phrase
+ <quote>Washington</quote> refers to a person's name or that it refers to a
+ place.</para>
+
+ <para>Analysis Engine's output can be saved in conventional structures,
+ for example, relational databases or search engine indices, where the content
+ of the original unstructured information may be efficiently accessed
+ according to its inferred semantics. </para>
+
+ <para>UIMA supports developers in creating,
+ integrating, and deploying components across platforms and among dispersed
+ teams working to develop unstructured information management
+ applications.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.pronounce">
+ <term><emphasis role="bold">How do you pronounce UIMA?</emphasis></term>
+ <listitem><para>You – eee – muh.
+ <!-- Or, in IPA notation, /juËiËmÉ/ (which does not
+ display correctly in our PDF documentation, so it's commented out). --></para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.difference_apache_uima">
+ <term><emphasis role="bold">What's the difference between UIMA and the Apache UIMA?</emphasis></term>
+ <listitem><para>UIMA is an architecture which specifies component interfaces,
+ design patterns, data representations and development roles.</para>
+
+ <para>Apache UIMA is an open source, Apache-licensed software project,
+ currently undergoing incubation at Apache.org. It includes run-time
+ frameworks in Java and C++, APIs and tools for implementing, composing, packaging
+ and deploying UIMA components.</para>
+
+ <para>The UIMA run-time framework allows developers to plug-in their components
+ and applications and run them on different platforms and according to different
+ deployment options that range from tightly-coupled (running in the same
+ process space) to loosely-coupled (distributed across different processes or
+ machines for greater scale, flexibility and recoverability).</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="ugr.faqs.include_semantic_search">
+ <term><emphasis role="bold">
+ Does UIMA include a semantic search engine?
+ </emphasis></term>
+ <listitem><para>
+ The Apache UIMA project does not itself include a semantic search engine.
+ It can interface with the semantic search engine
+ component (available from <ulink
+ url="www.alphaworks.ibm.com/tech/uima"/> for indexing and querying over
+ the results of analysis. Over time, we expect that additional search engines will
+ add support for semantic searching.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.what_is_an_annotation">
+
+ <term><emphasis role="bold">What is an Annotation?</emphasis></term>
+ <listitem><para>An annotation is metadata that is associated with a region of a
+ document. It often is a label, typically represented as string of characters. The
+ region may be the whole document. </para>
+
+ <para>An example is the label <quote>Person</quote> associated with the span of
+ text <quote>George Washington</quote>. We say that <quote>Person</quote>
+ annotates <quote>George Washington</quote> in the sentence <quote>George
+ Washington was the first president of the United States</quote>. The
+ association of the label
+ <quote>Person</quote> with a particular span of text is an annotation. Another
+ example may have an annotation represent a topic, like <quote>American
+ Presidents</quote> and be used to label an entire document.</para>
+
+ <para>Annotations are not limited to regions of texts. An annotation may annotate
+ a region of an image or a segment of audio. The same concepts apply.</para>
+ </listitem>
+ </varlistentry>
+
+
+ <varlistentry id="ugr.faqs.what_is_the_cas">
+ <term><emphasis role="bold">What is the CAS?</emphasis></term>
+ <listitem><para>The CAS stands for Common Analysis Structure. It provides
+ cooperating UIMA components with a common representation and mechanism for
+ shared access to the artifact being analyzed (e.g., a document, audio file, video
+ stream etc.) and the current analysis results.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.what_does_the_cas_contain">
+ <term><emphasis role="bold">What does the CAS contain?</emphasis></term>
+ <listitem><para>The CAS is a data structure for which UIMA provides multiple
+ interfaces. It contains and provides the analysis algorithm or application
+ developer with access to</para>
+
+ <itemizedlist spacing="compact">
+
+ <listitem><para>the subject of analysis (the artifact being analyzed, like
+ the document),</para></listitem>
+
+ <listitem><para>the analysis results or metadata(e.g., annotations, parse
+ trees, relations, entities etc.),</para></listitem>
+
+ <listitem><para>indices to the analysis results, and</para></listitem>
+
+ <listitem><para>the type system (a schema for the analysis results).</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>A CAS can hold multiple versions of the artifact being analyzed (for
+ instance, a raw html document, and a detagged version, or an English version and a
+ corresponding German version, or an audio sample, and the text that
+ corresponds, etc.). For each version there is a separate instance of the results
+ indices.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.only_annotations">
+ <term><emphasis role="bold">Does the CAS only contain Annotations?</emphasis></term>
+ <listitem><para>No. The CAS contains the artifact being analyzed plus the analysis
+ results. Analysis results are those metadata recorded by <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> in the
+ CAS. The most common form of analysis result is the addition of an annotation. But an
+ analysis engine may write any structure that conforms to the CAS's type
+ system into the CAS. These may not be annotations but may be other things, for
+ example links between annotations and properties of objects associated with
+ annotations.</para>
+ <para>The CAS may have multiple representations of the artifact being analyzed, each one
+ represented in the CAS as a particular Subject of Analysis. or <link linkend="ugr.faqs.what_is_a_sofa">Sofa</link></para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.just_xml">
+ <term><emphasis role="bold">Is the CAS just XML?</emphasis></term>
+ <listitem><para>No, in fact there are many possible representations of the CAS. If all
+ of the <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> are running in the same process, an efficient, in-memory
+ data object is used. If a CAS must be sent to an analysis engine on a remote machine, it
+ can be done via an XML or a binary serialization of the CAS. </para>
+
+ <para>The UIMA framework provides serialization and de-serialization methods
+ for a particular XML representation of the CAS named the XMI.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.what_is_a_type_system">
+ <term><emphasis role="bold">What is a Type System?</emphasis></term>
+ <listitem><para>Think of a type system as a schema or class model for the <link linkend="ugr.faqs.what_is_the_cas">CAS</link>. It defines
+ the types of objects and their properties (or features) that may be instantiated in
+ a CAS. A specific CAS conforms to a particular type system. UIMA components declare
+ their input and output with respect to a type system. </para>
+
+ <para>Type Systems include the definitions of types, their properties, range
+ types (these can restrict the value of properties to other types) and
+ single-inheritance hierarchy of types.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.what_is_a_sofa">
+ <term><emphasis role="bold">What is a Sofa?</emphasis></term>
+ <listitem><para>Sofa stands for “Subject of Analysis". A <link linkend="ugr.faqs.what_is_the_cas">CAS</link> is
+ associated with a single artifact being analysed by a collection of UIMA analysis
+ engines. But a single artifact may have multiple independent views, each of which
+ may be analyzed separately by a different set of <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link>. For example,
+ given a document it may have different translations, each of which are associated
+ with the original document but each potentially analyzed by different engines. A
+ CAS may have multiple Views, each containing a different Subject of Analysis
+ corresponding to some version of the original artifact. This feature is ideal for
+ multi-modal analysis, where for example, one view of a video stream may be the video
+ frames and the other the close-captions.</para></listitem>
+ </varlistentry>
+
+
+ <varlistentry id="ugr.faqs.annotator_versus_ae">
+ <term><emphasis role="bold">What's the difference between an Annotator and an Analysis
+ Engine?</emphasis></term>
+ <listitem><para>In the terminology of UIMA, an annotator is simply some code that
+ analyzes documents and outputs <link linkend="ugr.faqs.what_is_an_annotation">annotations</link> on the content of the documents. The
+ UIMA framework takes the annotator, together with metadata describing such
+ things as the input requirements and outputs types of the annotator, and produces
+ an analysis engine. </para>
+
+ <para>Analysis Engines contain the framework-provided infrastructure that
+ allows them to be easily combined with other analysis engines in different flows
+ and according to different deployment options (collocated or as web services,
+ for example). </para>
+
+ <para>Analysis Engines are the framework-generated objects that an Application
+ interacts with. An Annotator is a user-written class that implements the one of
+ the supported Annotator interfaces.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.web_services">
+ <term><emphasis role="bold">Are UIMA analysis engines web services?</emphasis></term>
+ <listitem><para>They can be deployed as such. Deploying an analysis engine as a web
+ service is one of the deployment options supported by the UIMA framework.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.stateless_aes">
+ <term><emphasis role="bold">Do Analysis Engines have to be
+ "stateless"?</emphasis></term>
+ <listitem><para>This is a user-specifyable option. The XML metadata for the
+ component includes an
+ <code>operationalProperties</code> element which can specify if multiple
+ deployment is allowed. If true, then a particular instance of an Engine might not
+ see all the CASes being processed. If false, then that component will see all of the
+ CASes being processed. In this case, it can accumulate state information among all
+ the CASes. Typically, Analysis Engines in the main analysis pipeline are marked
+ multipleDeploymentAllowed = true. The CAS Consumer component, on the other hand,
+ defaults to having this property set to false, and is typically associated with
+ some resource like a database or search engine that aggregates analysis results
+ across an entire collection.</para>
+
+ <para>Analysis Engines developers are encouraged not to maintain state between
+ documents that would prevent their engine from working as advertised if
+ operated in a parallelized environment.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.uddi">
+ <term><emphasis role="bold">Is engine meta-data compatible with web services and
+ UDDI?</emphasis></term>
+ <listitem><para>All UIMA component implementations are associated with Component
+ Descriptors which represents metadata describing various properties about the
+ component to support discovery, reuse, validation, automatic composition and
+ development tooling. In principle, UIMA component descriptors are compatible
+ with web services and UDDI. However, the UIMA framework currently uses its own XML
+ representation for component metadata. It would not be difficult to convert
+ between UIMA's XML representation and the WSDL and UDDI standards.</para>
+ </listitem>
+ </varlistentry>
+
+
+ <varlistentry id="ugr.faqs.scaling">
+ <term><emphasis role="bold">How do you scale a UIMA application?</emphasis></term>
+ <listitem><para>The UIMA framework allows components such as <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> and
+ CAS Consumers to be easily deployed as services or in other containers and managed
+ by systems middleware designed to scale. UIMA applications tend to naturally
+ scale-out across documents allowing many documents to be analyzed in
+ parallel.</para>
+ <para>A component in the UIMA framework called the CPM (Collection Processing
+ Manager) has a host of features and configuration settings for scaling an
+ application to increase its throughput and recoverability.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.embedding">
+ <term><emphasis role="bold">What does it mean to embed UIMA in systems middleware?</emphasis></term>
+ <listitem><para>An example of an embedding would be the deployment of a UIMA analysis
+ engine as an Enterprise Java Bean inside an application server such as IBM
+ WebSphere. Such an embedding allows the deployer to take advantage of the features
+ and tools provided by WebSphere for achieving scalability, service management,
+ recoverability etc. UIMA is independent of any particular systems middleware, so
+ <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> could be deployed on other application servers as well.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.cpm_versus_cpe">
+ <term><emphasis role="bold">How is the CPM different from a CPE?</emphasis></term>
+ <listitem><para>These name complimentary aspects of collection processing. The CPM
+ (Collection Processing <emphasis role="bold">Manager</emphasis> is the part of
+ the UIMA framework that manages the execution of a workflow of UIMA
+ components orchestrated to analyze a large collection of documents. The UIMA
+ developer does not implement or describe a CPM. It is a piece of infrastructure code
+ that handles CAS transport, instance management, batching, check-pointing,
+ statistics collection and failure recovery in the execution of a collection
+ processing workflow.</para>
+
+ <para>A Collection Processing Engine (CPE) is component created by the framework
+ from a specific CPE descriptor. A CPE descriptor refers to a series of UIMA
+ components including a Collection Reader, CAS Initializer, Analysis
+ Engine(s) and CAS Consumers. These components are organized in a work flow and
+ define a collection analysis job or CPE. A CPE acquires documents from a source
+ collection, initializes CASs with document content, performs document
+ analysis and then produces collection level results (e.g., search engine
+ index, database etc). The CPM is the execution engine for a CPE.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.semantic_search">
+ <term><emphasis role="bold">What is Semantic Search and what is its relationship to
+ UIMA?</emphasis></term>
+ <listitem><para>Semantic Search refers to a document search paradigm that allows
+ users to search based not just on the keywords contained in the documents, but also
+ on the semantics associated with the text by <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link>. UIMA applications
+ perform analysis on text documents and generate semantics in the form of
+ <link linkend="ugr.faqs.what_is_an_annotation">annotations</link> on regions of text. For example, a UIMA analysis engine may discover
+ the text <quote>First Financial Bank</quote> to refer to an organization and
+ annotated it as such. With traditional keyword search, the query
+ <command>first</command> will return all documents that contain that word.
+ <command>First</command> is a frequent and ambiguous term – it occurs a lot
+ and can mean different things in different places. If the user is looking for
+ organizations that contain that word <command>first</command> in their names,
+ s/he will likely have to sift through lots of documents containing the word
+ <quote>first</quote> used in different ways. Semantic Search exploits the
+ results of analysis to allow more precise queries. For example, the semantic
+ search query <emphasis><organization> first
+ </organization></emphasis> will rank first documents that contain the
+ word <quote>first</quote> as part of the name of an organization. The UIMA SDK
+ documentation demonstrates how UIMA applications can be built using semantic
+ search. It provides details about the XML Fragment Query language. This is the
+ particular query language used by the semantic search engine that comes with the
+ SDK.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.xml_fragment_not_xml">
+ <term><emphasis role="bold">Is an XML Fragment Query valid XML?</emphasis></term>
+ <listitem><para>Not necessarily. The XML Fragment Query syntax is used to formulate
+ queries interpreted by the semantic search engine that ships with the UIMA SDK.
+ This query language relies on basic XML syntax as an intuitive way to describe
+ hierarchical patterns of annotations that may occur in a <link linkend="ugr.faqs.what_is_the_cas">CAS</link>. The language
+ deviates from valid XML in order to support queries over
+ <quote>overlapping</quote> or <quote>cross-over</quote> annotations and
+ other features that affect the interpretation of the query by the query processor.
+ For example, it admits notations in the query to indicate whether a keyword or an
+ annotation is optional or required to match a document.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.modalities_other_than_text">
+ <term><emphasis role="bold">Does UIMA support modalities other than text?</emphasis></term>
+ <listitem><para>The UIMA architecture supports the development, discovery,
+ composition and deployment of multi-modal analytics including text, audio and
+ video. Applications that process text, speech and video have been developed using
+ UIMA. This release of the SDK, however, does not include examples of these
+ multi-modal applications. </para>
+
+ <para>It does however include documentation and programming examples for using
+ the key feature required for building multi-modal applications. UIMA supports
+ multiple subjects of analysis or <link linkend="ugr.faqs.what_is_a_sofa">Sofas</link>. These allow multiple views of a single
+ artifact to be associated with a <link linkend="ugr.faqs.what_is_the_cas">CAS</link>. For example, if an artifact is a video
+ stream, one Sofa could be associated with the video frames and another with the
+ closed-captions text. UIMA's multiple Sofa feature is included and
+ described in this release of the SDK.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.compare">
+ <term><emphasis role="bold">How does UIMA compare to other similar work?</emphasis></term>
+ <listitem><para>A number of different frameworks for NLP have preceded UIMA. Two of
+ them were developed at IBM Research and represent UIMA's early roots. For
+ details please refer to the UIMA article that appears in the IBM Systems Journal
+ Vol. 43, No. 3 (<ulink
+ url="http://www.research.ibm.com/journal/sj/433/ferrucci.html"/>
+ ).</para>
+
+ <para>UIMA has advanced that state of the art along a number of dimensions
+ including: support for distributed deployments in different middleware
+ environments, easy framework embedding in different software product
+ platforms (key for commercial applications), broader architectural converge
+ with its collection processing architecture, support for
+ multiple-modalities, support for efficient integration across programming
+ languages, support for a modern software engineering discipline calling out
+ different roles in the use of UIMA to develop applications, the extensive use of
+ descriptive component metadata to support development tooling, component
+ discovery and composition. (Please note that not all of these features are
+ available in this release of the SDK.)</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.open_source">
+ <term><emphasis role="bold">Is UIMA Open Source?</emphasis></term>
+ <listitem><para>Yes. As of version 2, UIMA development has moved to Apache and is being
+ developed within the Apache open source processes. It is licensed under the Apache
+ version 2 license. Previous versions are available on the IBM alphaWorks site (
+ <ulink url="http://www.alphaworks.ibm.com/tech/uima"/>) and the source
+ code for previous version of the UIMA framework is available on SourceForge (
+ <ulink url="http://uima-framework.sourceforge.net/"/>).</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.levels_required">
+ <term><emphasis role="bold">What Java level and OS are required for the UIMA SDK?</emphasis></term>
+ <listitem><para>As of release 2.2.1, the UIMA SDK requires a Java 1.5 level (or later). Releases prior to 2.2.1
+ require as a minimum the Java 1.4 level; they will not run on 1.3 (or earlier levels).
+ The release has been tested with Java 5 and 6.
+ It has been tested on mainly on Windows XP and Linux Intel 32bit platforms, with some
+ testing on the MacOSX. Other
+ platforms and JDK implementations will likely work, but have
+ not been as significantly tested.</para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.building_apps_on_top_of_uima">
+ <term><emphasis role="bold">Can I build my UIM application on top of UIMA?</emphasis></term>
+ <listitem><para>Yes. Apache UIMA is licensed under the Apache version 2 license,
+ enabling you to build and distribute applications which include the framework.
+ </para></listitem>
+ </varlistentry>
+ <varlistentry id="ugr.faqs.commercial_products">
+ <term><emphasis role="bold">Do any commercial products support the UIMA framework or include
+ it as part of their product?</emphasis></term>
+ <listitem><para>Yes. IBM's WebSphere Information Integration Omnifind Edition
+ product (<ulink
+ url="http://www.ibm.com/developerworks/db2/zones/db2ii"/> or <ulink
+ url="http://www-306.ibm.com/software/data/integration/db2ii/editions_womnifind.html"/>
+ ) has UIMA <quote>inside</quote> and supports adding UIMA annotators to the
+ processing pipeline. We are actively seeking other product embeddings. </para>
+ </listitem>
+ </varlistentry>
+ <!--
+ <varlistentry>
+ <term><emphasis role="bold"></emphasis></term>
+ <listitem><para></para></listitem>
+ </varlistentry>
+ -->
+ </variablelist>
+</chapter>
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/glossary.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/glossary.xml?rev=941736&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/glossary.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/glossary.xml Thu May 6 14:00:16 2010
@@ -0,0 +1,559 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<glossary id="ugr.glossary">
+ <title>Glossary: Key Terms & Concepts</title>
+ <titleabbrev>Glossary</titleabbrev>
+ <!--
+ <para></para>
+ <glossary id="ugr.glossary.glossary">
+ -->
+ <!--
+ <glossentry id="ugr.glossary.">
+ <glossterm></glossterm>
+ <glosssee otherterm="ugr.glossary."></glosssee>
+ <glossdef>
+ <para></para>
+ <glossseealso otherterm="ugr.glossary."/>
+ </glossdef>
+ </glossentry>
+ -->
+ <glossentry id="ugr.glossary.aggregate">
+ <glossterm>Aggregate &ae;</glossterm>
+ <glossdef>
+ <para>An <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>
+ made up of multiple subcomponent
+&ae;s arranged in a flow. The
+flow can be one of the two built-in flows, or a custom flow provided by the user.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.analysis_engine">
+ <glossterm>&ae;</glossterm>
+ <glossdef><para>A program that analyzes artifacts (e.g. documents) and infers information about
+them, and which implements the UIMA &ae; interface Specification. It
+does not matter how the program is built, with what framework or whether or not
+it contains component (<quote>sub</quote>) &ae;s.</para>
+ </glossdef>
+ </glossentry>
+
+
+ <glossentry id="ugr.glossary.annotation">
+ <glossterm>Annotation</glossterm>
+ <glossdef>
+ <para>The association of a metadata, such as a label, with a region of text (or other
+type of artifact). For example, the label <quote>Person</quote> associated with a
+region of text <quote>John Doe</quote> constitutes an annotation. We say
+<quote>Person</quote> annotates the span of text from X to Y containing exactly
+<quote>John Doe</quote>. An annotation is represented as a special
+ <glossterm linkend="ugr.glossary.type">type</glossterm>
+
+in a UIMA <glossterm linkend="ugr.glossary.type_system">type system</glossterm>.
+ It is the type used to record
+the labeling of regions of a <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm></para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.annotator">
+ <glossterm>Annotator</glossterm>
+ <glossdef>
+ <para>A software
+component that implements the UIMA annotator interface. Annotators are
+implemented to produce and record annotations over regions of an artifact
+(e.g., text document, audio, and video).</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.application">
+ <glossterm>Application</glossterm>
+ <glossdef>
+ <para>An application is the outer containing code that invokes
+ the UIMA framework functions to instantiate an
+ <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm> or a
+ <glossterm linkend="ugr.glossary.cpe">Collection Processing Engine</glossterm> from a particular
+ descriptor, and run it.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.apache_uima_java_framework">
+ <glossterm>Apache UIMA Java Framework</glossterm>
+ <glossdef>
+ <para>A Java-based implementation of the <glossterm linkend="ugr.glossary.uima">UIMA</glossterm>
+ architecture. It provides a run-time environment in which developers can plug in and run their UIMA component
+ implementations and with which they can build and deploy UIM applications. The framework is the
+ core part of the <glossterm linkend="ugr.glossary.apache_uima_sdk">Apache UIMA SDK</glossterm>.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.apache_uima_sdk">
+ <glossterm>Apache UIMA Software Development Kit (SDK)</glossterm>
+ <glossdef>
+ <para>The SDK for which you are now reading the documentation. The SDK includes the framework
+ plus additional components such as tooling and examples. Some of the tooling is Eclipse-based
+ (<ulink url="http://www.eclipse.org/"/>). The Apache UIMA SDK is being developed at the
+ <ulink url="http://incubator.apache.org">Apache Incuabator</ulink>.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.cas">
+ <glossterm>CAS</glossterm>
+ <glossdef>
+ <para>The UIMA Common Analysis Structure is
+the primary data structure which UIMA analysis components use to represent and
+share analysis results. It contains:</para>
+
+<itemizedlist><listitem><para>The artifact. This is the object
+being analyzed such as a text document or audio or video stream. The CAS
+projects one or more views of the artifact. Each view is referred to as a
+ <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.</para></listitem>
+
+
+<listitem><para>A type system description –
+indicating the types, subtypes, and their features. </para></listitem>
+
+
+<listitem><para>Analysis metadata – <quote>standoff</quote>
+annotations describing the artifact or a region of the artifact </para></listitem>
+
+
+<listitem><para>An index repository to support
+efficient access to and iteration over the results of analysis.
+</para></listitem></itemizedlist>
+
+<para>UIMA's primary interface to this structure is provided by
+a class called the Common Analysis System. We use <quote>CAS</quote> to refer to
+both the structure and system. Where the common analysis structure is used
+through a different interface, the particular implementation of the structure
+is indicated, For example, the <glossterm linkend="ugr.glossary.jcas">JCas</glossterm> is a native Java object
+representation of the contents of the common analysis structure.</para>
+
+<para>A CAS can have multiple views; each view has a unique
+representation of the artifact, and has its own index repository, representing
+results of analysis for that representation of the artifact.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.cas_consumer">
+ <glossterm>CAS Consumer</glossterm>
+ <glossdef>
+ <para>A component that
+receives each CAS in the collection, usually after it has been processed by an
+ <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>. It is responsible for taking the results from
+the CAS and using them for some purpose, perhaps storing selected results into
+a database, for instance. The CAS
+Consumer may also perform collection-level analysis, saving these results in an
+application-specific, aggregate data structure.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.cas_initializer">
+ <glossterm>CAS Initializer (deprecated)</glossterm>
+ <glossdef>
+ <para>Prior to version 2, this was the component that took an
+ undefined input form and produced a particular <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.
+ For version 2, this has been replaced with using any <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>
+ which takes a particular <glossterm linkend="ugr.glossary.cas_view">CAS View</glossterm> and creates a
+ new output Sofa. For example, if the document is HTML, an &ae; might
+ create a Sofa which is a detagged version of an input CAS View, perhaps also
+creating annotations derived from the tags. For example <p> tags
+might be translated into Paragraph annotations in the CAS.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.cas_multiplier">
+ <glossterm>CAS Multiplier</glossterm>
+ <glossdef>
+ <para>A component, implemented by a UIMA developer,
+that takes a CAS as input and produces 0 or more new CASes as output. Common use cases for a CAS Multiplier
+ include creating alternative versions of an input <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>
+ (see <glossterm linkend="ugr.glossary.cas_initializer">CAS Initializer</glossterm>), and breaking
+ a large input CAS into smaller pieces, each of which is emitted as a
+separate output CAS. There are other
+uses, however, such as aggregating input CASes into a single output CAS.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.cas_processor">
+ <glossterm>CAS Processor</glossterm>
+ <glossdef>
+ <para>A component of a Collection Processing Engine (CPE) that
+takes a CAS as input and returns a CAS as output. There are two types of CAS
+Processors: <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s and
+ <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumer</glossterm>s.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.cas_view">
+ <glossterm>CAS View</glossterm>
+ <glossdef>
+ <para>A CAS Object which shares the base CAS and type system
+definition and index specifications, but has a unique index repository and a
+particular <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>. Views are named, and applications and
+annotators can dynamically create additional views whenever they are needed.
+Annotations are made with respect to one view. Feature structures can have references to feature structures
+ indexed in other views, as needed.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.cde">
+ <glossterm>CDE</glossterm>
+ <glossdef>
+ <para>The Component Descriptor Editor. This
+is the Eclipse tool that lets you conveniently edit the UIMA descriptors;
+ see <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde"/>.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.cpe">
+ <glossterm>Collection Processing Engine (CPE)</glossterm>
+ <glossdef>
+ <para>Performs Collection Processing
+through the combination of a
+ <glossterm linkend="ugr.glossary.collection_reader">Collection Reader</glossterm>,
+ 0 or more <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s,
+ and zero or more <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumer</glossterm>s.
+The Collection Processing Manager (CPM) manages the execution of the engine.</para>
+ <para>The CPE also refers to the XML specification of the Collection Processing
+ engine. The CPM reads a CPE specification and instantiates a CPE instance from it,
+ and runs it.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.cpm">
+ <glossterm>Collection Processing Manager (CPM)</glossterm>
+ <glossdef>
+ <para>The part of the framework that
+manages the execution of collection processing, routing CASs from the
+ <glossterm linkend="ugr.glossary.collection_reader">Collection Reader</glossterm>
+
+to 0 or more <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s
+and then to the 0 or more <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumer</glossterm>s. The CPM
+provides feedback such as performance statistics and error reporting and supports
+other features such as parallelization and error handling.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.collection_reader">
+ <glossterm>Collection Reader</glossterm>
+ <glossdef>
+ <para>A component
+that reads documents from some source, for example a file system or database.
+The collection reader initializes a CAS with this document.
+ Each document is returned as a CAS that may then be processed by
+ an <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s. If the task of populating a CAS
+from the document is complex, you may use an arbitrarily complex chain of
+ <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s and have the last one
+ create and initialize a new <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.fact_search">
+ <glossterm>Fact Search</glossterm>
+ <glossdef>
+ <para>A search that given a fact pattern, returns facts
+extracted from a collection of documents by a set of &ae;s that
+match the fact pattern.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.feature">
+ <glossterm>Feature</glossterm>
+ <glossdef>
+ <para>A data member or attribute of a type. Each feature itself has an
+associated range type, the type of the value that it can hold. In the
+database analogy where types are tables, features are columns.
+ In the world of structured data types, each feature is a <quote>field</quote>,
+ or data member.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.flow_controller">
+ <glossterm>Flow Controller</glossterm>
+ <glossdef>
+ <para>A component which implements the interfaces needed
+to specify a custom flow within an <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.hybrid_analysis_engine">
+ <glossterm>Hybrid &ae;</glossterm>
+ <glossdef>
+ <para>An <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>
+ where more than one of its component &ae;s are deployed
+the same address space and one or more are deployed remotely (part tightly and
+part loosely-coupled).</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.index">
+ <glossterm>Index</glossterm>
+ <glossdef>
+ <para>Data in the CAS can only be retrieved using Indexes.
+ Indexes are analogous to the indexes that are
+specified on tables of a database. Indexes belong to Index Repositories;
+there is one Repository for each
+view of the CAS. Indexes are specified
+to retrieve instances of some CAS Type (including its subtypes), and can be
+optionally sorted in a user-definable way.
+ For example, all types derived from the UIMA
+built-in type <literal>uima.tcas.Annotation</literal> contain begin
+and end features, which mark the begin and end offsets in the text where this
+annotation occurs. There is a built-in index of Annotations that specifies that
+annotations are retrieved sequentially by sorting first on the value of the begin
+feature (ascending) and then by the value of the end feature (descending).
+In this case, iterating over the annotations, one first obtains annotations that
+come sequentially first in the text, while favoring longer annotations, in the case
+where two annotations start at the same offset. Users can define their own indexes
+as well.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.jcas">
+ <glossterm>JCas</glossterm>
+ <glossdef>
+ <para>A Java object interface to the contents of the CAS.
+ This interface use additional generated Java classes, where each type in the CAS
+is represented as a Java class with the same name, each feature is represented with
+a getter and setter method, and each instance of a type is represented as a
+Java object of the corresponding Java class.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.keyword_search">
+ <glossterm>Keyword Search</glossterm>
+ <glossdef>
+ <para>The standard search method where one supplies words (or <quote>keywords</quote>)
+and candidate documents are returned.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.knowledge_base">
+ <glossterm>Knowledge Base</glossterm>
+ <glossdef>
+ <para>A collection of data that may be interpreted as a
+set of facts and rules considered true in a possible world.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.loosely_coupled_analysis_engine">
+ <glossterm>Loosely-Coupled &ae;</glossterm>
+ <glossdef>
+ <para>An <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>
+ where no two of its component &ae;s run in the
+same address space but where each is remote with respect to the others that
+make up the aggregate. Loosely coupled engines are ideal for using
+ remote &ae; services that are
+not locally available, or for quickly assembling and testing functionality in
+cross-language, cross-platform distributed environments. They also better enable
+distributed scaleable implementations where quick recoverability may have a
+greater impact on overall throughput than analysis speed.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.ontology">
+ <glossterm></glossterm>
+ <glossdef>
+ <para>The part of a knowledge base that defines the semantics of the data
+axiomatically.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.pear">
+ <glossterm>PEAR</glossterm>
+ <glossdef>
+ <para>An archive file that packages up a UIMA component with its code,
+descriptor files and other resources required to install and run it in another
+environment. You can generate PEAR files using utilities that come with the
+UIMA SDK.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.primitive_analysis_engine">
+ <glossterm>Primitive &ae;</glossterm>
+ <glossdef>
+ <para>An <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>
+ that is composed of a single
+ <glossterm linkend="ugr.glossary.annotator">Annotator</glossterm>; one that has
+no component (or <quote>sub</quote>) &ae;s inside of it;
+contrast with
+ <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.semantic_search">
+ <glossterm>Semantic Search</glossterm>
+ <glossdef>
+ <para> search where the semantic intent of the query is
+specified using one or more entity or relation specifiers. For example,
+one could specify that they are looking for a person (named) <quote>Bush.</quote>
+Such a query would then not return results about the kind of bushes that grow
+in your garden but rather just persons named Bush.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.structured_information">
+ <glossterm>Structured Information</glossterm>
+ <glossdef>
+ <para>Items stored in structured resources such as
+search engine indices, databases or knowledge bases. The canonical example of
+structured information is the database table. Each element of information in
+the database is associated with a precisely defined schema where each table
+column heading indicates its precise semantics, defining exactly how the
+information should be interpreted by a computer program or end-user.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.sofa">
+ <glossterm>Subject of Analysis (Sofa)</glossterm>
+ <glossdef>
+ <para>A piece of
+data (e.g., text document, image, audio segment, or video segment), which is intended
+for analysis by UIMA analysis components. It belongs to a
+ <glossterm linkend="ugr.glossary.cas_view">CAS View</glossterm> which has the same name; there
+ is a one-to-one correspondence between these. There can be multiple Sofas contained within
+one CAS, each one representing a different view of the original artifact – for example,
+an audio file could be the original artifact, and also be one Sofa, and another
+could be the output of a voice-recognition component, where the Sofa would be
+the corresponding text document. Sofas may be analyzed independently or
+simultaneously; they all co-exist within the CAS. </para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.tightly_coupled_analysis_engine">
+ <glossterm>Tightly-Coupled &ae;</glossterm>
+ <glossdef>
+ <para>An <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>
+ where all of its component &ae;s run in the same address space.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.type">
+ <glossterm>Type</glossterm>
+ <glossdef>
+ <para>A specification of an object in the
+ <glossterm linkend="ugr.glossary.cas">CAS</glossterm> used to store the results of
+analysis. Types are defined using inheritance, so some types may be
+defined purely for the sake of defining other types, and are in this sense <quote>abstract
+types.</quote> Types usually contain
+ <glossterm linkend="ugr.glossary.feature">Feature</glossterm>s, which are attributes, or
+properties of the type. A type is roughly equivalent to a class in an
+object oriented programming language, or a table in a database. Instances of types in the CAS
+ may be indexed for retrieval.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.type_system">
+ <glossterm>Type System</glossterm>
+ <glossdef>
+ <para>A collection of related <glossterm linkend="ugr.glossary.type">types</glossterm>.
+ All components that can access the CAS,
+including <glossterm linkend="ugr.glossary.application">Applications</glossterm>,
+ <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s,
+ <glossterm linkend="ugr.glossary.collection_reader">Collection Readers</glossterm>,
+ <glossterm linkend="ugr.glossary.flow_controller">Flow Controllers</glossterm>, or
+ <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumers</glossterm>
+declare the type system that they use. Type systems are shared across &ae;s, allowing the outputs
+ of one &ae; to be read as input by another &ae;.
+A type system is roughly analogous to a set of related classes in object
+oriented programming, or a set of related tables in a database. The type
+system / type / feature terminology comes from computational linguistics.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.unstructured_information">
+ <glossterm>Unstructured Information</glossterm>
+ <glossdef>
+ <para>The canonical example of unstructured
+information is the natural language text document. The intended meaning of a
+document's content is only implicit and its precise interpretation by a
+computer program requires some degree of analysis to explicate the document's
+semantics. Other examples include audio, video and images. Contrast with
+<glossterm linkend="ugr.glossary.structured_information">Structured Information</glossterm>.
+ </para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.uima">
+ <glossterm>UIMA</glossterm>
+ <glossdef>
+ <para>UIMA is an acronym that stands for Unstructured Information Management Architecture;
+ it is a software architecture which specifies component interfaces, design patterns
+and development roles for creating, describing, discovering, composing and
+deploying multi-modal analysis capabilities. The UIMA specification is being developed by a
+ technical committee at <ulink url="http://www.oasis-open.org/committees/uima">OASIS</ulink>.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.uima_java_framework">
+ <glossterm>UIMA Java Framework</glossterm>
+ <glossdef>
+ <para>See <glossterm linkend="ugr.glossary.apache_uima_java_framework">Apache UIMA Java Framework</glossterm>.</para>
+ <para/>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.uima_sdk">
+ <glossterm>UIMA SDK</glossterm>
+ <glossdef>
+ <para>See <glossterm linkend="ugr.glossary.apache_uima_sdk">Apache UIMA SDK</glossterm>.</para>
+ <para/>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.xcas">
+ <glossterm>XCAS</glossterm>
+ <glossdef>
+ <para>An XML representation of the CAS. The XCAS can be used for saving
+and restoring CASs to and from streams. The UIMA SDK provides XCAS serialization and
+de-serialization methods for CASes. This is an older serialization format and
+new UIMA code should use the standard <glossterm linkend="ugr.glossary.xmi">XMI</glossterm>
+format instead.</para>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="ugr.glossary.xmi">
+ <glossterm>XML Metadata Interchange (XMI)</glossterm>
+ <glossdef>
+ <para>An OMG standard for representing
+object graphs in XML, which UIMA uses to serialize analysis results from the
+CAS to an XML representation. The UIMA SDK provides XMI serialization and
+de-serialization methods for CASes</para>
+ </glossdef>
+ </glossentry>
+
+
+ <!--
+ <glossentry id="ugr.glossary.">
+ <glossterm></glossterm>
+ <glossdef>
+ <para></para>
+ </glossdef>
+ </glossentry>
+ -->
+
+ </glossary>
+
+ <!--
+</chapter>
+ -->
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image002.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image002.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image002.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image004.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image004.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image004.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image006.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image006.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image006.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image008.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image008.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image008.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image010.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image010.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image010.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image012.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image012.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image012.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image014.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image014.png?rev=941736&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image014.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image002.jpg
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image002.jpg?rev=941736&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image002.jpg
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image004.jpg
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image004.jpg?rev=941736&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image004.jpg
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/known_issues.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/known_issues.xml?rev=941736&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/known_issues.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/known_issues.xml Thu May 6 14:00:16 2010
@@ -0,0 +1,68 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.issues">
+ <title>Known Issues</title>
+ <titleabbrev>Known Issues</titleabbrev>
+
+ <variablelist>
+ <varlistentry id="ugr.issues.cr_to_xml">
+ <term><emphasis role="bold">Sun Java 1.4.2_12 doesn't serialize CR characters to XML</emphasis></term>
+ <listitem>
+ <para>(Note: Apache UIMA now requires Java 1.5, so this issue is moot.) The XML serialization support in Sun Java 1.4.2_12 doesn't serialize CR characters to
+ XML. As a result, if the document text contains CR characters, XCAS or XMI serialization
+ will cause them to be lost, resulting in incorrect annotation offsets. This is exposed in
+ the DocumentAnalyzer, with the highlighting being incorrect if the input document contains
+ CR characters. </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry id="ugr.issues.jcasgen_java_1.4">
+ <term><emphasis role="bold">JCasGen merge facility only supports Java levels 1.4 or earlier</emphasis></term>
+ <listitem>
+ <para>JCasGen has a facility to merge in user (hand-coded) changes with the code generated
+ by JCasGen. This merging supports Java 1.4 constructs only. JCasGen generates Java 1.4
+ compliant code, so as long as any code you change here also only uses Java 1.4 constructs, the
+ merge will work, even if you're using Java 5 or later.
+ If you use syntactic structures particular to Java 5 or later, the merge
+ operation will likely fail to merge properly.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry id="ugr.issues.libgcj.4.1.2">
+ <term><emphasis role="bold">Descriptor editor in Eclipse tooling does not work with libgcj 4.1.2</emphasis></term>
+ <listitem>
+ <para>The descriptor editor in the Eclipse tooling does not work with libgcj 4.1.2, and
+ possibly other versions of libgcj. This is apparently due to a bug in the implementation of
+ their XML library, which results in a class cast error. libgcj is used as the default
+ JVM for Eclipse in Ubuntu (and other Linux distributions?). The workaround is to use a
+ different JVM to start Eclipse.</para>
+ </listitem>
+ </varlistentry>
+ <!--
+ <varlistentry>
+ <term><emphasis role="bold"></emphasis></term>
+ <listitem><para></para></listitem>
+ </varlistentry>
+ -->
+ </variablelist>
+</chapter>
Added: uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/overview-and-setup.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/overview-and-setup.xml?rev=941736&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/overview-and-setup.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-overview-and-setup/src/docbook/overview-and-setup.xml Thu May 6 14:00:16 2010
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<book lang="en" >
+ <title>UIMA Overview & SDK Setup</title>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../target/docbook-shared/common_book_info_ibm_c.xml"/>
+
+ <toc/>
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="project_overview.xml" />
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="conceptual_overview.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="eclipse_setup.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="faqs.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="known_issues.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="glossary.xml"/>
+</book>