You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2008/08/28 23:28:16 UTC
svn commit: r689997 [16/32] - in /incubator/uima/uimaj/trunk/uima-docbooks:
./ src/ src/docbook/overview_and_setup/ src/docbook/references/
src/docbook/tools/ src/docbook/tutorials_and_users_guides/
src/docbook/uima/organization/ src/olink/references/
Modified: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tools/tools.cde.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tools/tools.cde.xml?rev=689997&r1=689996&r2=689997&view=diff
==============================================================================
--- incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tools/tools.cde.xml (original)
+++ incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tools/tools.cde.xml Thu Aug 28 14:28:14 2008
@@ -1,1385 +1,1385 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
-"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
-<!ENTITY imgroot "../images/tools/tools.cde/" >
-<!ENTITY % uimaents SYSTEM "../entities.ent" >
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.tools.cde">
- <title>Component Descriptor Editor User's Guide</title>
- <titleabbrev>CDE User's Guide</titleabbrev>
-
- <para>The Component Descriptor Editor is an Eclipse plug-in that provides a forms-based
- interface for creating and editing UIMA XML descriptors. It supports most of the
- descriptor formats, except the Collection Processing Engine descriptor, the PEAR
- package descriptor and some remote deployment descriptors.</para>
-
- <section id="ugr.tools.cde.launching">
- <title>Launching the Component Descriptor Editor</title>
-
- <para>Here's how to launch this tool on a descriptor contained in the examples. This
- presumes you have installed the examples as described in the SDK Installation and Setup
- chapter.</para>
-
- <itemizedlist spacing="compact"><listitem><para>Expand the uimaj-examples
- project in the Eclipse Navigator or Package Explorer view</para></listitem>
-
- <listitem><para>Within this project, browse to the file
- descriptors/tutorial/ex1/RoomNumberAnnotator.xml.</para></listitem>
-
- <listitem><para>Right-click on this file and select Open With → Component
- Descriptor Editor. (If this option is not present, check to make sure you installed
- the plug-ins as described <olink targetdoc="&uima_docs_overview;"
- targetptr="ugr.ovv.eclipse_setup.installation"/>. The EMF plugin is also
- required.).</para></listitem>
-
- <listitem><para>This should open a graphical editor and display the contents of the
- RoomNumberAnnotator descriptor. </para></listitem></itemizedlist>
-
- </section>
-
- <section id="ugr.tools.cde.creating_new_ae_descriptor">
- <title>Creating a New AE Descriptor</title>
-
- <para>A new AE descriptor file may be created by selecting the File → New →
- Other... menu. This brings up the following dialog:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>
- </imageobject>
- <textobject><phrase>Screenshot of selecting new UIMA component in Eclipse</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>If the user then selects UIMA and Analysis Engine Descriptor File, and clicks the
- Next > button, the following dialog is displayed. We will cover creating other kinds
- of components later in the documentation.
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="3.2in" format="JPG" fileref="&imgroot;image004.jpg"/>
- </imageobject>
- <textobject><phrase>Screenshot of selecting new UIMA component in Eclipse
- after pushing Next</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>After entering the appropriate parent folder and file name, and clicking Finish,
- an initial AE descriptor file is created with the given name, and the descriptor is
- opened up within the Component Descriptor Editor.</para>
-
- <para>At this point, the display inside the Component Descriptor Editor is the same
- whether one started by creating a new AE descriptor, as in the preceding paragraph, or
- one merely opened a previously created AE descriptor from, say, the Package Explorer
- view. We show a previously created AE in the figure below:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/>
- </imageobject>
- <textobject><phrase>Screenshot of CDE showing overview page</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>To see all the information shown in the main editor pane with less scrolling, double
- click the title tab to toggle between the <quote>full screen</quote> and normal
- views.</para>
-
- <para>It is possible to set the Component Descriptor Editor as the default editor for all
- .xml files by going to Window → Preferences, and then selecting File Associations
- on the left, and *.xml on the right, and finally by clicking on Component Descriptor
- Editor, the Default button and then OK. If AE and Type System descriptors are not the
- primary .xml files you work with within the Eclipse environment, we recommend not
- setting the Component Descriptor Editor as your default editor for all .xml files. To
- open an .xml file using the Component Descriptor Editor, if the Component Descriptor
- Editor is not set as your default editor, right click on the file in the Package Explorer,
- or other navigational view, and select Open With → Component Descriptor Editor.
- This choice is remembered by Eclipse for subsequent open operations.</para>
-
- </section>
-
- <section id="ugr.tools.cde.pages_within_the_editor">
- <title>Pages within the Editor</title>
-
- <para>The Component Descriptor Editor follows a standard Eclipse paradigm for these
- kinds of editors. There are several pages in the editor; each one can be selected, one at a
- time, by clicking on the bottom tabs. The last page contains the actual XML source file
- being edited, and is displayed as plain text.</para>
-
- <para>The same set of tabs appear at the bottom of each page in the Component Descriptor
- Editor. The Component Descriptor Editor uses this <quote>multi-page editor</quote>
- paradigm to give the user a view of conceptually distinct portions of the Descriptor
- metadata in separate pages. At any point in time the user may click on the Source tab to
- view the actual XML source. The Component Descriptor Editor is, in a way, just a fancy GUI
- for editing the XML. The tabs provide quick access to the following pages: Overview,
- Aggregate, Parameters, Parameter Settings, Type System, Capabilities, Indexes,
- Resources, and Source. We discuss each of these pages in turn.</para>
-
- <section id="ugr.tools.cde.adjusting_display_of_pages">
- <title>Adjusting the display of pages</title>
-
- <para>Most pages in the editor have a <quote>sash</quote> bar. This is a light gray bar
- which separates sub-sections of the page. This bar can be dragged with the mouse to
- adjust how the display area is split between the two sash panes. You can also change the
- orientation of the Sash so it splits vertically, instead of horizontally, by
- clicking on the small icons at the top right of the page that look like this:
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width=".7in" format="JPG" fileref="&imgroot;image008.jpg"/>
- </imageobject>
- <textobject><phrase>Changing orientation of two window split</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>All of the sections on a page have subtitles, with an indicator to the left which
- you can click to collapse or expand that particular section. Collapsing sections can
- sometimes be useful to free up screen area for other sections.</para>
-
- </section>
- </section>
-
- <section id="ugr.tools.cde.overview_page">
- <title>Overview Page</title>
-
- <para>Normally, the first page displayed in the Component Descriptor Editor is the
- Overview page (the name of the page is shown in the GUI panel at the top left). If there is an
- error reading and parsing the source, the Source page is shown instead, giving you the
- opportunity to correct the problem. For many components, the Overview page contains
- three sections: Implementation Details, Runtime Information and overall
- Identification Information.</para>
-
- <section id="ugr.tools.cde.overview_page.implementation_details">
- <title>Implementation Details</title>
-
- <para>In the Implementation Details section you specify the Implementation Language
- and Engine Type. There are two kinds of Engines: Aggregate, and non-Aggregate (also
- called Primitive). An Aggregate engine is one which is composed of additional
- component engines and contains no code, itself. Several of the pages in the Component
- Descriptor Editor have different formats, depending on the engine type.</para>
-
- </section>
- <section id="ugr.tools.cde.overview_page.runtime_info">
- <title>Runtime Information</title>
-
- <para>Runtime information is only applicable for primitive engines and is disabled
- for aggregates and other kinds of descriptors. This is where you specify the class name of the annotator
- implementation, if you are doing a Java implementation, or the C++ shared object or dll name,
- if you are doing a C++ implementation. Most Analysis Engines will specify that
- they update the CAS, and that they may be replicated (for performance reasons) when deployed. If
- a particular Analysis Engine must see every CAS (for instance, if it is counting the
- number of CASes), then uncheck the <quote>multiple deployment allowed</quote>
- box. If the Analysis Engine doesn't update the CAS, uncheck the <quote>updates
- the CAS</quote> box. (Most CAS Consumers do not update the CAS, and this parameter
- defaults to unchecked for new CAS Consumer descriptors).</para>
-
- <para>Analysis engines are written using the CAS Multiplier APIs
- (see <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>)
- can create additional CASes for analysis. To specify that they
- do this, check the <quote>returns new artifacts</quote>.</para>
-
- </section>
-
- <section id="ugr.tools.cde.overview_page.overall_id_info">
- <title>Overall Identification Information</title>
-
- <para>The Name should be a human-readable name that describes this component. The
- Version, Vendor, and Description fields are optional, and are arbitrary
- strings.</para>
-
- </section>
- </section>
-
- <section id="ugr.tools.cde.aggregate_page">
- <title>Aggregate Page</title>
-
- <para>For primitive Analysis Engines, Flow Controllers or Collection Processing
- components, the Aggregate page is not used. For aggregate engines, the page looks like
- this:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.7in" format="JPG" fileref="&imgroot;image010.jpg"/>
- </imageobject>
- <textobject><phrase>CDE Aggregate page</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>On the left we see a list of component engines, and on the right information about the
- flow. If you hover the mouse over an item in the list of component engines, that
- engine's description meta data will be shown. If you right-click on one of these
- items, you get an option to open that delegate descriptor in another editor instance.
- Any changes you make, however, won't be seen until you close and reopen the editor
- on the importing file.</para>
-
- <para>Engines can be added to the list on the left by clicking the Add button at the bottom of
- the Component Engine section. This brings up one of the following two dialogs:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="3.875in" format="JPG" fileref="&imgroot;import-by-location.jpg"/>
- </imageobject>
- <textobject><phrase>Adding an Analysis Engine to an Aggregate, by location</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>This dialog lets you select
- a descriptor from your workspace, or browse the file system to select a descriptor.
- </para>
-
- <para>Or, if you have selected to import by name, this dialog is shown:
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.296875in" format="JPG" fileref="&imgroot;import-by-name.jpg"/>
- </imageobject>
- <textobject><phrase>Adding an Analysis Engine to an Aggregate, by name</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>You can specify that the import should be by Name (the name is looked up using both the
- Project's class path, and DataPath), or by location. If it is by name,
- the dialog shows the available xml files on the class path, to pick from. If the
- one you want isn't showing, this means it isn't on the enclosing Eclipse Java Project's
- classpath, nor on the datapath, and one of those needs to be updated to include the
- path to the resource. If the name picked is
- <literal>com/company/prod/xyz.xml</literal>, the name in
- the descriptor will be <quote><literal>com.company.prod.xyz</literal></quote>.
- The "Browse the file system..." button is disabled when import by name is checked, because
- the file system is not the source of the imports - rather, its the resources on the
- classpath or datapath that are.</para>
-
- <para>
- If it is by location, the file reference is converted to a relative reference if
- possible, in the descriptor.</para>
-
- <para>The final selection at the bottom tells whether or not the selected engine(s)
- should automatically be added to the end of the flow section (the right section on the
- Aggregate page). The OK button does not become activated until a descriptor
- file is selected.</para>
-
- <para>To remove an analysis engine from the component engine list simply select an engine
- and click the Remove button, or press the delete key. If the engine is already in the flow
- list you will be warned that deletion will also delete the specified engine from this
- list.</para>
-
- <section id="ugr.tools.cde.aggregate_page.adding_components_more_than_once">
- <title>Adding components more than once</title>
-
- <para>Components may be added to the left panel more than once. Each of these components
- will be given a key which is unique. A typical reason this might be done is to use a
- component in a flow several times, but have each use be associated with different
- configuration parameters (different configuration parameters can be associated
- with each instance).</para>
- </section>
-
- <section
- id="ugr.tools.cde.aggregate_page.adding_removing_components_from_flow">
- <title>Adding or Removing components in a flow</title>
-
- <para>The button in-between the Component Engines and the Flow List, labeled
- <literal>>></literal>, adds a chosen engine to the flow list and the button
- labeled <literal><<</literal> removes an engine from the flow list. To add an
- engine to the flow list you must first select an engine from the left hand list, and then
- press the <literal>>></literal> button. Engines may appear any number of
- times in the flow list. To remove an engine from the flow list, select an engine from the
- right hand list and press the <literal><<</literal> button.</para>
- </section>
-
- <section id="ugr.tools.cde.aggregate_page.adding_remote_aes">
- <title>Adding remote Analysis Engines</title>
-
- <para>There are two ways to add remote engines: add an existing descriptor, which
- specifies a remote engine (just as if you were adding a non-remote engine) or use the
- Add Remote button which will create a remote descriptor, save it, and then import it,
- all in one operation. The Add Remote button enables you to easily specify the
- information needed to create a Service Client descriptor for a remote AE - one that
- runs on a different computer connected over the network. The Service Client
- descriptor is described in <olink targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.xml.component_descriptor.service_client"/>. The Add
- Remote button creates this descriptor, saves it as a file in the workspace, and
- imports it into the aggregate.</para>
-
- <para>Of course, if you already have a Service Client descriptor, you can add it to the
- set of delegates, just like adding other kinds of analysis engines.</para>
-
- <para>After clicking on Add Remote, the following dialog is displayed:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.7in" format="JPG" fileref="&imgroot;image014.jpg"/>
- </imageobject>
- <textobject><phrase>Adding a remote client to an aggregate</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>To define a remote service you specify the Service Kind, Protocol Service Type,
- URI and Key. You can also specify a Timeout in milliseconds, used by the SOAP service,
- and a VNS Host and Port used by the Vinci Service. Just like when one adds an engine from
- the file system, you have the option of adding the engine to the end of the flow. The
- Component Descriptor Editor currently only supports Vinci and SOAP services using
- this dialog.</para>
-
- <para>Remote engines are added to the descriptor using the
- <import ... > syntax. The information you specify here is saved in the Eclipse
- project as a file, using a generated name, <key-name>.xml, where
- <key-name> is the name you listed as the Key. Because of this, the key-name must
- be a valid file name. If you want a different name, you can change the path information
- in the dialog box.</para>
- </section>
-
- <section id="ugr.tools.cde.aggregate_page.connecting_to_remote_services">
- <title>Connecting to Remote Services</title>
-
- <para>If you are using the Vinci protocol, it requires that you specify the location of
- the Vinci Name Server (an IP address and a Port number). You can specify these in the
- service descriptor, or globally, for your Eclipse workspace, using the Eclipse menu
- item: Window → Preferences... → UIMA Preferences. If the remote service
- is available (up and running), additional operations become possible. For
- instance, hovering the mouse over the remote descriptor will show the description
- metadata from the remote service.</para>
- </section>
-
- <section id="ugr.tools.cde.aggregate_page.finding_aes_by_searching">
- <title>Finding Analysis Engines by searching</title>
-
- <para>The next button that appears between the component engine list and the flow list
- is the Find AE button. When this button is pressed the following dialog is displayed,
- which allows one to search for AEs by name, by input or output types, or by a combination
- of these criteria. This function searches the existing Eclipse workspace for
- matching *.xml descriptor source files; it does not look inside Jar files.
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.3in" format="JPG" fileref="&imgroot;image016.jpg"/>
- </imageobject>
- <textobject><phrase>Searching for an AE to add to an aggregate</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>The search automatically adds a <quote>match any characters</quote> - style
- (*) wildcard at the beginning and end of anything entered. Thus, if person is
- specified for an output type, a <quote>*person*</quote> search is performed. Such a
- search would match such things as <quote>my.namespace.person</quote> and
- <quote>person.governmentOfficial.</quote> One can search in all projects or one
- particular project. The search does an implicit <emphasis>and</emphasis> on all
- fields which are left non-blank.</para>
- </section>
-
- <section id="ugr.tools.cde.aggregate_page.component_engine_flow">
- <title>Component Engine Flow</title>
-
- <para>The UIMA SDK currently supports three kinds of sequencing flows: Fixed,
- CapabilityLanguageFlow (see <olink targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints.capability_language_flow"/>
- ), and user-defined. The first two require specification of a linear flow sequence;
- this linear flow sequence can also be read by a user-defined flow controller (what use
- is made of it is up to the user-defined flow controller). The Component Engine Flow
- section allows specification of these items.</para>
-
- <para>The pull-down labeled Flow Kind picks between the three flow models. When the
- user-defined flow is selected, the Browse and Search buttons become enabled to let
- you pick the flow controller XML descriptor to import.
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="3.8in" format="JPG" fileref="&imgroot;image018.jpg"/>
- </imageobject>
- <textobject><phrase>Specifying flow control</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>The key name value is set automatically from the XML descriptor being imported,
- and enables parameters to be overridden for that descriptor (see following
- sections).</para>
-
- <para>The Up and Down buttons to the right in the Flow section are activated when an
- engine in the flow is selected. The Up button moves the selected engine up one place in
- the execution order, and down moves the selected engine down one place in the
- execution order. Remember that engines can appear multiple times in the flow (or not
- at all).</para>
-
- </section>
- </section>
-
- <section id="ugr.tools.cde.parm_definition">
- <title>Parameters Definition Page</title>
-
- <para>There are two pages for parameters: the first one is where parameters are defined,
- and the second one is where the parameter settings are configured. The first page is the
- Parameter Definition page and has two alternatives, depending on whether or not the
- descriptor is an Aggregate or not. We start with a description of parameter definitions
- for Primitive engines, CAS Consumers, Collection Readers, CAS Initializers, and Flow
- Controllers. Here is an example:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.2in" format="JPG" fileref="&imgroot;image020.jpg"/>
- </imageobject>
- <textobject><phrase>Parameter Definitions - not Aggregate</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>The first checkbox at the top simplifies things if you are not using Parameter
- Groups (see the following section for a discussion of groups). In this case, leave the
- check box unchecked. The main area shows a list of parameter definitions. Each
- parameter has a name, which must be unique for this Analysis Engine. The other three
- attributes specify whether the parameter can have a single or multiple values (an array
- of values), whether it is Optional or Mandatory, and what the value type it can hold
- (String, Integer, Float, and Boolean).</para>
-
- <para>In addition to using the buttons on the right to edit this information, you can
- double-click a parameter to edit it, or remove (delete) a selected parameter by
- pressing the delete key. Use the Add button to add a new parameter to the list.</para>
-
- <para>Parameters have an additional description field, which you can specify when you
- add or edit a parameter. To see the value of the description, hover the mouse over the
- item, as shown in the picture below:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.5in" format="JPG" fileref="&imgroot;image022.jpg"/>
- </imageobject>
- <textobject><phrase>Parameter description shown in a hover message</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <section id="ugr.tools.cde.parm_definition.using_groups">
- <title>Using groups</title>
-
- <para>The group concept for parameters arose from the observation that sets of
- parameters were sometimes associated with different configuration needs. As an
- example, you might have an Analysis Engine which needed different configuration
- based on the language of a document.</para>
-
- <para>To use groups, you check the <quote>Use Parameter Groups</quote> box. When you
- do this, you get the ability to add groups, and to define parameters within these
- groups. You also get a capability to define <quote>Common</quote> parameters,
- which are parameters which are defined for all groups. Here is a screen shot showing
- some parameter groups in use:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.2in" format="JPG" fileref="&imgroot;image024.jpg"/>
- </imageobject>
- <textobject><phrase>Using parameter groups</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>You can see the <quote><Common></quote> parameters as well as two
- different sets of groups.</para>
-
- <para>The Default Group is an optional specification of what Group to use if the
- parameter is not available for the group requested.</para>
-
- <para>The Search strategy specifies what to do when a parameter is not available for the
- group requested. It can have the values of None, language_fallback, or
- default_fallback. These are more fully described in the section <olink
- targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration"/>
- .</para>
-
- <para>Groups are added using the Add Group button. Once added, they can be edited or
- removed, using the buttons to the right, or the standard gestures for editing
- (double-clicking the item) and removing (pressing the delete key after an item is
- selected). Removing a group removes all the parameter definitions in the group. If
- you try and remove the <quote><Common></quote> group, it just removes the
- parameters in the group.</para>
-
- <para>Each entry for a group in the table specifies one or more group names. For example,
- the highlighted entry above, specifies two groups: <quote>myNewGroup2</quote>
- and <quote>mg3</quote>. The parameter definition underneath is considered to be in
- both groups.</para>
-
- </section>
-
- <section id="ugr.tools.cde.parm_definition.aggregates">
- <title>Parameter declarations for Aggregates</title>
-
- <para>Aggregates declare parameters which always must override a parameter setting
- for a component making up the aggregate. They do this using the version of this page
- which is shown when the descriptor is an Aggregate; here's an example:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.7in" format="JPG" fileref="&imgroot;image026.jpg"/>
- </imageobject>
- <textobject><phrase>Aggregate parameters</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>There is an additional panel shown (on the right) which lists all of the
- components by their key names, and shows for each of them their defined parameters. To
- add a new override for one or more of these parameters to the aggregate, select the
- component parameter you wish to override and push the Create Override button (or, you
- can just double-click the component parameter). This will automatically add a
- parameter of the same name (by default – you can change the name if you like) to
- the aggregate, putting it into the same group(s) (if groups are being used in the
- component – this is required), and setting the properties of the parameter to
- match those of the component (this is required).</para>
- <note><para>If the name of the parameter being added already is in use in the aggregate,
- and the parameters are not compatible, a new parameter name is generated by suffixing
- the name with a number. If the parameters are compatible, the selected component
- parameter is added to the existing aggregate parameter, as an additional override. If
- you don't want this behavior, but want to have a new name generated in this case,
- push the Create non-shared Override button instead, or hold down the
- <quote>shift</quote> key when double clicking the component parameter.</para>
-
- <para>The required / optional setting in the aggregate parameter is set to match that of
- the parameter being overridden. You may want to make an optional delegate parameter
- required. You can do this by changing that value manually in the source editor view.
- </para></note>
-
- <para>In the above example, the user has just double-clicked the
- <quote>TypeNames</quote> parameter in the <quote>NameRecognizer</quote>
- component. This added that parameter to this aggregate under the <quote><Not in
- any group></quote> section – since it wasn't part of a group.</para>
-
- <para>Once you have added a parameter definition to the aggregate, you can use the
- buttons on the right side of the left panel to add additional overrides or remove
- parameters or their overrides. <phrase
- id="ugr.tools.cde.parm_definition.removing_groups"> You can also remove
- groups; removing a group is like removing all the parameter definitions in the
- group.</phrase></para>
-
- <para>In addition to adding one parameter at a time from a component, you can also add all
- the parameters for a group within a component, or all the parameters in the component,
- by selecting those items.</para>
-
- <para>If you double-click (or push Create Override) the
- <quote><Common></quote> group or a parameter in the <Common> group in
- a component, a special group is created in the Aggregate consisting of all of the
- groups in that component, and the overriding parameter (or parameters) are added to
- that. This is done because each component can have different groups belonging to the
- Common group notion; the Common group for a component is just shorthand for all the
- groups in that component.</para>
-
- <para>The Aggregate's specification of the default group and search strategy
- override any specifications contained in the components.</para>
-
- </section>
- </section>
-
- <section id="ugr.tools.cde.parameter_settings">
- <title>Parameter Settings Page</title>
-
- <para>The Parameter Settings page is rather straightforward; it is where the user
- defines parameter settings for their engines. An example of such a page is given below:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.7in" format="JPG" fileref="&imgroot;image028.jpg"/>
- </imageobject>
- <textobject><phrase>Parameter settings page</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>For single valued attributes, the user simply types the default value into the
- Value box on the right hand side. For multi-valued parameters the user should use the
- Add, Edit and Remove buttons to manage the list of multiple parameter values.</para>
-
- <para>Values within groups are shown with each group separately displayed, to allow
- configuring different values for each group.</para>
-
- <para>Values are checked for validity. For Boolean values in a list, use the words
- <literal>true</literal> or <literal>false</literal>.</para>
- <note><para>If you specify a value in a single-valued parameter, and then delete all the
- characters in the value, the CDE will treat this as if you wanted to not specify any setting
- for this parameter. In order to specify a 0 length string setting for a String-valued
- parameter, you will have to manually edit the XML using the <quote>Source</quote> tab.
- </para>
- <para> For array valued parameters, if you remove all of the entries for a particular array
- parameter setting, the XML will reflect a 0-length array. To change this to an
- unspecified parameter setting, you will have to manually edit the XML using the
- <quote>Source</quote> tab. </para></note>
-
- </section>
-
- <section id="ugr.tools.cde.type_system">
- <title>Type System Page</title>
-
- <para>This page declares the type system used by the annotator. For aggregates it is
- derived by merging the type systems of all constituent AEs. The types used by the AE
- constitute the language in which the inputs and outputs are described in the
- Capabilities page and also affect the choice of indexes on the Indexes page. The Type
- System page looks like the following:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.7in" format="JPG" fileref="&imgroot;image030.jpg"/>
- </imageobject>
- <textobject><phrase>Type System declaration page</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>Before discussing this page in detail, it is important to note that there are two
- settings that affect the operation of this page. These are accessed by selecting the
- UIMA → Settings (or by going to the Eclipse Window → Preferences → UIMA
- Preferences) and checking or unchecking one of the following: <quote>Auto generate
- .java files when defining types</quote> and <quote>Display fully qualified type
- names.</quote></para>
-
- <para id="ugr.tools.cde.auto_jcasgen">When the Auto generate option is checked and the development language for the AE is
- Java, any time a change is made to a type and the change is saved, the corresponding .java
- files are generated using the JCasGen tool. The results are stored in the primary source
- directory defined for the project. The primary source directory is that listed first
- when you right click on your project and select Properties → Java Build Path, click
- on the Source tab and look in the list box under the text that reads: <quote>Source folder
- on build path.</quote> If no source folders are defined, you will get a warning that you
- have no source folders defined and JCasGen will not be run. (For information on JCasGen
- see <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/>).
- When JCasGen is run, you can monitor the progress of the generation by observing the
- status on the Eclipse status line (normally at the bottom of the Eclipse window).
- JCasGen runs on the fully-merged type system, consisting of the type specification
- plus any imported type system, plus (for aggregates) the merged type systems of all the
- components in an aggregate.</para>
-
- <warning><para>If the components of the aggregate have different definitions for the same
- type name, the CDE will show a warning. It is possible to continue past this warning,
- in which case the CDE will produce the correct
- Java source files representing the merged types (that is, the
- type definition that contains all of the features defined on that type by all of your
- components). However, it is not recommended to use this feature
- (of having different definitions for the same type name) since it can make it
- difficult to combine/package your annotator with others. See <olink
- targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.jcas.merging_types_from_other_specs"/> for more information.
- </para></warning>
-
- <note><para>In addition to running automatically, you can manually run JCasGen on the
- fully merged type system by clicking the JCasGen button, or by selecting Run JCasGen from
- the UIMA pulldown menu: </para></note>
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.2in" format="JPG" fileref="&imgroot;image032.jpg"/>
- </imageobject>
- <textobject><phrase>Setting JCasGen options</phrase>
- </textobject>
- </mediaobject>
- </screenshot>
-
- <para>When <quote>Display fully qualified type names</quote> is left unchecked, the
- namespace of types is not displayed, i.e. if a fully qualified type name is
- my.namespace.person, only the abbreviated type name person will be displayed. In the
- Type page diagram shown above, <quote>Display fully qualified type names</quote> is
- in fact unchecked.</para>
-
- <para>To add, edit, or remove types the buttons on the top left section are used. When
- adding or editing types, fully qualified type names should of course be used,
- regardless of whether the <quote>Display fully qualified type names</quote> is
- unchecked. Removing or editing a type will have a cascading effect in that the type
- removal/edit will effect inputs, outputs, indexes and type priorities in the natural
- way.</para>
-
- <para>When a type is added, this dialog is shown:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="4.2in" format="JPG" fileref="&imgroot;image034.jpg"/>
- </imageobject>
- <textobject><phrase>Adding a type</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>Type names should be specified using a namespace. The namespace is like a Java
- package name, and serves to insure type names are unique. It also serves as the package
- name for the generated JCas classes. The namespace name is the set of names up to the last
- period in the string.</para>
-
- <para>The supertype must be picked from an existing type. The entry field for the
- supertype supports Eclipse-style content assist. To use it, put the cursor in the
- supertype field, and type a letter or two of the supertype name (lower case is fine),
- either starting with the name space, or just with the type name (without the name space),
- and hold down the Control key and then press the spacebar. When you do this, you can see a
- list of suitable matching types. You can then type more letters to narrow down your
- choices, or pick the right entry with the mouse.</para>
-
- <para>To see the available types and pick one, press the Browse button. This will show the
- available types, and as you type letters for the type name (in lower case –
- capitalization is ignored), the available types that match are narrowed. When
- you've typed enough to specify the type you want, press Enter. Or you can use the
- list of matching type names and pick the one you want with the mouse.</para>
-
- <para>Once you've added the type, you can add features to it by highlighting the
- type, and pressing the Add button.</para>
-
- <para>If the type being defined is a subtype of uima.cas.String, the Add button allows you
- to add allowed values for the string, instead of adding features.</para>
-
- <para>To edit a type or feature, you can double click the entry, or highlight the entry and
- press the Edit button. To delete a type or feature, you highlight the entry to be deleted,
- and click the delete button or push the delete key.</para>
-
- <para>If the range of a feature is an array or one of the built-in list types, an additional
- specification allows you to specify if multiple references to the object referenced by
- this feature are allowed. If they are not allowed then the XMI serialization of
- instances of this type use a more efficient format.</para>
-
- <para>If the range of a feature is an array of Feature Structures, then it is possible to
- specify an element type for the array. This information is used in the XMI serialization
- and also by the JCas generation routines to generate more efficient code.
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="4.2in" format="JPG" fileref="&imgroot;image036.jpg"/>
- </imageobject>
- <textobject><phrase>Specifying a Feature Structure</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>It is also possible to import type systems for inclusion in your descriptor. To do
- this, use the Type Import panel's<literal> Add...</literal> button. This
- allows you to import a type system descriptor.</para>
-
- <para>When importing by name, the name is resolved using the class path for the Eclipse
- project containing the descriptor file being edited, or by looking up this name in the
- UIMA DataPath. The DataPath can be set by pushing the Set DataPath button. It will be
- remembered for this Eclipse project, as a project Property, so you only have to set it
- once (per project). The value of the DataPath setting is written just like a class path,
- and can include directories or JAR files, just as is true for class paths.</para>
-
- <para>The following dialog allows you to pick one or more files from the Eclipse
- workspace, or one file (at a time) from the file system:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="3.5in" format="JPG" fileref="&imgroot;import-chooser.jpg"/>
- </imageobject>
- <textobject><phrase>Picking files for importing</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>This is essentially the same dialog as was used to add component engines to an
- aggregate. To import from a type system descriptor that is not part of your Eclipse
- workspace, click the Browse the file system.... button.</para>
-
- <para>Imported types are validated, and if OK, they are added to the list in the Imported
- Type Systems section of the Type System page. Any types they define are merged with the
- existing type system.</para>
-
- <para>Imported types and features which are only defined in imports are shown in the Type
- System section, but in a grayed-out font; these type cannot be edited here. To change
- them, open up the imported type system descriptor, and change them there.</para>
-
- <para>If you hover the mouse over an import specification, it will show more information
- about the import. If you right-click, it will bring up a context menu that allows opening
- the imported file in the Editor, if the imported file is part of the Eclipse workspace.
- Changes you make, however, won't be seen until you close and reopen the editor on
- the importing file.</para>
-
- <para>It is not possible to define types for an aggregate analysis engine. In this case the
- type system is computed from the component AEs. The Type System information is shown in a
- grayed-out font.</para>
-
- <section id="ugr.tools.cde.type_system.exporting">
- <title>Exporting</title>
-
- <para>In addition to importing type specifications, you can export as well. When you
- push the Export... button, the editor will create a new importable XML descriptor for
- the types in this type system, and change the existing descriptor to import that newly
- created one.
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="3.75in" format="JPG" fileref="&imgroot;image040.jpg"/>
- </imageobject>
- <textobject><phrase>Exporting a type system</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>The base file name you type is inserted into the path in the line below
- automatically. You can change the path where the generated part descriptor is stored
- by overtyping the lower text box. When you click OK, the new part descriptor will be
- generated, and the current descriptor will be changed to import that part.</para>
-
- </section>
- </section>
-
- <section id="ugr.tools.cde.capabilities">
- <title>Capabilities Page</title>
-
- <para>Capabilities come in <quote>sets</quote>. You can have multiple sets of
- capabilities; each one specifies languages supported, plus inputs and outputs of the
- Analysis Engine. The idea behind having multiple sets is the concept that different
- inputs can result in different outputs. Many Analysis Engines, though, will probably
- define just one set of capabilities. A sample Capabilities page is given below:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.2in" format="JPG" fileref="&imgroot;image042.jpg"/>
- </imageobject>
- <textobject><phrase>Capabilities page</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>When defining the capabilities of a primitive analysis engine, input and output
- types can be any type defined in the type system. When defining the capabilities of an
- aggregate the inputs must be a subset of the union of the inputs in the constituent
- analysis engines and the outputs must be a subset of the union of the outputs of the
- constituent analysis engines.</para>
-
- <para>To add a type, first select something in the set you wish to add the type to, and press
- Add Type. The following dialog appears presenting the user with a list of types which are
- candidates for additional inputs:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="4.4in" format="JPG" fileref="&imgroot;image044.jpg"/>
- </imageobject>
- <textobject><phrase>Adding a type to the capabilities page</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>Follow the instructions to mark the types as input and / or output (a type can be
- both). By default, the <all features> flag is set to true. If you want to specify a
- subset of features of a type, read on.</para>
-
- <para>When types have features, you can specify what features are input and / or output. A
- type doesn't have to be an output to have an output feature. For example, an
- Analysis Engine might be passed as input a type Token, and it adds (outputs) a feature to
- the existing Token types. If no new Token instances were created, it would not be an
- output Type, but it would have features which are output.</para>
-
- <para>To specify features as input and / or output (they can be both), select a type, and
- press Add. The following dialog box appears:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="4in" format="JPG" fileref="&imgroot;image046.jpg"/>
- </imageobject>
- <textobject><phrase>Specifying features as input or output</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>To mark a feature as being input and / or output, click the mouse in the input and / or
- output column for the feature. If you select <all features>, it unmarks any
- individual feature you selected, since <all features> subsumes all the
- features.</para>
-
- <para>The Languages part of the capability is where you specify what languages are
- supported by the Analysis Engine. Supported languages should be listed using either a
- two letter ISO-639 language code, or an ISO-639 language code followed by a hyphen and then a two-letter
- ISO-3166 country code. Add a language by selecting Languages and pressing the Add
- button. The dialog for adding languages is given below.
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="4in" format="JPG" fileref="&imgroot;image048.jpg"/>
- </imageobject>
- <textobject><phrase>Specifying a language</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>The Sofa part of the capability is optional; it allows defining Sofa names that this
- component uses, and whether they are input (meaning they are created outside of this
- component, and passed into it), or output (meaning that they are created by this
- component). Note that a Sofa can be either input or output, but can't be
- both.</para>
-
- <para>To add a Sofa name (which is synonymous with the view name), press the Add Sofa
- button, and this dialog appears:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="4.2in" format="JPG" fileref="&imgroot;image050.jpg"/>
- </imageobject>
- <textobject><phrase>Specifying a Sofa name</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <section id="ugr.tools.cde.capabilities.sofa_name_mapping">
- <title>Sofa (and view) name mappings</title>
-
- <para>Sofa names, once created, are used in Sofa Mappings. These are optional
- mappings, done in an aggregate, that specify which Sofas are the same ones but with
- different names. The Sofa Mappings section is minimized unless you are editing an
- Aggregate descriptor, and have one or more Sofa names defined for the aggregate. In
- that case, the Sofa Mappings section will look like this:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.4in" format="JPG" fileref="&imgroot;image052.jpg"/>
- </imageobject>
- <textobject><phrase>Sofa mappings</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>Here the aggregate has defined two input Sofas, named
- <quote>MyInputSofa</quote>, and <quote>AnotherSofa</quote>. Any named sofas in
- the aggregate's capabilities will appear in the Sofa Mapping section, listed
- either under Inputs or Outputs. Each name in the Mappings has 0 or more delegate
- (component) sofa names mapped to it. A delegate may have multiple Sofas, as in this
- example, where the GovernmentOfficialRecognizer delegate has Sofas named
- <quote>so1</quote> and <quote>so2</quote>.</para>
-
- <para>Delegate components may be written as Single-View components. In this case,
- they have one implicit, default Sofa (<quote>_InitialView</quote>), and to map to
- it you use the form shown for the <quote>NameRecognizer</quote> – you map to
- the delegate's key name in the aggregate, without specifying a Sofa name. You
- can also specify the sofa name explicitly, e.g.,
- NameRecognizer/_InitialView.</para>
-
- <para>To add a new mapping, select the Aggregate Sofa name you wish to add the mapping
- for, and press the Add button. This brings up a window like this, showing all available
- delegates and their Sofas; select one or more (use the normal multi-select methods)
- of these and press OK to add them.
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.7in" format="JPG" fileref="&imgroot;image054.jpg"/>
- </imageobject>
- <textobject><phrase>Adding a Sofa mapping</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>To edit an existing mapping, select the mapping and press Edit. This will show the
- existing mapping with all mapped items <quote>selected</quote>, and other
- available items unselected. Change the items selected to match what you want,
- deselecting some, and perhaps selecting others, and press OK.</para>
-
- </section>
- </section>
-
- <section id="ugr.tools.cde.indexes">
- <title>Indexes Page</title>
-
- <para>The Indexes page is where the user declares what indexes and type priority lists are
- used by the analysis engine. Indexes are used to determine which Feature
- Structures of a particular type are fetched, using an iterator in the UIMA API. An
- unpopulated Indexes page is displayed below:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.5in" format="JPG" fileref="&imgroot;image056.jpg"/>
- </imageobject>
- <textobject><phrase>Index page</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>Both indexes and type priority lists can have imports. These imports work just like
- the type system imports, described above. Both indexes and type priority lists can be
- exported to new component descriptors, using the Export... button, just like the type
- system export operation described above.</para>
-
- <para>The built-in Annotation Index is always present. It is based on the built-in type
- <literal>uima.tcas.Annotation </literal>and has keys begin (Ascending), end
- (Descending) and TYPE_PRIORITY. There are no built-in type priorities, so this last
- sort item does not play a role in the index unless type priorities are specified.</para>
-
- <para>Type priority may be combined with other keys. Type priorities are defined in the
- Priority Lists section, using one or more priority list. A given priority list gives an
- ordering among a group of types. Types that appear higher in the priority list are given
- higher priority, in other words, they sort first when TYPE_PRIORITY is specified as the
- index key. Subtypes of these types are also ordered in a consistent manner, unless
- overridden by another specific type priority specification. To get the ordering used
- among all the types, all of the type priority lists are merged. This gives a partial
- ordering among the types. Ties are resolved in an unspecified fashion. The Component
- Descriptor Editor checks for incompatible orderings, and informs the user if they
- exist, so they can be corrected.</para>
-
- <para>To create a new index, use the Add Index button in the top left section. This brings up
- this dialog:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="4in" format="JPG" fileref="&imgroot;image058.jpg"/>
- </imageobject>
- <textobject><phrase>Adding a new index</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>Each index needs a globally unique index name. Every index indexes one CAS type (including
- its subtypes). If you're using Eclipse 3.2 or later, the entry field for this
- has content assist (start typing the type name
- and press Control – Spacebar to get help, or press the Browse button to pick a
- type).</para>
-
- <para>Indexes can be sorted, in which case you need to specify one or more keys to sort on.
- Sort keys are selected from features whose range type is Integer, Float, or String. Some
- elements will be disabled if they are not relevant. For instance, if the index kind is
- <quote>bag</quote>, you cannot provide sort keys. The order of sort keys can be
- adjusted using the up and down buttons, if necessary.</para>
-
-
- <note><para>There is usually no need to explicitly declare a Bag index in your descriptor.
- As of UIMA v2.1, if you do not declare any index for a type (or any of its
- supertypes), a Bag index will be automatically created. This index is
- accessed using the <literal>getAllIndexedFS(...)</literal> method defined on the index repository.</para></note>
-
-
- <para>A set index will contain no duplicates of the same type, where a duplicate is defined
- by the indexing comparator. That is, if you commit two feature structures of the same
- type that are equal with respect to the indexing comparator, only the first one will be
- entered into the index. Note that you can still have duplicates with respect to the
- indexing order, if they are of a different type. A set index is not guaranteed to be
- sorted. If no keys are specified for a set index, then all instances are considered by
- default to be equal, so only the first instance (for a particular type or subtype of the
- type being indexed) is indexed. On the other hand, <quote>bag</quote> indicates that
- all annotation instances are indexed, including duplicates.</para>
-
- <para>The Priority Lists section of the Indexes page is used to specify Priority Lists of
- types. Priority Lists are unnamed ordered sets of type names. Add a new priority list by
- clicking the Add Set button. Add a type to an existing priority list by first selecting
- the set, and then clicking Add. You can use the up and down buttons to adjust the order as
- necessary; these buttons move the selected item up or down.</para>
-
- <para>Although it is possible to import self-contained index and type priority files,
- the creation of such files is not yet supported by the Component Descriptor Editor. If
- you create these files using another editor, they can be imported using the
- corresponding Import panels, shown on the right. Imports are specified in the same
- manner as they are for Type System imports.</para>
-
- </section>
-
- <section id="ugr.tools.cde.resources">
- <title>Resources Page</title>
-
- <para>The resources page describes resource dependencies (for primitive Analysis
- Engines) and external Resource specification and their bindings to the resource
- dependencies.</para>
-
- <para>Only primitive Analysis Engines define resource dependencies. Primitive and
- Aggregate Analysis Engines can define external resources and connect them (bind them)
- to resource dependencies.</para>
-
- <para>When an Aggregate is providing an external resource to be bound to a dependency, the
- binding is specified using a possibly multi-level path, starting at the Aggregate, and
- specify which component (by its key name), and then if that component is, in turn, an
- Aggregate, which component (again by its key name), and so on until you reach a
- primitive. The sequence of key names is made into the binding specification by joining
- the parts with a <quote>/</quote> character. All of this is done for you by the Component
- Descriptor Editor.</para>
-
- <para>Any external resource provided by an Aggregate will override any binding provided
- by any lower level component for the same resource dependency.</para>
-
- <para>There are two views of the Resources page, depending on whether the Analysis Engine
- is an Aggregate or Primitive. Here's the view for a Primitive:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5in" format="JPG" fileref="&imgroot;image060.jpg"/>
- </imageobject>
- <textobject><phrase>Resources page for a primitive</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>To declare a resource dependency, click the Add button in the right hand panel. This
- puts up the dialog:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="4in" format="JPG" fileref="&imgroot;image062.jpg"/>
- </imageobject>
- <textobject><phrase>Specifying a resource dependency</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>The Key must be unique within the descriptor declaring it. The Interface, if
- present, is the name of a Java interface the Analysis Engine uses to access the
- resource.</para>
-
- <para>Declare actual External resource on the left side of the page. Clicking
- <quote>Add</quote> brings up this dialog:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="4.2in" format="JPG" fileref="&imgroot;image064.jpg"/>
- </imageobject>
- <textobject><phrase>Specifying an External Resource</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>The Name must be unique within this Analysis Engine. The URL identifies a file
- resource. If both the URL and URL suffix are used, the file resource is formed by
- combining the first URL part with the language-identifier, followed by the URL suffix;
- see <olink targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration"/>
- . URLs may be written as <quote>relative</quote> URLs; in this case they are resolved by
- looking them up relative to the classpath and/or datapath. A relative URL has the path
- part starting without an intial <quote>/</quote>; for example:
- file:my/directory/file. An absolute URL starts with file:/ or file:/// or
- file://some.network.address/. For more information about URLs, please read the
- javaDoc information for the Java class <quote>URL</quote>.</para>
-
- <para>The Implementation is optional, and if given, must be a Java class that implements
- the interface specified in any Resource Dependencies this resource is bound
- to.</para>
-
- <section id="ugr.tools.cde.resources.binding">
- <title>Binding</title>
-
- <para>Once you have an external resource definition, and a Resource Dependency, you
- can bind them together. To do this, you select the two things (an external resource
- definition, and a Resource Dependency) that you want to bind together, and click
- Bind.</para>
-
- </section>
-
- <section id="ugr.tools.cde.resources.aggregates">
-
- <title>Resources with Aggregates</title>
-
- <para>When editing an Aggregate Descriptor, the Resource definitions panel will show
- all the resources at the primitive level, with paths down through the components
- (multiple levels, if needed) to get to the primitives. The Aggregate can define
- external resources, and bind them to one or more uses by the primitives.</para>
-
- </section>
-
- <section id="ugr.tools.cde.resources.imports_exports">
- <title>Imports and Exports</title>
-
- <para>Resource definitions and their bindings can be imported, just like other
- imports. Existing Resource definitions and their bindings can be exported to a new
- importable part, and replaced with an import for that importable part, using the
- <quote>Export...</quote> button, just like the similar function on the Type System
- page.</para>
-
- </section>
- </section>
-
- <section id="ugr.tools.cde.source">
- <title>Source Page</title>
-
- <para>The Source page is a text view of the xml content of the Analysis Engine or Type System
- being configured. An example of this page is displayed below:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="5.7in" format="JPG" fileref="&imgroot;image066.jpg"/>
- </imageobject>
- <textobject><phrase>Source page</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>Changes made in the GUI are immediately reflected in the xml source, and changes
- made in the xml source are immediately reflected back in the GUI. The thought here is that
- the GUI view and the Source view are just two ways of looking at the same data. When the data
- is in an unsaved state the file name is prefaced with an asterisk in the currently
- selected file tab in the editor pane inside Eclipse (as in the example above).</para>
-
- <para>You may accidentally create invalid descriptors or XML by editing directly in the
- Source view. If you do this, when you try and save or when you switch to a different view,
- the error will be detected and reported. In the case of saving, the file will be saved,
- even if it is in an error state.</para>
-
- <section id="ugr.tools.cde.source.formatting">
- <title>Source formatting – indentation</title>
-
- <para>The XML is indented using an indentation amount saved as a global UIMA
- preference. To change this preference, use the Eclipse menu item: Windows →
- Preferences → UIMA Preferences.</para>
-
- </section>
- </section>
-
- <section id="ugr.tools.cde.creating_self_contained_type_system">
- <title>Creating a Self-Contained Type System</title>
-
- <para>It is also possible to use the Component Descriptor Editor to create or edit
- self-contained type systems. To create a self-contained type system, select the menu
- item File → New → Other and then select Type System Descriptor File. From the
- next page of the selection wizard specify a Parent Folder and File name and click Finish.
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="3.5in" format="JPG" fileref="&imgroot;image068.jpg"/>
- </imageobject>
- <textobject><phrase>Working with a self-contained type system</phrase>
- </textobject>
- </mediaobject>
- </screenshot>
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="3.5in" format="JPG" fileref="&imgroot;image070.jpg"/>
- </imageobject>
- <textobject><phrase></phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>This will take you to a version of the Component Descriptor Editor for editing a type
- system file which contains just three pages: an overview page, a type system page, and a
- source page. The overview page is a bit more spartan than in the case of an AE. It looks like
- the following:
-
-
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata width="3.7in" format="JPG" fileref="&imgroot;image072.jpg"/>
- </imageobject>
- <textobject><phrase>Editing a type system object</phrase>
- </textobject>
- </mediaobject>
- </screenshot></para>
-
- <para>Just like an AE has an associated name, version, vendor and description, the same is
- true of a self-contained type system. The Type System page is identical to that in an AE
- descriptor file, as is the Source page. Note that a self-contained type system can
- import type systems just like the type system associated with an AE.</para>
-
- <para>A type system component can also be created from an existing descriptor which
- contains a type system definition section, by clicking on the Export... button on the
- Type System page.</para>
-
- </section>
-
- <section id="ugr.tools.cde.creating_other_descriptor_components">
- <title>Creating Other Descriptor Components</title>
-
- <para>The new wizard can create several other kinds of components: Collection
- Processing Management (CPM) components, flow controllers, and importable parts
- (besides Type Systems, described above, Indexes, Type Priorities, and Resource
- Manager Configuration imports).</para>
-
- <para>The CPM components supported by this editor include the Collection Reader, CAS
- Initializer, and CAS Consumer descriptors. Each of these is basically treated just
- like a primitive AE descriptor, with small changes to accommodate the different
- semantics. For instance, a CAS Consumer can't declare in its capabilities
- section that it outputs types or features.</para>
-
- <para>Flow controllers are components that control the flow of CASes within an
- aggregate, an are edited in a similar fashion as a primitive Analysis Engine.</para>
-
- <para>The importable part support requires context information to enable the editor to
- work, because much of the power of this editor comes from extensive checking that
- requires additional information, other than what is available in just the importable
- part. For instance, when you create or edit an Indexes import, the facility for adding
- new indexes needs the type information, which is not present in this part when it is
- edited alone. </para>
-
- <para>To overcome this, when you edit these descriptors, you will be asked to
- specify a context descriptor, usually a descriptor which would import the part being
- edited, which would have the additional information needed. </para>
-
- <para>Various methods are used
- to guess what the context descriptor should be - and if the guess is correct, you can just
- press the Enter key to confirm. The last successful context file is remembered and will
- be suggested as the context file to use at the next edit session</para>
- </section>
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
+<!ENTITY imgroot "../images/tools/tools.cde/" >
+<!ENTITY % uimaents SYSTEM "../entities.ent" >
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tools.cde">
+ <title>Component Descriptor Editor User's Guide</title>
+ <titleabbrev>CDE User's Guide</titleabbrev>
+
+ <para>The Component Descriptor Editor is an Eclipse plug-in that provides a forms-based
+ interface for creating and editing UIMA XML descriptors. It supports most of the
+ descriptor formats, except the Collection Processing Engine descriptor, the PEAR
+ package descriptor and some remote deployment descriptors.</para>
+
+ <section id="ugr.tools.cde.launching">
+ <title>Launching the Component Descriptor Editor</title>
+
+ <para>Here's how to launch this tool on a descriptor contained in the examples. This
+ presumes you have installed the examples as described in the SDK Installation and Setup
+ chapter.</para>
+
+ <itemizedlist spacing="compact"><listitem><para>Expand the uimaj-examples
+ project in the Eclipse Navigator or Package Explorer view</para></listitem>
+
+ <listitem><para>Within this project, browse to the file
+ descriptors/tutorial/ex1/RoomNumberAnnotator.xml.</para></listitem>
+
+ <listitem><para>Right-click on this file and select Open With → Component
+ Descriptor Editor. (If this option is not present, check to make sure you installed
+ the plug-ins as described <olink targetdoc="&uima_docs_overview;"
+ targetptr="ugr.ovv.eclipse_setup.installation"/>. The EMF plugin is also
+ required.).</para></listitem>
+
+ <listitem><para>This should open a graphical editor and display the contents of the
+ RoomNumberAnnotator descriptor. </para></listitem></itemizedlist>
+
+ </section>
+
+ <section id="ugr.tools.cde.creating_new_ae_descriptor">
+ <title>Creating a New AE Descriptor</title>
+
+ <para>A new AE descriptor file may be created by selecting the File → New →
+ Other... menu. This brings up the following dialog:
+
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>
+ </imageobject>
+ <textobject><phrase>Screenshot of selecting new UIMA component in Eclipse</phrase>
+ </textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>If the user then selects UIMA and Analysis Engine Descriptor File, and clicks the
+ Next > button, the following dialog is displayed. We will cover creating other kinds
+ of components later in the documentation.
+
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="3.2in" format="JPG" fileref="&imgroot;image004.jpg"/>
+ </imageobject>
+ <textobject><phrase>Screenshot of selecting new UIMA component in Eclipse
+ after pushing Next</phrase>
+ </textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>After entering the appropriate parent folder and file name, and clicking Finish,
+ an initial AE descriptor file is created with the given name, and the descriptor is
+ opened up within the Component Descriptor Editor.</para>
+
+ <para>At this point, the display inside the Component Descriptor Editor is the same
+ whether one started by creating a new AE descriptor, as in the preceding paragraph, or
+ one merely opened a previously created AE descriptor from, say, the Package Explorer
+ view. We show a previously created AE in the figure below:
+
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/>
+ </imageobject>
+ <textobject><phrase>Screenshot of CDE showing overview page</phrase>
+ </textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>To see all the information shown in the main editor pane with less scrolling, double
+ click the title tab to toggle between the <quote>full screen</quote> and normal
+ views.</para>
+
+ <para>It is possible to set the Component Descriptor Editor as the default editor for all
+ .xml files by going to Window → Preferences, and then selecting File Associations
+ on the left, and *.xml on the right, and finally by clicking on Component Descriptor
+ Editor, the Default button and then OK. If AE and Type System descriptors are not the
+ primary .xml files you work with within the Eclipse environment, we recommend not
+ setting the Component Descriptor Editor as your default editor for all .xml files. To
+ open an .xml file using the Component Descriptor Editor, if the Component Descriptor
+ Editor is not set as your default editor, right click on the file in the Package Explorer,
+ or other navigational view, and select Open With → Component Descriptor Editor.
+ This choice is remembered by Eclipse for subsequent open operations.</para>
+
+ </section>
+
+ <section id="ugr.tools.cde.pages_within_the_editor">
+ <title>Pages within the Editor</title>
+
+ <para>The Component Descriptor Editor follows a standard Eclipse paradigm for these
+ kinds of editors. There are several pages in the editor; each one can be selected, one at a
+ time, by clicking on the bottom tabs. The last page contains the actual XML source file
+ being edited, and is displayed as plain text.</para>
+
+ <para>The same set of tabs appear at the bottom of each page in the Component Descriptor
+ Editor. The Component Descriptor Editor uses this <quote>multi-page editor</quote>
+ paradigm to give the user a view of conceptually distinct portions of the Descriptor
+ metadata in separate pages. At any point in time the user may click on the Source tab to
+ view the actual XML source. The Component Descriptor Editor is, in a way, just a fancy GUI
+ for editing the XML. The tabs provide quick access to the following pages: Overview,
+ Aggregate, Parameters, Parameter Settings, Type System, Capabilities, Indexes,
+ Resources, and Source. We discuss each of these pages in turn.</para>
+
+ <section id="ugr.tools.cde.adjusting_display_of_pages">
+ <title>Adjusting the display of pages</title>
+
+ <para>Most pages in the editor have a <quote>sash</quote> bar. This is a light gray bar
+ which separates sub-sections of the page. This bar can be dragged with the mouse to
+ adjust how the display area is split between the two sash panes. You can also change the
+ orientation of the Sash so it splits vertically, instead of horizontally, by
+ clicking on the small icons at the top right of the page that look like this:
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width=".7in" format="JPG" fileref="&imgroot;image008.jpg"/>
+ </imageobject>
+ <textobject><phrase>Changing orientation of two window split</phrase>
+ </textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>All of the sections on a page have subtitles, with an indicator to the left which
+ you can click to collapse or expand that particular section. Collapsing sections can
+ sometimes be useful to free up screen area for other sections.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.tools.cde.overview_page">
+ <title>Overview Page</title>
+
+ <para>Normally, the first page displayed in the Component Descriptor Editor is the
+ Overview page (the name of the page is shown in the GUI panel at the top left). If there is an
+ error reading and parsing the source, the Source page is shown instead, giving you the
+ opportunity to correct the problem. For many components, the Overview page contains
+ three sections: Implementation Details, Runtime Information and overall
+ Identification Information.</para>
+
+ <section id="ugr.tools.cde.overview_page.implementation_details">
+ <title>Implementation Details</title>
+
+ <para>In the Implementation Details section you specify the Implementation Language
+ and Engine Type. There are two kinds of Engines: Aggregate, and non-Aggregate (also
+ called Primitive). An Aggregate engine is one which is composed of additional
+ component engines and contains no code, itself. Several of the pages in the Component
+ Descriptor Editor have different formats, depending on the engine type.</para>
+
+ </section>
+ <section id="ugr.tools.cde.overview_page.runtime_info">
+ <title>Runtime Information</title>
+
+ <para>Runtime information is only applicable for primitive engines and is disabled
+ for aggregates and other kinds of descriptors. This is where you specify the class name of the annotator
+ implementation, if you are doing a Java implementation, or the C++ shared object or dll name,
+ if you are doing a C++ implementation. Most Analysis Engines will specify that
+ they update the CAS, and that they may be replicated (for performance reasons) when deployed. If
+ a particular Analysis Engine must see every CAS (for instance, if it is counting the
+ number of CASes), then uncheck the <quote>multiple deployment allowed</quote>
+ box. If the Analysis Engine doesn't update the CAS, uncheck the <quote>updates
+ the CAS</quote> box. (Most CAS Consumers do not update the CAS, and this parameter
+ defaults to unchecked for new CAS Consumer descriptors).</para>
+
+ <para>Analysis engines are written using the CAS Multiplier APIs
+ (see <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>)
+ can create additional CASes for analysis. To specify that they
+ do this, check the <quote>returns new artifacts</quote>.</para>
+
+ </section>
+
+ <section id="ugr.tools.cde.overview_page.overall_id_info">
+ <title>Overall Identification Information</title>
+
+ <para>The Name should be a human-readable name that describes this component. The
+ Version, Vendor, and Description fields are optional, and are arbitrary
+ strings.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.tools.cde.aggregate_page">
+ <title>Aggregate Page</title>
+
+ <para>For primitive Analysis Engines, Flow Controllers or Collection Processing
+ components, the Aggregate page is not used. For aggregate engines, the page looks like
+ this:
+
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="5.7in" format="JPG" fileref="&imgroot;image010.jpg"/>
+ </imageobject>
+ <textobject><phrase>CDE Aggregate page</phrase>
+ </textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>On the left we see a list of component engines, and on the right information about the
+ flow. If you hover the mouse over an item in the list of component engines, that
+ engine's description meta data will be shown. If you right-click on one of these
+ items, you get an option to open that delegate descriptor in another editor instance.
+ Any changes you make, however, won't be seen until you close and reopen the editor
+ on the importing file.</para>
+
+ <para>Engines can be added to the list on the left by clicking the Add button at the bottom of
+ the Component Engine section. This brings up one of the following two dialogs:
+
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="3.875in" format="JPG" fileref="&imgroot;import-by-location.jpg"/>
+ </imageobject>
+ <textobject><phrase>Adding an Analysis Engine to an Aggregate, by location</phrase>
+ </textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>This dialog lets you select
+ a descriptor from your workspace, or browse the file system to select a descriptor.
+ </para>
+
+ <para>Or, if you have selected to import by name, this dialog is shown:
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="5.296875in" format="JPG" fileref="&imgroot;import-by-name.jpg"/>
+ </imageobject>
+ <textobject><phrase>Adding an Analysis Engine to an Aggregate, by name</phrase>
+ </textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>You can specify that the import should be by Name (the name is looked up using both the
+ Project's class path, and DataPath), or by location. If it is by name,
+ the dialog shows the available xml files on the class path, to pick from. If the
+ one you want isn't showing, this means it isn't on the enclosing Eclipse Java Project's
+ classpath, nor on the datapath, and one of those needs to be updated to include the
+ path to the resource. If the name picked is
+ <literal>com/company/prod/xyz.xml</literal>, the name in
+ the descriptor will be <quote><literal>com.company.prod.xyz</literal></quote>.
+ The "Browse the file system..." button is disabled when import by name is checked, because
+ the file system is not the source of the imports - rather, its the resources on the
+ classpath or datapath that are.</para>
+
+ <para>
+ If it is by location, the file reference is converted to a relative reference if
+ possible, in the descriptor.</para>
+
+ <para>The final selection at the bottom tells whether or not the selected engine(s)
+ should automatically be added to the end of the flow section (the right section on the
+ Aggregate page). The OK button does not become activated until a descriptor
+ file is selected.</para>
+
+ <para>To remove an analysis engine from the component engine list simply select an engine
+ and click the Remove button, or press the delete key. If the engine is already in the flow
+ list you will be warned that deletion will also delete the specified engine from this
+ list.</para>
+
+ <section id="ugr.tools.cde.aggregate_page.adding_components_more_than_once">
+ <title>Adding components more than once</title>
+
+ <para>Components may be added to the left panel more than once. Each of these components
[... 1081 lines stripped ...]