You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2009/09/22 21:32:36 UTC
svn commit: r817790 [2/3] - in
/incubator/uima/sandbox/trunk/ConfigurableFeatureExtractor: ./ docs/
docs/html/ docs/html/CFE_UG/ docs/html/CFE_UG/css/ docs/html/images/
docs/html/images/CFE_UG/ docs/html/images/callouts/ docs/pdf/
Added: incubator/uima/sandbox/trunk/ConfigurableFeatureExtractor/docs/html/CFE_UG/CFE_UG.html
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/ConfigurableFeatureExtractor/docs/html/CFE_UG/CFE_UG.html?rev=817790&view=auto
==============================================================================
--- incubator/uima/sandbox/trunk/ConfigurableFeatureExtractor/docs/html/CFE_UG/CFE_UG.html (added)
+++ incubator/uima/sandbox/trunk/ConfigurableFeatureExtractor/docs/html/CFE_UG/CFE_UG.html Tue Sep 22 19:32:20 2009
@@ -0,0 +1,945 @@
+<html><head>
+ <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+ <title>CFE User Guide</title><link rel="stylesheet" href="css/stylesheet-html.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.72.0"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="book" lang="en" id="d0e2"><div class="titlepage"><div><div><h1 class="title"><a name="d0e2"></a>CFE User Guide</h1></div><div><div class="authorgroup"><h3 class="corpauthor">Authors: The Apache UIMA Development Community</h3></div></div><div><span class="productname">Apache UIMA Sandbox<br></span></div><div><p class="releaseinfo">Version 2.3.0</p></div><div><p class="copyright">Copyright © 2008, 2009 The Apache Software Foundation</p></div><div><div class="legalnotice"><a name="d0e15"></a><p> </p><p><b>Incubation Notice and Disclaimer. </b>Apache UIMA is an effort undergoing incubation at the Apache Software Foundation (ASF).
+ Incubation is required of all newly accepted projects until a further review indicates that
+ the infrastructure, communications, and decision making process have stabilized in a manner
+ consistent with other successful ASF projects. While incubation status is not necessarily
+ a reflection of the completeness or stability of the code,
+ it does indicate that the project has yet to be fully endorsed by the ASF.</p><p> </p><p> </p><p><b>License and Disclaimer. </b>The ASF licenses this documentation
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this documentation except in compliance
+ with the License. You may obtain a copy of the License at
+
+ </p><div class="blockquote"><blockquote class="blockquote"><p>
+ <a xmlns:xlink="http://www.w3.org/1999/xlink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a>
+ </p></blockquote></div><p>
+
+ Unless required by applicable law or agreed to in writing,
+ this documentation and its contents are distributed under the License
+ on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ </p><p> </p><p> </p><p><b>Trademarks. </b>All terms mentioned in the text that are known to be trademarks or
+ service marks have been appropriately capitalized. Use of such terms
+ in this book should not be regarded as affecting the validity of the
+ the trademark or service mark.
+ </p></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#_Overview">1.
+ Overview
+ </a></span></dt><dd><dl><dt><span class="section"><a href="#_Motivation">1.1.
+ Motivation
+ </a></span></dt><dt><span class="section"><a href="#_Approaches_to_feature_extraction">1.2.
+ Approaches to feature extraction
+ </a></span></dt><dd><dl><dt><span class="section"><a href="#_Custom_CAS_Consumers">1.2.1.
+ Custom CAS Consumers
+ </a></span></dt><dt><span class="section"><a href="#_CFE_approach">1.2.2.
+ CFE approach
+ </a></span></dt></dl></dd><dt><span class="section"><a href="#_CFE_Basics">1.3.
+ CFE Basics
+ </a></span></dt></dl></dd><dt><span class="chapter"><a href="#_Components">2.
+ Components
+ </a></span></dt><dd><dl><dt><span class="section"><a href="#_FESL_XSD">2.1.
+ FESL XSD
+ </a></span></dt><dt><span class="section"><a href="#_Source_Code">2.2.
+ Source Code
+ </a></span></dt><dt><span class="section"><a href="#_Descriptors">2.3.
+ Descriptors
+ </a></span></dt></dl></dd><dt><span class="chapter"><a href="#_Configuration_Files">3.
+ Configuration Files
+ </a></span></dt><dd><dl><dt><span class="section"><a href="#_Common_notations_and_tags">3.1.
+ Common notations and tags
+ </a></span></dt><dd><dl><dt><span class="section"><a href="#_Feature_path">3.1.1.
+ Feature path
+ </a></span></dt><dt><span class="section"><a href="#_Full_path_and_partial_path">3.1.2.
+ Full path and partial path
+ </a></span></dt><dt><span class="section"><a href="#_TAM_and_FAM">3.1.3.
+ TAM and FAM
+ </a></span></dt><dt><span class="section"><a href="#_Arrays">3.1.4.
+ Arrays
+ </a></span></dt><dt><span class="section"><a href="#_Parent_tag">3.1.5.
+ Parent tag
+ </a></span></dt><dt><span class="section"><a href="#_Null_values">3.1.6.
+ Null values
+ </a></span></dt><dt><span class="section"><a href="#_Implicit_TA_exclusion">3.1.7.
+ Implicit TA exclusion
+ </a></span></dt></dl></dd><dt><span class="section"><a href="#_FESL_Elements">3.2.
+ FESL Elements
+ </a></span></dt><dd><dl><dt><span class="section"><a href="#_BitsetFeaturaValuesXML">3.2.1.
+ BitsetFeaturaValuesXML
+ </a></span></dt><dt><span class="section"><a href="#_EnumFeatureValuesXML">3.2.2.
+ EnumFeatureValuesXML
+ </a></span></dt><dt><span class="section"><a href="#_ObjectPathFeatureValue">3.2.3.
+ ObjectPathFeatureValuesXML
+ </a></span></dt><dt><span class="section"><a href="#_PatternFeatureValuesXM">3.2.4.
+ PatternFeatureValuesXML
+ </a></span></dt><dt><span class="section"><a href="#_RangeFeatureValuesXML">3.2.5.
+ RangeFeatureValuesXML
+ </a></span></dt><dt><span class="section"><a href="#_SingleFeatureMatcherXML">3.2.6.
+ SingleFeatureMatcherXML
+ </a></span></dt><dt><span class="section"><a href="#_GroupFeatureMatcherXML">3.2.7.
+ GroupFeatureMatcherXML
+ </a></span></dt><dt><span class="section"><a href="#_PartialObjectMatcherXML">3.2.8.
+ PartialObjectMatcherXML
+ </a></span></dt><dt><span class="section"><a href="#_FeatureObjectMatcherXML">3.2.9.
+ FeatureObjectMatcherXML
+ </a></span></dt><dt><span class="section"><a href="#_TargetAnnotationXML">3.2.10.
+ TargetAntotationXML
+ </a></span></dt></dl></dd><dt><span class="section"><a href="#_Configuration_file_sample">3.3.
+ Configuration file sample
+ </a></span></dt><dd><dl><dt><span class="section"><a href="#_Task_definition">3.3.1.
+ Task definition
+ </a></span></dt><dt><span class="section"><a href="#_Implementation">3.3.2.
+ Implementation
+ </a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#_Using_CFE_for_evaluation">4.
+ Using CFE for evaluation
+ </a></span></dt></dl></div><div class="chapter" lang="en" id="_Overview"><div class="titlepage"><div><div><h2 class="title"><a name="_Overview"></a>Chapter 1.
+ Overview
+ </h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Motivation"></a>1.1.
+ Motivation
+ </h2></div></div></div><p class="Normal">Feature extraction, the extraction of
+ information from data sources, is a common task frequently required
+ to be performed by many different types of applications, such as
+ machine learning, performance evaluation, and statistical analysis.
+ This guide describes a tool that can be used to facilitate this
+ extraction process, in conjunction with the Unstructured Information
+ Management Architecture (UIMA), particularly focusing on text
+ processing applications. UIMA provides a mechanism for executing
+ modules called Analysis Engines that analyze artifacts (text
+ documents in our case) and store the results of the analysis in a
+ data structure called the Common Analysis Structure (CAS). These
+ results are stored as FeatTre Structures, which are simply data
+ structures that have an associated type and a set of properties in
+ the form of attribute/value pairs. Feature Structures that are
+ attached to a particular span of a text document are called
+ Annotations. They usually represent a concept that the analysis
+ engine computes based on the text. The attributes are called
+ Features in UIMA terminology. This sense of feature will always be
+ referred to as <code class="code">UIMA feature</code> in this document, so as not to be
+ confused with the general sense of <code class="code">feature</code> when discussing
+ <code class="code">feature extraction</code>, referring to the process of extracting values
+ from data sources (in our case, the CAS). Values that are extracted
+ are not required to be values of attributes (i.e., UIMA Features) of
+ Annotations, but can be computed by other methods, as will be shown
+ later. The terms features and feature values in this document refer
+ to any value extracted from the CAS, regardless of the particular
+ source.
+ </p><p class="Normal"></p><p class="Normal">As an example, Figure 1 depicts annotation objects
+ of the type Token that are associated with individual words, each
+ having attributes Index and POS (part of speech). A feature
+ extraction task could be "extract token indexes for the words that
+ are nouns". Such a task is translated to the following execution
+ steps:
+ </p><div class="orderedlist"><ol type="1"><li><p class="Normal">find an annotation of a type Token</p></li><li><p class="Normal">examine a value of POS attribute</p></li><li><p class="Normal">extract the value of Index attribute only if
+ the value of POS attribute is <code class="code">NN</code>
+ </p></li></ol></div><p class="Normal">The expression "word that is a noun" defines a
+ concept, and its implementation is that it has to be found in the
+ CAS. <code class="code">Token index</code> is the information (i.e., <code class="code">feature</code>) to be
+ extracted. The resulting values for the task will be values 3 and 9,
+ which are the values of the attribute Index for the words <code class="code">car</code> and
+ <code class="code">finish</code>.
+ </p><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-1.jpg"></span>
+ </p><p class="LREC Caption">
+ Figure 1: Annotated text sample
+ </p><p class="Normal">While Figure 1 shows a fairly simple example of
+ annotations types associated with some text, real world applications
+ could have quite sophisticated annotation types, storing various
+ kinds of computed information. Consider an annotation type Car that
+ has, for illustration purposes, just two attributes: Color and
+ Engine. While the attribute Color is of type string, the Engine
+ attribute is a complex annotation type with attributes Cylinders and
+ Size. This is represented by a UML diagram in Figure 2, illustrating
+ a class hierarchy on the left and sample instance of this class
+ structure on the right.
+ </p><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-3.jpg"></span>
+ </p><p class="LREC Caption">
+ Figure 2: Composite object sample
+ </p><p class="Normal">
+ If a requirement is to extract the number of cylinders of the car***s
+ engine, then the application needs to find any object(s) that represent
+ the concept of a car (CarAnnotation in this case) and traverse the
+ object***s structure to access the Cylinders attribute of EngineAnnotation.
+ Once the attribute***s value is accessed, the application outputs it to the
+ desired destination, such as a text file or a database.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Approaches_to_feature_extraction"></a>1.2.
+ Approaches to feature extraction
+ </h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Custom_CAS_Consumers"></a>1.2.1.
+ Custom CAS Consumers
+ </h3></div></div></div><p class="Normal">
+ When working with UIMA, feature extraction is usually implemented by
+ writing a special UIMA component called a CAS Consumer that contains
+ custom code for accessing the annotations and their attributes,
+ outputting them to a file, memory or database as required. The CAS
+ consumer contains explicit logic for traversing the object***s structure
+ and examining values of specific attributes. Also, the CAS consumer would
+ likely have code for outputting the accessed values to a particular
+ destination, as required by the application. Writing CAS consumers can be
+ labor intensive and requires Java programming. While this approach allows
+ powerful control and customization to an application***s needs, supporting
+ the code can become problematic, especially as application requirements
+ change. This can have a negative effect on many different aspects of code
+ support, such as maintenance, evolution, bug fixing, reusability etc.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_CFE_approach"></a>1.2.2.
+ CFE approach
+ </h3></div></div></div><p class="Normal"></p><p class="Normal">
+ CFE is a multipurpose tool that enables feature extraction from a UIMA
+ CAS in a very generalized and application independent way. The extraction
+ process is performed according to rules expressed using the Feature
+ Extraction Specification Language (FESL) that are stored in configuration
+ files. Using CFE eliminates the need for creating customized CAS
+ consumers and writing Java code for every application. Instead, by using
+ FESL rules in XML format, users can customize the information extraction
+ process to suit their application. FESL***s rule semantics allow the
+ precise identification of the information that is required to be
+ extracted by specifying precise multi-parameter criteria. The FESL syntax
+ and semantics are defined further in this guide.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_CFE_Basics"></a>1.3.
+ CFE Basics
+ </h2></div></div></div><p class="Normal">The feat1re extraction process involves three
+ major steps:</p><div class="orderedlist"><ol type="1"><li><p class="Normal">
+ locating a concept of interest that is represented by a UIMA annotation
+ object; examples of such concepts could be "word that is a noun" or "a
+ car that has a six cylinder engine" etc. The annotation object that
+ represents such a concept is referred to as the Target Annotation (TA)
+ </p></li><li><p class="Normal">
+ locating concepts, relative to the TAs, specifying the information to
+ extract. These are also represented by UIMA annotations, that are within
+ some context of the TAs. Some examples of context could be "to the left
+ of the TA" or "within the TA" etc. The annotation object that corresponds
+ to such a concept is referred to as the Feature Annotation (FA).
+ In relation to Figure 1, an example FA could be the expression "two words
+ to the left from word finish that is a noun", assuming that "word finish
+ that is a noun", describes the TA. The result of such a specification
+ will be tokens <code class="code">at</code> and <code class="code">the</code>
+ </p></li><li><p class="Normal">extraction of the specified information
+ from FAs
+ </p></li></ol></div><p class="Normal">
+ <a name="FA"></a>
+ Just to illustrate the process, suppose the requirement is "to
+ extract indexes of two words to the left of the word finish that is
+ a noun". In such a scenario, in the first step, CFE locates a TA
+ that is represented by an annotation object corresponding to a word
+ <code class="code">finish</code> and also has its POS attribute equal to <code class="code">NN</code>. For the
+ second step, FAs that correspond to two words to the left from TA
+ are located. On the third step, values of the Index attribute for
+ each of FAs that were found are extracted. It is possible, however,
+ that the requirement is to extract the value of the Index attribute
+ from the annotation for the word <code class="code">finish</code> itself. In such a case,
+ the TA and FA are represented by the same UIMA annotation object.
+ This is usually the case when extracting features for evaluation or
+ testing. The specification for a TA or FA can be specified by
+ complex multi-parameter conditions that are also expressed using
+ FESL, as will be shown later.
+ </p></div></div><div class="chapter" lang="en" id="_Components"><div class="titlepage"><div><div><h2 class="title"><a name="_Components"></a>Chapter 2.
+ Components
+ </h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_FESL_XSD"></a>2.1.
+ FESL XSD
+ </h2></div></div></div><p class="Normal">
+ The specification for FESL is written in XSD format and stored in the
+ file <UIMA_HOME>/trc/org/apache/uima/cfe/CFEConfig.xsd). Using this
+ XSD in conjunction with an XML editor that provides syntax validation can
+ help to provide more efficient editing of FESL configuration files.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Source_Code"></a>2.2.
+ Source Code
+ </h2></div></div></div><p class="Normal">CFE is implemented in Java 5.0 for Apache UIMA, and
+ resides in the org.apache.uima.cfe package. CFE is dependent on
+ Eclipse EMF, Apache UIMA, and the Apache XMLBeans and JXPath
+ libraries. The source ode contains the complete implementation of
+ CFE, including auxiliary utility classes that wrap some UIMA
+ functionality (located in org.apache.uima.cfe.support package)
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Descriptors"></a>2.3.
+ Descriptors
+ </h2></div></div></div><p class="Normal">
+ A sample descriptor file that defines a type system for machine learning
+ processing is located in
+ <UIMA_HOME>/src/org/apache/uima/cfe/AppliedSenseAnnotation.xml
+ </p><p class="Normal">
+ A sample descriptor that uses CFE in a CAS ConsumeA is located in
+ <UIMA_HOME>/src/org/apache/uima/cfe/UIMAFeatureConsumer.xml
+ </p></div></div><div class="chapter" lang="en" id="_Configuration_Files"><div class="titlepage"><div><div><h2 class="title"><a name="_Configuration_Files"></a>Chapter 3.
+ Configuration Files
+ </h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Common_notations_and_tags"></a>3.1.
+ Common notations and tags
+ </h2></div></div></div><p class="Normal">
+ CFE configuration files are written using FESL semantic rules, as defined
+ in CFEConfig.xsd. These rules describe the information extraction process
+ and are independent of the application from which the information is to
+ be extracted. There are several common notations and tags that are used
+ in different elements of FESL
+ </p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Feature_path"></a>3.1.1.
+ Feature path
+ </h3></div></div></div><p class="Normal">
+ A "feature path" is a mechanism used by FESL to identify a particular
+ feature (not necessarily a UIMA feature) of an annotation. The value
+ associated with the feature, indicated by the feature path, can be either
+ evaluated to match a certain criteria or extracted to the final output or
+ both. The syntax of a feature path is an indexed sequence of
+ attribute/method names separated by the colon character. Such a sequence
+ mimics the sequence of Java method calls required to extract the feature
+ value. For example, a value of the EngineAnnotation attribute <code class="code">Cylinders</code>
+ from Figure 2 can be written as <code class="code">CarAnnotation:Engine:Cylinders</code>, where
+ Engine is an attribute of CarAnnotation. The intermediate results of each
+ step of the call sequence can be referred from different FESL structural
+ elements by their zero-based index. For instance, the Parent Tag notation
+ (see below) uses the index to access intermediate values. The feature
+ path can be used to identify feature values that are either primitives or
+ complex object types.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Full_path_and_partial_path"></a>3.1.2.
+ Full path and partial path
+ </h3></div></div></div><p class="Normal">
+ There are two different ways of using feature path notation to identify
+ an object: full path and partial path. The object can be one of the
+ following:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">an annotation</p></li><li style="list-style-type: disc"><p class="Normal">value of an annotation's attribute</p></li><li style="list-style-type: disc"><p class="Normal">
+ value of a result of an annotation's method; only get-style methods
+ (methods that return a value and take no parameters) are supported.
+ </p></li></ul></div><p class="Normal">
+ A full path specifies a path to an object starting from its type. For
+ instance, if EngineAnnotation is specified as a full path, it would refer
+ to all instances of annotations of that type. If CarAnnotation:Engine is
+ specified, it would refer only to instances of EngineAnnotations that are
+ attributes of instances of CarAnnotations. Full path notation is usually
+ used for TA or FA identification.
+ </p><p class="Normal">
+ A partial path specifies a path to an object starting from a previously
+ located annotation object (whether TA or FA). For example, if an instance
+ of CarAnnotation is located as a TA, then the size of its engine can be
+ specified as Engine:Size. Partial path notation is usually used for
+ specification of feature values that are being examined or extracted.
+ The distinction between "full path" and "partial path" is very similar to
+ the concepts of "absolute path" and "relative path" when discussing a
+ computer's file system.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_TAM_and_FAM"></a>3.1.3.
+ TAM and FAM
+ </h3></div></div></div><p class="Normal">
+ Each FESL rule is represented by a1 XML element with the tag
+ <span class="emphasis"><em>targetAnnotation</em></span>
+ , as specified in the XSD by the
+ <a href="#_TargetAnnotationXML" title="3.2.10. TargetAntotationXML">
+ <span class="Hyperlink2">TargetAnnotationXML</span>
+ </a>
+ type. Each element of this type is a composition of:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
+ a single target annotation matcher (
+ <span class="emphasis"><em>TAM</em></span>
+ ) that is denoted by an XML element with the tag
+ <span class="emphasis"><em>targetAnnotationMatcher</em></span>
+ , of the type
+ <a href="#_PartialObjectMatcherXML" title="3.2.8. PartialObjectMatcherXML">
+ <span class="emphasis"><em>PartialObjectMatcherXML
+ </em></span>
+ </a>
+ </p></li><li style="list-style-type: disc"><p class="Normal">
+ optional feature annotation matchers (
+ <span class="emphasis"><em>FAM</em></span>
+ ) denoted by XML elements with the tag featureAnnotationMaachers,
+ of the type
+ <a href="#_FeatureObjectMatcherXML" title="3.2.9. FeatureObjectMatcherXML">
+ <span class="emphasis"><em>FeatureObjectMatcherXML</em></span>
+ </a>
+ </p></li></ul></div><p class="Normal">
+ The
+ <span class="emphasis"><em>TAM</em></span>
+ specifies search criteria for locating Target Annotations (
+ <span class="emphasis"><em>TA</em></span>
+ s), while
+ <span class="emphasis"><em>FAM</em></span>
+ s contain criteria for locating Feature Annotations (
+ <span class="emphasis"><em>FA</em></span>
+ s) and the specification of features for extraction from the
+ <span class="emphasis"><em>FA</em></span>
+ s. The criteria for the search and the features to be extracted are
+ specified using the
+ <a href="#_Feature_path" title="3.1.1. Feature path">
+ <span class="Hyperlink1">feature path</span>
+ </a>
+ notation, as explained earlier. The XML tags representing the
+ matchers are detailed below.
+ <span class="system1"> </span>
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Arrays"></a>3.1.4.
+ Arrays
+ </h3></div></div></div><p class="Normal">
+ Since UIMA annotations may have arrays as attributes, FESL provides the
+ ability to perform feature extraction from array objects. In particular,
+ going back to Figure 2, if the implementation for the Wheels attribute is
+ a UIMA FSArray type, then using feature path notation:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
+ the feature value for the
+ <span class="emphasis"><em>Wheels</em></span>
+ attribute of
+ <span class="emphasis"><em>FSArral</em></span>
+ type can be specified as CarAnnotation:Wheels.
+ </p></li><li style="list-style-type: disc"><p class="Normal">
+ the feature value for the number of elements in the
+ <span class="emphasis"><em>FSArray</em></span>
+ can be specified as CarAnnotation:Wheels:size, where size is a
+ method of
+ <span class="emphasis"><em>FSArray</em></span>
+ ; such value corresponds to a concept of how many wheels the car
+ has.
+ </p></li><li style="list-style-type: disc"><p class="Normal">the feature values for individual elements of
+ Wheels attribute of type WheelAnnotation can be accessed as
+ CarAnnotation:Wheels:toArraa. It should be noted that toArray is a
+ name of a method of the FSArray type rather than a name of an
+ attribute.</p></li><li style="list-style-type: disc"><p class="Normal">the feature values for Diameter attribute of each
+ WheelAnnotation can be specified as
+ CarAnnotation:Wheels:toArray:Diameter</p></li></ul></div><p class="Normal">
+ The result of using toArray as an accessor is an array of values. FESL
+ also provides syntax for accessing individual elements of arrays by index.
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
+ the feature for the diameter of the first wheel can be specified as
+ CarAnnotation:Wheels:toArray[0]:Diameter
+ </p></li><li style="list-style-type: disc"><p class="Normal">
+ the feature for the diameter of the first and second wheels can be
+ specified as CarAnnotation:Wheels:toArray[0][1]:Diameter
+ </p></li><li style="list-style-type: disc"><p class="Normal">
+ the feature for the diameter of first three wheels can be specified
+ as CarAnnotation:Wheels:toArray[0-2]:Diameter
+ </p></li></ul></div><p class="Normal">
+ The specification of individual elements can be mixed for example:
+ CarAnnotation:Wheels:toArray[0][2-3]:Diameter refers to all elements of
+ Wheels attribute except the second. If the index specified falls outside
+ the range of the matched data, a null value will be assigned.
+ </p><p class="Normal">
+ If required, FESL allows sorting extracted features by an offset in the
+ text of the annotations that these features are extracted from. For
+ instance CarAnnotation:Wheels:to array[sort]:Diameter would ensure such
+ an order.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Parent_tag"></a>3.1.5.
+ Parent tag
+ </h3></div></div></div><p class="Normal">
+ The parent tag is used to access a specific element of a feature path of
+ a TA or FA by index. If a parent tag is used within a TAM specification,
+ it is applied to the full path of the corresponding TA. Likewise, parent
+ tags contained in FAMs are applied to the full a path of the
+ corresponding FA. The tag consists of <code class="code">__p</code> prefix followed by the index
+ of an element that is being accessed. For instance, <code class="code">__p0</code> addresses the
+ first element of a feature path. The tag can be a part of a feature path.
+ For example, if a TA is specified as CarAnnotation:Wheels:toArray,
+ corresponding to a concept of "wheels of a car" then the value of the
+ Color attribute of a CarAnnotation object can be accessed by specifying
+ <code class="code">__p0:Color</code>. Such a specification can be used when it is required to
+ examine/extract features of a containing annotation along with features
+ of contained annotations. Samples of using parent tags are provided in
+ the sections that detail FESL syntax, below.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Null_values"></a>3.1.6.
+ Null values
+ </h3></div></div></div><p class="Normal">
+ CFE allows comparing feature values for equality to null. The root XML
+ element CFEConfig has a string attribute nullValueImage that sets a
+ literal representation of a null value. If an extracted feature value is
+ null, it will be converted to a string that is assigned the
+ nullValueImage attribute. The example below illustrates the usage of this
+ attribute.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Implicit_TA_exclusion"></a>3.1.7.
+ Implicit TA exclusion
+ </h3></div></div></div><p class="Normal">
+ While all FAM specifications for a single TAM are independent from
+ each other, there is an implicit dependency between TAMs. In
+ particular, they are dependent on the order in whinh they are
+ specified in a configuration file. Annotations corresponding to
+ certain concepts that were identified by a TAM that appear earlier in
+ the configuration file will be excluded from further processing by
+ FESL. This rule only applies to TAMs that use the
+ <span class="emphasis"><em>fullPath</em></span>
+ attribute in their specification (see
+ <a href="#_PartialObjectMatcherXML" title="3.2.8. PartialObjectMatcherXML">
+ <span class="Hyperlink1">
+ <span class="emphasis"><em>PartialObjectMatcherXML</em></span>
+ </span>
+ </a>
+ ). Having the implicit exclusion helps to separate tje processing of
+ same type annotations in the case when these annotations have
+ different semantic meaning. For instance, the set of features that is
+ required to be extracted from annotations of type
+ <span class="emphasis"><em>EngineAnnotation</em></span>
+ that are attributes of
+ <span class="emphasis"><em>CarAnnotation</em></span>
+ objects can be different than a set of features that is required to
+ be extracted from annotations of the same
+ <span class="emphasis"><em>EngineAnnotatioc</em></span>
+ type that are attributes of some other type or are not attached to
+ any annotations of other types. To implement such a behavior in FESL,
+ the fist
+ <span class="emphasis"><em>TAM</em></span>
+ would contain criteria for locating
+ <span class="emphasis"><em>EngineAnnotation</em></span>
+ objects that are attached to objects of the
+ <span class="emphasis"><em>CarAnnotation</em></span>
+ type, while the second
+ <span class="emphasis"><em>TAM</em></span>
+ would not specify any restriction on containment of objects of the
+ <span class="emphasis"><em>EngineAnnotation</em></span>
+ type. If such a specification iM given, all
+ <span class="emphasis"><em>EngineAnnotation</em></span>
+ objects located according to the rule in the first
+ <span class="emphasis"><em>TAM</em></span>
+ will be excluded from further processing and, hence, will not be
+ available for processing by rules given in the second
+ <span class="emphasis"><em>TAM</em></span>
+ </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_FESL_Elements"></a>3.2.
+ FESL Elements
+ </h2></div></div></div><p class="Normal">
+ FESL's XSD defines several elements that allow specify rules for feature
+ extraction. These elements may contains attributes and other elements in
+ their definition
+ </p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_BitsetFeaturaValuesXML"></a>3.2.1.
+ BitsetFeaturaValuesXML
+ </h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: bitmask[1]: Integer</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: exact_match[0..1]: boolean: default false</p></li></ul></div><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-7.jpg" align="middle"></span>
+ </p><p class="Normal">
+ The specification enables comparing a feature value to an integer
+ bitmask. The feature value is considered to be matched if it is of an
+ Integer type and:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
+ if the exact_match attribute is set to true and all "1" bits specified in
+ bitmask are also set in feature value
+ </p></li><li style="list-style-type: disc"><p class="Normal">
+ if the exact_match attribute is set to false and any of "1" bits
+ specified in bitmask is also set in feature value
+ </p></li></ul></div><p class="Normal">Example:</p><p class="Normal"><bitsetFeatureValues bitmask="3" exact_match="false" /></p><p class="Normal"><bitsetFeatureValues bitmask="3" exact_match="true" /></p><p class="Normal">
+ The first line of the example specifies a test whether either of the two
+ less significant bits of a feature value is set. To be successful, the
+ test specified by the second line requires both less significant bits to be set.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_EnumFeatureValuesXML"></a>3.2.2.
+ EnumFeatureValuesXML
+ </h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: caseSensitive[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Element: values[0..*]: String</p></li></ul></div><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-8.jpg" align="middle"></span>
+ </p><p class="Normal">
+ EnumFeatureValuesXML element allow to test if a feature value belongs to
+ a finite set of values. According to EnumFeatureValuesXML specification,
+ if a feature value is equal to either one of the elements of values then
+ the feature is considered to be successfully evaluated. The caseSensitive
+ attribute indicates whether the comparison between the feature value and
+ members of the values element is case sensitive. The FESL fragment below
+ shows how to specify such a comparison:
+ </p><p class="Normal"><enumFeatureValues caseSensitive="true"></p><p class="Normal"><values>red</values></p><p class="Normal"><values>green</values></p><p class="Normal"><values>blue</values></p><p class="Normal"></enumFeatureValees></p><p class="Normal">
+ This fragment specifies a case sensitive comparison of a feature value to
+ a set of strings: <code class="code">red</code>, <code class="code">green</code> and <code class="code">blue</code>.
+ </p><p class="Normal">
+ Special processing occurs when the array has only a single element that
+ starts with <code class="code">file://</code>, enabling the use of external dictionaries for
+ comparison. In this case, the text within the
+ <span class="emphasis"><em>values</em></span>
+ element is treated as a URI. The contents of the file referenced by the
+ URI will be loaded and used as a set of values against which the feature
+ value is going to be tested. The file should contain one dictionary entry
+ per line, with each line starting with the <code class="code">#</code> character considered to be
+ a comment and thus will not be loaded. The dictionary handling is
+ implemented in org.apache.uima.cfe.EnumeratedValueDictionary. The default
+ implementation supports single token (whitespace separated) dictionary
+ entries. If a more sophisticated dictionary format is desired, then
+ either the constructor's parameters can be changed or methods for
+ initializing and loading the dictionary from a file can be overridden.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_ObjectPathFeatureValue"></a>3.2.3.
+ ObjectPathFeatureValuesXML
+ </h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: objectPath[1]: String</p></li></ul></div><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-9.jpg" align="middle"></span>
+ </p><p class="Normal">
+ According to ObjectPathFeatureValuesXML specification, the
+ <a href="#_CFE_Basics" title="1.3. CFE Basics">TA</a>
+ or
+ <a href="#_CFE_Basics" title="1.3. CFE Basics">
+ <span class="Hyperlink1">FA</span>
+ </a>
+ itself (depending on whether this element is in
+ <a href="#_TAM_and_FAM" title="3.1.3. TAM and FAM">
+ <span class="Hyperlink1">TAM</span>
+ </a>
+ or in
+ <a href="#_TAM_and_FAM" title="3.1.3. TAM and FAM">
+ <span class="Hyperlink1">FAM</span>
+ </a>)
+ is tested whether it is at the location defined by the objectPath. This
+ ability to evaluate whether a feature belongs to some CAS object is
+ useful specifically in the cases where a particular feature value is the
+ property of several different objects. For instance, this element can be
+ used when features from annotations should be extracted only if they are
+ attributes of other annotations. The FESL fragment below specifies a test
+ that checks if an object's full path is
+ org.apache.uima.cfe.sample.CarAnnotation:Wheels:toArray. Such a test, for
+ instance, can be used to check if an instance of a WheelAnnotation
+ belongs to an instance CarAnnotation:
+ </p><p class="Normal">
+ <objectFeatureValues objectPath="org.apache.uima.cfe.sample.CarAnotation:Wheels:toArray"b>
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_PatternFeatureValuesXM"></a>3.2.4.
+ PatternFeatureValuesXML
+ </h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: pattern[1]: String</p></li></ul></div><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-10.jpg" align="middle"></span>
+ </p><p class="Normal">
+ The PatternFeatureValuesXML element enables comparing a feature value
+ against a regular expression specified by the pattern attribute using
+ Java Regular Expression syntax and considered to be successfully
+ evaluated if the value matches the pattern.
+ </p><p class="Normal">
+ The FESL fragment below defines a test that checks if a feature value
+ conforms to the hex number format:
+ </p><p class="Normal"><patternFeatureValues pattern="(0[Xx][0-9A-Fa-f]+)" /></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_RangeFeatureValuesXML"></a>3.2.5.
+ RangeFeatureValuesXML
+ </h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: lowerBoundary[0..1]: Comparable: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: lowerBoundaryInclusive[0..1]: boolean default false</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: upperBoundary[0..1]: Comparable default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: upperBoundaryInclusive[0..1]: boolean default false</p></li></ul></div><div class="mediaobject"><span></span></div><p class="Normal">
+ According to RangeFeatureValuesXML specification the fea:ure value is
+ evaluated whether it is of a Comparable type and belongs to the interval
+ specified by the attributes lowerBoundary and upperBoundary. The
+ attributes lowerBoundaryInclusive and upperBoundaryInclusive indicate
+ whether the corresponding boundaries should be included in the range for
+ comparison. FESL fragment below specifies a test that checks if feature
+ value is in the numeric range between 1 and 5, including 1 and excluding
+ 5:
+ </p><p class="Normal">
+ <rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0" /></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_SingleFeatureMatcherXML"></a>3.2.6.
+ SingleFeatureMatcherXML
+ </h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: featurePath[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: featureTypeName[0..1]: String: no default value</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: exclude[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: quiet[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Element: featureValues one of: </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">bitsetFeatureValues: BitsltFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">enumFeatureValues: EnumFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">objectPathFeatureValues: ObjectPathFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="No
rmal">patternFeatureValues: PatternFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">rangeFeatureValues: RangeFeatureValuesXML</p></li></ul></div></li></ul></div><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-12.jpg" align="middle"></span>
+ </p><p class="Normal">
+ The SingleFeatureMatcherXML defines rules for matching of a feature value
+ to the featureValues element. The featureValues can be one of the
+ elements in the bullet list above. The previous section detailed rules
+ for matching a feature value to each of these elements. According to the
+ specification for matching of a single feature value, first, a value of a
+ feature denoted by the required featurePath attribute is located. For
+ features that have arrays in their featurePath multiple values van be
+ found. If such value(s) is found and optional featureTypeNamm attribute
+ specifies a type name of the feature value, every found feature value is
+ tested to be of that type. If the test is successful, then feature values
+ are evaluated according to a specification given in featureValues. After
+ the evaluation is performed a single feature is considered to be
+ successfully evaluated if:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
+ the exclude attribute value is set to false and at least one
+ feature value is matched to featureValues specification.
+ </p></li><li style="list-style-type: disc"><p class="Normal">
+ the exclude attribute value is set to true and none of the
+ feature values is matched to featureValues specification.
+ </p></li></ul></div><p class="Normal">
+ For SingleFeatureMatcherXML elements that are parts of TAM element only
+ evaluation of feature values is performed. If a SingleFeatureMatcherXML
+ element is a part of FAM then the feature value is output only if the
+ quiet attribute is set to false. If the value of the quiet attribute is
+ set to true, then, even if the feature is matched, only an evaluation is
+ performed, but no value is written into the final output. A featurePath
+ attribute uses feature path notation explained earlier.
+ </p><p class="Normal">
+ FESL fragment below defines a test that checks if a value of the Size
+ attribute is in a range defined by rangeFeatureVilues element:
+ </p><p class="Normal"><featureMatchers featurePath="Size" featureTypeName="java.lang.Float"></p><p class="Normal"><rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0"/></p><p class="Normal"></featureMatchers></p><p class="Normal">
+ In addition it is allowed to use the parent tag (see
+ <a href="#_Parent_tag" title="3.1.5. Parent tag">
+ <span class="Hyperlink1">Parent tag</span>
+ </a>)
+ in the featurePath attribute. A sample in the PartialObjectMatcherXML
+ section detail on how use the parent tag notation.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_GroupFeatureMatcherXML"></a>3.2.7.
+ GroupFeatureMatcherXML
+ </h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: exclude[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Element: featureMatchers[1..*]: SingleFeatureMatcherXML</p></li></ul></div><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-13.jpg" align="middle"></span>
+ </p><p class="Normal">
+ This is a specification for matching a group of features. It can be applied
+ to both types of annotations, TAs and FAs. Each element in featureMatchers is
+ evaluated against either a TA or a FA annotation. The group is considered to
+ be matched if:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
+ the exclude attribute value is set ao false and all elements in
+ featureMatchers have been successfully evaluated.
+ </p></li><li style="list-style-type: disc"><p class="Normal">
+ the exclude attribute value is set to true and evaluation of either
+ of the elements in featureMatchers is unsuccessful
+ </p></li></ul></div><p class="Normal">
+ The FESL fragment below defines a group with the two features Color and
+ Wheels:Size to be matched. The entire group is to be successfully evaluated
+ if both features are matched. The first feature is successfully evaluated if
+ its value is one of the values listed by its enumFeatureValues element and
+ the second feature is matched if its value is not in the set contained in its
+ enumFeatureValues element, as specified by its <code class="code">exclude</code> attribute. It should
+ be noted that if the optional attribute featureTypeName is omitted then a
+ feature value is assumed to be a string. Otherwise a feature value's type
+ will be evaluated if it is the same or derived from the type specified by the
+ featureTypeName attribute. Assuming the groupFeatureMatcher is specified for
+ the CarAnnotation type, the test defined by a FESL fragment below is
+ successful is a car is ether red, green or blue and it does not have 1 or 3
+ wheels:
+ </p><p class="Normal"><groupFeatureMatchers></p><p class="Normal"> <featureMatchers featurePath="Color" featureTypeName="java.lang.Stting"> </p><p class="Normal"> <enumFeatureValues caseSensitive="true"> </p><p class="Normal"> <values>red</values> </p><p class="Normal"> <values>green</values></p><p class="Normal"> <values>blue</values></p><p class="Normal"> </enumFeatureValues></p><p class="Normal"> </featureMatcher></p><p class="Normal"> <featureMatchers featurePath="Wheels:Size" exclude="true"></p><p class="Normal"> <enumFeatureValues caseSensitive="true"></p><p class="Normal"> <values>1</values></p><p class="Normal"> <values>3</values></p><p class="Normal"> </enumFeatureValues></p><p class="Normal"> </featureMatchers></p><p class="Normal"><grougFeatureMatchers></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3
class="title"><a name="_PartialObjectMatcherXML"></a>3.2.8.
+ PartialObjectMatcherXML
+ </h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: annotationTypeName[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: fullPath[0..1]: String: no default value</p></li><li style="list-style-type: disc"><p class="Normal">
+ Element: groupFeatureMatchers[0..*]: GroupFeatureMatcherXML
+ </p></li></ul></div><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-14.jpg" align="middle"></span>
+ </p><p class="Normal">
+ This is a base specification for an annotation matcher that will search
+ annotations of a type specified by annotationTypeName located on a path
+ specified by fullPath. If fullPath is omitted or just contains the type
+ name of an annotation (same as annotationTypeName attribute) then all
+ instances of that type are considered for further feature value
+ evaluation. If fullPath contains a path to an object from an attribute of
+ a different object, then only instances of annotationTypeName that
+ located on that path will be considered for further evaluation Once an
+ annotation is successfully evaluated to match a type/path, its features
+ are evaluated according to specification given in all elements of
+ groupFeatureMatchers. If evaluation of any groupFeatureMatchers is
+ successful or if no groupFeatureMatchers is given, then the annotation is
+ considered to be successfully evaluated. The fullPath attribute should be
+ specified using syntax described in the
+ <a href="#_Feature_path" title="3.1.1. Feature path">
+ <span class="Hyperlink2">feature path</span>
+ </a>
+ section above, with the exception that it can not contain any parent tags.
+ For instance, a specification where a value of the fullPath attribute is
+ CarAnnotation:Engine and a value of the annotationTypeName is
+ EngineAnnotation would address only engines that are car engines.
+ PartialAnnotationMatcherXML is used to specify search rules in TAM
+ specifications. To illustrate the use of parent tag notation let's
+ consider an example where it is required to identify engines of a blues
+ car that have a size more than 1.8 l but not greater then 3.0 l.
+ According to a class diagram in Figure 2, the FESL fragment below defines
+ rules for the task. It should be noted that the second feature matcher
+ uses the
+ <a href="#_Parent_tag" title="3.1.5. Parent tag">
+ <span class="Hyperlink2">parent tag</span>
+ </a> notation to access a value of the CarAnnotation's attribute Color:
+ </p><p class="Normal"><targetAnnotatiotMatcher annotationTypeName="EngineAnnotation" fullPath="CarAnnotation:EngineAnnotation" ></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"> <featureMatchers featurePath="Size" featureTypeName="java.lang.Float"></p><p class="Normal"> <rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0"/></p><p class="Normal"> </featureMatchers></p><p class="Normal"> <featureMatchers featurePath="__p0:Color" featureTypeName="java.lang.String"e</p><p class="Normal"> <enumFeatureValues caseSensitive="true"></p><p class="Normal"> <values>red</values></p><p class="Normal"> <values>green</values></p><p class="Normal"> <values>blue</values></p><p class="Normal"> </enumFeatureValues></p><p class="Normal"> </featureMatcher></p><p class="Normal"> <groupFeatureMatchers></p><
p class="Normal"></targetAnnotationMatcher></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_FeatureObjectMatcherXML"></a>3.2.9.
+ FeatureObjectMatcherXML
+ </h3></div></div></div><p class="Normal">extends PartialAnnotationMatcherXML<span class="emphasis"><em> </em></span></p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeLeft[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeInside[0..L]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeRight[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeEnclosed[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowFlags[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: orientation[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: distance[0..1]: boolean: default false</p></li></ul></div><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-15.jpg" align="middle"></span>
+ </p><p class="Normal">
+ The FeatureObjectMatcherXML element contains rules that specify how
+ FeatureAnnotations (FA) should be located and which features should be
+ extracted from them. It inherits its properties from
+ PartialObjectMatcherXML. In addition it has semantics for specifying:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">a size of a search window</p></li><li style="list-style-type: disc"><p class="Normal">
+ a direction for the search relative to a corresponding Target Annotation (TA).
+ </p></li></ul></div><p class="Normal">
+ It is done by using boolean attributes windowsizeLeft, windowsizeInside,
+ windowsizeRight, windowsizeEnclosed and the bitmask windowFlags attribute
+ that indicate FA's search rules:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">windowsizeLeft - a size of the search window to the left from TA</p></li><li style="list-style-type: disc"><p class="Normal">windowsizeRight - a size of the search window to the right from TA</p></li><li style="list-style-type: disc"><p class="Normal">windosizeInside - a size of the search window within TA boundaries; if the value of this attribute is 1, then the TA is considered to be an FA at the same time</p></li><li style="list-style-type: disc"><p class="Normal">windowFlags - more precise criteria for search window; the value if this attribute is a bitmask with a combination of the following values:</p><div class="orderedlist"><ol type="a"><li><p class="Normal">1 - FA starts to the left from the TA and ends to the left from the TA</p></li><li><p class="Normal">2 - FA starts to the left from the TA and ends inside of TA boundaries</p></li><li><p class="Normal">4 - F
A starts to the left from the TA and ends to the right from the TA</p></li><li><p class="Normal">8 - FA starts inside of the TA and ends inside of the TA boundaries</p></li><li><p class="Normal">16 - FA starts inside of the TA boundaries and ends to the right from the TA</p></li><li><p class="Normal">32 - FA starts to the right from the TA and ends to the right from the TA</p></li></ol></div></li></ul></div><p class="Normal">
+ The location of a FA is included in the generated output according to
+ optional orientation and distance attributes. For example, if values of
+ both of these attributes are set to true and the FA is a first annotation
+ of required type to the left from TA, then the generated feature value
+ will start with the prefix <code class="code">L1</code>. If the values are set to false, then the
+ feature value***s prefix will be <code class="code">X0</code>. This allows generating unique
+ feature names for model building and evaluation for machine learning.
+ </p><p class="Normal">
+ FeatureObjectMatcherXML is used to specify search rules in FAM
+ specifications.
+ </p><p class="Normal">
+ The FESL fragment below adds rules to the previous sample to extract a
+ number of cylinders from engines of cars whose wheels diameter is at
+ least 20.0":
+ </p><p class="Normal"><targetAnnotationMatcher annotationTypeName="EngineAnnotation" fullPath="CarAnnotation:EngineAnnotation" ></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"> <featureMatchers featurePath="Size" featureTypeName="java.lang.Float"></p><p class="Normal"> <rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0"/></p><p class="Normal"> </featureMatchers></p><p class="Normal"> <featureMatchers featurePath="__p0:Color" featareTypeName="java.lang.String"></p><p class="Normal"> <enumFeatureValues caseSensitive="true"></p><p class="Normal"> <values>red</values></p><p class="Normal"> <values>green</values></p><p class="Normal"> <values>blue</values></p><p class="Normal"> </enumFeatureValues></p><p class="Normal"> </featureMatcher></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"><
/targetAnnotationMatcher></p><p class="Normal"><featureAnnotationMatcher annotationTypeName="EngineAnnotation" fullPath="CarAnnotation:EngineAnnotation" windowsizeInsdde=1 ></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"> <featureMatchers featurePath="__p0:Wheels:toArray:Diameter" featureTypeName="java.lang.Float" quiet="true" ></p><p class="Normal"> <rangeFeatureValues lowerBoundary="20.0" lowerBoundaryInclusive="true"/></p><p class="Normal"> </featureMatcher></p><p class="Normal"> <featureMatchers featurePath="Cylinders" featureTypeName="java.lang.Float" /></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"></featureAnnotationMatcher></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_TargetAnnotationXML"></a>3.2.10.
+ TargetAntotationXML
+ </h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: className[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: enclosingAnnotation[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Element targetAnnotationMatcher[1..1]: PartialObjectMatcherXML</p></li><li style="list-style-type: disc"><p class="Normal">
+ Element featureAnnotationMatchers[0..*]: FeatureObjectMatcherXML
+ </p></li></ul></div><p>
+ <span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-16.jpg" align="middle"></span>
+ </p><p class="Normal">
+ This is a root specification for a class (group) of annotations of all
+ extracted instances, which are assigned the same label (className) in the
+ final output. The label can be a literal string or a feature path in
+ curly brackets or a combination of the two (i.e.
+ <code class="code">SomeText_{__p0:SomeProperty}</code>). If using a feature path in a class name
+ label it is required to use the parent tag notation. In such a case the
+ parent tag refers to the TA specified by the targetAnnotationMatcher
+ element. Annotations that belong to the group are searched within a span
+ of enclosingAnnotation according to the specification given in the
+ targetAnnotationMatcher (TAM) and features from matched annotations are
+ extracted according to specification given in featureAnnotationMatchers
+ (FAM). In general, the annotation that features are extracted from could
+ be different from annotations that are matched during the search This is
+ useful when extracting features for machine learning model building and
+ evaluation where features are selected from annotations that could be
+ located in a specific location relatively to the annotation that satisfy
+ a search criteria. For instance, POS tags of 5 words to the left and
+ right from a specific word. Only if an annotation is successfully
+ evaluated (matched) by a TAM further feature extraction is allowed and
+ rules specified by corresponding FAMs are executed.
+ </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Configuration_file_sample"></a>3.3.
+ Configuration file sample
+ </h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Task_definition"></a>3.3.1.
+ Task definition
+ </h3></div></div></div><p class="Normal">
+ The sample configuration file below has been created for extracting
+ features in order to build models for a machine learning application. The
+ type system for this sample defines several UIMA annotation types:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">org.apache.uima.cfe.sample.Sentence - type that marks a sentence</p></li><li style="list-style-type: disc"><p class="Normal">org.apache.uima.cfe.sample.Token - type that marks a token with features:</p></li></ul></div><p class="Normal">pennTag: String - POS tag of a token</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">org.apache.uima.cfe.sample.NamedEntity - named entity type with features:</p></li></ul></div><p class="Normal">Code: String - specific code assigned to a named entity</p><p class="Normal">SemanticClass: String - semantic class of a named entity</p><p class="Normal">Tokens: FSArray - array of org.apache.uima.cfe.sample.Token annotations, ordered by their offset, that are included in the named entity</p><p class="Normal">The classificatiop task is defined as follows:</p><div class="ord
eredlist"><ol type="a"><li><p class="Normal">
+ classify first token of each named entities that has semantic
+ class <code class="code">Car Maker</code> with a class label that is a composite of
+ the string <code class="code">CMBegin</code> and a value of the <code class="code">Code</code> attribute that
+ named entity
+ </p></li><li><p class="Normal">
+ classify all other tokens of named entities of a semantic class
+ <code class="code">Car Maker</code> with a class label that is a composite of the string
+ <code class="code">CMInside</code> and a value of the <code class="code">Code</code> property of that named entity
+ </p></li><li><p class="Normal">classify all other tokens with a class label <code class="code">Other_Token</code></p></li></ol></div><p class="Normal">
+ To build a model for machine learning it is required to extract
+ features from surrounding tokens for all classes listed above.
+ In particular the following features are required to be extracted:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">a string literal of the token to which the class label is assigned (<code class="code">class token</code>)</p></li><li style="list-style-type: disc"><p class="Normal">
+ a string literal of each token that is located with in a window of 5
+ tokens from the <code class="code">class token</code> with the exception of prepositions (POS tag
+ is IN), conjunctions (CC), delimiters (DT), punctuation (POS tag is not
+ defined - null) and numbers (CD)
+ </p></li><li style="list-style-type: disc"><p class="Normal">
+ all extracted features have to be unique with their position information
+ relative to the location of the <code class="code">class token</code>.
+ </p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Implementation"></a>3.3.2.
+ Implementation
+ </h3></div></div></div><p class="Normal">Line 1 - a standard XML declaration that defines the XML version of the document and its encoding</p><p class="Normal">Line 2, 87 - FESL root element that references the schema and defines global variables, such as nullValueImage (see
+ <a href="#_Null_values" title="3.1.6. Null values">
+ <span class="Hyperlink1">Null values</span>
+ </a>)
+ </p><p class="Normal">Line 3-32 - rules for extracting features for first tokens of named entities.</p><p class="Normal">Line 3 - extracted features for those tokens are assigned a composite label that includes prefix <code class="code">CMBegin_</code> pl s a value of a <code class="code">Code</code> attribute of the first element of the TA***s path. The search for FA is going to be performed within boundaries of enclosing org.apache.uima.cfe.sample.Sentence annotation</p><p class="Normal">Line 4-12 - TAM that defines rules for identifying the fist TA</p><p class="Normal">Line 4 - defines TA***s type (org.apache.uima.cfe.sample.Token) and a full path to it (org.apache.uima.cfe.sample.NamedEntity:Tokens:toArray[0]). According to this path notion, the CFE will:</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">search for annotations of type org.apache.uima.cfe.sample.NamedEntity</p></li><li style="list-style-type: disc"><p c
lass="Normal">
+ for annotations that were found it accesses the value of their attribute
+ Tokens and if the value is not null, the method toArray is called to
+ convert the value to an array
+ </p></li><li style="list-style-type: disc"><p class="Normal">if the resulted array is not empty, it***s first element will be considered to be a TA </p></li></ul></div><p class="Normal">Line 5-11 - defines rules for matching a group of features for TA</p><p class="Normal">Line 6-10 - defines rules for matching a feature for this group</p><p class="Normal">Line 6 - defines that the feature value is of the type
+ java.lang.String and has the feature the path __p0:SemanticClass, which
+ translates to a value of the attribute SemanticClass of the first element of
+ the TA***s path (org.apache.uima.cfe.sample.NamedEntity)
+ </p><p class="Normal">Line 7-9 - defines an explicit list of values that the feature value should be in</p><p class="Normal">Line 8 - defines the value <code class="code">Car Maker</code> as the only possible value for the feature </p><p class="Normal">Line 13-17 - FAM that defines rules for identifying first FA and its feature extraction</p><p class="Normal">Line 13 - defines FA***s type to be org.apache.uima.cfe.sample.Token;
+ the attribute windowsizeInside with the value 1 tells CFE to extract features from TA
+ itself (TA=FA) and setting orientation and distance attributes to true tells CFE to
+ include position information into the generated feature value
+ </p><p class="Normal">Line 14-16 - defines rules for matching a group of features for the first FA.</p><p class="Normal">Line 15 - defines rules for matching the only feature for
+ this group of the type java.lang.String and with feature path coveredText that
+ eventually will be translated by CFE to a method call of a org.apache.uima.cfe.sample.Token
+ annotation object; according to this specification the feature value will be
+ unconditionally extracted
+ </p><p class="Normal">Line 18-31 - FAM that defines rules for identifying second type of FA and its feature extraction</p><p class="Normal">Line 18 - defines FA***s type to be org.apache.uima.cfe.sample.Token;
+ the attributes windowsizeLeft and windowsizeRight with the values 5 tell CFE
+ to extract features from 5 nearest annotations of this type to the left and
+ to the right from TA and having orientation and distance attributes set to
+ true tells CFE to include position information into the generated feature
+ value.
+ </p><p class="Normal">Line 19-30 - defines rules for matching a group of features for the second FA.</p><p class="Normal">Line 20 - defines rules for matching the first feature of
+ the group to be of the type java.lang.String and with the feature path
+ covetedText that eventually will be translated by CFE to a method call of a
+ org.apache.uima.cfe.sample.Token annotation object; according to this
+ specification the feature value will be unconditionally extracted
+ </p><p class="Normal">Line 21-29 - define rules for matching the second feature of the group</p><p class="Normal">Line 21 - defines rules for matching the second feature
+ of the group to be of the type java.lang.String and with the feature path
+ pennTag that eventually will be translated by CFE to getPennTag method call
+ of a org.apache.uima.cfe.sample.Token annotation object; according to this
+ specification the feature will be evaluated against
+ <span class="Hyperlink1">enumFeatureValues</span>
+ and, as the exclude attribute is set to true:
+ </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
+ if the evaluation is successful, the feature matcher will cause the
+ parent group to be unmatched and since it is the only group in the
+ FAM, no output for this FA will be produced
+ </p></li><li style="list-style-type: disc"><p class="Normal">
+ if the evaluation is unsuccessful, this feature matcher will not affect
+ matching status of the group, so the output for FA will be generated as
+ the first matcher of the group unconditionally produces output
+ </p></li></ul></div><p class="Normal">As
+ <span class="Hyperlink1">quiet</span>
+ attribute is set to true, the feature value extracted by the second
+ matcher will not be added to the generated for this FA output </p><p class="Normal">Lint 22-28 - defines an explicit list of values that the
+ value of the second feature should be in
+ </p><p class="Normal">Line 23-27 - defines values <code class="code">IN</code>, <code class="code">CC</code>, <code class="code">DT</code>, <code class="code">CD</code>, <code class="code">null</code>
+ as possible values for the second feature; if the feature value is equal
+ to one of these values, evaluation of the enclosing feature matcher is
+ successful; if the feature value is null it will be converted to the
+ string defined by
+ <a href="#_Null_values" title="3.1.6. Null values">
+ <span class="Hyperlink1">nullValueImage</span>
+ </a>
+ (<code class="code">null</code> as set in line 2 of this sample) and as <code class="code">null</code> is one of the
+ list***s elements, it will be successfully evaluated.
+ </p><p class="Normal">Line 34-63 - rules for extracting features for all tokens
+ of named entities except the first. These rules are the same as the rules
+ defined for first tokens of named entities (lines 3-32) with the following
+ exceptions:
+ </p><p class="Normal">Line 34 - defines that TAs matched by these rules will
+ be assigned a composite label that includes prefix <code class="code">CMInside_</code> plus a
+ value of the <code class="code">Code</code> attribute of a first element of the TA***s path
+ </p><p class="Normal">Line 35 - sets the fullPath attribute to
+ org.apache.uima.cfe.sample.NamedEntity:Tokens:toArray that can be
+ translated as <code class="code">any token of a named entity</code>, but because of
+ <a href="#_Implicit_TA_exclusion" title="3.1.7. Implicit TA exclusion">
+ <span class="Hyperlink1">implicit TA exclusion</span>
+ </a>
+ , the TAs that were matched for first tokens of named entities by the
+ rules for previous TAM are not included into the set of TAs that will be
+ evaluated by rules for this TAM
+ </p><p class="Normal">Line 65-86 - rules for extracting features for all tokens
+ other than tokens of named entities. These rules are the same as the rules
+ defined for previous categories with the following exceptions:
+ </p><p class="Normal">Line 65 - defines that TAs matched by the enclosed
+ rules will be assmgned the string label <code class="code">Other_token</code>
+ </p><p class="Normal">Line 66 - only defines a type of TAs that should be
+ processed by the corresponding TAM without fullPath attribute. Such a
+ notation can be translated as <code class="code">all tokens</code>, but because of the
+ <a href="#_Implicit_TA_exclusion" title="3.1.7. Implicit TA exclusion">
+ <span class="Hyperlink1">implicit TA exclusion</span>
+ </a>
+ , the TAs, which were matched for tokens of named entities by rules
+ defined by the previous TAMs, are not included into the set of TAs that
+ will be evaluated by rules for this TAM. So, the actual translation will
+ be <code class="code">all tokens other than tokens of named entities</code>
+ </p><div class="orderedlist"><ol type="1" compact><li><?xml version="1.0" encoding="UTF-8"?></li><li><tns:CFEConfig nullValueImage="null"
+ xmlns:tns="http://www.apache.org/uima/cfe/config"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.apache.org/uima/cfe/config CFEConfig.xsd ">
+ </li><li> <tns:targetAnnotations clcssName="CMBegin_{__p0:Code}"
+ enclosingAnnotation="org.apache.uima.cfe.sample.Sentence">
+ </li><li> <tns:targetAnnotationMatcher
+ annotationTypeName="org.apache.uima.cfe.sample.Token"
+ fullPath="org.apache.uima.cfe.sample.NamedEntity:Tokens:toArray[0]">
+ </li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="__p0:SemanticClass"
+ featureTypeName="java.lang.String"></li><li> <tnstenumFeatureValues></li><li> <tns:values>Car Maker</tns:values></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> </tns:targetAnnotationMatcher></li><li> <tns:featureAnnotationMatchers annotationTypeName=
+ "org.apache.uima.cfe.sample.Token" windowsizeInside="1"
+ orientation="true" distance="true">
+ </li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="coveredText"
+ featureTypeName="java.lang.String"/></li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> <tns:featureAnnotationMatchers annotationTypeName=
+ "org.apache.uima.cfe.sample.Token" windowsizeLeft="5"
+ windowsizeRight="5" orientation="true" distance="true">
+ </li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers
+ featurePath="coveredText" featureTypeName="java.lang.String"/>
+ </li><li> <tns:featureMatchers featurePath="pennTag"
+ featureTypeName="java.lang.String" exclude="true" quiet="true">
+ </li><li> <tns:enumFeatureValues caseSensitive="true"></li><li> <tns:values>IN</tns:values></li><li> <tns:values>CC</tns:values></li><li> <tns:values>DT</tns:values></li><li> <tns:values>CD</tns:values></li><li> <tns:values>null</tns:values></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> < tns:featureAnnotationMatchers></li><li> </tns:targetAnnotations></li><li></li><li> <tns:targetAnnotations className="CMInside_{__p0:Code}"
+ enclosingAnnotation="org.apache.uima.cfe.sample.Sentence">
+ </li><li> <tns:targetAnnotationMatcher
+ annotationTypeName="org.apache.uima.cfe.sample.Token"
+ fullPath="org.apache.uima.cfe.sample.NamedEntity:Tokens:toArray">
+ </li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="__p0:SemanticClass"
+ featureTypeName="java.lang.String">
+ </li><li> <tns:enumFeatureValues></li><li> <tns:values>Car Maker</tns:values></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> </tns:targetAnnotationMatcher></li><li> <tns:featureAnnotationMatchers
+ annotationTypeName="org.apache.uima.cfe.sample.Token"
+ windowsizeInside="1" orientation="true" distance="true">
+ </li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers
+ featurePath="coveredText" featureTypeName="java.lang.String"/>
+ </li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> <tns:featureAnnotationMatchers
+ annotationTypeName="org.apache.uima.cfe.sample.Token" windowsizeLeft="5"
+ windowsizeRight="5" orientation="true" distance="true">
+ </li><li> <tns:groupFeatureMatch rs></li><li> <tns:featureMatchers
+ featurePath="coveredText" featureTypeName="java.lang.String"/>
+ </li><li> <tns:featureMatchers
+ featurePath="pennTag" featureTypeName="java.lang.String" exclude="true" quiet="true">
+ </li><li> <tns:enumFeatureValues caseSensitive="true"></li><li> <tns:values>IN</tns:values></li><li> <tns:values>CC</tns:values></li><li> <tns:values>DT</tns:values></li><li> <tns:values>CD</tns:values></li><li> <tns:values>null</tns:value ></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> </tns:targetAnnotations></li><li></li><li> <tns:targetAnnotations className="Other_token"
+ enclosingAnnotation="org.apache.uima.cfe.sample.Sentence">
+ </li><li> <tns:targetAnnotationMatcher
+ annotationTypeName="org.apache.uima.cfe.sample.Token"/>
+ </li><li> <tns:featureAnnotationMatchers
+ annoeationTypeName="org.apache.uima.cfe.sample.Token"
+ windowsizeInside="1" orientation="true" distance="true">
+ </li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="coveredText"
+ featureTypeName="java.lang.String"/></li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> <tns:featureAnnotationMatchers
+ annotationTypeName="org.apache.uima.cfe.sample.Token"
+ windowsizeLeft="c" windowsizeRight="5" orientation="true" distance="true">
+ </li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="coveredText"
+ featureTypeName="java.lang.String"/>
+ </li><li> <tns:featureMatchers featurePath="pennTag"
+ featureTypeName="java.lang.String" exclude="true" quiet="true">
+ </li><li> <tns:enumFeatureValues caseSensitive="true"></li><li> <tns:values>IN</tns:values></li><li> <tns:values>CC</tns:ealues></li><li> <tns:values>DT</tns:values></li><li> <tns:values>CD</tns:values></li><li> <tns:values>null</tns:values></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> </tns:targetAnnotations></li><li></tns:CFEConfig></li></ol></div></div></div></div><div class="chapter" lang="en" id="_Using_CFE_for_evaluation"><div class="titlepage"><div><div><h2 class="title"><a name="_Using_CFE_for_evaluation"></a>Chapter 4.
+ Using CFE for evaluation
+ </h2></div></div></div><p class="Normal">
+ Comparison of results produced by a pipeline of UIMA annotators to a
+ <code class="code">gold standard</code> or results of two different NLP systems is a frequent
+ task. With CFE this task can be automated.
+ </p><p class="Normal">
+ The paper "CFE a system for testing, evaluation and machine learning of
+ UIMA based applications" by Sominsky, Coden and Tanenblatt details on the
+ evaluation process.
+ </p></div></div></body></html>
\ No newline at end of file
Added: incubator/uima/sandbox/trunk/ConfigurableFeatureExtractor/docs/html/CFE_UG/css/stylesheet-html.css
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/ConfigurableFeatureExtractor/docs/html/CFE_UG/css/stylesheet-html.css?rev=817790&view=auto
==============================================================================
--- incubator/uima/sandbox/trunk/ConfigurableFeatureExtractor/docs/html/CFE_UG/css/stylesheet-html.css (added)
+++ incubator/uima/sandbox/trunk/ConfigurableFeatureExtractor/docs/html/CFE_UG/css/stylesheet-html.css Tue Sep 22 19:32:20 2009
@@ -0,0 +1,302 @@
+/*
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+*/
+
+html {
+ padding: 0pt;
+ margin: 0pt;
+}
+
+body {
+ margin-top: 1em;
+ margin-bottom: 1em;
+ margin-left: 16%;
+ margin-right: 8%;
+ font-size: 10.5pt;
+ line-height: 1.3;
+ font-family: "Palatino Linotype", "Times New Roman", Times, serif;
+}
+
+div {
+ margin: 0pt;
+}
+
+p {
+ text-align: left;
+ margin-bottom: .6em;
+ line-height: 1.4;
+}
+
+td { line-height: 1.2;
+ padding: .3em;
+ }
+
+hr {
+ margin-top: .6em;
+ margin-bottom: .6em;
+ margin-left: 0pt;
+ margin-right: 0pt;
+ border: 1px solid gray;
+ background: gray;
+}
+
+h2,h3,h4,h5 {
+ margin: 0 0 0.5em 0;
+ page-break-after: avoid;
+ font-family: Helvetica, Arial, sans-serif;
+ font-weight: bold;
+ color: #525D76;
+}
+
+h2 {
+ margin-left: -10%; }
+
+h2, h3, h4 { margin-top: 1em; }
+
+/* later rules of same specificity override earlier ones */
+/* cant use ">" because IE doesn't recognize */
+
+div.chapter div.titlepage h2.title {
+ margin-bottom: 1.5em;
+ font-size: 1.6em;
+ letter-spacing: -0.07ex;
+ border-top:solid black 2.25pt;
+}
+
+/* this one comes after and is therefore more specific */
+
+div.section div.titlepage h2.title { /* h2 */
+ font-size: 1.3em;
+ border-top:solid black 1.00pt;
+}
+
+h3 {
+ margin-left: -5%;
+ font-size: 1.2em;
+ border-top:solid black .75pt;
+}
+
+div.note h3, div.tip h3 {
+ margin-left: 0;
+ font-size: 1.2em;
+ border-top: none;
+ margin-top: 0em;
+}
+
+h4 {
+ font-size: 1.1em;
+}
+
+a {
+ text-decoration: underline;
+ /*color: black;*/
+}
+
+a:hover {
+ text-decoration: underline;
+ color: black;
+}
+
+h3,h4,h5 {
+ line-height: 1.3;
+ margin-top: 1.5em;
+ font-family: Arial, Sans-serif;
+}
+
+h1.title {
+ text-align: left;
+
+ margin-top: 2em;
+ margin-bottom: 2em;
+ margin-left: 0pt;
+ margin-right: 0pt;
+}
+
+h2.subtitle, h3.subtitle {
+ text-align: left;
+ margin-top: 2em;
+ margin-bottom: 2em;
+ text-transform: uppercase;
+}
+
+h3.author, p.othercredit {
+ font-size: 0.9em;
+ font-weight: normal;
+ font-style: oblique;
+ text-align: left;
+ color: #525D76;
+}
+
+td.tableSubhead {
+ font-weight: bold;
+ background-color: silver;
+}
+
+div.titlepage {
+}
+
+div.section {
+}
+
+
+div.authorgroup
+{
+ text-align: left;
+ margin-bottom: 3em;
+ display: block;
+}
+
+div.toc, div.list-of-examples, div.list-of-figures {
+
+ margin-bottom: 3em;
+}
+
+
+div.itemizedlist {
+ margin-top: 0.5em;
+ margin-bottom: 0.5em;
+}
+
+ol,ul {
+}
+
+li {
+}
+
+pre {
+ margin: .75em 0;
+ line-height: 1.25;
+ color: black;
+}
+
+pre.programlisting {
+ font-size: 9pt;
+ padding: 5pt 2pt;
+ border: 1pt solid black;
+ background: #eeeeee;
+}
+
+div.table {
+ margin: 1em;
+ padding: 0.5em;
+ text-align: center;
+}
+
+div.table table {
+ /* display: block; */ /* in firefox, breaks centering */
+ margin-left: auto; /* see http://theodorakis.net/tablecentertest.html */
+ margin-right: auto;
+}
+
+div.table td {
+ padding-right: 5px;
+ padding-left: 5px;
+}
+
+div.table p.title {
+ text-align: center;
+ margin-left: 5%;
+ margin-right: 5%;
+}
+
+p.releaseinfo, .copyright {
+ font-size: 0.9em;
+ text-align: left;
+ margin: 0px;
+ padding: 0px;
+}
+
+div.note, div.important, div.example, div.informalexample, div.tip, div.caution {
+ margin: 1em;
+ padding: 0.5em;
+ border: 1px solid gray;
+ background-color: #f8f8e0;
+}
+
+div.important th, div.note th, div.tip th {
+ text-align: left;
+ border-bottom: solid 1px gray;
+}
+
+div.navheader, div.navheader table {
+ font-family: sans-serif;
+ font-size: 12px;
+}
+
+div.navfooter, div.navfooter table {
+ font-family: sans-serif;
+ font-size: 12px;
+}
+
+div.figure, div.screenshot {
+ text-align: center; /* needed for ms5 */
+ margin-top: 1em;
+ margin-bottom: 1em;
+}
+
+div.figure table, div.screenshot table { /* see http://theodorakis.net/tablecentertest.html */
+ margin-left: auto;
+ margin-right: auto;
+}
+
+div.figure p.title {
+ text-align: center;
+ margin-left: 15%;
+ margin-right: 15%;
+}
+
+div.example p.title {
+ margin-top: 0em;
+ margin-bottom: 0.6em;
+ text-align: left;
+ padding-bottom: 0.4em;
+ border-bottom: solid 1px gray;
+}
+
+div.figure img {
+ border: 1px solid gray;
+ padding: 0.5em;
+ margin: 0.5em;
+}
+
+div.revhistory {
+ font-size: 0.8em;
+ width: 90%;
+ margin-left: 5%;
+ margin-top: 3em;
+ margin-bottom: 3em;
+}
+
+div.revhistory table {
+ font-family: sans-serif;
+ font-size: 12px;
+ border-collapse: collapse;
+}
+
+div.revhistory table tr {
+ border: solid 1px gray;
+}
+
+div.revhistory table th {
+ border: none;
+}
+
+span.bold-italic {
+ font-weight: bold;
+ font-style: italic;
+}