You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2009/09/22 21:07:16 UTC
svn commit: r817775 [1/2] - in
/incubator/uima/sandbox/trunk/RegularExpressionAnnotator: ./ docs/
docs/html/ docs/html/RegexAnnotatorUserGuide/
docs/html/RegexAnnotatorUserGuide/css/ docs/html/images/
docs/html/images/RegexAnnotatorUserGuide/ docs/html...
Author: schor
Date: Tue Sep 22 19:07:01 2009
New Revision: 817775
URL: http://svn.apache.org/viewvc?rev=817775&view=rev
Log:
UIMA-1583 add new property needed for doc build, save docs for website, update pom to share more from parents, change dependencies for some 3rd party Jars
Added:
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/RegexAnnotatorUserGuide/
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/blank.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/10.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/10.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/11.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/11.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/12.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/12.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/13.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/13.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/14.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/14.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/15.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/15.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/2.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/2.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/3.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/3.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/4.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/4.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/5.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/5.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/6.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/6.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/7.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/7.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/8.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/8.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/9.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/9.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/caution.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/caution.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/caution.svg
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/caution.tif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/draft.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/home.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/home.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/home.svg
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/important.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/important.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/important.svg
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/important.tif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/next.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/next.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/next.svg
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/note.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/note.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/note.svg
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/note.tif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/prev.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/prev.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/prev.svg
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/tip.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/tip.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/tip.svg
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/tip.tif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/toc-blank.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/toc-minus.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/toc-plus.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/up.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/up.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/up.svg
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/warning.gif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/warning.png (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/warning.svg
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/warning.tif (with props)
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/pdf/
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/pdf/RegexAnnotatorUserGuide.pdf (with props)
Modified:
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/pom.xml
Modified: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml?rev=817775&r1=817774&r2=817775&view=diff
==============================================================================
--- incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml (original)
+++ incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml Tue Sep 22 19:07:01 2009
@@ -24,6 +24,7 @@
<project name="Apache UIMA Regular Expression Annotator Documentation" default="all" basedir=".">
<property name="book_name" value="RegexAnnotatorUserGuide"/>
+ <property name="artifactId" value="RegularExpressionAnnotator"/>
<import file="${basedir}/../SandboxDocs/sandbox_build.xml"/>
Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html?rev=817775&view=auto
==============================================================================
--- incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html (added)
+++ incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html Tue Sep 22 19:07:01 2009
@@ -0,0 +1,1193 @@
+<html><head>
+ <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+ <title>Apache UIMA Regular Expression Annotator Documentation</title><link rel="stylesheet" href="css/stylesheet-html.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.72.0"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="book" lang="en" id="d0e2"><div class="titlepage"><div><div><h1 class="title"><a name="d0e2"></a>
+ Apache UIMA Regular Expression Annotator Documentation
+ </h1></div><div><div class="authorgroup"><h3 class="corpauthor">Authors: The Apache UIMA Development Community</h3></div></div><div><span class="productname">Apache UIMA Sandbox<br></span></div><div><p class="releaseinfo">Version 2.3.0</p></div><div><p class="copyright">Copyright © 2008, 2009 The Apache Software Foundation</p></div><div><div class="legalnotice"><a name="d0e15"></a><p> </p><p><b>Incubation Notice and Disclaimer. </b>Apache UIMA is an effort undergoing incubation at the Apache Software Foundation (ASF).
+ Incubation is required of all newly accepted projects until a further review indicates that
+ the infrastructure, communications, and decision making process have stabilized in a manner
+ consistent with other successful ASF projects. While incubation status is not necessarily
+ a reflection of the completeness or stability of the code,
+ it does indicate that the project has yet to be fully endorsed by the ASF.</p><p> </p><p> </p><p><b>License and Disclaimer. </b>The ASF licenses this documentation
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this documentation except in compliance
+ with the License. You may obtain a copy of the License at
+
+ </p><div class="blockquote"><blockquote class="blockquote"><p>
+ <a xmlns:xlink="http://www.w3.org/1999/xlink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a>
+ </p></blockquote></div><p>
+
+ Unless required by applicable law or agreed to in writing,
+ this documentation and its contents are distributed under the License
+ on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ </p><p> </p><p> </p><p><b>Trademarks. </b>All terms mentioned in the text that are known to be trademarks or
+ service marks have been appropriately capitalized. Use of such terms
+ in this book should not be regarded as affecting the validity of the
+ the trademark or service mark.
+ </p></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="preface"><a href="#d0e54">Introduction</a></span></dt><dt><span class="chapter"><a href="#sandbox.regexAnnotator.processingOverview">1. Processing Overview</a></span></dt><dt><span class="chapter"><a href="#sandbox.regexAnnotator.conceptsFile">2. Concepts Configuration File</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rules">2.1. RuleSet definition</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.concepts">2.2. Concept definition</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.regexVariables">2.3. Regex Variables</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rulesDefinition">2.4. Rule Definition</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rulesDefinitio
n.filter">2.4.1. Match Type Filter</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rulesDefinition.update">2.4.2. Update Match Type Annotations With Additional Features</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rulesDefinition.exception">2.4.3. Rule exception</a></span></dt></dl></dd><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation">2.5. Annotation Creation</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries">2.5.1. Annotation Boundaries</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.validation">2.5.2. Annotation Validation</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.features">2.5.3. Annotation Features</a></span></dt></dl></dd></dl></dd><dt><span class="chapte
r"><a href="#sandbox.regexAnnotator.annotatorDescriptor">3. Annotator Descriptor</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.regexAnnotator.annotatorDescriptor.configParam">3.1. Configuration Parameters</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.annotatorDescriptor.capabilities">3.2. Capabilities</a></span></dt></dl></dd><dt><span class="appendix"><a href="#sandbox.regexAnnotator.xsd">A. Concept File Schema</a></span></dt><dt><span class="appendix"><a href="#sandbox.regexAnnotator.Validation">B. Validation Interface</a></span></dt><dt><span class="appendix"><a href="#sandbox.regexAnnotator.Normalization">C. Normalization Interface</a></span></dt></dl></div><div class="preface" lang="en" id="d0e54"><div class="titlepage"><div><div><h2 class="title"><a name="d0e54"></a>Introduction</h2></div></div></div><p>
+ The Regular Expression Annotator (RegexAnnotator) is an
+ Apache UIMA analysis engine that detects entities such as
+ email addresses, URLs, phone numbers, zip codes or any other
+ entity that can be specified using a regular expression. For
+ each entity that is detected an own annotation can be
+ created or an already existing annotation can be updated
+ with new features.
+
+ To detect also more difficult and complex entities, the
+ annotator provides some advanced filter capabilities and a
+ rule definition syntax that can combine rules to a concept
+ with a confidence value for each of the concept's rules.
+ </p></div><div class="chapter" lang="en" id="sandbox.regexAnnotator.processingOverview"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.processingOverview"></a>Chapter 1. Processing Overview</h2></div></div></div><p>
+ To detect any kind of entity the RegexAnnotator must be
+ configured using an external XML file. We call this file
+ "concept file" since it contains the regular expressions and
+ concepts that the annotator use during its processing to
+ detect entities. In addition to the rules the concept file
+ also contains the "entity result processing" that is done if
+ an entity was detected. The "entity result processing" can
+ either be the creation of new annotations or an update of an
+ existing annotation with additional features. The types and
+ features that are used to create new annotations have to be
+ available in the UIMA type system.
+ </p><p>
+ After the concept file is created, the annotator XML
+ descriptor have to be updated with the capabilities and
+ maybe with the type system information from the concept
+ file. The capability update is necessary that the UIMA
+ framework can call the annotator also in complex annotator
+ flows if the annotator is assembled with others to an
+ analysis bundle. The UIMA type system update is only
+ necessary if the used types are not available in the UIMA
+ type system definition.
+ </p><p>
+ With the completion of the descriptor updates, the
+ RegexAnnotator is ready to use. When starting the annotator,
+ during the initialization the annotator reads the concept
+ file and checks if all rules and concepts are valid and if
+ all annotations types are defined in the UIMA type system.
+ For each document that is processed the rules and concepts
+ are executed in exactly the same order as defined in the
+ concept file. The results and annotations created for a
+ preceding rule are used by the following one since they are
+ stored in the CAS.
+ </p></div><div class="chapter" lang="en" id="sandbox.regexAnnotator.conceptsFile"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.conceptsFile"></a>Chapter 2. Concepts Configuration File</h2></div></div></div><p>
+ The RegexAnnotator can be configured using two levels of
+ complexity.
+ </p><p>
+ The RuleSet definition is the easier way to define rules.
+ Such a definition consists of a regular expression pattern
+ and of annotations that should be created if the rule match
+ an entity.
+ </p><p>
+ The Concept definition is the more complex way to define
+ rules. Such a definition can consists of more than one
+ regular expression rule that can be combined together and of
+ a set of annotations that should be created if one of the
+ rules has matched an entity.
+ </p><p>
+ The syntax for both definitions is the same, so you don't
+ need to learn two configuration possibilities. The RuleSet
+ definition is just available to have an easier and faster
+ way to configure the annotator for simple tasks. If you have
+ a RuleSet definition it is also possible to extend it with
+ more and more features so that it becomes a real Concept
+ definition.
+ </p><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.rules"></a>2.1. RuleSet definition</h2></div></div></div><p>
+ The syntax of a simple RuleSet definition to detect email addresses
+ is shown in the listing below:
+ </p><p>
+ </p><pre class="programlisting"><conceptSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:noNamespaceSchemaLocation="concept.xsd">
+
+ <concept name="emailAddressDetection">
+ <rules>
+ <rule regEx="([a-zA-Z0-9!#$%*+'/=?^_-`{|}~.\x26]+)@
+ ([a-zA-Z0-9._-]+[a-zA-Z]{2,4})"
+ matchStrategy="matchAll" matchType="uima.tcas.DocumentAnnotation"/>
+ </rules>
+ <createAnnotations>
+ <annotation id="emailAnnot" type="org.apache.uima.EmailAddress">
+ <begin group="0"/>
+ <end group="0"/>
+ </annotation>
+ </createAnnotations>
+ </concept>
+
+</conceptSet>
+</pre><p>
+ </p><p>
+ The definition above defines are simple concept
+ with the name <code class="code">emailAddressDetection</code>. The
+ defined rule use <code class="code">([a-zA-Z0-9!#$%*+'/=?^_-`{|}~.\x26]+)@([a-zA-Z0-9._-]+[a-zA-Z]{2,4})</code> as
+ regular expression pattern that is matched on the
+ covered text of the match type <code class="code">uima.tcas.DocumentAnnotation</code>.
+ As match strategy, <code class="code">matchAll</code> is used that means that all
+ matches for the pattern are used to create the
+ annotations defined in the
+ <code class="code"><createAnnotations></code>
+ element. So for each match a
+ <code class="code">org.apache.uima.EmailAddress</code> annotation is created that
+ covers the match in the document text.
+ </p><p>
+ For additional annotation creation possibilities such as adding
+ features to a created annotation, please refer to
+ <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation" title="2.5. Annotation Creation">Section 2.5, “Annotation Creation”</a>
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.concepts"></a>2.2. Concept definition</h2></div></div></div><p>The syntax of a complex Concept definition to detect credit card numbers for the
+ RegexAnnotator is shown in the listing below:</p><p>
+
+ </p><pre class="programlisting"><conceptSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:noNamespaceSchemaLocation="concept.xsd">
+
+ <concept name="creditCardNumberDetection" processAllRules="true">
+ <rules>
+ <rule ruleId="AmericanExpress"
+ regEx="(((34|37)\d{2}[- ]?)(\d{6}[- ]?)\d{5})"
+ matchStrategy="matchAll"
+ matchType="uima.tcas.DocumentAnnotation"
+ confidence="1.0"/>
+ <rule ruleId="Visa"
+ regEx="((4\d{3}[- ]?)(\d{4}[- ]?){2}\d{4})"
+ matchStrategy="matchAll"
+ matchType="uima.tcas.DocumentAnnotation"
+ confidence="1.0"/>
+ <rule ruleId="MasterCard"
+ regEx="((5[1-5]\d{2}[- ]?)(\d{4}[- ]?){2}\d{4})"
+ matchStrategy="matchAll"
+ matchType="uima.tcas.DocumentAnnotation"
+ confidence="1.0"/>
+ <rule ruleId="unknownCardType"
+ regEx="(([1-6]\d{3}[- ])(\d{4}[- ]){2}\d{4})|
+ ([1-6]\d{13,18})|([1-6]\d{3}[- ]\d{6}[- ]\d{5})"
+ matchStrategy="matchAll"
+ matchType="uima.tcas.DocumentAnnotation"
+ confidence="1.0"/>
+ </rules>
+ <createAnnotations>
+ <annotation id="creditCardNumber"
+ type="org.apache.uima.CreditCardNumber"
+ validate="org.apache.uima.annotator.regex.
+ extension.impl.CreditCardNumberValidator">
+ <begin group="0"/>
+ <end group="0"/>
+ <setFeature name="confidence" type="Confidence"/>
+ <setFeature name="cardType" type="RuleId"/>
+ </annotation>
+ </createAnnotations>
+ </concept>
+
+</conceptSet>
+</pre><p>
+
+ </p><p>
+ As you can see the Concept definition is a more complex
+ RuleSet definition. The main differences are some additional
+ features defined at the rule and the combination of rules
+ within one concept.
+ The new features for a rule are <code class="code">ruleID</code>
+ and <code class="code">confidence</code>. If these features
+ are specified, the feature values for these features can
+ later be assigned to an annotation feature for a created annotation.
+ In case we use the listing above as example this means that when the
+ <code class="code">org.apache.uima.CreditCardNumber</code> is created the value of the
+ <code class="code">confidence</code> feature of the rule that matched the document text
+ is assigned to the annotation feature called <code class="code">confidenceValue</code>.
+ The same is done for the <code class="code">ruleId</code> feature.
+ With that you can later check your annotation confidence and you can see
+ which rule was responsible for the annotation creation.
+ </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
+ The annotation features for <code class="code">Confidence</code>
+ and <code class="code">RuleId</code>
+ have to be created manually in the UIMA type system.
+ Given that it is possible to assign the <code class="code">confidence</code> and <code class="code">ruleId</code>
+ feature values to any other annotation feature you have defined
+ in the UIMA type system. Confidence features have to be of type
+ <code class="code">uima.cas.Float</code> and RuleId features have to be of
+ type <code class="code">uima.cas.String</code>.
+ </p></div><p>
+ The processing of a concept definition depends on the rule processing.
+ The feature that controls the rule processing is called
+ <code class="code">processAllRules</code> and is specified at the <code class="code"><concept></code> element.
+ By default this optional feature is set to <code class="code">false</code>.
+ This means that the concept processing
+ starts with the first rule and goes on with the next one
+ until a match was found. So in this processing mode, maybe only the first rule
+ of a concept is evaluated if there a match was found. The other rules
+ of this concept will be ignored in that case.
+ This strategy should be used for example if your first concept
+ rule has a strict pattern with a confidence of 1.0 and your
+ second rule has a more lenient pattern with a confidence
+ of 0.5. If the <code class="code">processAllRules</code> feature
+ is set to <code class="code">true</code> all rules of a concept are processed
+ independent of the matches for a previous rule.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.regexVariables"></a>2.3. Regex Variables</h2></div></div></div><p>
+ The regex variables allows to externalize parts of a regular expression
+ to shorten them and make it easier to read. The externalized part of the
+ expression is replaced with a regex variable. The variable syntax looks like
+ <code class="code">\v{weekdays}</code>, where <code class="code">weekdays</code> is the variable name.
+ The field for regex variables are mainly the separation of enumerations in a
+ regular expression to make them easier to understand and maintain.
+ But let's see how it works in the short example below.
+ </p><p>
+ A simple regular expression for a date like <code class="code">Wednesday, November 28, 2007</code>
+ can look like:
+ </p><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><concept name="Date" processAllRules="true">
+ <rules>
+ <rule regEx="(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday),
+ (January|February|March|April|May|June|July|August|September|October|
+ November|December) (0[1-9]|[12][0-9]|3[01]), ((19|20)\d\d)"
+ matchStrategy="matchAll" matchType="uima.tcas.DocumentAnnotation"/>
+ </rules>
+ <createAnnotations>
+ <annotation type="org.apache.uima.Date">
+ <begin group="0" />
+ <end group="0" />
+ </annotation>
+ </createAnnotations>
+</concept>
+</em></span></pre><p>
+ </p><p>
+ When using regex variables to externalize the weekdays and the months in this
+ regular expression, it looks like:
+ </p><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><conceptSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xmlns="http://incubator.apache.org/uima/regex">
+
+<variables>
+ <variable name="weekdays"
+ value="Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday"/>
+
+ <variable name="months"
+ value="January|February|March|April|May|June|July|August|September|
+ October|November|December"/>
+</variables>
+
+
+<concept name="Date" processAllRules="true">
+ <rules>
+ <rule regEx="(\v{weekdays}), (\v{months}) (0[1-9]|[12][0-9]|3[01]),
+ ((19|20)\d\d)"
+ matchStrategy="matchAll" matchType="uima.tcas.DocumentAnnotation"/>
+ </rules>
+ <createAnnotations>
+ <annotation type="org.apache.uima.Date">
+ <begin group="0" />
+ <end group="0" />
+ </annotation>
+ </createAnnotations>
+</concept>
+
+</conceptSet>
+</em></span></pre><p>
+ </p><p>
+ The regex variables must be defined at the beginning of the concept file
+ next to the <code class="code"><conceptSet></code> element before the concepts are
+ defined. The variables can be used in all concept definition within the
+ same file.
+ </p><p>
+ The regex variable name can contain any of the following characters
+ <code class="code">[a-zA-Z_0-9]</code>. Other characters are not allowed.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.rulesDefinition"></a>2.4. Rule Definition</h2></div></div></div><p>
+ This paragraph shows in detail how to define a rule for a
+ RuleSet or Concept definition and give you some advanced
+ configuration possibilities for the rule processing.
+ </p><p>
+ The listing below shows an abstract rule definition with
+ all possible sub elements and attributes. Please refer to
+ the sub sections for details about the sub elements.
+ </p><p>
+</p><pre class="programlisting"><span class="emphasis"><em><rule ruleId="ID1" regEx="TestRegex" matchStrategy="matchAll"
+ matchType="uima.tcas.DocumentAnnotation" featurePath="my/feature/path"
+ confidence="1.0">
+
+ <matchTypeFilter>
+ <feature name="language">en</feature>
+ </matchTypeFilter>
+
+ <updateMatchTypeAnnotation>
+ <setFeature name="language" type="String">$0</setFeature>
+ </updateMatchTypeAnnotation>
+
+ <ruleExceptions>
+ <exception matchType="uima.tcas.DocumentAnnotation">
+ ExceptionExpression
+ </exception>
+ </ruleExceptions>
+
+</rule>
+</em></span></pre><p>
+ </p><p>
+ For each rule that should be added a <code class="code"><rule></code> element
+ have to be created. The <code class="code"><rule></code> element definition has three
+ mandatory features, these are:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">regEx</code>
+ - The regular expression pattern that
+ is used for this rule. As pattern, everything supported
+ by the Java regular expression syntax is allowed.
+ </p></li><li><p>
+ <code class="code">matchStrategy</code>
+ - The match strategy that is used
+ for this rule. Possible values are
+ <code class="code">matchAll</code>
+ to get all matches,
+ <code class="code">matchFirst</code>
+ to get the first match only and
+ <code class="code">matchComplete</code>
+ to get matches where the whole input
+ text match the regular expression pattern.
+ </p></li><li><p>
+ <code class="code">matchType</code>
+ - The annotation type that is used
+ to match the regular expression pattern.
+ As input text for the match, the annotation span
+ is used, but only if no additional <code class="code">featurePath</code>
+ feature is specified.
+ </p></li></ul></div><p>
+ </p><p>
+ In addition to the mandatory features the <code class="code"><rule></code>
+ element definition also has some optional features that can
+ be used, these are:
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">ruleId</code>
+ - Specifies the ID for this rule. The
+ ID can later be used to add it as
+ value to an annotation feature (see
+ <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.features" title="2.5.3. Annotation Features">Section 2.5.3, “Annotation Features”</a>).
+ </p></li><li><p>
+ <code class="code">confidence</code>
+ - Specifies the confidence value of this
+ rule. If you have more than one rule that describes
+ the same complex entity you can classify the rules with
+ a confidence value. This confidence value
+ can later be used to add it as value to an
+ annotation feature (see
+ <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.features" title="2.5.3. Annotation Features">Section 2.5.3, “Annotation Features”</a>).
+ </p></li><li><p>
+ <code class="code">featurePath</code>
+ - Specifies the feature path that should be used to match the regular expression pattern.
+ If a feature path is specified, the feature path value is used to match against the
+ regular expression instead of the match type annotation span.
+ The defined feature path must be valid for the specified match type annotation type.
+ The feature path elements are separated by "/".
+ </p><p>
+ The listing below shows how to match a regular expression on the <code class="code">normalizedText</code>
+ feature of a <code class="code">uima.TokenAnnotation</code>. So in this case, not the covered text of the
+ <code class="code">uima.TokenAnnotation</code> is used to match the regular expression but the
+ <code class="code">normalizedText</code> feature value of the annotation. The <code class="code">normalizedText</code>
+ feature must be defined in the UIMA type system as feature of type <code class="code">uima.TokenAnnotation</code>.
+ </p><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><rule regEx="TestRegex" matchStrategy="matchAll"
+ matchType="uima.TokenAnnotation" featurePath="normalizedText">
+</rule>
+</em></span></pre><p>
+ </p></li></ul></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.rulesDefinition.filter"></a>2.4.1. Match Type Filter</h3></div></div></div><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><matchTypeFilter>
+ <feature featurePath="language">en</feature>
+</matchTypeFilter>
+</em></span></pre><p>
+
+
+ </p><p>
+ Match type filters can be used to filter the match type
+ annotations that are used for matching the regular expression
+ pattern. For example to use a rule only when the document language
+ is English, as shown in the example above.
+ Match type filters ever relate to the <code class="code">matchType</code>
+ that was specified for the rule.
+ </p><p>
+ The <code class="code"><matchTypeFilter></code>
+ element can contain an arbitrary amount of
+ <code class="code"><feature></code>
+ elements that contains the filter information. But all specified <code class="code"><feature></code>
+ elements have to be valid for the <code class="code">matchType</code> annotation
+ of the rule.
+ </p><p>
+ The feature path that should be used as
+ filter is specified using the <code class="code">featurePath</code> feature of the
+ <code class="code"><feature></code> element. Feature path elements are separated by "/" e.g.
+ my/feature/path. The specified feature path must be valid for the <code class="code">matchType</code> annotation.
+ The content of the
+ <code class="code"><feature></code> element contains the regular expression pattern
+ that is used as filter. To pass the filter, this pattern
+ have to match the feature path value that is resolved using the match type annotation.
+ In the example above the match type annotation has a UIMA feature called
+ <code class="code">language</code> that have to have the content <code class="code">en</code>. If that
+ is true, the annotation passed the filter condition.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.rulesDefinition.update"></a>2.4.2. Update Match Type Annotations With Additional Features</h3></div></div></div><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><updateMatchTypeAnnotation>
+ <setFeature name="language" type="String">$0</setFeature>
+</updateMatchTypeAnnotation>
+</em></span></pre><p>
+ </p><p>
+ With the
+ <code class="code"><updateMatchTypeAnnotation></code>
+ construct it is possible to update or set a UIMA feature value
+ for the match type annotation in case a rule match
+ was found. The
+ <code class="code"><updateMatchTypeAnnotation></code> element
+ can have an arbitrary amount of
+ <code class="code"><setFeature></code> elements that contains
+ the feature information that should be updated.
+ </p><p>
+ The <code class="code"><setFeature></code> element has two
+ mandatory features, these are:
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">name</code>
+ - Specifies the UIMA feature name that
+ should be set. The feature have to be available
+ at the <code class="code">matchType</code> annotation
+ of the rule.
+ </p></li><li><p>
+ <code class="code">type</code>
+ - Specifies the UIMA feature type that is
+ defined in the UIMA type system for this feature.
+ Currently supported feature types are <code class="code">String</code>,
+ <code class="code">Integer</code> and <code class="code">Float</code>.
+ </p></li></ul></div><p>
+ The optional features are:
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">normalization</code>
+ - Specifies the normalization that should be performed before the feature value
+ is assigned to the match type annotation. For a list of all built-in
+ normalization functions please refer to
+ <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization" title="2.5.3.2. Features Value Normalization">Section 2.5.3.2, “Features Value Normalization”</a>.
+ </p></li><li><p>
+ <code class="code">class</code>
+ - Specifies the custom normalization class that should be used to normalize the
+ feature value before it is assigned to the match type annotation. Custom normalization
+ classes are used if the <code class="code">normalization</code> feature has the value
+ <code class="code">Custom</code>. The normalization class have to implement the
+ <code class="code">org.apache.uima.annotator.regex.extension.Normalization</code> interface.
+ For details about the feature normalization please refer to
+ <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization" title="2.5.3.2. Features Value Normalization">Section 2.5.3.2, “Features Value Normalization”</a>.
+ </p></li></ul></div><p>
+ The content of the <code class="code"><setFeature></code>
+ element definition contains the feature value that should be set.
+ This can either be a literal value or a regular
+ expression capturing group as shown in the example
+ above. A combination of capturing groups and literals
+ is also possible.
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.rulesDefinition.exception"></a>2.4.3. Rule exception</h3></div></div></div><p>
+
+ </p><pre class="programlisting"><span class="emphasis"><em><ruleExceptions>
+ <exception matchType="uima.tcas.DocumentAnnotation">
+ ExceptionPattern
+ </exception>
+</ruleExceptions>
+</em></span></pre><p>
+
+ </p><p>
+ With the
+ <code class="code"><ruleExceptions></code>
+ construct it is possible to configure exceptions to prevent matches for the rule.
+ An exception is something similar to a filter, but on the higher level. For
+ example take the scenario where you have several token annotations that
+ are covered by a sentence annotation. You have written a rule that can detect
+ car brands. The text you analyze has the sentence "Henry Ford was born 1863".
+ When analyzing the text you will get a car brand annotation since "Ford" is
+ a car brand. But is this the correct behavior? The work around that issue
+ you can create an exception that looks like
+ </p><pre class="programlisting"><span class="emphasis"><em><ruleExceptions>
+ <exception matchType="uima.SentenceAnnotation">Henry</exception>
+</ruleExceptions>
+</em></span></pre><p>
+ and add it to your car brand rule. After adding this, car brand annotations
+ are only created if the sentence annotation that covers the token annotation
+ does not contain the word "Henry".
+ </p><p>
+ The <code class="code"><ruleExceptions></code> element can have
+ an arbitrary amount of <code class="code"><exception></code>
+ elements to specify rule exceptions.
+ </p><p>
+ The <code class="code"><exception></code>
+ element has one mandatory feature called
+ <code class="code">matchType</code>. The <code class="code">matchType</code> feature
+ specifies the annotation type the exception is based on.
+ The concrete exception match type annotation that is used
+ during the runtime is evaluated for each
+ match type annotation that is used to match a rule. As
+ exception annotation always the covering annotation
+ of the current match type annotation is used.
+ If no covering annotation instance of the exception match type
+ was found the exception is not evaluated.
+ </p><p>
+ The content of the <code class="code"><exception></code>
+ element specifies the regular expression that is used to evaluate the exception.
+ </p><p>
+ If the exception match is true, the
+ current match type annotation is filtered out and is
+ not used to create any matches and annotations.
+ </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation"></a>2.5. Annotation Creation</h2></div></div></div><p>
+ This paragraph explains in detail how to create annotations if a rule has matched some input text.
+ An annotation creation example with all possible settings is shown in the listing below.
+ </p><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><annotation id="testannot" type="org.apache.uima.TestAnnot"
+ validate="CustomValidatorClass">
+ <begin group="0" location="start"/>
+ <end group="0" location="end"/>
+ <setFeature name="testFeature1" type="String">$0</setFeature>
+ <setFeature name="testFeature2" type="String"
+ normalization="ToLowerCase">$0</setFeature>
+ <setFeature name="testFeature3" type="Integer">$1</setFeature>
+ <setFeature name="testFeature4" type="Float">$2</setFeature>
+ <setFeature name="testFeature5" type="Reference">testannot1</setFeature>
+ <setFeature name="confidenceValue" type="Confidence"/>
+ <setFeature name="ruleId" type="RuleId"/>
+ <setFeature name="normalizedText" type="String"
+ normalization="Custom"
+ class="org.apache.CustomNormalizer">$0</setFeature>
+</annotation></em></span></pre><p>
+ </p><p>
+ The <code class="code"><annotation></code> element has two mandatory features, these are:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">id</code>
+ - Specifies the annotation id for this annotation. If the annotation id is specified,
+ it must be unique within the same concept. An annotation id is required if the
+ annotation is referred by another annotation or if the annotation itself refers
+ other annotations using a <code class="code">Reference</code> feature.
+ </p></li><li><p>
+ <code class="code">type</code>
+ - Specifies the UIMA annotation type that is used if an annotation is created.
+ The used type have to be defined in the UIMA type system.
+ </p></li></ul></div><p>
+ </p><p>
+ The optional features are:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">validate</code>
+ - Specifies the custom validator class that is used to validate matches before
+ they are added as annotation to the CAS. For more details about the custom
+ annotation validation, please refer to
+ <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.validation" title="2.5.2. Annotation Validation">Section 2.5.2, “Annotation Validation”</a>.
+ </p></li></ul></div><p>
+ </p><p>
+ The mandatory sub elements of the <code class="code"><annotation></code> element are:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code"><begin></code>
+ - Specifies the begin position of the annotation that is created.
+ For details about the <code class="code"><begin></code> element, please refer
+ to <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries" title="2.5.1. Annotation Boundaries">Section 2.5.1, “Annotation Boundaries”</a>.
+ </p></li><li><p>
+ <code class="code"><end></code>
+ - Specifies the end position of the annotation that is created.
+ For details about the <code class="code"><end></code> element, please refer
+ to <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries" title="2.5.1. Annotation Boundaries">Section 2.5.1, “Annotation Boundaries”</a>.
+ </p></li></ul></div><p>
+ </p><p>
+ The optional sub elements of the <code class="code"><annotation></code> element are:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code"><code class="code"><setFeature></code></code>
+ - set a UIMA feature for the created annotation.
+ For details about the <code class="code"><setFeature></code> element, please refer
+ to <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.features" title="2.5.3. Annotation Features">Section 2.5.3, “Annotation Features”</a>
+ </p></li></ul></div><p>
+ </p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries"></a>2.5.1. Annotation Boundaries</h3></div></div></div><p>
+ When creating an annotation with the <code class="code"><annotation></code> element it is also
+ necessary to define the annotations boundaries. The annotation boundaries are defined using the
+ sub elements <code class="code"><begin></code> and <code class="code"><end></code>. The start position of
+ the annotation is defined using the <code class="code"><begin></code> element, the end position using
+ the <code class="code"><end></code> element. Both elements have the same features as shown below:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">group</code>
+ - identifies the capturing group number within the regular expression pattern for the
+ current rule. The value is a positive number where 0 denotes
+ the whole match, 1 the first capturing group, 2 the second one, and so on.
+ </p></li><li><p>
+ <code class="code">location</code>
+ - indicates a position inside the capturing group, which can either be the position
+ of the left parenthesis in case of a value <code class="code">start</code>, or the right parenthesis in
+ case of a value <code class="code">end</code>. The <code class="code">location</code> feature is optional. By default
+ the <code class="code"><begin></code> element is set to <code class="code">location="start"</code> and the
+ <code class="code"><end></code> element to <code class="code">location="end"</code>.
+ </p></li></ul></div><p>
+ </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
+ When the rule definition defines a <code class="code">featurePath</code> for a <code class="code">matchType</code>,
+ the annotation boundaries for the created annotation are automatically set to
+ the annotation boundaries of the match input annotation. This must be done since
+ the matching with a feature value of an annotation has no relation to the document text, so the only
+ relation is the annotation where the feature is defined.
+ </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.validation"></a>2.5.2. Annotation Validation</h3></div></div></div><p>
+ The custom annotation validation can be used to validate a regular expression match by using some
+ java code before the match is added as annotation to the CAS. For example if your regular expression
+ detects an ISBN number you can use the custom validation code to check if it is really an ISBN number
+ by calculating the last check digit or if it is just a phone number.
+ </p><p>
+ To use the custom annotation validation you have to specify the validation class at the <code class="code">validate</code>
+ feature of the <code class="code"><annotation></code> element. The validation class must implement the
+ <code class="code">org.apache.uima.annotator.regex.extension.Validation</code> interface
+ (<a href="#sandbox.regexAnnotator.Validation" title="Appendix B. Validation Interface">Appendix B, <i xmlns:xlink="http://www.w3.org/1999/xlink">Validation Interface</i></a>). The interface defines one
+ method called <code class="code">validate(String coveredText, String ruleID)</code>. The validate method is called by the annotator
+ before the match is added as annotation to the CAS. Annotations are only added if the validate method
+ returns <code class="code">true</code>, otherwise the match is skipped. The <code class="code">coveredText</code> parameter contains
+ the text that matches the regular expression.
+ The <code class="code">ruleID</code> parameter contains the ruldId of the rule that creates the match. This can also be null
+ if no ruleID was specified. The listing below shows a sample implementation of the validation interface.
+ </p><p>
+ </p><pre class="programlisting">package org.apache.uima.annotator.regex;
+
+public class SampleValidator implements
+ org.apache.uima.annotator.regex.extension.Validation {
+
+ /* (non-Javadoc)
+ * @see org.apache.uima.annotator.regex.extension.Validation
+ * #validate(java.lang.String, java.lang.String)
+ */
+ public boolean validate(String coveredText, String ruleID)
+ throws Exception {
+
+ //implement your custom validation, e.g. to validate ISBN numbers
+ return validateISBNNumbers(coveredText);
+ }
+}</pre><p>
+ </p><p>
+ The configuration for this example looks like:
+ </p><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><annotation id="isbnNumber" type="org.apache.uima.ISBNNumber"
+ validate="org.apache.uima.annotator.regex.SampleValidator">
+ <begin group="0"/>
+ <end group="0"/>
+</annotation></em></span></pre><p>
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.features"></a>2.5.3. Annotation Features</h3></div></div></div><p>
+ With the <code class="code"><setFeature></code> element of <code class="code"><annotation></code> definition it is
+ possible to set UIMA features for the created annotation. The mandatory features
+ for the <code class="code"><setFeature></code> element are:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">name</code>
+ - Specifies the UIMA feature name that should be set. The feature name have to
+ be a valid UIMA feature for this annotation and have to be defined in the
+ UIMA type system.
+ </p></li><li><p>
+ <code class="code">type</code>
+ - Specifies the type of the UIMA feature. For a list of all
+ possible feature types please refer to
+ <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureTypes" title="2.5.3.1. Features Types">Section 2.5.3.1, “Features Types”</a>.
+ </p></li></ul></div><p>
+ </p><p>
+ The optional features are:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">normalization</code>
+ - Specifies the normalization that should be performed before the feature value
+ is assigned to the UIMA annotation. For a list of all built-in
+ normalization functions please refer to
+ <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization" title="2.5.3.2. Features Value Normalization">Section 2.5.3.2, “Features Value Normalization”</a>.
+ </p></li><li><p>
+ <code class="code">class</code>
+ - Specifies the custom normalization class that should be used to normalize the
+ feature value before it is assigned to the UIMA annotation. Custom normalization
+ classes are used if the <code class="code">normalization</code> feature has the value
+ <code class="code">Custom</code>. The normalization class have to implement the
+ <code class="code">org.apache.uima.annotator.regex.extension.Normalization</code> interface.
+ For details about the feature normalization please refer to
+ <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization" title="2.5.3.2. Features Value Normalization">Section 2.5.3.2, “Features Value Normalization”</a>.
+ </p></li></ul></div><p>
+ </p><p>
+ The content of the <code class="code"><setFeature></code> element specifies the value of the
+ UIMA feature that is set. As value a literal, a capturing group or a combination of
+ both can be used.
+ To add the value of a capturing group there are two ways to do it.
+ The first notation is <code class="code">$</code> followed by the capturing group number from 0 to 9
+ e.g. $0 for capturing group 0 or $7 for capturing group 7.
+ The second notation to get the value of a capturing group are capturing group names.
+ If the rule contains named capturing groups these groups can be accessed
+ with <code class="code">${matchGroupName}</code>. For the access of capturing
+ groups greater than 9 capturing group names must be used. An example for capturing group names is
+ shown below:
+ </p><p>
+ To add a name to a capturing group just add the following fragment <code class="code">\m{groupname}</code>
+ in front of the capturing group start parenthesis.
+ </p><pre class="programlisting"><span class="emphasis"><em><concept name="capturingGroupNames">
+ <rules>
+ <rule ruleId="ID1"
+ regEx="My \m{groupName}(named capturing group) example"
+ matchStrategy="matchAll"
+ matchType="uima.tcas.DocumentAnnotation"/>
+ </rules>
+ <createAnnotations>
+ <annotation type="org.apache.uima.TestAnnot">
+ <begin group="0"/>
+ <end group="0"/>
+ <setFeature name="testFeature0" type="String">
+ ${groupName}
+ </setFeature>
+ </annotation>
+ </createAnnotations>
+</concept>
+</em></span></pre><p>
+ </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.featureTypes"></a>2.5.3.1. Features Types</h4></div></div></div><p>
+ When setting UIMA feature for an annotation using the <code class="code"><setFeature></code> element
+ the feature type has to be specified according the the UIMA type system definition.
+ The feature at the <code class="code"><setFeature></code> element to do that is called <code class="code">type</code>.
+ The list below shows all currently supported feature types:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">String</code>
+ - for <code class="code">uima.cas.String</code> based UIMA features.
+ </p></li><li><p>
+ <code class="code">Integer</code>
+ - for <code class="code">uima.cas.Integer</code> based UIMA features.
+ </p></li><li><p>
+ <code class="code">Float</code>
+ - for <code class="code">uima.cas.Float</code> based UIMA features.
+ </p></li><li><p>
+ <code class="code">Reference</code>
+ - to link a UIMA feature to another annotation. In this case the
+ UIMA feature type have to be the same as the referred annotation type.
+ To reference another annotation instance the <code class="code"><setFeature></code>
+ content must have the annotation <code class="code">id</code> as value of the referred
+ annotation. The referred annotation instance is the created annotation of
+ the current match.
+ </p></li><li><p>
+ <code class="code">Confidence</code>
+ - to add the value of the <code class="code">confidence</code> feature defined
+ at the <code class="code"><rule></code> element to this feature. The UIMA feature have to
+ be of type <code class="code">uima.cas.Float</code>.
+ </p></li><li><p>
+ <code class="code">RuleId</code>
+ - to add the value of the <code class="code">ruleId</code> feature defined
+ at the <code class="code"><rule></code> element to this feature. The UIMA feature have to
+ be of type <code class="code">uima.cas.String</code>.
+ </p></li></ul></div><p>
+ </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
+ Float and Integer based feature values are converted using the Java NumberFormat for the
+ current Java default locale. If the feature value cannot be converted the feature value is not
+ set and a warning is written to the log. To prevent these warnings it may be useful
+ to do a custom normalization of the numbers before they are added to the feature.
+ </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization"></a>2.5.3.2. Features Value Normalization</h4></div></div></div><p>
+ Before assigning a feature value to an annotation it is possible to
+ do a normalization on the feature value. This normalization can be useful for example to normalize
+ a detected email addresses to lower case before it is added to the annotation.
+ To normalize a feature value the <code class="code">normalization</code> feature of the
+ <code class="code"><setFeature></code> element is used. The built-in normalization functions
+ are listed below. Additionally the RegexAnnotator provides an extension point that can be
+ implemented to add a custom normalization.
+ </p><p>
+ The possible build-in functions that are specified as feature value of
+ the <code class="code">normalization</code> feature are listed below:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">ToLowerCase</code>
+ - normalize the feature value to lower case before it is assigned to the annotation.
+ </p></li><li><p>
+ <code class="code">ToUpperCase</code>
+ - normalize the feature value to upper case before it is assigned to the annotation.
+ </p></li><li><p>
+ <code class="code">Trim</code>
+ - remove all leading and trailing whitespace characters from the feature value before
+ it is assigned to the annotation.
+ </p></li></ul></div><p>
+ Built-in normalization configuration:
+ </p><pre class="programlisting"><span class="emphasis"><em><setFeature name="normalizedFeature" type="String"
+ normalization="ToLowerCase">$0</setFeature></em></span></pre><p>
+ </p><p>
+ In case of a custom normalization, the <code class="code">normalization</code> feature must have the value
+ <code class="code">Custom</code>, and an additional feature of the <code class="code"><setFeature></code> element
+ called <code class="code">class</code> have to be specified containing the full qualified class name of the
+ custom normalization implementation. The custom normalization implementation have to implement
+ the interface <code class="code">org.apache.uima.annotator.regex.extension.Normalization</code>
+ (<a href="#sandbox.regexAnnotator.Normalization" title="Appendix C. Normalization Interface">Appendix C, <i xmlns:xlink="http://www.w3.org/1999/xlink">Normalization Interface</i></a>) which defines the
+ <code class="code">normalize</code> method to normalize the feature values. A sample implementation with
+ the corresponding configuration is shown below.
+ </p><p>
+ Custom normalization implementation:
+ </p><pre class="programlisting">package org.apache.uima;
+
+public class CustomNormalizer
+ implements org.apache.uima.annotator.regex.extension.Normalization {
+
+ /* (non-Javadoc)
+ * @see org.apache.uima.annotator.regex.extension.Normalization
+ * #normalize(java.lang.String, java.lang.String)
+ */
+ public String normalize(String input, String ruleId) {
+
+ //implement your custom normalization
+ String result = ...
+ return result;
+ }</pre><p>
+ </p><p>
+ Custom normalization configuration:
+ </p><pre class="programlisting"><span class="emphasis"><em><setFeature name="normalizedFeature" type="String"
+ normalization="Custom" class="org.apache.uima.CustomNormalizer">
+ $0
+</setFeature></em></span></pre><p>
+ </p></div></div></div></div><div class="chapter" lang="en" id="sandbox.regexAnnotator.annotatorDescriptor"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.annotatorDescriptor"></a>Chapter 3. Annotator Descriptor</h2></div></div></div><p>The RegexAnnotator analysis engine descriptor contains some processing information for
+ the annotator. The processing information is specified as configuration parameters.
+ This chapter we explain in detail the possible descriptor settings.
+ </p><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.annotatorDescriptor.configParam"></a>3.1. Configuration Parameters</h2></div></div></div><p>
+ The RegexAnnotator has the following configuration parameters:
+ </p><p>
+ </p><div class="itemizedlist"><ul type="disc"><li><p>
+ <code class="code">ConceptFiles</code>
+ - This parameter is modeled as array of Strings and contains
+ the concept files the annotator should use. The concept files
+ must be specified using a relative path that is available in the
+ UIMA datapath or in the classpath.
+ </p><pre class="programlisting"><span class="emphasis"><em><nameValuePair>
+ <name>ConceptFiles</name>
+ <value>
+ <array>
+ <string>subdir/myConcepts.xml</string>
+ <string>SampleConcept.xml</string>
+ </array>
+ </value>
+</nameValuePair></em></span></pre><p>
+ </p></li></ul></div><p>
+ </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.annotatorDescriptor.capabilities"></a>3.2. Capabilities</h2></div></div></div><p>
+ In the capabilities section of the RegexAnnotator descriptor the input and output
+ capabilities and the supported languages have to be defined.
+ </p><p>
+ The input capabilities defined
+ in the descriptor have to comply with the match types used in the concept rule file
+ that is used. For example the <code class="code">uima.SentenceAnnotation</code> used in the rule
+ below have to be added to the input capability section in the RegexAnnotator descriptor.
+ </p><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><rules>
+ <rule regEx="SampleRegex" matchStrategy="matchAll"
+ matchType="uima.SentenceAnnotation"/>
+</rules>
+</em></span></pre><p>
+ </p><p>
+ In the output section, all of the annotation types and features created by
+ the RegexAnnotator have to be specified. These have to match the
+ output types and features declared in the <code class="code"><annotation></code> elements of the concept file.
+ For example the <code class="code">org.apache.uima.TestAnnot</code> annotation and the
+ <code class="code">org.apache.uima.TestAnnot:testFeature</code> feature used below have to
+ be added to the output capability section in the RegexAnnotator descriptor.
+ </p><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><createAnnotations>
+ <annotation type="org.apache.uima.TestAnnot">
+ <begin group="0"/>
+ <end group="0"/>
+ <setFeature name="testFeature" type="String">$0</setFeature>
+ </annotation>
+</createAnnotations>
+</em></span></pre><p>
+ </p><p>
+ If there are any language dependent rules in the concept file the languages abbreviations
+ have to be specified in the <code class="code"><languagesSupported></code>element. If there are no
+ language dependent rules available you can specify <code class="code">x-unspecified</code> as language. That means
+ that the annotator can work on all languages.
+ </p><p>
+ For the short examples used above the capabilities section in the RegexAnnotator
+ descriptor looks like:
+ </p><p>
+ </p><pre class="programlisting"><span class="emphasis"><em><capabilities>
+ <capability>
+ <inputs>
+ <type>uima.SentenceAnnotation</type>
+ </inputs>
+ <outputs>
+ <type>org.apache.uima.TestAnnot</type>
+ <feature>org.apache.uima.TestAnnot:testFeature</feature>
+ </outputs>
+ <languagesSupported>
+ <language>x-unspecified</language>
+ </languagesSupported>
+ </capability>
+</capabilities>
+</em></span></pre><p>
+ </p></div></div><div class="appendix" lang="en" id="sandbox.regexAnnotator.xsd"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.xsd"></a>Appendix A. Concept File Schema</h2></div></div></div><p>The concept file schema that is used to define the concept file looks like:
+ </p><p>
+ </p><pre class="programlisting"><?xml version="1.0" encoding="UTF-8"?>
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+ targetNamespace="http://incubator.apache.org/uima/regex"
+ xmlns="http://incubator.apache.org/uima/regex"
+ elementFormDefault="qualified">
+ <!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ -->
+
+ <xs:element name="conceptSet">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="concept" minOccurs="0" maxOccurs="unbounded"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="concept">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="rules" minOccurs="1" maxOccurs="1"/>
+ <xs:element ref="createAnnotations" minOccurs="1" maxOccurs="1"/>
+ </xs:sequence>
+ <xs:attribute name="name" type="xs:string" use="optional"/>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="createAnnotations">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="annotation" minOccurs="1" maxOccurs="unbounded"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="rules">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="rule" minOccurs="1" maxOccurs="unbounded"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="rule">
+ <xs:complexType>
+ <xs:all>
+ <xs:element ref="matchTypeFilter" minOccurs="0" maxOccurs="1"/>
+ <xs:element ref="updateMatchTypeAnnotation" minOccurs="0" maxOccurs="1"/>
+ <xs:element ref="ruleExceptions" minOccurs="0" maxOccurs="1"/>
+ </xs:all>
+ <xs:attribute name="regEx" type="xs:string" use="required"/>
+ <xs:attribute name="matchStrategy" use="required">
+ <xs:simpleType>
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="matchFirst"/>
+ <xs:enumeration value="matchAll"/>
+ <xs:enumeration value="matchComplete"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ <xs:attribute name="matchType" type="xs:string" use="required"/>
+ <xs:attribute name="featurePath" type="xs:string" use="optional" />
+ <xs:attribute name="ruleId" type="xs:string" use="optional"/>
+ <xs:attribute name="confidence" type="xs:decimal" use="optional"/>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="matchTypeFilter">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="feature" minOccurs="0" maxOccurs="unbounded"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="ruleExceptions">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="exception" minOccurs="0" maxOccurs="unbounded"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="exception">
+ <xs:complexType>
+ <xs:simpleContent>
+ <xs:extension base="xs:string">
+ <xs:attribute name="matchType" type="xs:string" use="required"/>
+ </xs:extension>
+ </xs:simpleContent>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="feature">
+ <xs:complexType>
+ <xs:simpleContent>
+ <xs:extension base="xs:string">
+ <xs:attribute name="featurePath" type="xs:string" use="required"/>
+ </xs:extension>
+ </xs:simpleContent>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="annotation">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="begin" minOccurs="1" maxOccurs="1"/>
+ <xs:element ref="end" minOccurs="1" maxOccurs="1"/>
+ <xs:element ref="setFeature" minOccurs="0" maxOccurs="unbounded"/>
+ </xs:sequence>
+ <xs:attribute name="id" type="xs:string" use="optional"/>
+ <xs:attribute name="type" type="xs:string" use="required"/>
+ <xs:attribute name="validate" type="xs:string" use="optional" />
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="updateMatchTypeAnnotation">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="setFeature" minOccurs="0" maxOccurs="unbounded"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="begin">
+ <xs:complexType>
+ <xs:attribute name="group" use="required" type="xs:integer"/>
+ <xs:attribute name="location" use="optional" default="start">
+ <xs:simpleType>
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="start"/>
+ <xs:enumeration value="end"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="end">
+ <xs:complexType>
+ <xs:attribute name="group" use="required" type="xs:integer"/>
+ <xs:attribute name="location" use="optional" default="end">
+ <xs:simpleType>
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="start"/>
+ <xs:enumeration value="end"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="setFeature">
+ <xs:complexType>
+ <xs:simpleContent>
+ <xs:extension base="xs:string">
+ <xs:attribute name="name" type="xs:string" use="required"/>
+ <xs:attribute name="type" use="required">
+ <xs:simpleType>
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="String"/>
+ <xs:enumeration value="Integer"/>
+ <xs:enumeration value="Float"/>
+ <xs:enumeration value="Reference"/>
+ <xs:enumeration value="Confidence"/>
+ <xs:enumeration value="RuleId"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ <xs:attribute name="normalization" use="optional">
+ <xs:simpleType>
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="Custom" />
+ <xs:enumeration value="ToLowerCase" />
+ <xs:enumeration value="ToUpperCase" />
+ <xs:enumeration value="Trim" />
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ <xs:attribute name="class" type="xs:string" use="optional" />
+ </xs:extension>
+ </xs:simpleContent>
+ </xs:complexType>
+ </xs:element>
+</xs:schema>
+</pre><p>
+
+ </p></div><div class="appendix" lang="en" id="sandbox.regexAnnotator.Validation"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.Validation"></a>Appendix B. Validation Interface</h2></div></div></div><p>
+ </p><pre class="programlisting">/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.uima.annotator.regex.extension;
+
+
+/**
+ * The Validation interface is provided to implement a custom validator
+ * that can be used to validate regular expression matches before
+ * they are added as annotations.
+ */
+public interface Validation {
+
+/**
+ * The validate method validates the covered text of an annotator and
+ * returns true or false whether the annotation is correct or not.
+ * The validate method is called between a rule match and the
+ * annotation creation. The annotation is only created if the method
+ * returns true.
+ *
+ * @param coveredText covered text of the annotation that should be
+ * validated
+ * @param ruleID ruleID of the rule which created the match
+ *
+ * @return true if the annotation is valid or false if the annotation
+ * is invalid
+ *
+ * @throws Exception throws an exception if an validation error occurred
+ */
+public boolean validate(String coveredText, String ruleID)
+ throws Exception;
+
+}</pre><p>
+ </p></div><div class="appendix" lang="en" id="sandbox.regexAnnotator.Normalization"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.Normalization"></a>Appendix C. Normalization Interface</h2></div></div></div><p>
+ </p><pre class="programlisting">/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.uima.annotator.regex.extension;
+
+
+/**
+ * The Normalization interface was add to implement a custom normalization
+ * for feature values before they are assigned to an anntoation.
+ */
+public interface Normalization {
+
+/**
+ * Custom feature value normalization. This interface must be implemented
+ * to perform a custom normalization on the given input string.
+ *
+ * @param input input string which should be normalized
+ *
+ * @param ruleID rule ID of the matching rule
+ *
+ * @return String - normalized input string
+ */
+public String normalize(String input, String ruleID) throws Exception;
+}</pre><p>
+ </p></div></div></body></html>
\ No newline at end of file
Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css?rev=817775&view=auto
==============================================================================
--- incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css (added)
+++ incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css Tue Sep 22 19:07:01 2009
@@ -0,0 +1,302 @@
+/*
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+*/
+
+html {
+ padding: 0pt;
+ margin: 0pt;
+}
+
+body {
+ margin-top: 1em;
+ margin-bottom: 1em;
+ margin-left: 16%;
+ margin-right: 8%;
+ font-size: 10.5pt;
+ line-height: 1.3;
+ font-family: "Palatino Linotype", "Times New Roman", Times, serif;
+}
+
+div {
+ margin: 0pt;
+}
+
+p {
+ text-align: left;
+ margin-bottom: .6em;
+ line-height: 1.4;
+}
+
+td { line-height: 1.2;
+ padding: .3em;
+ }
+
+hr {
+ margin-top: .6em;
+ margin-bottom: .6em;
+ margin-left: 0pt;
+ margin-right: 0pt;
+ border: 1px solid gray;
+ background: gray;
+}
+
+h2,h3,h4,h5 {
+ margin: 0 0 0.5em 0;
+ page-break-after: avoid;
+ font-family: Helvetica, Arial, sans-serif;
+ font-weight: bold;
+ color: #525D76;
+}
+
+h2 {
+ margin-left: -10%; }
+
+h2, h3, h4 { margin-top: 1em; }
+
+/* later rules of same specificity override earlier ones */
+/* cant use ">" because IE doesn't recognize */
+
+div.chapter div.titlepage h2.title {
+ margin-bottom: 1.5em;
+ font-size: 1.6em;
+ letter-spacing: -0.07ex;
+ border-top:solid black 2.25pt;
+}
+
+/* this one comes after and is therefore more specific */
+
+div.section div.titlepage h2.title { /* h2 */
+ font-size: 1.3em;
+ border-top:solid black 1.00pt;
+}
+
+h3 {
+ margin-left: -5%;
+ font-size: 1.2em;
+ border-top:solid black .75pt;
+}
+
+div.note h3, div.tip h3 {
+ margin-left: 0;
+ font-size: 1.2em;
+ border-top: none;
+ margin-top: 0em;
+}
+
+h4 {
+ font-size: 1.1em;
+}
+
+a {
+ text-decoration: underline;
+ /*color: black;*/
+}
+
+a:hover {
+ text-decoration: underline;
+ color: black;
+}
+
+h3,h4,h5 {
+ line-height: 1.3;
+ margin-top: 1.5em;
+ font-family: Arial, Sans-serif;
+}
+
+h1.title {
+ text-align: left;
+
+ margin-top: 2em;
+ margin-bottom: 2em;
+ margin-left: 0pt;
+ margin-right: 0pt;
+}
+
+h2.subtitle, h3.subtitle {
+ text-align: left;
+ margin-top: 2em;
+ margin-bottom: 2em;
+ text-transform: uppercase;
+}
+
+h3.author, p.othercredit {
+ font-size: 0.9em;
+ font-weight: normal;
+ font-style: oblique;
+ text-align: left;
+ color: #525D76;
+}
+
+td.tableSubhead {
+ font-weight: bold;
+ background-color: silver;
+}
+
+div.titlepage {
+}
+
+div.section {
+}
+
+
+div.authorgroup
+{
+ text-align: left;
+ margin-bottom: 3em;
+ display: block;
+}
+
+div.toc, div.list-of-examples, div.list-of-figures {
+
+ margin-bottom: 3em;
+}
+
+
+div.itemizedlist {
+ margin-top: 0.5em;
+ margin-bottom: 0.5em;
+}
+
+ol,ul {
+}
+
+li {
+}
+
+pre {
+ margin: .75em 0;
+ line-height: 1.25;
+ color: black;
+}
+
+pre.programlisting {
+ font-size: 9pt;
+ padding: 5pt 2pt;
+ border: 1pt solid black;
+ background: #eeeeee;
+}
+
+div.table {
+ margin: 1em;
+ padding: 0.5em;
+ text-align: center;
+}
+
+div.table table {
+ /* display: block; */ /* in firefox, breaks centering */
+ margin-left: auto; /* see http://theodorakis.net/tablecentertest.html */
+ margin-right: auto;
+}
+
+div.table td {
+ padding-right: 5px;
+ padding-left: 5px;
+}
+
+div.table p.title {
+ text-align: center;
+ margin-left: 5%;
+ margin-right: 5%;
+}
+
+p.releaseinfo, .copyright {
+ font-size: 0.9em;
+ text-align: left;
+ margin: 0px;
+ padding: 0px;
+}
+
+div.note, div.important, div.example, div.informalexample, div.tip, div.caution {
+ margin: 1em;
+ padding: 0.5em;
+ border: 1px solid gray;
+ background-color: #f8f8e0;
+}
+
+div.important th, div.note th, div.tip th {
+ text-align: left;
+ border-bottom: solid 1px gray;
+}
+
+div.navheader, div.navheader table {
+ font-family: sans-serif;
+ font-size: 12px;
+}
+
+div.navfooter, div.navfooter table {
+ font-family: sans-serif;
+ font-size: 12px;
+}
+
+div.figure, div.screenshot {
+ text-align: center; /* needed for ms5 */
+ margin-top: 1em;
+ margin-bottom: 1em;
+}
+
+div.figure table, div.screenshot table { /* see http://theodorakis.net/tablecentertest.html */
+ margin-left: auto;
+ margin-right: auto;
+}
+
+div.figure p.title {
+ text-align: center;
+ margin-left: 15%;
+ margin-right: 15%;
+}
+
+div.example p.title {
+ margin-top: 0em;
+ margin-bottom: 0.6em;
+ text-align: left;
+ padding-bottom: 0.4em;
+ border-bottom: solid 1px gray;
+}
+
+div.figure img {
+ border: 1px solid gray;
+ padding: 0.5em;
+ margin: 0.5em;
+}
+
+div.revhistory {
+ font-size: 0.8em;
+ width: 90%;
+ margin-left: 5%;
+ margin-top: 3em;
+ margin-bottom: 3em;
+}
+
+div.revhistory table {
+ font-family: sans-serif;
+ font-size: 12px;
+ border-collapse: collapse;
+}
+
+div.revhistory table tr {
+ border: solid 1px gray;
+}
+
+div.revhistory table th {
+ border: none;
+}
+
+span.bold-italic {
+ font-weight: bold;
+ font-style: italic;
+}
Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/blank.png
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/blank.png?rev=817775&view=auto
==============================================================================
Binary file - no diff available.
Propchange: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/blank.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.gif
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.gif?rev=817775&view=auto
==============================================================================
Binary file - no diff available.
Propchange: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.gif
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.png
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.png?rev=817775&view=auto
==============================================================================
Binary file - no diff available.
Propchange: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/10.gif
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/10.gif?rev=817775&view=auto
==============================================================================
Binary file - no diff available.