You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2009/09/22 21:07:16 UTC

svn commit: r817775 [1/2] - in /incubator/uima/sandbox/trunk/RegularExpressionAnnotator: ./ docs/ docs/html/ docs/html/RegexAnnotatorUserGuide/ docs/html/RegexAnnotatorUserGuide/css/ docs/html/images/ docs/html/images/RegexAnnotatorUserGuide/ docs/html...

Author: schor
Date: Tue Sep 22 19:07:01 2009
New Revision: 817775

URL: http://svn.apache.org/viewvc?rev=817775&view=rev
Log:
UIMA-1583 add new property needed for doc build, save docs for website, update pom to share more from parents, change dependencies for some 3rd party Jars

Added:
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/RegexAnnotatorUserGuide/
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/blank.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/10.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/10.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/11.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/11.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/12.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/12.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/13.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/13.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/14.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/14.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/15.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/15.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/2.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/2.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/3.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/3.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/4.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/4.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/5.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/5.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/6.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/6.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/7.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/7.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/8.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/8.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/9.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/9.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/caution.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/caution.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/caution.svg
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/caution.tif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/draft.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/home.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/home.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/home.svg
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/important.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/important.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/important.svg
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/important.tif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/next.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/next.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/next.svg
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/note.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/note.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/note.svg
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/note.tif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/prev.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/prev.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/prev.svg
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/tip.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/tip.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/tip.svg
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/tip.tif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/toc-blank.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/toc-minus.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/toc-plus.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/up.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/up.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/up.svg
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/warning.gif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/warning.png   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/warning.svg
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/warning.tif   (with props)
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/pdf/
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/pdf/RegexAnnotatorUserGuide.pdf   (with props)
Modified:
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/pom.xml

Modified: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml?rev=817775&r1=817774&r2=817775&view=diff
==============================================================================
--- incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml (original)
+++ incubator/uima/sandbox/trunk/RegularExpressionAnnotator/build_documentation.xml Tue Sep 22 19:07:01 2009
@@ -24,6 +24,7 @@
 <project name="Apache UIMA Regular Expression Annotator Documentation" default="all" basedir=".">
   
   <property name="book_name" value="RegexAnnotatorUserGuide"/>
+  <property name="artifactId" value="RegularExpressionAnnotator"/>
 	
   <import file="${basedir}/../SandboxDocs/sandbox_build.xml"/>  
   

Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html?rev=817775&view=auto
==============================================================================
--- incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html (added)
+++ incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html Tue Sep 22 19:07:01 2009
@@ -0,0 +1,1193 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+   <title>Apache UIMA Regular Expression Annotator Documentation</title><link rel="stylesheet" href="css/stylesheet-html.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.72.0"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="book" lang="en" id="d0e2"><div class="titlepage"><div><div><h1 class="title"><a name="d0e2"></a>
+		Apache UIMA Regular Expression Annotator Documentation
+	</h1></div><div><div class="authorgroup"><h3 class="corpauthor">Authors: The Apache UIMA Development Community</h3></div></div><div><span class="productname">Apache UIMA Sandbox<br></span></div><div><p class="releaseinfo">Version 2.3.0</p></div><div><p class="copyright">Copyright &copy; 2008, 2009 The Apache Software Foundation</p></div><div><div class="legalnotice"><a name="d0e15"></a><p> </p><p><b>Incubation Notice and Disclaimer.&nbsp;</b>Apache UIMA is an effort undergoing incubation at the Apache Software Foundation (ASF). 
+          Incubation is required of all newly accepted projects until a further review indicates that 
+          the infrastructure, communications, and decision making process have stabilized in a manner 
+          consistent with other successful ASF projects. While incubation status is not necessarily 
+          a reflection of the completeness or stability of the code, 
+          it does indicate that the project has yet to be fully endorsed by the ASF.</p><p> </p><p> </p><p><b>License and Disclaimer.&nbsp;</b>The ASF licenses this documentation
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this documentation except in compliance
+           with the License.  You may obtain a copy of the License at
+         
+         </p><div class="blockquote"><blockquote class="blockquote"><p>
+             <a xmlns:xlink="http://www.w3.org/1999/xlink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a>
+           </p></blockquote></div><p>
+         
+           Unless required by applicable law or agreed to in writing,
+           this documentation and its contents are distributed under the License 
+           on an 
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+         </p><p> </p><p> </p><p><b>Trademarks.&nbsp;</b>All terms mentioned in the text that are known to be trademarks or 
+        service marks have been appropriately capitalized.  Use of such terms
+        in this book should not be regarded as affecting the validity of the
+        the trademark or service mark.
+        </p></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="preface"><a href="#d0e54">Introduction</a></span></dt><dt><span class="chapter"><a href="#sandbox.regexAnnotator.processingOverview">1. Processing Overview</a></span></dt><dt><span class="chapter"><a href="#sandbox.regexAnnotator.conceptsFile">2. Concepts Configuration File</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rules">2.1. RuleSet definition</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.concepts">2.2. Concept definition</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.regexVariables">2.3. Regex Variables</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rulesDefinition">2.4. Rule Definition</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rulesDefinitio
 n.filter">2.4.1. Match Type Filter</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rulesDefinition.update">2.4.2. Update Match Type Annotations With Additional Features</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.rulesDefinition.exception">2.4.3. Rule exception</a></span></dt></dl></dd><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation">2.5. Annotation Creation</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries">2.5.1. Annotation Boundaries</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.validation">2.5.2. Annotation Validation</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.features">2.5.3. Annotation Features</a></span></dt></dl></dd></dl></dd><dt><span class="chapte
 r"><a href="#sandbox.regexAnnotator.annotatorDescriptor">3. Annotator Descriptor</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.regexAnnotator.annotatorDescriptor.configParam">3.1. Configuration Parameters</a></span></dt><dt><span class="section"><a href="#sandbox.regexAnnotator.annotatorDescriptor.capabilities">3.2. Capabilities</a></span></dt></dl></dd><dt><span class="appendix"><a href="#sandbox.regexAnnotator.xsd">A. Concept File Schema</a></span></dt><dt><span class="appendix"><a href="#sandbox.regexAnnotator.Validation">B. Validation Interface</a></span></dt><dt><span class="appendix"><a href="#sandbox.regexAnnotator.Normalization">C. Normalization Interface</a></span></dt></dl></div><div class="preface" lang="en" id="d0e54"><div class="titlepage"><div><div><h2 class="title"><a name="d0e54"></a>Introduction</h2></div></div></div><p>
+			The Regular Expression Annotator (RegexAnnotator) is an
+			Apache UIMA analysis engine that detects entities such as
+			email addresses, URLs, phone numbers, zip codes or any other
+			entity that can be specified using a regular expression. For
+			each entity that is detected an own annotation can be
+			created or an already existing annotation can be updated
+			with new features.
+
+			To detect also more difficult and complex entities, the
+			annotator provides some advanced filter capabilities and a
+			rule definition syntax that can combine rules to a concept
+			with a confidence value for each of the concept's rules.
+		</p></div><div class="chapter" lang="en" id="sandbox.regexAnnotator.processingOverview"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.processingOverview"></a>Chapter&nbsp;1.&nbsp;Processing Overview</h2></div></div></div><p>
+			To detect any kind of entity the RegexAnnotator must be
+			configured using an external XML file. We call this file
+			"concept file" since it contains the regular expressions and
+			concepts that the annotator use during its processing to
+			detect entities. In addition to the rules the concept file
+			also contains the "entity result processing" that is done if
+			an entity was detected. The "entity result processing" can
+			either be the creation of new annotations or an update of an
+			existing annotation with additional features. The types and
+			features that are used to create new annotations have to be
+			available in the UIMA type system.
+		</p><p>
+			After the concept file is created, the annotator XML
+			descriptor have to be updated with the capabilities and
+			maybe with the type system information from the concept
+			file. The capability update is necessary that the UIMA
+			framework can call the annotator also in complex annotator
+			flows if the annotator is assembled with others to an
+			analysis bundle. The UIMA type system update is only
+			necessary if the used types are not available in the UIMA
+			type system definition.
+		</p><p>
+			With the completion of the descriptor updates, the
+			RegexAnnotator is ready to use. When starting the annotator,
+			during the initialization the annotator reads the concept
+			file and checks if all rules and concepts are valid and if
+			all annotations types are defined in the UIMA type system.
+			For each document that is processed the rules and concepts
+			are executed in exactly the same order as defined in the
+			concept file. The results and annotations created for a
+			preceding rule are used by the following one since they are
+			stored in the CAS.
+		</p></div><div class="chapter" lang="en" id="sandbox.regexAnnotator.conceptsFile"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.conceptsFile"></a>Chapter&nbsp;2.&nbsp;Concepts Configuration File</h2></div></div></div><p>
+			The RegexAnnotator can be configured using two levels of
+			complexity.
+		</p><p>
+			The RuleSet definition is the easier way to define rules.
+			Such a definition consists of a regular expression pattern
+			and of annotations that should be created if the rule match
+			an entity.
+		</p><p>
+			The Concept definition is the more complex way to define
+			rules. Such a definition can consists of more than one
+			regular expression rule that can be combined together and of
+			a set of annotations that should be created if one of the
+			rules has matched an entity.
+		</p><p>
+			The syntax for both definitions is the same, so you don't
+			need to learn two configuration possibilities. The RuleSet
+			definition is just available to have an easier and faster
+			way to configure the annotator for simple tasks. If you have
+			a RuleSet definition it is also possible to extend it with
+			more and more features so that it becomes a real Concept
+			definition.
+		</p><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.rules"></a>2.1.&nbsp;RuleSet definition</h2></div></div></div><p>
+				The syntax of a simple RuleSet definition to detect email addresses 
+				is shown in the listing below:
+			</p><p>
+				</p><pre class="programlisting">&lt;conceptSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
+  xsi:noNamespaceSchemaLocation="concept.xsd"&gt;
+
+  &lt;concept name="emailAddressDetection"&gt;
+    &lt;rules&gt;
+      &lt;rule regEx="([a-zA-Z0-9!#$%*+'/=?^_-`{|}~.\x26]+)@
+      			([a-zA-Z0-9._-]+[a-zA-Z]{2,4})" 
+        matchStrategy="matchAll" matchType="uima.tcas.DocumentAnnotation"/&gt;		
+    &lt;/rules&gt;
+    &lt;createAnnotations&gt;	
+      &lt;annotation id="emailAnnot" type="org.apache.uima.EmailAddress"&gt;
+        &lt;begin group="0"/&gt;
+        &lt;end group="0"/&gt;
+      &lt;/annotation&gt;
+    &lt;/createAnnotations&gt;
+  &lt;/concept&gt;
+
+&lt;/conceptSet&gt;
+</pre><p>
+			</p><p>
+				The definition above defines are simple concept
+				with the name <code class="code">emailAddressDetection</code>. The 
+				defined rule use <code class="code">([a-zA-Z0-9!#$%*+'/=?^_-`{|}~.\x26]+)@([a-zA-Z0-9._-]+[a-zA-Z]{2,4})</code> as 
+				regular expression pattern that is matched on the
+				covered text of the match type <code class="code">uima.tcas.DocumentAnnotation</code>. 
+				As match strategy, <code class="code">matchAll</code> is used that means that all
+				matches for the pattern are used to create the
+				annotations defined in the
+				<code class="code">&lt;createAnnotations&gt;</code>
+				element. So for each match a
+				<code class="code">org.apache.uima.EmailAddress</code> annotation is created that
+				covers the match in the document text.
+			</p><p>
+				For additional annotation creation possibilities such as adding
+				features to a created annotation, please refer to 
+				<a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation" title="2.5.&nbsp;Annotation Creation">Section&nbsp;2.5, &#8220;Annotation Creation&#8221;</a>
+			</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.concepts"></a>2.2.&nbsp;Concept definition</h2></div></div></div><p>The syntax of a complex Concept definition to detect credit card numbers for the 
+			  RegexAnnotator is shown in the listing below:</p><p>
+			
+			</p><pre class="programlisting">&lt;conceptSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
+    xsi:noNamespaceSchemaLocation="concept.xsd"&gt;
+
+    &lt;concept name="creditCardNumberDetection" processAllRules="true"&gt;
+      &lt;rules&gt;
+        &lt;rule ruleId="AmericanExpress" 
+              regEx="(((34|37)\d{2}[- ]?)(\d{6}[- ]?)\d{5})" 
+              matchStrategy="matchAll" 
+              matchType="uima.tcas.DocumentAnnotation" 
+              confidence="1.0"/&gt;
+        &lt;rule ruleId="Visa" 
+              regEx="((4\d{3}[- ]?)(\d{4}[- ]?){2}\d{4})" 
+              matchStrategy="matchAll" 
+              matchType="uima.tcas.DocumentAnnotation" 
+              confidence="1.0"/&gt;
+        &lt;rule ruleId="MasterCard" 
+              regEx="((5[1-5]\d{2}[- ]?)(\d{4}[- ]?){2}\d{4})" 
+              matchStrategy="matchAll" 
+              matchType="uima.tcas.DocumentAnnotation" 
+              confidence="1.0"/&gt;
+        &lt;rule ruleId="unknownCardType" 
+              regEx="(([1-6]\d{3}[- ])(\d{4}[- ]){2}\d{4})|
+                 ([1-6]\d{13,18})|([1-6]\d{3}[- ]\d{6}[- ]\d{5})" 
+              matchStrategy="matchAll" 
+              matchType="uima.tcas.DocumentAnnotation" 
+              confidence="1.0"/&gt;		
+      &lt;/rules&gt; 
+      &lt;createAnnotations&gt;	
+        &lt;annotation	id="creditCardNumber" 
+            		type="org.apache.uima.CreditCardNumber" 
+            		validate="org.apache.uima.annotator.regex.
+            		    extension.impl.CreditCardNumberValidator"&gt;
+          &lt;begin group="0"/&gt;
+          &lt;end group="0"/&gt;
+          &lt;setFeature name="confidence" type="Confidence"/&gt;
+          &lt;setFeature name="cardType" type="RuleId"/&gt;
+        &lt;/annotation&gt;
+      &lt;/createAnnotations&gt;
+    &lt;/concept&gt;
+
+&lt;/conceptSet&gt;
+</pre><p>
+				
+			</p><p>
+				As you can see the Concept definition is a more complex
+				RuleSet definition. The main differences are some additional
+				features defined at the rule and the combination of rules 
+				within one concept. 
+				The new features for a rule are <code class="code">ruleID</code>
+				and <code class="code">confidence</code>. If these features
+				are specified, the feature values for these features can 
+				later be assigned to an annotation feature for a created annotation. 
+				In case we use the listing above as example this means that when the 
+				<code class="code">org.apache.uima.CreditCardNumber</code> is created the value of the
+				<code class="code">confidence</code> feature of the rule that matched the document text 
+				is assigned to the annotation feature called <code class="code">confidenceValue</code>.
+				The same is done for the <code class="code">ruleId</code> feature.
+				With that you can later check your annotation confidence and you can see 
+				which rule was responsible for the annotation creation.
+			</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
+					The annotation features for <code class="code">Confidence</code>
+					and <code class="code">RuleId</code>
+					have to be created manually in the UIMA type system.
+					Given that it is possible to assign the <code class="code">confidence</code> and <code class="code">ruleId</code> 
+					feature values to any other annotation feature you have defined 
+					in the UIMA type system. Confidence features have to be of type
+					<code class="code">uima.cas.Float</code> and RuleId features have to be of
+					type <code class="code">uima.cas.String</code>.
+				</p></div><p>
+				The processing of a concept definition depends on the rule processing.
+				The feature that controls the rule processing is called
+				<code class="code">processAllRules</code> and is specified at the <code class="code">&lt;concept&gt;</code> element.
+				By default this optional feature is set to <code class="code">false</code>. 
+				This means that the concept processing 
+				starts with the	first rule and goes on with the next one 
+				until a match was found. So in this processing mode, maybe only the first rule
+				of a concept is evaluated if there a match was found. The other rules
+				of this concept will be ignored in that case.
+				This strategy should be used for example if your first concept 
+				rule has a strict pattern with a confidence of 1.0 and your 
+				second rule has a more lenient pattern with a confidence
+				of 0.5. If the <code class="code">processAllRules</code> feature
+				is set to <code class="code">true</code>	all rules of a concept are processed 
+				independent of the matches for a previous rule.
+			</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.regexVariables"></a>2.3.&nbsp;Regex Variables</h2></div></div></div><p>
+				The regex variables allows to externalize parts of a regular expression
+				to shorten them and make it easier to read. The externalized part of the
+				expression is replaced with a regex variable. The variable syntax looks like
+				<code class="code">\v{weekdays}</code>, where <code class="code">weekdays</code> is the variable name.
+				The field for regex variables are mainly the separation of enumerations in a 
+				regular expression to make them easier to understand and maintain.
+				But let's see how it works in the short example below.
+			</p><p>
+			    A simple regular expression for a date like <code class="code">Wednesday, November 28, 2007</code>
+			    can look like: 
+			</p><p>
+			   </p><pre class="programlisting"><span class="emphasis"><em>&lt;concept name="Date" processAllRules="true"&gt;
+ &lt;rules&gt;
+  &lt;rule regEx="(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday), 
+   (January|February|March|April|May|June|July|August|September|October|
+   November|December) (0[1-9]|[12][0-9]|3[01]), ((19|20)\d\d)"
+   matchStrategy="matchAll" matchType="uima.tcas.DocumentAnnotation"/&gt;
+ &lt;/rules&gt;
+ &lt;createAnnotations&gt;
+  &lt;annotation type="org.apache.uima.Date"&gt;
+   &lt;begin group="0" /&gt;
+   &lt;end group="0" /&gt;
+  &lt;/annotation&gt;
+ &lt;/createAnnotations&gt;
+&lt;/concept&gt;		   
+</em></span></pre><p>
+			</p><p>
+			   When using regex variables to externalize the weekdays and the months in this 
+			   regular expression, it looks like: 
+			</p><p>
+			   </p><pre class="programlisting"><span class="emphasis"><em>&lt;conceptSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	xmlns="http://incubator.apache.org/uima/regex"&gt;
+
+&lt;variables&gt;
+ &lt;variable name="weekdays" 
+   value="Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday"/&gt;
+ 
+ &lt;variable name="months" 
+   value="January|February|March|April|May|June|July|August|September|
+     October|November|December"/&gt;
+&lt;/variables&gt;
+			   
+
+&lt;concept name="Date" processAllRules="true"&gt;
+ &lt;rules&gt;
+  &lt;rule regEx="(\v{weekdays}), (\v{months}) (0[1-9]|[12][0-9]|3[01]), 
+     ((19|20)\d\d)"
+     matchStrategy="matchAll" matchType="uima.tcas.DocumentAnnotation"/&gt;
+  &lt;/rules&gt;
+  &lt;createAnnotations&gt;
+   &lt;annotation type="org.apache.uima.Date"&gt;
+    &lt;begin group="0" /&gt;
+    &lt;end group="0" /&gt;
+   &lt;/annotation&gt;
+ &lt;/createAnnotations&gt;
+&lt;/concept&gt;		
+
+&lt;/conceptSet&gt;   
+</em></span></pre><p>
+			</p><p>
+			  The regex variables must be defined at the beginning of the concept file 
+			  next to the <code class="code">&lt;conceptSet&gt;</code> element before the concepts are 
+			  defined. The variables can be used in all concept definition within the
+			  same file. 
+			</p><p>
+			  The regex variable name can contain any of the following characters 
+			  <code class="code">[a-zA-Z_0-9]</code>. Other characters are not allowed.
+			</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.rulesDefinition"></a>2.4.&nbsp;Rule Definition</h2></div></div></div><p>
+				This paragraph shows in detail how to define a rule for a 
+				RuleSet or Concept definition and give you some advanced 
+				configuration possibilities	for the rule processing.
+			</p><p>
+				The listing below shows an abstract rule definition with
+				all possible sub elements and attributes. Please refer to
+				the sub sections for details about the sub elements.
+			</p><p> 
+</p><pre class="programlisting"><span class="emphasis"><em>&lt;rule ruleId="ID1" regEx="TestRegex" matchStrategy="matchAll" 
+    matchType="uima.tcas.DocumentAnnotation" featurePath="my/feature/path" 
+    confidence="1.0"&gt;
+
+  &lt;matchTypeFilter&gt;
+    &lt;feature name="language"&gt;en&lt;/feature&gt;
+  &lt;/matchTypeFilter&gt;
+
+  &lt;updateMatchTypeAnnotation&gt;
+    &lt;setFeature name="language" type="String"&gt;$0&lt;/setFeature&gt;
+  &lt;/updateMatchTypeAnnotation&gt;	
+
+  &lt;ruleExceptions&gt;	
+    &lt;exception matchType="uima.tcas.DocumentAnnotation"&gt;
+        ExceptionExpression
+    &lt;/exception&gt;
+  &lt;/ruleExceptions&gt;
+
+&lt;/rule&gt;
+</em></span></pre><p>
+			</p><p>
+				For each rule that should be added a <code class="code">&lt;rule&gt;</code> element
+				have to be created. The <code class="code">&lt;rule&gt;</code> element definition has three 
+				mandatory features, these are:
+			</p><p>
+					</p><div class="itemizedlist"><ul type="disc"><li><p>
+								<code class="code">regEx</code>
+								- The regular expression pattern that
+								is used for this rule. As pattern, everything supported 
+								by the Java regular expression syntax is allowed.
+							</p></li><li><p>
+								<code class="code">matchStrategy</code>
+								- The match strategy that is used
+								for this rule. Possible values are
+								<code class="code">matchAll</code>
+								to get all matches,
+								<code class="code">matchFirst</code>
+								to get the first match only and
+								<code class="code">matchComplete</code>
+								to get matches where the whole input
+								text match the regular expression pattern.
+							</p></li><li><p>
+								<code class="code">matchType</code>
+								- The annotation type that is used 
+								to match the regular expression pattern.
+								As input text for the match, the annotation span 
+								is used, but only if no additional <code class="code">featurePath</code>
+								feature is specified.
+							</p></li></ul></div><p>
+				</p><p>
+					In addition to the mandatory features the <code class="code">&lt;rule&gt;</code>
+					element definition also has some optional features that can
+					be used, these are:
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">ruleId</code>
+							- Specifies the ID for this rule. The
+							ID can later be used to add it as
+							value to an annotation feature (see
+							<a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.features" title="2.5.3.&nbsp;Annotation Features">Section&nbsp;2.5.3, &#8220;Annotation Features&#8221;</a>).
+						</p></li><li><p>
+							<code class="code">confidence</code>
+							- Specifies the confidence value of this
+							rule. If you have more than one rule that describes 
+							the same complex entity you can classify the rules with
+							a confidence value. This confidence value
+							can later be used to add it as value to an
+							annotation feature (see
+							<a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.features" title="2.5.3.&nbsp;Annotation Features">Section&nbsp;2.5.3, &#8220;Annotation Features&#8221;</a>).
+						</p></li><li><p>
+							<code class="code">featurePath</code>
+							- Specifies the feature path that should be used to match the regular expression pattern. 
+							If a feature path is specified, the feature path value is used to match against the
+							regular expression instead of the match type annotation span.
+							The defined feature path must be valid for the specified match type annotation type. 
+							The feature path elements are separated by "/".
+						</p><p>
+						    The listing below shows how to match a regular expression on the <code class="code">normalizedText</code> 
+						    feature of a <code class="code">uima.TokenAnnotation</code>. So in this case, not the covered text of the
+						    <code class="code">uima.TokenAnnotation</code> is used to match the regular expression but the 
+						    <code class="code">normalizedText</code> feature value of the annotation. The <code class="code">normalizedText</code> 
+						    feature must be defined in the UIMA type system as feature of type <code class="code">uima.TokenAnnotation</code>.
+						</p><p>
+						    </p><pre class="programlisting"><span class="emphasis"><em>&lt;rule regEx="TestRegex" matchStrategy="matchAll" 
+    matchType="uima.TokenAnnotation" featurePath="normalizedText"&gt;
+&lt;/rule&gt;
+</em></span></pre><p>
+						</p></li></ul></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.rulesDefinition.filter"></a>2.4.1.&nbsp;Match Type Filter</h3></div></div></div><p>
+				</p><pre class="programlisting"><span class="emphasis"><em>&lt;matchTypeFilter&gt;
+  &lt;feature featurePath="language"&gt;en&lt;/feature&gt;
+&lt;/matchTypeFilter&gt;
+</em></span></pre><p>
+					
+
+				</p><p>
+					Match type filters can be used to filter the match type
+					annotations that are used for matching the regular expression
+					pattern. For example to use a rule only when the document language 
+					is English, as shown in the example above. 
+					Match type filters ever relate to the <code class="code">matchType</code>
+					that was specified for the rule.
+				</p><p>
+					The <code class="code">&lt;matchTypeFilter&gt;</code>
+					element can contain an arbitrary amount of
+					<code class="code">&lt;feature&gt;</code>
+					elements that contains the filter information. But all specified <code class="code">&lt;feature&gt;</code>
+					elements have to be valid for the <code class="code">matchType</code> annotation
+					of the rule.
+				</p><p>
+					The feature path that should be used as 
+					filter is specified using the <code class="code">featurePath</code> feature of the
+					<code class="code">&lt;feature&gt;</code> element. Feature path elements are separated by "/" e.g.
+					my/feature/path. The specified feature path must be valid for the <code class="code">matchType</code> annotation.
+					The content of the
+					<code class="code">&lt;feature&gt;</code> element contains the regular expression pattern 
+					that is used as filter. To pass the filter, this pattern 
+					have to match the feature path value that is resolved using the match type annotation. 
+					In the example above the match type annotation has a UIMA feature called 
+					<code class="code">language</code> that have to have the content <code class="code">en</code>. If that
+					is true, the annotation passed the filter condition.
+				</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.rulesDefinition.update"></a>2.4.2.&nbsp;Update Match Type Annotations With Additional Features</h3></div></div></div><p>	
+					</p><pre class="programlisting"><span class="emphasis"><em>&lt;updateMatchTypeAnnotation&gt;
+  &lt;setFeature name="language" type="String"&gt;$0&lt;/setFeature&gt;
+&lt;/updateMatchTypeAnnotation&gt;
+</em></span></pre><p>
+				</p><p>
+					With the
+					<code class="code">&lt;updateMatchTypeAnnotation&gt;</code>
+					construct it is possible to update or set a UIMA feature value
+					for the match type annotation in case a rule match
+					was found. The
+					<code class="code">&lt;updateMatchTypeAnnotation&gt;</code> element
+					can have an arbitrary amount of
+					<code class="code">&lt;setFeature&gt;</code> elements that contains
+					the feature information that should be updated.
+				</p><p>
+					The	<code class="code">&lt;setFeature&gt;</code> element has two 
+					mandatory features, these are:
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">name</code>
+							- Specifies the UIMA feature name that
+							should be set. The feature have to be available
+							at the <code class="code">matchType</code> annotation
+							of the rule.
+						</p></li><li><p>
+							<code class="code">type</code>
+							- Specifies the UIMA feature type that is
+							defined in the UIMA type system for this feature. 
+							Currently supported feature types are <code class="code">String</code>,
+							<code class="code">Integer</code> and <code class="code">Float</code>.
+						</p></li></ul></div><p>
+					The	optional features are:
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">normalization</code>
+							- Specifies the normalization that should be performed before the feature value
+							is assigned to the match type annotation. For a list of all built-in
+							normalization functions please refer to 
+							<a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization" title="2.5.3.2.&nbsp;Features Value Normalization">Section&nbsp;2.5.3.2, &#8220;Features Value Normalization&#8221;</a>. 
+						</p></li><li><p>
+							<code class="code">class</code>
+							- Specifies the custom normalization class that should be used to normalize the
+							feature value before it is assigned to the match type annotation. Custom normalization
+							classes are used if the <code class="code">normalization</code> feature has the value 
+							<code class="code">Custom</code>. The normalization class have to implement the
+							<code class="code">org.apache.uima.annotator.regex.extension.Normalization</code> interface.
+							For details about the feature normalization please refer to 
+							<a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization" title="2.5.3.2.&nbsp;Features Value Normalization">Section&nbsp;2.5.3.2, &#8220;Features Value Normalization&#8221;</a>. 
+						</p></li></ul></div><p>
+					The content of the	<code class="code">&lt;setFeature&gt;</code>
+					element definition contains the feature value that should be set. 
+					This can either be a literal value or a regular
+					expression capturing group as shown in the example
+					above. A combination of capturing groups and literals
+					is also possible.
+				</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.rulesDefinition.exception"></a>2.4.3.&nbsp;Rule exception</h3></div></div></div><p>
+					 
+					</p><pre class="programlisting"><span class="emphasis"><em>&lt;ruleExceptions&gt;	
+  &lt;exception matchType="uima.tcas.DocumentAnnotation"&gt;
+      ExceptionPattern
+  &lt;/exception&gt;
+&lt;/ruleExceptions&gt;
+</em></span></pre><p>			
+
+				</p><p>
+					With the
+					<code class="code">&lt;ruleExceptions&gt;</code>
+					construct it is possible to configure exceptions to prevent matches for the rule. 
+					An exception is something similar to a filter, but on the higher level. For
+					example take the scenario where you have several token annotations that
+					are covered by a sentence annotation. You have written a rule that can detect
+					car brands. The text you analyze has the sentence "Henry Ford was born 1863". 
+					When analyzing the text you will get a car brand annotation since "Ford" is
+					a car brand. But is this the correct behavior? The work around that issue
+					you can create an exception that looks like
+					 </p><pre class="programlisting"><span class="emphasis"><em>&lt;ruleExceptions&gt;	
+  &lt;exception matchType="uima.SentenceAnnotation"&gt;Henry&lt;/exception&gt;
+&lt;/ruleExceptions&gt;
+</em></span></pre><p>
+					and add it to your car brand rule. After adding this, car brand annotations
+					are only created if the sentence annotation that covers the token annotation
+					does not contain the word "Henry". 					
+				</p><p>
+					The	<code class="code">&lt;ruleExceptions&gt;</code> element can have 
+					an arbitrary amount of <code class="code">&lt;exception&gt;</code>
+					elements to specify rule exceptions.
+				</p><p>
+					The <code class="code">&lt;exception&gt;</code>
+					element has one mandatory feature called
+					<code class="code">matchType</code>. The <code class="code">matchType</code> feature
+					specifies the annotation type the exception is based on. 
+					The concrete exception match type annotation that is used 
+					during the runtime is evaluated for each
+					match type annotation that is used to match a rule. As
+					exception annotation always the covering annotation
+					of the current match type annotation is used. 
+					If no covering annotation instance of the exception match type 
+					was found the exception is not evaluated.
+				</p><p>
+					The content of the <code class="code">&lt;exception&gt;</code>
+					element specifies the regular expression that is used to evaluate the exception.
+				</p><p>
+					If the exception match is true, the
+					current match type annotation is filtered out and is
+					not used to create any matches and annotations.
+				</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation"></a>2.5.&nbsp;Annotation Creation</h2></div></div></div><p>
+				  This paragraph explains in detail how to create annotations if a rule has matched some input text.
+				  An annotation creation example with all possible settings is shown in the listing below.
+				</p><p>
+				</p><pre class="programlisting"><span class="emphasis"><em>&lt;annotation id="testannot" type="org.apache.uima.TestAnnot" 
+	validate="CustomValidatorClass"&gt;
+	&lt;begin group="0" location="start"/&gt;
+	&lt;end group="0" location="end"/&gt;
+	&lt;setFeature name="testFeature1" type="String"&gt;$0&lt;/setFeature&gt;
+	&lt;setFeature name="testFeature2" type="String" 
+		normalization="ToLowerCase"&gt;$0&lt;/setFeature&gt;
+	&lt;setFeature name="testFeature3" type="Integer"&gt;$1&lt;/setFeature&gt;
+	&lt;setFeature name="testFeature4" type="Float"&gt;$2&lt;/setFeature&gt;		
+	&lt;setFeature name="testFeature5" type="Reference"&gt;testannot1&lt;/setFeature&gt;
+	&lt;setFeature name="confidenceValue" type="Confidence"/&gt;
+	&lt;setFeature name="ruleId" type="RuleId"/&gt;
+	&lt;setFeature name="normalizedText" type="String" 
+		normalization="Custom" 
+		class="org.apache.CustomNormalizer"&gt;$0&lt;/setFeature&gt;
+&lt;/annotation&gt;</em></span></pre><p>
+				</p><p>
+				  The <code class="code">&lt;annotation&gt;</code> element has two mandatory features, these are:
+				</p><p>
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">id</code>
+							- Specifies the annotation id for this annotation. If the annotation id is specified, 
+							it must be unique within the same concept. An annotation id is required if the
+							annotation is referred by another annotation or if the annotation itself refers 
+							other annotations using a <code class="code">Reference</code> feature. 
+						</p></li><li><p>
+							<code class="code">type</code>
+							- Specifies the UIMA annotation type that is used if an annotation is created. 
+							The used type have to be defined in the UIMA type system.
+						</p></li></ul></div><p>
+				</p><p>
+				  The optional features are:
+				</p><p>
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">validate</code>
+							- Specifies the custom validator class that is used to validate matches before
+							they are added as annotation to the CAS. For more details about the custom 
+							annotation validation, please refer to 
+							<a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.validation" title="2.5.2.&nbsp;Annotation Validation">Section&nbsp;2.5.2, &#8220;Annotation Validation&#8221;</a>.
+						</p></li></ul></div><p>
+				</p><p>
+				  The mandatory sub elements of the <code class="code">&lt;annotation&gt;</code> element are:
+				</p><p>
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">&lt;begin&gt;</code>
+							- Specifies the begin position of the annotation that is created.
+							For details about the <code class="code">&lt;begin&gt;</code> element, please refer
+							to <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries" title="2.5.1.&nbsp;Annotation Boundaries">Section&nbsp;2.5.1, &#8220;Annotation Boundaries&#8221;</a>.
+						</p></li><li><p>
+							<code class="code">&lt;end&gt;</code>
+							- Specifies the end position of the annotation that is created.
+							For details about the <code class="code">&lt;end&gt;</code> element, please refer
+							to <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries" title="2.5.1.&nbsp;Annotation Boundaries">Section&nbsp;2.5.1, &#8220;Annotation Boundaries&#8221;</a>.
+						</p></li></ul></div><p>
+				</p><p>
+				  The optional sub elements of the <code class="code">&lt;annotation&gt;</code> element are:
+				</p><p>
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code"><code class="code">&lt;setFeature&gt;</code></code>
+							- set a UIMA feature for the created annotation.
+							For details about the <code class="code">&lt;setFeature&gt;</code> element, please refer
+							to <a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.features" title="2.5.3.&nbsp;Annotation Features">Section&nbsp;2.5.3, &#8220;Annotation Features&#8221;</a>
+						</p></li></ul></div><p>
+				</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries"></a>2.5.1.&nbsp;Annotation Boundaries</h3></div></div></div><p>
+				  When creating an annotation with the <code class="code">&lt;annotation&gt;</code> element it is also
+				  necessary to define the annotations boundaries. The annotation boundaries are defined using the
+				  sub elements <code class="code">&lt;begin&gt;</code> and <code class="code">&lt;end&gt;</code>. The start position of
+				  the annotation is defined using the <code class="code">&lt;begin&gt;</code> element, the end position using
+				  the <code class="code">&lt;end&gt;</code> element. Both elements have the same features as shown below:
+				</p><p>
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">group</code>
+							- identifies the capturing group number within the regular expression pattern for the 
+							current rule. The value is a positive number where 0 denotes 
+							the whole match, 1 the first capturing group, 2 the second one, and so on.
+						</p></li><li><p>
+							<code class="code">location</code>
+							- indicates a position inside the capturing group, which can either be the position 
+							of the left parenthesis in case of a value <code class="code">start</code>, or the right parenthesis in 
+							case of a value <code class="code">end</code>. The <code class="code">location</code> feature is optional. By default
+							the <code class="code">&lt;begin&gt;</code> element is set to <code class="code">location="start"</code> and the 
+							<code class="code">&lt;end&gt;</code> element to <code class="code">location="end"</code>.
+						</p></li></ul></div><p>
+				</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
+					When the rule definition defines a <code class="code">featurePath</code> for a <code class="code">matchType</code>, 
+					the annotation boundaries for the created annotation are automatically set to 
+					the annotation boundaries of the match input annotation. This must be done since
+					the matching with a feature value of an annotation has no relation to the document text, so the only
+					relation is the annotation where the feature is defined. 
+					</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.validation"></a>2.5.2.&nbsp;Annotation Validation</h3></div></div></div><p>
+				  The custom annotation validation can be used to validate a regular expression match by using some
+				  java code before the match is added as annotation to the CAS. For example if your regular expression
+				  detects an ISBN number you can use the custom validation code to check if it is really an ISBN number
+				  by calculating the last check digit or if it is just a phone number.
+				</p><p>
+				  To use the custom annotation validation you have to specify the validation class at the <code class="code">validate</code>
+				  feature of the <code class="code">&lt;annotation&gt;</code> element. The validation class must implement the 
+				  <code class="code">org.apache.uima.annotator.regex.extension.Validation</code> interface 
+				  (<a href="#sandbox.regexAnnotator.Validation" title="Appendix&nbsp;B.&nbsp;Validation Interface">Appendix&nbsp;B, <i xmlns:xlink="http://www.w3.org/1999/xlink">Validation Interface</i></a>). The interface defines one 
+				  method called <code class="code">validate(String coveredText, String ruleID)</code>. The validate method is called by the annotator
+				  before the match is added as annotation to the CAS. Annotations are only added if the validate method 
+				  returns <code class="code">true</code>, otherwise the match is skipped. The <code class="code">coveredText</code> parameter contains 
+				  the text that matches the regular expression.
+				  The <code class="code">ruleID</code> parameter contains the ruldId of the rule that creates the match. This can also be null
+				  if no ruleID was specified. The listing below shows a sample implementation of the validation interface.  
+				</p><p>
+				</p><pre class="programlisting">package org.apache.uima.annotator.regex;
+
+public class SampleValidator implements 
+	org.apache.uima.annotator.regex.extension.Validation {
+
+   /* (non-Javadoc)
+    * @see org.apache.uima.annotator.regex.extension.Validation
+    *      #validate(java.lang.String, java.lang.String)
+    */
+   public boolean validate(String coveredText, String ruleID) 
+      throws Exception {
+      
+      //implement your custom validation, e.g. to validate ISBN numbers
+      return validateISBNNumbers(coveredText);
+   }
+}</pre><p>
+				</p><p>
+				  The configuration for this example looks like: 
+				</p><p>
+				</p><pre class="programlisting"><span class="emphasis"><em>&lt;annotation id="isbnNumber" type="org.apache.uima.ISBNNumber" 
+    validate="org.apache.uima.annotator.regex.SampleValidator"&gt;
+	&lt;begin group="0"/&gt;
+	&lt;end group="0"/&gt;
+&lt;/annotation&gt;</em></span></pre><p>
+				</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.features"></a>2.5.3.&nbsp;Annotation Features</h3></div></div></div><p>
+				  With the <code class="code">&lt;setFeature&gt;</code> element of <code class="code">&lt;annotation&gt;</code> definition it is 
+				  possible to set UIMA features for the created annotation. The mandatory features
+				  for the <code class="code">&lt;setFeature&gt;</code> element are: 
+				</p><p>
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">name</code>
+							- Specifies the UIMA feature name that should be set. The feature name have to 
+							be a valid UIMA feature for this annotation and have to be defined in the
+							UIMA type system.
+						</p></li><li><p>
+							<code class="code">type</code>
+							- Specifies the type of the UIMA feature. For a list of all
+							possible feature types please refer to 
+							<a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureTypes" title="2.5.3.1.&nbsp;Features Types">Section&nbsp;2.5.3.1, &#8220;Features Types&#8221;</a>.
+						</p></li></ul></div><p>
+				</p><p>
+				  The optional features are:
+				</p><p>
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">normalization</code>
+							- Specifies the normalization that should be performed before the feature value
+							is assigned to the UIMA annotation. For a list of all built-in
+							normalization functions please refer to 
+							<a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization" title="2.5.3.2.&nbsp;Features Value Normalization">Section&nbsp;2.5.3.2, &#8220;Features Value Normalization&#8221;</a>. 
+						</p></li><li><p>
+							<code class="code">class</code>
+							- Specifies the custom normalization class that should be used to normalize the
+							feature value before it is assigned to the UIMA annotation. Custom normalization
+							classes are used if the <code class="code">normalization</code> feature has the value 
+							<code class="code">Custom</code>. The normalization class have to implement the
+							<code class="code">org.apache.uima.annotator.regex.extension.Normalization</code> interface.
+							For details about the feature normalization please refer to 
+							<a href="#sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization" title="2.5.3.2.&nbsp;Features Value Normalization">Section&nbsp;2.5.3.2, &#8220;Features Value Normalization&#8221;</a>. 
+						</p></li></ul></div><p>
+				</p><p>
+				  The content of the <code class="code">&lt;setFeature&gt;</code> element specifies the value of the
+				  UIMA feature that is set. As value a literal, a capturing group or a combination of
+				  both can be used.
+				  To add the value of a capturing group there are two ways to do it.
+				  The first notation is <code class="code">$</code> followed by the capturing group number from 0 to 9
+				  e.g. $0 for capturing group 0 or $7 for capturing group 7.
+				  The second notation to get the value of a capturing group are capturing group names. 
+				  If the rule contains named capturing groups these groups can be accessed 
+				  with <code class="code">${matchGroupName}</code>. For the access of capturing
+				  groups greater than 9 capturing group names must be used. An example for capturing group names is 
+				  shown below:
+				</p><p>
+				To add a name to a capturing group just add the following fragment <code class="code">\m{groupname}</code>
+				in front of the capturing group start parenthesis.
+				</p><pre class="programlisting"><span class="emphasis"><em>&lt;concept name="capturingGroupNames"&gt;
+   &lt;rules&gt;
+      &lt;rule ruleId="ID1" 
+         regEx="My \m{groupName}(named capturing group) example" 
+         matchStrategy="matchAll" 
+         matchType="uima.tcas.DocumentAnnotation"/&gt;
+   &lt;/rules&gt;
+   &lt;createAnnotations&gt;	
+      &lt;annotation type="org.apache.uima.TestAnnot"&gt;
+         &lt;begin group="0"/&gt;
+         &lt;end group="0"/&gt;
+         &lt;setFeature name="testFeature0" type="String"&gt;
+            ${groupName}
+         &lt;/setFeature&gt;		
+      &lt;/annotation&gt;
+   &lt;/createAnnotations&gt;
+&lt;/concept&gt;    
+</em></span></pre><p>
+				</p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.featureTypes"></a>2.5.3.1.&nbsp;Features Types</h4></div></div></div><p>
+				  When setting UIMA feature for an annotation using the <code class="code">&lt;setFeature&gt;</code> element 
+				  the feature type has to be specified according the the UIMA type system definition.
+				  The feature at the <code class="code">&lt;setFeature&gt;</code> element to do that is called <code class="code">type</code>.
+				  The list below shows all currently supported feature types: 
+				</p><p>
+				</p><div class="itemizedlist"><ul type="disc"><li><p>
+							<code class="code">String</code>
+							- for <code class="code">uima.cas.String</code> based UIMA features.
+						</p></li><li><p>
+							<code class="code">Integer</code>
+							- for <code class="code">uima.cas.Integer</code> based UIMA features.
+						</p></li><li><p>
+							<code class="code">Float</code>
+							- for <code class="code">uima.cas.Float</code> based UIMA features.
+						</p></li><li><p>
+							<code class="code">Reference</code>
+							- to link a UIMA feature to another annotation. In this case the
+							UIMA feature type have to be the same as the referred annotation type.
+							To reference another annotation instance the <code class="code">&lt;setFeature&gt;</code>
+							content must have the annotation <code class="code">id</code> as value of the referred 
+							annotation. The referred annotation instance is the created annotation of
+							the current match.
+						</p></li><li><p>
+							<code class="code">Confidence</code>
+							- to add the value of the <code class="code">confidence</code> feature defined
+							at the <code class="code">&lt;rule&gt;</code> element to this feature. The UIMA feature have to
+							be of type <code class="code">uima.cas.Float</code>.
+						</p></li><li><p>
+							<code class="code">RuleId</code>
+							- to add the value of the <code class="code">ruleId</code> feature defined
+							at the <code class="code">&lt;rule&gt;</code> element to this feature. The UIMA feature have to
+							be of type <code class="code">uima.cas.String</code>.
+						</p></li></ul></div><p>
+				</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
+					Float and Integer based feature values are converted using the Java NumberFormat for the
+					current Java default locale. If the feature value cannot be converted the feature value is not
+					set and a warning is written to the log. To prevent these warnings it may be useful 
+					to do a custom normalization of the numbers before they are added to the feature.
+					</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sandbox.regexAnnotator.conceptsFile.annotationCreation.featureNormalization"></a>2.5.3.2.&nbsp;Features Value Normalization</h4></div></div></div><p>
+					  Before assigning a feature value to an annotation it is possible to 
+					  do a normalization on the feature value. This normalization can be useful for example to normalize 
+					  a detected email addresses to lower case before it is added to the annotation. 
+					  To normalize a feature value the <code class="code">normalization</code> feature of the 
+					  <code class="code">&lt;setFeature&gt;</code> element is used. The built-in normalization functions 
+					  are listed below. Additionally the RegexAnnotator provides an extension point that can be 
+					  implemented to add a custom normalization. 
+				    </p><p>
+				      The possible build-in functions that are specified as feature value of 
+				      the <code class="code">normalization</code> feature are listed below: 
+					</p><p>
+						</p><div class="itemizedlist"><ul type="disc"><li><p>
+									<code class="code">ToLowerCase</code>
+									- normalize the feature value to lower case before it is assigned to the annotation.
+								</p></li><li><p>
+									<code class="code">ToUpperCase</code>
+									- normalize the feature value to upper case before it is assigned to the annotation.
+								</p></li><li><p>
+									<code class="code">Trim</code>
+									- remove all leading and trailing whitespace characters from the feature value before 
+									it is assigned to the annotation.
+								</p></li></ul></div><p>
+						Built-in normalization configuration:
+						</p><pre class="programlisting"><span class="emphasis"><em>&lt;setFeature name="normalizedFeature" type="String" 
+	normalization="ToLowerCase"&gt;$0&lt;/setFeature&gt;</em></span></pre><p>
+   					</p><p>
+					  In case of a custom normalization, the <code class="code">normalization</code> feature must have the value
+					  <code class="code">Custom</code>, and an additional feature of the <code class="code">&lt;setFeature&gt;</code> element 
+					  called <code class="code">class</code> have to be specified containing the full qualified class name of the 
+					  custom normalization implementation. The custom normalization implementation have to implement 
+					  the interface <code class="code">org.apache.uima.annotator.regex.extension.Normalization</code> 
+					  (<a href="#sandbox.regexAnnotator.Normalization" title="Appendix&nbsp;C.&nbsp;Normalization Interface">Appendix&nbsp;C, <i xmlns:xlink="http://www.w3.org/1999/xlink">Normalization Interface</i></a>) which defines the 
+					  <code class="code">normalize</code> method to normalize the feature values. A sample implementation with 
+					  the corresponding configuration is shown below.
+					</p><p> 
+					  Custom normalization implementation:
+					  </p><pre class="programlisting">package org.apache.uima;
+					  
+public class CustomNormalizer 
+  implements org.apache.uima.annotator.regex.extension.Normalization {
+
+   /* (non-Javadoc)
+    * @see org.apache.uima.annotator.regex.extension.Normalization
+    *		#normalize(java.lang.String, java.lang.String)
+    */
+   public String normalize(String input, String ruleId) {
+      
+      //implement your custom normalization
+      String result = ...
+      return result;
+   }</pre><p>
+   					</p><p>
+   					  Custom normalization configuration:
+   					  </p><pre class="programlisting"><span class="emphasis"><em>&lt;setFeature name="normalizedFeature" type="String" 
+	normalization="Custom" class="org.apache.uima.CustomNormalizer"&gt;
+  $0
+&lt;/setFeature&gt;</em></span></pre><p>
+   					</p></div></div></div></div><div class="chapter" lang="en" id="sandbox.regexAnnotator.annotatorDescriptor"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.annotatorDescriptor"></a>Chapter&nbsp;3.&nbsp;Annotator Descriptor</h2></div></div></div><p>The RegexAnnotator analysis engine descriptor contains some processing information for 
+			the annotator. The processing information is specified as configuration parameters. 
+			This chapter we explain in detail the possible descriptor settings.
+			</p><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.annotatorDescriptor.configParam"></a>3.1.&nbsp;Configuration Parameters</h2></div></div></div><p>
+				  The RegexAnnotator has the following configuration parameters: 
+				</p><p>
+					</p><div class="itemizedlist"><ul type="disc"><li><p>
+								<code class="code">ConceptFiles</code>
+								- This parameter is modeled as array of Strings and contains 
+								the concept files the annotator should use. The concept files
+								must be specified using a relative path that is available in the
+								UIMA datapath or in the classpath.
+								</p><pre class="programlisting"><span class="emphasis"><em>&lt;nameValuePair&gt;
+  &lt;name&gt;ConceptFiles&lt;/name&gt;
+  &lt;value&gt;
+    &lt;array&gt;
+      &lt;string&gt;subdir/myConcepts.xml&lt;/string&gt;
+      &lt;string&gt;SampleConcept.xml&lt;/string&gt; 
+    &lt;/array&gt;
+  &lt;/value&gt;
+&lt;/nameValuePair&gt;</em></span></pre><p>
+							</p></li></ul></div><p>
+				</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sandbox.regexAnnotator.annotatorDescriptor.capabilities"></a>3.2.&nbsp;Capabilities</h2></div></div></div><p>
+				  In the capabilities section of the RegexAnnotator descriptor the input and output 
+				  capabilities and the supported languages have to be defined. 
+				</p><p>
+				  The input capabilities defined
+				  in the descriptor have to comply with the match types used in the concept rule file 
+				  that is used. For example the <code class="code">uima.SentenceAnnotation</code> used in the rule
+				  below have to be added to the input capability section in the RegexAnnotator descriptor.
+				</p><p>
+				</p><pre class="programlisting"><span class="emphasis"><em>&lt;rules&gt;
+  &lt;rule regEx="SampleRegex" matchStrategy="matchAll" 
+      matchType="uima.SentenceAnnotation"/&gt;
+&lt;/rules&gt;
+</em></span></pre><p>
+				</p><p>
+				  In the output section, all of the annotation types and features created by 
+				  the RegexAnnotator have to be specified. These have to match the 
+				  output types and features declared in the <code class="code">&lt;annotation&gt;</code> elements of the concept file.
+				  For example the <code class="code">org.apache.uima.TestAnnot</code> annotation and the 
+				  <code class="code">org.apache.uima.TestAnnot:testFeature</code> feature used below have to
+				  be added to the output capability section in the RegexAnnotator descriptor. 
+				</p><p>
+				</p><pre class="programlisting"><span class="emphasis"><em>&lt;createAnnotations&gt;
+  &lt;annotation type="org.apache.uima.TestAnnot"&gt;
+    &lt;begin group="0"/&gt;
+    &lt;end group="0"/&gt;
+    &lt;setFeature name="testFeature" type="String"&gt;$0&lt;/setFeature&gt;
+  &lt;/annotation&gt;
+&lt;/createAnnotations&gt;
+</em></span></pre><p>
+				</p><p>
+				  If there are any language dependent rules in the concept file the languages abbreviations 
+				  have to be specified in the <code class="code">&lt;languagesSupported&gt;</code>element. If there are no 
+				  language dependent rules available you can specify <code class="code">x-unspecified</code> as language. That means
+				  that the annotator can work on all languages.   
+				</p><p>
+				  For the short examples used above the capabilities section in the RegexAnnotator 
+				  descriptor looks like:
+				</p><p>
+				</p><pre class="programlisting"><span class="emphasis"><em>&lt;capabilities&gt;
+  &lt;capability&gt;
+    &lt;inputs&gt;
+      &lt;type&gt;uima.SentenceAnnotation&lt;/type&gt;
+    &lt;/inputs&gt;
+    &lt;outputs&gt;
+      &lt;type&gt;org.apache.uima.TestAnnot&lt;/type&gt;
+      &lt;feature&gt;org.apache.uima.TestAnnot:testFeature&lt;/feature&gt;
+    &lt;/outputs&gt;
+    &lt;languagesSupported&gt;
+      &lt;language&gt;x-unspecified&lt;/language&gt;
+    &lt;/languagesSupported&gt;
+  &lt;/capability&gt;
+&lt;/capabilities&gt;
+</em></span></pre><p>
+				</p></div></div><div class="appendix" lang="en" id="sandbox.regexAnnotator.xsd"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.xsd"></a>Appendix&nbsp;A.&nbsp;Concept File Schema</h2></div></div></div><p>The concept file schema that is used to define the concept file looks like:
+			</p><p>
+				</p><pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
+&lt;xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
+   targetNamespace="http://incubator.apache.org/uima/regex"
+   xmlns="http://incubator.apache.org/uima/regex" 
+   elementFormDefault="qualified"&gt;
+	&lt;!--
+		* Licensed to the Apache Software Foundation (ASF) under one
+		* or more contributor license agreements.  See the NOTICE file
+		* distributed with this work for additional information
+		* regarding copyright ownership.  The ASF licenses this file
+		* to you under the Apache License, Version 2.0 (the
+		* "License"); you may not use this file except in compliance
+		* with the License.  You may obtain a copy of the License at
+		* 
+		*   http://www.apache.org/licenses/LICENSE-2.0
+		* 
+		* Unless required by applicable law or agreed to in writing,
+		* software distributed under the License is distributed on an
+		* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+		* KIND, either express or implied.  See the License for the
+		* specific language governing permissions and limitations
+		* under the License.
+	--&gt;
+
+  &lt;xs:element name="conceptSet"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:sequence&gt;
+		&lt;xs:element ref="concept" minOccurs="0"	maxOccurs="unbounded"/&gt;
+	  &lt;/xs:sequence&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="concept"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:sequence&gt;
+		&lt;xs:element ref="rules" minOccurs="1" maxOccurs="1"/&gt;
+		&lt;xs:element ref="createAnnotations" minOccurs="1" maxOccurs="1"/&gt;
+	  &lt;/xs:sequence&gt;
+	  &lt;xs:attribute name="name" type="xs:string" use="optional"/&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="createAnnotations"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:sequence&gt;
+		&lt;xs:element ref="annotation" minOccurs="1" maxOccurs="unbounded"/&gt;
+	  &lt;/xs:sequence&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="rules"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:sequence&gt;
+		&lt;xs:element ref="rule" minOccurs="1" maxOccurs="unbounded"/&gt;
+	  &lt;/xs:sequence&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="rule"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:all&gt;
+		&lt;xs:element ref="matchTypeFilter" minOccurs="0"	maxOccurs="1"/&gt;
+		&lt;xs:element ref="updateMatchTypeAnnotation" minOccurs="0" maxOccurs="1"/&gt;
+		&lt;xs:element ref="ruleExceptions" minOccurs="0" maxOccurs="1"/&gt;
+	  &lt;/xs:all&gt;
+	  &lt;xs:attribute name="regEx" type="xs:string" use="required"/&gt;
+	  &lt;xs:attribute name="matchStrategy" use="required"&gt;
+	    &lt;xs:simpleType&gt;
+		  &lt;xs:restriction base="xs:string"&gt;
+		    &lt;xs:enumeration value="matchFirst"/&gt;
+			&lt;xs:enumeration value="matchAll"/&gt;
+			&lt;xs:enumeration value="matchComplete"/&gt;
+		  &lt;/xs:restriction&gt;
+		&lt;/xs:simpleType&gt;
+	  &lt;/xs:attribute&gt;
+	  &lt;xs:attribute name="matchType" type="xs:string" use="required"/&gt;
+	  &lt;xs:attribute name="featurePath" type="xs:string" use="optional" /&gt;
+	  &lt;xs:attribute name="ruleId" type="xs:string" use="optional"/&gt;
+	  &lt;xs:attribute name="confidence" type="xs:decimal"	use="optional"/&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="matchTypeFilter"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:sequence&gt;
+		&lt;xs:element ref="feature" minOccurs="0"	maxOccurs="unbounded"/&gt;
+	  &lt;/xs:sequence&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="ruleExceptions"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:sequence&gt;
+	    &lt;xs:element ref="exception" minOccurs="0" maxOccurs="unbounded"/&gt;
+	  &lt;/xs:sequence&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="exception"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:simpleContent&gt;
+		&lt;xs:extension base="xs:string"&gt;
+		  &lt;xs:attribute name="matchType" type="xs:string" use="required"/&gt;
+		&lt;/xs:extension&gt;
+	  &lt;/xs:simpleContent&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="feature"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:simpleContent&gt;
+		&lt;xs:extension base="xs:string"&gt;
+		  &lt;xs:attribute name="featurePath" type="xs:string" use="required"/&gt;
+		&lt;/xs:extension&gt;
+	  &lt;/xs:simpleContent&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="annotation"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:sequence&gt;
+		&lt;xs:element ref="begin" minOccurs="1" maxOccurs="1"/&gt;
+		&lt;xs:element ref="end" minOccurs="1" maxOccurs="1"/&gt;
+		&lt;xs:element ref="setFeature" minOccurs="0" maxOccurs="unbounded"/&gt;
+	  &lt;/xs:sequence&gt;
+	  &lt;xs:attribute name="id" type="xs:string" use="optional"/&gt;
+	  &lt;xs:attribute name="type" type="xs:string" use="required"/&gt;
+	  &lt;xs:attribute name="validate" type="xs:string" use="optional" /&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="updateMatchTypeAnnotation"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:sequence&gt;
+	    &lt;xs:element ref="setFeature" minOccurs="0" maxOccurs="unbounded"/&gt;
+	  &lt;/xs:sequence&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="begin"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:attribute name="group" use="required" type="xs:integer"/&gt;
+	  &lt;xs:attribute name="location" use="optional" default="start"&gt;
+	    &lt;xs:simpleType&gt;
+	      &lt;xs:restriction base="xs:string"&gt;
+		    &lt;xs:enumeration value="start"/&gt;
+		    &lt;xs:enumeration value="end"/&gt;
+		  &lt;/xs:restriction&gt;
+	    &lt;/xs:simpleType&gt;
+	  &lt;/xs:attribute&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="end"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:attribute name="group" use="required" type="xs:integer"/&gt;
+	  &lt;xs:attribute name="location" use="optional" default="end"&gt;
+		&lt;xs:simpleType&gt;
+		  &lt;xs:restriction base="xs:string"&gt;
+		    &lt;xs:enumeration value="start"/&gt;
+			&lt;xs:enumeration value="end"/&gt;
+		  &lt;/xs:restriction&gt;
+		&lt;/xs:simpleType&gt;
+	  &lt;/xs:attribute&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+
+  &lt;xs:element name="setFeature"&gt;
+	&lt;xs:complexType&gt;
+	  &lt;xs:simpleContent&gt;
+		&lt;xs:extension base="xs:string"&gt;
+		  &lt;xs:attribute name="name" type="xs:string" use="required"/&gt;
+		  &lt;xs:attribute name="type" use="required"&gt;
+		    &lt;xs:simpleType&gt;
+			  &lt;xs:restriction base="xs:string"&gt;
+			    &lt;xs:enumeration value="String"/&gt;
+				&lt;xs:enumeration value="Integer"/&gt;
+				&lt;xs:enumeration value="Float"/&gt;
+				&lt;xs:enumeration value="Reference"/&gt;
+				&lt;xs:enumeration value="Confidence"/&gt;
+				&lt;xs:enumeration value="RuleId"/&gt;
+			  &lt;/xs:restriction&gt;
+			&lt;/xs:simpleType&gt;
+		  &lt;/xs:attribute&gt;
+		  &lt;xs:attribute name="normalization" use="optional"&gt;
+		    &lt;xs:simpleType&gt;
+			  &lt;xs:restriction base="xs:string"&gt;
+			    &lt;xs:enumeration value="Custom" /&gt;
+				&lt;xs:enumeration value="ToLowerCase" /&gt;
+				&lt;xs:enumeration value="ToUpperCase" /&gt;
+				&lt;xs:enumeration value="Trim" /&gt;
+			  &lt;/xs:restriction&gt;
+			&lt;/xs:simpleType&gt;
+		  &lt;/xs:attribute&gt;
+		  &lt;xs:attribute name="class" type="xs:string" use="optional" /&gt;
+		&lt;/xs:extension&gt;
+	  &lt;/xs:simpleContent&gt;
+	&lt;/xs:complexType&gt;
+  &lt;/xs:element&gt;
+&lt;/xs:schema&gt;
+</pre><p>
+			  
+			</p></div><div class="appendix" lang="en" id="sandbox.regexAnnotator.Validation"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.Validation"></a>Appendix&nbsp;B.&nbsp;Validation Interface</h2></div></div></div><p>
+		</p><pre class="programlisting">/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.uima.annotator.regex.extension;
+
+
+/**
+ * The Validation interface is provided to implement a custom validator
+ * that can be used to validate regular expression matches before 
+ * they are added as annotations. 
+ */
+public interface Validation {
+	
+/**
+ * The validate method validates the covered text of an annotator and 
+ * returns true or false whether the annotation is correct or not. 
+ * The validate method is called between a rule match and the 
+ * annotation creation. The annotation is only created if the method
+ * returns true. 
+ * 
+ * @param coveredText covered text of the annotation that should be 
+ *        validated 
+ * @param ruleID ruleID of the rule which created the match
+ * 
+ * @return true if the annotation is valid or false if the annotation 
+ *         is invalid
+ * 
+ * @throws Exception throws an exception if an validation error occurred
+ */
+public boolean validate(String coveredText, String ruleID) 
+   throws Exception;
+
+}</pre><p>
+	</p></div><div class="appendix" lang="en" id="sandbox.regexAnnotator.Normalization"><div class="titlepage"><div><div><h2 class="title"><a name="sandbox.regexAnnotator.Normalization"></a>Appendix&nbsp;C.&nbsp;Normalization Interface</h2></div></div></div><p>
+		</p><pre class="programlisting">/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.uima.annotator.regex.extension;
+
+
+/**
+ * The Normalization interface was add to implement a custom normalization
+ * for feature values before they are assigned to an anntoation. 
+ */
+public interface Normalization {
+	
+/**
+ * Custom feature value normalization. This interface must be implemented 
+ * to perform a custom normalization on the given input string.
+ * 
+ * @param input input string which should be normalized
+ *
+ * @param ruleID rule ID of the matching rule
+ * 
+ * @return String - normalized input string 
+ */
+public String normalize(String input, String ruleID) throws Exception;
+}</pre><p>
+	</p></div></div></body></html>
\ No newline at end of file

Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css?rev=817775&view=auto
==============================================================================
--- incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css (added)
+++ incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/RegexAnnotatorUserGuide/css/stylesheet-html.css Tue Sep 22 19:07:01 2009
@@ -0,0 +1,302 @@
+/*
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+*/
+
+html {
+    padding:        0pt;
+    margin:         0pt;
+}
+
+body {
+    margin-top:     1em;
+    margin-bottom:  1em;
+    margin-left:    16%;
+    margin-right:   8%;
+    font-size: 10.5pt;
+    line-height: 1.3;
+    font-family:     "Palatino Linotype", "Times New Roman", Times, serif;
+}
+
+div {
+    margin:         0pt;
+}
+
+p {
+    text-align:     left;
+    margin-bottom:  .6em;
+    line-height:    1.4;
+}
+
+td {   line-height:    1.2;
+       padding: .3em;
+       }
+
+hr {
+    margin-top:     .6em;
+    margin-bottom:  .6em;
+    margin-left:    0pt;
+    margin-right:   0pt;
+    border:         1px solid gray;
+    background:     gray;
+}
+
+h2,h3,h4,h5 {
+  margin: 0 0 0.5em 0;
+  page-break-after: avoid;
+  font-family: Helvetica, Arial, sans-serif;
+  font-weight: bold;
+    color:          #525D76;
+}
+
+h2 {
+  margin-left: -10%; }
+
+h2, h3, h4 { margin-top: 1em; }
+
+/* later rules of same specificity override earlier ones */
+/* cant use ">" because IE doesn't recognize */
+
+div.chapter div.titlepage h2.title {
+  margin-bottom: 1.5em;
+  font-size: 1.6em;
+  letter-spacing: -0.07ex;
+  border-top:solid black 2.25pt;
+}
+
+/* this one comes after and is therefore more specific */
+
+div.section div.titlepage h2.title {  /* h2 */
+  font-size: 1.3em;
+  border-top:solid black 1.00pt;
+}
+
+h3 {
+  margin-left: -5%;
+  font-size: 1.2em;
+  border-top:solid black .75pt;
+}
+
+div.note h3, div.tip h3 {
+    margin-left: 0;
+    font-size: 1.2em;
+    border-top: none;
+    margin-top: 0em;
+}
+
+h4 {
+  font-size: 1.1em;
+}
+
+a {
+   text-decoration: underline;
+   /*color: black;*/
+}
+
+a:hover {
+   text-decoration: underline;
+   color: black;
+}
+
+h3,h4,h5 {
+    line-height:    1.3;
+    margin-top:     1.5em;
+    font-family:    Arial, Sans-serif;
+}
+
+h1.title {
+    text-align:     left;
+
+    margin-top:     2em;
+    margin-bottom:  2em;
+    margin-left:    0pt;
+    margin-right:   0pt;
+}
+
+h2.subtitle, h3.subtitle {
+    text-align:     left;
+    margin-top:     2em;
+    margin-bottom:  2em;
+    text-transform: uppercase;
+}
+
+h3.author, p.othercredit {
+    font-size:      0.9em;
+    font-weight:    normal;
+    font-style:     oblique;
+    text-align:     left;
+    color:          #525D76;
+}
+
+td.tableSubhead {
+    font-weight:    bold;
+    background-color: silver;
+}
+
+div.titlepage {
+}
+
+div.section {
+}
+
+
+div.authorgroup
+{
+    text-align:     left;
+    margin-bottom:  3em;
+    display:        block;
+}
+
+div.toc, div.list-of-examples, div.list-of-figures {
+
+    margin-bottom:  3em;
+}
+
+
+div.itemizedlist {
+    margin-top:     0.5em;
+    margin-bottom:  0.5em;
+}
+
+ol,ul {
+}
+
+li {
+}
+
+pre {
+    margin:         .75em 0;
+    line-height:    1.25;
+    color:          black;
+}
+
+pre.programlisting {
+    font-size:      9pt;
+    padding:        5pt 2pt;
+    border:         1pt solid black;
+    background:     #eeeeee;
+}
+
+div.table {
+    margin:         1em;
+    padding:        0.5em;
+    text-align:     center;
+}
+
+div.table table {
+ /*    display:        block; */   /* in firefox, breaks centering */
+    margin-left: auto;  /* see http://theodorakis.net/tablecentertest.html */
+    margin-right: auto;
+}
+
+div.table td {
+    padding-right:  5px;
+    padding-left:   5px;
+}
+
+div.table p.title {
+    text-align:     center;
+    margin-left:    5%;
+    margin-right:   5%;
+}
+
+p.releaseinfo, .copyright {
+    font-size:      0.9em;
+    text-align:     left;
+    margin:         0px;
+    padding:        0px;
+}
+
+div.note, div.important, div.example, div.informalexample, div.tip, div.caution {
+    margin:         1em;
+    padding:        0.5em;
+    border:         1px solid gray;
+    background-color: #f8f8e0;
+}
+
+div.important th, div.note th, div.tip th {
+    text-align:     left;
+    border-bottom:  solid 1px gray;
+}
+
+div.navheader, div.navheader table {
+    font-family:    sans-serif;
+    font-size:      12px;
+}
+
+div.navfooter, div.navfooter table {
+    font-family:    sans-serif;
+    font-size:      12px;
+}
+
+div.figure, div.screenshot {
+    text-align:     center;  /* needed for ms5 */
+    margin-top:     1em;
+    margin-bottom:  1em;
+}
+
+div.figure table, div.screenshot table {    /* see http://theodorakis.net/tablecentertest.html */
+    margin-left: auto;
+    margin-right: auto;
+}
+
+div.figure p.title {
+    text-align:     center; 
+    margin-left:    15%;
+    margin-right:   15%;
+}
+
+div.example p.title {
+    margin-top:     0em;
+    margin-bottom:  0.6em;
+    text-align:     left;
+    padding-bottom: 0.4em;
+    border-bottom:  solid 1px gray;
+}
+
+div.figure img {
+    border:         1px solid gray;
+    padding:        0.5em;
+    margin:         0.5em;
+}
+
+div.revhistory {
+    font-size:      0.8em;
+    width:          90%;
+    margin-left:    5%;
+    margin-top:     3em;
+    margin-bottom:  3em;
+}
+
+div.revhistory table {
+    font-family:    sans-serif;
+    font-size:      12px;
+  border-collapse: collapse;
+}
+
+div.revhistory table tr {
+  border:         solid 1px gray;
+}
+
+div.revhistory table th {
+  border: none;
+}
+
+span.bold-italic {
+  font-weight: bold;
+  font-style: italic;
+}

Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/blank.png
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/blank.png?rev=817775&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/blank.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.gif
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.gif?rev=817775&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.png
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.png?rev=817775&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/1.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/10.gif
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docs/html/images/callouts/10.gif?rev=817775&view=auto
==============================================================================
Binary file - no diff available.