You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by mb...@apache.org on 2007/09/14 17:12:12 UTC

svn commit: r575717 - /incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml

Author: mbaessler
Date: Fri Sep 14 08:12:12 2007
New Revision: 575717

URL: http://svn.apache.org/viewvc?rev=575717&view=rev
Log:
UIMA-555

update RegexAnnotator documentation

https://issues.apache.org/jira/browse/UIMA-555

Modified:
    incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml

Modified: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml?rev=575717&r1=575716&r2=575717&view=diff
==============================================================================
--- incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml (original)
+++ incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml Fri Sep 14 08:12:12 2007
@@ -26,51 +26,67 @@
 
 <book lang="en">
 
-<title>Apache UIMA RegexAnnotator Documentation</title>
+	<title>
+		Apache UIMA Regular Expression Annotator Documentation
+	</title>
 
-<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../../SandboxDocs/src/docbook/book_info.xml"/>	
+	<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
+		href="../../../SandboxDocs/src/docbook/book_info.xml" />
 
-<chapter id="sandbox.regexAnnotator">
-	<title>Regular Expression Annotator</title>
-
-	<para>
-		The Regular Expression Annotator (RegexAnnotator) is an Apache
-		UIMA analysis engine that detects entities based on regular
-		expressions or concepts. A regular expression describe precise
-		patterns that are looked for in the document text. A concepts in
-		the current sense is a set of regular expressions that work
-		together to detect a more complex entity. The defined regular
-		expressions or concepts are used to detect entities like
-		numbers, email addresses or URLs and create annotations for
-		them.
-	</para>
+	<preface>
+		<title>Introduction</title>
+		<para>
+			The Regular Expression Annotator (RegexAnnotator) is an
+			Apache UIMA analysis engine that detects entities such as
+			email addresses, URLs, phone numbers, zip codes or any other
+			entity that can be specified using a regular expression. For
+			each entity that is detected an own annotation can be
+			created or an already existing annotation can be updated
+			with new features.
+
+			To detect also more difficult and complex entities, the
+			annotator provides some advanced filter capabilities and a
+			rule definition syntax that can combine rules to a concept
+			with a confidence value for each of the concept's rules.
+		</para>
+	</preface>
 
-	<section id="sandbox.regexAnnotator.processingOverview">
+	<chapter id="sandbox.regexAnnotator.processingOverview">
 		<title>Processing Overview</title>
 		<para>
-			To detect entities the RegexAnnotator must be configured
-			using an external XML file. We call this file concepts file
-			since it contains the concepts and regular expression rules
-			that the annotator use to detect the entities. This
-			configuration contains additional to the rules and concepts
-			also the annotations that should be created if an entity was
-			found in the document text. The types and features used to
-			create the annotations must be defined in the UIMA type
-			system.
+			To detect any kind of entity the RegexAnnotator must be
+			configured using an external XML file. We call this file
+			"concepts file" since it contains the regular expressions
+			and concepts that the annotator use during its processing to
+			detect entities. In addition to the rules the concept file
+			also contains the result processing that is done if an
+			entity was detected. The result processing can be the
+			creation of new annotations or an update of an existing
+			annotation with additional features. The types and features
+			used to create new annotations must be defined in the UIMA
+			type system.
 		</para>
 		<para>
-			After the configuration is done, the RegexAnnotator is ready
-			to use. During is initialization is reads the concepts file
-			and checks if all rules and concepts are valid and if all
-			annotations types are defined. If no error exists the
-			processing can start. During the processing the rules are
-			processed in the same order as defined in the concepts xml
-			document. The results of a preceding rule can be used for
-			the following one.
+			After the concept file is created, the annotator XML
+			descriptor must be updated with the capabilities and type
+			system information from the concept file. This update is
+			necessary that the UIMA framework can call the annotator
+			also in complex annotator flows if the annotator is
+			assembled with others to an analysis bundle.
 		</para>
-	</section>
-
-	<section id="sandbox.regexAnnotator.conceptsFile">
+		<para>
+			Now the RegexAnnotator is ready to use. During the annotator
+			initialization the annotator reads the concept file and
+			checks if all rules and concepts are valid and if all
+			annotations types are defined in the UIMA type system. If
+			no error occurs the document processing can be started.
+			For each document that is processed the rules are executed in
+			the same order as defined in the concept file. The results
+			and annotations created for a preceding rule are used by the
+			following one.
+		</para>
+	</chapter>
+	<chapter id="sandbox.regexAnnotator.conceptsFile">
 		<title>Concepts Configuration File</title>
 		<para>
 			The RegexAnnotator can be configured using two levels of
@@ -691,7 +707,8 @@
 				</section>
 			</section>			
 		</section>
-		<section id="sandbox.regexAnnotator.annotatorDescriptor">
+</chapter>
+<chapter id="sandbox.regexAnnotator.annotatorDescriptor">
 			<title>Annotator Descriptor</title>
 			<para>The RegexAnnotator analysis engine descriptor contains some processing information about 
 			the annotator. These processing information are specified as parameters and external resource dependencies. 
@@ -811,8 +828,8 @@
 ]]></programlisting>
 				</para>
 			</section>
-		</section>
-		<section id="sandbox.regexAnnotator.xsd">
+</chapter>
+<appendix id="sandbox.regexAnnotator.xsd">
 			<title>Concept File Schema</title>
 			<para>The concept file schema looks like:
 			</para>
@@ -1019,9 +1036,7 @@
 ]]></programlisting>
 			  
 			</para>
-		</section>
-	</section>
 
-</chapter>
+</appendix>
 
 </book>