You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by mb...@apache.org on 2007/09/11 17:50:19 UTC
svn commit: r574632 -
/incubator/uima/sandbox/trunk/SandboxDocs/src/docbook/users_guide_and_reference/regexAnnotator.xml
Author: mbaessler
Date: Tue Sep 11 08:50:18 2007
New Revision: 574632
URL: http://svn.apache.org/viewvc?rev=574632&view=rev
Log:
UIMA-555
update RegexAnnotator documentation
https://issues.apache.org/jira/browse/UIMA-555
Modified:
incubator/uima/sandbox/trunk/SandboxDocs/src/docbook/users_guide_and_reference/regexAnnotator.xml
Modified: incubator/uima/sandbox/trunk/SandboxDocs/src/docbook/users_guide_and_reference/regexAnnotator.xml
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/SandboxDocs/src/docbook/users_guide_and_reference/regexAnnotator.xml?rev=574632&r1=574631&r2=574632&view=diff
==============================================================================
--- incubator/uima/sandbox/trunk/SandboxDocs/src/docbook/users_guide_and_reference/regexAnnotator.xml (original)
+++ incubator/uima/sandbox/trunk/SandboxDocs/src/docbook/users_guide_and_reference/regexAnnotator.xml Tue Sep 11 08:50:18 2007
@@ -680,8 +680,336 @@
</para>
</section>
- </section>
+ </section>
+ </section>
+ <section id="sandbox.regexAnnotator.annotatorDescriptor">
+ <title>Annotator Descriptor</title>
+ <para>The RegexAnnotator analysis engine descriptor contains some processing information about
+ the annotator. These processing information are specified as parameters and external resource dependencies.
+ In this chapter we will look in detail at the descriptor settings.
+ </para>
+ <section id="sandbox.regexAnnotator.annotatorDescriptor.configParam">
+ <title>Configuration Parameters</title>
+ <para>
+ The RegexAnnotator has the following configuration parameters that can affect the processing:
+ </para>
+ <para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ <code>ProcessAllConceptRules</code>
+ - If this parameter is set to true, all rules of a concept are processed.
+ If this parameter is set to false, the rules are processed by confidence
+ (highest confidence value first) and the processing stops after the first
+ rule where matches are available.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </section>
+ <section id="sandbox.regexAnnotator.annotatorDescriptor.externalResource">
+ <title>External Resources</title>
+ <para>
+ To specify the concept file that contains all the concepts and rules the
+ RegexAnnotator should process an external resource binding is used.
+ The important section in the descriptor where the external resource
+ is specified is shown below.
+ </para>
+ <para>
+ <programlisting><![CDATA[
+<externalResources>
+ <externalResource>
+ <name>RegexConceptsFile</name>
+ <description>Regex Concepts file</description>
+ <fileResourceSpecifier>
+ <fileUrl>file:concepts.xml</fileUrl>
+ </fileResourceSpecifier>
+ <implementationName>org.apache.uima.annotator.regex.impl.FileResource_impl</implementationName>
+ </externalResource>
+</externalResources>
+]]></programlisting>
+ </para>
+ <para>
+ The <code><fileUrl></code> element contains the file URL of the concept file.
+ The given URL have to be available in the UIMA datapath or in the classpath.
+ </para>
+ </section>
+ <section id="sandbox.regexAnnotator.annotatorDescriptor.capabilities">
+ <title>Capabilities</title>
+ <para>
+ In the capabilities section of the RegexAnnotator descriptor the input and output
+ capabilities and the supported languages have to be defined.
+ </para>
+ <para>
+ The input capabilities defined
+ in the descriptor have to comply with the match types used in the concept rule file
+ that is used. For example the <code>uima.SentenceAnnotation</code> use in the rule
+ below must be added to the input capability section in the RegexAnnotator descriptor.
+ </para>
+ <para>
+ <programlisting><![CDATA[
+<rules>
+ <rule regEx="RestRegex" matchStrategy="matchAll" matchType="uima.SentenceAnnotation"/>
+</rules>
+]]></programlisting>
+ </para>
+ <para>
+ In the output section, all of the annotation types and features created by
+ the RegexAnnotator have to be specified. These have to match the
+ output types and features declared in the <code><annotation></code> elements of the concept file.
+ For example the <code>org.apache.uima.TestAnnot</code> annotation and the
+ <code>org.apache.uima.TestAnnot:testFeature</code> feature used below must
+ be added to the output capability section in the RegexAnnotator descriptor.
+ </para>
+ <para>
+ <programlisting><![CDATA[
+<createAnnotations>
+ <annotation id="testannotation" type="org.apache.uima.TestAnnot">
+ <begin group="0"/>
+ <end group="0"/>
+ <setFeature name="testFeature" type="String">$0</setFeature>
+ </annotation>
+</createAnnotations>
+]]></programlisting>
+ </para>
+ <para>
+ If there are any language dependent rules in the concept file the supported languages abbreviations
+ have to be specified in the <code><languagesSupported></code>element. If there are no
+ language dependent rules available you can specify <code>x-unspecified</code> as language. That means
+ that the annotator can work on all languages.
+ </para>
+ <para>
+ For the short examples used above the capabilities section in the RegexAnnotator
+ descriptor looks like:
+ </para>
+ <para>
+ <programlisting><![CDATA[
+<capabilities>
+ <capability>
+ <inputs>
+ <type>uima.SentenceAnnotation</type>
+ </inputs>
+ <outputs>
+ <type>org.apache.uima.TestAnnot</type>
+ <feature>org.apache.uima.TestAnnot:testFeature</feature>
+ </outputs>
+ <languagesSupported>
+ <language>x-unspecified</language>
+ </languagesSupported>
+ </capability>
+</capabilities>
+]]></programlisting>
+ </para>
+ </section>
+ </section>
+ <section id="sandbox.regexAnnotator.xsd">
+ <title>Concept File Schema</title>
+ <para>The concept file schema looks like:
+ </para>
+ <para>
+ <programlisting><![CDATA[
+<?xml version="1.0" encoding="UTF-8"?>
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
+ <!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ -->
+
+ <xs:element name="conceptSet">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="concept" minOccurs="1" maxOccurs="unbounded" />
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="concept">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="rules" minOccurs="1" maxOccurs="1" />
+ <xs:element ref="createAnnotations" minOccurs="1" maxOccurs="1" />
+ </xs:sequence>
+ <xs:attribute name="name" type="xs:ID" use="optional" />
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="createAnnotations">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="annotation" minOccurs="1" maxOccurs="unbounded" />
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="rules">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="rule" minOccurs="1" maxOccurs="unbounded" />
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="rule">
+ <xs:complexType>
+ <xs:all>
+ <xs:element ref="matchTypeFilter" minOccurs="0" maxOccurs="1" />
+ <xs:element ref="updateMatchTypeAnnotation" minOccurs="0" maxOccurs="1" />
+ <xs:element ref="ruleExceptions" minOccurs="0" maxOccurs="1" />
+ </xs:all>
+ <xs:attribute name="regEx" type="xs:string" use="required" />
+ <xs:attribute name="matchStrategy" use="required">
+ <xs:simpleType>
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="matchFirst" />
+ <xs:enumeration value="matchAll" />
+ <xs:enumeration value="matchComplete" />
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ <xs:attribute name="matchType" type="xs:string" use="required" />
+ <xs:attribute name="ruleId" type="xs:ID" use="optional" />
+ <xs:attribute name="confidence" type="xs:decimal" use="optional" />
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="matchTypeFilter">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="feature" minOccurs="0" maxOccurs="unbounded" />
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="ruleExceptions">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="exception" minOccurs="0" maxOccurs="unbounded" />
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="exception">
+ <xs:complexType>
+ <xs:simpleContent>
+ <xs:extension base="xs:string">
+ <xs:attribute name="matchType" type="xs:string" use="required" />
+ </xs:extension>
+ </xs:simpleContent>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="feature">
+ <xs:complexType>
+ <xs:simpleContent>
+ <xs:extension base="xs:string">
+ <xs:attribute name="name" type="xs:string" use="required" />
+ </xs:extension>
+ </xs:simpleContent>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="annotation">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="begin" minOccurs="1" maxOccurs="1" />
+ <xs:element ref="end" minOccurs="1" maxOccurs="1" />
+ <xs:element ref="setFeature" minOccurs="0" maxOccurs="unbounded" />
+ </xs:sequence>
+ <xs:attribute name="id" type="xs:ID" use="required" />
+ <xs:attribute name="type" type="xs:string" use="required" />
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="updateMatchTypeAnnotation">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="setFeature" minOccurs="0" maxOccurs="unbounded" />
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="begin">
+ <xs:complexType>
+ <xs:attribute name="group" use="required">
+ <xs:simpleType>
+ <xs:restriction base="xs:integer">
+ <xs:minInclusive value="0" />
+ <xs:maxInclusive value="9" />
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ <xs:attribute name="location" use="optional" default="start">
+ <xs:simpleType>
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="start" />
+ <xs:enumeration value="end" />
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="end">
+ <xs:complexType>
+ <xs:attribute name="group" use="required">
+ <xs:simpleType>
+ <xs:restriction base="xs:integer">
+ <xs:minInclusive value="0" />
+ <xs:maxInclusive value="9" />
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ <xs:attribute name="location" use="optional" default="end">
+ <xs:simpleType>
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="start" />
+ <xs:enumeration value="end" />
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:element name="setFeature">
+ <xs:complexType>
+ <xs:simpleContent>
+ <xs:extension base="xs:string">
+ <xs:attribute name="name" type="xs:string" use="required" />
+ <xs:attribute name="type" use="required">
+ <xs:simpleType>
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="String" />
+ <xs:enumeration value="Integer" />
+ <xs:enumeration value="Float" />
+ <xs:enumeration value="Reference" />
+ <xs:enumeration value="Confidence" />
+ <xs:enumeration value="RuleId" />
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:extension>
+ </xs:simpleContent>
+ </xs:complexType>
+ </xs:element>
+</xs:schema>
+]]></programlisting>
+
+ </para>
</section>
</section>