You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by mb...@apache.org on 2007/09/18 16:05:13 UTC
svn commit: r576926 -
/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml
Author: mbaessler
Date: Tue Sep 18 07:05:13 2007
New Revision: 576926
URL: http://svn.apache.org/viewvc?rev=576926&view=rev
Log:
UIMA-555
update RegexAnnotator documentation
https://issues.apache.org/jira/browse/UIMA-555
Modified:
incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml
Modified: incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml
URL: http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml?rev=576926&r1=576925&r2=576926&view=diff
==============================================================================
--- incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml (original)
+++ incubator/uima/sandbox/trunk/RegularExpressionAnnotator/docbook/RegexAnnotatorUserGuide/regexAnnotatorUserGuide.xml Tue Sep 18 07:05:13 2007
@@ -56,36 +56,38 @@
<para>
To detect any kind of entity the RegexAnnotator must be
configured using an external XML file. We call this file
- "concept file" since it contains the regular expressions
- and concepts that the annotator use during its processing to
+ "concept file" since it contains the regular expressions and
+ concepts that the annotator use during its processing to
detect entities. In addition to the rules the concept file
- also contains the "entity result processing" that is done if an
- entity was detected. The "entity result processing" can either be the
- creation of new annotations or an update of an existing
- annotation with additional features. The types and features that are
- used to create new annotations have to be available in the UIMA
- type system.
+ also contains the "entity result processing" that is done if
+ an entity was detected. The "entity result processing" can
+ either be the creation of new annotations or an update of an
+ existing annotation with additional features. The types and
+ features that are used to create new annotations have to be
+ available in the UIMA type system.
</para>
<para>
After the concept file is created, the annotator XML
- descriptor have to be updated with the capabilities and maybe with the type
- system information from the concept file. The capability update is
- necessary that the UIMA framework can call the annotator
- also in complex annotator flows if the annotator is
- assembled with others to an analysis bundle. The UIMA type system
- update is only necessary if the used types are not available in
- the UIMA type system definition.
+ descriptor have to be updated with the capabilities and
+ maybe with the type system information from the concept
+ file. The capability update is necessary that the UIMA
+ framework can call the annotator also in complex annotator
+ flows if the annotator is assembled with others to an
+ analysis bundle. The UIMA type system update is only
+ necessary if the used types are not available in the UIMA
+ type system definition.
</para>
<para>
- With the completion of the descriptor updates,
- the RegexAnnotator is ready to use. When starting the annotator,
- during the initialization the annotator reads the concept file and
- checks if all rules and concepts are valid and if all
- annotations types are defined in the UIMA type system.
- For each document that is processed the rules and concepts are executed in
- exactly the same order as defined in the concept file. The results
- and annotations created for a preceding rule are used by the
- following one since they are stored in the CAS.
+ With the completion of the descriptor updates, the
+ RegexAnnotator is ready to use. When starting the annotator,
+ during the initialization the annotator reads the concept
+ file and checks if all rules and concepts are valid and if
+ all annotations types are defined in the UIMA type system.
+ For each document that is processed the rules and concepts
+ are executed in exactly the same order as defined in the
+ concept file. The results and annotations created for a
+ preceding rule are used by the following one since they are
+ stored in the CAS.
</para>
</chapter>
<chapter id="sandbox.regexAnnotator.conceptsFile">
@@ -96,31 +98,33 @@
</para>
<para>
The RuleSet definition is the easier way to define rules.
- Such a definition consists of a regular expression pattern and of
- annotations that should be created if the rule match an
- entity.
+ Such a definition consists of a regular expression pattern
+ and of annotations that should be created if the rule match
+ an entity.
</para>
<para>
The Concept definition is the more complex way to define
- rules. Such a definition can consists of more than one regular
- expression rule that can be combined together and of a set
- of annotations that should be created if one of the
+ rules. Such a definition can consists of more than one
+ regular expression rule that can be combined together and of
+ a set of annotations that should be created if one of the
rules has matched an entity.
</para>
<para>
The syntax for both definitions is the same, so you don't
- need to learn two configuration possibilities. The RuleSet
- definition is just available to have an easier and faster way to
- configure the annotator for simple tasks.
- If you have a RuleSet definition it is also possible to extend it
- with more and more features so that it becomes a
- real Concept definition.
+ need to learn two configuration possibilities. The RuleSet
+ definition is just available to have an easier and faster
+ way to configure the annotator for simple tasks. If you have
+ a RuleSet definition it is also possible to extend it with
+ more and more features so that it becomes a real Concept
+ definition.
</para>
<section id="sandbox.regexAnnotator.conceptsFile.rules">
<title>RuleSet definition</title>
- <para>The syntax of a simple RuleSet definition for the
- RegexAnnotator is shown in the listing below:</para>
+ <para>
+ The syntax of a simple RuleSet definition for the
+ RegexAnnotator is shown in the listing below:
+ </para>
<para>
<programlisting><![CDATA[
@@ -192,7 +196,7 @@
</para>
<para>
- As you can see the Concept definition is a complex
+ As you can see the Concept definition is a more complex
RuleSet definition. The main differences are some additional
features defined at the rule and the combination of rules
within one concept.
@@ -229,7 +233,7 @@
By default this parameter is set to <code>false</code>.
This means that the concept processing
starts with the first rule and goes on with the next one
- until a match was found. So in this processing maybe only the first rule
+ until a match was found. So in this processing mode, maybe only the first rule
of a concept is evaluated if there a match was found. The other rules
of this concept will be ignored in that case.
This strategy should be used for example if your first concept
@@ -246,18 +250,17 @@
id="sandbox.regexAnnotator.conceptsFile.rulesDefinition">
<title>Rule Definition</title>
<para>
- This paragraph shows in details how a rule is defined
- and what are the advanced configuration possibilities
- for the rule processing.
+ This paragraph shows in detail how to define a rule for a
+ RuleSet or Concept definition and give you some advanced
+ configuration possibilities for the rule processing.
</para>
<para>
The listing below shows a complex rule definition with
- all the possible features and details. Please refer to
- the sub sections for some details.
+ all the possible sub elements and details. Please refer to
+ the sub sections for the details about the sub elements.
</para>
- <para>
-
- <programlisting><![CDATA[
+ <para>
+ <programlisting><![CDATA[
<rule ruleId="ID1" regEx="TestRegex" matchStrategy="matchAll" matchType="uima.tcas.DocumentAnnotation" confidence="1.0">
<matchTypeFilter>
@@ -274,90 +277,82 @@
</rule>
]]></programlisting>
-
</para>
- <section
- id="sandbox.regexAnnotator.conceptsFile.rulesDefinition.rule">
- <title>Rule Definition Details</title>
- <para>
- The
- <code><rule></code>
- definition has three mandatory features, these are:
- </para>
+
+ <para>
+ For each rule that should be added a <code><rule></code> element
+ have to be created. The <code><rule></code> element definition has three
+ mandatory features, these are:
+ </para>
<para>
<itemizedlist>
<listitem>
<para>
<code>regEx</code>
- The regular expression pattern that
- should be used for this rule using the
- Java regular expression syntax.
+ is used for this rule. As pattern, everything supported
+ by the Java regular expression syntax is allowed.
</para>
</listitem>
<listitem>
<para>
<code>matchStrategy</code>
- - The match strategy that should be used
+ - The match strategy that is used
for this rule. Possible values are
<code>matchAll</code>
to get all matches,
<code>matchFirst</code>
- to get the first match and
+ to get the first match only and
<code>matchComplete</code>
- to get only matches if the whole input
- text matches the regEx pattern.
+ to get matches where the whole input
+ text match the regular expression pattern.
</para>
</listitem>
<listitem>
<para>
<code>matchType</code>
- - As match type the annotation type have
- to be specified where the covered text
- should be used as input text for the
- regEx pattern.
+ - The annotation type that is used
+ to match the regular expression pattern.
+ As input text for the match, the annotation span
+ is used.
</para>
</listitem>
</itemizedlist>
</para>
<para>
- Additionally the
- <code><rule></code>
- definition also has some optional features that can
- be set, these are:
+ In addition to the mandatory features the <code><rule></code>
+ element definition also has some optional features that can
+ be used, these are:
</para>
<itemizedlist>
<listitem>
<para>
<code>ruleId</code>
- Specifies an unique ID for the rule. This
- ID value can later be used to add it as
+ ID can later be used to add it as
value to an annotation feature (see
- <code><setFeature></code>
- ).
+ <xref linkend="sandbox.regexAnnotator.conceptsFile.annotationCreation.features"/>).
</para>
</listitem>
<listitem>
<para>
<code>confidence</code>
- Specifies the confidence value of this
- rule. Maybe you have more than one rule and
- use different patterns to describe the same
- entity, so you can classify the rules with
+ rule. If you have more than one rule that describes
+ the same complex entity you can classify the rules with
a confidence value. This confidence value
can later be used to add it as value to an
annotation feature (see
- <code><setFeature></code>
- ).
+ <xref linkend="sandbox.regexAnnotator.conceptsFile.annotationCreation.features"/>).
</para>
</listitem>
</itemizedlist>
- </section>
+
<section
id="sandbox.regexAnnotator.conceptsFile.rulesDefinition.filter">
<title>Match Type Filter</title>
<para>
-
- <programlisting><![CDATA[
+ <programlisting><![CDATA[
<matchTypeFilter>
<feature name="language">en</feature>
</matchTypeFilter>
@@ -366,87 +361,84 @@
</para>
<para>
- The match type filter construct can be used to
- filter the match type annotations before they are
- used for the evaluation. The
- <code><matchTypeFilter></code>
- element can contain one or more
- <code><feature></code>
- elements that contains filter information.
+ Match type filters can be used to filter the match type
+ annotations that are used for matching the regular expression
+ pattern. For example to use a rule only when the document language
+ is English, as shown in the example above.
+ Match type filters ever relate to the <code>matchType</code>
+ that was specified for the rule.
</para>
<para>
- The name of the UIMA feature is specified using the
- <code>name</code>
- feature of the
- <code><feature></code>
- element. The content of the
+ The <code><matchTypeFilter></code>
+ element can contain an arbitrary amount of
<code><feature></code>
- element contains the regular expression pattern that
- have to match the UIMA feature value. In the example
- above the match type annotation has a feature
- "language" that have to have the content "en". If that
- is true, the annotation is pass the filter
- condition.
+ elements that contains the filter information. But all specified features
+ have to be valid for the <code>matchType</code> annotation
+ of the rule.
</para>
- </section>
- <section
- id="sandbox.regexAnnotator.conceptsFile.rulesDefinition.update">
- <title>Update Match Type Annotation</title>
<para>
-
+ The name of the UIMA feature that should be used as
+ filter is specified using the <code>name</code> feature of the
+ <code><feature></code> element. The content of the
+ <code><feature></code> element contains the regular expression pattern
+ that is used as filter. This pattern
+ have to match the UIMA feature value of the match type annotation
+ that the filter pass. In the example
+ above the match type annotation has a UIMA feature called
+ <code>language</code> that have to have the content <code>en</code>. If that
+ is true, the annotation is pass the filter condition.
+ </para>
+ </section>
+ <section id="sandbox.regexAnnotator.conceptsFile.rulesDefinition.update">
+ <title>Update Match Type Annotations With Additional Features</title>
+ <para>
<programlisting><![CDATA[
<updateMatchTypeAnnotation>
<setFeature name="language" type="String">$0</setFeature>
</updateMatchTypeAnnotation>
]]></programlisting>
-
-
</para>
<para>
With the
<code><updateMatchTypeAnnotation></code>
- construct you can configure to update a UIMA feature
- value at the match type annotation if a rule match
+ construct it is possible to update or set a UIMA feature value
+ for the match type annotation in case a rule match
was found. The
- <code><updateMatchTypeAnnotation></code>
- can have one or more
- <code><setFeature></code>
- elements.
+ <code><updateMatchTypeAnnotation></code> element
+ can have an arbitrary amount of
+ <code><setFeature></code> elements that contains
+ the feature information that should be updated.
</para>
<para>
- The
- <code><setFeature></code>
- element has the two mandatory features, these are:
+ The <code><setFeature></code> element has two
+ mandatory features, these are:
</para>
<itemizedlist>
<listitem>
<para>
<code>name</code>
- Specifies the UIMA feature name that
- should be set at the match type annotation.
+ should be set. The feature have to be available
+ at the <code>matchType</code> annotation
+ of the rule.
</para>
</listitem>
<listitem>
<para>
<code>type</code>
- Specifies the UIMA feature type that is
- defined in the UIMA type system. Possible
- values are
- <code>String</code>
- ,
- <code>Integer</code>
- and
- <code>Float</code>
+ defined in the UIMA type system for this feature.
+ Currently supported feature types are <code>String</code>,
+ <code>Integer</code> and <code>Float</code>.
</para>
</listitem>
</itemizedlist>
<para>
- The content of the
- <code><setFeature></code>
- element contains the value that should be set. This
- can either be a literal value or it can be a regular
- expression matching group as shown in the example
- above. A combination of matching groups and literals
+ The content of the <code><setFeature></code>
+ element definition contains the feature value that should be set.
+ This can either be a literal value or a regular
+ expression capturing group as shown in the example
+ above. A combination of capturing groups and literals
is also possible.
</para>
</section>
@@ -466,14 +458,13 @@
<para>
With the
<code><ruleExceptions></code>
- construct you can configure exceptions to prevent matches for the current rule.
- An exception is something
- similar to a filter, but on the higher level. For
+ construct it is possible to configure exceptions to prevent matches for the rule.
+ An exception is something similar to a filter, but on the higher level. For
example take the scenario where you have several token annotations that
- are all covered by a sentence annotation. You have written a rule that can detect
+ are covered by a sentence annotation. You have written a rule that can detect
car brands. The text you analyze has the sentence "Henry Ford was born 1863".
When analyzing the text you will get a car brand annotation since "Ford" is
- a car brand. But is this behavior correct? The work around that issue
+ a car brand. But is this the correct behavior? The work around that issue
you can create an exception that looks like
<programlisting><![CDATA[
<ruleExceptions>
@@ -485,28 +476,26 @@
does not contain the word "Henry".
</para>
<para>
- The
- <code><ruleExceptions></code>
- element can have one or more exceptions specified with the
- <code><exception></code>
- elements.
+ The <code><ruleExceptions></code> element can have
+ an arbitrary amount of <code><exception></code>
+ elements to specify rule exceptions.
</para>
<para>
- The
- <code><exception></code>
+ The <code><exception></code>
element has one mandatory feature called
<code>matchType</code>. The <code>matchType</code> feature
specifies the annotation type the exception is based on.
- The exception annotation instance that is used during the runtime is evaluated for each
+ The concrete exception match type annotation that is used
+ during the runtime is evaluated for each
match type annotation that is used to match a rule. As
- exception annotation instance always the covering annotation
- of the match type annotation is searched.
- If no covering annotation was found the exception is not evaluated.
+ exception annotation always the covering annotation
+ of the current match type annotation is used.
+ If no covering annotation instance of the exception match type
+ was found the exception is not evaluated.
</para>
<para>
- The content of the
- <code><exception></code>
- element specify the regular expression that is used to evaluate the exception.
+ The content of the <code><exception></code>
+ element specifies the regular expression that is used to evaluate the exception.
</para>
<para>
If the exception match is true, the
@@ -518,8 +507,8 @@
<section id="sandbox.regexAnnotator.conceptsFile.annotationCreation">
<title>Annotation Creation</title>
<para>
- This paragraph explain with all the details how to create annotations if a rule has matched.
- The listing below shows the definition of an annotation with all possible settings.
+ This paragraph explains in detail how to create annotations if a rule has matched some input text.
+ An annotation creation example with all possible settings is shown in the listing below.
</para>
<para>
<programlisting><![CDATA[
@@ -535,10 +524,9 @@
</annotation>
]]></programlisting>
</para>
- <section id="sandbox.regexAnnotator.conceptsFile.annotationDefinition.details">
- <title>Annotation Definition Details</title>
+
<para>
- The <code><annotation></code> definition has two mandatory features, these are:
+ The <code><annotation></code> element has two mandatory features, these are:
</para>
<para>
<itemizedlist>
@@ -552,51 +540,57 @@
<listitem>
<para>
<code>type</code>
- - Specifies the UIMA annotation type that should be used if a match was found
- to create the annotation. The used type have to be specified in the UIMA type system.
+ - Specifies the UIMA annotation type that is used if an annotation is created.
+ The used type have to be defined in the UIMA type system.
</para>
</listitem>
</itemizedlist>
</para>
<para>
- The mandatory sub elements of <code><annotation></code> are:
+ The mandatory sub elements of the <code><annotation></code> element are:
</para>
<para>
<itemizedlist>
<listitem>
<para>
<code><begin></code>
- - Specifies the begin position of the annotation.
+ - Specifies the begin position of the annotation that is created.
+ For details about the <code><begin></code> element, please refer
+ to the <xref linkend="sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries"/>.
</para>
</listitem>
<listitem>
<para>
<code><end></code>
- - Specifies the end position of the annotation.
+ - Specifies the end position of the annotation that is created.
+ For details about the <code><end></code> element, please refer
+ to the <xref linkend="sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries"/>.
</para>
</listitem>
</itemizedlist>
</para>
<para>
- The optional sub elements of <code><annotation></code> are:
+ The optional sub elements of the <code><annotation></code> element are:
</para>
<para>
<itemizedlist>
<listitem>
<para>
<code><code><setFeature></code></code>
- - set a UIMA feature at the created annotation.
+ - set a UIMA feature for the created annotation.
+ For details about the <code><setFeature></code> element, please refer
+ to the <xref linkend="sandbox.regexAnnotator.conceptsFile.annotationCreation.features"/>
</para>
</listitem>
</itemizedlist>
</para>
- </section>
- <section id="sandbox.regexAnnotator.conceptsFile.annotationDefinition.boundaries">
+ <section id="sandbox.regexAnnotator.conceptsFile.annotationCreation.boundaries">
<title>Annotation Boundaries</title>
<para>
- The <code><annotation></code> element defines the annotations boundaries using the
+ When creating an annotation with the <code><annotation></code> element it is also
+ necessary to define the annotations boundaries. The annotation boundaries are defined using the
sub elements <code><begin></code> and <code><end></code>. The start position of
- an annotation is defined using the <code><begin></code> element. The end position using
+ the annotation is defined using the <code><begin></code> element, the end position using
the <code><end></code> element. Both elements have the same features as shown below:
</para>
<para>
@@ -604,44 +598,47 @@
<listitem>
<para>
<code>group</code>
- - identifies a capturing group within the regular expression pattern of the
- current rule. It can be assigned a single number from 0 to 9, where 0 denotes
- the whole match, 1 the first match group, 2 the second, and so on.
+ - identifies the capturing group number within the regular expression pattern for the
+ current rule. The value can be a single number from 0 to 9, where 0 denotes
+ the whole match, 1 the first capturing group, 2 the second one, and so on.
</para>
</listitem>
<listitem>
<para>
<code>location</code>
- - indicates a position inside the match group, which can either be the position
- of the left parenthesis in case of a value âstartâ, or the right parenthesis in
- case of a value âendâ. The <code>location</code> feature is optional. By default
- the <code><begin></code> element set <code>location="start"</code> and the
- <code><end></code> element <code>location="end"</code>.
+ - indicates a position inside the capturing group, which can either be the position
+ of the left parenthesis in case of a value <code>start</code>, or the right parenthesis in
+ case of a value <code>end</code>. The <code>location</code> feature is optional. By default
+ the <code><begin></code> element is set to <code>location="start"</code> and the
+ <code><end></code> element to <code>location="end"</code>.
</para>
</listitem>
</itemizedlist>
</para>
</section>
- <section id="sandbox.regexAnnotator.conceptsFile.annotationDefinition.features">
+ <section id="sandbox.regexAnnotator.conceptsFile.annotationCreation.features">
<title>Annotation Features</title>
<para>
- With the <code><setFeature></code> element of <code><annotation></code> it is
- possible to set UIMA features at the created annotation. The mandatory features
- that have to be set are:
+ With the <code><setFeature></code> element of <code><annotation></code> definition it is
+ possible to set UIMA features for the created annotation. The mandatory features
+ for the <code><setFeature></code> element are:
</para>
<para>
<itemizedlist>
<listitem>
<para>
<code>name</code>
- - specifies the UIMA feature name that should be set.
+ - specifies the UIMA feature name that should be set. The feature name have to
+ be a valid UIMA feature for this annotation and have to be defined in the
+ UIMA type system.
</para>
</listitem>
<listitem>
<para>
<code>type</code>
- specifies the type of the UIMA feature. For a list of all
- possible type values please refer to the feature types section below.
+ possible feature types please refer to
+ <xref linkend="sandbox.regexAnnotator.conceptsFile.annotationCreation.featureTypes"/>.
</para>
</listitem>
</itemizedlist>
@@ -649,13 +646,15 @@
<para>
The content of the <code><setFeature></code> element specifies the value of the
UIMA feature that is set. As value a literal, a capturing group or a combination of
- both can be specified.
+ both can be used.
</para>
- <section id="sandbox.regexAnnotator.conceptsFile.annotationDefinition.featureTypes">
+ <section id="sandbox.regexAnnotator.conceptsFile.annotationCreation.featureTypes">
<title>Features types</title>
<para>
- The <code><setFeature></code> element has a feature called <code>type</code>
- to specify the UIMA feature type. The possible feature types are listed below:
+ When setting UIMA feature for an annotation using the <code><setFeature></code> element
+ the feature type has to be specified according the the UIMA type system definition.
+ The feature at the <code><setFeature></code> element to do that is called <code>type</code>.
+ The list below shows all currently supported feature types:
</para>
<para>
<itemizedlist>
@@ -683,13 +682,14 @@
- to link a UIMA feature to another annotation. In this case the
UIMA feature type have to be the same as the referred annotation type.
To reference another annotation the <code><setFeature></code>
- content have to contain the annotation id of the referred annotation.
+ content have to contain the annotation <code>id</code> of the referred
+ annotation.
</para>
</listitem>
<listitem>
<para>
<code>Confidence</code>
- - add if available the value of the <code>confidence</code> feature defined
+ - to add the value of the <code>confidence</code> feature defined
at the <code><rule></code> element to this feature. The UIMA feature have to
be of type <code>uima.cas.Float</code>.
</para>
@@ -697,7 +697,7 @@
<listitem>
<para>
<code>RuleId</code>
- - add if available the value of the <code>ruleId</code> feature defined
+ - to add the value of the <code>ruleId</code> feature defined
at the <code><rule></code> element to this feature. The UIMA feature have to
be of type <code>uima.cas.String</code>.
</para>
@@ -712,8 +712,8 @@
<chapter id="sandbox.regexAnnotator.annotatorDescriptor">
<title>Annotator Descriptor</title>
<para>The RegexAnnotator analysis engine descriptor contains some processing information about
- the annotator. These processing information are specified as parameters and external resource dependencies.
- In this chapter we will look in detail at the descriptor settings.
+ the annotator. This processing information is specified as parameters and external resource dependencies.
+ This chapter we explain in detail the possible descriptor settings.
</para>
<section id="sandbox.regexAnnotator.annotatorDescriptor.configParam">
<title>Configuration Parameters</title>
@@ -725,10 +725,15 @@
<listitem>
<para>
<code>ProcessAllConceptRules</code>
- - If this parameter is set to true, all rules of a concept are processed.
- If this parameter is set to false, the rules are processed by confidence
- (highest confidence value first) and the processing stops after the first
- rule where matches are available.
+ - By default this parameter is set to <code>false</code>.
+ This means that the concept processing
+ starts with the first rule (highest confidence) and goes on with the next one
+ until a match was found. So in this processing mode, maybe only the first rule
+ of a concept is evaluated if there a match was found.
+ The other rules of this concept will be ignored in that case.
+ If the <code>ProcessAllConceptRules</code> parameter
+ is set to <code>true</code> all rules of a concept are processed
+ independent of the matches for a previous rule.
</para>
</listitem>
</itemizedlist>
@@ -737,8 +742,8 @@
<section id="sandbox.regexAnnotator.annotatorDescriptor.externalResource">
<title>External Resources</title>
<para>
- To specify the concept file that contains all the concepts and rules the
- RegexAnnotator should process an external resource binding is used.
+ To specify the concept file the RegexAnnotator should use
+ an external resource binding is used.
The important section in the descriptor where the external resource
is specified is shown below.
</para>
@@ -801,7 +806,7 @@
]]></programlisting>
</para>
<para>
- If there are any language dependent rules in the concept file the supported languages abbreviations
+ If there are any language dependent rules in the concept file the languages abbreviations
have to be specified in the <code><languagesSupported></code>element. If there are no
language dependent rules available you can specify <code>x-unspecified</code> as language. That means
that the annotator can work on all languages.
@@ -832,7 +837,7 @@
</chapter>
<appendix id="sandbox.regexAnnotator.xsd">
<title>Concept File Schema</title>
- <para>The concept file schema looks like:
+ <para>The concept file schema that is used to define the concept file looks like:
</para>
<para>
<programlisting><![CDATA[