You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2013/01/07 18:10:55 UTC
svn commit: r1429901 [1/2] -
/uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/
Author: pkluegl
Date: Mon Jan 7 17:10:54 2013
New Revision: 1429901
URL: http://svn.apache.org/viewvc?rev=1429901&view=rev
Log:
UIMA-2285
- fixed some typos
Modified:
uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.actions.xml
uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.basic_annotations.xml
uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.conditions.xml
uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.declarations.xml
uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.expressions.xml
uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.quantifier.xml
uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.syntax.xml
uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.xml
uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.overview.xml
Modified: uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.actions.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.actions.xml?rev=1429901&r1=1429900&r2=1429901&view=diff
==============================================================================
--- uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.actions.xml (original)
+++ uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.actions.xml Mon Jan 7 17:10:54 2013
@@ -31,8 +31,8 @@ under the License.
<title>ADD</title>
<para>
The ADD action adds all the elements of the passed
- TextMarkerExpressions to a given list. For example this expressions
- could be a string, an integer variable or a list itself. For a
+ TextMarkerExpressions to a given list. For example, this expressions
+ could be a string, an integer variable or a list. For a
complete overview on Textmarker expressions see
<xref linkend='ugr.tools.tm.language.expressions' />.
</para>
@@ -89,7 +89,7 @@ under the License.
<programlisting><![CDATA[Document{->ASSIGN(amount, (amount/2))};]]></programlisting>
</para>
<para>
- In this example, the value of the variable 'amount' is halved.
+ In this example, the value of the variable 'amount' is divided in half.
</para>
</section>
</section>
@@ -98,7 +98,7 @@ under the License.
<title>CALL</title>
<para>
The CALL action initiates the execution of a different script
- file or script block. Currently only complete script files are
+ file or script block. Currently, only complete script files are
supported.
</para>
<section>
@@ -130,7 +130,7 @@ under the License.
<title>CLEAR</title>
<para>
The CLEAR action removes all elements of the given list. If the list was initialized as it was declared,
- then it is reseted to its initial value.
+ then it is reset to its initial value.
</para>
<section>
<title>
@@ -157,15 +157,12 @@ under the License.
<title>COLOR</title>
<para>
The COLOR action sets the color of an annotation type in the
- modified view if the rule is fired. The background color is passed as
+ modified view, if the rule has fired. The background color is passed as
the second parameter. The font color can be changed by passing a
- further color as third parameter. By default annotations are not
- automatically selected when opening the modified view. This can be
- changed for the matched annotations by passing true as fourth
- parameter. By default The supported colors are: black, silver, gray,
+ further color as a third parameter. The supported colors are: black, silver, gray,
white, maroon, red, purple, fuchsia, green, lime, olive, yellow,
navy, blue, aqua, lightblue, lightgreen, orange, pink, salmon, cyan,
- violet, tan, brown, white, mediumpurple.
+ violet, tan, brown, white and mediumpurple.
</para>
<section>
<title>
@@ -185,7 +182,7 @@ under the License.
</para>
<para>
This rule colors all Headline annotations in the modified view.
- Thereby background color is set to red, font color is set to green
+ Thereby, the background color is set to red, font color is set to green
and all 'Headline' annotations are selected when opening the
modified view.
</para>
@@ -252,7 +249,7 @@ Document{->CONFIGURE(HtmlAnnotator, "onl
This rule counts the number of tokens of type ANY in a
Paragraph annotation and assigns the counted value to the int
variable 'cnt'. If the counted number is between 0 and 10000, a
- Headline annotation is created for this Paragraph. Moreover the
+ Headline annotation is created for this Paragraph. Moreover, the
feature named 'size' of Headline is set to the value of 'cnt'.
</para>
</section>
@@ -311,7 +308,7 @@ Document{->CONFIGURE(HtmlAnnotator, "onl
<programlisting><![CDATA[Document{->DYNAMICANCHORING(true)};]]></programlisting>
</para>
<para>
- The above example activates dynamic anchoring.
+ The above mentioned example activates dynamic anchoring.
</para>
</section>
</section>
@@ -320,10 +317,10 @@ Document{->CONFIGURE(HtmlAnnotator, "onl
<title>EXEC</title>
<para>
The EXEC action initiates the execution of a different script
- file or analysis engine on the complete input document independent of
+ file or analysis engine on the complete input document, independent from
the matched text and the current filtering settings. If the argument
refers to another script file, a new view on the document is created:
- the complete text of the original CAS and with the default filtering
+ the complete text of the original CAS with the default filtering
settings of the TextMarker analysis engine.
</para>
<section>
@@ -372,7 +369,7 @@ Document{->EXEC(NamedEntities)};]]></pro
</para>
<para>
Here, the number of tokens within an Headline annotation is
- counted an stored in variable 'tokenCount'. If the number of tokens
+ counted and stored in variable 'tokenCount'. If the number of tokens
is within the interval [0;10000], the FILL action fills the
Headline's feature 'size' with the value of 'tokenCount'.
</para>
@@ -403,7 +400,7 @@ Document{->EXEC(NamedEntities)};]]></pro
</para>
<para>
This rule filters all small written words in the input
- document. This means they are further ignored by any rules.
+ document. They are further ignored by every rule.
</para>
</section>
</section>
@@ -411,7 +408,7 @@ Document{->EXEC(NamedEntities)};]]></pro
<section id="ugr.tools.tm.language.actions.gather">
<title>GATHER</title>
<para>
- This action creates a complex structure, a annotation with
+ This action creates a complex structure: an annotation with
features. The optionally passed indexes (NumberExpressions after the
TypeExpression) can be used to create an annotation that spanns the
matched information of several rule elements. The features are
@@ -583,7 +580,7 @@ A B{-> GATHER(C, 1, 2, "a" = 1, "b" = 2)
<section id="ugr.tools.tm.language.actions.log">
<title>LOG</title>
<para>
- The LOG action simply writes a log message.
+ The LOG action writes a log message.
</para>
<section>
<title>
@@ -644,19 +641,19 @@ A B{-> GATHER(C, 1, 2, "a" = 1, "b" = 2)
<title>MARKFAST</title>
<para>
The MARKFAST action creates annotations of the given type (first
- parameter) if an element of the passed list (second parameter) occurs
- within the window of the matched annotation. Thereby the created
- annotation doesn't cover the whole matched annotation. Instead it
+ parameter), if an element of the passed list (second parameter) occurs
+ within the window of the matched annotation. Thereby, the created
+ annotation does not cover the whole matched annotation. Instead, it
only covers the text of the found occurence. The third parameter is
- optional. It defines if the MARKFAST action should ignore the case,
+ optional. It defines, whether the MARKFAST action should ignore the case,
whereby its default value is false. The optional fourth parameter
specifies a character threshold for the ignorence of the case. It is
- only relevant if the ignore-case value is set to true. The last
+ only relevant, if the ignore-case value is set to true. The last
parameter is set to true by default and specifies whether whitespaces
in the entries of the dictionary should be ignored. For more
information on lists see
- <xref linkend='ugr.tools.tm.language.declarations.ressource' />
- . Additionally to external word lists, string lists variables can be
+ <xref linkend='ugr.tools.tm.language.declarations.ressource' />.
+ Additionally to external word lists, string lists variables can be
used.
</para>
<section>
@@ -681,7 +678,7 @@ Document{-> MARKFAST(FirstName, FirstNam
</para>
<para>
This rule annotates all first names listed in the list
- 'FirstNameList' within the document and ignores the case if the
+ 'FirstNameList' within the document and ignores the case, if the
length of the word
is greater than 2.
</para>
@@ -720,7 +717,7 @@ Document{-> MARKFAST(FirstName, FirstNam
<title>MARKONCE</title>
<para>
The MARKONCE action has the same functionality as the MARK
- action, but creates a new annotation only if it does not yet exist.
+ action, but creates a new annotation only, if it is not yet existing.
</para>
<section>
<title>
@@ -739,7 +736,7 @@ Document{-> MARKFAST(FirstName, FirstNam
</para>
<para>
This rule matches on a free line followed by a Paragraph and
- annotates both in a single ParagraphAfterFreeline annotation if it
+ annotates both in a single ParagraphAfterFreeline annotation, if it
is not already annotated with ParagraphAfterFreeline annotation. The
two numerical expressions at the end of the MARKONCE action state
that the matched text of the first and the second rule elements are
@@ -752,10 +749,10 @@ Document{-> MARKFAST(FirstName, FirstNam
<title>MARKSCORE</title>
<para>
The MARKSCORE action is similar to the MARK action. It also creates a
- new annotation of the given type, but only if it does not yet exist.
+ new annotation of the given type, but only if it is not yet existing.
The optionally passed indexes (parameters after the TypeExpression)
can be used to create an annotation that spanns the matched
- information of several rule elements. Additionally a score value
+ information of several rule elements. Additionally, a score value
(first parameter) is added to the heuristic score value of the
annotation. For more information on heuristic scores see
<xref linkend='ugr.tools.tm.language.score' />
@@ -781,7 +778,7 @@ Document{-> MARKFAST(FirstName, FirstNam
annotates both in a single ParagraphAfterFreeline annotation. The
two number expressions at the end of the mark action indicate that
the matched text of the first and the second rule elements are
- joined to create the boundaries of the new annotation. Additionally
+ joined to create the boundaries of the new annotation. Additionally,
the score '10' is added to the heuristic threshold of this
annotation.
</para>
@@ -792,15 +789,15 @@ Document{-> MARKFAST(FirstName, FirstNam
<title>MARKTABLE</title>
<para>
The MARKTABLE action creates annotations of the given type (first
- parameter) if an element of the given column (second parameter) of a
+ parameter), if an element of the given column (second parameter) of a
passed table (third parameter) occures within the window of the
- matched annotation. Thereby the created annotation doesn't cover the
- whole matched annotation. Instead it only covers the text of the
+ matched annotation. Thereby, the created annotation does not cover the
+ whole matched annotation. Instead, it only covers the text of the
found occurence. Optionally the MARKTABLE action is able to assign
entries of the given table to features of the created annotation.
For
more information on tables see
- <xref linkend='ugr.tools.tm.language.declarations.ressource' />. Additionally several configuration parameters are possible. (See example.)
+ <xref linkend='ugr.tools.tm.language.declarations.ressource' />. Additionally, several configuration parameters are possible. (See example.)
</para>
<section>
<title>
@@ -826,11 +823,11 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<para>
In this example, the whole document is searched for all
occurences of the entries of the first column of the given table
- 'TestTable'. For each occurence an annotation of the type Struct is
+ 'TestTable'. For each occurence, an annotation of the type Struct is
created and its feature 'first' is filled with the entry of the
- second column. Moreover the case of the word is ignored if the
- length of the word exceeds 4. Additionally the chars '.', ',' and
- '-' are ignored, but at maximum two of them.
+ second column. Moreover, the case of the word is ignored if the
+ length of the word exceeds 4. Additionally, the chars '.', ',' and
+ '-' are ignored, but maximally two of them.
</para>
</section>
</section>
@@ -869,7 +866,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<title>MERGE</title>
<para>
The MERGE action merges a number of given lists. The first
- parameter defines if the merge is done as intersection (false) or as
+ parameter defines, if the merge is done as intersection (false) or as
union (true). The second parameter is the list variable that will
contain the result.
</para>
@@ -899,7 +896,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<title>REMOVE</title>
<para>
The REMOVE action removes lists or single values from a given
- list
+ list.
</para>
<section>
<title>
@@ -944,7 +941,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<programlisting><![CDATA[Document{->REMOVEDUPLICATE(list)};]]></programlisting>
</para>
<para>
- Here, all duplicates in list 'list' are removed.
+ Here, all duplicates within the list 'list' are removed.
</para>
</section>
</section>
@@ -1002,7 +999,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<programlisting><![CDATA[Document{->RETAINTYPE(SPACE)};]]></programlisting>
</para>
<para>
- All spaces are retained and can be matched by rules.
+ Here, all spaces are retained and can be matched by rules.
</para>
</section>
</section>
@@ -1040,7 +1037,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<title>SHIFT</title>
<para>
The SHIFT action can be used to change the offsets of an annotation. The optional number expression,
- which point the the rule elements of the rule, specify the new offsets of the annotation.
+ which points the the rule elements of the rule, specify the new offsets of the annotation.
</para>
<section>
<title>
@@ -1086,7 +1083,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<programlisting><![CDATA[Document{->TRANSFER(LanguageStorage)};]]></programlisting>
</para>
<para>
- Here, a new feature structure LanguageStorage is created and
+ Here, a new feature structure <quote>LanguageStorage</quote> is created and
the compatible features of the Document annotation are copied. E.g.,
if LanguageStorage defined a feature named 'language', then the
feature value of the Document annotation is copied.
@@ -1124,7 +1121,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
words previously contained in the file 'FirstNames.txt' are
annotated with the type FirstName and the words in the file
'Companies.txt' with the type Company. The case of the word is
- ignored if the length of the word exceeds 4. The edit distance is
+ ignored, if the length of the word exceeds 4. The edit distance is
deactivated. The cost of an edit operation can currently not be
configured by an argument. The last argument additionally defines
several chars that will be ignored.
@@ -1155,7 +1152,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<programlisting><![CDATA[Keyword{-> TRIM(SPACE)};]]></programlisting>
</para>
<para>
- This rule removes all spaces in at the beginning and at the end of Keyword annotations and
+ This rule removes all spaces at the beginning and at the end of Keyword annotations and
thus changes the offsets of the matched annotations.
</para>
</section>
@@ -1168,7 +1165,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
overlapping the matched annotation. There are two additional configurations: If additional
indexes are given, then the span of the specified rule elements are applied, similar the the MARK action.
If instead a boolean is given as an additional argument, then all annotations of the given type are removed
- that start at the macthed position.
+ that start at the matched position.
</para>
<section>
<title>
@@ -1188,19 +1185,19 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<programlisting><![CDATA[Headline{->UNMARK(Headline)};]]></programlisting>
</para>
<para>
- Here, the headline annotation is removed.
+ Here, the Headline annotation is removed.
</para>
<para>
<programlisting><![CDATA[CW ANY+? QUESTION{->UNMARK(Headline,1,3)};]]></programlisting>
</para>
<para>
- Here, all headline annotations are removed that start with a capitalized word and end with a question mark.
+ Here, all Headline annotations are removed that start with a capitalized word and end with a question mark.
</para>
<para>
<programlisting><![CDATA[CW{->UNMARK(Headline,true)};]]></programlisting>
</para>
<para>
- Here, all headline annotations are removed that start with a capitalized word.
+ Here, all Headline annotations are removed that start with a capitalized word.
</para>
</section>
</section>
@@ -1228,7 +1225,7 @@ Document{-> MARKTABLE(Struct, 1, TestTab
<programlisting><![CDATA[Annotation{->UNMARKALL(Annotation, {Headline})};]]></programlisting>
</para>
<para>
- Here, all annotations but headlines are removed.
+ Here, all annotations except from headlines are removed.
</para>
</section>
Modified: uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.basic_annotations.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.basic_annotations.xml?rev=1429901&r1=1429900&r2=1429901&view=diff
==============================================================================
--- uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.basic_annotations.xml (original)
+++ uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.basic_annotations.xml Mon Jan 7 17:10:54 2013
@@ -20,15 +20,14 @@
<title>Basic annotations and tokens</title>
<para>
The TextMarker system uses a JFlex lexer to initially create a
- seed of basic token annotations. These tokens build a hierarchy
- which is shown in <xref linkend='figure.ugr.tools.tm.language.seeding.basic_token' />. The
+ seed of basic token annotations. These tokens build a hierarchy shown in <xref linkend='figure.ugr.tools.tm.language.seeding.basic_token' />. The
<quote>ALL</quote> (green) annotation is the root of the hierarchy. ALL and the red
- marked annotation types are abstract. This means that they are not
- actually created by the lexer. An overview of these abstract types can
+ marked annotation types are abstract. This means that they are actually not
+ created by the lexer. An overview of these abstract types can
be found in <xref linkend='table.ugr.tools.tm.language.seeding.basic_token.abstract' />. The leafs of the hierarchy (blue) are created by the lexer. Each
- leaf is an own type but also inherits the types of the abstract
+ leaf is an own type, but also inherits the types of the abstract
annotation types further up in the hierarchy. The leaf types are
- described in more detail in <xref linkend='table.ugr.tools.tm.language.seeding.basic_token.created' />
+ described in more detail in <xref linkend='table.ugr.tools.tm.language.seeding.basic_token.created' />.
Each text unit within an input document belongs to exactly one of these
annotation types.
</para>
@@ -77,7 +76,7 @@
<row>
<entry>ANY</entry>
<entry>ALL</entry>
- <entry>all token but markup</entry>
+ <entry>all tokens except for markup</entry>
</row>
<row>
<entry>W</entry>
Modified: uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.conditions.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.conditions.xml?rev=1429901&r1=1429900&r2=1429901&view=diff
==============================================================================
--- uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.conditions.xml (original)
+++ uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.conditions.xml Mon Jan 7 17:10:54 2013
@@ -22,7 +22,7 @@
<section id="ugr.tools.tm.language.conditions.after">
<title>AFTER</title>
<para>
- The AFTER condition evaluates true if the matched annotation
+ The AFTER condition evaluates true, if the matched annotation
starts after the beginning of an arbitrary annotation of the passed
type. If a list of types is passed, this has to be true for at least
one of them.
@@ -43,7 +43,7 @@
<programlisting><![CDATA[CW{AFTER(SW)};]]></programlisting>
</para>
<para>
- Here, the rule matches on a capitalized word if there is any
+ Here, the rule matches on a capitalized word, if there is any
small written word previously.
</para>
</section>
@@ -52,7 +52,7 @@
<section id="ugr.tools.tm.language.conditions.and">
<title>AND</title>
<para>
- The AND condition is a composed condition and evaluates true if
+ The AND condition is a composed condition and evaluates true, if
all contained conditions evaluate true.
</para>
<section>
@@ -72,8 +72,8 @@
->MARK(ImportantHeadline)};]]></programlisting>
</para>
<para>
- In this example a Paragraph is annotated with an
- ImportantHeadline annotation if it is part of a Headline and
+ In this example, a paragraph is annotated with an
+ ImportantHeadline annotation, if it is part of a Headline and
contains a Keyword annotation.
</para>
</section>
@@ -82,7 +82,7 @@
<section id="ugr.tools.tm.language.conditions.before">
<title>BEFORE</title>
<para>
- The BEFORE condition evaluates true if the matched annotation
+ The BEFORE condition evaluates true, if the matched annotation
starts before the beginning of an arbitrary annotation of the passed
type. If a list of types is passed, this has to be true for at least
one of them.
@@ -103,7 +103,7 @@
<programlisting><![CDATA[CW{BEFORE(SW)};]]></programlisting>
</para>
<para>
- Here, the rule matches on a capitalized word if there is any
+ Here, the rule matches on a capitalized word, if there is any
small written word afterwards.
</para>
</section>
@@ -112,14 +112,14 @@
<section id="ugr.tools.tm.language.conditions.contains">
<title>CONTAINS</title>
<para>
- The CONTAINS condition evaluates true on a matched annotation
+ The CONTAINS condition evaluates true on a matched annotation,
if
the frequency of the passed type lies within an optionally passed
interval. The limits of the passed interval are per default
interpreted as absolute numeral values. By passing a further boolean
parameter set to true the limits are interpreted as percental
values.
- If no interval parameters are passed at all the condition
+ If no interval parameters are passed at all, then the condition
checks
whether the matched annotation contains at least one
occurrence of the
@@ -141,26 +141,25 @@
<programlisting><![CDATA[Paragraph{CONTAINS(Keyword)->MARK(KeywordParagraph)};]]></programlisting>
</para>
<para>
- A Paragraph is annotated with a KeywordParagraph annotation if
+ A Paragraph is annotated with a KeywordParagraph annotation, if
it contains a Keyword annotation.
</para>
<para>
<programlisting><![CDATA[Paragraph{CONTAINS(Keyword,2,4)->MARK(KeywordParagraph)};]]></programlisting>
</para>
<para>
- A Paragraph is annotated with a KeywordParagraph annotation if
+ A Paragraph is annotated with a KeywordParagraph annotation, if
it contains between two and four Keyword annotations.
</para>
<para>
<programlisting><![CDATA[Paragraph{CONTAINS(Keyword,50,100,true)->MARK(KeywordParagraph)};]]></programlisting>
</para>
<para>
- A Paragraph is annotated with a KeywordParagraph annotation if it
+ A Paragraph is annotated with a KeywordParagraph annotation, if it
contains between 50% and 100% Keyword annotations. This is
calculated based on the tokens of the Paragraph. If the Paragraph
contains six basic annotations (see
- <xref linkend='ugr.tools.tm.language.seeding' />
- ), two of them are part of one Keyword annotation and one basic
+ <xref linkend='ugr.tools.tm.language.seeding' />), two of them are part of one Keyword annotation, and if one basic
annotation is also annotated with a Keyword annotation, then the
percentage of the contained Keywords is 50%.
</para>
@@ -197,7 +196,7 @@
<para>
Here, the position of the matched Keyword annotation within a
Paragraph annotation is calculated and stored in the variable 'var'.
- If the counted value lies within the interval [2,3] the matched
+ If the counted value lies within the interval [2,3], then the matched
Keyword is annotated with the SecondOrThirdKeywordInParagraph
annotation.
</para>
@@ -210,14 +209,14 @@
The COUNT condition can be used in two different ways. In the
first case (see first definition), it counts the number of
annotations of the passed type within the window of the matched
- annotation and stores the amount in a numerical variable if such a
+ annotation and stores the amount in a numerical variable, if such a
variable is passed. The condition evaluates true if the counted
amount is within a specified interval. If no interval is passed, the
condition always evaluates true. In the second case (see second
definition), it counts the number of occurrences of the passed
VariableExpression (second parameter) within the passed list (first
- parameter) and stores the amount in a numerical variable if such a
- variable is passed. Again the condition evaluates true if the counted
+ parameter) and stores the amount in a numerical variable, if such a
+ variable is passed. Again, the condition evaluates true if the counted
amount is within a specified interval. If no interval is passed, the
condition always evaluates true.
</para>
@@ -243,7 +242,7 @@
<para>
Here, the amount of Keyword annotations within a Paragraph is
calculated and stored in the variable 'var'. If one to ten Keywords
- were counted, the Paragraph is marked with a KeywordParagraph
+ were counted, the paragraph is marked with a KeywordParagraph
annotation.
</para>
<para>
@@ -263,8 +262,8 @@
<para>
The CURRENTCOUNT condition numbers all occurences of the matched
type within the whole document consecutively, thus assigning an index
- to each occurence. Additionally it stores the index of the matched
- annotation in a numerical variable if one is passed. The condition
+ to each occurence. Additionally, it stores the index of the matched
+ annotation in a numerical variable, if one is passed. The condition
evaluates true if the index of the matched annotation is within a
specified interval. If no interval is passed, the condition always
evaluates true.
@@ -285,8 +284,8 @@
<programlisting><![CDATA[Paragraph{CURRENTCOUNT(Keyword,3,3,var)->MARK(ParagraphWithThirdKeyword)};]]></programlisting>
</para>
<para>
- Here, the Paragraph which contains the third Keyword of the
- whole document is annotated with the ParagraphWithThirdKeyword
+ Here, the Paragraph, which contains the third Keyword of the
+ whole document, is annotated with the ParagraphWithThirdKeyword
annotation. The index is stored in the variable 'var'.
</para>
</section>
@@ -295,7 +294,7 @@
<section id="ugr.tools.tm.language.conditions.endswith">
<title>ENDSWITH</title>
<para>
- The ENDSWITH condition evaluates true if an annotation of the
+ The ENDSWITH condition evaluates true, if an annotation of the
given type ends exactly at the same position as the matched
annotation. If a list of types is passed, this has to be true for at
least one of them.
@@ -316,7 +315,7 @@
<programlisting><![CDATA[Paragraph{ENDSWITH(SW)};]]></programlisting>
</para>
<para>
- Here, the rule matches on a Paragraph annotation if it ends
+ Here, the rule matches on a Paragraph annotation, if it ends
with a small written word.
</para>
</section>
@@ -344,7 +343,7 @@
<programlisting><![CDATA[Document{FEATURE("language",targetLanguage)}]]></programlisting>
</para>
<para>
- This rule matches if the feature named 'language' of the
+ This rule matches, if the feature named 'language' of the
document annotation equals the value of the variable
'targetLanguage'.
</para>
@@ -354,8 +353,8 @@
<section id="ugr.tools.tm.language.conditions.if">
<title>IF</title>
<para>
- The IF condition evaluates true if the contained boolean
- expression does.
+ The IF condition evaluates true, if the contained boolean
+ expression evaluates true.
</para>
<section>
<title>
@@ -374,7 +373,7 @@
</para>
<para>
A Paragraph annotation is annotated with a KeywordParagraph
- annotation if the value of the variable 'keywordAmount' is greater
+ annotation, if the value of the variable 'keywordAmount' is greater
than five.
</para>
</section>
@@ -383,10 +382,9 @@
<section id="ugr.tools.tm.language.conditions.inlist">
<title>INLIST</title>
<para>
- The INLIST condition is fulfilled if the matched annotation is listed
+ The INLIST condition is fulfilled, if the matched annotation is listed
in a given word or string list. The (relative) edit distance
- is
- currently disabled.
+ is currently disabled.
<!-- ATTENTION: it seems the edit distance is still disabled? what does
this mean? -->
</para>
@@ -409,7 +407,7 @@
<programlisting><![CDATA[Keyword{INLIST(specialKeywords.txt)->MARK(SpecialKeyword)};]]></programlisting>
</para>
<para>
- A Keyword is annotated with the type SpecialKeyword if the text
+ A Keyword is annotated with the type SpecialKeyword, if the text
of the Keyword annotation is listed in the word list
'specialKeywords.txt'.
</para>
@@ -419,11 +417,11 @@
<section id="ugr.tools.tm.language.conditions.is">
<title>IS</title>
<para>
- The IS condition evaluates true if there is an annotation of the
+ The IS condition evaluates true, if there is an annotation of the
given type with the same beginning and ending offsets as the
matched
annotation. If a list of types is given, the condition
- evaluates true
+ evaluates true,
if at least one of them fulfills the former condition.
</para>
<section>
@@ -451,7 +449,7 @@
<section id="ugr.tools.tm.language.conditions.last">
<title>LAST</title>
<para>
- The LAST condition evaluates true if the type of the last token
+ The LAST condition evaluates true, if the type of the last token
within the window of the matched annotation is of the given type.
</para>
<section>
@@ -470,7 +468,7 @@
<programlisting><![CDATA[Document{LAST(CW)};]]></programlisting>
</para>
<para>
- This rule fires if the last token of the document is a
+ This rule fires, if the last token of the document is a
capitalized word.
</para>
</section>
@@ -500,7 +498,7 @@
->MARK(HeadlineXORKeywords)};]]></programlisting>
</para>
<para>
- A Paragraph is marked as a HeadlineXORKeywords if the matched
+ A Paragraph is marked as a HeadlineXORKeywords, if the matched
text is either part of a Headline annotation or contains Keyword
annotations.
</para>
@@ -510,10 +508,10 @@
<section id="ugr.tools.tm.language.conditions.near">
<title>NEAR</title>
<para>
- The NEAR condition is fulfilled if the distance of the matched
+ The NEAR condition is fulfilled, if the distance of the matched
annotation to an annotation of the given type is within a given
interval. The direction is defined by a boolean parameter, whose
- default value is true, therefore searching forward. By default this
+ default value is set to true, therefore searching forward. By default this
condition works on an unfiltered index. An optional fifth boolean
parameter can be set to true to get the condition being evaluated on
a filtered index.
@@ -572,7 +570,7 @@
<section id="ugr.tools.tm.language.conditions.or">
<title>OR</title>
<para>
- The OR Condition is a composed condition and evaluates true if
+ The OR Condition is a composed condition and evaluates true, if
at least one contained condition is evaluated true.
</para>
<section>
@@ -593,7 +591,7 @@
</para>
<para>
In this example a Paragraph is annotated with the
- ImportantParagraph annotation if it is a Headline or contains
+ ImportantParagraph annotation, if it is a Headline or contains
Keyword annotations.
</para>
</section>
@@ -602,7 +600,7 @@
<section id="ugr.tools.tm.language.conditions.parse">
<title>PARSE</title>
<para>
- The PARSE condition is fulfilled if the text covered by the
+ The PARSE condition is fulfilled, if the text covered by the
matched annotation can be transformed into a value of the given
variable's type. If this is possible, the parsed value is
additionally assigned to the passed variable.
@@ -632,11 +630,11 @@
<section id="ugr.tools.tm.language.conditions.partof">
<title>PARTOF</title>
<para>
- The PARTOF condition is fulfilled if the matched annotation is
- part of an annotation of the given type. However it is not necessary
+ The PARTOF condition is fulfilled, if the matched annotation is
+ part of an annotation of the given type. However, it is not necessary
that the matched annotation is smaller than the annotation of the
- given type. Use the (much slower) PARTOFNEQ condition instead if this
- is needed. If a type list is given, the condition evaluates true if
+ given type. Use the (much slower) PARTOFNEQ condition instead, if this
+ is needed. If a type list is given, the condition evaluates true, if
the former described condition for a single type is fulfilled for at
least one of the types in the list.
</para>
@@ -656,7 +654,7 @@
<programlisting><![CDATA[Paragraph{PARTOF(Headline) -> MARK(ImportantParagraph)};]]></programlisting>
</para>
<para>
- A Paragraph is an ImportantParagraph if the matched text is
+ A Paragraph is an ImportantParagraph, if the matched text is
part of a Headline annotation.
</para>
</section>
@@ -688,7 +686,7 @@
<programlisting><![CDATA[W{PARTOFNEQ(Headline) -> MARK(ImportantWord)};]]></programlisting>
</para>
<para>
- A word is an ImportantWord if it is part of a headline.
+ A word is an <quote>ImportantWord</quote>, if it is part of a headline.
</para>
</section>
</section>
@@ -696,11 +694,11 @@
<section id="ugr.tools.tm.language.conditions.position">
<title>POSITION</title>
<para>
- The POSITION condition is fulfilled if the matched type is the
+ The POSITION condition is fulfilled, if the matched type is the
k-th occurence of this type within the window of an annotation of the
passed type, whereby k is defined by the value of the passed
NumberExpression. If the additional boolean paramter is set to false,
- then k count the occurences of of the minimal annotations.
+ then k counts the occurences of of the minimal annotations.
</para>
<section>
<title>
@@ -736,19 +734,19 @@
<section id="ugr.tools.tm.language.conditions.regexp">
<title>REGEXP</title>
<para>
- The REGEXP condition is fulfilled if the given pattern matches on the
+ The REGEXP condition is fulfilled, if the given pattern matches on the
matched annotation. However, if a string variable is given as the
first
argument, then the pattern is evaluated on the value of the
variable.
For more details on the syntax of regular
- expressions, have a
+ expressions, take a
look at
the
<ulink
url="http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html">Java API</ulink>
- . By default the REGEXP condition is case-sensitive. To change this
- add an optional boolean parameter set to true.
+ . By default the REGEXP condition is case-sensitive. To change this,
+ add an optional boolean parameter, which is set to true.
</para>
<section>
<title>
@@ -778,9 +776,9 @@
The SCORE condition evaluates the heuristic score of the matched
annotation. This score is set or changed by the MARK action.
The
- condition is fulfilled if the score of the matched annotation is
+ condition is fulfilled, if the score of the matched annotation is
in a
- given interval. Optionally the score can be stored in a
+ given interval. Optionally, the score can be stored in a
variable.
</para>
<section>
@@ -799,8 +797,8 @@
<programlisting><![CDATA[MaybeHeadline{SCORE(40,100)->MARK(Headline)};]]></programlisting>
</para>
<para>
- A annotation of the type MaybeHeadline is annotated with
- Headline if its score is between 40 and 100.
+ An annotation of the type MaybeHeadline is annotated with
+ Headline, if its score is between 40 and 100.
</para>
</section>
</section>
@@ -809,8 +807,8 @@
<title>SIZE</title>
<para>
The SIZE contition counts the number of elements in the given
- list. By default this condition always evaluates true. If an interval
- is passed, it evaluates true if the counted number of list elements
+ list. By default, this condition always evaluates true. When an interval
+ is passed, it evaluates true, if the counted number of list elements
is within the interval. The counted number can be stored in an
optionally passed numeral variable.
</para>
@@ -830,9 +828,9 @@
<programlisting><![CDATA[Document{SIZE(list,4,10,var)};]]></programlisting>
</para>
<para>
- This rule fires if the given list contains between 4 and 10
+ This rule fires, if the given list contains between 4 and 10
elements. Additionally, the exact amount is stored in the variable
- var.
+ <quote>var</quote>.
</para>
</section>
</section>
@@ -840,9 +838,9 @@
<section id="ugr.tools.tm.language.conditions.startswith">
<title>STARTSWITH</title>
<para>
- The STARTSWITH condition evaluates true if an annotation of the
+ The STARTSWITH condition evaluates true, if an annotation of the
given type starts exactly at the same position as the matched
- annotation. If a type list is given, the condition evaluates true if
+ annotation. If a type list is given, the condition evaluates true, if
the former is true for at least one of the given types in the list.
</para>
<section>
@@ -861,7 +859,7 @@
<programlisting><![CDATA[Paragraph{STARTSWITH(SW)};]]></programlisting>
</para>
<para>
- Here, the rule matches on a Paragraph annotation if it starts
+ Here, the rule matches on a Paragraph annotation, if it starts
with small written word.
</para>
</section>
@@ -872,7 +870,7 @@
<para>
The TOTALCOUNT condition counts the annotations of the passed
type within the whole document and stores the amount in an optionally
- passed numerical variable. The condition evaluates true if the
+ passed numerical variable. The condition evaluates true, if the
amount
is within the passed interval. If no interval is passed, the
condition always evaluates true.
@@ -905,9 +903,9 @@
<title>VOTE</title>
<para>
The VOTE condition counts the annotations of the given two types
- within the window of the matched annotation and evaluates true
+ within the window of the matched annotation and evaluates true,
if it
- found more annotations of the first type.
+ finds more annotations of the first type.
</para>
<section>
<title>
@@ -925,7 +923,7 @@
<programlisting><![CDATA[Paragraph{VOTE(FirstName,LastName)};]]></programlisting>
</para>
<para>
- Here, this rule fires if a paragraph contains more firstnames
+ Here, this rule fires, if a paragraph contains more firstnames
than lastnames.
</para>
</section>
Modified: uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.declarations.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.declarations.xml?rev=1429901&r1=1429900&r2=1429901&view=diff
==============================================================================
--- uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.declarations.xml (original)
+++ uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.declarations.xml Mon Jan 7 17:10:54 2013
@@ -26,10 +26,11 @@
such as type systems and analysis engines.
</para>
<section id="ugr.tools.tm.language.declarations.type">
- <title>Type</title>
+ <title>Types</title>
<para>
Type declarations define new kinds of annotation types and
- optionally its features.
+ optionally their features.
+ </para>
<section>
<title>
<emphasis role="bold">Example:</emphasis>
@@ -41,20 +42,15 @@ DECLARE ParentType NewType (SomeType fea
// a new type "NewType" with parent type "ParentType" and two features]]></programlisting>
</para>
<para>
- Attention: Types with features need
- a parent type in its
- declaration. If no
- special parent type is
- requested, just use type
- Annotation as
- default parent
- type.
+ Attention: Types with features need a parent type in their
+ declarations. If no special parent type is
+ requested, just use type Annotation as
+ default parent type.
</para>
</section>
- </para>
</section>
<section id="ugr.tools.tm.language.declarations.variable">
- <title>Variable</title>
+ <title>Variables</title>
<para>
Variable declarations define new variables. There are 12 kinds of
variables:
@@ -74,7 +70,7 @@ DECLARE ParentType NewType (SomeType fea
</listitem>
<listitem>
<para>
- Integer variable: A variable that represents a integer.
+ Integer variable: A variable that represents an integer.
</para>
</listitem>
<listitem>
@@ -133,6 +129,7 @@ DECLARE ParentType NewType (SomeType fea
</para>
</listitem>
</itemizedlist>
+ </para>
<section>
<title>
<emphasis role="bold">Example:</emphasis>
@@ -152,12 +149,11 @@ BOOLEAN newBooleanVariable;
BOOLEANLIST newBooleanList;]]></programlisting>
</para>
</section>
- </para>
</section>
<section id="ugr.tools.tm.language.declarations.ressource">
<title>Resources</title>
<para>
- There are two kinds of resource declaration, that make external
+ There are two kinds of resource declarations that make external
resources available in the TextMarker system:
<itemizedlist mark='opencircle'>
<listitem>
@@ -171,11 +167,12 @@ BOOLEANLIST newBooleanList;]]></programl
</listitem>
<listitem>
<para>
- Table: A table represents comma separated
+ Table: A table represents a comma separated
file.
</para>
</listitem>
</itemizedlist>
+ </para>
<section>
<title>
<emphasis role="bold">Example:</emphasis>
@@ -185,16 +182,15 @@ BOOLEANLIST newBooleanList;]]></programl
WORDTABLE tableName = 'someTable.csv';]]></programlisting>
</para>
</section>
- </para>
</section>
<section id="ugr.tools.tm.language.declarations.scripts">
<title>Scripts</title>
<para>
Additional scripts can be imported and reused with the CALL action.
- The types of the imported rules are then also available, so that it
- is
- not necessary to import the Type System of the additional rule
+ The types of the imported rules are also available so that it
+ is not necessary to import the Type System of the additional rule
script.
+ </para>
<section>
<title>
<emphasis role="bold">Example:</emphasis>
@@ -205,12 +201,11 @@ WORDTABLE tableName = 'someTable.csv';]]
Document{->CALL(AnotherScript)}; // <- rule executes "AnotherScript.tm"]]></programlisting>
</para>
</section>
- </para>
</section>
<section id="ugr.tools.tm.language.declarations.components">
<title>Components</title>
<para>
- There are two kind of UIMA components that can be imported in a
+ There are two kinds of UIMA components that can be imported in a
TextMarker script:
<itemizedlist mark='opencircle'>
<listitem>
@@ -233,6 +228,7 @@ Document{->CALL(AnotherScript)}; // <- r
</para>
</listitem>
</itemizedlist>
+ </para>
<section>
<title>
<emphasis role="bold">Example:</emphasis>
@@ -246,6 +242,5 @@ Document{->RETAINTYPE(SPACE,BREAK),CALL(
// calls ExternalEngine, but retains white spaces]]></programlisting>
</para>
</section>
- </para>
</section>
</section>
\ No newline at end of file
Modified: uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.expressions.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.expressions.xml?rev=1429901&r1=1429900&r2=1429901&view=diff
==============================================================================
--- uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.expressions.xml (original)
+++ uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.expressions.xml Mon Jan 7 17:10:54 2013
@@ -20,7 +20,7 @@
<title>Expressions</title>
<para>
TextMarker provides five different kinds of expressions. These are
- type expressions, number expressions, string expressions and
+ type expressions, number expressions, string expressions,
boolean expressions and list expressions.
</para>
<para>
@@ -43,7 +43,7 @@
Type variables
(see
<xref linkend='ugr.tools.tm.language.declarations.variable' />
- )
+ ).
</listitem>
</orderedlist>
<section>
@@ -211,7 +211,7 @@ BooleanLiteral -> "true" | "false"]]
<programlisting><![CDATA[Document{->ASSIGN(boolVar, typeVar == Author)};]]></programlisting>
</para>
<para>
- If type variable typeVar represents annotation type Author,
+ If the type variable typeVar represents annotation type Author,
the
boolean
type expression evaluates to true, otherwise it evaluates
Modified: uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.quantifier.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.quantifier.xml?rev=1429901&r1=1429900&r2=1429901&view=diff
==============================================================================
--- uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.quantifier.xml (original)
+++ uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.quantifier.xml Mon Jan 7 17:10:54 2013
@@ -24,8 +24,8 @@
<title>* Star Greedy</title>
<para>
The Star Greedy quantifier matches on any amount of annotations and
- evaluates always true. Please mind, that a rule element with a Star
- Greedy quantifier needs to match on different annotations than the
+ evaluates always true. Please mind that a rule element with a Star
+ Greedy quantifier needs to match on different annotations as the
next rule element.
Examples:
@@ -57,7 +57,7 @@ Matched: Big]]></programlisting>
<para>
The Plus Greedy quantifier needs to match on at least one
annotation.
- Please mind, that a rule element after a rule element
+ Please mind that a rule element after a rule element
with a Plus
Greedy quantifier matches and evaluates on different
conditions.
@@ -101,8 +101,8 @@ Matched: small Big small]]></programlis
<title>?? Question Reluctant</title>
<para>
The Question Reluctant quantifier matches optionally on an
- annotation
- if the next rule element can not match on the same
+ annotation,
+ if the next rule element does not match on the same
annotation and
therefore always evaluates true.
@@ -129,11 +129,8 @@ Matched: small Big small]]></programlis
<para>
The Min Max Greedy quantifier has to match at least x and at most y
annotations of its rule element to evaluate true, but stops to
- match
- on additional annotations if the next rule element is able to
- match
- on
- this annotation.
+ match on additional annotations, if the next rule element is able to
+ match on this annotation.
Examples:
<programlisting><![CDATA[Input: 123 456 small Big Big Big small Big
Modified: uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.syntax.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.syntax.xml?rev=1429901&r1=1429900&r2=1429901&view=diff
==============================================================================
--- uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.syntax.xml (original)
+++ uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.syntax.xml Mon Jan 7 17:10:54 2013
@@ -82,7 +82,7 @@ BasicAnnotationType -> ('COLON'| 'SW' |
BlockDeclaration -> "BLOCK" "(" Identifier ")" RuleElementWithCA
"{" Statements "}"]]></programlisting>
- Syntax of statements and rule elements
+ Syntax of statements and rule elements:
<programlisting><![CDATA[SimpleStatement -> RuleElements ";"
RuleElements -> RuleElement+
RuleElement -> RuleElementType | RuleElementLiteral
@@ -105,15 +105,15 @@ Conditions -> Condition ( ",
Actions -> "->" Action ( "," Action)*
]]></programlisting>
Since each condition and each action has its own syntax, conditions
- and actions are described in their own section. (For conditions see
+ and actions are described in their own section. For conditions see
<xref linkend='ugr.tools.tm.language.conditions' />
, for actions see
- <xref linkend='ugr.tools.tm.language.actions' />
+ <xref linkend='ugr.tools.tm.language.actions' />.
The syntax of expressions is explained in
- <xref linkend='ugr.tools.tm.language.expressions' />
+ <xref linkend='ugr.tools.tm.language.expressions' />.
</para>
<para>
- Identifier
+ Identifier:
<programlisting><![CDATA[DottedIdentifier -> Identifier ("." Identifier)*
DottedIdentifier2 -> Identifier (("."|"-") Identifier)*
Identifier -> letter (letter|digit)*
Modified: uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.xml?rev=1429901&r1=1429900&r2=1429901&view=diff
==============================================================================
--- uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.xml (original)
+++ uima/sandbox/TextMarker/trunk/uima-docbook-textmarker/src/docbook/tools.textmarker.language.xml Mon Jan 7 17:10:54 2013
@@ -72,7 +72,7 @@ under the License.
the default filtering
configuration ignores whitespaces and markup.
Look at the following rule:
- <programlisting><![CDATA["Dr" PERIOD CW CW
+ <programlisting><![CDATA["Dr" PERIOD CW CW;
]]></programlisting>
Using the default
setting, this rule matches on all four lines
@@ -85,31 +85,31 @@ Dr.JoachimBaumeister
]]></programlisting>
</para>
<para>
- To change the default setting use the
+ To change the default setting, use the
<quote>FILTERTYPE</quote>
or
<quote>RETAINTYPE</quote>
action. For example if markups should no longer be ignored, try
- the following example on the above input document:
+ the following example on the above mentioned input document:
<programlisting><![CDATA[Document{->RETAINTYPE(MARKUP)};
-"Dr" PERIOD CW CW
+"Dr" PERIOD CW CW;
]]></programlisting>
You will see that the third line of the previous input example
will no longer be matched.
</para>
<para>
- To filter types try the following on the input document:
+ To filter types, try the following rules on the input document:
<programlisting><![CDATA[Document{->FILTERTYPE(PERIOD)};
-"Dr" CW CW
+"Dr" CW CW;
]]></programlisting>
- Since periods are ignored now, the rule will match on all four
+ Since periods are ignored here, the rule will match on all four
lines of the example.
</para>
<para>
Notice that using a filtered annotation type within a
- rule, prevents this rule from being executed. Try the following:
+ rule prevents this rule from being executed. Try the following:
<programlisting><![CDATA[Document{->FILTERTYPE(PERIOD)};
-"Dr" PERIOD CW CW
+"Dr" PERIOD CW CW;
]]></programlisting>
You will see that this matches on no line of the input document
since the second rule uses the filtered type PERIOD and is therefore not
@@ -237,34 +237,34 @@ BLOCK(ForEach) Sentence{} {
</para>
<para>
This construction is especially useful, if you have a set of
- rules
+ rules,
which has to be executed continously on the same part of an input
- document. Lets assume you have already annotated your document
+ document. Let us assume that you have already annotated your document
with
Paragraph annotations. Now you want to count the number of words
- within each paragraph and if the number of words is bigger than 500
- annotate it as BigParagraph. Therefore you wrote the following
+ within each paragraph and, if the number of words exceeds 500,
+ annotate it as BigParagraph. Therefore, you wrote the following
rules:
<programlisting><![CDATA[DECLARE BigParagraph;
INT numberOfWords;
Paragraph{COUNT(W,numberOfWords)};
Paragraph{IF(numberOfWords > 500) -> MARK(BigParagraph)};
]]></programlisting>
- This will not work. The reason is that the rule which counts the
+ This will not work. The reason for this is that the rule, which counts the
number of words within a Paragraph is executed on all Paragraphs
before the last rule which marks the Paragraph as BigParagraph
is
- even executed once. Therefore when reaching the last rule in this
+ even executed once. When reaching the last rule in this
example, the variable
<literal>numberOfWords</literal>
holds the
number of words of the last Paragraph in the input
document,
- thus annotating all Paragraphs either as BigParagraph or
+ thus, annotating all Paragraphs either as BigParagraph or
not.
</para>
<para>
- To solve this, use a block to tie the
+ To solve this problem, use a block to tie the
execution of this rules
together for each Paragraph:
<programlisting><![CDATA[DECLARE BigParagraph;
@@ -276,19 +276,19 @@ BLOCK(IsBig) Paragraph{} {
]]></programlisting>
Since the scope of the Document is limited to a Paragraph within
the
- block, the rule which counts the words is only executed once
+ block, the rule, which counts the words is only executed once
before
- the second rule decides if the Paragraph is a BigParagraph.
- Of course
+ the second rule decides, if the Paragraph is a BigParagraph.
+ Of course,
this is done for every Paragraph in the whole document.
</para>
</section>
<section id="ugr.tools.tm.language.blocks.procedure">
<title>Procedures</title>
<para>
- Blocks can be used to introduce procedures into TextMarker
- language.
- To do this declare a block as before. Lets assume you want to
+ Blocks can be used to introduce procedures to the TextMarker
+ scripts.
+ To do this, declare a block as before. Let us assume, you want to
simulate a procedure
<programlisting><![CDATA[public int countAmountOfTypesInDocument(Type type){
int amount = 0;
@@ -304,9 +304,9 @@ public static void main() {
int amount = countAmountOfTypesInDocument(Paragraph));
}
]]></programlisting>
- which counts the number of the passed type wihtin the document
+ which counts the number of the passed type within the document
and
- gives back the counted number. This can be done in the following
+ returns the counted number. This can be done in the following
way:
<programlisting><![CDATA[BOOLEAN executeProcedure = false;
TYPE type;
@@ -395,7 +395,7 @@ Headline{SCORE(5,10)->LOG("Maybe a headl
rules finally execute their actions, if the score of a
headline
annotation exceeds ten points, or lies in the interval of
- five and ten
+ five to ten
points, respectively.
</para>
</section>
@@ -404,13 +404,13 @@ Headline{SCORE(5,10)->LOG("Maybe a headl
<para>
There are different actions that can modify the input document,
like DEL,
- COLOR and REPLACE. But the input document itself can not be
+ COLOR and REPLACE. However, the input document itself can not be
modified
directly. A separate engine, the Modifier.xml, has to be
called in
- order to create another cas view with the name "modified".
+ order to create another CAS view with the name "modified".
In that
- document all modifications are executed.
+ document, all modifications are executed.
</para>
<para>
The following example shows how to import and call the
@@ -427,25 +427,13 @@ Document{-> COLOR(Headline, "green")};
Document{-> EXEC(Modifier)};
]]></programlisting>
- <para>
- To get to the modified view of an input document
- <quote>file1.txt</quote>
- open the output document
- <quote>file1.txt.xmi</quote>
- .
- In editor do right-click and choose
- <quote>CAS Views →
- modified
- </quote>
- .
- </para>
</section>
<section id="ugr.tools.tm.language.external_resources">
<title>External resources</title>
<para>
- Imagine you have a set of documents containing lots of different
- first names. (As example we use a short list, containing the first
+ Imagine you have a set of documents containing many different
+ first names. (as example we use a short list, containing the first
names
<quote>Frank</quote>
,
@@ -454,28 +442,28 @@ Document{-> EXEC(Modifier)};
<quote>Jochen</quote>
and
<quote>Martin</quote>
- .)
- If you would like to annotate all of them with a
+ )
+ If you like to annotate all of them with a
<quote>FirstName</quote>
- annotation you could write a script using the rule
+ annotation, then you could write a script using the rule
<literal>("Frank" | "Peter" | "Jochen" |
"Martin"){->MARK(FirstName)};</literal>.
- This does exactly what you want. But in fact it is not very handy.
+ This does exactly what you want, but not very handy.
If you like to add new first names to the list of recognized first
- names you have to change the rule itself every time. Moreover writing
+ names you have to change the rule itself every time. Moreover, writing
rules with possibly hundreds of first names
- is not really practically realizable and definitely not efficient if you have
+ is not really practically realizable and definitely not efficient, if you have
the list of first names already as a simple text file. Using this text file directly
- would much reduce the effort.
+ would reduce the effort.
</para>
<para>
- Therefore TextMarker provides two kinds of external resources to
+ TextMarker provides, therefore, two kinds of external resources to
solve such tasks more easily: WORDLISTs and WORDTABLEs.
</para>
<section>
<title>WORDLISTs</title>
<para>
- A WORDLIST is simply a list of text items. There are three
+ A WORDLIST is a list of text items. There are three
different possibilities of how to provide a WORDLIST to the TextMarker system.
</para>
<para>
@@ -493,9 +481,9 @@ Martin
by using
<literal>Document{->MARKFAST(FistName, "FirstNames.txt")};</literal>, assuming
an already declared type FirstName. To make this rule
- recognizing more first names just add
+ recognizing more first names, add
them to the external list.
- You could also use a WORLIST variable to do the same thing like this:
+ You could also use a WORLIST variable to do the same thing as follows:
<programlisting><![CDATA[WORDLIST FirstNameList = "FirstNames.txt";
DECLARE FirstName;
Document{->MARKFAST(FistName, FirstNameList)};
@@ -548,7 +536,7 @@ Document{->TRIE("FirstNames.txt" = FistN
WORDLISTS have been used to annotate all occurrences of any list
item in a document with a certain type. Imagine now that each annotation
has features that should be filled with values dependent on the list item
- that matched. This can be achieved with WORDTABLEs. For example, lets
+ that matched. This can be achieved with WORDTABLEs. For example, let us
assume we want to annotate all US presidents within a document.
Moreover each annotation should contain the party of the president as well as the
year of his inauguration. Therefore we use an annotation type