You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2012/05/29 19:30:12 UTC
svn commit: r1343865 [2/2] - /uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.xml

Modified: uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.xml?rev=1343865&r1=1343864&r2=1343865&view=diff
==============================================================================
--- uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.xml (original)
+++ uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.xml Tue May 29 17:30:10 2012
@@ -25,488 +25,488 @@ under the License.
 -->
 
 <chapter id="ugr.tools.tm">
-	<title>TextMarker User&apos;s Guide</title>
-	<titleabbrev>TextMarker User&apos;s Guide</titleabbrev>
+    <title>TextMarker User&apos;s Guide</title>
+    <titleabbrev>TextMarker User&apos;s Guide</titleabbrev>
 
-	<section id="ugr.tools.tm.introduction">
-		<title>TextMarker</title>
-		<para>The TextMarker system is an open source tool
-			for the development
-			of rule-based information extraction applications.
-			The development
-			environment is based on the DLTK framework. It
-			supports the knowledge
-			engineer with a full-featured rule editor,
-			components for the
-			explanation of the rule inference and a build
-			process for generic UIMA
-			Analysis Engines and Type Systems.
-			Therefore TextMarker components can
-			be easily created and combined
-			with other UIMA components in different
-			information extraction
-			pipelines rather flexibly.
-
-			TextMarker applies a
-			specialized rule representation language for the effective
-			knowledge
-			formalization:
-			The rules of the TextMarker language are composed of a
-			list of rule
-			elements that themselves consists of four parts: The
-			mandatory
-			matching condition establishes a connection to the input
-			document by
-			referring to an already existing concept, respectively
-			annotation.
-			The
-			optional quantifier defines the usage of the matching
-			condition
-			similar to regular expressions. Then, additional conditions
-			add
-			constraints to the matched text fragment and additional actions
-			determine the consequences of the rule. Therefore, TextMarker rules
-			match on a pattern of given annotations and, if the additional
-			conditions evaluate true, then they execute their actions, e.g.
-			create
-			a new annotation. If no initial annotations exist, for example,
-			created by another component, a scanner is used to seed simple token
-			annotations contained in a taxonomy.
-
-			The TextMarker system provides
-			unique functionality that is usually not
-			found in similar systems. The
-			actions are able to modify the document
-			either by replacing or
-			deleting text fragments or by filtering the
-			view on the document. In
-			this case, the rules ignore some
-			annotations,
-			e.g. HTML markup, or are
-			executed only on the remaining text passages.
-			The knowledge engineer
-			is able to add heuristic knowledge by using
-			scoring rules.
-			Additionally, several language elements common to
-			scripting languages
-			like conditioned statements, loops, procedures,
-			recursion, variables
-			and expressions increase the expressiveness of
-			the language. Rules are
-			able to directly invoke external rule sets or
-			arbitrary UIMA Analysis
-			Engines and foreign libraries can be
-			integrated with the extension
-			mechanism for new language elements.
-
-		</para>
-		<section id="ugr.tools.tm.introduction.metaphor">
-			<title>Introduction</title>
-			<para>
-				In manual information extraction humans often apply a strategy
-				according to a highlighter metaphor: First relevant headlines are
-				considered and classified according to their content by coloring
-				them
-				with different highlighters. The paragraphs of the annotated
-				headlines
-				are then considered further. Relevant text fragments or
-				single words
-				in the context of that headline can then be colored. In
-				this way, a
-				top-down analysis and extraction strategy is implemented.
-				Necessary
-				additional information can then be added that either refers
-				to other
-				text segments or contains valuable domain specific
-				information.
-				Finally the colored text can be easily analyzed
-				concerning the
-				relevant information.
-
-				The TextMarker system (textmarker
-				is a common german word for a
-				highlighter) tries to imitate this
-				manual extraction method by
-				formalizing the appropriate actions using
-				matching rules: The rules
-				mark sequences of words, extract text
-				segments or modify the input
-				document depending on textual
-				features.The default input for the
-				TextMarker system is
-				semi-structured text, but it can also process
-				structured or free
-				text. Technically, HTML is often the input
-				format,
-				since most word
-				processing documents can be converted to HTML.
-				Additionally, the
-				TextMarker systems offers the possibility to
-				create
-				a modified output
-				document.
-			</para>
-		</section>
-		<section id="ugr.tools.tm.introduction.concepts">
-			<title>Core Concepts</title>
-			<para>
-				As a first step in the extraction process the TextMarker system uses
-				a
-				tokenizer (scanner) to tokenize the input document and to create a
-				stream of basic symbols. The types and valid annotations of the
-				possible tokens are predefined by a taxonomy of annotation types.
-				Annotations simply refer to a section of the input document and
-				assign a type or concept to the respective text fragment. The figure
-				on the right shows an excerpt of a basic annotation taxonomy: CW
-				describes all tokens, for example, that contains a single word
-				starting with a capital letter, MARKUP corresponds to HTML or XML
-				tags, and PM refers to all kinds of punctuations marks. Take a look
-				at [basic annotations|BasicAnnotationList] for a complete list of
-				initial annotations.
-
-
-				<screenshot>
-					<mediaobject>
-						<imageobject>
-							<imagedata scale="80" format="PNG" fileref="&imgroot;symboltaxo.png" />
-						</imageobject>
-						<textobject>
-							<phrase>Part of a taxonomy for basic annotation types.</phrase>
-						</textobject>
-					</mediaobject>
-				</screenshot>
-
-				By using (and extending) the taxonomy, the knowledge engineer is
-				able
-				to choose the most adequate types and concepts when defining new
-				matching rules, i.e., TextMarker rules for matching a text fragment
-				given by a set of symbols to an annotation. If the capitalization of
-				a word, for example, is of no importance, then the annotation type W
-				that describes words of any kind can be used. The initial scanner
-				creates a set of basic annotations that may be used by the matching
-				rules of the TextMarker language. However, most information
-				extraction applications require domain specific concepts and
-				annotations. Therefore, the knowledge engineer is able to extend the
-				set of annotations, and to define new annotation types tuned to the
-				requirements of the given domain. These types can be flexibly
-				integrated in the taxonomy of annotation types.
-
-				One of the goals in
-				developing a new information extraction language
-				was
-				to maintain an
-				easily readable syntax while still providing a
-				scalable
-				expressiveness of the language. Basically, the TextMarker
-				language
-				contains expressions for the definition of new annotation
-				types and
-				for defining new matching rules. The rules are defined by a
-				list of
-				rule elements.
-				Each rule element contains at least a basic matching
-				condition referring
-				to text fragments or already specified
-				annotations. Additionally a
-				list of conditions and actions may be
-				specified for a rule element.
-				Whereas the conditions describe
-				necessary attributes of the matched
-				text fragment, the actions point
-				to operations and assignments on
-				the
-				current fragments. These actions
-				will then only be executed if all
-				basic conditions matched on a text
-				fragment or the annotation and the
-				related conditions are fulfilled.
-			</para>
-		</section>
-		<section id="ugr.tools.tm.introduction.examples">
-			<title>Examples</title>
-			<para>
-				The usage of the language and its readability can be demonstrated by
-				simple examples:
-
-				<programlisting><![CDATA[
-					CW{INLIST('animals.txt') -> MARK(Animal)};
-					Animal "and" Animal{-> MARK(Animalpair, 1, 2, 3)};
+    <section id="ugr.tools.tm.introduction">
+        <title>TextMarker</title>
+        <para>The TextMarker system is an open source tool
+            for the development
+            of rule-based information extraction applications.
+            The development
+            environment is based on the DLTK framework. It
+            supports the knowledge
+            engineer with a full-featured rule editor,
+            components for the
+            explanation of the rule inference and a build
+            process for generic UIMA
+            Analysis Engines and Type Systems.
+            Therefore TextMarker components can
+            be easily created and combined
+            with other UIMA components in different
+            information extraction
+            pipelines rather flexibly.
+
+            TextMarker applies a
+            specialized rule representation language for the effective
+            knowledge
+            formalization:
+            The rules of the TextMarker language are composed of a
+            list of rule
+            elements that themselves consists of four parts: The
+            mandatory
+            matching condition establishes a connection to the input
+            document by
+            referring to an already existing concept, respectively
+            annotation.
+            The
+            optional quantifier defines the usage of the matching
+            condition
+            similar to regular expressions. Then, additional conditions
+            add
+            constraints to the matched text fragment and additional actions
+            determine the consequences of the rule. Therefore, TextMarker rules
+            match on a pattern of given annotations and, if the additional
+            conditions evaluate true, then they execute their actions, e.g.
+            create
+            a new annotation. If no initial annotations exist, for example,
+            created by another component, a scanner is used to seed simple token
+            annotations contained in a taxonomy.
+
+            The TextMarker system provides
+            unique functionality that is usually not
+            found in similar systems. The
+            actions are able to modify the document
+            either by replacing or
+            deleting text fragments or by filtering the
+            view on the document. In
+            this case, the rules ignore some
+            annotations,
+            e.g. HTML markup, or are
+            executed only on the remaining text passages.
+            The knowledge engineer
+            is able to add heuristic knowledge by using
+            scoring rules.
+            Additionally, several language elements common to
+            scripting languages
+            like conditioned statements, loops, procedures,
+            recursion, variables
+            and expressions increase the expressiveness of
+            the language. Rules are
+            able to directly invoke external rule sets or
+            arbitrary UIMA Analysis
+            Engines and foreign libraries can be
+            integrated with the extension
+            mechanism for new language elements.
+
+        </para>
+        <section id="ugr.tools.tm.introduction.metaphor">
+            <title>Introduction</title>
+            <para>
+                In manual information extraction humans often apply a strategy
+                according to a highlighter metaphor: First relevant headlines are
+                considered and classified according to their content by coloring
+                them
+                with different highlighters. The paragraphs of the annotated
+                headlines
+                are then considered further. Relevant text fragments or
+                single words
+                in the context of that headline can then be colored. In
+                this way, a
+                top-down analysis and extraction strategy is implemented.
+                Necessary
+                additional information can then be added that either refers
+                to other
+                text segments or contains valuable domain specific
+                information.
+                Finally the colored text can be easily analyzed
+                concerning the
+                relevant information.
+
+                The TextMarker system (textmarker
+                is a common german word for a
+                highlighter) tries to imitate this
+                manual extraction method by
+                formalizing the appropriate actions using
+                matching rules: The rules
+                mark sequences of words, extract text
+                segments or modify the input
+                document depending on textual
+                features.The default input for the
+                TextMarker system is
+                semi-structured text, but it can also process
+                structured or free
+                text. Technically, HTML is often the input
+                format,
+                since most word
+                processing documents can be converted to HTML.
+                Additionally, the
+                TextMarker systems offers the possibility to
+                create
+                a modified output
+                document.
+            </para>
+        </section>
+        <section id="ugr.tools.tm.introduction.concepts">
+            <title>Core Concepts</title>
+            <para>
+                As a first step in the extraction process the TextMarker system uses
+                a
+                tokenizer (scanner) to tokenize the input document and to create a
+                stream of basic symbols. The types and valid annotations of the
+                possible tokens are predefined by a taxonomy of annotation types.
+                Annotations simply refer to a section of the input document and
+                assign a type or concept to the respective text fragment. The figure
+                on the right shows an excerpt of a basic annotation taxonomy: CW
+                describes all tokens, for example, that contains a single word
+                starting with a capital letter, MARKUP corresponds to HTML or XML
+                tags, and PM refers to all kinds of punctuations marks. Take a look
+                at [basic annotations|BasicAnnotationList] for a complete list of
+                initial annotations.
+
+
+                <screenshot>
+                    <mediaobject>
+                        <imageobject>
+                            <imagedata scale="80" format="PNG" fileref="&imgroot;symboltaxo.png" />
+                        </imageobject>
+                        <textobject>
+                            <phrase>Part of a taxonomy for basic annotation types.</phrase>
+                        </textobject>
+                    </mediaobject>
+                </screenshot>
+
+                By using (and extending) the taxonomy, the knowledge engineer is
+                able
+                to choose the most adequate types and concepts when defining new
+                matching rules, i.e., TextMarker rules for matching a text fragment
+                given by a set of symbols to an annotation. If the capitalization of
+                a word, for example, is of no importance, then the annotation type W
+                that describes words of any kind can be used. The initial scanner
+                creates a set of basic annotations that may be used by the matching
+                rules of the TextMarker language. However, most information
+                extraction applications require domain specific concepts and
+                annotations. Therefore, the knowledge engineer is able to extend the
+                set of annotations, and to define new annotation types tuned to the
+                requirements of the given domain. These types can be flexibly
+                integrated in the taxonomy of annotation types.
+
+                One of the goals in
+                developing a new information extraction language
+                was
+                to maintain an
+                easily readable syntax while still providing a
+                scalable
+                expressiveness of the language. Basically, the TextMarker
+                language
+                contains expressions for the definition of new annotation
+                types and
+                for defining new matching rules. The rules are defined by a
+                list of
+                rule elements.
+                Each rule element contains at least a basic matching
+                condition referring
+                to text fragments or already specified
+                annotations. Additionally a
+                list of conditions and actions may be
+                specified for a rule element.
+                Whereas the conditions describe
+                necessary attributes of the matched
+                text fragment, the actions point
+                to operations and assignments on
+                the
+                current fragments. These actions
+                will then only be executed if all
+                basic conditions matched on a text
+                fragment or the annotation and the
+                related conditions are fulfilled.
+            </para>
+        </section>
+        <section id="ugr.tools.tm.introduction.examples">
+            <title>Examples</title>
+            <para>
+                The usage of the language and its readability can be demonstrated by
+                simple examples:
+
+                <programlisting><![CDATA[
+                    CW{INLIST('animals.txt') -> MARK(Animal)};
+                    Animal "and" Animal{-> MARK(Animalpair, 1, 2, 3)};
         ]]></programlisting>
 
-				The first rule looks at all capitalized words that are listed in an
-				external document animals.txt and creates a new annotation of the
-				type
-				animal using the boundaries of the matched word. The second rule
-				searches for an annotation of the type animal followed by the
-				literal
-				and and a second animal annotation. Then it will create a new
-				annotation animalpair covering the text segment that matched the
-				three
-				rule elements (the digit parameters refer to the number of
-				matched
-				rule element).
-
-				<programlisting><![CDATA[
-					Document{-> MARKFAST(Firstname, 'firstnames.txt')};
-					Firstname CW{-> MARK(Lastname)};
-					Paragraph{VOTE(Firstname, Lastname) -> LOG("Found more Firstnames than Lastnames")};
-      	]]></programlisting>
-
-				In this example, the first rule annotates all words that occur in
-				the
-				external document firstnames.txt with the type firstname. The
-				second
-				rule creates a lastname annotation for all capitalized word
-				that
-				follow a firstname annotation. The last rule finally processes
-				all
-				paragraph} annotations. If the VOTE condition counts more
-				firstname
-				than lastname annotations, then the rule writes a log entry
-				with a
-				predefined message.
-
-
-				<programlisting><![CDATA[
-					ANY+{PARTOF(Paragraph), CONTAINS(Delete, 50, 100, true) -> MARK(Delete)};
-					Firstname{-> MARK(Delete,1 , 2)} Lastname;
-					Delete{-> DEL};
-				]]></programlisting>
-
-				Here, the first rule looks for sequences of any kind of tokens
-				except
-				markup and creates one annotation of the type delete for each
-				sequence, if the tokens are part of a paragraph annotation and
-				contains together already more than 50% of delete annoations. The +
-				signs indicate this greedy processing. The second rule annotates
-				first
-				names followed by last names with the type delete and the third
-				rule
-				simply deletes all text segments that are associated with that
-				delete
-				annotation.
-
-			</para>
-		</section>
-		<section id="ugr.tools.tm.introduction.features">
-			<title>Special Features</title>
-			<para>
-				The TextMarker language features some special characteristics
-				that are
-				usually not found in other rule-based information extraction
-				systems
-				or even shift it towards scripting languages. The possibility
-				of
-				creating new annotation types and integrating them into the
-				taxonomy
-				facilitates an even more modular development of information
-				extraction systems.
-
-				Read more about robust extraction using
-				filtering, complex control
-				structures and heuristic extraction using
-				scoring rules.
-			</para>
-		</section>
-		<section id="ugr.tools.tm.introduction.getstarted">
-			<title>Get started</title>
-			<para>
-				This section page gives you a short, technical introduction on
-				how to
-				get
-				started with TextMarker system and mostly just links the
-				information
-				of the other wiki pages. Some knowledge about the usage
-				of Eclipse and
-				central concepts of UIMA are useful. TextMarker
-				consists of the
-				TextMarker rule language (and of course the rule
-				inference) and the
-				TextMarker workbench. Additionally, the CEV plugin
-				is used to edit
-				and
-				visualize annotated text. The TextRuler system
-				with implementations of
-				well known rule learning methods and
-				development extension with
-				support for test-driven development are
-				already integrated.
-			</para>
-			<section id="ugr.tools.tm.introduction.getstarted.running">
-				<title>Up and running</title>
-				<para>
-					First of all, install the Workbench and read the introduction
-					and its
-					examples. In order to verify if the Workbench is correctly
-					installed,
-					take a look at Help-About Eclipse-Installation Details
-					and compare
-					the installed plugins with the plugins you copied into
-					the plugins
-					folder of your Eclipse application. Normally most of the
-					plugins do
-					not cause any troubles, but the CEV does because of the
-					XPCom and
-					XULRunner dependencies. You should at least get the XPCom
-					plugin up
-					and running. However, you cannot use the additional HTML
-					functionality without the XULRunner plugin. If the plugins of the
-					installation guide do not work properly and a google search for a
-					suiteable plugin is not successful, then write a mail to the user
-					list and we will try to solve the problem. If all plugins are
-					correctly installed, then start the Eclipse application and switch
-					to
-					the TextMarker perspective (Window-Open Perspective-Other...)
-				</para>
-			</section>
-			<section id="ugr.tools.tm.introduction.getstarted.example">
-				<title>Learn by example</title>
-				<para>
-					Having a running Workbench download the example project and
-					import/copy
-					this TextMarker project into your workspace. The project
-					contains
-					some simple rules for extraction the author, title and year
-					of
-					reference strings. Next, take a look at the project structure and
-					the
-					syntax and compare it with the example project and its contents.
-					Open
-					the Main.tm TextMarker script in the folder
-					script/de.uniwue.example
-					and press the Run button in the Eclipse
-					toolbar. The docments in
-					the
-					input folder will then be processed by
-					the Main.tm file and the
-					result of the information extraction task
-					is placed in the output
-					folder. As you can see, there are four
-					files: an xmiCAS for each
-					input file and a HTML file (the
-					modifed/colored result). Open one of
-					the .xmi files with the CAS
-					Editor plugin (-popup menu-Open with) and
-					select some checkboxes in
-					the Annotation Browser view.
-				</para>
-			</section>
-			<section id="ugr.tools.tm.introduction.getstarted.doit">
-				<title>Do it yourself</title>
-				<para>
-					Try to write some rules yourself. Read the description of the
-					available
-					language constructs, e.g., conditions and actions and use
-					the
-					explanation component in order to take a closer look at the rule
-					inference. Then finally, read the rest of this document.
-				</para>
-			</section>
-		</section>
-	</section>
-	<section id="ugr.tools.tm.language">
-		<title>TextMarker Language</title>
-		<para>
-
-		</para>
-
-		<section id="ugr.tools.tm.seeding">
-			<title>Basic Annotations and tokens</title>
-			<para>
-				The TextMarker system uses a JFlex lexer to initially create a
-				seed of
-				basic, token annotations.
-			</para>
-		</section>
-		<section id="ugr.tools.tm.syntax">
-			<title>Syntax</title>
-			<para>
-				Structure
-				<programlisting><![CDATA[<![CDATA[
-			script                 -> packageDeclaration globalStatements statements
-			packageDeclaration     -> "PACKAGE" DottedIdentifier ";"
-			globalStatments        -> globalStatment*   
-			globalStatment         -> ("TYPESYSTEM" | "SCRIPT" | "ENGINE") DottedIdentifier ";"
-			statements             -> statement*
-			statement              -> typeDeclaration | resourceDeclaration | variableDeclaration 
-			                          | blockDeclaration | simpleStatement
-			]]></programlisting>
-
-			Declarations
-			<programlisting><![CDATA[
-			typeDeclaration        -> "DECLARE" (AnnotationType)? Identifier ("," Identifier )*
-			                          | "DECLARE" AnnotationType Identifier ( "(" featureDeclaration ")" )?
-			featureDeclaration     -> ( (AnnotationType | "STRING" | "INT" | "DOUBLE" | "BOOLEAN") Identifier)+
-			resourceDeclaration    -> ("WORDLIST" Identifier = listExpression | "WORDTABLE" Identifier = tableExpression)  ";"
-			variableDeclaration    -> ("TYPE" | "STRING" | "INT" | "DOUBLE" | "BOOLEAN") Identifier ";"
-			]]></programlisting>
-				More information about Declarations.
-
-				Statements
-				<programlisting><![CDATA[
-			blockDeclaration       -> "BLOCK" "(" Identifier ")" ruleElementWithType "{" statements "}"
-			simpleStatement        -> ruleElements ";"
-			ruleElements           -> ( ruleElementWithLiteral  | ruleElementWithType )+
-			ruleElementWithLiteral -> simpleStringExpression quantifierPart? conditionActionPart?
-			ruleElementWithType    -> typeExpression quantifierPart? conditionActionPart?
-			quantifierPart         -> "*" | "*?" | "+" | "+?" | "?" | "??" 
-			                          | "[" numberExpression "," numberExpression "]"
-			                          | "[" numberExpression "," numberExpression "]?"
-			                          
-			conditionActionPart    -> "{" (condition ( "," condition )*)? ( "->" (action( "," action)*))? "}"        
-			condition              -> ConditionName ("(" argument ("," argument)* ")")?
-			action                 -> ActionName ("(" argument ("," argument)* ")")?
-			]]></programlisting>
-				More information about Quantifiers,
-				Conditions, Actions and Blocks.
-				The ruleElementWithType of a BLOCK declaration must have opening
-				and
-				closing curly brackets (e.g., BLOCK(name) Document{} {...})
-
-				Expressions
-				<programlisting><![CDATA[
-			argument                   -> typeExpression | numberExpression | stringExpression | booleanExpression
-			typeExpression             -> AnnotationType | TypeVariable
-			numberExpression           -> additiveExpression
-			additiveExpression         -> multiplicativeExpression
-			multiplicativeExpression   -> simpleNumberExpression ( ( "*" | "/" | "%" ) simpleNumberExpression )*
-			                              | ( "EXP" | "LOGN" | "SIN" | "COS" | "TAN" ) numberExpressionInPar
-			numberExpressionInPar      -> "(" additiveExpression ")"
-			simpleNumberExpression     -> "-"? ( DecimalLiteral | FloatingPointLiteral | NumberVariable)
-			                              | numberExpressionInPar      
-			stringExpression           -> simpleStringExpression ( "+" simpleSEOrNE )*                   
-			simpleStringExpression     -> StringLiteral | StringVariable
-			simpleSEOrNE               -> simpleStringExpression | numberExpressionInPar
-			booleanExpression          -> booleanNumberExpression | BooleanVariable | BooleanLiteral
-			booleanNumberExpression    -> "(" numberExpression ( "<" | "<=" | ">" | ">=" | "==" | "!=" ) numberExpression ")"
-			listExpression             -> Identifier | ResourceLiteral
-			tableExpression            -> Identifier | ResourceLiteral
-			]]></programlisting>
-				More information about Expressions. A ResourceLiteral
-				is something
-				like 'folder/file.txt' (yes, with single quotes).
-			</para>
-		</section>
-		<section id="ugr.tools.tm.inference">
-			<title>Syntax</title>
-			<para>
-				The inference relies on a complete, disjunctive partition of the
-				document. A basic (minimal) annotation for each element of the
-				partition is assigned to a type of a hierarchy. These basic
-				annotations are enriched for performance reasons with information
-				about annotations that start at the same offset or overlap with the
-				basic annotation. Normally, a scanner creates a basic annotation for
-				each token, punctuation or whitespace, but can also be replaced with
-				a different annotation seeding strategy. Unlike other rule-based
-				information extraction language, the rules are executed in an
-				imperative way. Experience has shown that the dependencies between
-				rules, e.g., the same annotation types in the action and in the
-				condition of a different rule, often form tree-like and not
-				graph-like structures. Therefore, the sequencing and imperative
-				processing did not cause disadvantages, but instead obvious
-				advantages, e.g., the improved understandability of large rule sets.
-				The following algorithm summarizes the rule inference:
-				<programlisting><![CDATA[
+                The first rule looks at all capitalized words that are listed in an
+                external document animals.txt and creates a new annotation of the
+                type
+                animal using the boundaries of the matched word. The second rule
+                searches for an annotation of the type animal followed by the
+                literal
+                and and a second animal annotation. Then it will create a new
+                annotation animalpair covering the text segment that matched the
+                three
+                rule elements (the digit parameters refer to the number of
+                matched
+                rule element).
+
+                <programlisting><![CDATA[
+                    Document{-> MARKFAST(Firstname, 'firstnames.txt')};
+                    Firstname CW{-> MARK(Lastname)};
+                    Paragraph{VOTE(Firstname, Lastname) -> LOG("Found more Firstnames than Lastnames")};
+        ]]></programlisting>
+
+                In this example, the first rule annotates all words that occur in
+                the
+                external document firstnames.txt with the type firstname. The
+                second
+                rule creates a lastname annotation for all capitalized word
+                that
+                follow a firstname annotation. The last rule finally processes
+                all
+                paragraph} annotations. If the VOTE condition counts more
+                firstname
+                than lastname annotations, then the rule writes a log entry
+                with a
+                predefined message.
+
+
+                <programlisting><![CDATA[
+                    ANY+{PARTOF(Paragraph), CONTAINS(Delete, 50, 100, true) -> MARK(Delete)};
+                    Firstname{-> MARK(Delete,1 , 2)} Lastname;
+                    Delete{-> DEL};
+                ]]></programlisting>
+
+                Here, the first rule looks for sequences of any kind of tokens
+                except
+                markup and creates one annotation of the type delete for each
+                sequence, if the tokens are part of a paragraph annotation and
+                contains together already more than 50% of delete annoations. The +
+                signs indicate this greedy processing. The second rule annotates
+                first
+                names followed by last names with the type delete and the third
+                rule
+                simply deletes all text segments that are associated with that
+                delete
+                annotation.
+
+            </para>
+        </section>
+        <section id="ugr.tools.tm.introduction.features">
+            <title>Special Features</title>
+            <para>
+                The TextMarker language features some special characteristics
+                that are
+                usually not found in other rule-based information extraction
+                systems
+                or even shift it towards scripting languages. The possibility
+                of
+                creating new annotation types and integrating them into the
+                taxonomy
+                facilitates an even more modular development of information
+                extraction systems.
+
+                Read more about robust extraction using
+                filtering, complex control
+                structures and heuristic extraction using
+                scoring rules.
+            </para>
+        </section>
+        <section id="ugr.tools.tm.introduction.getstarted">
+            <title>Get started</title>
+            <para>
+                This section page gives you a short, technical introduction on
+                how to
+                get
+                started with TextMarker system and mostly just links the
+                information
+                of the other wiki pages. Some knowledge about the usage
+                of Eclipse and
+                central concepts of UIMA are useful. TextMarker
+                consists of the
+                TextMarker rule language (and of course the rule
+                inference) and the
+                TextMarker workbench. Additionally, the CEV plugin
+                is used to edit
+                and
+                visualize annotated text. The TextRuler system
+                with implementations of
+                well known rule learning methods and
+                development extension with
+                support for test-driven development are
+                already integrated.
+            </para>
+            <section id="ugr.tools.tm.introduction.getstarted.running">
+                <title>Up and running</title>
+                <para>
+                    First of all, install the Workbench and read the introduction
+                    and its
+                    examples. In order to verify if the Workbench is correctly
+                    installed,
+                    take a look at Help-About Eclipse-Installation Details
+                    and compare
+                    the installed plugins with the plugins you copied into
+                    the plugins
+                    folder of your Eclipse application. Normally most of the
+                    plugins do
+                    not cause any troubles, but the CEV does because of the
+                    XPCom and
+                    XULRunner dependencies. You should at least get the XPCom
+                    plugin up
+                    and running. However, you cannot use the additional HTML
+                    functionality without the XULRunner plugin. If the plugins of the
+                    installation guide do not work properly and a google search for a
+                    suiteable plugin is not successful, then write a mail to the user
+                    list and we will try to solve the problem. If all plugins are
+                    correctly installed, then start the Eclipse application and switch
+                    to
+                    the TextMarker perspective (Window-Open Perspective-Other...)
+                </para>
+            </section>
+            <section id="ugr.tools.tm.introduction.getstarted.example">
+                <title>Learn by example</title>
+                <para>
+                    Having a running Workbench download the example project and
+                    import/copy
+                    this TextMarker project into your workspace. The project
+                    contains
+                    some simple rules for extraction the author, title and year
+                    of
+                    reference strings. Next, take a look at the project structure and
+                    the
+                    syntax and compare it with the example project and its contents.
+                    Open
+                    the Main.tm TextMarker script in the folder
+                    script/de.uniwue.example
+                    and press the Run button in the Eclipse
+                    toolbar. The docments in
+                    the
+                    input folder will then be processed by
+                    the Main.tm file and the
+                    result of the information extraction task
+                    is placed in the output
+                    folder. As you can see, there are four
+                    files: an xmiCAS for each
+                    input file and a HTML file (the
+                    modifed/colored result). Open one of
+                    the .xmi files with the CAS
+                    Editor plugin (-popup menu-Open with) and
+                    select some checkboxes in
+                    the Annotation Browser view.
+                </para>
+            </section>
+            <section id="ugr.tools.tm.introduction.getstarted.doit">
+                <title>Do it yourself</title>
+                <para>
+                    Try to write some rules yourself. Read the description of the
+                    available
+                    language constructs, e.g., conditions and actions and use
+                    the
+                    explanation component in order to take a closer look at the rule
+                    inference. Then finally, read the rest of this document.
+                </para>
+            </section>
+        </section>
+    </section>
+    <section id="ugr.tools.tm.language">
+        <title>TextMarker Language</title>
+        <para>
+
+        </para>
+
+        <section id="ugr.tools.tm.seeding">
+            <title>Basic Annotations and tokens</title>
+            <para>
+                The TextMarker system uses a JFlex lexer to initially create a
+                seed of
+                basic, token annotations.
+            </para>
+        </section>
+        <section id="ugr.tools.tm.syntax">
+            <title>Syntax</title>
+            <para>
+                Structure
+                <programlisting><![CDATA[<![CDATA[
+            script                 -> packageDeclaration globalStatements statements
+            packageDeclaration     -> "PACKAGE" DottedIdentifier ";"
+            globalStatments        -> globalStatment*   
+            globalStatment         -> ("TYPESYSTEM" | "SCRIPT" | "ENGINE") DottedIdentifier ";"
+            statements             -> statement*
+            statement              -> typeDeclaration | resourceDeclaration | variableDeclaration 
+                                      | blockDeclaration | simpleStatement
+            ]]></programlisting>
+
+            Declarations
+            <programlisting><![CDATA[
+            typeDeclaration        -> "DECLARE" (AnnotationType)? Identifier ("," Identifier )*
+                                      | "DECLARE" AnnotationType Identifier ( "(" featureDeclaration ")" )?
+            featureDeclaration     -> ( (AnnotationType | "STRING" | "INT" | "DOUBLE" | "BOOLEAN") Identifier)+
+            resourceDeclaration    -> ("WORDLIST" Identifier = listExpression | "WORDTABLE" Identifier = tableExpression)  ";"
+            variableDeclaration    -> ("TYPE" | "STRING" | "INT" | "DOUBLE" | "BOOLEAN") Identifier ";"
+            ]]></programlisting>
+                More information about Declarations.
+
+                Statements
+                <programlisting><![CDATA[
+            blockDeclaration       -> "BLOCK" "(" Identifier ")" ruleElementWithType "{" statements "}"
+            simpleStatement        -> ruleElements ";"
+            ruleElements           -> ( ruleElementWithLiteral  | ruleElementWithType )+
+            ruleElementWithLiteral -> simpleStringExpression quantifierPart? conditionActionPart?
+            ruleElementWithType    -> typeExpression quantifierPart? conditionActionPart?
+            quantifierPart         -> "*" | "*?" | "+" | "+?" | "?" | "??" 
+                                      | "[" numberExpression "," numberExpression "]"
+                                      | "[" numberExpression "," numberExpression "]?"
+                                      
+            conditionActionPart    -> "{" (condition ( "," condition )*)? ( "->" (action( "," action)*))? "}"        
+            condition              -> ConditionName ("(" argument ("," argument)* ")")?
+            action                 -> ActionName ("(" argument ("," argument)* ")")?
+            ]]></programlisting>
+                More information about Quantifiers,
+                Conditions, Actions and Blocks.
+                The ruleElementWithType of a BLOCK declaration must have opening
+                and
+                closing curly brackets (e.g., BLOCK(name) Document{} {...})
+
+                Expressions
+                <programlisting><![CDATA[
+            argument                   -> typeExpression | numberExpression | stringExpression | booleanExpression
+            typeExpression             -> AnnotationType | TypeVariable
+            numberExpression           -> additiveExpression
+            additiveExpression         -> multiplicativeExpression
+            multiplicativeExpression   -> simpleNumberExpression ( ( "*" | "/" | "%" ) simpleNumberExpression )*
+                                          | ( "EXP" | "LOGN" | "SIN" | "COS" | "TAN" ) numberExpressionInPar
+            numberExpressionInPar      -> "(" additiveExpression ")"
+            simpleNumberExpression     -> "-"? ( DecimalLiteral | FloatingPointLiteral | NumberVariable)
+                                          | numberExpressionInPar      
+            stringExpression           -> simpleStringExpression ( "+" simpleSEOrNE )*                   
+            simpleStringExpression     -> StringLiteral | StringVariable
+            simpleSEOrNE               -> simpleStringExpression | numberExpressionInPar
+            booleanExpression          -> booleanNumberExpression | BooleanVariable | BooleanLiteral
+            booleanNumberExpression    -> "(" numberExpression ( "<" | "<=" | ">" | ">=" | "==" | "!=" ) numberExpression ")"
+            listExpression             -> Identifier | ResourceLiteral
+            tableExpression            -> Identifier | ResourceLiteral
+            ]]></programlisting>
+                More information about Expressions. A ResourceLiteral
+                is something
+                like 'folder/file.txt' (yes, with single quotes).
+            </para>
+        </section>
+        <section id="ugr.tools.tm.inference">
+            <title>Syntax</title>
+            <para>
+                The inference relies on a complete, disjunctive partition of the
+                document. A basic (minimal) annotation for each element of the
+                partition is assigned to a type of a hierarchy. These basic
+                annotations are enriched for performance reasons with information
+                about annotations that start at the same offset or overlap with the
+                basic annotation. Normally, a scanner creates a basic annotation for
+                each token, punctuation or whitespace, but can also be replaced with
+                a different annotation seeding strategy. Unlike other rule-based
+                information extraction language, the rules are executed in an
+                imperative way. Experience has shown that the dependencies between
+                rules, e.g., the same annotation types in the action and in the
+                condition of a different rule, often form tree-like and not
+                graph-like structures. Therefore, the sequencing and imperative
+                processing did not cause disadvantages, but instead obvious
+                advantages, e.g., the improved understandability of large rule sets.
+                The following algorithm summarizes the rule inference:
+                <programlisting><![CDATA[
 collect all basic annotations that fulfill the first matching condition
   for all collected basic annotations do
     for all rule elements of current rule do
@@ -526,1765 +526,1765 @@ collect all basic annotations that fulfi
   if all rule elements matched then
     execute the actions of all rule elements
 ]]></programlisting>
-				The rule elements can of course match on all kinds of annotations.
-				Therefore the determination of the next basic annotation returns the
-				first basic annotation after the last basic annotation of the
-				complete, matched annotation.
-
-			</para>
-		</section>
-		<section id="ugr.tools.tm.declarations">
-			<title>Declarations</title>
-			<para>
-
-				There are three different kinds declaration in the TextMarker
-				system:
-				Declarations of types with optional feature definitions of
-				that type,
-				declaration of variables and declarations for importing
-				external
-				resources, scripts of UIMA components.
-			</para>
-			<section id="ugr.tools.tm.declarations.type">
-				<title>Type</title>
-				<para>
-					Type declarations define new kinds of annotations types and
-					optionally its features.
-
-					Examples:
-					<programlisting><![CDATA[
-			DECLARE SimpleType1, SimpleType2; // <- two new types with the parent type "Annotation"
-			DECLARE ParentType NewType (SomeType feature1, INT feature2); // <- defines a new type "NewType" 
-			    // with parent type "ParentType" and two features
-			]]></programlisting>
-
-					If the parent type is not defined in the same namepace, then the
-					complete namespace has to be used, e.g., DECLARE
-					my.other.package.Parent NewType;
-				</para>
-			</section>
-			<section id="ugr.tools.tm.declarations.variable">
-				<title>Variable</title>
-				<para>
-					Variable declarations define new variables. There are five kinds of
-					variables:
-					* Type variable: A variable that represents an annotation
-					type.
-					* Integer variable: A variable that represents a integer.
-					*
-					Double variable: A variable that represents a floating-point
-					number.
-					* String variable: A variable that represents a string.
-					*
-					Boolean
-					variable: A variable that represents a boolean.
-
-					Examples:
-					<programlisting><![CDATA[
-				TYPE newTypeVariable;
-				INT newIntegerVariable;
-				DOUBLE newDoubleVariable;
-				STRING newStringVariable;
-				BOOLEAN newBooleanVariable;
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.declarations.ressource">
-				<title>Resources</title>
-				<para>
-
-					There are two kinds of resource declaration, that make external
-					resources available in hte TextMarker system:
-					* List: A list
-					represents a normal text file with an entry per line
-					or a compiled
-					tree of a word list.
-					* Table: A table represents comma separated
-					file.
-
-					Examples:
-					<programlisting><![CDATA[
-				LIST Name = 'someWordList.txt';
-				TABLE Name = 'someTable.csv';
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.declarations.scripts">
-				<title>Scripts</title>
-				<para>
-
-					Additional scripts can be imported and reused with the CALL action.
-					The types of the imported rules are then also available, so that it
-					is not neccessary to import the Type System of the additional rule
-					script.
-
-					Examples:
-					<programlisting><![CDATA[
-				SCRIPT my.package.AnotherScript; // <- "AnotherScript.tm" in the "my.package" package
-				Document{->CALL(AnotherScript)}; // <- rule executes "AnotherScript.tm"
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.declarations.components">
-				<title>Components</title>
-				<para>
-
-					There are two kind of UIMA components that can be imported in a
-					TextMarker script:
-					* Type System: includes the types defined in an
-					external type system.
-					* Analysis Engine: makes an external analysis
-					engine available. The
-					type system needed for the analysis engine has
-					to be imported
-					seperately. Please mind the filtering setting when
-					calling an
-					external analysis engine.
-
-					Examples:
-					<programlisting><![CDATA[
-				ENINGE my.package.ExternalEngine; // <- "ExternalEngine.xml" in the 
-				    // "my.package" package (in the descriptor folder)
-				TYPESYSTEM my.package.ExternalTypeSystem; // <- "ExternalTypeSystem.xml" 
-				    // in the "my.package" package (in the descriptor folder)
-				Document{->RETAINTYPE(SPACE,BREAK),CALL(ExternalEngine)}; 
-				    // calls ExternalEngine, but retains white spaces
-				]]></programlisting>
-
-				</para>
-			</section>
-		</section>
-		<section id="ugr.tools.tm.quantifier">
-			<title>Quantifiers</title>
-			<para>
-			</para>
-			<section id="ugr.tools.tm.quantifier.sg">
-				<title>* Star Greedy</title>
-				<para>
-					The Star Greedy quantifier matches on any amount of annotations and
-					evaluates always true. Please mind, that a rule element with a Star
-					Greedy quantifier needs to match on different annotations than the
-					next rule element.
-
-					Examples:
-					<programlisting><![CDATA[
-				Input:    small Big Big Big small
-				Rule:     CW*
-				Matched:  Big Big Big  
-				Matched:  Big Big 
-				Matched:  Big
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.quantifier.sr">
-				<title>*? Star Reluctant</title>
-				<para>
-					The Star Reluctant quantifier matches on any amount of annotations
-					and evaluates always true, but stops to match on new annotations,
-					when the next rule element matches and evaluates true on this
-					annotation.
-
-					Examples:
-					<programlisting><![CDATA[
-				Input:    123 456 small small Big 
-				Rule:     W*? CW
-				Matched:  small small Big
-				Matched:  small Big
-				Matched:  Big
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.quantifier.pg">
-				<title>+ Plus Greedy</title>
-				<para>
-					The Plus Greedy quantifier needs to match on at least one
-					annotation. Please mind, that a rule element after a rule element
-					with a Plus Greedy quantifier matches and evaluates on different
-					conditions.
-
-					Examples:
-
-					<programlisting><![CDATA[
-				Input:    123 456 small small Big 
-				Rule:     SW+ 
-				Matched:  small small
-				Matched:  small 
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.quantifier.pr">
-				<title>+? Plus Reluctant</title>
-				<para>
-					The Plus Reluctant quantifier has to match on at least one
-					annotation in order to evaluate true, but stops when the next rule
-					element is able to match on this annotation.
-
-					Examples:
-					<programlisting><![CDATA[
-				Input:    123 456 small small Big 
-				Rule:     W+? CW
-				Matched:  small small Big
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.quantifier.qg">
-				<title>? Question Greedy</title>
-				<para>
-					The Question Greedy quantifier matches optionally on an annotation
-					and therefore always evaluates true.
-
-					Examples:
-					<programlisting><![CDATA[
-				Input:    123 456 small Big small Big 
-				Rule:     SW CW? SW
-				Matched:  small Big small
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.quantifier.qr">
-				<title>?? Question Reluctant</title>
-				<para>
-					The Question Reluctant quantifier matches optionally on an
-					annotation if the next rule element can not match on the same
-					annotation and therefore always evaluates true.
-
-					Examples:
-					<programlisting><![CDATA[
-				Input:    123 456 small Big small Big 
-				Rule:     SW CW?? SW
-				Matched:  small Big small
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.quantifier.mmg">
-				<title>[x,y] Min Max Greedy</title>
-				<para>
-					The Min Max Greedy quantifier has to match at least x and at most y
-					annotations of its rule element to elaluate true.
-
-					Examples:
-					<programlisting><![CDATA[
-				Input:    123 456 small Big small Big 
-				Rule:     SW CW[1,2] SW
-				Matched:  small Big small
-				]]></programlisting>
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.quantifier.mmr">
-				<title>[x,y]? Min Max Reluctant</title>
-				<para>
-					The Min Max Greedy quantifier has to match at least x and at most y
-					annotations of its rule element to elaluate true, but stops to
-					match
-					on additional annotations if the next rule element is able to
-					match
-					on this annotation.
-
-					Examples:
-					<programlisting><![CDATA[
-				Input:    123 456 small Big Big Big small Big 
-				Rule:     SW CW[2,100]? SW
-				Matched:  small Big Big Big small
-				]]></programlisting>
-				</para>
-			</section>
-		</section>
-		<section id="ugr.tools.tm.condition">
-			<title>Conditions</title>
-			<para>
-			</para>
-			<section id="ugr.tools.tm.condition.after">
-				<title>AFTER</title>
-				<para>
-
-					The AFTER condition evaluates true, if an annotation of the given
-					type preceeds the matched annotations.
-
-					Definition
-					<programlisting><![CDATA[AFTER(Type|TypeListExpression)  ]]></programlisting>
-
-					Example
-
-					<programlisting><![CDATA[CW{AFTER(SW)}; ]]></programlisting>
-					Here, the rule matches on a capitalized word, if there is any small
-					written word previously.
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.and">
-				<title>AND</title>
-				<para>
-					The AND Condition is a composed condition and evaluates true, if
-					all
-					contained conditions are evaluated true.
-
-					Definition
-
-					<programlisting><![CDATA[AND(Condition1,...,ConditionN)  ]]></programlisting>
-
-					Example
-
-					<programlisting><![CDATA[Paragraph{AND(PARTOF(Headline),CONTAINS(Keyword))->MARK(ImportantHeadline)}; ]]></programlisting>
-
-					In this example a Paragraph is annotated with the ImportantHealine
-					annotation, if it is a Headline and contains Keyword.
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.before">
-				<title>BEFORE</title>
-				<para>
-
-					The BEFORE condition evaluates true, if the matched annotations
-					prceeds an annotation of the given type.
-
-					Definition
-					<programlisting><![CDATA[BEFORE (Type|TypeListExpression)  ]]></programlisting>
-
-					Example
-
-					<programlisting><![CDATA[CW{BEFORE(SW)}; ]]></programlisting>
-					Here, the rule matches on a capitalized word, if there is any small
-					written word afterwards.
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.contains">
-				<title>CONTAINS</title>
-				<para>
-
-					The CONTAINS condition evaluates true if the amount or percentage
-					of
-					certain types in the window of the matched annotation is in a
-					predefined interval.
-
-					Definition
-
-					<programlisting><![CDATA[CONTAINS(Type(,NumbericalExpression,NumbericalExpression(,BooleanExpression)?)?) ]]></programlisting>
-
-					Example
-
-					<programlisting><![CDATA[Paragraph{CONTAINS(Keyword)->MARK(KeywordParagraph)}; ]]></programlisting>
-
-					A Pararaph is annotated with a KeywordParagraph annotation, if it
-					contains a Keyword annotation.
-
-					<programlisting><![CDATA[Paragraph{CONTAINS(Keyword,2,4)->MARK(KeywordParagraph)}; ]]></programlisting>
-
-					A Pararaph is annotated with a KeywordParagraph annotation, if it
-					contains between two and four Keyword annotations.
-
-					<programlisting><![CDATA[Paragraph{CONTAINS(Keyword,50,100,true)->MARK(KeywordParagraph)}; ]]></programlisting>
-
-					A Pararaph is annotated with a KeywordParagraph annotation, if it
-					contains between 50% and 100% Keyword annotations. This is
-					calculated based on the tokens of the Paragraph. If the Paragraph
-					contains six basic annatotions, two of them are part of one Keyword
-					annotation and one basic annotation is also annotated with a
-					Keyword
-					annotation, then the percantage of the contained Keywords
-					is
-					50%.
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.contextcount">
-				<title>CONTEXTCOUNT</title>
-				<para>
-
-					The CONTEXTCOUNT condition counts the annotations of the matched
-					type and stores the amount in a optional numerical variable.
-					Additionally the condition evaluates true, if the amount is in a
-					predefined interval.
-
-					Definition
-
-					<programlisting><![CDATA[CONTEXTCOUNT(Type(,NumbericalExpression,NumbericalExpression(,Variable)?)?) ]]></programlisting>
-
-					Example
-
-					<programlisting><![CDATA[Keyword{CONTEXTCOUNT(Paragraph,0,1000,var)->MARK(KeywordParagraph)}; ]]></programlisting>
-
-					Here, the position in a Paragraph of the matched Keyword annotation
-					is calculated and stored in the variable var.
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.count">
-				<title>COUNT</title>
-				<para>
-
-					The COUNT condition counts the annotations of a given type and
-					stores the amount in a optional numerical variable. Additionally
-					the
-					condition evaluates true, if the amount is in a predefined
-					interval.
-
-					Definition
-
-					<programlisting><![CDATA[COUNT(Type(,NumbericalExpression,NumbericalExpression)?(,NumberVariable)?) ]]></programlisting>
-					<programlisting><![CDATA[COUNT(ListExpression(,NumbericalExpression,NumbericalExpression)?(,NumberVariable)?) ]]></programlisting>
-
-					Example
-
-					<programlisting><![CDATA[Paragraph{COUNT(Keyword,1,10,var)->MARK(KeywordParagraph)}; ]]></programlisting>
-
-					Here, the amount of Keyword annotations in a Paragraph is
-					calculated
-					and stored in the variable var. The action of the rule
-					will be
-					executed if one to ten Keywords were counted.
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.currentcount">
-				<title>CURRENTCOUNT</title>
-				<para>
-
-
-					Definition
-
-					<programlisting><![CDATA[CURRENTCOUNT(Type(,NumbericalExpression,NumbericalExpression(,Variable)?)?) ]]></programlisting>
-
-					Example
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.endswith">
-				<title>ENDSWITH</title>
-				<para>
-
-					The ENDSWITH condition evaluates true, if an annotation of the
-					given
-					type ends exactly at the same position as the matched
-					annotation.
-
-					Definition
-					<programlisting><![CDATA[ENDSWITH(Type|TypeListExpression)  ]]></programlisting>
-
-					Example
-
-					<programlisting><![CDATA[Paragraph{ENDSWITH(SW)}; ]]></programlisting>
-					Here, the rule matches on a Paragraph annotation, if it ends with
-					small written word.
-
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.feature">
-				<title>FEATURE</title>
-				<para>
-
-
-					The FEATURE condition compares a feature of the matched annotation
-					with the the second argument.
+                The rule elements can of course match on all kinds of annotations.
+                Therefore the determination of the next basic annotation returns the
+                first basic annotation after the last basic annotation of the
+                complete, matched annotation.
+
+            </para>
+        </section>
+        <section id="ugr.tools.tm.declarations">
+            <title>Declarations</title>
+            <para>
+
+                There are three different kinds declaration in the TextMarker
+                system:
+                Declarations of types with optional feature definitions of
+                that type,
+                declaration of variables and declarations for importing
+                external
+                resources, scripts of UIMA components.
+            </para>
+            <section id="ugr.tools.tm.declarations.type">
+                <title>Type</title>
+                <para>
+                    Type declarations define new kinds of annotations types and
+                    optionally its features.
+
+                    Examples:
+                    <programlisting><![CDATA[
+            DECLARE SimpleType1, SimpleType2; // <- two new types with the parent type "Annotation"
+            DECLARE ParentType NewType (SomeType feature1, INT feature2); // <- defines a new type "NewType" 
+                // with parent type "ParentType" and two features
+            ]]></programlisting>
+
+                    If the parent type is not defined in the same namepace, then the
+                    complete namespace has to be used, e.g., DECLARE
+                    my.other.package.Parent NewType;
+                </para>
+            </section>
+            <section id="ugr.tools.tm.declarations.variable">
+                <title>Variable</title>
+                <para>
+                    Variable declarations define new variables. There are five kinds of
+                    variables:
+                    * Type variable: A variable that represents an annotation
+                    type.
+                    * Integer variable: A variable that represents a integer.
+                    *
+                    Double variable: A variable that represents a floating-point
+                    number.
+                    * String variable: A variable that represents a string.
+                    *
+                    Boolean
+                    variable: A variable that represents a boolean.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                TYPE newTypeVariable;
+                INT newIntegerVariable;
+                DOUBLE newDoubleVariable;
+                STRING newStringVariable;
+                BOOLEAN newBooleanVariable;
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.declarations.ressource">
+                <title>Resources</title>
+                <para>
+
+                    There are two kinds of resource declaration, that make external
+                    resources available in hte TextMarker system:
+                    * List: A list
+                    represents a normal text file with an entry per line
+                    or a compiled
+                    tree of a word list.
+                    * Table: A table represents comma separated
+                    file.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                LIST Name = 'someWordList.txt';
+                TABLE Name = 'someTable.csv';
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.declarations.scripts">
+                <title>Scripts</title>
+                <para>
+
+                    Additional scripts can be imported and reused with the CALL action.
+                    The types of the imported rules are then also available, so that it
+                    is not neccessary to import the Type System of the additional rule
+                    script.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                SCRIPT my.package.AnotherScript; // <- "AnotherScript.tm" in the "my.package" package
+                Document{->CALL(AnotherScript)}; // <- rule executes "AnotherScript.tm"
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.declarations.components">
+                <title>Components</title>
+                <para>
+
+                    There are two kind of UIMA components that can be imported in a
+                    TextMarker script:
+                    * Type System: includes the types defined in an
+                    external type system.
+                    * Analysis Engine: makes an external analysis
+                    engine available. The
+                    type system needed for the analysis engine has
+                    to be imported
+                    seperately. Please mind the filtering setting when
+                    calling an
+                    external analysis engine.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                ENINGE my.package.ExternalEngine; // <- "ExternalEngine.xml" in the 
+                    // "my.package" package (in the descriptor folder)
+                TYPESYSTEM my.package.ExternalTypeSystem; // <- "ExternalTypeSystem.xml" 
+                    // in the "my.package" package (in the descriptor folder)
+                Document{->RETAINTYPE(SPACE,BREAK),CALL(ExternalEngine)}; 
+                    // calls ExternalEngine, but retains white spaces
+                ]]></programlisting>
+
+                </para>
+            </section>
+        </section>
+        <section id="ugr.tools.tm.quantifier">
+            <title>Quantifiers</title>
+            <para>
+            </para>
+            <section id="ugr.tools.tm.quantifier.sg">
+                <title>* Star Greedy</title>
+                <para>
+                    The Star Greedy quantifier matches on any amount of annotations and
+                    evaluates always true. Please mind, that a rule element with a Star
+                    Greedy quantifier needs to match on different annotations than the
+                    next rule element.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                Input:    small Big Big Big small
+                Rule:     CW*
+                Matched:  Big Big Big  
+                Matched:  Big Big 
+                Matched:  Big
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.quantifier.sr">
+                <title>*? Star Reluctant</title>
+                <para>
+                    The Star Reluctant quantifier matches on any amount of annotations
+                    and evaluates always true, but stops to match on new annotations,
+                    when the next rule element matches and evaluates true on this
+                    annotation.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                Input:    123 456 small small Big 
+                Rule:     W*? CW
+                Matched:  small small Big
+                Matched:  small Big
+                Matched:  Big
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.quantifier.pg">
+                <title>+ Plus Greedy</title>
+                <para>
+                    The Plus Greedy quantifier needs to match on at least one
+                    annotation. Please mind, that a rule element after a rule element
+                    with a Plus Greedy quantifier matches and evaluates on different
+                    conditions.
+
+                    Examples:
+
+                    <programlisting><![CDATA[
+                Input:    123 456 small small Big 
+                Rule:     SW+ 
+                Matched:  small small
+                Matched:  small 
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.quantifier.pr">
+                <title>+? Plus Reluctant</title>
+                <para>
+                    The Plus Reluctant quantifier has to match on at least one
+                    annotation in order to evaluate true, but stops when the next rule
+                    element is able to match on this annotation.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                Input:    123 456 small small Big 
+                Rule:     W+? CW
+                Matched:  small small Big
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.quantifier.qg">
+                <title>? Question Greedy</title>
+                <para>
+                    The Question Greedy quantifier matches optionally on an annotation
+                    and therefore always evaluates true.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                Input:    123 456 small Big small Big 
+                Rule:     SW CW? SW
+                Matched:  small Big small
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.quantifier.qr">
+                <title>?? Question Reluctant</title>
+                <para>
+                    The Question Reluctant quantifier matches optionally on an
+                    annotation if the next rule element can not match on the same
+                    annotation and therefore always evaluates true.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                Input:    123 456 small Big small Big 
+                Rule:     SW CW?? SW
+                Matched:  small Big small
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.quantifier.mmg">
+                <title>[x,y] Min Max Greedy</title>
+                <para>
+                    The Min Max Greedy quantifier has to match at least x and at most y
+                    annotations of its rule element to elaluate true.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                Input:    123 456 small Big small Big 
+                Rule:     SW CW[1,2] SW
+                Matched:  small Big small
+                ]]></programlisting>
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.quantifier.mmr">
+                <title>[x,y]? Min Max Reluctant</title>
+                <para>
+                    The Min Max Greedy quantifier has to match at least x and at most y
+                    annotations of its rule element to elaluate true, but stops to
+                    match
+                    on additional annotations if the next rule element is able to
+                    match
+                    on this annotation.
+
+                    Examples:
+                    <programlisting><![CDATA[
+                Input:    123 456 small Big Big Big small Big 
+                Rule:     SW CW[2,100]? SW
+                Matched:  small Big Big Big small
+                ]]></programlisting>
+                </para>
+            </section>
+        </section>
+        <section id="ugr.tools.tm.condition">
+            <title>Conditions</title>
+            <para>
+            </para>
+            <section id="ugr.tools.tm.condition.after">
+                <title>AFTER</title>
+                <para>
+
+                    The AFTER condition evaluates true, if an annotation of the given
+                    type preceeds the matched annotations.
+
+                    Definition
+                    <programlisting><![CDATA[AFTER(Type|TypeListExpression)  ]]></programlisting>
+
+                    Example
+
+                    <programlisting><![CDATA[CW{AFTER(SW)}; ]]></programlisting>
+                    Here, the rule matches on a capitalized word, if there is any small
+                    written word previously.
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.and">
+                <title>AND</title>
+                <para>
+                    The AND Condition is a composed condition and evaluates true, if
+                    all
+                    contained conditions are evaluated true.
+
+                    Definition
+
+                    <programlisting><![CDATA[AND(Condition1,...,ConditionN)  ]]></programlisting>
+
+                    Example
+
+                    <programlisting><![CDATA[Paragraph{AND(PARTOF(Headline),CONTAINS(Keyword))->MARK(ImportantHeadline)}; ]]></programlisting>
+
+                    In this example a Paragraph is annotated with the ImportantHealine
+                    annotation, if it is a Headline and contains Keyword.
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.before">
+                <title>BEFORE</title>
+                <para>
+
+                    The BEFORE condition evaluates true, if the matched annotations
+                    prceeds an annotation of the given type.
+
+                    Definition
+                    <programlisting><![CDATA[BEFORE (Type|TypeListExpression)  ]]></programlisting>
+
+                    Example
+
+                    <programlisting><![CDATA[CW{BEFORE(SW)}; ]]></programlisting>
+                    Here, the rule matches on a capitalized word, if there is any small
+                    written word afterwards.
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.contains">
+                <title>CONTAINS</title>
+                <para>
+
+                    The CONTAINS condition evaluates true if the amount or percentage
+                    of
+                    certain types in the window of the matched annotation is in a
+                    predefined interval.
+
+                    Definition
+
+                    <programlisting><![CDATA[CONTAINS(Type(,NumbericalExpression,NumbericalExpression(,BooleanExpression)?)?) ]]></programlisting>
+
+                    Example
+
+                    <programlisting><![CDATA[Paragraph{CONTAINS(Keyword)->MARK(KeywordParagraph)}; ]]></programlisting>
+
+                    A Pararaph is annotated with a KeywordParagraph annotation, if it
+                    contains a Keyword annotation.
+
+                    <programlisting><![CDATA[Paragraph{CONTAINS(Keyword,2,4)->MARK(KeywordParagraph)}; ]]></programlisting>
+
+                    A Pararaph is annotated with a KeywordParagraph annotation, if it
+                    contains between two and four Keyword annotations.
+
+                    <programlisting><![CDATA[Paragraph{CONTAINS(Keyword,50,100,true)->MARK(KeywordParagraph)}; ]]></programlisting>
+
+                    A Pararaph is annotated with a KeywordParagraph annotation, if it
+                    contains between 50% and 100% Keyword annotations. This is
+                    calculated based on the tokens of the Paragraph. If the Paragraph
+                    contains six basic annatotions, two of them are part of one Keyword
+                    annotation and one basic annotation is also annotated with a
+                    Keyword
+                    annotation, then the percantage of the contained Keywords
+                    is
+                    50%.
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.contextcount">
+                <title>CONTEXTCOUNT</title>
+                <para>
+
+                    The CONTEXTCOUNT condition counts the annotations of the matched
+                    type and stores the amount in a optional numerical variable.
+                    Additionally the condition evaluates true, if the amount is in a
+                    predefined interval.
+
+                    Definition
+
+                    <programlisting><![CDATA[CONTEXTCOUNT(Type(,NumbericalExpression,NumbericalExpression(,Variable)?)?) ]]></programlisting>
+
+                    Example
+
+                    <programlisting><![CDATA[Keyword{CONTEXTCOUNT(Paragraph,0,1000,var)->MARK(KeywordParagraph)}; ]]></programlisting>
+
+                    Here, the position in a Paragraph of the matched Keyword annotation
+                    is calculated and stored in the variable var.
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.count">
+                <title>COUNT</title>
+                <para>
+
+                    The COUNT condition counts the annotations of a given type and
+                    stores the amount in a optional numerical variable. Additionally
+                    the
+                    condition evaluates true, if the amount is in a predefined
+                    interval.
+
+                    Definition
+
+                    <programlisting><![CDATA[COUNT(Type(,NumbericalExpression,NumbericalExpression)?(,NumberVariable)?) ]]></programlisting>
+                    <programlisting><![CDATA[COUNT(ListExpression(,NumbericalExpression,NumbericalExpression)?(,NumberVariable)?) ]]></programlisting>
+
+                    Example
+
+                    <programlisting><![CDATA[Paragraph{COUNT(Keyword,1,10,var)->MARK(KeywordParagraph)}; ]]></programlisting>
+
+                    Here, the amount of Keyword annotations in a Paragraph is
+                    calculated
+                    and stored in the variable var. The action of the rule
+                    will be
+                    executed if one to ten Keywords were counted.
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.currentcount">
+                <title>CURRENTCOUNT</title>
+                <para>
+
+
+                    Definition
+
+                    <programlisting><![CDATA[CURRENTCOUNT(Type(,NumbericalExpression,NumbericalExpression(,Variable)?)?) ]]></programlisting>
+
+                    Example
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.endswith">
+                <title>ENDSWITH</title>
+                <para>
+
+                    The ENDSWITH condition evaluates true, if an annotation of the
+                    given
+                    type ends exactly at the same position as the matched
+                    annotation.
+
+                    Definition
+                    <programlisting><![CDATA[ENDSWITH(Type|TypeListExpression)  ]]></programlisting>
+
+                    Example
+
+                    <programlisting><![CDATA[Paragraph{ENDSWITH(SW)}; ]]></programlisting>
+                    Here, the rule matches on a Paragraph annotation, if it ends with
+                    small written word.
+
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.feature">
+                <title>FEATURE</title>
+                <para>
+
+
+                    The FEATURE condition compares a feature of the matched annotation
+                    with the the second argument.
 
-					Definition
+                    Definition
 
-					<programlisting><![CDATA[FEATURE(StringExpression,Expression) ]]></programlisting>
+                    <programlisting><![CDATA[FEATURE(StringExpression,Expression) ]]></programlisting>
 
-					Example
+                    Example
 
-					<programlisting><![CDATA[
+                    <programlisting><![CDATA[
 Document{FEATURE("language",targetLanguage)}
 ]]></programlisting>
-					Here, this rule matched, if the feature with the name "language" of
-					the document annotation equals the value of the variable
-					targetLanguage.
+                    Here, this rule matched, if the feature with the name "language" of
+                    the document annotation equals the value of the variable
+                    targetLanguage.
 
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.if">
-				<title>IF</title>
-				<para>
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.if">
+                <title>IF</title>
+                <para>
 
-					The IF evaluates true if the contained BooleanExpression does.
+                    The IF evaluates true if the contained BooleanExpression does.
 
-					Definition
+                    Definition
 
-					<programlisting><![CDATA[IF(BooleanExpression) ]]></programlisting>
+                    <programlisting><![CDATA[IF(BooleanExpression) ]]></programlisting>
 
-					Example
+                    Example
 
-					<programlisting><![CDATA[Paragraph{IF(keywordAmount > 5)->MARK(KeywordParagraph)}; ]]></programlisting>
+                    <programlisting><![CDATA[Paragraph{IF(keywordAmount > 5)->MARK(KeywordParagraph)}; ]]></programlisting>
 
-					A Paragraph annotation is annotated with a KeywordParagraph
-					annotation, if the value of the variable keywordAmount is greater
-					than five.
+                    A Paragraph annotation is annotated with a KeywordParagraph
+                    annotation, if the value of the variable keywordAmount is greater
+                    than five.
 
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.inlist">
-				<title>INLIST</title>
-				<para>
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.inlist">
+                <title>INLIST</title>
+                <para>
 
-					The INLIST condition is fulfilled, if the matched annotation is
-					listed in a given word list. The (relative) edit distance is
-					currently disabled.
+                    The INLIST condition is fulfilled, if the matched annotation is
+                    listed in a given word list. The (relative) edit distance is
+                    currently disabled.
 
-					Definition
+                    Definition
 
-					<programlisting><![CDATA[INLIST(WordList(,NumberExpression,(BooleanExpression)?)?) ]]></programlisting>
-					<programlisting><![CDATA[INLIST(StringList(,NumberExpression,(BooleanExpression)?)?) ]]></programlisting>
+                    <programlisting><![CDATA[INLIST(WordList(,NumberExpression,(BooleanExpression)?)?) ]]></programlisting>
+                    <programlisting><![CDATA[INLIST(StringList(,NumberExpression,(BooleanExpression)?)?) ]]></programlisting>
 
-					Example
+                    Example
 
-					<programlisting><![CDATA[Keyword{INLIST(names.txt)->MARK(SpecialKeyword)}; ]]></programlisting>
+                    <programlisting><![CDATA[Keyword{INLIST(names.txt)->MARK(SpecialKeyword)}; ]]></programlisting>
 
-					A Keyword is annotated with the type SpecialKeyword, if the text of
-					the Keyword annotation is listed in the word list names.txt.
+                    A Keyword is annotated with the type SpecialKeyword, if the text of
+                    the Keyword annotation is listed in the word list names.txt.
 
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.is">
-				<title>IS</title>
-				<para>
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.is">
+                <title>IS</title>
+                <para>
 
 
-					The IS conditions evaluates true, if there is an annotation of the
-					given type with the same offsets as the matched annotations
+                    The IS conditions evaluates true, if there is an annotation of the
+                    given type with the same offsets as the matched annotations
 
-					Definition
+                    Definition
 
 
-					<programlisting><![CDATA[IS(Type) ]]></programlisting>
+                    <programlisting><![CDATA[IS(Type) ]]></programlisting>
 
-					Example
+                    Example
 
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.isintag">
-				<title>ISINTAG</title>
-				<para>
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.isintag">
+                <title>ISINTAG</title>
+                <para>
 
-					The ISINTAG condition evaluates true, if the matched annotation is
-					in the given HTML tag. Attributes are currently disabled.
+                    The ISINTAG condition evaluates true, if the matched annotation is
+                    in the given HTML tag. Attributes are currently disabled.
 
-					Definition
+                    Definition
 
-					<programlisting><![CDATA[ISINTAG(StringExpression(,StringExpression '=' StringExpression)?) ]]></programlisting>
+                    <programlisting><![CDATA[ISINTAG(StringExpression(,StringExpression '=' StringExpression)?) ]]></programlisting>
 
-					Example
+                    Example
 
-					<programlisting><![CDATA[Paragraph{ISINTAG("h1")->MARK(Headline)}; ]]></programlisting>
+                    <programlisting><![CDATA[Paragraph{ISINTAG("h1")->MARK(Headline)}; ]]></programlisting>
 
-					A Paragraph is marked as a Headline, if the matched text is in a h1
-					HTML tag.
+                    A Paragraph is marked as a Headline, if the matched text is in a h1
+                    HTML tag.
 
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.last">
-				<title>LAST</title>
-				<para>
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.last">
+                <title>LAST</title>
+                <para>
 
-					The LAST condition evaluates true, if the type of the last token of
-					the matched annotation is subsumed by the given type.
+                    The LAST condition evaluates true, if the type of the last token of
+                    the matched annotation is subsumed by the given type.
 
-					Definition
+                    Definition
 
-					<programlisting><![CDATA[LAST(TypeExpression) ]]></programlisting>
+                    <programlisting><![CDATA[LAST(TypeExpression) ]]></programlisting>
 
-					Example
+                    Example
 
-					<programlisting><![CDATA[Document{LAST(CW)}; ]]></programlisting>
+                    <programlisting><![CDATA[Document{LAST(CW)}; ]]></programlisting>
 
-					This rule fires, if the last token of the document is a capitalized
-					word.
+                    This rule fires, if the last token of the document is a capitalized
+                    word.
 
-				</para>
-			</section>
-			<section id="ugr.tools.tm.condition.mofn">
-				<title>MOFN</title>
-				<para>
+                </para>
+            </section>
+            <section id="ugr.tools.tm.condition.mofn">
+                <title>MOFN</title>
+                <para>
 
-					The MOFN condition is a composed condition and evaluates true, if

[... 4973 lines stripped ...]