You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2012/11/05 17:56:05 UTC

svn commit: r1405876 - /uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.overview.xml

Author: pkluegl
Date: Mon Nov  5 16:56:04 2012
New Revision: 1405876

URL: http://svn.apache.org/viewvc?rev=1405876&view=rev
Log:
UIMA-2285
- added some more information about configuration parameters
- fixed formatting

Modified:
    uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.overview.xml

Modified: uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.overview.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.overview.xml?rev=1405876&r1=1405875&r2=1405876&view=diff
==============================================================================
--- uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.overview.xml (original)
+++ uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.overview.xml Mon Nov  5 16:56:04 2012
@@ -5,814 +5,603 @@
 <!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
 %uimaents;
 ]>
-<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor 
-	license agreements. See the NOTICE file distributed with this work for additional 
-	information regarding copyright ownership. The ASF licenses this file to 
-	you under the Apache License, Version 2.0 (the "License"); you may not use 
-	this file except in compliance with the License. You may obtain a copy of 
-	the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required 
-	by applicable law or agreed to in writing, software distributed under the 
-	License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 
-	OF ANY KIND, either express or implied. See the License for the specific 
-	language governing permissions and limitations under the License. -->
-
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
 <chapter id="ugr.tools.tm.introduction">
-	<title>TextMarker</title>
-	<para>The TextMarker system is an open source tool
-		for the development
-		of rule-based information extraction applications.
-		The development
-		environment is based on the DLTK framework. It
-		supports the knowledge
-		engineer with a full-featured rule editor,
-		components for the
-		explanation of the rule inference and a build
-		process for generic UIMA
-		Analysis Engines and Type Systems.
-		Therefore TextMarker components can
-		be easily created and combined
-		with other UIMA components in different
-		information extraction
-		pipelines rather flexibly.
-
-		TextMarker applies a
-		specialized rule representation language for the effective
-		knowledge
-		formalization:
-		The rules of the TextMarker language are composed of a
-		list of rule
-		elements that themselves consists of four parts: The
-		mandatory
-		matching condition establishes a connection to the input
-		document by
-		referring to an already existing concept, respectively
-		annotation.
-		The
-		optional quantifier defines the usage of the matching
-		condition
-		similar to regular expressions. Then, additional conditions
-		add
-		constraints to the matched text fragment and additional actions
-		determine the consequences of the rule. Therefore, TextMarker rules
-		match on a pattern of given annotations and, if the additional
-		conditions evaluate true, then they execute their actions, e.g.
-		create
-		a new annotation. If no initial annotations exist, for example,
-		created by another component, a scanner is used to seed simple token
-		annotations contained in a taxonomy.
-
-		The TextMarker system provides
-		unique functionality that is usually not
-		found in similar systems. The
-		actions are able to modify the document
-		either by replacing or
-		deleting
-		text fragments or by filtering the
-		view on the document. In
-		this case,
-		the rules ignore some
-		annotations,
-		e.g. HTML markup, or are
-		executed only
-		on the remaining text passages.
-		The knowledge engineer
-		is able to add
-		heuristic knowledge by using
-		scoring rules.
-		Additionally, several
-		language elements common to
-		scripting languages
-		like conditioned
-		statements, loops, procedures,
-		recursion, variables
-		and expressions
-		increase the expressiveness of
-		the language. Rules are
-		able to directly
-		invoke external rule sets or
-		arbitrary UIMA Analysis
-		Engines and foreign
-		libraries can be
-		integrated with the extension
-		mechanism for new
-		language elements.
-
-	</para>
-	<section id="ugr.tools.tm.introduction.metaphor">
-		<title>Introduction</title>
-		<para>
-			In manual information extraction humans often apply a strategy
-			according to a highlighter metaphor: First relevant headlines are
-			considered and classified according to their content by coloring
-			them
-			with different highlighters. The paragraphs of the annotated
-			headlines
-			are then considered further. Relevant text fragments or
-			single words
-			in the context of that headline can then be colored. In
-			this way, a
-			top-down analysis and extraction strategy is implemented.
-			Necessary
-			additional information can then be added that either refers
-			to other
-			text segments or contains valuable domain specific
-			information.
-			Finally the colored text can be easily analyzed
-			concerning the
-			relevant information.
-
-			The TextMarker system (textmarker
-			is a common german word for a
-			highlighter) tries to imitate this
-			manual extraction method by
-			formalizing the appropriate actions using
-			matching rules: The rules
-			mark sequences of words, extract text
-			segments or modify the input
-			document depending on textual
-			features.The default input for the
-			TextMarker system is
-			semi-structured text, but it can also process
-			structured or free
-			text.
-			Technically, HTML is often the input
-			format,
-			since most word
-			processing
-			documents can be converted to HTML.
-			Additionally, the
-			TextMarker
-			systems offers the possibility to
-			create
-			a modified output
-			document.
-		</para>
-	</section>
-	<section id="ugr.tools.tm.introduction.concepts">
-		<title>Core Concepts</title>
-		<para>
-			As a first step in the extraction process the TextMarker system uses
-			a
-			tokenizer (scanner) to tokenize the input document and to create a
-			stream of basic symbols. The types and valid annotations of the
-			possible tokens are predefined by a taxonomy of annotation types.
-			Annotations simply refer to a section of the input document and
-			assign a type or concept to the respective text fragment. The figure
-			on the right shows an excerpt of a basic annotation taxonomy: CW
-			describes all tokens, for example, that contains a single word
-			starting with a capital letter, MARKUP corresponds to HTML or XML
-			tags, and PM refers to all kinds of punctuations marks. Take a look
-			at [basic annotations|BasicAnnotationList] for a complete list of
-			initial annotations.
-
-
-			<screenshot>
-				<mediaobject>
-					<imageobject>
-						<imagedata scale="80" format="PNG" fileref="&imgroot;symboltaxo.png" />
-					</imageobject>
-					<textobject>
-						<phrase>Part of a taxonomy for basic annotation types.</phrase>
-					</textobject>
-				</mediaobject>
-			</screenshot>
-
-			By using (and extending) the taxonomy, the knowledge engineer is
-			able
-			to choose the most adequate types and concepts when defining new
-			matching rules, i.e., TextMarker rules for matching a text fragment
-			given by a set of symbols to an annotation. If the capitalization of
-			a word, for example, is of no importance, then the annotation type W
-			that describes words of any kind can be used. The initial scanner
-			creates a set of basic annotations that may be used by the matching
-			rules of the TextMarker language. However, most information
-			extraction applications require domain specific concepts and
-			annotations. Therefore, the knowledge engineer is able to extend the
-			set of annotations, and to define new annotation types tuned to the
-			requirements of the given domain. These types can be flexibly
-			integrated in the taxonomy of annotation types.
-
-			One of the goals in
-			developing a new information extraction language
-			was
-			to maintain an
-			easily readable syntax while still providing a
-			scalable
-			expressiveness
-			of the language. Basically, the TextMarker
-			language
-			contains
-			expressions for the definition of new annotation
-			types and
-			for defining
-			new matching rules. The rules are defined by a
-			list of
-			rule elements.
-			Each rule element contains at least a basic matching
-			condition
-			referring
-			to text fragments or already specified
-			annotations.
-			Additionally a
-			list of conditions and actions may be
-			specified for a
-			rule element.
-			Whereas the conditions describe
-			necessary attributes of
-			the matched
-			text fragment, the actions point
-			to operations and
-			assignments on
-			the
-			current fragments. These actions
-			will then only be
-			executed if all
-			basic conditions matched on a text
-			fragment or the
-			annotation and the
-			related conditions are fulfilled.
-		</para>
-	</section>
-	<section id="ugr.tools.tm.introduction.examples">
-		<title>Examples</title>
-		<para>
-			The usage of the language and its readability can be demonstrated by
-			simple examples:
-
-			<programlisting><![CDATA[
-                CW{INLIST('animals.txt') -> MARK(Animal)};
-                Animal "and" Animal{-> MARK(Animalpair, 1, 2, 3)};
-    ]]></programlisting>
-
-			The first rule looks at all capitalized words that are listed in an
-			external document animals.txt and creates a new annotation of the
-			type
-			animal using the boundaries of the matched word. The second rule
-			searches for an annotation of the type animal followed by the
-			literal
-			and and a second animal annotation. Then it will create a new
-			annotation animalpair covering the text segment that matched the
-			three
-			rule elements (the digit parameters refer to the number of
-			matched
-			rule element).
-
-			<programlisting><![CDATA[
-                Document{-> MARKFAST(Firstname, 'firstnames.txt')};
-                Firstname CW{-> MARK(Lastname)};
-                Paragraph{VOTE(Firstname, Lastname) -> LOG("Found more Firstnames than Lastnames")};
-    ]]></programlisting>
-
-			In this example, the first rule annotates all words that occur in
-			the
-			external document firstnames.txt with the type firstname. The
-			second
-			rule creates a lastname annotation for all capitalized word
-			that
-			follow a firstname annotation. The last rule finally processes
-			all
-			paragraph} annotations. If the VOTE condition counts more
-			firstname
-			than lastname annotations, then the rule writes a log entry
-			with a
-			predefined message.
-
-
-			<programlisting><![CDATA[
-                ANY+{PARTOF(Paragraph), CONTAINS(Delete, 50, 100, true) -> MARK(Delete)};
-                Firstname{-> MARK(Delete,1 , 2)} Lastname;
-                Delete{-> DEL};
-            ]]></programlisting>
-
-			Here, the first rule looks for sequences of any kind of tokens
-			except
-			markup and creates one annotation of the type delete for each
-			sequence, if the tokens are part of a paragraph annotation and
-			contains together already more than 50% of delete annoations. The +
-			signs indicate this greedy processing. The second rule annotates
-			first
-			names followed by last names with the type delete and the third
-			rule
-			simply deletes all text segments that are associated with that
-			delete
-			annotation.
-
-		</para>
-	</section>
-	<section id="ugr.tools.tm.introduction.features">
-		<title>Special Features</title>
-		<para>
-			The TextMarker language features some special characteristics
-			that are
-			usually not found in other rule-based information extraction
-			systems
-			or even shift it towards scripting languages. The possibility
-			of
-			creating new annotation types and integrating them into the
-			taxonomy
-			facilitates an even more modular development of information
-			extraction systems.
-
-			Read more about robust extraction using
-			filtering,
-			complex control
-			structures and heuristic extraction using
-			scoring
-			rules.
-		</para>
-	</section>
-	<section id="ugr.tools.tm.introduction.getstarted">
-		<title>Get started</title>
-		<para>
-			This section page gives you a short, technical introduction on
-			how to
-			get
-			started with TextMarker system and mostly just links the
-			information
-			of the other wiki pages. Some knowledge about the usage
-			of
-			Eclipse and
-			central concepts of UIMA are useful. TextMarker
-			consists of
-			the
-			TextMarker rule language (and of course the rule
-			inference) and the
-			TextMarker workbench. Additionally, the CEV plugin
-			is used to edit
-			and
-			visualize annotated text. The TextRuler system
-			with implementations of
-			well known rule learning methods and
-			development extension with
-			support for test-driven development are
-			already integrated.
-		</para>
-		<section id="ugr.tools.tm.introduction.getstarted.running">
-			<title>Up and running</title>
-			<para>
-				First of all, install the Workbench and read the introduction
-				and its
-				examples. In order to verify if the Workbench is correctly
-				installed,
-				take a look at Help-About Eclipse-Installation Details
-				and
-				compare
-				the installed plugins with the plugins you copied into
-				the
-				plugins
-				folder of your Eclipse application. Normally most of the
-				plugins do
-				not cause any troubles, but the CEV does because of the
-				XPCom and
-				XULRunner dependencies. You should at least get the XPCom
-				plugin up
-				and running. However, you cannot use the additional HTML
-				functionality without the XULRunner plugin. If the plugins of the
-				installation guide do not work properly and a google search for a
-				suiteable plugin is not successful, then write a mail to the user
-				list and we will try to solve the problem. If all plugins are
-				correctly installed, then start the Eclipse application and switch
-				to
-				the TextMarker perspective (Window-Open Perspective-Other...)
-			</para>
-		</section>
-		<section id="ugr.tools.tm.introduction.getstarted.example">
-			<title>Learn by example</title>
-			<para>
-				Having a running Workbench download the example project and
-				import/copy
-				this TextMarker project into your workspace. The project
-				contains
-				some simple rules for extraction the author, title and year
-				of
-				reference strings. Next, take a look at the project structure and
-				the
-				syntax and compare it with the example project and its contents.
-				Open
-				the Main.tm TextMarker script in the folder
-				script/de.uniwue.example
-				and press the Run button in the Eclipse
-				toolbar. The docments in
-				the
-				input folder will then be processed by
-				the Main.tm file and the
-				result of the information extraction task
-				is
-				placed in the output
-				folder. As you can see, there are four
-				files: an
-				xmiCAS for each
-				input file and a HTML file (the
-				modifed/colored
-				result). Open one of
-				the .xmi files with the CAS
-				Editor plugin (-popup
-				menu-Open with) and
-				select some checkboxes in
-				the Annotation Browser
-				view.
-			</para>
-		</section>
-		<section id="ugr.tools.tm.introduction.getstarted.doit">
-			<title>Do it yourself</title>
-			<para>
-				Try to write some rules yourself. Read the description of the
-				available
-				language constructs, e.g., conditions and actions and use
-				the
-				explanation component in order to take a closer look at the rule
-				inference. Then finally, read the rest of this document.
-			</para>
-		</section>
-	</section>
-	<section id="ugr.tools.tm.ae">
-		<title>TextMarker Analysis Engine</title>
-		<para>
-			- TextMarker in UIMA, only a AE which is parameterized and
-			controlled
-			by that.
-		</para>
-		<section id="ugr.tools.tm.ae.parameter">
-			<title>Configuration Parameters</title>
-			<para>
-				The configuration parameters of the TextMarker analysis engines can
-				be separated into three different groups: parameters for the setup
-				of
-				the environment (
-				<xref linkend='ugr.tools.tm.ae.parameter.mainScript' />
-				to
-				<xref linkend='ugr.tools.tm.ae.parameter.additionalExtensions' />
-				), parameters that change the behavior of the analysis engine (
-				<xref linkend='ugr.tools.tm.ae.parameter.reloadScript' />
-				to
-				<xref linkend='ugr.tools.tm.ae.parameter.simpleGreedyForComposed' />
-				) and parameters for creating additional information how the rules
-				were executed (
-				<xref linkend='ugr.tools.tm.ae.parameter.debug' />
-				to
-				<xref linkend='ugr.tools.tm.ae.parameter.createdBy' />
-				). First, a short overview of the configuration parameters is given
-				in
-				<ref linkend='table.ugr.tools.tm.ae.parameter' />
-				. Then all parameters are described in detail with examples.
-
-				<table id="table.ugr.tools.tm.ae.parameter" frame="all">
-					<title>Configuration parameters of the TextMarker Analysis Engine
-					</title>
-					<tgroup cols="3" colsep="1" rowsep="1">
-						<colspec colname="c1" colwidth="1.2*" />
-						<colspec colname="c2" colwidth="2*" />
-						<colspec colname="c3" colwidth="0.8*" />
-						<thead>
-							<row>
-								<entry align="center">Name</entry>
-								<entry align="center">Short description</entry>
-								<entry align="center">Type</entry>
-							</row>
-						</thead>
-						<tbody>
-							<row>
-								<entry>
-									mainScript
-									<ref linkend='ugr.tools.tm.ae.parameter.mainScript' />
-								</entry>
-								<entry>
-									Name with complete namespace of the script which will be
-									interpreted and executed by the analysis engine.
-								</entry>
-								<entry>
-									Single String
-								</entry>
-							</row>
-							<row>
-								<entry>scriptEncoding</entry>
-								<entry>
-									Encoding of all TextMarker script files.
-								</entry>
-								<entry>
-									Single String
-								</entry>
-							</row>
-							<row>
-								<entry>scriptPaths</entry>
-								<entry>
-									List of absolute locations, which contain the neccessary
-									script files like the main script.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-							<row>
-								<entry>descriptorPaths</entry>
-								<entry>
-									List of absolute locations, which contain the neccessary
-									descriptor files like type systems.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-							<row>
-								<entry>resourcePaths</entry>
-								<entry>
-									List of absolute locations, which contain the neccessary
-									resource files like word lists.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-							<row>
-								<entry>additionalScripts</entry>
-								<entry>
-									List of names with complete namespace of additional
-									scripts, which can be referred to.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-							<row>
-								<entry>additionalEngines</entry>
-								<entry>
-									List of names with complete namespace of additional
-									analysis engines, which can be called by TextMarker rules.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-							<row>
-								<entry>additionalEngineLoaders</entry>
-								<entry>
-									List of class names of implementations that are able to
-									perform additional task when loading external analysis engines.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-							<row>
-								<entry>additionalExtensions</entry>
-								<entry>
-									List of factory classes for additional extensions of the
-									TextMarker language like proprietary conditions.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-
-							<row>
-								<entry>reloadScript</entry>
-								<entry>
-									Option to initialize the rule script each time the
-									analysis engine
-									processes a CAS.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-							<row>
-								<entry>seeders</entry>
-								<entry>
-									List of class names that provide additional annoations
-									before the rules are executed.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-							<row>
-								<entry>defaultFilteredTypes</entry>
-								<entry>
-									List of complete type names of annoations that are
-									invisible by default.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-							<row>
-								<entry>removeBasics</entry>
-								<entry>
-									Option to remove all inference annoations after execution
-									of the rule script.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-							<row>
-								<entry>dynamicAnchoring</entry>
-								<entry>
-									Option to allow rule matches to start at any rule
-									element.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-							<row>
-								<entry>lowMemoryProfile</entry>
-								<entry>
-									Option to decrease the memory consumption when processing
-									a large CAS.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-							<row>
-								<entry>simpleGreedyForComposed</entry>
-								<entry>
-									Option to activate a different inferencer for composed
-									rule elements.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-
-
-							<row>
-								<entry>debug</entry>
-								<entry>
-									Option to add debug information to the CAS.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-							<row>
-								<entry>debugWithMatches</entry>
-								<entry>
-									Option to add information about the rule matches to the
-									CAS.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-							<row>
-								<entry>debugOnlyFor</entry>
-								<entry>
-									List of rule ids. If provided, then debug information is
-									only created for those rules.
-								</entry>
-								<entry>
-									Multi String
-								</entry>
-							</row>
-							<row>
-								<entry>profile</entry>
-								<entry>
-									Option to add profile information to the CAS.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-							<row>
-								<entry>statistics</entry>
-								<entry>
-									Option to add statistics of conditions and actions to the
-									CAS.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-							<row>
-								<entry>createdBy</entry>
-								<entry>
-									Option to add additional information, which rule created
-									a annotation.
-								</entry>
-								<entry>
-									Single Boolean
-								</entry>
-							</row>
-
-
-
-						</tbody>
-					</tgroup>
-				</table>
-			</para>
-			<section id="ugr.tools.tm.ae.parameter.mainScript">
-				<title>mainScript</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.scriptEncoding">
-				<title>scriptEncoding</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.scriptPaths">
-				<title>scriptPaths</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.descriptorPaths">
-				<title>descriptorPaths</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.resourcePaths">
-				<title>resourcePaths</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.additionalScripts">
-				<title>additionalScripts</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.additionalEngines">
-				<title>additionalEngines</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.additionalEngineLoaders">
-				<title>additionalEngineLoaders</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.additionalExtensions">
-				<title>additionalExtensions</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.reloadScript">
-				<title>reloadScript</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.seeders">
-				<title>seeders</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.defaultFilteredTypes">
-				<title>defaultFilteredTypes</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.removeBasics">
-				<title>removeBasics</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.dynamicAnchoring">
-				<title>dynamicAnchoring</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.lowMemoryProfile">
-				<title>lowMemoryProfile</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.simpleGreedyForComposed">
-				<title>simpleGreedyForComposed</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.debug">
-				<title>debug</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.debugWithMatches">
-				<title>debugWithMatches</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.debugOnlyFor">
-				<title>debugOnlyFor</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.profile">
-				<title>profile</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.statistics">
-				<title>statistics</title>
-				<para>
-				</para>
-			</section>
-			<section id="ugr.tools.tm.ae.parameter.createdBy">
-				<title>createdBy</title>
-				<para>
-				</para>
-			</section>
-
-		</section>
-	</section>
+  <title>TextMarker</title>
+  <para>The TextMarker system is an open source tool for the development of rule-based information
+    extraction applications. The development environment is based on the DLTK framework. It supports
+    the knowledge engineer with a full-featured rule editor, components for the explanation of the
+    rule inference and a build process for generic UIMA Analysis Engines and Type Systems. Therefore
+    TextMarker components can be easily created and combined with other UIMA components in different
+    information extraction pipelines rather flexibly. TextMarker applies a specialized rule
+    representation language for the effective knowledge formalization: The rules of the TextMarker
+    language are composed of a list of rule elements that themselves consists of four parts: The
+    mandatory matching condition establishes a connection to the input document by referring to an
+    already existing concept, respectively annotation. The optional quantifier defines the usage of
+    the matching condition similar to regular expressions. Then, additional conditions add
+    constraints to the matched text fragment and additional actions determine the consequences of
+    the rule. Therefore, TextMarker rules match on a pattern of given annotations and, if the
+    additional conditions evaluate true, then they execute their actions, e.g. create a new
+    annotation. If no initial annotations exist, for example, created by another component, a
+    scanner is used to seed simple token annotations contained in a taxonomy. The TextMarker system
+    provides unique functionality that is usually not found in similar systems. The actions are able
+    to modify the document either by replacing or deleting text fragments or by filtering the view
+    on the document. In this case, the rules ignore some annotations, e.g. HTML markup, or are
+    executed only on the remaining text passages. The knowledge engineer is able to add heuristic
+    knowledge by using scoring rules. Additionally, several language elements common to scripting
+    languages like conditioned statements, loops, procedures, recursion, variables and expressions
+    increase the expressiveness of the language. Rules are able to directly invoke external rule
+    sets or arbitrary UIMA Analysis Engines and foreign libraries can be integrated with the
+    extension mechanism for new language elements.
+
+  </para>
+  <section id="ugr.tools.tm.introduction.metaphor">
+    <title>Introduction</title>
+    <para> In manual information extraction humans often apply a strategy according to a highlighter
+      metaphor: First relevant headlines are considered and classified according to their content by
+      coloring them with different highlighters. The paragraphs of the annotated headlines are then
+      considered further. Relevant text fragments or single words in the context of that headline
+      can then be colored. In this way, a top-down analysis and extraction strategy is implemented.
+      Necessary additional information can then be added that either refers to other text segments
+      or contains valuable domain specific information. Finally the colored text can be easily
+      analyzed concerning the relevant information.The TextMarker system (textmarker is a common
+      german word for a highlighter) tries to imitate this manual extraction method by formalizing
+      the appropriate actions using matching rules: The rules mark sequences of words, extract text
+      segments or modify the input document depending on textual features.The default input for the
+      TextMarker system is semi-structured text, but it can also process structured or free text.
+      Technically, HTML is often the input format, since most word processing documents can be
+      converted to HTML. Additionally, the TextMarker systems offers the possibility to create a
+      modified output document.
+    </para>
+  </section>
+  <section id="ugr.tools.tm.introduction.concepts">
+    <title>Core Concepts</title>
+    <para>
+      As a first step in the extraction process the TextMarker system uses a tokenizer (scanner) to
+      tokenize the input document and to create a stream of basic symbols. The types and valid
+      annotations of the possible tokens are predefined by a taxonomy of annotation types.
+      Annotations simply refer to a section of the input document and assign a type or concept to
+      the respective text fragment. The figure on the right shows an excerpt of a basic annotation
+      taxonomy: CW describes all tokens, for example, that contains a single word starting with a
+      capital letter, MARKUP corresponds to HTML or XML tags, and PM refers to all kinds of
+      punctuations marks. Take a look at [basic annotations|BasicAnnotationList] for a complete list
+      of initial annotations.
+      <screenshot>
+        <mediaobject>
+          <imageobject>
+            <imagedata scale="80" format="PNG" fileref="&imgroot;symboltaxo.png" />
+          </imageobject>
+          <textobject>
+            <phrase>Part of a taxonomy for basic annotation types.</phrase>
+          </textobject>
+        </mediaobject>
+      </screenshot>
+      By using (and extending) the taxonomy, the knowledge engineer is able to choose the most
+      adequate types and concepts when defining new matching rules, i.e., TextMarker rules for
+      matching a text fragment given by a set of symbols to an annotation. If the capitalization of
+      a word, for example, is of no importance, then the annotation type W that describes words of
+      any kind can be used. The initial scanner creates a set of basic annotations that may be used
+      by the matching rules of the TextMarker language. However, most information extraction
+      applications require domain specific concepts and annotations. Therefore, the knowledge
+      engineer is able to extend the set of annotations, and to define new annotation types tuned to
+      the requirements of the given domain. These types can be flexibly integrated in the taxonomy
+      of annotation types. One of the goals in developing a new information extraction language was
+      to maintain an easily readable syntax while still providing a scalable expressiveness of the
+      language. Basically, the TextMarker language contains expressions for the definition of new
+      annotation types and for defining new matching rules. The rules are defined by a list of rule
+      elements. Each rule element contains at least a basic matching condition referring to text
+      fragments or already specified annotations. Additionally a list of conditions and actions may
+      be specified for a rule element. Whereas the conditions describe necessary attributes of the
+      matched text fragment, the actions point to operations and assignments on the current
+      fragments. These actions will then only be executed if all basic conditions matched on a text
+      fragment or the annotation and the related conditions are fulfilled.
+    </para>
+  </section>
+  <section id="ugr.tools.tm.introduction.examples">
+    <title>Examples</title>
+    <para>
+      The usage of the language and its readability can be demonstrated by simple examples:
+      <programlisting><![CDATA[CW{INLIST('animals.txt') -> MARK(Animal)};        Animal "and" Animal{-> MARK(Animalpair, 1, 2, 3)}; ]]></programlisting>
+      The first rule looks at all capitalized words that are listed in an external document
+      animals.txt and creates a new annotation of the type animal using the boundaries of the
+      matched word. The second rule searches for an annotation of the type animal followed by the
+      literal and and a second animal annotation. Then it will create a new annotation animalpair
+      covering the text segment that matched the three rule elements (the digit parameters refer to
+      the number of matched rule element).
+      <programlisting><![CDATA[Document{-> MARKFAST(Firstname, 'firstnames.txt')};        Firstname CW{-> MARK(Lastname)};        Paragraph{VOTE(Firstname, Lastname) -> LOG("Found more Firstnames than Lastnames")}; ]]></programlisting>
+      In this example, the first rule annotates all words that occur in the external document
+      firstnames.txt with the type firstname. The second rule creates a lastname annotation for all
+      capitalized word that follow a firstname annotation. The last rule finally processes all
+      paragraph} annotations. If the VOTE condition counts more firstname than lastname annotations,
+      then the rule writes a log entry with a predefined message.
+      <programlisting><![CDATA[ANY+{PARTOF(Paragraph), CONTAINS(Delete, 50, 100, true) -> MARK(Delete)};        Firstname{-> MARK(Delete,1 , 2)} Lastname;     Delete{-> DEL};   ]]></programlisting>
+      Here, the first rule looks for sequences of any kind of tokens except markup and creates one
+      annotation of the type delete for each sequence, if the tokens are part of a paragraph
+      annotation and contains together already more than 50% of delete annoations. The + signs
+      indicate this greedy processing. The second rule annotates first names followed by last names
+      with the type delete and the third rule simply deletes all text segments that are associated
+      with that delete annotation.
+    </para>
+  </section>
+  <section id="ugr.tools.tm.introduction.features">
+    <title>Special Features</title>
+    <para> The TextMarker language features some special characteristics that are usually not found
+      in other rule-based information extraction systems or even shift it towards scripting
+      languages. The possibility of creating new annotation types and integrating them into the
+      taxonomy facilitates an even more modular development of information extraction systems. Read
+      more about robust extraction using filtering, complex control structures and heuristic
+      extraction using scoring rules.
+    </para>
+  </section>
+  <section id="ugr.tools.tm.introduction.getstarted">
+    <title>Get started</title>
+    <para> This section page gives you a short, technical introduction on how to get started with
+      TextMarker system and mostly just links the information of the other wiki pages. Some
+      knowledge about the usage of Eclipse and central concepts of UIMA are useful. TextMarker
+      consists of the TextMarker rule language (and of course the rule inference) and the TextMarker
+      workbench. Additionally, the CEV plugin is used to edit and visualize annotated text. The
+      TextRuler system with implementations of well known rule learning methods and development
+      extension with support for test-driven development are already integrated.
+    </para>
+    <section id="ugr.tools.tm.introduction.getstarted.running">
+      <title>Up and running</title>
+      <para> First of all, install the Workbench and read the introduction and its examples. In
+        order to verify if the Workbench is correctly installed, take a look at Help-About
+        Eclipse-Installation Details and compare the installed plugins with the plugins you copied
+        into the plugins folder of your Eclipse application. Normally most of the plugins do not
+        cause any troubles, but the CEV does because of the XPCom and XULRunner dependencies. You
+        should at least get the XPCom plugin up and running. However, you cannot use the additional
+        HTML functionality without the XULRunner plugin. If the plugins of the installation guide do
+        not work properly and a google search for a suiteable plugin is not successful, then write a
+        mail to the user list and we will try to solve the problem. If all plugins are correctly
+        installed, then start the Eclipse application and switch to the TextMarker perspective
+        (Window-Open Perspective-Other...)
+      </para>
+    </section>
+    <section id="ugr.tools.tm.introduction.getstarted.example">
+      <title>Learn by example</title>
+      <para> Having a running Workbench download the example project and import/copy this TextMarker
+        project into your workspace. The project contains some simple rules for extraction the
+        author, title and year of reference strings. Next, take a look at the project structure and
+        the syntax and compare it with the example project and its contents. Open the Main.tm
+        TextMarker script in the folder script/de.uniwue.example and press the Run button in the
+        Eclipse toolbar. The docments in the input folder will then be processed by the Main.tm file
+        and the result of the information extraction task is placed in the output folder. As you can
+        see, there are four files: an xmiCAS for each input file and a HTML file (the
+        modifed/colored result). Open one of the .xmi files with the CAS Editor plugin (-popup
+        menu-Open with) and select some checkboxes in the Annotation Browser view.
+      </para>
+    </section>
+    <section id="ugr.tools.tm.introduction.getstarted.doit">
+      <title>Do it yourself</title>
+      <para> Try to write some rules yourself. Read the description of the available language
+        constructs, e.g., conditions and actions and use the explanation component in order to take
+        a closer look at the rule inference. Then finally, read the rest of this document.
+      </para>
+    </section>
+  </section>
+  <section id="ugr.tools.tm.ae">
+    <title>TextMarker Analysis Engine</title>
+    <para> Description of TextMarker and other Analysis Engines</para>
+    <section id="ugr.tools.tm.ae.parameter">
+      <title>Configuration Parameters</title>
+      <para>
+        The configuration parameters of the TextMarker analysis engines can be separated into three
+        different groups: parameters for the setup of the environment (
+        <link linkend='ugr.tools.tm.ae.parameter.mainScript'> mainScript</link>
+        to
+        <link linkend='ugr.tools.tm.ae.parameter.additionalExtensions'> additionalExtensions</link>
+        ), parameters that change the behavior of the analysis engine (
+        <link linkend='ugr.tools.tm.ae.parameter.reloadScript'> reloadScript</link>
+        to
+        <link linkend='ugr.tools.tm.ae.parameter.simpleGreedyForComposed'> simpleGreedyForComposed</link>
+        ) and parameters for creating additional information how the rules were executed (
+        <link linkend='ugr.tools.tm.ae.parameter.debug'> debug</link>
+        to
+        <link linkend='ugr.tools.tm.ae.parameter.createdBy'> createdBy</link>
+        ). First, a short overview of the configuration parameters is given in
+        <xref linkend='table.ugr.tools.tm.ae.parameter' />
+        . Then all parameters are described in detail with examples.
+        <table id="table.ugr.tools.tm.ae.parameter" frame="all">
+          <title>Configuration parameters of the TextMarker Analysis Engine   </title>
+          <tgroup cols="3" colsep="1" rowsep="1">
+            <colspec colname="c1" colwidth="1.2*" />
+            <colspec colname="c2" colwidth="2*" />
+            <colspec colname="c3" colwidth="0.8*" />
+            <thead>
+              <row>
+                <entry align="center">Name</entry>
+                <entry align="center">Short description</entry>
+                <entry align="center">Type</entry>
+              </row>
+            </thead>
+            <tbody>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.mainScript'>mainScript</link>
+                </entry>
+                <entry>Name with complete namespace of the script which will be interpreted and
+                  executed by the analysis engine.
+                </entry>
+                <entry>Single String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.scriptEncoding'>scriptEncoding</link>
+                </entry>
+                <entry>Encoding of all TextMarker script files.</entry>
+                <entry>Single String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.scriptPaths'>scriptPaths</link>
+                </entry>
+                <entry>List of absolute locations, which contain the neccessary script files like
+                  the main script.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.descriptorPaths'>descriptorPaths</link>
+                </entry>
+                <entry>List of absolute locations, which contain the neccessary descriptor files
+                  like type systems.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.resourcePaths'>resourcePaths</link>
+                </entry>
+                <entry>List of absolute locations, which contain the neccessary resource files like
+                  word lists.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.additionalScripts'>additionalScripts</link>
+                </entry>
+                <entry>List of names with complete namespace of additional scripts, which can be
+                  referred to.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.additionalEngines'>additionalEngines</link>
+                </entry>
+                <entry>List of names with complete namespace of additional analysis engines, which
+                  can be called by TextMarker rules.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.additionalEngineLoaders'>additionalEngineLoaders</link>
+                </entry>
+                <entry>List of class names of implementations that are able to perform additional
+                  task when loading external analysis engines.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.additionalExtensions'>additionalExtensions</link>
+                </entry>
+                <entry>List of factory classes for additional extensions of the TextMarker language
+                  like proprietary conditions.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.reloadScript'>reloadScript</link>
+                </entry>
+                <entry>Option to initialize the rule script each time the analysis engine processes
+                  a CAS.
+                </entry>
+                <entry>Single Boolean</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.seeders'>seeders</link>
+                </entry>
+                <entry>List of class names that provide additional annoations before the rules are
+                  executed.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.defaultFilteredTypes'>defaultFilteredTypes</link>
+                </entry>
+                <entry>List of complete type names of annoations that are invisible by default.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.removeBasics'>removeBasics</link>
+                </entry>
+                <entry>Option to remove all inference annoations after execution of the rule script.
+                </entry>
+                <entry>Single Boolean</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.dynamicAnchoring'>dynamicAnchoring</link>
+                </entry>
+                <entry>Option to allow rule matches to start at any rule element.</entry>
+                <entry>Single Boolean</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.lowMemoryProfile'>lowMemoryProfile</link>
+                </entry>
+                <entry>Option to decrease the memory consumption when processing a large CAS.
+                </entry>
+                <entry>Single Boolean</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.simpleGreedyForComposed'>simpleGreedyForComposed</link>
+                </entry>
+                <entry>Option to activate a different inferencer for composed rule elements.</entry>
+                <entry>Single Boolean</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.debug'>debug</link>
+                </entry>
+                <entry>Option to add debug information to the CAS.</entry>
+                <entry>Single Boolean</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.debugWithMatches'>debugWithMatches</link>
+                </entry>
+                <entry>Option to add information about the rule matches to the CAS.</entry>
+                <entry>Single Boolean</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.debugOnlyFor'>debugOnlyFor</link>
+                </entry>
+                <entry>List of rule ids. If provided, then debug information is only created for
+                  those rules.
+                </entry>
+                <entry>Multi String</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.profile'>profile</link>
+                </entry>
+                <entry>Option to add profile information to the CAS.</entry>
+                <entry>Single Boolean</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.statistics'>statistics</link>
+                </entry>
+                <entry>Option to add statistics of conditions and actions to the CAS.</entry>
+                <entry>Single Boolean</entry>
+              </row>
+              <row>
+                <entry>
+                  <link linkend='ugr.tools.tm.ae.parameter.createdBy'>createdBy</link>
+                </entry>
+                <entry>Option to add additional information, which rule created a annotation.
+                </entry>
+                <entry>Single Boolean</entry>
+              </row>
+
+            </tbody>
+          </tgroup>
+        </table>
+      </para>
+      <section id="ugr.tools.tm.ae.parameter.mainScript">
+        <title>mainScript</title>
+        <para>
+          This parameter specifies the rule file that will be executed by the analysis engine and is
+          therefore one of the most important ones. The extact name of the script is given by the complete namespace of the file, which correspond to its location
+          relative to the given parameter <link linkend='ugr.tools.tm.ae.parameter.scriptPaths'>scriptPaths</link>.
+          The single names of packages (or folders) are separated by periods. An exemplary value for this parameter could be "org.apache.uima.Main", 
+          whereas "Main" specifies the file containing the rules and "org.apache.uima" its package.
+          In this case, the analysis engine loads the script file "Main.tm", which is located in the folder structure "org/apache/uima/".
+          This parameter has no default value and ha sto be provided, although it is not specified as mandatory.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.scriptEncoding">
+        <title>scriptEncoding</title>
+        <para>
+          This parameter specifies the encoding of the rule files. Its default value is "UTF-8".
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.scriptPaths">
+        <title>scriptPaths</title>
+        <para>
+          The parameter scriptPaths refers to a list of String values, which specify the possible locations of script files. 
+          The given locations are absolute paths. A typical value for this parameter is for example "C:/TextMarker/MyProject/script/".
+          If the parameter <link linkend='ugr.tools.tm.ae.parameter.mainScript'>mainScript</link> is set to org.apache.uima.Main, 
+          then the absolute path of the script file has to be "C:/TextMarker/MyProject/script/org/apache/uima/Main.tm". 
+          This parameter can contain multiple values, as the main script can refer to multiple projects similar to a class path in Java.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.descriptorPaths">
+        <title>descriptorPaths</title>
+        <para>
+          This parameter specifies the possible locations for descriptors like analysis engines or type systems, similar to the parameter
+          <link linkend='ugr.tools.tm.ae.parameter.scriptPaths'>scriptPaths</link> for the script files. A typical value for this parameter 
+          is for example "C:/TextMarker/MyProject/descriptor/".
+          The relative values of the parameter <link linkend='ugr.tools.tm.ae.parameter.additionalEngines'>additionalEngines</link> are
+          resolved to these absolute locations.
+          This parameter can contain multiple values, as the main script can refer to multiple projects similar to a class path in Java.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.resourcePaths">
+        <title>resourcePaths</title>
+        <para>
+          This parameter specifies the possible locations of additional resources like word lists or CSV tables. The string values have to contain absolute
+          locations, for example, "C:/TextMarker/MyProject/resources/".
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.additionalScripts">
+        <title>additionalScripts</title>
+        <para>
+          The parameter additionalScripts is defined as a list of string values and contains script files, 
+          which are additionally loaded by the analysis engine. These script files are specified by their 
+          complete namespace, exactly like the value of the parameter <link linkend='ugr.tools.tm.ae.parameter.mainScript'>mainScript</link> 
+          and can be refered to by language elements, e.g., by executing the containing rules. An exemplary 
+          value of this parameter is "org.apache.uima.SecondaryScript". In this example, the main script could import 
+          this script file by the declaration "SCRIPT org.apache.uima.SecondaryScript;" and then could execute it with the rule 
+          "Document{-> CALL(SecondaryScript)};". 
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.additionalEngines">
+        <title>additionalEngines</title>
+        <para>
+          This parameter contains a list of additional analysis engines, which can be executed by the TextMarker rules. The single values
+          are given by the name of the analysis engine with their complete namespace and have to be located relative to one value of the parameter
+          <link linkend='ugr.tools.tm.ae.parameter.descriptorPaths'>descriptorPaths</link>, the location, where the analysis engine searches for the descriptor file.
+          An exmaple for one value of the parameter is "utils.HtmlAnnotator", which points to the descriptor "HtmlAnnotator.xml" in the folder "utils".
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.additionalEngineLoaders">
+        <title>additionalEngineLoaders</title>
+        <para>
+          The parameter "additionalEngineLoaders" specifies are list of optional implementations of the interface 
+          "org.apache.uima.textmarker.extensions.IEngineLoader", which can be used to application-specific configurations of
+          additional analysis engines.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.additionalExtensions">
+        <title>additionalExtensions</title>
+        <para>
+          This parameter specifies optional extensions of the TextMarker language. The elements of the string list must implement the interface 
+          "org.apache.uima.textmarker.extensions.ITextMarkerExtension". With those extensions, application-specific conditions and actions can be
+          added to the set of provided ones.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.reloadScript">
+        <title>reloadScript</title>
+        <para>
+          This boolean parameter indicates wether the script or resource files should be reloaded when processing a cas. The default value is set to false. 
+          In this case, the script files are loaded when the analysis engine is initialized. If script files or resource files are extended, e.g., a dictionary is filled
+          yet when a collection of documents are processed, then the parameter is need to be set to true in order to include the changes.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.seeders">
+        <title>seeders</title>
+        <para>
+          This list of string values refer to implementations of the interface "org.apache.uima.textmarker.seed.TextMarkerAnnotationSeeder", 
+          which can be used to automatically add annotations to the CAS. The default value of the parameter is a single seeder, namely "org.apache.uima.textmarker.seed.DefaultSeeder"
+          that adds annotations for token classes like CW, MARKUP or SEMICOLON. Remember that additional annoations can also be added with 
+          an additional engine that is executed by a TextMarker rule.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.defaultFilteredTypes">
+        <title>defaultFilteredTypes</title>
+        <para>
+          This parameter specifies a list of types, which are filtered by default when executing a script file. Using the default values of this parameter,
+          whitespaces, line breaks and markup elements are not visible to TextMarker rules. The visibility of annoations and therefore the covered text can be changed
+          using the actions <link linkend='ugr.tools.tm.language.actions.filtertype'>FILTERTYPE</link> and 
+          <link linkend='ugr.tools.tm.language.actions.retaintype'>RETAINTYPE</link>.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.removeBasics">
+        <title>removeBasics</title>
+        <para>
+          This parameter specifies whether the inference annoations created by the analysis engine should be removed after processing the CAS.
+          The default value is set to false.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.dynamicAnchoring">
+        <title>dynamicAnchoring</title>
+        <para>
+          If this parameter is set to true, then the TextMarker rules are not forced to start to match with the first rule element. 
+          Rather the rule element referring to the most rare type is chosen. Therefore, this option can be utilized to optimize the performance.
+          Please mind that the matching result can vary in some cases when greedy rule elements are applied.
+          The default value is set to false.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.lowMemoryProfile">
+        <title>lowMemoryProfile</title>
+        <para>
+          This parameter specifies whether the memory consumption should be reduced. This parameter should be set to true for 
+          very large CAS documents (e.g., > 500k tokens), but it also reduces the performance. The default value is set to false.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.simpleGreedyForComposed">
+        <title>simpleGreedyForComposed</title>
+          This parameter specifies whether a different inference strategy for composed rule elements should be applied. This option is only neccessary, 
+          if the composed rule element is expected to match very often, e.g., a rule element like (ANY ANY).
+          The default value of this parameter is set to false.
+        <para>
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.debug">
+        <title>debug</title>
+        <para>
+          If this parameter is set to true, then additional information about the execution of a rule script is added to the CAS.
+          The actual information is specified by the following parameters.
+          The default value of this parameter is set to false.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.debugWithMatches">
+        <title>debugWithMatches</title>
+        <para>
+          This parameter specificies whether the match information (covered text) of the rules should be stored in the CAS.
+          The default value of this parameter is set to false.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.debugOnlyFor">
+        <title>debugOnlyFor</title>
+        <para>
+          This parameter specifies a list of rule ids that enumeratethe rule for which debug information should be created. 
+          No specific ids are given by default.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.profile">
+        <title>profile</title>
+        <para>
+          If this parameter is set to true, then additional information about the runtime of applied rules is added to the CAS.
+          The default value of this parameter is set to false.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.statistics">
+        <title>statistics</title>
+        <para>
+          If this parameter is set to true, then additional information about the runtime of TextMarker lanuage elements like conditions and actions
+          is added to the CAS.
+          The default value of this parameter is set to false.
+        </para>
+      </section>
+      <section id="ugr.tools.tm.ae.parameter.createdBy">
+        <title>createdBy</title>
+        <para>
+          If this parameter is set to true, then additional information is added to the CAS about what annotation was created by which rule.
+          The default value of this parameter is set to false.
+        </para>
+      </section>
+    </section>
+  </section>
 </chapter>
\ No newline at end of file