You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2013/03/05 16:27:47 UTC
svn commit: r1452847 [3/4] - in /uima/site/trunk/uima-website/docs/d:
textmarker-2.0.0/ textmarker-2.0.0/issuesFixed/
textmarker-2.0.0/issuesFixed/css/ textmarker-2.0.0/issuesFixed/images/
textmarker-2.0.0/issuesFixed/images/logos/ textmarker-current/ ...
Added: uima/site/trunk/uima-website/docs/d/textmarker-current/tools.textmarker.book.html
URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/docs/d/textmarker-current/tools.textmarker.book.html?rev=1452847&view=auto
==============================================================================
--- uima/site/trunk/uima-website/docs/d/textmarker-current/tools.textmarker.book.html (added)
+++ uima/site/trunk/uima-website/docs/d/textmarker-current/tools.textmarker.book.html Tue Mar 5 15:27:45 2013
@@ -0,0 +1,5783 @@
+<html><head>
+ <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+ <title>Apache UIMA TextMarker Guide and Reference</title><link rel="stylesheet" type="text/css" href="css/stylesheet-html.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div lang="en" class="book" title="Apache UIMA TextMarker Guide and Reference" id="d5e1"><div xmlns:d="http://docbook.org/ns/docbook" class="titlepage"><div><div><h1 class="title">Apache UIMA TextMarker Guide and Reference</h1></div><div><div class="authorgroup">
+ <h3 class="corpauthor">Written and maintained by the Apache UIMA Development Community</h3>
+ </div></div><div><p class="releaseinfo">Version 2.0.0</p></div><div><p class="copyright">Copyright © 2006, 2013 The Apache Software Foundation</p></div><div><div class="legalnotice" title="Legal Notice"><a name="d5e8"></a>
+ <p> </p>
+ <p title="License and Disclaimer">
+ <b>License and Disclaimer. </b>
+
+ The ASF licenses this documentation
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this documentation except in compliance
+ with the License. You may obtain a copy of the License at
+
+ </p><div class="blockquote"><blockquote class="blockquote">
+ <a class="ulink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a>
+ </blockquote></div><p title="License and Disclaimer">
+
+ Unless required by applicable law or agreed to in writing,
+ this documentation and its contents are distributed under the License
+ on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+ </p>
+ <p> </p>
+ <p> </p>
+ <p title="Trademarks">
+ <b>Trademarks. </b>
+ All terms mentioned in the text that are known to be trademarks or
+ service marks have been appropriately capitalized. Use of such terms
+ in this book should not be regarded as affecting the validity of the
+ the trademark or service mark.
+
+ </p>
+ </div></div><div><p class="pubdate">February, 2013</p></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#ugr.tools.tm.overview">1. Apache UIMA TextMarker Overview</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.tm.overview.intro">1.1. What is Apache UIMA TextMarker?</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.overview.gettingstarted">1.2. Getting started</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.overview.coreconcepts">1.3. Core Concepts</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.overview.examples">1.4. Learning by Example</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.ae">1.5. UIMA Analysis Engines</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.tm.ae.basic">1.5.1. TextMarker Engine</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.ae.annotationwriter">1.5.2. Annotation Writer</a
></span></dt><dt><span class="section"><a href="#ugr.tools.tm.ae.plaintext">1.5.3. Plain Text Annotator</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.ae.modifier">1.5.4. Modifier</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.ae.html">1.5.5. HMTL Annotator</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.ae.stylemap">1.5.6. Style Map Creator</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.ae.xmi">1.5.7. XMI Writer</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#ugr.tools.tm.language.language">2. Apache UIMA TextMarker Language</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.tm.language.syntax">2.1. Syntax</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.seeding">2.2. Basic annotations and tokens</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.quantifier">2.3. Quantifiers</a></span></dt><dd><dl><dt><span class="section"
><a href="#ugr.tools.tm.language.quantifier.sg">2.3.1. * Star Greedy</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.quantifier.sr">2.3.2. *? Star Reluctant</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.quantifier.pg">2.3.3. + Plus Greedy</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.quantifier.pr">2.3.4. +? Plus Reluctant</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.quantifier.qg">2.3.5. ? Question Greedy</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.quantifier.qr">2.3.6. ?? Question Reluctant</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.quantifier.mmg">2.3.7. [x,y] Min Max Greedy</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.quantifier.mmr">2.3.8. [x,y]? Min Max Reluctant</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.tm.language.declarations">2.4. Declarations</a
></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.tm.language.declarations.type">2.4.1. Types</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.declarations.variable">2.4.2. Variables</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.declarations.ressource">2.4.3. Resources</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.declarations.scripts">2.4.4. Scripts</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.declarations.components">2.4.5. Components</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.tm.language.expressions">2.5. Expressions</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.tm.language.expressions.type">2.5.1. Type Expressions</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.expressions.number">2.5.2. Number Expressions</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.langua
ge.expressions.string">2.5.3. String Expressions</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.expressions.boolean">2.5.4. Boolean Expressions</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.expressions.lists">2.5.5. List Expressions</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.tm.language.conditions">2.6. Conditions</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.after">2.6.1. AFTER</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.and">2.6.2. AND</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.before">2.6.3. BEFORE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.contains">2.6.4. CONTAINS</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.contextcount">2.6.5. CONTEXTCOUNT</a></span></dt><dt><span class="section"><a hr
ef="#ugr.tools.tm.language.conditions.count">2.6.6. COUNT</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.currentcount">2.6.7. CURRENTCOUNT</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.endswith">2.6.8. ENDSWITH</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.feature">2.6.9. FEATURE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.if">2.6.10. IF</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.inlist">2.6.11. INLIST</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.is">2.6.12. IS</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.last">2.6.13. LAST</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.mofn">2.6.14. MOFN</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.
near">2.6.15. NEAR</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.not">2.6.16. NOT</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.or">2.6.17. OR</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.parse">2.6.18. PARSE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.partof">2.6.19. PARTOF</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.partofneq">2.6.20. PARTOFNEQ</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.position">2.6.21. POSITION</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.regexp">2.6.22. REGEXP</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.score">2.6.23. SCORE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.size">2.6.24. SIZE</a></span></dt><dt>
<span class="section"><a href="#ugr.tools.tm.language.conditions.startswith">2.6.25. STARTSWITH</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.totalcount">2.6.26. TOTALCOUNT</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.conditions.vote">2.6.27. VOTE</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.tm.language.actions">2.7. Actions</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.tm.language.actions.add">2.7.1. ADD</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.addfiltertype">2.7.2. ADDFILTERTYPE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.addretaintype">2.7.3. ADDRETAINTYPE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.assign">2.7.4. ASSIGN</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.call">2.7.5. CALL</a></span></dt><dt><span clas
s="section"><a href="#ugr.tools.tm.language.actions.clear">2.7.6. CLEAR</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.color">2.7.7. COLOR</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.configure">2.7.8. CONFIGURE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.create">2.7.9. CREATE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.del">2.7.10. DEL</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.dynamicanchoring">2.7.11. DYNAMICANCHORING</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.exec">2.7.12. EXEC</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.fill">2.7.13. FILL</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.filtertype">2.7.14. FILTERTYPE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.langua
ge.actions.gather">2.7.15. GATHER</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.get">2.7.16. GET</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.getfeature">2.7.17. GETFEATURE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.getlist">2.7.18. GETLIST</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.log">2.7.19. LOG</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.mark">2.7.20. MARK</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.markfast">2.7.21. MARKFAST</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.marklast">2.7.22. MARKLAST</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.markonce">2.7.23. MARKONCE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.markscore">2.7.24. MARKSCORE</a></sp
an></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.marktable">2.7.25. MARKTABLE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.matchedtext">2.7.26. MATCHEDTEXT</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.merge">2.7.27. MERGE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.remove">2.7.28. REMOVE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.removeduplicate">2.7.29. REMOVEDUPLICATE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.removefiltertype">2.7.30. REMOVEFILTERTYPE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.removeretaintype">2.7.31. REMOVERETAINTYPE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.replace">2.7.32. REPLACE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.retaintyp
e">2.7.33. RETAINTYPE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.setfeature">2.7.34. SETFEATURE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.shift">2.7.35. SHIFT</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.transfer">2.7.36. TRANSFER</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.trie">2.7.37. TRIE</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.trim">2.7.38. TRIM</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.unmark">2.7.39. UNMARK</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.actions.unmarkall">2.7.40. UNMARKALL</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.tm.language.filtering">2.8. Robust extraction using filtering</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.blocks">2.9. Blocks</a></
span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.tm.language.blocks.condition">2.9.1. Conditioned statements</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.blocks.foreach">2.9.2.
+ <span class="quote">“<span class="quote">Foreach</span>”</span>
+ -Loops
+ </a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.blocks.procedure">2.9.3. Procedures</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.tm.language.score">2.10. Heuristic extraction using scoring rules</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.modification">2.11. Modification</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.language.external_resources">2.12. External resources</a></span></dt><dd><dl><dt><span class="section"><a href="#d5e2077">2.12.1. WORDLISTs</a></span></dt><dt><span class="section"><a href="#d5e2097">2.12.2. WORDTABLEs</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#ugr.tools.tm.workbench">3. Apache UIMA TextMarker Workbench</a></span></dt><dd><dl><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.install">3.1. Installation</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.overview">3.2. Te
xtMarker Workbench Overview</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.projects">3.3. TextMarker Projects</a></span></dt><dd><dl><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.projects.create_projects">3.3.1. TextMarker create project wizard</a></span></dt></dl></dd><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.tm_perspective">3.4. TextMarker Perspective</a></span></dt><dd><dl><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.tm_perspective.annotation_browser">3.4.1. Annotation Browser</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.tm_perspective.selection">3.4.2. Selection</a></span></dt></dl></dd><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.explain_perspective">3.5. Explain Perspective</a></span></dt><dd><dl><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.explain_perspective.applied_rules">3.5.1. Applie
d Rules</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.explain_perspective.matched_and_failed_rules">3.5.2. Matched Rules and Failed Rules</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.explain_perspective.rule_elements">3.5.3. Rule Elements</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.explain_perspective.covering_rules">3.5.4. Covering Rules</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.explain_perspective.rule_list">3.5.5. Rule List</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.explain_perspective.created_by">3.5.6. Created By</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.explain_perspective.statistics">3.5.7. Statistics</a></span></dt></dl></dd><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.tm_query">3.6. Query View</a></span></dt><dt><span
class="section"><a href="#section.ugr.tools.tm.workbench.testing">3.7. Testing</a></span></dt><dd><dl><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.testing.usage">3.7.1. Usage</a></span></dt><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.testing.evaluators">3.7.2. Evaluators</a></span></dt></dl></dd><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.textruler">3.8. TextRuler</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.tm.textruler.learner">3.8.1. Available Learners</a></span></dt></dl></dd><dt><span class="section"><a href="#section.ugr.tools.tm.workbench.create_dictionaries">3.9. Creation of Tree Word Lists</a></span></dt><dt><span class="section"><a href="#ugr.tools.tm.workbench.apply">3.10. Apply a TextMarker script to a folder</a></span></dt></dl></dd></dl></div>
+
+
+
+
+
+ <div class="chapter" title="Chapter 1. Apache UIMA TextMarker Overview" id="ugr.tools.tm.overview"><div class="titlepage"><div><div><h2 class="title">Chapter 1. Apache UIMA TextMarker Overview</h2></div></div></div>
+
+ <p>
+
+ </p>
+ <div class="section" title="1.1. What is Apache UIMA TextMarker?"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.tm.overview.intro">1.1. What is Apache UIMA TextMarker?</h2></div></div></div>
+
+ <p>
+ Apache UIMA™ TextMarker is a rule-based script language supported by Eclipse-based tooling.
+ The language is designed to enable rapid development of text processing applications within UIMA.
+ A special focus lies on the intuitive and flexible domain specific language for defining
+ patterns of annotations. Writing rules for information extraction or other text processing
+ applications is a tedious process. The Eclipse-based tooling for TextMarker, called the Apache UIMA TextMarker Workbench,
+ was created to support the user and to facilitate every step when writing TextMarker rules. Both the
+ TextMarker rule language and the TextMarker Workbench integrate smoothly with Apache UIMA.
+ </p>
+ </div>
+
+ <div class="section" title="1.2. Getting started"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.tm.overview.gettingstarted">1.2. Getting started</h2></div></div></div>
+
+ <p>
+ This section gives a short roadmap how to read the documentation and gives some recommendations how to
+ start developing TextMarker-based applications. This documentation assumes that the reader knows about
+ the core concepts of Apache UIMA. Knowledge of the meaning and usage of the terms <span class="quote">“<span class="quote">CAS</span>”</span>,
+ <span class="quote">“<span class="quote">Feature Structure</span>”</span>, <span class="quote">“<span class="quote">Annotation</span>”</span>, <span class="quote">“<span class="quote">Type</span>”</span>, <span class="quote">“<span class="quote">Type System</span>”</span>
+ and <span class="quote">“<span class="quote">Analysis Engine</span>”</span> is required. Please refer to the documentation of Apache UIMA for an introduction.
+ </p>
+ <p>
+ Unexperienced users that want to learn about TextMarker can start with the next two sections:
+ <a class="xref" href="#ugr.tools.tm.overview.coreconcepts" title="1.3. Core Concepts">Section 1.3, “Core Concepts”</a>
+ gives a short overview of the core ideas and features of the TextMarker language and Workbench.
+ This section introduces the main concepts of the TextMarker language. It explains how TextMarker rules
+ are composed and applied, and discusses the advantages of the TextMarker system.
+ The following <a class="xref" href="#ugr.tools.tm.overview.examples" title="1.4. Learning by Example">Section 1.4, “Learning by Example”</a> approaches the TextMarker language using a different
+ perspective. Here, the language is introduced by examples. The first example starts with explaining how a simple rule
+ looks like, and each following example extends the syntax or semantics of the TextMarker language.
+ After the consultation of these two sections, the reader is expected to have gained enough
+ knowledge to start writing her first TextMarker-based application.
+ </p>
+ <p>
+ The TextMarker Workbench was created to support the user and to facilitate the development process. It is strongly recommended to
+ use this Eclipse-based IDE since it, for example, automatically configures the component descriptors and provides editing support like
+ syntax checking. <a class="xref" href="#section.ugr.tools.tm.workbench.install" title="3.1. Installation">Section 3.1, “Installation”</a> describes how the TextMarker Workbench is installed.
+ TextMarker rules can also be applied on CAS without using the TextMarker Workbench.
+ <a class="xref" href="#ugr.tools.tm.ae.basic.apply" title="1.5.1.1. Apply TextMarker Analysis Engine in plain Java">Section 1.5.1.1, “Apply TextMarker Analysis Engine in plain Java”</a> contains examples how to execute TextMarker rules in plain java.
+ A good way to get started with TextMarker is to play around with an exemplary TextMarker project, e.g.,
+ <code class="uri">https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects/ExampleProject</code>. This TextMarker project
+ contains some simple rules for processing citation metadata.
+ </p>
+ <p>
+ <a class="xref" href="#ugr.tools.tm.language.language" title="Chapter 2. Apache UIMA TextMarker Language">Chapter 2, <i>Apache UIMA TextMarker Language</i></a> and <a class="xref" href="#ugr.tools.tm.workbench" title="Chapter 3. Apache UIMA TextMarker Workbench">Chapter 3, <i>Apache UIMA TextMarker Workbench</i></a> provide
+ more detailed descriptions and can be referred to in order to gain knowledge of specific parts
+ of the TextMarker language or the TextMarker Workbench.
+ </p>
+ </div>
+
+ <div class="section" title="1.3. Core Concepts"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.tm.overview.coreconcepts">1.3. Core Concepts</h2></div></div></div>
+
+ <p>
+ The TextMarker language is an imperative rule language extended with scripting elements. A TextMarker rule defines a
+ pattern of annotations with additional conditions. If this pattern applies, then the actions of the rule are performed
+ on the matched annotations. A rule is composed of a sequence of rule elements and a rule element essentially consist of four parts:
+ A matching condition, an optional quantifier, a list of conditions and a list of actions.
+ The matching condition is typically a type of an annotation by which the rule element matches on the covered text of one of those annotations.
+ The quantifier specifies, whether it is necessary that the rule element successfully matches and how often the rule element may match.
+ The list of conditions specifies additional constraints that the matched text or annotations need to fulfill. The list of actions defines
+ the consequences of the rule and often creates new annotations or modifies existing annotations.
+ They are only applied if all rule elements of the rule have successfully matched. Examples for TextMarker rules can be found in
+ <a class="xref" href="#ugr.tools.tm.overview.examples" title="1.4. Learning by Example">Section 1.4, “Learning by Example”</a>.
+ </p>
+ <p>
+ When TextMarker rules are applied on a document, respectively on a CAS, then they are always grouped in a script file. However, a TextMarker
+ script file does not only contain rules, but also other statements. First of all, each script file starts with a package declaration followed by
+ a list of optional imports. Then, common statements like rules, type declarations or blocks build the body and functionality of a script.
+ <a class="xref" href="#ugr.tools.tm.ae.basic.apply" title="1.5.1.1. Apply TextMarker Analysis Engine in plain Java">Section 1.5.1.1, “Apply TextMarker Analysis Engine in plain Java”</a> gives an example, how TextMarker scripts can be applied in plain Java.
+ TextMarker script files are naturally organized in TextMarker projects, which is a concept of the TextMarker Workbench.
+ The structure of a TextMarker project is described in <a class="xref" href="#section.ugr.tools.tm.workbench.projects" title="3.3. TextMarker Projects">Section 3.3, “TextMarker Projects”</a>
+ </p>
+ <p>
+ The inference of TextMarker rules, that is the approach how the rules are applied, can be described as imperative depth-first matching.
+ In contrast to similar rule-based systems, TextMarker rules are applied in the order they are defined in the script.
+ The imperative execution of the matching rules may have disadvantages, but also many advantages like an increased rate of development or
+ an easier explanation. The second main property of the TextMarker inference is the depth-first matching. When a rule matches on a pattern of annotations, then
+ an alternative is always tracked until it has matched or failed before the next alternative is considered. The behavior of a rule may change, if
+ it has already matched on an early alternative and thus has performed an action, which influences some constraints of the rule.
+ Examples, how TextMarker rules are applied, are given in <a class="xref" href="#ugr.tools.tm.overview.examples" title="1.4. Learning by Example">Section 1.4, “Learning by Example”</a>.
+ </p>
+ <p>
+ The TextMarker language provides the possibility to approach an annotation problem in different ways. Let us distinguish
+ some approaches as an example.
+ It is common in the TextMarker language to create many annotations of different types. These annotations are probably not the targeted annotation of the domain,
+ but can be helpful to incrementally approximate the annotation of interest. This enables the user to work <span class="quote">“<span class="quote">bottom-up</span>”</span> and <span class="quote">“<span class="quote">top-down</span>”</span>.
+ In the former approach, the rules add incrementally more complex annotations using simple ones until the target annotation can be created.
+ In the latter approach, the rules get more specific while partitioning the document in smaller segments, which result in the targeted annotation, eventually.
+ By using many <span class="quote">“<span class="quote">helper</span>”</span>-annotations, the engineering task becomes easier and more comprehensive.
+ The TextMarker language provides distinctive language elements for different tasks. There are, for example, actions
+ that are able to create new annotations, actions that are able to remove annotations and actions that are able to modify the
+ offsets of annotations. This enables, amongst other things, a transformation-based approach. The user starts by creating general rules that are able to
+ annotate most of the text fragments of interest. Then, instead of making these rules more complex by adding more conditions for situations where they fail,
+ additional rules are defined that correct the mistakes of the general rules, e.g., by deleting false positive annotations.
+ <a class="xref" href="#ugr.tools.tm.overview.examples" title="1.4. Learning by Example">Section 1.4, “Learning by Example”</a> provides some examples how TextMarker rules can be engineered.
+ </p>
+ <p>
+ To write rules manually is a tedious and error-prone process. The <a class="link" href="#ugr.tools.tm.workbench" title="Chapter 3. Apache UIMA TextMarker Workbench">TextMarker Workbench</a>
+ was developed to facilitate writing rules by providing as much tooling support as possible. This includes, for example, syntax checking and auto completion, which
+ make the development less error-prone. The user can annotate documents and use these documents as unit tests for test-driven development or
+ quality maintenance. Sometimes, it is necessary to debug the rules because they do not match as expected. In this case, the explanation perspective provides views
+ that explain every detail of the matching process. Finally, the TextMarker language can also be used by the tooling, for example, by the <span class="quote">“<span class="quote">Query</span>”</span> view.
+ Here, TextMarker rules can be used as query statements in order to investigate annotated documents.
+ </p>
+ <p>
+ TextMarker smoothly integrates with Apache UIMA. First of all, the TextMarker rules are applied using a generic Analysis Engine and thus TextMarker scripts can
+ easily be added to Apache UIMA pipelines. TextMarker also provides the functionality to import and use other UIMA components like Analysis Engines and Type Systems.
+ TextMarker rules can refer to every type defined in an imported type system, and the TextMarker Workbench generates a type system descriptor file containing all
+ types that were defined in a script file. Any Analysis Engine can be executed by rules as long as their implementation is available in the classpath. Therefore,
+ functionality outsourced in an arbitrary Analysis Engine can be added and used within TextMarker.
+ </p>
+ </div>
+
+ <div class="section" title="1.4. Learning by Example"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.tm.overview.examples">1.4. Learning by Example</h2></div></div></div>
+
+ <p>
+ This section gives an introduction to the TextMarker language by explaining the rule syntax
+ and inference with some simplified examples. It is recommended to use the TextMarker Workbench to write TextMarker rules
+ in order to gain advantages like syntax checking. A short description how to install the TextMarker Workbench
+ is given <a class="link" href="#section.ugr.tools.tm.workbench.install" title="3.1. Installation">here</a>. The following examples make use of the
+ annotations added by the default seeding of the TextMarker Analysis Engine. Their meaning is explained along with the examples.
+ </p>
+ <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
+ The examples in this section are not valid script files as they are missing at least a package declaration.
+ In order to obtain a valid script file, please ensure that all used types are imported or declared and
+ that a package declaration like <span class="quote">“<span class="quote">PACKAGE uima.textmarker.example;</span>”</span> is added in the first line of the script.
+ </p></div>
+ <p>
+ The first example consists of a declaration of a type followed by a simple rule. Type declarations always start with the keyword
+ <span class="quote">“<span class="quote">DECLARE</span>”</span> followed by the short name of the new type. The namespace of the type is equal to the package declaration of the script file.
+ There is also the possibility to create more complex types with features or specific parent types, but this will be neglected for now.
+ In the example, a simple annotation type with the short name <span class="quote">“<span class="quote">Animal</span>”</span> is defined.
+ After the declaration of the type, a rule with one rule element is given.
+ TextMarker rules in general can consist of a sequence of rule elements. Simple rule elements themselves consist of four parts: A matching condition,
+ an optional quantifier, an optional list of conditions and an optional list of actions. The rule element in the
+ following example has a matching condition <span class="quote">“<span class="quote">W</span>”</span>, an annotation type standing for normal words.
+ Statements like declarations and rules always end with a semicolon.
+ </p>
+
+ <pre class="programlisting">DECLARE Animal;
+W{REGEXP("dog") -> MARK(Animal)};</pre>
+
+ <p>
+ The rule element also contains one condition and one action, both surrounded by curly parentheses. In order to distinguish conditions from actions
+ they are separated by <span class="quote">“<span class="quote">-></span>”</span>. The condition <span class="quote">“<span class="quote">REGEXP("dog")</span>”</span> indicates that the matched
+ word must match the regular expression <span class="quote">“<span class="quote">dog</span>”</span>. If the matching condition and the additional regular expression are fulfilled, then the action
+ is executed, which creates a new annotation of the type <span class="quote">“<span class="quote">Animal</span>”</span> with the same offsets as the matched token.
+ The default seeder does actually not add annotations of the type <span class="quote">“<span class="quote">W</span>”</span>, but annotations of the types <span class="quote">“<span class="quote">SW</span>”</span> and
+ <span class="quote">“<span class="quote">CW</span>”</span> for small written words and capitalized words, which both have the parent type <span class="quote">“<span class="quote">W</span>”</span>.
+ </p>
+
+ <p>
+ Since it is tedious to create Animal annotations by matching on different regular expression, we apply an external dictionary in the next example.
+ The first line defines a word list named <span class="quote">“<span class="quote">AnimalsList</span>”</span>, which is located in the resource folder (the file <span class="quote">“<span class="quote">Animals.txt</span>”</span>
+ contains one animal name in each line). After the declaration of the type, a rule uses this word list to find all occurrences of animals
+ in the complete document.
+ </p>
+
+ <pre class="programlisting">WORDLIST AnimalsList = 'Animals.txt;'
+DECLARE Animal;
+Document{-> MARKFAST(Animal, AnimalsList)};
+</pre>
+
+ <p>
+ The matching condition of the rule element refers to the complete document, or more specific to the annotation of the type
+ <span class="quote">“<span class="quote">DocumentAnnotation</span>”</span>, which covers the whole document.
+ The action <span class="quote">“<span class="quote">MARKFAST</span>”</span> of this rule element creates an annotation of the type <span class="quote">“<span class="quote">Animal</span>”</span> for each found
+ entry of the dictionary <span class="quote">“<span class="quote">AnimalsList</span>”</span>.
+ </p>
+
+ <p>
+ The next example introduces rules with more than one rule element, whereby one of them is a composed rule element. The following rule tries to
+ annotate occurrences of animals separated by commas, e.g., <span class="quote">“<span class="quote">dog, cat, bird</span>”</span>.
+ </p>
+
+ <pre class="programlisting">DECLARE AnimalEnum;
+(Animal COMMA)+{-> MARK(AnimalEnum,1,2)} Animal;</pre>
+
+ <p>
+ The rule consists of two rule elements, with <span class="quote">“<span class="quote">(Animal COMMA)+{-> MARK(AnimalEnum,1,2)}</span>”</span> being the first rule element and
+ <span class="quote">“<span class="quote">Animal</span>”</span> the second one. Let us take a closer look at the first rule element. This rule element is actually composed of two normal rule elements,
+ that are <span class="quote">“<span class="quote">Animal</span>”</span> and <span class="quote">“<span class="quote">COMMA</span>”</span>, and contains a greedy quantifier and one action. This rule element, therefore, matches on
+ one Animal annotation and a following comma. This is repeated until one of the inner rule elements does not match anymore. Then, there has to be
+ another Animal annotation afterwards, specified by the second rule element of the rule. In this case, the rule matches and its action is executed:
+ The MARK action creates a new annotation of the type <span class="quote">“<span class="quote">AnimalEnum</span>”</span>. However, in contrast to the previous examples, this action also
+ contains two numbers. These numbers refer to the rule elements that should be used to calculate the span of the created annotation. The numbers
+ <span class="quote">“<span class="quote">1, 2</span>”</span> state that the new annotation should start with the first rule element, the composed one, and should end with the second rule element.
+ </p>
+
+ <p>
+ Let us make the composed rule element more complex. The following rule also matches on lists of animals, which are
+ separated by semicolon. A disjunctive rule element is therefore added, indicated by the symbol <span class="quote">“<span class="quote">|</span>”</span>, which matches on
+ annotations of the type <span class="quote">“<span class="quote">COMMA</span>”</span> or <span class="quote">“<span class="quote">SEMICOLON</span>”</span>.
+ </p>
+ <pre class="programlisting">(Animal (COMMA | SEMICOLON))+{-> MARK(AnimalEnum,1,2)} Animal;</pre>
+
+ <p>
+ Rule elements can contain more then one condition. The rule in the next example tries to identify headlines, which are bold,
+ underlined and end with a colon.
+ </p>
+
+ <pre class="programlisting">DECLARE Headline;
+Paragraph{CONTAINS(Bold, 90, 100, true),
+ CONTAINS(Underlined, 90, 100, true), ENDSWITH(COLON)
+ -> MARK(Headline)};</pre>
+
+ <p>
+ The matching condition of this rule element is given with the type <span class="quote">“<span class="quote">Paragraph</span>”</span>, thus the rule takes a look at all Paragraph annotations.
+ The rule matches only if the three conditions, separated by commas, are fulfilled. The first condition <span class="quote">“<span class="quote">CONTAINS(Bold, 90, 100, true)</span>”</span> states that
+ 90%-100% of the matched paragraph annotation should also be annotated with annotations of the type <span class="quote">“<span class="quote">Bold</span>”</span>. The boolean parameter <span class="quote">“<span class="quote">true</span>”</span>
+ indicates that amount of Bold annotations should be calculated relatively to the matched annotation. The two numbers <span class="quote">“<span class="quote">90,100</span>”</span> are, therefore, interpreted as
+ percent amounts. The exact calculation of the coverage is dependent on the tokenization of the document and is neglected for now. The second condition
+ <span class="quote">“<span class="quote">CONTAINS(Underlined, 90, 100, true)</span>”</span> consequently states that the paragraph should also contain at least 90% of annotations of the type <span class="quote">“<span class="quote">underlined</span>”</span>.
+ The third condition <span class="quote">“<span class="quote">ENDSWITH(COLON)</span>”</span> finally forces the Paragraph annotation to end with a colon. It is only fulfilled, if there is an annotation of the type
+ <span class="quote">“<span class="quote">COLON</span>”</span>, which has an end offset equal to the end offset of the matched Paragraph annotation.
+ </p>
+
+ <p>
+ The readability and maintenance of rules does not increase, if more conditions are added.
+ One of the strengths of the TextMarker language is that it provides different approaches to solve an annotation task. The next two examples
+ introduce actions for transformation-based rules.
+ </p>
+
+ <pre class="programlisting">Headline{-CONTAINS(W) -> UNMARK(Headline)};</pre>
+
+ <p>
+ This rule consists of one condition and one action. The condition <span class="quote">“<span class="quote">-CONTAINS(W)</span>”</span> is negated (indicated by the character <span class="quote">“<span class="quote">-</span>”</span>),
+ and is therefore only fulfilled, if there are no annotations of the type <span class="quote">“<span class="quote">W</span>”</span> within the bound of the matched Headline annotation.
+ The action <span class="quote">“<span class="quote">UNMARK(Headline)</span>”</span> removes the matched Headline annotation. Put into simple words, headlines that contain no words at all are not headlines.
+ </p>
+
+ <p>
+ The next rule does not remove an annotation, but changes its offsets dependent on the context.
+ </p>
+
+ <pre class="programlisting">Headline{-> SHIFT(Headline, 1, 2)} COLON;</pre>
+
+ <p>
+ Here, the action <span class="quote">“<span class="quote">SHIFT(Headline, 1, 2)</span>”</span> expands the matched Headline annotation to the next colon, if that Headline annotation
+ is followed by a COLON annotation.
+ </p>
+
+ <p>
+ TextMarker rules can contain arbitrary conditions and actions, which is illustrated by the next example.
+ </p>
+
+ <pre class="programlisting">DECLARE Month, Year, Date;
+ANY{INLIST(MonthsList) -> MARK(Month), MARK(Date,1,3)}
+ PERIOD? NUM{REGEXP(".{2,4}") -> MARK(Year))};</pre>
+
+ <p>
+ This rule consists of three rule elements. The first one matches on every token, which has a covered text that occurs in a word lists named <span class="quote">“<span class="quote">MonthsList</span>”</span>.
+ The second rule element is optional and does not need to be fulfilled, which is indicated by the quantifier <span class="quote">“<span class="quote">?</span>”</span>. The last rule element matches
+ on numbers that fulfill the regular expression <span class="quote">“<span class="quote">REGEXP(".{2,4}"</span>”</span> and are therefore at least two characters to a maximum of four characters long.
+ If this rule successfully matches on a text passage, then its three actions are executed: An annotation of the type <span class="quote">“<span class="quote">Month</span>”</span> is created for the first rule element,
+ an annotation of the type <span class="quote">“<span class="quote">Year</span>”</span> is created for the last rule element and an annotation of the type <span class="quote">“<span class="quote">Date</span>”</span>
+ is created for the span of all three rule elements. If the word list contains the correct entries, then this rule matches on strings like
+ <span class="quote">“<span class="quote">Dec. 2004</span>”</span>, <span class="quote">“<span class="quote">July 85</span>”</span> or <span class="quote">“<span class="quote">11.2008</span>”</span> and creates the corresponding annotations.
+ </p>
+
+ <p>
+ After introducing the composition of rule elements, the default matching strategy is examined. The two rules in the next example create an annotation
+ for a sequence of arbitrary tokens with the only difference of one condition.
+ </p>
+
+ <pre class="programlisting">DECLARE Text1, Text2;
+ANY+{ -> MARK(Text1)};
+ANY+{-PARTOF(Text2) -> MARK(Text2)};</pre>
+
+ <p>
+ The first rule matches on each occurrence of an arbitrary token and continues this until the end of the document is reached.
+ This is caused by the greedy quantifier <span class="quote">“<span class="quote">+</span>”</span>. Note that this rule considers each occurrence of a token and is therefore
+ executed for each token resulting many overlapping annotations. This behavior is illustrated with an example:
+ When applied on the document <span class="quote">“<span class="quote">Peter works for Frank</span>”</span>, the rule creates four annotations with the covered texts
+ <span class="quote">“<span class="quote">Peter works for Frank</span>”</span>, <span class="quote">“<span class="quote">works for Frank</span>”</span>, <span class="quote">“<span class="quote">for Frank</span>”</span> and <span class="quote">“<span class="quote">Frank</span>”</span>.
+ The rule first tries to match on the token <span class="quote">“<span class="quote">Peter</span>”</span> and continues its matching. Then, it tries to match on the token <span class="quote">“<span class="quote">works</span>”</span> and
+ continues its matching, and so on.
+ </p>
+ <p>
+ In this example, the second rule only returns one annotation, which covers the complete document. This is caused by the additional
+ condition <span class="quote">“<span class="quote">-PARTOF(Text2)</span>”</span>. The PARTOF condition is fulfilled, if the matched annotation is located within an annotation of the given type, or
+ put in simple words, if the matched annotation is part of an annotation of the type <span class="quote">“<span class="quote">Text2</span>”</span>. When applied on the
+ document <span class="quote">“<span class="quote">Peter works for Frank</span>”</span>, the rule matches on the first token <span class="quote">“<span class="quote">Peter</span>”</span>, continues its match and
+ creates an annotation of the type <span class="quote">“<span class="quote">Text2</span>”</span> for the complete document. Then it tries to match on the second token <span class="quote">“<span class="quote">works</span>”</span>, but fails,
+ because this token is already part of an Text2 annotation.
+ </p>
+
+ <p>
+ TextMarker rules can not only be used to create or modify annotations, but also to create features for annotations. The next example defines
+ and assigns a relation of employment, by storing the given annotations as feature values.
+ </p>
+
+ <pre class="programlisting">DECLARE Annotation EmplRelation
+ (Employee employeeRef, Employer employerRef);
+Sentence{CONTAINS(EmploymentIndicator) -> CREATE(EmplRelation,
+ "employeeRef" = Employee, "employerRef" = Employer)};</pre>
+
+ <p>
+ The first statement of this example is a declaration that defines a new type of annotation named <span class="quote">“<span class="quote">EmplRelation</span>”</span>.
+ This annotation has two features:
+ One feature with the name <span class="quote">“<span class="quote">employeeRef</span>”</span> of the type <span class="quote">“<span class="quote">Employee</span>”</span> and
+ one feature with the name <span class="quote">“<span class="quote">employerRef</span>”</span> of the type <span class="quote">“<span class="quote">Employer</span>”</span>.
+ The second statement of this example, which is a simple rule, creates one annotation of the type <span class="quote">“<span class="quote">EmplRelation</span>”</span> for
+ each Sentence annotation that contains at least one annotation of the type <span class="quote">“<span class="quote">EmploymentIndicator</span>”</span>. Additionally to creating an annotation,
+ the CREATE action also assigns an annotation of the <span class="quote">“<span class="quote">Employee</span>”</span>, which needs to be located within the span of the matched sentence,
+ to the feature <span class="quote">“<span class="quote">employeeRef</span>”</span> and an Employer annotation to the feature <span class="quote">“<span class="quote">employerRef</span>”</span>. The annotations mentioned in this
+ example need to be present in advance.
+ </p>
+
+ <p>
+ In the last example, the values of features were defined as annotation types. However, also primitive
+ types can be used, as will be shown in the next example, together with a short introduction of variables.
+ </p>
+
+ <pre class="programlisting">DECLARE Annotation MoneyAmount(STRING currency, INT amount);
+INT moneyAmount;
+STRING moneyCurrency;
+NUM{PARSE(moneyAmount)} SPECIAL{REGEXP("€") -> MATCHEDTEXT(moneyCurrency),
+ CREATE(MoneyAmount, 1, 2, "amount" = moneyAmount,
+ "currency" = moneyCurrency)};</pre>
+
+ <p>
+ First, a new annotation with the name <span class="quote">“<span class="quote">MoneyAmount</span>”</span> and two features are defined, one string feature and one integer feature.
+ Then, two TextMarker variables are declared, one integer variable and one string variable. The rule matches on a number, whose value is stored
+ in the variable <span class="quote">“<span class="quote">moneyAmount</span>”</span>, followed by a special token that needs to be equal to the string <span class="quote">“<span class="quote">€</span>”</span>. Then,
+ the covered text of the special annotation is stored in the string variable <span class="quote">“<span class="quote">moneyCurrency</span>”</span> and annotation of the
+ type <span class="quote">“<span class="quote">MoneyAmount</span>”</span> spanning over both rule elements is created. Additionally, the variables are assigned as feature values.
+ </p>
+
+ <p>
+ TextMarker script files with many rules can quickly confuse the reader. The TextMarker language, therefore, allows to import other script files in order to increase
+ the modularity of a project or to create rule libraries. The next example imports the rules together with all known types of another script file
+ and executes that script file.
+ </p>
+
+ <pre class="programlisting">SCRIPT uima.textmarker.example.SecondaryScript;
+Document{-> CALL(SecondaryScript)};</pre>
+
+ <p>
+ The script file with the name <span class="quote">“<span class="quote">SecondaryScript.tm</span>”</span>, which is located in the package <span class="quote">“<span class="quote">uima/textmarker/example</span>”</span>, is imported and executed
+ by the CALL action on the complete document. The script needs to be located in the folder specified by the parameter
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.scriptPaths" title="scriptPaths">scriptPaths</a>. It is also possible to import script files of other TextMarker projects, e.g.,
+ by adapting the configuration parameters of the TextMarker Analysis Engine or
+ by setting a project reference in the project properties of a TextMarker project.
+ </p>
+
+ <p>
+ The types of important annotations of the application are often defined in a separate type system. The next example shows how to import those types.
+ </p>
+
+ <pre class="programlisting">TYPESYSTEM my.package.NamedEntityTypeSystem;
+Person{PARTOF(Organization) -> UNMARK(Person)};
+</pre>
+
+ <p>
+ The type system descriptor file with the name <span class="quote">“<span class="quote">NamedEntityTypeSystem.xml</span>”</span> located in the package <span class="quote">“<span class="quote">my/package</span>”</span> is imported.
+ The descriptor needs to be located in a folder specified by the parameter
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.descriptorPaths" title="descriptorPaths">descriptorPaths</a>.
+ </p>
+
+
+ <p>
+ It is sometimes easier to express functionality with control structures known by programming languages rather than to engineer all functionality
+ only with matching rules. The TextMarker language provides the BLOCK element for some of these use cases.
+ The TextMarker BLOCK element starts with the keyword <span class="quote">“<span class="quote">BLOCK</span>”</span> followed by its name in parentheses. The name of a block has two purposes:
+ On the one hand, it is easier to distinguish the block, if they have different names, e.g., in the
+ <a class="link" href="#section.ugr.tools.tm.workbench.explain_perspective" title="3.5. Explain Perspective">explain perspective</a> of the TextMarker Workbench. On the other hand,
+ the name can be used to execute this block using the CALL action. Hereby, it is possible to access only specific sets of rules of other script files,
+ or to implement a recursive call of rules. After the name of the block, a single rule element is given, which has curly parentheses,
+ even if no conditions or actions are specified. Then, the body of the block is framed by curly brackets.
+ </p>
+
+ <pre class="programlisting">BLOCK(English) Document{FEATURE("language", "en")} {
+ // rules for english documents
+}
+BLOCK(German) Document{FEATURE("language", "de")} {
+ // rules for german documents
+}</pre>
+
+ <p>
+ This example contains two simple BLOCK statements. The rules defined within the block are only executed, if the condition in the head of the block is fulfilled.
+ The rules of the first block are only considered if the feature <span class="quote">“<span class="quote">language</span>”</span> of the document annotation has the value <span class="quote">“<span class="quote">en</span>”</span>.
+ Following this, the rules of the second block are only considered for German documents.
+ </p>
+
+ <p>
+ The rule element of the block definition can also refer to other annotation types than <span class="quote">“<span class="quote">Document</span>”</span>. While the last example implemented something similar
+ to an if-statement, the next example provides a show case for something similar to a for-each-statement.
+ </p>
+
+ <pre class="programlisting">DECLARE SentenceWithNoLeadingNP;
+BLOCK(ForEach) Sentence{} {
+ Document{-STARTSWITH(NP) -> MARK(SentenceWithNoLeadingNP)};
+}
+</pre>
+
+ <p>
+ Here, the rule in the block statement is performed for each occurence of an annotation of the type <span class="quote">“<span class="quote">Sentence</span>”</span>.
+ The rule within the block matches on the complete document, which is the current sentence in the context of the block statement.
+ As a consequence, this example creates an annotation of the type <span class="quote">“<span class="quote">SentenceWithNoLeadingNP</span>”</span> for each sentence
+ that does not start with a NP annotation.
+ </p>
+
+ <p>
+ Let us take a closer look on what exactly the TextMarker rules match. The following rule matches on a word followed by another word:
+ </p>
+ <pre class="programlisting">W W;</pre>
+
+ <p>
+ To be more precise, this rule matches on all documents like <span class="quote">“<span class="quote">Apache UIMA</span>”</span>, <span class="quote">“<span class="quote">Apache UIMA</span>”</span>, <span class="quote">“<span class="quote">ApacheUIMA</span>”</span>,
+ <span class="quote">“<span class="quote">Apache <b>UIMA</b></span>”</span>. There are two main reasons for this: First of all, it depends on how the available annotations are defined. The default seeder
+ for the inital annotations creates an annotation for all characters until an upper case character occurs. Thus, the string <span class="quote">“<span class="quote">ApacheUIMA</span>”</span> consists of
+ two tokens.
+ However, more important, the TextMarker language provides a concept of visibility of the annotations. By default, all annotations of the types
+ <span class="quote">“<span class="quote">SPACE</span>”</span>, <span class="quote">“<span class="quote">NBSP</span>”</span>, <span class="quote">“<span class="quote">BREAK</span>”</span> and <span class="quote">“<span class="quote">MARKUP</span>”</span> (whitespace and XML elements) are filtered and not visible. This holds of course for
+ their covered text, too. The rule elements skip all positions of the
+ document where those annotations occur. The rule in the last example matches on all examples. Without the default filtering settings,
+ with all annotations set to visible, the rule matches only on the document <span class="quote">“<span class="quote">ApacheUIMA</span>”</span> since it is the only one that contains two word annotations without
+ any whitespace between them.
+ </p>
+
+ <p>
+ The filtering setting can also be modified by the TextMarker rules themselves. The next example provides rules that extend and limit
+ the amount of visible text of the document.
+ </p>
+
+ <pre class="programlisting">Sentence;
+Document{-> RETAINTYPE(SPACE)};
+Sentence;
+Document{-> FILTERTYPE(CW)};
+Sentence;
+Document{-> RETAINTYPE, FILTERTYPE};</pre>
+
+ <p>
+ The first rule matches on sentences, which do not start with any filtered type. Sentences that start with whitespace or markup,
+ for example, are not considered.
+ The next rule retains all text that is covered by annotations of the type <span class="quote">“<span class="quote">SPACE</span>”</span> meaning
+ that the rule elements are now sensible to whitespaces. The following rule will, therefore, match on sentences that start with whitespaces.
+ The third rule now filters the type <span class="quote">“<span class="quote">CW</span>”</span> with the consequence that all capitalized words are invisible.
+ If the following rule now wants to match on sentences, then this is only possible for Sentence annotations that do not start with a capitalized word.
+ The last rule finally resets the filtering setting to the default configuration in the TextMarker Analysis Engine.
+ </p>
+
+ <p>
+ The next example gives a showcase for importing external Analysis Engines and for modifying the documents by creating a new view called <span class="quote">“<span class="quote">modified</span>”</span>.
+ Additional Analysis Engines can be imported with the keyword <span class="quote">“<span class="quote">ENGINE</span>”</span> followed by the name of the descriptor. These imported Analysis Engines can be
+ executed with the actions <span class="quote">“<span class="quote">CALL</span>”</span> or <span class="quote">“<span class="quote">EXEC</span>”</span>. If the executed Analysis Engine adds, removes or modifies annotations, then their types need
+ to be mentioned when calling the descriptor, or else these annotations will not be correctly processed by the following TextMarker rules.
+ </p>
+
+ <pre class="programlisting">ENGINE utils.Modifier;
+Date{-> DEL};
+MoneyAmount{-> REPLACE("<MoneyAmount/>")};
+Document{-> COLOR(Headline, "green")};
+Document{-> EXEC(Modifier)};
+</pre>
+
+ <p>
+ In this example, we first import an Analysis Engine defined by the descriptor <span class="quote">“<span class="quote">Modifier.xml</span>”</span> located in the folder <span class="quote">“<span class="quote">utils</span>”</span>.
+ The descriptor needs to be located in the folder specified by the parameter <a class="link" href="#ugr.tools.tm.ae.basic.parameter.descriptorPaths" title="descriptorPaths">descriptorPaths</a>.
+ The first rule deletes all text covered by annotations of the type <span class="quote">“<span class="quote">Date</span>”</span>. The second rule replaces the text of all annotations of the type <span class="quote">“<span class="quote">MoneyAmount</span>”</span>
+ with the string <span class="quote">“<span class="quote"><MoneyAmount/></span>”</span>. The third rule remembers to set the background color of text in Headline annotation to green. The last rule
+ finally performs all of these changes in an additonal view called <span class="quote">“<span class="quote">modified</span>”</span>, which is specified in the configuration parameters of the analysis engine.
+ <a class="xref" href="#ugr.tools.tm.ae.modifier" title="1.5.4. Modifier">Section 1.5.4, “Modifier”</a> and <a class="xref" href="#ugr.tools.tm.language.modification" title="2.11. Modification">Section 2.11, “Modification”</a> provide a more detailed description.
+ </p>
+
+ </div>
+
+ <div class="section" title="1.5. UIMA Analysis Engines"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.tm.ae">1.5. UIMA Analysis Engines</h2></div></div></div>
+
+ <p>This section gives an overview of the UIMA Analysis Engines shipped with TextMarker. The most
+ important one is <span class="quote">“<span class="quote">TextMarkerEngine</span>”</span>, a generic analysis engine, which is able to interpret
+ and execute script files. The other analysis engines provide support for some additional functionality or
+ add certain types of annotations.
+ </p>
+ <div class="section" title="1.5.1. TextMarker Engine"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.tm.ae.basic">1.5.1. TextMarker Engine</h3></div></div></div>
+
+ <p>
+ This generic Analysis Engine is the most important one for the TextMarker language since it is
+ responsible for applying the TextMarker rules on a CAS. Its functionality is configured by the configuration parameters,
+ which, for example, specify the rule file that should be executed. In the TextMarker IDE, a basic template named <span class="quote">“<span class="quote">BasicEngine.xml</span>”</span>
+ is given in the descriptor folder of a TextMarker project and correctly configured descriptors typically named <span class="quote">“<span class="quote">MyScriptEngine.xml</span>”</span>
+ are generated in the descriptor folder corresponding to the package namespace of the script file.
+ The available configuration parameters of the TextMarker Analysis Engine are described in the following.
+ </p>
+ <div class="section" title="1.5.1.1. Apply TextMarker Analysis Engine in plain Java"><div class="titlepage"><div><div><h4 class="title" id="ugr.tools.tm.ae.basic.apply">1.5.1.1. Apply TextMarker Analysis Engine in plain Java</h4></div></div></div>
+
+ <p>
+ Let us assume that the reader wrote the TextMarker rules using the TextMarker Workbench, which already creates correctly configured descriptors.
+ In this case, the following java code can be used to apply the TextMarker script.
+ </p>
+ <pre class="programlisting">File specFile = new File("pathToMyWorkspace/MyProject/descriptor/"+
+ "my/package/MyScriptEngine.xml");
+XMLInputSource in = new XMLInputSource(specFile);
+ResourceSpecifier specifier = UIMAFramework.getXMLParser().
+ parseResourceSpecifier(in);
+// for import by name... set the datapath in the ResourceManager
+AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);
+CAS cas = ae.newCAS();
+cas.setDocumentText("This is my document.");
+ae.process(cas);</pre>
+ <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
+ The TextMarker Analysis Engine utilizes type priorities. If the CAS object is
+ not created using the TextMarker Analysis Engine descriptor by other means, then please
+ provide the necessary type priorities for a valid execution of the TextMarker rules.
+ </p></div>
+ <p>
+ If the TextMarker script was written, for example, with a common text editor and no configured descriptors are yet available,
+ then the following java code can be used, which, however, is only applicable for executing single script files that do not import
+ additional components or scripts. In this case the other parameters, e.g., <span class="quote">“<span class="quote">additionalScripts</span>”</span>, need to be configured correctly.
+ </p>
+ <pre class="programlisting">URL aedesc = TextMarkerEngine.class.getResource("BasicEngine.xml");
+XMLInputSource inae = new XMLInputSource(aedesc);
+ResourceSpecifier specifier = UIMAFramework.getXMLParser().
+ parseResourceSpecifier(inae);
+ResourceManager resMgr = UIMAFramework.newDefaultResourceManager();
+AnalysisEngineDescription aed = (AnalysisEngineDescription) specifier;
+TypeSystemDescription basicTypeSystem = aed.getAnalysisEngineMetaData().
+ getTypeSystem();
+
+Collection<TypeSystemDescription> tsds =
+ new ArrayList<TypeSystemDescription>();
+tsds.add(basicTypeSystem);
+// add some other type system descriptors
+// that are needed by your script file
+TypeSystemDescription mergeTypeSystems = CasCreationUtils.
+ mergeTypeSystems(tsds);
+aed.getAnalysisEngineMetaData().setTypeSystem(mergeTypeSystems);
+aed.resolveImports(resMgr);
+
+AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(aed,
+ resMgr, null);
+File scriptFile = new File("path/to/file/MyScript.tm");
+ae.setConfigParameterValue(TextMarkerEngine.SCRIPT_PATHS,
+ new String[] { scriptFile.getParent().getAbsolutePath() });
+String name = scriptFile.getName().substring(0,
+ scriptFile.getName().length() - 3);
+ae.setConfigParameterValue(TextMarkerEngine.MAIN_SCRIPT, name);
+ae.reconfigure();
+CAS cas = ae.newCAS();
+cas.setDocumentText("This is my document.");
+ae.process(cas);</pre>
+
+ </div>
+ <div class="section" title="1.5.1.2. Configuration Parameters"><div class="titlepage"><div><div><h4 class="title" id="ugr.tools.tm.ae.basic.parameter">1.5.1.2. Configuration Parameters</h4></div></div></div>
+
+ <p>
+ The configuration parameters of the TextMarker Analysis Engine can be subdivided into three
+ different groups: parameters for the setup of the environment (<a class="link" href="#ugr.tools.tm.ae.basic.parameter.mainScript" title="mainScript">mainScript</a>
+ to <a class="link" href="#ugr.tools.tm.ae.basic.parameter.additionalExtensions" title="additionalExtensions">additionalExtensions</a>),
+ parameters that change the behavior of the analysis engine (<a class="link" href="#ugr.tools.tm.ae.basic.parameter.reloadScript" title="reloadScript">reloadScript</a>
+ to <a class="link" href="#ugr.tools.tm.ae.basic.parameter.simpleGreedyForComposed" title="simpleGreedyForComposed">simpleGreedyForComposed</a>)
+ and parameters for creating additional information how the rules were executed
+ (<a class="link" href="#ugr.tools.tm.ae.basic.parameter.debug" title="debug">debug</a>
+ to <a class="link" href="#ugr.tools.tm.ae.basic.parameter.createdBy" title="createdBy">createdBy</a>). First, a short overview of the configuration parameters is given in
+ <a class="xref" href="#table.ugr.tools.tm.ae.parameter" title="Table 1.1. Configuration parameters of the TextMarker Analysis Engine">Table 1.1, “Configuration parameters of the TextMarker Analysis Engine ”</a>. Afterwards, all parameters are described in detail with examples.
+ </p>
+ <p>
+ To change the value of any configuration parameter within a TextMarker script, the CONFIGURE action (see <a class="xref" href="#ugr.tools.tm.language.actions.configure" title="2.7.8. CONFIGURE">Section 2.7.8, “CONFIGURE”</a>)
+ can be used. For changing behaviour of <a class="link" href="#ugr.tools.tm.ae.basic.parameter.dynamicAnchoring" title="dynamicAnchoring">dynamicAnchoring</a> the DYNAMICANCHORING action
+ (see <a class="xref" href="#ugr.tools.tm.language.actions.dynamicanchoring" title="2.7.11. DYNAMICANCHORING">Section 2.7.11, “DYNAMICANCHORING”</a>) is recommended.
+ </p>
+ <p>
+ </p><div class="table"><a name="table.ugr.tools.tm.ae.parameter"></a><p class="title"><b>Table 1.1. Configuration parameters of the TextMarker Analysis Engine </b></p><div class="table-contents">
+
+ <table summary="Configuration parameters of the TextMarker Analysis Engine " style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col class="c1"><col class="c2"><col class="c3"></colgroup><thead><tr><th style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="center">Name</th><th style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="center">Short description</th><th style="border-bottom: 0.5pt solid black; " align="center">Type</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.mainScript" title="mainScript">mainScript</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Name with complete namespace of the script which will be interpreted and
+ executed by the analysis engine.
+ </td><td style="border-bottom: 0.5pt solid black; ">Single String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.scriptEncoding" title="scriptEncoding">scriptEncoding</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Encoding of all TextMarker script files.</td><td style="border-bottom: 0.5pt solid black; ">Single String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.scriptPaths" title="scriptPaths">scriptPaths</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of absolute locations, which contain the necessary script files like
+ the main script.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.descriptorPaths" title="descriptorPaths">descriptorPaths</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of absolute locations, which contain the necessary descriptor files
+ like type systems.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.resourcePaths" title="resourcePaths">resourcePaths</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of absolute locations, which contain the necessary resource files like
+ word lists.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.additionalScripts" title="additionalScripts">additionalScripts</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of names with complete namespace of additional scripts, which can be
+ referred to.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.additionalEngines" title="additionalEngines">additionalEngines</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of names with complete namespace of additional analysis engines, which
+ can be called by TextMarker rules.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.additionalEngineLoaders" title="additionalEngineLoaders">additionalEngineLoaders</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of class names of implementations that are able to perform additional
+ task when loading external analysis engines.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.additionalExtensions" title="additionalExtensions">additionalExtensions</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of factory classes for additional extensions of the TextMarker language
+ like proprietary conditions.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.reloadScript" title="reloadScript">reloadScript</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option to initialize the rule script each time the analysis engine processes
+ a CAS.
+ </td><td style="border-bottom: 0.5pt solid black; ">Single Boolean</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.seeders" title="seeders">seeders</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of class names that provide additional annotations before the rules are
+ executed.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.defaultFilteredTypes" title="defaultFilteredTypes">defaultFilteredTypes</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of complete type names of annotations that are invisible by default.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.removeBasics" title="removeBasics">removeBasics</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option to remove all inference annotations after execution of the rule script.
+ </td><td style="border-bottom: 0.5pt solid black; ">Single Boolean</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.dynamicAnchoring" title="dynamicAnchoring">dynamicAnchoring</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option to allow rule matches to start at any rule element.</td><td style="border-bottom: 0.5pt solid black; ">Single Boolean</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.lowMemoryProfile" title="lowMemoryProfile">lowMemoryProfile</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option to decrease the memory consumption when processing a large CAS.</td><td style="border-bottom: 0.5pt solid black; ">Single Boolean</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.simpleGreedyForComposed" title="simpleGreedyForComposed">simpleGreedyForComposed</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option to activate a different inferencer for composed rule elements.</td><td style="border-bottom: 0.5pt solid black; ">Single Boolean</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.debug" title="debug">debug</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option to add debug information to the CAS.</td><td style="border-bottom: 0.5pt solid black; ">Single Boolean</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.debugWithMatches" title="debugWithMatches">debugWithMatches</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option to add information about the rule matches to the CAS.</td><td style="border-bottom: 0.5pt solid black; ">Single Boolean</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.debugOnlyFor" title="debugOnlyFor">debugOnlyFor</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">List of rule ids. If provided, then debug information is only created for
+ those rules.
+ </td><td style="border-bottom: 0.5pt solid black; ">Multi String</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.profile" title="profile">profile</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option to add profile information to the CAS.</td><td style="border-bottom: 0.5pt solid black; ">Single Boolean</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.statistics" title="statistics">statistics</a>
+ </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option to add statistics of conditions and actions to the CAS.</td><td style="border-bottom: 0.5pt solid black; ">Single Boolean</td></tr><tr><td style="border-right: 0.5pt solid black; ">
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.createdBy" title="createdBy">createdBy</a>
+ </td><td style="border-right: 0.5pt solid black; ">Option to add additional information, which rule created a annotation.
+ </td><td style="">Single Boolean</td></tr></tbody></table>
+ </div></div><p><br class="table-break">
+ </p>
+ <div class="section" title="mainScript"><div class="titlepage"><div><div><h5 class="title" id="ugr.tools.tm.ae.basic.parameter.mainScript">mainScript</h5></div></div></div>
+
+ <p>
+ This parameter specifies the rule file that will be executed by the analysis engine and is,
+ therefore, one of the most important ones. The exact name of the script is given by the complete namespace of the file, which corresponds to its location
+ relative to the given parameter <a class="link" href="#ugr.tools.tm.ae.basic.parameter.scriptPaths" title="scriptPaths">scriptPaths</a>.
+ The single names of packages (or folders) are separated by periods. An exemplary value for this parameter could be "org.apache.uima.Main",
+ whereas "Main" specifies the file containing the rules and "org.apache.uima" its package.
+ In this case, the analysis engine loads the script file "Main.tm", which is located in the folder structure "org/apache/uima/".
+ This parameter has no default value and has to be provided, although it is not specified as mandatory.
+ </p>
+ </div>
+ <div class="section" title="scriptEncoding"><div class="titlepage"><div><div><h5 class="title" id="ugr.tools.tm.ae.basic.parameter.scriptEncoding">scriptEncoding</h5></div></div></div>
+
+ <p>
+ This parameter specifies the encoding of the rule files. Its default value is "UTF-8".
+ </p>
+ </div>
+ <div class="section" title="scriptPaths"><div class="titlepage"><div><div><h5 class="title" id="ugr.tools.tm.ae.basic.parameter.scriptPaths">scriptPaths</h5></div></div></div>
+
+ <p>
+ The parameter scriptPaths refers to a list of String values, which specify the possible locations of script files.
+ The given locations are absolute paths. A typical value for this parameter is, for example, "C:/TextMarker/MyProject/script/".
+ If the parameter <a class="link" href="#ugr.tools.tm.ae.basic.parameter.mainScript" title="mainScript">mainScript</a> is set to org.apache.uima.Main,
+ then the absolute path of the script file has to be "C:/TextMarker/MyProject/script/org/apache/uima/Main.tm".
+ This parameter can contain multiple values, as the main script can refer to multiple projects similar to a class path in Java.
+ </p>
+ </div>
+ <div class="section" title="descriptorPaths"><div class="titlepage"><div><div><h5 class="title" id="ugr.tools.tm.ae.basic.parameter.descriptorPaths">descriptorPaths</h5></div></div></div>
+
+ <p>
+ This parameter specifies the possible locations for descriptors like analysis engines or type systems, similar to the parameter
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.scriptPaths" title="scriptPaths">scriptPaths</a> for the script files. A typical value for this parameter
+ is for example "C:/TextMarker/MyProject/descriptor/".
+ The relative values of the parameter <a class="link" href="#ugr.tools.tm.ae.basic.parameter.additionalEngines" title="additionalEngines">additionalEngines</a> are
+ resolved to these absolute locations.
+ This parameter can contain multiple values, as the main script can refer to multiple projects similar to a class path in Java.
+ </p>
+ </div>
+ <div class="section" title="resourcePaths"><div class="titlepage"><div><div><h5 class="title" id="ugr.tools.tm.ae.basic.parameter.resourcePaths">resourcePaths</h5></div></div></div>
+
+ <p>
+ This parameter specifies the possible locations of additional resources like word lists or CSV tables. The string values have to contain absolute
+ locations, for example, "C:/TextMarker/MyProject/resources/".
+ </p>
+ </div>
+ <div class="section" title="additionalScripts"><div class="titlepage"><div><div><h5 class="title" id="ugr.tools.tm.ae.basic.parameter.additionalScripts">additionalScripts</h5></div></div></div>
+
+ <p>
+ The parameter additionalScripts is defined as a list of string values and contains script files, which are additionally loaded by the analysis engine. These script files are specified by their
+ complete namespace, exactly like the value of the parameter <a class="link" href="#ugr.tools.tm.ae.basic.parameter.mainScript" title="mainScript">mainScript</a>
+ and can be refered to by language elements, e.g., by executing the containing rules. An exemplary value of this parameter is "org.apache.uima.SecondaryScript". In this example, the main script could import
+ this script file by the declaration "SCRIPT org.apache.uima.SecondaryScript;" and then could execute it with the rule
+ "Document{-> CALL(SecondaryScript)};".
+ </p>
+ </div>
+ <div class="section" title="additionalEngines"><div class="titlepage"><div><div><h5 class="title" id="ugr.tools.tm.ae.basic.parameter.additionalEngines">additionalEngines</h5></div></div></div>
+
+ <p>
+ This parameter contains a list of additional analysis engines, which can be executed by the TextMarker rules. The single values
+ are given by the name of the analysis engine with their complete namespace and have to be located relative to one value of the parameter
+ <a class="link" href="#ugr.tools.tm.ae.basic.parameter.descriptorPaths" title="descriptorPaths">descriptorPaths</a>, the location where the analysis engine searches for the descriptor file.
+ An example for one value of the parameter is "utils.HtmlAnnotator", which points to the descriptor "HtmlAnnotator.xml" in the folder "utils".
+ </p>
+ </div>
+ <div class="section" title="additionalEngineLoaders"><div class="titlepage"><div><div><h5 class="title" id="ugr.tools.tm.ae.basic.parameter.additionalEngineLoaders">additionalEngineLoaders</h5></div></div></div>
+
+ <p>
+ The parameter "additionalEngineLoaders" specifies a list of optional implementations of the interface
+ "org.apache.uima.textmarker.extensions.IEngineLoader", which can be used to application-specific configurations of
+ additional analysis engines.
+ </p>
+ </div>
+ <div class="section" title="additionalExtensions"><div class="titlepage"><div><div><h5 class="title" id="ugr.tools.tm.ae.basic.parameter.additionalExtensions">additionalExtensions</h5></div></div></div>
+
+ <p>
[... 5020 lines stripped ...]