You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2012/10/15 18:22:24 UTC
svn commit: r1398363 [2/3] - in
/uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook: ./
images/tools/tm/ images/tools/tm/workbench/
images/tools/tm/workbench/explain/ images/tools/tm/workbench/overview/
images/tools/tm/workbench/projects...
Modified: uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.workbench.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.workbench.xml?rev=1398363&r1=1398362&r2=1398363&view=diff
==============================================================================
--- uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.workbench.xml (original)
+++ uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/tools.textmarker.workbench.xml Mon Oct 15 16:22:23 2012
@@ -5,1479 +5,40 @@
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
%uimaents;
]>
-<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor
- license agreements. See the NOTICE file distributed with this work for additional
- information regarding copyright ownership. The ASF licenses this file to
- you under the Apache License, Version 2.0 (the "License"); you may not use
- this file except in compliance with the License. You may obtain a copy of
- the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
- by applicable law or agreed to in writing, software distributed under the
- License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
- OF ANY KIND, either express or implied. See the License for the specific
- language governing permissions and limitations under the License. -->
-
+<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.
+ See the NOTICE file distributed with this work for additional information regarding copyright ownership.
+ The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not
+ use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software distributed under the License is
+ distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and limitations under the License. -->
+
<chapter id="ugr.tools.tm.workbench">
- <title>TextMarker Workbench</title>
- <para>
- </para>
-
- <section id="ugr.tools.tm.install">
- <title>Installation</title>
- <para>
- # Download, install and start an Eclipse 3.5 or Eclipse
- 3.6.
- #
- Add the Apache UIMA update site
- (http://www.apache.org/dist/uima/eclipse-update-site/) and the
- TextMarker update site
- (http://ki.informatik.uni-wuerzburg.de/~pkluegl/updatesite/) to the
- available software sites in your Eclipse installation. This can be
- achived in the "Install New Software" dialog in the help menu of
- Eclipse.
- # Eclipse 3.6: TextMarker is currently based on DLTK
- 1.0.
- Therefore, adding the DLTK 1.0 update site
- (http://download.eclipse.org/technology/dltk/updates-dev/1.0/) is
- required since the Eclipse 3.6 update site only supports DLTK 2.0.
- #
- Select "Install New Software" in the help menu of Eclipse, if not
- done yet.
- # Select the TextMarker update site at "Work with",
- deselect "Group
- items by category" and select "Contact all update
- sites during
- install to find required software"
- # Select the
- TextMarker feature and continue the dialog. The CEV
- feature is
- already contained in the TextMarker feature. Eclipse will
- automatically install the Apache UIMA (version 2.3) plugins and the
- DLTK Core Framework (version 1.X) plugins.
- # ''(OPTIONAL)'' If
- additional HTML visualizations are desired, then
- also install the CEV
- HTML feature. However, you need to install the
- XPCom and XULRunner
- features previously, for example by using an
- appropriate update site
- (http://ftp.mozilla.org/pub/mozilla.org/xulrunner/eclipse/). Please
- refer to the [CEV installation instruction|CEVInstall] for details.
- # After the successful installation, switch to the TextMarker
- perspective.
-
- You can also download the TextMarker plugins from
- [SourceForge.net|https://sourceforge.net/projects/textmarker/] and
- install the plugins mentioned above manually.
- </para>
- </section>
- <section id="ugr.tools.tm.project">
- <title>TextMarker Projects</title>
- <para>
- Similar to Java projects in Eclipse, the TextMarker workbench
- provides the possibility to create TextMarker projects. TextMarker
- projects require a certain folder structure that is created with the
- project. The most important folders are the script folder that
- contains the TextMarker rule files in a package and the descriptor
- folder that contains the generated UIMA components. The input folder
- contains the text files or xmiCAS files that will be executed when
- starting a TextMarker script. The result will be placed in the
- output folder.
-
- <programlisting><![CDATA[
- ||Project element|| Used for
- | Project | the TextMarker project
- | - script | source folder with TextMarker scripts
- | -- my.package | the package, resulting in several folders
- | --- Script.tm | a TextMarker script
- | - descriptor | build folder for UIMA components
- | -- my/package | the folder structure for the components
- | --- ScriptEngine.xml | the analysis engine of the Script.tm script
- | --- ScriptTypeSystem.xml | the type system of the Script.tm script
- | -- BasicEngine.xml | the analysis engine template for all generated engines in this project
- | -- BasicTypeSystem.xml | the type system template for all generated type systems in this project
- | -- InternalTypeSystem.xml | a type system with TextMarker types
- | -- Modifier.xml | the analysis engine of the optional modifier that creates the ''modified'' view
- | - input | folder that contains the files that will be processed when launching a TextMarker script
- | -- test.html | an input file containing html
- | -- test.xmi | an input file containing text and annotations
- | - output | folder that contains the files that were processed by a TextMarker script
- | -- test.html.modified.html | the result of the modifier: replaced text and colored html
- | -- test.html.xmi | the result CAS with optional information
- | -- test.xmi.modified.html | the result of the modifier: replaced text and colored html
- | -- test.xmi.xmi | the result CAS with optional information
- | - resources | default folder for word lists and dictionaries
- | -- Dictionary.mtwl | a dictionary in the "multi tree word list" format
- | -- FirstNames.txt | a simple word list with first names: one first name per line
- | - test | test-driven development is still under construction
-]]></programlisting>
-
- </para>
-
- </section>
- <section id="ugr.tools.tm.explain">
- <title>Explanation</title>
- <para>
- Handcrafting rules is laborious, especially if the newly
- written rules do not
- behave as expected. The TextMarker System is
- able to protocol the
- application of each single rule and block in
- order to provide an
- explanation of the rule inference and a minmal
- debug functionality.
-
- The explanation component is built upon the CEV
- plugin. The
- information about the application of the rules itself is
- stored in
- the result xmiCAS, if the parameter of the executed engine
- are
- configured correctly. The simplest way the generate these
- information is to open a TextMarker file and click on the common
- "Debug" button (looks like a green bug) in your eclipse. The current
- TextMarker file will then be executed on the text files in the input
- directory and xmiCAS are created in the output directory containing
- the additional UIMA feature structures describing the rule
- inference. The resulting xmiCAS needs to be opened with the CEV
- plugin. However, only additional views are capable of displaying the
- debug information. In order to open the neccessary views, you can
- either open the "Explain" perspective or open the views separately
- and arrange them as you like.
-
- There are currently seven views that
- display information about the
- execution of the rules: Applied Rules,
- Selected Rules, Rule List,
- Matched Rules, Failed Rules, Rule Elements
- and Basic Stream.
-
- </para>
-
- </section>
- <section id="ugr.tools.tm.dictionaries">
- <title>Dictionariers</title>
- <para>
-
- The TextMarker system suports currently the usage of dictionaries in
- four different ways. The files are always encoded with UTF-8. The
- generated analysis engines provide a parameter "resourceLocation"
- that specifies the folder that contains the external dictionary
- files. The paramter is initially set to the resource folder of the
- current TextMarker project. In order to use a different folder,
- change for example set value of the paramter and rebuild all
- TextMarker rule files in the project in order to update all analysis
- engines.
-
- The algorithm for the detection of the entires of a
- dictionary:
-
- <programlisting><![CDATA[
-for all basic annotations of the matched annotation do
- set current candidate to current basic
- loop
- if the dictionary contains current candidate then
- remember candidate
- else if an entry of the dictionary starts with the current candidate then
- add next basic annotation to the current candidate
- continue loop
- else
- stop loop
-]]></programlisting>
-
-
-
-
- Word List (.txt)
- Word lists are simple text files that contain a term
- or string in each
- line. The strings may include white spaces and are
- sperated by a
- line break.
-
- Usage:
- Content of a file named FirstNames.txt
- (located in the resource folder of a
- TextMarker project):
- <programlisting><![CDATA[
-Peter
-Jochen
-Joachim
-Martin
-]]></programlisting>
-
- Examplary rules:
- <programlisting><![CDATA[
-LIST FirstNameList = 'FirstNames.txt';
-DECLARE FirstName;
-Document{-> MARKFAST(FirstName, FirstNameList)};
-]]></programlisting>
-
- In this example, all first names in the given text file are
- annotated in the input document with the type FirstName.
-
- Tree Word
- List (.twl)
- A tree word list is a compiled word list similar to a
- trie. A .twl
- file is an XML-file that contains a tree-like structure
- with a node
- for each character. The nodes themselves refer to child
- nodes that
- represent all characters that succeed the caracter of the
- parent
- node. For single word entries, this is resulting in a
- complexity of
- O(m*log(n)) instead of a complexity of O(m*n) (simple
- .txt file),
- whereas m is the amount of basic annotations in the
- document and n
- is the amount of entries in the dictionary.
-
- Usage:
- A
- .twl file are generated using the popup menu. Select one or more
- .txt files (or a folder containing .txt files), click the right
- mouse button and choose ''Convert to TWL''. Then, one or more .twl
- files are generated with the according file name.
-
- Examplary rules:
-
- <programlisting><![CDATA[
-LIST FirstNameList = 'FirstNames.twl';
-DECLARE FirstName;
-Document{-> MARKFAST(FirstName, FirstNameList)};
-]]></programlisting>
-
- In this example, all first names in the given text file are again
- annotated in the input document with the type FirstName.
-
- Multi Tree
- Word List (.mtwl)
- A multi tree word list is generated using multiple
- .txt files and
- contains special nodes: Its nodes provide additional
- information
- about the original file. The .mtwl files are useful, if
- several
- different dictionaries are used in a TextMarker file. For
- five
- dictionaries, for example, also five MARKFAST rules are
- necessary.
- Therefore the matched text is searched five times and the
- complexity
- is 5 * O(m*log(n)). Using a .mtwl file reduces the
- complexity to
- about O(m*log(5*n)).
-
- Usage:
- A .mtwl file is generated
- using the popup menu. Select one or more
- .txt files (or a folder
- containing .txt files), click the right
- mouse button and choose
- ''Convert to MTWL''. A .mtwl file named
- "generated.mtwl" is then
- generated that contains the word lists of
- all selected .txt files.
- Renaming the .mtwl file is recommended.
-
-
- If there are for example two
- or more word lists with the name
- "FirstNames.txt", "Companies.txt"
- and so on given and the generated
- .mtwl file is renamed to
- "Dictionary.mtwl", then the following rule
- annotates all companies
- and first names in the complete document.
-
- Examplary rules:
-
- <programlisting><![CDATA[
-LIST Dictionary = 'Dictionary.mtwl';
-DECLARE FirstName, Company;
-Document{-> TRIE("FirstNames.txt" = FirstName, "Companies.txt" = Company, Dictionary, false, 0, false, 0, "")};
-]]></programlisting>
-
-
-
-
- Table (.csv)
- The TextMarker system also supports .csv files,
- respectively tables.
-
- Usage:
- Content of a file named TestTable.csv
- (located in the resource folder of a
- TextMarker project):
- <programlisting><![CDATA[
-Peter;P;
-Jochen;J;
-Joba;J;
-]]></programlisting>
-
- Examplary rules:
- <programlisting><![CDATA[
-PACKAGE de.uniwue.tm;
-TABLE TestTable = 'TestTable.csv';
-DECLARE Annotation Struct (STRING first);
-Document{-> MARKTABLE(Struct, 1, TestTable, "first" = 2)};
-]]></programlisting>
- In this example, the document is searched for all occurences of the
- entries of the first column of the given table, an annotation of the
- type Struct is created and its feature "first" is filled with the
- entry of the second column.
-
- For the input document with the content
- "Peter" the result is a single
- annotation of the type Struct and with
- P assigned to its features
- "first".
-
- </para>
-
- </section>
- <section id="ugr.tools.tm.parameters">
- <title>Parameters</title>
- <para>
- <itemizedlist>
- <listitem>
- <para>mainScript (String): This is the TextMarker script that
- will
- be loaded and executed by the generated engine. The string
- is
- referencing the name of the file without file extension but
- with
- its complete namespace, e.g., my.package.Main.
- </para>
- </listitem>
-
- <listitem>
- <para>scriptPaths (Multiple Strings): The given strings
- specify the
- folders that contain TextMarker script files, the
- main script file
- and the additional script files in particular.
- Currently, there is
- only one folder supported in the TextMarker
- workbench (script).
- </para>
- </listitem>
-
- <listitem>
- <para>enginePaths (Multiple Strings): The given strings
- specify the
- folders that contain additional analysis engines that
- are called
- from within a script file. Currently, there is only
- one folder
- supported in the TextMarker workbench (descriptor).
- </para>
- </listitem>
-
- <listitem>
- <para>resourcePaths (Multiple Strings): The given strings
- specify
- the folders that contain the word lists and dictionaries.
- Currently, there is only one folder supported in the TextMarker
- workbench (resources).
-
- </para>
- </listitem>
-
- <listitem>
- <para>additionalScripts (Multiple Strings): This parameter
- contains a list of all known script files references with their
- complete namespace, e.g., my.package.AnotherOne.
- </para>
- </listitem>
-
- <listitem>
- <para>additionalEngines (Multiple Strings): This parameter
- contains a list of all known analysis engines.
- </para>
- </listitem>
-
- <listitem>
- <para>additionalEngineLoaders (Multiple Strings): This
- parameter
- contains the class names of the implementations that
- help to load
- more complex analysis engines.
-
- </para>
- </listitem>
-
- <listitem>
- <para>scriptEncoding (String): The encoding of the script
- files.
- Not yet supported, please use UTF-8.
- </para>
- </listitem>
-
- <listitem>
- <para>defaultFilteredTypes (Multiple Strings): The complete
- names
- of the types that are filtered by default.
- </para>
- </listitem>
-
- <listitem>
- <para>defaultFilteredMarkups (Multiple Strings): The names of
- the
- markups that are filtered by default.
-
- </para>
- </listitem>
-
- <listitem>
- <para>seeders (Multiple Strings):
- </para>
- </listitem>
-
- <listitem>
- <para>useBasics (String):
- </para>
- </listitem>
-
- <listitem>
- <para>removeBasics (Boolean):
-
- </para>
- </listitem>
-
- <listitem>
- <para>debug (Boolean):
- </para>
- </listitem>
-
- <listitem>
- <para>profile (Boolean):
- </para>
- </listitem>
-
- <listitem>
- <para>debugWithMatches (Boolean):
- </para>
- </listitem>
-
- <listitem>
- <para>statistics (Boolean):
- </para>
- </listitem>
-
- <listitem>
- <para>debugOnlyFor (Multiple Strings):
-
- </para>
- </listitem>
-
- <listitem>
- <para>style (Boolean):
- </para>
- </listitem>
-
- <listitem>
- <para>styleMapLocation (String):
- </para>
- </listitem>
- </itemizedlist>
- </para>
-
- </section>
- <section id="ugr.tools.tm.query">
- <title>Query</title>
- <para>
- The query view can be used to write queries on several documents
- within a folder with the TextMArker language.
-
- A short example how to
- use the Query view:
- <itemizedlist>
- <listitem>
- <para> In the first field ''Query Data'', the folder is added in
- which the query is executed, for example with drag and drop from
- the script explorer. If the checkbox is activated, then all
- subfolder will be included in the query.
- </para>
- </listitem>
- <listitem>
- <para> The next field ''Type System'' must contain a type system
- or a TextMarker script that specifies all types that are used in
- the query.
- </para>
- </listitem>
- <listitem>
- <para> The query in form of one or more TextMarker rules is
- specified in the text field in the middle of the view. In the
- example of the screenshot, all ''Author'' annotations are
- selected that contain a ''FalsePositive'' or ''FalseNegative''
- annotation.
- </para>
- </listitem>
- <listitem>
- <para> If the start button near the tab of the view in the upper
- right corner ist pressed, then the results are displayed.
- </para>
- </listitem>
- </itemizedlist>
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata scale="80" format="PNG" fileref="&imgroot;Query.png" />
- </imageobject>
- <textobject>
- <phrase>Query View</phrase>
- </textobject>
- </mediaobject>
- </screenshot>
-
- </para>
- </section>
- <section id="ugr.tools.tm.views">
- <title>Views</title>
- <para>
-
- </para>
- <section id="ugr.tools.tm.views.browser">
- <title>Annotation Browser</title>
- <para>
- </para>
- </section>
- <section id="ugr.tools.tm.views.editor">
- <title>Annotation Editor</title>
- <para>
- </para>
- </section>
- <section id="ugr.tools.tm.views.palette">
- <title>Marker Palette</title>
- <para>
- </para>
- </section>
- <section id="ugr.tools.tm.views.selection">
- <title>Selection</title>
- <para>
- </para>
- </section>
-
- <section id="ugr.tools.tm.views.stream">
- <title>Basic Stream</title>
- <para>
- The basic stream contains a listing of the complete disjunct
- partition
- of the document by the TextMarkerBasic annotation that are
- used for
- the inference and the annotation seeding.
- </para>
- </section>
-
- <section id="ugr.tools.tm.views.applied">
- <title>Applied Rules</title>
- <para>
- The Applied Rules views displays how often a rule tried to
- apply and
- how often the rule succeeded. Additionally some profiling
- information is added after a short verbalisation of the rule. The
- information is structured: if BLOCK constructs were used in the
- executed TextMarker file, the rules contained in that block will be
- represented as child node in the tree of the view. Each TextMarker
- file is itself a BLOCK construct named after the file. Therefore
- the root node of the view is always a BLOCK containing the rules of
- the executed TextMarker script. Additionally, if a rule calls a
- different TextMarker file, then the root block of that file is the
- child of that rule. The selection of a rule in this view will
- directly change the information visualized in the other views.
-
- </para>
- </section>
- <section id="ugr.tools.tm.views.selected">
- <title>Selected Rules</title>
- <para>
- This views is very similar to the Applied Rules view, but
- displays only
- rules and blocks under a given selection. If the user
- clicks on the
- document, then an Applied Rule view is generated
- containing only
- element that affect that position in the document.
- The Rule
- Elements view then only contains match information of that
- position, but the result of the rule element match is still
- displayed.
- </para>
- </section>
-
- <section id="ugr.tools.tm.views.rulelist">
- <title>Rule List</title>
- <para>
- This views is very similar to the Applied Rules view and the
- Selected
- Rules view, but displays only rules and NO blocks under
- a
- given
- selection. If the user clicks on the document, then a list
- of
- rules
- is generated that matched or tried to match on that
- position in
- the
- document. The Rule Elements view then only contains
- match
- information of that position, but the result of the rule
- element
- match is still displayed. Additionally, this view provides a
- text
- field for filtering the rules. Only those rules remain that
- contain
- the entered text in their verbalization.
- </para>
- </section>
-
- <section id="ugr.tools.tm.views.matched">
- <title>Matched Rules</title>
- <para>
- If a rule is selected in the Applied Rules views, then this
- view
- displays the instances (text passages) where this rules
- matched.
- </para>
- </section>
-
- <section id="ugr.tools.tm.views.failed">
- <title>Failed Rules</title>
- <para>
- If a rule is selected in the Applied Rules views, then this
- view
- displays the instances (text passages) where this rules failed
- to
- match.
- </para>
- </section>
-
- <section id="ugr.tools.tm.views.elements">
- <title>Rule Elements</title>
- <para>
- If a successful or failed rule match in the Matched Rules view
- or
- Failed Rules view is selected, then this views contains a listing
- of the rule elements and their conditions. There is detailed
- information available on what text each rule element matched and
- which condition did evavaluate true.
- </para>
- </section>
-
- <section id="ugr.tools.tm.views.statistics">
- <title>Statistics</title>
- <para>
- This views displays the used conditions and actions of the
- TextMarker
- language. Three numbers are given for each element: The
- total time
- of execution, the amount of executions and the time per
- execution.
- </para>
- </section>
- <section id="ugr.tools.tm.views.fp">
- <title>False Positive</title>
- <para>
- </para>
- </section>
-
- <section id="ugr.tools.tm.views.fn">
- <title>False Negative</title>
- <para>
- </para>
- </section>
-
- <section id="ugr.tools.tm.views.tp">
- <title>True Positive</title>
- <para>
-
- </para>
- </section>
- </section>
- <section id="ugr.tools.tm.testing">
- <title>Testing</title>
- <para>
- The TextMarker Software comes bundled with its own testing
- environment,
- that allows you to test and evaluate TextMarker scripts.
- It provides
- full back end testing capabilities and allows you to
- examine test
- results in detail. As a product of the testing operation
- a new
- document file will be created and detailed information on how
- well
- the script performed in the test will be added to this document.
- </para>
- <section id="ugr.tools.tm.testing.overview">
- <title>Overview</title>
- <para>
- The testing procedure compares a previously annotated gold standard
- file with the result of the selected TextMarker script using an
- evaluator. The evaluators compare the offsets of annotations in
- both documents and, depending on the evaluator, mark a result
- document with true positive, false positive or false negative
- annotations. Afterwards the f1-score is calculated for the whole
- set of tests, each test file and each type in the test file.
- The testing environment contains the following parts :
- <itemizedlist>
- <listitem>
- <para>Main view</para>
- </listitem>
- <listitem>
- <para>Result views : true positive, false positive, false
- negative view
- </para>
- </listitem>
- <listitem>
- <para>Preference page</para>
- </listitem>
- </itemizedlist>
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata scale="80" format="PNG"
- fileref="&imgroot;Screenshot_main.png" />
- </imageobject>
- <textobject>
- <phrase>Eclipse with open TextMarker and testing environment.
- </phrase>
- </textobject>
- </mediaobject>
- </screenshot>
- All control elements,that are needed for the interaction with the
- testing environment, are located in the main view.
- This is also
- where test files can be selected and information, on how
- well the
- script performed is, displayed. During the testing process
- a result
- CAS file is produced that will contain new
- annotation types like
- true positives (tp), false positives (fp) and false
- negatives (fn).
- While displaying the result .xmi file in the script
- editor,
- additional
- views allow easy navigation through the new annotations.
- Additional tree
- views, like the true positive view, display the
- corresponding
- annotations in a
- hierarchic structure. This allows an
- easy tracing of the results inside the
- testing document. A
- preference page allows customization of the
- behavior
- of the testing
- plug-in.
- </para>
- <section id="ugr.tools.tm.testing.overview.main">
- <title>Main View</title>
- <para>
- The following picture shows a close up view of the testing
- environments main-view part. The toolbar contains all buttons
- needed to operate the plug-ins. The first line shows the name of
- the script that is going to be tested and a combo-box, where the
- view, that should be tested, is selected. On the right follow
- fields that will show some basic information of the results of the
- test-run.
- Below and on the left the test-list is located. This list
- contains the
- different test-files. Right besides it, you will find
- a table with
- statistic information. It shows a total tp, fp and fn
- information,
- as well as precision, recall and f1-score of every
- test-file and
- for every type in each file.
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata scale="80" format="PNG"
- fileref="&imgroot;Screenshot_testing_desc_3_resize.png" />
- </imageobject>
- <textobject>
- <phrase>The main view of the testing environment.</phrase>
- </textobject>
- </mediaobject>
- </screenshot>
- </para>
- </section>
- <section id="ugr.tools.tm.testing.overview.result">
- <title>Result Views</title>
- <para>
- This views add additional information to the CAS View, once a
- result file is opened. Each view displays one of the following
- annotation types in a hierarchic tree structure : true positives,
- false positive and false negative. Adding a check mark to one of
- the annotations in a result view, will highlight the annotation in
- the CAS Editor.
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata scale="80" format="PNG"
- fileref="&imgroot;Screenshot_result.png" />
- </imageobject>
- <textobject>
- <phrase>The main view of the testing environment.</phrase>
- </textobject>
- </mediaobject>
- </screenshot>
- </para>
- </section>
- <section id="ugr.tools.tm.testing.overview.preferences">
- <title>Preference Page</title>
- <para>
- The preference page offers a few options that will modify the
- plug-ins general behavior. For example the preloading of
- previously collected result data can be turned off, should it
- produce a to long loading time. An important option in the
- preference page is the evaluator you can select. On default the
- "exact evaluator" is selected, which compares the offsets of the
- annotations, that are contained in the file produced by the
- selected script, with the annotations in the test file. Other
- evaluators will compare annotations in a different way.
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata scale="80" format="PNG"
- fileref="&imgroot;Screenshot_preferences.png" />
- </imageobject>
- <textobject>
- <phrase>The preference page of the testing environment.
- </phrase>
- </textobject>
- </mediaobject>
- </screenshot>
- </para>
- </section>
- <section id="ugr.tools.tm.testing.overview.project">
- <title>The TextMarker Project Structure</title>
- <para>
- The picture shows the TextMarker's script explorer. Every
- TextMarker project contains a folder called "test". This folder is
- the default location for the test-files. In the folder each
- script-file has its own sub-folder with a relative path equal to
- the scripts package path in the "script" folder. This folder
- contains the test files. In every scripts test-folder you will
- also find a result folder with the results of the tests. Should
- you use test-files from another location in the file-system, the
- results will be saved in the "temp" sub-folder of the projects
- "test" folder. All files in the "temp" folder will be deleted,
- once eclipse is closed.
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata scale="80" format="PNG"
- fileref="&imgroot;folder_struc_sep_desc_cut.png" />
- </imageobject>
- <textobject>
- <phrase>Script Explorer with the test folder expanded.</phrase>
- </textobject>
- </mediaobject>
- </screenshot>
- </para>
- </section>
- </section>
- <section id="ugr.tools.tm.testing.usage">
- <title>Usage</title>
- <para>
- This section will demonstrate how to use the testing
- environment.
- It will show the basic actions needed to perform a test
- run.
- </para>
- <para>
- Preparing Eclipse:
- The testing environment provides its own
- perspective called
- "TextMarker Testing". It will display the main
- view as well as the
- different result views on the right hand side.
- It is encouraged to
- use this perspective, especially when working
- with the testing
- environment for the first time.
- </para>
- <para>
- Selecting a script for testing:
- TextMarker will always test the
- script, that is currently open in the
- script-editor. Should another
- editor be open, for example a
- java-editor with some java class being
- displayed, you will see that
- the testing view is not available.
- </para>
- <para>
- Creating a test file:
- A test-file is a previously annotated
- .xmi file that can be used as
- a golden standard for the test. To
- create such a file, no
- additional tools will be provided, instead
- the TextMarker system
- already provides such tools.
- </para>
- <para>
- Selecting a test-file:
- Test files can be added to the test-list
- by simply dragging them from
- the Script Explorer into the test-file
- list. Depending on the
- setting in the preference page, test-files
- from a scripts "test"
- folder might already be loaded into the list.
- A different way to
- add test-files is to use the "Add files from
- folder" button. It can
- be used to add all .xmi files from a selected
- folder. The "del" key
- can be used to remove files from the
- test-list.
- </para>
- <para>
- Selecting a CAS View to test:
- TextMarker supports different
- views, that allow you to operate on different
- levels in a document.
- The InitialView is selected as default,
- however you can also switch
- the evaluation to another view by
- typing the views name into the
- list or selecting the view you wish
- to use from the list.
- </para>
- <para>
- Selecting the evaluator:
- The testing environment supports
- different evaluators that allow a
- sophisticated analysis of the
- behavior of a TextMarker script. The
- evaluator can be chosen in the
- testing environments preference
- page. The preference page can be
- opened either trough the menu or
- by clicking the blue preference
- buttons in the testing views
- toolbar. The default evaluator is the
- "Exact CAS Evaluator" which
- compares the offsets of the annotations
- between the test file and
- the file annotated by the tested script.
- </para>
- <para>
- Excluding Types:
- During a test-run it might be convenient to
- disable testing for specific
- types like punctuation or tags. The
- ''exclude types`` button will
- open a dialog where all types can be
- selected that should not be
- considered in the test.
- </para>
- <para>
- Running the test:
- A test-run can be started by clicking on the
- green start button in
- the toolbar.
- </para>
- <para>
- Result Overview:
- The testing main view displays some
- information, on how well the
- script did, after every test run. It
- will display an overall number
- of true positive, false positive and
- false negatives annotations of
- all result files as well as an
- overall f1-score. Furthermore a
- table will be displayed that
- contains the overall statistics of the
- selected test file as well as
- statistics for every single type in
- the test file. The information
- displayed are true positives, false
- positives, false negatives,
- precision, recall and f1-measure.
- </para>
- <para>
- The testing environment also supports the export of the
- overall data
- in form of a comma-separated table. Clicking the export
- evaluation
- data will open a dialog window that contains this table.
- The text
- in this table can be copied and easily imported into
- OpenOffice.org
- or MS Excel.
- </para>
- <para>
- Result Files:
- When running a test, the evaluator will create a new
- result .xmi file
- and will add new true positive, false positive and
- false negative
- annotations. By clicking on a file in the test-file
- list, you can
- open the corresponding result .xmi file in the
- TextMarker script
- editor. When opening a result file in the script
- explorer,
- additional views will open, that allow easy access and
- browsing of
- the additional debugging annotations.
- <screenshot>
- <mediaobject>
- <imageobject>
- <imagedata scale="80" format="PNG"
- fileref="&imgroot;Screenshot_Result_TP_desc_close_cut.png" />
- </imageobject>
- <textobject>
- <phrase>Open result file and selected true positive annotation
- in the true positive view.
- </phrase>
- </textobject>
- </mediaobject>
- </screenshot>
- </para>
- </section>
- <section id="ugr.tools.tm.testing.evaluators">
- <title>Evaluators</title>
- <para>
- When testing a CAS file, the system compared the offsets of
- the
- annotations of a previously annotated gold standard file with
- the
- offsets of the annotations
- of the result file the script
- produced. Responsible for comparing
- annotations in the two CAS files
- are evaluators. These evaluators
- have different methods
- and
- strategies, for comparing the annotations, implemented. Also a
- extension point is provided that allows easy implementation new
- evaluators.
- </para>
- <para>
- Exact Match Evaluator:
- The Exact Match Evaluator compares the
- offsets of the annotations in
- the result and the golden standard
- file. Any difference will be
- marked with either an false positive or
- false negative annotations.
- </para>
- <para>
- Partial Match Evaluator:
- The Partial Match Evaluator compares
- the offsets of the annotations in
- the result and golden standard
- file. It will allow differences in
- the beginning or the end of an
- annotation. For example "corresponding" and "corresponding " will
- not be
- annotated as an error.
- </para>
- <para>
- Core Match Evaluator:
- The Core Match Evaluator accepts
- annotations that share a core
- expression. In this context a core
- expression is at least four
- digits long and starts with a
- capitalized letter. For example the
- two annotations "L404-123-421"
- and "L404-321-412" would be
- considered a true positive match,
- because of "L404" is considered a
- core expression that is contained
- in both annotations.
- </para>
- <para>
- Word Accuracy Evaluator:
- Compares the labels of all
- words/numbers in an annotation, whereas the
- label equals the type of
- the annotation. This has the consequence,
- for example, that each
- word or number that is not part of the
- annotation is counted as a
- single false negative. For example we
- have the sentence: "Christmas
- is on the 24.12 every year."
- The script labels "Christmas is on the
- 12" as a single sentence, while
- the test file labels the sentence
- correctly with a single sentence
- annotation. While for example the
- Exact CAS Evaluator while only
- assign a single False Negative
- annotation, Word Accuracy Evaluator
- will mark every word or number
- as a single False Negative.
- </para>
- <para>
- Template Only Evaluator:
- This Evaluator compares the offsets of
- the annotations and the
- features, that have been created by the
- script. For example the
- text "Alan Mathison Turing" is marked with
- the author annotation
- and "author" contains 2 features: "FirstName"
- and "LastName". If
- the script now creates an author annotation with
- only one feature,
- the annotation will be marked as a false positive.
- </para>
- <para>
- Template on Word Level Evaluator:
- The Template On Word
- Evaluator compares the offsets of the
- annotations. In addition it
- also compares the features and feature
- structures and the values
- stored in the features. For example the
- annotation "author" might
- have features like "FirstName" and
- "LastName" The authors name is
- "Alan Mathison Turing" and the
- script correctly assigns the author
- annotation. The feature
- assigned by the script are "Firstname :
- Alan", "LastName :
- Mathison", while the correct feature values would
- be "FirstName
- Alan", "LastName Turing". In this case the Template
- Only Evaluator
- will mark an annotation as a false positive, since the
- feature
- values differ.
- </para>
- </section>
-
- </section>
- <section id="ugr.tools.tm.textruler">
- <title>TextRuler</title>
- <para>
- Using the knowledge engineering approach, a knowledge engineer
- normally
- writes handcrafted rules to create a domain dependent
- information
- extraction application, often supported by a gold
- standard. When
- starting the engineering process for the acquisition
- of the
- extraction knowledge for possibly new slot or more general for
- new
- concepts, machine learning methods are often able to offer
- support
- in an iterative engineering process. This section gives a
- conceptual
- overview of the process model for the semi-automatic
- development of
- rule-based information extraction applications.
- </para>
- <para>
- First, a suitable set of documents that contain the text
- fragments with
- interesting patterns needs to be selected and
- annotated with the
- target concepts. Then, the knowledge engineer
- chooses and configures
- the methods for automatic rule acquisition to
- the best of his
- knowledge for the learning task: Lambda expressions
- based on tokens
- and linguistic features, for example, differ in their
- application
- domain from wrappers that process generated HTML pages.
- </para>
- <para>
- Furthermore, parameters like the window size defining relevant
- features need to
- be set to an appropriate level. Before the annotated
- training
- documents form the input of the learning task, they are
- enriched
- with features generated by the partial rule set of the
- developed
- application. The result of the methods, that is the learned
- rules,
- are proposed to the knowledge engineer for the extraction of
- the
- target concept.
- </para>
- <para>
- The knowledge engineer has different options to proceed: If the
- quality, amount or generality of the presented rules is not
- sufficient, then additional training documents need to be annotated
- or additional rules have to be handcrafted to provide more features
- in general or more appropriate features. Rules or rule sets of high
- quality can be modified, combined or generalized and transfered to
- the rule set of the application in order to support the extraction
- task of the target concept. In the case that the methods did not
- learn reasonable rules at all, the knowledge engineer proceeds with
- writing handcrafted rules.
- </para>
- <para>
- Having gathered enough extraction knowledge for the current
- concept, the
- semi-automatic process is iterated and the focus is
- moved to the
- next concept until the development of the application is
- completed.
- </para>
- <section id="ugr.tools.tm.textruler.learner">
- <title>Available Learners</title>
- <para>
- Overview
-
- ||Name||Strategy||Document||Slots||Status
- |BWI (1)
- |Boosting, Top Down |Struct, Semi |Single, Boundary |Planning
- |LP2
- (2) |Bottom Up Cover |All |Single, Boundary |Prototype
- |RAPIER (3)
- |Top Down/Bottom Up Compr. |Semi |Single |Experimental
- |WHISK (4)
- |Top Down Cover |All |Multi |Prototype
- |WIEN (5) |CSP |Struct
- |Multi, Rows |Prototype
- </para>
- <para>
- * Strategy: The used strategy of the learning methods are
- commonly
- coverage algorithms.
- * Document: The type of the document
- may be ''free'' like in
- newspapers, ''semi'' or ''struct'' like HTML
- pages.
- * Slots: The slots refer to a single annotation that
- represents the
- goal of the learning task. Some rule are able to
- create several
- annotation at once in the same context (multi-slot).
- However, only
- single slots are supported by the current
- implementations.
- * Status: The current status of the implementation
- in the TextRuler
- framework.
- </para>
- <para>
- Publications
- </para>
- <para>
- (1) Dayne Freitag and Nicholas Kushmerick. Boosted Wrapper
- Induction.
- In AAAI/IAAI, pages 577â583, 2000.
- </para>
- <para>
- (2) F. Ciravegna. (LP)2, Rule Induction for Information
- Extraction
- Using Linguistic Constraints. Technical Report CS-03-07,
- Department
- of Computer Science, University of Sheffield, Sheffield,
- 2003.
- </para>
- <para>
- (3) Mary Elaine Califf and Raymond J. Mooney. Bottom-up
- Relational
- Learning of Pattern Matching Rules for Information
- Extraction.
- Journal of Machine Learning Research, 4:177â210, 2003.
- </para>
- <para>
- (4) Stephen Soderland, Claire Cardie, and Raymond Mooney.
- Learning
- Information Extraction Rules for Semi-Structured and Free
- Text. In
- Machine Learning, volume 34, pages 233â272, 1999.
- </para>
- <para>
- (5) N. Kushmerick, D. Weld, and B. Doorenbos. Wrapper
- Induction for
- Information Extraction. In Proc. IJC Artificial
- Intelligence, 1997.
- </para>
- <para>
- BWI
- BWI (Boosted Wrapper Induction) uses boosting techniques to
- improve
- the performance of simple pattern matching single-slot
- boundary
- wrappers (boundary detectors). Two sets of detectors are
- learned:
- the "fore" and the "aft" detectors. Weighted by their
- confidences
- and combined with a slot length histogram derived from
- the training
- data they can classify a given pair of boundaries
- within a
- document. BWI can be used for structured, semi-structured
- and free
- text. The patterns are token-based with special wildcards
- for more
- general rules.
- </para>
- <para>
- Implementations
- No implementations are yet available.
- </para>
- <para>
- Parameters
- No parameters are yet available.
-
- </para>
- <para>
- LP2
- This method operates on all three kinds of documents. It
- learns
- separate rules for the beginning and the end of a single
- slot. So
- called tagging rules insert boundary SGML tags and
- additionally
- induced correction rules shift misplaced tags to their
- correct
- positions in order to improve precision. The learning
- strategy is a
- bottom-up covering algorithm. It starts by creating a
- specific seed
- instance with a window of w tokens to the left and
- right of the
- target boundary and searches for the best
- generalization. Other
- linguistic NLP-features can be used in order
- to generalize over the
- flat word sequence.
- </para>
- <para>
- Implementations
- LP2 (naive):
- LP2 (optimized):
- </para>
- <para>
- Parameters
- Context Window Size (to the left and right):
- Best
- Rules List Size:
- Minimum Covered Positives per Rule:
- Maximum Error
- Threshold:
- Contextual Rules List Size:
- </para>
- <para>
- RAPIER
- RAPIER induces single slot extraction rules for
- semi-structured
- documents. The rules consist of three patterns: a
- pre-filler, a
- filler and a post-filler pattern. Each can hold
- several constraints
- on tokens and their according POS-tag- and
- semantic information.
- The algorithm uses a bottom-up compression
- strategy, starting with
- a most specific seed rule for each training
- instance. This initial
- rule base is compressed by randomly selecting
- rule pairs and search
- for the best generalization. Considering
- two
- rules, the least general generalization (LGG) of the slot fillers
- are created and specialized by adding rule items to the pre- and
- post-filler until the new rules operate well on the training set.
- The best of the k rules (k-beam search) is added to the rule base
- and all empirically subsumed rules are removed.
- </para>
- <para>
- Implementations
- RAPIER:
- </para>
- <para>
- Parameters
- Maximum Compression Fail Count:
- Internal Rules List
- Size:
- Rule Pairs for Generalizing:
- Maximum 'No improvement' Count:
- Maximum Noise Threshold:
- Minimum Covered Positives Per Rule:
- PosTag
- Root Type:
- Use All 3 GenSets at Specialization:
- </para>
- <para>
- WHISK
- WHISK is a multi-slot method that operates on all three
- kinds of
- documents and learns single- or multi-slot rules looking
- similar to
- regular expressions. The top-down covering algorithm
- begins with
- the most general rule and specializes it by adding
- single
- rule terms until the rule makes no errors on the training
- set. Domain
- specific classes or linguistic information obtained by a
- syntactic
- analyzer can be used as additional features. The exact
- definition
- of a rule term (e.g. a token) and of a problem instance
- (e.g. a
- whole document or a single sentence) depends on the
- operating
- domain and document
- type.
- </para>
- <para>
- Implementations
- WHISK (token):
- WHISK (generic):
- </para>
- <para>
- Parameters
- Window Size:
- Maximum Error Threshold:
- PosTag Root
- Type:
- </para>
- <para>
- WIEN
- WIEN is the only method listed here that operates on
- highly structured
- texts only. It induces so called wrappers that
- anchor the slots by
- their structured context around them. The HLRT
- (head left right
- tail) wrapper class for example can determine and
- extract
- several multi-slot-templates by first separating the
- important information
- block from unimportant head and tail portions
- and then extracting
- multiple data rows from table like
- data
- structures from the remaining document. Inducing a wrapper is done
- by solving a CSP for all possible pattern combinations from the
- training data.
- </para>
- <para>
- Implementations
- WIEN:
- </para>
- <para>
- Parameters
- No parameters are available.
-
- </para>
- </section>
- </section>
-
+ <title>TextMarker Workbench</title>
+ <para> The TextMarker workbench, which is made available as an Eclipse- plugin, offers a powerful
+ environment for creating and working on TextMarker projects. It provides two main perspectives
+ and several views to develop, run, debug, test and evaluate TextMarker rules in a comfortable
+ way, supporting many of the known Eclipse features e.g. auto-completion. Moreover it makes the
+ creation of dictionaries like tree word lists easy and supports machine learning methods which
+ can be used within a knowledge engineering process. The following chapter starts with the
+ installation of the workbench, followed by a description of all its features.
+ </para>
+
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href=".\workbench\tools.textmarker.workbench.install.xml" />
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href=".\workbench\tools.textmarker.workbench.overview.xml" />
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href=".\workbench\tools.textmarker.workbench.projects.xml" />
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href=".\workbench\tools.textmarker.workbench.tm_perspective.xml" />
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href=".\workbench\tools.textmarker.workbench.explain_perspective.xml" />
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href=".\workbench\tools.textmarker.workbench.query.xml" />
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href=".\workbench\tools.textmarker.workbench.testing.xml" />
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href=".\workbench\tools.textmarker.workbench.textruler.xml" />
+
</chapter>
\ No newline at end of file
Added: uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.explain_perspective.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.explain_perspective.xml?rev=1398363&view=auto
==============================================================================
--- uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.explain_perspective.xml (added)
+++ uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.explain_perspective.xml Mon Oct 15 16:22:23 2012
@@ -0,0 +1,415 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY imgroot "images/tools/tm/workbench/" >
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor
+ license agreements. See the NOTICE file distributed with this work for additional
+ information regarding copyright ownership. The ASF licenses this file to
+ you under the Apache License, Version 2.0 (the "License"); you may not use
+ this file except in compliance with the License. You may obtain a copy of
+ the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
+ by applicable law or agreed to in writing, software distributed under the
+ License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
+ OF ANY KIND, either express or implied. See the License for the specific
+ language governing permissions and limitations under the License. -->
+
+<section id="section.ugr.tools.tm.workbench.explain_perspective">
+ <title>Explain Perspective</title>
+ <para>
+ Writing new rules is laborious, especially if the newly written
+ rules do not
+ behave as
+ expected. The TextMarker system is able to
+ protocol the application of each
+ single rule and
+ block in order to
+ provide an explanation of the rule inference and a
+ minimal debugging
+ functionality. The information about the application of the rules
+ itself is stored
+ in the resulting
+ xmiCAS output file if the parameters
+ of the executed engine are
+ configured correctly. The simplest way to
+ generate these explanation information
+ is to click on the common
+ 'Debug' button (looks like a green bug)
+ while having the TextMarker
+ script file, you want to debug, active in
+ your editor. The current
+ TextMarker file will then be executed on the text files in the input
+ directory and xmiCAS are
+ created in the output directory containing the
+ additional UIMA
+ feature structures describing the
+ rule inference. To
+ show the newly created execution information, you can either open the
+ Explain
+ perspective or open the necessary views separately and arrange
+ them as you like. There are eight
+ views that display information about
+ the execution of
+ the rules: Applied Rules, Covering Rules, Created By,
+ Failed Rules, Matched Rules, Rule Elements, Rule List
+ and Statistics.
+ All of theses views are further explained in detail, using the
+ TextMarker example project
+ for examples.
+ </para>
+
+ <para>
+ To make it possible to reproduce all of the examples used below,
+ switch to the TextMarker Explain perspective within your Eclipse
+ workbench.
+ Import the TextMarker example project and open the main
+ TextMarker script file 'Main.tm'. Now press the 'Debug' button
+ and wait
+ for the end of execution. Open the resulting xmiCAS file
+ 'Test1.txt.xmi', which you can find in the output folder.
+ </para>
+
+ <section
+ id="section.ugr.tools.tm.workbench.explain_perspective.applied_rules">
+ <title>Applied Rules</title>
+ <para>
+ The Applied Rules view displays structured information about all
+ rules that tried to apply to the input documents.
+ </para>
+ <para>
+ The structure is as
+ follows: if BLOCK constructs were used in the
+ executed TextMarker
+ file, the rules contained in that block will be
+ represented as child
+ node in the tree of the view. Each TextMarker
+ file is itself a BLOCK
+ construct named after the file. Therefore the
+ root node of the view
+ is always a BLOCK containing the rules of the
+ executed TextMarker
+ script. Additionally, if a rule calls a different
+ TextMarker file,
+ then the root block of that file is the child of the
+ calling rule.
+ </para>
+ <para>
+ If you double-click on one of the rules, the related script file
+ is opened within the editor and the rule itself is selected.
+ </para>
+ <para>
+ <xref linkend='section.ugr.tools.tm.workbench.explain_perspective' />
+ shows the whole rule hierarchy resulting from the TextMarker example
+ project. The root of the whole hierarchy is the BLOCK associated to
+ the 'Main.tm' script. On the next level, the rules called by the
+ 'Main.tm' script are listed. Since there is a call to each of the
+ script files 'Year.tm',
+ 'Author.tm' and 'Title.tm', these are included
+ into the hierarchy, each forming their own block.
+ </para>
+ <para>
+ The following image shows the TextMarker Applied Rules view.
+ <figure
+ id="figure.ugr.tools.tm.workbench.explain_perspective.applied_rules">
+ <title> Applied Rules view
+ </title>
+ <mediaobject>
+ <imageobject role="html">
+ <imagedata width="576px" format="PNG" align="center"
+ fileref="&imgroot;explain/applied_rules_view.png" />
+ </imageobject>
+ <imageobject role="fo">
+ <imagedata width="5.5in" format="PNG" align="center"
+ fileref="&imgroot;explain/applied_rules_view.png" />
+ </imageobject>
+ <textobject>
+ <phrase>
+ Applied Rules view.
+ </phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ </para>
+ <para>
+ Besides the hierarchy, the view shows how often a rule tried to match
+ in total and how often it succeeded. This is shown in brackets at the
+ beginning of each rule entry. E.g., the Applied Rules view tells us
+ that the rule
+ <literal>NUM{REGEXP("19..|20..") -> MARK(Year)};</literal>
+ within script 'Year.tm' tried to match twice but only succeeded once.
+ </para>
+ <para>
+ After this information the rule itself is given. Notice that
+ each rule
+ is given with all the parameters it has been executed.
+ E.g., have a
+ look at rule entry
+ <literal>[1/1]Document{->MARKFAST(FirstName,FirstNameList,false,0,true)}
+ </literal>
+ within BLOCK Author. The rule obviously has been executed with five
+ parameters. If you double-click on this rule, you will get to the
+ rule in the script file 'Author.tm'. It shows the rule as follows:
+ <literal>Document{-> MARKFAST(FirstName, FirstNameList)};
+ </literal>
+ . This means the last three parameters have been default values used
+ to execute the rule.
+ </para>
+ <para>
+ Additionally some profiling information, giving details about
+ the absolute time and the percentage of total execution time the rule
+ needed, is added at the end of each rule entry.
+ </para>
+ <para>
+ The selection (single-click) of a rule in this view will
+ directly change the information visualized in the views Failed Rules
+ and Matched Rules.
+ </para>
+ </section>
+
+ <section
+ id="section.ugr.tools.tm.workbench.explain_perspective.matched_and_failed_rules">
+ <title>Matched Rules and Failed Rules</title>
+ <para>
+ If a rule is selected (single-click) in the Applied Rules view,
+ then the Matched Rules view displays all instances (text passages) on
+ which the rule matched. On the contrary, the Failed Rules view shows
+ the instances on which the rule tried but failed to match.
+ </para>
+ <para>
+ E.g. select rule
+ <literal>[2/3]Name{-PARTOF(NameListPart)} NameLinker[1,2]{->
+ MARK(NameListPart,1,2)}
+ </literal>
+ within BLOCK Author.
+ <xref
+ linkend='figure.ugr.tools.tm.workbench.explain_perspective.matched_and_failed_rules' />
+ shows the text passages this rule tried to match on. One did not
+ succeed. Therefore it is displayed within the Failed Rules view. Two
+ succeeded and are shown in the Matched Rules view.
+ </para>
+ <para>
+ The following image shows the TextMarker Applied Rules view.
+ <figure
+ id="figure.ugr.tools.tm.workbench.explain_perspective.matched_and_failed_rules">
+ <title> The views Matched Rules and Failed Rules
+ </title>
+ <mediaobject>
+ <imageobject role="html">
+ <imagedata width="576px" format="PNG" align="center"
+ fileref="&imgroot;explain/matched_and_failed.png" />
+ </imageobject>
+ <imageobject role="fo">
+ <imagedata width="5.5in" format="PNG" align="center"
+ fileref="&imgroot;explain/matched_and_failed.png" />
+ </imageobject>
+ <textobject>
+ <phrase>
+ The views Matched Rules and Failed Rules.
+ </phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ </para>
+ <para>
+ The selection (single-click) of one of the text passages in
+ either Matched Rules view or Failed Rules view will directly change
+ the information visualized in the Rule Elements view.
+ </para>
+ </section>
+
+ <section
+ id="section.ugr.tools.tm.workbench.explain_perspective.rule_elements">
+ <title>Rule Elements</title>
+ <para>
+ If you select one of the listed instances in the Matched or
+ Failed Rules view,
+ then the Rule Elements view contains a listing
+ of
+ the rule elements and their
+ conditions belonging to the related rule
+ used on the specific text passage. There is detailed
+ information
+ available on what text
+ passage each rule element did or
+ did not match
+ and which condition did
+ or did not evaluate true.
+ </para>
+ <para>
+ Within the Rule Elements view, each rule element generates its
+ own explanation hierarchy. On the root level the rule element itself
+ is given. An apostrophe at the beginning of the rule element
+ indicates that this rule was the anchor for the rule execution. On
+ the next level the text passage on which the rule element tried to
+ match on is given. The last level then explains why the rule element
+ did or did not match. The first entry on this level tells if the text
+ passage is of the requested annotation type. If it is, a green hook
+ is shown in front of the requested type. Otherwise a red cross is
+ shown. In the following the rule conditions and their evaluation on
+ the given text passage are shown.
+ </para>
+ <para>
+ In the previous example, select the listed instance
+ <literal>Bethard, S.</literal>
+ . The Rule Elements view then shows the related explanation displayed
+ in
+ <xref
+ linkend='figure.ugr.tools.tm.workbench.explain_perspective.rule_elements' />
+ .
+ </para>
+ <para>
+ The following image shows the TextMarker Rule Elements view.
+ <figure
+ id="figure.ugr.tools.tm.workbench.explain_perspective.rule_elements">
+ <title> The views Matched Rules and Failed Rules
+ </title>
+ <mediaobject>
+ <imageobject role="html">
+ <imagedata width="250px" format="PNG" align="center"
+ fileref="&imgroot;explain/rule_elments.png" />
+ </imageobject>
+ <imageobject role="fo">
+ <imagedata width="3.5in" format="PNG" align="center"
+ fileref="&imgroot;explain/rule_elements.png" />
+ </imageobject>
+ <textobject>
+ <phrase>
+ The Rule Elements view.
+ </phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ </para>
+ <para>
+ As you can see, the first rule element
+ <literal>Name{-PARTOF(NameListPart)}</literal>
+ matched on the text passage
+ <literal>Bethard, S.</literal>
+ since it is firstly annotated with an
+ <quote>Name</quote>
+ annotation and secondly it is not part of an annotation
+ <quote>NameListPart</quote>
+ . But as this first text passage is not followed by a
+ <quote>NameLinker</quote>
+ annotation the whole rule fails.
+ </para>
+ </section>
+
+ <section
+ id="section.ugr.tools.tm.workbench.explain_perspective.covering_rules">
+ <title>Covering Rules</title>
+ <para>
+ This views is very similar to the Applied Rules view, but
+ displays only rules and blocks under a given selection. If the user
+ clicks on any position in the xmiCAS document, an Covering Rules view
+ is generated containing only rule elements that affect that position
+ in the document. The Matched Rules,
+ Failed Rules and Rule Elements
+ views
+ then only contain match
+ information
+ of that position.
+ </para>
+ </section>
+
+ <section id="section.ugr.tools.tm.workbench.explain_perspective.rule_list">
+ <title>Rule List</title>
+ <para>
+ This views is very similar to the Applied Rules view and the
+ Covering Rules view, but displays only rules and NO blocks under a
+ given selection. If the user clicks on any position in the xmiCAS
+ document, a list of rules, that matched or tried to match on that
+ position in the document, is generated within the Rule List view. The
+ Matched Rules, Failed Rules and Rule Elements views then only contain
+ match information of that position. Additionally, this view provides
+ a text field for filtering the rules. Only those rules remain that
+ contain the entered text.
+ </para>
+ </section>
+
+ <section
+ id="section.ugr.tools.tm.workbench.explain_perspective.created_by">
+ <title>Created By</title>
+ <para>
+ The Created By view tells you which rule created a specific
+ annotation. To get this information just select an annotation in the
+ Annotation Browser. After doing this the Created By view shows the
+ related information.
+ </para>
+ <para>
+ To see how this works, use the example project and go to the
+ Annotation view. Select the
+ <quote>d.u.e.Year</quote>
+ annotation
+ <quote>(2008)</quote>
+ . The Created By view displays the information, seen in
+ <xref linkend='figure.ugr.tools.tm.workbench.explain_perspective.created_by' />
+ . You can double-click on the shown rule to jump to the related
+ document
+ <quote>Year.tm</quote>
+ .
+ </para>
+ <para>
+ The following image shows the TextMarker Created By view.
+ <figure
+ id="figure.ugr.tools.tm.workbench.explain_perspective.created_by">
+ <title> The Created By view
+ </title>
+ <mediaobject>
+ <imageobject role="html">
+ <imagedata width="560px" format="PNG" align="center"
+ fileref="&imgroot;explain/created_by.png" />
+ </imageobject>
+ <imageobject role="fo">
+ <imagedata width="5.5in" format="PNG" align="center"
+ fileref="&imgroot;explain/created_by.png" />
+ </imageobject>
+ <textobject>
+ <phrase>
+ The Created By view.
+ </phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ </para>
+ </section>
+
+ <section
+ id="section.ugr.tools.tm.workbench.explain_perspective.statistics">
+ <title>Statistics</title>
+ <para>
+ The Statistics view displays profiling information for the used
+ conditions and actions of the TextMarker language. Three
+ numbers are
+ given for each element: The total time of execution, the amount of
+ executions and the average time per execution.
+ </para>
+ <para>
+ The following image shows the TextMarker Statistics view generated
+ form the TextMarker example project.
+ <figure
+ id="figure.ugr.tools.tm.workbench.explain_perspective.statistics">
+ <title> The Statistics view
+ </title>
+ <mediaobject>
+ <imageobject role="html">
+ <imagedata width="300px" format="PNG" align="center"
+ fileref="&imgroot;explain/statistics.png" />
+ </imageobject>
+ <imageobject role="fo">
+ <imagedata width="4.0in" format="PNG" align="center"
+ fileref="&imgroot;explain/statistics.png" />
+ </imageobject>
+ <textobject>
+ <phrase>
+ The Statistics view.
+ </phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ </para>
+ </section>
+
+</section>
\ No newline at end of file
Added: uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.install.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.install.xml?rev=1398363&view=auto
==============================================================================
--- uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.install.xml (added)
+++ uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.install.xml Mon Oct 15 16:22:23 2012
@@ -0,0 +1,67 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY imgroot "images/tools/tm/workbench/" >
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.
+ See the NOTICE file distributed with this work for additional information regarding copyright ownership.
+ The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not
+ use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software distributed under the License is
+ distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and limitations under the License. -->
+
+<section id="section.ugr.tools.tm.workbench.install">
+ <title>Installation</title>
+ <para>
+ Do the installation of the TextMarker workbench as follows:
+ <orderedlist numeration="arabic">
+ <listitem>
+ <para>
+ Download, install and start an Eclipse instance in version 3.7. Eclipse can be obtained
+ from the
+ <ulink url="http://www.eclipse.org/downloads/packages/release/indigo/sr2">eclipse.org</ulink>
+ download site.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Add the Apache UIMA update site (
+ <ulink url="http://www.apache.org/dist/uima/eclipse-update-site/">http://www.apache.org/dist/uima/eclipse-update-site/</ulink>
+ ) and the TextMarker update site
+ (
+ <ulink url="http://ki.informatik.uni-wuerzburg.de/~pkluegl/updatesite/">http://ki.informatik.uni-wuerzburg.de/~pkluegl/updatesite/</ulink>
+ ) to the available
+ software sites in your Eclipse installation. Click
+ <quote>Help →
+ Install New Software →
+ Add...
+ </quote>
+ and add each site.
+ </para>
+ </listitem>
+ <listitem>
+ <para> Select the TextMarker update site at "Work with", unselect "Group items by
+ category"
+ and select "Contact all update sites during install to find required software"
+ </para>
+ </listitem>
+ <listitem>
+ <para> Select the TextMarker feature and continue the dialog. The CEV feature is already
+ contained in the TextMarker feature. Eclipse will automatically install the Apache UIMA
+ (version 2.4.1) plugins and the DLTK Core Framework (version 3.0) plugins.
+ </para>
+ </listitem>
+ </orderedlist>
+ </para>
+ <para>
+ After the successful installation, switch to the TextMarker perspective. To get an overview
+ see
+ <xref linkend='section.ugr.tools.tm.workbench.overview' />
+ You can also download the TextMarker plugin from
+ <ulink url="https://sourceforge.net/projects/textmarker/">TextMarker porject on SourceForge.net</ulink>] and install the plugin
+ manually.
+ </para>
+</section>
\ No newline at end of file
Added: uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.overview.xml
URL: http://svn.apache.org/viewvc/uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.overview.xml?rev=1398363&view=auto
==============================================================================
--- uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.overview.xml (added)
+++ uima/sandbox/trunk/TextMarker/uima-docbook-textmarker/src/docbook/workbench/tools.textmarker.workbench.overview.xml Mon Oct 15 16:22:23 2012
@@ -0,0 +1,227 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY imgroot "images/tools/tm/workbench/" >
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.
+ See the NOTICE file distributed with this work for additional information regarding copyright ownership.
+ The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not
+ use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software distributed under the License is
+ distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and limitations under the License. -->
+
+<section id="section.ugr.tools.tm.workbench.overview">
+ <title>TextMarker Workbench Overview</title>
+ <para>
+ The TextMarker workbench provides two main perspectives.
+ <orderedlist numeration="arabic">
+ <listitem>
+ <para>
+ The
+ <quote>TextMarker perspective</quote>
+ which provides the main functionality for working on TextMarker projects. See
+ <xref linkend='section.ugr.tools.tm.workbench.tm_perspective' />
+ for detailed information.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The
+ <quote>Explain perspective</quote>
+ which provides functionality primarily used to explain how a set of written rules
+ behaved
+ on a number of input documents. See
+ <xref linkend='section.ugr.tools.tm.workbench.explain_perspective' />
+ .
+ </para>
+ </listitem>
+ </orderedlist>
+ </para>
+ <para>
+ The following image shows the TextMarker perspective.
+ <figure id="figure.ugr.tools.tm.workbench.overview.tm_perspective">
+ <title> The TextMarker perspective.
+ </title>
+ <mediaobject>
+ <imageobject role="html">
+ <imagedata width="576px" format="PNG" align="center"
+ fileref="&imgroot;overview/screenshot_tm_perspective_.png" />
+ </imageobject>
+ <imageobject role="fo">
+ <imagedata width="5.5in" format="PNG" align="center"
+ fileref="&imgroot;overview/screenshot_tm_perspective_.png" />
+ </imageobject>
+ <textobject>
+ <phrase>
+ The TextMarker perspective.
+ </phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ As you can see, the TextMarker perspective provides an editor for editing documents, e.g.
+ TextMarker scripts, and several views for different other tasks. The Script Explorer for
+ example
+ helps to manage your TextMarker projects.
+ </para>
+ <para>
+ The following
+ <xref linkend='table.ugr.tools.tm.workbench.overview.views' />
+ lists all available TextMarker views:
+ <table id="table.ugr.tools.tm.workbench.overview.views" frame="all">
+ <title>TextMarker views</title>
+ <tgroup cols="2" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="1*" />
+ <colspec colname="c2" colwidth="1*" />
+ <thead>
+ <row>
+ <entry align="center">View</entry>
+ <entry align="center">Detailed description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Annotation Test</entry>
+ <entry>
+ See
+ <xref linkend='ugr.tools.tm.testing' />
+ </entry>
+ </row>
+ <row>
+ <entry>Applied Rules</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.explain_perspective.applied_rules' />
+ </entry>
+ </row>
+ <row>
+ <entry>Covering Rules</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.explain_perspective.covering_rules' />
+ </entry>
+ </row>
+ <row>
+ <entry>Created By</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.explain_perspective.created_by' />
+ </entry>
+ </row>
+ <row>
+ <entry>Failed Rules</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.explain_perspective.matched_and_failed_rules' />
+ </entry>
+ </row>
+ <row>
+ <entry>False Negative</entry>
+ <entry>
+ See
+ <xref linkend='ugr.tools.tm.testing' />
+ </entry>
+ </row>
+ <row>
+ <entry>False Positive</entry>
+ <entry>
+ See
+ <xref linkend='ugr.tools.tm.testing' />
+ </entry>
+ </row>
+ <row>
+ <entry>Matched Rules</entry>
+ <entry>
+ See
+ <xref linkend='figure.ugr.tools.tm.workbench.explain_perspective.matched_and_failed_rules' />
+ </entry>
+ </row>
+ <row>
+ <entry>Rule Elements</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.explain_perspective.rule_elements' />
+ </entry>
+ </row>
+ <row>
+ <entry>Rule List</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.explain_perspective.rule_list' />
+ </entry>
+ </row>
+ <row>
+ <entry>Statistics</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.explain_perspective.statistics' />
+ </entry>
+ </row>
+ <row>
+ <entry>TextMarker Documentation</entry>
+ <entry>
+ ---
+ </entry>
+ </row>
+ <row>
+ <entry>TextMarker Query</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.tm_query' />
+ </entry>
+ </row>
+ <row>
+ <entry>TextRuler</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.textruler' />
+ </entry>
+ </row>
+ <row>
+ <entry>TextRuler Results</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.textruler' />
+ </entry>
+ </row>
+ <row>
+ <entry>True Positive</entry>
+ <entry>
+ See
+ <xref linkend='ugr.tools.tm.testing' />
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ <para>
+ The following
+ <xref linkend='table.ugr.tools.tm.workbench.overview.wizards' />
+ lists all TextMarker wizards:
+ <table id="table.ugr.tools.tm.workbench.overview.wizards" frame="all">
+ <title>TextMarker wizards</title>
+ <tgroup cols="2" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="1*" />
+ <colspec colname="c2" colwidth="1*" />
+ <thead>
+ <row>
+ <entry align="center">Wizard</entry>
+ <entry align="center">Detailed description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Create TextMarker project</entry>
+ <entry>
+ See
+ <xref linkend='section.ugr.tools.tm.workbench.projects.create_projects' />
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+</section>
\ No newline at end of file