You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by pk...@apache.org on 2013/03/06 13:29:56 UTC

svn commit: r1453311 - in /uima/site/trunk/uima-website: docs/images/textmarker/ docs/images/textmarker/textmarker_workbench.png docs/textmarker.html xdocs/textmarker.xml

Author: pkluegl
Date: Wed Mar  6 12:29:55 2013
New Revision: 1453311

URL: http://svn.apache.org/r1453311
Log:
[UIMA-2699] added first version of textmarker page

Added:
    uima/site/trunk/uima-website/docs/images/textmarker/
    uima/site/trunk/uima-website/docs/images/textmarker/textmarker_workbench.png   (with props)
Modified:
    uima/site/trunk/uima-website/docs/textmarker.html
    uima/site/trunk/uima-website/xdocs/textmarker.xml

Added: uima/site/trunk/uima-website/docs/images/textmarker/textmarker_workbench.png
URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/docs/images/textmarker/textmarker_workbench.png?rev=1453311&view=auto
==============================================================================
Binary file - no diff available.

Propchange: uima/site/trunk/uima-website/docs/images/textmarker/textmarker_workbench.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: uima/site/trunk/uima-website/docs/textmarker.html
URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/docs/textmarker.html?rev=1453311&r1=1453310&r2=1453311&view=diff
==============================================================================
--- uima/site/trunk/uima-website/docs/textmarker.html (original)
+++ uima/site/trunk/uima-website/docs/textmarker.html Wed Mar  6 12:29:55 2013
@@ -14,7 +14,7 @@
                         
             
                         
-                        <title>Apache UIMA - Apache UIMA Addons and Sandbox</title>
+                        <title>Apache UIMA - Apache UIMA TextMarker</title>
         </head>
 
         <body>
@@ -28,7 +28,7 @@
                             </a>
                     </td>
                     <td align='CENTER'>
-                          <div class="pageBanner">Apache UIMA Addons and Sandbox</div>
+                          <div class="pageBanner">Apache UIMA TextMarker</div>
                     </td>
                     <td align='RIGHT'>
                                   <a href="http://www.apache.org">
@@ -180,7 +180,196 @@
       </td></tr>
       <tr><td>
         <blockquote class="sectionBody">
-                </blockquote>
+                                          <ul>
+          <li><a href='#Overview'>
+                  Overview
+        
+                </a></li>
+          <li><a href='#Rule Language'>
+                  Rule Language
+        
+                </a></li>
+          <li><a href='#Workbench'>
+                  Workbench
+        
+                </a></li>
+          <li><a href='#Developer Information'>
+                  Developer Information
+        
+                </a></li>
+        </ul>
+                                                        <table class="subsectionTable" id='textmarker.overview'>
+        <tr><td>
+       
+       
+       
+          <a name="Overview">
+            <h2>Overview
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>
+    Apache UIMA&trade; TextMarker consists of to major parts: An Analysis Engine, which interpret 
+    and executes the TextMarker rule-based scripting language, and the Eclipse-based tooling (TextMarker Workbench), 
+    which provides various support for developing TextMarker rules.
+  </p>
+                                                <ul>
+    <li>
+      <p>
+        This page only contains a short overview. A more detailed introduction can be found in the documentation 
+        (<a href="d/textmarker-current/tools.textmarker.book.html">html</a>, 
+        <a href="d/textmarker-current/tools.textmarker.book.pdf">pdf</a>).
+      </p>    
+    </li> 
+    <li>
+      <p>
+        UIMA TextMarker Workbench can be installed via our Eclipse update site:
+        <a href="http://www.apache.org/dist/uima/eclipse-update-site/">http://www.apache.org/dist/uima/eclipse-update-site/</a>
+      </p>    
+    </li>     
+  </ul>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table class="subsectionTable" id='textmarker.language'>
+        <tr><td>
+       
+       
+       
+          <a name="Rule Language">
+            <h2>Rule Language
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>
+    The TextMarker language is an imperative rule language extended with scripting elements. A TextMarker rule defines a
+    pattern of annotations with additional conditions. If this pattern applies, then the actions of the rule are performed 
+    on the matched annotations. A rule is composed of a sequence of rule elements and a rule element essentially consist of four parts: 
+    A matching condition, an optional quantifier, a list of conditions and a list of actions.
+    The matching condition is typically a type of an annotation by which the rule element matches on the covered text of one of those annotations.
+    The quantifier specifies, whether it is necessary that the rule element successfully matches and how often the rule element may match.
+    The list of conditions specifies additional constraints that the matched text or annotations need to fulfill. The list of actions defines
+    the consequences of the rule and often creates new annotations or modifies existing annotations.
+  </p>
+                                                <p>
+    The following example rule consists of three rule elements. The first one (<code>ANY...</code>) matches on every token, which has a covered text that occurs in a word lists named <code>MonthsList</code>.
+    The second rule element (<code>PERIOD?</code>) is optional and does not need to be fulfilled, which is indicated by the quantifier <code>?</code>. The last rule element (<code>NUM...</code>) matches
+    on numbers that fulfill the regular expression <code>REGEXP(".{2,4}"</code> and are therefore at least two characters to a maximum of four characters long.
+    If this rule successfully matches on a text passage, then its three actions are executed: An annotation of the type <code>Month</code> is created for the first rule element,
+    an annotation of the type <code>Year</code> is created for the last rule element and an annotation of the type <code>Date</code> 
+    is created for the span of all three rule elements. If the word list contains the correct entries, then this rule matches on strings like 
+    <code>Dec. 2004</code>, <code>July 85</code> or <code>11.2008</code> and creates the corresponding annotations.
+  
+    <pre>ANY{INLIST(MonthsList) -&gt; MARK(Month), MARK(Date,1,3)} 
+    PERIOD? NUM{REGEXP(".{2,4}") -&gt; MARK(Year))};</pre>
+  </p>
+                                                <p>
+     Here is a short overview of additional features of the TextMarker language: 
+  </p>
+                                                <ul>
+    <li>Expressions and variables</li>
+    <li>Import and execution of external components</li> 
+    <li>Flexible matching with filtering</li> 
+    <li>Modularization in different files or blocks</li>
+    <li>Control structures, e.g., for windowing</li>
+    <li>Score-based extraction</li>
+    <li>Modification</li>
+    <li>Html support</li> 
+    <li>Dictionaries</li>
+    <li>Extensible language definition</li>  
+  </ul>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table class="subsectionTable" id='textmarker.workbench'>
+        <tr><td>
+       
+       
+       
+          <a name="Workbench">
+            <h2>Workbench
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>
+   The TextMarker Workbench was created to facilitate all steps in creating Analysis Engines based on the TextMarker language.
+   Here is a short overview of included features: 
+  </p>
+                                                <ul>
+    <li>
+      <p>
+        <b>Editing support:</b> The full-featured editor for the TextMaker language provides syntax and semantic highlighting, 
+        syntax checking, context-sensitive auto-completion, template-based completion, open declaration and more.
+      </p>    
+    </li> 
+    <li>
+      <p>
+        <b>Rule Explanation:</b> Each step in the matching process can be explained: This includes how often a rule was applied, 
+        which condition was not fulfilled, or by which rule a specific annotation was created. Additionally, profile information 
+        about the runtime performance can be accessed.
+      </p>    
+    </li> 
+    <li>
+      <p>
+        <b>Test-driven development:</b> TextMarker scripts can automatically tested against a set of annotated documents. 
+      </p>    
+    </li>
+    <li>
+      <p>
+        <b>Rule learning:</b> The supervised learning algorithms of the included TextRuler framework are able to induce TextMarker rules 
+        and, therefore, enable semi-automatic development of rule-based components.
+      </p>    
+    </li>
+    <li>
+      <p>
+        <b>Query:</b> TextMarker rules can be used as query statements in order to investigate annotated documents.
+      </p>    
+    </li>     
+  </ul>
+                                                <img style="width: 75%; height: 75%" src="./images/textmarker/textmarker_workbench.png" alt="UIMA TextMarker Workbench" />
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table class="subsectionTable" id='textmarker.developer'>
+        <tr><td>
+       
+       
+       
+          <a name="Developer Information">
+            <h2>Developer Information
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>The latest version of UIMA TextMarker is available via <a href="http://search.maven.org/#search%7Cga%7C1%7Ctextmarker">Maven Central</a>. 
+  If you use Maven as your build tool, then you can add the basic UIMA TextMarker functionality as a dependency 
+  in your pom.xml file (additionally to other UIMA dependencies):</p>
+                                                <pre>
+&lt;dependency&gt;
+  &lt;groupId&gt;org.apache.uima&lt;/groupId&gt;
+  &lt;artifactId&gt;textmarker-core&lt;/artifactId&gt;
+  &lt;version&gt;2.0.0&lt;/version&gt;
+&lt;/dependency&gt;
+  </pre>
+                                                <subsubsection>
+    For building the UIMA TextMarker projects from sources, follow the instructions for <a href="building-uima.html">building UIMA</a>, 
+    but exchange the command for SVN checkout:<br />
+    <code>svn checkout https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk c:/myWorkingDirectoryForTextMarker</code>
+  </subsubsection>
+                                                <p>
+    The sources of the current release are available at the <a href="downloads.html">download page</a>. 
+  </p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                            </blockquote>
         </p>
       </td></tr>
     </table>

Modified: uima/site/trunk/uima-website/xdocs/textmarker.xml
URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/xdocs/textmarker.xml?rev=1453311&r1=1453310&r2=1453311&view=diff
==============================================================================
--- uima/site/trunk/uima-website/xdocs/textmarker.xml (original)
+++ uima/site/trunk/uima-website/xdocs/textmarker.xml Wed Mar  6 12:29:55 2013
@@ -22,7 +22,7 @@ under the License.
 <document>
 
 <properties>
-<title>Apache UIMA Addons and Sandbox</title>
+<title>Apache UIMA TextMarker</title>
 <author email="dev@uima.apache.org">
  Apache UIMA Documentation Team</author>
 </properties>
@@ -31,8 +31,137 @@ under the License.
 
 <section name="Apache UIMA TextMarker">
 
+<subsectionToc/>
 
-
+<subsection name='Overview' id="textmarker.overview">
+  <p>
+    Apache UIMA&#0153; TextMarker consists of to major parts: An Analysis Engine, which interpret 
+    and executes the TextMarker rule-based scripting language, and the Eclipse-based tooling (TextMarker Workbench), 
+    which provides various support for developing TextMarker rules.
+  </p>
+  <ul>
+    <li>
+      <p>
+        This page only contains a short overview. A more detailed introduction can be found in the documentation 
+        (<a href="d/textmarker-current/tools.textmarker.book.html">html</a>, 
+        <a href="d/textmarker-current/tools.textmarker.book.pdf">pdf</a>).
+      </p>    
+    </li> 
+    <li>
+      <p>
+        UIMA TextMarker Workbench can be installed via our Eclipse update site:
+        <a href="http://www.apache.org/dist/uima/eclipse-update-site/">http://www.apache.org/dist/uima/eclipse-update-site/</a>
+      </p>    
+    </li>     
+  </ul>
+</subsection>
+
+<subsection name='Rule Language' id="textmarker.language">
+  <p>
+    The TextMarker language is an imperative rule language extended with scripting elements. A TextMarker rule defines a
+    pattern of annotations with additional conditions. If this pattern applies, then the actions of the rule are performed 
+    on the matched annotations. A rule is composed of a sequence of rule elements and a rule element essentially consist of four parts: 
+    A matching condition, an optional quantifier, a list of conditions and a list of actions.
+    The matching condition is typically a type of an annotation by which the rule element matches on the covered text of one of those annotations.
+    The quantifier specifies, whether it is necessary that the rule element successfully matches and how often the rule element may match.
+    The list of conditions specifies additional constraints that the matched text or annotations need to fulfill. The list of actions defines
+    the consequences of the rule and often creates new annotations or modifies existing annotations.
+  </p>
+  <p>
+    The following example rule consists of three rule elements. The first one (<code>ANY...</code>) matches on every token, which has a covered text that occurs in a word lists named <code>MonthsList</code>.
+    The second rule element (<code>PERIOD?</code>) is optional and does not need to be fulfilled, which is indicated by the quantifier <code>?</code>. The last rule element (<code>NUM...</code>) matches
+    on numbers that fulfill the regular expression <code>REGEXP(".{2,4}"</code> and are therefore at least two characters to a maximum of four characters long.
+    If this rule successfully matches on a text passage, then its three actions are executed: An annotation of the type <code>Month</code> is created for the first rule element,
+    an annotation of the type <code>Year</code> is created for the last rule element and an annotation of the type <code>Date</code> 
+    is created for the span of all three rule elements. If the word list contains the correct entries, then this rule matches on strings like 
+    <code>Dec. 2004</code>, <code>July 85</code> or <code>11.2008</code> and creates the corresponding annotations.
+  
+    <pre>ANY{INLIST(MonthsList) -> MARK(Month), MARK(Date,1,3)} 
+    PERIOD? NUM{REGEXP(".{2,4}") -> MARK(Year))};</pre>
+  </p>
+  
+   <p>
+     Here is a short overview of additional features of the TextMarker language: 
+  </p>
+  <ul>
+    <li>Expressions and variables</li>
+    <li>Import and execution of external components</li> 
+    <li>Flexible matching with filtering</li> 
+    <li>Modularization in different files or blocks</li>
+    <li>Control structures, e.g., for windowing</li>
+    <li>Score-based extraction</li>
+    <li>Modification</li>
+    <li>Html support</li> 
+    <li>Dictionaries</li>
+    <li>Extensible language definition</li>  
+  </ul>
+  
+  
+  
+</subsection>
+
+<subsection name='Workbench' id="textmarker.workbench">
+  <p>
+   The TextMarker Workbench was created to facilitate all steps in creating Analysis Engines based on the TextMarker language.
+   Here is a short overview of included features: 
+  </p>
+  <ul>
+    <li>
+      <p>
+        <b>Editing support:</b> The full-featured editor for the TextMaker language provides syntax and semantic highlighting, 
+        syntax checking, context-sensitive auto-completion, template-based completion, open declaration and more.
+      </p>    
+    </li> 
+    <li>
+      <p>
+        <b>Rule Explanation:</b> Each step in the matching process can be explained: This includes how often a rule was applied, 
+        which condition was not fulfilled, or by which rule a specific annotation was created. Additionally, profile information 
+        about the runtime performance can be accessed.
+      </p>    
+    </li> 
+    <li>
+      <p>
+        <b>Test-driven development:</b> TextMarker scripts can automatically tested against a set of annotated documents. 
+      </p>    
+    </li>
+    <li>
+      <p>
+        <b>Rule learning:</b> The supervised learning algorithms of the included TextRuler framework are able to induce TextMarker rules 
+        and, therefore, enable semi-automatic development of rule-based components.
+      </p>    
+    </li>
+    <li>
+      <p>
+        <b>Query:</b> TextMarker rules can be used as query statements in order to investigate annotated documents.
+      </p>    
+    </li>     
+  </ul>
+  <img style="width: 75%; height: 75%" src="./images/textmarker/textmarker_workbench.png" alt="UIMA TextMarker Workbench"/>     
+</subsection>
+
+<subsection name='Developer Information' id="textmarker.developer">
+  <p>The latest version of UIMA TextMarker is available via <a href="http://search.maven.org/#search%7Cga%7C1%7Ctextmarker">Maven Central</a>. 
+  If you use Maven as your build tool, then you can add the basic UIMA TextMarker functionality as a dependency 
+  in your pom.xml file (additionally to other UIMA dependencies):</p>
+  <pre>
+&lt;dependency>
+  &lt;groupId>org.apache.uima&lt;/groupId>
+  &lt;artifactId>textmarker-core&lt;/artifactId>
+  &lt;version>2.0.0&lt;/version>
+&lt;/dependency>
+  </pre>
+  <subsubsection>
+    For building the UIMA TextMarker projects from sources, follow the instructions for <a href="building-uima.html">building UIMA</a>, 
+    but exchange the command for SVN checkout:<br/>
+    <code>svn checkout https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk c:/myWorkingDirectoryForTextMarker</code>
+  </subsubsection>
+   
+  <p>
+    The sources of the current release are available at the <a href="downloads.html">download page</a>. 
+  </p>
+  
+</subsection>
+ 
 </section>
 
 </body>