You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/11/23 14:16:43 UTC
svn commit: r839315 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/nlp/nlpannotations docs/trunk/components/enhancer/nlp/nlpannotations.html

Author: buildbot
Date: Fri Nov 23 13:16:43 2012
New Revision: 839315

Log:
Staging update by buildbot for stanbol

Added:
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/nlp/nlpannotations.html
Removed:
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/nlp/nlpannotations
Modified:
    websites/staging/stanbol/trunk/content/   (props changed)

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Nov 23 13:16:43 2012
@@ -1 +1 @@
-1412874
+1412876

Added: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/nlp/nlpannotations.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/nlp/nlpannotations.html (added)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/nlp/nlpannotations.html Fri Nov 23 13:16:43 2012
@@ -0,0 +1,301 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - NLP Annotations</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/index.html">Home</a></li>
+<li><a href="/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/docs/trunk/">Documentation</a><ul>
+<li><a href="/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/docs/trunk/components">Components</a></li>
+</ul>
+</li>
+<li><a href="/development/">Development</a></li>
+<li><a href="/production/">Production</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/downloads/">Overview</a><ul>
+<li><a href="/downloads/releases.html">Releases</a></li>
+<li><a href="/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrumbs">
+      <ul> <li><a href="/">Home</a></li> <li class="item"><a href="/docs/">Docs</a></li> <li class="item"><a href="/docs/trunk/">Trunk</a></li> <li class="item"><a href="/docs/trunk/components/">Components</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/">Enhancer</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/nlp/">Nlp</a></li> </ul>
+    </div>
+    <h1 class="title">NLP Annotations</h1>
+    <p>While the The <a href="analyzedtext">Analyzed Text</a> interface allows to define Sentences, Chunks and Tokens within the text and also to attach annotations to those this part of the Stanbol NLP processing module provides the Java domain model for the annotations section this part of the Stanbol NLP processing module defines the Java domain model used for those annotations. This includes annotation models for Part of Speech (POS) tags, Chunks , recognized Named Entities (NER) as well as morphological analysis.</p>
+<h3 id="part-of-speech-pos-annotations">Part of Speech (POS) annotations</h3>
+<p>Part of Speech (POS) tagging represents an token level annotation. It assigns tokens with categories like noun, verb, adjectives, punctuation ... This annotations are typically provided by an POS tagger that consumes Tokens and provides tag(s) with confidence(s) as output. Tags are usually string values that are member of a TagSet - a fixed list of tags used to annotate tokens. Those Tag sets are typically language and often even trainings corpus specific. This makes it really hard to consume POS tags created by different POS tagger for different languages as the consumer would need to know about the meanings of all the different POS tags for the different languages.</p>
+<p>The POS annotation model defined by the Stanbol NLP module tries to solve this issue by providing means to align POS tag sets with formal categories defined by the <a href="http://nlp2rdf.lod2.eu/olia/">OLiA Ontology</a>. The following sub-section will provide details and usage examples.</p>
+<h4 id="olia-morphosyntacticcategories">OLiA MorphosyntacticCategories</h4>
+<p>The '<a href="http://nlp2rdf.lod2.eu/olia/">OLiA</a> Reference Model for Morphology and Morphosyntax, with experimental extension to Syntax' defines a set of ~150 formally defined and multi-lingual POS tags. Those types are defined as a non-cyclic multi-hierarchy with 'oilia:MorphosyntacticCategory' as common root.</p>
+<p>To give an example the POS 'olia:Gerund' is defined as a 'olia:NonFiniteVerb' what itself is a 'olia:Verb'. An example for a multi-hierarchy is 'olia:NominalQuantifier' that is both a 'olia:Noun' and a 'olia:Quantifier'.</p>
+<p>To allow support a nice integration of the formal definitions by the OLiA ontology within the Stanbol NLP annotations there are two Java enumerations:</p>
+<ul>
+<li><strong>LexicalCategories</strong>: This enumeration covers the 12 top level categories as defined by OLiA. This includes Noun, Verb, Adjective, Adposition, Adverb, Conjuction, Interjection, PronounOrDeterminer, Punctuation, Quantifier, Residual and Unique.</li>
+<li><strong>Pos</strong>: This enumeration covers all OLiA MorphosyntacticCategories from the 2+ level. So by using the <em>Pos</em> enum one can e.g. distinguish between ProperNoun's and CommonNoun's or FiniteVerb's and NonFiniteVerb's ... The <em>Pos</em> enumeration has full support for the multi-hierarchy as defined by OLiA. The Pos#categories() methods allows to get the 1st level parents of <em>Pos</em>. The Pos#hierarchy() returns all 2+ level parents of a <em>Pos</em> member.</li>
+</ul>
+<h4 id="postag-and-tagset">PosTag and TagSet</h4>
+<p>The PosTag represents a POS tag as used by an POS tagger. PosTags do support the following features:</p>
+<ul>
+<li><strong>tag</strong> [1..1]::Stirng - This is the string tag as used by the POS tagger.</li>
+<li><strong>category</strong> [0..*]::LexicalCategory - The assigned LexicalCategory enumeration members.</li>
+<li><strong>pos</strong> [0..*]::Pos - The assigned Pos enumeration members.</li>
+</ul>
+<p>An Example for a PosTag representing a 'olia:ProperNoun' looks like follows</p>
+<div class="codehilite"><pre><span class="n">PosTag</span> <span class="n">tag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PosTag</span><span class="o">(</span><span class="s">&quot;NP&quot;</span><span class="o">,</span> <span class="n">Pos</span><span class="o">.</span><span class="na">ProperNoun</span><span class="o">);</span>
+</pre></div>
+
+
+<p>The first parameter is the String POS tag used by the POS tagger and the second parameter represents the mapping to the OLiA MorphosyntacticCategories for this tag. The next example shows an sofisticated mapping for the "PWAV" (Pronominaladverb) as used by the STTS tag set for the German language</p>
+<div class="codehilite"><pre><span class="k">new</span> <span class="nf">PosTag</span><span class="o">(</span><span class="s">&quot;PWAV&quot;</span><span class="o">,</span> <span class="n">LexicalCategory</span><span class="o">.</span><span class="na">Adverb</span><span class="o">,</span> <span class="n">Pos</span><span class="o">.</span><span class="na">RelativePronoun</span><span class="o">,</span> <span class="n">Pos</span><span class="o">.</span><span class="na">InterrogativePronoun</span><span class="o">);</span>
+</pre></div>
+
+
+<p><em>TagSet</em> is the other important class as it allows to manage the set of PosTag instances. <em>TagSet</em> has two main functions: First it allows an integrator of an POS tagger with Stanbol to define the mappings from the string POS tags used by the Pos Tagger to the LexicalCategory and Pos enumeration members as preferable used by the Stanbol NLP chain. Second it ensures that there is only a single instance of PosTag used to annotate all Tokens with the same type.</p>
+<p>_TagSet_s are typically specified as static members of utility classes. The following code snippet shows an example</p>
+<div class="codehilite"><pre><span class="c1">//Tagset is generically typed. We need a TagSet for PosTag&#39;s</span>
+<span class="kd">public</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">TagSet</span><span class="o">&lt;</span><span class="n">PosTag</span><span class="o">&gt;</span> <span class="n">STTS</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TagSet</span><span class="o">&lt;</span><span class="n">PosTag</span><span class="o">&gt;(</span>
+    <span class="s">&quot;STTS&quot;</span><span class="o">,</span> <span class="s">&quot;de&quot;</span><span class="o">);</span> <span class="c1">//define a name and the languages it supports</span>
+
+<span class="kd">static</span> <span class="o">{</span>
+    <span class="c1">//you can set properties to a TagSet. While supported this</span>
+    <span class="c1">//feature is currently not used by Stanbol</span>
+    <span class="n">STTS</span><span class="o">.</span><span class="na">getProperties</span><span class="o">().</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;olia.annotationModel&quot;</span><span class="o">,</span>
+        <span class="k">new</span> <span class="nf">UriRef</span><span class="o">(</span><span class="s">&quot;http://purl.org/olia/stts.owl&quot;</span><span class="o">));</span>
+    <span class="n">STTS</span><span class="o">.</span><span class="na">getProperties</span><span class="o">().</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;olia.linkingModel&quot;</span><span class="o">,</span>
+        <span class="k">new</span> <span class="nf">UriRef</span><span class="o">(</span><span class="s">&quot;http://purl.org/olia/stts-link.rdf&quot;</span><span class="o">));</span>
+    <span class="n">STTS</span><span class="o">.</span><span class="na">addTag</span><span class="o">(</span><span class="k">new</span> <span class="n">PosTag</span><span class="o">(</span><span class="s">&quot;ADJA&quot;</span><span class="o">,</span> <span class="n">Pos</span><span class="o">.</span><span class="na">AttributiveAdjective</span><span class="o">));</span>
+    <span class="n">STTS</span><span class="o">.</span><span class="na">addTag</span><span class="o">(</span><span class="k">new</span> <span class="n">PosTag</span><span class="o">(</span><span class="s">&quot;ADJD&quot;</span><span class="o">,</span> <span class="n">Pos</span><span class="o">.</span><span class="na">PredicativeAdjective</span><span class="o">));</span>
+    <span class="n">STTS</span><span class="o">.</span><span class="na">addTag</span><span class="o">(</span><span class="k">new</span> <span class="n">PosTag</span><span class="o">(</span><span class="s">&quot;ADV&quot;</span><span class="o">,</span> <span class="n">LexicalCategory</span><span class="o">.</span><span class="na">Adverb</span><span class="o">));</span>
+</pre></div>
+
+
+<p class="..">//</p>
+<p>The string tag (first parameter) of the <em>PosTag</em> is used as unique key by the <em>TagSet</em>. Adding an 2nd <em>PasTag</em> with the same tag will override the first one. <em>PosTag_s that are added to a _TagSet</em> have the <em>Tag#getAnnotationModel()</em> property set to that model.</p>
+<p>The final example shows a code snippet shows the core part of an POS tagging engine using the both the <a href="analyzedtext">AnalyzedText</a> and the <em>PosTag</em> and <em>TagSet</em> APIs.</p>
+<div class="codehilite"><pre><span class="n">TagSet</span><span class="o">&lt;</span><span class="n">PosTag</span><span class="o">&gt;</span> <span class="n">tagSet</span><span class="o">;</span> <span class="c1">//the used TagSet</span>
+<span class="c1">//holds PosTags for tags returned by the POS tagger that</span>
+<span class="c1">//are missing in the TagSet</span>
+<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">PosTag</span><span class="o">&gt;</span> <span class="n">adhocTags</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">PosTag</span><span class="o">&gt;():</span>
+<span class="n">List</span><span class="o">&lt;</span><span class="n">Span</span><span class="o">&gt;</span> <span class="n">token</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Span</span><span class="o">&gt;(</span><span class="mi">64</span><span class="o">);</span>
+
+<span class="n">Iterator</span><span class="o">&lt;</span><span class="n">Section</span><span class="o">&gt;</span> <span class="n">sentences</span><span class="o">;</span> <span class="c1">//Iterator over the sentences</span>
+
+<span class="k">while</span><span class="o">(</span><span class="n">sentences</span><span class="o">.</span><span class="na">hasNext</span><span class="o">()){</span>
+    <span class="n">Section</span> <span class="n">sentence</span> <span class="o">=</span> <span class="n">sentences</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
+    <span class="c1">//get the tokens of the current sentence</span>
+    <span class="n">token</span><span class="o">.</span><span class="na">clean</span><span class="o">();</span>
+    <span class="n">AnalysedTextUtils</span><span class="o">.</span><span class="na">appandToList</span><span class="o">(</span>
+        <span class="n">sentence</span><span class="o">.</span><span class="na">getEnclosed</span><span class="o">(</span><span class="n">SpanTypeEnum</span><span class="o">.</span><span class="na">Token</span><span class="o">),</span>
+        <span class="n">tokenList</span><span class="o">);</span>
+    <span class="c1">//typically one needs also to get the Strings</span>
+    <span class="c1">//of the tokens for the pos tagger</span>
+    <span class="n">String</span><span class="o">[]</span> <span class="n">tokenText</span> <span class="o">=</span> <span class="k">new</span> <span class="n">String</span><span class="o">[</span><span class="n">tokenList</span><span class="o">.</span><span class="na">size</span><span class="o">()];</span>
+    <span class="k">for</span><span class="o">(</span><span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="o">;</span><span class="n">i</span><span class="o">&lt;</span><span class="n">tokens</span><span class="o">.</span><span class="na">size</span><span class="o">();</span><span class="n">i</span><span class="o">++){</span>
+        <span class="n">tokenText</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">i</span><span class="o">).</span><span class="na">getSpan</span><span class="o">();</span>
+    <span class="o">}</span>
+
+    <span class="c1">//now POS tag the sentence</span>
+    <span class="n">String</span><span class="o">[]</span> <span class="n">posTags</span> <span class="o">=</span> <span class="n">posTagger</span><span class="o">.</span><span class="na">tag</span><span class="o">(</span><span class="n">tokens</span><span class="o">);</span>
+
+    <span class="c1">//finally apply the PosTags and save the annotation</span>
+    <span class="k">for</span><span class="o">(</span><span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="o">;</span><span class="n">i</span><span class="o">&lt;</span><span class="n">tokens</span><span class="o">.</span><span class="na">size</span><span class="o">();</span><span class="n">i</span><span class="o">++){</span>
+        <span class="n">PosTag</span> <span class="n">tag</span> <span class="o">=</span> <span class="n">tagSet</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">posTags</span><span class="o">[</span><span class="n">i</span><span class="o">]);</span>
+        <span class="k">if</span><span class="o">(</span><span class="n">tag</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span> <span class="c1">//unmapped tag</span>
+            <span class="n">tag</span> <span class="o">=</span> <span class="n">adhocTags</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">posTags</span><span class="o">[</span><span class="n">i</span><span class="o">]);</span>
+        <span class="o">}</span>
+        <span class="k">if</span><span class="o">(</span><span class="n">tag</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span> <span class="c1">//unknown tag</span>
+            <span class="n">tag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PosTag</span><span class="o">(</span><span class="n">posTags</span><span class="o">[</span><span class="n">i</span><span class="o">]);</span>
+            <span class="n">adhocTags</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">posTags</span><span class="o">[</span><span class="n">i</span><span class="o">],</span><span class="n">tag</span><span class="o">);</span>
+        <span class="o">}</span>
+        <span class="c1">//add the annotation to the Token</span>
+        <span class="n">token</span><span class="o">.</span><span class="na">addAnnotation</span><span class="o">(</span>
+            <span class="n">NlpAnnotations</span><span class="o">.</span><span class="na">POS_ANNOTATION</span><span class="o">,</span>
+            <span class="n">Value</span><span class="o">.</span><span class="na">value</span><span class="o">(</span><span class="n">tag</span><span class="o">));</span>
+    <span class="o">}</span>
+<span class="o">}</span>
+</pre></div>
+
+
+<h3 id="phrase-annotations">Phrase annotations</h3>
+<p>Phrase annotations can be used to define the type of a <em>Chunk</em>. The <em>PhraseTag</em> class is used for phrase annotations. It defines first a string tag and secondly the Phrase category. The <em>LexicalCategory</em> enumeration is used as valued for the category. As the <em>PhraseTag</em> is a subclass of <em>Tag</em> it can be also used in combination with the <em>TagSet</em> class as described in the [PosTag and TagSet] section.</p>
+<p>The following code snippets show how to create a PhraseTag for noun phrases</p>
+<div class="codehilite"><pre><span class="n">PhraseTag</span> <span class="n">tag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PhraseTag</span><span class="o">(</span><span class="s">&quot;NP&quot;</span><span class="o">,</span> <span class="n">LexicalCategory</span><span class="o">.</span><span class="na">Noun</span><span class="o">);</span>
+</pre></div>
+
+
+<h3 id="name-entity-ner-annotations">Name Entity (NER) annotations</h3>
+<p>Named Entity annotations are created by NER modules. Before the Stanbol NLP chain they where represented in Stanbol by using '<a href="../enhancementstructure#fisetextannotation">fise:TextAnnotation</a>'s and any Enhancement Engine that does NER should still support this. With the Stanbol NLP processing module it is now also possible to represent detected Named Entities as <em>Chunk</em> with an PhraseTag added as Annotation.</p>
+<p>A Named Entity represented as 'fise:TextAnnotation' includes the following information:</p>
+<div class="codehilite"><pre><span class="err">urn:namedEntity:1</span>
+    <span class="err">rdf:type</span> <span class="err">fise:TextAnnotation,</span> <span class="err">fise:Enhancement</span>
+    <span class="err">fise:selected-text</span> <span class="err">{named-entity-text}</span>
+    <span class="err">fise:start</span> <span class="err">{start-char-pos}</span>
+    <span class="err">fise:end</span> <span class="err">{end-char-pos}</span>
+    <span class="err">dc:type</span> <span class="err">{named-entity-type}</span>
+</pre></div>
+
+
+<p>where:</p>
+<ul>
+<li>{named-entity-text} is the text recognized as Named Entity. This is the same as returned by <em>Chunk#getSpan()</em></li>
+<li>{start-char-pos} is the start character position of the Named Entity relative to the start of the text. This is the same as <em>Chunk#getStart()</em></li>
+<li>{end-char-pos} is the end position and the same as <em>Chunk#getEnd()</em></li>
+<li>{named-enttiy-type} is the type of the recognized Named Entity as URI. The _PhraseTag allows to define both the string tag as used by the NER component as well as the URI this type is mapped to. In Stanbol it is preferred to use 'dbpedia:Person', 'dbpedia:Organisation' and 'dbpedia:Place' for the according entity types.</li>
+</ul>
+<p>The <em>NerTag</em> class extends <em>Tag</em> and can therefore be also used with the <em>TagSet</em> class. This means that users of the API can use <em>TagSet</em> to manage the string tag to URI mappings for the supported Named Entity types.</p>
+<p>The following Code Snippets shows how to add NER annotations to the AnalysedText:</p>
+<div class="codehilite"><pre><span class="n">AnalysedText</span> <span class="n">at</span><span class="o">;</span> <span class="c1">//The AnalysedText</span>
+<span class="n">TagSet</span><span class="o">&lt;</span><span class="n">NerTag</span><span class="o">&gt;</span> <span class="n">nerTags</span><span class="o">;</span> <span class="c1">//registered NER tags</span>
+<span class="n">Iterator</span><span class="o">&lt;</span><span class="n">Section</span><span class="o">&gt;</span> <span class="n">sections</span><span class="o">;</span> <span class="c1">//sections to iterate over</span>
+
+<span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">tokenTexts</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Span</span><span class="o">&gt;(</span><span class="mi">64</span><span class="o">);</span>
+
+<span class="k">while</span><span class="o">(</span><span class="n">sections</span><span class="o">.</span><span class="na">hasNext</span><span class="o">()){</span>
+    <span class="n">Section</span> <span class="n">section</span> <span class="o">=</span> <span class="n">sections</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
+    <span class="c1">//NER tagger typically need String[] as input</span>
+    <span class="n">token</span><span class="o">.</span><span class="na">clean</span><span class="o">();</span>
+</pre></div>
+
+
+<p>Iterator<Token> tokens = section.getTokens;
+        while(tokens.hasNext()){
+            tokenTexts.add(tokens.next().getSpan());
+        }
+        //Span -&gt; #start #end #type #probability
+Span[] nerSpans = nerTagger.tag(
+            tokenTexts.toArray(new String[tokenTexts.size()]);
+        for(int i=0; i &lt; nerSpans.length; i++){
+            Chunk namedEntity = at.addChunk(
+                nerSpans[i].start,nerSpans[i].start);
+            NerTag tag = nerTags.get(nerSpans[i].type)
+            if(tag == null){ //unmapped NER
+                tag = new NerTag(nerSpans[i].type);
+            }
+            namedEntity.addAnnotation(
+                NlpAnnotations.NER_ANNOTATION,
+                Value.value(tag, nerSpans[i]. probability));
+        }
+    }</p>
+<p>Note that the above Code Snippet only shows how to add the Named Entity to the AnalyzedText ContentPart. A actual NER engine Implementation needs also to add those information to the metadata of the <a href="../contentitem">ContentItem</a>.</p>
+<div class="codehilite"><pre><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">;</span> <span class="c1">//The processed ContentItem</span>
+<span class="n">Language</span> <span class="n">lang</span><span class="o">;</span> <span class="c1">//The Language of the processed Text</span>
+<span class="n">MGraph</span> <span class="n">metadata</span> <span class="o">=</span> <span class="n">ci</span><span class="o">.</span><span class="na">getMetadata</span><span class="o">();</span>
+<span class="n">Section</span> <span class="n">section</span><span class="o">;</span> <span class="c1">//the current Section</span>
+<span class="n">Chunk</span> <span class="n">namedEntity</span> <span class="c1">//the currently processed Named Entity</span>
+
+<span class="n">Value</span><span class="o">&lt;</span><span class="n">NerTag</span><span class="o">&gt;</span> <span class="n">nerAnnotation</span> <span class="o">=</span> <span class="n">namedEntity</span><span class="o">.</span><span class="na">getAnnotation</span><span class="o">(</span>
+    <span class="n">NlpAnnotations</span><span class="o">.</span><span class="na">NER_ANNOTATION</span><span class="o">);</span>
+
+<span class="n">UriRef</span> <span class="n">textAnnotation</span> <span class="o">=</span> <span class="n">EnhancementEngineHelper</span><span class="o">.</span><span class="na">createTextEnhancement</span><span class="o">(</span><span class="n">ci</span><span class="o">,</span> <span class="k">this</span><span class="o">);</span>
+<span class="n">metadata</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">textAnnotation</span><span class="o">,</span> <span class="n">ENHANCER_SELECTED_TEXT</span><span class="o">,</span>
+    <span class="k">new</span> <span class="nf">PlainLiteralImpl</span><span class="o">(</span><span class="n">namedEntity</span><span class="o">.</span><span class="na">getSpan</span><span class="o">(),</span> <span class="n">language</span><span class="o">)));</span>
+<span class="n">metadata</span><span class="o">.</span><span class="na">add</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">textAnnotation</span><span class="o">,</span> <span class="n">ENHANCER_SELECTION_CONTEXT</span><span class="o">,</span>
+    <span class="k">new</span> <span class="nf">PlainLiteralImpl</span><span class="o">(</span><span class="n">section</span><span class="o">.</span><span class="na">getSpan</span><span class="o">(),</span> <span class="n">language</span><span class="o">)));</span>
+<span class="k">if</span><span class="o">(</span><span class="n">tag</span><span class="o">.</span><span class="na">getType</span><span class="o">()</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">){</span>
+    <span class="n">metadata</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">textAnnotation</span><span class="o">,</span> <span class="n">DC_TYPE</span><span class="o">,</span>
+        <span class="n">nerAnnotation</span><span class="o">.</span><span class="na">value</span><span class="o">().</span><span class="na">getType</span><span class="o">));</span>
+<span class="o">}</span> <span class="c1">//else do not add an dc:type for unmapped NamedEntities</span>
+<span class="n">g</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">textAnnotation</span><span class="o">,</span> <span class="n">ENHANCER_CONFIDENCE</span><span class="o">,</span>
+    <span class="n">literalFactory</span><span class="o">.</span><span class="na">createTypedLiteral</span><span class="o">(</span><span class="n">nerAnnotation</span><span class="o">.</span><span class="na">probability</span><span class="o">())));</span>
+<span class="n">g</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">textAnnotation</span><span class="o">,</span> <span class="n">ENHANCER_START</span><span class="o">,</span>
+    <span class="n">literalFactory</span><span class="o">.</span><span class="na">createTypedLiteral</span><span class="o">(</span><span class="n">namedEntity</span><span class="o">.</span><span class="na">getStart</span><span class="o">()));</span>
+<span class="n">g</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">textAnnotation</span><span class="o">,</span> <span class="n">ENHANCER_END</span><span class="o">,</span>
+    <span class="n">literalFactory</span><span class="o">.</span><span class="na">createTypedLiteral</span><span class="o">(</span><span class="n">namedEntity</span><span class="o">.</span><span class="na">getEnd</span><span class="o">())));</span>
+</pre></div>
+
+
+<h3 id="morphological-analyses">Morphological Analyses</h3>
+<p><strong>NOTE:</strong> <em>This part of the Stanbol NLP annotations is still work in progress. So this part of the API might undergo heavy changes even in minor releases.</em></p>
+<p>The results of a Morphological Analyses are represented by the <em>MorphoFeatures</em> class and can be added to the analyzed word (<em>Token</em>) by using the <em>NlpAnnotations.MORPHO_ANNOTATION</em>. The <em>MorphoFeatures</em> class provides the following features:</p>
+<ul>
+<li><strong>Lemma</strong>: A String value representing the lemmatization of the annotated Token.</li>
+<li><strong>Case</strong>: The <em>Case</em> enumeration contains around 70 members defined based on concepts of the <a href="http://nlp2rdf.lod2.eu/olia/">OLiA Ontology</a>. The <em>CaseTag</em> allows to define cases and optionally map them to the cases defined by the enumeration.</li>
+<li><strong>Definitness</strong>: The <em>Definitness</em> enumeration has the members Definite and Indefinite also defined by Concepts in the <a href="http://nlp2rdf.lod2.eu/olia/">OLiA Ontology</a>.</li>
+<li><strong>Gender</strong>: The <em>Gender</em> enumeration contains the six gender defined by the <a href="http://nlp2rdf.lod2.eu/olia/">OLiA Ontology</a>. The <em>GenderTag</em> allows to define Genders and optionally map them to the gender defined by the enumeration.</li>
+<li><strong>Number</strong>: The <em>NumberFeature</em> enumeration defines the eight number features defined by <a href="http://nlp2rdf.lod2.eu/olia/">OLiA</a>. The <em>NumberTag</em> can be used to define number features and map them to the members of the enumeration</li>
+<li><strong>Person</strong>: the <em>Person</em> enumeration has the definitions for 'first', 'second' and 'third' with mappings to the according concepts of the <a href="http://nlp2rdf.lod2.eu/olia/">OLiA Ontology</a>.</li>
+<li><strong>Tense</strong>: The <em>Tense</em> enumeration represents the tense hierarchy as defined by the <a href="http://nlp2rdf.lod2.eu/olia/">OLiA Ontology</a>. the <em>Tense#getParent()</em> allows access to the direct parent of a <em>Tense</em> while the <em>Tense#getTenses()</em> method can be used to obtain the transitive closure (including the <em>Tens</em> object itself). <em>TenseTag</em> is used for Tense annotations. It allows both to parse a string tag representing the tense as well as defining a mapping to the tenses defined by the <em>Tense</em> enumeration.</li>
+<li><strong>Mood</strong>: The <em>VerbMood</em> enumeration currently defines members from different part of the <a href="http://nlp2rdf.lod2.eu/olia/">OLiA Ontology</a>. While OLiA does define the 'ilia:MoodFeature' class but those members had not a good match with verb moods as used by the CELI/linguagrid.org service. For now the decision was to define the <em>VerbMood</em> enumeration more closely to the usage of CELI, but this needs clearly to be validated as soon as implementations for other NLP frameworks are added. Their is also a <em>VerbMoodTag</em> that allows to define verb moods by a string tag and an mapping to the <em>VerbMood</em> enumeration.</li>
+</ul>
+<p>The <em>MorphoFeatures</em> supports multi valued annotations for all the above features. Getter for a single value will always return the first added value.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>
+