You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/04/11 10:30:50 UTC

svn commit: r812318 [3/10] - in /websites/staging/stanbol/trunk/content: ./ stanbol/docs/0.9.0-incubating/ stanbol/docs/0.9.0-incubating/cmsadapter/ stanbol/docs/0.9.0-incubating/contenthub/ stanbol/docs/0.9.0-incubating/enhancer/ stanbol/docs/0.9.0-in...

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/chains/weightedchain.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/chains/weightedchain.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/chains/weightedchain.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,100 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Weighted Chain</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Weighted Chain</h1>
+    <p>The WeightedChain takes a list of <a href="../engines">EnhancementEngine</a> names as input and uses the "org.apache.stanbol.enhancer.engine.order" metadata of the configured Engines to calculate an ExecutionPlan.</p>
+<p>This chain is designed for easy configuration - just a list of the engine names - but has limited possibilities to control the execution order.</p>
+<h2 id="configuration">Configuration</h2>
+<p>The property "stanbol.enhancer.chain.weighted.chain" is used to provide the list of engine names. Both arrays and collections are supported as values.</p>
+<p>In addition it is possible to define engines as optional. This allows to specify that the enhancement process should not fail if an engine is not active or fails while processing a content item.</p>
+<p>The syntax to define an Engine as optional is as follows <em>(Both variants make the execution of the engine with the name <name> optional.)</em>:</p>
+<div class="codehilite"><pre>&lt;name&gt;;optional
+&lt;name&gt;;optional=true
+</pre></div>
+
+
+<p><img alt="Configuration dialog for the WeightedCahin" src="enhancer-weightedchain-config.png" title="Screenshot of the configuration dialog for a WeightedChain with two required and one optional engine" /></p>
+<h2 id="calculation-of-the-executionplan">Calculation of the ExecutionPlan</h2>
+<p>It is important to note that the ordering of the list has no influence on the ExecutionPlan because the order of execution of the configured <a href="../engines">EnhancementEngines</a> is calculated only by using the value of the "org.apache.stanbol.enhancer.engine.order" property provided by the EnhancementEngine:</p>
+<ul>
+<li>Engines with a lower order are executed before engines with a higher value</li>
+<li>Engines with the same order may be executed simultaneously if the EnhancementJobManager and the EnhancementEngine do support this feature.</li>
+</ul>
+<p>The WeightedCahin follows exactly the same algorithm as the WeightedJobManager used to decide the execution order of all active EnhancementEngines. However the WeightedChain will only consider configured chains and ignore others.</p>
+<p>The following image shows the ExecutionPlan as calculated based on the above configuration.</p>
+<p><img alt="ExecutionPlan for the keyword chain" src="enhancer-weightedchain-allactive.png" title="The ExecutionPlan is calculated based on the 'order' information of the Enhancement Engines. In this case first 'metaxa' is used to convert any type of content to plain text; second the 'langid' engine is used to detect the language and third the words mentioned in the text are used to lookup entities in DBpedia.org" /></p>
+<p>If some of the Enhancement Engines are not available this will be visualized as follows. If you parse content by using the RESTful interface similar information will be available via the the Execution Metadata included in the metadata of the enhanced content item.</p>
+<p><img alt="Optional Engine is inactive" src="enhancer-weightedchain-optionalinactive.png" title="The optional 'metaxa' engine is inactive. The engines can still be executed however content other than plain text will bot get enhanced" /></p>
+<p>This shows that the optional engine 'metaxa' is currently not available. The chain can be still used however the functionality provided by this optional engine will not be available. In this case only requests for plain text files could be processed.</p>
+<p>The next figure shows a situation where a required engine is not active. Requests to this chain will fail until all required engines are active.</p>
+<p><img alt="Required Engine is inactive" src="enhancer-weightedchain-requiredinactive.png" title="The required 'langid' engine is not active. Because of this requests to this chain will fail." /></p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/contentitem.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/contentitem.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/contentitem.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,191 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Content Item</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Content Item</h1>
+    <p><span style="float:right"> <img alt="Content Item Overview" src="contentitemoverview.png" title="The ContentItem can contain several ContentParts and the Enhancement Metadata - an RDF Graph" /></span> </p>
+<p>The ContentItem is the object which represents the content to be enhanced by Apache Stanbol. It is created based on the data provided by the enhancement request and used throughout the enhancement process to store results. Therefore, after the enhancement process has finished, the ContentItem represents the result of the Apache Stanbol enhancement process.</p>
+<p>The following section describes the interface of the ContentItem in detail:</p>
+<h3 id="content-parts">Content Parts</h3>
+<p>Content parts are used to represent the original content as well as transformations of the original content (typically created by pre-processing <a href="engines/list.html">enhancement engines</a> such as the <a href="engines/metaxaengine.html">Metaxa engine</a>). </p>
+<p>The ContentItem provides the following API to work with content parts:</p>
+<div class="codehilite"><pre><span class="cm">/** Getter for the ContentPart based on the index */</span>
+<span class="n">getPart</span><span class="o">(</span><span class="kt">int</span> <span class="n">index</span><span class="o">,</span> <span class="n">Class</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">type</span><span class="o">)</span> <span class="o">:</span> <span class="n">T</span>
+<span class="cm">/** Getter for the ContentPart based on its ID */</span>
+<span class="n">getPart</span><span class="o">(</span><span class="n">UriRef</span> <span class="n">uri</span><span class="o">,</span> <span class="n">Class</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">type</span><span class="o">)</span> <span class="o">:</span> <span class="n">T</span>
+<span class="cm">/** Getter for the ID based on the index */</span>
+<span class="n">getPartUri</span><span class="o">(</span><span class="n">index</span> <span class="n">index</span><span class="o">)</span> <span class="o">:</span> <span class="n">UriRef</span>
+<span class="cm">/** Adds a new ContentPart to the content item */</span>
+<span class="n">addPart</span><span class="o">(</span><span class="n">UriRef</span> <span class="n">uri</span><span class="o">,</span> <span class="n">Object</span> <span class="n">part</span><span class="o">)</span> <span class="o">:</span> <span class="n">Object</span>
+</pre></div>
+
+
+<p>Content parts are accessible by the index <em>and</em> by their URI formatted ID. Re-adding a content part will replace the old one. The index will not be changed by this operation.</p>
+<p>There are two types of content parts:</p>
+<ol>
+<li>Content parts which have additional metadata provided within the metadata of the content item. Such content parts are typically used to store transformed versions of the original content. This allows e.g. engines which can only process plain text versions to query for the content part containing this version of the parsed document.</li>
+<li>Content parts that are registered under a predefined URI. Such content parts are typically not mentioned within the metadata of the content item. This is used to share intermediate enhancement results between enhancement engines. An example would be tokens, sentences, POS tags and chunks that are extracted by some NLP engine. Engines which want to consume such data need to know the predefined URI of the content part holding this data. They will check within the <code>canEnhance(..)</code> method if a content part with an expected URI is present and if it has the correct type. </li>
+</ol>
+<h3 id="accessing-the-main-content-of-the-contentitem">Accessing the main content of the ContentItem</h3>
+<p>The main content of the ContentItem refers to the content parsed by the enhancement request (or downloaded from the URL provided by a request). For accessing this content the following methods are available</p>
+<div class="codehilite"><pre><span class="cm">/** Getter for the InputStream of the content as parsed</span>
+<span class="cm">    for the ContentItem */</span>
+<span class="o">+</span> <span class="n">getStream</span><span class="o">()</span> <span class="o">:</span> <span class="n">InputStream</span>
+<span class="cm">/** Getter for the mime type of the content */</span>
+<span class="o">+</span> <span class="n">getMimeType</span><span class="o">()</span> <span class="o">:</span> <span class="n">String</span>
+<span class="cm">/** Getted for the Content as Blob */</span>
+<span class="o">+</span> <span class="n">getBlob</span><span class="o">()</span> <span class="o">:</span> <span class="n">Blob</span>
+</pre></div>
+
+
+<p>The <code>getStream()</code> and <code>getMimeType()</code> methods are shortcuts for the according methods of the content item's blob object. Calling <code>contentItem.getBlob.getStream()</code> will return an InputStream over the exact same content as directly calling <code>getStream()</code> on the content item. <em>Note that the blob interface also provides a <code>getParameter()</code> method which allows to retrieve mime-type parameters such as the charset of textual content.</em></p>
+<p>The content parsed by the user is stored as content part at the index '0' with the URI of the content item in the form of a blob. Therefore, calling</p>
+<div class="codehilite"><pre><span class="n">contentItem</span><span class="o">.</span><span class="na">getPart</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span><span class="n">Blob</span><span class="o">.</span><span class="na">class</span><span class="o">)</span>
+<span class="n">contentItem</span><span class="o">.</span><span class="na">getPart</span><span class="o">(</span><span class="n">contentItem</span><span class="o">.</span><span class="na">getUri</span><span class="o">(),</span><span class="n">Blob</span><span class="o">.</span><span class="na">class</span><span class="o">)</span>
+<span class="n">contentItem</span><span class="o">.</span><span class="na">getBlob</span><span class="o">()</span>
+</pre></div>
+
+
+<p>returns the same blob instance.</p>
+<h3 id="metadata-of-the-contentitem">Metadata of the ContentItem</h3>
+<p>The metadata of the ContentItem is managed by a lockable MGraph. This is basically a normal <code>java.util.Collections</code> for triples. The only RDF specific method is the support for filtered iterators which support wildcards for subjects, predicates and objects.</p>
+<p>This graph is used to store all enhancement results as well as metadata about the content item (such as content parts) and the enhancement process (see <a href="executionmetadata.html">execution metadata</a>).</p>
+<h3 id="readwrite-locks">Read/Write locks</h3>
+<p>During the Apache Stanbol enhancement process as executed by the <a href="enhancementjobmanager.html">enhancement job manager</a> components running in multiple threads need to access the state of the <em>ContentItem</em>. Because of that the content item provides the possibility to acquire locks.</p>
+<div class="codehilite"><pre><span class="cm">/** Getter for the ReadWirteLock of a ContentItem */</span>
+<span class="o">+</span> <span class="n">getLock</span><span class="o">()</span> <span class="o">:</span> <span class="n">java</span><span class="o">.</span><span class="na">util</span><span class="o">.</span><span class="na">concurrent</span><span class="o">.</span><span class="na">ReadWriteLock</span>
+</pre></div>
+
+
+<p>Note also that</p>
+<div class="codehilite"><pre><span class="n">contentItem</span><span class="o">.</span><span class="na">getLock</span><span class="o">()</span>
+<span class="n">contentItem</span><span class="o">.</span><span class="na">getMetadata</span><span class="o">().</span><span class="na">getLock</span><span class="o">()</span>
+</pre></div>
+
+
+<p>will return the same <code>ReadWriteLock</code> instance.</p>
+<p>This lock can be used to request read/write locks on the content item. All methods of the content item and also the <code>MGraph</code> holding the metadata need to be protected by using the lock. This means that users which do not need to protect whole sections of code do not need to bother with the usage of locks. Typical examples are working with content parts, final classes like <code>Blob</code> or adding/removing a triple from the metadata.</p>
+<p>However, whenever components need to ensure that the data are not changed by other threads while performing some calculations read/write locks <em>must be</em> used. A typical example are iterations over data returned by the MGraph. In this case code iterating over the results should be protected against concurrent changes by</p>
+<div class="codehilite"><pre><span class="n">contentItem</span><span class="o">.</span><span class="na">getLock</span><span class="o">().</span><span class="na">readLock</span><span class="o">().</span><span class="na">lock</span><span class="o">();</span>
+<span class="k">try</span> <span class="o">{</span>
+    <span class="n">Iterator</span><span class="o">&lt;</span><span class="n">Triple</span><span class="o">&gt;</span> <span class="n">it</span> <span class="o">=</span> <span class="n">contentItem</span><span class="o">.</span><span class="na">getMetadata</span><span class="o">().</span>
+        <span class="n">filter</span><span class="o">(</span><span class="kc">null</span><span class="o">,</span><span class="n">RDF</span><span class="o">.</span><span class="na">TYPE</span><span class="o">,</span><span class="n">TechnicalClasses</span><span class="o">.</span><span class="na">ENHANCER_TEXTANNOTATION</span><span class="o">);</span>
+    <span class="k">while</span><span class="o">(</span><span class="n">it</span><span class="o">.</span><span class="na">hasNext</span><span class="o">()){</span>
+        <span class="n">log</span><span class="o">.</span><span class="na">debug</span><span class="o">(</span><span class="err">&quot;</span><span class="n">Process</span> <span class="nl">TextAnnotation:</span> <span class="o">{},</span><span class="n">it</span><span class="o">.</span><span class="na">next</span><span class="o">().</span><span class="na">getSubject</span><span class="o">());</span>
+        <span class="c1">//read the needed information</span>
+    <span class="o">}</span>
+<span class="o">}</span> <span class="k">finally</span> <span class="o">{</span>
+    <span class="n">contentItem</span><span class="o">.</span><span class="na">getLock</span><span class="o">().</span><span class="na">readLock</span><span class="o">().</span><span class="na">unlock</span><span class="o">()</span>
+<span class="o">}</span>
+</pre></div>
+
+
+<p>While accessing content items within an <a href="engines">enhancement engine</a> there is an exception to this rule. If an engine declares that it only supports the <code>SYNCHRONOUS</code> enhancement mode, then the <a href="enhancementjobmanager.html">enhancement job manager</a> needs to take care that an engine has exclusive access to the <em>CotentItem</em>. In this case implementors of enhancement engines need not to care about using read/write locks.</p>
+<h3 id="multipart-mime-serialization">Multipart MIME serialization</h3>
+<p><span style="float:right"> <img alt="ContentItem Multipart MIME format" src="contentitemmultipartmime.png" title="This figure provides an overview how Content Items are serialized as MultiPart MIME" /></span></p>
+<p>Stanbol supports the serialization of content items as multipart MIME. This serialization is used by the RESTful API of the Stanbol Enhancer. This section provides details about how content items are represented using multipart MIME. For more information on how to send/receive multipart content items via the RESTful Services provided by the Stanbol Enhancer please see the documentation provided in the web interface (e.g. at http://localhost:8080/enhancer).</p>
+<p>The following figure provides an overview on how ContentItems are represented using MultiPart MIME.</p>
+<p><strong>ContentItem Container</strong></p>
+<ul>
+<li>ContentItems are contained within a "multipart/form-data" container</li>
+<li>Apache Stanbol uses "ContentItem" as "boundary", but users may use any other as long as the "boundary" parameter in the "Content-Type" header is set correctly.</li>
+<li>Stanbol uses UTF-8 as charset, but users might use any supported encoding as long as the "charset" parameter in the "Content-Type" header is set accordingly.</li>
+</ul>
+<p>The default Content-Type for serialized ContentItems is therefore "multipart/form-data; boundary=contentItem; charset=UTF-8"</p>
+<p><strong>Enhancement Metadata</strong></p>
+<ul>
+<li>If present this MUST BE the first MIME part within the "multipart/form-data" container representing the ContentItem.</li>
+<li>The "name" parameter of the "Content-Disposition" header MUST BE "metadata"</li>
+<li>If the "fileName" parameter of the "Content-Disposition" header is present it MUST BE the URI of the ContentItem. Users are typically required to set this header in case they want to parse existing metadata with enhancement requests. This is because is such cases it is important that the URI of the ContentItem created by the Stanbol Enhancer is equal to the URI used to describe the Content within the parsed Metadata. The Stanbol Enhancer MUST set to "fileName" parameter of the metadata to the URI of the processed ContentItem.</li>
+<li>The "Content-Type" of the metadata can be any RDF serialization supported by Apache Stanbol. UTF-8 is used as default charset.</li>
+<li>The RDF data serialized in this MIME part represent the enhancement results.</li>
+</ul>
+<p><strong>Content</strong></p>
+<ul>
+<li>If present the MIME part representing the Content MUST directly follow the Metadata. If the Metadata are not present the Content MUST BE the first MIME part within the "multipart/form-data" container representing the ContentItem.</li>
+<li>Because multiple content variants can be included within a ContentItem a "multipart/alternate" container is used to represent the content.</li>
+<li>The "name" parameter of the "Content-Disposition" header MUST BE "content". The "fileName" parameter is not used and therefore not present/ignored. The Stanbol Enhancer uses "contentParts" as boundary but users may use any boundary as long as it is correctly set within the "Content-Type" header.</li>
+</ul>
+<p>The various content elements are contained within the "multipart/form-data" container. The ordering is important. For serialized ContentItems it is assumed that the first element is the original document for the ConentItem. All further MIME parts are considered alternate - e.g. transcoded/transformed - versions. For serialized ContentItems provided as response to requests to the Stanbol Enhancer the ordering of the MIME parts is the same as the indexes of the ContentParts in the ContentItem.</p>
+<ul>
+<li>the "name" parameter of the "Content-Disposition" is set to the URI of the ContentPart in the ContentItem.</li>
+<li>the "Content-Type" header must correspond to the media type of the content</li>
+</ul>
+<p>Note that users which want to send a single ContentPart AND Metadata to the Stanbol Enhancer can also directly add the content to the "multipart/form-data" container of the ContentItem. In this case the "name" parameter MUST BE still set to "content" but the "Content-Type" header needs to be directly set to the media type of the parsed ContentPart. The Stanbol Enhancer does NOT use this option when serializing ContentItems. It will ALWAYS use a "multipart/alternate" container for the "content" even when only a single ContentPart is included in an Response.</p>
+<p><strong>Additional Metadata</strong></p>
+<p>The <a href="#content_parts">ContentPart API</a> of the Stanbol ContentItem allows to register content parts of any type. The MultiPart MIME serialization of ContentItems supports the serialization of such additional parts as long as they are encoded as RDF graphs (compatible to the Clerezza TripleCollection class). Additional ContentParts which are not encoded as RDF data are currently not supported by the Multipart MIME serialization.</p>
+<ul>
+<li>MimeParts representing such ContentParts MUST BE added after the MIME parts for the "metadata" AND the "content"</li>
+<li>The "name" parameter of the "Content-Disposition" MUST BE set to the URI of the ContentPart in the ContentItem.</li>
+<li>the "Content-Type" header must correspond to the media type of the content. The Stanbol Enhancer will always use the same RDF serialization as for the "metadata" when serializing additional Metadata. Users are free to use any supported serialization as long as they set the "Content-Type" header accordingly.</li>
+<li>The ordering of parts representing additional Metadata is the same as the ordering (index) of the ContentParts in the ContentItem.</li>
+</ul>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/contentitemmultipartmime.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/contentitemmultipartmime.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/contentitemoverview.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/contentitemoverview.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/engines/enhancementenginemanager.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/engines/enhancementenginemanager.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/engines/enhancementenginemanager.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,121 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Enhancement Engine Manager</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Enhancement Engine Manager</h1>
+    <p>The EnhancementEngineManager provides name based access to all active <a href="index.html">EnhancementEngine</a>s and their ServiceReferences. This interface is typically used by components that need to lookup EnhancementEngines based on their name. However the EngineTracker implementation can also be used to track specific EnhancementEngines.</p>
+<h3 id="enhancementenginemanager-interface">EnhancementEngineManager interface</h3>
+<p>This is the Java API providing access to registered EnhancementEngines in the ways as described above. This interface includes the following methods:</p>
+<div class="codehilite"><pre><span class="cm">/** Getter for all names with active engines */</span>
+<span class="n">getActiveEngineNames</span><span class="o">()</span> <span class="o">:</span> <span class="n">Set</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span>
+<span class="cm">/** Getter for the ServiceReference to the engine </span>
+<span class="cm">    with a given name */</span>
+<span class="n">getReference</span><span class="o">(</span><span class="n">String</span> <span class="n">name</span><span class="o">)</span> <span class="o">:</span> <span class="n">ServiceReference</span>
+<span class="cm">/** Getter for all ServiceReferences to engines </span>
+<span class="cm">    with a given name sorted by service ranking */</span>
+<span class="n">getReferences</span><span class="o">(</span><span class="n">String</span> <span class="n">name</span><span class="o">)</span>
+<span class="cm">/** Getter for the engine with a given name */</span>
+<span class="o">+</span> <span class="n">getEngine</span><span class="o">(</span><span class="n">Stirng</span> <span class="n">name</span><span class="o">)</span> <span class="o">:</span> <span class="n">EnhancementEngine</span>
+<span class="cm">/** Getter for all engines with a given name sorted </span>
+<span class="cm">    by service ranking */</span>
+<span class="o">+</span> <span class="n">getEngines</span><span class="o">(</span><span class="n">String</span> <span class="n">name</span><span class="o">)</span> <span class="o">:</span> <span class="n">List</span><span class="o">&lt;</span><span class="n">EnhancementEngine</span><span class="o">&gt;</span>
+<span class="cm">/** Getter for an engine based on a service reference */</span>
+<span class="o">+</span> <span class="n">getEngine</span><span class="o">(</span><span class="n">ServiceReference</span> <span class="n">ref</span><span class="o">)</span> <span class="o">:</span> <span class="n">EnhancementEgnie</span>
+<span class="cm">/** Checks if there is an engine for the given name */</span>
+<span class="o">+</span> <span class="n">isEngine</span><span class="o">(</span><span class="n">String</span> <span class="n">name</span><span class="o">)</span> <span class="o">:</span> <span class="kt">boolean</span>
+</pre></div>
+
+
+<p>There are two implementations of this interface available:</p>
+<h4 id="enhancementenginemanager-service">EnhancementEngineManager Service</h4>
+<p>This is an implementation of the EnhancementEngineManager interface that is registered as OSGI service. It can be used e.g. by using the @Reference annotation</p>
+<div class="codehilite"><pre><span class="nd">@Reference</span>
+<span class="n">EnhancementEngineManager</span> <span class="n">engineManager</span>
+</pre></div>
+
+
+<p>This service is provided by the "org.apache.stanbol.enhancer.enginemanager" module and is included in all Stanbol launchers.</p>
+<h4 id="enginestracker">EnginesTracker</h4>
+<p>This is an utility similar to the standard OSGI ServiceTracker which allows to track some/all EnhancementEngines. It also supports the usage of a ServiceTrackerCustomizer so that users of that utility can directly react to changes of tracked EnhancementEngines.</p>
+<div class="codehilite"><pre><span class="c1">//track only &quot;myEngine&quot; and &quot;otherEngine&quot;</span>
+<span class="n">EnginesTracker</span> <span class="n">tracker</span> <span class="o">=</span> <span class="k">new</span> <span class="n">EnginesTracker</span><span class="o">(</span>
+    <span class="n">context</span><span class="o">,</span> <span class="s">&quot;myEngine&quot;</span><span class="o">,</span><span class="s">&quot;otherEngine&quot;</span><span class="o">);</span>
+<span class="n">tracker</span><span class="o">.</span><span class="na">open</span><span class="o">();</span> <span class="c1">//start tracking</span>
+
+<span class="c1">//the tracker need to be closed if no longer needed</span>
+<span class="n">tracker</span><span class="o">.</span><span class="na">close</span><span class="o">()</span>
+<span class="n">tracker</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>
+</pre></div>
+
+
+<p>For most users the EnhancementEngineManager service is sufficient and preferable. Direct use of the EngineTracker is only recommended if one needs only to track some specific engines and especially if one needs to get notified an changes of such engines.</p>
+<p>The implementation of the <code><a href="http://svn.apache.org/repos/asf/incubator/stanbol/trunk/enhancer/chain/weighted/src/main/java/org/apache/stanbol/enhancer/chain/weighted/impl/WeightedChain.java">WeightedChain</a></code> is a good example for the intended usage of the EnginesTracker.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/engines/geonamesengine.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/engines/geonamesengine.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/engines/geonamesengine.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,287 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - The Geonames Enhancement Engine</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">The Geonames Enhancement Engine</h1>
+    <p>This engine creates fise:EntityAnnotations based on the http://geonames.org  dataset. It does not directly work on the parsed content, but processes named entities extracted by some NLP (natural language processing) engine. This engine  creates EntityAnnotations for Features found for named entities in the geonames.org data set. In addition it adds EntityAnnotations for the continent, country and administrative regions for entities with an high confidence level.</p>
+<h2 id="processed-annotations-input">Processed Annotations (Input)</h2>
+<p>This engine consumes fise:TextAnnotations of type dbpedia:Place. More concretely it filters for enhancements which confirm to the following two requirements and consumes the text selected by the TextAnnotations:</p>
+<div class="codehilite"><pre>?textAnnotation rdf:type fise:TextAnnotation .
+?textAnnotation dc:type dbpedia:Place
+?textAnnotation fise:selected-text ?text
+</pre></div>
+
+
+<p>Here an example for such an TextAnnotations selecting the text "Vienna" form the content "The community Workshop will take place in Vienna".</p>
+<div class="codehilite"><pre>urn:enhancement:text-enhancement:id1
+    a       fise:TextAnnotation , fise:Enhancement ;
+    dc:type
+         dbpedia:Place ;
+    fise:selected-text
+         &quot;Vienna&quot;^^xsd:string ;
+    fise:selection-context
+         &quot;The community Workshop will take place in Vienna&quot;^^xsd:string ;
+    fise:start
+         &quot;46&quot;^^xsd:int ;
+    fise:end
+         &quot;52&quot;^^xsd:int ;
+    fise:confidence
+         &quot;0.9773640902587215&quot;^^xsd:double ;
+    fise:extracted-from
+         urn:content-item:id1 .
+</pre></div>
+
+
+<p>Typically such enhancements are created by engines that provide named entity extraction based on some natural language processing framework.</p>
+<h2 id="created-enhancements-output">Created Enhancements (Output)</h2>
+<p>The LocationEnhancementEngine creates two types of EntityAnnotations. First it suggests Entities for processed TextAnnotations and second it creates EntityAnnotations for the hierarchy of regions the suggested Entities are located in. Suggested Entities are connected with the "dc:relation" attribute to the TextAnnotation they enhance. EntityAnnotations representing the hierarchydefine a dc:requires attribute to the EntityAnnotation.</p>
+<h3 id="entity-suggestions">Entity Suggestions</h3>
+<p>Entity suggestions are EntityEnhancements that suggest Features of the geonames.org dataset for an processed TextAnnotation. This suggestions are currently only calculated based on the fise:selected-text of the TextAnnotation. </p>
+<p>The following example shows three EntityAnnotations for the TextAnnotation used in the above example. See the fise:relation statements at the end of each of the two EntityAnnotations.</p>
+<p>The first Entity found in the geonames.orf dataset is the capital city in Austria with an confidence level of 1.0:</p>
+<div class="codehilite"><pre>urn:enhancement:entity-enhancement:id1
+    a      fise:EntityAnnotation , fise:Enhancement ;
+    fise:confidence
+         &quot;1.0&quot;^^xsd:double ;
+    fise:entity-label
+         &quot;Vienna&quot;^^xsd:string ;
+    fise:entity-reference
+         http://sws.geonames.org/2761369/ ;
+    fise:entity-type
+         geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPLC ;
+    fise:extracted-from
+         urn:content-item:id1 ;
+   dc:relation
+         urn:enhancement:text-enhancement:id1 .
+</pre></div>
+
+
+<p>With lower confidence levels there are a lot of other populated places with the name "Vienna" found in the geonames.org dataset.</p>
+<div class="codehilite"><pre>urn:enhancement:entity-enhancement:id2
+    a      fise:EntityAnnotation , fise:Enhancement ;
+    fise:confidence
+         &quot;0.42163702845573425&quot;^^xsd:double ;
+    fise:entity-label
+         &quot;Vienna&quot;^^xsd:string ;
+    fise:entity-reference
+         http://sws.geonames.org/4496671/ ;
+    fise:entity-type
+         geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ;
+    fise:extracted-from
+         urn:content-item:id1 ;
+   dc:relation
+         urn:enhancement:text-enhancement:id1 .
+
+urn:enhancement:entity-enhancement:id3
+    a      fise:EntityAnnotation , fise:Enhancement ;
+    fise:confidence
+         &quot;0.42163702845573425&quot;^^xsd:double ;
+    fise:entity-label
+         &quot;Vienna&quot;^^xsd:string ;
+    fise:entity-reference
+         http://sws.geonames.org/4825976/ ;
+    fise:entity-type
+         geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ;
+    fise:extracted-from
+         urn:content-item:id1 ;
+    dc:relation
+         urn:enhancement:text-enhancement:id1 .
+</pre></div>
+
+
+<h2 id="entity-hierarchy-enhancements">Entity Hierarchy Enhancements</h2>
+<p>Entity Hierarchy Enhancements describe the regions that contain suggested Features based on the geonames.org dataset. Enhancements describing this hierarchy are added for all suggested entities with a confidence level above the value of "eu.iksproject.fise.engines.geonames.locationEnhancementEngine.min-hierarchy-score". </p>
+<p>The default value for this property is 0.7. The hierarchy web service provided by geonames.org is used to calculate the regions.
+The following example shows the entity hierarchy enhancements for the suggested entity for Vienna (Autria). <em>Please note the dc:requires relation to this EntityAnnotation at the end of each of the following enhancement.</em></p>
+<h3 id="continent-europe">Continent: Europe</h3>
+<p>First the enhancement for the continent Europe:</p>
+<div class="codehilite"><pre>urn:enhancement:entity-hierarchy-enhancement:id1
+    a      fise:EntityAnnotation , fise:Enhancement ;
+    fise:confidence
+         &quot;0.42163702845573425&quot;^^xsd:double ;
+    fise:entity-label
+         &quot;Europe&quot;^^xsd:string ;
+    fise:entity-reference
+         http://sws.geonames.org/6255148/ ;
+    fise:entity-type
+         geonames:Feature , dbpedia:Place, geonames:L.CONT ;
+    fise:extracted-from
+         urn:content-item:id1 ;
+   dc:requires
+         urn:enhancement:entity-enhancement:id1 .
+</pre></div>
+
+
+<h3 id="country-austria">Country: Austria</h3>
+<p>Next the enhancement for the country "Austria", classified as an independent political entry within geonames.org</p>
+<div class="codehilite"><pre>urn:enhancement:entity-hierarchy-enhancement:id2
+    a      fise:EntityAnnotation , fise:Enhancement ;
+    fise:confidence
+         &quot;0.42163702845573425&quot;^^xsd:double ;
+    fise:entity-label
+         &quot;Austria&quot;^^xsd:string ;
+    fise:entity-reference
+         http://sws.geonames.org/2782113/ ;
+    fise:entity-type
+         geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.PCLI ;
+    fise:extracted-from
+         urn:content-item:id1 ;
+    dc:requires
+         urn:enhancement:entity-enhancement:id1 .
+</pre></div>
+
+
+<h3 id="aadm1-a-county">A.ADM1 - A county</h3>
+<p>Now three enhancements describing the different hierarchies of administrative regions within Austria. First the "Bundesland", next the "Stadtteil" and last the "Gemeindebezirk".</p>
+<div class="codehilite"><pre>urn:enhancement:entity-hierarchy-enhancement:id3
+    a      fise:EntityAnnotation , fise:Enhancement ;
+    fise:confidence
+         &quot;0.42163702845573425&quot;^^xsd:double ;
+    fise:entity-label
+         &quot;Vienna&quot;^^xsd:string ;
+    fise:entity-reference
+         http://sws.geonames.org/2761367/ ;
+    fise:entity-type
+         geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM1 ;
+    fise:extracted-from
+         urn:content-item:id1 ;
+    dc:requires
+         urn:enhancement:entity-enhancement:id1 .
+</pre></div>
+
+
+<h3 id="aadm2-a-city">A.ADM2 - A city</h3>
+<div class="codehilite"><pre>urn:enhancement:entity-hierarchy-enhancement:id4
+    a      fise:EntityAnnotation , fise:Enhancement ;
+    fise:confidence
+         &quot;0.42163702845573425&quot;^^xsd:double ;
+    fise:entity-label
+         &quot;Politischer Bezirk Wien (Stadt)&quot;^^xsd:string ;
+    fise:entity-reference
+         http://sws.geonames.org/2761333/ ;
+    fise:entity-type
+         geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM2 ;
+    fise:extracted-from
+         urn:content-item:id1 ;
+   dc:requires
+         urn:enhancement:entity-enhancement:id1 .
+</pre></div>
+
+
+<h3 id="aadm3-a-village">A.ADM3 - A village</h3>
+<div class="codehilite"><pre>urn:enhancement:entity-hierarchy-enhancement:id5
+    a      fise:EntityAnnotation , fise:Enhancement ;
+    fise:confidence
+         &quot;0.42163702845573425&quot;^^xsd:double ;
+    fise:entity-label
+         &quot;Gemeindebezirk Innere Stadt&quot;^^xsd:string ;
+    fise:entity-reference
+         http://sws.geonames.org/2775259/ ;
+    fise:entity-type
+         geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM3 ;
+    fise:extracted-from
+         urn:content-item:id1 ;
+   dc:requires
+         urn:enhancement:entity-enhancement:id1 .
+</pre></div>
+
+
+<p>The last two hierarchy levels are no longer valid for the meaning of "Vienna" as selected by the TextAnnotation, but added, because the geonames.org dataset locations the Feature of cities exactly in the center. However if the TextAnnotation would describe a precise address such hierarchy levels would completely make sense.</p>
+<h2 id="configuration">Configuration</h2>
+<p>The LocationEnhancementEngine provides six configurations</p>
+<p>The first three can be used to optimise the behaviour of the Engine</p>
+<ul>
+<li>Minimum score (default = 0.33): The minimum score (confidence) that is required for entity suggestions</li>
+<li>Maximum Locations (default = 3): The maximum numbers of entity suggestions added (regardless if there would be more results with a score &gt; min-score.</li>
+<li>Maximum Locations (default = 0.7): The minimum score (confidence) that is required that hierarchy enhancements are added for an suggested entity. To add hierarchy enhancements for all suggested entities min-hierarchy-score needs to be set to a value smaller equals than min-score.</li>
+</ul>
+<p>The other three are used to configure the configured geonames.org server</p>
+<ul>
+<li>geonames.org Server: The URL of the geonames.org service. The default is the free geonames.org webserver that works without user authentication. There is a second free server at http://api.geonames.org/ that requires to setup a free user account. Users with a premium account will require to add here there own URL</li>
+<li>User Name: Thats the name of the account (can be empty if the configured server does not require user authentication</li>
+<li>Token: The token is usually the password of the user account.</li>
+</ul>
+<h3 id="howto-setup-a-free-user-account">HOWTO setup a free user account:</h3>
+<p>Such an account is required to be able to use the http://api.geonames.org/ server
+ that should support better performance and higher uptime than the default
+ free server available at http://ws.geonames.org/.</p>
+<p>To setup the free account:</p>
+<ol>
+<li>go to www.geonames.org. In the right top corner you will find a "login" link that is also used to create new accounts</li>
+<li>choose a username and pwd. You will get an confirmation mail at the provided email address. When choosing the password consider, that it will be sent unencrypted (as token) with every webservice Request. Therefore it is strongly suggested to do not use an password that is used for any other account!<br />
+</li>
+<li>confirm the account</li>
+<li>IMPORTANT: You need to activate the free web service for the account via http://www.geonames.org/manageaccount. Log in first, go back to this site. At the botton you should find the text "the account is not yet enabled to use the free web services. Click here to enable"</li>
+</ol>
+<p>If you do not complete step (4) requests with your account will result an IOExceptions with the message</p>
+<div class="codehilite"><pre><span class="s">&quot;user account not enabled to use the free webservice. Please enable it on your account page: http://www.geonames.org/manageaccount&quot;</span>
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/engines/index.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/engines/index.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/engines/index.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,176 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Enhancement Engines</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Enhancement Engines</h1>
+    <p>Enhancement engines are the components responsible to enhance <a href="../contentitem.html">content items</a>. They are called by the <a href="../enhancementjobmanager.html">Enhancement Job Manager</a>. Enhancement engines do have full access to the parsed content items. They are expected to modify their state.</p>
+<p>The RESTful interface of an enhancement engine can be accessed via</p>
+<div class="codehilite"><pre>http://{host}:{port}/{stanbol-root}/enhancer/engine/{engine-name}
+</pre></div>
+
+
+<p>e.g. an enhancement engine with the name "ner" running at a Apache Stanbol instance on local host with the default configuration will be accessible at</p>
+<div class="codehilite"><pre>http://localhost:8080/enhancer/engine/ner
+</pre></div>
+
+
+<p>When using the Java API, enhancement engines can be linked up as OSGI services. The <a href="enhancementenginemanager.html">Enhancement Engine Manager</a> service is designed to ease this by providing an API that allows to access enhancement engine by their name.</p>
+<h2 id="enhancement-engine-interface">Enhancement Engine Interface</h2>
+<p>The interface for enhancement engines contains the following three methods:</p>
+<div class="codehilite"><pre><span class="cm">/** Getter for the value of the &quot;stanbol.enhancer.engine.name&quot; property */</span>
+<span class="o">+</span> <span class="n">getName</span><span class="o">()</span> <span class="o">:</span> <span class="n">String</span>
+<span class="cm">/** Checks if this engine can enhance the parsed content item */</span>
+<span class="o">+</span> <span class="n">canEnhance</span><span class="o">(</span><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">)</span> <span class="o">:</span> <span class="kt">int</span>
+<span class="cm">/** Enhances the parsed content item */</span>
+<span class="o">+</span> <span class="n">computeEnhancements</span><span class="o">(</span><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">)</span>
+
+<span class="cm">/** The property used for the name of an engine */</span>
+<span class="n">PROPERTY_NAME</span> <span class="o">:</span> <span class="n">String</span>
+<span class="cm">/** Indicates that this engine can not enhance an content item */</span>
+<span class="n">CANNOT_ENHANCE</span> <span class="o">:</span> <span class="kt">int</span>
+<span class="cm">/** Indicates support for synchronous enhancement */</span>
+<span class="n">ENHANCE_SYNCHRONOUS</span> <span class="o">:</span> <span class="kt">int</span>
+<span class="cm">/** Indicates support for asynchronous enhancement */</span>
+<span class="n">ENHANCE_ASYNC</span> <span class="o">:</span> <span class="kt">int</span>
+</pre></div>
+
+
+<p>Each enhancement engine has a name. This is typically provided by the engine configuration and MUST be set as value to the property "stanbol.enhancer.engine.name" in the service registration of the enhancement engine. The getter for the name MUST return the same value as the value set to this property. Enhancement engine implementations will usually get the name by calling:</p>
+<div class="codehilite"><pre><span class="k">this</span><span class="o">.</span><span class="na">name</span> <span class="o">=</span> <span class="o">(</span><span class="n">String</span><span class="o">)</span><span class="n">ComponentContext</span><span class="o">.</span><span class="na">getProperties</span><span class="o">(</span><span class="n">EnhancementEngine</span><span class="o">.</span><span class="na">PROPERTY_NAME</span><span class="o">);</span>
+</pre></div>
+
+
+<p>The <code>canEnhance(ContentItem ci)</code> method is used by the <a href="../enhancementjobmanager.html">Enhancement Job Manager</a> to check if an engine is able to process a <a href="../contentitem.html">Content Item</a>. Calling this method MUST NOT change the state of the content item and this method MUST also NOT acquire a write lock on the content item.</p>
+<p>The <code>computeEnhancements(ContentItem ci)</code> starts the processing of the parsed content item by the engine. It is expected to change the state of the parsed content item. Engines that support asynchronous processing need to take care to correctly apply read/write locks when reading/writing information from/to the content item. Engines that return <code>ENHANCE_SYNCHRONOUS</code> on calls to <code>canEnhance(..)</code> do not need to use locks. They can trust that they have exclusive read/write access to the content item.</p>
+<p>Enhancement engines do have full access to the content item. Theoretically, they would be even allowed to delete all metadata as well as all content parts from the parsed content item. However typically the do only</p>
+<ul>
+<li>read existing content parts</li>
+<li>add new content parts</li>
+<li>add new enhancements to the metadata</li>
+<li>some engines might also need to update/delete existing metadata.</li>
+</ul>
+<p>Both the <code>canEnhance(..)</code> and <code>computeEnhancements(..)</code> methods MUST be called by the <a href="../enhancementjobmanager.html">Enhancement Job Manager</a> after all the executions of all enhancement engines this one depends on are completed. This dependencies are defined by the <a href="../chains/executionplan.html">Execution Plan</a> used by the enhancement job manager to enhance the content item. Implementors of enhancement engines can therefore trust that all metadata expected to be added by other enhancement engines are already present within the metadata of the parsed content items when <code>canEnhance(..)</code> or <code>computeEnhancements(..)</code> is called.</p>
+<h3 id="services-properties-interface">Services Properties Interface</h3>
+<p>This interface is implemented by most of the current enhancement engines. It allows engines to expose additional properties to other components. This interface defines a single method</p>
+<div class="codehilite"><pre><span class="cm">/** Getter for the ServiceProperties */</span>
+<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">Object</span><span class="o">&gt;</span> <span class="n">getServiceProperties</span><span class="o">();</span>
+</pre></div>
+
+
+<p>but also predefines the property <code>ENHANCEMENT_ENGINE_ORDERING = "org.apache.stanbol.enhancer.engine.order"</code> that can be used by enhancement engine implementations to specify their typical ordering within the enhancement process.</p>
+<h3 id="engine-ordering-information">Engine Ordering Information</h3>
+<p>By implementing the ServicesProperties interface, enhancement engines do have the possibility to expose additional metadata to other components. The services properties interface defines only a single method</p>
+<div class="codehilite"><pre><span class="cm">/** Getter for the ServiceProperties */</span>
+<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">Object</span><span class="o">&gt;</span> <span class="n">getServiceProperties</span><span class="o">();</span>
+</pre></div>
+
+
+<p>and is implemented by most of the current enhancement engines. Its currently only use is to provide information about the engine ordering within the enhancement process. This information is exposed by using the key "org.apache.stanbol.enhancer.engine.order" that is defined as value by the constant <code>ENHANCEMENT_ENGINE_ORDERING</code> defined directly by the services properties interface. Values are expected to be integer within the ranges </p>
+<ul>
+<li><strong>ORDERING_PRE_PROCESSING</strong>: All values &gt;= 200 are considered for engines that do some kind of preprocessing of the content. This includes e.g. the conversion of media formats such as extracting the plain text from HTML, keyframes from videos, wave form from mp3 ...; extracting metadata directly encoded within the parsed content such as ID3 tags from MP3 or RDFa, microdata provided by HTML content.</li>
+<li><strong>ORDERING_CONTENT_EXTRACTION</strong>: This range includes values form &lt; 200 and &gt;= 100 and shall be used by enhancement engines that need to analyze the parsed content to extract additional metadata. Examples would be Language detection, Natural Language Processing, Named Entity Recognition, Face Detection in Images, Speech to text …</li>
+<li><strong>ORDERING_EXTRACTION_ENHANCEMENT</strong>: This range includes values from &lt; 100 and &gt;= 1 and shall be used by enhancement engines to provide semantic lifting of preexisting enhancement such as linking named entities extracted by an NER engine with entities defines in a controlled vocabulary or lifting artist names, song titles ... extracted from mp3 files with the according Entities defined in an music database.</li>
+<li><strong>ORDERING_DEFAULT</strong>: This represents the value 0 and shall be used as default value for all enhancement engines that do not provide ordering information or do not implement the ServicesProperties interface.</li>
+<li><strong>ORDERING_POST_PROCESSING</strong>: This range includes valued form &lt; 0 and &gt;= -100 and is intended to be used by all enhancement engines that do post processing of enhancement results such as schema translation, filtering of Enhancements ...<br />
+</li>
+</ul>
+<p>The engine ordering information as described here are used by the <a href="../chains/defaultchain.html">Default Chain</a> and the <a href="../chains/weightedchain.html">Weighted Chain</a> to calculate the <a href="../chains/executionplan.html">Execution Plan</a>.</p>
+<p>Basically this features allows the implementor of an enhancement engine to define the correct position of his engine within an typical enhancement chain and therefore ensure that users who add this engine to an enhancer installation to immediately use this engine with the <a href="../chains/defaultchain.html">Default Chain</a>.</p>
+<p>However, the engine ordering is not the only possibility for users to control the execution order. Enhancement chain implementations such as the <a href="../chains/listchain.html">List Chain</a> and the <a href="../chains/graphchain.html">Graph Chain</a> do also allow to directly define the oder of execution. For these chains the ordering information provided by enhancement engines are ignored.</p>
+<h2 id="enhancement-engine-management">Enhancement Engine Management</h2>
+<p>This section describes how enhancement engines are managed by the Apache Stanbol Enhancer and how they can be selected/accessed through the <a href="../enhancementjobmanager.html">Enhancement Job Manager</a> and executed in an <a href="../chains">Enhancement Chain</a>.</p>
+<p>Enhancement engines are registered as OSGi services and managed by using the following service properties:</p>
+<ul>
+<li><strong>Name:</strong> Defined by the value of the property "stanbol.enhancer.engine.name" it will be used to access engines on the Stanbol RESTful interface</li>
+<li><strong>Service Ranking:</strong> The service ranking property defined by OSGI will be used to decide which engine to use in case several active enhancement engines do use the same name. In such cases only the Engine with the highest ranking will be used to enhance ContentItems.</li>
+</ul>
+<!-- TODO: The Configuration is not yet defined 
+* __Configuration:__ Each EnhancementEngine MAY provide an RDF graph with its configuration. This graph will be returned on GET request on the URL of the enhancement engine. If no configuration is known for the engine this MUST at least return a single triple with the name for the engine.
+
+_TODO:_ To correctly construct this graph the Engine needs to know this URL. This could e.g. be provided by some OSGI environment parameter set by the JerseyApplication. As an alternative we could also parse this URI as an parameter to the getEngineConfig method.
+-->
+
+<p>Other components such as enhancement chains do refer to engines by their name. The actual enhancement engine instance is only looked up shortly before the execution.</p>
+<h3 id="enhancement-engine-name-conflicts">Enhancement Engine Name Conflicts</h3>
+<p>As enhancement engines are identified by the value of the "stanbol.enhancer.engine.name" property - the name - there might be cases where multiple enhancement engine are registered with the same name. In such cases the normal OSGi procedure to select the default service instance of several possible matches is used. This means that</p>
+<ol>
+<li>the enhancement engine with the highest "service.ranking" and</li>
+<li>the enhancement engine with the lowest "service.id"</li>
+</ol>
+<p>will be selected on requests for a enhancement engine with a given name. Requests on the RESTful service API will always answer with the enhancement engine selected as default. When using the Java API there are also means to retrieve all enhancement engines for a given name via the <a href="enhancementenginemanager.html">Enhancement Engine Manager</a> interface.</p>
+<p>Out of a user perspective there is one major use case for configuring multiple enhancement engines with the same name. This is to allow the definition of fallback engines if the main one becomes unavailable. E.g. lets assume that a user has a local cache of geonames.org loaded into the <a href="../../entityhub/">Entityhub</a> and configures an <a href="keywordlinkingengine.html">Named Entity Linking</a> engine to perform semantic lifting of extracted locations. However Apache Stanbol also provides the <a href="geonamesengine.html">geonames.org Engine</a> that provides a similar functionality by directly accessing <a href="http://geonames.org">geonames.org</a>. By configuring both engines for the same name, but specifying a higher service ranking for the one using the local cache one can ensure that the local cache is used for the enhancement under normal circumstances. However in case the local cache becomes unavailable the other engine using the remote service will be us
 ed for enhancement.</p>
+<h3 id="enhancement-engine-manager-interface">Enhancement Engine Manager Interface</h3>
+<p>The <a href="enhancementenginemanager.html">Enhancement Engine Manager</a> is the management interface for enhancement engines that can be used by components to lookup enhancement engines based on their name. There is also OSGI ServiceTracker like implementation that can be used to track only enhancement engines registered for a specific set of names. </p>
+<h2 id="enhancement-engine-implementations">Enhancement Engine Implementations</h2>
+<p>A list of enhancement engine implementations maintained directly by the Apache Stanbol community can be found <a href="list.html">here</a>.
+However the enhancement engine interface is designed in a way that it should be possible for advanced Apache Stanbol users to implement own enhancement engine implementations fulfilling their special needs.</p>
+<p>The Apache Stanbol community would be very happy if users decide to share thoughts about possible enhancement engines or even would like to contribute additional engines to the Apache Stanbol project.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>