You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/02/10 17:38:41 UTC

svn commit: r804437 - in /websites/staging/stanbol/trunk/content/stanbol/docs/trunk: components.html enhancer/engines/index.html enhancer/engines/list.html enhancer/index.html factstore/index.html

Author: buildbot
Date: Fri Feb 10 16:38:40 2012
New Revision: 804437

Log:
Staging update by buildbot for stanbol

Added:
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html
Modified:
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html

Modified: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html (original)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html Fri Feb 10 16:38:40 2012
@@ -66,10 +66,10 @@
 <p>We will shortly describe the components from top to bottom and link to their detailed descriptions.</p>
 <ul>
 <li>
-<p>The <a href="enhancer.html">Enhancer</a> component together with its <a href="engines.html">Enhancement Engines</a> provides you with the ability to post content to Apache Stanbol and get suggestions for possible entity annotation in return. The enhancements are provided via natural language processing, metadata extraction and linking named entities to public or private entity repositories. Furthermore, Apache Stanbol provides a machinery to further process this data and add additional knowledge and links via applying rules and reasoning. Technically, the enhancements are stored in a triple-graph that is maintained by <a href="http://incubator.apache.org/clerezza">Apache Clerezza</a>.</p>
+<p>The <a href="enhancer/">Enhancer</a> component together with its <a href="enhancer/engines">Enhancement Engines</a> provides you with the ability to post content to Apache Stanbol and get suggestions for possible entity annotation in return. The enhancements are provided via natural language processing, metadata extraction and linking named entities to public or private entity repositories. Furthermore, Apache Stanbol provides a machinery to further process this data and add additional knowledge and links via applying rules and reasoning. Technically, the enhancements are stored in a triple-graph that is maintained by <a href="http://incubator.apache.org/clerezza">Apache Clerezza</a>.</p>
 </li>
 <li>
-<p>The 'Sparql endpoint' gives access to the semantic enhancements form the Apache Stanbol <a href="enhancer.html">Enhancer</a>.</p>
+<p>The 'Sparql endpoint' gives access to the semantic enhancements form the Apache Stanbol <a href="enhancer/">Enhancer</a>.</p>
 </li>
 <li>
 <p>The 'EnhancerVIE' is a stateful interface to submit content to analyze and store the results on the server. It is then possible to browse the resulting enhanced content items.</p>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html Fri Feb 10 16:38:40 2012
@@ -0,0 +1,174 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Enhancement Engines</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Enhancement Engines</h1>
+    <p>Enhancement engines are the components that are responsible to enhance ContentItmes. They are called by the <a href="../enhancementjobmanager.html">EnhancementJobManager</a>. Enhancement engines do have full access to the parsed <a href="../contentitem.html">ContentItem</a>s. They are expected to modify the state of the content item.</p>
+<p>The RESTful interface of an EnhancementEngines can be accessed by</p>
+<div class="codehilite"><pre><span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">host</span><span class="p">}:{</span><span class="n">port</span><span class="p">}</span><span class="sr">/{stanbol-root}/</span><span class="n">enhancer</span><span class="sr">/engine/</span><span class="p">{</span><span class="n">engine</span><span class="o">-</span><span class="n">name</span><span class="p">}</span>
+</pre></div>
+
+
+<p>e.g. an EnhancementEngine with the name "ner" running at a Apache Stanbol instance on local host with the default configuration will be accessible at</p>
+<div class="codehilite"><pre><span class="n">http:</span><span class="sr">//</span><span class="n">localhost:8080</span><span class="sr">/enhancer/</span><span class="n">engine</span><span class="o">/</span><span class="n">ner</span>
+</pre></div>
+
+
+<p>When using the Java API enhancement engines can be liked up as OSGI services. The <a href="enhancementenginemanager.html">EnhanceEngineManager</a> service is designed to ease this by providing a API that allows to access enhancement engine by their name.</p>
+<h2 id="enhancement_engine_interface">Enhancement Engine Interface</h2>
+<p>The interface for enhancement engines contains the following three methods:</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the value of the &quot;stanbol.enhancer.engine.name&quot; property */</span>
+<span class="o">+</span> <span class="n">getName</span><span class="p">()</span> <span class="p">:</span> <span class="n">String</span>
+<span class="sr">/** Checks if this engine can enhance the parsed content item */</span>
+<span class="o">+</span> <span class="n">canEnhance</span><span class="p">(</span><span class="n">ContentItem</span> <span class="n">ci</span><span class="p">)</span> <span class="p">:</span> <span class="nb">int</span>
+<span class="sr">/** Enhances the parsed content item */</span>
+<span class="o">+</span> <span class="n">computeEnhacements</span><span class="p">(</span><span class="n">ContentItem</span> <span class="n">ci</span><span class="p">)</span>
+
+<span class="sr">/** The property used for the name of an engine */</span>
+<span class="n">PROPERTY_NAME</span> <span class="p">:</span> <span class="n">String</span>
+<span class="sr">/** Indicates that this engine can not enhance an content item */</span>
+<span class="n">CANNOT_ENHANCE</span> <span class="p">:</span> <span class="nb">int</span>
+<span class="sr">/** Indicates support for synchronous enhancement */</span>
+<span class="n">ENHANCE_SYNCHRONOUS</span> <span class="p">:</span> <span class="nb">int</span>
+<span class="sr">/** Indicates support for asynchronous enhancement */</span>
+<span class="n">ENHANCE_ASYNC</span> <span class="p">:</span> <span class="nb">int</span>
+</pre></div>
+
+
+<p>Each enhancement engine has an name assigned. This is typically provided by the engine configuration and MUST be set as value to the property "stanbol.enhancer.engine.name" in the service registration of the enhancement engine. The getter for the name MUST return the same value as the value set to this property. Enhancement engine implementations will usually get the name by calling</p>
+<p>this.name = (String)ComponentContext.getProperties(EnhancementEngine.PROPERTY_NAME);</p>
+<p>in the activate method.</p>
+<p>The "canEnahnce(ContentItem ci)" method is used by the <a href="../enhancementjobmanager.html">EnhancementJobManager</a> to check if an engine is able to process a <a href="../contentitem.html">ContentItem</a>. Calling this method MUST NOT change the state of the ContentItem and this method MUST also NOT acquire a write lock on the content item.</p>
+<p>The "computeEnhacements(ContentItem ci)" starts the processing of the parsed ContentItem by the engine. It is expected to change the state of the parsed ContentItem. Engines that support asynchronous processing need to take care to correctly apply read/write locks when reading/writing information from/to the content time. Engines that return ENHANCE_SYNCHRONOUS on calls to canEnhance(..) do not need to use locks. They can trust that they have exclusive read/write access to the content item.</p>
+<p>EnhancementEngiens do have full access to the ContentItem. Theoretically they would be even allowed to delete all metadata as well as all content parts from the parsed ContentItem. However typically the do only</p>
+<ul>
+<li>read existing ContentParts</li>
+<li>add new ContentParts</li>
+<li>add new Enhancements to the metadata</li>
+<li>some engines might also need to update/delete existing metadata.</li>
+</ul>
+<p>Both the "canEnhance(..)" and "computeEnhancements(..)" methods MUST be called by the <a href="../enhancementjobmanager.html">EnhancementJobManager</a> after all the executions of all enhancement engines this one depends on are completed. This dependencies are defined by the <a href="../chains/executionplan.html">ExecutionPlan</a> used by the EnhancementJobManager to enhance the ContentItem. Implementors of enhancement engines can therefore trust that all metadata expected to be added by other enhancement engines are already present within the metadata of the parsed ContentItems when "canEnhance(..)" or "computeEnhancements(..)" is called.</p>
+<h3 id="servicesproperties_interface">ServicesProperties Interface</h3>
+<p>This interface is implemented by most of the current enhancement engines. It allows engines to expose additional properties to other component. This interface defines a single method</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the ServiceProperties */</span>
+<span class="n">Map</span><span class="sr">&lt;String,Object&gt;</span> <span class="n">getServiceProperties</span><span class="p">();</span>
+</pre></div>
+
+
+<p>but also predefines the property ENHANCEMENT_ENGINE_ORDERING = "org.apache.stanbol.enhancer.engine.order" that can be used by enhancement engine implementations to specify their typical ordering within the enhancement process.</p>
+<h3 id="engine_ordering_information">Engine Ordering Information</h3>
+<p>By implementing the ServicesProperties interface enhancement engines do have the possibility to expose additional metadata to other components. The ServicesProperties interface defines only a single method</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the ServiceProperties */</span>
+<span class="n">Map</span><span class="sr">&lt;String,Object&gt;</span> <span class="n">getServiceProperties</span><span class="p">();</span>
+</pre></div>
+
+
+<p>and is implemented by most of the current enhancement engines. Its currently only use is to provide information about the engine ordering within the enhancement process. This information is exposed by using the key "org.apache.stanbol.enhancer.engine.order" that is defined as value by the constant ENHANCEMENT_ENGINE_ORDERING defined directly by the ServicesProperties interface. Values are expected to be integer within the ranges </p>
+<ul>
+<li><strong>ORDERING_PRE_PROCESSING</strong>: All values &gt;= 200 are considered for engines that do some kind of preprocessing of the Content. This includes e.g. the conversation of media formats such as extracting the plain text from HTML, keyframes from videos, wave form from mp3 ...; extracting metadata directly encoded within the parsed content such as ID3 tags from MP3 or RDFa, microdata provided by HTML content.</li>
+<li><strong>ORDERING_CONTENT_EXTRACTION</strong>: This range includes values form &lt; 200 and &gt;= 100 and shall be used by enhancement engine that need to analyze the parsed content to extract additional metadata. Examples would be Language detection, Natural Language Processing, Named Entity Recognition, Face Detection in Images, Speech to text …</li>
+<li><strong>ORDERING_EXTRACTION_ENHANCEMENT</strong>: This range includes values from &lt; 100 and &gt;= 1 and shall be used by enhancement engines to provide semantic lifting of preexisting enhancement such as linking named entities extracted by an NER engine with entities defines in a controlled vocabulary or lifting artist names, song titles ... extracted from mp3 files with the according Entities defined in an music database.</li>
+<li><strong>ORDERING_DEFAULT</strong>: This represents the value 0 and shall be used as default value for all enhancement engines that do not provide ordering information or do not implement the ServicesProperties interface.</li>
+<li><strong>ORDERING_POST_PROCESSING</strong>: This range includes valued form &lt; 0 and &gt;= -100 and is intended to be used by all enhancement engines that do post processing of enhancement results such as schema translation, filtering of Enhancements ...<br />
+</li>
+</ul>
+<p>The Engine Ordering information as described here are used by the <a href="../chains/defaultchain.html">DefaultChain</a> and the <a href="../chains/weightedchain.html">WeightedChain</a> to calculate the <a href="../chains/executionplan.html">ExecutionPlan</a>.</p>
+<p>Basically this features allows the implementor of an enhancement engine to define the correct position of his engine within an typical enhancement chain and therefore ensure that users that add this engine to a Stanbol Enhancer installation to immediately use this engine with the <a href="../chains/defaultchain.html">DefaultChain</a>.</p>
+<p>However the Engine Ordering is not the only possibility for users to control the execution order. Enhancement chain implementations such as the <a href="../chains/listchain.html">ListChain</a> and the <a href="../chains/graphchain.html">GraphChain</a> do also allow to directly define the oder of execution. For this chains the ordering information provided by EnhancementEngines are ignored.</p>
+<h2 id="enhancement_engine_management">Enhancement Engine Management</h2>
+<p>This section describes how enhancement engines are managed by the Stanbol Enhancer and how they can be selected/accessed by the <a href="../enhancementjobmanager.html">EnhancementJobManager</a> execution a <a href="../chains/enhancementchain.html">Chain</a>.</p>
+<p>Enhancement engines are registered as OSGI services and managed by using the following service properties:</p>
+<ul>
+<li><strong>Name:</strong> Defined by the value of the property "stanbol.enhancer.engine.name" it will be used to access Engines on the Stanbol RESTful interface</li>
+<li><strong>Service Ranking:</strong> The service ranking property defined by OSGI will be used to decide which engine to use in case several active enhancement engines do use the same name. In such cases only the Engine with the highest ranking will be used to enhance ContentItems.</li>
+</ul>
+<!-- TODO: The Configuration is not yet defined 
+* __Configuration:__ Each EnhacementEngien MAY provide an RDF graph with its configuration. This graph will be returned on GET request on the URL of the enhancement engine. If no configuration is known for the engine this MUST at least return a single triple with the name for the engine.
+
+_TODO:_ To correctly construct this graph the Engine needs to know this URL. This could e.g. be provided by some OSGI environment parameter set by the JerseyApplication. As an alternative we could also parse this URI as an parameter to the getEngineConfig method.
+-->
+
+<p>Other components such as enhancement Chains do refer to engines by their name. The actual enhancement engine instance is only looked up shortly before the execution.</p>
+<h3 id="enhancement_engine_name_conflicts">Enhancement Engine Name Conflicts</h3>
+<p>As enhancement engines are identified by the value of the "stanbol.enhancer.engine.name" property - the name - there might be cases where multiple enhancement engine are registered for the same name. In such cases the normal OSGI procedure to select the default service instance of several possible matches is used. This means that</p>
+<ol>
+<li>the enhancement engine with the highest "service.ranking" and</li>
+<li>the enhancement engine with the lowest "service.id"</li>
+</ol>
+<p>will be selected on requests for a enhancement engine with a given name. Requests on the RESTful service API will always answer with the enhancement engine selected as default. When using the Java API there are also means to retrieve all enhancement engines for a given name via the <a href="enhancementenginemanager.html">Enhancement Engine Manager</a> interface.</p>
+<p>Out of a user perspective there is one major use case for configuring multiple enhancement engines for the same name. This is to allow the definition of fallback engines if the main one becomes unavailable. e.g. lets assume that a user has a local cache of geonames.org loaded into the Entityhub and configures an <a href="keywordlinkingengine.html">Named Entity Linking</a> engine to perform semantic lifting of extracted locations. However Stanbol also provides the <a href="geonamesengine.html">geonames.org Engine</a> that provides a similar functionality by directly accessing <a href="http://geonames.org">geonames.org</a>. By configuring both engines for the same name, but specifying a higher service ranking for the one using the local cache one can ensure that the local cache is used for the enhancement under normal circumstances. However in case the local cache becomes unavailable the other engine using the remote service will be used for enhancement.</p>
+<h3 id="enhancement_engine_manager_interface">Enhancement Engine Manager Interface</h3>
+<p>The <a href="enhancementenginemanager.html">Enhancement Engine Manager</a> is the management interface for enhancement engines that can be used by components to lookup enhancement engines based on their name. There is also OSGI ServiceTracker like implementation that can be used to track only enhancement engines registered for a specific set of names. </p>
+<h2 id="enhancement_engine_implementations">Enhancement Engine Implementations</h2>
+<p>A list of enhancement engine implementations maintained directly by the Apache Stanbol community can be found <a href="../../engines.html">here</a>.
+However the enhancement engine interface is designed in a way that it should be possible for advanced Apache Stanbol users to implement own enhancement engine implementations fulfilling their special needs.</p>
+<p>The Stanbol Community would be very happy if users decide to share thoughts about possible enhancement engines or even would like to contribute addition engines to the Apache Stanbol project.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html Fri Feb 10 16:38:40 2012
@@ -0,0 +1,149 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Enhancement Engines and their main features</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Enhancement Engines and their main features</h1>
+    <h2 id="preprocessing">Preprocessing</h2>
+<ul>
+<li><strong><a href="enhancer/engines/langidengine.html">Language Identification Engine</a></strong><ul>
+<li>language detection for textual content utilizing <a href="http://tika.apache.org/">Apache Tika</a></li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/metaxaengine.html">Metaxa Engine</a></strong></p>
+<ul>
+<li>text extraction from various document formats</li>
+<li>extraction of metadata from document formats
+-</li>
+</ul>
+</li>
+</ul>
+<h2 id="natural_language_processing">Natural Language Processing</h2>
+<ul>
+<li><strong><a href="enhancer/engines/namedentityextractionengine.html">Named Entity Extraction Enhancement Engine</a></strong> <ul>
+<li>NLP processing using OpenNLP NER</li>
+<li>detects occurrences of persons, places and organizations only</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/keywordlinkingengine.html">KeywordLinkingEngine</a></strong></p>
+<ul>
+<li>NLP processing using OpenNLP</li>
+<li>supports multiple languages</li>
+<li>detects occurrences of untyped entities as concepts, takes local taxonomies as linking target</li>
+</ul>
+</li>
+<li>
+<p><em>Taxonomy Linking Engine</em> (deprecated, see KeywordLinkingEngine)</p>
+<ul>
+<li>NLP processing using OpenNLP POS</li>
+<li>detect occurrences of untyped entities as concepts, takes local taxonomies as linking target</li>
+</ul>
+</li>
+</ul>
+<h2 id="linking_suggestions">Linking Suggestions</h2>
+<ul>
+<li><strong><a href="enhancer/engines/namedentitytaggingengine.html">Named Entity Tagging Engine</a></strong><ul>
+<li>suggest links to several Linked Data Sources (e.g. DBpedia)</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/geonamesengine.html">Geonames Enhancement Engine</a></strong> </p>
+<ul>
+<li>suggests links to geonames.org</li>
+<li>provides hierarchical links for locations</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/opencalaisengine.html">OpenCalais Enhancement Engine</a></strong></p>
+<ul>
+<li>integrates service from Open Calais. (Note: You need to provide a key in order to use this engine)</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/zemantaengine.html">Zemanta Enhancement Engine</a></strong></p>
+<ul>
+<li>integrates the Zemanta services. (Note: You need to provide a key in order to use this engine)</li>
+</ul>
+</li>
+</ul>
+<h2 id="postprocessing__other">Postprocessing / Other</h2>
+<ul>
+<li><em>CachingDereferencerEngine</em> (deprecated, see dereferencing support of individual engines as well as  <a href="https://issues.apache.org/jira/browse/STANBOL-336">STANBOL-336</a>)<ul>
+<li>retrieves additional content for presenting the enhancement results.</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/refactorengine.html">Refactor Engine</a></strong>
+        - transforms enhancements according to a target ontology, requires KRES launcher.</p>
+</li>
+</ul>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html Fri Feb 10 16:38:40 2012
@@ -0,0 +1,124 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Enhancer</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Enhancer</h1>
+    <p>This stateless interface allows the caller to submit content to the Apache Stanbol <a href="engines/">enhancer engines</a> and get the resulting enhancements formatted as RDF at once without storing anything on the server-side.</p>
+<p>The content to analyze should be sent in a POST request with the mimetype specified in the Content-type header. The response will hold the RDF enhancement serialized in the format specified in the Accept header:</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span class="o">-</span><span class="n">X</span> <span class="n">POST</span> <span class="o">-</span><span class="n">H</span> <span class="s">&quot;Accept: text/turtle&quot;</span> <span class="o">-</span><span class="n">H</span> <span class="s">&quot;Content-type: text/plain&quot;</span> <span class="o">\</span>
+<span class="o">--</span><span class="n">data</span> <span class="s">&quot;John Smith was born in London.&quot;</span> <span class="n">http:</span><span class="sr">//</span><span class="n">localhost:8080</span><span class="o">/</span><span class="n">engines</span>
+</pre></div>
+
+
+<p>The list of mimetypes accepted as inputs depends on the deployed engines. By default only text/plain content will be analyzed.</p>
+<h2 id="list_of_available_enhancement_engines">List of Available Enhancement Engines</h2>
+<p>Apache Stanbol comes with a <a href="engines/list.html">list of predefined enhancement engines</a>. These engines are supported by the Apache Stanbol community. If you would like to implement your own enhancement engine, you should go on reading this documentation.</p>
+<h2 id="main_interfaces_and_utilities">Main Interfaces and Utilities</h2>
+<p>A <strong><a href="contentitem.html">Content Item</a></strong> is the unit of content that Stanbol Enhancer can deal with. It gives access to the binary content that was registered, and the graph that represents its metadata (provided by client and/or generated). The <strong><a href="engines/">Enhancement Engine</a></strong> provides the interface to internal or external semantic enhancement engines. There will usually be several of those, that the EnhancementJobManager uses to enhance content items. The <strong>Enhancement Job Manager</strong> accepts requests for enhancing ContentItems, and processes them either synchronously or asynchronously (as decided by the enhancement engines or by configuration). The <strong>Enhancement Engine Helper</strong> provides the classes for the resulting enhancement structure according to the defined <strong>Enhancement Structure</strong>.</p>
+<h2 id="enhancement_structure">Enhancement Structure</h2>
+<p>The enhancement structure for Apache Stanbol is been described <a href="http://wiki.iks-project.eu/index.php/EnhancementStructure">here</a> in full. It defines the types and properties used for the resulting metadata graph of Apache Stanbol. <em>Note: There is a proposal and ongoing discussion to update this structure in the future.</em> Every <strong>Enhancement</strong> type is a description which contains the following important properties:</p>
+<ul>
+<li>creator: the specific enhancement engine creating this enhancement</li>
+<li>creation time: the local system time, when the annotation was created</li>
+<li>extracted-from: the content item for the enhancement. This links to the ID of the content item as assigned by Stanbol.</li>
+<li>type: the type of the enhancement (e.g. Location, Person, Location, Concept ...).</li>
+<li>confidence: The level of confidence in the range from 0 to 1 </li>
+</ul>
+<p>A <strong>Text Annotation</strong> type provides metadata for the selected text. This is intended to be used in addition to the enhancement type if an enhancement is based on a part of the content.</p>
+<ul>
+<li>start: the character position of the start of the selection. If start is not defined it is assumed, that the selection starts at the beginning of the document</li>
+<li>end: the character position of the end of the selection. If end is not defined it is assumed, that the selection ends at the end of the document.</li>
+<li>selected-text: The text selected by the enhancement. (optional).</li>
+<li>selection-context: The context of the selected text. This adds the possibility to specify the context used to extract entities such as persons, organizations, locations ... from natural language documents.</li>
+</ul>
+<p>The <strong>Entity Annotation</strong> refer to named entities which have been recognized within the content. This type is intended to be used together with the FISE enhancement type.</p>
+<ul>
+<li>entity-reference: This refers to the URI identifying the Entity</li>
+<li>entity-label: The label(s) of the referred entity</li>
+<li>entity-type: This property can be used to specify the type of the entity (optional) </li>
+<li>The occurrences of the entity within the content (the exact positions within the text where this entity is referred) are determined by outgoing dc:relation links.</li>
+</ul>
+<h2 id="response_in_rdf">Response in RDF</h2>
+<p>Apache Stanbol Enhancer is able to serialize the response in the following RDF formats:</p>
+<div class="codehilite"><pre><span class="n">application</span><span class="o">/</span><span class="n">json</span> <span class="p">(</span><span class="n">JSON</span><span class="o">-</span><span class="n">LD</span><span class="p">)</span>
+<span class="n">application</span><span class="sr">/rdf+xml (RDF/</span><span class="n">XML</span><span class="p">)</span>
+<span class="n">application</span><span class="sr">/rdf+json (RDF/</span><span class="n">JSON</span><span class="p">)</span>
+<span class="n">text</span><span class="o">/</span><span class="n">turtle</span> <span class="p">(</span><span class="n">Turtle</span><span class="p">)</span>
+<span class="n">text</span><span class="o">/</span><span class="n">rdf</span><span class="o">+</span><span class="n">nt</span> <span class="p">(</span><span class="n">N</span><span class="o">-</span><span class="n">TRIPLES</span><span class="p">)</span>
+</pre></div>
+
+
+<p>By default the URI of the content item being enhanced is a local, non de-referencable URI automatically built out of a hash digest of the binary content. Sometimes it might be helpful to provide the URI of the content-item to be used in the enhancements RDF graph. This can be achieved by passing a URI request parameter as follows:</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span class="o">-</span><span class="n">X</span> <span class="n">POST</span> <span class="o">-</span><span class="n">H</span> <span class="s">&quot;Accept: text/turtle&quot;</span> <span class="o">-</span><span class="n">H</span> <span class="s">&quot;Content-type: text/plain&quot;</span> <span class="o">\</span>
+<span class="o">--</span><span class="n">data</span> <span class="s">&quot;John Smith was born in London.&quot;</span> <span class="o">\</span>
+<span class="s">&quot;http://localhost:8080/engines?uri=urn:fise-example-content-item&quot;</span>
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Modified: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html (original)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html Fri Feb 10 16:38:40 2012
@@ -57,10 +57,8 @@
   
   <div id="content">
     <h1 class="title">Factstore</h1>
-    <p>The FactStore is a component that let's use store relations between entities identified by their URIs. A relation between two or more entities is called a <em>fact</em>. The FactStore let's you store N-ary facts according to a user defined fact schema. In consequence you can store relations between N participating entities.</p>
-<p>The FactStore only stores the relation and not the entities itself. It only uses references to entities by using the entities' URI. The entities itself should be handled by another component, e.g. the <a href="../entityhub.html">EntityHub</a>. A fact is defined by a fact schema which is defined over types of entities.</p>
-<p>A fact schema can be defined between an arbitrary number of entities. In most cases a fact schema is defined between two or three entities. For example, the fact schema 'works-for' can be defined as a relation between entities of type 'Person' and 'Organization'. The Fact Store interface allows the creation of custom fact schemata and to store facts according to these custom schemata.</p>
-<p>The Fact Store provides a simple way to define and store facts. This component is meant to be used in scenarios where a simple solution is sufficient and it is not required to define a complex ontology with reasoning support.</p>
+    <p>The FactStore is a component that let's use store relations between entities identified by their URIs. A relation between two or more entities is called a <em>fact</em>. The FactStore let's you store N-ary facts according to a user defined fact schema. In consequence you can store relations between N participating entities. The FactStore only stores the relation and not the entities itself. It only uses references to entities by using the entities' URI. The entities itself should be handled by another component, e.g. the <a href="../entityhub.html">EntityHub</a>. A fact is defined by a fact schema which is defined over types of entities.</p>
+<p>A fact schema can be defined between an arbitrary number of entities. In most cases a fact schema is defined between two or three entities. For example, the fact schema 'works-for' can be defined as a relation between entities of type 'Person' and 'Organization'. The Fact Store interface allows the creation of custom fact schemata and to store facts according to these custom schemata. The Fact Store provides a simple way to define and store facts. This component is meant to be used in scenarios where a simple solution is sufficient and it is not required to define a complex ontology with reasoning support.</p>
 <p>Read on and have a look at a concrete example or go to the <a href="specification.html">FactStore specification</a> page for more details. If you need some information about its realization, read the notes about its <a href="implementation.html">implementation concept</a>.</p>
 <h2 id="example">Example</h2>
 <p>Imagine you want to store the fact that the person named John Doe works for the company Winzigweich. John Doe is represented by the URI http://www.doe.com/john and the company by http://www.winzigweich.de. This fact is stored as a relation between the entity http://www.doe.com/john and http://www.winzigweich.de.</p>