You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/07/16 15:02:48 UTC

svn commit: r825985 [5/12] - in /websites/staging/stanbol/trunk/content: ./ stanbol/docs/trunk/ stanbol/docs/trunk/cmsadapter/ stanbol/docs/trunk/components/ stanbol/docs/trunk/components/cmsadapter/ stanbol/docs/trunk/components/contenthub/ stanbol/do...

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/namedentitytaggingengine.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/namedentitytaggingengine.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/namedentitytaggingengine.html Mon Jul 16 13:02:45 2012
@@ -0,0 +1,159 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - The Named Entity Tagging Engine: linking text annotations to (external) datasets of entities</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a><ul>
+<li><a href="/stanbol/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/stanbol/docs/trunk/components.html">Components</a></li>
+</ul>
+</li>
+<li><a href="/stanbol/development/">Development</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/stanbol/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/downloads/">Overview</a><ul>
+<li><a href="/stanbol/downloads/releases.html">Releases</a></li>
+<li><a href="/stanbol/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/stanbol/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrump" style="font-size: 80%;">
+      <a href="/">Home</a>&nbsp;&raquo&nbsp;<a href="/stanbol/">Stanbol</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/">Trunk</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/">Components</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/">Enhancer</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/engines/">Engines</a>
+    </div>
+    <h1 class="title">The Named Entity Tagging Engine: linking text annotations to (external) datasets of entities</h1>
+    <p>The Entity Linking Engine uses <em><a href="../../entityhub.html">Referenced Sites</a></em> to search for Entities based on given Text Annotations.</p>
+<h2 id="configuration">Configuration</h2>
+<p>The configuration decides, which dataset you want to use as linking target. The default value is "local" referencing to the default DBpedia index. You may also decide on whether given types should restrict the set of possible links. E.g. for DBpedia, some organisations are not tagged as such, therefore, you want get them with this engine although, you expect them from your dataset.</p>
+<ul>
+<li>Referenced Site: {local, your referenced site}: The ID of the Entityhub Referenced Site used for semantic lifting of TextAnnotations.</li>
+<li>Persons: {true, false}: Set to TRUE to enable semantic lifting of Persons</li>
+<li>Person Type {<empty>, dbp-ont:Person}: The rdf:type used to search for Persons. If empty Entities of any type are accepted.</li>
+<li>Organisations {true, false}: Set to TRUE to enable semantic lifting of Organisations</li>
+<li>Organisation Type {<empty>, dbp-ont:Organisation}: The rdf:type used to search for Organizations. If empty Entities of any type are accepted.</li>
+<li>Places {true, false}: Set to TRUE to enable semantic lifting of Places</li>
+<li>Place Type {<empty>, dbp-ont:Place}: The rdf:type used to search for Places. If empty Entities of any type are accepted.*</li>
+<li>Label Field {<empty>, rdfs:label}: The field used to search for Entities with a label similar to the selected text of the Text Annotation. If empty rdfs:label is used as default*</li>
+</ul>
+<h2 id="example-result">Example Result</h2>
+<p>For the sentence "The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley.", you will get several EntityAnnotations for the terms "Paris" and "Bob Marley" from your linking target resource (in this case DBpedia) together with a confidence value, which can be used to sort the suggestions, e.g.:</p>
+<div class="codehilite"><pre>{
+  &quot;@subject&quot;: &quot;urn:enhancement-b98283ae-845d-6666-d68b-f649852a7959&quot;,
+  &quot;@type&quot;: [&quot;enhancer:Enhancement&quot;,&quot;enhancer:EntityAnnotation&quot;],
+  &quot;dc:created&quot;: &quot;2012-02-29T11:34:56.383Z&quot;,
+  &quot;dc:creator&quot;: &quot;org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine&quot;,
+  &quot;dc:relation&quot;: &quot;urn:enhancement-b3d4617d-1760-0374-f471-e0e746003f4e&quot;,
+  &quot;enhancer:confidence&quot;: 16641.191,
+  &quot;enhancer:entity-label&quot;: 
+     {
+       &quot;@literal&quot;: &quot;Bob Marley&quot;,
+       &quot;@language&quot;: &quot;en&quot;
+     },
+   &quot;enhancer:entity-reference&quot;: &quot;http://dbpedia.org/resource/Bob_Marley&quot;,
+   &quot;enhancer:entity-type&quot;: 
+      [&quot;dbp-ont:MusicalArtist&quot;, &quot;foaf:Person&quot;, &quot;dbp-ont:Artist&quot;,
+        &quot;dbp-ont:Person&quot;, &quot;owl:Thing&quot;],
+   &quot;enhancer:extracted-from&quot;: &quot;urn:content-item-sha1-37c8a8244041cf6113d4ee04b3a04d0a014f6e10&quot;
+  },
+</pre></div>
+
+
+<p>or </p>
+<div class="codehilite"><pre>{
+  &quot;@subject&quot;: &quot;urn:enhancement-785a4c4f-dc7d-aa46-91a2-aef840542ae2&quot;,
+  &quot;@type&quot;: [&quot;enhancer:Enhancement&quot;,&quot;enhancer:EntityAnnotation&quot;],
+  &quot;dc:created&quot;: &quot;2012-02-29T11:34:56.383Z&quot;,
+  &quot;dc:creator&quot;: &quot;org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine&quot;,
+  &quot;dc:relation&quot;: &quot;urn:enhancement-c176f1bf-a1dd-830e-df7d-deecdfdc8375&quot;,
+  &quot;enhancer:confidence&quot;: 1323049.5,
+  &quot;enhancer:entity-label&quot;: 
+     {
+       &quot;@literal&quot;: &quot;Paris&quot;,
+       &quot;@language&quot;: &quot;en&quot;
+     },
+   &quot;enhancer:entity-reference&quot;: &quot;http://dbpedia.org/resource/Paris&quot;,
+   &quot;enhancer:entity-type&quot;:
+      [&quot;dbp-ont:PopulatedPlace&quot;,&quot;dbp-ont:Settlement&quot;,
+      &quot;http://www.opengis.net/gml/_Feature&quot;,
+      &quot;dbp-ont:Place&quot;,&quot;owl:Thing&quot;],
+   &quot;enhancer:extracted-from&quot;: &quot;urn:content-item-sha1-37c8a8244041cf6113d4ee04b3a04d0a014f6e10&quot;
+ }
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/opencalaisengine.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/opencalaisengine.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/opencalaisengine.html Mon Jul 16 13:02:45 2012
@@ -0,0 +1,174 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - The OpenCalais Enhancement Engine</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a><ul>
+<li><a href="/stanbol/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/stanbol/docs/trunk/components.html">Components</a></li>
+</ul>
+</li>
+<li><a href="/stanbol/development/">Development</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/stanbol/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/downloads/">Overview</a><ul>
+<li><a href="/stanbol/downloads/releases.html">Releases</a></li>
+<li><a href="/stanbol/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/stanbol/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrump" style="font-size: 80%;">
+      <a href="/">Home</a>&nbsp;&raquo&nbsp;<a href="/stanbol/">Stanbol</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/">Trunk</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/">Components</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/">Enhancer</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/engines/">Engines</a>
+    </div>
+    <h1 class="title">The OpenCalais Enhancement Engine</h1>
+    <p>The <strong>OpenCalais Enhancement Engine</strong> provides an interface to the <a href="http://www.opencalais.com/">OpenCalais
+Webservice</a> for Named Entity Recognition (NER).</p>
+<h2 id="technical-description">Technical description</h2>
+<p>The engine will send the text of content item to the OpenCalais service and
+retrieve the NER annotations in RDF format.  The OpenCalais annotations are
+added to the content item's metadata as specified by the Stanbol <a href="../enhancementstructure.html">Enhancement
+ Structures</a>.</p>
+<p>The engine natively supports the mime types <em>text/plain</em> and
+<em>text/html</em>. Additionally, text can be processed that is provided in the content
+item's metadata as value of the property</p>
+<div class="codehilite"><pre>http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
+</pre></div>
+
+
+<p>Supported languages are</p>
+<ul>
+<li>English (en)</li>
+<li>French (fr)</li>
+<li>Spanish (es)</li>
+</ul>
+<h2 id="requirements-for-use-and-configuration-options">Requirements for use and configuration options</h2>
+<p>The use of this component requires an API key from OpenCalais. Without
+providing an API key, the engine will not do anything.  Such a key can be
+obtained from <a href="http://www.opencalais.com/APIkey">http://www.opencalais.com/APIkey</a>.</p>
+<p>In the OSGi configuration the key is set as value of the property</p>
+<div class="codehilite"><pre>org.apache.stanbol.enhancer.engines.opencalais.license
+</pre></div>
+
+
+<p>Also, the unit tests require the API key. Without the key some tests will be
+skipped. For Maven the key can be set as a system property on the command line:</p>
+<div class="codehilite"><pre>mvn -Dorg.apache.stanbol.enhancer.engines.opencalais.license<span class="o">=</span>YOUR_API_KEY <span class="o">[</span>install|test<span class="o">]</span>
+</pre></div>
+
+
+<p>The following configuration properties are defined:</p>
+<ul>
+<li><tt>org.apache.stanbol.enhancer.engines.opencalais.license</tt>: The OpenCalais license key that <strong>must</strong> be defined.</li>
+<li><tt>org.apache.stanbol.enhancer.engines.opencalais.url</tt>: The URL of the OpenCalais RESTful service. That needs only be changed when OpenCalais should change its web service address.</li>
+<li>
+<p><tt>org.apache.stanbol.enhancer.engines.opencalais.typeMap</tt>: The value is the name of a file for mapping the NER types from OpenCalais to other types. By default, a mapping to the DBPedia types is provided in order to achieve compatibility with the Stanbol OpenLNLP-NER engine.  If no mapping is desired one might pass an empty mapping file. Types for which no mapping is defined are passed as is to the metadata.  The syntax of the mapping table is similar to that of Java property files. Each entry takes the form</p>
+<p>:::text
+CalaisTypeURI=TargetTypeURI</p>
+</li>
+<li>
+<p><tt>org.apache.stanbol.enhancer.engines.opencalais.NERonly</tt>: A Boolean property to specify whether in addition to the NER enhancements also the OpenCalais Linked Data references are included as entity references. By default, these are omitted.</p>
+</li>
+</ul>
+<h2 id="usage">Usage</h2>
+<p>Assuming that the Stanbol endpoint with the full launcher is running at</p>
+<div class="codehilite"><pre>http://localhost:8080
+</pre></div>
+
+
+<p>the license key has been defined and the engine is activated, from the
+command line commands like this can be used for submitting some text file as content item:</p>
+<ul>
+<li>
+<p>stateless interface</p>
+<p>:::bash
+curl -i -X POST -H "Content-Type:text/plain" -T testfile.txt http://localhost:8080/engines</p>
+</li>
+<li>
+<p>stateful interface</p>
+<p>:::bash
+curl -i -X PUT -H "Content-Type:text/plain" -T testfile.txt http://localhost:8080/contenthub/content/someFileId</p>
+</li>
+</ul>
+<p>Alternatively, the Stanbol web interface can be used for submitting documents
+and viewing the metadata at</p>
+<div class="codehilite"><pre>http://localhost:8080/contenthub
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/refactorengine.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/refactorengine.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/refactorengine.html Mon Jul 16 13:02:45 2012
@@ -0,0 +1,113 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - The Refactor Engine</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a><ul>
+<li><a href="/stanbol/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/stanbol/docs/trunk/components.html">Components</a></li>
+</ul>
+</li>
+<li><a href="/stanbol/development/">Development</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/stanbol/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/downloads/">Overview</a><ul>
+<li><a href="/stanbol/downloads/releases.html">Releases</a></li>
+<li><a href="/stanbol/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/stanbol/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrump" style="font-size: 80%;">
+      <a href="/">Home</a>&nbsp;&raquo&nbsp;<a href="/stanbol/">Stanbol</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/">Trunk</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/">Components</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/">Enhancer</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/engines/">Engines</a>
+    </div>
+    <h1 class="title">The Refactor Engine</h1>
+    <p>It re-factors the RDF graphs of recognized entities to a target vocabulary. The engines is provided with a default set of rules (a recipe) for the refactoring which allows to produce an RDF graph according to the google vocabulary. That default recipe allows to produce google rich
+snippets.</p>
+<h2 id="technical-description">Technical Description</h2>
+<p>This enhancement engine requires the following components running:</p>
+<ul>
+<li>Stanbol Entityhub</li>
+<li>Stanbol Refactor</li>
+<li>Stanbol OntoNet</li>
+</ul>
+<h2 id="example">Example</h2>
+<p>TODO</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/tikaengine.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/tikaengine.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/tikaengine.html Mon Jul 16 13:02:45 2012
@@ -0,0 +1,163 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Tika Engine</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a><ul>
+<li><a href="/stanbol/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/stanbol/docs/trunk/components.html">Components</a></li>
+</ul>
+</li>
+<li><a href="/stanbol/development/">Development</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/stanbol/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/downloads/">Overview</a><ul>
+<li><a href="/stanbol/downloads/releases.html">Releases</a></li>
+<li><a href="/stanbol/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/stanbol/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrump" style="font-size: 80%;">
+      <a href="/">Home</a>&nbsp;&raquo&nbsp;<a href="/stanbol/">Stanbol</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/">Trunk</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/">Components</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/">Enhancer</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/engines/">Engines</a>
+    </div>
+    <h1 class="title">Tika Engine</h1>
+    <p>Apache Stanbol Enhancement Engine based on Apache Tika that has three main functionalities:</p>
+<ol>
+<li>To detect the content type of parsed content. This is only performed if the no content type is parsed of the cogent type is set to "application/octed-stream". The detected content type is added to the metadata of the Content Item. </li>
+<li>To extract the plain text (and XHTML) from parsed content and add it to the <a href="../contentitem.html">ContentItem</a>   as content parts with the type Blob.</li>
+<li>To extract metadata from the parsed content and add it to the metadata of the <a href="../contentitem.html">ContentItem</a></li>
+</ol>
+<h2 id="supported-media-types">Supported Media Types</h2>
+<p>As this engine uses Apache Tika the supported media types are the same as stated on the <a href="http://tika.apache.org/1.0/formats.html">Tika Homepage</a>.</p>
+<h2 id="extracted-metadata">Extracted Metadata</h2>
+<p>Tika provides metadata as 'key:values' pairs. To use them efficiently within stanbol they need to be converted to valid RDF and aligned with existing Ontologies.</p>
+<p>The TikaEngine supports alignments to several different Ontologies. Such alignment rules can be activated/deactivated within the configuration of the TikaEngine.</p>
+<p>Supported Ontologies:</p>
+<ul>
+<li>
+<p><a href="http://www.w3.org/TR/mediaont-10/">Ontology for Media Resources</a>: This is the most complete mapping to an single Ontology. This includes mappings for all Dublin Core metadata; geo locations; some image specific data and most of the Audio and Viedo related metadata.</p>
+</li>
+<li>
+<p><a href="http://dublincore.org/documents/dcmi-terms/">DC terms</a>: Provides good mappings for text documents (HTML, Office, OpenOffice, PDF ...)</p>
+</li>
+<li>
+<p><a href="http://www.semanticdesktop.org/ontologies/2007/05/10/nexif/">Nepomuk EXIF ontology</a>: Interesting for users that want to work with EXIF metadata extracted from images.</p>
+</li>
+<li>
+<p><a href="http://www.semanticdesktop.org/ontologies/2007/03/22/nmo/">Nepomuk Message Ontology</a>: Used for sender and recaiver information of mail messages. </p>
+</li>
+<li>
+<p>SKOS: Allows mapping of labels and notes to <a href="http://www.w3.org/2009/08/skos-reference/skos.html">SKOS</a>. This is deactivated by default.</p>
+</li>
+<li>
+<p>RDFS: Allows to map labels and comments to "rdfs:label" and "rdfs:comment"</p>
+</li>
+</ul>
+<p>Note that the metadata extracted by the Tika engine are not covered by the Stanbol <a href="../enhancementstructure.html">Enhancement Structure</a> as they are outside of its scope.</p>
+<h3 id="contenttype">ContentType:</h3>
+<p>The detected content type for the parsed contentItem is added by using the following two properties:</p>
+<ul>
+<li>'http://purl.org/dc/terms/format': Dublin Core terms 'format'</li>
+<li>'http://www.w3.org/ns/ma-ont#hasFormat': Media Resource Ontology 'hasFormat'</li>
+</ul>
+<p>Note that this properties will only be present if the related Ontology is activated in the TikaEngine configuration.</p>
+<h2 id="sending-requests-directly-to-the-tika-engine">Sending Requests directly to the Tika Engine</h2>
+<p>The Stanbol Enhancer allows to send enhancement requests directly to specific EnhancementEngine. This feature can be used in combination with the Tika Engine to request</p>
+<ol>
+<li>the "text/plain" or "application/xhtml+xml" version of parsed content</li>
+<li>the extracted metadata as RDF aligned to the activated Ontologies</li>
+</ol>
+<p>The first example requests the plain text version of a PDF file with the name "test.pdf". </p>
+<div class="codehilite"><pre>curl -v -X POST -H <span class="s2">&quot;Accept: text/plain&quot;</span> -T test.pdf <span class="se">\</span>
+    <span class="s2">&quot;http://localhost:8080/enhancer/engine/tika?omitMetadata=true&quot;</span>
+</pre></div>
+
+
+<p>Note the </p>
+<ul>
+<li>'Accept' header is set to the contentType of the requested content and the </li>
+<li>'omitMetadata=true' telling the Enhancer to not return the RDF metadata.</li>
+</ul>
+<p>This second example returns the metadata as extracted from the parsed "song.mp3"</p>
+<div class="codehilite"><pre>curl -v -X POST -H <span class="s2">&quot;Accept: application/rdf+xml&quot;</span> -T song.mp3 <span class="se">\</span>
+    <span class="s2">&quot;http://localhost:8080/enhancer/engine/tika&quot;</span>
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/zemantaengine.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/zemantaengine.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/engines/zemantaengine.html Mon Jul 16 13:02:45 2012
@@ -0,0 +1,123 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - The Zemanta enhancement engine</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a><ul>
+<li><a href="/stanbol/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/stanbol/docs/trunk/components.html">Components</a></li>
+</ul>
+</li>
+<li><a href="/stanbol/development/">Development</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/stanbol/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/downloads/">Overview</a><ul>
+<li><a href="/stanbol/downloads/releases.html">Releases</a></li>
+<li><a href="/stanbol/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/stanbol/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrump" style="font-size: 80%;">
+      <a href="/">Home</a>&nbsp;&raquo&nbsp;<a href="/stanbol/">Stanbol</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/">Trunk</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/">Components</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/">Enhancer</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/engines/">Engines</a>
+    </div>
+    <h1 class="title">The Zemanta enhancement engine</h1>
+    <p>Enhancement engine that uses the Zemanta API. You need a Zemanta API key to run this engine.</p>
+<h2 id="usage">Usage</h2>
+<p>If the Engine does not show up in the Componets tab of the Apache Felix Web Console you will first need to build and install this Engine to your OSGI environment</p>
+<ol>
+<li>build ("mvn install") and deploy the Clerezza bundle org.apache.clerezza.rdf.jena.parser</li>
+<li>build the jar ("mvn install")</li>
+<li>import the jar into the OSGi runtime (all default</li>
+</ol>
+<p>To use this Enhancement Engine it is important to configure your Zemanta API key.</p>
+<ul>
+<li>In the OSGi web console, set the property "org.apache.stanbol.enhancer.engines.zemanta.key" with your API key</li>
+<li>restart the component in the OSGi console</li>
+<li>
+<p>Watch the console when you add text using commands such as:</p>
+<p>:::bash
+curl -T myText.txt -H Content-Type:text/plain http://localhost:8080/enhancer</p>
+</li>
+</ul>
+<h2 id="enhancements">Enhancements</h2>
+<p>This engine supports Extracted Entities and Topic Classification. The occurrence of extracted entities are represented by '<a href="../enhancementstructure.html#fisetextannotation">fise:TextAnnotation</a>' while suggested Entities are represented as '<a href="../enhancementstructure.html#fiseentityannotation">fise:EntityAnnotation</a>' with a 'dc:relation' link to the 'fise:TextAnnotation'. Categories are represented as '<a href="../enhancementstructure.html#fisetopicannotation">fise:TopicAnnotation</a>'s.</p>
+<p>Enhancemetns created by the ZemantaEngine are compatible to those created by similar engines such as the <a href="namedentityextractionengine.html">Named Entity Extraction Enhancement Engine</a>, <a href="keywordlinkingengine.html">KeywordLinkingEngine</a> or <a href="namedentitytaggingengine.html">Named Entity Tagging Engine</a>. This ensures that Stanbol Users can arbitrary mix those engines with the Zemanta based variant without the need to adapt client side code.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementexample.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementexample.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementjobmanager.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementjobmanager.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementjobmanager.html Mon Jul 16 13:02:45 2012
@@ -0,0 +1,170 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - EnhancementJobManager</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a><ul>
+<li><a href="/stanbol/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/stanbol/docs/trunk/components.html">Components</a></li>
+</ul>
+</li>
+<li><a href="/stanbol/development/">Development</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/stanbol/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/downloads/">Overview</a><ul>
+<li><a href="/stanbol/downloads/releases.html">Releases</a></li>
+<li><a href="/stanbol/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/stanbol/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrump" style="font-size: 80%;">
+      <a href="/">Home</a>&nbsp;&raquo&nbsp;<a href="/stanbol/">Stanbol</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/">Trunk</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/">Components</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/">Enhancer</a>
+    </div>
+    <h1 class="title">EnhancementJobManager</h1>
+    <p>The EnhancementJobManager is the component responsible for the execution of the <a href="chains/executionplan.html">ExecutionPlan</a> as provided by the <a href="chains">Enhancement Chain</a> on the <a href="contentitem.html">ContentItem</a>.</p>
+<h2 id="enhancementjobmanager-interface">EnhancementJobManager interface</h2>
+<p>The interface of the EnhancementJobManager is very simple:</p>
+<div class="codehilite"><pre><span class="cm">/** Enhances the content item by using the default Chain */</span>
+<span class="o">+</span> <span class="n">enhanceContent</span><span class="o">(</span><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">)</span>
+<span class="cm">/** Enhances the content item by using the parsed Chain */</span>
+<span class="o">+</span> <span class="n">enhanceContent</span><span class="o">(</span><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">,</span> <span class="n">Chain</span> <span class="n">chain</span><span class="o">)</span>
+</pre></div>
+
+
+<p>Note: the parsed ContentItem will be changed during the enhancement process. <a href="engines">EnhancementEngines</a> will add extracted knowledge to the metadata of the content item. Also additional content parts may be added to the ContentItem.</p>
+<h2 id="enhancement-process">Enhancement Process</h2>
+<p><span style="float:right"> <img alt="Enhancement Job Manager Overview" src="enhancementjobmanageroverview.png" title="The Enhancement Job Manager executes the ExecutionPlan provided by the Enhancement Chain and records the ExecutionMetadata" /></span> </p>
+<p>While the <a href="chains/executionplan.html">ExecutionPlan</a> defines what EnhancementEngines are used and how they depend on each other, the EnhancementJobManager is responsible for the actual execution of the enhancement process based on this plan. This section provides detailed information about requirements and expectations that MUST BE considered.</p>
+<p>The EnhancementJobManager is also responsible to create and update the <a href="executionmetadata.html">ExecutionMetadata</a> in the metadata of the processed <a href="contentitem.html">ContentItem</a>. Details about this are provided in the section "<a href="executionmetadata.html#creationmanagement_of_executionmetadata">Creation/Management of ExecutionMetadata</a>" of the ExecutionMetadata documentation.</p>
+<h3 id="initializing-the-enhancement-process">Initializing the Enhancement Process</h3>
+<p>Here one needs to distinguish two cases:</p>
+<ol>
+<li>Initialization of an new Enhancement process and</li>
+<li>Continuation of an existing Enhancement process.</li>
+</ol>
+<p>The two cases can be easily detected by the EnhancementJobManager by evaluating if a content part with the URI  "http://stanbol.apache.org/ontology/enhancer/executionMetadata#ChainExecution"  is present within the parsed  <a href="contentitem.html">ContentItem</a>.</p>
+<p>In the first case the <a href="chains/executionplan.html">ExecutionPlan</a> to be used by the enhancement process is provided by the Chain in a final graph that is guaranteed to be not changed. However because the configuration of a Chain might be changed at any time the EnhancementJobManager MUST retrieve the execution plan only once and use it during the entire enhancement process. In addition the ExecutionPlan MUST BE also added to the graph containing the <a href="executionmetadata.html">EnahcementMetadata</a>. In case of continuing on an previously aborted enhancement process the ExecutionPlan MUST BE initialized from the ExecutionMetadata provided by the ContentItem.</p>
+<p>For details on how to initialize/load the execution metadata see the section "<a href="executionmetadata.html#creationmanagement_of_executionmetadata">Creation/Management of ExecutionMetadata</a>" of the ExecutionMetadata documentation.</p>
+<h3 id="engine-execution">Engine Execution</h3>
+<p>The ExecutionPlan provides the necessary information which <a href="engines">EnhancementEngines</a> can be executed at any given state. The following code shows how to determine executable engines. 
+This code snippet assumes to be called after the execution of an EnhancementEngine has completed. Note that in a multi threaded environment access to the list of executed and running engines need to be synchronized.</p>
+<div class="codehilite"><pre><span class="n">Collection</span><span class="o">&lt;</span><span class="n">NonLiteral</span><span class="o">&gt;</span> <span class="n">executed</span><span class="o">;</span> <span class="c1">//already executed Engines</span>
+<span class="n">Collection</span><span class="o">&lt;</span><span class="n">NonLiteral</span><span class="o">&gt;</span> <span class="n">running</span><span class="o">;</span> <span class="c1">//currently running Engines</span>
+
+<span class="n">Collection</span><span class="o">&lt;</span><span class="n">NonLiteral</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">ExecutionPlanUtils</span><span class="o">.</span><span class="na">getExecuteable</span><span class="o">(</span><span class="n">plan</span><span class="o">,</span> <span class="n">executed</span><span class="o">);</span>
+<span class="k">for</span><span class="o">(</span><span class="n">NonLiteral</span> <span class="n">node</span> <span class="o">:</span> <span class="n">next</span><span class="o">){</span>
+    <span class="k">if</span><span class="o">(!</span><span class="n">running</span><span class="o">.</span><span class="na">contains</span><span class="o">(</span><span class="n">node</span><span class="o">)){</span>
+        <span class="n">String</span> <span class="n">engineName</span> <span class="o">=</span> <span class="n">EnhancementEngineHelper</span><span class="o">.</span><span class="na">getString</span><span class="o">(</span><span class="n">executionPlan</span><span class="o">,</span><span class="n">node</span><span class="o">,</span> <span class="n">EX_ENGINE</span><span class="o">));</span>
+        <span class="n">EnhancementEngine</span> <span class="n">engine</span> <span class="o">=</span> <span class="n">tracker</span><span class="o">.</span><span class="na">getEngine</span><span class="o">(</span><span class="n">engineName</span><span class="o">);</span>
+        <span class="k">if</span><span class="o">(</span><span class="n">engine</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">){</span>
+            <span class="c1">// execute engine</span>
+        <span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
+           <span class="c1">//check if optional and throw error if not</span>
+        <span class="o">}</span>
+    <span class="o">}</span> <span class="c1">// else already running -&gt; ignore</span>
+<span class="o">}</span>
+</pre></div>
+
+
+<p><em>NOTE</em> that the NonLiterals contained in the two collections are 'ep:ExecutionNode' instances and NOT 'em:EngineExecution' instances. Each 'em:EngineExecution' instance in the ExecutionMetadata' is linked by the 'em:executionNode' property to the corresponding 'ep:ExecutionNode' of the ExecutionPlan.</p>
+<p>Before executing an <a href="engines">EnhancementEngine</a>, the EnhancementJobManager needs to check if and how the engine can enhance a content item. This is indicated by the integer returned by the "canEnhance(ContentItem ci)" method:</p>
+<ul>
+<li><strong>CANNOT_ENHANCE</strong>: Indicates that this engines can not process the parsed content item. In this case the EnhancementJobManager needs to skip this engine and mark the EngineExecution as skipped with a status message that the EnhancementEngine was unable to process the content item. If this engine is marked as optional the enhancement process can continue; if not, the execution MUST be marked as failed and an according exception needs to be thrown.</li>
+<li><strong>ENHANCE_SYNCHRONOUS</strong>: Indicates that the engine needs exclusive access to the parsed content item. The EnhancementJobManager needs to ensure that in some way. Typically by calling the "computeEnhancement(ContentItem ci)" method within a write lock.</li>
+<li><strong>ENHANCE_ASYNC</strong>: Indicates that this engine supports asynchronous execution and takes itself care to acquire read and write locks on the parsed content item. However this does not require the JobManager to execute the engine asynchronously.</li>
+</ul>
+<p>If the execution of an EnhancementEngine completes, the JobManager needs to set the state of the execution to completed and update the execution metadata accordingly.</p>
+<p>If a call to "computeEnhancement(ContentItem ci)" results in an Exception the EnhancementJobManager must mark the execution of the engine as failed with a decryption of the occurred exception. If the execution of the affected engine was optional, the enhancement process is continued. Otherwise the enhancement process needs to be stopped and the Error needs to rethrown by the "enhanceContent(..)" method.</p>
+<p>For all the details on how to reflect state changes in the Execution metadata see <a href="executionmetadata.html#execution_state_management">this section</a> of the documentation of the ExecutionMetadata.</p>
+<h3 id="multi-threaded-enhancement-processes">Multi Threaded enhancement processes</h3>
+<p>In case the EnhancementJobManager supports to simultaneously call <a href="engines">EnhancementEngines</a> for the same content item in multiple threads, it is important to correctly use the ReadWriteLock as provided by the ContentItem.getLock() method.</p>
+<p>There are many good examples on how to correctly use "java.util.concurrent.ReadWriteLock" available on the web.</p>
+<h3 id="finalizing-the-enhancementprocess">Finalizing the EnhancementProcess</h3>
+<p>When the execution is completed (successfully or failed), the EnhancementJobManager need to ensure that the 'em:status' and the 'em:completed' of the 'em:ChainExecution' instance are set. If the execution failed also the 'em:statusMessage' should be available and contain a message that describes the problem.</p>
+<h2 id="enhancementjobmanager-implementations">EnhancementJobManager implementations</h2>
+<p>EnhancementJobManager implementations need to register itself as OSGI services. By default the Stanbol Enhancer will use the implementation with the highest service ranking. The service ranking can be set by providing a configuration defining an integer value for the property "service.ranking"</p>
+<h3 id="eventjobmanager">EventJobManager</h3>
+<p>This implementation is provided by the "org.apache.stanbol.enhancer.jobmanager.event" module and is currently used as default. It registers itself (by default) with a service ranking of '0'.</p>
+<p>This implementation supports an asynchronous enhancement process by using the <a href="http://www.osgi.org/javadoc/r4v42/org/osgi/service/event/package-summary.html">"org.osgi.service.event"</a> framework. </p>
+<h3 id="weightedjobmanager">WeightedJobManager</h3>
+<p>This JobManager was used as default before the introduction of EnhancementChains. It does not support EnhancementChains and will enhance parsed <a href="contentitem.html">ContentItems</a> by calling all currently active EnhancementEngines in a sequential manner. It does also not have support for EnhancementMetadata.</p>
+<p>This implementation is provided by the "org.apache.stanbol.enhancer.jobmanager.weightedjobmanager" module and is no longer included within the Apache Stanbol launchers. This JobManager registers itself with a service ranking of "-1000". Users that want to use this job manager need to manually install this bundle and either deactivate other EnhancementJobManager implementations or reconfigure the service ranking of this one to an value &gt; 0.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementjobmanageroverview.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementjobmanageroverview.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementstructure.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementstructure.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementstructure.html Mon Jul 16 13:02:45 2012
@@ -0,0 +1,222 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Stanbol Enhancement Structure</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a><ul>
+<li><a href="/stanbol/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/stanbol/docs/trunk/components.html">Components</a></li>
+</ul>
+</li>
+<li><a href="/stanbol/development/">Development</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/stanbol/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/downloads/">Overview</a><ul>
+<li><a href="/stanbol/downloads/releases.html">Releases</a></li>
+<li><a href="/stanbol/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/stanbol/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrump" style="font-size: 80%;">
+      <a href="/">Home</a>&nbsp;&raquo&nbsp;<a href="/stanbol/">Stanbol</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/">Trunk</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/">Components</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/">Enhancer</a>
+    </div>
+    <h1 class="title">Stanbol Enhancement Structure</h1>
+    <p>This document specifies the Structure used by the Stanbol Enhancer encodes features extracted form the parsed <a href="contentitem.html">ContentItem</a>. The Enhancement Structure is based on <a href="http://www.w3.org/TR/rdf-primer/">RDF</a> technology and defined as <a href="http://www.w3.org/2004/OWL/">OWL</a> ontology. </p>
+<p>Its two main purposes are to facilitate the:</p>
+<ol>
+<li>Interoperability between EnhancementEngines: The design of the Stanbol Enhancer is based on the processing of an <a href="contentitem.html">ContentItem</a> by multiple <a href="engines">EnhancementEngine</a>s in an <a href="chains">EnhancementChain</a>. Together with the ContentItem API the EnhancementStructure is the key enabler for the cooperation of the different engines. It ensures that enhancements created by one engine can be consumed by the following engines (e.g. the first engine detects the language of the parsed text; the second consumes the language to select the correct NER (named entity recognition) model and create enhancements describing Named Entities contained in the text; the third Engine consumes those Named Entity annotations and creates suggestions for Entities part of an controlled vocabulary).</li>
+<li>Consumption of extracted Features: The knowledge structure standardized by this Ontology aims to allow users to consume/process the features extracted from the parsed content. This includes things like:<ul>
+<li>list all suggested Entities (accept/reject Tags)</li>
+<li>list all suggested Topics (content classification)</li>
+<li>group Entity suggestion based on detected "Named Entities" (disambiguation support)</li>
+<li>show the occurrence of detected Entities within the analyzed text (similar to spell checker UIs)</li>
+</ul>
+</li>
+</ol>
+<p>While this document focuses on the first Engine and provides details on how the Stanbol Enhancement Structure it the integral part of the Stanbol Enhancer there is also a <a href="../enhancementusage.html">Usage Scenario</a> available that focuses on how the Enhancements can be consumed by Stanbol Enhancer users.</p>
+<h2 id="overview-on-the-stanbol-enhancement-structure">Overview on the Stanbol Enhancement Structure</h2>
+<p>The Stanbol Enhancement Structure is a central part of the <a href="index.html">Stanbol Enhancer</a> architecture as it represents the binding element between the <a href="contentitem.html">ContentItem</a> analyzed by the the <a href="engines">EnhancementEngine</a>s as configured by an <a href="chains">EnhancementChain</a>. Together with the <a href="contentitem.html#content-parts">ContentParts</a> it represents the state that is constantly updated during the enhancement process.</p>
+<p>The following graphic provides an overview on how the EnhancementStructure is used by the Stanbol Enhancer to formally represent the enhancement results.</p>
+<p><img alt="EnhancementStructure Overview" src="enhancementstructure.png" title="Overview of the Stanbol Enhancement Structure showing 'Bob Marley' recognized as Person within the parsed Text with two suggested Entities 'Bob Marley' the musician and 'Bob Marley' the comedian" /></p>
+<p>The above figure shows </p>
+<ul>
+<li>A <a href="contentitem.html">ContentItem</a> with a single plain text <a href="contentitem.html#content-parts">ContentParts</a> containing the text "Apache Stanbol can detect famous entities such as Paris or Bob Marley!"</li>
+<li>Three Enhancements: One TextAnnotation describing "Bob Marley" as Named-Entity as extracted by the NER (NamedEntityRecognition) engine and two EntityAnnotation that suggest different Entities from <a href="http://dbpedia.org">DBpedia.org</a>.</li>
+<li>Two referenced Entities: Both <a href="http://dbpedia.org/resource/Bob_Marley">dbpedia:Bob_Marley</a> and <a href="http://dbpedia.org/resource/Bob_Marley_%28comedian%29">dbpedia:Bob_Marley_(comedian)</a> are part of <a href="http://dbpedia.org">DBpedia.org</a> and referenced by fise:EntityAnnotations created by instance of the the <a href="engines/namedentitytaggingengine.html">NamedEntityLinging engine</a> configured to link with <a href="http://dbpedia.org">DBpedia.org</a></li>
+<li>An <a href="chains">EnhancementChain</a> with four <a href="engines">EnhancementEngine</a>s. However only the enhancements of the later two are shown in the figure.</li>
+</ul>
+<p>The bold relations within the figure are central as they show how the EnhancementStructure is used to formally specify that the mention "Bob Marley" within the analyzed text is believed to represent the Entity <a href="http://dbpedia.org/resource/Bob_Marley">dbpedia:Bob_Marley</a>. However it is also stated that there is a disambiguation with an other person <a href="http://dbpedia.org/resource/Bob_Marley_%28comedian%29">dbpedia:Bob_Marley_(comedian)</a>.</p>
+<p>The dashed relations are also important as they are used to formally describe the extraction context: which EnhancementEngine has extracted a feature from what ContentItem. If even more contextual information are needed, users can combine those information with the <a href="executionmetadata.html">ExecutionMetadata</a> collected during the enhancement process.</p>
+<h2 id="general-information">General Information</h2>
+<p><strong>Used Namespaces</strong></p>
+<p>This provides the list of namespaces used/referenced by the Enhancement Structure</p>
+<ul>
+<li><strong>fise</strong> (<em>http://fise.iks-project.eu/ontology/</em>): This is the main namespace of the currently used Enhancement Structure. All custom concepts and properties are defined using this namespace. (*)</li>
+<li><strong>enhancer</strong> (<em>http://stanbol.apache.org/ontology/enhancer/enhancer#</em>): This is the main namespace of the Stanbol Enhancer defining concepts such as ContentItem, EnhancementEngine, EnhancementChain …</li>
+<li>
+<dl>
+<dt><strong>entityhub</strong> (<em>http://stanbol.apache.org/ontology/entityhub/entityhub#</em>)</dt>
+<dd>This is the main namespace of the Stanbol Entityhub component. </dd>
+</dl>
+</li>
+<li><strong>dc</strong> (<em>http://purl.org/dc/terms/</em>): The Dublin Core terms standard is also heavily used by the Stanbol Enhancement Structure. Especially to encode metada data, but also to encode relations between extracted information (fise:Enhancement's)</li>
+<li><strong>dppedia-ont</strong> (<em>http://dbpedia.org/ontology/</em>): Concepts of this Ontology are used to describe the types of "Named Entities" detected in parsed content.</li>
+<li><strong>skos</strong> (<em>http://www.w3.org/2004/02/skos/core#</em>): The SKOS standard is preferable used to describe entries of Thesauri or more generally any type of controlled vocabularies.</li>
+<li><strong>rdf</strong> (<em>http://www.w3.org/1999/02/22-rdf-syntax-ns#</em>)</li>
+<li>in addition <a href="engines">EnhancementEngine</a>s are free to add/use properties of any additional Ontology (e.g. when adding the rdf:type's of suggested Entities).</li>
+</ul>
+<p><em>(*) Historical side note: FISE was the name of the Stanbol Enhancer before its <a href="http://wiki.apache.org/incubator/StanbolProposal">incubation to Apache</a>. The Enhancement Structure does still use the original namespace for compatibility reasons.</em></p>
+<p><strong>About Expressiveness:</strong></p>
+<p>All Stanbol Ontologies are encoded using OWL but restrict itself to basic features. Users need to be aware that not all rules defined in this documentation are formally expressed within the Ontology. However all the stated rules are validated by the <a href="http://svn.apache.org/repos/asf/incubator/stanbol/trunk/enhancer/generic/test/src/main/java/org/apache/stanbol/enhancer/test/helper/EnhancementStructureHelper.java">EnhancementStructureHelper</a> UnitTest utility part of the "org.apache.stanbol.enhancer.test" module. This ensures that EnhancementEngine implementation that validate there enhancement using this utility comply to this specification.</p>
+<p><strong>About Reasoning:</strong></p>
+<p>Apache Stanbol assumes the users will have no reasoning support. Because of that EnhancementEngines are required to materialize information that would be otherwise only available by reasoning (e.g. it is required that they add both "fise:TextAnnotation" and "fise:Enhancement" as "rdf:type"s when writing a TextAnnotation).</p>
+<h2 id="core-concepts">Core Concepts</h2>
+<p>The main concept of the Stanbol Enhancement Structure is the "fise:Enhancement". It is used as base concept for all annotation types and defines the generic properties every enhancement MUST provide (e.g. creator, creation date, extracted-from, confidence). On top of the "fise:Enhancement" three specific annotations types are defined:</p>
+<ul>
+<li>TextAnnotation: To describe features with there occurrence within the parsed Text</li>
+<li>EntityAnnotation: To suggest (linked) Entities with features detected within the content</li>
+<li>TopicAnnotation: To classify (link) the parsed content along topics</li>
+</ul>
+<h3 id="fiseenhancement">fise:Enhancement</h3>
+<p>Every feature extracted by an <a href="engines">EnhancementEngine</a> that is expressed using the Stanbol Enhancement Structure needs to be represented as a RDF resource with the "rdf:type" "fise:Enhancement".</p>
+<p>Enhancements use <a href="http://dublincore.org/documents/dcmi-terms/">Dublin Core terms</a> to provide metadata about their creation:</p>
+<ul>
+<li><strong>dc:creator</strong> <em>(required, single)</em>: The <a href="engines">EnhancementEngine</a> that created the Enhancement. Currently the full qualified name of the Java Class implementing the engine is used as String values. In future version this will change to the relative URL of the EnhancementEngine (e.g. "/enhancer/engine/{engine-name}")</li>
+<li><strong>dc:created</strong> <em>(required, single)</em>: The UTF date/time when the enhancement was created by the EnhancementEngine.</li>
+<li><strong>dc:contributor</strong> <em>(optional, multiple)</em>: Additional <a href="engines">EnhancementEngine</a> that contributed to the Enhancement.</li>
+<li><strong>dc:modified</strong> <em>(optional, single)</em>: The last change to a given enhancement.</li>
+</ul>
+<p>The following properties provide information about the enhancement</p>
+<ul>
+<li><strong>fise:extracted-from</strong> <em>(required, single)</em>: The URI of the "enhancer:ContentItem" the feature was extracted. EnhancementEngines need to use the UriRef returned by ContentItem#getUri() as value.</li>
+<li><strong>fise:confidence</strong> <em>(optional, single, range: 0 &lt;= confidence &lt;= 1)</em>: The confidence of the enhancement as floating point number. NOTE that while this uses a floating point number as value users should not treat values to be on a rational scale - meaning that an enhancement with a confidence of 0.4 is NOT half as good as one with 0.8!</li>
+<li><strong>dc:relation</strong> <em>(optional, multiple)</em>: Specifies that the current fise:Enhancement has a relation to an other fise:Enhancement. Values need to be resources of the "rdf:type" "fise:Enhancement".</li>
+<li><strong>dc:requires</strong> <em>(optional, multiple)</em>: Specifies that the current fise:Enhancement depends on an other fise:Enhancement. This is a stronger version of using "dc:relation" and should indicate that if one of the required enhancements is declined/removed this also affects this one. Values need to be resources of the "rdf:type" "fise:Enhancement". NOTE also that Dublin Core terms defines dc:requires as an sub-property of dc:relation.</li>
+</ul>
+<h3 id="fisetextannotation">fise:TextAnnotation</h3>
+<p>TextAnnotations are used to select portions parsed textual content by using the following properties:</p>
+<ul>
+<li><strong>fise:start</strong> <em>(optional, single)</em>: The start character position within the plain text version of the parsed content. Note that the plain text version can be retrieved by using the <a href="enhancerrest.html#multi-part-contentitem-support">multi-part content item support</a> of the Stanbol Enhancer RESTful API.</li>
+<li><strong>fise:end</strong> <em>(required of fise:start is present, single)</em>: The end character position. This MUST only be present of "fise:start" is also defined.</li>
+<li><strong>fise:selected-text</strong> <em>(optional, single)</em>: The text selected by the TextAnnotation. This MUST be the same as the text from index "fise:start" to "fise:end" within the plain text version of the parsed content.</li>
+<li><strong>fise:selection-context</strong> <em>(required if fise:selected-text is present, single)</em>: The selection context such as the current sentence or a fixed number of characters/word before and after the selected text. This MUST be present if "fise:selected-text" is defined.</li>
+<li><strong>dc:type</strong> <em>(optional,single)</em>: The nature of the selected part of the text (e.g. dbpedia-ont:Person, Organization, dbpedia-ont:Place for Named Entities; dc:LinguisticSystem for language annotations; skos:Concept for abstract things incl. categorizations). Note that dc:type values are just recommendations. Users are free to use different as the recommended one. As an example the <a href="engines/keywordlinkingengine.html">KeywordLinkingEngine</a> allows users to configure dc:type mappings.</li>
+</ul>
+<p>As hinted by the description of the above properties their usage depends on the size of the selected part of the text.</p>
+<ul>
+<li>selection of the whole Document: This is the default and MUST BE assumed if non of the start/end/selected-text/selection-context properties is present</li>
+<li>selection of a part (e.g. chapter, sentence): The preferred way is to define start/end positions. selected-text and selection-context are inefficient for bigger section as they would duplicate those sections of the content with the RDF graph as literals.</li>
+<li>Selection of words, word-phrases: In this case it is highly recommended to define start/end as well as selected-text/selection-context. Especially the selected-text and selection-context are important to calculate the exact position of an enhancement in non-plain-text content (e.g. HTML fragments).</li>
+</ul>
+<p>The following figure shows an fise:TextAnnotation used to mark the occurrence of Named Entity "Bob Marley" form character  59 to 69 in the given Content.</p>
+<p><img alt="'fise:TextAnnotation'" src="es_textannotation.png" title="This figure shows a TextAnnotation describing the occurrence of &quot;Bob Marley&quot; located from character 59 to 69 in the given text" /></p>
+<p>NOTE: In future version TextAnnotations might switch to a Model that uses</p>
+<ul>
+<li>fise:selection-prefix: some words/characters before the selected section.</li>
+<li>fise:selection-head: the first few word/characters of a the selected section within the text. Alternative to fise:selected-text in case bigger sections of the parsed content need to be selected.</li>
+<li>fise:selection-tail: the last few words/characters of a selected section. To be used together with fise:selection-head.</li>
+<li>fise:selection-suffix: some words/characters after the selected section.</li>
+</ul>
+<h3 id="fiseentityannotation">fise:EntityAnnotation</h3>
+<p>EntityAnnotations are used to suggest/link entities recognized within the Text. While fise:TextAnnotations are used for representing the recognition(s) (occurrence(s) within the content) the EntityAnnotation provides information about the referenced Entity.</p>
+<ul>
+<li><strong>fise:entity-reference</strong> <em>(required, single)</em>: The URI of the referenced entity. In cases several URIs are defined as equal (e.g. by "owl:sameAs") EnhancementEngines need to choose one of the URIs and include the according "owl:sameAs" in the enhancement results</li>
+<li><strong>fise:entity-label</strong> <em>(required, single)</em>: The label of the linked entity. While entities may define multiple labels (e.g. for different languages, alternate/preferred …) EnhancementEngines are required to only include a single - the best fitting - label.</li>
+<li><strong>fise:entity-type</strong> <em>(optional, multiple)</em>: The types of the linked entity. Usually this is the list of rdf:types. However there might be situations where other Resources are used as types. </li>
+<li><strong>dc:relation</strong> <em>(required, multiple)</em>: The dc:relation property is required for entity annotations. Typically values are "fise:TextAnnotation"s this EntityAnnotation is a suggestion for.</li>
+<li><strong>entityhub:site</strong> <em>(optional, single)</em>: The name of the Entityhub ReferencedSite managing the the suggested Entity. If this property is present users can dereference the suggested Entity with a GET request to "{stanbol}/entityhub/site/{site-name}/entity?id={entity}" where {site-name} is the value of this property and {entity} is the value of the "fise:entity-reference" property. 
+    NOTE: the values "local" and "entityhub" need to be treated separately. In those cases the GET request need to use "{stanbol}/entityhub/entity?id={entity}".</li>
+</ul>
+<p>The following figure shows an fise:EntityAnnotation for the Entity <a href="http://dbpedia.org/resource/Bob_Marley">'dbpedia:Bob_Marley'</a>.</p>
+<p><img alt="'fise:EntityAnnotation' example" src="es_entityannotation.png" title="This Example shown an EntityAnnotation that suggests the Entity 'dbpedia:Bob_Marley' for the TextAnnotation" /></p>
+<h3 id="fisetopicannotation">fise:TopicAnnotation</h3>
+<p>TopicAnnotation are used to categorize/classify the parsed content along some categorization system. This is done by suggesting/linking Topics of that categorization system for (possible parts) of the parsed content. A "fise:TextAnnotation" is used to select the part of the content where the linked topics apply.</p>
+<ul>
+<li><strong>fise:entity-reference</strong> <em>(required, single)</em>: The URI of the topic.</li>
+<li><strong>fise:entity-label</strong> <em>(required, single)</em>: The human readable label of the topic. While topics may define multiple labels (e.g. for different languages) EnhancementEngines are required to only include a single - the best fitting - label.</li>
+<li><strong>fise:entity-type</strong> <em>(optional, multiple)</em>: It is best practice to use <a href="http://www.w3.org/2004/02/skos/">SKOS</a> for modeling hierarchical classification systems. If this recommendation is followed than the value of fise:entity-type will be "skos:Concept". However users are free to also use different types with "fise:TopicAnnotation"s. </li>
+<li><strong>dc:relation</strong> <em>(required, multiple)</em>: The dc:relation property is required for topic annotations. It refers to the fise:TextAnnotation specifying the part of the text this topic is applied to.</li>
+<li><strong>entityhub:site</strong> (optional, single)_: The name of the Entityhub ReferencedSite managing the the suggested Entity. If this property is present users can dereference the suggested Entity with a GET request to "{stanbol}/entityhub/site/{site-name}/entity?id={entity}" where {site-name} is the value of this property and {entity} is the value of the "fise:entity-reference" property. 
+    NOTE: the values "local" and "entityhub" need to be treated separately. In those cases the GET request need to use "{stanbol}/entityhub/entity?id={entity}".</li>
+</ul>
+<p>The following figure shows a fise:TopicAnnotation suggesting the skos:Concept "Boxing" from the <a href="http://cv.iptc.org/newscodes/subjectcode/">IPTC Subject Codes</a>. The figure shows also that the Boxing category has Sport as an browser one.</p>
+<p><img alt="'fise:TopicAnnotation' example" src="es_topicannotation.png" title="This Example shown a TopicAnnotation that suggests the Category 'iptc-subjectcode:15014000'" /></p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementstructure.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementstructure.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementstructureoverview.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancementstructureoverview.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancer-overview.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancer-overview.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhanceroverview-s.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhanceroverview-s.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhanceroverview.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhanceroverview.png
------------------------------------------------------------------------------
    svn:mime-type = image/png