You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/04/11 10:30:50 UTC

svn commit: r812318 [6/10] - in /websites/staging/stanbol/trunk/content: ./ stanbol/docs/0.9.0-incubating/ stanbol/docs/0.9.0-incubating/cmsadapter/ stanbol/docs/0.9.0-incubating/contenthub/ stanbol/docs/0.9.0-incubating/enhancer/ stanbol/docs/0.9.0-in...

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/executionmetadata.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/executionmetadata.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/executionmetadata.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,325 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Execution Metadata</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Execution Metadata</h1>
+    <p>The execution metadata holds detailed information about an ongoing/completed enhancement process. Basically they describe how the <a href="chains/executionplan.html">ExecutionPlan</a> provided by the <a href="chains">Chain</a> was executed by the <a href="enhancementjobmanager.html">EnhancementJobManager</a>. Both the ExecutionMetadata and the ExecutionPlan are provided with the ContentItem as an own content part of the type MGraph with the URI "http://stanbol.apache.org/ontology/enhancer/executionmetadata#ChainExecution". For users of the Stanbol Enhancer the Execution Metadata are of interest to:</p>
+<ul>
+<li>check progress of asynchronously started Enhancement Processes: Metadata for all planed executions of engines are created as soon as an ContentItem is parsed to the EnhancementJobManager and are updated as soon as the execution of engines start/complete/fail.</li>
+<li>Monitor the performance of different EnhancementEngines: The Execution Metadata provide detailed information about starting/completion time points for engine executions.</li>
+<li>Inspect the Enhancement Process: check if optional EnhancementEngines were successfully executed or skipped/failed; validate the configured EnhancementChain by checking the actual execution order of the EnhancementEngines.</li>
+</ul>
+<h2 id="execution-metadata-ontology">Execution Metadata Ontology</h2>
+<p>The RDFS schema used for the execution plan is defined as follows:</p>
+<p><img alt="Execution Metadata" src="executionmetadata.png" title="Overview of the Execution Metadata Ontology" /></p>
+<ul>
+<li>Namespace: em : http://stanbol.apache.org/ontology/enhancer/executionmetadata#</li>
+<li><strong>em:Execution</strong> : Super class for all Executions<ul>
+<li><strong>em:executionPart</strong> (domain:Execution, range: em:ChainExecution): Defines that this execution was part of the execution of a chain</li>
+<li><strong>em:status</strong>(domain: em:Execution; range: em:ExecutionStatus): The status of an execution (used for both em:EngineExecution and em:ChainExecution</li>
+<li><strong>em:started</strong> (domain: em:Execution; range: xsd:dateTime): Marks the start of the execution</li>
+<li><strong>em:completed</strong> (domain: em:Execution; range: xsd:dateTime): Marks the completion of the execution</li>
+<li><strong>em:statusMessage</strong> (domain: em:Execution; range: xsd:string): A natural language description providing further information about the status of this execution. Typically used to parse error messages if the execution fails (em:status is set to em:StatusFailed).</li>
+</ul>
+</li>
+<li><strong>em:ChainExecution</strong> : Class used to describe the execution of an enhancement chain.<ul>
+<li><strong>em:defaultChain</strong> (domain: em:ChainExecution; range: xsd:boolean): If the executed chain is currently the default Chain of the Stanbol Enhancer.</li>
+<li><strong>em:executionPlan</strong> (domain:ChainExecution; range: ep:ExecutionPlan): Links to the execution plan as provided by the chain.</li>
+<li><strong>em:enhances</strong>(domain: em:ChainExecution; range: rdf:Resource) : links the em:ChainExecution with the URI of the processed content item. The range needs to be updated as soon as the Stanbol Enhancement Structure is defined.</li>
+<li><strong>em:enhancedBy</strong> (domain: rdf:Resource; range: em:ChainExecution) : links the URI of the content item with the metadata about the enhancement process. The range needs to be updated as soon as the Stanbol Enhancement Structure is defined.</li>
+</ul>
+</li>
+<li><strong>em:EngineExecution</strong> : Class used to describe the execution of an EnhancementEngine.<ul>
+<li><strong>em:executionNode</strong> (domain: em:EngineExecution; range: ep:ExecutionNode): The node within the ExecutionPlan</li>
+</ul>
+</li>
+<li><strong>em:ExecutionStatus</strong> : Class describing the status of an EngineExecution<ul>
+<li><strong>em:StatusScheduled</strong> : ExecutionStatus instance describing that an execution is scheduled but has not yet started</li>
+<li><strong>em:StatusInProgress</strong> : ExecutionStatus instance describing that the execution of the linked EngineExecution is in progress</li>
+<li><strong>em:StatusCompleted</strong> : ExecutionStatus instance describing that the execution has already completed successfully</li>
+<li><strong>em:StatusFailed</strong> : ExecutionStatus indicating that the execution has failed. Typically an em:statusMessage describing the reason for the failed execution is provided for em:Executions with this state.</li>
+<li><strong>em:StatusSkipped</strong> : ExecutionStatus indicating that the execution of an ep:ExecutionNode was skipped. This is only allowed for execution nodes that are marked as optional. Typically also an em:statusMessage with the reason should be provided.</li>
+</ul>
+</li>
+</ul>
+<h3 id="example">Example</h3>
+<p>The following example uses the same properties as used within the <a href="chains/executionplan.html">ExecutionPlan</a> section. To make it easier to see the relations between the execution metadata and the execution plan, the triples of the execution plan are included at the end of this example.</p>
+<p>This example describes the following situation:</p>
+<ul>
+<li>the execution of the content item with the URI 'urn:contentItem1' with the default chain</li>
+<li>the default chain is represented by a chain with the name "demoChain" the ExecutionPlan has the URI 'urn:execPlan'</li>
+<li>the successful execution of the 'langid' engine (execution: 'urn:exec1', node: 'urn:node1')</li>
+<li>the failed execution of the 'ner' engine (execution: 'urn:exec2', node: 'urn:node2'): As reason for the failure a message is provided that the NER model for the language 'de' is not available</li>
+<li>the successful execution of the 'zemanta' engine (execution: 'urn:exec3', node: 'urn:node5'): This engine was started in parallel to the 'ner' engine - therefore before the chain failed.</li>
+<li>There is no execution of the dbpediaLinking (node: '') and geonamesLinking (node: '') engines because the chain failed before these engines were scheduled. This assumes the EnhancementJobManager does only add em:EngineExecution resources when it starts the processing of an ep:ExecutionNode defined in the execution plan. However, the EnhancementJobManager can also create ep:Execution resources for all execution nodes. In that case there would be also em:EngineExecution resources for the dbpediaLinking and geonamesLinking engines with the em:status set to 'em:StatusScheduled'. </li>
+</ul>
+<p>The RDF graph with the Execution Metadata:</p>
+<div class="codehilite"><pre>urn:exec
+    rdf:type em:ChainExecution
+    em:executionPlan urn:execPlan
+    em:enhances urn:contentItem1
+    em:defaultChain &quot;true&quot;
+    em:started 2012-01-11T12.13.14.156
+    em:completed 2012-01-11T12.13.15.157
+    em:status em:StatusFailed
+    em:statusMessage &quot;Unable to execute EnhancementEngine &#39;new&#39; \
+        (Message: No NER model for language &#39;de&#39; is available).&quot;
+    em:executionPart urn:exec1, urn:exec2, urn:exec3, urn:exec4, urn:exec5
+
+urn:exec1
+    rdf:type em:EngineExecution
+    em:executionPart urn:exec
+    em:executionNode urn:node1
+    em:status em:StatusCompleted
+    em:started 2012-01-11T12.13.14.160
+    em:completed 2012-01-11T12.13.14.250
+
+urn:exec2
+    rdf:type em:EngineExecution
+    em:executionPart urn:exec
+    em:executionNode urn:node2
+    em:status StatusFailed
+    em:statusMessage &quot;No NER model for language &#39;de&#39; is available&quot;
+    em:started 2012-01-11T12.13.14.253
+    em:completed 2012-01-11T12.13.14.289
+
+urn:exec3
+    rdf:type em:EngineExecution
+    em:executionPart urn:exec
+    em:executionNode urn:node5
+    em:status StatusCompleted
+    em:started 2012-01-11T12.13.14.253
+    em:completed 2012-01-11T12.13.15.150
+</pre></div>
+
+
+<p>The Execution Plan: (copy from the example provided in the ExecutionPlan section)</p>
+<div class="codehilite"><pre>urn:execPlan
+    rdf:type ep:ExecutionPlan
+    ep:hasExecutionNode urn:node1, urn:node2, urn:node3, urn:node4, urn:node5
+    ep:chain &quot;demoChain&quot;
+
+urn:node1
+    rdf:type stanbol:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    stanbol:engine langId
+
+urn:node2
+    rdf:type ep:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    ep:dependsOn urn:node1
+    ep:engine ner
+
+urn:node3
+    rdf:type ep:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    ep:dependsOn urn:node1
+    ep:engine dbpediaLinking
+
+urn:node4
+    rdf:type ep:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    ep:dependsOn urn:node1
+    ep:engine geonamesLinking
+
+urn:node5
+    rdf:type ep:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    ep:engine zemanta
+    ep:optional &quot;true&quot;^^xsd:boolean
+</pre></div>
+
+
+<h2 id="creationmanagement-of-execution-metadata">Creation/Management of Execution Metadata</h2>
+<p>This section is primarily intended for implementors of EnhancementJobManager. However it might also provide insights for users that want/need to monitor the state of enhancement processes as it describes what information are added when to the Execution Metadata.</p>
+<p>When the <a href="enhancementjobmanager.html">EnhancementJobManager</a> starts the Enhancement of a ContentItem it needs to check if the <a href="contentitem.html">ContentItem</a> already contains ExecutionMetadata in the ContentPart with the URI "http://stanbol.apache.org/ontology/enhancer/executionmetadata#ChainExecution". If this is the case it needs to initialize itself based on the pre-existing information. If no ExecutionMetadata are present, a new EnhancementProcess needs to be created based on the parsed Chain. Differences between this two cases are explained in the following two sub sections.</p>
+<h3 id="initialization">Initialization</h3>
+<p>If no ExecutionMetadata are present within a parsed ContentItem, a new EnhancementProcess needs to be set up. This includes the following steps:</p>
+<ol>
+<li>Get the <a href="chains/executionplan.html">ExecutionPlan</a> for the parsed enhancement <a href="chains">Chain</a>. If no chain is parsed the default chain need to be acquired by using the <a href="chains/chainmanager.html">ChainManager</a>.</li>
+<li>Create the content part for the ExecutionMetadata with the <a href="contentitem.html">ContentItem</a> and add the information of the <a href="chains/executionplan.html">ExecutionPlan</a> to it.</li>
+<li>Create the initial ExecutionMetadata. This includes the 'em:ChainExecution' instance for the 'ep:ExecutionPlan' as well as 'em:EngineExecution' instances for all 'ep:ExecutionNode's defined by the execution plan. All such 'em:Execution' instances MUST BE created with the 'em:ExecutionStatus' 'em:StatusSheduled'.</li>
+</ol>
+<p>The ExecutionMetadataHelper utility of the "org.apache.stanbol.enhancer.servicesapi" module contains utility methods for initializing execution metadata.</p>
+<h3 id="continuation">Continuation</h3>
+<p>If the parsed ContentItem does already contain ExecutionMetadata in the content part with the URI "http://stanbol.apache.org/ontology/enhancer/executionmetadata#ChainExecution" the EnhancementJobManager MUST follow the following steps to continue an EnhancementProcess.</p>
+<ol>
+<li>Check if the contained ExecutionMetadata are valid<ul>
+<li>If a 'em:ChainExecution' node is present that 'em:enhances' the parsed ContentItem</li>
+<li>If the ExecutionPlan is included and if the value of the 'ep:chain' property for the 'ep:ExecutionPlan' resource corresponds to the name of the Chain parsed in the request.</li>
+</ul>
+</li>
+<li>Check the status of all 'em:Execution' instances<ul>
+<li>reset the status of 'em:Execution's that are in-progress to scheduled.</li>
+<li>TODO: here we could also retry the execution of failed 'em:Execution's</li>
+</ul>
+</li>
+</ol>
+<p>Note that with an continuation the ExecutionPlan MUST NOT be updated. It MUST BE also NOT checked if a Chain with the name as stored in the ExecutionMetadata is still present. Note also that configuration changes of EnhancementEngine will affect the continuation of the enhancement process.</p>
+<p>The ExecutionMetadataHelper utility of the "org.apache.stanbol.enhancer.servicesapi" module contains utility methods for reading and validating pre-existing execution metadata.</p>
+<h3 id="execution-state-management">Execution State Management</h3>
+<p>The following metadata need to be updated by the EnhancementJobManager when:</p>
+<ul>
+<li>Enhancement process starts<ul>
+<li>set the 'em:status' of the 'em:ChainExecution' to 'em:StatusInProgress'</li>
+<li>set the 'em:started' to the current date time</li>
+</ul>
+</li>
+<li>EnhancementEngine execution starts:<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusInProgress'</li>
+<li>set the 'em:started' to the current date time</li>
+</ul>
+</li>
+<li>EnhancementEngine completes<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusCompleted'</li>
+<li>set the 'em:completed' to the current date time</li>
+</ul>
+</li>
+<li>Optional EnhancementEngine not available<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusSkipped'</li>
+<li>set both 'em:started' and 'em:completed' to the current date time</li>
+</ul>
+</li>
+<li>Optional EnhancementEngine failed<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusFailed'</li>
+<li>set the 'em:completed' to the current date time</li>
+</ul>
+</li>
+<li>Required EnhancementEngine failed or not available<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusFailed'</li>
+<li>set the 'em:status' of the 'em:ChainExecution' to 'em:StatusFailed'</li>
+<li>set the 'em:completed' of both the engine and the chain execution to the current date time</li>
+</ul>
+</li>
+<li>Enhancement process completes<ul>
+<li>set the 'em:status' of the 'em:ChainExecution' to 'em:StatusCompleted'</li>
+<li>set the 'em:completed' to the current date time</li>
+</ul>
+</li>
+<li>Internal error in the EnhancementJobManager implementation<ul>
+<li>set the 'em:status' of the 'em:ChainExecution' to 'em:StatusFailed'</li>
+<li>do not set any 'em:EngineExecution' to failed.</li>
+<li>set the 'em:completed' value of the 'em:ChainExecution' to the current date time</li>
+</ul>
+</li>
+</ul>
+<p>The ExecutionMetadataHelper utility of the "org.apache.stanbol.enhancer.servicesapi" module contains utility methods to preform state transitions on 'em:Execution' instances.</p>
+<h2 id="using-executionmetadata">Using ExecutionMetadata</h2>
+<p>This section provides some examples on how to access and retrieve information from the ExecutionMetadata.</p>
+<h3 id="accessing-executionmetadata">Accessing ExecutionMetadata</h3>
+<p>The ExecutionMetadata and the <a href="chains/executionplan.html">ExecutionPlan</a> are stored in a content part with with URI "http://stanbol.apache.org/ontology/enhancer/executionmetadata#ChainExecution" with the <a href="contentitem.html">ContentItem</a>. The following code segment can be used to retrieve the RDF graph with the ExecutionMetadata:</p>
+<div class="codehilite"><pre><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">;</span> <span class="c1">//the ContentItem</span>
+<span class="c1">//the URI is available as constant of the ExecutionMetadata class</span>
+<span class="n">UriRef</span> <span class="n">contentPartURI</span> <span class="o">=</span> <span class="n">ExecutionMetadata</span><span class="o">.</span><span class="na">CHAIN_EXECUTION</span><span class="o">;</span>
+
+<span class="n">MGraph</span> <span class="n">executionMetadata</span> <span class="o">=</span> <span class="n">ci</span><span class="o">.</span><span class="na">getPart</span><span class="o">(</span><span class="n">contentPartURI</span><span class="o">,</span><span class="n">MGraph</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
+</pre></div>
+
+
+<p>The ExecutionMetadata are stored as read-/writeable RDF graph. To parse a read-only version to other components one can use the "getGraph()" method defined by MGraph.</p>
+<h3 id="getting-details-about-the-emchainexecution">Getting details about the em:ChainExecution</h3>
+<p>The following code segments show how to access information about the execution of the enhancement process for a <a href="contentitem.html">ContentItem</a>. All directly accessed methods in the examples below are static imports from one of the following two utility classes part of the "org.apache.stanbol.enhancer.servicesapi" module.</p>
+<ul>
+<li>ExecutionPlanHelper: Utility class that provides methods for reading and creating <a href="chains/executionplan.html">ExecutionPlan</a>.</li>
+<li>ExecutionMetadataHelper: Utility class for reading and manipulating the ExecutionMetadata</li>
+<li>EnhancementEngineHelper: Utility that contains general purpose RDF utilities.</li>
+</ul>
+<p>This code example first gets the ChainExecution, ExecutionPlan and Chain name for the enhanced content item. In a second step metadata of all executed EnhancementEngines are retrieved.</p>
+<div class="codehilite"><pre><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">;</span> <span class="c1">//the ContentItem</span>
+<span class="n">MGraph</span> <span class="n">em</span><span class="o">;</span> <span class="c1">//the ExecutionMetadata</span>
+
+<span class="c1">//get the ChainExecution, ExecutionPlan and the name of the Chain</span>
+<span class="n">NonLiteral</span> <span class="n">ce</span> <span class="o">=</span> <span class="n">getChainExecution</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ci</span><span class="o">.</span><span class="na">getUri</span><span class="o">());</span>
+<span class="k">if</span><span class="o">(</span><span class="n">ce</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">){</span>
+    <span class="n">NonLiteral</span> <span class="n">ep</span> <span class="o">=</span> <span class="n">getExecutionPlan</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ce</span><span class="o">);</span>
+    <span class="n">String</span> <span class="n">chainName</span> <span class="o">=</span> <span class="n">getString</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ep</span><span class="o">,</span><span class="n">ExecutionPlan</span><span class="o">.</span><span class="na">CHAIN</span><span class="o">);</span>
+<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
+    <span class="n">log</span><span class="o">.</span><span class="na">warn</span><span class="o">(</span><span class="s">&quot;ExecutionMetadata of not contain information for &quot;</span>
+        <span class="o">+</span> <span class="s">&quot;ContentItem {}!&quot;</span><span class="o">,</span><span class="n">ci</span><span class="o">.</span><span class="na">getUri</span><span class="o">());</span>
+<span class="o">}</span>
+
+<span class="c1">//get the EngineExecutions and the name of the Engines</span>
+<span class="n">Set</span><span class="o">&lt;</span><span class="n">NonLiteral</span><span class="o">&gt;</span> <span class="n">executions</span> <span class="o">=</span> <span class="n">getExecutions</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ce</span><span class="o">);</span>
+<span class="k">for</span><span class="o">(</span><span class="n">NonLiteral</span> <span class="n">ex</span> <span class="o">:</span> <span class="n">executions</span><span class="o">){</span>
+    <span class="n">NonLiteral</span> <span class="n">en</span> <span class="o">=</span> <span class="n">getExecutionNode</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ex</span><span class="o">);</span>
+    <span class="k">if</span><span class="o">(</span><span class="n">en</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">){</span>
+        <span class="n">String</span> <span class="n">engineName</span> <span class="o">=</span> <span class="n">getEngine</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">en</span><span class="o">);</span>
+        <span class="kt">boolean</span> <span class="n">optional</span> <span class="o">=</span> <span class="n">isOptional</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">en</span><span class="o">);</span>
+    <span class="o">}</span> <span class="k">else</span> <span class="o">{</span> <span class="c1">//maybe a sub-chain execution</span>
+        <span class="c1">//currently not supported, but might</span>
+        <span class="c1">//added in future versions</span>
+    <span class="o">}</span>
+    <span class="n">UriRef</span> <span class="n">status</span> <span class="o">=</span> <span class="n">getStatus</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ex</span><span class="o">);</span>
+    <span class="n">Date</span> <span class="n">started</span> <span class="o">=</span> <span class="n">getStarted</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ex</span><span class="o">);</span>
+    <span class="n">Date</span> <span class="n">completed</span> <span class="o">=</span> <span class="n">getCompleted</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ex</span><span class="o">);</span>
+<span class="o">}</span>
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/executionmetadata.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/executionmetadata.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/index.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/index.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/index.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,176 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Stanbol Enhancer</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Stanbol Enhancer</h1>
+    <p>The Apache Stanbol Enhancer provides both a RESTful and a Java API that allows a caller to extract features from parsed content. In more detail the parsed content is processed by <a href="engines">Enhancement Engines</a> as defined by the called <a href="chains">Enhancement Chain</a>.</p>
+<h2 id="using-the-stanbol-enhancer">Using the Stanbol Enhancer</h2>
+<p>The figure below provides an overview of the RESTful as well as the Java API provided by the Stanbol Enhancer</p>
+<p><img alt="Stanbol Enhancer Overview" src="enhanceroverview-s.png" title="Overview of RESTful Services and Java API provided by the Stanbol Enhancer" /></p>
+<h3 id="restful-service">RESTful service</h3>
+<p>The content to be analyzed should be sent in a POST request with the mime-type specified in the Content-type header. The response will hold the RDF enhancement serialized in the format specified in the Accept header:</p>
+<div class="codehilite"><pre>curl -X POST -H <span class="s2">&quot;Accept: text/turtle&quot;</span> -H <span class="s2">&quot;Content-type: text/plain&quot;</span> <span class="se">\</span>
+    --data <span class="s2">&quot;The Stanbol enhancer can detect famous cities such as \</span>
+<span class="s2">            Paris and people such as Bob Marley.&quot;</span> <span class="se">\</span>
+    http://localhost:8080/enhancer
+</pre></div>
+
+
+<p>The RESTful interface also provides parameters that can be used to parse/request additional information. The following example shows a request which answers with the plain/text version of the parsed HTML content.</p>
+<div class="codehilite"><pre>curl -v -X POST -H <span class="s2">&quot;Accept: text/plain&quot;</span> <span class="se">\</span>
+    -H <span class="s2">&quot;Content-type: text/html; charset=UTF-8&quot;</span> <span class="se">\</span>
+    --data <span class="s2">&quot;&lt;html&gt;&lt;body&gt;&lt;p&gt;The Stanbol enhancer can detect famous cities \</span>
+<span class="s2">            such as Paris and people such as Bob Marley.&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;&quot;</span> <span class="se">\</span>
+    <span class="s2">&quot;http://localhost:8080/enhancer/chain/language?omitMetadata=true&quot;</span>
+</pre></div>
+
+
+<p>For detailed information please see the documentation of the <a href="enhancerrest.html">Stanbol Enhancer RESTful Services</a>. A short version is also provided under the REST API link of the Stanbol Web UI (e.g. <a href="http://localhost:8080/enhancer">http://localhost:8080/enhancer</a> assuming that Apache Stanbol runs on localhost:8080).</p>
+<h3 id="java-api">Java API</h3>
+<p>The usage of the Java API requires the following OSGI Services</p>
+<div class="codehilite"><pre><span class="nd">@Reference</span>
+<span class="n">EnhancementJobManager</span> <span class="n">jobManager</span><span class="o">;</span>
+<span class="nd">@Reference</span>
+<span class="n">ChainManager</span> <span class="n">chainManager</span><span class="o">;</span>
+</pre></div>
+
+
+<p>This code snipped shows how to enhance an HTML document</p>
+<div class="codehilite"><pre><span class="n">InputStream</span> <span class="n">content</span><span class="o">;</span> <span class="c1">//the content (assuming an HTML document)</span>
+<span class="n">String</span> <span class="n">chainName</span><span class="o">;</span> <span class="c1">//the name of the chain or null to use the default</span>
+<span class="n">ContentItem</span> <span class="n">contentItem</span> <span class="o">=</span> <span class="k">new</span> <span class="n">InMemoryContentItem</span><span class="o">(</span>
+    <span class="n">IOUtils</span><span class="o">.</span><span class="na">toByteArray</span><span class="o">(</span><span class="n">content</span><span class="o">),</span> <span class="s">&quot;text/html; charset=UTF-8&quot;</span><span class="o">);</span>
+<span class="c1">//get the EnhancementChain</span>
+<span class="n">Chain</span> <span class="n">enhancementChain</span><span class="o">;</span>
+<span class="k">if</span><span class="o">(</span><span class="n">chainName</span> <span class="o">==</span> <span class="kc">null</span><span class="o">){</span>
+    <span class="n">enhancementChain</span> <span class="o">=</span> <span class="n">chainManager</span><span class="o">.</span><span class="na">getDefault</span><span class="o">();</span>
+<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
+    <span class="n">enhancementChain</span> <span class="o">=</span> <span class="n">chainManager</span><span class="o">.</span><span class="na">getChain</span><span class="o">(</span><span class="n">chainName</span><span class="o">);</span>
+<span class="o">}</span>
+<span class="k">try</span> <span class="o">{</span> <span class="c1">//enhance the content</span>
+    <span class="n">jobManager</span><span class="o">.</span><span class="na">enhanceContent</span><span class="o">(</span><span class="n">contentItem</span><span class="o">,</span> <span class="n">enhancementChain</span><span class="o">);</span>
+<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="n">EnhancementException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{}</span>
+
+<span class="c1">//Get the enhancement Results</span>
+<span class="n">MGraph</span> <span class="n">enhancements</span> <span class="o">=</span> <span class="n">contentItem</span><span class="o">.</span><span class="na">getMetadata</span><span class="o">();</span>
+</pre></div>
+
+
+<p>After the enhancement process, ContentItems do not only contain the metadata but also other informations such as converted versions of the parsed content. The following code snippet shows how to retrieve the text version of the parsed HTML content such as created by the <a href="engines/metaxaengine.html">Metaxa Engine</a>.</p>
+<div class="codehilite"><pre><span class="n">Entry</span><span class="o">&lt;</span><span class="n">UriRef</span><span class="o">,</span><span class="n">Blob</span><span class="o">&gt;</span> <span class="n">textContentPart</span> <span class="o">=</span> 
+        <span class="n">ContentItemHelper</span><span class="o">.</span><span class="na">getBlob</span><span class="o">(</span><span class="n">contentItem</span><span class="o">,</span> 
+            <span class="n">Collections</span><span class="o">.</span><span class="na">singleton</span><span class="o">(</span><span class="s">&quot;text/plain&quot;</span><span class="o">));</span>
+<span class="n">Blob</span> <span class="n">testBlob</span> <span class="o">=</span> <span class="n">textContentPart</span><span class="o">.</span><span class="na">getValue</span><span class="o">();</span>
+<span class="n">String</span> <span class="n">charset</span> <span class="o">=</span> <span class="n">testBlob</span><span class="o">.</span><span class="na">getParameter</span><span class="o">().</span><span class="na">get</span><span class="o">(</span><span class="s">&quot;charset&quot;</span><span class="o">);</span>
+<span class="n">String</span> <span class="n">plainText</span> <span class="o">=</span> <span class="n">IOUtils</span><span class="o">.</span><span class="na">toString</span><span class="o">(</span>
+    <span class="n">textContentPart</span><span class="o">.</span><span class="na">getValue</span><span class="o">().</span><span class="na">getStream</span><span class="o">(),</span>
+    <span class="n">charset</span> <span class="o">==</span> <span class="kc">null</span> <span class="o">?</span> <span class="s">&quot;UTF-8&quot;</span> <span class="o">:</span> <span class="n">charset</span><span class="o">);</span>
+</pre></div>
+
+
+<h2 id="list-of-available-enhancement-engines">List of Available Enhancement Engines</h2>
+<p>Apache Stanbol comes with a <a href="engines/list.html">list of enhancement engines implementations</a>. These engines are supported by the Apache Stanbol community. If you would like to implement your own enhancement engine, you should go on reading this documentation.</p>
+<h2 id="main-interfaces-and-utilities">Main Interfaces and Utilities</h2>
+<ul>
+<li><strong>ContentItem</strong>: A <a href="contentitem.html">content item</a> is the unit of content the Stanbol Enhancer can deal with. It gives access to the binary content that was registered, and the graph that represents its metadata (provided by client and/or generated). </li>
+<li><strong>EnhancementEngine</strong>: The <a href="engines">enhancement engine</a> provides the interface to internal or external semantic enhancement engines. Typically content items will be processed by several enhancement engines.</li>
+<li><strong>EnhancementChain</strong>: An <a href="chains">enhancement chain</a> represents a user provided configuration which describes how <a href="contentitem.html">content items</a> parsed to this chain should be processed by the Stanbol Enhancer. The chain defines a list of <a href="engines/list.html">available enhancement engines</a> and their order of execution.</li>
+<li><strong>EnhancementJobManager</strong>: The <a href="enhancementjobmanager.html">enhancement job manager</a> performs the execution of the enhancement process as described in the <a href="chains/executionplan.html">execution plan</a> provided by the <a href="chains">enhancement chain</a>. The enhancement job manager is also responsible for recording the <a href="executionmetadata.html">execution metadata</a>.</li>
+<li><strong>ChainManager</strong>: The <a href="chains/chainmanager.html">chain manager</a> allows to lookup all configured enhancement chains. It also provides a getter for the default chain.</li>
+<li><strong>EnhancementEngineManager</strong>: The <a href="engines/enhancementenginemanager.html">enhancement engine manager</a> allows to lookup active enhancement engines by their name.</li>
+</ul>
+<p><em>Note that the "org.apache.stanbol.enhancer.servicesapi" module also provides a set of "**Helper" utility classes (e.g. ContentItemHelper, EnhancementEngineHelper …). It is highly recommended for users to use the functionality provided by such helpers when working with the according classes of the Stanbol Enhancer.</em></p>
+<h2 id="enhancement-structure">Enhancement Structure</h2>
+<p>The enhancement structure for Apache Stanbol is been described <a href="http://wiki.iks-project.eu/index.php/EnhancementStructure">here</a> in full. It defines the types and properties used for the resulting metadata graph of Apache Stanbol.</p>
+<p><em>Note: The currently used Enhancement Structure was defined before the incubation to Apache. There is a proposal and ongoing discussion to update this structure in the future however the decision was to keep the current Structure until a first Release.</em></p>
+<p>Each enhancement type description which contains the following important properties:</p>
+<ul>
+<li>creator: the specific enhancement engine creating this enhancement</li>
+<li>creation time: the local system time, when the annotation was created</li>
+<li>extracted-from: the content item for the enhancement. This links to the ID of the content item as assigned by Apache Stanbol.</li>
+<li>type: the type of the enhancement (e.g. Location, Person, Location, Concept ...).</li>
+<li>confidence: The level of confidence in the range from 0 to 1 </li>
+</ul>
+<p>A text annotation type provides metadata for the selected text. This is intended to be used in addition to the enhancement type if an enhancement is based on a part of the content.</p>
+<ul>
+<li>start: the character position of the start of the selection. If start is not defined it is assumed, that the selection starts at the beginning of the document</li>
+<li>end: the character position of the end of the selection. If end is not defined it is assumed, that the selection ends at the end of the document.</li>
+<li>selected-text: The text selected by the enhancement. (optional).</li>
+<li>selection-context: The context of the selected text. This adds the possibility to specify the context used to extract entities such as persons, organizations, locations ... from natural language documents.</li>
+</ul>
+<p>The entity annotation type refers to named entities which have been recognized within the content. This type is intended to be used together with the FISE enhancement type.</p>
+<ul>
+<li>entity-reference: This refers to the URI identifying the Entity</li>
+<li>entity-label: The label(s) of the referred entity</li>
+<li>entity-type: This property can be used to specify the type of the entity (optional) </li>
+<li>The occurrences of the entity within the content (the exact positions within the text where this entity is referred) are determined by outgoing dc:relation links.</li>
+</ul>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/ses_annotationontology.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/ses_annotationontology.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/ses_annotationontology.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,191 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - The Stanbol Enhancement Structure (PROPOSAL)</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">The Stanbol Enhancement Structure (PROPOSAL)</h1>
+    <p>Please NOTE: This is a proposal for the future version of the Enhancement Structure used by the Stanbol Enhancer. This <strong>DOES NOT</strong> describe the Enhancement Structure used by the current version of the Stanbol Enhancer!</p>
+<h2 id="background">Background</h2>
+<p>This proposal is aimed to define the "Stanbol Enhancement Structure" intended to be used by future version of the Stanbol Enhancer to encode Knowledge extracted from analyzed Documents.</p>
+<p>Currently the Stanbol Enhancer still uses the <a href="http://wiki.iks-project.eu/index.php/EnhancementStructure">FISE Enhancement Structure</a> that dates back before the incubation of Stanbol to Apache. This proposal now suggest to base the "Stanbol Enhancement Structure" on the existing <a href="http://code.google.com/p/annotation-ontology/wiki/Homepage">Annotation-Ontology</a>.</p>
+<p>The following two sections provide a short overview about the currently used FISE Enhancement Structure as well as the Annotation-Ontology. As this information is critical to understand the suggestion made in the later parts of this document.</p>
+<h3 id="fise-enhancement-structure">FISE Enhancement Structure</h3>
+<p>The FISE Enhancement Structure defines three main Concepts:</p>
+<ol>
+<li><strong>FISE Enhancement</strong>: Defines Metadata about the creation process, type of the Enhancement as well as relations to other Enhancements.</li>
+<li><strong>FISE Text Annotation</strong>: Defines a selections within enhanced plain Text. Annotations about other content types are not defined.</li>
+<li><strong>FISE Entity Annotation</strong>: Defines an annotation about an Entity.</li>
+</ol>
+<p>Each Annotation created by an Enhancement Engine MUST have the FISE Enhancement type as well as one of FISE Text Annotation or FISE Entity Annotation.</p>
+<p>The typical use is as follows:</p>
+<ul>
+<li>A Text Annotation is used to define the annotated part of the document. Text Annotations do use the dc:type property to define the type of the extracted entity (e.g. as provided by Named Entity Recognition). </li>
+<li>A Entity Annotation is used to suggest Entities for a Text Annotation. </li>
+<li>Properties of the Enhancement are used to link the Text Annotation with the suggested Entity Annotations.</li>
+<li>Enhancement Engines may also add knowledge about suggested entities (dereferencing of entities).</li>
+</ul>
+<p>Annotations like Keywords, Categories ... where discussed but never formally defined for the FISE Enhancement Structure.</p>
+<h3 id="annotation-ontology">Annotation-Ontology</h3>
+<p>This Proposal describes how Stanbol can used the <a href="http://code.google.com/p/annotation-ontology/wiki/Homepage">Annotation-Ontology</a> for encoding Enhancements. </p>
+<p>From the Annotation-Ontology homepage:</p>
+<blockquote>
+<p>Annotation Ontology (AO) is a vocabulary designed to extensively reuse existing domain ontologies (entities annotations or semantic tags) and to provide several other kind of annotations - comments, textual annotation (classic tags), notes, examples, erratum... - on potentially any kind of document (text, images, audio...) and document fragments.</p>
+</blockquote>
+<p>The following Figure gives an overview about the Annotation-Ontology as it shows a simple tagging like annotation of an whole document.</p>
+<blockquote>
+<p><img alt="Example of annotation on a whole document with AO" src="http://annotation-ontology.googlecode.com/svn/trunk/images/Document%20Annotation%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png" title="Example of annotation on a whole document with AO" /></p>
+<p>Image Credit: Annotation-Ontology <a href="http://annotation-ontology.googlecode.com/svn/trunk/images/Document%20Annotation%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png">Link</a></p>
+</blockquote>
+<h2 id="stanbol-enhancement-structure">Stanbol Enhancement Structure</h2>
+<p>The following sections describe how the Stanbol Enhancement Structure can utilize the Annotation-Ontology to encode knowledge extracted from analyzed Content Items.</p>
+<h3 id="contentitems">ContentItems</h3>
+<p>Within the FISE Enhancement Structure the enhanced ContentItems where only referenced by the <strong>fise:extracted-from</strong> property. There was no specification on how to further define properties of the ContentItem. The Annotation-Ontology defines a much richer vocabulary for that.</p>
+<p>First an most important the Annotation-Ontology distinguished between the:</p>
+<ul>
+<li><strong>Annotated Document</strong>: This is the Document that is annotated</li>
+<li><strong>Source Document</strong>: This is the Document version that was used for the annotation process.</li>
+</ul>
+<blockquote>
+<p><img alt="Source Documents" src="http://annotation-ontology.googlecode.com/svn/trunk/images/Source%20Document%202%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png" title="Document Annotations" /></p>
+<p>Image Credit: Annotation Ontology <a href="http://annotation-ontology.googlecode.com/svn/trunk/images/Source%20Document%202%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png">Link</a></p>
+</blockquote>
+<p>As an example: If a Web-Crawler crawls a site on the Web and stores a local copy for indexing, than the <strong>Annotated Document</strong> would use the URL of the document on the Web. The <strong>Source Document</strong> would be the ID of the locally cached version used for the enhancement process.</p>
+<h4 id="content-adapter-and-source-documents">Content Adapter and Source Documents:</h4>
+<p>The Content Adapter pattern was suggested to be used to convert parsed documents to different Content Formats such as extracting the Plain Text of parsed HTML or PDF documents.</p>
+<p>The possibility to distinguish between the <em>Annotated Document</em> and the <em>Source Document</em> nicely supports this, because while Enhancement Engines can state that an Annotation is about the <em>Annotated Document</em> they can still state the exact <em>Source Document</em> that was used for processing. This allows e.g. to clearly state that the indexes of a text selection are based on the plain text version of the <em>Annotated Document</em>. </p>
+<h3 id="content-selectors">Content Selectors</h3>
+<p>The FISE Enhancement Structure defined a single "Content Selector" the <em>FISE Text Annotation</em>. The Annotation-Ontology uses a much richer Structure that even provides the possibility to extensions for defining specific selections on different content types.</p>
+<p>With the Annotation-Ontology each Selector can link to both the <em>Annotated Document</em> and the <em>Source Document</em>. In the following an Example for an Image Selection</p>
+<blockquote>
+<p><img alt="Image Selector" src="http://annotation-ontology.googlecode.com/svn/trunk/images/Image%20InitEndCorner%20Selector%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png" title="Image Selector Example" /></p>
+<p>Image Credits: Annotation-Ontology <a href="http://annotation-ontology.googlecode.com/svn/trunk/images/Image%20InitEndCorner%20Selector%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png">Link</a>.</p>
+</blockquote>
+<h4 id="text-selectors">Text Selectors</h4>
+<p>The currently used FISE TextAnnotation differs from text selects of the Annotation-Ontology mainly in that it defines both the actual annotation AND the selection within the text. Therefore when adopting the "Annotation -&gt; Selector" model or the Annotation-Ontology all Annotation related properties of the FISE TextAnnotation must be separated from the properties describing the selection.</p>
+<p>The Annotation-Ontology defines two text selectors: (1) the "OffsetRangeSelector" that uses char offset within the text to define a selection and (2) the "PrefixPostfixSelector" that uses a prefix, suffix and the selected text to define the selection based on the context. The Stanbol Enhancer currently uses both (context and offset) to define selection. However currently only single property "context" is used instead of the prefix, suffix model of the "PrefixPostfixSelector". In general the prefix, postfix based context definition as used by the Annotation-Ontology is better, because is allows to uniquely determine the selected part of the text even if the selected text appears multiple times within a given context. With the currently used model it is not possible to do that if the selected text appears several times in the provided context. </p>
+<p>The suggestion is to keep both (offset and context) based definition of text selection but switch to the prefix, suffix model for defining the context. Therefore stanbol:TextSelector will be defined as sub-class of both "OffsetRangeSelector" and "PrefixPostfixSelector".</p>
+<h4 id="multi-media-selectors-and-the-media-fragments-standard">Multi Media Selectors and the Media Fragments Standard</h4>
+<p>The <a href="http://www.w3.org/2008/WebVideo/Fragments/">Media Fragments Working Group</a> of the W3C is currently working on a Recommendation on how to encode Fragments of Resources within so called <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/">Media Fragments URIs</a>.</p>
+<p>This specification defines how to encode the <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-time">Temporal</a>, <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-space">Spatial</a>, <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-track">Track</a> and <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-id">ID</a> dimensions within Document URIs but also defines processing rules (e.g. for Browsers) and the semantics.</p>
+<p>The proposal here is to use this specification for encoding selections within multi media files within the Annotation-Ontology. This will most likely require the definition of an MediaFragmentSelector as extension.</p>
+<h3 id="annotations">Annotations</h3>
+<p>The FISE Enhancement Structure uses both properties of FISE Enhancements and FISE TextAnnotation/EntityAnnotation to describe Annotations as defined by the Annotation-Ontology. On the other side some properties of the FISE TextAnnotation are part of the Selectors within the Annotation-Ontology. Because of that the switch to the Annotation-Ontology will not only mean a change in the used Vocabulary, but also bring some structural changes. </p>
+<p>Annotations as defined by the Annotation-Ontology are structured as follows:</p>
+<ul>
+<li>An Annotation is represented by a Resource (called Annotation-Resource in the remaining document) with the rdf:type ao:Annotation. Special types of Annotations can be introduced by subclasses of ao:Annotation.</li>
+<li>The Annotation-Resource may be linked to an Selector with the <strong>ao:context</strong> property. If no such link is present the Annotation-Resource is about the whole Document. It is also possible to link multiple Selectors with an annotation.</li>
+<li>Each Annotation-Resource MUST BE linked to the <em>Annotated Document</em> by using the <strong>ao:annotatesResource</strong> property. The <em>Source Document</em> can be referenced by using the <strong>ao:onSourceDocument</strong>. It is also possible to link multiple Documents with an annotation.</li>
+</ul>
+<p>The following sub-sections will provide an overview how Text Annotations, Entity Annotations and Category Annotations as used by Stanbol can be expressed using the Annotation-Ontology</p>
+<h4 id="text-annotations">Text Annotations</h4>
+<p>Text Annotations are Annotations as typically created by NER (Named Entity Recognition) engines. Such Annotations select a part of a Text and assign a type (Person, Organization, Place ...) to that.</p>
+<p>The text selection can be expressed by using an "PrefixPostfixSelector". The type and the confidence of the detected named entity need to be properties of the Annotation class.</p>
+<div class="codehilite"><pre><span class="err">stanbol:TextAnnotation</span> <span class="err">rdfs:subClassOf</span> <span class="err">ao:Annotation</span>
+<span class="err">stanbol:TextAnnotation</span> <span class="err">stanbol:named-entity-type</span> <span class="err">{schema:Perosn,</span> <span class="err">schema:Organization,</span> <span class="err">schema:Place,</span> <span class="err">…}</span>
+</pre></div>
+
+
+<h4 id="entity-annotations">Entity Annotations</h4>
+<p>Entity Annotations are similar to "Qualifier" annotations as defined to the Annotaiton-Ontology. The <em>ao:hasTopic</em> relation is used to link the annotation with the related topic.</p>
+<div class="codehilite"><pre><span class="err">stanbol:EntityAnnotation</span> <span class="err">rdfs:subClassof</span> <span class="err">aot:Qualifier,</span> <span class="err">ao:Annotation</span>
+</pre></div>
+
+
+<h4 id="category-anotations">Category Anotations</h4>
+<p>Category Annotations are typically about the whole or an specific section of an Document. Normal Selectors can be used for defining the categorized Section. If no Selector is present the categorization applies to the whole document. The "Qualifier" annotation could also be used as a base class for categorizations.</p>
+<h3 id="annotation-sets">Annotation Sets</h3>
+<p>Within the Annotation-Ontologies Annotation Sets can be used to group several Annotations together. Although the FISE Enhancement Structure does not explicitly define a similar possibility the possibilities to define relations between FISE Enhancements are used for a similar purpose by the Stanbol Enhancer. Therefore the suggestion is to use this feature of the Annotation-Ontology to model for expressing sets of possible Categories, suggestions of Entities.</p>
+<p>The following figure shows an Example for an Annotation Set with a single Annotation</p>
+<blockquote>
+<p><img alt="Annotation sets" src="http://annotation-ontology.googlecode.com/svn/trunk/images/Annotation%20Set%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png" title="A simple Annotation Set with a single Annotation" /></p>
+<p>Image Credits: Annotation-Ontology <a href="http://annotation-ontology.googlecode.com/svn/trunk/images/Annotation%20Set%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png">Link</a></p>
+</blockquote>
+<p>This suggests the use of Annotation Sets to formally describe situations where the Stanbol Enhancer need group several Annotations in order to provide users the possibility to select from a predefined set of options. Assigning an unique ID - the URI of the AnnotationSet instance - to such a collection of Annotations brings also the possibility for the consumer to provide explicit feedback to the Stanbol Enhancer (e.g. by accepting/rejecting Annotations part of the AnnotationSet, adding an additional Annotation to an set, ...)</p>
+<p>Note that single Annotations might be part of several annotation sets. As an Example take an Text Annotation for that to sets of Entity suggestions are generated.</p>
+<p>The suggestion is to create subclasses for common types of Annotation Sets uses by the Stanbol Enhancer</p>
+<h4 id="entity-suggestions">Entity Suggestions</h4>
+<p>With the FISE Enhancement Structure this is expressed by a <em>fise:TextAnnotation</em> that is linked to several <em>fise:EntityAnnotation</em>s by the <em>dc:relation</em> property.</p>
+<p>Expressing the same based on the Annotation-Ontology would be possible by</p>
+<ul>
+<li>An Annotation Set that links to the following Annotations (by the <em>ao:item</em> property):</li>
+<li>A TextAnnotation uses a stanbol:TextSelector to define the actual selected position of the selected text within the document</li>
+<li>One EntityAnnotation (extends ao:Qualifier) per suggested Entities.</li>
+<li>In addition the Annotation Set also includes metadata such the the Engine that created the suggestions</li>
+</ul>
+<p><strong>OPTIONS</strong></p>
+<ul>
+<li>Allow multiple TextAnnotations: This would allow to suggest the same set of Entities to all TextAnnotations. However it would make it also more difficult to express if a user would accept a suggestion for one TextAnnotation but reject the same for an other. In addition Users might even accept different suggestions for different included TextAnnotation. (see also <em>Coreference Suggestions</em>)</li>
+</ul>
+<h4 id="category-suggestions">Category Suggestions</h4>
+<p>Typically categorizations can provide more than a single Category. So grouping such suggestions within an AnnotationSet gives Users the possibility to accept/reject one or more of such suggestions. In addition it would also allow to distinguish sets of categorizations calculated based on disjoint sets of categories (e.g. a categorization based on a UserProfile with a categorization based on general topics or a spatial categorization).</p>
+<h4 id="coreference-suggestion">Coreference Suggestion</h4>
+<p>This would allow to link several Text Annotations to suggest a co-reference between those two. This kind of AnnotationSet is expected to be used by NLP (Natural Language Processing) frameworks that can detect co-references. It might be also of interest for Engines that suggest Entities but keep an Annotation Context and therefore want to link persons only referred by the given or family name to an other occurrence that uses both.</p>
+<p>The type of the coreference could be captured by an special property of this annotation set type.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/setting-enhancement.owl
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/setting-enhancement.owl
------------------------------------------------------------------------------
    svn:mime-type = application/xml