You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/07/16 15:02:48 UTC

svn commit: r825985 [6/12] - in /websites/staging/stanbol/trunk/content: ./ stanbol/docs/trunk/ stanbol/docs/trunk/cmsadapter/ stanbol/docs/trunk/components/ stanbol/docs/trunk/components/cmsadapter/ stanbol/docs/trunk/components/contenthub/ stanbol/do...

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancerrest.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancerrest.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/enhancerrest.html Mon Jul 16 13:02:45 2012
@@ -0,0 +1,442 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Stanbol Enhancer RESTful Services</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a><ul>
+<li><a href="/stanbol/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/stanbol/docs/trunk/components.html">Components</a></li>
+</ul>
+</li>
+<li><a href="/stanbol/development/">Development</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/stanbol/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/downloads/">Overview</a><ul>
+<li><a href="/stanbol/downloads/releases.html">Releases</a></li>
+<li><a href="/stanbol/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/stanbol/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrump" style="font-size: 80%;">
+      <a href="/">Home</a>&nbsp;&raquo&nbsp;<a href="/stanbol/">Stanbol</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/">Trunk</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/">Components</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/">Enhancer</a>
+    </div>
+    <h1 class="title">Stanbol Enhancer RESTful Services</h1>
+    <p>The RESTful service endpoint provided by the Stanbol Enhancer is a stateless interface that allows the caller to submit content and get the resulting enhancements formatted as RDF at once without storing anything on the server-side. More advanced options also allow to parse pre-existing metadata, parse and request alternate content versions and additional metadata created by the Enhancer or specific Enhancement Engines.</p>
+<p>The RESTful interface described below is provided on several endpoints</p>
+<ul>
+<li><strong>'/enhancer':</strong> The main endpoint of the Stanbol Enhancer. Parsed content will get enhanced by using the default enhancement chain.</li>
+<li><strong>'/enhancer/chain/{chain-name}'</strong>: The Stanbol Enhancer supports the configuration of multiple <a href="chains">Enhancement Chains</a>. Users can lookup active chains by requests to the 'enhancer/chain' endpoint.</li>
+<li><strong>'/enhancer/engine/{engine-name}'</strong>: This can be used to enhance parsed Content with a single <a href="engines">Enhancement Engine</a>. Note that the parsed Content MUST be processable by the referenced engine. So if the engine is not able to directly process the parsed content you might need to send existing metadata such as explained in the section <a href="#parsing_multiple_contentparts">Parsing multiple ContentParts</a>. This feature is e.g. useful to directly send a MP3 file to the <a href="engines/tikaengine.html">TikaEnigne</a> to extract the metadata.</li>
+<li><strong>'/engines':</strong> Same as '/enhancer' this ensures backward compatibility to older Stanbol versions.</li>
+</ul>
+<h2 id="basic-enhancement-service">Basic Enhancement Service</h2>
+<p>This sections describes how to parse content to the Stanbol Enhancer which then gets analyzed. Results are sent back in the form of a serialized RDF graph.</p>
+<p>The content to analyze should be sent in a POST request with the mime-type specified in
+the <code>Content-type</code> header. The response will hold the RDF enhancement serialized in the format specified in the <code>Accept</code> header:</p>
+<div class="codehilite"><pre>curl -X POST -H <span class="s2">&quot;Accept: text/turtle&quot;</span> -H <span class="s2">&quot;Content-type: text/plain&quot;</span> <span class="se">\</span>
+    --data <span class="s2">&quot;The Stanbol enhancer can detect famous cities such as Paris \</span>
+<span class="s2">            and people such as Bob Marley.&quot;</span> <span class="se">\</span>
+    http://localhost:8080/enhancer
+</pre></div>
+
+
+<p>The list of mime-types accepted as inputs depends on the deployed engines. By default most Enhancement Engines can only process plain text content. However EnhancementEngines like <a href="engines/metaxaengine.html">Metaxa</a> can be used to create 'text/plain' versions of parsed content. This allows also to enhance contents with mime-types such as html, pdf and MS office documents (see the Metaxa documentation for details)</p>
+<p>Stanbol Enhancer is able to serialize the response in the following RDF formats:</p>
+<div class="codehilite"><pre>application/json (JSON-LD)
+application/rdf+xml (RDF/XML)
+application/rdf+json (RDF/JSON)
+text/turtle (Turtle)
+text/rdf+nt (N-TRIPLES)
+</pre></div>
+
+
+<h3 id="additional-parameters">Additional Parameters</h3>
+<ul>
+<li><strong>uri={content-item-uri}:</strong> By default the URI of the content item being enhanced is a local, non de-referencable URI automatically built out of a hash digest of the binary content. Sometimes it might be helpful to provide the URI of the <a href="contentitem.html">ContentItem</a> to be used in the enhancements RDF graph.</li>
+<li><strong>executionmetadata=true/false:</strong> Allows the include of <a href="executionmetadata.html">execution metadata</a> in the enhancement metadata of the response. Such data include also the <a href="chains/executionplan.html">execution plan</a> used to enhance the parsed content. This information is typically only useful to clients that want to know how the parsed content was processed by the enhancer. NOTE that the execution metadata can also be requested by using the multi-part content item API described below.</li>
+</ul>
+<p>The following example shows how to send an enhancement request with a custom content item URI that will include the execution metadata in the response.
+In addition this request is directed to a <a href="chains">Enhancement Chain</a> with the name "dbpedia-keyword"</p>
+<div class="codehilite"><pre>curl -X POST -H <span class="s2">&quot;Accept: text/turtle&quot;</span> -H <span class="s2">&quot;Content-type: text/plain&quot;</span> <span class="se">\</span>
+    --data <span class="s2">&quot;The Stanbol enhancer can detect famous cities such as Paris \</span>
+<span class="s2">            and people such as Bob Marley.&quot;</span> <span class="se">\</span>
+    <span class="s2">&quot;http://localhost:8080/enhancer/chain/dbpedia-keyword?uri=urn:fise-example-content-item&amp;executionmetadata=true&quot;</span>
+</pre></div>
+
+
+<h2 id="enhancer-configuration">Enhancer Configuration</h2>
+<p>The Stanbol Enhancer supports several RESTful services to inspect the configuration. This services allow to retrieve currently active <a href="engines">Enhancement Engines</a> and <a href="chains">Enhancement Chains</a>.</p>
+<ul>
+<li><strong>'/enhancer':</strong> GET requests to the main Stanbol Enhancer endpoint the do used an 'Accept' header compatible to one of the supported RDF serializations will return the current configuration as RDF.</li>
+<li><strong>'/enhancer/engine':</strong> Same as above however the response will only include active enhancement engines</li>
+<li><strong>'/enhancer/chain':</strong> Returns the currently active enhancement chains.</li>
+<li><strong>'/enhancer/sparql':</strong> SPARQL endpoint that allows to query the configuration.</li>
+</ul>
+<p>Example Response as 'application/rdf' serialization of the default configuration of the Stanbol Enhancer.</p>
+<p>The request</p>
+<div class="codehilite"><pre>curl -v -X GET -H <span class="s2">&quot;Accept: application/rdf+xml&quot;</span> <span class="s2">&quot;http://localhost:8080/enhancer/ep&quot;</span>
+</pre></div>
+
+
+<p>returns the following results</p>
+<div class="codehilite"><pre><span class="nt">&lt;rdf:RDF</span>
+    <span class="na">xmlns:rdf=</span><span class="s">&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;</span>
+    <span class="na">xmlns:j.0=</span><span class="s">&quot;http://stanbol.apache.org/ontology/enhancer/enhancer#&quot;</span>
+    <span class="na">xmlns:rdfs=</span><span class="s">&quot;http://www.w3.org/2000/01/rdf-schema#&quot;</span> <span class="nt">&gt;</span> 
+  <span class="nt">&lt;rdf:Description</span> <span class="na">rdf:about=</span><span class="s">&quot;http://localhost:8080/enhancer/engine/langid&quot;</span><span class="nt">&gt;</span>
+    <span class="nt">&lt;rdfs:label&gt;</span>langid<span class="nt">&lt;/rdfs:label&gt;</span>
+    <span class="nt">&lt;rdf:type</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://stanbol.apache.org/ontology/enhancer/enhancer#EnhancementEngine&quot;</span><span class="nt">/&gt;</span>
+  <span class="nt">&lt;/rdf:Description&gt;</span>
+  <span class="nt">&lt;rdf:Description</span> <span class="na">rdf:about=</span><span class="s">&quot;http://localhost:8080/enhancer&quot;</span><span class="nt">&gt;</span>
+    <span class="nt">&lt;rdf:type</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://stanbol.apache.org/ontology/enhancer/enhancer#Enhancer&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;j.0:hasEngine</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://localhost:8080/enhancer/engine/dbpediaLinking&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;j.0:hasEngine</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://localhost:8080/enhancer/engine/langid&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;j.0:hasEngine</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://localhost:8080/enhancer/engine/entityhubLinking&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;j.0:hasEngine</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://localhost:8080/enhancer/engine/tika&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;j.0:hasEngine</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://localhost:8080/enhancer/engine/metaxa&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;j.0:hasEngine</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://localhost:8080/enhancer/engine/ner&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;j.0:hasChain</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://localhost:8080/enhancer/chain/default&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;j.0:hasDefaultChain</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://localhost:8080/enhancer/chain/default&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;j.0:hasChain</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://localhost:8080/enhancer/chain/language&quot;</span><span class="nt">/&gt;</span>
+  <span class="nt">&lt;/rdf:Description&gt;</span>
+  <span class="nt">&lt;rdf:Description</span> <span class="na">rdf:about=</span><span class="s">&quot;http://localhost:8080/enhancer/chain/language&quot;</span><span class="nt">&gt;</span>
+    <span class="nt">&lt;rdfs:label&gt;</span>language<span class="nt">&lt;/rdfs:label&gt;</span>
+    <span class="nt">&lt;rdf:type</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://stanbol.apache.org/ontology/enhancer/enhancer#EnhancementChain&quot;</span><span class="nt">/&gt;</span>
+  <span class="nt">&lt;/rdf:Description&gt;</span>
+  <span class="nt">&lt;rdf:Description</span> <span class="na">rdf:about=</span><span class="s">&quot;http://localhost:8080/enhancer/engine/ner&quot;</span><span class="nt">&gt;</span>
+    <span class="nt">&lt;rdf:type</span> <span class="na">rdf:resource=</span><span class="s">&quot;http://stanbol.apache.org/ontology/enhancer/enhancer#EnhancementEngine&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;rdfs:label&gt;</span>ner<span class="nt">&lt;/rdfs:label&gt;</span>
+  <span class="nt">&lt;/rdf:Description&gt;</span>
+[...]
+<span class="nt">&lt;/rdf:RDF&gt;</span>
+</pre></div>
+
+
+<h3 id="executionplan-of-enhancement-chains">Executionplan of Enhancement Chains</h3>
+<p>The <a href="chains/executionplan.html">ExecutionPlan</a> can be also requested by sending a GET request with an supported RDF serialization as 'Accept' header to</p>
+<ul>
+<li><strong>'/enhancer/ep'</strong></li>
+<li><strong>'/enhancer/chain/{chain-name}/ep'</strong></li>
+<li><strong>'/engines/ep'</strong></li>
+</ul>
+<h2 id="multi-part-contentitem-support">Multi-part ContentItem support</h2>
+<p>The multipart <code>ContentItem</code> extensions to the basic RESTful services are provided by the Stanbol Enhancer. It was introduced (by <a href="https://issues.apache.org/jira/browse/STANBOL-481">STANBOL-481</a>) to allow advanced usage scenarios. Users will want to use this extensions if they need to:</p>
+<ul>
+<li>parse multiple versions of the content: Most CMS already do have support for converting content to plain text. This API allows to parse both the original AND multiple transcoded versions of the content to the Enhancer.</li>
+<li>parse pre-existing metadata: Typically CMS do have already some metadata about content parsed to the Stanbol Enhancer (e.g. User provided Tags, Categories …). The multi-part extensions do allow to parse such data in addition to the content. </li>
+<li>request transcoded versions of the parsed content: This API extensions allows to include transcoded (e.g. the 'plain/text') version of parsed content in the response. It also allows requests that directly returns transcoded content by omitting extracted metadata.</li>
+<li>request additional metadata that are normally not included within the metadata of the Enhancement response: This can to request the <a href="executionmetadata.html">execution metadata</a> in an own RDF graph, but it can also be used to request metadata of specific enhancement engines (TODO: add example)</li>
+</ul>
+<h3 id="queryparameters">QueryParameters</h3>
+<p>The following QueryParameters are defined by the multi-part content item extension:</p>
+<ul>
+<li>
+<p><strong>outputContent=[mediaType]:</strong> Allows to specify the Mime-types of content included within the Response of the Stanbol Enhancer. This parameter supports wild cards (e.g. '<em>' ... all, 'text/</em>'' ... all text versions,  'text/plain' ... only the plain text version). This parameter can be used multiple times.</p>
+<p>Responses to requests with this parameter will be encoded as <code>multipart/form-data</code>. If the "Accept" header of the request is not compatible to <code>multipart/form-data</code> it is assumed as a <code>400 BAD_REQUEST</code>. For details see the documentation of the <a href="contentitem.html#multipart_mime_serialization">Multipart MIME format for ContentItems</a>.</p>
+</li>
+<li>
+<p><strong>omitParsed=[true/false]:</strong> Makes only sense in combination with the <code>outputContent</code> parameter. This allows to exclude all content included in the request from the response. A typical combination is <code>outputContentPart=<em>/</em>&amp;omitParsed=true</code>. The default value of this parameter is <code>false</code>.</p>
+</li>
+<li>
+<p><strong>outputContentPart=[uri/'*']:</strong> This parameter allows to explicitly include content parts with a specific URI in the response. Currently this only supports <a href="contentitem.html#content_parts">ContentParts</a> that are stored as RDF graphs. </p>
+<p>Responses to requests with this parameter will be encoded as <code>multipart/form-data</code>. If the "Accept" header of the request is not compatible to <code>multipart/form-data</code> it is assumed as a <code>400 BAD_REQUEST</code>. The selected content parts will be included as MIME parts in the returned <a href="contentitem.html#multipart_mime_serialization">Multipart MIME formated ContentItems</a>. The URI of the part will be used as name. Such parts will be added after the "metadata" and the "content" (if present).</p>
+</li>
+<li>
+<p><strong>omitMetadata=[true/false]:</strong> This allows to enable/disable the inclusion of the metadata in the response. The default is <code>false</code>.</p>
+<p>Typically <code>omitMetadata=true</code> is used when users want to use the Stanbol Enhancer just to get one or more ContentParts as an response. Note that Requests that use an <code>Accept: {mimeType}</code> header AND <code>omitMetadata=true</code> will directly return the content version of <code>{mimeType}</code> and NOT wrap the result as <code>multipart/form-data</code>. See also the example further down this documentation.</p>
+</li>
+<li>
+<p><strong>rdfFormat=[rdfMimeType]:</strong> This allows for requests that result in <code>multipart/form-data</code> encoded responses to specify the used RDF serialization format. Supported formats and defaults are the same as for normal Enhancer Requests. </p>
+</li>
+</ul>
+<h3 id="parsing-multiple-contentparts">Parsing multiple ContentParts</h3>
+<p>Requests to the Stanbol Enhancer with the <code>Content-Type: multipart/form-data</code> are considered to contain a ContentItem serialized as MultiPart MIME. The exact specification of the <a href="contentitem.html#multipart_mime_serialization">MultiPart MIME format for ContentItems</a> is provided by the documentation of the ContentItem.</p>
+<p>The combination of <code>multipart/form-data</code> encoded requests with QueryParameters as described above allow for the usage of <a href="contentitem.html#multipart_mime_serialization">MultiPart MIME format for ContentItems</a> for both request and response.</p>
+<h2 id="using-the-multi-part-content-item-restful-api-extensions">Using the multi-part content item RESTful API extensions</h2>
+<p>The following examples show typical usage scenarios of the multi-part content item RESTful API. Note that for better readability the values of the query parameters are not URL-encoded.</p>
+<h3 id="example-1-return-metadata-and-content">Example 1: Return metadata and content</h3>
+<p>The first example shows how users can request both the metadata and transcoded versions of the parsed content.
+This can be achieved relatively easy by using the "<code>outputContent=<em>/</em></code>" in combination with "<code>omitParsed=true</code>".</p>
+<div class="codehilite"><pre>curl -v -X POST -H <span class="s2">&quot;Accept: multipart/form-data&quot;</span> <span class="se">\</span>
+    -H <span class="s2">&quot;Content-type: text/html; charset=UTF-8&quot;</span>  <span class="se">\</span>
+    --data <span class="s2">&quot;&lt;html&gt;&lt;body&gt;&lt;p&gt;The Stanbol enhancer can detect famous cities \</span>
+<span class="s2">            such as Paris and people such as Bob Marley.&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;&quot;</span> <span class="se">\</span>
+    <span class="s2">&quot;${it.serviceUrl}?outputContent=*/*&amp;omitParsed=true&amp;rdfFormat=application/rdf+xml&quot;</span>
+</pre></div>
+
+
+<p>This will result in a response with the mime-type <code>"Content-Type: multipart/form-data; charset=UTF-8; boundary=contentItem"</code> and the metadata as well as the plain text version of the parsed HTML document as content.</p>
+<div class="codehilite"><pre>--contentItem
+Content-Disposition: form-data; name=&quot;metadata&quot;; filename=&quot;urn:content-item-sha1-76e44d4b51c626bbed38ce88370be88702de9341&quot;
+Content-Type: application/rdf+xml; charset=UTF-8;
+Content-Transfer-Encoding: 8bit
+
+&lt;rdf:RDF
+    xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;
+[..the metadata formatted as RDF+XML..]
+&lt;/rdf:RDF&gt;
+
+--contentItem
+Content-Disposition: form-data; name=&quot;content&quot;
+Content-Type: multipart/alternate; boundary=contentParts; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+--contentParts
+Content-Disposition: form-data; name=&quot;urn:metaxa:plain-text:2daba9dc-21f6-7ea1-70dd-a2b0d5c6cd08&quot;
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley.
+--contentParts--
+
+--contentItem--
+</pre></div>
+
+
+<p>Se also the formal specification of the <a href="contentitem.html#multipart_mime_serialization">MultiPart MIME format for ContentItems</a> for ContentItems.</p>
+<h3 id="example-2-directly-return-the-plain-text-version-of-parsed-content">Example 2: Directly return the plain text version of parsed content</h3>
+<p>The using the '<code>omitMetadata=true</code>' together with the "Accept: {requested-content-type}" the multi-part content API allows to directly request the transcoded version of the content with the format {requested-content-type}. </p>
+<div class="codehilite"><pre>curl -v -X POST -H &quot;Accept: text/plain&quot; \
+    -H &quot;Content-type: text/html; charset=UTF-8&quot; \
+    --data &quot;<span class="nt">&lt;html&gt;&lt;body&gt;&lt;p&gt;</span>The Stanbol enhancer can detect famous cities \
+            such as Paris and people such as Bob Marley.<span class="nt">&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;</span>&quot; \
+    &quot;<span class="cp">${</span><span class="n">it</span><span class="o">.</span><span class="n">serviceUrl</span><span class="cp">}</span>?omitMetadata=true&quot;
+</pre></div>
+
+
+<p>The response will use <code>Content-Type: text/plain</code> and contain the string</p>
+<div class="codehilite"><pre>The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley.
+</pre></div>
+
+
+<p>To make this work the requested <a href="chains">Enhancement Chain</a> will need to include an engine (e.g. <a href="engines/metaxaengine.html">Metaxa</a>) that supports transcoding the parsed content. If no content with the request type is available the request will answer with a "<code>404 NOT FOUND</code>". </p>
+<p>Note also that because the metadata are omitted by responses to such requests it is also recommended to configure/use a chain that does no further processing on the transcoded content. </p>
+<h3 id="example-3-parse-multiple-content-versions">Example 3: Parse multiple content versions</h3>
+<p>This example will use the "httpmime" part of the Apache commons httpcomponents to create the Multipart MIME sent to the Stanbol enhancer.</p>
+<div class="codehilite"><pre><span class="nt">&lt;dependency&gt;</span>
+    <span class="nt">&lt;groupId&gt;</span>org.apache.httpcomponents<span class="nt">&lt;/groupId&gt;</span>
+    <span class="nt">&lt;artifactId&gt;</span>httpmime<span class="nt">&lt;/artifactId&gt;</span>
+    <span class="nt">&lt;version&gt;</span>4.1.2<span class="nt">&lt;/version&gt;</span>
+<span class="nt">&lt;/dependency&gt;</span>
+</pre></div>
+
+
+<p>The created Multipart MIME content MUST follow the specifications as defined by the <a href="contentitem.html#multipart_mime_serialization">MultiPart MIME format for ContentItems</a>.</p>
+<div class="codehilite"><pre><span class="n">InputStream</span> <span class="n">wordIn</span><span class="o">;</span> <span class="c1">//The MS Word version of the Content</span>
+<span class="n">InputStream</span> <span class="n">plainIn</span><span class="o">;</span> <span class="c1">//The plain text version of the Content</span>
+<span class="n">HttpClient</span> <span class="n">httpClient</span><span class="o">;</span> <span class="c1">//The client used to execute the request</span>
+
+<span class="c1">//create the multipart/form-data container for the ContentItem</span>
+<span class="c1">//MultipartEntity also implements HttpEntity</span>
+<span class="n">MultipartEntity</span> <span class="n">contentItem</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MultipartEntity</span><span class="o">(</span><span class="kc">null</span><span class="o">,</span> <span class="kc">null</span> <span class="o">,</span><span class="n">UTF8</span><span class="o">);</span>
+<span class="c1">//The multipart/alternate container for the contents</span>
+<span class="n">HttpMultipart</span> <span class="n">content</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HttpMultipart</span><span class="o">(</span><span class="s">&quot;alternate&quot;</span><span class="o">,</span> <span class="n">UTF8</span> <span class="o">,</span><span class="s">&quot;contentParts&quot;</span><span class="o">);</span>
+
+<span class="c1">//now add the container for the content to the content item container</span>
+<span class="n">contentItem</span><span class="o">.</span><span class="na">addPart</span><span class="o">(</span>
+    <span class="s">&quot;content&quot;</span><span class="o">,</span> <span class="c1">//the name MUST BE &quot;content&quot;!</span>
+    <span class="k">new</span> <span class="nf">MultipartContentBody</span><span class="o">(</span><span class="n">content</span><span class="o">));</span>
+
+<span class="c1">//now add the MS word content at the first location</span>
+<span class="c1">//this will make it the &quot;original&quot; content</span>
+<span class="n">content</span><span class="o">.</span><span class="na">addBodyPart</span><span class="o">(</span><span class="k">new</span> <span class="n">FormBodyPart</span><span class="o">(</span>
+    <span class="s">&quot;http://www.example.com/example.docx&quot;</span><span class="o">,</span> <span class="c1">//the id of the content part</span>
+    <span class="k">new</span> <span class="nf">InputStreamBody</span><span class="o">(</span>
+        <span class="n">wordIn</span><span class="o">,</span> 
+        <span class="s">&quot;application/vnd.openxmlformats-officedocument.wordprocessingml.document&quot;</span><span class="o">,</span> 
+        <span class="s">&quot;example.docx&quot;</span><span class="o">)));</span>
+
+<span class="c1">//now add the alternate plain text version</span>
+<span class="n">content</span><span class="o">.</span><span class="na">addBodyPart</span><span class="o">(</span><span class="k">new</span> <span class="n">FormBodyPart</span><span class="o">(</span>
+    <span class="s">&quot;http://www.example.com/example.docx&quot;</span><span class="o">,</span> <span class="c1">//the id of the content part</span>
+    <span class="k">new</span> <span class="nf">StringBody</span><span class="o">(</span> <span class="c1">//use a StringBody to avoid binary encoding for text</span>
+        <span class="n">IOUtils</span><span class="o">.</span><span class="na">toString</span><span class="o">(</span><span class="n">plainIn</span><span class="o">),</span> <span class="c1">//apache commons IO utility</span>
+        <span class="s">&quot;text/plain&quot;</span><span class="o">,</span>
+        <span class="n">Charset</span><span class="o">.</span><span class="na">forName</span><span class="o">(</span><span class="s">&quot;UTF-8&quot;</span><span class="o">))));</span>
+
+<span class="c1">//now we are ready to create and execute the POST request to the</span>
+<span class="c1">//Stanbol Enhancer</span>
+<span class="n">HttpPost</span> <span class="n">request</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HttpPost</span><span class="o">(</span><span class="s">&quot;http://localhost:8080/enhancer&quot;</span><span class="o">);</span>
+<span class="n">request</span><span class="o">.</span><span class="na">setEntity</span><span class="o">(</span><span class="n">contentItem</span><span class="o">);</span>
+<span class="n">request</span><span class="o">.</span><span class="na">setHeader</span><span class="o">(</span><span class="s">&quot;Accept&quot;</span><span class="o">,</span><span class="s">&quot;application/rdf+xml&quot;</span><span class="o">);</span>
+<span class="n">Response</span> <span class="n">response</span> <span class="o">=</span> <span class="n">httpClient</span><span class="o">.</span><span class="na">execute</span><span class="o">(</span><span class="n">request</span><span class="o">);</span>
+</pre></div>
+
+
+<p>Note that for such requests <a href="engines/metaxaengine.html">Metaxa</a> will still try to extract metadata of the parsed MS Word document, but all other engines will use the plain text version as parsed by the request for processing.</p>
+<h3 id="example-4-parse-existing-free-text-annotations">Example 4: Parse existing free text annotations</h3>
+<p>This example shows how the multi-part content item API can be used to parse already existing tags for an parsed content to the Stanbol Enhancer. For this example it is important to understand that parsed metadata need to confirm to the Stanbol <a href="enhancementstructure.html">Enhancement Structure</a>. Because of that this example consist of two main steps:</p>
+<ol>
+<li>Convert user tags to <code>TextAnnotation</code>s</li>
+<li>Send existing Metadata along with the Content to the Stanbol Enhancer</li>
+</ol>
+<p>Also note that the code snippets will use utilities provided by the "org.apache.stannbol.enhancer.servicesapi" module. As RDF framework Apache Clerezza is used. Both dependencies are easily replaceable.</p>
+<p>First lets have a look at the required information</p>
+<div class="codehilite"><pre><span class="n">MGraph</span> <span class="n">graph</span><span class="o">;</span> <span class="c1">//the RDF graph to store the metadata</span>
+<span class="n">UriRef</span> <span class="n">ciUri</span><span class="o">;</span> <span class="c1">//the URI for the contentItem</span>
+<span class="n">String</span> <span class="n">tag</span><span class="o">;</span> <span class="c1">// user provided tag</span>
+<span class="n">UriRef</span> <span class="n">tagType</span><span class="o">;</span> <span class="c1">//the type of the Tag</span>
+</pre></div>
+
+
+<p>Regarding the tag type: Stanbol natively supports the following types </p>
+<ul>
+<li><strong>Person</strong> (http://dbpedia.org/ontology/Person)</li>
+<li><strong>Organization</strong> (http://dbpedia.org/ontology/Organisation): NOTE the British spelling</li>
+<li><strong>Place</strong> (http://dbpedia.org/ontology/Place)</li>
+</ul>
+<p>The processing of parsed tags that use other or no type depends on the used <a href="engines">enhancement engines</a> and their configurations. Especially the configuration of the <a href="engines/namedentitytaggingengine.html">Named Entity Tagging Engine</a> is important in that respect.</p>
+<div class="codehilite"><pre><span class="n">Resource</span> <span class="n">user</span><span class="o">;</span> <span class="c1">//the user that has created the tag (optional)</span>
+<span class="c1">//in case of an name just use a literal</span>
+<span class="n">user</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PlainLiteral</span><span class="o">(</span><span class="s">&quot;Rudolf Huber&quot;</span><span class="o">);</span>
+<span class="c1">//in case users have assigned URIs</span>
+<span class="n">user</span> <span class="o">=</span> <span class="k">new</span> <span class="n">UriRef</span><span class="o">(</span><span class="s">&quot;http://my.cms.org/users/rudof.huber&quot;</span><span class="o">);</span>
+</pre></div>
+
+
+<p>Now we can convert the User Tags to <code>TextAnnotation</code>s</p>
+<div class="codehilite"><pre><span class="c1">//first create a URI for the text annotation. Here we use a random URN</span>
+<span class="c1">//If you can create a meaningful URI this would be better!</span>
+<span class="n">UriRef</span> <span class="n">ta</span> <span class="o">=</span> <span class="k">new</span> <span class="n">UriRef</span><span class="o">(</span><span class="s">&quot;urn:user-annotation:&quot;</span><span class="o">+</span><span class="n">EnhancementEngineHelper</span><span class="o">.</span><span class="na">randomUUID</span><span class="o">());</span>
+<span class="c1">//The the &#39;rdf:type&#39;s</span>
+<span class="n">graph</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">ta</span><span class="o">,</span> <span class="n">RDF</span><span class="o">.</span><span class="na">type</span><span class="o">,</span> <span class="n">TechnicalClasses</span><span class="o">.</span><span class="na">ENHANCER_TEXTANNOTATION</span><span class="o">));</span>
+<span class="n">graph</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">ta</span><span class="o">,</span> <span class="n">RDF</span><span class="o">.</span><span class="na">type</span><span class="o">,</span> <span class="n">TechnicalClasses</span><span class="o">.</span><span class="na">ENHANCER_ENHANCEMENT</span><span class="o">));</span>
+
+<span class="c1">//this TextAnnotation is about the ContentItem</span>
+<span class="n">graph</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">ta</span><span class="o">,</span> <span class="n">Properties</span><span class="o">.</span><span class="na">ENHANCER_EXTRACTED_FROM</span><span class="o">,</span> <span class="n">ciUri</span><span class="o">));</span>
+<span class="c1">//if the Tag uses a type add it</span>
+<span class="k">if</span><span class="o">(</span><span class="n">tagType</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">){</span>
+    <span class="n">graph</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">ta</span><span class="o">,</span> <span class="n">Properties</span><span class="o">.</span><span class="na">DC_TYPE</span><span class="o">,</span> <span class="n">tagType</span><span class="o">));</span>
+<span class="o">}</span>
+<span class="c1">//add the value of the tag</span>
+<span class="n">graph</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">ta</span><span class="o">,</span> <span class="n">Properties</span><span class="o">.</span><span class="na">ENHANCER_SELECTED_TEXT</span><span class="o">,</span> <span class="k">new</span> <span class="n">PlainLiteralImpl</span><span class="o">(</span><span class="n">tag</span><span class="o">)));</span>
+<span class="c1">//add the user</span>
+<span class="k">if</span><span class="o">(</span><span class="n">user</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">){</span>
+    <span class="n">graph</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">TripleImpl</span><span class="o">(</span><span class="n">ta</span><span class="o">,</span> <span class="n">Properties</span><span class="o">.</span><span class="na">DC_CREATOR</span><span class="o">,</span><span class="n">user</span><span class="o">));</span>
+<span class="o">}</span>
+</pre></div>
+
+
+<p>Now the 'graph' contains a valid <code>TextAnnotation</code> for the given user tag. This should be done for all tags of the current content.</p>
+<p>In the next step we need to serialize the RDF data. Again we will use here Clerezza as API, but any RDF framework will provide similar functionality</p>
+<div class="codehilite"><pre><span class="n">ByteArrayOutputStream</span> <span class="n">out</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ByteArrayOutputStream</span><span class="o">();</span>
+<span class="c1">//this tells the Serializer to create &quot;application/rdf+xml&quot;</span>
+<span class="n">serializer</span><span class="o">.</span><span class="na">serialize</span><span class="o">(</span><span class="n">out</span><span class="o">,</span> <span class="n">metadata</span><span class="o">,</span> <span class="n">SupportedFormat</span><span class="o">.</span><span class="na">RDF_XML</span><span class="o">);</span>
+<span class="n">String</span> <span class="n">rdfContent</span> <span class="o">=</span> <span class="k">new</span> <span class="n">String</span><span class="o">(</span><span class="n">out</span><span class="o">.</span><span class="na">toByteArray</span><span class="o">(),</span><span class="n">UTF8</span><span class="o">);</span>
+</pre></div>
+
+
+<p>Now we need to create the MultiPart MIME content item containing the metadata and the content</p>
+<div class="codehilite"><pre><span class="n">String</span> <span class="n">content</span><span class="o">;</span> <span class="c1">//the content we want to send to the Stanbol Enhancer</span>
+
+<span class="c1">//the container for the ContentITem</span>
+<span class="n">MultipartEntity</span> <span class="n">contentItem</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MultipartEntity</span><span class="o">(</span><span class="kc">null</span><span class="o">,</span> <span class="kc">null</span> <span class="o">,</span><span class="n">UTF8</span><span class="o">);</span>
+
+<span class="c1">//The Metadata MUST BE the first element</span>
+<span class="n">contentItem</span><span class="o">.</span><span class="na">addPart</span><span class="o">(</span>
+    <span class="s">&quot;metadata&quot;</span><span class="o">,</span> <span class="c1">//the name MUST BE &quot;metadata&quot; </span>
+    <span class="k">new</span> <span class="nf">StringBody</span><span class="o">(</span><span class="n">rdfContent</span><span class="o">,</span><span class="n">SupportedFormat</span><span class="o">.</span><span class="na">RDF_XML</span><span class="o">,</span><span class="n">UTF8</span><span class="o">){</span>
+        <span class="nd">@Override</span>
+        <span class="kd">public</span> <span class="n">String</span> <span class="nf">getFilename</span><span class="o">()</span> <span class="o">{</span> <span class="c1">//The filename MUST BE the</span>
+            <span class="k">return</span> <span class="n">ciUri</span><span class="o">.</span><span class="na">getUnicodeString</span><span class="o">();</span> <span class="c1">//uri of the ContentItem</span>
+        <span class="o">}</span>
+    <span class="o">});</span>
+</pre></div>
+
+
+<p>Note that because the <code>StringBody</code> class provided my the "httpmime" framework does not set a filename we need to override this method and return the URI of the content item. This is essential, because we need ensure that the URI of the <code>ContentItem</code> is the same as the URI (variable '<code>ciUri</code>') as used when creating the <code>TextAnnotation</code>s for the user tags.</p>
+<p>For the following code snippet note that we can directly add the content to the content item container. Only if we would need to sent multiple alternate content versions (as shown in 'Example 3') the usage of an <code>'multipart/alternate'</code> container is required.</p>
+<div class="codehilite"><pre><span class="c1">//Add the content as second mime part</span>
+<span class="n">contentItem</span><span class="o">.</span><span class="na">addPart</span><span class="o">(</span>
+    <span class="s">&quot;content&quot;</span><span class="o">,</span> <span class="c1">//the name MUST BE &quot;content&quot;</span>
+    <span class="k">new</span> <span class="nf">StringBody</span><span class="o">(</span><span class="n">content</span><span class="o">,</span><span class="s">&quot;text/plain&quot;</span><span class="o">,</span><span class="n">UTF8</span><span class="o">));</span>
+
+<span class="c1">//now we are ready to create and execute the POST request to the</span>
+<span class="c1">//Stanbol Enhancer</span>
+<span class="n">HttpPost</span> <span class="n">request</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HttpPost</span><span class="o">(</span><span class="s">&quot;http://localhost:8080/enhancer&quot;</span><span class="o">);</span>
+<span class="n">request</span><span class="o">.</span><span class="na">setEntity</span><span class="o">(</span><span class="n">contentItem</span><span class="o">);</span>
+<span class="n">request</span><span class="o">.</span><span class="na">setHeader</span><span class="o">(</span><span class="s">&quot;Accept&quot;</span><span class="o">,</span> <span class="n">SupportedFormat</span><span class="o">.</span><span class="na">RDF_XML</span><span class="o">);</span>
+<span class="n">Response</span> <span class="n">response</span> <span class="o">=</span> <span class="n">httpClient</span><span class="o">.</span><span class="na">execute</span><span class="o">(</span><span class="n">request</span><span class="o">);</span>
+</pre></div>
+
+
+<p>The response of the Enhancer will now contain entity suggestions for the free text user tags.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/es_entityannotation.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/es_entityannotation.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/es_entitydisambiguation.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/es_entitydisambiguation.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/es_textannotation.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/es_textannotation.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/es_topicannotation.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/es_topicannotation.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/executionmetadata.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/executionmetadata.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/executionmetadata.html Mon Jul 16 13:02:45 2012
@@ -0,0 +1,353 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Execution Metadata</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a><ul>
+<li><a href="/stanbol/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/stanbol/docs/trunk/components.html">Components</a></li>
+</ul>
+</li>
+<li><a href="/stanbol/development/">Development</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/stanbol/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/downloads/">Overview</a><ul>
+<li><a href="/stanbol/downloads/releases.html">Releases</a></li>
+<li><a href="/stanbol/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/stanbol/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrump" style="font-size: 80%;">
+      <a href="/">Home</a>&nbsp;&raquo&nbsp;<a href="/stanbol/">Stanbol</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/">Trunk</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/">Components</a>&nbsp;&raquo&nbsp;<a href="/stanbol/docs/trunk/components/enhancer/">Enhancer</a>
+    </div>
+    <h1 class="title">Execution Metadata</h1>
+    <p>The execution metadata holds detailed information about an ongoing/completed enhancement process. Basically they describe how the <a href="chains/executionplan.html">ExecutionPlan</a> provided by the <a href="chains">Chain</a> was executed by the <a href="enhancementjobmanager.html">EnhancementJobManager</a>. Both the ExecutionMetadata and the ExecutionPlan are provided with the ContentItem as an own content part of the type MGraph with the URI "http://stanbol.apache.org/ontology/enhancer/executionmetadata#ChainExecution". For users of the Stanbol Enhancer the Execution Metadata are of interest to:</p>
+<ul>
+<li>check progress of asynchronously started Enhancement Processes: Metadata for all planed executions of engines are created as soon as an ContentItem is parsed to the EnhancementJobManager and are updated as soon as the execution of engines start/complete/fail.</li>
+<li>Monitor the performance of different EnhancementEngines: The Execution Metadata provide detailed information about starting/completion time points for engine executions.</li>
+<li>Inspect the Enhancement Process: check if optional EnhancementEngines were successfully executed or skipped/failed; validate the configured EnhancementChain by checking the actual execution order of the EnhancementEngines.</li>
+</ul>
+<h2 id="execution-metadata-ontology">Execution Metadata Ontology</h2>
+<p>The RDFS schema used for the execution plan is defined as follows:</p>
+<p><img alt="Execution Metadata" src="executionmetadata.png" title="Overview of the Execution Metadata Ontology" /></p>
+<ul>
+<li>Namespace: em : http://stanbol.apache.org/ontology/enhancer/executionmetadata#</li>
+<li><strong>em:Execution</strong> : Super class for all Executions<ul>
+<li><strong>em:executionPart</strong> (domain:Execution, range: em:ChainExecution): Defines that this execution was part of the execution of a chain</li>
+<li><strong>em:status</strong>(domain: em:Execution; range: em:ExecutionStatus): The status of an execution (used for both em:EngineExecution and em:ChainExecution</li>
+<li><strong>em:started</strong> (domain: em:Execution; range: xsd:dateTime): Marks the start of the execution</li>
+<li><strong>em:completed</strong> (domain: em:Execution; range: xsd:dateTime): Marks the completion of the execution</li>
+<li><strong>em:statusMessage</strong> (domain: em:Execution; range: xsd:string): A natural language description providing further information about the status of this execution. Typically used to parse error messages if the execution fails (em:status is set to em:StatusFailed).</li>
+</ul>
+</li>
+<li><strong>em:ChainExecution</strong> : Class used to describe the execution of an enhancement chain.<ul>
+<li><strong>em:defaultChain</strong> (domain: em:ChainExecution; range: xsd:boolean): If the executed chain is currently the default Chain of the Stanbol Enhancer.</li>
+<li><strong>em:executionPlan</strong> (domain:ChainExecution; range: ep:ExecutionPlan): Links to the execution plan as provided by the chain.</li>
+<li><strong>em:enhances</strong>(domain: em:ChainExecution; range: rdf:Resource) : links the em:ChainExecution with the URI of the processed content item. The range needs to be updated as soon as the Stanbol Enhancement Structure is defined.</li>
+<li><strong>em:enhancedBy</strong> (domain: rdf:Resource; range: em:ChainExecution) : links the URI of the content item with the metadata about the enhancement process. The range needs to be updated as soon as the Stanbol Enhancement Structure is defined.</li>
+</ul>
+</li>
+<li><strong>em:EngineExecution</strong> : Class used to describe the execution of an EnhancementEngine.<ul>
+<li><strong>em:executionNode</strong> (domain: em:EngineExecution; range: ep:ExecutionNode): The node within the ExecutionPlan</li>
+</ul>
+</li>
+<li><strong>em:ExecutionStatus</strong> : Class describing the status of an EngineExecution<ul>
+<li><strong>em:StatusScheduled</strong> : ExecutionStatus instance describing that an execution is scheduled but has not yet started</li>
+<li><strong>em:StatusInProgress</strong> : ExecutionStatus instance describing that the execution of the linked EngineExecution is in progress</li>
+<li><strong>em:StatusCompleted</strong> : ExecutionStatus instance describing that the execution has already completed successfully</li>
+<li><strong>em:StatusFailed</strong> : ExecutionStatus indicating that the execution has failed. Typically an em:statusMessage describing the reason for the failed execution is provided for em:Executions with this state.</li>
+<li><strong>em:StatusSkipped</strong> : ExecutionStatus indicating that the execution of an ep:ExecutionNode was skipped. This is only allowed for execution nodes that are marked as optional. Typically also an em:statusMessage with the reason should be provided.</li>
+</ul>
+</li>
+</ul>
+<h3 id="example">Example</h3>
+<p>The following example uses the same properties as used within the <a href="chains/executionplan.html">ExecutionPlan</a> section. To make it easier to see the relations between the execution metadata and the execution plan, the triples of the execution plan are included at the end of this example.</p>
+<p>This example describes the following situation:</p>
+<ul>
+<li>the execution of the content item with the URI 'urn:contentItem1' with the default chain</li>
+<li>the default chain is represented by a chain with the name "demoChain" the ExecutionPlan has the URI 'urn:execPlan'</li>
+<li>the successful execution of the 'langid' engine (execution: 'urn:exec1', node: 'urn:node1')</li>
+<li>the failed execution of the 'ner' engine (execution: 'urn:exec2', node: 'urn:node2'): As reason for the failure a message is provided that the NER model for the language 'de' is not available</li>
+<li>the successful execution of the 'zemanta' engine (execution: 'urn:exec3', node: 'urn:node5'): This engine was started in parallel to the 'ner' engine - therefore before the chain failed.</li>
+<li>There is no execution of the dbpediaLinking (node: '') and geonamesLinking (node: '') engines because the chain failed before these engines were scheduled. This assumes the EnhancementJobManager does only add em:EngineExecution resources when it starts the processing of an ep:ExecutionNode defined in the execution plan. However, the EnhancementJobManager can also create ep:Execution resources for all execution nodes. In that case there would be also em:EngineExecution resources for the dbpediaLinking and geonamesLinking engines with the em:status set to 'em:StatusScheduled'. </li>
+</ul>
+<p>The RDF graph with the Execution Metadata:</p>
+<div class="codehilite"><pre>urn:exec
+    rdf:type em:ChainExecution
+    em:executionPlan urn:execPlan
+    em:enhances urn:contentItem1
+    em:defaultChain &quot;true&quot;
+    em:started 2012-01-11T12.13.14.156
+    em:completed 2012-01-11T12.13.15.157
+    em:status em:StatusFailed
+    em:statusMessage &quot;Unable to execute EnhancementEngine &#39;new&#39; \
+        (Message: No NER model for language &#39;de&#39; is available).&quot;
+    em:executionPart urn:exec1, urn:exec2, urn:exec3, urn:exec4, urn:exec5
+
+urn:exec1
+    rdf:type em:EngineExecution
+    em:executionPart urn:exec
+    em:executionNode urn:node1
+    em:status em:StatusCompleted
+    em:started 2012-01-11T12.13.14.160
+    em:completed 2012-01-11T12.13.14.250
+
+urn:exec2
+    rdf:type em:EngineExecution
+    em:executionPart urn:exec
+    em:executionNode urn:node2
+    em:status StatusFailed
+    em:statusMessage &quot;No NER model for language &#39;de&#39; is available&quot;
+    em:started 2012-01-11T12.13.14.253
+    em:completed 2012-01-11T12.13.14.289
+
+urn:exec3
+    rdf:type em:EngineExecution
+    em:executionPart urn:exec
+    em:executionNode urn:node5
+    em:status StatusCompleted
+    em:started 2012-01-11T12.13.14.253
+    em:completed 2012-01-11T12.13.15.150
+</pre></div>
+
+
+<p>The Execution Plan: (copy from the example provided in the ExecutionPlan section)</p>
+<div class="codehilite"><pre>urn:execPlan
+    rdf:type ep:ExecutionPlan
+    ep:hasExecutionNode urn:node1, urn:node2, urn:node3, urn:node4, urn:node5
+    ep:chain &quot;demoChain&quot;
+
+urn:node1
+    rdf:type stanbol:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    stanbol:engine langId
+
+urn:node2
+    rdf:type ep:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    ep:dependsOn urn:node1
+    ep:engine ner
+
+urn:node3
+    rdf:type ep:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    ep:dependsOn urn:node1
+    ep:engine dbpediaLinking
+
+urn:node4
+    rdf:type ep:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    ep:dependsOn urn:node1
+    ep:engine geonamesLinking
+
+urn:node5
+    rdf:type ep:ExecutionNode
+    ep:inExecutionPlan urn:execPlan
+    ep:engine zemanta
+    ep:optional &quot;true&quot;^^xsd:boolean
+</pre></div>
+
+
+<h2 id="creationmanagement-of-execution-metadata">Creation/Management of Execution Metadata</h2>
+<p>This section is primarily intended for implementors of EnhancementJobManager. However it might also provide insights for users that want/need to monitor the state of enhancement processes as it describes what information are added when to the Execution Metadata.</p>
+<p>When the <a href="enhancementjobmanager.html">EnhancementJobManager</a> starts the Enhancement of a ContentItem it needs to check if the <a href="contentitem.html">ContentItem</a> already contains ExecutionMetadata in the ContentPart with the URI "http://stanbol.apache.org/ontology/enhancer/executionmetadata#ChainExecution". If this is the case it needs to initialize itself based on the pre-existing information. If no ExecutionMetadata are present, a new EnhancementProcess needs to be created based on the parsed Chain. Differences between this two cases are explained in the following two sub sections.</p>
+<h3 id="initialization">Initialization</h3>
+<p>If no ExecutionMetadata are present within a parsed ContentItem, a new EnhancementProcess needs to be set up. This includes the following steps:</p>
+<ol>
+<li>Get the <a href="chains/executionplan.html">ExecutionPlan</a> for the parsed enhancement <a href="chains">Chain</a>. If no chain is parsed the default chain need to be acquired by using the <a href="chains/chainmanager.html">ChainManager</a>.</li>
+<li>Create the content part for the ExecutionMetadata with the <a href="contentitem.html">ContentItem</a> and add the information of the <a href="chains/executionplan.html">ExecutionPlan</a> to it.</li>
+<li>Create the initial ExecutionMetadata. This includes the 'em:ChainExecution' instance for the 'ep:ExecutionPlan' as well as 'em:EngineExecution' instances for all 'ep:ExecutionNode's defined by the execution plan. All such 'em:Execution' instances MUST BE created with the 'em:ExecutionStatus' 'em:StatusSheduled'.</li>
+</ol>
+<p>The ExecutionMetadataHelper utility of the "org.apache.stanbol.enhancer.servicesapi" module contains utility methods for initializing execution metadata.</p>
+<h3 id="continuation">Continuation</h3>
+<p>If the parsed ContentItem does already contain ExecutionMetadata in the content part with the URI "http://stanbol.apache.org/ontology/enhancer/executionmetadata#ChainExecution" the EnhancementJobManager MUST follow the following steps to continue an EnhancementProcess.</p>
+<ol>
+<li>Check if the contained ExecutionMetadata are valid<ul>
+<li>If a 'em:ChainExecution' node is present that 'em:enhances' the parsed ContentItem</li>
+<li>If the ExecutionPlan is included and if the value of the 'ep:chain' property for the 'ep:ExecutionPlan' resource corresponds to the name of the Chain parsed in the request.</li>
+</ul>
+</li>
+<li>Check the status of all 'em:Execution' instances<ul>
+<li>reset the status of 'em:Execution's that are in-progress to scheduled.</li>
+<li>TODO: here we could also retry the execution of failed 'em:Execution's</li>
+</ul>
+</li>
+</ol>
+<p>Note that with an continuation the ExecutionPlan MUST NOT be updated. It MUST BE also NOT checked if a Chain with the name as stored in the ExecutionMetadata is still present. Note also that configuration changes of EnhancementEngine will affect the continuation of the enhancement process.</p>
+<p>The ExecutionMetadataHelper utility of the "org.apache.stanbol.enhancer.servicesapi" module contains utility methods for reading and validating pre-existing execution metadata.</p>
+<h3 id="execution-state-management">Execution State Management</h3>
+<p>The following metadata need to be updated by the EnhancementJobManager when:</p>
+<ul>
+<li>Enhancement process starts<ul>
+<li>set the 'em:status' of the 'em:ChainExecution' to 'em:StatusInProgress'</li>
+<li>set the 'em:started' to the current date time</li>
+</ul>
+</li>
+<li>EnhancementEngine execution starts:<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusInProgress'</li>
+<li>set the 'em:started' to the current date time</li>
+</ul>
+</li>
+<li>EnhancementEngine completes<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusCompleted'</li>
+<li>set the 'em:completed' to the current date time</li>
+</ul>
+</li>
+<li>Optional EnhancementEngine not available<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusSkipped'</li>
+<li>set both 'em:started' and 'em:completed' to the current date time</li>
+</ul>
+</li>
+<li>Optional EnhancementEngine failed<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusFailed'</li>
+<li>set the 'em:completed' to the current date time</li>
+</ul>
+</li>
+<li>Required EnhancementEngine failed or not available<ul>
+<li>set the 'em:status' of the 'em:EngineExecution' to 'em:StatusFailed'</li>
+<li>set the 'em:status' of the 'em:ChainExecution' to 'em:StatusFailed'</li>
+<li>set the 'em:completed' of both the engine and the chain execution to the current date time</li>
+</ul>
+</li>
+<li>Enhancement process completes<ul>
+<li>set the 'em:status' of the 'em:ChainExecution' to 'em:StatusCompleted'</li>
+<li>set the 'em:completed' to the current date time</li>
+</ul>
+</li>
+<li>Internal error in the EnhancementJobManager implementation<ul>
+<li>set the 'em:status' of the 'em:ChainExecution' to 'em:StatusFailed'</li>
+<li>do not set any 'em:EngineExecution' to failed.</li>
+<li>set the 'em:completed' value of the 'em:ChainExecution' to the current date time</li>
+</ul>
+</li>
+</ul>
+<p>The ExecutionMetadataHelper utility of the "org.apache.stanbol.enhancer.servicesapi" module contains utility methods to preform state transitions on 'em:Execution' instances.</p>
+<h2 id="using-executionmetadata">Using ExecutionMetadata</h2>
+<p>This section provides some examples on how to access and retrieve information from the ExecutionMetadata.</p>
+<h3 id="accessing-executionmetadata">Accessing ExecutionMetadata</h3>
+<p>The ExecutionMetadata and the <a href="chains/executionplan.html">ExecutionPlan</a> are stored in a content part with with URI "http://stanbol.apache.org/ontology/enhancer/executionmetadata#ChainExecution" with the <a href="contentitem.html">ContentItem</a>. The following code segment can be used to retrieve the RDF graph with the ExecutionMetadata:</p>
+<div class="codehilite"><pre><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">;</span> <span class="c1">//the ContentItem</span>
+<span class="c1">//the URI is available as constant of the ExecutionMetadata class</span>
+<span class="n">UriRef</span> <span class="n">contentPartURI</span> <span class="o">=</span> <span class="n">ExecutionMetadata</span><span class="o">.</span><span class="na">CHAIN_EXECUTION</span><span class="o">;</span>
+
+<span class="n">MGraph</span> <span class="n">executionMetadata</span> <span class="o">=</span> <span class="n">ci</span><span class="o">.</span><span class="na">getPart</span><span class="o">(</span><span class="n">contentPartURI</span><span class="o">,</span><span class="n">MGraph</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
+</pre></div>
+
+
+<p>The ExecutionMetadata are stored as read-/writeable RDF graph. To parse a read-only version to other components one can use the "getGraph()" method defined by MGraph.</p>
+<h3 id="getting-details-about-the-emchainexecution">Getting details about the em:ChainExecution</h3>
+<p>The following code segments show how to access information about the execution of the enhancement process for a <a href="contentitem.html">ContentItem</a>. All directly accessed methods in the examples below are static imports from one of the following two utility classes part of the "org.apache.stanbol.enhancer.servicesapi" module.</p>
+<ul>
+<li>ExecutionPlanHelper: Utility class that provides methods for reading and creating <a href="chains/executionplan.html">ExecutionPlan</a>.</li>
+<li>ExecutionMetadataHelper: Utility class for reading and manipulating the ExecutionMetadata</li>
+<li>EnhancementEngineHelper: Utility that contains general purpose RDF utilities.</li>
+</ul>
+<p>This code example first gets the ChainExecution, ExecutionPlan and Chain name for the enhanced content item. In a second step metadata of all executed EnhancementEngines are retrieved.</p>
+<div class="codehilite"><pre><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">;</span> <span class="c1">//the ContentItem</span>
+<span class="n">MGraph</span> <span class="n">em</span><span class="o">;</span> <span class="c1">//the ExecutionMetadata</span>
+
+<span class="c1">//get the ChainExecution, ExecutionPlan and the name of the Chain</span>
+<span class="n">NonLiteral</span> <span class="n">ce</span> <span class="o">=</span> <span class="n">getChainExecution</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ci</span><span class="o">.</span><span class="na">getUri</span><span class="o">());</span>
+<span class="k">if</span><span class="o">(</span><span class="n">ce</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">){</span>
+    <span class="n">NonLiteral</span> <span class="n">ep</span> <span class="o">=</span> <span class="n">getExecutionPlan</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ce</span><span class="o">);</span>
+    <span class="n">String</span> <span class="n">chainName</span> <span class="o">=</span> <span class="n">getString</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ep</span><span class="o">,</span><span class="n">ExecutionPlan</span><span class="o">.</span><span class="na">CHAIN</span><span class="o">);</span>
+<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
+    <span class="n">log</span><span class="o">.</span><span class="na">warn</span><span class="o">(</span><span class="s">&quot;ExecutionMetadata of not contain information for &quot;</span>
+        <span class="o">+</span> <span class="s">&quot;ContentItem {}!&quot;</span><span class="o">,</span><span class="n">ci</span><span class="o">.</span><span class="na">getUri</span><span class="o">());</span>
+<span class="o">}</span>
+
+<span class="c1">//get the EngineExecutions and the name of the Engines</span>
+<span class="n">Set</span><span class="o">&lt;</span><span class="n">NonLiteral</span><span class="o">&gt;</span> <span class="n">executions</span> <span class="o">=</span> <span class="n">getExecutions</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ce</span><span class="o">);</span>
+<span class="k">for</span><span class="o">(</span><span class="n">NonLiteral</span> <span class="n">ex</span> <span class="o">:</span> <span class="n">executions</span><span class="o">){</span>
+    <span class="n">NonLiteral</span> <span class="n">en</span> <span class="o">=</span> <span class="n">getExecutionNode</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ex</span><span class="o">);</span>
+    <span class="k">if</span><span class="o">(</span><span class="n">en</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">){</span>
+        <span class="n">String</span> <span class="n">engineName</span> <span class="o">=</span> <span class="n">getEngine</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">en</span><span class="o">);</span>
+        <span class="kt">boolean</span> <span class="n">optional</span> <span class="o">=</span> <span class="n">isOptional</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">en</span><span class="o">);</span>
+    <span class="o">}</span> <span class="k">else</span> <span class="o">{</span> <span class="c1">//maybe a sub-chain execution</span>
+        <span class="c1">//currently not supported, but might</span>
+        <span class="c1">//added in future versions</span>
+    <span class="o">}</span>
+    <span class="n">UriRef</span> <span class="n">status</span> <span class="o">=</span> <span class="n">getStatus</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ex</span><span class="o">);</span>
+    <span class="n">Date</span> <span class="n">started</span> <span class="o">=</span> <span class="n">getStarted</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ex</span><span class="o">);</span>
+    <span class="n">Date</span> <span class="n">completed</span> <span class="o">=</span> <span class="n">getCompleted</span><span class="o">(</span><span class="n">em</span><span class="o">,</span><span class="n">ex</span><span class="o">);</span>
+<span class="o">}</span>
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/executionmetadata.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/executionmetadata.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/hallo-annotate_scrrenshot.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components/enhancer/hallo-annotate_scrrenshot.png
------------------------------------------------------------------------------
    svn:mime-type = image/png