You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/04/11 10:30:50 UTC

svn commit: r812318 [7/10] - in /websites/staging/stanbol/trunk/content: ./ stanbol/docs/0.9.0-incubating/ stanbol/docs/0.9.0-incubating/cmsadapter/ stanbol/docs/0.9.0-incubating/contenthub/ stanbol/docs/0.9.0-incubating/enhancer/ stanbol/docs/0.9.0-in...

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/stanbolenhancementstructure.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/stanbolenhancementstructure.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/enhancer/stanbolenhancementstructure.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,723 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - The Stanbol Enhancement Structure (PROPOSAL)</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">The Stanbol Enhancement Structure (PROPOSAL)</h1>
+    <p>Please NOTE: This is a proposal for the future version of the Enhancement Structure used by the Stanbol Enhancer. This <strong>DOES NOT</strong> describe the Enhancement Structure used by the current version of the Stanbol Enhancer!</p>
+<p>This describe the schema (ontology) used by the Apache Stanbol Enhancer to express features extracted from parsed content items. The main purpose of this is to standardizes information created by EnhamncementEngines to enable users to easily work with enhancement results, but also to support cooperation between different enhancement engines.</p>
+<h2 id="overview">Overview</h2>
+<p>The Stanbol Enhancement Structure is build around the following main Concepts. Each of this concepts covers a specific aspect related to the enhancement process of content.</p>
+<p>The following list gives an overview about the concepts used by the Stanbol Enhancement Strucutre:</p>
+<p><img alt="Overview about the Stanbol Enhancement Structure" src="enhancementstructureoverview.png" title="Overview of the Stanbol Enhancement Structure" /></p>
+<ul>
+<li>
+<p><strong>ContentItem:</strong> This is the resource representing the parsed content. The URI of this resource depends on how the content was parsed to the Stanbol Enhancer. In case an absolute URI is provided by the request, than this URI is used. In all other cased the Stanbol Enhancer creates an URI based on the configured prefix or the URL of the service. The documentation of the RESTful service should provide more information about that.</p>
+</li>
+<li>
+<p><strong>sb:Content:</strong> Several content model distinguish between Content (data) and the ContentItem (Interpretation of the Data). The Enhancement Structure currently only defines ContentItem, because there is no need to describe the data for the purpose of the enhancement process. Other components (such as the /store endpoint) might need to formally describe the data. For such use cases the sic:content property will be used to refer from the ContentItem to the Content. The URI representing the Content will be the same to be used to retrieve its data via a RESTful service. </p>
+</li>
+<li>
+<p><strong>sb:Enhancement:</strong> This provides metadata about extractions created by EnhancementEngines or present within the content. This includes the creator (usually a EnhancementEngine), the creation time, as well as relations to other enhancements. Users of the Stanbol Enhancer will typically not care about such data because out of the their perspective they represent Meta-Meta-Data (meta data about the metadata). Every feature, suggestion or other piece of information extracted by any EnhancementEngine need to attach the metadata defined for this concept.</p>
+</li>
+<li>
+<p><strong>sb:Annotation:</strong> An annotation describe some piece of knowledge extracted from the parsed content and/or the metadata of the content. Information provided by Annotations include the label, type and the confidence. In addition Annotations need to link at least to a single Occurrence and may have one or more Suggestions. Annotations can also be related/dependent to other Annotations. The EnhancementStructure defines only a small set of different Annotation types. Implementors of EnhancementEngines that extract specific kind of things (e.g. coreferences, events, …) may need to define there own Annotation types. Such Extensions should be called "**Annotation" and be defined as rdfs:subclass of any Annotation type defined by this Enhancement structure.</p>
+</li>
+<li>
+<p><strong>sb:Suggestion</strong> An suggestion describes an Resource (Entity, Topic, Category …) that an EnhancementEngine suggests as a possible match for an Annotation. Suggestions are typically created by Engines that further process - semantic lifting - of Annotations. However EnhancementEngines might also create both - the Annotation and the Suggestions. Suggestions are always linked to a single Annotations (functional property). They  define the label, the ID (typically the URI of the Resource), the type(s) of the suggested Resource and the confidence of the suggestion.</p>
+</li>
+<li>
+<p><strong>sb:Occurrence:</strong> An Occurrence describes the actual location of an extracted feature within the content. This location may be within the content or within parsed metadata. Occurrences are always linked to a single Annotation (functional property). Based on the type of the content there will be different types of Occurrences. This EnhancementStructure currently focus on two types of Occurrences: (1) TextOccurrence and (2) MetadataOccurrence. For details on the model of such Occurrence types see the according sections. EnhancementEngines that support the extraction of Features from content types that are not covered by this Specification (e.g. Pictures, Sound, Video) need to define there own Occurrence types. Such types should use the name "***Occurrence" and be defined as rdfs:subClassOf any of the Occurrence types defined in this specification.</p>
+</li>
+</ul>
+<p>Enhancements encoded based on this specification need to confirm to the following rules:</p>
+<ul>
+<li>sb:Annotation and sb:Suggestion MUST also be of type sb:Enhancement and include the required metadata defined by sb:Enhancement.</li>
+<li>sb:Occurrences, sb:Annotations and Suggestions MUST include rdf:type information for all parent types. e.g. when adding a sb:TextOccurrences the rdf:type MUST include sb:TextOccurrence AND sb:Occurrences. Consumers are expected to NOT using any kind of reasoner therefore adding such additional information is the only way to ensure that queries for occurrences, annotations or suggestions provide the expected results.</li>
+</ul>
+<h2 id="specification">Specification</h2>
+<h3 id="namespaces-and-used-notations">Namespaces and used Notations</h3>
+<p>While the Stanbol Enhancement Structure does define some Concepts and Properties it also uses a lot of existing things from other ontologies. To improve the readability of this specification namespace prefixes + local names are used instead of the full URLs by this specification.</p>
+<p>All the namespace prefixes used within this specification are described by the following list: </p>
+<ul>
+<li>sb: represents Stanbol and refers to all properties and concepts defined by the Stanbol enhancement structure. This URL is not yet final, but one of the options is "http://stanbol.apache.org/ontology/".</li>
+<li>dc: the Dublin Core Terms (DCterms) ontology (http://dublincore.org/documents/dcmi-terms/)</li>
+<li>rdf: the Resrouce Description Framework (http://www.w3.org/RDF/)</li>
+<li>rdfs: the RDF schema  (http://www.w3.org/TR/rdf-schema/)</li>
+<li>sioc: SIOC (Semantically-Interlinked Online Communities) Core Ontology (http://rdfs.org/sioc/ns#)</li>
+</ul>
+<p>Notations used by this specification:</p>
+<ul>
+<li><strong>&lt;{code}&gt;</strong> elements do refer to an instance identified by the URI {code}. To improve the readability {codes} that refer to instances of concepts defined by the Stanbol enhancement structure will use short forms (&lt;ci&gt; for a ContentItem instance, &lt;a&gt; for anAnnotation instance ...).</li>
+<li><strong>{prefix}:{localname}</strong> is used as short form for &lt;{namespace+localname}&gt;. The namespace -&gt; prefix mappings are defined in the above list</li>
+<li><strong>{value}^^dataType</strong> The (xsd) dataType required by the value e.g. xsd:float, xsd:anyUri, The default is xsd:string</li>
+<li><strong>?{var}</strong> represent a resource that is unknown by the Stanbol Enhancer. Usually a resource of the Users knowledge model that is not necessarily parsed to the Stanbol<br />
+</li>
+<li><strong>[{statement}]</strong> represent statements that are typically used in combination with the Stanbol Enhancement Structure but not required nor used by the enhancement process itself.</li>
+</ul>
+<p>A special NOTE to the usage of &lt;{code}&gt; in comairism to {value}^^xsd:anyURI:</p>
+<ul>
+<li>In both cases the value will be an URI</li>
+<li>In case of &lt;{code}&gt; the URI identifies a resource that is created/defined by the enhancement results - meaning that the returned knowledge contains all information about that resource</li>
+<li>{value}^^xsd:anyURI indicates that enhancement results will not provide additional knowledge about this resource. If the consumer needs more information about such resources he need to use other services to retrieve such knowledge or parse special parameters to tell Stanbol to explicitly include such knowledge in the response.</li>
+</ul>
+<h3 id="contentitem-wzxhzdk26ci">ContentItem &lt;ci&gt;</h3>
+<p>The ContentItem &lt;ci&gt; represents a content parsed to the Stanbol Enhancer. It is the central resource used to link all the enhancements created by the EnhancementEngines.</p>
+<div class="codehilite"><pre>&lt;ci&gt; rdf:type sb:ContentItem
+[&lt;ci&gt; sb:embeds-knowledge {knowlegeGraphId}]
+[&lt;ci&gt; sb:has-section sb:ContentItem]
+[&lt;ci&gt; &lt;{metadatafield}&gt; {value(s)}]
+</pre></div>
+
+
+<p>The ContentItem itself does only define two fields:</p>
+<ul>
+<li><strong>sb:embeds-knowledge</strong>: Documents might contain explicit knowledge (e.g. MicroData, RDFa). If such information can be extracted, than it will be stored in an own RDF graph. This property links to the ID of this RDF graph. Such knowledge is typically extracted during the pre-processing phase of the EnhancementProcess. Therefore EnhancementEngine do have access to this information.</li>
+<li><strong>sb:has-section</strong>: A ContentItem my define different sections. The Stanbol EnhancementEngine will create an own ContentItem with an own ID for such sections. The Stanbol Enhancer will first enhance the main content item and than all the sections. This feature is mainly intended to split up huge documents to feasible parts to enhance.</li>
+</ul>
+<p>In addition metadata extracted or parsed with the parsed content (e.g. Dublin Core, EXIF, ID3 ...) can also be directly added to the ContentItem &lt;ci&gt;. EnhancementEngines may used such information during the EnancementProcess.</p>
+<p><strong>Example: Embedded Knowledge</strong></p>
+<p><em>TODO</em>: Move this to an own section about RDFa support!</p>
+<p>This example shows how SIOC (Semantically-Interlinked Online Communities) and RDFa can be used to embed knowledge to tell Stanbol how to process parsed HTML markup.</p>
+<div class="codehilite"><pre><span class="nt">&lt;body</span> <span class="na">about=</span><span class="s">&quot;http://www.examplenews.com/featuredNews&quot;</span><span class="nt">&gt;&lt;table&gt;&lt;tr&gt;</span>
+    <span class="nt">&lt;td&gt;</span><span class="c">&lt;!-- The menue: Not to be enhanced --&gt;</span> <span class="nt">&lt;/td&gt;</span>
+    <span class="nt">&lt;td&gt;&lt;span</span> <span class="na">property=</span><span class="s">&quot;sic:content&quot;</span> <span class="na">about=</span><span class="s">&quot;http://www.examplenews.com/story123&quot;</span><span class="nt">&gt;</span> 
+        This is the Content of this page to be enhanced by the Stanbol enhancer
+    <span class="nt">&lt;/span&gt;&lt;span</span> <span class="na">property=</span><span class="s">&quot;sic:content&quot;</span> <span class="na">about=</span><span class="s">&quot;http://www.examplenews.com/interview456&quot;</span><span class="nt">&gt;</span>
+        And there may be even more than one Sections within the document that need to be enhanced
+    <span class="nt">&lt;/span&gt;&lt;/td&gt;</span>
+    <span class="nt">&lt;td&gt;</span> <span class="c">&lt;!-- Advertisements: Not to be enhanced --&gt;</span> <span class="nt">&lt;/td&gt;</span>
+<span class="nt">&lt;/tr&gt;&lt;/trable&gt;&lt;/body&gt;</span>
+</pre></div>
+
+
+<p>By parsing this as Content the Stanbol Enhancer should create:</p>
+<ul>
+<li>A sb:ContentItem for "http://www.examplenews.com/featuredNews" with two section but an empty content.<ul>
+<li>The knowledge as defined by the above RDFa markup is included in an own RDF graph and linked with the "sb:embeds-knowledge" property</li>
+</ul>
+</li>
+<li>A sb:ContentItem representing the section "http://www.examplenews.com/story123" <ul>
+<li>the HTML fragment enclosed by the according span-tag is the content</li>
+</ul>
+</li>
+<li>A sb:ContentItem representing the section "http://www.examplenews.com/interview456"<ul>
+<li>the HTML fragment enclosed by the according span-tag is the content</li>
+</ul>
+</li>
+</ul>
+<p>NOTE: This assumes the presence of </p>
+<ul>
+<li>a Components for extracting RDFa </li>
+<li>a Component that supports the creation of sb:ContentItems and fragments based on SIOC</li>
+</ul>
+<h3 id="enhancement">Enhancement</h3>
+<p>The concept "Enhancement" defines properties that allow Stanbol EnhancementEngines to formally describe information about the enhancement process. This information are crucial for EnhancemetnEngines to cooperate with each other but typical Stanbol users will not need to border with such information even that in some situation such knowledge might even be useful on the client side e.g. if someone wants to ignore all enhancements created by an specific enhancement engine, or to calculate all enhancements affected by the removal of an part of the content.</p>
+<p>The following code segments shows the knowledge typically described by using the Enhancement concept</p>
+<div class="codehilite"><pre>&lt;e&gt; rdf:type sb:Enhancement
+&lt;e&gt; dc:creator enhancementEngine^^xsd:anyURI
+&lt;e&gt; dc:contributor enhancementEngine^^xsd:anyURI
+&lt;e&gt; dc:created date^^xsd:dateTime
+&lt;e&gt; dc:modified date^^xsd:dateTime
+[&lt;e&gt; sb:relatedTo &lt;relatedEnhancement&gt;]
+[&lt;e&gt; sb:dependsOn &lt;dependsOnEnhancement&gt;]
+</pre></div>
+
+
+<p>The presence of the statement "&lt;e&gt; rdf:type sd:Enhancement" statement indicated that enhancement metadata are present for the resource &lt;e&gt;. This also means that if there is some configuration set to exclude such information, than all the above properties MUST be removed from the results of the enhancement process.
+The metadata defined by sb:Enhancement MUST BE added for all sb:Annotation and sb:Suggestion instances created by an EnhancementEngine. This also includes any rdf:subClassOf of those two Concepts. </p>
+<p>The following figure shows an example of an sb:Annotation and a sb:Suggestion for Paris with the according metadata as defined by the sb:Enhancement concept.</p>
+<p><img alt="Example: sb:Annotation and sb:Suggestion including sb:Enhancement metadata" src="enhancementexample.png" title="Example: sb:Annotation and sb:Suggestion including sb:Enhancement metadata" /></p>
+<p>Note that sb:Annotation and sb:Suggestion are not sub-classes of sb:Annotation. EnhancementEngines need to add sb:Enhancement as an additional rdf:type to sb:Annotation and sb:Suggestion.</p>
+<p>Description of the properties defined/used by sb:Enhancement:</p>
+<ul>
+<li><strong>dc:creator</strong> and <strong>dc:contributor</strong> link to the EnhancementEngine(s) involved in creating the Enhancement.</li>
+<li><strong>dc:created and </strong>dc:modified** are intended to help sort enhancement based on enhancement activities performed during the enhancement process (something that might be useful especially in case EnhancementEngines do work asynchronously). </li>
+<li><strong>sb:relatedTo</strong> defines that an sb:Enhancement is related to an other. However also specifies that both enhancements are still valid if the other one is deleted.</li>
+<li><strong>sb:dependsOn</strong> defines that an sb:Enhancement depends on the other. If the other Enhancement is deleted (or rejected by a user) than all dependent sb:Enhancements MUST BE also removed/rejected. The above figure shows that sb:hasSuggestion as defined by sb:Annotation is an inverse relation to sb:dependsOn because suggestions depend on the annotation they are suggested for.</li>
+</ul>
+<p>In addition EnhancementEngines might want/need to add additional metadata to the sb:Annotation and sb:Suggestion instances they create. Implementors of such EnhancementEngines are free to define there own Enhancemnt types. Such types MUST BE defined as rdfs:subClassOf sb:Enhancement and SHOULD use **Enhancement in there Concept name. EnhancementEngine MUST also add both the specific type AND sb:Enhancement as rdf:type values.</p>
+<hr />
+<p>Sections below are not yet updated</p>
+<hr />
+<h3 id="annotations">Annotations</h3>
+<p>The concept "Annotation" provides metadata about the extracted feature. This information are important both for the enhancement process and the users of the Stanbol Enhancer.
+The following code segment shows the knowledge typically provided by an Annotation &lt;a&gt;. A description of the properties is provided below:</p>
+<div class="codehilite"><pre>&lt;a&gt; rdf:type sb:Annotation
+[&lt;a&gt; rdf:type sb:Enhancement, sb:Occurrence]
+&lt;a&gt; sb:extracted-from &lt;ci&gt;
+&lt;a&gt; dc:title label  //TODO: maybe it is better to use rdfs:label
+&lt;a&gt; dc:role annotationRole^^xsd:anyURI
+&lt;a&gt; dc:type annotationType^^xsd:anyURI
+&lt;e&gt; sb:confidence value^^xsd:float
+&lt;a&gt; sb:entity entity^^xsd:anyURI
+&lt;a&gt; sb:entity-type entityType^^xsd:anyURI
+&lt;a&gt; sb:suggestion &lt;a1&gt;
+</pre></div>
+
+
+<p>The following properties are defined for Annotations &lt;a&gt;</p>
+<ul>
+<li><strong>rdf:type sb:Annotation</strong>: This states that someone can expect the resource to provide all the information as defined by this specification</li>
+<li><strong>sb:extracted-from</strong>: This links the annotation describing an feature with the content item this feature is extracted from.</li>
+<li><strong>dc:title</strong>: This is the human readable name - the label - of the extracted Feature</li>
+<li><strong>dc:role</strong>: If this Annotation is a Tag, Category, Suggestion ... There will be a controlled vocabulary describing the different roles used by the Stanbol Enhancer</li>
+<li><strong>dc:type</strong>: The type of the Feature described by this Annotation e.g. Person, Organization, Location ... There will be a controlled vocabulary with types used by the Stanbol Enhancer</li>
+<li><strong>sb:confidence</strong>: The value describes the confidence of the EnhancementEngine. Values are on an ordinal scale. TODO: In the current implementation values of different Enhancement Engines are not comparable, but that information might not be available/processed by users and therefore result in wrong interpretations (rwesten)</li>
+<li><strong>sb:entity</strong>: In case an annotation describes an Entity, this property provides the URI for the entity</li>
+<li><strong>sb:entity-type</strong>: In case an annotation describes an Entity, this property provides the rdf:types of the linked entity</li>
+<li><strong>sb:suggestion</strong>: Links to an other annotation that provides a suggestion for this one. This indicates that the Stanbol Enhancer requests the client to decide between the provided options - e.g. by some user interaction.</li>
+<li><strong>sb:occurrence</strong>: Optionally links to one or more sb:Occurrence of this annotation within the parsed Content. Note that there are several types of Occurrences (TextOccurrence, ImageOccurrence, MetadataOccurrence …) defined. If this property is missing, that the Annotation is assumed to be about the whole content (as referred to by the sb:extracted-from property).</li>
+</ul>
+<p><strong>Annotations Type</strong> describe the type of the annotated feature based on a terminology standardized by Stanbol. Current types include</p>
+<ul>
+<li>dbpedia-ont:Place</li>
+<li>dbpedia-ont:Organisation</li>
+<li>dbpedia-ont:Person</li>
+<li>add some additional types describing Occurrents (Activities, Events), Conceptualizations </li>
+</ul>
+<p>This list should only contain some types useful for grouping Annotations in user interfaces. The exact types of entities can be anyway added by using the sb:entity-type property.</p>
+<p><em>TODO</em>: We need to decide if we create an own controlled vocabulary within the Stanbol namespace or if we select some concepts defined in an external ontology (such as the dbpedia ontology that is currently used). </p>
+<p><strong>Annotation Roles</strong> describe the proposed role of the extracted feature in relation to the content. The following list shows the currently defined roles:</p>
+<ul>
+<li>sb:Tag: The feature can be suggested as tag for the parsed content.</li>
+<li>sb:Category: The feature provides a categorization for the parsed content.</li>
+<li>sb:Keyword: The feature describes a keyword within the parsed content TODO: describe the difference between keywords and tags</li>
+</ul>
+<p><em>NOTE</em>: Such roles should make it more easy to support additional Annotations roles as suggested by <a href="https://issues.apache.org/jira/browse/STANBOL-48">STANBOL-48</a> and <a href="https://issues.apache.org/jira/browse/STANBOL-12">STANBOL-12</a> that includes <a href="https://issues.apache.org/jira/browse/STANBOL-28">STANBOL-28</a> and <a href="https://issues.apache.org/jira/browse/STANBOL-29">STANBOL-29</a>.</p>
+<h3 id="sbsuggestion">sb:Suggestion</h3>
+<p>Suggestions are used by the Stanbol Enhancer to suggest possible values for the resolution features extracted from the parsed content. 
+Currently there are two different use cases for Suggestions defined</p>
+<ul>
+<li>(1) Entity Resolution:* Suggests entities for an Feature extracted from the content. Typically such suggestions are calculated based on the name of the feature found within the content (e.g. the selected text of a sb:TextOccurrence).</li>
+<li>(2) Field Value Suggestion:* Suggest a value for a specific property. This kind of suggestion are useful if an relation between two extracted features is detected. A typical example would be a person "Steve Jobs" with the role "CEO" of the company "Apple Inc". Such relations can be detected by NLP tools. However suggestions like this are also central for semantic lifting of RDFa annotations as shown in the example below.</li>
+</ul>
+<p>sb:Suggestion uses the following properties</p>
+<ul>
+<li><strong>sb:entity</strong>: The id of the suggested Entity</li>
+<li><strong>sb:entity-type</strong>: The type(s) of the suggested Entity</li>
+<li><strong>sb:confidence</strong>: Needed to sort in case of multiple suggestions</li>
+<li><strong>sb:field</strong>: Defines the property this suggestion should become the value if accepted by the user</li>
+</ul>
+<p>In addition all sb:Suggestions are also of type sb:Enhancement to allow EnhancementEngine to provide enhancement metadata for them.</p>
+<p>for details how they are used please see the following Example</p>
+<p>==== Example ====</p>
+<p>As example lets assume that the following RDFa annotated content is parsed to the Stanbol Enhancer</p>
+<div class="codehilite"><pre><span class="nt">&lt;span</span> <span class="na">typeof=</span><span class="s">&quot;cal:Vevent&quot;</span><span class="nt">&gt;</span>
+    <span class="nt">&lt;h3</span> <span class="na">property=</span><span class="s">&quot;dc:title&quot;</span><span class="nt">&gt;</span> Stanbol Teleconference <span class="nt">&lt;/h3&gt;</span>
+    <span class="nt">&lt;span</span> <span class="na">property=</span><span class="s">&quot;cal:summary&gt;</span>
+<span class="s">        &lt;p&gt; Agenda: &lt;/p&gt;</span>
+<span class="s">        &lt;ul&gt;</span>
+<span class="s">            &lt;li&gt; ... &lt;/li&gt;</span>
+<span class="s">        &lt;ul&gt;</span>
+<span class="s">        &lt;p&gt; Participants: &lt;/p&gt;</span>
+<span class="s">        &lt;ul&gt;</span>
+<span class="s">            &lt;li typeof=&quot;</span><span class="na">foaf:Person</span><span class="err">&quot;</span> <span class="na">property=</span><span class="s">&quot;foaf:name&quot;</span><span class="nt">&gt;</span>Rupert Westenthaler<span class="nt">&lt;/li&gt;</span>
+            <span class="nt">&lt;li</span> <span class="na">typeof=</span><span class="s">&quot;foaf:Person&quot;</span> <span class="na">property=</span><span class="s">&quot;foaf:name&quot;</span><span class="nt">&gt;</span>Olivier Grisel<span class="nt">&lt;/li&gt;</span>
+            <span class="nt">&lt;li&gt;</span> ... <span class="nt">&lt;/li&gt;</span>
+        <span class="nt">&lt;/ul&gt;</span>
+    <span class="nt">&lt;/span&gt;</span>
+<span class="nt">&lt;/span&gt;</span>
+</pre></div>
+
+
+<p>(1) Suggest the Entities for Rupert and Olivier
+(2) Suggest to link Rupert and Olivier as values for "cal:attendee"</p>
+<p>Both for Rupert Westenthaler and Olivier Grisel an EntityAnnotation would be present - in that case created by the RDFa extractor, but in principle this could also work if the RDFa markup is missing. In such cases the EntityAnnotations could be created by an NLPEnhancementEngine.</p>
+<div class="codehilite"><pre>&lt;a1&gt; rdf:type sb:EntityAnnotation
+&lt;a1&gt; dc:title Rupert Westenthaler
+&lt;a1&gt; sb:entity-type foaf:Person
+&lt;a1&gt; sb:hasOccurrence &lt;o1&gt;
+&lt;a1&gt; sb:hasSuggestion &lt;s1&gt;
+
+&lt;a2&gt; rdf:type sb:EntityAnnotation
+&lt;a2&gt; dc:title Olivier Grisel
+&lt;a1&gt; sb:entity-type foaf:Person
+&lt;a2&gt; sb:hasOccurrence &lt;o2&gt;
+&lt;a2&gt; sb:hasSuggestion &lt;s2&gt;
+</pre></div>
+
+
+<p>Lets ignore the occurrences - because how to create Occurrences for RDFa markup is a whole different story that needs to be specified - and concentrate on the suggestions.</p>
+<div class="codehilite"><pre>&lt;s1&gt; rdf:type sb:Suggestion
+&lt;s1&gt; sb:entity &lt;http://www.example.com/person/Rupert_Westenthaler&gt;
+&lt;s1&gt; sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
+&lt;s1&gt; sb:confidence 123,456
+
+&lt;s2&gt; rdf:type sb:Suggestion
+&lt;s2&gt; sb:entity &lt;http://www.example.com/person/Olivier_Grisel&gt;
+&lt;s2&gt; sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
+&lt;s2&gt; sb:confidence 234,567
+</pre></div>
+
+
+<p>If the suggestion is accepted by the client the RDFa markup could be updated like this</p>
+<div class="codehilite"><pre><span class="nt">&lt;li</span> <span class="na">about=</span><span class="s">&quot;http://www.example.com/person/Rupert_Westenthaler&quot;</span>
+    <span class="na">typeof=</span><span class="s">&quot;foaf:Person&quot;</span> <span class="na">property=</span><span class="s">&quot;foaf:name&quot;</span><span class="nt">&gt;</span>Rupert Westenthaler<span class="nt">&lt;/li&gt;</span>
+<span class="nt">&lt;li</span> <span class="na">about=</span><span class="s">&quot;http://www.example.com/person/Olivier_Grisel&quot;</span>
+    <span class="na">typeof=</span><span class="s">&quot;foaf:Person&quot;</span> <span class="na">property=</span><span class="s">&quot;foaf:name&quot;</span><span class="nt">&gt;</span>Olivier Grisel<span class="nt">&lt;/li&gt;</span>
+</pre></div>
+
+
+<p>Now lets have a detailed look at the suggestions to add Rupert and Olivier as a "cal:attendee" to the meeting.
+First we need to create an EntityAnnotation for the Meeting that would be created by the RDFa extractor</p>
+<div class="codehilite"><pre>&lt;a&gt; rdf:type sb:EntityAnnotation
+&lt;a&gt; dc:title &quot;Stanbol Teleconference&quot;
+&lt;a&gt; sb:entity-type cal:Vevent
+&lt;a&gt; sb:hasOccurrence &lt;o&gt;
+&lt;a&gt; sb:hasSuggestion &lt;s3&gt;
+&lt;a&gt; sb:hasSuggestion &lt;s4&gt;
+</pre></div>
+
+
+<p>Again lets skip the occurrence and look at the two suggestions. What I want to do here is to suggest to use the Annotations for Rupert (<a1>) and Olivier (<a2>) as values for the property "cal:attendee".</p>
+<p>It is important to suggest here the annotations <a1> and <a2> as values and NOT the suggested entities (e.g. <a href="http://www.example.com/person/Rupert_Westenthaler">http://www.example.com/person/Rupert_Westenthaler</a> in case of <a1>) because the Stanbol Enhancer can not assume that the user will accepts the suggestions <s1> for <a1> and <s2> for <a2>.</p>
+<p>The following suggestions also use the sb:field property to tell the user that the suggestions is about values for the "cal:attendee" property.</p>
+<div class="codehilite"><pre>&lt;s3&gt; rdf:type sb:Suggestion
+&lt;s3&gt; sb:field cal:attendee
+&lt;s3&gt; sb:entity &lt;a1&gt;
+&lt;s3&gt; sb:entity-type sb:EntityAnnotation
+&lt;s3&gt; sb:confidence 12,34
+
+&lt;s4&gt; rdf:type sb:Suggestion
+&lt;s4&gt; sb:field cal:attendee
+&lt;s4&gt; sb:entity &lt;a2&gt;
+&lt;s4&gt; sb:entity-type sb:EntityAnnotation
+&lt;s4&gt; sb:confidence 12,34
+</pre></div>
+
+
+<p>NOTE:</p>
+<ul>
+<li>I am not sure if it is a good Idea to use "sb:entity" to link to an annotation created by the Stanbol Enhancer because it might confuse users if the same property is used to link external and internal resources. However introducing an additional property such as "sb:value" seam also not better.</li>
+</ul>
+<p>Here the RDFa markup if the user accepts <s3> and <s4> but not <s1> and <s2></p>
+<div class="codehilite"><pre><span class="nt">&lt;span</span> <span class="na">typeof=</span><span class="s">&quot;cal:Vevent&quot;</span><span class="nt">&gt;</span>
+    [...]
+    <span class="nt">&lt;p&gt;</span> Participants: <span class="nt">&lt;/p&gt;</span>
+    <span class="nt">&lt;ul</span> <span class="na">property=</span><span class="s">&quot;cal:attendee&quot;</span><span class="nt">&gt;</span>
+        <span class="nt">&lt;li</span> <span class="na">typeof=</span><span class="s">&quot;foaf:Person&quot;</span> <span class="na">property=</span><span class="s">&quot;foaf:name&quot;</span><span class="nt">&gt;</span>Rupert Westenthaler<span class="nt">&lt;/li&gt;</span>
+        <span class="nt">&lt;li</span> <span class="na">typeof=</span><span class="s">&quot;foaf:Person&quot;</span> <span class="na">property=</span><span class="s">&quot;foaf:name&quot;</span><span class="nt">&gt;</span>Olivier Grisel<span class="nt">&lt;/li&gt;</span>
+        <span class="nt">&lt;li&gt;</span> ... <span class="nt">&lt;/li&gt;</span>
+    <span class="nt">&lt;/ul&gt;</span>
+<span class="nt">&lt;/span&gt;</span>
+</pre></div>
+
+
+<p>and finally the RDFa markup if the all suggestions are accepted by the client side</p>
+<div class="codehilite"><pre><span class="nt">&lt;span</span> <span class="na">typeof=</span><span class="s">&quot;cal:Vevent&quot;</span><span class="nt">&gt;</span>
+    [...]
+    <span class="nt">&lt;p&gt;</span> Participants: <span class="nt">&lt;/p&gt;</span>
+    <span class="nt">&lt;ul</span> <span class="na">property=</span><span class="s">&quot;cal:attendee&quot;</span><span class="nt">&gt;</span>
+        <span class="nt">&lt;li</span> <span class="na">about=</span><span class="s">&quot;http://www.example.com/person/Rupert_Westenthaler&quot;</span>
+            <span class="na">typeof=</span><span class="s">&quot;foaf:Person&quot;</span> <span class="na">property=</span><span class="s">&quot;foaf:name&quot;</span><span class="nt">&gt;</span>Rupert Westenthaler<span class="nt">&lt;/li&gt;</span>
+        <span class="nt">&lt;li</span> <span class="na">about=</span><span class="s">&quot;http://www.example.com/person/Olivier_Grisel&quot;</span>
+            <span class="na">typeof=</span><span class="s">&quot;foaf:Person&quot;</span> <span class="na">property=</span><span class="s">&quot;foaf:name&quot;</span><span class="nt">&gt;</span>Olivier Grisel<span class="nt">&lt;/li&gt;</span>
+    <span class="nt">&lt;/ul&gt;</span>
+<span class="nt">&lt;/span&gt;</span>
+</pre></div>
+
+
+<h3 id="occurrences">Occurrences</h3>
+<p>By default detected Features are considered to be extracted from the whole content. While this assumption is appropriate for things like Categorizations and keywords for a lot of cases it is possible to specify the exact occurrence of features within the content and/or the metadata of the content. In such cases the sb:Annotation will define one or more values for the sb:occurrence value.</p>
+<p>Different Occurrence descriptions are needed to describe the position of a feature within different types of content or within the parsed metadata.</p>
+<p><strong>TextOccurrence:</strong> </p>
+<p>Describe the occurrence of a feature within an textual content.</p>
+<div class="codehilite"><pre>&lt;o&gt; rdf:type sb:TextOccurrence
+    sb:TextOccurrence rdfs:subClassOf sb:Occurrence
+&lt;o&gt; rdf:type sb:Occurrence
+&lt;o&gt; sb:selected-text selectedText
+&lt;o&gt; sb:start startPosition^^xsd:long
+&lt;o&gt; sb:end endPosition^^xsd:long
+&lt;o&gt; sb:context selectionContext
+&lt;o&gt; sb:occurrence-within-context count^^xsd:int
+</pre></div>
+
+
+<ul>
+<li><strong>rdf:type sb:TextOccurrence, sb:Occurrence</strong>: It is required to add both types, to support queries for all Occurrences when no RDFS reasoner is present</li>
+<li><strong>sb:selected-text</strong>: The text selected by this Occurrence. Often the value of this property is the same as of the dc:title property defined by sb:Annotation. However this is no requirement. Enhancement Engines may decide to use different values if appropriate.</li>
+<li><strong>sb:start</strong> and <strong>sb:end</strong>: The start and end position of the selected text relative to the start of the content</li>
+<li><strong>sb:context</strong>: The context (e.g. the sentence) used to extract the selected text.</li>
+<li><strong>sb:occurrence-within-context</strong>: Defines the n-th occurrence of the selected text with the context. Together with the sb:context this can be used to locate the selected text even if the sb:start/sb:end positions are no longer valid (e.g. when the original content was transformed to an other format).</li>
+</ul>
+<p><strong>MetadataOccurrence:</strong> </p>
+<p>Describes the occurrence of an feature within the metadata of the parsed content. This are extremely useful to link entities for literal values provided by metadata standards such as creator information for Dublin Core, Artist, Album, Label ... information provided by ID3 or Camera Model information as present in EXIF metadata. Also geo-point to City, Region, Country enhancements could be done by using this type of occurrences. </p>
+<div class="codehilite"><pre>&lt;o&gt; rdf:type sb:MetadataOccurrence
+    sb:MetadataOccurrence rdfs:subClassOf sb:Occurrence
+&lt;o&gt; rdf:type sb:Occurrence
+&lt;o&gt; sb:field metadataProperty^^xsd:anyURI
+&lt;o&gt; sb:value value
+</pre></div>
+
+
+<ul>
+<li><strong>rdf:type sb:MetadataOccurrence, sb:Occurrence</strong>: It is required to add both types, to support queries for all Occurrences when no RDFS reasoner is present</li>
+<li><strong>rdf:field</strong>: The field of the metadata standard used. Multiple values describe that the feature occurs in several fields</li>
+<li><strong>rdf:value</strong>: The value that hints the described feature. The property is related to the properties dc:title - in case the value is a literal - and the sb:entity - in case the value is an URI - of sb:Annotation.</li>
+</ul>
+<p><strong>Other Occurrence Types</strong></p>
+<ul>
+<li>TimeBasedMediaOccurrence: This would define a temporal section within a time based media (e.g. a Sound File)</li>
+<li>VisualOccurrence: This would define a section within a media that can be presented on a screen</li>
+<li>VideoOccurrence: Would be the combination of a time based and a visual occurrence
+This kind of occurrences are currently not defined, because there is no Stanbol EnhancementEngine that could make use of it.</li>
+</ul>
+<h2 id="use-cases-and-examples">Use Cases and Examples</h2>
+<p>This Sections describes uses cases how the Stanbol Enhancement Structure is used to enhance documents. It also provides examples of how users can use/query for enhancements based on the returned knowledge</p>
+<h3 id="simple-text-enhancement">Simple Text Enhancement</h3>
+<p>An User types the text "Next week I will travel to Paris" and would like to have general Enhancements like Tags, Keywords and Categories</p>
+<p>Lets assume that Paris was detected to describe a location and travel to be a keyword. There are also two known Entities with the name "Paris" and the type Location.
+This would result in an enhancement graph as follows </p>
+<div class="codehilite"><pre># The content item 
+&lt;ci&gt; rdf:type sb:ContentItem
+
+# Paris as detected by the nlpEngine as location
+&lt;a1&gt; rdf:type sb:Enhancement
+&lt;a1&gt; rdf:type sb:Annotation
+&lt;a1&gt; rdf:type sb:Occurrence
+&lt;a1&gt; rdf:type sb:TextOccurrence
+# Properties for Enhancement
+&lt;a1&gt; sb:extracted-from &lt;ci&gt;
+&lt;a1&gt; dc:creator urn:stanbol.engines:nlpEngine
+&lt;a1&gt; dc:created &quot;2011-02-28T12:13:14Z&quot;
+# Properties for Annotation
+&lt;a1&gt; dc:title &quot;Paris&quot;
+&lt;a1&gt; dc:role sb:Tag
+&lt;a1&gt; dc:type: dbpedia-ont:Place
+&lt;a1&gt; dc:suggestion &lt;a2&gt;, &lt;a3&gt;
+&lt;a1&gt; sb:confidence 0.85
+# Properties for TextOccurrence
+&lt;ai&gt; sb:selected-text &quot;Paris&quot;
+&lt;a1&gt; sb:start 28
+&lt;a1&gt; sb:end 32
+&lt;a1&gt; sb:context &quot;Next week I will travel to Paris&quot;
+&lt;a1&gt; sb:occurrence-within-context 1
+
+# dbpedia:Paris as suggested Entity
+&lt;a2&gt; rdf:type sb:Enhancement
+&lt;a2&gt; rdf:type sb:Annotation
+# Properties for Enhancement
+&lt;a2&gt; sb:extracted-from &lt;ci&gt;
+&lt;a2&gt; dc:requires &lt;a1&gt;
+&lt;a2&gt; dc:creator urn:stanbol.engines:entityTaggingEngine
+&lt;a2&gt; dc:created &quot;2011-02-28T12:13:18Z&quot;
+# Properties for Annotation
+&lt;a2&gt; dc:title &quot;Paris&quot;
+&lt;a2&gt; dc:role sb:Suggestion
+&lt;a2&gt; dc:type: dbpedia-ont:Place
+&lt;a2&gt; sb:entity http://dbpedia.org/resources/Paris
+&lt;a2&gt; sb:entity-type dbpedia-ont:City, dbpedia-ont:Settlement, dbpedia-ont:PopulatedPlace, dbpedia-ont:Place
+&lt;a2&gt; sb:confidence 123.456
+
+# dbpedia:Paris,_Texas as suggested Entity
+&lt;a3&gt; rdf:type sb:Enhancement
+&lt;a3&gt; rdf:type sb:Annotation
+# Properties for Enhancement
+&lt;a3&gt; sb:extracted-from &lt;ci&gt;
+&lt;a3&gt; dc:requires &lt;a1&gt;
+&lt;a3&gt; dc:creator urn:stanbol.engines:entityTaggingEngine
+&lt;a3&gt; dc:created &quot;2011-02-28T12:13:19Z&quot;
+# Properties for Annotation
+&lt;a3&gt; dc:title &quot;Paris, Texas&quot;
+&lt;a3&gt; dc:role sb:Suggestion
+&lt;a3&gt; dc:type: dbpedia-ont:Place
+&lt;a3&gt; sb:entity http://dbpedia.org/resources/Paris,_Texas
+&lt;a3&gt; sb:entity-type dbpedia-ont:City, dbpedia-ont:Settlement, dbpedia-ont:PopulatedPlace, dbpedia-ont:Place
+&lt;a3&gt; sb:confidence 12.34
+
+# travel as detected keyword
+&lt;a4&gt; rdf:type sb:Enhancement
+&lt;a4&gt; rdf:type sb:Annotation
+# Properties for Enhancement
+&lt;a4&gt; sb:extracted-from &lt;ci&gt;
+&lt;a4&gt; dc:creator urn:stanbol.engines:keywordExtractionEngine
+&lt;a4&gt; dc:created &quot;2011-02-28T12:13:22Z&quot;
+# Properties for Annotation
+&lt;a4&gt; dc:title &quot;travel&quot;
+&lt;a4&gt; dc:role sb:Keyword
+&lt;a4&gt; dc:type: dbpedia-ont:Activity //can we expect this to be available -&gt; probably not
+</pre></div>
+
+
+<p>When consuming the following queries would be used:</p>
+<p>Getting all Tags: to get all Keywords/Categories replace sb:Tag with sb:Keyword/sb:Category</p>
+<div class="codehilite"><pre>PREFIX dc: &lt;http://purl.org/dc/terms/&gt;
+PREFIX sb: &lt;http://stanbol.apache.org/ontology/1.0/&gt;    
+SELECT ?id, ?title, ?type 
+WHERE {
+    ?id dc:role sb:Tag .
+    ?id dc:title ?title .
+    OPTIONAL { ?id dc:type ?type }
+}
+</pre></div>
+
+
+<p>Getting suggestions for an known Annotation (e.g. urn:annotation1)</p>
+<div class="codehilite"><pre>PREFIX dc: &lt;http://purl.org/dc/terms/&gt;
+PREFIX sb: &lt;http://stanbol.apache.org/ontology/1.0/&gt;    
+SELECT ?entity, ?title, ?type ?score
+WHERE {
+    &lt;urn:annotation1&gt; sb:suggestion ?id .
+    ?id dc:title ?title .
+    ?id sb:entity ?entity .
+    OPTIONAL { ?id sb:entity-type ?type } .
+    OPTIONAL { ?id sb:confidence ?score }
+}
+</pre></div>
+
+
+<p>Getting all selected Entities within the Text</p>
+<div class="codehilite"><pre>PREFIX dc: &lt;http://purl.org/dc/terms/&gt;
+PREFIX sb: &lt;http://stanbol.apache.org/ontology/1.0/&gt;    
+SELECT ?id, ?title, ?start, ?end, ?type 
+WHERE {
+    ?id dc:role sb:Tag .
+    ?id dc:title ?title .
+    ?id sb:start ?start .
+    ?id sb:end ?end .
+    OPTIONAL { ?id dc:type ?type }
+}
+</pre></div>
+
+
+<p>Getting all Locations and optionally the occurrences within the text</p>
+<div class="codehilite"><pre>PREFIX dc: &lt;http://purl.org/dc/terms/&gt;
+PREFIX sb: &lt;http://stanbol.apache.org/ontology/1.0/&gt;    
+PREFIX dbpedia-ont: &lt;http://dbpedia.org/ontology/&gt;  
+SELECT ?id, ?title, ?start, ?end
+WHERE {
+    ?id dc:type dbpedia-ont:Place .
+    ?id dc:title ?title .
+    OPTIONAL {
+        ?id sb:start ?start .
+        ?id sb:end ?end
+    }
+}
+</pre></div>
+
+
+<h3 id="enhancement-of-metadata">Enhancement of Metadata</h3>
+<p>This example shows the the Enhancement Structure allows to create enhancements based on parsed Metadata.</p>
+<p>Lets assume that a user parses a content item and an additional file providing Dublin Core metadata that include (among others)</p>
+<ul>
+<li>dc:creator "Richard Cypher"</li>
+<li>dc:creator "Rachel Brandstone"</li>
+<li>dc:contributor "Richard Cypher"</li>
+</ul>
+<p>Further assume that both Richard and Rachel works for the company running the Stanbol Enhancer and there is an EnhancementEngine that knows about Company resource.
+This example uses the URI "http://www.company.org/team/Richard_Cypher" and "http://www.company.org/team/Rachel_Brandstone" to identify the two example employees.</p>
+<div class="codehilite"><pre>#The content item
+&lt;ci&gt; rdf:type sb:ContentItem
+&lt;ci&gt; dc:creator &quot;Richard Cypher&quot;, &quot;Rachel Brandstone&quot;
+&lt;ci&gt; dc:contributor &quot;Richard Cypher&quot;
+&lt;ci&gt; {other Dublin Core metadata extracted from the parsed file}
+
+# Annotation describing the &quot;Richard Cypher&quot;
+# Assumed to be created by the dcAnnotationEngine with the help
+# of the entityTaggingEngine.
+&lt;a1&gt; rdf:type sb:Enhancement
+&lt;a1&gt; rdf:type sb:Annotation
+&lt;a1&gt; rdf:type sb:Occurrence
+&lt;a1&gt; rdf:type sb:MetadataOccurrence
+# Properties for Enhancement
+&lt;a1&gt; sb:extracted-from &lt;ci&gt;
+&lt;a1&gt; dc:creator urn:stanbol.engines:dcAnnotationEngine
+&lt;a1&gt; dc:contributor urn:stanbol.engines:entityTaggingEngine
+&lt;a1&gt; dc:created &quot;2011-02-28T13:14:15Z&quot;
+# Properties for Annotation
+&lt;a1&gt; dc:title &quot;Richard Cypher&quot;
+&lt;a1&gt; dc:role sb:Tag
+&lt;a1&gt; dc:type: dbpedia-ont:Person
+&lt;a1&gt; sb:confidence 1.0
+&lt;a1&gt; sb:entity http://www.company.org/team/Richard_Cypher
+&lt;a1&gt; sb:entity-type foaf:Agent, foaf:Person, vCard:Contact
+# Properteis for MetadataOccurrence
+&lt;a1&gt; sb:field dc:creator, dc:contributor
+&lt;a1&gt; sb:value &quot;Richard Cypher&quot;
+
+# Annotation describing the &quot;Rachel Brandstone&quot;
+&lt;a1&gt; rdf:type sb:Enhancement
+&lt;a1&gt; rdf:type sb:Annotation
+&lt;a1&gt; rdf:type sb:Occurrence
+&lt;a1&gt; rdf:type sb:MetadataOccurrence
+# Properties for Enhancement
+&lt;a1&gt; sb:extracted-from &lt;ci&gt;
+&lt;a1&gt; dc:creator urn:stanbol.engines:dcAnnotationEngine
+&lt;a1&gt; dc:contributor urn:stanbol.engines:entityTaggingEngine
+&lt;a1&gt; dc:created &quot;2011-02-28T13:14:22Z&quot;
+# Properties for Annotation
+&lt;a1&gt; dc:title &quot;Rachel Brandstone&quot;
+&lt;a1&gt; dc:role sb:Tag
+&lt;a1&gt; dc:type: dbpedia-ont:Person
+&lt;a1&gt; sb:confidence 1.0
+&lt;a1&gt; sb:entity http://www.company.org/team/Rachel_Brandstone
+&lt;a1&gt; sb:entity-type foaf:Agent, foaf:Person, vCard:Contact
+# Properteis for MetadataOccurrence
+&lt;a1&gt; sb:field dc:creator
+&lt;a1&gt; sb:value &quot;Rachel Brandstone&quot;
+</pre></div>
+
+
+<p><em>NOTE</em>: One could also create two sb:Annotations for both Richard and Rachel, one Annotation describing the annotated value and a second suggesting the entity for the first, but that seams like an unnecessary complexity as long as there is only one person with this name in the company. Nonetheless this decision needs to be reviewed.
+Therefore the code for Richard when using this variant.</p>
+<div class="codehilite"><pre>#Annotation describing &quot;Richard Cypher&quot; as extracted from the DC description
+&lt;a1&gt; rdf:type sb:Enhancement
+&lt;a1&gt; rdf:type sb:Annotation
+&lt;a1&gt; rdf:type sb:Occurrence
+&lt;a1&gt; rdf:type sb:MetadataOccurrence
+# Properties for Enhancement
+&lt;a1&gt; sb:extracted-from &lt;ci&gt;
+&lt;a1&gt; dc:creator urn:stanbol.engines:dcAnnotationEngine
+&lt;a1&gt; dc:created &quot;2011-02-28T13:14:15Z&quot;
+# Properties for Annotation
+&lt;a1&gt; dc:title &quot;Richard Cypher&quot;
+&lt;a1&gt; dc:role sb:Tag
+&lt;a1&gt; dc:type: dbpedia-ont:Person
+&lt;a1&gt; sb:confidence 1.0
+&lt;a1&gt; sb:suggestion &lt;a3&gt;
+# Properteis for MetadataOccurrence
+&lt;a1&gt; sb:field dc:creator, dc:contributor
+&lt;a1&gt; sb:value &quot;Richard Cypher&quot;
+
+# Annotation describing the employee Richard Cypher
+&lt;a3&gt; rdf:type sb:Enhancement
+&lt;a3&gt; rdf:type sb:Annotation
+# Properties for Enhancement
+&lt;a3&gt; sb:extracted-from &lt;ci&gt;
+&lt;a3&gt; dc:requires &lt;a1&gt;
+&lt;a3&gt; dc:creator urn:stanbol.engines:entityTaggingEngine
+&lt;a3&gt; dc:created &quot;2011-02-28T13:14:18Z&quot;
+# Properties for Annotation
+&lt;a3&gt; dc:title &quot;Richard Cypher&quot;
+&lt;a3&gt; dc:role sb:Suggestion
+&lt;a3&gt; dc:type: dbpedia-ont:Person
+&lt;a3&gt; sb:entity http://www.company.org/team/Richard_Cypher
+&lt;a3&gt; sb:entity-type foaf:Agent, foaf:Person, vCard:Contact
+&lt;a3&gt; sb:confidence 8.76
+</pre></div>
+
+
+<p>When consuming the following queries would be used:</p>
+<p>Getting all Annotations for the dc:creator field</p>
+<p>Version based on variant 1:</p>
+<div class="codehilite"><pre>PREFIX dc: &lt;http://purl.org/dc/terms/&gt;
+PREFIX sb: &lt;http://stanbol.apache.org/ontology/1.0/&gt;    
+SELECT ?id, ?title, ?creatorId
+WHERE {
+    ?id dc:title ?title .
+    ?id sb:entity ?creatorId .
+    ?id sb:field dc:creator.
+}
+</pre></div>
+
+
+<p>Version for variant 2:</p>
+<div class="codehilite"><pre>PREFIX dc: &lt;http://purl.org/dc/terms/&gt;
+PREFIX sb: &lt;http://stanbol.apache.org/ontology/1.0/&gt;    
+SELECT ?id, ?title, ?creatorId
+WHERE {
+    ?ma sb:field dc:creator .
+    ?ma sb:suggestion ?id . 
+    ?id dc:title ?title .
+    ?id sb:entity ?creatorId .
+    ?id sb:field dc:creator.
+}
+</pre></div>
+
+
+<p>Getting all Annotations created for DC properties</p>
+<p>Version based on variant 1:</p>
+<div class="codehilite"><pre>PREFIX dc: &lt;http://purl.org/dc/terms/&gt;
+PREFIX sb: &lt;http://stanbol.apache.org/ontology/1.0/&gt;    
+SELECT ?id, ?title, ?field, ?entity
+WHERE {
+    ?id dc:title ?title .
+    ?id sb:entity ?entity .
+    ?id sb:field ?field.
+    FILTER(REGEX(asString(?field),&quot;$http://purl.org/dc/terms/.*&quot;))
+}
+</pre></div>
+
+
+<p>Version based on variant 2:</p>
+<div class="codehilite"><pre>PREFIX dc: &lt;http://purl.org/dc/terms/&gt;
+PREFIX sb: &lt;http://stanbol.apache.org/ontology/1.0/&gt;    
+SELECT ?id, ?title, ?field, ?entity
+WHERE {
+    ?ma sb:field dc:creator .
+    ?ma sb:field ?field.
+    ?ma sb:suggestion ?id . 
+    ?id dc:title ?title .
+    ?id sb:entity ?entity .
+    FILTER(REGEX(asString(?field),&quot;$http://purl.org/dc/terms/.*&quot;))
+}
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/entityhub/entityhubandlinkeddata.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/entityhub/entityhubandlinkeddata.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/0.9.0-incubating/entityhub/entityhubandlinkeddata.html Wed Apr 11 08:30:47 2012
@@ -0,0 +1,289 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - </title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title"></h1>
+    <h1 id="adopting-linked-media-principles-for-stanbol-entityhub">Adopting Linked Media principles for Stanbol Entityhub</h1>
+<p><a href="http://linkeddata.org/">Linked Data</a> describes the idea of linking - formally unconnected - bits of data over the web. Think about how hyperlinks are used to navigate within the Web of documents. Linked data tries to do the same for the Web of Data. This basic idea is also central to most of the Apache Stanbol Components. However Stanbol is not only concerned about about linking data but also with interlinking the web of documents with the web of data. Therefore <a href="http://lists.w3.org/Archives/Public/public-lod/2011May/0019.html">this proposal</a> to extend Linked Data principles to also support content and not just data seams like a natural fit for Apache Stanbol.</p>
+<p>This Documents first provides a short introduction to Linked Data and the proposed Linked Media extensions. The second part of the document analysis requirements of the Stanbol Entityhub related to Linked Data and Linked Media. The third section goes than into more details on how Linked Media principles could be implemented by Entityhub.</p>
+<h2 id="short-introduction-to-linked-data-and-proposed-linked-media-extensions">Short Introduction to Linked Data and proposed Linked Media extensions</h2>
+<p>from <a href="http://linkeddata.org/faq">linkeddata.org</a> </p>
+<blockquote>
+<h3 id="what-is-linked-data">What is Linked Data?</h3>
+<p>The Web enables us to link related documents. Similarly it enables us to link related data. 
+The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. 
+Key technologies that support Linked Data are URIs (a generic means to identify entities or 
+concepts in the world), HTTP (a simple yet universal mechanism for retrieving resources, 
+or descriptions of resources), and RDF (a generic graph-based data model with which to 
+structure and link data that describes things in the world).</p>
+</blockquote>
+<p>The following terminology is often used with with Linked Data:</p>
+<ul>
+<li>Resources: All items of interest that are to be published on the Web.</li>
+<li>Information Resources: All documents on the Web (test, imaged, videos ...)</li>
+<li>Non-Information Resources: Real-word-objects that exist outside of the Web (Persons, Organizations, Places ...) but also social concepts (Categories, Terminologies …).</li>
+<li>Resource Identifiers: Linked Data recommends to only use HTTP URIs as identifiers because this allows to directly accessing information about the resource over the web.</li>
+<li>Representation: A stream of bytes in a certain format that describes an Information Resource. Representations can be available in different formats.</li>
+<li>Dereferencing of HTTP URIs: For Information Resources the content is directly returned. For Non-Information Resources the HTTP status "303 See Other" with a link to the Information Resource describing the Non-Information resource is returned.</li>
+<li>Content Negotiation: Users can select the format (content type) of the returned Representations by setting the "Accept" header in requests. Linked data recommends to use different URIs for Representations of different content type to allow Bookmarking. The parsed "Accept" header is therefore used to decide about the URI parsed with an "303 See Other" response.</li>
+<li>URI Aliases: If different providers publish information about the same Non-Information Resource (e.g a famous Person, a Country, ...) than "<a href="http://www.w3.org/TR/owl-ref/#sameAs-def">owl:sameAs</a>" relations are used to tell clients that two different Resource Identifiers (HTTP URIs) identify the same Resource.</li>
+</ul>
+<p>A more detailed overview is provided by the <a href="http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/">Linked Data Tutorial</a>.</p>
+<h3 id="linked-media">Linked Media</h3>
+<p>The <a href="http://lists.w3.org/Archives/Public/public-lod/2011May/0019.html">Linked Media proposal</a> tries to extend Linked Data by two features.</p>
+<ol>
+<li>Creating and updating of resources: Linked data currently covers only retrieval of information, which is sufficient for sites like <a href="http://dbpedia.org">DBpedia</a> or <a href="http://www.geonames.org">Geonames</a> where users are only able to consume data. When creating interactive (web) applications one needs to be able to create/update and remove information. Features that are currently not covered by linked data, but well defined for RESTful Services. The Linked Media proposal therefore suggest to use HTTP PUT, POST and DELETE request for this purpose.</li>
+<li>Handling both content and metadata: Linked Data uses Content Negotiation to select suitable content types. In addition it provides means to redirect to Information Resources about Non-Information Resources. However linked data does not differentiate between metadata and content. One can not explicitly ask first for an GIF Image and later for the metadata as RDF. Or first for an HTML blog post and later for its metadata formatted as HTML. Such a differentiation is only supported for Non-Information Resources. E.g. for a famous painting (Non-Information Resource) and a photo (Information Resource). Liked Media proposes to use the "rel" parameter of the Accept header to allow users to explicitly ask for content ("Accept: type/subtype; rel=content") or metadata ("Accept: type/subtype; rel=meta").</li>
+</ol>
+<p>For a more detailed description please follow the link to the <a href="http://lists.w3.org/Archives/Public/public-lod/2011May/0019.html">Linked Media proposal</a> [1] as posted by by Sebastian Schaffert on the linked open data mailing list of W3C. You might also be interested in reading the following discussion. Note also <a href="http://code.google.com/p/kiwi/source/browse/kiwi-core/src/main/java/kiwi/core/webservices/resource/ResourceWebService.java">ResourceWebService</a> [2] a first implementation of the Linked Media proposal based on the <a href="http://code.google.com/p/kiwi/">Kiwi2/Linked Media Framework</a> [3][4].<br />
+</p>
+<h2 id="requirements-of-the-stanbol-entityhub">Requirements of the Stanbol Entityhub</h2>
+<p>This section tries to identify requirements of the Stanbol Entityhub related to Linked Data and Linked Media. The goal of this analysis is to identify where it makes sense to adopt Linked Data/Media principles for the RESTful interface of the Entityhub.</p>
+<p>The Entityhub fulfills two requirements: </p>
+<ol>
+<li>it allows to define and manage network of referenced sites used to retrieve information about entities from. In addition the Entityhub also supports the use of local caches to speedup access and to get independency of the availability of remote services. </li>
+<li>it manages an own (local) site that is used to manage local entities. Such entities can be created locally but it is also possible to import them form any referenced site. Typical examples of locally managed entities are customers, employees, concepts of a company thesaurus, offices, meeting rooms ... </li>
+</ol>
+<h3 id="entity-model-of-the-entityhub">Entity Model of the Entityhub</h3>
+<p>Entities managed by the entityhub define first an unique ID. In case the referenced site follows linked data principles this will be the HTTP URI of the Non-Information resource. However this might be any valid URI (including URNs). The URI prefix of locally managed entities are configureable. Therefore the URI type of locally managed entities depends on the configuration. The Entity itself represents a Non-Information Resource. Each Entity comes with a Representation. The representation holds all information known by the site about the entity. In Linked Data terminology the Representation is the Information Resource a User needs to be redirected when he requests the Entity (Non-Information Resource). Finally an Entity also links to the ID of the (referenced) site managing it. This allows users to track who is providing the information for an Entity.</p>
+<p>Currently the Entityhub distinguish three different types of Entities:</p>
+<ol>
+<li>Sign: All Entities managed by referenced sites</li>
+<li>Symbol: All locally managed Entities. Symbols hold additional metadata such as a preferred label, a state.</li>
+<li>EntityMapping: Mappings form Symbols to Signs. Linked Data typically uses owl:sameAs to define such mappings however in case of the Entityhub such mappings need to hold additional meta information such as the state, expire data of the mapping ...</li>
+</ol>
+<p>Metadata such as license, copyright statements, attributions as well as informations about the organization managing a referenced site are managed with referenced sites and not with single entities.</p>
+<p>All the additional information provided by this three Entity types as well as the additional metadata provided for referenced sites are based on Linked Data principles metadata about the Information Resource - the Representation - and not about the Non-Information Resource - the Entity.</p>
+<p>Therefore the Entityhub manages:</p>
+<ul>
+<li>Non-Information Resources: All the Entities of referenced Sites as well as locally managed Entities</li>
+<li>Content: All Representations about Entities</li>
+<li>Metadata: Additional information about Representations such as license, copyrights, attributions as well as mappings to other entities.</li>
+</ul>
+<h3 id="restful-services-of-the-entityhub">RESTful Services of the Entityhub</h3>
+<p>The Entityhub defines the following service endpoints:</p>
+<ol>
+<li>The (referenced) Site Manager: Provides retrieval and search over all referenced sites.</li>
+<li>(referenced) Site Endpoint: Provides the same interface but for a specific referenced site.</li>
+<li>The Entityhub Endpoint: Provides full read/write and retrieval access for locally managed Entities.</li>
+</ol>
+<p>Therefore the Entityhub needs to support read only access for Entities managed by referenced sites and full read/write access (CRUD) locally managed Entities.</p>
+<h3 id="summary">Summary</h3>
+<p>Consuming Linked Data:</p>
+<ul>
+<li>Consume Linked Data from remote sites</li>
+<li>Search resources on remote sites based on labels/language and type (by using SPARQL)</li>
+</ul>
+<p>Referenced Entities (Entities of Referenced Sites)</p>
+<ul>
+<li>Support local management of additional metadata for referenced entities (e.g. mappings to local entities)</li>
+<li>Support merging of remote metadata (e.g. defined by "foaf:primaryTopic") with local ones (e.g. mappings to local entities)</li>
+<li>Provide Content + Metadata - as proposed by Linked Media - even for referenced entities.</li>
+<li>Support Search for Entities based on labels/language and type</li>
+</ul>
+<p>Local Entities (Entities managed by the Entityhub)</p>
+<ul>
+<li>Provide local Entities as Linked Media (full CRUD support; management of Content and Metadata)</li>
+<li>Support creation of local entities based on referenced one</li>
+<li>Support finding of additional mappings based on owl:sameAs relations</li>
+<li>Support importing of metadata for mapped entities (e.g. to correctly handle attribution requirements)</li>
+<li>Support Enabling/disabling the use of redirects</li>
+<li>Support Search for Entities based on labels/language and type</li>
+</ul>
+<p>Based on this evaluation of the Model and the Services provided by the Entityhub the proposed Linked Media extension to the Linked Data principles would be sufficient to cover most of the functionalities exposed by the Entityhub as RESTful services. While for referenced Sites only the distinction between Metadata and Content is needed for locally managed Entities also the possibility to create, update and remove Entities, their Representation (content) and metadata is of central importance. The main functionalities not covered is the import of Entities from referenced sites. Also for functionalities like the creation of mappings and the management of the Entity workflow special additions to the generic Linked Media/Linked Data API would be useful.</p>
+<h2 id="specific-considerations">Specific Considerations</h2>
+<p>This section contains Entityhub specific considerations about some of the principles defined for Linked Data and Linked Media. </p>
+<h3 id="resource-identifier">Resource Identifier</h3>
+<p>Linked data defines the principle to use HTTP URIs as Resource Indetifier so that one can retrieve data by directly accessing the URI of a resource. This does not work out for the Entityhub because it needs to also manage remote entities and also for local entities this will not always be an option. Because of that the RESTful interface needs also to support an alternative that allows to parse the URI of an entity as a parameter. This is also a requirement to don't affect the IDs of entities when the Entityhub is deployed on an different host of even by using localhost. In addition this allows to use use other URI types (mainly URNs but also other protocols such as LDAP) as identifiers for locally managed entities.</p>
+<h3 id="redirects-for-content-negotiation">Redirects for Content Negotiation</h3>
+<p>It is important to consider that Entities are Non-Information Resources and based on Linked Data Principles requests for Non-Information resources need to be answered with redirects ("303 See Other") to the URI of the Information Resource. In practice such redirects are for two things:</p>
+<ol>
+<li>
+<p>To allow Users to directly access (and bookmark) URIs of a specific format and therefore bypass content negotiation. This is mainly because Browsers do not allow to define the "Accept" headers. Because of that without this indirection typical users would be unable to retrieve other formats that HTML.</p>
+<p>For the Entityhub where most of the requests will be issued by clients that support the usage of "Accept" headers, the usage of redirects seems unfavorable because: First it will double the numbers of requests and also adds an additional RTT (round trip time). Secound browsers always issue a GET request when following an redirect independent of the type for the initial request. This can cause problems when returning redirects for POST, PUT and DELETE requests. Because of this for the Entityhub it would make sense to provide the possibility to deactivate/activate the usage of redirects (e.g. via a configuration, a request property or even a header field).</p>
+</li>
+<li>
+<p>To attach metadata of the Information Resources. As an example take the <a href="http://data.nytimes.com">Linked Data endpoint of the New York Times</a>. It uses "http://data.nytimes.com/{uuid}" for Entities and "http://data.nytimes.com/{uuid}.rdf" for the RDF XML representations. When looking at the representations provided for Entities (e.g. take <a href="http://data.nytimes.com/N25800450843199534421">North Carolina</a> one can see that triples using "http://data.nytimes.com/{uuid}" as subject are data about North Carolina where triples that use "http://data.nytimes.com/{uuid}.rdf" as subject represent metadata. Note also that the metadata is also connected to the representation of North Carolina by the <a href="http://xmlns.com/foaf/0.1/primaryTopic">foaf:primaryTopic</a> relation. </p>
+<p>When using extensions proposed by Linked Media, than it would be possible to directly refer to the metadata by setting the "rel" parameter of the "Accept" header to "meta". Therefore a request defining "Accept: application/rdf+xml; rel=meta" would - assuming that redirects are deactivated - directly return the metadata for for the requested entity (e.g. the license) encoded as RDF XML. In case redirects are enabled it would return a "303 See Other" with the URI of the metadata.</p>
+<p>Note that - in principle - there are two kinds of redirects: (1) redirects between Resources. This includes redirects from Entities to Representation ("rel=content") as well as to the Metadata ("rel=meta"); (2) redirects used for Content Negotiation. Therefore it would be possible to provide the possibility to enable/disable this types separately. </p>
+<p>Also note that in cases where several redirects would be needed to reach the final resource (e.g. when requesting information about an Non-Information Resource in "text/html": Non-Information Resource -&gt; Information resource -&gt; HTML version) than the request will directly return the final destination. </p>
+</li>
+</ol>
+<h2 id="redesigning-the-entityhub">Redesigning the Entityhub</h2>
+<p>This section evaluates necessary changes to the Entityhub.</p>
+<h3 id="uri-scheme-for-resources">URI scheme for Resources</h3>
+<p>The support of Linked Data requires the use of a local URI. This is in contrast to the parameter based approach ("?id={remoteURI}") as currently used by the Entityhub. The goal is that the Entityhub allows both variants</p>
+<div class="codehilite"><pre><span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">host</span><span class="p">}</span><span class="sr">/entityhub/</span><span class="p">{</span><span class="n">site</span><span class="p">}</span><span class="sr">/entity/</span><span class="p">{</span><span class="n">localname</span><span class="p">}</span> <span class="ow">and</span>
+<span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">host</span><span class="p">}</span><span class="sr">/entityhub/</span><span class="p">{</span><span class="n">site</span><span class="p">}</span><span class="o">/</span><span class="n">entity</span><span class="p">?</span><span class="n">uri</span><span class="o">=</span><span class="p">{</span><span class="n">uri</span><span class="p">}</span>
+</pre></div>
+
+
+<p>to refer an Entity. This requires that the Entityhub provides a local HTTP URI for any (local or remote) entity. The suggestion is to use the local name of the remote entity or the MD5 of the whole URI in cases where this is not possible.</p>
+<p>To support the redirects as defined by Linked Data it is also necessary to generate own URIs for Representations. To support the differentiation between Content and Metadata we need also an own URI for the metadata.</p>
+<p>The proposal is to use file extension like additions to the local name of Entities:</p>
+<div class="codehilite"><pre><span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">host</span><span class="p">}</span><span class="sr">/entityhub/</span><span class="p">{</span><span class="n">site</span><span class="p">}</span><span class="sr">/entity/</span><span class="p">{</span><span class="n">localname</span><span class="p">}</span><span class="o">.</span><span class="n">rep</span>
+</pre></div>
+
+
+<p>is used to directly refer to the Representation of an Entity - in Linked Media terminology the Information Resource. Note that the local HTTP URI is use as base for the ".rep" extension. "?uri={uri}.rep" will not be supported. Users of the Entityhub can therefore use the ".rep" extension to directly access the content for an Entity. Note that content negotiation will still be needed when requesting this kind of URIs.</p>
+<p>Similar to the above the ".meta" extension will be used for constructing URIs for the metadata:</p>
+<div class="codehilite"><pre><span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">host</span><span class="p">}</span><span class="sr">/entityhub/</span><span class="p">{</span><span class="n">site</span><span class="p">}</span><span class="sr">/entity/</span><span class="p">{</span><span class="n">localname</span><span class="p">}</span><span class="o">.</span><span class="n">meta</span>
+</pre></div>
+
+
+<p>For referenced entities such representations will be created by merging remote metadata with locally managed. Remote Metadata will be recognized by Resources with a <a href="http://xmlns.com/foaf/0.1/primaryTopic">foaf:primaryTopic</a> relation to the Entity. Local Metadata can include information known for the referenced site (e.g. license, copyright, attributions, information about the managing organization ...) as well as mappings to other (locally managed) entities.</p>
+<p>For locally managed Entities the metadata will also include all the additional information as currently defined by the Symbol API (state, predecessors, successors).</p>
+<p>Note that the URIs for Representations and Metadata are optional and will be omitted based on HTTP request headers in case redirects are disabled. However even in case that redirects are disabled it is still possible to use such URIs for requests.</p>
+<h3 id="uri-scheme-for-content-negotiation">URI scheme for Content Negotiation</h3>
+<p>To confirm with the Linked Data principles the Entityhub needs to provide unique HTTP URIs for any content type Information Resources (Content and Metadata Resoruces) can be serialized. As for the ".rep" and ".meta" extensions used to directly access Representations and their Metadata the proposal is also to use of file extensions to indicate the media type. In cases users wish to parse the remote URI as parameter it is also possible to parse the extension or the media type as parameter.</p>
+<div class="codehilite"><pre><span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">host</span><span class="p">}</span><span class="sr">/entityhub/</span><span class="p">{</span><span class="n">site</span><span class="p">}</span><span class="sr">/entity/</span><span class="p">{</span><span class="n">localname</span><span class="p">}</span><span class="o">.</span><span class="p">{</span><span class="n">extension</span><span class="p">}</span> <span class="ow">or</span>
+<span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">host</span><span class="p">}</span><span class="sr">/entityhub/</span><span class="p">{</span><span class="n">site</span><span class="p">}</span><span class="o">/</span><span class="n">entity</span><span class="p">?</span><span class="n">uri</span><span class="o">=</span><span class="p">{</span><span class="n">uri</span><span class="p">}</span><span class="o">&amp;</span><span class="nb">format</span><span class="o">=</span><span class="p">{</span><span class="n">extension</span><span class="p">}</span><span class="o">&amp;</span><span class="n">mediaType</span><span class="o">=</span><span class="p">{</span><span class="n">mediatype</span><span class="p">}</span>
+</pre></div>
+
+
+<p>This shows the case that the extension is directly added to the local URI of the entity. In this case the "rel" parameter of the Accept header would be used to determine if the content - representation - or the metadata need to be encoded in the response. If not specified the representation will be returned.</p>
+<p>To allow also to directly address the representation or the metadata in a specific format the Entityhub also supports the following two variants: </p>
+<div class="codehilite"><pre><span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">host</span><span class="p">}</span><span class="sr">/entityhub/</span><span class="p">{</span><span class="n">site</span><span class="p">}</span><span class="sr">/entity/</span><span class="p">{</span><span class="n">localname</span><span class="p">}</span><span class="o">.</span><span class="n">rep</span><span class="o">.</span><span class="p">{</span><span class="n">extension</span><span class="p">}</span>
+<span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">host</span><span class="p">}</span><span class="sr">/entityhub/</span><span class="p">{</span><span class="n">site</span><span class="p">}</span><span class="sr">/entity/</span><span class="p">{</span><span class="n">localname</span><span class="p">}</span><span class="o">.</span><span class="n">meta</span><span class="o">.</span><span class="p">{</span><span class="n">extension</span><span class="p">}</span>
+</pre></div>
+
+
+<p>Note that the URIs used for content negotiation are optional and will be omitted based on HTTP request headers in case redirects are disabled. However even in case that redirects are disabled it is still possible to use such URIs for requests.</p>
+<h3 id="http-requestresponse-headers-with-special-use">HTTP Request/Response Headers with special use</h3>
+<p>This section provides information about header fields that are specially evaluated by the Entityhub. Normal evaluations of headers as specified by <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html">RFC2616 section 14</a> e.g. the use of Content-Type to read data parsed by PUT/POST requests are not described.</p>
+<h4 id="accept-header"><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1">Accept header</a></h4>
+<p>The Accept header allows to specify the media type of the content as expected by the client in the response. The <a href="http://lists.w3.org/Archives/Public/public-lod/2011May/0019.html">Linked Media proposal</a> suggests to use the "rel" parameter to specify if the response should return the data or the metadata of the requested resource. The semantics of the "rel" parameter is defined for the Link header by <a href="http://www.ietf.org/rfc/rfc5988.txt">RFC5988</a>. An related example can be found on the <a href="http://www.w3.org/wiki/LinkHeader">LinkHeader</a> page on the W3C wiki.</p>
+<p>The pattern useable for Accept header looks like</p>
+<div class="codehilite"><pre><span class="n">Accept:</span> <span class="p">{</span><span class="n">media</span><span class="o">-</span><span class="n">type</span><span class="p">}[;</span> <span class="n">rel</span><span class="o">=</span><span class="n">meta</span><span class="p">]</span>
+</pre></div>
+
+
+<p>If no "rel" pattern is specified the Entityhub will return the data (representation about the entity) as default. If users want to retrieve the the metadata they need to add "rel=meta". The {media-type} is always applied to the information selected by the "rel" parameter. </p>
+<h4 id="cache-control"><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9">Cache-Control</a></h4>
+<p>The Entityhub supports the following cache-request-directives to allow clients some control about local caching of entities managed by remote sites. Note that the Stanbol OFFLINE mode has precedence over Cache-Control specifications<br />
+</p>
+<ul>
+<li>no-cache: Entities are retrieved from the remote site even if a local cache exists (if Stanbol is not in OFFLINE mode)</li>
+<li>no-store: Entities retrieved from a remote side are not cached locally (if Stanbol is not in OFFLINE mode)</li>
+<li>no-transform: The Entityhub may be configured to transform/filter information from the remote site. This can be used to bypass this kind of transformations. In case transformations are used for the local cache, then this parameter will not work out if Stanbol is operates in OFFLINE mode</li>
+<li>only-if-cached: Representations are only returned if they are available in the local cache.</li>
+</ul>
+<h4 id="link-header"><a href="http://www.ietf.org/rfc/rfc5988.txt">Link Header</a></h4>
+<p>The Link header is central to Linked Data and Linked Media because it is used to expose internal structures defined in-between Resources (in-between Entities but also between Entities and there Representations and Metadata)</p>
+<p>The principle Syntax of Link headers is as follows:</p>
+<div class="codehilite"><pre><span class="n">Link:</span> <span class="o">&amp;</span><span class="ow">lt</span><span class="p">;{</span><span class="n">uri</span><span class="p">}</span><span class="o">&amp;</span><span class="ow">gt</span><span class="p">;;</span> <span class="n">rel</span><span class="o">=</span><span class="s">&quot;{relation}&quot;</span><span class="p">;</span> <span class="n">type</span><span class="o">=</span><span class="s">&quot;{media-type}&quot;</span>
+</pre></div>
+
+
+<p>The relation parameter defines the type of the relation. <a href="http://www.iana.org/assignments/link-relations/link-relations.xml">Registered relation types</a> are mainly used to improve the navigation of users. The values "content" and "meta" as suggested by the Linked Media proposal are currently not registered. In such cases <a href="http://www.ietf.org/rfc/rfc5988.txt">RFC5988</a> requires the use of absolute URIs as {relation}. This document will use "content" and "meta" instead of the full URIs as required by RFC5988.</p>
+<p>Regardless of that the values used for the "rel" parameter within the "Link" header by the Entityhub MUST BE the SAME as supported values for the "rel" parameter in the "Accept" header for requests. A pragmatic solution would be to support both the short form and a full URI.<br />
+</p>
+<p>The Entityhub will add the following Links (if applicable)</p>
+<ul>
+<li>
+<p>A reference to the Non-Information resource for the Entity by using the relation type "self". This will always use the local URI used for the resource. In case of remote entities there is also a link to the original resource.</p>
+<p>Link: http://{host}/entityhub/{site}/entity/{localname}; rel=self; </p>
+</li>
+<li>
+<p>A reference to the representation about the reference by using the relation type "content". Currently it is not intended to provide separate links to all available media types for content.</p>
+<p>Link: http://{host}/entityhub/{site}/entity/{localname}.ref; rel=content;</p>
+</li>
+<li>
+<p>A reference to the metadata about the representation about the Entity. Currently it is not intended to provide separate links to all available media types for metadata.</p>
+<p>Link: http://{host}/entityhub/{site}/entity/{localname}.meta; rel=meta;</p>
+</li>
+<li>
+<p>A reference to the source in case of referenced entities. This will be the URI of the entity</p>
+<p>Link: {uri}; rel=via</p>
+</li>
+<li>
+<p>A link to the license for the entity if present</p>
+<p>Link: {licenseURI}; rel=license</p>
+</li>
+</ul>
+<h3 id="entity-model">Entity Model</h3>
+<p>This changes to the RESTful API should be also reflected in the Java API. Currently on the API level there are three types of Entities: Sign, Symbol and EntityMapping. The only differentiation between those Entities are a different set of metadata. However there is no plan to distinguish such types on the RESTful API level.</p>
+<p>To streamline the domain model and to bring it more in line with the RESTful API the proposal is to drop the different Entity types. The Sign, Symbol and EnttiyMapping Interfaces will be replaced by a single Entity interface with the following Methods</p>
+<div class="codehilite"><pre><span class="n">Entity</span>
+    <span class="o">+</span> <span class="n">getId</span><span class="p">()</span> <span class="p">:</span> <span class="n">String</span>
+    <span class="o">+</span> <span class="n">getSite</span><span class="p">()</span> <span class="p">:</span> <span class="n">String</span>
+    <span class="o">+</span> <span class="n">getRepresentation</span><span class="p">()</span> <span class="p">:</span> <span class="n">Representation</span>
+    <span class="o">+</span> <span class="n">getMetadata</span><span class="p">()</span> <span class="p">:</span> <span class="n">Representation</span>
+</pre></div>
+
+
+<p>The use of the Representation interface also for the Metadata allows the use of the same parsers and serializes for both content and metadata. Functionality currently depending on the special APIs of Sign, Symbol and EntityMapping need to be adapted to retrieve the information via the Representation interface. This should be implemented by an utility class.</p>
+<h2 id="references">References</h2>
+<p>[1] http://lists.w3.org/Archives/Public/public-lod/2011May/0019.html</p>
+<p>[2] http://code.google.com/p/kiwi/source/browse/kiwi-core/src/main/java/kiwi/core/webservices/resource/ResourceWebService.java</p>
+<p>[3] Kiwi Project: http://www.kiwi-community.eu/ Blog: http://planet.kiwi-project.eu/</p>
+<p>[4] Kiwi Source Repository: http://code.google.com/p/kiwi/</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>