You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2011/02/28 22:31:56 UTC
svn commit: r1075542 - in /incubator/stanbol/site/trunk/content/stanbol/docs: ./ trunk/ trunk/enhancer/ trunk/enhancer/stanbolenhancementstructure.mdtext

Author: rwesten
Date: Mon Feb 28 21:31:56 2011
New Revision: 1075542

URL: http://svn.apache.org/viewvc?rev=1075542&view=rev
Log:
STANBOL-3 First Proposal for the Stanbol Enhancement Structure that will replace the FISE Enhancement Structure currently used by the Stanbol Enhancer.

Background:
Currently the Stanbol Enhancer still uses the FISE Enhancement Structure. Changing this is unavoidable but will break all the current clients.
Therefore the current plan is to keep using the current structure for some time and switch only to a new one as soon as we also implement new features that do require an extended Enhancement Structure (e.g. support for extracting metadata from parsed content)

As discussed with ogrisel: The Issues STANBOL-12 and STANBOL-48 can and will be resolved by extending the current FISE Enhancement Structure (and therefore without breaking existing clients)

Main Goals of this Proposal:
 - start the discussion early and give peoples time to contribute
 - inspire usage scenarios to catch as many requirements as possible
 - propose solutions for shortcomings and missing features of the FISE Enhancement Structure

As reminder:

The biggest shortcoming of the current FISE Enhancement Structure was the complexity to consume (understand/parse/query) it on the client side. This can - to some extend - be improved by providing clients, but a good design of the Enhancement Structure will always be a central point for the ease of use of the Stanbol Enhancer component. 

I my opinion the easiness depends on a lot of things including
 - human readable default serialisation (JSON-LD): A flat structure that uses less resources with a lot of properties would help with that. Having small pieces of information that link each other randomly distributed over the whole file is a disaster typically for many serialised RDF data and something we must aim to avoid.
 - easy to read/write and modify (SPARQL) queries
 - meaningful property and concept names
 - usage of well known and understood metadata standards such as Dublin Core
 
best
Rupert Westenthaler

Added:
    incubator/stanbol/site/trunk/content/stanbol/docs/
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext

Added: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext
URL: http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext?rev=1075542&view=auto
==============================================================================
--- incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext (added)
+++ incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext Mon Feb 28 21:31:56 2011
@@ -0,0 +1,531 @@
+Title: The Stanbol Enhancement Structure (PROPOSAL)
+
+Please NOTE: This is a proposal for the future version of the Enhancement Structure used by the Stanbol Enhancer. This **DOES NOT** describe the Enhancement Structure used by the current version of the Stanbol Enhancer!
+
+This describe the schema (ontology) used by the Apache Stanbol Enhancer to express features extracted from parsed content items. The main purpose of this is to standardizes information created by EnhamncementEngines to enable users to easily work with enhancement results, but also to support cooperation between different enhancement engines.
+
+## Overview
+
+The Stanbol Enhancement Structure is build around the following main Concepts. Each of this concepts covers a specific aspect related to the enhancement process of content.
+
+The following list gives an overview about the concepts used by the Stanbol Enhancement Strucutre:
+
+* **ContentItem:** This is the resource representing the parsed content. The URI of this resource depends on how the content was parsed to the Stanbol Enhancer. In case an absolute URI is provided by the request, than this URI is used. In all other cased the Stanbol Enhancer creates an URI based on the configured prefix or the URL of the service. The documentation of the RESTful service should provide more information about that.
+
+* **Content:** Several content model distinguish between Content (data) and the ContentItem (Interpretation of the Data). The Enhancement Structure currently only defines ContentItem, because there is no need to describe the data for the purpose of the enhancement process. Other components (such as the /store endpoint) might need to formally describe the data. For such use cases the sic:content property will be used to refer from the ContentItem to the Content. The URI representing the Content will be the same to be used to retrieve its data via a RESTful service. 
+
+* **Enhancement:** This provides metadata about extractions created by EnhancementEngines or present within the content. This includes the creator (usually a EnhancementEngine), the creation time, as well as relations to other enhancements. Users of the Stanbol Enhancer will typically not care about such data because out of the their perspective they represent Meta-Meta-Data (meta data about the metadata).
+
+* **Annotation:** An annotation describe a feature present within the parsed content. Such feature can have three sources. (1) the can originate form metadata present in the parsed content, (2) the can be extracted by analyzing the content itself and (3) they can be based on further processing Annotations of type (1) and (2). The Annotation provides the label, the type (e.g. Person, Organization, Location ) the role (e.g. Tag, Category, Keyword), the confidence and (if available) the link to the entity representing the extracted feature. It is the central concept for users that need to present all the things extracted from the parsed content.
+
+* **Occurrence:** An Occurrence describes the actual location of the feature within the content or the metadata. Based on the type of the content there will be different types of Occurrences. A "text occurrence" will contain information such as the selected-text, the start/end position of the selection and the surrounding text to provide some context. An "image accurrence" will provide the top left and the bottom right position of the selected rectangle. A "metadata occurrence" will describe the property used for the annotation (e.g. dc:creator) the used standard (e.g. DCterms) and the value.
+
+When using the Enhancement Structure one need usually need to combine several of the above concepts to create meaningful statement.
+As an example take a natural language processing engine that needs to express the the word "Paris" found within an sentence like "I will travel to Prais next week" portably refers to a location.
+To express that it will need to combine the concepts 
+
+* Enhancement: to express that this feature was extracted by the Natural Language Processing Engine at a given time ...
+* Annotation: to express that "Paris" represents a "Location" and has the role "Tag"
+* Occurrence: to express where the selected text "Paris" is located within the analyzed content
+
+The same is true for consuming Enhancements. A client interested in presenting Tags, Categories and Keywords needs only information provided by the Annotation concept. To be able to highlight the actual location of detected features within the content on needs to also process information provided by the Occurrence concept.
+
+
+## Specification
+
+### Namespaces and used Notations
+While the Stanbol Enhancement Structure does define some Concepts and Properties it also uses a lot of existing things from other ontologies. To improve the readability of this specification namespace prefixes + local names are used instead of the full URLs by this specification.
+
+All the namespace prefixes used within this specification are described by the following list: 
+
+* sb: represents Stanbol and refers to all properties and concepts defined by the Stanbol enhancement structure. This URL is not yet final, but one of the options is "http://stanbol.apache.org/ontology/".
+* dc: the Dublin Core Terms (DCterms) ontology (http://dublincore.org/documents/dcmi-terms/)
+* rdf: the Resrouce Description Framework (http://www.w3.org/RDF/)
+* rdfs: the RDF schema  (http://www.w3.org/TR/rdf-schema/)
+* sioc: SIOC (Semantically-Interlinked Online Communities) Core Ontology (http://rdfs.org/sioc/ns#)
+
+Notations used by this specification:
+
+* **<{code}>** elements do refer to an instance identified by the URI {code}. To improve the readability {codes} theta refer to instances of concepts defined by the Stanbol enhancement structure will use short forms e.g. <ci> for a ContentItem instance, <a> for anAnnotation instance â¦)
+* **{prefix}:{localname}** is used as short form for <{namespace+localname}>. The namespace -> prefix mappings are defined in the above list
+* **{value}^^dataType** The (xsd) dataType required by the value e.g. xsd:float, xsd:anyUri, The default is xsd:string
+* **?{var}** represent a resource that is unknown by the Stanbol Enhancer. Usually a resource of the Users knowledge model that is not necessarily parsed to the Stanbol  
+* **[{statement}]** represent statements that are typically used in combination with the Stanbol Enhancement Structure but not required nor used by the enhancement process itself.
+
+A special NOTE to the usage of <{code}> in comairism to {value}^^xsd:anyURI:
+
+* In both cases the value will be an URI
+* In case of <{code}> the URI identifies a resource that is created/defined by the enhancement results - meaning that the returned knowledge contains all information about that resource
+* {value}^^xsd:anyURI indicates that enhancement results will not provide additional knowledge about this resource. If the consumer needs more information about such resources he need to use other services to retrieve such knowledge or parse special parameters to tell Stanbol to explicitly include such knowledge in the response.
+
+### ContentItem <ci>
+
+The ContentItem <ci> represents a content enhanced by the Stanbol Enhancer. It is the central resource used to link all the enhancements created by the EnhancementEngines.
+The Stanbol Enhancement Structure does not force client to distinguish between content (data) and contentItem (interpretation of the data). Within the Stanbol Enhancer only the contentItem is needed, because the Content is accessed via the Java API. Client are free to use markup to explicitly identify these parts of documents that need to be interpreted as content (e.g. an element in the DOM tree). An example is provided below. 
+
+    <ci> rdf:type sb:ContentItem
+    [<ci> rdf:type sioc:Item]
+    [<ci> <{metadatafield}> {value(s)}]
+    [?parent sioc:content <ci>]
+
+The ContentItem itself does not define any properties however it is used as domain (target type) of some properties within the Stanbol Enhancement structure. Information extracted from metadata parsed with the content (e.g. Dublin Core, EXIF, ID3 ...) can be added directly to the ContentItem <ci>.
+
+*TODO:* Describe here how to deal with embedded knowledge (e.g. RDFa, MicroFormats â¦). Last time the discussion was to write such knowledge into an own Graph and do not add it to the returned Enhancement Structure. However this was only an suggestion and need to be reviewed.
+
+The usage of SIOC (Semantically-Interlinked Online Communities) is optionally and usually added by the client to embed information (e.g. as RDFa) to the content itself. However when parsing HTML with such markup to the Stanbol Enhancer such markup MUST BE used as default to determine those parts of the content that need to be enhanced.
+
+*TODO*: Move this to an own section about RDFa support!
+
+    <body about="http://www.examplenews.com/featuredNews"><table><tr>
+        <td><!-- The menue: Not to be enhanced --> </td>
+        <td><span property="sic:content" about="http://www.examplenews.com/story123"> 
+            This is the Content of this page to be enhanced by the Stanbol enhancer
+        </span><span property="sic:content" about="http://www.examplenews.com/interview456">
+            And there may be even more than one Sections within the document that need to be enhanced
+        </span></td>
+        <td> <!-- Advertisements: Not to be enhanced --> </td>
+    </tr></trable></body>
+
+By parsing this as Content the Stanbol Enhancer should gather the following knowledge
+
+    <http://www.examplenews.com/featuredNews> sic:content <http://www.examplenews.com/story123>
+    <http://www.examplenews.com/featuredNews> sic:content <http://www.examplenews.com/interview456>
+
+and enhance only the markup within the two span tags marked with sic:content.
+
+*NOTE*: this would require support for multiple ContentItems (story123 and interview456 in that example)
+
+### Enhancement
+
+The concept "Enhancement" defines properties that allow Stanbol EnhancementEngines to formally describe information about the enhancement process. This information are crucial for EnhancemetnEngines to cooperate with each other but typical Stanbol users will not need to border with such information even that in some situation such knowledge might even be useful on the client side e.g. if someone wants to ignore all enhancements created by an specific enhancement engine, or to calculate all enhancements affected by the removal of an part of the content.
+
+The following code segments shows the knowledge typically described by using the Enhancement concept
+
+    <e> rdf:type sb:Enhancement
+    [<e> rdf:type sb:Annotation, sb:Occurrence]
+    <e> dc:creator enhancementEngine^^xsd:anyURI
+    <e> dc:contributor enhancementEngine^^xsd:anyURI
+    <e> dc:created date^^xsd:dateTime
+    <e> dc:modified date^^xsd:dateTime
+    <e> dc:relation <relatedEnhancement>
+    <e> dc:requires <dependsOnEnhancement>
+
+The presence of the statement "<e> rdf:type sd:Enhancement" statement indicated that enhancement metadata are present for the resource <e>. This also means that if there is some configuration set to exclude such information, than all the above properties MUST be removed from the results of the enhancement process.
+The optional  rdf:types sb:Annotation and sb:Occurrent do only indicate, that typically any enhancement resource <e> is also of type sb:Annotation and/or sb:Occurrent. See the according sections and the usage examples for more information.
+
+All of the metadata used to describe the enhancement process do use the DCterms vocabulary. 
+
+* dc:creator and dc:contributor link to the EnhancementEngine(s) involved in creating the Enhancement. 
+* dc:created and dc:modified are intended to help sort enhancement based on enhancement activities performed during the enhancement process (something that might be useful especially in case EnhancementEngines do work asynchronously). 
+* dc:relation and dc:requires are used to describe relations between enhancements. dc:relation is used to state that an enhancement is related to an other one, but would be still valid if the other gets invalid or is removed. dc:requires is used to state that an enhancement depends on an other one and cascading delete/invalidation should be applied. As example an enhancement suggesting the entity "http://dbpedia.org/resources/Paris" might depend on the Word "Paris" found in the Text and be related to an other enhancement stating that the document is about "http://dbpedai.org/resources/France".
+
+*NOTE*: With this version of the enhancement structure it is no longer expected from users to process dc:relation and dc:requires relations as it was the case with the FISE enhancement structure to query for EntityAnnotations for TextAnnotations.
+
+### Annotations
+
+The concept "Annotation" provides metadata about the extracted feature. This information are important both for the enhancement process and the users of the Stanbol Enhancer.
+The following code segment shows the knowledge typically provided by an Annotation <a>. A description of the properties is provided below:
+
+    <a> rdf:type sb:Annotation
+    [<a> rdf:type sb:Enhancement, sb:Occurrence]
+    <a> sb:extracted-from <ci>
+    <a> dc:title label  //TODO: maybe it is better to use rdfs:label
+    <a> dc:role annotationRole^^xsd:anyURI
+    <a> dc:type annotationType^^xsd:anyURI
+    <e> sb:confidence value^^xsd:float
+    <a> sb:entity entity^^xsd:anyURI
+    <a> sb:entity-type entityType^^xsd:anyURI
+    <a> sb:suggestion <a1>
+
+The following properties are defined for Annotations <a>
+
+* **rdf:type sb:Annotation**: This states that someone can expect the resource to provide all the information as defined by this specification
+* **sb:extracted-from**: This links the annotation describing an feature with the content item this feature is extracted from.
+* **dc:title**: This is the human readable name - the label - of the extracted Feature
+* **dc:role**: If this Annotation is a Tag, Category, Suggestion ... There will be a controlled vocabulary describing the different roles used by the Stanbol Enhancer
+* **dc:type**: The type of the Feature described by this Annotation e.g. Person, Organization, Location ... There will be a controlled vocabulary with types used by the Stanbol Enhancer
+* **sb:confidence**: The value describes the confidence of the EnhancementEngine. Values are on an ordinal scale. TODO: In the current implementation values of different Enhancement Engines are not comparable, but that information might not be available/processed by users and therefore result in wrong interpretations (rwesten)
+* **sb:entity**: In case an annotation describes an Entity, this property provides the URI for the entity
+* **sb:entity-type**: In case an annotation describes an Entity, this property provides the rdf:types of the linked entity
+* **sb:suggestion**: Links to an other annotation that provides a suggestion for this one. This indicates that the Stanbol Enhancer requests the client to decide between the provided options - e.g. by some user interaction.
+
+**Annotations Type** describe the type of the annotated feature based on a terminology standardized by Stanbol. Current types include
+
+* dbpedia-ont:Place
+* dbpedia-ont:Organisation
+* dbpedia-ont:Person
+* add some additional types describing Occurrents (Activities, Events), Conceptualizations 
+
+This list should only contain some types useful for grouping Annotations in user interfaces. The exact types of entities can be anyway added by using the sb:entity-type property.
+ 
+*TODO*: We need to decide if we create an own controlled vocabulary within the Stanbol namespace or if we select some concepts defined in an external ontology (such as the dbpedia ontology that is currently used). 
+
+**Annotation Roles** describe the proposed role of the extracted feature in relation to the content. The following list shows the currently defined roles:
+
+* sb:Tag: The feature can be suggested as tag for the parsed content.
+* sb:Category: The feature provides a categorization for the parsed content.
+* sb:Keyword: The feature describes a keyword within the parsed content TODO: describe the difference between keywords and tags
+* sb:Suggestion: The feature is a suggestion for an other Annotations. 
+
+*NOTE*: Such roles should make it more easy to support additional Annotations roles as suggested by [STANBOL-48](https://issues.apache.org/jira/browse/STANBOL-48) and [STANBOL-12](https://issues.apache.org/jira/browse/STANBOL-12) that includes [STANBOL-28](https://issues.apache.org/jira/browse/STANBOL-28) and [STANBOL-29](https://issues.apache.org/jira/browse/STANBOL-29).
+
+For **Suggestions** there are some additional constraints as defined by the following code block
+
+    <a> rdf:type sb:Annotation
+    <a> dc:role !sb:Suggestion
+    <a> sb:suggestion <a1>
+        <a1> rdf:type sb:Annotation
+        <a1> dc:role sb:Suggestion
+        <a1> sb:confidence ordering^^xsd:float 
+
+This means:
+
+* an Annotation may only define suggestion if it does not have the dc:role sb:Suggestion. This prohibits nested suggestions
+* an Annotation lined by sb:suggestion con considered to be of the dc:role sb:Suggestion - even that it does not define this role explicitly.
+* Annotations used as suggestions MUST define some way to allow clients to show them in the right order (
+* the confidence value of annotations used as suggestions should be used to order suggestions when presented to the user. However Applications need to consider that such values are on an ordinal scale meaning that a value of "4" does NOT mean that it is twice as likely than a suggestion with an confidence of "2"!
+
+
+### Occurrences
+
+By default detected Features are considered to be extracted from the whole content. While this assumption is appropriate for things like Categorizations and keywords for a lot of cases it is possible to specify the exact occurrence of features within the content and/or the metadata of the content.
+
+Typically Occurrences are used together with sb:Annotations and sb:Enhancement in cases an EnhancementEngine whats to describe the position of the extracted Feature within the analyzed content. So propertied defined by this two context should be considered when reading this section.
+
+Different Occurrence descriptions are needed to describe the position of a feature within different types of content or within the parsed metadata.
+
+#### TextOccurrence: 
+
+Describe the occurrence of a feature within an textual content.
+
+    <o> rdf:type sb:TextOccurrence
+        sb:TextOccurrence rdfs:subClassOf sb:Occurrence
+    <o> rdf:type sb:Occurrence
+    <o> sb:selected-text selectedText
+    <o> sb:start startPosition^^xsd:long
+    <o> sb:end endPosition^^xsd:long
+    <o> sb:context selectionContext
+    <o> sb:occurrence-within-context count^^xsd:int
+
+* **rdf:type sb:TextOccurrence, sb:Occurrence**: It is required to add both types, to support queries for all Occurrences when no RDFS reasoner is present
+* **sb:selected-text**: The text selected by this Occurrence. Often the value of this property is the same as of the dc:title property defined by sb:Annotation. However this is no requirement. Enhancement Engines may decide to use different values if appropriate.
+* **sb:start** and **sb:end**: The start and end position of the selected text relative to the start of the content
+* **sb:context**: The context (e.g. the sentence) used to extract the selected text.
+* **sb:occurrence-within-context**: Defines the n-th occurrence of the selected text with the context. Together with the sb:context this can be used to locate the selected text even if the sb:start/sb:end positions are no longer valid (e.g. when the original content was transformed to an other format).
+
+#### MetadataOccurrence: 
+
+Describes the occurrence of an feature within the metadata of the parsed content. This are extremely useful to link entities for literal values provided by metadata standards such as creator information for Dublin Core, Artist, Album, Label ... information provided by ID3 or Camera Model information as present in EXIF metadata. Also geo-point to City, Region, Country enhancements could be done by using this type of occurrences. 
+
+    <o> rdf:type sb:MetadataOccurrence
+        sb:MetadataOccurrence rdfs:subClassOf sb:Occurrence
+    <o> rdf:type sb:Occurrence
+    <o> sb:field metadataProperty^^xsd:anyURI
+    <o> sb:value value
+
+* **rdf:type sb:MetadataOccurrence, sb:Occurrence**: It is required to add both types, to support queries for all Occurrences when no RDFS reasoner is present
+* **rdf:field**: The field of the metadata standard used. Multiple values describe that the feature occurs in several fields
+* **rdf:value**: The value that hints the described feature. The property is related to the properties dc:title - in case the value is a literal - and the sb:entity - in case the value is an URI - of sb:Annotation.
+
+
+#### Other Occurrence Types
+
+* TimeBasedMediaOccurrence: This would define a temporal section within a time based media (e.g. a Sound File)
+* VisualOccurrence: This would define a section within a media that can be presented on a screen
+* VideoOccurrence: Would be the combination of a time based and a visual occurrence
+This kind of occurrences are currently not defined, because there is no Stanbol EnhancementEngine that could make use of it.
+
+
+## Use Cases and Examples
+
+This Sections describes uses cases how the Stanbol Enhancement Structure is used to enhance documents. It also provides examples of how users can use/query for enhancements based on the returned knowledge
+
+
+### Simple Text Enhancement
+
+An User types the text "Next week I will travel to Paris" and would like to have general Enhancements like Tags, Keywords and Categories
+
+Lets assume that Paris was detected to describe a location and travel to be a keyword. There are also two known Entities with the name "Paris" and the type Location.
+This would result in an enhancement graph as follows 
+
+    # The content item 
+    <ci> rdf:type sb:ContentItem
+    
+    # Paris as detected by the nlpEngine as location
+    <a1> rdf:type sb:Enhancement
+    <a1> rdf:type sb:Annotation
+    <a1> rdf:type sb:Occurrence
+    <a1> rdf:type sb:TextOccurrence
+    # Properties for Enhancement
+    <a1> sb:extracted-from <ci>
+    <a1> dc:creator urn:stanbol.engines:nlpEngine
+    <a1> dc:created "2011-02-28T12:13:14Z"
+    # Properties for Annotation
+    <a1> dc:title "Paris"
+    <a1> dc:role sb:Tag
+    <a1> dc:type: dbpedia-ont:Place
+    <a1> dc:suggestion <a2>, <a3>
+    <a1> sb:confidence 0.85
+    # Properties for TextOccurrence
+    <ai> sb:selected-text "Paris"
+    <a1> sb:start 28
+    <a1> sb:end 32
+    <a1> sb:context "Next week I will travel to Paris"
+    <a1> sb:occurrence-within-context 1
+
+    # dbpedia:Paris as suggested Entity
+    <a2> rdf:type sb:Enhancement
+    <a2> rdf:type sb:Annotation
+    # Properties for Enhancement
+    <a2> sb:extracted-from <ci>
+    <a2> dc:requires <a1>
+    <a2> dc:creator urn:stanbol.engines:entityTaggingEngine
+    <a2> dc:created "2011-02-28T12:13:18Z"
+    # Properties for Annotation
+    <a2> dc:title "Paris"
+    <a2> dc:role sb:Suggestion
+    <a2> dc:type: dbpedia-ont:Place
+    <a2> sb:entity http://dbpedia.org/resources/Paris
+    <a2> sb:entity-type dbpedia-ont:City, dbpedia-ont:Settlement, dbpedia-ont:PopulatedPlace, dbpedia-ont:Place
+    <a2> sb:confidence 123.456
+
+    # dbpedia:Paris,_Texas as suggested Entity
+    <a3> rdf:type sb:Enhancement
+    <a3> rdf:type sb:Annotation
+    # Properties for Enhancement
+    <a3> sb:extracted-from <ci>
+    <a3> dc:requires <a1>
+    <a3> dc:creator urn:stanbol.engines:entityTaggingEngine
+    <a3> dc:created "2011-02-28T12:13:19Z"
+    # Properties for Annotation
+    <a3> dc:title "Paris, Texas"
+    <a3> dc:role sb:Suggestion
+    <a3> dc:type: dbpedia-ont:Place
+    <a3> sb:entity http://dbpedia.org/resources/Paris,_Texas
+    <a3> sb:entity-type dbpedia-ont:City, dbpedia-ont:Settlement, dbpedia-ont:PopulatedPlace, dbpedia-ont:Place
+    <a3> sb:confidence 12.34
+
+    # travel as detected keyword
+    <a4> rdf:type sb:Enhancement
+    <a4> rdf:type sb:Annotation
+    # Properties for Enhancement
+    <a4> sb:extracted-from <ci>
+    <a4> dc:creator urn:stanbol.engines:keywordExtractionEngine
+    <a4> dc:created "2011-02-28T12:13:22Z"
+    # Properties for Annotation
+    <a4> dc:title "travel"
+    <a4> dc:role sb:Keyword
+    <a4> dc:type: dbpedia-ont:Activity //can we expect this to be available -> probably not
+    
+When consuming the following queries would be used:
+
+Getting all Tags: to get all Keywords/Categories replace sb:Tag with sb:Keyword/sb:Category
+
+    PREFIX dc: <http://purl.org/dc/terms/>
+    PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
+    SELECT ?id, ?title, ?type 
+    WHERE {
+        ?id dc:role sb:Tag .
+        ?id dc:title ?title .
+        OPTIONAL { ?id dc:type ?type }
+    }
+
+Getting suggestions for an known Annotation (e.g. urn:annotation1)
+
+    PREFIX dc: <http://purl.org/dc/terms/>
+    PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
+    SELECT ?entity, ?title, ?type ?score
+    WHERE {
+        <urn:annotation1> sb:suggestion ?id .
+        ?id dc:title ?title .
+        ?id sb:entity ?entity .
+        OPTIONAL { ?id sb:entity-type ?type } .
+        OPTIONAL { ?id sb:confidence ?score }
+    }
+ 
+Getting all selected Entities within the Text
+
+    PREFIX dc: <http://purl.org/dc/terms/>
+    PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
+    SELECT ?id, ?title, ?start, ?end, ?type 
+    WHERE {
+        ?id dc:role sb:Tag .
+        ?id dc:title ?title .
+        ?id sb:start ?start .
+        ?id sb:end ?end .
+        OPTIONAL { ?id dc:type ?type }
+    }
+
+Getting all Locations and optionally the occurrences within the text
+
+    PREFIX dc: <http://purl.org/dc/terms/>
+    PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
+    PREFIX dbpedia-ont: <http://dbpedia.org/ontology/>  
+    SELECT ?id, ?title, ?start, ?end
+    WHERE {
+        ?id dc:type dbpedia-ont:Place .
+        ?id dc:title ?title .
+        OPTIONAL {
+            ?id sb:start ?start .
+            ?id sb:end ?end
+        }
+    }
+
+### Enhancement of Metadata
+
+This example shows the the Enhancement Structure allows to create enhancements based on parsed Metadata.
+
+Lets assume that a user parses a content item and an additional file providing Dublin Core metadata that include (among others)
+
+* dc:creator "Richard Cypher"
+* dc:creator "Rachel Brandstone"
+* dc:contributor "Richard Cypher"
+
+Further assume that both Richard and Rachel works for the company running the Stanbol Enhancer and there is an EnhancementEngine that knows about Company resource.
+This example uses the URI "http://www.company.org/team/Richard_Cypher" and "http://www.company.org/team/Rachel_Brandstone" to identify the two example employees.
+
+    #The content item
+    <ci> rdf:type sb:ContentItem
+    <ci> dc:creator "Richard Cypher", "Rachel Brandstone"
+    <ci> dc:contributor "Richard Cypher"
+    <ci> {other Dublin Core metadata extracted from the parsed file}
+
+    # Annotation describing the "Richard Cypher"
+    # Assumed to be created by the dcAnnotationEngine with the help
+    # of the entityTaggingEngine.
+    <a1> rdf:type sb:Enhancement
+    <a1> rdf:type sb:Annotation
+    <a1> rdf:type sb:Occurrence
+    <a1> rdf:type sb:MetadataOccurrence
+    # Properties for Enhancement
+    <a1> sb:extracted-from <ci>
+    <a1> dc:creator urn:stanbol.engines:dcAnnotationEngine
+    <a1> dc:contributor urn:stanbol.engines:entityTaggingEngine
+    <a1> dc:created "2011-02-28T13:14:15Z"
+    # Properties for Annotation
+    <a1> dc:title "Richard Cypher"
+    <a1> dc:role sb:Tag
+    <a1> dc:type: dbpedia-ont:Person
+    <a1> sb:confidence 1.0
+    <a1> sb:entity http://www.company.org/team/Richard_Cypher
+    <a1> sb:entity-type foaf:Agent, foaf:Person, vCard:Contact
+    # Properteis for MetadataOccurrence
+    <a1> sb:field dc:creator, dc:contributor
+    <a1> sb:value "Richard Cypher"
+    
+    # Annotation describing the "Rachel Brandstone"
+    <a1> rdf:type sb:Enhancement
+    <a1> rdf:type sb:Annotation
+    <a1> rdf:type sb:Occurrence
+    <a1> rdf:type sb:MetadataOccurrence
+    # Properties for Enhancement
+    <a1> sb:extracted-from <ci>
+    <a1> dc:creator urn:stanbol.engines:dcAnnotationEngine
+    <a1> dc:contributor urn:stanbol.engines:entityTaggingEngine
+    <a1> dc:created "2011-02-28T13:14:22Z"
+    # Properties for Annotation
+    <a1> dc:title "Rachel Brandstone"
+    <a1> dc:role sb:Tag
+    <a1> dc:type: dbpedia-ont:Person
+    <a1> sb:confidence 1.0
+    <a1> sb:entity http://www.company.org/team/Rachel_Brandstone
+    <a1> sb:entity-type foaf:Agent, foaf:Person, vCard:Contact
+    # Properteis for MetadataOccurrence
+    <a1> sb:field dc:creator
+    <a1> sb:value "Rachel Brandstone"
+
+*NOTE*: One could also create two sb:Annotations for both Richard and Rachel, one Annotation describing the annotated value and a second suggesting the entity for the first, but that seams like an unnecessary complexity as long as there is only one person with this name in the company. Nonetheless this decision needs to be reviewed.
+Therefore the code for Richard when using this variant.
+
+    #Annotation describing "Richard Cypher" as extracted from the DC description
+    <a1> rdf:type sb:Enhancement
+    <a1> rdf:type sb:Annotation
+    <a1> rdf:type sb:Occurrence
+    <a1> rdf:type sb:MetadataOccurrence
+    # Properties for Enhancement
+    <a1> sb:extracted-from <ci>
+    <a1> dc:creator urn:stanbol.engines:dcAnnotationEngine
+    <a1> dc:created "2011-02-28T13:14:15Z"
+    # Properties for Annotation
+    <a1> dc:title "Richard Cypher"
+    <a1> dc:role sb:Tag
+    <a1> dc:type: dbpedia-ont:Person
+    <a1> sb:confidence 1.0
+    <a1> sb:suggestion <a3>
+    # Properteis for MetadataOccurrence
+    <a1> sb:field dc:creator, dc:contributor
+    <a1> sb:value "Richard Cypher"
+
+    # Annotation describing the employee Richard Cypher
+    <a3> rdf:type sb:Enhancement
+    <a3> rdf:type sb:Annotation
+    # Properties for Enhancement
+    <a3> sb:extracted-from <ci>
+    <a3> dc:requires <a1>
+    <a3> dc:creator urn:stanbol.engines:entityTaggingEngine
+    <a3> dc:created "2011-02-28T13:14:18Z"
+    # Properties for Annotation
+    <a3> dc:title "Richard Cypher"
+    <a3> dc:role sb:Suggestion
+    <a3> dc:type: dbpedia-ont:Person
+    <a3> sb:entity http://www.company.org/team/Richard_Cypher
+    <a3> sb:entity-type foaf:Agent, foaf:Person, vCard:Contact
+    <a3> sb:confidence 8.76
+
+When consuming the following queries would be used:
+
+Getting all Annotations for the dc:creator field
+
+Version based on variant 1:
+
+    PREFIX dc: <http://purl.org/dc/terms/>
+    PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
+    SELECT ?id, ?title, ?creatorId
+    WHERE {
+        ?id dc:title ?title .
+        ?id sb:entity ?creatorId .
+        ?id sb:field dc:creator.
+    }
+
+Version for variant 2:
+
+    PREFIX dc: <http://purl.org/dc/terms/>
+    PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
+    SELECT ?id, ?title, ?creatorId
+    WHERE {
+        ?ma sb:field dc:creator .
+        ?ma sb:suggestion ?id . 
+        ?id dc:title ?title .
+        ?id sb:entity ?creatorId .
+        ?id sb:field dc:creator.
+    }
+
+
+Getting all Annotations created for DC properties
+
+Version based on variant 1:
+
+    PREFIX dc: <http://purl.org/dc/terms/>
+    PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
+    SELECT ?id, ?title, ?field, ?entity
+    WHERE {
+        ?id dc:title ?title .
+        ?id sb:entity ?entity .
+        ?id sb:field ?field.
+        FILTER(REGEX(asString(?field),"$http://purl.org/dc/terms/.*"))
+    }
+
+Version based on variant 2:
+
+    PREFIX dc: <http://purl.org/dc/terms/>
+    PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
+    SELECT ?id, ?title, ?field, ?entity
+    WHERE {
+        ?ma sb:field dc:creator .
+        ?ma sb:field ?field.
+        ?ma sb:suggestion ?id . 
+        ?id dc:title ?title .
+        ?id sb:entity ?entity .
+        FILTER(REGEX(asString(?field),"$http://purl.org/dc/terms/.*"))
+    }
+