You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2011/09/27 09:56:03 UTC

svn commit: r1176255 - in /incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer: EnhancementStructureOverview.png enhancementexample.png enhancementstructureoverview.png stanbolenhancementstructure.mdtext

Author: rwesten
Date: Tue Sep 27 07:56:02 2011
New Revision: 1176255

URL: http://svn.apache.org/viewvc?rev=1176255&view=rev
Log:
Updated the sections

* ContentItem
* Enhancement

added a new figure showing an Example for Enhancements

Added:
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementexample.png   (with props)
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructureoverview.png   (contents, props changed)
      - copied, changed from r1174655, incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
Removed:
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
Modified:
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext

Added: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementexample.png
URL: http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementexample.png?rev=1176255&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementexample.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Copied: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructureoverview.png (from r1174655, incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png)
URL: http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructureoverview.png?p2=incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructureoverview.png&p1=incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png&r1=1174655&r2=1176255&rev=1176255&view=diff
==============================================================================
Binary files - no diff available.

Propchange: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructureoverview.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext
URL: http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext?rev=1176255&r1=1176254&r2=1176255&view=diff
==============================================================================
--- incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext (original)
+++ incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext Tue Sep 27 07:56:02 2011
@@ -11,7 +11,7 @@ The Stanbol Enhancement Structure is bui
 
 The following list gives an overview about the concepts used by the Stanbol Enhancement Strucutre:
 
-![Overview about the Stanbol Enhancement Structure](EnhancementStructureOverview.png "Overview of the Stanbol Enhancement Structure")
+![Overview about the Stanbol Enhancement Structure](enhancementstructureoverview.png "Overview of the Stanbol Enhancement Structure")
 
 * **ContentItem:** This is the resource representing the parsed content. The URI of this resource depends on how the content was parsed to the Stanbol Enhancer. In case an absolute URI is provided by the request, than this URI is used. In all other cased the Stanbol Enhancer creates an URI based on the configured prefix or the URL of the service. The documentation of the RESTful service should provide more information about that.
 
@@ -30,11 +30,6 @@ Enhancements encoded based on this speci
 * sb:Annotation and sb:Suggestion MUST also be of type sb:Enhancement and include the required metadata defined by sb:Enhancement.
 * sb:Occurrences, sb:Annotations and Suggestions MUST include rdf:type information for all parent types. e.g. when adding a sb:TextOccurrences the rdf:type MUST include sb:TextOccurrence AND sb:Occurrences. Consumers are expected to NOT using any kind of reasoner therefore adding such additional information is the only way to ensure that queries for occurrences, annotations or suggestions provide the expected results.
 
----
-
-The parts below are currently under work
-
----
 
 ## Specification
 
@@ -65,22 +60,26 @@ A special NOTE to the usage of <{code
 
 ### ContentItem <ci>
 
-The ContentItem <ci> represents a content enhanced by the Stanbol Enhancer. It is the central resource used to link all the enhancements created by the EnhancementEngines.
-The Stanbol Enhancement Structure does not force client to distinguish between content (data) and contentItem (interpretation of the data). Within the Stanbol Enhancer only the contentItem is needed, because the Content is accessed via the Java API. Client are free to use markup to explicitly identify these parts of documents that need to be interpreted as content (e.g. an element in the DOM tree). An example is provided below. 
+The ContentItem <ci> represents a content parsed to the Stanbol Enhancer. It is the central resource used to link all the enhancements created by the EnhancementEngines.
 
     <ci> rdf:type sb:ContentItem
-    [<ci> rdf:type sioc:Item]
+    [<ci> sb:embeds-knowledge {knowlegeGraphId}]
+    [<ci> sb:has-section sb:ContentItem]
     [<ci> <{metadatafield}> {value(s)}]
-    [?parent sioc:content <ci>]
 
-The ContentItem itself does not define any properties however it is used as domain (target type) of some properties within the Stanbol Enhancement structure. Information extracted from metadata parsed with the content (e.g. Dublin Core, EXIF, ID3 ...) can be added directly to the ContentItem &lt;ci>.
+The ContentItem itself does only define two fields:
 
-*TODO:* Describe here how to deal with embedded knowledge (e.g. RDFa, MicroFormats …). Last time the discussion was to write such knowledge into an own Graph and do not add it to the returned Enhancement Structure. However this was only an suggestion and need to be reviewed.
+* **sb:embeds-knowledge**: Documents might contain explicit knowledge (e.g. MicroData, RDFa). If such information can be extracted, than it will be stored in an own RDF graph. This property links to the ID of this RDF graph. Such knowledge is typically extracted during the pre-processing phase of the EnhancementProcess. Therefore EnhancementEngine do have access to this information.
+* **sb:has-section**: A ContentItem my define different sections. The Stanbol EnhancementEngine will create an own ContentItem with an own ID for such sections. The Stanbol Enhancer will first enhance the main content item and than all the sections. This feature is mainly intended to split up huge documents to feasible parts to enhance.
 
-The usage of SIOC (Semantically-Interlinked Online Communities) is optionally and usually added by the client to embed information (e.g. as RDFa) to the content itself. However when parsing HTML with such markup to the Stanbol Enhancer such markup MUST BE used as default to determine those parts of the content that need to be enhanced.
+In addition metadata extracted or parsed with the parsed content (e.g. Dublin Core, EXIF, ID3 ...) can also be directly added to the ContentItem &lt;ci>. EnhancementEngines may used such information during the EnancementProcess.
+
+**Example: Embedded Knowledge**
 
 *TODO*: Move this to an own section about RDFa support!
 
+This example shows how SIOC (Semantically-Interlinked Online Communities) and RDFa can be used to embed knowledge to tell Stanbol how to process parsed HTML markup.
+
     <body about="http://www.examplenews.com/featuredNews"><table><tr>
         <td><!-- The menue: Not to be enhanced --> </td>
         <td><span property="sic:content" about="http://www.examplenews.com/story123"> 
@@ -91,14 +90,19 @@ The usage of SIOC (Semantically-Interlin
         <td> <!-- Advertisements: Not to be enhanced --> </td>
     </tr></trable></body>
 
-By parsing this as Content the Stanbol Enhancer should gather the following knowledge
+By parsing this as Content the Stanbol Enhancer should create:
 
-    <http://www.examplenews.com/featuredNews> sic:content <http://www.examplenews.com/story123>
-    <http://www.examplenews.com/featuredNews> sic:content <http://www.examplenews.com/interview456>
+* A sb:ContentItem for "http://www.examplenews.com/featuredNews" with two section but an empty content.
+    * The knowledge as defined by the above RDFa markup is included in an own RDF graph and linked with the "sb:embeds-knowledge" property
+* A sb:ContentItem representing the section "http://www.examplenews.com/story123" 
+    * the HTML fragment enclosed by the according span-tag is the content
+* A sb:ContentItem representing the section "http://www.examplenews.com/interview456"
+    * the HTML fragment enclosed by the according span-tag is the content
 
-and enhance only the markup within the two span tags marked with sic:content.
+NOTE: This assumes the presence of 
 
-*NOTE*: this would require support for multiple ContentItems (story123 and interview456 in that example)
+* a Components for extracting RDFa 
+* a Component that supports the creation of sb:ContentItems and fragments based on SIOC
 
 ### Enhancement
 
@@ -107,24 +111,36 @@ The concept "Enhancement" defines proper
 The following code segments shows the knowledge typically described by using the Enhancement concept
 
     <e> rdf:type sb:Enhancement
-    [<e> rdf:type sb:Annotation, sb:Occurrence]
     <e> dc:creator enhancementEngine^^xsd:anyURI
     <e> dc:contributor enhancementEngine^^xsd:anyURI
     <e> dc:created date^^xsd:dateTime
     <e> dc:modified date^^xsd:dateTime
-    <e> dc:relation <relatedEnhancement>
-    <e> dc:requires <dependsOnEnhancement>
+    [<e> sb:relatedTo <relatedEnhancement>]
+    [<e> sb:dependsOn <dependsOnEnhancement>]
 
 The presence of the statement "&lt;e> rdf:type sd:Enhancement" statement indicated that enhancement metadata are present for the resource &lt;e>. This also means that if there is some configuration set to exclude such information, than all the above properties MUST be removed from the results of the enhancement process.
-The optional  rdf:types sb:Annotation and sb:Occurrent do only indicate, that typically any enhancement resource &lt;e> is also of type sb:Annotation and/or sb:Occurrent. See the according sections and the usage examples for more information.
+The metadata defined by sb:Enhancement MUST BE added for all sb:Annotation and sb:Suggestion instances created by an EnhancementEngine. This also includes any rdf:subClassOf of those two Concepts. 
+
+The following figure shows an example of an sb:Annotation and a sb:Suggestion for Paris with the according metadata as defined by the sb:Enhancement concept.
 
-All of the metadata used to describe the enhancement process do use the DCterms vocabulary. 
+![Example: sb:Annotation and sb:Suggestion including sb:Enhancement metadata](enhancementexample.png "Example: sb:Annotation and sb:Suggestion including sb:Enhancement metadata")
 
-* dc:creator and dc:contributor link to the EnhancementEngine(s) involved in creating the Enhancement. 
-* dc:created and dc:modified are intended to help sort enhancement based on enhancement activities performed during the enhancement process (something that might be useful especially in case EnhancementEngines do work asynchronously). 
-* dc:relation and dc:requires are used to describe relations between enhancements. dc:relation is used to state that an enhancement is related to an other one, but would be still valid if the other gets invalid or is removed. dc:requires is used to state that an enhancement depends on an other one and cascading delete/invalidation should be applied. As example an enhancement suggesting the entity "http://dbpedia.org/resources/Paris" might depend on the Word "Paris" found in the Text and be related to an other enhancement stating that the document is about "http://dbpedai.org/resources/France".
+Note that sb:Annotation and sb:Suggestion are not sub-classes of sb:Annotation. EnhancementEngines need to add sb:Enhancement as an additional rdf:type to sb:Annotation and sb:Suggestion.
 
-*NOTE*: With this version of the enhancement structure it is no longer expected from users to process dc:relation and dc:requires relations as it was the case with the FISE enhancement structure to query for EntityAnnotations for TextAnnotations.
+Description of the properties defined/used by sb:Enhancement:
+
+* **dc:creator** and **dc:contributor** link to the EnhancementEngine(s) involved in creating the Enhancement.
+* **dc:created and **dc:modified** are intended to help sort enhancement based on enhancement activities performed during the enhancement process (something that might be useful especially in case EnhancementEngines do work asynchronously). 
+* **sb:relatedTo** defines that an sb:Enhancement is related to an other. However also specifies that both enhancements are still valid if the other one is deleted.
+* **sb:dependsOn** defines that an sb:Enhancement depends on the other. If the other Enhancement is deleted (or rejected by a user) than all dependent sb:Enhancements MUST BE also removed/rejected. The above figure shows that sb:hasSuggestion as defined by sb:Annotation is an inverse relation to sb:dependsOn because suggestions depend on the annotation they are suggested for.
+
+In addition EnhancementEngines might want/need to add additional metadata to the sb:Annotation and sb:Suggestion instances they create. Implementors of such EnhancementEngines are free to define there own Enhancemnt types. Such types MUST BE defined as rdfs:subClassOf sb:Enhancement and SHOULD use **Enhancement in there Concept name. EnhancementEngine MUST also add both the specific type AND sb:Enhancement as rdf:type values.
+
+---
+
+Sections below are not yet updated
+
+---
 
 ### Annotations