You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2011/11/28 06:41:23 UTC

svn commit: r1206981 - /incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext

Author: rwesten
Date: Mon Nov 28 05:41:23 2011
New Revision: 1206981

URL: http://svn.apache.org/viewvc?rev=1206981&view=rev
Log:
first version of a proposal for the Stanbol Enhancement Structure based on the [Annotation-Ontology](http://code.google.com/p/annotation-ontology/)

Added:
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext

Added: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext
URL: http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext?rev=1206981&view=auto
==============================================================================
--- incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext (added)
+++ incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext Mon Nov 28 05:41:23 2011
@@ -0,0 +1,157 @@
+Title: The Stanbol Enhancement Structure (PROPOSAL)
+
+Please NOTE: This is a proposal for the future version of the Enhancement Structure used by the Stanbol Enhancer. 
+
+**NOTES:** 
+
+* This **DOES NOT** describe the Enhancement Structure used by the current version of the Stanbol Enhancer! 
+* There is also an [older Proposal](stanbolenhancementstructure.html) that might still contain some information that are not yet contained in this version.
+
+## Background
+
+This proposal is aimed to define the "Stanbol Enhancement Structure" intended to be used by future version of the Stanbol Enhancer to encode Knowledge extracted from analyzed Documents.
+
+Currently the Stanbol Enhancer still uses the [FISE Enhancement Structure](http://wiki.iks-project.eu/index.php/EnhancementStructure) that dates back before the incubation of Stanbol to Apache. This proposal now suggest to base the "Stanbol Enhancement Structure" on the existing [Annotation-Ontology](http://code.google.com/p/annotation-ontology/wiki/Homepage).
+
+The following two sections provide a short overview about the currently used FISE Enhancement Structure as well as the Annotation-Ontology. As this information is critical to understand the suggestion made in the later parts of this document.
+
+### FISE Enhancement Structure
+
+The FISE Enhancement Structure defines three main Concepts:
+
+1. **FISE Enhancement**: Defines Metadata about the creation process, type of the Enhancement as well as relations to other Enhancements.
+2. **FISE Text Annotation**: Defines a selections within enhanced plain Text. Annotations about other content types are not defined.
+3. **FISE Entity Annotation**: Defines an annotation about an Entity.
+
+Each Annotation created by an Enhancement Engine MUST have the FISE Enhancement type as well as one of FISE Text Annotation or FISE Entity Annotation.
+
+The typical use is as follows:
+
+* A Text Annotation is used to define the annotated part of the document. Text Annotations do use the dc:type property to define the type of the extracted entity (e.g. as provided by Named Entity Recognition). 
+* A Entity Annotation is used to suggest Entities for a Text Annotation. 
+* Properties of the Enhancement are used to link the Text Annotation with the suggested Entity Annotations.
+* Enhancement Engines may also add knowledge about suggested entities (dereferencing of entities).
+
+Annotations like Keywords, Categories ... where discussed but never formally defined for the FISE Enhancement Structure.
+
+### Annotation-Ontology
+
+This Proposal describes how Stanbol can used the [Annotation-Ontology](http://code.google.com/p/annotation-ontology/wiki/Homepage) for encoding Enhancements. 
+
+From the Annotation-Ontology homepage:
+
+> Annotation Ontology (AO) is a vocabulary designed to extensively reuse existing domain ontologies (entities annotations or semantic tags) and to provide several other kind of annotations - comments, textual annotation (classic tags), notes, examples, erratum... - on potentially any kind of document (text, images, audio...) and document fragments.
+
+The following Figure gives an overview about the Annotation-Ontology as it shows a simple tagging like annotation of an whole document.
+
+> ![Example of annotation on a whole document with AO](http://annotation-ontology.googlecode.com/svn/trunk/images/Document%20Annotation%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png "Example of annotation on a whole document with AO")
+
+> Image Credit: Annotation-Ontology [Link](http://annotation-ontology.googlecode.com/svn/trunk/images/Document%20Annotation%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png)
+
+## Stanbol Enhancement Strucutre
+
+The following sections describe how the Stanbol Enhancement Structure can utilize the Annotation-Ontology to encode knowledge extracted form analyzed Content Items.
+
+### ContentItems
+
+Within the FISE Enhancement Structure the enhanced ContentItems where only referenced by the **fise:extracted-form** property. There was no specification on how to further define properties of the ContentItem. The Annotation-Ontology defines a much richer vocabulary for that.
+
+First an most important the Annotation-Ontology distinguished between the:
+
+* **Annotated Document**: This is the Document that is annotated
+* **Source Document**: This is the Document version that was used for the annotation process.
+
+> ![Source Documents](http://annotation-ontology.googlecode.com/svn/trunk/images/Source%20Document%202%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png "Document Annotations")
+
+> Image Credit: Annotation Ontology [Link](http://annotation-ontology.googlecode.com/svn/trunk/images/Source%20Document%202%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png)
+
+As an example: If a Web-Crawler crawls a site on the Web and stores a local copy for indexing, than the **Annotated Document** would use the URL of the document on the Web. The **Source Document** would be the ID of the locally cached version used for the enhancement process.
+
+#### Content Adapter and Source Documents:
+
+The Content Adapter pattern was suggested to be used to convert parsed documents to different Content Formats such as extracting the Plain Text of parsed HTML or PDF documents.
+
+The possibility to distinguish between the *Annotated Document* and the *Source Document* nicely supports this, because while Enhancement Engines can state that an Annotation is about the *Annotated Document* they can still state the exact *Source Document* that was used for processing. This allows e.g. to clearly state that the indexes of a text selection are based on the plain text version of the *Annotated Document*. 
+
+### Content Selectors
+
+The FISE Enhancement Structure defined a single "Content Selector" the *FISE Text Annotation*. The Annotation-Ontology uses a much richer Structure that even provides the possibility to extensions for defining specific selections different content types.
+
+With the Annotation-Ontology each Selector can link to both a the *Annotated Document* and the *Source Document*. In the following an Example for an Image Selection
+
+> ![Image Selector](http://annotation-ontology.googlecode.com/svn/trunk/images/Image%20InitEndCorner%20Selector%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png "Image Selector Example")
+
+> Image Credits: Annotation-Ontology [Link](http://annotation-ontology.googlecode.com/svn/trunk/images/Image%20InitEndCorner%20Selector%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png).
+
+#### Text Selectors
+
+The "PrefixPostfixSelector" as defined by the Text-Annotation Ontology differs from the currently used FISE Text Annotation. It does not define the character indexes and uses prefix and postfix instead of the surrounding context.
+
+Regarding backward compatibility The suggestion is to adopt the "PrefixPostfixSelector" but keep the start and end positions of the current Text Annotation. The prefix/posfix model of the "PrefixPostfixSelector" is definitely better than the used context of the FISE Text Annotation, because it allows to clearly identify the selected text even if it occurs several times in a given context.
+
+#### Multi Media Selectors and the Media Fragments Standard
+
+The [Media Fragments Working Group](http://www.w3.org/2008/WebVideo/Fragments/) of the W3C is currently working on a Recommendion on how to encode Fragments of Resources within so called [Media Fragments URIs](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/).
+
+This specification defines how to encode the [Temporal](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-time), [Spatial](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-space), [Track](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-track) and [ID](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-id) dimensions within Document URIs but also defines processing rules (e.g. for Browsers) and the semantics.
+
+The proposal here is to use this specification for encoding selections within multi media files within the Annotation-Ontology. This will most likely require the definition of an MediaFragmentSelector as extension.
+
+### Annotations
+
+The FISE Enhancement Structure uses both properties of FISE Enhancements and FISE TextAnnotation/EntityAnnotation to describe Annotations as defined by the Annotation-Ontology. On the other side some properties of the FISE TextAnnotation are part of the Selectors within the Annotation-Ontology. Because of that the switch to the Annotation-Ontology will not only mean a change in the used Vocabulary, but also bring some structural changes. 
+
+Annotations as defined by the Annotation-Ontology are structured as follows:
+
+* An Annotation is represented by a Resource (called Annotation-Resource in the remaining document) with the rdf:type ao:Annotation. Special types of Annotations can be introduced by subclasses of ao:Annotation.
+* The Annotation-Resource may be linked to an Selector with the **ao:context** property. If no such link is present the Annotation-Resource is about the whole Document. It is also possible to link multiple Selectors with an annotation.
+* Each Annotation-Resource MUST BE linked to the *Annotated Document* by using the **ao:annotatesResource** property. The *Source Document* can be referenced by using the **ao:onSourceDocument**. It is also possible to link multiple Documents with an annotation.
+
+The following sub-sections will provide an overview how Text Annotations , Entity Annotations and Category Annotations as used by Stanbol can be expressed using the Annotation-Ontology
+
+#### Text Annotations
+
+Text Annotations are Annotations as typically created by NER (Named Entity Recognition) engines. Such Annotations select a part of a Text and assign an type (Person, Organization, Place ...) to that.
+
+The text selection can be expressed by using an "PrefixPostfixSelector". The type and the confidence of the detected named entity need to be properties of the Annotation class.
+
+#### Entity Annotations
+
+Entity Annotations are similar to "Qualifier" annotations as defined to the Annotaiton-Ontology. The *ao:hasTopic* relation is used to link the annotation with the related topic.
+
+#### Category Anotations
+
+Category Annotations are typically about the whole or an specific section of an Document. Normal Selectors can be used for defining the categorized Section. If no Selector is present the categorization applies to the whole document. The "Qualifier" annotation could also be used as a base class for categorizations.
+
+### Annotation Sets
+
+Within the Annotation-Ontologies Annotation Sets can be used to group several Annotations together. Although the FISE Enhancement Structure does not explicitly define a similar possibility the possibilities to define relations between FISE Enhancements are used for a similar purpose by the Stanbol Enhancer. Therefore the suggestion is to use this feature of the Annotation-Ontology to model for expressing sets of possible Categories, suggestions of Entities.
+
+The following figure shows an Example for an Annotation Set with a single Annotation
+
+> ![Annotation sets](http://annotation-ontology.googlecode.com/svn/trunk/images/Annotation%20Set%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png "A simple Annotation Set with a single Annotation")
+
+> Image Credits: Annotation-Ontology [Link](http://annotation-ontology.googlecode.com/svn/trunk/images/Annotation%20Set%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png)
+
+This suggests the use of Annotation Sets to formally describe situations where the Stanbol Enhancer need group several Annotations in order to provide users the possibility to select from a predefined set of options. Assigning an unique ID - the URI of the AnnotationSet instance - to such a collection of Annotations brings also the possibility for the consumer to provide explicit feedback to the Stanbol Enhancer (e.g. by accepting/rejecting Annotations part of the AnnotationSet, adding an additional Annotation to an set, ...)
+
+Note that single Annotations might be part of several annotation sets. As an Example take an Text Annotation for that to sets of Entity suggestions are generated.
+
+The suggestion is to create subclasses for common types of Annotation Sets uses by the Stanbol Enhancer
+
+#### Entity Suggestions
+
+With the FISE Enhancement Structure this is expressed by a *fise:TextAnnotation* that is linked to several *fise:EntityAnnotation*s by the *dc:relation* property.
+
+Expressing the same based on the Annotation-Ontology would be possible by
+
+* An Annotation Set that links to the following Annotations (by the *ao:item* property):
+* An TextAnnotaion including the PrefixPostfixSelector selector defining the actual position of the selected text within the document
+* One EntityAnnotation (extends ao:Qualifier) per suggested Entities.
+* In addition the Annotation Set also includes metadata such the the Engine that created the suggestions
+
+#### Category Suggestions
+
+Typically categorizations can provide more than a single Category. So grouping such suggestions within an AnnotationSet gives Users the possibility to accept/reject one or more of such suggestions. In addition it would also allow to distinguish sets of categorizations calculated based on disjoint sets of categories (e.g. a categorization based on a UserProfile with a categorization based on general topics or a spatial categorization.)
+
+