You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2014/10/27 16:19:11 UTC

svn commit: r1634568 - in /stanbol/site/trunk/content/docs/trunk/components/enhancer/engines: list.mdtext nif20.mdtext nif20config.png

Author: rwesten
Date: Mon Oct 27 15:19:11 2014
New Revision: 1634568

URL: http://svn.apache.org/r1634568
Log:
added Enhancement Engine Documentation for STANBOL-1397

Added:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png   (with props)
Modified:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext?rev=1634568&r1=1634567&r2=1634568&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext Mon Oct 27 15:19:11 2014
@@ -213,9 +213,9 @@ Apache Stanbol provide a core implementa
 
 ### Others
 
-* _NLP 2 RDF Engine:_ __under development__ (see [STANBOL-741](https://issues.apache.org/jira/browse/STANBOL-741))
-	* converts NLP processing results stored in the [AnalyzedText](../nlp/analyzedtext) content part to RDF and adds them to the metadata of the [ContentItem](../contentitem)
-	* generated RDF uses the NIF (NLP Interchange Format)
+* __[NIF 2.0 Transformation Engine](nif20)__ allows to serialize low level NLP results as RDF
+    * [NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/) stands for NLP Interchange Format. It defines an RDF schema that allows to describe Sentences, Phrases, Words and its NLP annotation.
+    * This engines allows to retrieve detailed information about NLP results typically only available by the Java API of the [Analysed Text](../nlp/analyzedtext) content part.
 
 
 ## Deprecated
@@ -227,6 +227,13 @@ Enhancement Engines listed below are no 
 	* supports multiple languages
 	* detects occurrences of untyped entities as concepts, takes local taxonomies as linking target	
 
+* _NLP 2 RDF Engine:_ __under development__ (see [STANBOL-741](https://issues.apache.org/jira/browse/STANBOL-741))
+    * replaced by the __[NIF 2.0 Transformation Engine](nif20)__ that supportes version 2.0 of the NIF standard while this engine is based on NIF 1.0
+	* converts NLP processing results stored in the [AnalyzedText](../nlp/analyzedtext) content part to RDF and adds them to the metadata of the [ContentItem](../contentitem)
+	* generated RDF uses the NIF (NLP Interchange Format)
+
+
+
 * _CachingDereferencerEngine_ __deprecated__ (see dereferencing support of individual engines as well as  [STANBOL-336](https://issues.apache.org/jira/browse/STANBOL-336))
 	* retrieves additional content for presenting the enhancement results.
 

Added: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext?rev=1634568&view=auto
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext (added)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext Mon Oct 27 15:19:11 2014
@@ -0,0 +1,193 @@
+Title: NIF 2.0 Transformation Engine
+
+Typically low level NLP results are not included to the RDF enhancement results. This engine supports the serialization of such results by using the [NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/) (NLP Interchange Format)  standard.
+
+## Processed Information (Input)
+
+Apache Stanbol manages NLP results by the [Analysed Text](../nlp/analyzedtext) content part. This ContentPart provides a Java API for accessing those results. This engine reads such information and transformes it according to the [NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html) core ontology. 
+
+If a ContentItem does not contain this content part it will not be processed by this engine.
+
+## Created RDF
+
+The engine serializes the following information:
+
+* Segment URIs by using the [RFC 5147](http://tools.ietf.org/html/rfc5147) URI scheme
+* Selector information like `nif:beginIndex`, `nif:endIndex` as well as `nif:before`, `nif:anchorOf` and `nif:after`. For spans longer as 100 chars the `nif:head` property is used instead of `nif:anchorOf`.
+* Context information: This includes `nif:referenceContext` links for all Strings as well as additional metadata for the context.
+* String hierarchies: `nif:sub-/nif:superWord`, `nif:sentence`
+* String navigation: `nif:next-/nif:previousSentnece`, `nif:next-/nif:previousWord`
+* String annotations: `nif:oliaCategory`, `nif:oliaConfidence` and `nif:posTag`
+
+### Configuration
+
+The Engine supports several switches that allow to enable/disable the serialization of NIF information. The engine supports the configuration of multiple instances with different configurations. The following figure shows the configuration dialog:
+
+![NIF2.0 Engine Configuration](nif20config.png)
+
+* __Selector__ _(enhancer.engines.nlp2rdf.selector)_: Allows to enable/disable the serialization of selector related properties such as `nif:beginIndex`, `nif:endIndex`, `nif:before`, `nif:anchorOf` and `nif:after`. If disabled clients can still parse the start/end indexes from the [RFC 5147](http://tools.ietf.org/html/rfc5147) encoded segment URI.
+* __Hierarchy__ _(enhancer.engines.nlp2rdf.hierarchy)_: Switch that allows to enable/disable writing of hierarchical links. This includes `olia:sentence`, `olia:superString` and `olia:subString` properties.
+* __Previous and Next Links__ _(enhancer.engines.nlp2rdf.previousNext)_: Allows to enable/disable the serialization of links to the previous/next sentence/word
+* __Context only URI Scheme__ _(enhancer.engines.nlp2rdf.cotextOnlyUriScheme)_: If enabled the used [RFC 5147](http://tools.ietf.org/html/rfc5147) URI scheme is added only to the `rdf:type` of the `nif:Context`. If disabled the `nif:RFC5147String` `rdf:type` is added to all segments.
+* __String Type__ _(enhancer.engines.nlp2rdf.writeStringType)_: If enabled the `nif:String` type is added to all serialized segments. If disabled only more specific types like `nif:Sentence` or `nif:Word` are used.
+
+### Examples
+
+This sections provides some examples of RDF generated by this Engine. OpenNLP was used to create the serialized NLP annotation. The Sentence `The Apache Stanbol Enhancer can detect entities in text` was used for generating this example.
+
+    :::text
+    @prefix content <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#> .
+    @prefix nif  <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
+    @prefix olia  <http://purl.org/olia/olia.owl#> .
+    @prefix  xsd  <http://www.w3.org/2001/XMLSchema#> .
+
+The first Turtle snippet shows the `nif:Context` instance. This is referenced by all segments and it will refer to the URI of the ContentItem by using the `nif:sourceUrl`.
+
+    :::text
+    content:char=0
+        a nif:Context ,  nif:RFC5147String ;
+        nif:anchorOf
+            "The Apache Stanbol Enhancer can detect entities in text."@en ;
+        nif:beginIndex
+            "0"^^xsd:int ;
+        nif:endIndex
+            "56"^^xsd:int ;
+        nif:sourceUrl
+            <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e> .
+
+Next the segment describing the only sentence in the example text.
+
+    :::text
+    content:char=0,56
+        a nif:RFC5147String ,  nif:Sentence ;
+        nif:anchorOf
+            "The Apache Stanbol Enhancer can detect entities in text."@en ;
+        nif:beginIndex
+            "0"^^xsd:int ;
+        nif:endIndex
+            "56"^^xsd:int ;
+        nif:firstWord
+            content:char=0,3 ;
+        nif:referenceContext
+            content:char=0 .
+
+The following snippet shows the segments for the first three words of the Sentence.
+
+    :::text
+    content:char=0,3
+        a nif:RFC5147String ,  nif:Word ;
+        nif:anchorOf
+            "The"@en ;
+        nif:beginIndex
+            "0"^^xsd:int ;
+        nif:endIndex
+            "3"^^xsd:int ;
+        nif:nextWord
+            content:char=4,10 ;
+        nif:oliaCategory
+             olia:Determiner ,  olia:PronounOrDeterminer ;
+        nif:oliaConf
+            "0.9662179110607207"^^xsd:double ;
+        nif:posTag
+            "DT"^^xsd:string ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:sentence
+            content:char=0,56 ;
+        nif:subString
+            content:char=0,10 .
+
+    content:char=4,10
+        a nif:RFC5147String ,  nif:Word ;
+        nif:anchorOf
+            "Apache"@en ;
+        nif:beginIndex
+            "4"^^xsd:int ;
+        nif:endIndex
+            "10"^^xsd:int ;
+        nif:nextWord
+            content:char=11,18 ;
+        nif:oliaCategory
+             olia:Noun ,  olia:PluralQuantifier ,  olia:ProperNoun ,  olia:Quantifier ;
+        nif:oliaConf
+            "0.7882547205652428"^^xsd:double ;
+        nif:posTag
+            "NNPS"^^xsd:string ;
+        nif:previousWord
+            content:char=0,3 ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:sentence
+            content:char=0,56 ;
+        nif:subString
+            content:char=0,10 .
+
+    content:char=11,18
+        a nif:RFC5147String ,  nif:Word ;
+        nif:anchorOf
+            "Stanbol"@en ;
+        nif:beginIndex
+            "11"^^xsd:int ;
+        nif:endIndex
+            "18"^^xsd:int ;
+        nif:nextWord
+            content:char=19,27 ;
+        nif:oliaCategory
+             olia:Noun ,  olia:ProperNoun ,  olia:Quantifier ,  olia:SingularQuantifier ;
+        nif:oliaConf
+            "0.701014272348203"^^xsd:double ;
+        nif:posTag
+            "NNP"^^xsd:string ;
+        nif:previousWord
+            content:char=4,10 ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:sentence
+            content:char=0,56 ;
+        nif:subString
+            content:char=11,27 .
+
+Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the included the segment for the verb that links to the phrase using `nif:subString`.
+
+    :::text
+    content:char=28,38
+        a nif:Phrase ,  nif:RFC5147String ;
+        nif:anchorOf
+            "can detect"@en ;
+        nif:beginIndex
+            "28"^^xsd:int ;
+        nif:endIndex
+            "38"^^xsd:int ;
+        nif:oliaCategory
+             olia:VerbPhrase ;
+        nif:oliaConf
+            "0.9864510669287669"^^xsd:double ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:superString
+            content:char=0,56 .
+
+    content:char=32,38
+        a nif:RFC5147String ,  nif:Word ;
+        nif:anchorOf
+            "detect"@en ;
+        nif:beginIndex
+            "32"^^xsd:int ;
+        nif:endIndex
+            "38"^^xsd:int ;
+        nif:nextWord
+            content:char=39,47 ;
+        nif:oliaCategory
+             olia:Infinitive ,  olia:Verb ;
+        nif:oliaConf
+            "0.9930989756397197"^^xsd:double ;
+        nif:posTag
+            "VB"^^xsd:string ;
+        nif:previousWord
+            content:char=28,31 ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:sentence
+            content:char=0,56 ;
+        nif:subString
+            content:char=28,38 .

Added: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png?rev=1634568&view=auto
==============================================================================
Binary file - no diff available.

Propchange: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream