You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2014/10/27 16:19:11 UTC
svn commit: r1634568 - in
/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines:
list.mdtext nif20.mdtext nif20config.png
Author: rwesten
Date: Mon Oct 27 15:19:11 2014
New Revision: 1634568
URL: http://svn.apache.org/r1634568
Log:
added Enhancement Engine Documentation for STANBOL-1397
Added:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png (with props)
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext?rev=1634568&r1=1634567&r2=1634568&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext Mon Oct 27 15:19:11 2014
@@ -213,9 +213,9 @@ Apache Stanbol provide a core implementa
### Others
-* _NLP 2 RDF Engine:_ __under development__ (see [STANBOL-741](https://issues.apache.org/jira/browse/STANBOL-741))
- * converts NLP processing results stored in the [AnalyzedText](../nlp/analyzedtext) content part to RDF and adds them to the metadata of the [ContentItem](../contentitem)
- * generated RDF uses the NIF (NLP Interchange Format)
+* __[NIF 2.0 Transformation Engine](nif20)__ allows to serialize low level NLP results as RDF
+ * [NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/) stands for NLP Interchange Format. It defines an RDF schema that allows to describe Sentences, Phrases, Words and its NLP annotation.
+ * This engines allows to retrieve detailed information about NLP results typically only available by the Java API of the [Analysed Text](../nlp/analyzedtext) content part.
## Deprecated
@@ -227,6 +227,13 @@ Enhancement Engines listed below are no
* supports multiple languages
* detects occurrences of untyped entities as concepts, takes local taxonomies as linking target
+* _NLP 2 RDF Engine:_ __under development__ (see [STANBOL-741](https://issues.apache.org/jira/browse/STANBOL-741))
+ * replaced by the __[NIF 2.0 Transformation Engine](nif20)__ that supportes version 2.0 of the NIF standard while this engine is based on NIF 1.0
+ * converts NLP processing results stored in the [AnalyzedText](../nlp/analyzedtext) content part to RDF and adds them to the metadata of the [ContentItem](../contentitem)
+ * generated RDF uses the NIF (NLP Interchange Format)
+
+
+
* _CachingDereferencerEngine_ __deprecated__ (see dereferencing support of individual engines as well as [STANBOL-336](https://issues.apache.org/jira/browse/STANBOL-336))
* retrieves additional content for presenting the enhancement results.
Added: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext?rev=1634568&view=auto
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext (added)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext Mon Oct 27 15:19:11 2014
@@ -0,0 +1,193 @@
+Title: NIF 2.0 Transformation Engine
+
+Typically low level NLP results are not included to the RDF enhancement results. This engine supports the serialization of such results by using the [NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/) (NLP Interchange Format) standard.
+
+## Processed Information (Input)
+
+Apache Stanbol manages NLP results by the [Analysed Text](../nlp/analyzedtext) content part. This ContentPart provides a Java API for accessing those results. This engine reads such information and transformes it according to the [NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html) core ontology.
+
+If a ContentItem does not contain this content part it will not be processed by this engine.
+
+## Created RDF
+
+The engine serializes the following information:
+
+* Segment URIs by using the [RFC 5147](http://tools.ietf.org/html/rfc5147) URI scheme
+* Selector information like `nif:beginIndex`, `nif:endIndex` as well as `nif:before`, `nif:anchorOf` and `nif:after`. For spans longer as 100 chars the `nif:head` property is used instead of `nif:anchorOf`.
+* Context information: This includes `nif:referenceContext` links for all Strings as well as additional metadata for the context.
+* String hierarchies: `nif:sub-/nif:superWord`, `nif:sentence`
+* String navigation: `nif:next-/nif:previousSentnece`, `nif:next-/nif:previousWord`
+* String annotations: `nif:oliaCategory`, `nif:oliaConfidence` and `nif:posTag`
+
+### Configuration
+
+The Engine supports several switches that allow to enable/disable the serialization of NIF information. The engine supports the configuration of multiple instances with different configurations. The following figure shows the configuration dialog:
+
+![NIF2.0 Engine Configuration](nif20config.png)
+
+* __Selector__ _(enhancer.engines.nlp2rdf.selector)_: Allows to enable/disable the serialization of selector related properties such as `nif:beginIndex`, `nif:endIndex`, `nif:before`, `nif:anchorOf` and `nif:after`. If disabled clients can still parse the start/end indexes from the [RFC 5147](http://tools.ietf.org/html/rfc5147) encoded segment URI.
+* __Hierarchy__ _(enhancer.engines.nlp2rdf.hierarchy)_: Switch that allows to enable/disable writing of hierarchical links. This includes `olia:sentence`, `olia:superString` and `olia:subString` properties.
+* __Previous and Next Links__ _(enhancer.engines.nlp2rdf.previousNext)_: Allows to enable/disable the serialization of links to the previous/next sentence/word
+* __Context only URI Scheme__ _(enhancer.engines.nlp2rdf.cotextOnlyUriScheme)_: If enabled the used [RFC 5147](http://tools.ietf.org/html/rfc5147) URI scheme is added only to the `rdf:type` of the `nif:Context`. If disabled the `nif:RFC5147String` `rdf:type` is added to all segments.
+* __String Type__ _(enhancer.engines.nlp2rdf.writeStringType)_: If enabled the `nif:String` type is added to all serialized segments. If disabled only more specific types like `nif:Sentence` or `nif:Word` are used.
+
+### Examples
+
+This sections provides some examples of RDF generated by this Engine. OpenNLP was used to create the serialized NLP annotation. The Sentence `The Apache Stanbol Enhancer can detect entities in text` was used for generating this example.
+
+ :::text
+ @prefix content <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#> .
+ @prefix nif <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
+ @prefix olia <http://purl.org/olia/olia.owl#> .
+ @prefix xsd <http://www.w3.org/2001/XMLSchema#> .
+
+The first Turtle snippet shows the `nif:Context` instance. This is referenced by all segments and it will refer to the URI of the ContentItem by using the `nif:sourceUrl`.
+
+ :::text
+ content:char=0
+ a nif:Context , nif:RFC5147String ;
+ nif:anchorOf
+ "The Apache Stanbol Enhancer can detect entities in text."@en ;
+ nif:beginIndex
+ "0"^^xsd:int ;
+ nif:endIndex
+ "56"^^xsd:int ;
+ nif:sourceUrl
+ <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e> .
+
+Next the segment describing the only sentence in the example text.
+
+ :::text
+ content:char=0,56
+ a nif:RFC5147String , nif:Sentence ;
+ nif:anchorOf
+ "The Apache Stanbol Enhancer can detect entities in text."@en ;
+ nif:beginIndex
+ "0"^^xsd:int ;
+ nif:endIndex
+ "56"^^xsd:int ;
+ nif:firstWord
+ content:char=0,3 ;
+ nif:referenceContext
+ content:char=0 .
+
+The following snippet shows the segments for the first three words of the Sentence.
+
+ :::text
+ content:char=0,3
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "The"@en ;
+ nif:beginIndex
+ "0"^^xsd:int ;
+ nif:endIndex
+ "3"^^xsd:int ;
+ nif:nextWord
+ content:char=4,10 ;
+ nif:oliaCategory
+ olia:Determiner , olia:PronounOrDeterminer ;
+ nif:oliaConf
+ "0.9662179110607207"^^xsd:double ;
+ nif:posTag
+ "DT"^^xsd:string ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=0,10 .
+
+ content:char=4,10
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "Apache"@en ;
+ nif:beginIndex
+ "4"^^xsd:int ;
+ nif:endIndex
+ "10"^^xsd:int ;
+ nif:nextWord
+ content:char=11,18 ;
+ nif:oliaCategory
+ olia:Noun , olia:PluralQuantifier , olia:ProperNoun , olia:Quantifier ;
+ nif:oliaConf
+ "0.7882547205652428"^^xsd:double ;
+ nif:posTag
+ "NNPS"^^xsd:string ;
+ nif:previousWord
+ content:char=0,3 ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=0,10 .
+
+ content:char=11,18
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "Stanbol"@en ;
+ nif:beginIndex
+ "11"^^xsd:int ;
+ nif:endIndex
+ "18"^^xsd:int ;
+ nif:nextWord
+ content:char=19,27 ;
+ nif:oliaCategory
+ olia:Noun , olia:ProperNoun , olia:Quantifier , olia:SingularQuantifier ;
+ nif:oliaConf
+ "0.701014272348203"^^xsd:double ;
+ nif:posTag
+ "NNP"^^xsd:string ;
+ nif:previousWord
+ content:char=4,10 ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=11,27 .
+
+Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the included the segment for the verb that links to the phrase using `nif:subString`.
+
+ :::text
+ content:char=28,38
+ a nif:Phrase , nif:RFC5147String ;
+ nif:anchorOf
+ "can detect"@en ;
+ nif:beginIndex
+ "28"^^xsd:int ;
+ nif:endIndex
+ "38"^^xsd:int ;
+ nif:oliaCategory
+ olia:VerbPhrase ;
+ nif:oliaConf
+ "0.9864510669287669"^^xsd:double ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:superString
+ content:char=0,56 .
+
+ content:char=32,38
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "detect"@en ;
+ nif:beginIndex
+ "32"^^xsd:int ;
+ nif:endIndex
+ "38"^^xsd:int ;
+ nif:nextWord
+ content:char=39,47 ;
+ nif:oliaCategory
+ olia:Infinitive , olia:Verb ;
+ nif:oliaConf
+ "0.9930989756397197"^^xsd:double ;
+ nif:posTag
+ "VB"^^xsd:string ;
+ nif:previousWord
+ content:char=28,31 ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=28,38 .
Added: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png?rev=1634568&view=auto
==============================================================================
Binary file - no diff available.
Propchange: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream