You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2014/10/27 16:19:23 UTC
svn commit: r927044 - in /websites/staging/stanbol/trunk/content: ./
docs/trunk/components/enhancer/engines/list.html
docs/trunk/components/enhancer/engines/nif20.html
docs/trunk/components/enhancer/engines/nif20config.png
Author: buildbot
Date: Mon Oct 27 15:19:23 2014
New Revision: 927044
Log:
Staging update by buildbot for stanbol
Added:
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20.html
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png (with props)
Modified:
websites/staging/stanbol/trunk/content/ (props changed)
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Oct 27 15:19:23 2014
@@ -1 +1 @@
-1605679
+1634568
Modified: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html (original)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html Mon Oct 27 15:19:23 2014
@@ -393,9 +393,9 @@
</ul>
<h3 id="others">Others</h3>
<ul>
-<li><em>NLP 2 RDF Engine:</em> <strong>under development</strong> (see <a href="https://issues.apache.org/jira/browse/STANBOL-741">STANBOL-741</a>)<ul>
-<li>converts NLP processing results stored in the <a href="../nlp/analyzedtext">AnalyzedText</a> content part to RDF and adds them to the metadata of the <a href="../contentitem">ContentItem</a></li>
-<li>generated RDF uses the NIF (NLP Interchange Format)</li>
+<li><strong><a href="nif20">NIF 2.0 Transformation Engine</a></strong> allows to serialize low level NLP results as RDF<ul>
+<li><a href="http://persistence.uni-leipzig.org/nlp2rdf/">NIF 2.0</a> stands for NLP Interchange Format. It defines an RDF schema that allows to describe Sentences, Phrases, Words and its NLP annotation.</li>
+<li>This engines allows to retrieve detailed information about NLP results typically only available by the Java API of the <a href="../nlp/analyzedtext">Analysed Text</a> content part.</li>
</ul>
</li>
</ul>
@@ -411,6 +411,14 @@
</ul>
</li>
<li>
+<p><em>NLP 2 RDF Engine:</em> <strong>under development</strong> (see <a href="https://issues.apache.org/jira/browse/STANBOL-741">STANBOL-741</a>)</p>
+<ul>
+<li>replaced by the <strong><a href="nif20">NIF 2.0 Transformation Engine</a></strong> that supportes version 2.0 of the NIF standard while this engine is based on NIF 1.0</li>
+<li>converts NLP processing results stored in the <a href="../nlp/analyzedtext">AnalyzedText</a> content part to RDF and adds them to the metadata of the <a href="../contentitem">ContentItem</a></li>
+<li>generated RDF uses the NIF (NLP Interchange Format)</li>
+</ul>
+</li>
+<li>
<p><em>CachingDereferencerEngine</em> <strong>deprecated</strong> (see dereferencing support of individual engines as well as <a href="https://issues.apache.org/jira/browse/STANBOL-336">STANBOL-336</a>)</p>
<ul>
<li>retrieves additional content for presenting the enhancement results.</li>
Added: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20.html (added)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20.html Mon Oct 27 15:19:23 2014
@@ -0,0 +1,289 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE- 2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+ <link href="/css/stanbol.css" rel="stylesheet" type="text/css">
+ <title>Apache Stanbol - NIF 2.0 Transformation Engine</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <link title="doap" rel="meta" type="application/rdf+xml" href="/doap.rdf"/>
+ <link rel="icon" type="image/png" href="/images/stanbol-logo/stanbol-favicon.png"/>
+ <script type="text/javascript">
+ // Google Analytics Tracking Code
+ var _gaq = _gaq || [];
+ _gaq.push(['_setAccount', 'UA-32086816-1']);
+ _gaq.push(['_trackPageview']);
+
+ (function() {
+ var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+ ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+ var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+ })();
+ </script>
+</head>
+
+<body>
+ <div id="navigation"> <!-- but auto scroll the menue -->
+ <a href="/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/images/stanbol-logo/stanbol-2010-12-14.png"/></a><br />
+ <ul>
+<li><a href="/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/docs/trunk/">Documentation</a><ul>
+<li><a href="/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/docs/trunk/components/">Components</a></li>
+<li><a href="/docs/trunk/production-mode/">Production Mode</a></li>
+</ul>
+</li>
+<li><a href="/development/">Development</a><ul>
+<li><a href="/development/index.html#mailing_lists">Mailing Lists</a></li>
+<li><a href="/development/index.html#issue_tracker">Issue Tracker</a></li>
+<li><a href="/development/index.html#source_code">Source Code</a></li>
+<li><a href="/development/index.html#development_practices">Development Practices</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/downloads/">Overview</a><ul>
+<li><a href="/downloads/releases.html">Releases</a></li>
+<li><a href="/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/pmc/">PMC</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="archived-docs">Archived Docs</h1>
+<ul>
+<li><a href="/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+<p><br /><a href="/doap.rdf"><img style="margin-left: 1em;" border="0" alt="DOAP File" src="/images/doap.png"/></a></p>
+ </div>
+ <div id="content">
+ <div class="breadcrumbs">
+ <ul> <li><a href="/">Home</a></li> <li class="item"><a href="/docs/">Docs</a></li> <li class="item"><a href="/docs/trunk/">Trunk</a></li> <li class="item"><a href="/docs/trunk/components/">Components</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/">Enhancer</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/engines/">Engines</a></li> </ul>
+ </div>
+ <h1 class="title">NIF 2.0 Transformation Engine</h1>
+ <p>Typically low level NLP results are not included to the RDF enhancement results. This engine supports the serialization of such results by using the <a href="http://persistence.uni-leipzig.org/nlp2rdf/">NIF 2.0</a> (NLP Interchange Format) standard.</p>
+<h2 id="processed-information-input">Processed Information (Input)</h2>
+<p>Apache Stanbol manages NLP results by the <a href="../nlp/analyzedtext">Analysed Text</a> content part. This ContentPart provides a Java API for accessing those results. This engine reads such information and transformes it according to the <a href="http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html">NIF 2.0</a> core ontology. </p>
+<p>If a ContentItem does not contain this content part it will not be processed by this engine.</p>
+<h2 id="created-rdf">Created RDF</h2>
+<p>The engine serializes the following information:</p>
+<ul>
+<li>Segment URIs by using the <a href="http://tools.ietf.org/html/rfc5147">RFC 5147</a> URI scheme</li>
+<li>Selector information like <code>nif:beginIndex</code>, <code>nif:endIndex</code> as well as <code>nif:before</code>, <code>nif:anchorOf</code> and <code>nif:after</code>. For spans longer as 100 chars the <code>nif:head</code> property is used instead of <code>nif:anchorOf</code>.</li>
+<li>Context information: This includes <code>nif:referenceContext</code> links for all Strings as well as additional metadata for the context.</li>
+<li>String hierarchies: <code>nif:sub-/nif:superWord</code>, <code>nif:sentence</code></li>
+<li>String navigation: <code>nif:next-/nif:previousSentnece</code>, <code>nif:next-/nif:previousWord</code></li>
+<li>String annotations: <code>nif:oliaCategory</code>, <code>nif:oliaConfidence</code> and <code>nif:posTag</code></li>
+</ul>
+<h3 id="configuration">Configuration</h3>
+<p>The Engine supports several switches that allow to enable/disable the serialization of NIF information. The engine supports the configuration of multiple instances with different configurations. The following figure shows the configuration dialog:</p>
+<p><img alt="NIF2.0 Engine Configuration" src="nif20config.png" /></p>
+<ul>
+<li><strong>Selector</strong> <em>(enhancer.engines.nlp2rdf.selector)</em>: Allows to enable/disable the serialization of selector related properties such as <code>nif:beginIndex</code>, <code>nif:endIndex</code>, <code>nif:before</code>, <code>nif:anchorOf</code> and <code>nif:after</code>. If disabled clients can still parse the start/end indexes from the <a href="http://tools.ietf.org/html/rfc5147">RFC 5147</a> encoded segment URI.</li>
+<li><strong>Hierarchy</strong> <em>(enhancer.engines.nlp2rdf.hierarchy)</em>: Switch that allows to enable/disable writing of hierarchical links. This includes <code>olia:sentence</code>, <code>olia:superString</code> and <code>olia:subString</code> properties.</li>
+<li><strong>Previous and Next Links</strong> <em>(enhancer.engines.nlp2rdf.previousNext)</em>: Allows to enable/disable the serialization of links to the previous/next sentence/word</li>
+<li><strong>Context only URI Scheme</strong> <em>(enhancer.engines.nlp2rdf.cotextOnlyUriScheme)</em>: If enabled the used <a href="http://tools.ietf.org/html/rfc5147">RFC 5147</a> URI scheme is added only to the <code>rdf:type</code> of the <code>nif:Context</code>. If disabled the <code>nif:RFC5147String</code> <code>rdf:type</code> is added to all segments.</li>
+<li><strong>String Type</strong> <em>(enhancer.engines.nlp2rdf.writeStringType)</em>: If enabled the <code>nif:String</code> type is added to all serialized segments. If disabled only more specific types like <code>nif:Sentence</code> or <code>nif:Word</code> are used.</li>
+</ul>
+<h3 id="examples">Examples</h3>
+<p>This sections provides some examples of RDF generated by this Engine. OpenNLP was used to create the serialized NLP annotation. The Sentence <code>The Apache Stanbol Enhancer can detect entities in text</code> was used for generating this example.</p>
+<div class="codehilite"><pre>@prefix content <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#> .
+@prefix nif <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
+@prefix olia <http://purl.org/olia/olia.owl#> .
+@prefix xsd <http://www.w3.org/2001/XMLSchema#> .
+</pre></div>
+
+
+<p>The first Turtle snippet shows the <code>nif:Context</code> instance. This is referenced by all segments and it will refer to the URI of the ContentItem by using the <code>nif:sourceUrl</code>.</p>
+<div class="codehilite"><pre>content:char=0
+ a nif:Context , nif:RFC5147String ;
+ nif:anchorOf
+ "The Apache Stanbol Enhancer can detect entities in text."@en ;
+ nif:beginIndex
+ "0"^^xsd:int ;
+ nif:endIndex
+ "56"^^xsd:int ;
+ nif:sourceUrl
+ <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e> .
+</pre></div>
+
+
+<p>Next the segment describing the only sentence in the example text.</p>
+<div class="codehilite"><pre>content:char=0,56
+ a nif:RFC5147String , nif:Sentence ;
+ nif:anchorOf
+ "The Apache Stanbol Enhancer can detect entities in text."@en ;
+ nif:beginIndex
+ "0"^^xsd:int ;
+ nif:endIndex
+ "56"^^xsd:int ;
+ nif:firstWord
+ content:char=0,3 ;
+ nif:referenceContext
+ content:char=0 .
+</pre></div>
+
+
+<p>The following snippet shows the segments for the first three words of the Sentence.</p>
+<div class="codehilite"><pre>content:char=0,3
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "The"@en ;
+ nif:beginIndex
+ "0"^^xsd:int ;
+ nif:endIndex
+ "3"^^xsd:int ;
+ nif:nextWord
+ content:char=4,10 ;
+ nif:oliaCategory
+ olia:Determiner , olia:PronounOrDeterminer ;
+ nif:oliaConf
+ "0.9662179110607207"^^xsd:double ;
+ nif:posTag
+ "DT"^^xsd:string ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=0,10 .
+
+content:char=4,10
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "Apache"@en ;
+ nif:beginIndex
+ "4"^^xsd:int ;
+ nif:endIndex
+ "10"^^xsd:int ;
+ nif:nextWord
+ content:char=11,18 ;
+ nif:oliaCategory
+ olia:Noun , olia:PluralQuantifier , olia:ProperNoun , olia:Quantifier ;
+ nif:oliaConf
+ "0.7882547205652428"^^xsd:double ;
+ nif:posTag
+ "NNPS"^^xsd:string ;
+ nif:previousWord
+ content:char=0,3 ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=0,10 .
+
+content:char=11,18
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "Stanbol"@en ;
+ nif:beginIndex
+ "11"^^xsd:int ;
+ nif:endIndex
+ "18"^^xsd:int ;
+ nif:nextWord
+ content:char=19,27 ;
+ nif:oliaCategory
+ olia:Noun , olia:ProperNoun , olia:Quantifier , olia:SingularQuantifier ;
+ nif:oliaConf
+ "0.701014272348203"^^xsd:double ;
+ nif:posTag
+ "NNP"^^xsd:string ;
+ nif:previousWord
+ content:char=4,10 ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=11,27 .
+</pre></div>
+
+
+<p>Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the included the segment for the verb that links to the phrase using <code>nif:subString</code>.</p>
+<div class="codehilite"><pre>content:char=28,38
+ a nif:Phrase , nif:RFC5147String ;
+ nif:anchorOf
+ "can detect"@en ;
+ nif:beginIndex
+ "28"^^xsd:int ;
+ nif:endIndex
+ "38"^^xsd:int ;
+ nif:oliaCategory
+ olia:VerbPhrase ;
+ nif:oliaConf
+ "0.9864510669287669"^^xsd:double ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:superString
+ content:char=0,56 .
+
+content:char=32,38
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "detect"@en ;
+ nif:beginIndex
+ "32"^^xsd:int ;
+ nif:endIndex
+ "38"^^xsd:int ;
+ nif:nextWord
+ content:char=39,47 ;
+ nif:oliaCategory
+ olia:Infinitive , olia:Verb ;
+ nif:oliaConf
+ "0.9930989756397197"^^xsd:double ;
+ nif:posTag
+ "VB"^^xsd:string ;
+ nif:previousWord
+ content:char=28,31 ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=28,38 .
+</pre></div>
+ </div>
+
+ <div id="footer">
+ <div class="copyright">
+ <p>
+ Copyright © 2010 The Apache Software Foundation, Licensed under
+ the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br />
+ Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+ </p>
+ </div>
+ </div>
+
+</body>
+</html>
+
Added: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
==============================================================================
Binary file - no diff available.
Propchange: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
------------------------------------------------------------------------------
svn:mime-type = image/png