You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2014/10/27 16:19:23 UTC

svn commit: r927044 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/list.html docs/trunk/components/enhancer/engines/nif20.html docs/trunk/components/enhancer/engines/nif20config.png

Author: buildbot
Date: Mon Oct 27 15:19:23 2014
New Revision: 927044

Log:
Staging update by buildbot for stanbol

Added:
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20.html
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png   (with props)
Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Oct 27 15:19:23 2014
@@ -1 +1 @@
-1605679
+1634568

Modified: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html (original)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html Mon Oct 27 15:19:23 2014
@@ -393,9 +393,9 @@
 </ul>
 <h3 id="others">Others</h3>
 <ul>
-<li><em>NLP 2 RDF Engine:</em> <strong>under development</strong> (see <a href="https://issues.apache.org/jira/browse/STANBOL-741">STANBOL-741</a>)<ul>
-<li>converts NLP processing results stored in the <a href="../nlp/analyzedtext">AnalyzedText</a> content part to RDF and adds them to the metadata of the <a href="../contentitem">ContentItem</a></li>
-<li>generated RDF uses the NIF (NLP Interchange Format)</li>
+<li><strong><a href="nif20">NIF 2.0 Transformation Engine</a></strong> allows to serialize low level NLP results as RDF<ul>
+<li><a href="http://persistence.uni-leipzig.org/nlp2rdf/">NIF 2.0</a> stands for NLP Interchange Format. It defines an RDF schema that allows to describe Sentences, Phrases, Words and its NLP annotation.</li>
+<li>This engines allows to retrieve detailed information about NLP results typically only available by the Java API of the <a href="../nlp/analyzedtext">Analysed Text</a> content part.</li>
 </ul>
 </li>
 </ul>
@@ -411,6 +411,14 @@
 </ul>
 </li>
 <li>
+<p><em>NLP 2 RDF Engine:</em> <strong>under development</strong> (see <a href="https://issues.apache.org/jira/browse/STANBOL-741">STANBOL-741</a>)</p>
+<ul>
+<li>replaced by the <strong><a href="nif20">NIF 2.0 Transformation Engine</a></strong> that supportes version 2.0 of the NIF standard while this engine is based on NIF 1.0</li>
+<li>converts NLP processing results stored in the <a href="../nlp/analyzedtext">AnalyzedText</a> content part to RDF and adds them to the metadata of the <a href="../contentitem">ContentItem</a></li>
+<li>generated RDF uses the NIF (NLP Interchange Format)</li>
+</ul>
+</li>
+<li>
 <p><em>CachingDereferencerEngine</em> <strong>deprecated</strong> (see dereferencing support of individual engines as well as  <a href="https://issues.apache.org/jira/browse/STANBOL-336">STANBOL-336</a>)</p>
 <ul>
 <li>retrieves additional content for presenting the enhancement results.</li>

Added: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20.html (added)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20.html Mon Oct 27 15:19:23 2014
@@ -0,0 +1,289 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - NIF 2.0 Transformation Engine</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link title="doap" rel="meta" type="application/rdf+xml" href="/doap.rdf"/>
+  <link rel="icon" type="image/png" href="/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+    <a href="/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/images/stanbol-logo/stanbol-2010-12-14.png"/></a><br />
+      <ul>
+<li><a href="/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/docs/trunk/">Documentation</a><ul>
+<li><a href="/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/docs/trunk/components/">Components</a></li>
+<li><a href="/docs/trunk/production-mode/">Production Mode</a></li>
+</ul>
+</li>
+<li><a href="/development/">Development</a><ul>
+<li><a href="/development/index.html#mailing_lists">Mailing Lists</a></li>
+<li><a href="/development/index.html#issue_tracker">Issue Tracker</a></li>
+<li><a href="/development/index.html#source_code">Source Code</a></li>
+<li><a href="/development/index.html#development_practices">Development Practices</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/downloads/">Overview</a><ul>
+<li><a href="/downloads/releases.html">Releases</a></li>
+<li><a href="/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/pmc/">PMC</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="archived-docs">Archived Docs</h1>
+<ul>
+<li><a href="/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+<p><br /><a href="/doap.rdf"><img style="margin-left: 1em;" border="0" alt="DOAP File" src="/images/doap.png"/></a></p>
+  </div>
+  <div id="content">
+    <div class="breadcrumbs">
+      <ul> <li><a href="/">Home</a></li> <li class="item"><a href="/docs/">Docs</a></li> <li class="item"><a href="/docs/trunk/">Trunk</a></li> <li class="item"><a href="/docs/trunk/components/">Components</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/">Enhancer</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/engines/">Engines</a></li> </ul>
+    </div>
+    <h1 class="title">NIF 2.0 Transformation Engine</h1>
+    <p>Typically low level NLP results are not included to the RDF enhancement results. This engine supports the serialization of such results by using the <a href="http://persistence.uni-leipzig.org/nlp2rdf/">NIF 2.0</a> (NLP Interchange Format)  standard.</p>
+<h2 id="processed-information-input">Processed Information (Input)</h2>
+<p>Apache Stanbol manages NLP results by the <a href="../nlp/analyzedtext">Analysed Text</a> content part. This ContentPart provides a Java API for accessing those results. This engine reads such information and transformes it according to the <a href="http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html">NIF 2.0</a> core ontology. </p>
+<p>If a ContentItem does not contain this content part it will not be processed by this engine.</p>
+<h2 id="created-rdf">Created RDF</h2>
+<p>The engine serializes the following information:</p>
+<ul>
+<li>Segment URIs by using the <a href="http://tools.ietf.org/html/rfc5147">RFC 5147</a> URI scheme</li>
+<li>Selector information like <code>nif:beginIndex</code>, <code>nif:endIndex</code> as well as <code>nif:before</code>, <code>nif:anchorOf</code> and <code>nif:after</code>. For spans longer as 100 chars the <code>nif:head</code> property is used instead of <code>nif:anchorOf</code>.</li>
+<li>Context information: This includes <code>nif:referenceContext</code> links for all Strings as well as additional metadata for the context.</li>
+<li>String hierarchies: <code>nif:sub-/nif:superWord</code>, <code>nif:sentence</code></li>
+<li>String navigation: <code>nif:next-/nif:previousSentnece</code>, <code>nif:next-/nif:previousWord</code></li>
+<li>String annotations: <code>nif:oliaCategory</code>, <code>nif:oliaConfidence</code> and <code>nif:posTag</code></li>
+</ul>
+<h3 id="configuration">Configuration</h3>
+<p>The Engine supports several switches that allow to enable/disable the serialization of NIF information. The engine supports the configuration of multiple instances with different configurations. The following figure shows the configuration dialog:</p>
+<p><img alt="NIF2.0 Engine Configuration" src="nif20config.png" /></p>
+<ul>
+<li><strong>Selector</strong> <em>(enhancer.engines.nlp2rdf.selector)</em>: Allows to enable/disable the serialization of selector related properties such as <code>nif:beginIndex</code>, <code>nif:endIndex</code>, <code>nif:before</code>, <code>nif:anchorOf</code> and <code>nif:after</code>. If disabled clients can still parse the start/end indexes from the <a href="http://tools.ietf.org/html/rfc5147">RFC 5147</a> encoded segment URI.</li>
+<li><strong>Hierarchy</strong> <em>(enhancer.engines.nlp2rdf.hierarchy)</em>: Switch that allows to enable/disable writing of hierarchical links. This includes <code>olia:sentence</code>, <code>olia:superString</code> and <code>olia:subString</code> properties.</li>
+<li><strong>Previous and Next Links</strong> <em>(enhancer.engines.nlp2rdf.previousNext)</em>: Allows to enable/disable the serialization of links to the previous/next sentence/word</li>
+<li><strong>Context only URI Scheme</strong> <em>(enhancer.engines.nlp2rdf.cotextOnlyUriScheme)</em>: If enabled the used <a href="http://tools.ietf.org/html/rfc5147">RFC 5147</a> URI scheme is added only to the <code>rdf:type</code> of the <code>nif:Context</code>. If disabled the <code>nif:RFC5147String</code> <code>rdf:type</code> is added to all segments.</li>
+<li><strong>String Type</strong> <em>(enhancer.engines.nlp2rdf.writeStringType)</em>: If enabled the <code>nif:String</code> type is added to all serialized segments. If disabled only more specific types like <code>nif:Sentence</code> or <code>nif:Word</code> are used.</li>
+</ul>
+<h3 id="examples">Examples</h3>
+<p>This sections provides some examples of RDF generated by this Engine. OpenNLP was used to create the serialized NLP annotation. The Sentence <code>The Apache Stanbol Enhancer can detect entities in text</code> was used for generating this example.</p>
+<div class="codehilite"><pre>@prefix content &lt;urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#&gt; .
+@prefix nif  &lt;http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#&gt; .
+@prefix olia  &lt;http://purl.org/olia/olia.owl#&gt; .
+@prefix  xsd  &lt;http://www.w3.org/2001/XMLSchema#&gt; .
+</pre></div>
+
+
+<p>The first Turtle snippet shows the <code>nif:Context</code> instance. This is referenced by all segments and it will refer to the URI of the ContentItem by using the <code>nif:sourceUrl</code>.</p>
+<div class="codehilite"><pre>content:char=0
+    a nif:Context ,  nif:RFC5147String ;
+    nif:anchorOf
+        &quot;The Apache Stanbol Enhancer can detect entities in text.&quot;@en ;
+    nif:beginIndex
+        &quot;0&quot;^^xsd:int ;
+    nif:endIndex
+        &quot;56&quot;^^xsd:int ;
+    nif:sourceUrl
+        &lt;urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e&gt; .
+</pre></div>
+
+
+<p>Next the segment describing the only sentence in the example text.</p>
+<div class="codehilite"><pre>content:char=0,56
+    a nif:RFC5147String ,  nif:Sentence ;
+    nif:anchorOf
+        &quot;The Apache Stanbol Enhancer can detect entities in text.&quot;@en ;
+    nif:beginIndex
+        &quot;0&quot;^^xsd:int ;
+    nif:endIndex
+        &quot;56&quot;^^xsd:int ;
+    nif:firstWord
+        content:char=0,3 ;
+    nif:referenceContext
+        content:char=0 .
+</pre></div>
+
+
+<p>The following snippet shows the segments for the first three words of the Sentence.</p>
+<div class="codehilite"><pre>content:char=0,3
+    a nif:RFC5147String ,  nif:Word ;
+    nif:anchorOf
+        &quot;The&quot;@en ;
+    nif:beginIndex
+        &quot;0&quot;^^xsd:int ;
+    nif:endIndex
+        &quot;3&quot;^^xsd:int ;
+    nif:nextWord
+        content:char=4,10 ;
+    nif:oliaCategory
+         olia:Determiner ,  olia:PronounOrDeterminer ;
+    nif:oliaConf
+        &quot;0.9662179110607207&quot;^^xsd:double ;
+    nif:posTag
+        &quot;DT&quot;^^xsd:string ;
+    nif:referenceContext
+        content:char=0 ;
+    nif:sentence
+        content:char=0,56 ;
+    nif:subString
+        content:char=0,10 .
+
+content:char=4,10
+    a nif:RFC5147String ,  nif:Word ;
+    nif:anchorOf
+        &quot;Apache&quot;@en ;
+    nif:beginIndex
+        &quot;4&quot;^^xsd:int ;
+    nif:endIndex
+        &quot;10&quot;^^xsd:int ;
+    nif:nextWord
+        content:char=11,18 ;
+    nif:oliaCategory
+         olia:Noun ,  olia:PluralQuantifier ,  olia:ProperNoun ,  olia:Quantifier ;
+    nif:oliaConf
+        &quot;0.7882547205652428&quot;^^xsd:double ;
+    nif:posTag
+        &quot;NNPS&quot;^^xsd:string ;
+    nif:previousWord
+        content:char=0,3 ;
+    nif:referenceContext
+        content:char=0 ;
+    nif:sentence
+        content:char=0,56 ;
+    nif:subString
+        content:char=0,10 .
+
+content:char=11,18
+    a nif:RFC5147String ,  nif:Word ;
+    nif:anchorOf
+        &quot;Stanbol&quot;@en ;
+    nif:beginIndex
+        &quot;11&quot;^^xsd:int ;
+    nif:endIndex
+        &quot;18&quot;^^xsd:int ;
+    nif:nextWord
+        content:char=19,27 ;
+    nif:oliaCategory
+         olia:Noun ,  olia:ProperNoun ,  olia:Quantifier ,  olia:SingularQuantifier ;
+    nif:oliaConf
+        &quot;0.701014272348203&quot;^^xsd:double ;
+    nif:posTag
+        &quot;NNP&quot;^^xsd:string ;
+    nif:previousWord
+        content:char=4,10 ;
+    nif:referenceContext
+        content:char=0 ;
+    nif:sentence
+        content:char=0,56 ;
+    nif:subString
+        content:char=11,27 .
+</pre></div>
+
+
+<p>Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the included the segment for the verb that links to the phrase using <code>nif:subString</code>.</p>
+<div class="codehilite"><pre>content:char=28,38
+    a nif:Phrase ,  nif:RFC5147String ;
+    nif:anchorOf
+        &quot;can detect&quot;@en ;
+    nif:beginIndex
+        &quot;28&quot;^^xsd:int ;
+    nif:endIndex
+        &quot;38&quot;^^xsd:int ;
+    nif:oliaCategory
+         olia:VerbPhrase ;
+    nif:oliaConf
+        &quot;0.9864510669287669&quot;^^xsd:double ;
+    nif:referenceContext
+        content:char=0 ;
+    nif:superString
+        content:char=0,56 .
+
+content:char=32,38
+    a nif:RFC5147String ,  nif:Word ;
+    nif:anchorOf
+        &quot;detect&quot;@en ;
+    nif:beginIndex
+        &quot;32&quot;^^xsd:int ;
+    nif:endIndex
+        &quot;38&quot;^^xsd:int ;
+    nif:nextWord
+        content:char=39,47 ;
+    nif:oliaCategory
+         olia:Infinitive ,  olia:Verb ;
+    nif:oliaConf
+        &quot;0.9930989756397197&quot;^^xsd:double ;
+    nif:posTag
+        &quot;VB&quot;^^xsd:string ;
+    nif:previousWord
+        content:char=28,31 ;
+    nif:referenceContext
+        content:char=0 ;
+    nif:sentence
+        content:char=0,56 ;
+    nif:subString
+        content:char=28,38 .
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>
+

Added: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
------------------------------------------------------------------------------
    svn:mime-type = image/png