You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2013/03/20 16:25:16 UTC

svn commit: r855231 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/kuromojinlp.html docs/trunk/components/enhancer/engines/textannotationnewmodel.html

Author: buildbot
Date: Wed Mar 20 15:25:15 2013
New Revision: 855231

Log:
Staging update by buildbot for stanbol

Added:
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/kuromojinlp.html
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/textannotationnewmodel.html
Modified:
    websites/staging/stanbol/trunk/content/   (props changed)

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Mar 20 15:25:15 2013
@@ -1 +1 @@
-1458883
+1458884

Added: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/kuromojinlp.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/kuromojinlp.html (added)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/kuromojinlp.html Wed Mar 20 15:25:15 2013
@@ -0,0 +1,123 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Kuromoji NLP Engine for Japanese</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+    <a href="/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/images/stanbol-logo/stanbol-2010-12-14.png"/></a><br />
+      <ul>
+<li><a href="/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/docs/trunk/">Documentation</a><ul>
+<li><a href="/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/docs/trunk/components/">Components</a></li>
+<li><a href="/docs/trunk/production-mode/">Production Mode</a></li>
+</ul>
+</li>
+<li><a href="/development/">Development</a><ul>
+<li><a href="/development/index.html#mailing_lists">Mailing Lists</a></li>
+<li><a href="/development/index.html#issue_tracker">Issue Tracker</a></li>
+<li><a href="/development/index.html#source_code">Source Code</a></li>
+<li><a href="/development/index.html#development_practices">Development Practices</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/downloads/">Overview</a><ul>
+<li><a href="/downloads/releases.html">Releases</a></li>
+<li><a href="/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/pmc/">PMC</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="archived-docs">Archived Docs</h1>
+<ul>
+<li><a href="/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrumbs">
+      <ul> <li><a href="/">Home</a></li> <li class="item"><a href="/docs/">Docs</a></li> <li class="item"><a href="/docs/trunk/">Trunk</a></li> <li class="item"><a href="/docs/trunk/components/">Components</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/">Enhancer</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/engines/">Engines</a></li> </ul>
+    </div>
+    <h1 class="title">Kuromoji NLP Engine for Japanese</h1>
+    <p><a href="http://www.atilika.org/">Kuromoji</a> is a NLP Framework contributed to <a href="http://lucene.apache.org">Apache Lucene</a>. It is available starting with version 3.6.2 and 4.1 of Solr/Lucene. In Stanbol it requires the use of a version newer than <a href="http://svn.apache.org/r1458703">revision 1458703</a> as it only works for the stanbol.commons.solr modules compatible to Solr 4.1.</p>
+<h2 id="consumed-information">Consumed information</h2>
+<ul>
+<li><strong>Language</strong> (required): The language of the text needs to be available. It is read as specified by <a href="https://issues.apache.org/jira/browse/STANBOL-613">STANBOL-613</a> from the metadata of the ContentItem. Effectively this means that any Stanbol Language Detection engine will need to be executed before the OpenNLP POS Tagging Engine.</li>
+</ul>
+<h2 id="supported-modules">Supported modules</h2>
+<ul>
+<li><strong>Sentences</strong> : Kuromoji itself does not provide sentence detection. Because of that the detection of sentences is done by using POS tagging results. The POS tag '記号-句点' is used for splitting Sentences. Further it is assumed that each Text starts and ends with a complete sentence.</li>
+<li><strong>Tokens</strong>: Kuromoji is configured to provide tokens for all words and punctuation. This is done by configuring an empty stop tag list as well as setting the 'discardPunctuation' property to <code>false</code></li>
+<li><strong>POS tagging</strong>: The POS tag set used by Kuromoji was mapped to the LexicalCategories and POS types as defined by the Stanbol NLP processing module. For the String tags the Japanese name is used (e.g. '名詞-代名詞-縮約' := Pos.Pronoun,Pos.Participle, description: noun-pronoun-contraction: Spoken language contraction made by combining a pronoun and the particle 'wa'. e.g. ありゃ, こりゃ, こりゃあ, そりゃ, そりゃあ )
+    POS tags are represented by adding <em>NlpAnnotations#POS_ANNOTATION</em>'s to the <em>Tokens</em> of the <em>AnalyzedText</em> content part. Kuromoji provides only a single POS tag per Token.</li>
+<li><strong>NER detection</strong>; The POS tag set used by Kuromoji defines POS tags describing named entities. Those POS tags are than combined to chunks and interpreted as named entities (e.g. '名詞-固有名詞-人名-姓' noun-proper-person-surname; '名詞-固有名詞-人名-名' noun-proper-person-given_name)
+    Named Entities are represented by adding <em>NlpAnnotations#NER_ANNOTATION</em>'s to the <em>Tokens</em> of the <em>AnalyzedText</em> content part. In addition also 'fise:TextAnnotations' are added to the metadata of the ContentItem.</li>
+</ul>
+<h3 id="confidence">Confidence</h3>
+<p>Kuromoji does not provide confidence values for results.</p>
+<h2 id="configuration">Configuration</h2>
+<p>The engine does not provide any custom configuration. However it supports the configuration of the engine name.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>
+

Added: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/textannotationnewmodel.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/textannotationnewmodel.html (added)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/textannotationnewmodel.html Wed Mar 20 15:25:15 2013
@@ -0,0 +1,112 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - TextAnnotation new Model Converter Engine</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+    <a href="/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/images/stanbol-logo/stanbol-2010-12-14.png"/></a><br />
+      <ul>
+<li><a href="/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/docs/trunk/">Documentation</a><ul>
+<li><a href="/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/docs/trunk/components/">Components</a></li>
+<li><a href="/docs/trunk/production-mode/">Production Mode</a></li>
+</ul>
+</li>
+<li><a href="/development/">Development</a><ul>
+<li><a href="/development/index.html#mailing_lists">Mailing Lists</a></li>
+<li><a href="/development/index.html#issue_tracker">Issue Tracker</a></li>
+<li><a href="/development/index.html#source_code">Source Code</a></li>
+<li><a href="/development/index.html#development_practices">Development Practices</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/downloads/">Overview</a><ul>
+<li><a href="/downloads/releases.html">Releases</a></li>
+<li><a href="/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/pmc/">PMC</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="archived-docs">Archived Docs</h1>
+<ul>
+<li><a href="/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrumbs">
+      <ul> <li><a href="/">Home</a></li> <li class="item"><a href="/docs/">Docs</a></li> <li class="item"><a href="/docs/trunk/">Trunk</a></li> <li class="item"><a href="/docs/trunk/components/">Components</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/">Enhancer</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/engines/">Engines</a></li> </ul>
+    </div>
+    <h1 class="title">TextAnnotation new Model Converter Engine</h1>
+    <p>This Engine converts '<a href="../enhancementstructure#fisetextannotation">fise:TextAnnotation</a>' to include the 'fise:selection-prefix' and 'fise:selection-suffix' properties as introduced by <a href="https://issues.apache.org/jira/browse/STANBOL-987">STANBOL-987</a>.</p>
+<p>It processes all 'fise:TextAnnotation' that select a specific part of the text. Meaning that they define a 'fise:start' and 'fise:end' property. 'fise:TextAnnotations' that do already define 'fise:selection-prefix' or 'fise:selection-suffix' properties are skipped.</p>
+<h2 id="configuration">Configuration:</h2>
+<p>Other than the configurations for the engines name and ranking this engine supports the following custom properties:</p>
+<ul>
+<li><strong>Prefix Suffix Length</strong> <em>(enhancer.engines.textannotationnewmodel.prefixSuffixSize)</em>: Allows to change the char length of prefixes and suffixes. The default is <code>10</code>. The minimum allowed value is <code>3</code></li>
+</ul>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>
+