You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/11/03 09:11:16 UTC

svn commit: r837121 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/customnermodelengine.html docs/trunk/components/enhancer/engines/customnermodelengineconfig.png docs/trunk/components/enhancer/engines/list.html

Author: buildbot
Date: Sat Nov  3 08:11:15 2012
New Revision: 837121

Log:
Staging update by buildbot for stanbol

Added:
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/customnermodelengine.html
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/customnermodelengineconfig.png   (with props)
Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sat Nov  3 08:11:15 2012
@@ -1 +1 @@
-1402794
+1405305

Added: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/customnermodelengine.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/customnermodelengine.html (added)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/customnermodelengine.html Sat Nov  3 08:11:15 2012
@@ -0,0 +1,148 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - The Custom NER Model Extraction Engine</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="logo"> <!-- do not scroll the logo -->
+  <a href="/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/images/stanbol-logo/stanbol-2010-12-14.png"/></a></div>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+      <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/index.html">Home</a></li>
+<li><a href="/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/docs/trunk/">Documentation</a><ul>
+<li><a href="/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/docs/trunk/components">Components</a></li>
+</ul>
+</li>
+<li><a href="/development/">Development</a></li>
+<li><a href="/production/">Production</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/downloads/">Overview</a><ul>
+<li><a href="/downloads/releases.html">Releases</a></li>
+<li><a href="/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="archive">Archive</h1>
+<ul>
+<li><a href="/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  <div id="content">
+    <div class="breadcrumbs">
+      <ul> <li><a href="/">Home</a></li> <li class="item"><a href="/docs/">Docs</a></li> <li class="item"><a href="/docs/trunk/">Trunk</a></li> <li class="item"><a href="/docs/trunk/components/">Components</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/">Enhancer</a></li> <li class="item"><a href="/docs/trunk/components/enhancer/engines/">Engines</a></li> </ul>
+    </div>
+    <h1 class="title">The Custom NER Model Extraction Engine</h1>
+    <p>This engine allows the configuration of custom <a href="http://opennlp.apache.org">Apache OpenNLP</a> NameFinder models for NER of plain text content. </p>
+<h2 id="example-result">Example Result</h2>
+<p>This engine adds <a href="../enhancementstructure.html#fisetextannotation">fise:TextAnnotation</a> for the processed plain text to the metadata of the content item. The following code listing shows an DNA type Named Entity detected based on a OpenNLP NameFinder model trained based on the <a href="http://www.nactem.ac.uk/tsujii/GENIA/ERtask/report.html">BioNLP2004</a> dataset:</p>
+<div class="codehilite"><pre><span class="p">{</span>
+    <span class="s">&quot;@subject&quot;</span><span class="p">:</span> <span class="s">&quot;urn:enhancement-0e31eb01-23c5-82b5-1372-5c5606c09960&quot;</span><span class="p">,</span>
+    <span class="s">&quot;@type&quot;</span><span class="p">:</span> <span class="p">[</span>
+        <span class="s">&quot;Enhancement&quot;</span><span class="p">,</span>
+        <span class="s">&quot;TextAnnotation&quot;</span>
+    <span class="p">],</span>
+    <span class="s">&quot;confidence&quot;</span><span class="p">:</span> <span class="mf">0.40148407</span><span class="p">,</span>
+    <span class="s">&quot;creator&quot;</span><span class="p">:</span> <span class="s">&quot;org.apache.stanbol.enhancer.engines.opennlp.impl.CustomNERModelEnhancementEngine&quot;</span><span class="p">,</span>
+    <span class="s">&quot;start&quot;</span><span class="p">:</span> <span class="mi">228</span><span class="p">,</span>
+    <span class="s">&quot;end&quot;</span><span class="p">:</span> <span class="mi">242</span><span class="p">,</span>
+    <span class="s">&quot;extracted-from&quot;</span><span class="p">:</span> <span class="s">&quot;urn:content-item-sha1-84a30aeeb073be543f7c54266e232aae572efac0&quot;</span><span class="p">,</span>
+    <span class="s">&quot;selected-text&quot;</span><span class="p">:</span> <span class="p">{</span>
+        <span class="s">&quot;@language&quot;</span><span class="p">:</span> <span class="s">&quot;en&quot;</span><span class="p">,</span>
+        <span class="s">&quot;@literal&quot;</span><span class="p">:</span> <span class="s">&quot;HIV-2 enhancer&quot;</span>
+    <span class="p">},</span>
+    <span class="s">&quot;selection-context&quot;</span><span class="p">:</span> <span class="p">{</span>
+        <span class="s">&quot;@language&quot;</span><span class="p">:</span> <span class="s">&quot;en&quot;</span><span class="p">,</span>
+        <span class="s">&quot;@literal&quot;</span><span class="p">:</span> <span class="s">&quot;activation of the HIV-2 enhancer in monocytes and T cells&quot;</span>
+    <span class="p">},</span>
+    <span class="s">&quot;type&quot;</span><span class="p">:</span> <span class="s">&quot;http://www.bootstrep.eu/ontology/GRO#DNA&quot;</span>
+<span class="p">},</span>
+</pre></div>
+
+
+<h2 id="configuration">Configuration</h2>
+<p>The usage of this Engine requires to create a service configuration. Configurations require at least a single NameFinderModel name to be configured.</p>
+<h3 id="parameters">Parameters</h3>
+<ul>
+<li><strong>Name Finder Models</strong> <em>(stanbol.engines.opennlp-ner.nameFinderModels)</em>: The list if custom NameFinderModels used by this engine. The Engine supports Arrays, Vectors and comma separated string for. Values are the file names of the NameFinderModel files. Configured files are loaded by using the DataFileProvider service. That means that files copied into the 'datafile' folder (by default located at '{stanbol-working-dir}/stanbol/datafiles').</li>
+<li><strong>Named Entity to 'dc:type' Mappings</strong> <em>(stanbol.engines.opennlp-ner.typeMappings)</em>: This configuration uses the syntax {named-entity-type} &gt; {uri}": {named-entity-type} matches to the string "name" used for the named entity type in the OpenNLP NameFinder model. {uri} MUST BE a valid URI and is used as dc:type value for fise:TextAnnotations created by the engine for extracted Named Entities. NOTE: that TextAnnotations for unmapped Named Entity Types will have no dc:type information.</li>
+</ul>
+<p>The following figure provides a visual representation of an engine configuration configured for all NamedEntity types supported by the <a href="http://www.nactem.ac.uk/tsujii/GENIA/ERtask/report.html">BioNLP2004</a> dataset.</p>
+<p><img alt="'CustomNerModelEngine Configuration'" src="customnermodelengineconfig.png" title="This figure shows the configuration screen as presented by the Apache Felix WebConsole when creating an Component Configuration for the Custom NER Model Engine" /></p>
+<p>The same configuration can be also provided as OSGI configuration file with the name 'org.apache.stanbol.enhancer.engines.opennlp.impl.CustomNERModelEnhancementEngine-ehealthner.config' and the contents:</p>
+<div class="codehilite"><pre>stanbol.enhancer.engine.name=&quot;ehealth-ner&quot;
+stanbol.engines.opennlp-ner.nameFinderModels=[&quot;bionlp2004-DNA-en.bin&quot;,&quot;bionlp2004-protein-en.bin&quot;,&quot;bionlp2004-cell_type-en.bin&quot;,&quot;bionlp2004-cell_line-en.bin&quot;,&quot;bionlp2004-RNA-en.bin&quot;]
+stanbol.engines.opennlp-ner.typeMappings=[&quot;DNA\ &gt;\ http://www.bootstrep.eu/ontology/GRO#DNA&quot;,&quot;RNA\ &gt;\ http://www.bootstrep.eu/ontology/GRO#RNA&quot;,&quot;protein\ &gt;\ http://www.bootstrep.eu/ontology/GRO#Protein&quot;,&quot;cell_type\ &gt;\ http://purl.bioontology.org/ontology/CL&quot;,&quot;cell_line\ &gt;\ http://purl.bioontology.org/ontology/MCCL&quot;]
+</pre></div>
+
+
+<p>NOTE: that the '.config' format requires spaces to be escaped with '\'</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>
+

Added: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/customnermodelengineconfig.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/customnermodelengineconfig.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Modified: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html (original)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html Sat Nov  3 08:11:15 2012
@@ -127,6 +127,14 @@
 </ul>
 </li>
 <li>
+<p><strong><a href="customnermodelengine.html">Custom NER Model Extraction Enhancement Engine</a></strong> </p>
+<ul>
+<li>NLP processing using OpenNLP NER</li>
+<li>uses custom NameFinder modles (user configured)</li>
+<li>supports custom Named Entity types (other than persons, places and organizations</li>
+</ul>
+</li>
+<li>
 <p><strong><a href="keywordlinkingengine.html">KeywordLinkingEngine</a></strong></p>
 <ul>
 <li>NLP processing using OpenNLP</li>