You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2013/06/10 07:43:00 UTC

svn commit: r865093 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/entitylinking.html

Author: buildbot
Date: Mon Jun 10 05:43:00 2013
New Revision: 865093

Log:
Staging update by buildbot for stanbol

Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Jun 10 05:43:00 2013
@@ -1 +1 @@
-1491339
+1491341

Modified: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html (original)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html Mon Jun 10 05:43:00 2013
@@ -135,13 +135,12 @@
 
 
 <p>where:</p>
-<div class="codehilite"><pre><span class="o">*</span> <span class="p">{</span><span class="n">lt</span><span class="p">}</span> <span class="p">...</span> <span class="n">the</span> <span class="n">_Linkable</span> <span class="n">Token_</span> <span class="k">for</span> <span class="n">that</span> <span class="n">the</span> <span class="n">search</span> <span class="n">is</span> <span class="n">issued</span>
-<span class="o">*</span> <span class="p">{</span><span class="n">at</span><span class="p">}</span> <span class="p">...</span> <span class="n">additional</span> <span class="n">_Linkable</span><span class="o">-</span><span class="n">_</span> <span class="n">or</span> <span class="n">_Matchable</span> <span class="n">Tokens_</span> <span class="n">included</span> <span class="n">in</span> <span class="n">the</span> <span class="n">search</span>
-<span class="o">*</span> <span class="p">{</span><span class="n">lang</span><span class="p">}</span> <span class="p">...</span> <span class="n">the</span> <span class="n">language</span> <span class="n">of</span> <span class="n">the</span> <span class="n">text</span>
-<span class="o">*</span> <span class="p">{</span><span class="n">dl</span><span class="p">}</span> <span class="p">...</span> <span class="n">the</span> <span class="n">configured</span> <span class="n">_Default</span> <span class="n">Matching</span> <span class="n">Language_</span><span class="p">.</span> <span class="n">If</span> <span class="p">{</span><span class="n">df</span><span class="p">}</span> <span class="o">==</span> <span class="p">{</span><span class="n">lang</span><span class="p">}</span> <span class="n">than</span> <span class="n">the</span> <span class="n">or</span> <span class="n">term</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="k">for</span> <span class="n">the</span> <span class="p">{</span><span class="n">dl</span><span class="p">}</span> <span class="n">are</span> <span class="n">omitted</span>
-</pre></div>
-
-
+<ul>
+<li>{lt} ... the <em>Linkable Token</em> for that the search is issued</li>
+<li>{at} ... additional <em>Linkable-</em> or <em>Matchable Tokens</em> included in the search</li>
+<li>{lang} ... the language of the text</li>
+<li>{dl} ... the configured <em>Default Matching Language</em>. If '{df} == {lang}' than the or term(s) for the {dl} are omitted</li>
+</ul>
 <p>For results of those queries the labels in the {lang} and {dl} are matched against the text. However {dl} labels are only considered if no match was found for labels in the language of the text. For matching labels with the Tokens of the text the engine need to tokenize the labels. This is done by using the <em>LabelTokenizer</em> interface.</p>
 <p>The matching process distinguishes between matchable and non-matchable Tokens as well as non-alpha-numeric Tokens that are completely ignored. Matching starts at the position of the <em>Linkable Token</em> for that the search in the configured vocabulary was issued. From this position Tokens in the Label are matched with Tokens in the text until the first matchable or 2nd non-matchable token is not found. In a second round the same is done in the backward direction. The configured <em>Min Token Match Factor</em> determines how exact tokens in the text must correspond to tokens in the label so that a match is considered. This is repeated for all labels of an Entity. The label match that covers the most tokens is than considered as the match for that Entity.</p>
 <p>There are various parameters that can be used to fine tune the matching process. But the most important decision is if one want to include suggestions where labels with two tokens do only match a single <em>Matchable Token</em> in the Text (e.g. "Barack Obama" matching "Obama" but also 1000+ "Tom {something}" matching "Tom"). The default configuration of the Engine excludes those but depending on the use case and the linked vocabulary users might want to change this. See the documentation of the <em>Min Matched Tokens</em> and <em>Min Labe Score</em> for details and examples. </p>
@@ -157,7 +156,7 @@
 <p>The configuration of the EntityLinkingEngine done by parsing a <em>TextProcessingConfig</em> and an <em>EntityLinkingConfig</em> in it constructor. Both configuration classes provide an API base configuration (via getter and setter) as well as an OSGI Dictionary based configuration (via a static method that configures a new instance by an parsed configuration).</p>
 <p>The following two sections describe the "key, value" based configuration as the API based version is anyway described by the JavaDoc.</p>
 <h3 id="text-processing-configuration">Text Processing Configuration</h3>
-<h4 id="proper-noun-linking-wzxhzdk16enhancerengineslinkingpropernounsstatewzxhzdk17">Proper Noun Linking <small><em>(enhancer.engines.linking.properNounsState)</em></small></h4>
+<h4 id="proper-noun-linking-wzxhzdk15enhancerengineslinkingpropernounsstatewzxhzdk16">Proper Noun Linking <small><em>(enhancer.engines.linking.properNounsState)</em></small></h4>
 <p>This is a high level configuration option allowing users to easily specify if they want to do EntityLinking based on any Nouns ("Noun Linking") or only ProperNouns ("Proper Noun Linking").
 Configuration wise this will pre-set the defaults for the linkable <em>LexcicalCategories</em> and <em>Pos</em> types.</p>
 <p>"Noun linking" is equivalent to the behavior of the <a href="keywordlinkingengine">KeywordLinkingEngine</a> while "Proper Noun Linking" is similar to using NER (Named Entity Recognition) with the <a href="namedentityextractionengine">NamedEntityLinking</a> engine. </p>
@@ -171,7 +170,7 @@ Configuration wise this will pre-set the
 </li>
 </ol>
 <p>If suitable it is strongly recommended to activate "Proper Noun Linking" as it highly increases the performance because in typical text only around 1/10 of the Nouns are marked as Proper Nouns and therefore the amount of vocabulary lookups also decreases by this amount.</p>
-<h4 id="language-processing-configuration-wzxhzdk18enhancerengineslinkingprocessedlanguageswzxhzdk19">Language Processing configuration <small><em>(enhancer.engines.linking.processedLanguages)</em></small></h4>
+<h4 id="language-processing-configuration-wzxhzdk17enhancerengineslinkingprocessedlanguageswzxhzdk18">Language Processing configuration <small><em>(enhancer.engines.linking.processedLanguages)</em></small></h4>
 <p>This parameter is used for two things: (1) to specify what languages are processed and (2) to provide specific configurations on how languages are processed. For the 2nd aspect there is also a default configuration that can be extended with language specific setting.</p>
 <p><strong>1. Processed Languages Configuration:</strong></p>
 <p>For the configuration of the processed languages the following syntax is used:</p>