You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/12/18 19:49:01 UTC

svn commit: r843005 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/entitylinking.html

Author: buildbot
Date: Tue Dec 18 18:49:00 2012
New Revision: 843005

Log:
Staging update by buildbot for stanbol

Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue Dec 18 18:49:00 2012
@@ -1 +1 @@
-1423476
+1423575

Modified: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html (original)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html Tue Dec 18 18:49:00 2012
@@ -141,7 +141,7 @@
 
 <p>For results of those queries the labels in the {lang} and {dl} are matched against the text. However {dl} labels are only considered if no match was found for labels in the language of the text. For matching labels with the Tokens of the text the engine need to tokenize the labels. This is done by using the <em>LabelTokenizer</em> interface.</p>
 <p>The matching process distinguishes between matchable and non-matchable Tokens as well as non-alpha-numeric Tokens that are completely ignored. Matching starts at the position of the <em>Linkable Token</em> for that the search in the configured vocabulary was issued. From this position Tokens in the Label are matched with Tokens in the text until the first matchable or 2nd non-matchable token is not found. In a second round the same is done in the backward direction. The configured <em>Min Token Match Factor</em> determines how exact tokens in the text must correspond to tokens in the label so that a match is considered. This is repeated for all labels of an Entity. The label match that covers the most tokens is than considered as the match for that Entity.</p>
-<p>There are various parameters that can be used to fine tune the matching process. But the most important decision is if one want to include suggestions where labels with two tokens do only match a single <em>Matchable Token</em> in the Text (e.g. "Barack Obama" matching "Obama" but also 1000+ "Tom {something}" matching "Tom"). The default configuration of the Engine excludes those but depending on the use case and the linked vocabulary users might want to change this. See the documentation of the <em>Min Matched Tokens</em> and <em>Min Label Match Score</em> for details and examples. </p>
+<p>There are various parameters that can be used to fine tune the matching process. But the most important decision is if one want to include suggestions where labels with two tokens do only match a single <em>Matchable Token</em> in the Text (e.g. "Barack Obama" matching "Obama" but also 1000+ "Tom {something}" matching "Tom"). The default configuration of the Engine excludes those but depending on the use case and the linked vocabulary users might want to change this. See the documentation of the <em>Min Matched Tokens</em> and <em>Min Labe Score</em> for details and examples. </p>
 <h3 id="writing-enhancement-results">Writing Enhancement Results</h3>
 <p>This step covers the following steps:</p>
 <ul>
@@ -258,21 +258,21 @@ Configuration wise this will pre-set the
 <p>The parameters below are used to configure the matching process.</p>
 <ul>
 <li><strong>Minimum Token Match Score</strong> <em>(enhancer.engines.linking.minTokenScore)</em>: This defines how well single tokens of the text need to match single tokens in the label so that they are considered as matching. This parameter configures the lower limit. However the actual token match score does also influence the overall matching scores for labels with the text. So non exact matches will decrease matching scores for the whole label with the text.</li>
-<li><strong>Min Label Match Score</strong> <em>org.apache.stanbol.enhancer.engines.keywordextraction.minLabelMatchFactor</em> [0..1]::double: The "Label Score" [0..1] represents how much of the Label of an Entity matches with the Text. It compares the number of Tokens of the Label with the number of Tokens matched to the Text. Not exact matches for Tokens, or if the Tokens within the label do appear in an other order than in the text do also reduce this score. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of <em>Min Label Match Score</em>, <em>Min Text Match Score</em> and <em>Min Match Score</em>.</li>
+<li><strong>Min Label Score</strong> <em>(enhancer.engines.linking.minLabelScore)</em> [0..1]::double: The "Label Score" [0..1] represents how much of the Label of an Entity matches with the Text. It compares the number of Tokens of the Label with the number of Tokens matched to the Text. Not exact matches for Tokens, or if the Tokens within the label do appear in an other order than in the text do also reduce this score. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of <em>Min Labe Score</em>, <em>Min Text Match Score</em> and <em>Min Match Score</em>.</li>
 <li>
-<p><strong>Min Matched Tokens</strong> <em>org.apache.stanbol.enhancer.engines.keywordextraction.minFoundTokens</em> [1..*]::int: The minimum number of matching tokens. Only "matchable" tokens are counted. For full matches (where all tokens of the Label do match tokens in the text) this parameter is ignored.</p>
-<p>This parameter is strongly related with the <em>Min Label Match Score</em> Typical setting are</p>
+<p><strong>Min Matched Tokens</strong> <em>(enhancer.engines.linking.minFoundTokens)</em> [1..*]::int: The minimum number of matching tokens. Only "matchable" tokens are counted. For full matches (where all tokens of the Label do match tokens in the text) this parameter is ignored.</p>
+<p>This parameter is strongly related with the <em>Min Labe Score</em> Typical setting are</p>
 <ol>
-<li><em>Min Matched Tokens</em>=1 and <em>Min Label Match Score</em> &gt; 0.5 (e.g. 0.75)</li>
-<li><em>Min Matched Tokens</em>=2 and <em>Min Label Match Score</em> &lt;= 0.5 (e.g. 0.5)</li>
+<li><em>Min Matched Tokens</em>=1 and <em>Min Label Score</em> &gt; 0.5 (e.g. 0.75)</li>
+<li><em>Min Matched Tokens</em>=2 and <em>Min Label Score</em> &lt;= 0.5 (e.g. 0.5)</li>
 </ol>
 <p>For Labels containing of one or two words both options do have the same result, but for Longer labels (1) is more restrictive than (2). The important thing is that both options ensures that Labels with more than one tokens will not be considered if only a single token does match the text.</p>
-<p>If used in combination with an disambiguation Engine one might want to consider to suggest Entities where only a single token of multi-token labels do match. In such cases a configuration like <em>Min Matched Tokens</em>=1 and <em>Min Label Match Score</em> &lt;= 0.5 (e.g. 0.4) might be considered. With such scenarios users will also want to considerable increase the value for <em>Max Suggestions</em> (typically values &gt; 10).</p>
+<p>If used in combination with an disambiguation Engine one might want to consider to suggest Entities where only a single token of multi-token labels do match. In such cases a configuration like <em>Min Matched Tokens</em>=1 and <em>Min Label Score</em> &lt;= 0.5 (e.g. 0.4) might be considered. With such scenarios users will also want to considerable increase the value for <em>Max Suggestions</em> (typically values &gt; 10).</p>
 </li>
 <li>
-<p><strong>Min Text Match Score</strong> <em>org.apache.stanbol.enhancer.engines.keywordextraction.minTextMatchFactor</em> [0..1]::double: The "Text Score" [0..1] represents how well the Label of an Entity matches to the selected Span in the Text. It compares the number of matched {@link Token} from the label with the number of Tokens enclosed by the Span in the Text an Entity is suggested for. Not exact matches for Tokens, or if the Tokens within the label do appear in an other order than in the text do also reduce this score. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of <em>Min Label Match Score</em>, <em>Min Text Match Score</em> and <em>Min Match Score</em>.</p>
+<p><strong>Min Text Score</strong> <em>(enhancer.engines.linking.minTextScore)</em> [0..1]::double: The "Text Score" [0..1] represents how well the Label of an Entity matches to the selected Span in the Text. It compares the number of matched {@link Token} from the label with the number of Tokens enclosed by the Span in the Text an Entity is suggested for. Not exact matches for Tokens, or if the Tokens within the label do appear in an other order than in the text do also reduce this score. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of <em>Min Labe Score</em>, <em>Min Text Match Score</em> and <em>Min Match Score</em>.</p>
 </li>
-<li><strong>Min Match Score</strong> <em>org.apache.stanbol.enhancer.engines.keywordextraction.minTextMatchFactor</em> [0..1]::double: Defined as the product of the "Text Score" with the "Label Score" - meaning that this value represents both how well the label matches the text and how much of the label is matched with the text. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of <em>Min Label Match Score</em>, <em>Min Text Match Score</em> and <em>Min Match Score</em>. </li>
+<li><strong>Min Match Score</strong> <em>(enhancer.engines.linking.minMatchScore)</em> [0..1]::double: Defined as the product of the "Text Score" with the "Label Score" - meaning that this value represents both how well the label matches the text and how much of the label is matched with the text. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of <em>Min Labe Score</em>, <em>Min Text Match Score</em> and <em>Min Match Score</em>. </li>
 </ul>
 <h4 id="type-mappings-syntax">Type Mappings Syntax</h4>
 <p>The Type Mappings are used to determine the "dc:type" of the <a href="../enhancementstructure.html#fisetextannotation">TextAnnotation</a> based on the types of the suggested Entity. The field "Type Mappings" (property: <em>org.apache.stanbol.enhancer.engines.keywordextraction.typeMappings</em>) can be used to customize such mappings.</p>