You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2012/12/18 19:48:55 UTC

svn commit: r1423575 - /stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext

Author: rwesten
Date: Tue Dec 18 18:48:55 2012
New Revision: 1423575

URL: http://svn.apache.org/viewvc?rev=1423575&view=rev
Log:
Corrected some outdated configuration keys of the EntityLinkingEngine

Modified:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext?rev=1423575&r1=1423574&r2=1423575&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext Tue Dec 18 18:48:55 2012
@@ -69,7 +69,7 @@ For results of those queries the labels 
 
 The matching process distinguishes between matchable and non-matchable Tokens as well as non-alpha-numeric Tokens that are completely ignored. Matching starts at the position of the _Linkable Token_ for that the search in the configured vocabulary was issued. From this position Tokens in the Label are matched with Tokens in the text until the first matchable or 2nd non-matchable token is not found. In a second round the same is done in the backward direction. The configured _Min Token Match Factor_ determines how exact tokens in the text must correspond to tokens in the label so that a match is considered. This is repeated for all labels of an Entity. The label match that covers the most tokens is than considered as the match for that Entity.
 
-There are various parameters that can be used to fine tune the matching process. But the most important decision is if one want to include suggestions where labels with two tokens do only match a single _Matchable Token_ in the Text (e.g. "Barack Obama" matching "Obama" but also 1000+ "Tom {something}" matching "Tom"). The default configuration of the Engine excludes those but depending on the use case and the linked vocabulary users might want to change this. See the documentation of the _Min Matched Tokens_ and _Min Label Match Score_ for details and examples. 
+There are various parameters that can be used to fine tune the matching process. But the most important decision is if one want to include suggestions where labels with two tokens do only match a single _Matchable Token_ in the Text (e.g. "Barack Obama" matching "Obama" but also 1000+ "Tom {something}" matching "Tom"). The default configuration of the Engine excludes those but depending on the use case and the linked vocabulary users might want to change this. See the documentation of the _Min Matched Tokens_ and _Min Labe Score_ for details and examples. 
 
 
 ### Writing Enhancement Results
@@ -202,20 +202,20 @@ The following properties define how Link
 The parameters below are used to configure the matching process.
 
 * __Minimum Token Match Score__ _(enhancer.engines.linking.minTokenScore)_: This defines how well single tokens of the text need to match single tokens in the label so that they are considered as matching. This parameter configures the lower limit. However the actual token match score does also influence the overall matching scores for labels with the text. So non exact matches will decrease matching scores for the whole label with the text.
-* __Min Label Match Score__ _org.apache.stanbol.enhancer.engines.keywordextraction.minLabelMatchFactor_ [0..1]::double: The "Label Score" [0..1] represents how much of the Label of an Entity matches with the Text. It compares the number of Tokens of the Label with the number of Tokens matched to the Text. Not exact matches for Tokens, or if the Tokens within the label do appear in an other order than in the text do also reduce this score. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of _Min Label Match Score_, _Min Text Match Score_ and _Min Match Score_.
-* __Min Matched Tokens__ _org.apache.stanbol.enhancer.engines.keywordextraction.minFoundTokens_ [1..*]::int: The minimum number of matching tokens. Only "matchable" tokens are counted. For full matches (where all tokens of the Label do match tokens in the text) this parameter is ignored.
+* __Min Label Score__ _(enhancer.engines.linking.minLabelScore)_ [0..1]::double: The "Label Score" [0..1] represents how much of the Label of an Entity matches with the Text. It compares the number of Tokens of the Label with the number of Tokens matched to the Text. Not exact matches for Tokens, or if the Tokens within the label do appear in an other order than in the text do also reduce this score. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of _Min Labe Score_, _Min Text Match Score_ and _Min Match Score_.
+* __Min Matched Tokens__ _(enhancer.engines.linking.minFoundTokens)_ [1..*]::int: The minimum number of matching tokens. Only "matchable" tokens are counted. For full matches (where all tokens of the Label do match tokens in the text) this parameter is ignored.
 
-    This parameter is strongly related with the _Min Label Match Score_ Typical setting are
+    This parameter is strongly related with the _Min Labe Score_ Typical setting are
 
-    1. _Min Matched Tokens_=1 and _Min Label Match Score_ > 0.5 (e.g. 0.75)
-    2. _Min Matched Tokens_=2 and _Min Label Match Score_ <= 0.5 (e.g. 0.5)
+    1. _Min Matched Tokens_=1 and _Min Label Score_ > 0.5 (e.g. 0.75)
+    2. _Min Matched Tokens_=2 and _Min Label Score_ <= 0.5 (e.g. 0.5)
 
     For Labels containing of one or two words both options do have the same result, but for Longer labels (1) is more restrictive than (2). The important thing is that both options ensures that Labels with more than one tokens will not be considered if only a single token does match the text.
 
-    If used in combination with an disambiguation Engine one might want to consider to suggest Entities where only a single token of multi-token labels do match. In such cases a configuration like _Min Matched Tokens_=1 and _Min Label Match Score_ <= 0.5 (e.g. 0.4) might be considered. With such scenarios users will also want to considerable increase the value for _Max Suggestions_ (typically values > 10).
+    If used in combination with an disambiguation Engine one might want to consider to suggest Entities where only a single token of multi-token labels do match. In such cases a configuration like _Min Matched Tokens_=1 and _Min Label Score_ <= 0.5 (e.g. 0.4) might be considered. With such scenarios users will also want to considerable increase the value for _Max Suggestions_ (typically values > 10).
 
-* __Min Text Match Score__ _org.apache.stanbol.enhancer.engines.keywordextraction.minTextMatchFactor_ [0..1]::double: The "Text Score" [0..1] represents how well the Label of an Entity matches to the selected Span in the Text. It compares the number of matched {@link Token} from the label with the number of Tokens enclosed by the Span in the Text an Entity is suggested for. Not exact matches for Tokens, or if the Tokens within the label do appear in an other order than in the text do also reduce this score. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of _Min Label Match Score_, _Min Text Match Score_ and _Min Match Score_.
-* __Min Match Score__ _org.apache.stanbol.enhancer.engines.keywordextraction.minTextMatchFactor_ [0..1]::double: Defined as the product of the "Text Score" with the "Label Score" - meaning that this value represents both how well the label matches the text and how much of the label is matched with the text. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of _Min Label Match Score_, _Min Text Match Score_ and _Min Match Score_. 
+* __Min Text Score__ _(enhancer.engines.linking.minTextScore)_ [0..1]::double: The "Text Score" [0..1] represents how well the Label of an Entity matches to the selected Span in the Text. It compares the number of matched {@link Token} from the label with the number of Tokens enclosed by the Span in the Text an Entity is suggested for. Not exact matches for Tokens, or if the Tokens within the label do appear in an other order than in the text do also reduce this score. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of _Min Labe Score_, _Min Text Match Score_ and _Min Match Score_.
+* __Min Match Score__ _(enhancer.engines.linking.minMatchScore)_ [0..1]::double: Defined as the product of the "Text Score" with the "Label Score" - meaning that this value represents both how well the label matches the text and how much of the label is matched with the text. Entities are only considered if at least one of their labels cores higher than the minimum for all tree of _Min Labe Score_, _Min Text Match Score_ and _Min Match Score_. 
 
 #### Type Mappings Syntax