You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2014/01/22 10:43:19 UTC

[jira] [Resolved] (STANBOL-1252) Add support for MIN_FOUND_TOKENS to the Lucene FST Linking Engine

     [ https://issues.apache.org/jira/browse/STANBOL-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-1252.
------------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.12.0

Implemented with http://svn.apache.org/r1557037 in trunk and merged back to 0.12 with http://svn.apache.org/r1557044

> Add support for MIN_FOUND_TOKENS to the Lucene FST Linking Engine
> -----------------------------------------------------------------
>
>                 Key: STANBOL-1252
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1252
>             Project: Stanbol
>          Issue Type: Improvement
>    Affects Versions: 0.12.0
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>             Fix For: 0.12.0
>
>
> The FST linking engine already allows to configure in percentage how much of a processable chunk (typically noun phrases) need to match so that a suggestion is accepted. This is done by using the "enhancer.engines.linking.minChunkMatchScore" property. The default is > 50%.
> While this way of configuration is great for chunks created by NamedEntityAnnotations it is not always well suited for detected noun phrases as those may select larger sections of a sentence. E.g. "goalie Mathias Lange (Iserlohn Roosters)" will not match any Entity in a vocabulary as it contains 5 matchable tokens but both the player "Mathias Lange" and the Team name "Iserlohn Roosters" do only represent two of them.
> In such cases the configuration of a fixed lower limit of the number of (matchable) Tokens that need to match within a Chunk can be preferable.
> For this configuration the FST linking engine will use the "Min Matched Tokens (enhancer.engines.linking.minFoundTokens)" property of the EntityLinker configuration. The default will be "2".
> The FST linking Engine will accept tokens the either confirm with "enhancer.engines.linking.minChunkMatchScore" or "enhancer.engines.linking.minFoundTokens".
> NOTE: those configuration do only apply for Tokens within a processable Chunk (typically a Noun Phrase)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)