You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2014/01/09 14:40:57 UTC

[jira] [Created] (STANBOL-1252) Add support for MIN_FOUND_TOKENS to the Lucene FST Linking Engine

Rupert Westenthaler created STANBOL-1252:
--------------------------------------------

Summary: Add support for MIN_FOUND_TOKENS to the Lucene FST Linking Engine
Key: STANBOL-1252
URL: https://issues.apache.org/jira/browse/STANBOL-1252
Project: Stanbol
Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler

The FST linking engine already allows to configure in percentage how much of a processable chunk (typically noun phrases) need to match so that a suggestion is accepted. This is done by using the "enhancer.engines.linking.minChunkMatchScore" property. The default is > 50%.

While this way of configuration is great for chunks created by NamedEntityAnnotations it is not always well suited for detected noun phrases as those may select larger sections of a sentence. E.g. "goalie Mathias Lange (Iserlohn Roosters)" will not match any Entity in a vocabulary as it contains 5 matchable tokens but both the player "Mathias Lange" and the Team name "Iserlohn Roosters" do only represent two of them.

In such cases the configuration of a fixed lower limit of the number of (matchable) Tokens that need to match within a Chunk can be preferable.

For this configuration the FST linking engine will use the "Min Matched Tokens (enhancer.engines.linking.minFoundTokens)" property of the EntityLinker configuration.

The FST linking Engine will accept tokens the either confirm with "enhancer.engines.linking.minChunkMatchScore" or "enhancer.engines.linking.minFoundTokens".

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)