You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2014/07/10 10:22:09 UTC
[jira] [Updated] (STANBOL-1362) FST linking engine should use the
matchable span to calculate dominant tag
[ https://issues.apache.org/jira/browse/STANBOL-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rupert Westenthaler updated STANBOL-1362:
-----------------------------------------
Description:
The FST linking engine uses the TagClusterReducer#LONGEST_DOMINANT_RIGHT to select the dominant Tag in an overlapping cluster of Tag suggestions.
While this algorithm is fine the span used as input are not ideal as also none-matchable tokens are considered. Especially when linking against DBPedia this sometimes results in unexpected results as several entities in DBPedia do have labels that include things like pre-/post-positions. Because of that a text mentioning "in <location>" could get linked to an entity with this name (could be a book or music-album) but not suggesting the <location>. This is because the matching span for "in <location>" is the LONGEST_DOMINANT_RIGHT and the match for the <location> will be removed.
To fix this issue one needs to implement a LONGEST_DOMINANT_RIGHT variant that only considers the span of enclosed matchable tokens instead of the whole matching span. Doing so will only use <location> as matchable span and therefore suggest both the <location> and the other entity matching "in <location>".
was:
The FST linking engine uses the TagClusterReducer#LONGEST_DOMINANT_RIGHT to select the dominant Tag in an overlapping cluster of Tag suggestions.
While this algorithm is fine the span used as input are not ideal as also none-matchable tokens are considered. Especially when linking against DBPedia this sometimes results in unexpected results as several entities in DBPedia do have labels that include things like pre-/post-positions. Because of that a text mentioning "in {location}" could get linked to an entity with this name (could be a {book} or {music-album}) but not suggesting the {location}. This is because the matching span for "in {location}" is the LONGEST_DOMINANT_RIGHT and the match for the {location} will be removed.
To fix this issue one needs to implement a LONGEST_DOMINANT_RIGHT variant that only considers the span of enclosed matchable tokens instead of the whole matching span. Doing so will only use {location} as matchable span and therefore suggest both the {location} and the other entity matching "in {location}".
> FST linking engine should use the matchable span to calculate dominant tag
> ---------------------------------------------------------------------------
>
> Key: STANBOL-1362
> URL: https://issues.apache.org/jira/browse/STANBOL-1362
> Project: Stanbol
> Issue Type: Improvement
> Components: Enhancement Engines
> Affects Versions: 0.12.0
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
> Priority: Minor
> Fix For: 1.0.0, 0.12.1
>
>
> The FST linking engine uses the TagClusterReducer#LONGEST_DOMINANT_RIGHT to select the dominant Tag in an overlapping cluster of Tag suggestions.
> While this algorithm is fine the span used as input are not ideal as also none-matchable tokens are considered. Especially when linking against DBPedia this sometimes results in unexpected results as several entities in DBPedia do have labels that include things like pre-/post-positions. Because of that a text mentioning "in <location>" could get linked to an entity with this name (could be a book or music-album) but not suggesting the <location>. This is because the matching span for "in <location>" is the LONGEST_DOMINANT_RIGHT and the match for the <location> will be removed.
> To fix this issue one needs to implement a LONGEST_DOMINANT_RIGHT variant that only considers the span of enclosed matchable tokens instead of the whole matching span. Doing so will only use <location> as matchable span and therefore suggest both the <location> and the other entity matching "in <location>".
--
This message was sent by Atlassian JIRA
(v6.2#6252)