You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2013/06/13 14:43:21 UTC

[jira] [Resolved] (STANBOL-1110) Use Term Proximity for Searching Entities in the EntityhubLinkingEngine

     [ https://issues.apache.org/jira/browse/STANBOL-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-1110.
------------------------------------------

    Resolution: Fixed

starting with http://svn.apache.org/r1492611 the EntityhubLinkingEngine uses Term Proximity for searches
                
> Use Term Proximity for Searching Entities in the EntityhubLinkingEngine
> -----------------------------------------------------------------------
>
>                 Key: STANBOL-1110
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1110
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancement Engines
>    Affects Versions: enhancement-engines-0.10.0
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> The issue with the ranking of the results of the EntityLinkingEngine is that some Entities had matching labels in both the language of the text as well as the fallback language. Other only in one of the two. As Background:
> The EntityLinkingEngine perfoms queries like
>     {lang1}:"{term1}" OR {lang1}:"{term2}" OR {lang2}:"{term1}" OR {lang2}:"{term2}"
> when linking Entities. Where {lang1} is the language detected for the document and {lang2} is the default mapping language.
> When executing such queries on the Entithub based EntitySearcher implementations of the EntityhubLinkingEngine the ranking of results where Entities only matching only one of the parsed terms are in front of some matching both therms.
> The reason for that is that there are two possibilities how two of the four query terms can match
>  (a) both {term1} and {term2} do match in the same language
>  (b) a single term matches in {lang1} and {lang2}
> While (a) is the matching expected by users (b) is not so unlikely. Especially if (a) is not a very famous entity and is missing translations of its labels to many languages and {term1} and/or {term2} is present in more famous entities that do have such translation. Most often this happens with given names of persons. 
> As the EntityLinking engine only processes (for performance reasons) only the first few results (by default 2*maxSuggestions but at least 10)  this will cause Entities to be not linked because of the unintended ranking of results.
> The new Proximity Ranking Feature (STANBOL-1105) can be used to solve this, as it ensures that Entities matching both terms in the same language (and therefore in the same label) will be ranked above those that match only a single term in two different languages.
> This issue will enable the use of this feature for the EntityhubLinkingEngine

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira