You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2013/06/20 09:09:24 UTC

[jira] [Created] (STANBOL-1116) Filter Literals of suggested Entities based on Languages used for Lookups

Rupert Westenthaler created STANBOL-1116:
--------------------------------------------

             Summary: Filter Literals of suggested Entities based on Languages used for Lookups
                 Key: STANBOL-1116
                 URL: https://issues.apache.org/jira/browse/STANBOL-1116
             Project: Stanbol
          Issue Type: Sub-task
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


EntityLinking uses two languages to lookup Entities:

(1) the language of the current document (as detected by language detection)
(2) the default mapping language (default: null ... labels without language tag)

In multi-lingual vocabularies (e.g. dbpedia or freebase) entities might define literal values for a lot of languages (for freebase there might be labels for more as 100 languages for some entities)

Currently the EntityLinkingEngine includes labels of all languages in the EnhancementResults. This has two disadvantages:

(1) All values need to be provided by the EntitySearcher. This might require to convert all those values to Clerezza RDF (such as in the case of the Solr based EntitySearcher)

(2) If dereferencing is activated a lot of additional literals (ant therefore triples) are added to the Enhancement results. This has both a negative impact for performance AND also the size of the Enhancement Results.

This issue will adapt the EntiySearcher interface to allow specifying

* selected fields
* selected languages

with all requests, where the languages used to query will always be included to the parsed selected languages and the label field, type field and redirect field will always be included in the selected fields - as those information are required by the linking process itself.

EntitySearcher implementation may ignore those configurations and return all values for returned entities instead.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira