You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2013/12/04 08:16:35 UTC

[jira] [Created] (STANBOL-1230) Add Lookup Cache to EntityLinking Engine

Rupert Westenthaler created STANBOL-1230:
--------------------------------------------

             Summary: Add Lookup Cache to EntityLinking Engine
                 Key: STANBOL-1230
                 URL: https://issues.apache.org/jira/browse/STANBOL-1230
             Project: Stanbol
          Issue Type: Improvement
          Components: Enhancement Engines
    Affects Versions: 0.12.0
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler
             Fix For: 0.12.0


The EntityLinkingEngine should cache results of lookups on the EntitySearchers.

Entities are often reoccurring in analyzed Documents. Because of that caching results for look upped  tokens should provide considerable performance improvements as tatistics shows that ~90% of the processing time for the EntityLinking engine is contributed by the entity look-up. 

So if 20% of all Entity mentions are about reoccurring Entities the processing time should be reduced by about 18%.

The cache will use the list of search string as key and a list of returned Entities as value. The cache will only collect look-up results for the currently analyzed document. 

EntityLinking statistics will be updated to include the cache hit percentage.

This issue affects both the trunk (1.0.0-SNAPSHOT) as well as the stable 0.12 releasing branch. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)