You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2013/12/04 08:16:35 UTC
[jira] [Created] (STANBOL-1230) Add Lookup Cache to EntityLinking
Engine
Rupert Westenthaler created STANBOL-1230:
--------------------------------------------
Summary: Add Lookup Cache to EntityLinking Engine
Key: STANBOL-1230
URL: https://issues.apache.org/jira/browse/STANBOL-1230
Project: Stanbol
Issue Type: Improvement
Components: Enhancement Engines
Affects Versions: 0.12.0
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
Fix For: 0.12.0
The EntityLinkingEngine should cache results of lookups on the EntitySearchers.
Entities are often reoccurring in analyzed Documents. Because of that caching results for look upped tokens should provide considerable performance improvements as tatistics shows that ~90% of the processing time for the EntityLinking engine is contributed by the entity look-up.
So if 20% of all Entity mentions are about reoccurring Entities the processing time should be reduced by about 18%.
The cache will use the list of search string as key and a list of returned Entities as value. The cache will only collect look-up results for the currently analyzed document.
EntityLinking statistics will be updated to include the cache hit percentage.
This issue affects both the trunk (1.0.0-SNAPSHOT) as well as the stable 0.12 releasing branch.
--
This message was sent by Atlassian JIRA
(v6.1#6144)