You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2013/04/02 15:19:15 UTC

[jira] [Created] (STANBOL-1013) Seperate (Entity)Spotting and (Entity)Linking

Rupert Westenthaler created STANBOL-1013:
--------------------------------------------

             Summary: Seperate (Entity)Spotting and (Entity)Linking
                 Key: STANBOL-1013
                 URL: https://issues.apache.org/jira/browse/STANBOL-1013
             Project: Stanbol
          Issue Type: Bug
          Components: Engine - Entity Linking, Enhancer
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Currently the EntityLinking engine performs two major tasks

(1) Spotting: detect the words in the analyzed Text that should be linked to the controlled Vocabulary. For that words are categorized as "linkable", "matchable" and "others". Also Chunks are considered for this task.

(2) Linking: Creates searches for "linkable" words while considering "matchable" words. Labels of suggested Entities are tokenized and matched against "linkable" and "matchable" words in the text. The EntityLinkingConfiguration ise used to configure this task.


See the documentation of the EntityLinkingEngine [1] for details.


(1) is configured by using the TextProcessingConfiguration and implemented by the ProcessingState class. (2) is configured by the EntityLinkingConfiguration and implemented by the EntityLinker class.

Proposed Workplan:
=====

1. clean-up and improve the internal APIs used by the EntityLinking engine

2. define a public API for describing Entity Spotting results: Possibilities include
    * using the metadata of the ContentItems (e.g. fise:TextAnnotations)
    * annotations in the AnalyzedText contentpart
    * some additional ContentPart

3 Split-up (1) and (2) as two separate EnhancementEngines so that
   * (1) NlpSpottingEngine: Spots potential Entities by using NLP processing results
   * (2) EntityLinkingEngine: Links Entities of a Controlled Vocabulary based on Spotting results

4. Integrate Named Entity Linking into the new Spotting & Linking workflow
    * By allowing Spotters to also annotate spotted Entities to carry additional metadata (e.g. the type as suggested by NER)
    * Extending the EntityLinkingEngine to make use of those metadata when searching/matching Entities from linked Vocabularies. 

[1] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira