You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Antonio David Pérez Morales (JIRA)" <ji...@apache.org> on 2013/09/13 14:46:52 UTC

[jira] [Commented] (STANBOL-1157) Freebase Disambiguation Algorithm

    [ https://issues.apache.org/jira/browse/STANBOL-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766445#comment-13766445 ] 

Antonio David Pérez Morales commented on STANBOL-1157:
------------------------------------------------------

The Freebase Disambiguation Engine implements the above algorithm but the first point (local score) which is not implemented in this version.

The algorithm builds a subgraph from the whole Freebase graph only for the entities returned after the NLP and Entity linking process, and the relations between them.

Using the Entity Annotations for each Text Annotation, it builds all the possible solutions for the text to enhance. It means, all the possible tuples result of combining the entities in each set of entity annotations (for each text annotation).

The searched solution is the tuple minimizing the distance in the graph between every pair of entities in the tuple. Minimal distance means higher disambiguation score.

The engine can be downloaded from https://github.com/adperezmorales/gsoc-freebase-disambiguation-engine/tree/master/gsoc-freebase-disambiguation-engine
                
> Freebase Disambiguation Algorithm
> ---------------------------------
>
>                 Key: STANBOL-1157
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1157
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancement Engines, Enhancer, Entityhub
>            Reporter: Rafa Haro
>             Fix For: 0.12.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The disambiguation algorithm should take into account a local disambiguation score (comparing in some way the document context with the contexts provided by Wikilinks resource) and a global disambiguation score computed by a graph based algorithm using the Freebase graph imported in a Neo4j database. Each disambiguation score would have a different weight in the final disambiguation store for each entity. The algorithm's steps, for each TextAnnotation, can be the following:
> 1. Local score: for each EntityAnnotation, retrieves from Wikilinks database all the contexts associated to the referenced entity. Compare (similarity, distance....) the mention context (selected-context) with the wikilinks contexts.
> 2. Global score: build a subgraph with all the possible entities and its relations in Freebase. Extract a set of possibles solutions from such graph (note: a solution should include only one entity annotation for each text annotation). Compute the Dijsktra distance between each pair of entities belonging to a possible solution. 
> 3. Weights normalization and confidence values refinement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira