You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Antonio David Pérez Morales (JIRA)" <ji...@apache.org> on 2013/09/25 15:36:04 UTC

[jira] [Comment Edited] (STANBOL-1157) Freebase Disambiguation Algorithm

    [ https://issues.apache.org/jira/browse/STANBOL-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777384#comment-13777384 ] 

Antonio David Pérez Morales edited comment on STANBOL-1157 at 9/25/13 1:35 PM:
-------------------------------------------------------------------------------

Both classes are used to calculate the shortest path between two entities in the graph. The CustomDijkstraDistance is a customization of the original DijkstraDistance of Jung to avoid to use the Transformation class from an old version of guava library (Stanbol provides a new version of guava library), so this class only adds a static interface Transformation to be used when creating a new instance of the DijkstraDistance class (to convert an Edge into a number to calculate the shortest paths). I had a problem adding the dependency to the project and I was obtaining a ClassNotFoundException because the Transformation class provided by guava in Stanbol is in a different package than the one used by DjkstraDistance so I think that updating the Jung package to use a version of guava greater than 13.0.1 (provided by Stanbol) will cause we can remove this class and use the raw DijkstraDistance implementation provided by Jung.

The UndirectedGraphJung is an extension of GraphJung class from Tinkerpop (because DijkstraDistance uses an instance of Jung graphs) with a reimplementation of the getOutEdges (getEdgesForVertex) method. In Dijkstra, we have to obtain the out edges from a vertex to follow the possible paths until the target vertex. Due to the Freebase graph is undirected (it is the same go from entityA to entityB than go from entityB to entityA), this class redefines the method to return the edges of the vertex in both directions (in and out).
                
      was (Author: adperezmorales):
    Both classes are used to calculate the shortest path between two entities in the graph. The CustomDijkstraDistance is a customization of the original DijkstraDistance of Jung to avoid to use the Transformation class from an old version of guava library (Stanbol provides a new version of guava library), so this class only adds a static interface Transformation to be used when creating a new instance of the DijkstraDistance class (to convert an Edge into a number to calculate the shortest paths). 
The UndirectedGraphJung is an extension of GraphJung class from Tinkerpop (because DijkstraDistance uses an instance of Jung graphs) with a reimplementation of the getOutEdges (getEdgesForVertex) method. In Dijkstra, we have to obtain the out edges from a vertex to follow the possible paths until the target vertex. Due to the Freebase graph is undirected (it is the same go from entityA to entityB than go from entityB to entityA), this class redefines the method to return the edges of the vertex in both directions (in and out).
                  
> Freebase Disambiguation Algorithm
> ---------------------------------
>
>                 Key: STANBOL-1157
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1157
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancement Engines, Enhancer, Entityhub
>            Reporter: Rafa Haro
>         Attachments: gsoc-freebase-disambiguation-engine-1.0-SNAPSHOT.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The disambiguation algorithm should take into account a local disambiguation score (comparing in some way the document context with the contexts provided by Wikilinks resource) and a global disambiguation score computed by a graph based algorithm using the Freebase graph imported in a Neo4j database. Each disambiguation score would have a different weight in the final disambiguation store for each entity. The algorithm's steps, for each TextAnnotation, can be the following:
> 1. Local score: for each EntityAnnotation, retrieves from Wikilinks database all the contexts associated to the referenced entity. Compare (similarity, distance....) the mention context (selected-context) with the wikilinks contexts.
> 2. Global score: build a subgraph with all the possible entities and its relations in Freebase. Extract a set of possibles solutions from such graph (note: a solution should include only one entity annotation for each text annotation). Compute the Dijsktra distance between each pair of entities belonging to a possible solution. 
> 3. Weights normalization and confidence values refinement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira