You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2013/09/11 13:58:52 UTC

[jira] [Updated] (STANBOL-1141) Wikilinks Parser and TDB Generator

     [ https://issues.apache.org/jira/browse/STANBOL-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler updated STANBOL-1141:
-----------------------------------------

    Parent Issue: STANBOL-1156  (was: STANBOL-1037)
    
> Wikilinks Parser and TDB Generator
> ----------------------------------
>
>                 Key: STANBOL-1141
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1141
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer, Entityhub
>            Reporter: Antonio David PĂ©rez Morales
>              Labels: freebase, jenatdb, wikilinks
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components.
> Wikilinks is one of the result of this task. 
> Wikilinks dataset comprising of 40 million mentions over 3 million entities. The method is based on finding hyperlinks to Wikipedia from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, we are able to include many styles of text beyond newswire and many entity types beyond people.
> UMass has created expanded versions of the dataset containing the following extra features:
> * Complete webpage content (with cleaned DOM structure)
> * Extracted context for the mentions
> * Alignment to Freebase entities
> The expanded dataset can be downloaded from http://iesl.cs.umass.edu/downloads/wiki-link/context-only/
> A tool is needed in order to parser this information and store it in any type of storage like Jena TDB. 
> Wikilinks provides information of documents with mentions to Freebase entities and this information can be used both to desambiguate and to merge with the Freebase information in order to have a large set of valuable data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira