You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Mark Giaconia (JIRA)" <ji...@apache.org> on 2013/06/02 17:06:21 UTC

[jira] [Commented] (OPENNLP-579) Framework to support Gazateer search in concert with NER for location entities.

    [ https://issues.apache.org/jira/browse/OPENNLP-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672555#comment-13672555 ] 

Mark Giaconia commented on OPENNLP-579:
---------------------------------------

Please take a look at the latest Upload. At it's essence, it utilizes these three interfaces
/**
 *Allows for processing of a complete document. Ties to the EntityLinker framework, and can optionally return custom Spans as per the EntityLinker
 * configuration for each entity type
 * @author Mark Giaconia
 */
public interface LinkableDocumentNameFinder{
  Document find(String[] sentences, Tokenizer tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
  Document find(String documentText, SentenceDetector sentenceDetector, Tokenizer tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
}
Notice the Document object... in order to make this clean I went with a more OO approach after looking at the current DocumentNameFinder. Document object contains a List<Sentence> look in the domain package at the details of those objects.

A LinkableDocumentNameFinder instantiates a list of linkables for each entity type that it discovers via the EntityLinkerFactory (which works off of a Properties file). 

public interface EntityLinker<T extends Set<? extends Span>> {
  T find(String[] tokens,Span[] spans, List<Class> linkables) ;
  T find(String[] tokens,Span[] spans) ;   
}

each EntityLinker impl utilizes any number of Linkable interface impls via the LinkableFactory (which also works off of a Properties file):
public interface Linkable<T extends Set<? extends BaseLink>  > extends Formatable{
  T find(String textToSearchFor);
 T find(String locationText, List<String> whereConditions);
  T getHierarchyFor(BaseLink entry);
}
formattable interface is because different databases will require different formatting of the entity's text before passing into the query
public interface Formatable{  
  String format(String entity);
}

take a look at the Example class, and the DefaultLinkableDocumentNameFinderImpl classes.


                
> Framework to support Gazateer search in concert with NER for location entities.
> -------------------------------------------------------------------------------
>
>                 Key: OPENNLP-579
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-579
>             Project: OpenNLP
>          Issue Type: Wish
>          Components: Name Finder
>    Affects Versions: 1.6.0
>         Environment: Any
>            Reporter: Mark Giaconia
>            Priority: Minor
>              Labels: features
>             Fix For: 1.6.0
>
>         Attachments: EntityLinker_30may2013.zip, geonamefinder.properties, geonamefind.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> An interface for defining a Gazeteer and the methods to search it, an extended Span object, and a Namefinder that encapsulates a TokenNameFinder for locations. Commercial applications that do this are extremely expensive, and there are many free gazateers one could use to create a solution with OpenNLP. The capability should provide a simple default implementation using the most popular open source geospatial database, PostGIS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira