You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Mark Giaconia (JIRA)" <ji...@apache.org> on 2013/06/02 17:06:21 UTC
[jira] [Commented] (OPENNLP-579) Framework to support Gazateer
search in concert with NER for location entities.
[ https://issues.apache.org/jira/browse/OPENNLP-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672555#comment-13672555 ]
Mark Giaconia commented on OPENNLP-579:
---------------------------------------
Please take a look at the latest Upload. At it's essence, it utilizes these three interfaces
/**
*Allows for processing of a complete document. Ties to the EntityLinker framework, and can optionally return custom Spans as per the EntityLinker
* configuration for each entity type
* @author Mark Giaconia
*/
public interface LinkableDocumentNameFinder{
Document find(String[] sentences, Tokenizer tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
Document find(String documentText, SentenceDetector sentenceDetector, Tokenizer tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
}
Notice the Document object... in order to make this clean I went with a more OO approach after looking at the current DocumentNameFinder. Document object contains a List<Sentence> look in the domain package at the details of those objects.
A LinkableDocumentNameFinder instantiates a list of linkables for each entity type that it discovers via the EntityLinkerFactory (which works off of a Properties file).
public interface EntityLinker<T extends Set<? extends Span>> {
T find(String[] tokens,Span[] spans, List<Class> linkables) ;
T find(String[] tokens,Span[] spans) ;
}
each EntityLinker impl utilizes any number of Linkable interface impls via the LinkableFactory (which also works off of a Properties file):
public interface Linkable<T extends Set<? extends BaseLink> > extends Formatable{
T find(String textToSearchFor);
T find(String locationText, List<String> whereConditions);
T getHierarchyFor(BaseLink entry);
}
formattable interface is because different databases will require different formatting of the entity's text before passing into the query
public interface Formatable{
String format(String entity);
}
take a look at the Example class, and the DefaultLinkableDocumentNameFinderImpl classes.
> Framework to support Gazateer search in concert with NER for location entities.
> -------------------------------------------------------------------------------
>
> Key: OPENNLP-579
> URL: https://issues.apache.org/jira/browse/OPENNLP-579
> Project: OpenNLP
> Issue Type: Wish
> Components: Name Finder
> Affects Versions: 1.6.0
> Environment: Any
> Reporter: Mark Giaconia
> Priority: Minor
> Labels: features
> Fix For: 1.6.0
>
> Attachments: EntityLinker_30may2013.zip, geonamefinder.properties, geonamefind.zip
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> An interface for defining a Gazeteer and the methods to search it, an extended Span object, and a Namefinder that encapsulates a TokenNameFinder for locations. Commercial applications that do this are extremely expensive, and there are many free gazateers one could use to create a solution with OpenNLP. The capability should provide a simple default implementation using the most popular open source geospatial database, PostGIS.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira