You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by "Giaconia, Mark [USA]" <Gi...@bah.com> on 2013/06/02 22:59:38 UTC

EntityLinker Framework OPENNLP-579

Trying to articulate what I've done with OPENNLP-579<https://issues.apache.org/jira/browse/OPENNLP-579> to get some feedback and iterate some more...fail early, fail often I say....

I started to document a bit, here is what I have so far for an explanation...

How to use the OpenNLP EntityLinker framework:

Purpose and Use Cases
The OpenNLP entity Linker framework exists in order to associate extracted entities to external data sources. For instance, the EntityLinker framework can provide the means to associating a discovered location entity to N records in a GeoGazateer. Another case may be to associate a name to a database of person names (fuzzily).

Technical Overview.
The framework consists of 3 Main Interfaces and two factories
EntityLinker and EntityLinkerFactory. The factory can return many EntityLinkers for a given entitytype using one properties file
Linkable and LinkableFactory. This factory can also return many Linkables for a given EntityLinker type from the same properties file.
The current framework assumes all sentence detection, tokenization, and namefinding happened externally to the EntityLinker. If the LinkedDocumentNameFInder is used, the functionality of NER and EntityLinking is encapsulated to a greater degree (see my other post)

Conceptual design
The concept is that an EntityLinker is associated to an entity type. Every EntityLinker can utilize Multiple pluggable Linkables. For instance, an EntityLinker implementation called GeoEntityLinker can link to several database gazateers that are Linkable implementations, such as NGA Geonames and USGS placenames, or a SOLR index of locations… the possibilities are endless.
The Factory classes utilize reflection to instantiate EntityLinkers and their Linkables from configured properties in a properties file.

Here are the interface signatures:

EntityLinker

public interface EntityLinker<T extends Set<? extends Span>> {
  T find(String[] tokens,Span[] spans, List<Class> linkables) ;//not used currently
  T find(String[] tokens,Span[] spans) ;
}

Linkable (an EntityLinker impl utilizes many Linkables)

public interface Linkable<T extends Set<? extends BaseLink>  > extends Formattable{
  static LinkableFactory factory = LinkableFactory.getInstance();
  T find(String textToSearchFor);
 T find(String locationText, List<String> whereConditions);
  T getHierarchyFor(BaseLink entry);
}
//formats an entity's text to prepare it to be used as a search string for a particular system
public interface Formattable{
  String format(String entity);
}



Mark Giaconia