You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Jörn Kottmann <ko...@gmail.com> on 2013/10/22 21:04:06 UTC

Instantiation of an EntityLinker

Hi all,

the EntityLinker is created by the EntityLinkerFactory, and that one 
requires an
EntityLinkerProperties object, which defines the EntityLinker instance 
that is supposed
to be created.

The EntityLinkerProperties object can only be created from a file, I 
suggest that we
extend this so it can be created from an InputStream as well, similar 
how it is possible
with our other models (can be created form InputStream, File and URL).

Additionally I propose that we only have one method to create one 
EntityLinker at a time which as the only
parameter takes the properties file, all the settings can be stored 
inside it. That would make it
easier for me to integrate the EntityLinker, because all that is needed 
to create it is the properties
file and no further configuration parameters.

For example:
EntityLinker createEntityLinker(EntityLinkerProperties) throws IOException

Is there a good reason to create multiple EntityLinkers with the same call?

Any opinions?

Jörn

Re: Instantiation of an EntityLinker

Posted by Mark G <gi...@gmail.com>.

Thanks for the feedback Joern, here are my thoughts:
Totally agree about the inputstream and file overloads for EntityLinker
properties, I should update that immediately.

As for the multiple linkers, Initially I thought people may want to link to
multiple external datasets for the same named entiity with different linker
impls. But, then, like I did in the GeoEntityLinker, the linkerImpl itself
can actually orchestrate the different connectors, so multiple linkers are
unnecesary for one type. So I agree that the factory should be simplified
to return one linker. As for the other param (entitytype), currently
the type param drives which property entry to use to instantiate the
appropriate linker.
for instance the props file may be like this:

linker.location=org.apache.opennlp.tools.lntitylinker.GeoEntityLinker
linker.person=my.project.class.MyPersonLinker
linker.organization=my.project.class.MyOrgLinker

the factory will return the appropriate linker for the entity type passed
in. Without that parameter we would need a separate ELprops file for each
type... and currently the entitylinkerprops object is used for other
properties. Essentially, if we took away the type param, then each
entitylinker impl would need it's own properties file, and you would know
what file to load based on what type of entity you have. Part of the reason
for one file was to make the BaseEntityLinker simple to extend, and I am
used to working on large clusters and properties files can get out of
control so I was shooting for requiring only one from the beginning. One
file per linker is totally doable though.

MG

On Tue, Oct 22, 2013 at 3:04 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> Hi all,
>
> the EntityLinker is created by the EntityLinkerFactory, and that one
> requires an
> EntityLinkerProperties object, which defines the EntityLinker instance
> that is supposed
> to be created.
>
> The EntityLinkerProperties object can only be created from a file, I
> suggest that we
> extend this so it can be created from an InputStream as well, similar how
> it is possible
> with our other models (can be created form InputStream, File and URL).
>
> Additionally I propose that we only have one method to create one
> EntityLinker at a time which as the only
> parameter takes the properties file, all the settings can be stored inside
> it. That would make it
> easier for me to integrate the EntityLinker, because all that is needed to
> create it is the properties
> file and no further configuration parameters.
>
> For example:
> EntityLinker createEntityLinker(**EntityLinkerProperties) throws
> IOException
>
> Is there a good reason to create multiple EntityLinkers with the same call?
>
> Any opinions?
>
> Jörn
>