You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Jeffrey Zemerick <jz...@apache.org> on 2016/01/24 16:25:18 UTC

Filtering spans based on probability

Hi,

After calls to NameFinderME.find() we remove spans from the list whose
probability falls under some set threshold. I am wondering if it is
currently possible for this filtering to take place earlier in the process
perhaps by specifying some minimum probability value as an argument
somewhere. If not possible and if it's something that the community thinks
would be beneficial I'll be happy to attempt to provide a patch.

Thanks,
Jeff

Re: Filtering spans based on probability

Posted by Joern Kottmann <ko...@gmail.com>.

Hello,

no, this is currently not possible as a built-in feature. On the other hand
this is something that is really simple to do.

The question is how can be provide exta value to a user by offering this as
a built-in feature?

The idea in OpenNLP is that a user just has a model and can run it without
doing any tricks after it is loaded.
That way an OpenNLP model can be run in all software solutions which
integrate OpenNLP.

To achieve that the filtering threshold should be stored in the model and
set during training time.

Filtering by confidence could be one filter some user need. We use filters
for minimum and maxium name length (e.g we don't want names which only
consists of punctuation, single chars, or to 2000 token long names).

It would be nice to see how we could fit in custom filters. Currently we
have sequence validation. Maybe they can be used for this purposes.

Jörn

On Sun, Jan 24, 2016 at 4:25 PM, Jeffrey Zemerick <jz...@apache.org>
wrote:

> Hi,
>
> After calls to NameFinderME.find() we remove spans from the list whose
> probability falls under some set threshold. I am wondering if it is
> currently possible for this filtering to take place earlier in the process
> perhaps by specifying some minimum probability value as an argument
> somewhere. If not possible and if it's something that the community thinks
> would be beneficial I'll be happy to attempt to provide a patch.
>
> Thanks,
> Jeff
>