You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "Richard Head Jr." <hs...@yahoo.com.INVALID> on 2014/06/09 01:08:26 UTC
DictionaryNameFinder Not Finding Longest Match When Name Ends in a Number
Here's my dictionary:
<?xml version="1.0" encoding="UTF-8"?>
<dictionary case_sensitive="false">
<entry>
<token>vitamin</token>
<token>b12</token>
</entry>
<entry>
<token>vitamin</token>
<token>b</token>
</entry>
<entry>
<token>john</token>
<token>doe</token>
</entry>
<entry>
<token>john</token>
<token>d</token>
</entry>
</dictionary>
When ran on this sentence using a DictionaryNameFinder: My name is john doe, aka john d. I like vitamin b12.
The following tokens are found: john doe, john d, vitamin b
As you can see, when the 2nd token ends in a number, the longest match is discarded.
Bug, or am I missing something?
Thanks
Re: DictionaryNameFinder Not Finding Longest Match When Name Ends in a Number
Posted by "Richard Head Jr." <hs...@yahoo.com.INVALID>.
Issue can be found here: https://issues.apache.org/jira/browse/OPENNLP-702
On Tuesday, June 10, 2014 3:38 AM, Jörn Kottmann <ko...@gmail.com> wrote:
Hello,
that looks like a bug. Please open a jira issue.
Thanks,
Jörn
On 06/09/2014 01:08 AM, Richard Head Jr. wrote:
> Here's my dictionary:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <dictionary case_sensitive="false">
> <entry>
> <token>vitamin</token>
> <token>b12</token>
> </entry>
> <entry>
> <token>vitamin</token>
> <token>b</token>
> </entry>
> <entry>
> <token>john</token>
> <token>doe</token>
> </entry>
> <entry>
> <token>john</token>
> <token>d</token>
> </entry>
> </dictionary>
>
> When ran on this sentence using a DictionaryNameFinder: My name is john doe, aka john d. I like vitamin b12.
>
> The following tokens are found: john doe, john d, vitamin b
>
> As you can see, when the 2nd token ends in a number, the longest match is discarded.
> Bug, or am I missing something?
>
> Thanks
Re: DictionaryNameFinder Not Finding Longest Match When Name Ends
in a Number
Posted by Jörn Kottmann <ko...@gmail.com>.
Hello,
that looks like a bug. Please open a jira issue.
Thanks,
Jörn
On 06/09/2014 01:08 AM, Richard Head Jr. wrote:
> Here's my dictionary:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <dictionary case_sensitive="false">
> <entry>
> <token>vitamin</token>
> <token>b12</token>
> </entry>
> <entry>
> <token>vitamin</token>
> <token>b</token>
> </entry>
> <entry>
> <token>john</token>
> <token>doe</token>
> </entry>
> <entry>
> <token>john</token>
> <token>d</token>
> </entry>
> </dictionary>
>
> When ran on this sentence using a DictionaryNameFinder: My name is john doe, aka john d. I like vitamin b12.
>
> The following tokens are found: john doe, john d, vitamin b
>
> As you can see, when the 2nd token ends in a number, the longest match is discarded.
> Bug, or am I missing something?
>
> Thanks