You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by "Richard Head Jr." <hs...@yahoo.com.INVALID> on 2014/06/09 01:08:26 UTC

DictionaryNameFinder Not Finding Longest Match When Name Ends in a Number

Here's my dictionary:

<?xml version="1.0" encoding="UTF-8"?>
<dictionary case_sensitive="false">
  <entry>
    <token>vitamin</token>
    <token>b12</token>
  </entry>
  <entry>
    <token>vitamin</token>
    <token>b</token>
  </entry>
  <entry>
    <token>john</token>
    <token>doe</token>
  </entry>
  <entry>
    <token>john</token>
    <token>d</token>
  </entry>
</dictionary>

When ran on this sentence using a DictionaryNameFinder: My name is john doe, aka john d. I like vitamin b12.

The following tokens are found: john doe, john d, vitamin b

As you can see, when the 2nd token ends in a number, the longest match is discarded. 
Bug, or am I missing something?

Thanks

Re: DictionaryNameFinder Not Finding Longest Match When Name Ends in a Number

Posted by "Richard Head Jr." <hs...@yahoo.com.INVALID>.

Issue can be found here: https://issues.apache.org/jira/browse/OPENNLP-702



On Tuesday, June 10, 2014 3:38 AM, Jörn Kottmann <ko...@gmail.com> wrote:
Hello,

that looks like a bug. Please open a jira issue.

Thanks,
Jörn




On 06/09/2014 01:08 AM, Richard Head Jr. wrote:
> Here's my dictionary:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <dictionary case_sensitive="false">
>    <entry>
>      <token>vitamin</token>
>      <token>b12</token>
>    </entry>
>    <entry>
>      <token>vitamin</token>
>      <token>b</token>
>    </entry>
>    <entry>
>      <token>john</token>
>      <token>doe</token>
>    </entry>
>    <entry>
>      <token>john</token>
>      <token>d</token>
>    </entry>
> </dictionary>
>
> When ran on this sentence using a DictionaryNameFinder: My name is john doe, aka john d. I like vitamin b12.
>
> The following tokens are found: john doe, john d, vitamin b
>
> As you can see, when the 2nd token ends in a number, the longest match is discarded.
> Bug, or am I missing something?
>
> Thanks

Re: DictionaryNameFinder Not Finding Longest Match When Name Ends in a Number

Posted by Jörn Kottmann <ko...@gmail.com>.

Hello,

that looks like a bug. Please open a jira issue.

Thanks,
Jörn

On 06/09/2014 01:08 AM, Richard Head Jr. wrote:
> Here's my dictionary:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <dictionary case_sensitive="false">
>    <entry>
>      <token>vitamin</token>
>      <token>b12</token>
>    </entry>
>    <entry>
>      <token>vitamin</token>
>      <token>b</token>
>    </entry>
>    <entry>
>      <token>john</token>
>      <token>doe</token>
>    </entry>
>    <entry>
>      <token>john</token>
>      <token>d</token>
>    </entry>
> </dictionary>
>
> When ran on this sentence using a DictionaryNameFinder: My name is john doe, aka john d. I like vitamin b12.
>
> The following tokens are found: john doe, john d, vitamin b
>
> As you can see, when the 2nd token ends in a number, the longest match is discarded.
> Bug, or am I missing something?
>
> Thanks