You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@opennlp.apache.org by "rhead (JIRA)" <ji...@apache.org> on 2014/06/11 06:35:01 UTC

[jira] [Created] (OPENNLP-702) DictionaryNameFinder Not Finding Longest Match When Name Ends in a Number

rhead created OPENNLP-702:
-----------------------------

             Summary: DictionaryNameFinder Not Finding Longest Match When Name Ends in a Number
                 Key: OPENNLP-702
                 URL: https://issues.apache.org/jira/browse/OPENNLP-702
             Project: OpenNLP
          Issue Type: Bug
          Components: Name Finder, Tokenizer
         Environment: Darwin Kernel Version 12.5.0
            Reporter: rhead


Here's my dictionary:

<?xml version="1.0" encoding="UTF-8"?>
<dictionary case_sensitive="false">
  <entry>
    <token>vitamin</token>
    <token>b12</token>
  </entry>
  <entry>
    <token>vitamin</token>
    <token>b</token>
  </entry>
  <entry>
    <token>john</token>
    <token>doe</token>
  </entry>
  <entry>
    <token>john</token>
    <token>d</token>
  </entry>
</dictionary>

When ran on this sentence using a DictionaryNameFinder: My name is john doe, aka john d. I
like vitamin b12.

The following tokens are found: john doe, john d, vitamin b

As you can see, when the 2nd token ends in a number, the longest match is discarded.

(Originally from: http://mail-archives.apache.org/mod_mbox/opennlp-users/201406.mbox/%3C1402268906.31205.YahooMailNeo%40web121102.mail.ne1.yahoo.com%3E)



--
This message was sent by Atlassian JIRA
(v6.2#6252)