You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "William Colen (JIRA)" <ji...@apache.org> on 2011/07/13 21:17:00 UTC

[jira] [Commented] (OPENNLP-225) Restore the abbreviation dictionary support in SentenceDetector

    [ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064783#comment-13064783 ] 

William Colen commented on OPENNLP-225:
---------------------------------------

Question: should we create an AbbreviationDictionary class that wraps the Dictionary to reuse the parse/serialize mechanism? As I understand an abbreviation dictionary is much simpler than our Dictionary implementation and maybe using the same mechanism should be overkilling.

Today teh DefaultSDContextGenerator expects a Set<String> as abbreviationDictionary.

> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
>                 Key: OPENNLP-225
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-225
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Sentence Detector
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira