You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "William Colen (JIRA)" <ji...@apache.org> on 2011/07/13 21:06:59 UTC

[jira] [Created] (OPENNLP-225) Restore the abbreviation dictionary support in SentenceDetector

Restore the abbreviation dictionary support in SentenceDetector
---------------------------------------------------------------

                 Key: OPENNLP-225
                 URL: https://issues.apache.org/jira/browse/OPENNLP-225
             Project: OpenNLP
          Issue Type: Improvement
          Components: Command Line Interface, Sentence Detector
    Affects Versions: tools-1.5.2-incubating
            Reporter: William Colen
            Assignee: William Colen
            Priority: Minor
             Fix For: tools-1.5.2-incubating


Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-225) Restore the abbreviation dictionary support in SentenceDetector

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064796#comment-13064796 ] 

Jörn Kottmann commented on OPENNLP-225:
---------------------------------------

No, that is not necessary. We can reuse the dictionary parse/serialize mechanism without using the Dictionary class.
You can find that in the dictionary.serializer package. A sample how to use it can be found in the Dictionary constructor, and Dictionary.serialize method. 

> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
>                 Key: OPENNLP-225
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-225
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Sentence Detector
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Issue Comment Edited] (OPENNLP-225) Restore the abbreviation dictionary support in SentenceDetector

Posted by "William Colen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064783#comment-13064783 ] 

William Colen edited comment on OPENNLP-225 at 7/13/11 7:16 PM:
----------------------------------------------------------------

Question: should we create an AbbreviationDictionary class that wraps the Dictionary to reuse the parse/serialize mechanism? As I understand an abbreviation dictionary is much simpler than our Dictionary implementation and maybe using the same mechanism should be overkilling.

Today the DefaultSDContextGenerator expects a Set<String> as abbreviationDictionary.

      was (Author: colen):
    Question: should we create an AbbreviationDictionary class that wraps the Dictionary to reuse the parse/serialize mechanism? As I understand an abbreviation dictionary is much simpler than our Dictionary implementation and maybe using the same mechanism should be overkilling.

Today teh DefaultSDContextGenerator expects a Set<String> as abbreviationDictionary.
  
> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
>                 Key: OPENNLP-225
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-225
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Sentence Detector
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (OPENNLP-225) Restore the abbreviation dictionary support in SentenceDetector

Posted by "William Colen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Colen closed OPENNLP-225.
---------------------------------

    Resolution: Fixed

> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
>                 Key: OPENNLP-225
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-225
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Sentence Detector
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-225) Restore the abbreviation dictionary support in SentenceDetector

Posted by "William Colen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064783#comment-13064783 ] 

William Colen commented on OPENNLP-225:
---------------------------------------

Question: should we create an AbbreviationDictionary class that wraps the Dictionary to reuse the parse/serialize mechanism? As I understand an abbreviation dictionary is much simpler than our Dictionary implementation and maybe using the same mechanism should be overkilling.

Today teh DefaultSDContextGenerator expects a Set<String> as abbreviationDictionary.

> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
>                 Key: OPENNLP-225
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-225
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Sentence Detector
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-225) Restore the abbreviation dictionary support in SentenceDetector

Posted by "William Colen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068783#comment-13068783 ] 

William Colen commented on OPENNLP-225:
---------------------------------------

I commited the initial code for it, but there is one issue I could not figure out how to solve:
One can create a AbbreviationDictionary from a serialized file passing the stream and a case sensitivity flag. But how will it work while loading the dictionary from the model during runtime? The ArtifactSerializer.create method don't know which flag to use to restore a dictionary was serialized to the model.

> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
>                 Key: OPENNLP-225
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-225
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Sentence Detector
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira