You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "William Colen (JIRA)" <ji...@apache.org> on 2011/07/13 21:06:59 UTC
[jira] [Created] (OPENNLP-225) Restore the abbreviation dictionary
support in SentenceDetector
Restore the abbreviation dictionary support in SentenceDetector
---------------------------------------------------------------
Key: OPENNLP-225
URL: https://issues.apache.org/jira/browse/OPENNLP-225
Project: OpenNLP
Issue Type: Improvement
Components: Command Line Interface, Sentence Detector
Affects Versions: tools-1.5.2-incubating
Reporter: William Colen
Assignee: William Colen
Priority: Minor
Fix For: tools-1.5.2-incubating
Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-225) Restore the abbreviation
dictionary support in SentenceDetector
Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064796#comment-13064796 ]
Jörn Kottmann commented on OPENNLP-225:
---------------------------------------
No, that is not necessary. We can reuse the dictionary parse/serialize mechanism without using the Dictionary class.
You can find that in the dictionary.serializer package. A sample how to use it can be found in the Dictionary constructor, and Dictionary.serialize method.
> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
> Key: OPENNLP-225
> URL: https://issues.apache.org/jira/browse/OPENNLP-225
> Project: OpenNLP
> Issue Type: Improvement
> Components: Command Line Interface, Sentence Detector
> Affects Versions: tools-1.5.2-incubating
> Reporter: William Colen
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (OPENNLP-225) Restore the
abbreviation dictionary support in SentenceDetector
Posted by "William Colen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064783#comment-13064783 ]
William Colen edited comment on OPENNLP-225 at 7/13/11 7:16 PM:
----------------------------------------------------------------
Question: should we create an AbbreviationDictionary class that wraps the Dictionary to reuse the parse/serialize mechanism? As I understand an abbreviation dictionary is much simpler than our Dictionary implementation and maybe using the same mechanism should be overkilling.
Today the DefaultSDContextGenerator expects a Set<String> as abbreviationDictionary.
was (Author: colen):
Question: should we create an AbbreviationDictionary class that wraps the Dictionary to reuse the parse/serialize mechanism? As I understand an abbreviation dictionary is much simpler than our Dictionary implementation and maybe using the same mechanism should be overkilling.
Today teh DefaultSDContextGenerator expects a Set<String> as abbreviationDictionary.
> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
> Key: OPENNLP-225
> URL: https://issues.apache.org/jira/browse/OPENNLP-225
> Project: OpenNLP
> Issue Type: Improvement
> Components: Command Line Interface, Sentence Detector
> Affects Versions: tools-1.5.2-incubating
> Reporter: William Colen
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (OPENNLP-225) Restore the abbreviation dictionary
support in SentenceDetector
Posted by "William Colen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
William Colen closed OPENNLP-225.
---------------------------------
Resolution: Fixed
> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
> Key: OPENNLP-225
> URL: https://issues.apache.org/jira/browse/OPENNLP-225
> Project: OpenNLP
> Issue Type: Improvement
> Components: Command Line Interface, Sentence Detector
> Affects Versions: tools-1.5.2-incubating
> Reporter: William Colen
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-225) Restore the abbreviation
dictionary support in SentenceDetector
Posted by "William Colen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064783#comment-13064783 ]
William Colen commented on OPENNLP-225:
---------------------------------------
Question: should we create an AbbreviationDictionary class that wraps the Dictionary to reuse the parse/serialize mechanism? As I understand an abbreviation dictionary is much simpler than our Dictionary implementation and maybe using the same mechanism should be overkilling.
Today teh DefaultSDContextGenerator expects a Set<String> as abbreviationDictionary.
> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
> Key: OPENNLP-225
> URL: https://issues.apache.org/jira/browse/OPENNLP-225
> Project: OpenNLP
> Issue Type: Improvement
> Components: Command Line Interface, Sentence Detector
> Affects Versions: tools-1.5.2-incubating
> Reporter: William Colen
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-225) Restore the abbreviation
dictionary support in SentenceDetector
Posted by "William Colen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068783#comment-13068783 ]
William Colen commented on OPENNLP-225:
---------------------------------------
I commited the initial code for it, but there is one issue I could not figure out how to solve:
One can create a AbbreviationDictionary from a serialized file passing the stream and a case sensitivity flag. But how will it work while loading the dictionary from the model during runtime? The ArtifactSerializer.create method don't know which flag to use to restore a dictionary was serialized to the model.
> Restore the abbreviation dictionary support in SentenceDetector
> ---------------------------------------------------------------
>
> Key: OPENNLP-225
> URL: https://issues.apache.org/jira/browse/OPENNLP-225
> Project: OpenNLP
> Issue Type: Improvement
> Components: Command Line Interface, Sentence Detector
> Affects Versions: tools-1.5.2-incubating
> Reporter: William Colen
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.2-incubating
>
>
> Today the abbreviation dictionary features of SentenceDetector are only usable though the API. We should add mechanism to allow training with an abbreviation dictionary from command line, and also add the dictionary to the model as we do with POS Tagger.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira