You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Katrin Tomanek (Created) (JIRA)" <ji...@apache.org> on 2012/02/09 11:10:59 UTC
[jira] [Created] (OPENNLP-428) make EOS character set configurable
make EOS character set configurable
-----------------------------------
Key: OPENNLP-428
URL: https://issues.apache.org/jira/browse/OPENNLP-428
Project: OpenNLP
Issue Type: Improvement
Components: Sentence Detector
Reporter: Katrin Tomanek
Priority: Minor
Fix For: tools-1.5.3-incubating
Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204690#comment-13204690 ]
William Colen commented on OPENNLP-428:
---------------------------------------
Thank you very much for the patch! Looks good. All tests are passing after applying it.
Just a few comments:
You are passing the EOS characters in a string, and using a simple String.toCharArray() to get the chars. It is simple and should work for most of the languages. But won't work for Thai for example, because we can't pass a space and a new line using the command line tool.
SentenceDetectorME.java
- We should keep the old train method, we can't break backward compatibility. Create a new train method with the new argument.
- We don't need to modify the deprecated train methods.
BaseModel.java
- Remove the unnecessary sysout.
SDContextGenerator.java
- Keep the old constructor, we should be backward compatible. Add a new constructor with the new argument.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205290#comment-13205290 ]
Joern Kottmann commented on OPENNLP-428:
----------------------------------------
Please try to follow our code conventions. We are always using two spaces to indent, an never 4 spaces or a tab.
There should be spaces around operators e.g. 1 + 1 and not 1+1.
More information can be found here:
http://incubator.apache.org/opennlp/code-conventions.html
I usually review the patch file itself or the commit mails. This patch removes and adds CmdLineTool.
Does someone know why this happens? The CmdLineTool class has the eol-style property set to native,
so that should be fine.
We also need support to pass in the eos chars via the cmd line trainer.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "Katrin Tomanek (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204540#comment-13204540 ]
Katrin Tomanek commented on OPENNLP-428:
----------------------------------------
Short explanation to my patch: the sentence detector logic now has a new parameter, which can be specified for the CMD-Line tools (eosChar). Here, you can specify characters to be used as EOS symbols. eosChars is optinal; if not specified, the default EOS symbols (as defined in the Factory, language-dependent) are used.
Note: specifying eosChars will override the logic caused by language in the Factory class.
I also modified the cross-validator, the evaluator, the trainier etc to use the EOS symbols.
EOS symbols, if provided, are stored in the models manifest.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208501#comment-13208501 ]
William Colen commented on OPENNLP-428:
---------------------------------------
I finished reviewing the format. I reformatted only the lines that were changed by revision #1242761.
Again, thank you Katrin for the patch. Thank you Jörn for pointing the problem.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (OPENNLP-428) make EOS character set configurable
Posted by "Katrin Tomanek (Closed) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Katrin Tomanek closed OPENNLP-428.
----------------------------------
Resolution: Fixed
patch works and solves the issue for me!
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (OPENNLP-428) make EOS character set configurable
Posted by "Katrin Tomanek (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Katrin Tomanek updated OPENNLP-428:
-----------------------------------
Attachment: patch_eos_characters
Patch for JIRA Issue: OPENNLP-428
adds eos characters as parameter to sentence detector logic
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209857#comment-13209857 ]
William Colen commented on OPENNLP-428:
---------------------------------------
While working in another issue I notice that I missed one issue while reviewing the patch. The constructors of SentenceModel changed with the addition of a new argument. We should create a new constructor instead to be backward compatible.
I fixed it.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207765#comment-13207765 ]
William Colen commented on OPENNLP-428:
---------------------------------------
Katrin, could you check if the code at the trunk works? If it works you can close the issue. Thank you.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (OPENNLP-428) make EOS character set configurable
Posted by "Joern Kottmann (Reopened) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joern Kottmann reopened OPENNLP-428:
------------------------------------
There slipped in quite some tab indents. I suggest that we fix them now.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (OPENNLP-428) make EOS character set configurable
Posted by "William Colen (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
William Colen resolved OPENNLP-428.
-----------------------------------
Resolution: Fixed
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269469#comment-13269469 ]
Joern Kottmann commented on OPENNLP-428:
----------------------------------------
Katrin, does it work for you? Can you please close this issue to confirm that.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (OPENNLP-428) make EOS character set configurable
Posted by "William Colen (Assigned) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
William Colen reassigned OPENNLP-428:
-------------------------------------
Assignee: William Colen
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (OPENNLP-428) make EOS character set configurable
Posted by "Katrin Tomanek (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Katrin Tomanek updated OPENNLP-428:
-----------------------------------
Attachment: patch_eos_characters2
I modified the patch in correspondence to the Jörns and Williams comments:
- formatting (indention)
- SentenceDetectorME: old train method kept, deprecated methods not modified
- no patch for CmdLineTool (I don't know why my eclipse wants to create a patch entry here --> nothing changed by me!)
- BaseModel.java: syso removed
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (OPENNLP-428) make EOS character set configurable
Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joern Kottmann closed OPENNLP-428.
----------------------------------
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208488#comment-13208488 ]
William Colen commented on OPENNLP-428:
---------------------------------------
I am reviewing the format. Sorry, I should have donne it before committing the patch.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205456#comment-13205456 ]
William Colen commented on OPENNLP-428:
---------------------------------------
Thank you, Katrin. I reviewed and commited your patch.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "Katrin Tomanek (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208280#comment-13208280 ]
Katrin Tomanek commented on OPENNLP-428:
----------------------------------------
could you do that... I checked for the tabs but didn't find any (moreover, didn't want to change my eclipse coding convention and indentation defaults since ours differ from opennlp's)
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-428) make EOS character set
configurable
Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208321#comment-13208321 ]
Joern Kottmann commented on OPENNLP-428:
----------------------------------------
You should import our eclipse formatter settings and then just use it for the opennlp projects in your workspace. Otherwise there will always be format issues in your patches.
> make EOS character set configurable
> -----------------------------------
>
> Key: OPENNLP-428
> URL: https://issues.apache.org/jira/browse/OPENNLP-428
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Reporter: Katrin Tomanek
> Assignee: William Colen
> Priority: Minor
> Fix For: tools-1.5.3-incubating
>
> Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira