You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Katrin Tomanek (Created) (JIRA)" <ji...@apache.org> on 2012/02/09 11:10:59 UTC

[jira] [Created] (OPENNLP-428) make EOS character set configurable

make EOS character set configurable
-----------------------------------

                 Key: OPENNLP-428
                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
             Project: OpenNLP
          Issue Type: Improvement
          Components: Sentence Detector
            Reporter: Katrin Tomanek
            Priority: Minor
             Fix For: tools-1.5.3-incubating


Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory

Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204690#comment-13204690 ] 

William Colen commented on OPENNLP-428:
---------------------------------------

Thank you very much for the patch! Looks good. All tests are passing after applying it. 

Just a few comments:

You are passing the EOS characters in a string, and using a simple String.toCharArray() to get the chars. It is simple and should work for most of the languages. But won't work for Thai for example, because we can't pass a space and a new line using the command line tool.

SentenceDetectorME.java
 - We should keep the old train method, we can't break backward compatibility. Create a new train method with the new argument.
 - We don't need to modify the deprecated train methods.

BaseModel.java
 - Remove the unnecessary sysout.

SDContextGenerator.java
 - Keep the old constructor, we should be backward compatible. Add a new constructor with the new argument.

                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205290#comment-13205290 ] 

Joern Kottmann commented on OPENNLP-428:
----------------------------------------

Please try to follow our code conventions. We are always using two spaces to indent, an never 4 spaces or a tab.
There should be spaces around operators e.g. 1 + 1 and not 1+1.

More information can be found here:
http://incubator.apache.org/opennlp/code-conventions.html

I usually review the patch file itself or the commit mails. This patch removes and adds CmdLineTool.
Does someone know why this happens? The CmdLineTool class has the eol-style property set to native,
so that should be fine.

We also need support to pass in the eos chars via the cmd line trainer.

                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "Katrin Tomanek (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204540#comment-13204540 ] 

Katrin Tomanek commented on OPENNLP-428:
----------------------------------------

Short explanation to my patch: the sentence detector logic now has a new parameter, which can be specified for the CMD-Line tools (eosChar). Here, you can specify characters to be used as EOS symbols. eosChars is optinal; if not specified, the default EOS symbols (as defined in the Factory, language-dependent) are used. 

Note: specifying eosChars will override the logic caused by language in the Factory class.

I also modified the cross-validator, the evaluator, the trainier etc to use the EOS symbols.

EOS symbols, if provided, are stored in the models manifest.
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208501#comment-13208501 ] 

William Colen commented on OPENNLP-428:
---------------------------------------

I finished reviewing the format. I reformatted only the lines that were changed by revision #1242761.

Again, thank you Katrin for the patch. Thank you Jörn for pointing the problem.
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Closed] (OPENNLP-428) make EOS character set configurable

Posted by "Katrin Tomanek (Closed) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Katrin Tomanek closed OPENNLP-428.
----------------------------------

    Resolution: Fixed

patch works and solves the issue for me!
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-428) make EOS character set configurable

Posted by "Katrin Tomanek (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Katrin Tomanek updated OPENNLP-428:
-----------------------------------

    Attachment: patch_eos_characters

Patch for JIRA Issue: OPENNLP-428

adds eos characters as parameter to sentence detector logic
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209857#comment-13209857 ] 

William Colen commented on OPENNLP-428:
---------------------------------------

While working in another issue I notice that I missed one issue while reviewing the patch. The constructors of SentenceModel changed with the addition of a new argument. We should create a new constructor instead to be backward compatible.
I fixed it.
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207765#comment-13207765 ] 

William Colen commented on OPENNLP-428:
---------------------------------------

Katrin, could you check if the code at the trunk works? If it works you can close the issue. Thank you.
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (OPENNLP-428) make EOS character set configurable

Posted by "Joern Kottmann (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann reopened OPENNLP-428:
------------------------------------


There slipped in quite some tab indents. I suggest that we fix them now.
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (OPENNLP-428) make EOS character set configurable

Posted by "William Colen (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Colen resolved OPENNLP-428.
-----------------------------------

    Resolution: Fixed
    
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269469#comment-13269469 ] 

Joern Kottmann commented on OPENNLP-428:
----------------------------------------

Katrin, does it work for you? Can you please close this issue to confirm that.
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (OPENNLP-428) make EOS character set configurable

Posted by "William Colen (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Colen reassigned OPENNLP-428:
-------------------------------------

    Assignee: William Colen
    
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-428) make EOS character set configurable

Posted by "Katrin Tomanek (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Katrin Tomanek updated OPENNLP-428:
-----------------------------------

    Attachment: patch_eos_characters2

I modified the patch in correspondence to the Jörns and Williams comments:
- formatting (indention)
- SentenceDetectorME: old train method kept, deprecated methods not modified
- no patch for CmdLineTool (I don't know why my eclipse wants to create a patch entry here --> nothing changed by me!)
- BaseModel.java: syso removed



                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Closed] (OPENNLP-428) make EOS character set configurable

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann closed OPENNLP-428.
----------------------------------

    
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208488#comment-13208488 ] 

William Colen commented on OPENNLP-428:
---------------------------------------

I am reviewing the format. Sorry, I should have donne it before committing the patch.
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205456#comment-13205456 ] 

William Colen commented on OPENNLP-428:
---------------------------------------

Thank you, Katrin. I reviewed and commited your patch.
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "Katrin Tomanek (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208280#comment-13208280 ] 

Katrin Tomanek commented on OPENNLP-428:
----------------------------------------

could you do that... I checked for the tabs but didn't find any (moreover, didn't want to change my eclipse coding convention and indentation defaults since ours differ from opennlp's)
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-428) make EOS character set configurable

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208321#comment-13208321 ] 

Joern Kottmann commented on OPENNLP-428:
----------------------------------------

You should import our eclipse formatter settings and then just use it for the opennlp projects in your workspace. Otherwise there will always be format issues in your patches.
                
> make EOS character set configurable
> -----------------------------------
>
>                 Key: OPENNLP-428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-428
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>            Reporter: Katrin Tomanek
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: patch_eos_characters, patch_eos_characters2
>
>
> Currently, the EOS symbols to be used by the sentence detector cannot be configured (at the moment, a user would have to make changes in opennlp.tools.sentdetect.lang.Factory
> Since it is important to use the same EOS symbols during training and during testing/prediction, the EOS symbols should be stored with the model's properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira