You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "James Kosin (Created) (JIRA)" <ji...@apache.org> on 2011/10/18 02:28:10 UTC

[jira] [Created] (OPENNLP-329) Input / Output file encoding needs validation

Input / Output file encoding needs validation
---------------------------------------------

                 Key: OPENNLP-329
                 URL: https://issues.apache.org/jira/browse/OPENNLP-329
             Project: OpenNLP
          Issue Type: Bug
          Components: Command Line Interface
         Environment: Windows / CMD line / Netbeans
            Reporter: James Kosin


The file input / output encoding seems to be broken for file redirection.  I'm now getting different event counts again ... the only thing I can attribute the change to currently is while experimenting with a possible issue with a previous email, I changed the chcp several times and changed the cmd prompt font in accordance with others having encoding issues with displaying the characters on the terminal.

We need to put forth some effort into (1) validating the input / output encoding for the file level and (2) write a wrapper for the stdout (System.out.xxx) functions to output the correct encoding based on the input file or parameter for the encoding used.

Please comment...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-329) Converter Issue with output redirection

Posted by "James Kosin (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129438#comment-13129438 ] 

James Kosin commented on OPENNLP-329:
-------------------------------------

Verified the ouputs are correct for training and the namefinder.
                
> Converter Issue with output redirection
> ---------------------------------------
>
>                 Key: OPENNLP-329
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-329
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Command Line Interface
>         Environment: Windows / CMD line / Netbeans
>            Reporter: James Kosin
>            Assignee: James Kosin
>
> Output from the converters and scripts can get mixed in with converted text output causing training and validation issues.
> Just need to verify and comment the script files and possibly outputters for the converters to be sure no unwanted text is output to the System.out path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (OPENNLP-329) Converter Issue with output redirection

Posted by "James Kosin (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kosin resolved OPENNLP-329.
---------------------------------

    Resolution: Fixed

This issue is closed; however, we may want to open a new issue about specifying the encoding of the input/output files for the tools.  The encoding of the input may not be the same as the OS default; which may cause display issues and possibly issues with encoding / decoding of characters from the stdio paths.
Will post to the development list on this.
                
> Converter Issue with output redirection
> ---------------------------------------
>
>                 Key: OPENNLP-329
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-329
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Command Line Interface
>         Environment: Windows / CMD line / Netbeans
>            Reporter: James Kosin
>            Assignee: James Kosin
>
> Output from the converters and scripts can get mixed in with converted text output causing training and validation issues.
> Just need to verify and comment the script files and possibly outputters for the converters to be sure no unwanted text is output to the System.out path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-329) Converter Issue with output redirection

Posted by "James Kosin (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129385#comment-13129385 ] 

James Kosin commented on OPENNLP-329:
-------------------------------------

I've traced the issue in my case to the opennlp.bat file I'm using.  But, it could also happen elsewhere.

Basically, if I redirect the output of the batchfile with
     bin\opennlp TokenNameFinderConverter conll03 -data eng.train -lang en -types per > per-train.txt

And the java applet or appliication outputs anything other than nothing.  The output regisers as events since all output gets saved to the file.
We just need to be sure that the scipts and or batch files don't output anything other than the real data we want to capture.  We may also need to be sure that the converters never output anything to the plain System.out that shouldn't be in the output file.

                
> Converter Issue with output redirection
> ---------------------------------------
>
>                 Key: OPENNLP-329
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-329
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Command Line Interface
>         Environment: Windows / CMD line / Netbeans
>            Reporter: James Kosin
>
> The file input / output encoding seems to be broken for file redirection.  I'm now getting different event counts again ... the only thing I can attribute the change to currently is while experimenting with a possible issue with a previous email, I changed the chcp several times and changed the cmd prompt font in accordance with others having encoding issues with displaying the characters on the terminal.
> We need to put forth some effort into (1) validating the input / output encoding for the file level and (2) write a wrapper for the stdout (System.out.xxx) functions to output the correct encoding based on the input file or parameter for the encoding used.
> Please comment...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (OPENNLP-329) Converter Issue with output redirection

Posted by "Joern Kottmann (Closed) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann closed OPENNLP-329.
----------------------------------

    
> Converter Issue with output redirection
> ---------------------------------------
>
>                 Key: OPENNLP-329
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-329
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Command Line Interface
>         Environment: Windows / CMD line / Netbeans
>            Reporter: James Kosin
>            Assignee: James Kosin
>
> Output from the converters and scripts can get mixed in with converted text output causing training and validation issues.
> Just need to verify and comment the script files and possibly outputters for the converters to be sure no unwanted text is output to the System.out path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-329) Converter Issue with output redirection

Posted by "James Kosin (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kosin updated OPENNLP-329:
--------------------------------

    Description: 
Output from the converters and scripts can get mixed in with converted text output causing training and validation issues.

Just need to verify and comment the script files and possibly outputters for the converters to be sure no unwanted text is output to the System.out path.


  was:
The file input / output encoding seems to be broken for file redirection.  I'm now getting different event counts again ... the only thing I can attribute the change to currently is while experimenting with a possible issue with a previous email, I changed the chcp several times and changed the cmd prompt font in accordance with others having encoding issues with displaying the characters on the terminal.

We need to put forth some effort into (1) validating the input / output encoding for the file level and (2) write a wrapper for the stdout (System.out.xxx) functions to output the correct encoding based on the input file or parameter for the encoding used.

Please comment...

    
> Converter Issue with output redirection
> ---------------------------------------
>
>                 Key: OPENNLP-329
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-329
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Command Line Interface
>         Environment: Windows / CMD line / Netbeans
>            Reporter: James Kosin
>
> Output from the converters and scripts can get mixed in with converted text output causing training and validation issues.
> Just need to verify and comment the script files and possibly outputters for the converters to be sure no unwanted text is output to the System.out path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-329) Converter Issue with output redirection

Posted by "James Kosin (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kosin updated OPENNLP-329:
--------------------------------

    Summary: Converter Issue with output redirection  (was: Input / Output file encoding needs validation)
    
> Converter Issue with output redirection
> ---------------------------------------
>
>                 Key: OPENNLP-329
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-329
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Command Line Interface
>         Environment: Windows / CMD line / Netbeans
>            Reporter: James Kosin
>
> The file input / output encoding seems to be broken for file redirection.  I'm now getting different event counts again ... the only thing I can attribute the change to currently is while experimenting with a possible issue with a previous email, I changed the chcp several times and changed the cmd prompt font in accordance with others having encoding issues with displaying the characters on the terminal.
> We need to put forth some effort into (1) validating the input / output encoding for the file level and (2) write a wrapper for the stdout (System.out.xxx) functions to output the correct encoding based on the input file or parameter for the encoding used.
> Please comment...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (OPENNLP-329) Converter Issue with output redirection

Posted by "James Kosin (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kosin reassigned OPENNLP-329:
-----------------------------------

    Assignee: James Kosin
    
> Converter Issue with output redirection
> ---------------------------------------
>
>                 Key: OPENNLP-329
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-329
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Command Line Interface
>         Environment: Windows / CMD line / Netbeans
>            Reporter: James Kosin
>            Assignee: James Kosin
>
> Output from the converters and scripts can get mixed in with converted text output causing training and validation issues.
> Just need to verify and comment the script files and possibly outputters for the converters to be sure no unwanted text is output to the System.out path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira