You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tapestry.apache.org by "Robert Coie (Created) (JIRA)" <ji...@apache.org> on 2011/12/07 22:28:40 UTC

[jira] [Created] (TAP5-1778) Template parsing dependent on JVM default charset

Template parsing dependent on JVM default charset
-------------------------------------------------

                 Key: TAP5-1778
                 URL: https://issues.apache.org/jira/browse/TAP5-1778
             Project: Tapestry 5
          Issue Type: Bug
          Components: tapestry-core
    Affects Versions: 5.3
            Reporter: Robert Coie
            Priority: Minor


This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.

internal.services.XMLTokenStream's openStream method contains the following lines:

InputStreamReader rawReader = new InputStreamReader(rawStream);
...
PrintWriter writer = new PrintWriter(bos);

Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:

InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
...
PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM default charset

Posted by "Josh Canfield (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164768#comment-13164768 ] 

Josh Canfield commented on TAP5-1778:
-------------------------------------

> Perhaps a better, more generic solution is just to document that JVM's default encoding should be set to -Dfile.encoding=UTF-8.

I strongly disagree. Tapestry TML files are XML and the default charset for XML files is UTF-8.

I haven't looked at the code, but from the snippet it sounds like Tapestry does not respect the XML declaration for changing charset?
<?xml version="1.0" encoding="iso-8859-1" ?>

                
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
>                 Key: TAP5-1778
>                 URL: https://issues.apache.org/jira/browse/TAP5-1778
>             Project: Tapestry 5
>          Issue Type: Bug
>          Components: tapestry-core
>    Affects Versions: 5.3
>            Reporter: Robert Coie
>            Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM default charset

Posted by "Takeshi Sugita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294523#comment-13294523 ] 

Takeshi Sugita commented on TAP5-1778:
--------------------------------------

JVM's default encoding has been used to log output or console.
JVM settings, it happens the effect is too large.
                
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
>                 Key: TAP5-1778
>                 URL: https://issues.apache.org/jira/browse/TAP5-1778
>             Project: Tapestry 5
>          Issue Type: Bug
>          Components: tapestry-core
>    Affects Versions: 5.3
>            Reporter: Robert Coie
>            Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM default charset

Posted by "Kalle Korhonen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164740#comment-13164740 ] 

Kalle Korhonen commented on TAP5-1778:
--------------------------------------

I wonder if this creates more problems than solves. You could do it for this instance but any read operation should then explicitly specify UTF-8, otherwise you get mixed results. What if Tapestry, or your own code depends on a library that doesn't specify the encoding but uses platform default. Perhaps a better, more generic solution is just to document that JVM's default encoding should be set to -Dfile.encoding=UTF-8.
                
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
>                 Key: TAP5-1778
>                 URL: https://issues.apache.org/jira/browse/TAP5-1778
>             Project: Tapestry 5
>          Issue Type: Bug
>          Components: tapestry-core
>    Affects Versions: 5.3
>            Reporter: Robert Coie
>            Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM default charset

Posted by "Josh Canfield (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164768#comment-13164768 ] 

Josh Canfield commented on TAP5-1778:
-------------------------------------

> Perhaps a better, more generic solution is just to document that JVM's default encoding should be set to -Dfile.encoding=UTF-8.

I strongly disagree. Tapestry TML files are XML and the default charset for XML files is UTF-8.

I haven't looked at the code, but from the snippet it sounds like Tapestry does not respect the XML declaration for changing charset?
<?xml version="1.0" encoding="iso-8859-1" ?>

                
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
>                 Key: TAP5-1778
>                 URL: https://issues.apache.org/jira/browse/TAP5-1778
>             Project: Tapestry 5
>          Issue Type: Bug
>          Components: tapestry-core
>    Affects Versions: 5.3
>            Reporter: Robert Coie
>            Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TAP5-1778) Template parsing dependent on JVM default charset

Posted by "Takeshi Sugita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Takeshi Sugita updated TAP5-1778:
---------------------------------

    Attachment: charset_5_3_5.patch

patch for v5.3.5
                
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
>                 Key: TAP5-1778
>                 URL: https://issues.apache.org/jira/browse/TAP5-1778
>             Project: Tapestry 5
>          Issue Type: Bug
>          Components: tapestry-core
>    Affects Versions: 5.3
>            Reporter: Robert Coie
>            Priority: Minor
>         Attachments: charset_5_3_5.patch
>
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM default charset

Posted by "Kalle Korhonen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164740#comment-13164740 ] 

Kalle Korhonen commented on TAP5-1778:
--------------------------------------

I wonder if this creates more problems than solves. You could do it for this instance but any read operation should then explicitly specify UTF-8, otherwise you get mixed results. What if Tapestry, or your own code depends on a library that doesn't specify the encoding but uses platform default. Perhaps a better, more generic solution is just to document that JVM's default encoding should be set to -Dfile.encoding=UTF-8.
                
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
>                 Key: TAP5-1778
>                 URL: https://issues.apache.org/jira/browse/TAP5-1778
>             Project: Tapestry 5
>          Issue Type: Bug
>          Components: tapestry-core
>    Affects Versions: 5.3
>            Reporter: Robert Coie
>            Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM default charset

Posted by "Takeshi Sugita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294523#comment-13294523 ] 

Takeshi Sugita commented on TAP5-1778:
--------------------------------------

JVM's default encoding has been used to log output or console.
JVM settings, it happens the effect is too large.
                
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
>                 Key: TAP5-1778
>                 URL: https://issues.apache.org/jira/browse/TAP5-1778
>             Project: Tapestry 5
>          Issue Type: Bug
>          Components: tapestry-core
>    Affects Versions: 5.3
>            Reporter: Robert Coie
>            Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TAP5-1778) Template parsing dependent on JVM default charset

Posted by "Takeshi Sugita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Takeshi Sugita updated TAP5-1778:
---------------------------------

    Attachment: charset_5_3_5.patch

patch for v5.3.5
                
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
>                 Key: TAP5-1778
>                 URL: https://issues.apache.org/jira/browse/TAP5-1778
>             Project: Tapestry 5
>          Issue Type: Bug
>          Components: tapestry-core
>    Affects Versions: 5.3
>            Reporter: Robert Coie
>            Priority: Minor
>         Attachments: charset_5_3_5.patch
>
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira