You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tapestry.apache.org by "Robert Coie (Created) (JIRA)" <ji...@apache.org> on 2011/12/07 22:28:40 UTC
[jira] [Created] (TAP5-1778) Template parsing dependent on JVM
default charset
Template parsing dependent on JVM default charset
-------------------------------------------------
Key: TAP5-1778
URL: https://issues.apache.org/jira/browse/TAP5-1778
Project: Tapestry 5
Issue Type: Bug
Components: tapestry-core
Affects Versions: 5.3
Reporter: Robert Coie
Priority: Minor
This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
internal.services.XMLTokenStream's openStream method contains the following lines:
InputStreamReader rawReader = new InputStreamReader(rawStream);
...
PrintWriter writer = new PrintWriter(bos);
Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
...
PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM
default charset
Posted by "Josh Canfield (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164768#comment-13164768 ]
Josh Canfield commented on TAP5-1778:
-------------------------------------
> Perhaps a better, more generic solution is just to document that JVM's default encoding should be set to -Dfile.encoding=UTF-8.
I strongly disagree. Tapestry TML files are XML and the default charset for XML files is UTF-8.
I haven't looked at the code, but from the snippet it sounds like Tapestry does not respect the XML declaration for changing charset?
<?xml version="1.0" encoding="iso-8859-1" ?>
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
> Key: TAP5-1778
> URL: https://issues.apache.org/jira/browse/TAP5-1778
> Project: Tapestry 5
> Issue Type: Bug
> Components: tapestry-core
> Affects Versions: 5.3
> Reporter: Robert Coie
> Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM
default charset
Posted by "Takeshi Sugita (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294523#comment-13294523 ]
Takeshi Sugita commented on TAP5-1778:
--------------------------------------
JVM's default encoding has been used to log output or console.
JVM settings, it happens the effect is too large.
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
> Key: TAP5-1778
> URL: https://issues.apache.org/jira/browse/TAP5-1778
> Project: Tapestry 5
> Issue Type: Bug
> Components: tapestry-core
> Affects Versions: 5.3
> Reporter: Robert Coie
> Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM
default charset
Posted by "Kalle Korhonen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164740#comment-13164740 ]
Kalle Korhonen commented on TAP5-1778:
--------------------------------------
I wonder if this creates more problems than solves. You could do it for this instance but any read operation should then explicitly specify UTF-8, otherwise you get mixed results. What if Tapestry, or your own code depends on a library that doesn't specify the encoding but uses platform default. Perhaps a better, more generic solution is just to document that JVM's default encoding should be set to -Dfile.encoding=UTF-8.
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
> Key: TAP5-1778
> URL: https://issues.apache.org/jira/browse/TAP5-1778
> Project: Tapestry 5
> Issue Type: Bug
> Components: tapestry-core
> Affects Versions: 5.3
> Reporter: Robert Coie
> Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM
default charset
Posted by "Josh Canfield (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164768#comment-13164768 ]
Josh Canfield commented on TAP5-1778:
-------------------------------------
> Perhaps a better, more generic solution is just to document that JVM's default encoding should be set to -Dfile.encoding=UTF-8.
I strongly disagree. Tapestry TML files are XML and the default charset for XML files is UTF-8.
I haven't looked at the code, but from the snippet it sounds like Tapestry does not respect the XML declaration for changing charset?
<?xml version="1.0" encoding="iso-8859-1" ?>
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
> Key: TAP5-1778
> URL: https://issues.apache.org/jira/browse/TAP5-1778
> Project: Tapestry 5
> Issue Type: Bug
> Components: tapestry-core
> Affects Versions: 5.3
> Reporter: Robert Coie
> Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TAP5-1778) Template parsing dependent on JVM
default charset
Posted by "Takeshi Sugita (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Sugita updated TAP5-1778:
---------------------------------
Attachment: charset_5_3_5.patch
patch for v5.3.5
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
> Key: TAP5-1778
> URL: https://issues.apache.org/jira/browse/TAP5-1778
> Project: Tapestry 5
> Issue Type: Bug
> Components: tapestry-core
> Affects Versions: 5.3
> Reporter: Robert Coie
> Priority: Minor
> Attachments: charset_5_3_5.patch
>
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM
default charset
Posted by "Kalle Korhonen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164740#comment-13164740 ]
Kalle Korhonen commented on TAP5-1778:
--------------------------------------
I wonder if this creates more problems than solves. You could do it for this instance but any read operation should then explicitly specify UTF-8, otherwise you get mixed results. What if Tapestry, or your own code depends on a library that doesn't specify the encoding but uses platform default. Perhaps a better, more generic solution is just to document that JVM's default encoding should be set to -Dfile.encoding=UTF-8.
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
> Key: TAP5-1778
> URL: https://issues.apache.org/jira/browse/TAP5-1778
> Project: Tapestry 5
> Issue Type: Bug
> Components: tapestry-core
> Affects Versions: 5.3
> Reporter: Robert Coie
> Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TAP5-1778) Template parsing dependent on JVM
default charset
Posted by "Takeshi Sugita (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294523#comment-13294523 ]
Takeshi Sugita commented on TAP5-1778:
--------------------------------------
JVM's default encoding has been used to log output or console.
JVM settings, it happens the effect is too large.
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
> Key: TAP5-1778
> URL: https://issues.apache.org/jira/browse/TAP5-1778
> Project: Tapestry 5
> Issue Type: Bug
> Components: tapestry-core
> Affects Versions: 5.3
> Reporter: Robert Coie
> Priority: Minor
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TAP5-1778) Template parsing dependent on JVM
default charset
Posted by "Takeshi Sugita (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TAP5-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Sugita updated TAP5-1778:
---------------------------------
Attachment: charset_5_3_5.patch
patch for v5.3.5
> Template parsing dependent on JVM default charset
> -------------------------------------------------
>
> Key: TAP5-1778
> URL: https://issues.apache.org/jira/browse/TAP5-1778
> Project: Tapestry 5
> Issue Type: Bug
> Components: tapestry-core
> Affects Versions: 5.3
> Reporter: Robert Coie
> Priority: Minor
> Attachments: charset_5_3_5.patch
>
>
> This is my first experience with JIRA, so apologies if it is not formatted properly. I raised this topic on the tapestry-users mailing list and was asked by a couple of people there to create an issue here.
> internal.services.XMLTokenStream's openStream method contains the following lines:
> InputStreamReader rawReader = new InputStreamReader(rawStream);
> ...
> PrintWriter writer = new PrintWriter(bos);
> Both of these implicitly rely on the default JVM charset. This poses a significant problem for non-ASCII text in templates on Google App Engine, where the default is "US-ASCII". In the interests of robustness, I think it would be nice if Tapestry was able to eliminate any reliance on default charsets. I am not confident enough in my understanding of Tapestry internals to know how to appropriately retrieve symbol properties (such as "tapestry.charset") via the IoC system in internal service implementations, but I have verified that explicitly specifying "UTF-8" as follows resolved my problem:
> InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8");
> ...
> PrintWriter writer = new PrintWriter( new OutputStreamWriter(bos, "UTF-8") );
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira