You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by "Nico Verwer (JIRA)" <ji...@apache.org> on 2010/08/13 11:20:16 UTC
[jira] Created: (COCOON-2297) Character encoding does not follow
JTidy properties
Character encoding does not follow JTidy properties
---------------------------------------------------
Key: COCOON-2297
URL: https://issues.apache.org/jira/browse/COCOON-2297
Project: Cocoon
Issue Type: Bug
Components: Blocks: HTML
Affects Versions: 2.1.11
Reporter: Nico Verwer
The text that HTMLTransformer sends to JTidy is always encoded according tot the platform default encoding, by calling text.getBytes() without an encoding parameter. JTidy does not follow the platform default encoding, but has its own default. It is possible to change JTidy's input encoding in the properties file.
The patch uses the encoding specified by JTidy's configuration.
The result is that HTMLTransformer handles UTF-8 or other encodings correctly, so you don't get Chinese characters where you expected a diacritical mark.
While I was changing the code, I also changed the logging settings. They now take the settings in the JTidy configuration into account.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (COCOON-2297) Character encoding does not follow
JTidy properties
Posted by "Nico Verwer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COCOON-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902345#action_12902345 ]
Nico Verwer commented on COCOON-2297:
-------------------------------------
I am afraid the patch is not correct. The line that now reads
tidy.setShowWarnings(getLogger().isWarnEnabled() && ((wChar == 'f') || (wChar == 'n') || (wChar == '0')));
should have been
tidy.setShowWarnings(getLogger().isWarnEnabled() && ((wChar == 't') || (wChar == 'y') || (wChar == '1')));
Otherwise, the meaning of show-warnings is inverted. Sorry for the confusion.
Are there still committers for Cocoon 2.1 who are willing to pick this up?
> Character encoding does not follow JTidy properties
> ---------------------------------------------------
>
> Key: COCOON-2297
> URL: https://issues.apache.org/jira/browse/COCOON-2297
> Project: Cocoon
> Issue Type: Bug
> Components: Blocks: HTML
> Affects Versions: 2.1.11
> Reporter: Nico Verwer
> Attachments: HTMLTransformer.patch
>
>
> The text that HTMLTransformer sends to JTidy is always encoded according tot the platform default encoding, by calling text.getBytes() without an encoding parameter. JTidy does not follow the platform default encoding, but has its own default. It is possible to change JTidy's input encoding in the properties file.
> The patch uses the encoding specified by JTidy's configuration.
> The result is that HTMLTransformer handles UTF-8 or other encodings correctly, so you don't get Chinese characters where you expected a diacritical mark.
> While I was changing the code, I also changed the logging settings. They now take the settings in the JTidy configuration into account.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (COCOON-2297) Character encoding does not follow
JTidy properties
Posted by "Nico Verwer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COCOON-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nico Verwer updated COCOON-2297:
--------------------------------
Attachment: HTMLTransformer.patch
The patch that fixes the issue described.
> Character encoding does not follow JTidy properties
> ---------------------------------------------------
>
> Key: COCOON-2297
> URL: https://issues.apache.org/jira/browse/COCOON-2297
> Project: Cocoon
> Issue Type: Bug
> Components: Blocks: HTML
> Affects Versions: 2.1.11
> Reporter: Nico Verwer
> Attachments: HTMLTransformer.patch
>
>
> The text that HTMLTransformer sends to JTidy is always encoded according tot the platform default encoding, by calling text.getBytes() without an encoding parameter. JTidy does not follow the platform default encoding, but has its own default. It is possible to change JTidy's input encoding in the properties file.
> The patch uses the encoding specified by JTidy's configuration.
> The result is that HTMLTransformer handles UTF-8 or other encodings correctly, so you don't get Chinese characters where you expected a diacritical mark.
> While I was changing the code, I also changed the logging settings. They now take the settings in the JTidy configuration into account.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.