You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Lance Bader (JIRA)" <ji...@codehaus.org> on 2006/04/29 04:12:19 UTC

[jira] Created: (MAVEN-1760) xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML

xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML 
--------------------------------------------------------------------

         Key: MAVEN-1760
         URL: http://jira.codehaus.org/browse/MAVEN-1760
     Project: Maven
        Type: Bug

    Versions: 1.1-beta-2    
 Environment: Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP workstation with IBM Java SDK V1.4.2
    Reporter: Lance Bader
 Attachments: UTF8EncodingProblem.zip

When I attempt to build UTF-8 encoded HTML from UTF-8 XML source files, every special character is scrambled.

We are using the xDoc plugin to generate the HTML for our on-line user guide.  We sent the English source files and the default properties file to 9 translation centers.  The translators returned valid UTF-8 source, but xDoc will not generate valid UTF-8 HTML.

I have attached a very small subset of our product that demonstrates this problem.  See the README.txt file within the ZIP archive for information about how to use the supplied scripts to build the output for German, English, French, and Traditional Chinese and view the result.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (MPXDOC-195) xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML

Posted by "Lance Bader (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MPXDOC-195?page=comments#action_64446 ] 

Lance Bader commented on MPXDOC-195:
------------------------------------

I found an old Red Hat Linux system where I could run the supplied test case.  Precisely, it is Red Hat Enterprise Linux  V4 update 3 for i386.  I installed Maven 1.1-beta-2 with maven-xdoc-plugin-1.9.2 and the attached test case.  I created a script that matches the actions in build_de.bat, build_en.bat, build_fr.bat, and build_zh_TW.bat.

Except for the unrelated problem caused by poison properties files in src\i18nBundles (see the problem report in a previous comment), the HTML was generated CORRECTLY.

NOTE:  I did NOT have to modify the LANG or LC_CTYPE environment variables, as suggested in the xDoc plugin FAQ or in http://jira.codehaus.org/browse/MPXDOC-184 .  By default, LANG was already set to LANG="en_US.UTF-8".   I dumped the Java system properties and observed that file.encoding=UTF-8 by default.

So, that begs the question, "Why doesn't this work on a Windows XP system when you use -Dfile.encoding=UTF-8 to override the default file encoding?"  Its a mystery.

> xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML 
> --------------------------------------------------------------------
>
>          Key: MPXDOC-195
>          URL: http://jira.codehaus.org/browse/MPXDOC-195
>      Project: maven-xdoc-plugin
>         Type: Bug

>     Versions: 1.9.2
>  Environment: Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP workstation with IBM Java SDK V1.4.2
>     Reporter: Lance Bader
>  Attachments: UTF8EncodingProblem.zip
>
>
> When I attempt to build UTF-8 encoded HTML from UTF-8 XML source files, every special character is scrambled.
> We are using the xDoc plugin to generate the HTML for our on-line user guide.  We sent the English source files and the default properties file to 9 translation centers.  The translators returned valid UTF-8 source, but xDoc will not generate valid UTF-8 HTML.
> I have attached a very small subset of our product that demonstrates this problem.  See the README.txt file within the ZIP archive for information about how to use the supplied scripts to build the output for German, English, French, and Traditional Chinese and view the result.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (MPXDOC-195) xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML

Posted by "Lance Bader (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MPXDOC-195?page=all ]

Lance Bader updated MPXDOC-195:
-------------------------------

    Attachment: UTF8EncodingProblem.zip

> xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML 
> --------------------------------------------------------------------
>
>          Key: MPXDOC-195
>          URL: http://jira.codehaus.org/browse/MPXDOC-195
>      Project: maven-xdoc-plugin
>         Type: Bug

>     Versions: 1.9.2
>  Environment: Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP workstation with IBM Java SDK V1.4.2
>     Reporter: Lance Bader
>  Attachments: UTF8EncodingProblem.zip, UTF8EncodingProblem.zip
>
>
> When I attempt to build UTF-8 encoded HTML from UTF-8 XML source files, every special character is scrambled.
> We are using the xDoc plugin to generate the HTML for our on-line user guide.  We sent the English source files and the default properties file to 9 translation centers.  The translators returned valid UTF-8 source, but xDoc will not generate valid UTF-8 HTML.
> I have attached a very small subset of our product that demonstrates this problem.  See the README.txt file within the ZIP archive for information about how to use the supplied scripts to build the output for German, English, French, and Traditional Chinese and view the result.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (MPXDOC-195) xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML

Posted by "Lance Bader (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MPXDOC-195?page=comments#action_64444 ] 

Lance Bader commented on MPXDOC-195:
------------------------------------

NOTE:  Although it has no affect on this problem, I have discovered a defect in the test case I supplied.  The properties files in src\i18nBundles have not been converted to the required ASCII encoding.  I expected the translators to return ASCII encoded files, but they used some native format instead.  As a result, the navigation items, the section, headers, and the subsection headers will appear wrong, even if the rest of the page is generated correctly.

I will attach an updated test case when I have converted the properties files correctly.  I first have to find out what encoding the translators used (it is obviously not UTF-8) and fix them with native2ascii.

Lance

> xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML 
> --------------------------------------------------------------------
>
>          Key: MPXDOC-195
>          URL: http://jira.codehaus.org/browse/MPXDOC-195
>      Project: maven-xdoc-plugin
>         Type: Bug

>     Versions: 1.9.2
>  Environment: Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP workstation with IBM Java SDK V1.4.2
>     Reporter: Lance Bader
>  Attachments: UTF8EncodingProblem.zip
>
>
> When I attempt to build UTF-8 encoded HTML from UTF-8 XML source files, every special character is scrambled.
> We are using the xDoc plugin to generate the HTML for our on-line user guide.  We sent the English source files and the default properties file to 9 translation centers.  The translators returned valid UTF-8 source, but xDoc will not generate valid UTF-8 HTML.
> I have attached a very small subset of our product that demonstrates this problem.  See the README.txt file within the ZIP archive for information about how to use the supplied scripts to build the output for German, English, French, and Traditional Chinese and view the result.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (MPXDOC-195) xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML

Posted by "Lance Bader (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MPXDOC-195?page=comments#action_64441 ] 

Lance Bader commented on MPXDOC-195:
------------------------------------

Since I opened the issue I copied the same archive to a workstation using the Sun JDK V1.5 Update 6 and recreated the same problem.  I think this means that it is unlikely that the IBM JDK is causing the problem.

I have also modified the velocity-1.4.jar file in the .maven\repository\velocity\jars directory.  I replaced org\apache\velocity\runtime\defaults\velocity.properties after making the following changes.


#----------------------------------------------------------------------------
# T E M P L A T E  E N C O D I N G
#----------------------------------------------------------------------------

input.encoding=UTF-8
output.encoding=UTF-8

I recreated the problem after making this change.  I like to think that the problem is in velocity, but this change did not affect the outcome.

Next I will move to Red Hat Linux and try the suggested work around there.

Lance


> xDoc plugin scrambles UTF-8 source files when generating UTF-8 HTML 
> --------------------------------------------------------------------
>
>          Key: MPXDOC-195
>          URL: http://jira.codehaus.org/browse/MPXDOC-195
>      Project: maven-xdoc-plugin
>         Type: Bug

>     Versions: 1.9.2
>  Environment: Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP workstation with IBM Java SDK V1.4.2
>     Reporter: Lance Bader
>  Attachments: UTF8EncodingProblem.zip
>
>
> When I attempt to build UTF-8 encoded HTML from UTF-8 XML source files, every special character is scrambled.
> We are using the xDoc plugin to generate the HTML for our on-line user guide.  We sent the English source files and the default properties file to 9 translation centers.  The translators returned valid UTF-8 source, but xDoc will not generate valid UTF-8 HTML.
> I have attached a very small subset of our product that demonstrates this problem.  See the README.txt file within the ZIP archive for information about how to use the supplied scripts to build the output for German, English, French, and Traditional Chinese and view the result.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira