You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@maven.apache.org by Lance Bader <ld...@gmail.com> on 2006/04/28 15:37:42 UTC

xDoc Scrambles UTF-8 source files when generating UTF-8 HTML

I am using Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP
workstation.  When I attempt to build UTF-8 encoded HTML from UTF-8 XML
source files, every special character is scrambled.  I haven't done the
analysis, but I would guess that every multi-byte character is being treated
like a group of single byte characters.

We are using the Maven xDoc plug-in to generate our on-line user guide.  We
sent the English XML source and I18N properties file to be translated into 9
languages.  The returned files are UTF-8 encoded.

Each source file begins.

<?xml version="1.0" encoding="UTF-8" ?>

I build each language tree seperately and then combine the output trees into
a single web site.  In my project properties, I specify

maven.xdoc.includeProjectDocumentation=no
maven.xdoc.date=navigation-bottom
maven.xdoc.jsl=file:${basedir}/src/site.jsl
maven.docs.outputencoding=UTF-8

maven.docs.src=${basedir}/src/xdoc/en
maven.faq.src=${basedir}/src/xdoc/en/Faq

maven.xdoc.bundle.src=${basedir}/src/i18nBundles
maven.xdoc.bundle=wasce
maven.xdoc.locale.default=en
maven.docs.dest=${maven.build.dir}/docs/en

When I want to generate a site in a different language, I override the
properties on the maven command line like this:

-Dmaven.docs.src=${basedir}/src/xdoc/xx
-Dmaven.faq.src=${basedir}/src/xdoc/xx/Faq
-Dmaven.xdoc.locale.default=xx
-Dmaven.docs.dest=${maven.build.dir}/docs/xx

where xx is replaced with the language to be generated (de es fr it ko pt_BR
ru zh_CN zh_TW)

In my UTF-8 enabled editor, the source files appear to be properly encoded.
Firefox and the Internet Explorer both agree that the HTML is UTF-8
encoded.  The characters are scrambled.

What am I doing wrong?

Lance

Re: xDoc Scrambles UTF-8 source files when generating UTF-8 HTML

Posted by Lance Bader <ld...@gmail.com>.
I found an old Red Hat Linux system where I could run the supplied test
case.  Precisely, it is Red Hat Enterprise Linux  V4 update 3 for i386.  I
installed Maven 1.1-beta-2 with maven-xdoc-plugin-1.9.2 and the attached
test case.  I created a script that matches the actions in build_de.bat,
build_en.bat, build_fr.bat, and build_zh_TW.bat.

Except for the unrelated problem caused by poison properties files in
src\i18nBundles (see the comments in the JIRA issue), the HTML was generated
CORRECTLY.

NOTE:  I did NOT have to modify the LANG or LC_CTYPE environment variables,
as suggested in the xDoc plugin FAQ or in
http://jira.codehaus.org/browse/MPXDOC-184 .  By default, LANG was already
set to LANG="en_US.UTF-8".   I dumped the Java system properties and
observed that file.encoding=UTF-8 by default.

So, that begs the question, "Why doesn't this work on a Windows XP system
when you use -Dfile.encoding=UTF-8 to override the default file encoding?"
Its a mystery.

Re: xDoc Scrambles UTF-8 source files when generating UTF-8 HTML

Posted by Lance Bader <ld...@gmail.com>.
URL: http://jira.codehaus.org/browse/MAVEN-1760

I have opened the Jira issue you requested and provided an archive
containing enough of the product to generate the index.html page for 4
languages, German, English, French, and Traditional Chinese.

Since I opened the issue I copied the same archive to a workstation using
the Sun JDK V1.5 Update 6 and recreated the same problem.  I think this
means that it is unlikely that the IBM JDK is causing the problem.

I have also modified the velocity-1.4.jar file in the
.maven\repository\velocity\jars directory.  I replaced
org\apache\velocity\runtime\defaults\velocity.properties after making the
following changes.


#----------------------------------------------------------------------------
# T E M P L A T E  E N C O D I N G
#----------------------------------------------------------------------------

input.encoding=UTF-8
output.encoding=UTF-8

I recreated the problem after making this change.  I like to think that the
problem is in velocity, but this change did not affect the outcome.

Next I will move to Red Hat Linux and try the suggested work around there.

Lance

Re: xDoc Scrambles UTF-8 source files when generating UTF-8 HTML

Posted by Lukas Theussl <lt...@apache.org>.
Something similar was reported here:

http://jira.codehaus.org/browse/MPXDOC-184

maybe you find a useful comment there. If you don't get it fixed, can 
you try to trim down a simple test project and attach it to a JIRA issue?

Cheers,
-Lukas

Lance Bader wrote:
> I am using Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP
> workstation.  When I attempt to build UTF-8 encoded HTML from UTF-8 XML
> source files, every special character is scrambled.  I haven't done the
> analysis, but I would guess that every multi-byte character is being 
> treated
> like a group of single byte characters.
> 
> We are using the Maven xDoc plug-in to generate our on-line user guide.  We
> sent the English XML source and I18N properties file to be translated 
> into 9
> languages.  The returned files are UTF-8 encoded.
> 
> Each source file begins.
> 
> <?xml version="1.0" encoding="UTF-8" ?>
> 
> I build each language tree seperately and then combine the output trees 
> into
> a single web site.  In my project properties, I specify
> 
> maven.xdoc.includeProjectDocumentation=no
> maven.xdoc.date=navigation-bottom
> maven.xdoc.jsl=file:${basedir}/src/site.jsl
> maven.docs.outputencoding=UTF-8
> 
> maven.docs.src=${basedir}/src/xdoc/en
> maven.faq.src=${basedir}/src/xdoc/en/Faq
> 
> maven.xdoc.bundle.src=${basedir}/src/i18nBundles
> maven.xdoc.bundle=wasce
> maven.xdoc.locale.default=en
> maven.docs.dest=${maven.build.dir}/docs/en
> 
> When I want to generate a site in a different language, I override the
> properties on the maven command line like this:
> 
> -Dmaven.docs.src=${basedir}/src/xdoc/xx
> -Dmaven.faq.src=${basedir}/src/xdoc/xx/Faq
> -Dmaven.xdoc.locale.default=xx
> -Dmaven.docs.dest=${maven.build.dir}/docs/xx
> 
> where xx is replaced with the language to be generated (de es fr it ko 
> pt_BR
> ru zh_CN zh_TW)
> 
> In my UTF-8 enabled editor, the source files appear to be properly encoded.
> Firefox and the Internet Explorer both agree that the HTML is UTF-8
> encoded.  The characters are scrambled.
> 
> What am I doing wrong?
> 
> Lance
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org