You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@maven.apache.org by Lance Bader <ld...@gmail.com> on 2006/04/28 15:37:42 UTC
xDoc Scrambles UTF-8 source files when generating UTF-8 HTML
I am using Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP
workstation. When I attempt to build UTF-8 encoded HTML from UTF-8 XML
source files, every special character is scrambled. I haven't done the
analysis, but I would guess that every multi-byte character is being treated
like a group of single byte characters.
We are using the Maven xDoc plug-in to generate our on-line user guide. We
sent the English XML source and I18N properties file to be translated into 9
languages. The returned files are UTF-8 encoded.
Each source file begins.
<?xml version="1.0" encoding="UTF-8" ?>
I build each language tree seperately and then combine the output trees into
a single web site. In my project properties, I specify
maven.xdoc.includeProjectDocumentation=no
maven.xdoc.date=navigation-bottom
maven.xdoc.jsl=file:${basedir}/src/site.jsl
maven.docs.outputencoding=UTF-8
maven.docs.src=${basedir}/src/xdoc/en
maven.faq.src=${basedir}/src/xdoc/en/Faq
maven.xdoc.bundle.src=${basedir}/src/i18nBundles
maven.xdoc.bundle=wasce
maven.xdoc.locale.default=en
maven.docs.dest=${maven.build.dir}/docs/en
When I want to generate a site in a different language, I override the
properties on the maven command line like this:
-Dmaven.docs.src=${basedir}/src/xdoc/xx
-Dmaven.faq.src=${basedir}/src/xdoc/xx/Faq
-Dmaven.xdoc.locale.default=xx
-Dmaven.docs.dest=${maven.build.dir}/docs/xx
where xx is replaced with the language to be generated (de es fr it ko pt_BR
ru zh_CN zh_TW)
In my UTF-8 enabled editor, the source files appear to be properly encoded.
Firefox and the Internet Explorer both agree that the HTML is UTF-8
encoded. The characters are scrambled.
What am I doing wrong?
Lance
Re: xDoc Scrambles UTF-8 source files when generating UTF-8 HTML
Posted by Lance Bader <ld...@gmail.com>.
I found an old Red Hat Linux system where I could run the supplied test
case. Precisely, it is Red Hat Enterprise Linux V4 update 3 for i386. I
installed Maven 1.1-beta-2 with maven-xdoc-plugin-1.9.2 and the attached
test case. I created a script that matches the actions in build_de.bat,
build_en.bat, build_fr.bat, and build_zh_TW.bat.
Except for the unrelated problem caused by poison properties files in
src\i18nBundles (see the comments in the JIRA issue), the HTML was generated
CORRECTLY.
NOTE: I did NOT have to modify the LANG or LC_CTYPE environment variables,
as suggested in the xDoc plugin FAQ or in
http://jira.codehaus.org/browse/MPXDOC-184 . By default, LANG was already
set to LANG="en_US.UTF-8". I dumped the Java system properties and
observed that file.encoding=UTF-8 by default.
So, that begs the question, "Why doesn't this work on a Windows XP system
when you use -Dfile.encoding=UTF-8 to override the default file encoding?"
Its a mystery.
Re: xDoc Scrambles UTF-8 source files when generating UTF-8 HTML
Posted by Lance Bader <ld...@gmail.com>.
URL: http://jira.codehaus.org/browse/MAVEN-1760
I have opened the Jira issue you requested and provided an archive
containing enough of the product to generate the index.html page for 4
languages, German, English, French, and Traditional Chinese.
Since I opened the issue I copied the same archive to a workstation using
the Sun JDK V1.5 Update 6 and recreated the same problem. I think this
means that it is unlikely that the IBM JDK is causing the problem.
I have also modified the velocity-1.4.jar file in the
.maven\repository\velocity\jars directory. I replaced
org\apache\velocity\runtime\defaults\velocity.properties after making the
following changes.
#----------------------------------------------------------------------------
# T E M P L A T E E N C O D I N G
#----------------------------------------------------------------------------
input.encoding=UTF-8
output.encoding=UTF-8
I recreated the problem after making this change. I like to think that the
problem is in velocity, but this change did not affect the outcome.
Next I will move to Red Hat Linux and try the suggested work around there.
Lance
Re: xDoc Scrambles UTF-8 source files when generating UTF-8 HTML
Posted by Lukas Theussl <lt...@apache.org>.
Something similar was reported here:
http://jira.codehaus.org/browse/MPXDOC-184
maybe you find a useful comment there. If you don't get it fixed, can
you try to trim down a simple test project and attach it to a JIRA issue?
Cheers,
-Lukas
Lance Bader wrote:
> I am using Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP
> workstation. When I attempt to build UTF-8 encoded HTML from UTF-8 XML
> source files, every special character is scrambled. I haven't done the
> analysis, but I would guess that every multi-byte character is being
> treated
> like a group of single byte characters.
>
> We are using the Maven xDoc plug-in to generate our on-line user guide. We
> sent the English XML source and I18N properties file to be translated
> into 9
> languages. The returned files are UTF-8 encoded.
>
> Each source file begins.
>
> <?xml version="1.0" encoding="UTF-8" ?>
>
> I build each language tree seperately and then combine the output trees
> into
> a single web site. In my project properties, I specify
>
> maven.xdoc.includeProjectDocumentation=no
> maven.xdoc.date=navigation-bottom
> maven.xdoc.jsl=file:${basedir}/src/site.jsl
> maven.docs.outputencoding=UTF-8
>
> maven.docs.src=${basedir}/src/xdoc/en
> maven.faq.src=${basedir}/src/xdoc/en/Faq
>
> maven.xdoc.bundle.src=${basedir}/src/i18nBundles
> maven.xdoc.bundle=wasce
> maven.xdoc.locale.default=en
> maven.docs.dest=${maven.build.dir}/docs/en
>
> When I want to generate a site in a different language, I override the
> properties on the maven command line like this:
>
> -Dmaven.docs.src=${basedir}/src/xdoc/xx
> -Dmaven.faq.src=${basedir}/src/xdoc/xx/Faq
> -Dmaven.xdoc.locale.default=xx
> -Dmaven.docs.dest=${maven.build.dir}/docs/xx
>
> where xx is replaced with the language to be generated (de es fr it ko
> pt_BR
> ru zh_CN zh_TW)
>
> In my UTF-8 enabled editor, the source files appear to be properly encoded.
> Firefox and the Internet Explorer both agree that the HTML is UTF-8
> encoded. The characters are scrambled.
>
> What am I doing wrong?
>
> Lance
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org