You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@velocity.apache.org by Brett Joseph Morgan <bj...@it.uts.EDU.AU> on 2002/11/07 01:12:25 UTC

Difference in behaviour between Solaris and Win32 regarding ShiftJIS templates

Hi all,

I have a cautionary tale about developing internationalized code on
windows for deployment on unix machines. 

Generating web pages for Japan requires that we use Shift-JIS character
sets. Java's native character set is UTF-16. The official way, according
to Sun, is to always convert character sets on the way into, and out of,
java. Thus given the fact that we have templates in Shift-JIS, and we
are generating web pages in ShiftJIS, we should do the following:
Templates read in and converted Shift-JIS -> UTF16, munch internal
content, then produce output converting on the fly UTF16 -> Shift-JIS.

A problem: It appears, from testing, that Sun's character conversion
charts for Shift-JIS <=> UTF-16 are not perfect, with some characters
being converted into ?'s on the round trip.

A workaround: Do not do any character conversions, treat the character
streams as byte streams, and hope for the best.

The cautionary tale: The above workaround works flawlessly on Solaris
using both jdk 1.3.1_01 and jdk1.2.2, but fails miserably on Windows
2000 using jdks 1.2.2, 1.3.1_06 and 1.4.1_01.

I have attached my test case (StreamTest.java), a boiled down velocity
template that contains a set of Shift-JIS encoded characters
(japanese-template.txt), and the output files generated on win2k (my
laptop) and Solaris 2.6 (cco-dev). 

These tests were carried using velocity-dep-1.3.1-rc2.jar, but other
versions of velocity appear, at first glance, to behave the same.

Anyone who has insight into why this works on Solaris but not win32,
please speak up now :-)

brett

Re: Difference in behaviour between Solaris and Win32 regarding ShiftJIS templates

Posted by Daniel Dekany <dd...@freemail.hu>.
Thursday, November 7, 2002, 3:21:46 PM, Kent Johnson wrote:

[snip]
> You didn't say what the default locale is on the Win2k and Solaris
> systems. If Win2K is English and Solaris is Japanese that could 
> account for the difference.
[snip]

The default encoding probably differs even if both OS is configured
for the same locale. E.g. for the locales where most UN*X uses
ISO-8859-x encoding, Windows usually uses cp125x encodings. (No to
mention that Windows also uses a so called OEM encoding for terminal
output, which is usually cp85x. It often confuse Java programmers when
they output something to the stdout to test something.)

-- 
Best regards,
 Daniel Dekany


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Difference in behaviour between Solaris and Win32 regarding ShiftJIS templates

Posted by Kent Johnson <ke...@skillsoft.com>.
You should specify the correct encoding for both input and output. 
For input, use input.encoding as noted below. For output, one way is 
to use an OutputStreamWriter wrapped around your FileOutputStream. 
For example
final Writer writer = new OutputStreawWriter(new 
FileOutputStream(outputFilename), "Shift_JIS");
template.merge(context, writer);
writer.close();

You didn't say what the default locale is on the Win2k and Solaris 
systems. If Win2K is English and Solaris is Japanese that could 
account for the difference.

Kent

>My tip:
>I guess the cause of problem is that you don't specify the charset
>when you call getTemplate, thus Velocity uses the input.encoding in
>velocity.propertyes. This is probably ISO-8859-1, which differs from
>the default encoding used by Windows with US locale (CP1252). And
>since you call String.getBytes(), you basically do
>ISO-8859-1 -> UTF-16 -> CP1252, which is often not possible without
>loss. (The default charset on Solaris (with US locale is) ISO-8859-1,
>so the problem does not occur there.)
>
>
>--
>To unsubscribe, e-mail: 
><ma...@jakarta.apache.org>
>For additional commands, e-mail: 
><ma...@jakarta.apache.org>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Difference in behaviour between Solaris and Win32 regarding ShiftJIS templates

Posted by Daniel Dekany <dd...@freemail.hu>.
Thursday, November 7, 2002, 1:12:25 AM, Brett Joseph Morgan wrote:

[snip]
> A problem: It appears, from testing, that Sun's character conversion
> charts for Shift-JIS <=> UTF-16 are not perfect, with some characters
> being converted into ?'s on the round trip.
[snip]

I don't know Shift-JIS charset, but I guess that since it is an
important charset, it should be convertible to UCS and back without
loss. And if so, it's pathetic if the Sun implementation (AFAIK uses
IBM's ICU) does this converting wrongly. Are you really sure you tried
to do the round trip conversion on the right way?

> A workaround: Do not do any character conversions, treat the character
> streams as byte streams, and hope for the best.
>
> The cautionary tale: The above workaround works flawlessly on Solaris
> using both jdk 1.3.1_01 and jdk1.2.2, but fails miserably on Windows
> 2000 using jdks 1.2.2, 1.3.1_06 and 1.4.1_01.
[snip]

My tip:
I guess the cause of problem is that you don't specify the charset
when you call getTemplate, thus Velocity uses the input.encoding in
velocity.propertyes. This is probably ISO-8859-1, which differs from
the default encoding used by Windows with US locale (CP1252). And
since you call String.getBytes(), you basically do
ISO-8859-1 -> UTF-16 -> CP1252, which is often not possible without
loss. (The default charset on Solaris (with US locale is) ISO-8859-1,
so the problem does not occur there.)


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>