You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Håvard Wigtil <ha...@stud.ntnu.no> on 2001/02/24 02:48:22 UTC

Converting between character sets

Hi all!

[I realize this is more of a user question, but as I can find no user-list
 I'll try my luck here. Feel free to ignore me. ;) ]

I'm writing an app that will output text to multiple platforms (for import
in QuarkXPress or similar programs), so I need to convert iso-8859-1 to
the local character set.

The problem is converting to Macintosh (MacRoman or MacTEC) encodings.
I started with Xalan 1.2.2, but I couldn't get it to output the proper
line terminators, they were always "CR/LF" even if encoding was set to
MacRoman and the transformation was executed on a Mac.

I couldn't solve the problem with 1.2.2, so I upgraded to 2.0. The upgrade
solved the line terminator problem, but it intoduced a new one:
Coversion of entities don't work as they did with 1.2.2, all extended
characters like accented characters and the bullet symbol and the em dash
shows up as number entities instead of the native characters.

Exapmle: <name>Håvard Wigtil</name> transforms to "Håvard Wigtil" on
1.2.2, but comes out as "H&#229;vard Wigtil" on 2.0.

For a small example using 1.2.2 and 2.0 see
http://www.stud.ntnu.no/~havardw/transform.jar


   TIA, Håvard

mailto:havardw@stud.ntnu.no||http://www.stud.ntnu.no/~havardw||73 52 55 76
 All it takes to start an avalanche is a single snowflake||Or a snowboarder
        Oh! Un Fraggle! Regarde, maman! J'ai attrapé un Fraggle!