You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Sean Jensen_Grey <se...@speakeasy.org> on 2000/06/03 23:19:13 UTC

difference between ISO-8859-1 and ISO-8859-2 encodings?

I have a bunch of xml docs which have both 

ISO-8859-1 and
ISO-8859-2

Whenever my SAX program hits ISO-8859-2 it bails with

Fatal Error at (file foo,line 1, char 44): An exception occured! Type:RuntimeException, Message:Could not
create a converter for encoding: ISO-8859-2

How do I go about adding a converter for this encoding? Do I need to?

can I reuse ./util/XML88591Transcoder.cpp

Thanks, Sean.




Re: difference between ISO-8859-1 and ISO-8859-2 encodings?

Posted by Dean Roddey <dr...@charmedquark.com>.
If you are using the XML4C version, ICU handles it. Most likely, in the
upcoming release, the native Win32 transcoder will handle it also (though
that depends upon whether Win32 support for that particular encoding is
installed on the target machine.)

You can add your own transcoders if you want to take responsibility for
doing it. 8859-1 is a special case, so don't follow that one. It maps
straight to Unicode with just a widening cast. You'll want to base it on the
256 table transcoder. Look at the Win 1252 transcoder. It is based on that
underlying table transcoder, and 8859-2 is of that type. It uses two tables.
One to transcoder to and one to transcode from. What you have to do is to
set up those tables.

Right now, we don't allow you to stick your own transcoders into the
intrinsic transcoder table. So you'll have to either hack the
TransService.cpp file to stick yours in. Or, you can hack the particular
transcoder you use and watch for the particular encoding you've provided a
transcoder for, and new up one when you see that transcoding, otherwise fall
through and let the normal transcoder logic work.

--------------------------
Dean Roddey
The CIDLib Class Libraries
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"Give me immortality, or give me death"

----- Original Message -----
> I have a bunch of xml docs which have both
>
> ISO-8859-1 and
> ISO-8859-2
>
> Whenever my SAX program hits ISO-8859-2 it bails with
>



Re: difference between ISO-8859-1 and ISO-8859-2 encodings?

Posted by Andy Heninger <an...@jtcsv.com>.
ISO-8859-2 encoding supports the characters of Eastern European languages
(SLavic, Albanian, Hungarian, Romanian.)  It is not compatible with
ISO-8859-1  (also known as latin-1, supports Western European languages.)

The easiest answer is to use IBM's ICU transcoding support with Xerces.
XML4C from IBM is Xerces built with ICU; it is also possible to rebuild
the Xerces sources from Apache to use ICU transcoding, although we don't
have a good set of instructions for this available yet.

ICU is also open source.  See http://oss.software.ibm.com/icu/
XML4C is at  http://alphaworks.ibm.com/tech/xml4c


Andy Heninger
IBM XML Technology Group, Cupertino, CA
heninger@us.ibm.com



----- Original Message -----
From: "Sean Jensen_Grey" <se...@speakeasy.org>
To: <xe...@xml.apache.org>
Sent: Saturday, June 03, 2000 2:19 PM
Subject: difference between ISO-8859-1 and ISO-8859-2 encodings?


>
> I have a bunch of xml docs which have both
>
> ISO-8859-1 and
> ISO-8859-2
>
> Whenever my SAX program hits ISO-8859-2 it bails with
>
> Fatal Error at (file foo,line 1, char 44): An exception occured!
Type:RuntimeException, Message:Could not
> create a converter for encoding: ISO-8859-2
>
> How do I go about adding a converter for this encoding? Do I need to?
>
> can I reuse ./util/XML88591Transcoder.cpp
>
> Thanks, Sean.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>
>