You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Scott Paulinski <sc...@hotmail.com> on 2001/08/17 19:38:45 UTC
Xerces Internationalization
Hello,
I am currently trying to internationalize a product that uses Xerces 1.5.1.
The code is written in C++ and it accesses Xerces through its xml4C COM
interface using DOM. I am running into some problems getting this to work
correctly in the case of Windows 95 on French and German systems. What I am
looking to achieve is to be able parse and save an XML file on Windows 95
(without an IE upgrade) that contains non-English European characters like
the e and i with an accent or the u with two dots over it. What I have
tried so far is:
1) Include no header in the XML file being read. This results in
non-English characters being read in as a ? character.
2) Including the header <?xml version="1.0" encoding="iso-8859-1" ?>. This
causes the file not to be read at all. Looking at the Xerces code I was
able to track down one of the problems to the way Xerces detects codepages
in Win32TransService.cpp. In the constructor it checks for the codepages on
the machine by looking in the registry under HKCR\MIME\Database\Codepage
(and Charset), which doesn't exist on a base Windows 95 system.
I was able to add this set of registry keys by installing IE 4.01, but the
iso-8859-1 encoding still doesn't work for non-English characters. In this
case Xerces ignores the entire file if it contains such characters.
Unfortunately, the 1252 codepage (which is what iso-8859-1 looks like it is
mapped to) appears to be the only one installed on this version of Windows
95. The 1252 codepage is named "Western European (Windows)" in the registry
which sounds like the character set I am looking for. Looking at Xerces
documentation it looks like they support iso-8859-1 as "ISO Latin 1" which
sounds promising as well. So it looks like I am using the proper codepage,
but it just isn't working for some reason.
On a side note, I found that using iso-8859-3 (1254) does allow Xerces to
use these non-English characters. Though this encoding is not installed on
these Windows 95 systems. If anyone knows an easy way to install this
encoding (without installing a whole application like IE) that would be
helpful as well.
Any help is greatly appreciated.
Sincerely,
Scott Paulinski
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org