You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Scott Paulinski <sc...@hotmail.com> on 2001/08/17 19:38:45 UTC

Xerces Internationalization

Hello,

I am currently trying to internationalize a product that uses Xerces 1.5.1.  
The code is written in C++ and it accesses Xerces through its xml4C COM 
interface using DOM.  I am running into some problems getting this to work 
correctly in the case of Windows 95 on French and German systems.  What I am 
looking to achieve is to be able parse and save an XML file on Windows 95 
(without an IE upgrade) that contains non-English European characters like 
the e and i with an accent or the u with two dots over it.  What I have 
tried so far is:

1) Include no header in the XML file being read.  This results in 
non-English characters being read in as a ? character.

2) Including the header <?xml version="1.0" encoding="iso-8859-1" ?>.  This 
causes the file not to be read at all.  Looking at the Xerces code I was 
able to track down one of the problems to the way Xerces detects codepages 
in Win32TransService.cpp.  In the constructor it checks for the codepages on 
the machine by looking in the registry under HKCR\MIME\Database\Codepage 
(and Charset), which doesn't exist on a base Windows 95 system.

I was able to add this set of registry keys by installing IE 4.01, but the 
iso-8859-1 encoding still doesn't work for non-English characters.  In this 
case Xerces ignores the entire file if it contains such characters.  
Unfortunately, the 1252 codepage (which is what iso-8859-1 looks like it is 
mapped to) appears to be the only one installed on this version of Windows 
95.  The 1252 codepage is named "Western European (Windows)" in the registry 
which sounds like the character set I am looking for.  Looking at Xerces 
documentation it looks like they support iso-8859-1 as "ISO Latin 1" which 
sounds promising as well.  So it looks like I am using the proper codepage, 
but it just isn't working for some reason.

On a side note, I found that using iso-8859-3 (1254) does allow Xerces to 
use these non-English characters.  Though this encoding is not installed on 
these Windows 95 systems.  If anyone knows an easy way to install this 
encoding (without installing a whole application like IE) that would be 
helpful as well.

Any help is greatly appreciated.

Sincerely,
Scott Paulinski

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org