You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Alexander Konovalov <ak...@esri.com> on 2000/06/17 01:07:38 UTC

Win2000 unicode locales

Did you by any chance try the parser on Win2000 with on from the pure 
Unicode locales? There are at least 7 of them available for now:

> http://www.microsoft.com/globaldev/faqs/locales.asp
> Which of the Windows 2000 locales do not have codepages?
> -----------------------------------------------------------
> These 7 locales do not have codepages, and are supported in Windows 2000 
> solely through Unicode: 
> 
> Armenian (Armenia) 
> Georgian (Georgia) 
> Hindi (India) 
> Tamil (India) 
> Marathi (India) 
> Sanskrit (India) 
> Konkani (India)  

How, for example, the  char* DOMString::transcode() method will work on such

locales? Will it return pointer to UTF-8 encoded string?


thanks,
Alexander Konovalov
akonovalov@esri.com






Re: Win2000 unicode locales

Posted by Mike Pogue <mp...@apache.org>.
There's a difference between a LOCALE and an ENCODING.

The locale is a set of user preferences and such.
An encoding (as far as XML is concerned) is a mapping between some bytes (
a code page) and Unicode.

On Win2K, what MS is saying is that some human languages (like Tamil) won't
have their own code page to encode Tamil letters.  Instead, Tamil will be expressed in
Unicode (which does handle Tamil characters!).

Note that Xerces-C and Xerces-J both handle Unicode characters, including Tamil characters
expressed in Unicode.

I don't think we've ever tested Xerces-C running on a Win2K Tamil system, when
the Tamil locale is specified!  But, this would be an interesting test, to see if
the char* returned is actually UTF-8 (and Tamil characters, besides!).

Mike

Alexander Konovalov wrote:
> 
> Did you by any chance try the parser on Win2000 with on from the pure
> Unicode locales? There are at least 7 of them available for now:
> 
> > http://www.microsoft.com/globaldev/faqs/locales.asp
> > Which of the Windows 2000 locales do not have codepages?
> > -----------------------------------------------------------
> > These 7 locales do not have codepages, and are supported in Windows 2000
> > solely through Unicode:
> >
> > Armenian (Armenia)
> > Georgian (Georgia)
> > Hindi (India)
> > Tamil (India)
> > Marathi (India)
> > Sanskrit (India)
> > Konkani (India)
> 
> How, for example, the  char* DOMString::transcode() method will work on such
> 
> locales? Will it return pointer to UTF-8 encoded string?
> 
> thanks,
> Alexander Konovalov
> akonovalov@esri.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org