You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Brian Quinlan <br...@sweetapp.com> on 2003/01/25 22:45:38 UTC
Semantics of string types
I'd like to get my head around the various string types used in Xalan-C.
#ifdef XALAN_USE_NATIVE_WCHAR_T
XalanDOMChar is a wchar_t (is it interpreted as UTF-16 or as
UCS-2/UCS-4?)
#elsif
XalanDOMChar is a UTF-16 character
#endif
XMLCh is a wchar_t, how is that to be interpreted?
Cheers,
Brian
Re: Semantics of string types
Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.
Hi Brian,
We end up doing what Xerces does, so we can consume each other's UTF-16
strings. Xerces is a bit funny in that they always use unsigned short,
except on some platforms where the port was done by someone else.
Currently, Borland C++ is the only platform where wchar_t is used by
Xerces.
The algorithm really ought to be:
If wchar_t is known to be UTF-16, then use it. Otherwise, use unsigned
short.
So, all WIN32/64 compilers should use wchar_t, as should AIX 32-bit. I
don't know about any of the other Unix platforms. Linux is an entirely
different story, as wchar_t can even be EBCDIC, depending on the platform.
The only reason that ifdef exists in Xalan is because I started to do a
Borland port (but gave up), and had to do that first.
We didn't ever figure the whole URIResolver thing, did we? It's probably
too late to do that for the next release, but we ought to revive that
discussion and settle on something.
Dave
Brian Quinlan
<brian@sweetapp. To: xalan-dev@xml.apache.org
com> cc: (bcc: David N Bertoni/Cambridge/IBM)
Subject: Semantics of string types
01/25/2003 01:45
PM
Please respond
to xalan-dev
I'd like to get my head around the various string types used in Xalan-C.
#ifdef XALAN_USE_NATIVE_WCHAR_T
XalanDOMChar is a wchar_t (is it interpreted as UTF-16 or as
UCS-2/UCS-4?)
#elsif
XalanDOMChar is a UTF-16 character
#endif
XMLCh is a wchar_t, how is that to be interpreted?
Cheers,
Brian