You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@xalan.apache.org by Brian Quinlan <br...@sweetapp.com> on 2003/01/25 22:45:38 UTC

Semantics of string types

I'd like to get my head around the various string types used in Xalan-C.

#ifdef XALAN_USE_NATIVE_WCHAR_T
XalanDOMChar is a wchar_t (is it interpreted as UTF-16 or as
UCS-2/UCS-4?)
#elsif
XalanDOMChar is a UTF-16 character
#endif

XMLCh is a wchar_t, how is that to be interpreted?

Cheers,
Brian

Re: Semantics of string types

Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.




Hi Brian,

We end up doing what Xerces does, so we can consume each other's UTF-16
strings.  Xerces is a bit funny in that they always use unsigned short,
except on some platforms where the port was done by someone else.
Currently, Borland C++ is the only platform where wchar_t is used by
Xerces.

The algorithm really ought to be:

   If wchar_t is known to be UTF-16, then use it.  Otherwise, use unsigned
   short.

So, all WIN32/64 compilers should use wchar_t, as should AIX 32-bit.  I
don't know about any of the other Unix platforms.  Linux is an entirely
different story, as wchar_t can even be EBCDIC, depending on the platform.

The only reason that ifdef exists in Xalan is because I started to do a
Borland port (but gave up), and had to do that first.

We didn't ever figure the whole URIResolver thing, did we?  It's probably
too late to do that for the next release, but we ought to revive that
discussion and settle on something.

Dave



                                                                                                                                      
                      Brian Quinlan                                                                                                   
                      <brian@sweetapp.         To:      xalan-dev@xml.apache.org                                                      
                      com>                     cc:      (bcc: David N Bertoni/Cambridge/IBM)                                          
                                               Subject: Semantics of string types                                                     
                      01/25/2003 01:45                                                                                                
                      PM                                                                                                              
                      Please respond                                                                                                  
                      to xalan-dev                                                                                                    
                                                                                                                                      



I'd like to get my head around the various string types used in Xalan-C.

#ifdef XALAN_USE_NATIVE_WCHAR_T
XalanDOMChar is a wchar_t (is it interpreted as UTF-16 or as
UCS-2/UCS-4?)
#elsif
XalanDOMChar is a UTF-16 character
#endif

XMLCh is a wchar_t, how is that to be interpreted?

Cheers,
Brian