You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by da...@opensource.lk on 2003/09/03 13:00:15 UTC

Axis C++ problem--FYI

Hi all,

The code in Axis c++ cvs compiles both for windows and linux. But it works
only for windows. 
We have identified the problem as is clear from the following xerces mailing
archive. wchar_t is 32bit in linux and is 16 bit in Windows. We made the
mistake of assuming that the wchar_t is 16 bit in every platform.
We hope to solve this problem immediately and apologise for any inconvenience
caused.
This problem is describe in Xerces mailing list as follows.
========================================================================
List:     xerces-c-dev
Subject:  RE: wchar_t and XMLCh
From:     "Nikko" <nikko () gmx ! fr>
Date:     2003-03-04 20:58:28
[Download message RAW]


Believe me, wchar_t is evil. Redefine your own 16 bits string or use a
string<unsigned short> instead, which you can convert to/from XMLCh* easily.
Even if theoretically unsigned short is not necessarily two bytes long, it
is far more reliable than wchar_t being 2 bytes long.

Best


-----Message d'origine-----
De : David N Bertoni/Cambridge/IBM [mailto:david_n_bertoni@us.ibm.com]
Envoyé : mardi 4 mars 2003 20:27
À : xerces-c-dev@xml.apache.org
Objet : RE: wchar_t and XMLCh






> Thanks for your suggestion. That will probably work in every case except
this one.
> The reason being we are building a wrapper library over Xerces and our
interface
> exposes only std::wstring. We don't expose internal xerces types. In
particular
> on Solaris, we want to link against STLport library. Is there any
requirement in
> Xerces that will force XMLCh to be 2 bytes? If all xerces code uses
sizeof(XMLCh)
> then it should be probably be ok, but if there is any hard coded value
(which
> assumes 2 bytes), then the change won't work.

I suggest you typedef something which mirrors the Xerces XMLCh typedef and
use std::basic_string<OutXMLChTypedef>.  Otherwise, you risk some
incompatibility with Xerces now, or in the future.  You also inadvertantly
encourage the use of wide-character functions may not be prepared to accept
UTF-16 code points:

   // find the first newline character in the Xerces string.
   const wchar_t* const    newlineChar = wcschr(xercesStr.c_str(), 10);

Will this work?  Maybe, but who knows?  A particular compiler/platform has
a particular encoding for wchar_t and you should not attempt to force
improperly-encoded code points into it.

Of course, you can always change the Xerces typedef to wchar_t and do what
you want, but that means you're on your own if there's a problem now or in
the future.  You also have to build a custom version of Xerces for every
platform and be prepared to support it.  It seems just a bit too scary for
me.

Dave




                      qchen
                      <qchen@micron.co         To:
"'xerces-c-dev@xml.apache.org'" <xe...@xml.apache.org>
                      m>                       cc:      (bcc: David N
Bertoni/Cambridge/IBM)
                                               Subject: RE: wchar_t and
XMLCh
                      03/04/2003 09:41
                      AM
                      Please respond
                      to xerces-c-dev




David,

Thanks for your suggestion. That will probably work in every case except
this one. The reason being we are building a wrapper library over Xerces
and our interface exposes only std::wstring. We don't expose internal
xerces types. In particular on Solaris, we want to link against STLport
library. Is there any requirement in Xerces that will force XMLCh to be 2
bytes? If all xerces code uses sizeof(XMLCh) then it should be probably be
ok, but if there is any hard coded value (which assumes 2 bytes), then the
change won't work.


Qi Chen


-----Original Message-----
From: David N Bertoni/Cambridge/IBM [mailto:david_n_bertoni@us.ibm.com]
Sent: Tuesday, March 04, 2003 10:08 AM
To: xerces-c-dev@xml.apache.org
Subject: Re: wchar_t and XMLCh






> Basically I need to convert the XMLCh* to a std::wstring and vice versa.
In Xerces, XMLCh is
> typdef-ed to unsigned short (2 bytes). Under win32, there is no need for
conversion since wchar_t
> is also typedef-ed to unsigned short. In Solaris/Linux/VMS, however,
wchar_t is typedef-ed to
> unsigned long (4 bytes), so the conversion seem to be inevitable.

There are several reason there's no need for conversion on Win32. One is
that Visual C++ 6.0 doesn't not implement wchar_t as a proper type, which
is not correct.  Most of the platforms to which you refer, depending on the
age of the compiler, _do_ implement wchar_t as a proper type, and not as a
typedef.  The other, and more important reason, is because Win32 uses
Unicode, so wide characters are known to be UCS-2/UTF-16 code points.

> My question is: Does Xerces implementation requires that the size XMLCh
to be 2 bytes?  if I
> change the typedef of XMLCh to wchar_t and recompile the xerces, would it
work? I know the
> answer is probably no, but I just want to make sure. Of course the memory
usage will be doubled
> if we change the XMLCh to 4 bytes, but that is not a concern for me.

For any given operating system, the issue is not really the size of XMLCh,
it's whether the operating system assumes wide characters are UCS-2/UTF-16
code points.  If not, there's no point in making XMLCh and wchar_t
compatible, because the OS cannot process them.

You should re-examine why you're storing UTF-16 encoded character, like
Xerces produces, in std::wstring.  std::basic_string<XMLCh> might be a
better choice.

Dave

========================================================================

Note:We can immediatly solve this by reverting back to using
XMLString::transcode. But it is highly inefficient. We are working on an
alternative.

damitha

--
Lanka Software Foundation (http://www.opensource.lk)
Promoting Open-Source Development in Sri Lanka


Re: Axis C++ problem--FYI

Posted by Steve Loughran <st...@iseran.com>.
damitha@opensource.lk wrote:

> Hi all,
> 
> The code in Axis c++ cvs compiles both for windows and linux. But it works
> only for windows. 
> We have identified the problem as is clear from the following xerces mailing
> archive. wchar_t is 32bit in linux and is 16 bit in Windows. We made the
> mistake of assuming that the wchar_t is 16 bit in every platform.
> We hope to solve this problem immediately and apologise for any inconvenience
> caused.
> This problem is describe in Xerces mailing list as follows.
> ========================================================================
> List:     xerces-c-dev
> Subject:  RE: wchar_t and XMLCh
> From:     "Nikko" <nikko () gmx ! fr>
> Date:     2003-03-04 20:58:28
> [Download message RAW]
> 
> 
> Believe me, wchar_t is evil. Redefine your own 16 bits string or use a
> string<unsigned short> instead, which you can convert to/from XMLCh* easily.
> Even if theoretically unsigned short is not necessarily two bytes long, it
> is far more reliable than wchar_t being 2 bytes long.
> 
> Best

Interesting. I never knew that.

Given you are building on VS6.0, you dont have the 'wchar_t is a unique 
type' switch which is the real benefit of using the type over a typedef. 
Flip it and you can override
foo(unsigned short); foo(unsigned long) and foo(wchar_t) without the 
compiler complaining.