You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by da...@opensource.lk on 2003/09/03 13:00:15 UTC
Axis C++ problem--FYI
Hi all,
The code in Axis c++ cvs compiles both for windows and linux. But it works
only for windows.
We have identified the problem as is clear from the following xerces mailing
archive. wchar_t is 32bit in linux and is 16 bit in Windows. We made the
mistake of assuming that the wchar_t is 16 bit in every platform.
We hope to solve this problem immediately and apologise for any inconvenience
caused.
This problem is describe in Xerces mailing list as follows.
========================================================================
List: xerces-c-dev
Subject: RE: wchar_t and XMLCh
From: "Nikko" <nikko () gmx ! fr>
Date: 2003-03-04 20:58:28
[Download message RAW]
Believe me, wchar_t is evil. Redefine your own 16 bits string or use a
string<unsigned short> instead, which you can convert to/from XMLCh* easily.
Even if theoretically unsigned short is not necessarily two bytes long, it
is far more reliable than wchar_t being 2 bytes long.
Best
-----Message d'origine-----
De : David N Bertoni/Cambridge/IBM [mailto:david_n_bertoni@us.ibm.com]
Envoyé : mardi 4 mars 2003 20:27
À : xerces-c-dev@xml.apache.org
Objet : RE: wchar_t and XMLCh
> Thanks for your suggestion. That will probably work in every case except
this one.
> The reason being we are building a wrapper library over Xerces and our
interface
> exposes only std::wstring. We don't expose internal xerces types. In
particular
> on Solaris, we want to link against STLport library. Is there any
requirement in
> Xerces that will force XMLCh to be 2 bytes? If all xerces code uses
sizeof(XMLCh)
> then it should be probably be ok, but if there is any hard coded value
(which
> assumes 2 bytes), then the change won't work.
I suggest you typedef something which mirrors the Xerces XMLCh typedef and
use std::basic_string<OutXMLChTypedef>. Otherwise, you risk some
incompatibility with Xerces now, or in the future. You also inadvertantly
encourage the use of wide-character functions may not be prepared to accept
UTF-16 code points:
// find the first newline character in the Xerces string.
const wchar_t* const newlineChar = wcschr(xercesStr.c_str(), 10);
Will this work? Maybe, but who knows? A particular compiler/platform has
a particular encoding for wchar_t and you should not attempt to force
improperly-encoded code points into it.
Of course, you can always change the Xerces typedef to wchar_t and do what
you want, but that means you're on your own if there's a problem now or in
the future. You also have to build a custom version of Xerces for every
platform and be prepared to support it. It seems just a bit too scary for
me.
Dave
qchen
<qchen@micron.co To:
"'xerces-c-dev@xml.apache.org'" <xe...@xml.apache.org>
m> cc: (bcc: David N
Bertoni/Cambridge/IBM)
Subject: RE: wchar_t and
XMLCh
03/04/2003 09:41
AM
Please respond
to xerces-c-dev
David,
Thanks for your suggestion. That will probably work in every case except
this one. The reason being we are building a wrapper library over Xerces
and our interface exposes only std::wstring. We don't expose internal
xerces types. In particular on Solaris, we want to link against STLport
library. Is there any requirement in Xerces that will force XMLCh to be 2
bytes? If all xerces code uses sizeof(XMLCh) then it should be probably be
ok, but if there is any hard coded value (which assumes 2 bytes), then the
change won't work.
Qi Chen
-----Original Message-----
From: David N Bertoni/Cambridge/IBM [mailto:david_n_bertoni@us.ibm.com]
Sent: Tuesday, March 04, 2003 10:08 AM
To: xerces-c-dev@xml.apache.org
Subject: Re: wchar_t and XMLCh
> Basically I need to convert the XMLCh* to a std::wstring and vice versa.
In Xerces, XMLCh is
> typdef-ed to unsigned short (2 bytes). Under win32, there is no need for
conversion since wchar_t
> is also typedef-ed to unsigned short. In Solaris/Linux/VMS, however,
wchar_t is typedef-ed to
> unsigned long (4 bytes), so the conversion seem to be inevitable.
There are several reason there's no need for conversion on Win32. One is
that Visual C++ 6.0 doesn't not implement wchar_t as a proper type, which
is not correct. Most of the platforms to which you refer, depending on the
age of the compiler, _do_ implement wchar_t as a proper type, and not as a
typedef. The other, and more important reason, is because Win32 uses
Unicode, so wide characters are known to be UCS-2/UTF-16 code points.
> My question is: Does Xerces implementation requires that the size XMLCh
to be 2 bytes? if I
> change the typedef of XMLCh to wchar_t and recompile the xerces, would it
work? I know the
> answer is probably no, but I just want to make sure. Of course the memory
usage will be doubled
> if we change the XMLCh to 4 bytes, but that is not a concern for me.
For any given operating system, the issue is not really the size of XMLCh,
it's whether the operating system assumes wide characters are UCS-2/UTF-16
code points. If not, there's no point in making XMLCh and wchar_t
compatible, because the OS cannot process them.
You should re-examine why you're storing UTF-16 encoded character, like
Xerces produces, in std::wstring. std::basic_string<XMLCh> might be a
better choice.
Dave
========================================================================
Note:We can immediatly solve this by reverting back to using
XMLString::transcode. But it is highly inefficient. We are working on an
alternative.
damitha
--
Lanka Software Foundation (http://www.opensource.lk)
Promoting Open-Source Development in Sri Lanka
Re: Axis C++ problem--FYI
Posted by Steve Loughran <st...@iseran.com>.
damitha@opensource.lk wrote:
> Hi all,
>
> The code in Axis c++ cvs compiles both for windows and linux. But it works
> only for windows.
> We have identified the problem as is clear from the following xerces mailing
> archive. wchar_t is 32bit in linux and is 16 bit in Windows. We made the
> mistake of assuming that the wchar_t is 16 bit in every platform.
> We hope to solve this problem immediately and apologise for any inconvenience
> caused.
> This problem is describe in Xerces mailing list as follows.
> ========================================================================
> List: xerces-c-dev
> Subject: RE: wchar_t and XMLCh
> From: "Nikko" <nikko () gmx ! fr>
> Date: 2003-03-04 20:58:28
> [Download message RAW]
>
>
> Believe me, wchar_t is evil. Redefine your own 16 bits string or use a
> string<unsigned short> instead, which you can convert to/from XMLCh* easily.
> Even if theoretically unsigned short is not necessarily two bytes long, it
> is far more reliable than wchar_t being 2 bytes long.
>
> Best
Interesting. I never knew that.
Given you are building on VS6.0, you dont have the 'wchar_t is a unique
type' switch which is the real benefit of using the type over a typedef.
Flip it and you can override
foo(unsigned short); foo(unsigned long) and foo(wchar_t) without the
compiler complaining.