You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Ma...@VerizonWireless.com on 2005/04/29 21:03:38 UTC

XMLCh endianness and conversion issues

I am working on writing an extremely simple C++ utility that just performs
schema validation on a document. In the process of doing this, I need to be
able to accept a schema argument on the command line because the path
specified in the document itself is not always correct. To this end, I read
the Xerces API documentation which says I should set the
fgXercesSchemaExternalSchemaLocation property to something like
"http://www.foo.com/bar/baz boo.xsd". The only trick is, that string literal
has to be in XMLCh format.

Here is a code snippet, where doStuff's output represents my eventual XMLCh*
UTF-16 string.

    SAX2XMLReader *parser = XMLReaderFactory::createXMLReader();
    std::string schemaLocation = "http://www.foo.com/bar/baz ";
    schemaLocation += argv[2];
    parser->setProperty(XMLUni::fgXercesSchemaExternalSchemaLocation,
doStuff(schemaLocation));

Now, the age-old problem, to which I never found a specific answer in the
list archives, is the following: I am working on a Solaris 2.9 UltraSPARC
III system with GCC 3.4.3 and my compiler unfortunately defines wchar_t as a
32-bit half word (64-bit mode here). Therefore I cannot just cast a pointer
to a wstring object's data or a wchar_t* pointer.

This leads to the need to make a very simple transcoder from ASCII to
UTF-16. However, I am on a big endian system and I do not know into which
bytes I should be placing the character codes. Is XMLCh* in UTF-16,
UTF-16BE, or UTF-16LE form? If it is in pure UTF-16, I shall need to insert
the Unicode Byte Order Mark symbol into the array I create [1].

Can somebody please help me with this? I don't know what to do and my thread
is blocking on figuring this out since the rest of the code is done already.

Side note: I also figured out how to get support for Solaris / GCC compiler
instead of Sun Forte. Also, when I get this very useful CLI validator going,
where / how can I contribute that as a Xerces sample application? Where /
how do I give all of this potentially useful information back to Xerces?
None of the existing sample applications seemed to address the transcoding
issue so it would be useful to integrate into the tree. Whatever you all can
help me figure out, I'll engineer a working implementation and give it back
to the community.

Matthew Hall
Verizon Wireless Data Mediation

[1]: http://en.wikipedia.org/wiki/UTF-16

PS: Sorry about the disclaimer. I can't remove it.
___________________________________________________________________
The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work
product doctrine and thus protected from disclosure.  If the reader
of this message is not the intended recipient, or an employee or
agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all
copies and backups thereof.  Thank you.