You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Debashis Tripathy <de...@gmail.com> on 2006/10/04 09:01:56 UTC

Query on Unicode Support for XERCES-C

Hi,

I need to implement XERCES-C on a MFC project. I need to know the extent of
Unicode Support provided by XERECES. My question is, whenever I need to pass
a literal string (for eg: "my string") to one of the XRECES library methods,
can I pass a unicode string (eg _T("my string") or L"my string") instead?

Also, wherever XERCES expects a parameter of type "const char*", is it OK to
pass a "const _TCHAR*" or "LPCTSTR" instead?

Any help will be highly appreciated.

Thanks & Regards,
Deba

Re: Query on Unicode Support for XERCES-C

Posted by Alberto Massari <am...@datadirect.com>.
At 10.58 05/10/2006 +0530, Debashis Tripathy wrote:
>Alberto, thanks.
>
>Jesse, could you please be a little more precise as to why the _T() macro
>strings will be broken? As per my understanding, it is always better to use
>_T("my string"). If _UNICODE is defined, then this will be taken as L"my
>string" at compile time. If _UNICODE is not defined (ie for ANSI or MBCS) -
>it will be treated as the plain literal string "my string".
>
>My understanding (from the API documentation) is that most XERCES routines
>have an unicode-compatible overloaded method. For eg, in case of
>XMLString::catString() method, the available signatures are:
>
>static void XMLString::catString (char* const target, const char* source)
>static void XMLString::catString (XMLCh* const target, const XMLCh* source)
>
>Now if I declare one of the parameters (say the source string) as
>_T("...."), then for _UNICODE it will be a L"....", which will be treated as
>"const XMLCh*". This will invoke the second function. If _UNICODE is not
>defined, then it will map to "const char*" and the first one (of the two
>overloaded functions above) will be invoked.
>
>Please let me know if I am wrong in my assumption.

Your assumption is right, but you are assuming that *ALL* the methods 
have the double signature. But if you write

    XMLCh buffer[200];
    ...
    XMLString::catString(buffer, _T("..."));

it will not work without the _UNICODE macro defined, as there is no 
overload that mixes XMLCh* and char*.

Alberto 


Re: Query on Unicode Support for XERCES-C

Posted by David Bertoni <db...@apache.org>.
Debashis Tripathy wrote:
> Alberto, thanks.
> 
> Jesse, could you please be a little more precise as to why the _T() macro
> strings will be broken? As per my understanding, it is always better to use
> _T("my string"). If _UNICODE is defined, then this will be taken as L"my
> string" at compile time. If _UNICODE is not defined (ie for ANSI or MBCS) -
> it will be treated as the plain literal string "my string".
> 
> My understanding (from the API documentation) is that most XERCES routines
> have an unicode-compatible overloaded method. For eg, in case of
> XMLString::catString() method, the available signatures are:

Many, but not all.

> 
> static void XMLString::catString (char* const target, const char* source)
> static void XMLString::catString (XMLCh* const target, const XMLCh* source)

I can imagine a case where you would try to use catString() with a constant 
string from your application, and a wide string from Xerces-C.  If you 
didn't define _UNICODE, you would need an overload like this:

static void XMLString::catString (XMLCh* const target, const char* source)

Since that overload doesn't exist, your code would not compile without 
defining _UNICODE.

> 
> Now if I declare one of the parameters (say the source string) as
> _T("...."), then for _UNICODE it will be a L"....", which will be 
> treated as
> "const XMLCh*". This will invoke the second function. If _UNICODE is not
> defined, then it will map to "const char*" and the first one (of the two
> overloaded functions above) will be invoked.
> 
> Please let me know if I am wrong in my assumption.
> 

Since Xerces-C always operates internally in Unicode, regardless of the 
_UNICODE macro, you gain nothing by using the _T macro, and sacrifice 
performance by transcoding strings at run-time, rather than at compile-time.

Also, because your application can run in a different code page from the 
one that you compiled with, unless you limit yourself to a small subset of 
invariant characters, local code page transcoding may cause problems.

Dave

RE: Query on Unicode Support for XERCES-C

Posted by Jesse Pelton <js...@PKC.com>.
I guess I'm a control freak (and a lazy one to boot) and overstated the
case, for which I apologize.

As long as there are two versions of the functions you want to use, it's
reasonable to use the _T() macro.  However, this means that as you write
your code, you either have to check that both versions exist or accept
the possibility that your code won't compile when you change the value
of the _UNICODE macro.  For me it's easier to choose a type and stick
with it, but your situation may be different.

-----Original Message-----
From: Debashis Tripathy [mailto:deba022@gmail.com] 
Sent: Thursday, October 05, 2006 1:28 AM
To: c-users@xerces.apache.org
Subject: Re: Query on Unicode Support for XERCES-C

Alberto, thanks.

Jesse, could you please be a little more precise as to why the _T()
macro
strings will be broken? As per my understanding, it is always better to
use
_T("my string"). If _UNICODE is defined, then this will be taken as L"my
string" at compile time. If _UNICODE is not defined (ie for ANSI or
MBCS) -
it will be treated as the plain literal string "my string".

My understanding (from the API documentation) is that most XERCES
routines
have an unicode-compatible overloaded method. For eg, in case of
XMLString::catString() method, the available signatures are:

static void XMLString::catString (char* const target, const char*
source)
static void XMLString::catString (XMLCh* const target, const XMLCh*
source)

Now if I declare one of the parameters (say the source string) as
_T("...."), then for _UNICODE it will be a L"....", which will be
treated as
"const XMLCh*". This will invoke the second function. If _UNICODE is not
defined, then it will map to "const char*" and the first one (of the two
overloaded functions above) will be invoked.

Please let me know if I am wrong in my assumption.

Thanks & Regards,
Deba



On 10/4/06, Jesse Pelton <js...@pkc.com> wrote:
>
> I'd recommend avoiding the _T() macro when calling Xerces routines
> because, as Alberto points out, whether it generates the right kind of
> string depends on the _UNICODE macro.  If you ever change how you
> compile, everything that uses _T() will be broken.  You can
future-proof
> your code by using L"my string" where XMLCh* is required, and just "my
> string" where char* is required.
>
> -----Original Message-----
> From: Alberto Massari [mailto:amassari@datadirect.com]
> Sent: Wednesday, October 04, 2006 7:41 AM
> To: c-users@xerces.apache.org
> Subject: Re: Query on Unicode Support for XERCES-C
>
> Hi Deba,
>
> At 12.31 04/10/2006 +0530, Debashis Tripathy wrote:
> >Hi,
> >
> >I need to implement XERCES-C on a MFC project. I need to know the
> extent of
> >Unicode Support provided by XERECES. My question is, whenever I need
to
> pass
> >a literal string (for eg: "my string") to one of the XRECES library
> methods,
> >can I pass a unicode string (eg _T("my string") or L"my string")
> instead?
>
> On Windows platforms, wherever you see XMLCh* you can use L"my
> string" or _T("my string") if you have defined the _UNICODE macro.
>
>
> >Also, wherever XERCES expects a parameter of type "const char*", is
it
> OK to
> >pass a "const _TCHAR*" or "LPCTSTR" instead?
>
> In this case _T("my string")/LPCTSTR/TCHAR* can only be used if the
> _UNICODE macro is NOT defined.
>
> Hope this helps,
> Alberto
>
>


-- 
Debashis Tripathy
+91 9937026725  (Mobile)
+91 674 2396071  (Home)
+91 674 2320032 * 42371 (Work)
-----------------------------------------------------------------
Anything written on paper can affect history,
not life. Life is a different history.

Re: Query on Unicode Support for XERCES-C

Posted by Debashis Tripathy <de...@gmail.com>.
Alberto, thanks.

Jesse, could you please be a little more precise as to why the _T() macro
strings will be broken? As per my understanding, it is always better to use
_T("my string"). If _UNICODE is defined, then this will be taken as L"my
string" at compile time. If _UNICODE is not defined (ie for ANSI or MBCS) -
it will be treated as the plain literal string "my string".

My understanding (from the API documentation) is that most XERCES routines
have an unicode-compatible overloaded method. For eg, in case of
XMLString::catString() method, the available signatures are:

static void XMLString::catString (char* const target, const char* source)
static void XMLString::catString (XMLCh* const target, const XMLCh* source)

Now if I declare one of the parameters (say the source string) as
_T("...."), then for _UNICODE it will be a L"....", which will be treated as
"const XMLCh*". This will invoke the second function. If _UNICODE is not
defined, then it will map to "const char*" and the first one (of the two
overloaded functions above) will be invoked.

Please let me know if I am wrong in my assumption.

Thanks & Regards,
Deba



On 10/4/06, Jesse Pelton <js...@pkc.com> wrote:
>
> I'd recommend avoiding the _T() macro when calling Xerces routines
> because, as Alberto points out, whether it generates the right kind of
> string depends on the _UNICODE macro.  If you ever change how you
> compile, everything that uses _T() will be broken.  You can future-proof
> your code by using L"my string" where XMLCh* is required, and just "my
> string" where char* is required.
>
> -----Original Message-----
> From: Alberto Massari [mailto:amassari@datadirect.com]
> Sent: Wednesday, October 04, 2006 7:41 AM
> To: c-users@xerces.apache.org
> Subject: Re: Query on Unicode Support for XERCES-C
>
> Hi Deba,
>
> At 12.31 04/10/2006 +0530, Debashis Tripathy wrote:
> >Hi,
> >
> >I need to implement XERCES-C on a MFC project. I need to know the
> extent of
> >Unicode Support provided by XERECES. My question is, whenever I need to
> pass
> >a literal string (for eg: "my string") to one of the XRECES library
> methods,
> >can I pass a unicode string (eg _T("my string") or L"my string")
> instead?
>
> On Windows platforms, wherever you see XMLCh* you can use L"my
> string" or _T("my string") if you have defined the _UNICODE macro.
>
>
> >Also, wherever XERCES expects a parameter of type "const char*", is it
> OK to
> >pass a "const _TCHAR*" or "LPCTSTR" instead?
>
> In this case _T("my string")/LPCTSTR/TCHAR* can only be used if the
> _UNICODE macro is NOT defined.
>
> Hope this helps,
> Alberto
>
>


-- 
Debashis Tripathy
+91 9937026725  (Mobile)
+91 674 2396071  (Home)
+91 674 2320032 * 42371 (Work)
-----------------------------------------------------------------
Anything written on paper can affect history,
not life. Life is a different history.

RE: Query on Unicode Support for XERCES-C

Posted by Jesse Pelton <js...@PKC.com>.
I'd recommend avoiding the _T() macro when calling Xerces routines
because, as Alberto points out, whether it generates the right kind of
string depends on the _UNICODE macro.  If you ever change how you
compile, everything that uses _T() will be broken.  You can future-proof
your code by using L"my string" where XMLCh* is required, and just "my
string" where char* is required. 

-----Original Message-----
From: Alberto Massari [mailto:amassari@datadirect.com] 
Sent: Wednesday, October 04, 2006 7:41 AM
To: c-users@xerces.apache.org
Subject: Re: Query on Unicode Support for XERCES-C

Hi Deba,

At 12.31 04/10/2006 +0530, Debashis Tripathy wrote:
>Hi,
>
>I need to implement XERCES-C on a MFC project. I need to know the
extent of
>Unicode Support provided by XERECES. My question is, whenever I need to
pass
>a literal string (for eg: "my string") to one of the XRECES library
methods,
>can I pass a unicode string (eg _T("my string") or L"my string")
instead?

On Windows platforms, wherever you see XMLCh* you can use L"my 
string" or _T("my string") if you have defined the _UNICODE macro.


>Also, wherever XERCES expects a parameter of type "const char*", is it
OK to
>pass a "const _TCHAR*" or "LPCTSTR" instead?

In this case _T("my string")/LPCTSTR/TCHAR* can only be used if the 
_UNICODE macro is NOT defined.

Hope this helps,
Alberto 


Re: Query on Unicode Support for XERCES-C

Posted by Alberto Massari <am...@datadirect.com>.
Hi Deba,

At 12.31 04/10/2006 +0530, Debashis Tripathy wrote:
>Hi,
>
>I need to implement XERCES-C on a MFC project. I need to know the extent of
>Unicode Support provided by XERECES. My question is, whenever I need to pass
>a literal string (for eg: "my string") to one of the XRECES library methods,
>can I pass a unicode string (eg _T("my string") or L"my string") instead?

On Windows platforms, wherever you see XMLCh* you can use L"my 
string" or _T("my string") if you have defined the _UNICODE macro.


>Also, wherever XERCES expects a parameter of type "const char*", is it OK to
>pass a "const _TCHAR*" or "LPCTSTR" instead?

In this case _T("my string")/LPCTSTR/TCHAR* can only be used if the 
_UNICODE macro is NOT defined.

Hope this helps,
Alberto