You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Mannion, Enda" <en...@hp.com> on 2006/08/10 18:50:24 UTC

Reading UNICODE string

Hi,

 

I am able to write a unicode character to my XML file but I am unable to
read it.

 

 

I read using this code and xmlval = ""

 

 

n = node->getFirstChild()->getNodeValue();

xmlval = xercesc::XMLString::transcode(n);      

 

If the character(s) written are not Unicode there are no problems.

 

Any ideas appreciated,

Thanks,

Enda 

 


Re: Reading UNICODE string

Posted by Axel Weiß <aw...@informatik.hu-berlin.de>.
Mannion, Enda wrote:
> Can you suggest a solution to me so I can read and write as Unicode?

My local encoding is
LANG=de_DE.UTF-8

HTH,
			Axel

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Reading UNICODE string

Posted by sandip shahane <sa...@yahoo.com>.
I have  a question on similar line 
I want to know, is there any kind of wstring
support with xerces? My application is intended to use
wstring based class (It is must since all code base
that was earlier on windows is using wstring, and now
i want to port it to other platforms of UNIX). Also it
shall support I18 viz. foreign language character set
e.g. japanese etc... For this I am doing as below:

I have seen XMLString examples at
some places that does ::transcode(...) to and fro
XMLCh * to char * . If possible my strategy is to use
this way to get char * from xerces's XMLCh * and
convert it back to wstring using wcstombs and vice
versa.
What does this transcode(...) do? Will I be able to
support non-english (say japanese character set) do as
above.?

Has anybody done this way, any issues, or
alternatives?
because i think this is going to be too expensive.
Is there any better way? Or has anybody used any other
simpler way? If so, some sample code will be greatly
helpful.


Thanks,
Sandeep Shahane

--- David Bertoni <db...@apache.org> wrote:

> Mannion, Enda wrote:
> > Thanks Again Axel,
> > 
> > Can you suggest a solution to me so I can read and
> write as Unicode?
> > 
> 
> Why don't you just write the UTF-16 data into the
> file?  As long as you are 
> reading and writing the data on a single platform,
> or you remember whether 
> it's big-endian or little-endian, it should work.
> 
> If you want an encoding that's more compatible with
> non-Unicode 
> applications, then you can transcode to UTF-8.
> 
> Dave
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail:
> c-dev-help@xerces.apache.org
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Reading UNICODE string

Posted by David Bertoni <db...@apache.org>.
Mannion, Enda wrote:
> Sorry for my lack of xerces knowledge but how do I transcode to UTF-8?

You create a UTF-8 transcoder.  There are previous emails in the archives 
that describe the process.  There is a global instance of XMLTransService 
that is available in XMLPlatformUtils that you can use for creating the 
transcoder:

http://xml.apache.org/xerces-c/apiDocs/classXMLPlatformUtils.html#z941_1

Here is the documentation for XMLTransService:

http://xml.apache.org/xerces-c/apiDocs/classXMLTransService.html

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


RE: Reading UNICODE string

Posted by "Mannion, Enda" <en...@hp.com>.
Sorry for my lack of xerces knowledge but how do I transcode to UTF-8?

Enda

-----Original Message-----
From: David Bertoni [mailto:dbertoni@apache.org] 
Sent: 10 August 2006 19:21
To: c-dev@xerces.apache.org
Subject: Re: Reading UNICODE string

Mannion, Enda wrote:
> Thanks Again Axel,
> 
> Can you suggest a solution to me so I can read and write as Unicode?
> 

Why don't you just write the UTF-16 data into the file?  As long as you
are 
reading and writing the data on a single platform, or you remember
whether 
it's big-endian or little-endian, it should work.

If you want an encoding that's more compatible with non-Unicode 
applications, then you can transcode to UTF-8.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Reading UNICODE string

Posted by David Bertoni <db...@apache.org>.
Mannion, Enda wrote:
> Thanks Again Axel,
> 
> Can you suggest a solution to me so I can read and write as Unicode?
> 

Why don't you just write the UTF-16 data into the file?  As long as you are 
reading and writing the data on a single platform, or you remember whether 
it's big-endian or little-endian, it should work.

If you want an encoding that's more compatible with non-Unicode 
applications, then you can transcode to UTF-8.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


RE: Reading UNICODE string

Posted by "Mannion, Enda" <en...@hp.com>.
Thanks Again Axel,

Can you suggest a solution to me so I can read and write as Unicode?


Enda

-----Original Message-----
From: Axel Weiß [mailto:aweiss@informatik.hu-berlin.de] 
Sent: 10 August 2006 18:51
To: c-dev@xerces.apache.org
Subject: Re: Reading UNICODE string

Mannion, Enda wrote:
> I don't understand, if I can write this Unicode character then why can I
> not read it.

Enda, (sorry, overlooked the comma)

I didn't argue about reading, but about transcoding. After reading, the 
character is hold in XMLCh, the utf-16 representation required by the 
standards. If this character has a representation in the local (libc) 
encoding, xercesc::XMLString::transcode delivers the local representation.

> I read using this code and xmlval = ""
>
> n = node->getFirstChild()->getNodeValue();
> xmlval = xercesc::XMLString::transcode(n);      

Did you check the value of 'n'?

HTH,
			Axel

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Reading UNICODE string

Posted by Axel Weiß <aw...@informatik.hu-berlin.de>.
Mannion, Enda wrote:
> I don't understand, if I can write this Unicode character then why can I
> not read it.

Enda, (sorry, overlooked the comma)

I didn't argue about reading, but about transcoding. After reading, the 
character is hold in XMLCh, the utf-16 representation required by the 
standards. If this character has a representation in the local (libc) 
encoding, xercesc::XMLString::transcode delivers the local representation.

> I read using this code and xmlval = ""
>
> n = node->getFirstChild()->getNodeValue();
> xmlval = xercesc::XMLString::transcode(n);      

Did you check the value of 'n'?

HTH,
			Axel

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


RE: Reading UNICODE string

Posted by "Mannion, Enda" <en...@hp.com>.
Axel,

I don't understand, if I can write this Unicode character then why can I not read it.

Enda


Original Message
----------------

I am able to write a unicode character to my XML file but I am unable to read it.

I read using this code and xmlval = ""

n = node->getFirstChild()->getNodeValue();
xmlval = xercesc::XMLString::transcode(n);      


If the character(s) written are not Unicode there are no problems.



-----Original Message-----
From: Axel Weiß [mailto:aweiss@informatik.hu-berlin.de] 
Sent: 10 August 2006 18:04
To: c-dev@xerces.apache.org
Cc: Mannion, Enda
Subject: Re: Reading UNICODE string

Mannion, Enda wrote:
> If the character(s) written are not Unicode there are no problems.

Mannion,

if the character has no representation in your local codepage, you cannot 
transcode it.

HTH,
			Axel

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Reading UNICODE string

Posted by Axel Weiß <aw...@informatik.hu-berlin.de>.
Mannion, Enda wrote:
> If the character(s) written are not Unicode there are no problems.

Mannion,

if the character has no representation in your local codepage, you cannot 
transcode it.

HTH,
			Axel

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org