You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Qian XIA <xi...@kick.gr.jp> on 2000/12/11 09:11:29 UTC

About DOMString.transcode()

Hi, I come here again for help, thanks a lot.

I am now processing XML data with the SJIS encoding. I didn't use icu. The
way I tried is changing the encoding model previously. The function is as
the following:

SJIStoUTF8( char *inSJIS, int strlen(inSJIS), char *outUTF8);
UTF8toSJIS(char *inUTF8, char *outSJIS)

When running with the functions separately, i.e. first change japanese to
UTF8, then change UTF8 back to SJIS, the returned japanese is showed
correctly.(The main src functions I used inside them are:
MultiByteToWideChar() & WideCharToMultiByte()).

So,  I parsed the XML data this way:
1. Changed the text from SJIS --> UTF8
2. Write the UTF8 text as a DOM_Text and inserted in the XML data.
3. Parse the XML data.

Till here, there is no problem. Suppose the completed XML data is as the
following:
  <elemTag>UTF8txt</elemTag>

But when I want to change the DOMString type data to char type(using
transcode()) to let it be input into UTF8toSJIS(), there's nothing returned.
So I tried printing out the text data this way:
    cout << elemTag.getFirstChild().getNodeValue().transcode() << endl;

there's nothing appeared. If I just test with "ABCD" instead of japanese,
the result is perfectly good. It seems DOMString.transcode() cannot
recognize those UTF8 text so just ignored them and returned nothing back.

What I can do to solve this problem? Why should DOMString.transcode()
ignored those UTF8 encoding texts?

Thanks a lot,

_/_     Join those who intrigue you         _/_
_/_       and try to become intriguing too  _/_
 _/_ _______________________________________  _/_
  _/_    Name : Qian Xia                    _/_
   _/_   Email : xqianxia@ysb.nsd.co.jp _/_
    _/_ ____________________________________ _/_


Re: About DOMString.transcode()

Posted by Dean Roddey <dr...@charmedquark.com>.
If you want to transcode to something besides the LCP, then create an XMLTranscoder object. Use the transcoder object in the XMLPlatformUtils to create a transcoder for the encoding that you want, and use that object to do your transcoding.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"It takes two buttocks to make friction"
    - African Proverb


  ----- Original Message ----- 
  From: Qian XIA 
  To: xerces-c-dev@xml.apache.org 
  Sent: Monday, December 11, 2000 10:06 PM
  Subject: Re: About DOMString.transcode()


  To state my problem in another way, that is:
  The API document for " char *DOMString::transcode()" say that it returns a copy of the string, transcoded to the local code page. What I want is just to transfer the DOMString object to a string of 8-bit Windows(ANSI) characters, not to the local code page. It should be made possible for Xerces, isn't it?

  Any hint will be helpful, thanks a lot,

  Qian
    ----- Original Message ----- 
    From: Qian XIA 
    To: Xerces C++ 
    Sent: Monday, December 11, 2000 5:11 PM
    Subject: About DOMString.transcode()


    Hi, I come here again for help, thanks a lot.

    I am now processing XML data with the SJIS encoding. I didn't use icu. The way I tried is changing the encoding model previously. The function is as the following:
     
    SJIStoUTF8( char *inSJIS, int strlen(inSJIS), char *outUTF8);
    UTF8toSJIS(char *inUTF8, char *outSJIS)

    When running with the functions separately, i.e. first change japanese to UTF8, then change UTF8 back to SJIS, the returned japanese is showed correctly.(The main src functions I used inside them are: MultiByteToWideChar() & WideCharToMultiByte()).

    So,  I parsed the XML data this way:
    1. Changed the text from SJIS --> UTF8
    2. Write the UTF8 text as a DOM_Text and inserted in the XML data.
    3. Parse the XML data.

    Till here, there is no problem. Suppose the completed XML data is as the following:
      <elemTag>UTF8txt</elemTag>
     
    But when I want to change the DOMString type data to char type(using transcode()) to let it be input into UTF8toSJIS(), there's nothing returned. So I tried printing out the text data this way:
        cout << elemTag.getFirstChild().getNodeValue().transcode() << endl;
     
    there's nothing appeared. If I just test with "ABCD" instead of japanese, the result is perfectly good. It seems DOMString.transcode() cannot recognize those UTF8 text so just ignored them and returned nothing back.

    What I can do to solve this problem? Why should DOMString.transcode() ignored those UTF8 encoding texts?

    Thanks a lot,

    _/_     Join those who intrigue you         _/_
    _/_       and try to become intriguing too  _/_
     _/_ _______________________________________  _/_
      _/_    Name : Qian Xia                    _/_
       _/_   Email : xqianxia@ysb.nsd.co.jp _/_
        _/_ ____________________________________ _/_


Re: About DOMString.transcode()

Posted by Qian XIA <xi...@kick.gr.jp>.
To state my problem in another way, that is:
The API document for " char *DOMString::transcode()" say that it returns a
copy of the string, transcoded to the local code page. What I want is just
to transfer the DOMString object to a string of 8-bit Windows(ANSI)
characters, not to the local code page. It should be made possible for
Xerces, isn't it?

Any hint will be helpful, thanks a lot,

Qian
  ----- Original Message -----
  From: Qian XIA
  To: Xerces C++
  Sent: Monday, December 11, 2000 5:11 PM
  Subject: About DOMString.transcode()


  Hi, I come here again for help, thanks a lot.

  I am now processing XML data with the SJIS encoding. I didn't use icu. The
way I tried is changing the encoding model previously. The function is as
the following:

  SJIStoUTF8( char *inSJIS, int strlen(inSJIS), char *outUTF8);
  UTF8toSJIS(char *inUTF8, char *outSJIS)

  When running with the functions separately, i.e. first change japanese to
UTF8, then change UTF8 back to SJIS, the returned japanese is showed
correctly.(The main src functions I used inside them are:
MultiByteToWideChar() & WideCharToMultiByte()).

  So,  I parsed the XML data this way:
  1. Changed the text from SJIS --> UTF8
  2. Write the UTF8 text as a DOM_Text and inserted in the XML data.
  3. Parse the XML data.

  Till here, there is no problem. Suppose the completed XML data is as the
following:
    <elemTag>UTF8txt</elemTag>

  But when I want to change the DOMString type data to char type(using
transcode()) to let it be input into UTF8toSJIS(), there's nothing returned.
So I tried printing out the text data this way:
      cout << elemTag.getFirstChild().getNodeValue().transcode() << endl;

  there's nothing appeared. If I just test with "ABCD" instead of japanese,
the result is perfectly good. It seems DOMString.transcode() cannot
recognize those UTF8 text so just ignored them and returned nothing back.

  What I can do to solve this problem? Why should DOMString.transcode()
ignored those UTF8 encoding texts?

  Thanks a lot,

  _/_     Join those who intrigue you         _/_
  _/_       and try to become intriguing too  _/_
   _/_ _______________________________________  _/_
    _/_    Name : Qian Xia                    _/_
     _/_   Email : xqianxia@ysb.nsd.co.jp _/_
      _/_ ____________________________________ _/_