You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by ma...@kodak.com on 2000/06/05 19:54:42 UTC

reading and writing in UTF-16 format


From: Mahesh Bhat

Hi
I am writing  XML data to file in UTF-16 format but when I try to read the
file using DOMPrint example I get following error
"Fatal Error at file "E:\data8.xml", line 1, column 2
   Message: Invalid document structure"


Here the code I am using to write to a file on WIN NT platform

wstring p =L"\xFFFE\
<?xml version='1.0' encoding='UTF-16'?>\n\
<!DOCTYPE company [\n\
<!ELEMENT company     (product,category,developedAt)>\n\
<!ELEMENT product     (#PCDATA)>\n\
<!ELEMENT category    (#PCDATA)>\n\
<!ATTLIST category idea CDATA #IMPLIED>\n\
<!ELEMENT developedAt (#PCDATA)>\n\
]>\n\n\
<company>\n\
    <product>XML4C</product>\n\
    <category idea='great'>XML Parsing Tools</category>\n\
    <developedAt>\n\
      IBM Center for Java Technology, Silicon Valley, Cupertino, CA\n\
    </developedAt>\n\
</company>\
";
  FILE *stream ;
     stream = fopen("e:\\data8.xml", "w+" );

     //fwrite((void*)m,sizeof(char),2,stream);
     fwrite( p.c_str(), sizeof(wchar_t), p.size(), stream );
     fclose(stream);

what I am doing wrong?
Thanks in advance



Re: reading and writing in UTF-16 format

Posted by Dean Roddey <dr...@charmedquark.com>.
Not sure what the primary problem is, since its almost certainly involved in
the code you are using to write the data out and its relationship to the
wide string class. Do this:

Get rid of wstring. Just declare a basic L"" type string. Get its length and
multiply by the size of a wide character, write that out. If that works,
then you are basically ok, and there is something about the wstring stuff
that you are misunderstanding.

Some other points:

1. If you have the <?xml (XMLDecl) then you don't really need a BOM.
2. And you also won't need the encoding="UTF-16 ". In fact that's bound to
break if you moved this code to a platform where the wide char is not UTF-16
(i.e. its 32 bits.)
3. If you are feeding the data straight from a memory buffer, use the magic
encoding name XMLUni::fgXMLChEncodingString. This is guaranteed to be
correct for a local L"" prefixed string and has the best performance.
However, this is not appropriate for external files, unless you know for
sure its really in the local L"" format (though in your case it is.)

--------------------------
Dean Roddey
The CIDLib Class Libraries
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"Give me immortality, or give me death"

----- Original Message -----
From: <ma...@kodak.com>
To: <xe...@xml.apache.org>
Sent: Monday, June 05, 2000 10:54 AM
Subject: reading and writing in UTF-16 format


>
>
> From: Mahesh Bhat
>
> Hi
> I am writing  XML data to file in UTF-16 format but when I try to read the
> file using DOMPrint example I get following error
> "Fatal Error at file "E:\data8.xml", line 1, column 2
>    Message: Invalid document structure"
>
>
> Here the code I am using to write to a file on WIN NT platform
>
> wstring p =L"\xFFFE\
> <?xml version='1.0' encoding='UTF-16'?>\n\
> <!DOCTYPE company [\n\
> <!ELEMENT company     (product,category,developedAt)>\n\
> <!ELEMENT product     (#PCDATA)>\n\
> <!ELEMENT category    (#PCDATA)>\n\
> <!ATTLIST category idea CDATA #IMPLIED>\n\
> <!ELEMENT developedAt (#PCDATA)>\n\
> ]>\n\n\
> <company>\n\
>     <product>XML4C</product>\n\
>     <category idea='great'>XML Parsing Tools</category>\n\
>     <developedAt>\n\
>       IBM Center for Java Technology, Silicon Valley, Cupertino, CA\n\
>     </developedAt>\n\
> </company>\
> ";
>   FILE *stream ;
>      stream = fopen("e:\\data8.xml", "w+" );
>
>      file://fwrite((void*)m,sizeof(char),2,stream);
>      fwrite( p.c_str(), sizeof(wchar_t), p.size(), stream );
>      fclose(stream);
>
> what I am doing wrong?
> Thanks in advance
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>


Re: reading and writing in UTF-16 format

Posted by Andy Heninger <an...@jtcsv.com>.
From: <ma...@kodak.com>
> I am writing  XML data to file in UTF-16 format but when I try to read
the
> file using DOMPrint example I get following error
> "Fatal Error at file "E:\data8.xml", line 1, column 2
>    Message: Invalid document structure"


Two suggestions...

1.  Look at the actual bytes in your file, verify that they are really
    what you were trying for.  If you have Cygwin or equivalent tools,
    this command will do it:

       od -h yourfile

2.  Get the (free) Unipad Unicode editor, which can read, write
    and edit files in many encodings, including utf-8, 16 and
    ucs-4.  Create the equivalent file with it, and see how
    it differs from yours.

    Unipad can be downloaded from
    http://www.sharmahd.com/unipad/


Andy Heninger
IBM XML Technology Group, Cupertino, CA
heninger@us.ibm.com