You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Mike Lepine <mi...@ezgov.com> on 2002/11/18 17:37:43 UTC

Parsing XML Containing Euro Sign

I searhed the Xerces FAQ, tried to review the mailing list archives but they
appeared to be offline, and was not able to find much information on my
question. So, if this has been asked/answered before, I apologize for
reposting.

I am using Xerces version 1.4.2 to parse an XML document containing a Euro
sign character. I create a FileInputStream

            // create input stream from XML file
            FileInputStream inputStream(new File(fileName));

            // parse XML
            parser.parse(new InputSource(inputStream));

When the XML sign containing the Euro sign is parsed, it is misread,
converting it to a different character (in this case a question mark). I
tried to change the document encoding to UTF-16 instead of UTF-8 but this
generated an exception stating that UTF-16 was not supported.

In order to write the XML file (containing the Euro sign), I have to make
sure the data is written out as characters instead of bytes because when the
Euro sign is converted to a byte, it looks like the high order byte is
discarded resulting in the wrong character being written out.

Finally, my question is whether I can use Xerces to parse an XML document
containing the Euro sign and if so, how do I do it?

I appreciate any help offered.

Thanks.

- Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Parsing XML Containing Euro Sign

Posted by Mike Lepine <mi...@ezgov.com>.
Thanks for the tips. I was definitely not reading in the document using the
same encoding that was used to create it. I am now able to read in Euro
signs when I get the encoding right.

The only outstanding issue is that line separators are being interpreted as
spaces. For example, if there is a block of XML that looks like this:

<tagx>Line 1.
Line 2.
Line 3.</tagx>

It is being read in as:

Line1. Line 2. Line 3.

instead of:

Line 1.
Line 2.
Line 3.

as I want it to. I have to assume it is an encoding issue because the line
separators used to be retained after the XML was parsed.

Once again, thank you for the help.

----- Original Message -----
From: "Joseph Kesselman" <ke...@us.ibm.com>
To: <xe...@xml.apache.org>
Sent: Tuesday, November 19, 2002 11:05 AM
Subject: Re: Parsing XML Containing Euro Sign


> Make sure the encoding declared in your document considers the Euro symbol
> an acceptable character. If it doesn't, (a) change encodings to one that
> does, or (b) use a numeric character escape or (c) if you're in UTF-8, use
> the multibyte sequence which represents that symbol or (d) define an
> entity which maps to (b) or (c), and reference that entity name.
>
> ______________________________________
> Joe Kesselman  / IBM Research
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Parsing XML Containing Euro Sign

Posted by Joseph Kesselman <ke...@us.ibm.com>.
Make sure the encoding declared in your document considers the Euro symbol 
an acceptable character. If it doesn't, (a) change encodings to one that 
does, or (b) use a numeric character escape or (c) if you're in UTF-8, use 
the multibyte sequence which represents that symbol or (d) define an 
entity which maps to (b) or (c), and reference that entity name.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Elements vs Attributes Performance wise

Posted by Joseph Kesselman <ke...@us.ibm.com>.
Attributes have a bit of overhead for name-based lookup. On the other 
hand, that may be faster than linearly searching children by name, if 
that's what you want.

General advice is not to try to microoptimize on this basis. Use elements 
for structure, or for things which must contain structure; use attributes 
for simple-valued annotations that help describe that structure.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Elements vs Attributes Performance wise

Posted by Rob Outar <ro...@ideorlando.org>.
Does anyone have any info on the performance impact of using elements vs.
attributes?

Thanks,

Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org