You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Lisa Retief <li...@exinet.co.za> on 2000/08/30 13:56:15 UTC

SAX Parsing "&" or "&"

Hi,

A colleague of mine who is not subsrcibed to the list asked me to forward
this query.

I am using Xerces 1.1.2 and am encountering the following problem...

When I SAX parse a document containing something like this:
    <tag>
        blah blah & yyy
    </tag>
i get the following error:
    The entity name must immediately follow the '&' in the entity reference

when i replace '&' with '&amp;' i only get 'yyy' out without the 'blah blah
&amp;'

please help!

Thanks, Lisa

Re: SAX Parsing "&" or "&"

Posted by David Waite <ma...@ufl.edu>.

No problem - SAX is sending multiple characters(...) events out because 
of the entity. You must escape out the XML characters (as you have 
already found out), and you must concetenate multiple character(...) 
events to get the full text, i.e.

characters("blah blah ")
characters("&amp;")
characters(" yyy")
(yes, these aren't the arguments, but are the gist of what is happening)

-David Waite

Lisa Retief wrote:
> 
> Hi,
> 
> A colleague of mine who is not subsrcibed to the list asked me to forward
> this query.
> 
> I am using Xerces 1.1.2 and am encountering the following problem...
> 
> When I SAX parse a document containing something like this:
>     <tag>
>         blah blah & yyy
>     </tag>
> i get the following error:
>     The entity name must immediately follow the '&' in the entity reference
> 
> when i replace '&' with '&amp;' i only get 'yyy' out without the 'blah blah
> &amp;'
> 
> please help!
> 
> Thanks, Lisa
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 
>

Re: SAX Parsing "&" or "&"

Posted by Elliotte Rusty Harold <el...@metalab.unc.edu>.

At 1:56 PM +0200 8/30/00, Lisa Retief wrote:
>Hi,
>
>A colleague of mine who is not subsrcibed to the list asked me to forward
>this query.
>
>I am using Xerces 1.1.2 and am encountering the following problem...
>
>When I SAX parse a document containing something like this:
>    <tag>
>        blah blah & yyy
>    </tag>
>i get the following error:
>    The entity name must immediately follow the '&' in the entity reference
>

That's the correct and expected behavior in this case. The document 
is not well-formed. An exception should be thrown.

>when i replace '&' with '&amp;' i only get 'yyy' out without the 'blah blah
>&amp;'
>

You did not provide enough details to conclusively diagnose your 
problem. However it certainly looks like a it may be a 
misunderstanding of how Xerces works. If you're using DOM, I suspect 
that Xerces is returning three nodes as the content of the tag 
element, and you're only looking at the first one. If you're using 
SAX, I suspect that Xerces is calling characters() three times and 
you're only looking at the first call. There's no guarantee 
characters() returns the maximum contiguous run of text. But whatever 
you're doing I doubt Xerces is at fault here.

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+