You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by James McCarthy <ja...@webxi.com> on 2000/01/28 16:52:39 UTC

SAX Handler Question

I am trying to parse an XML file and then reproduce it (or a portion of it)
from the handler events, how do you tell the difference between a />
terminated tag and an empty one.

For example, how do I know the difference between "<MYTAG ATTRIBUTE="DATA"
/>" and "<MYTAG ATTRIBUTE="DATA"></MYTAG>" from the SAX events.

I would like to keep the exact content of the input file or portions of the
input file that I am sending on.

Thanks,

Jim McCarthy
WebXi, Inc.


Re: SAX Handler Question

Posted by Andy Clark <an...@apache.org>.
James McCarthy wrote:
> the answer is that slight modifications should be acceptable as long 
> as it does not change the data integrety.

I agree.

And if your program knows what documents it will be writing,
then it can decide for each element type how to write it. But
if you're making a general XML writer, then you have to make
a decision. 

Perhaps there is a serializer (+ settings) that will handle 
this for you. Check out the org.apache.xml.serialize package.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

RE: SAX Handler Question

Posted by James McCarthy <ja...@webxi.com>.

> -----Original Message-----
> From: Andy Clark [mailto:andyc@apache.org]
> Sent: Friday, January 28, 2000 2:08 PM
> To: general@xml.apache.org
> Subject: Re: SAX Handler Question
>
>
> James McCarthy wrote:
> > I would like to keep the exact content of the input file or
> > portions of the input file that I am sending on.
>
> Both standard APIs (DOM and SAX) are lossy. They do not
> communicate all of the information in the original source
> document. With SAX you can output an equivalent document
> but not an identical one.
>
> <tag/> and <tag></tag> produce the same set of DOM nodes
> and SAX callbacks. So what's the difference?

I agree from a parser point of view there is no difference. But since XML
was designed to be readable by users as well as parsers the difference may
be unacceptable. I was hoping for an easy answer. I guess there isn't one or
the answer is that slight modifications should be acceptable as long as it
does not change the data integrety.

>
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org


Re: SAX Handler Question

Posted by Andy Clark <an...@apache.org>.
James McCarthy wrote:
> I would like to keep the exact content of the input file or 
> portions of the input file that I am sending on.

Both standard APIs (DOM and SAX) are lossy. They do not 
communicate all of the information in the original source 
document. With SAX you can output an equivalent document
but not an identical one.

<tag/> and <tag></tag> produce the same set of DOM nodes
and SAX callbacks. So what's the difference?

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: SAX Handler Question

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
James McCarthy wrote:
> 
> I was not aware of the Locator.

Nobody is... It's basically used for reporting the right Location in
error messages, but, we can "tweak" its use (I personally discovered
it 12 seconds before sending that mail)

> When I read about it and tested what you
> said it worked.

Good :)

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

RE: SAX Handler Question

Posted by James McCarthy <ja...@webxi.com>.
I was not aware of the Locator. When I read about it and tested what you
said it worked.

Thanks for your help.

Jim

> -----Original Message-----
> From: Pierpaolo Fumagalli [mailto:pier@apache.org]
> Sent: Friday, January 28, 2000 8:44 PM
> To: general@xml.apache.org
> Subject: Re: SAX Handler Question
>
>
> James McCarthy wrote:
> >
> > I am trying to parse an XML file and then reproduce it (or a
> portion of it)
> > from the handler events, how do you tell the difference between a />
> > terminated tag and an empty one.
> >
> > For example, how do I know the difference between "<MYTAG
> ATTRIBUTE="DATA"
> > />" and "<MYTAG ATTRIBUTE="DATA"></MYTAG>" from the SAX events.
> >
> > I would like to keep the exact content of the input file or
> portions of the
> > input file that I am sending on.
>
> I think that, if you use sax, you can use the locator to exactly
> differentitate those. If you have <ELEM></ELEM> the values for
> getColumnNumber() and getLineNumber() are different, while if you have
> <ELEM/> they (should) be the same...
> Check it out...
>
> 	Pier
>
> --
> --------------------------------------------------------------------
> -          P              I              E              R          -
> stable structure erected over water to allow the docking of seacraft
> <ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
> --------------------------------------------------------------------
> - ApacheCON Y2K: Come to the official Apache developers conference -
> -------------------- <http://www.apachecon.com> --------------------


Re: SAX Handler Question

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
James McCarthy wrote:
> 
> I am trying to parse an XML file and then reproduce it (or a portion of it)
> from the handler events, how do you tell the difference between a />
> terminated tag and an empty one.
> 
> For example, how do I know the difference between "<MYTAG ATTRIBUTE="DATA"
> />" and "<MYTAG ATTRIBUTE="DATA"></MYTAG>" from the SAX events.
> 
> I would like to keep the exact content of the input file or portions of the
> input file that I am sending on.

I think that, if you use sax, you can use the locator to exactly
differentitate those. If you have <ELEM></ELEM> the values for
getColumnNumber() and getLineNumber() are different, while if you have
<ELEM/> they (should) be the same...
Check it out...

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------