You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Balazs Somogyi <ba...@FATHOMTECHNOLOGY.com> on 2003/02/26 15:34:36 UTC
digester + DOM
Hi,
Is it possible to feed digester with an already parsed XML (actually
XHTML).
I'm using JTidy to parse HTML and would like to extract some of its
elements but don't want to traverse manually the tree.
Thanks in advance for your help,
Balazs
Re: [OT] SAX & DOM was: digester + DOM
Posted by robert burrell donkin <ro...@blueyonder.co.uk>.
On Wednesday, February 26, 2003, at 04:08 PM, Erik Price wrote:
> James Strachan wrote:
>> Rather than using JTidy to parse HTML (which makes a DOM) you could use
>> NekoHTML which is-a SAX parser that can handle HTML. Then you don't need
>> to
>> use a DOM.
>
> Sorry to hijack a thread like this, but I was curious -- if you're
> building an in-memory representation of an XML document, is there still a
> compelling reason to use a SAX parser? Or should you just use DOM in
> that case.
james can probably give you a pretty definitive answer to this question
but here's my two penneth.
i think that the answer about this depends on what in-memory
representation you want. DOM is a generic representation. different kinds
of xml (eg having different schemas) are represented using the same
objects. this may be good or bad depending on the circumstances. if you're
interested in general xml then a general representation is best. but there'
s more than DOM out there. there are several general representations (eg.
dom4j) which offer more java-friendly APIs.
even when you're dealing with general representations, SAX (and therefore
digester) can have advantages over DOM. with SAX it is easy to filter so
that only the part of the object model you're interested in is created.
digester has a rule that creates partial DOM object models which can be
used in this way.
on the other hand, a very common use case is having a particular object
model in mind which is represented by strongly typed java beans. in this
case, though the mapping is to an in-memory object model, there is a
considerable performance benefit (both speed and memory) in using SAX
rather than DOM. there are a number of technologies (eg. castor, JAXB,
betwixt) which do this - and digester is also commonly used for this
purpose.
- robert
[OT] SAX & DOM was: digester + DOM
Posted by Erik Price <ep...@ptc.com>.
James Strachan wrote:
> Rather than using JTidy to parse HTML (which makes a DOM) you could use
> NekoHTML which is-a SAX parser that can handle HTML. Then you don't need to
> use a DOM.
Sorry to hijack a thread like this, but I was curious -- if you're
building an in-memory representation of an XML document, is there still
a compelling reason to use a SAX parser? Or should you just use DOM in
that case.
I haven't really done much with XML parsing and was wondering about this.
Erik
Re: digester + DOM
Posted by James Strachan <ja...@yahoo.co.uk>.
Rather than using JTidy to parse HTML (which makes a DOM) you could use
NekoHTML which is-a SAX parser that can handle HTML. Then you don't need to
use a DOM.
NekoHTML plugs right into Digester allowing you to fire Digester rules
straight from the SAX events coming out of the HTML
http://www.apache.org/~andyc/neko/doc/html/
James
-------
http://radio.weblogs.com/0112098/
----- Original Message -----
From: "Balazs Somogyi" <ba...@FATHOMTECHNOLOGY.com>
To: <co...@jakarta.apache.org>
Sent: Wednesday, February 26, 2003 2:34 PM
Subject: digester + DOM
Hi,
Is it possible to feed digester with an already parsed XML (actually
XHTML).
I'm using JTidy to parse HTML and would like to extract some of its
elements but don't want to traverse manually the tree.
Thanks in advance for your help,
Balazs
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
Re: digester + DOM
Posted by Janek Bogucki <ya...@studylink.com>.
Hi Balazs,
> From: "Balazs Somogyi" <ba...@FATHOMTECHNOLOGY.com>
> Reply-To: "Jakarta Commons Users List" <co...@jakarta.apache.org>
> Date: Wed, 26 Feb 2003 15:34:36 +0100
> To: <co...@jakarta.apache.org>
> Subject: digester + DOM
>
> Hi,
>
> Is it possible to feed digester with an already parsed XML (actually
> XHTML).
> I'm using JTidy to parse HTML and would like to extract some of its
> elements but don't want to traverse manually the tree.
>
> Thanks in advance for your help,
> Balazs
>
You could address the elements you want with XPath. This is likely to be a
better approach than serializing the XHTML object tree and having Digester
act on that.
Jakarta has an XPath implementation
http://jakarta.apache.org/commons/jxpath/index.html
There is also Jaxen (http://jaxen.sourceforge.net/) with can be used to
address W3C DOM, dom4j, JDOM and XOM object trees.
-Janek