You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Ramon F Herrera <ra...@patriot.net> on 2010/04/20 22:25:27 UTC
Question about performance for COM/SAX gurus
Subtitle 1: How to make SAX fly.
Subtitle 2: Should I use DOM instead?
My application retrieves several items (attributes and text) from large
XML files. Such items are used to create a spreadsheet. The app is based
on JAXP, and the code contains many lines like these:
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@user");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@project");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@projectpath");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@title");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@notes");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@computer");
[...]
(note: the 'oneItemAtATime()' function implements an XPath query).
Performance is rather poor because the whole XML file is completely
scanned by SAX (internally used by the 'XPathDemo' app that comes with
the JAXP distro) for every item above. To make things worse, multiple
XML files may be opened sequentially, and the items above are retrieved
from each XML file.
I have considered two alternatives to improve performance, but would
like to request some advice.
(1) Rewrite (how?, details are most welcome!) the XPath query, basing it
on DOM. The way the file will be read only once from the filesystem.
IOW, remove SAX from the application.
(2) Stick with SAX, but in a smart way. As it reads the XML tree, all
the required items above are captured. Again, the question is how.
As they say, the devil is in the details.
TIA for sharing your expertise...
Regards,
-Ramon
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: Question about performance for COM/SAX gurus
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Elliotte Rusty Harold <el...@ibiblio.org> wrote on 04/20/2010 08:16:02 PM:
> On Tue, Apr 20, 2010 at 4:25 PM, Ramon F Herrera <ra...@patriot.net>
wrote:
> >
> > Subtitle 1: How to make SAX fly.
> > Subtitle 2: Should I use DOM instead?
> >
> > My application retrieves several items (attributes and text) from large
XML
> > files. Such items are used to create a spreadsheet. The app is based on
> > JAXP, and the code contains many lines like these:
> >
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@user");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@project");
> > cell.cellValue = oneItemAtATime(xmlFile,
"//root/creator/@projectpath");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@title");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@notes");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@computer");
> >
> > [...]
>
> I'm not sure what you're using, but nothing in this sample is SAX. It
> sounds like there's some higher level API sitting on top of SAX doing
> something ill-advised. Were you to rewrite this app to use real SAX,
> you could first use a single pass to grab all the values you need; and
> then fill the cells.
And if the queries were complex enough that you couldn't stream it then you
could build a DOM and evaluate each XPath over the same instance. You
certainly don't need to parse the same document N times to accomplish this.
> --
> Elliotte Rusty Harold
> elharo@ibiblio.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
Thanks.
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
Re: Question about performance for COM/SAX gurus
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Elliotte Rusty Harold <el...@ibiblio.org> wrote on 04/20/2010 08:16:02 PM:
> On Tue, Apr 20, 2010 at 4:25 PM, Ramon F Herrera <ra...@patriot.net>
wrote:
> >
> > Subtitle 1: How to make SAX fly.
> > Subtitle 2: Should I use DOM instead?
> >
> > My application retrieves several items (attributes and text) from large
XML
> > files. Such items are used to create a spreadsheet. The app is based on
> > JAXP, and the code contains many lines like these:
> >
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@user");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@project");
> > cell.cellValue = oneItemAtATime(xmlFile,
"//root/creator/@projectpath");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@title");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@notes");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@computer");
> >
> > [...]
>
> I'm not sure what you're using, but nothing in this sample is SAX. It
> sounds like there's some higher level API sitting on top of SAX doing
> something ill-advised. Were you to rewrite this app to use real SAX,
> you could first use a single pass to grab all the values you need; and
> then fill the cells.
And if the queries were complex enough that you couldn't stream it then you
could build a DOM and evaluate each XPath over the same instance. You
certainly don't need to parse the same document N times to accomplish this.
> --
> Elliotte Rusty Harold
> elharo@ibiblio.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
Thanks.
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
Re: Question about performance for COM/SAX gurus
Posted by Elliotte Rusty Harold <el...@ibiblio.org>.
On Tue, Apr 20, 2010 at 4:25 PM, Ramon F Herrera <ra...@patriot.net> wrote:
>
> Subtitle 1: How to make SAX fly.
> Subtitle 2: Should I use DOM instead?
>
> My application retrieves several items (attributes and text) from large XML
> files. Such items are used to create a spreadsheet. The app is based on
> JAXP, and the code contains many lines like these:
>
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@user");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@project");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@projectpath");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@title");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@notes");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@computer");
>
> [...]
I'm not sure what you're using, but nothing in this sample is SAX. It
sounds like there's some higher level API sitting on top of SAX doing
something ill-advised. Were you to rewrite this app to use real SAX,
you could first use a single pass to grab all the values you need; and
then fill the cells.
--
Elliotte Rusty Harold
elharo@ibiblio.org
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org