You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Ramon F Herrera <ra...@patriot.net> on 2010/04/20 22:25:27 UTC

Question about performance for COM/SAX gurus

Subtitle 1: How to make SAX fly.
Subtitle 2: Should I use DOM instead?

My application retrieves several items (attributes and text) from large 
XML files. Such items are used to create a spreadsheet. The app is based 
on JAXP, and the code contains many lines like these:

cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@user");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@project");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@projectpath");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@title");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@notes");
cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@computer");

[...]

(note: the 'oneItemAtATime()' function implements an XPath query).

Performance is rather poor because the whole XML file is completely 
scanned by SAX (internally used by the 'XPathDemo' app that comes with 
the JAXP distro) for every item above. To make things worse, multiple 
XML files may be opened sequentially, and the items above are retrieved 
from each XML file.

I have considered two alternatives to improve performance, but would 
like to request some advice.

(1) Rewrite (how?, details are most welcome!) the XPath query, basing it 
on DOM. The way the file will be read only once from the filesystem. 
IOW, remove SAX from the application.

(2) Stick with SAX, but in a smart way. As it reads the XML tree, all 
the required items above are captured. Again, the question is how.

As they say, the devil is in the details.

TIA for sharing your expertise...

Regards,

-Ramon




---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Question about performance for COM/SAX gurus

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Elliotte Rusty Harold <el...@ibiblio.org> wrote on 04/20/2010 08:16:02 PM:

> On Tue, Apr 20, 2010 at 4:25 PM, Ramon F Herrera <ra...@patriot.net>
wrote:
> >
> > Subtitle 1: How to make SAX fly.
> > Subtitle 2: Should I use DOM instead?
> >
> > My application retrieves several items (attributes and text) from large
XML
> > files. Such items are used to create a spreadsheet. The app is based on
> > JAXP, and the code contains many lines like these:
> >
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@user");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@project");
> > cell.cellValue = oneItemAtATime(xmlFile,
"//root/creator/@projectpath");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@title");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@notes");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@computer");
> >
> > [...]
>
> I'm not sure what you're using, but nothing in this sample is SAX. It
> sounds like there's some higher level API sitting on top of SAX doing
> something ill-advised. Were you to rewrite this app to use real SAX,
> you could first use a single pass to grab all the values you need; and
> then fill the cells.

And if the queries were complex enough that you couldn't stream it then you
could build a DOM and evaluate each XPath over the same instance. You
certainly don't need to parse the same document N times to accomplish this.

> --
> Elliotte Rusty Harold
> elharo@ibiblio.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Re: Question about performance for COM/SAX gurus

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Elliotte Rusty Harold <el...@ibiblio.org> wrote on 04/20/2010 08:16:02 PM:

> On Tue, Apr 20, 2010 at 4:25 PM, Ramon F Herrera <ra...@patriot.net>
wrote:
> >
> > Subtitle 1: How to make SAX fly.
> > Subtitle 2: Should I use DOM instead?
> >
> > My application retrieves several items (attributes and text) from large
XML
> > files. Such items are used to create a spreadsheet. The app is based on
> > JAXP, and the code contains many lines like these:
> >
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@user");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@project");
> > cell.cellValue = oneItemAtATime(xmlFile,
"//root/creator/@projectpath");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@title");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@notes");
> > cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@computer");
> >
> > [...]
>
> I'm not sure what you're using, but nothing in this sample is SAX. It
> sounds like there's some higher level API sitting on top of SAX doing
> something ill-advised. Were you to rewrite this app to use real SAX,
> you could first use a single pass to grab all the values you need; and
> then fill the cells.

And if the queries were complex enough that you couldn't stream it then you
could build a DOM and evaluate each XPath over the same instance. You
certainly don't need to parse the same document N times to accomplish this.

> --
> Elliotte Rusty Harold
> elharo@ibiblio.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Re: Question about performance for COM/SAX gurus

Posted by Elliotte Rusty Harold <el...@ibiblio.org>.
On Tue, Apr 20, 2010 at 4:25 PM, Ramon F Herrera <ra...@patriot.net> wrote:
>
> Subtitle 1: How to make SAX fly.
> Subtitle 2: Should I use DOM instead?
>
> My application retrieves several items (attributes and text) from large XML
> files. Such items are used to create a spreadsheet. The app is based on
> JAXP, and the code contains many lines like these:
>
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@user");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@project");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@projectpath");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@title");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@notes");
> cell.cellValue = oneItemAtATime(xmlFile, "//root/creator/@computer");
>
> [...]

I'm not sure what you're using, but nothing in this sample is SAX. It
sounds like there's some higher level API sitting on top of SAX doing
something ill-advised. Were you to rewrite this app to use real SAX,
you could first use a single pass to grab all the values you need; and
then fill the cells.

-- 
Elliotte Rusty Harold
elharo@ibiblio.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org