You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Co...@trilogy.com on 2001/01/15 07:22:42 UTC

Raw parser speed...

I'm about to write a few small programs along with a writeup to just give 
a general parser to parser comparison of base parsing speed.  People often 

ask me at work how the raw speed of parser X compares to parser Y (the 
most common X and Y in my real life case actually being XML4J 2.0.15 and 
'whatever the most current version of Xerces is' (at the time of the 
question)).

There is a good chance that a coworker and I will flesh it out for an XML 
Journal article we're slated for.  Does anyone have any 
suggestions/comments on things to check/common approaches/etc?  We are 
less concerned about tests that try to evenly exercise every aspect of the 

spec (like David Brownell's SAX based test driver attempts to do), but 
much more with A) basic DOM parsing of pretty run-of-the-mill documents 
and B) parsing of very very large documents. (we have a soft spot in our 
hearts for B)

Any input would be great!

---
Corbett J. Klempay
Trilogy
512.532.5176 (W) | 512.750.1372 (C)
corbett.klempay@trilogy.com 

RE: Raw parser speed...

Posted by Andrew Lunstad <aj...@eternalwarriors.com>.
To clarify:

We have seen a roughly 5x speed degradation in our codes performance on
systems which have had J++ installed. Uninstalling J++ has not remedied the
situation on the NT4 machine where we attempted this remedy. This problem
exists on both 2000 and NT.

I am not aware of any MS acknowledgement of this issue and have not
submitted a bug report (yet anyhow -- we discovered this recently). I have
searched the Knowledge Base for performance issues and have only come up
with a debugging under 2000 problem which they have fixed (thank you MS),
but no mention of the general performance degradation with J++.

Andrew

-----Original Message-----
From: xerces-j-dev-return-5111-ajl=intertechsys.com@xml.apache.org
[mailto:xerces-j-dev-return-5111-ajl=intertechsys.com@xml.apache.org]On
Behalf Of Brad O'Hearne
Sent: Wednesday, January 17, 2001 4:04 PM
To: xerces-j-dev@xml.apache.org
Subject: RE: Raw parser speed...


Can you elaborate a little more?  It appears that you are saying that even
with the presence of Visual J++ on your system, either installed or once
installed but now uninstalled, that there is a performance degradation?  Do
you have any documents that speak to this, or a link to an MS knowledgebase
article about this?

Brad

-----Original Message-----
From: Andrew Lunstad [mailto:ajl@fenriswolf.com]
Sent: Wednesday, January 17, 2001 8:54 AM
To: xerces-j-dev@xml.apache.org; henrik.stahl@iconmedialab.se
Subject: RE: Raw parser speed...


One other thing to keep in mind if you are interested in performance and
using a Microsoft environment is that Visual J++ causes a significant
degradation in your JVM performance even when not debugging. Uninstalling
J++ does not restore the performance either, some other form of 'system
cleaning' is necessary. (I know reformatting the drive works ;->)

Andrew

-----Original Message-----
Brad O'Hearne wrote:

>  Corbett,I have been watching the list for responses to this, but I am
> acutely interested in this subject matter, and am interested in any
> more info or tests you have run.  I don't mind you posting it to this
> list, but if you would rather this go offline, email me at
> cabodog@megapathdsl.net. For the past year I have been involved in a
> project where the file sizes I am parsing are 200MB+, and these are
> now being switched over to XML.  Parsing speed is a must....in fact,
> if anyone on this list can direct me to where I can download the
> Xerces source code, I would be interested in taking a peak at the IO
> and token parsing algorithm....Thanks in advance,Brad O'Hearne
>
>      -----Original Message-----
>      From: Corbett.Klempay@trilogy.com
>      [mailto:Corbett.Klempay@trilogy.com]
>      Sent: Sunday, January 14, 2001 10:23 PM
>      To: xerces-j-dev@xml.apache.org
>      Subject: Raw parser speed...
>
>
>
>      I'm about to write a few small programs along with a writeup
>      to just give
>      a general parser to parser comparison of base parsing
>      speed.  People often
>      ask me at work how the raw speed of parser X compares to
>      parser Y (the
>      most common X and Y in my real life case actually being
>      XML4J 2.0.15 and
>      'whatever the most current version of Xerces is' (at the
>      time of the
>      question)).
>
>      There is a good chance that a coworker and I will flesh it
>      out for an XML
>      Journal article we're slated for.  Does anyone have any
>      suggestions/comments on things to check/common
>      approaches/etc?  We are
>      less concerned about tests that try to evenly exercise every
>      aspect of the
>      spec (like David Brownell's SAX based test driver attempts
>      to do), but
>      much more with A) basic DOM parsing of pretty
>      run-of-the-mill documents
>      and B) parsing of very very large documents. (we have a soft
>      spot in our
>      hearts for B)
>
>      Any input would be great!
>
>      ---
>      Corbett J. Klempay
>      Trilogy
>      512.532.5176 (W) | 512.750.1372 (C)
>      corbett.klempay@trilogy.com
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


RE: Raw parser speed...

Posted by Brad O'Hearne <ca...@megapathdsl.net>.
...and I should have added that I am interested in exactly whether "the
presence" of the IDE is the crux here -- even if no use of J++ has been made
at all, and all Java code has been developed, debugged, etc. in a different
IDE.  I hope no one on the planet is still using J++.

Brad

-----Original Message-----
From: Brad O'Hearne [mailto:cabodog@megapathdsl.net]
Sent: Wednesday, January 17, 2001 2:04 PM
To: xerces-j-dev@xml.apache.org
Subject: RE: Raw parser speed...


Can you elaborate a little more?  It appears that you are saying that even
with the presence of Visual J++ on your system, either installed or once
installed but now uninstalled, that there is a performance degradation?  Do
you have any documents that speak to this, or a link to an MS knowledgebase
article about this?

Brad

-----Original Message-----
From: Andrew Lunstad [mailto:ajl@fenriswolf.com]
Sent: Wednesday, January 17, 2001 8:54 AM
To: xerces-j-dev@xml.apache.org; henrik.stahl@iconmedialab.se
Subject: RE: Raw parser speed...


One other thing to keep in mind if you are interested in performance and
using a Microsoft environment is that Visual J++ causes a significant
degradation in your JVM performance even when not debugging. Uninstalling
J++ does not restore the performance either, some other form of 'system
cleaning' is necessary. (I know reformatting the drive works ;->)

Andrew

-----Original Message-----
Brad O'Hearne wrote:

>  Corbett,I have been watching the list for responses to this, but I am
> acutely interested in this subject matter, and am interested in any
> more info or tests you have run.  I don't mind you posting it to this
> list, but if you would rather this go offline, email me at
> cabodog@megapathdsl.net. For the past year I have been involved in a
> project where the file sizes I am parsing are 200MB+, and these are
> now being switched over to XML.  Parsing speed is a must....in fact,
> if anyone on this list can direct me to where I can download the
> Xerces source code, I would be interested in taking a peak at the IO
> and token parsing algorithm....Thanks in advance,Brad O'Hearne
>
>      -----Original Message-----
>      From: Corbett.Klempay@trilogy.com
>      [mailto:Corbett.Klempay@trilogy.com]
>      Sent: Sunday, January 14, 2001 10:23 PM
>      To: xerces-j-dev@xml.apache.org
>      Subject: Raw parser speed...
>
>
>
>      I'm about to write a few small programs along with a writeup
>      to just give
>      a general parser to parser comparison of base parsing
>      speed.  People often
>      ask me at work how the raw speed of parser X compares to
>      parser Y (the
>      most common X and Y in my real life case actually being
>      XML4J 2.0.15 and
>      'whatever the most current version of Xerces is' (at the
>      time of the
>      question)).
>
>      There is a good chance that a coworker and I will flesh it
>      out for an XML
>      Journal article we're slated for.  Does anyone have any
>      suggestions/comments on things to check/common
>      approaches/etc?  We are
>      less concerned about tests that try to evenly exercise every
>      aspect of the
>      spec (like David Brownell's SAX based test driver attempts
>      to do), but
>      much more with A) basic DOM parsing of pretty
>      run-of-the-mill documents
>      and B) parsing of very very large documents. (we have a soft
>      spot in our
>      hearts for B)
>
>      Any input would be great!
>
>      ---
>      Corbett J. Klempay
>      Trilogy
>      512.532.5176 (W) | 512.750.1372 (C)
>      corbett.klempay@trilogy.com
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


RE: Raw parser speed...

Posted by Brad O'Hearne <ca...@megapathdsl.net>.
Can you elaborate a little more?  It appears that you are saying that even
with the presence of Visual J++ on your system, either installed or once
installed but now uninstalled, that there is a performance degradation?  Do
you have any documents that speak to this, or a link to an MS knowledgebase
article about this?

Brad

-----Original Message-----
From: Andrew Lunstad [mailto:ajl@fenriswolf.com]
Sent: Wednesday, January 17, 2001 8:54 AM
To: xerces-j-dev@xml.apache.org; henrik.stahl@iconmedialab.se
Subject: RE: Raw parser speed...


One other thing to keep in mind if you are interested in performance and
using a Microsoft environment is that Visual J++ causes a significant
degradation in your JVM performance even when not debugging. Uninstalling
J++ does not restore the performance either, some other form of 'system
cleaning' is necessary. (I know reformatting the drive works ;->)

Andrew

-----Original Message-----
Brad O'Hearne wrote:

>  Corbett,I have been watching the list for responses to this, but I am
> acutely interested in this subject matter, and am interested in any
> more info or tests you have run.  I don't mind you posting it to this
> list, but if you would rather this go offline, email me at
> cabodog@megapathdsl.net. For the past year I have been involved in a
> project where the file sizes I am parsing are 200MB+, and these are
> now being switched over to XML.  Parsing speed is a must....in fact,
> if anyone on this list can direct me to where I can download the
> Xerces source code, I would be interested in taking a peak at the IO
> and token parsing algorithm....Thanks in advance,Brad O'Hearne
>
>      -----Original Message-----
>      From: Corbett.Klempay@trilogy.com
>      [mailto:Corbett.Klempay@trilogy.com]
>      Sent: Sunday, January 14, 2001 10:23 PM
>      To: xerces-j-dev@xml.apache.org
>      Subject: Raw parser speed...
>
>
>
>      I'm about to write a few small programs along with a writeup
>      to just give
>      a general parser to parser comparison of base parsing
>      speed.  People often
>      ask me at work how the raw speed of parser X compares to
>      parser Y (the
>      most common X and Y in my real life case actually being
>      XML4J 2.0.15 and
>      'whatever the most current version of Xerces is' (at the
>      time of the
>      question)).
>
>      There is a good chance that a coworker and I will flesh it
>      out for an XML
>      Journal article we're slated for.  Does anyone have any
>      suggestions/comments on things to check/common
>      approaches/etc?  We are
>      less concerned about tests that try to evenly exercise every
>      aspect of the
>      spec (like David Brownell's SAX based test driver attempts
>      to do), but
>      much more with A) basic DOM parsing of pretty
>      run-of-the-mill documents
>      and B) parsing of very very large documents. (we have a soft
>      spot in our
>      hearts for B)
>
>      Any input would be great!
>
>      ---
>      Corbett J. Klempay
>      Trilogy
>      512.532.5176 (W) | 512.750.1372 (C)
>      corbett.klempay@trilogy.com
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


RE: Raw parser speed...

Posted by Andrew Lunstad <aj...@fenriswolf.com>.
One other thing to keep in mind if you are interested in performance and
using a Microsoft environment is that Visual J++ causes a significant
degradation in your JVM performance even when not debugging. Uninstalling
J++ does not restore the performance either, some other form of 'system
cleaning' is necessary. (I know reformatting the drive works ;->)

Andrew

-----Original Message-----
Brad O'Hearne wrote:

>  Corbett,I have been watching the list for responses to this, but I am
> acutely interested in this subject matter, and am interested in any
> more info or tests you have run.  I don't mind you posting it to this
> list, but if you would rather this go offline, email me at
> cabodog@megapathdsl.net. For the past year I have been involved in a
> project where the file sizes I am parsing are 200MB+, and these are
> now being switched over to XML.  Parsing speed is a must....in fact,
> if anyone on this list can direct me to where I can download the
> Xerces source code, I would be interested in taking a peak at the IO
> and token parsing algorithm....Thanks in advance,Brad O'Hearne
>
>      -----Original Message-----
>      From: Corbett.Klempay@trilogy.com
>      [mailto:Corbett.Klempay@trilogy.com]
>      Sent: Sunday, January 14, 2001 10:23 PM
>      To: xerces-j-dev@xml.apache.org
>      Subject: Raw parser speed...
>
>
>
>      I'm about to write a few small programs along with a writeup
>      to just give
>      a general parser to parser comparison of base parsing
>      speed.  People often
>      ask me at work how the raw speed of parser X compares to
>      parser Y (the
>      most common X and Y in my real life case actually being
>      XML4J 2.0.15 and
>      'whatever the most current version of Xerces is' (at the
>      time of the
>      question)).
>
>      There is a good chance that a coworker and I will flesh it
>      out for an XML
>      Journal article we're slated for.  Does anyone have any
>      suggestions/comments on things to check/common
>      approaches/etc?  We are
>      less concerned about tests that try to evenly exercise every
>      aspect of the
>      spec (like David Brownell's SAX based test driver attempts
>      to do), but
>      much more with A) basic DOM parsing of pretty
>      run-of-the-mill documents
>      and B) parsing of very very large documents. (we have a soft
>      spot in our
>      hearts for B)
>
>      Any input would be great!
>
>      ---
>      Corbett J. Klempay
>      Trilogy
>      512.532.5176 (W) | 512.750.1372 (C)
>      corbett.klempay@trilogy.com
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Raw parser speed...

Posted by Henrik Stahl <he...@parallelconsulting.com>.
FYI

We did some testing on combinations of different parsers and different
JVM:s a while ago
in a project where fast parsing of small files (~2kB) was critical. In
our case, Xerces 1.2.3
(the latest version) turned out to be some 10-30% faster that previous
versions and about the
same speed as the Oracle parser. One interesting discovery was that
changing the JVM to
the Hotspot client VM 1.3 helped quite a bit, and that a specialized
server VM from a Swedish
company (www.appeal.com) was 2-3 times faster yet. Unfortunately we
could not
use it in our application due to the lack of Solaris support.

The benchmark we ran was using 10-50 threads in parallel, all parsing a
small XML document.

Henrik Ståhl
Icon Medialab Parallel

Brad O'Hearne wrote:

>  Corbett,I have been watching the list for responses to this, but I am
> acutely interested in this subject matter, and am interested in any
> more info or tests you have run.  I don't mind you posting it to this
> list, but if you would rather this go offline, email me at
> cabodog@megapathdsl.net. For the past year I have been involved in a
> project where the file sizes I am parsing are 200MB+, and these are
> now being switched over to XML.  Parsing speed is a must....in fact,
> if anyone on this list can direct me to where I can download the
> Xerces source code, I would be interested in taking a peak at the IO
> and token parsing algorithm....Thanks in advance,Brad O'Hearne
>
>      -----Original Message-----
>      From: Corbett.Klempay@trilogy.com
>      [mailto:Corbett.Klempay@trilogy.com]
>      Sent: Sunday, January 14, 2001 10:23 PM
>      To: xerces-j-dev@xml.apache.org
>      Subject: Raw parser speed...
>
>
>
>      I'm about to write a few small programs along with a writeup
>      to just give
>      a general parser to parser comparison of base parsing
>      speed.  People often
>      ask me at work how the raw speed of parser X compares to
>      parser Y (the
>      most common X and Y in my real life case actually being
>      XML4J 2.0.15 and
>      'whatever the most current version of Xerces is' (at the
>      time of the
>      question)).
>
>      There is a good chance that a coworker and I will flesh it
>      out for an XML
>      Journal article we're slated for.  Does anyone have any
>      suggestions/comments on things to check/common
>      approaches/etc?  We are
>      less concerned about tests that try to evenly exercise every
>      aspect of the
>      spec (like David Brownell's SAX based test driver attempts
>      to do), but
>      much more with A) basic DOM parsing of pretty
>      run-of-the-mill documents
>      and B) parsing of very very large documents. (we have a soft
>      spot in our
>      hearts for B)
>
>      Any input would be great!
>
>      ---
>      Corbett J. Klempay
>      Trilogy
>      512.532.5176 (W) | 512.750.1372 (C)
>      corbett.klempay@trilogy.com
>


RE: Raw parser speed...

Posted by Brad O'Hearne <ca...@megapathdsl.net>.
Corbett,

I have been watching the list for responses to this, but I am acutely
interested in this subject matter, and am interested in any more info or
tests you have run.  I don't mind you posting it to this list, but if you
would rather this go offline, email me at cabodog@megapathdsl.net.

For the past year I have been involved in a project where the file sizes I
am parsing are 200MB+, and these are now being switched over to XML.
Parsing speed is a must....in fact, if anyone on this list can direct me to
where I can download the Xerces source code, I would be interested in taking
a peak at the IO and token parsing algorithm....

Thanks in advance,

Brad O'Hearne
  ---- -Original Message-----
  From: Corbett.Klempay@trilogy.com [mailto:Corbett.Klempay@trilogy.com]
  Sent: Sunday, January 14, 2001 10:23 PM
  To: xerces-j-dev@xml.apache.org
  Subject: Raw parser speed...



  I'm about to write a few small programs along with a writeup to just give
  a general parser to parser comparison of base parsing speed.  People often
  ask me at work how the raw speed of parser X compares to parser Y (the
  most common X and Y in my real life case actually being XML4J 2.0.15 and
  'whatever the most current version of Xerces is' (at the time of the
  question)).

  There is a good chance that a coworker and I will flesh it out for an XML
  Journal article we're slated for.  Does anyone have any
  suggestions/comments on things to check/common approaches/etc?  We are
  less concerned about tests that try to evenly exercise every aspect of the
  spec (like David Brownell's SAX based test driver attempts to do), but
  much more with A) basic DOM parsing of pretty run-of-the-mill documents
  and B) parsing of very very large documents. (we have a soft spot in our
  hearts for B)

  Any input would be great!

  ---
  Corbett J. Klempay
  Trilogy
  512.532.5176 (W) | 512.750.1372 (C)
  corbett.klempay@trilogy.com