You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@shindig.apache.org by Kevin Brown <et...@google.com> on 2008/09/19 14:59:43 UTC

[java] XML parsing performance.

I've noticed that the current performance of xml parsing is pretty bad.

I've got a patch ready to go to improve this substantially. On our internal
deployment it has cut down memory and CPU usage substantially and has
significantly improved overall response time.

Most of the changes that need to be made are compatible with fairly old xml
parsers, but not so many work with pooling DocumentBuilders. For instance,
xerces needs to be upgraded (I'm using 2.8.1, which is about 2 years old,
and that works).

Does anyone have any strong objections to upgrading xerces, or are there any
instances of xml parsers being used that also don't support
DocumentBuilder.reset ?

Re: [java] XML parsing performance.

Posted by Louis Ryan <lr...@google.com>.

This is why I suggest we use an annotating binder on the existing POJOs we
have.

On Fri, Sep 19, 2008 at 10:27 AM, Kevin Brown <et...@google.com> wrote:

> On Fri, Sep 19, 2008 at 7:18 PM, Paul Lindner <pl...@hi5.com> wrote:
>
> > Using JAXB would make life much easier.
> >
> > We could use xjc to generate the pojo classes straight from the xsd file
> > and work with that.   Then you get a nice fast parser/generator for free.
>
>
> POJOs aren't that great of an option for gadget specs because we'd still
> have to maintain all the spec objects in addition to the pojos due to the
> large number of convoluted things we have to do to the specs when we parse
> them. We definitely don't need a generator, we just need something to
> produce the GadgetSpec / MessageBundle objects.
>
>
> >
> >
> > On Sep 19, 2008, at 10:03 AM, Louis Ryan wrote:
> >
> >  Actually Im pretty sure we could switch to STaX and use one of the
> >> annotating binder impls out there. JAXB2 or somesuch. Probably a few
> days
> >> of
> >> coding.
> >>
> >> On Fri, Sep 19, 2008 at 9:01 AM, Kevin Brown <et...@google.com> wrote:
> >>
> >>  On Fri, Sep 19, 2008 at 5:24 PM, Ian Boston <ie...@tfd.co.uk> wrote:
> >>>
> >>>  Two things.
> >>>>
> >>>> Dom nearly always generates excessive object creation that under load
> >>>> stresses the GC of any JVM, I did some comparisons for smallish XML
> >>>>
> >>> blocks
> >>>
> >>>> between DOM and SAX  (< 100 elements) a while back and there was
> >>>> perhapse
> >>>>
> >>> 3x
> >>>
> >>>> on speed and 4x on memory, I cant remember the precise details. I
> >>>>
> >>> understand
> >>>
> >>>> if its node manipulation thats required Dom is just easier and may be
> >>>> the
> >>>> same weight, but if its xml -> object then it may be better to
> consider
> >>>>
> >>> sax
> >>>
> >>>> or a push parser. Cocoon had some bad experience with dom in the
> request
> >>>> cycle in v1 ( al long time ago when parsers were even more creaky)
> >>>>
> >>>>
> >>> Yeah, switching to sax would be a lot more efficient for the stuff that
> >>> we
> >>> do, but it would also require changing a lot more code (most likely we
> >>> could
> >>> write adapters for each object type to simplify it).
> >>>
> >>>
> >>>
> >>>>
> >>>> The other thing I have been told by IBMers is that their JVM makes it
> >>>>
> >>> hard
> >>>
> >>>> to replace the Xerces impl. I dont know if thats relevant.
> >>>>
> >>>> The last 2 comments here http://jira.sakaiproject.org:8081/jira/
> >>>> browse/SAK-14388
> >>>> give some insight into a similar problem.
> >>>>
> >>>>
> >>>> HTH
> >>>> Ian
> >>>>
> >>>> On 19 Sep 2008, at 13:59, Kevin Brown wrote:
> >>>>
> >>>> I've noticed that the current performance of xml parsing is pretty
> bad.
> >>>>
> >>>>>
> >>>>> I've got a patch ready to go to improve this substantially. On our
> >>>>> internal
> >>>>> deployment it has cut down memory and CPU usage substantially and has
> >>>>> significantly improved overall response time.
> >>>>>
> >>>>> Most of the changes that need to be made are compatible with fairly
> old
> >>>>> xml
> >>>>> parsers, but not so many work with pooling DocumentBuilders. For
> >>>>>
> >>>> instance,
> >>>
> >>>> xerces needs to be upgraded (I'm using 2.8.1, which is about 2 years
> >>>>>
> >>>> old,
> >>>
> >>>> and that works).
> >>>>>
> >>>>> Does anyone have any strong objections to upgrading xerces, or are
> >>>>> there
> >>>>> any
> >>>>> instances of xml parsers being used that also don't support
> >>>>> DocumentBuilder.reset ?
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> > Paul Lindner
> > plindner@hi5.com
> >
> >
> >
> >
>

Re: [java] XML parsing performance.

Posted by Kevin Brown <et...@google.com>.

On Fri, Sep 19, 2008 at 7:18 PM, Paul Lindner <pl...@hi5.com> wrote:

> Using JAXB would make life much easier.
>
> We could use xjc to generate the pojo classes straight from the xsd file
> and work with that.   Then you get a nice fast parser/generator for free.


POJOs aren't that great of an option for gadget specs because we'd still
have to maintain all the spec objects in addition to the pojos due to the
large number of convoluted things we have to do to the specs when we parse
them. We definitely don't need a generator, we just need something to
produce the GadgetSpec / MessageBundle objects.


>
>
> On Sep 19, 2008, at 10:03 AM, Louis Ryan wrote:
>
>  Actually Im pretty sure we could switch to STaX and use one of the
>> annotating binder impls out there. JAXB2 or somesuch. Probably a few days
>> of
>> coding.
>>
>> On Fri, Sep 19, 2008 at 9:01 AM, Kevin Brown <et...@google.com> wrote:
>>
>>  On Fri, Sep 19, 2008 at 5:24 PM, Ian Boston <ie...@tfd.co.uk> wrote:
>>>
>>>  Two things.
>>>>
>>>> Dom nearly always generates excessive object creation that under load
>>>> stresses the GC of any JVM, I did some comparisons for smallish XML
>>>>
>>> blocks
>>>
>>>> between DOM and SAX  (< 100 elements) a while back and there was
>>>> perhapse
>>>>
>>> 3x
>>>
>>>> on speed and 4x on memory, I cant remember the precise details. I
>>>>
>>> understand
>>>
>>>> if its node manipulation thats required Dom is just easier and may be
>>>> the
>>>> same weight, but if its xml -> object then it may be better to consider
>>>>
>>> sax
>>>
>>>> or a push parser. Cocoon had some bad experience with dom in the request
>>>> cycle in v1 ( al long time ago when parsers were even more creaky)
>>>>
>>>>
>>> Yeah, switching to sax would be a lot more efficient for the stuff that
>>> we
>>> do, but it would also require changing a lot more code (most likely we
>>> could
>>> write adapters for each object type to simplify it).
>>>
>>>
>>>
>>>>
>>>> The other thing I have been told by IBMers is that their JVM makes it
>>>>
>>> hard
>>>
>>>> to replace the Xerces impl. I dont know if thats relevant.
>>>>
>>>> The last 2 comments here http://jira.sakaiproject.org:8081/jira/
>>>> browse/SAK-14388
>>>> give some insight into a similar problem.
>>>>
>>>>
>>>> HTH
>>>> Ian
>>>>
>>>> On 19 Sep 2008, at 13:59, Kevin Brown wrote:
>>>>
>>>> I've noticed that the current performance of xml parsing is pretty bad.
>>>>
>>>>>
>>>>> I've got a patch ready to go to improve this substantially. On our
>>>>> internal
>>>>> deployment it has cut down memory and CPU usage substantially and has
>>>>> significantly improved overall response time.
>>>>>
>>>>> Most of the changes that need to be made are compatible with fairly old
>>>>> xml
>>>>> parsers, but not so many work with pooling DocumentBuilders. For
>>>>>
>>>> instance,
>>>
>>>> xerces needs to be upgraded (I'm using 2.8.1, which is about 2 years
>>>>>
>>>> old,
>>>
>>>> and that works).
>>>>>
>>>>> Does anyone have any strong objections to upgrading xerces, or are
>>>>> there
>>>>> any
>>>>> instances of xml parsers being used that also don't support
>>>>> DocumentBuilder.reset ?
>>>>>
>>>>>
>>>>
>>>>
>>>
> Paul Lindner
> plindner@hi5.com
>
>
>
>

Re: [java] XML parsing performance.

Posted by Paul Lindner <pl...@hi5.com>.

Using JAXB would make life much easier.

We could use xjc to generate the pojo classes straight from the xsd  
file and work with that.   Then you get a nice fast parser/generator  
for free.

On Sep 19, 2008, at 10:03 AM, Louis Ryan wrote:

> Actually Im pretty sure we could switch to STaX and use one of the
> annotating binder impls out there. JAXB2 or somesuch. Probably a few  
> days of
> coding.
>
> On Fri, Sep 19, 2008 at 9:01 AM, Kevin Brown <et...@google.com> wrote:
>
>> On Fri, Sep 19, 2008 at 5:24 PM, Ian Boston <ie...@tfd.co.uk> wrote:
>>
>>> Two things.
>>>
>>> Dom nearly always generates excessive object creation that under  
>>> load
>>> stresses the GC of any JVM, I did some comparisons for smallish XML
>> blocks
>>> between DOM and SAX  (< 100 elements) a while back and there was  
>>> perhapse
>> 3x
>>> on speed and 4x on memory, I cant remember the precise details. I
>> understand
>>> if its node manipulation thats required Dom is just easier and may  
>>> be the
>>> same weight, but if its xml -> object then it may be better to  
>>> consider
>> sax
>>> or a push parser. Cocoon had some bad experience with dom in the  
>>> request
>>> cycle in v1 ( al long time ago when parsers were even more creaky)
>>>
>>
>> Yeah, switching to sax would be a lot more efficient for the stuff  
>> that we
>> do, but it would also require changing a lot more code (most likely  
>> we
>> could
>> write adapters for each object type to simplify it).
>>
>>
>>>
>>>
>>> The other thing I have been told by IBMers is that their JVM makes  
>>> it
>> hard
>>> to replace the Xerces impl. I dont know if thats relevant.
>>>
>>> The last 2 comments here http://jira.sakaiproject.org:8081/jira/
>>> browse/SAK-14388
>>> give some insight into a similar problem.
>>>
>>>
>>> HTH
>>> Ian
>>>
>>> On 19 Sep 2008, at 13:59, Kevin Brown wrote:
>>>
>>> I've noticed that the current performance of xml parsing is pretty  
>>> bad.
>>>>
>>>> I've got a patch ready to go to improve this substantially. On our
>>>> internal
>>>> deployment it has cut down memory and CPU usage substantially and  
>>>> has
>>>> significantly improved overall response time.
>>>>
>>>> Most of the changes that need to be made are compatible with  
>>>> fairly old
>>>> xml
>>>> parsers, but not so many work with pooling DocumentBuilders. For
>> instance,
>>>> xerces needs to be upgraded (I'm using 2.8.1, which is about 2  
>>>> years
>> old,
>>>> and that works).
>>>>
>>>> Does anyone have any strong objections to upgrading xerces, or  
>>>> are there
>>>> any
>>>> instances of xml parsers being used that also don't support
>>>> DocumentBuilder.reset ?
>>>>
>>>
>>>
>>

Paul Lindner
plindner@hi5.com

Re: [java] XML parsing performance.

Posted by Louis Ryan <lr...@google.com>.

Actually Im pretty sure we could switch to STaX and use one of the
annotating binder impls out there. JAXB2 or somesuch. Probably a few days of
coding.

On Fri, Sep 19, 2008 at 9:01 AM, Kevin Brown <et...@google.com> wrote:

> On Fri, Sep 19, 2008 at 5:24 PM, Ian Boston <ie...@tfd.co.uk> wrote:
>
> > Two things.
> >
> > Dom nearly always generates excessive object creation that under load
> > stresses the GC of any JVM, I did some comparisons for smallish XML
> blocks
> > between DOM and SAX  (< 100 elements) a while back and there was perhapse
> 3x
> > on speed and 4x on memory, I cant remember the precise details. I
> understand
> > if its node manipulation thats required Dom is just easier and may be the
> > same weight, but if its xml -> object then it may be better to consider
> sax
> > or a push parser. Cocoon had some bad experience with dom in the request
> > cycle in v1 ( al long time ago when parsers were even more creaky)
> >
>
> Yeah, switching to sax would be a lot more efficient for the stuff that we
> do, but it would also require changing a lot more code (most likely we
> could
> write adapters for each object type to simplify it).
>
>
> >
> >
> > The other thing I have been told by IBMers is that their JVM makes it
> hard
> > to replace the Xerces impl. I dont know if thats relevant.
> >
> > The last 2 comments here http://jira.sakaiproject.org:8081/jira/
> > browse/SAK-14388
> > give some insight into a similar problem.
> >
> >
> > HTH
> > Ian
> >
> > On 19 Sep 2008, at 13:59, Kevin Brown wrote:
> >
> >  I've noticed that the current performance of xml parsing is pretty bad.
> >>
> >> I've got a patch ready to go to improve this substantially. On our
> >> internal
> >> deployment it has cut down memory and CPU usage substantially and has
> >> significantly improved overall response time.
> >>
> >> Most of the changes that need to be made are compatible with fairly old
> >> xml
> >> parsers, but not so many work with pooling DocumentBuilders. For
> instance,
> >> xerces needs to be upgraded (I'm using 2.8.1, which is about 2 years
> old,
> >> and that works).
> >>
> >> Does anyone have any strong objections to upgrading xerces, or are there
> >> any
> >> instances of xml parsers being used that also don't support
> >> DocumentBuilder.reset ?
> >>
> >
> >
>

Re: [java] XML parsing performance.

Posted by Kevin Brown <et...@google.com>.

On Fri, Sep 19, 2008 at 5:24 PM, Ian Boston <ie...@tfd.co.uk> wrote:

> Two things.
>
> Dom nearly always generates excessive object creation that under load
> stresses the GC of any JVM, I did some comparisons for smallish XML blocks
> between DOM and SAX  (< 100 elements) a while back and there was perhapse 3x
> on speed and 4x on memory, I cant remember the precise details. I understand
> if its node manipulation thats required Dom is just easier and may be the
> same weight, but if its xml -> object then it may be better to consider sax
> or a push parser. Cocoon had some bad experience with dom in the request
> cycle in v1 ( al long time ago when parsers were even more creaky)
>

Yeah, switching to sax would be a lot more efficient for the stuff that we
do, but it would also require changing a lot more code (most likely we could
write adapters for each object type to simplify it).


>
>
> The other thing I have been told by IBMers is that their JVM makes it hard
> to replace the Xerces impl. I dont know if thats relevant.
>
> The last 2 comments here http://jira.sakaiproject.org:8081/jira/
> browse/SAK-14388
> give some insight into a similar problem.
>
>
> HTH
> Ian
>
> On 19 Sep 2008, at 13:59, Kevin Brown wrote:
>
>  I've noticed that the current performance of xml parsing is pretty bad.
>>
>> I've got a patch ready to go to improve this substantially. On our
>> internal
>> deployment it has cut down memory and CPU usage substantially and has
>> significantly improved overall response time.
>>
>> Most of the changes that need to be made are compatible with fairly old
>> xml
>> parsers, but not so many work with pooling DocumentBuilders. For instance,
>> xerces needs to be upgraded (I'm using 2.8.1, which is about 2 years old,
>> and that works).
>>
>> Does anyone have any strong objections to upgrading xerces, or are there
>> any
>> instances of xml parsers being used that also don't support
>> DocumentBuilder.reset ?
>>
>
>

Re: [java] XML parsing performance.

Posted by Ian Boston <ie...@tfd.co.uk>.

Two things.

Dom nearly always generates excessive object creation that under load  
stresses the GC of any JVM, I did some comparisons for smallish XML  
blocks between DOM and SAX  (< 100 elements) a while back and there  
was perhapse 3x on speed and 4x on memory, I cant remember the  
precise details. I understand if its node manipulation thats required  
Dom is just easier and may be the same weight, but if its xml ->  
object then it may be better to consider sax or a push parser. Cocoon  
had some bad experience with dom in the request cycle in v1 ( al long  
time ago when parsers were even more creaky)

The other thing I have been told by IBMers is that their JVM makes it  
hard to replace the Xerces impl. I dont know if thats relevant.

The last 2 comments here http://jira.sakaiproject.org:8081/jira/ 
browse/SAK-14388
give some insight into a similar problem.

HTH
Ian
On 19 Sep 2008, at 13:59, Kevin Brown wrote:

> I've noticed that the current performance of xml parsing is pretty  
> bad.
>
> I've got a patch ready to go to improve this substantially. On our  
> internal
> deployment it has cut down memory and CPU usage substantially and has
> significantly improved overall response time.
>
> Most of the changes that need to be made are compatible with fairly  
> old xml
> parsers, but not so many work with pooling DocumentBuilders. For  
> instance,
> xerces needs to be upgraded (I'm using 2.8.1, which is about 2  
> years old,
> and that works).
>
> Does anyone have any strong objections to upgrading xerces, or are  
> there any
> instances of xml parsers being used that also don't support
> DocumentBuilder.reset ?