You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cxf.apache.org by Benson Margulies <bi...@gmail.com> on 2008/03/19 22:57:22 UTC

More news from the performance front

At this point, the startup performance of the bus is entirely tangled up in
Xerces. I'm having a hard time believing that building a DOM from StaX is
going to beat Xerces, but if someone else thinks so, I guess I'm game.

Sadly, there are no Apache-compatible XML databases I can see out there, so
my idea of 'compiling' all the XML files to DOM trees in some sort of
persistent store seems impossible.

Re: More news from the performance front

Posted by Daniel Kulp <dk...@apache.org>.
On Wednesday 19 March 2008, Christian Vest Hansen wrote:
> Woodstox claim to be validating on their front page.
>
> From the news section it looks like they got W3C Schema validation in
> 3.9.0 23-Nov-2007.

That's a prerelease version that isn't in the maven repos yet.   

Dan



> Just FYI.
>
> On 3/19/08, Daniel Kulp <dk...@apache.org> wrote:
> >  The other issue with using woodstox is that I think you lose
> > validation entirely.
> >
> >  Dan
> >
> >  On Wednesday 19 March 2008, Dan Diephouse wrote:
> >  > Benson Margulies wrote:
> >  > > At this point, the startup performance of the bus is entirely
> >  > > tangled up in Xerces. I'm having a hard time believing that
> >  > > building a DOM from StaX is going to beat Xerces, but if
> >  > > someone else thinks so, I guess I'm game.
> >  > >
> >  > > Sadly, there are no Apache-compatible XML databases I can see
> >  > > out there, so my idea of 'compiling' all the XML files to DOM
> >  > > trees in some sort of persistent store seems impossible.
> >  >
> >  > Woodstox provides a significantly faster SAX implementation than
> >  > Xerces. But the question is - is the startup time in xerces
> >  > because of validation or because of parsing? I'm guess the
> >  > former.
> >  >
> >  > Its easy to test the Woodstox parser:
> >  >
> >  > java -Dorg.xml.sax.driver=com.ctc.wstx.sax.WstxSAXParser ....
> >  >
> >  > Dan
> >
> >  --
> >
> > J. Daniel Kulp
> >  Principal Engineer, IONA
> >  dkulp@apache.org
> >  http://www.dankulp.com/blog



-- 
J. Daniel Kulp
Principal Engineer, IONA
dkulp@apache.org
http://www.dankulp.com/blog

Re: More news from the performance front

Posted by Benson Margulies <bi...@gmail.com>.
I could easily woodstox only when we aren't validating, anyway.

On Wed, Mar 19, 2008 at 7:03 PM, Christian Vest Hansen <ka...@gmail.com>
wrote:

> Woodstox claim to be validating on their front page.
>
> From the news section it looks like they got W3C Schema validation in
> 3.9.0 23-Nov-2007.
>
> Just FYI.
>
> On 3/19/08, Daniel Kulp <dk...@apache.org> wrote:
> >
> >  The other issue with using woodstox is that I think you lose validation
> >  entirely.
> >
> >  Dan
> >
> >
> >
> >  On Wednesday 19 March 2008, Dan Diephouse wrote:
> >  > Benson Margulies wrote:
> >  > > At this point, the startup performance of the bus is entirely
> >  > > tangled up in Xerces. I'm having a hard time believing that
> building
> >  > > a DOM from StaX is going to beat Xerces, but if someone else thinks
> >  > > so, I guess I'm game.
> >  > >
> >  > > Sadly, there are no Apache-compatible XML databases I can see out
> >  > > there, so my idea of 'compiling' all the XML files to DOM trees in
> >  > > some sort of persistent store seems impossible.
> >  >
> >  > Woodstox provides a significantly faster SAX implementation than
> >  > Xerces. But the question is - is the startup time in xerces because
> of
> >  > validation or because of parsing? I'm guess the former.
> >  >
> >  > Its easy to test the Woodstox parser:
> >  >
> >  > java -Dorg.xml.sax.driver=com.ctc.wstx.sax.WstxSAXParser ....
> >  >
> >  > Dan
> >
> >
> >
> >  --
> >
> > J. Daniel Kulp
> >  Principal Engineer, IONA
> >  dkulp@apache.org
> >  http://www.dankulp.com/blog
> >
>
>
> --
> Venlig hilsen / Kind regards,
> Christian Vest Hansen.
>

Re: More news from the performance front

Posted by Christian Vest Hansen <ka...@gmail.com>.
Woodstox claim to be validating on their front page.

>From the news section it looks like they got W3C Schema validation in
3.9.0 23-Nov-2007.

Just FYI.

On 3/19/08, Daniel Kulp <dk...@apache.org> wrote:
>
>  The other issue with using woodstox is that I think you lose validation
>  entirely.
>
>  Dan
>
>
>
>  On Wednesday 19 March 2008, Dan Diephouse wrote:
>  > Benson Margulies wrote:
>  > > At this point, the startup performance of the bus is entirely
>  > > tangled up in Xerces. I'm having a hard time believing that building
>  > > a DOM from StaX is going to beat Xerces, but if someone else thinks
>  > > so, I guess I'm game.
>  > >
>  > > Sadly, there are no Apache-compatible XML databases I can see out
>  > > there, so my idea of 'compiling' all the XML files to DOM trees in
>  > > some sort of persistent store seems impossible.
>  >
>  > Woodstox provides a significantly faster SAX implementation than
>  > Xerces. But the question is - is the startup time in xerces because of
>  > validation or because of parsing? I'm guess the former.
>  >
>  > Its easy to test the Woodstox parser:
>  >
>  > java -Dorg.xml.sax.driver=com.ctc.wstx.sax.WstxSAXParser ....
>  >
>  > Dan
>
>
>
>  --
>
> J. Daniel Kulp
>  Principal Engineer, IONA
>  dkulp@apache.org
>  http://www.dankulp.com/blog
>


-- 
Venlig hilsen / Kind regards,
Christian Vest Hansen.

Re: More news from the performance front

Posted by Daniel Kulp <dk...@apache.org>.
The other issue with using woodstox is that I think you lose validation 
entirely.

Dan


On Wednesday 19 March 2008, Dan Diephouse wrote:
> Benson Margulies wrote:
> > At this point, the startup performance of the bus is entirely
> > tangled up in Xerces. I'm having a hard time believing that building
> > a DOM from StaX is going to beat Xerces, but if someone else thinks
> > so, I guess I'm game.
> >
> > Sadly, there are no Apache-compatible XML databases I can see out
> > there, so my idea of 'compiling' all the XML files to DOM trees in
> > some sort of persistent store seems impossible.
>
> Woodstox provides a significantly faster SAX implementation than
> Xerces. But the question is - is the startup time in xerces because of
> validation or because of parsing? I'm guess the former.
>
> Its easy to test the Woodstox parser:
>
> java -Dorg.xml.sax.driver=com.ctc.wstx.sax.WstxSAXParser ....
>
> Dan



-- 
J. Daniel Kulp
Principal Engineer, IONA
dkulp@apache.org
http://www.dankulp.com/blog

Re: More news from the performance front

Posted by Benson Margulies <bi...@gmail.com>.
Isn't that what you get when you use REST?

On Thu, Mar 20, 2008 at 4:39 PM, Christian Vest Hansen <ka...@gmail.com>
wrote:

> Slap me if I'm going off topic, but how about JSON? I haven't
> researched the JSON capabilities in CXF so don't know how easy it
> would be to use in place of XML (and possiby SOAP?).
> In my case, I need interoperability with Ruby and Google indicates
> that no infoset library exists for ruby, making it a no-go tech in my
> eyes.
>
> On 3/20/08, Benson Margulies <bi...@gmail.com> wrote:
> > CXF already uses WoodStox as to read and write the wire traffic.
> >
> >  For a truly frightening possibility, imagine some WS-xxx that allowed
> client
> >  and server to agree to exchange FastInfoset blobs instead of XML at
> all.
> >
>
>
> --
> Venlig hilsen / Kind regards,
> Christian Vest Hansen.
>

Re: More news from the performance front

Posted by Dan Diephouse <da...@mulesource.com>.
Jackson is a great streaming JSON parser: 
http://www.cowtowncoder.com/hatchery/jackson/index.html. New page is 
going to be here eventually... http://jackson.codehaus.org. Looks like 
Tatu doesn't quite have the downloads up there yet though.

Dan

Christian Vest Hansen wrote:
> Slap me if I'm going off topic, but how about JSON? I haven't
> researched the JSON capabilities in CXF so don't know how easy it
> would be to use in place of XML (and possiby SOAP?).
> In my case, I need interoperability with Ruby and Google indicates
> that no infoset library exists for ruby, making it a no-go tech in my
> eyes.
>
> On 3/20/08, Benson Margulies <bi...@gmail.com> wrote:
>   
>> CXF already uses WoodStox as to read and write the wire traffic.
>>
>>  For a truly frightening possibility, imagine some WS-xxx that allowed client
>>  and server to agree to exchange FastInfoset blobs instead of XML at all.
>>
>>     
>
>
>   


-- 
Dan Diephouse
MuleSource
http://mulesource.com | http://netzooid.com 


Re: More news from the performance front

Posted by Christian Vest Hansen <ka...@gmail.com>.
Slap me if I'm going off topic, but how about JSON? I haven't
researched the JSON capabilities in CXF so don't know how easy it
would be to use in place of XML (and possiby SOAP?).
In my case, I need interoperability with Ruby and Google indicates
that no infoset library exists for ruby, making it a no-go tech in my
eyes.

On 3/20/08, Benson Margulies <bi...@gmail.com> wrote:
> CXF already uses WoodStox as to read and write the wire traffic.
>
>  For a truly frightening possibility, imagine some WS-xxx that allowed client
>  and server to agree to exchange FastInfoset blobs instead of XML at all.
>


-- 
Venlig hilsen / Kind regards,
Christian Vest Hansen.

Re: More news from the performance front

Posted by Benson Margulies <bi...@gmail.com>.
CXF already uses WoodStox as to read and write the wire traffic.

For a truly frightening possibility, imagine some WS-xxx that allowed client
and server to agree to exchange FastInfoset blobs instead of XML at all.

Re: More news from the performance front

Posted by Christian Vest Hansen <ka...@gmail.com>.
Does using wstx also improve non-startup related performance, ie.
parsing/marshalling of incomming and outgoing SOAP messages?

If so, then I would be interested in the details :)


On 3/20/08, Benson Margulies <bi...@gmail.com> wrote:
> Wow. wstx, as advertised, is a LOT faster. I'm submitting my first cut,
>  which passes a full test run, and then I'll look into tuning up the error /
>  resolution questions.
>


-- 
Venlig hilsen / Kind regards,
Christian Vest Hansen.

Re: More news from the performance front

Posted by Benson Margulies <bi...@gmail.com>.
Wow. wstx, as advertised, is a LOT faster. I'm submitting my first cut,
which passes a full test run, and then I'll look into tuning up the error /
resolution questions.

Re: More news from the performance front

Posted by Benson Margulies <bi...@gmail.com>.
oops, for #1 I meant 'copy STAX events'.

Re: More news from the performance front

Posted by Benson Margulies <bi...@gmail.com>.
Looks like there are two approaches possible to connecting in StaX.

1) As per Dan's help last night, use the StaX DOM 'sink', and copy SAX
events. I have this coded. It will be subtly incompatible if any of our
users somehow manager to depend on the Spring entity resolver or fine
details of error handling. I don't see how the first is likely to be an
issue, and the second, well, ...

2) Use the woodstox sax parser class, and then use TraX to turn it's results
into a DOM. That has the advantage of being as compatible as Talu cared to
make the entity and error management.

Even though I've invested effort in #1, I'm thinking that #2 would be the
more virtuous alternative.

Re: More news from the performance front

Posted by Benson Margulies <bi...@gmail.com>.
If this is good, how would we get it into the code? I don't see a way to
tell the DocumentBuilderFactory to use a specific SAX driver.

On Wed, Mar 19, 2008 at 6:34 PM, Dan Diephouse <da...@mulesource.com>
wrote:

> Benson Margulies wrote:
> > At this point, the startup performance of the bus is entirely tangled up
> in
> > Xerces. I'm having a hard time believing that building a DOM from StaX
> is
> > going to beat Xerces, but if someone else thinks so, I guess I'm game.
> >
> > Sadly, there are no Apache-compatible XML databases I can see out there,
> so
> > my idea of 'compiling' all the XML files to DOM trees in some sort of
> > persistent store seems impossible.
> >
> >
> Woodstox provides a significantly faster SAX implementation than Xerces.
> But the question is - is the startup time in xerces because of
> validation or because of parsing? I'm guess the former.
>
> Its easy to test the Woodstox parser:
>
> java -Dorg.xml.sax.driver=com.ctc.wstx.sax.WstxSAXParser ....
>
> Dan
>
> --
> Dan Diephouse
> MuleSource
> http://mulesource.com | http://netzooid.com
>
>

Re: More news from the performance front

Posted by Benson Margulies <bi...@gmail.com>.
It isn't validation. I have validation turned off.

I didn't combine, because the current profile shows all the time down in the
scanner, and that wouldn't be changed by combining except insofar as we
would only list all the namespace schema declarations once.

My current experiment concerns
http://apache.org/xml/features/dom/defer-node-expansion, since these are
small files. I'll try woodstox next.


On Wed, Mar 19, 2008 at 6:34 PM, Dan Diephouse <da...@mulesource.com>
wrote:

> Benson Margulies wrote:
> > At this point, the startup performance of the bus is entirely tangled up
> in
> > Xerces. I'm having a hard time believing that building a DOM from StaX
> is
> > going to beat Xerces, but if someone else thinks so, I guess I'm game.
> >
> > Sadly, there are no Apache-compatible XML databases I can see out there,
> so
> > my idea of 'compiling' all the XML files to DOM trees in some sort of
> > persistent store seems impossible.
> >
> >
> Woodstox provides a significantly faster SAX implementation than Xerces.
> But the question is - is the startup time in xerces because of
> validation or because of parsing? I'm guess the former.
>
> Its easy to test the Woodstox parser:
>
> java -Dorg.xml.sax.driver=com.ctc.wstx.sax.WstxSAXParser ....
>
> Dan
>
> --
> Dan Diephouse
> MuleSource
> http://mulesource.com | http://netzooid.com
>
>

Re: More news from the performance front

Posted by Dan Diephouse <da...@mulesource.com>.
Benson Margulies wrote:
> At this point, the startup performance of the bus is entirely tangled up in
> Xerces. I'm having a hard time believing that building a DOM from StaX is
> going to beat Xerces, but if someone else thinks so, I guess I'm game.
>
> Sadly, there are no Apache-compatible XML databases I can see out there, so
> my idea of 'compiling' all the XML files to DOM trees in some sort of
> persistent store seems impossible.
>
>   
Woodstox provides a significantly faster SAX implementation than Xerces. 
But the question is - is the startup time in xerces because of 
validation or because of parsing? I'm guess the former.

Its easy to test the Woodstox parser:

java -Dorg.xml.sax.driver=com.ctc.wstx.sax.WstxSAXParser ....

Dan

-- 
Dan Diephouse
MuleSource
http://mulesource.com | http://netzooid.com 


Re: More news from the performance front

Posted by Daniel Kulp <dk...@apache.org>.
Benson,

Did you try sticking all the cxf-extension* stuff into a single XML file 
and seeing if that helped?  I'm curious if parsing one big file would be 
significantly faster than a bunch of small files.

Dan


On Wednesday 19 March 2008, Benson Margulies wrote:
> At this point, the startup performance of the bus is entirely tangled
> up in Xerces. I'm having a hard time believing that building a DOM
> from StaX is going to beat Xerces, but if someone else thinks so, I
> guess I'm game.
>
> Sadly, there are no Apache-compatible XML databases I can see out
> there, so my idea of 'compiling' all the XML files to DOM trees in
> some sort of persistent store seems impossible.





-- 
J. Daniel Kulp
Principal Engineer, IONA
dkulp@apache.org
http://www.dankulp.com/blog