You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Mike Pogue <mp...@apache.org> on 2000/03/31 18:36:22 UTC

Re: looking at Crimson merge -- three suggestions

Thanks to the folks from Sun for all their hard work in making this
happen! Now we have some work to do!   :-)

The xml-contrib area is designed so that people can look at the code,
try it out, etc.  The license issues have all been worked out, and the
code is now under the Apache 1.1 license, so feel free to look at it, 
play with it, figure it out, etc.

I have a couple of major suggestions (I'm sure that other people have
more):

1) It has been reported that the Crimson code is 50% faster than
Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
than Crimson on a Windows NT machine. It's not obvious to me why this
would be true!  We need to figure out WHY, so we can create a single
code base that is fast on BOTH.

2) Crimson has a DOM implementation that is particularly interesting. 
It has been reported that it "scales better" as the size of an XML
document goes up, but that is not my experience (but, I've been looking
only at Windows NT, so this could again be a Sparc/Windows difference).
This could be due to differences in memory consumption, or something
else altogether.  We should be able to figure out what's going on here,
and get the best of both worlds.  Because the Xerces DOM is pluggable,
we might need to end up with two DOM's that are optimized for two
different things:  a) the current deferred DOM is optimized for
performance, but maybe not for memory consumption, and b) perhaps the
Crimson DOM is optimized for memory consumption.  

3) Now that we can see the XHTML code, we should be able to compare
Assaf's HTML parser code, and the new Crimson code, so we can end up
with the best of both.  We routinely get requests for HTML parsing, and
this is a pretty self-contained area, so it's a great opportunity to
jump in and contribute!

All of these things are high on my list -- does anybody want to take a
crack at them? This is a great opportunity for some new people to jump
in, and check out all the code...

Mike

P.S.  Traffic is now moving to the xerces-j list...please adjust your
mailing list subscriptions accordingly!

Rajiv Mordani wrote:
> 
> Announicing the release of the code for Crimson XML Parsing Core Library..
> This code is based on Sun's Java Project X and is available via the cvs
> module xml-contrib/crimson for people to look at... Please read the README
> for directions on how to build the source. The list of features to be
> included into xerces is yet to be decided.
> 
> - Rajiv
> 
> --
> :wq

Re: looking at Crimson merge -- three suggestions

Posted by Edwin Goei <Ed...@eng.sun.com>.
Mike Pogue wrote:
>
> P.S.  Traffic is now moving to the xerces-j list...please adjust your
> mailing list subscriptions accordingly!

It looks like http://xml.apache.org/mail.html still refers to xerces-dev
and does not mention xerces-j-dev or xerces-c-dev.

Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.
Hey Pier don't take all the credit ;)... It was **US** and not just you
;).. Pier ran the tests on Windows and I ran them on the ultra 5. ;)..

- Rajiv

--
:wq

On Sat, 1 Apr 2000, Pierpaolo Fumagalli wrote:

> Mike Pogue wrote:
> > 
> > 1) It has been reported that the Crimson code is 50% faster than
> > Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> > than Crimson on a Windows NT machine. It's not obvious to me why this
> > would be true!  We need to figure out WHY, so we can create a single
> > code base that is fast on BOTH.
> 
> Ok... I hate secrecy... It was me running the tests... and, kids, that's
> _TOTALLY_ unbelievable :) :) :)
> 
> 	Pier
> 
> -- 
> ----------------------------------------------------------------------
> pier: stable structure erected over water to allow docking of seacraft
> <ma...@betaversion.org>      <http://www.betaversion.org/~pier/>
> ----------------------------------------------------------------------
> 


Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.
Hey Pier don't take all the credit ;)... It was **US** and not just you
;).. Pier ran the tests on Windows and I ran them on the ultra 5. ;)..

- Rajiv

--
:wq

On Sat, 1 Apr 2000, Pierpaolo Fumagalli wrote:

> Mike Pogue wrote:
> > 
> > 1) It has been reported that the Crimson code is 50% faster than
> > Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> > than Crimson on a Windows NT machine. It's not obvious to me why this
> > would be true!  We need to figure out WHY, so we can create a single
> > code base that is fast on BOTH.
> 
> Ok... I hate secrecy... It was me running the tests... and, kids, that's
> _TOTALLY_ unbelievable :) :) :)
> 
> 	Pier
> 
> -- 
> ----------------------------------------------------------------------
> pier: stable structure erected over water to allow docking of seacraft
> <ma...@betaversion.org>      <http://www.betaversion.org/~pier/>
> ----------------------------------------------------------------------
> 


Re: looking at Crimson merge -- three suggestions

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
Mike Pogue wrote:
> 
> 1) It has been reported that the Crimson code is 50% faster than
> Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> than Crimson on a Windows NT machine. It's not obvious to me why this
> would be true!  We need to figure out WHY, so we can create a single
> code base that is fast on BOTH.

Ok... I hate secrecy... It was me running the tests... and, kids, that's
_TOTALLY_ unbelievable :) :) :)

	Pier

-- 
----------------------------------------------------------------------
pier: stable structure erected over water to allow docking of seacraft
<ma...@betaversion.org>      <http://www.betaversion.org/~pier/>
----------------------------------------------------------------------


Re: looking at Crimson merge -- three suggestions

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
Arnaud Le Hors wrote:
> 
> I don't really have a problem with this but on the principle I find a
> little odd to do something like that over a week-end...

Arnaud, please... When replying please quote at least something of the
original message (otherwise I'll just freak out trying to understand :)

C'ya :)

	Pier

-- 
----------------------------------------------------------------------
pier: stable structure erected over water to allow docking of seacraft
<ma...@betaversion.org>      <http://www.betaversion.org/~pier/>
----------------------------------------------------------------------



Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.
I am still making changes to the code to build it with Xerces. It still
doesn't build. Once I have that, I will do some more testing for memory
and speed performance and then send out the mail to the list. So wait for
some more mail to come by. Sorry about not discussing this in more detail
here.

- Rajiv

--
:wq

On Wed, 5 Apr 2000, Mike Pogue wrote:

> Rajiv,
> 
> 	No big deal here...I think the issue was that it was checked into a
> somewhat unusual place in the tree, with only a single +1 (that I could
> find), and no discussion beforehand of exactly where the code should
> really go in the tree (that I could find).
> Doing it on the weekend resulted in less discussion than is normal for a
> change this big....
> 
> 	But, this is the first time we've done this kind of "whiteboarding", so
> people will probably take some time to let everybody get used to it.
> 
> 	Just so everybody is aware -- the Crimson DOM code is in the main part
> of the tree now, so it can be evaluated, measured, poked, and prodded,
> so we can figure out how best to integrate it into the main body of
> code...
> 
> 	Duncan and I have talked specifically about measuring the memory
> consumption and performance, to see how different this new DOM is, and
> so we can best figure out what to do (should we replace the current lazy
> DOM with the Crimson one?  Make the Crimson one a second DOM, targetted
> at a different memory/performance point?  It's hard to decide anything,
> until we can actually measure what it does!)...
> 
> Mike
> 
> Rajiv Mordani wrote:
> > 
> > Could you be a little more specific why you have a problem with this? I
> > don't think that there is any such guidleine that tells code shouldn't be
> > checked in over the weekend. Infact the impression I have got is that a
> > lot of opensource development happens at night and over the weekends.
> > 
> > - Rajiv
> > 
> > --
> > :wq
> > 
> > On Mon, 3 Apr 2000, Arnaud Le Hors wrote:
> > 
> > > I don't really have a problem with this but on the principle I find a
> > > little odd to do something like that over a week-end...
> > > --
> > > Arnaud  Le Hors - IBM Cupertino, XML Technology Group
> > >
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 


Re: looking at Crimson merge -- three suggestions

Posted by Mike Pogue <mp...@apache.org>.
Rajiv,

	No big deal here...I think the issue was that it was checked into a
somewhat unusual place in the tree, with only a single +1 (that I could
find), and no discussion beforehand of exactly where the code should
really go in the tree (that I could find).
Doing it on the weekend resulted in less discussion than is normal for a
change this big....

	But, this is the first time we've done this kind of "whiteboarding", so
people will probably take some time to let everybody get used to it.

	Just so everybody is aware -- the Crimson DOM code is in the main part
of the tree now, so it can be evaluated, measured, poked, and prodded,
so we can figure out how best to integrate it into the main body of
code...

	Duncan and I have talked specifically about measuring the memory
consumption and performance, to see how different this new DOM is, and
so we can best figure out what to do (should we replace the current lazy
DOM with the Crimson one?  Make the Crimson one a second DOM, targetted
at a different memory/performance point?  It's hard to decide anything,
until we can actually measure what it does!)...

Mike

Rajiv Mordani wrote:
> 
> Could you be a little more specific why you have a problem with this? I
> don't think that there is any such guidleine that tells code shouldn't be
> checked in over the weekend. Infact the impression I have got is that a
> lot of opensource development happens at night and over the weekends.
> 
> - Rajiv
> 
> --
> :wq
> 
> On Mon, 3 Apr 2000, Arnaud Le Hors wrote:
> 
> > I don't really have a problem with this but on the principle I find a
> > little odd to do something like that over a week-end...
> > --
> > Arnaud  Le Hors - IBM Cupertino, XML Technology Group
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.
Could you be a little more specific why you have a problem with this? I
don't think that there is any such guidleine that tells code shouldn't be
checked in over the weekend. Infact the impression I have got is that a
lot of opensource development happens at night and over the weekends.

- Rajiv

--
:wq

On Mon, 3 Apr 2000, Arnaud Le Hors wrote:

> I don't really have a problem with this but on the principle I find a
> little odd to do something like that over a week-end...
> -- 
> Arnaud  Le Hors - IBM Cupertino, XML Technology Group
> 


Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.
Could you be a little more specific why you have a problem with this? I
don't think that there is any such guidleine that tells code shouldn't be
checked in over the weekend. Infact the impression I have got is that a
lot of opensource development happens at night and over the weekends.

- Rajiv

--
:wq

On Mon, 3 Apr 2000, Arnaud Le Hors wrote:

> I don't really have a problem with this but on the principle I find a
> little odd to do something like that over a week-end...
> -- 
> Arnaud  Le Hors - IBM Cupertino, XML Technology Group
> 


Re: looking at Crimson merge -- three suggestions

Posted by Arnaud Le Hors <le...@us.ibm.com>.
I don't really have a problem with this but on the principle I find a
little odd to do something like that over a week-end...
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group

Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.
Since I don't have any vetos so far I shall go ahead and create the
whiteboard.

- Rajiv

--
:wq

On Fri, 31 Mar 2000, Arkin wrote:

> +1
> 
> arkin
> 
> Rajiv Mordani wrote:
> > 
> > I have started looking at 1 and 2. One of the proposals that I had was to
> > create a whiteboard (won't be part of the std build for xerces so not to
> > worry) under xerces-j and make the crimson DOM implementation work with
> > xerces and see the outcome. Can we have a round of +1s for that.  This
> > would also integrate the ElementFactory that someone had asked for earlier
> > on the mailing list..
> > 
> > - Rajiv
> > 
> > --
> > :wq
> > 
> > On Fri, 31 Mar 2000, Mike Pogue wrote:
> > 
> > > Thanks to the folks from Sun for all their hard work in making this
> > > happen! Now we have some work to do!   :-)
> > >
> > > The xml-contrib area is designed so that people can look at the code,
> > > try it out, etc.  The license issues have all been worked out, and the
> > > code is now under the Apache 1.1 license, so feel free to look at it,
> > > play with it, figure it out, etc.
> > >
> > > I have a couple of major suggestions (I'm sure that other people have
> > > more):
> > >
> > > 1) It has been reported that the Crimson code is 50% faster than
> > > Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> > > than Crimson on a Windows NT machine. It's not obvious to me why this
> > > would be true!  We need to figure out WHY, so we can create a single
> > > code base that is fast on BOTH.
> > >
> > > 2) Crimson has a DOM implementation that is particularly interesting.
> > > It has been reported that it "scales better" as the size of an XML
> > > document goes up, but that is not my experience (but, I've been looking
> > > only at Windows NT, so this could again be a Sparc/Windows difference).
> > > This could be due to differences in memory consumption, or something
> > > else altogether.  We should be able to figure out what's going on here,
> > > and get the best of both worlds.  Because the Xerces DOM is pluggable,
> > > we might need to end up with two DOM's that are optimized for two
> > > different things:  a) the current deferred DOM is optimized for
> > > performance, but maybe not for memory consumption, and b) perhaps the
> > > Crimson DOM is optimized for memory consumption.
> > >
> > > 3) Now that we can see the XHTML code, we should be able to compare
> > > Assaf's HTML parser code, and the new Crimson code, so we can end up
> > > with the best of both.  We routinely get requests for HTML parsing, and
> > > this is a pretty self-contained area, so it's a great opportunity to
> > > jump in and contribute!
> > >
> > > All of these things are high on my list -- does anybody want to take a
> > > crack at them? This is a great opportunity for some new people to jump
> > > in, and check out all the code...
> > >
> > > Mike
> > >
> > > P.S.  Traffic is now moving to the xerces-j list...please adjust your
> > > mailing list subscriptions accordingly!
> > >
> > > Rajiv Mordani wrote:
> > > >
> > > > Announicing the release of the code for Crimson XML Parsing Core Library..
> > > > This code is based on Sun's Java Project X and is available via the cvs
> > > > module xml-contrib/crimson for people to look at... Please read the README
> > > > for directions on how to build the source. The list of features to be
> > > > included into xerces is yet to be decided.
> > > >
> > > > - Rajiv
> > > >
> > > > --
> > > > :wq
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> > >
> 
> -- 
> ----------------------------------------------------------------------
> Assaf Arkin                                           www.exoffice.com
> CTO, Exoffice Technologies, Inc.                        www.exolab.org
> 
> 


Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.
Since I don't have any vetos so far I shall go ahead and create the
whiteboard.

- Rajiv

--
:wq

On Fri, 31 Mar 2000, Arkin wrote:

> +1
> 
> arkin
> 
> Rajiv Mordani wrote:
> > 
> > I have started looking at 1 and 2. One of the proposals that I had was to
> > create a whiteboard (won't be part of the std build for xerces so not to
> > worry) under xerces-j and make the crimson DOM implementation work with
> > xerces and see the outcome. Can we have a round of +1s for that.  This
> > would also integrate the ElementFactory that someone had asked for earlier
> > on the mailing list..
> > 
> > - Rajiv
> > 
> > --
> > :wq
> > 
> > On Fri, 31 Mar 2000, Mike Pogue wrote:
> > 
> > > Thanks to the folks from Sun for all their hard work in making this
> > > happen! Now we have some work to do!   :-)
> > >
> > > The xml-contrib area is designed so that people can look at the code,
> > > try it out, etc.  The license issues have all been worked out, and the
> > > code is now under the Apache 1.1 license, so feel free to look at it,
> > > play with it, figure it out, etc.
> > >
> > > I have a couple of major suggestions (I'm sure that other people have
> > > more):
> > >
> > > 1) It has been reported that the Crimson code is 50% faster than
> > > Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> > > than Crimson on a Windows NT machine. It's not obvious to me why this
> > > would be true!  We need to figure out WHY, so we can create a single
> > > code base that is fast on BOTH.
> > >
> > > 2) Crimson has a DOM implementation that is particularly interesting.
> > > It has been reported that it "scales better" as the size of an XML
> > > document goes up, but that is not my experience (but, I've been looking
> > > only at Windows NT, so this could again be a Sparc/Windows difference).
> > > This could be due to differences in memory consumption, or something
> > > else altogether.  We should be able to figure out what's going on here,
> > > and get the best of both worlds.  Because the Xerces DOM is pluggable,
> > > we might need to end up with two DOM's that are optimized for two
> > > different things:  a) the current deferred DOM is optimized for
> > > performance, but maybe not for memory consumption, and b) perhaps the
> > > Crimson DOM is optimized for memory consumption.
> > >
> > > 3) Now that we can see the XHTML code, we should be able to compare
> > > Assaf's HTML parser code, and the new Crimson code, so we can end up
> > > with the best of both.  We routinely get requests for HTML parsing, and
> > > this is a pretty self-contained area, so it's a great opportunity to
> > > jump in and contribute!
> > >
> > > All of these things are high on my list -- does anybody want to take a
> > > crack at them? This is a great opportunity for some new people to jump
> > > in, and check out all the code...
> > >
> > > Mike
> > >
> > > P.S.  Traffic is now moving to the xerces-j list...please adjust your
> > > mailing list subscriptions accordingly!
> > >
> > > Rajiv Mordani wrote:
> > > >
> > > > Announicing the release of the code for Crimson XML Parsing Core Library..
> > > > This code is based on Sun's Java Project X and is available via the cvs
> > > > module xml-contrib/crimson for people to look at... Please read the README
> > > > for directions on how to build the source. The list of features to be
> > > > included into xerces is yet to be decided.
> > > >
> > > > - Rajiv
> > > >
> > > > --
> > > > :wq
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> > >
> 
> -- 
> ----------------------------------------------------------------------
> Assaf Arkin                                           www.exoffice.com
> CTO, Exoffice Technologies, Inc.                        www.exolab.org
> 
> 


Re: looking at Crimson merge -- three suggestions

Posted by Arkin <ar...@exoffice.com>.
+1

arkin

Rajiv Mordani wrote:
> 
> I have started looking at 1 and 2. One of the proposals that I had was to
> create a whiteboard (won't be part of the std build for xerces so not to
> worry) under xerces-j and make the crimson DOM implementation work with
> xerces and see the outcome. Can we have a round of +1s for that.  This
> would also integrate the ElementFactory that someone had asked for earlier
> on the mailing list..
> 
> - Rajiv
> 
> --
> :wq
> 
> On Fri, 31 Mar 2000, Mike Pogue wrote:
> 
> > Thanks to the folks from Sun for all their hard work in making this
> > happen! Now we have some work to do!   :-)
> >
> > The xml-contrib area is designed so that people can look at the code,
> > try it out, etc.  The license issues have all been worked out, and the
> > code is now under the Apache 1.1 license, so feel free to look at it,
> > play with it, figure it out, etc.
> >
> > I have a couple of major suggestions (I'm sure that other people have
> > more):
> >
> > 1) It has been reported that the Crimson code is 50% faster than
> > Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> > than Crimson on a Windows NT machine. It's not obvious to me why this
> > would be true!  We need to figure out WHY, so we can create a single
> > code base that is fast on BOTH.
> >
> > 2) Crimson has a DOM implementation that is particularly interesting.
> > It has been reported that it "scales better" as the size of an XML
> > document goes up, but that is not my experience (but, I've been looking
> > only at Windows NT, so this could again be a Sparc/Windows difference).
> > This could be due to differences in memory consumption, or something
> > else altogether.  We should be able to figure out what's going on here,
> > and get the best of both worlds.  Because the Xerces DOM is pluggable,
> > we might need to end up with two DOM's that are optimized for two
> > different things:  a) the current deferred DOM is optimized for
> > performance, but maybe not for memory consumption, and b) perhaps the
> > Crimson DOM is optimized for memory consumption.
> >
> > 3) Now that we can see the XHTML code, we should be able to compare
> > Assaf's HTML parser code, and the new Crimson code, so we can end up
> > with the best of both.  We routinely get requests for HTML parsing, and
> > this is a pretty self-contained area, so it's a great opportunity to
> > jump in and contribute!
> >
> > All of these things are high on my list -- does anybody want to take a
> > crack at them? This is a great opportunity for some new people to jump
> > in, and check out all the code...
> >
> > Mike
> >
> > P.S.  Traffic is now moving to the xerces-j list...please adjust your
> > mailing list subscriptions accordingly!
> >
> > Rajiv Mordani wrote:
> > >
> > > Announicing the release of the code for Crimson XML Parsing Core Library..
> > > This code is based on Sun's Java Project X and is available via the cvs
> > > module xml-contrib/crimson for people to look at... Please read the README
> > > for directions on how to build the source. The list of features to be
> > > included into xerces is yet to be decided.
> > >
> > > - Rajiv
> > >
> > > --
> > > :wq
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> >

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org



Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.
I have started looking at 1 and 2. One of the proposals that I had was to
create a whiteboard (won't be part of the std build for xerces so not to
worry) under xerces-j and make the crimson DOM implementation work with
xerces and see the outcome. Can we have a round of +1s for that.  This
would also integrate the ElementFactory that someone had asked for earlier
on the mailing list..

- Rajiv

--
:wq

On Fri, 31 Mar 2000, Mike Pogue wrote:

> Thanks to the folks from Sun for all their hard work in making this
> happen! Now we have some work to do!   :-)
> 
> The xml-contrib area is designed so that people can look at the code,
> try it out, etc.  The license issues have all been worked out, and the
> code is now under the Apache 1.1 license, so feel free to look at it, 
> play with it, figure it out, etc.
> 
> I have a couple of major suggestions (I'm sure that other people have
> more):
> 
> 1) It has been reported that the Crimson code is 50% faster than
> Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> than Crimson on a Windows NT machine. It's not obvious to me why this
> would be true!  We need to figure out WHY, so we can create a single
> code base that is fast on BOTH.
> 
> 2) Crimson has a DOM implementation that is particularly interesting. 
> It has been reported that it "scales better" as the size of an XML
> document goes up, but that is not my experience (but, I've been looking
> only at Windows NT, so this could again be a Sparc/Windows difference).
> This could be due to differences in memory consumption, or something
> else altogether.  We should be able to figure out what's going on here,
> and get the best of both worlds.  Because the Xerces DOM is pluggable,
> we might need to end up with two DOM's that are optimized for two
> different things:  a) the current deferred DOM is optimized for
> performance, but maybe not for memory consumption, and b) perhaps the
> Crimson DOM is optimized for memory consumption.  
> 
> 3) Now that we can see the XHTML code, we should be able to compare
> Assaf's HTML parser code, and the new Crimson code, so we can end up
> with the best of both.  We routinely get requests for HTML parsing, and
> this is a pretty self-contained area, so it's a great opportunity to
> jump in and contribute!
> 
> All of these things are high on my list -- does anybody want to take a
> crack at them? This is a great opportunity for some new people to jump
> in, and check out all the code...
> 
> Mike
> 
> P.S.  Traffic is now moving to the xerces-j list...please adjust your
> mailing list subscriptions accordingly!
> 
> Rajiv Mordani wrote:
> > 
> > Announicing the release of the code for Crimson XML Parsing Core Library..
> > This code is based on Sun's Java Project X and is available via the cvs
> > module xml-contrib/crimson for people to look at... Please read the README
> > for directions on how to build the source. The list of features to be
> > included into xerces is yet to be decided.
> > 
> > - Rajiv
> > 
> > --
> > :wq
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 


Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.
I have started looking at 1 and 2. One of the proposals that I had was to
create a whiteboard (won't be part of the std build for xerces so not to
worry) under xerces-j and make the crimson DOM implementation work with
xerces and see the outcome. Can we have a round of +1s for that.  This
would also integrate the ElementFactory that someone had asked for earlier
on the mailing list..

- Rajiv

--
:wq

On Fri, 31 Mar 2000, Mike Pogue wrote:

> Thanks to the folks from Sun for all their hard work in making this
> happen! Now we have some work to do!   :-)
> 
> The xml-contrib area is designed so that people can look at the code,
> try it out, etc.  The license issues have all been worked out, and the
> code is now under the Apache 1.1 license, so feel free to look at it, 
> play with it, figure it out, etc.
> 
> I have a couple of major suggestions (I'm sure that other people have
> more):
> 
> 1) It has been reported that the Crimson code is 50% faster than
> Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> than Crimson on a Windows NT machine. It's not obvious to me why this
> would be true!  We need to figure out WHY, so we can create a single
> code base that is fast on BOTH.
> 
> 2) Crimson has a DOM implementation that is particularly interesting. 
> It has been reported that it "scales better" as the size of an XML
> document goes up, but that is not my experience (but, I've been looking
> only at Windows NT, so this could again be a Sparc/Windows difference).
> This could be due to differences in memory consumption, or something
> else altogether.  We should be able to figure out what's going on here,
> and get the best of both worlds.  Because the Xerces DOM is pluggable,
> we might need to end up with two DOM's that are optimized for two
> different things:  a) the current deferred DOM is optimized for
> performance, but maybe not for memory consumption, and b) perhaps the
> Crimson DOM is optimized for memory consumption.  
> 
> 3) Now that we can see the XHTML code, we should be able to compare
> Assaf's HTML parser code, and the new Crimson code, so we can end up
> with the best of both.  We routinely get requests for HTML parsing, and
> this is a pretty self-contained area, so it's a great opportunity to
> jump in and contribute!
> 
> All of these things are high on my list -- does anybody want to take a
> crack at them? This is a great opportunity for some new people to jump
> in, and check out all the code...
> 
> Mike
> 
> P.S.  Traffic is now moving to the xerces-j list...please adjust your
> mailing list subscriptions accordingly!
> 
> Rajiv Mordani wrote:
> > 
> > Announicing the release of the code for Crimson XML Parsing Core Library..
> > This code is based on Sun's Java Project X and is available via the cvs
> > module xml-contrib/crimson for people to look at... Please read the README
> > for directions on how to build the source. The list of features to be
> > included into xerces is yet to be decided.
> > 
> > - Rajiv
> > 
> > --
> > :wq
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 


Re: looking at Crimson merge -- three suggestions

Posted by Mike Pogue <mp...@apache.org>.
Rajiv Mordani wrote:
> On Fri, 31 Mar 2000, Mike Pogue wrote:
> > 2) Crimson has a DOM implementation that is particularly interesting.
> > It has been reported that it "scales better" as the size of an XML
> > document goes up, but that is not my experience (but, I've been looking
> > only at Windows NT, so this could again be a Sparc/Windows difference).
> > This could be due to differences in memory consumption, or something
> > else altogether.  We should be able to figure out what's going on here,
> > and get the best of both worlds.  Because the Xerces DOM is pluggable,
> > we might need to end up with two DOM's that are optimized for two
> > different things:  a) the current deferred DOM is optimized for
> > performance, but maybe not for memory consumption, and b) perhaps the
> > Crimson DOM is optimized for memory consumption.
> 
> The differences were in the SAX parsing. In DOM there was no difference on
> windows / Sparc. On both the platforms the crimson dom parser scales
> better in both the time to create the DOM and the size of the files.
> 
> - Rajiv

That has not been my experience in our tests, where the Xerces-DOM was
better on 
initial parse, and on subsequent traversal/use of the tree.  We should
compare tests!
We might have missed something here...How much memory do you give the
thing when you run it?   
Maybe that's one difference that we haven't accounted for...

Mike

P.S.  REMINDER: PLEASE MOVE TO THE XERCES-J-DEV MAILING LIST!!

Re: looking at Crimson merge -- three suggestions

Posted by Mike Pogue <mp...@apache.org>.
Rajiv Mordani wrote:
> On Fri, 31 Mar 2000, Mike Pogue wrote:
> > 2) Crimson has a DOM implementation that is particularly interesting.
> > It has been reported that it "scales better" as the size of an XML
> > document goes up, but that is not my experience (but, I've been looking
> > only at Windows NT, so this could again be a Sparc/Windows difference).
> > This could be due to differences in memory consumption, or something
> > else altogether.  We should be able to figure out what's going on here,
> > and get the best of both worlds.  Because the Xerces DOM is pluggable,
> > we might need to end up with two DOM's that are optimized for two
> > different things:  a) the current deferred DOM is optimized for
> > performance, but maybe not for memory consumption, and b) perhaps the
> > Crimson DOM is optimized for memory consumption.
> 
> The differences were in the SAX parsing. In DOM there was no difference on
> windows / Sparc. On both the platforms the crimson dom parser scales
> better in both the time to create the DOM and the size of the files.
> 
> - Rajiv

That has not been my experience in our tests, where the Xerces-DOM was
better on 
initial parse, and on subsequent traversal/use of the tree.  We should
compare tests!
We might have missed something here...How much memory do you give the
thing when you run it?   
Maybe that's one difference that we haven't accounted for...

Mike

P.S.  REMINDER: PLEASE MOVE TO THE XERCES-J-DEV MAILING LIST!!

Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.

--
:wq

On Fri, 31 Mar 2000, Mike Pogue wrote:

> Thanks to the folks from Sun for all their hard work in making this
> happen! Now we have some work to do!   :-)
> 
> The xml-contrib area is designed so that people can look at the code,
> try it out, etc.  The license issues have all been worked out, and the
> code is now under the Apache 1.1 license, so feel free to look at it, 
> play with it, figure it out, etc.
> 
> I have a couple of major suggestions (I'm sure that other people have
> more):
> 
> 1) It has been reported that the Crimson code is 50% faster than
> Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> than Crimson on a Windows NT machine. It's not obvious to me why this
> would be true!  We need to figure out WHY, so we can create a single
> code base that is fast on BOTH.
> 
> 2) Crimson has a DOM implementation that is particularly interesting. 
> It has been reported that it "scales better" as the size of an XML
> document goes up, but that is not my experience (but, I've been looking
> only at Windows NT, so this could again be a Sparc/Windows difference).
> This could be due to differences in memory consumption, or something
> else altogether.  We should be able to figure out what's going on here,
> and get the best of both worlds.  Because the Xerces DOM is pluggable,
> we might need to end up with two DOM's that are optimized for two
> different things:  a) the current deferred DOM is optimized for
> performance, but maybe not for memory consumption, and b) perhaps the
> Crimson DOM is optimized for memory consumption.  

The differences were in the SAX parsing. In DOM there was no difference on
windows / Sparc. On both the platforms the crimson dom parser scales
better in both the time to create the DOM and the size of the files. 

- Rajiv

> 
> 3) Now that we can see the XHTML code, we should be able to compare
> Assaf's HTML parser code, and the new Crimson code, so we can end up
> with the best of both.  We routinely get requests for HTML parsing, and
> this is a pretty self-contained area, so it's a great opportunity to
> jump in and contribute!
> 
> All of these things are high on my list -- does anybody want to take a
> crack at them? This is a great opportunity for some new people to jump
> in, and check out all the code...
> 
> Mike
> 
> P.S.  Traffic is now moving to the xerces-j list...please adjust your
> mailing list subscriptions accordingly!
> 
> Rajiv Mordani wrote:
> > 
> > Announicing the release of the code for Crimson XML Parsing Core Library..
> > This code is based on Sun's Java Project X and is available via the cvs
> > module xml-contrib/crimson for people to look at... Please read the README
> > for directions on how to build the source. The list of features to be
> > included into xerces is yet to be decided.
> > 
> > - Rajiv
> > 
> > --
> > :wq
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 


Re: looking at Crimson merge -- three suggestions

Posted by Rajiv Mordani <Ra...@eng.sun.com>.

--
:wq

On Fri, 31 Mar 2000, Mike Pogue wrote:

> Thanks to the folks from Sun for all their hard work in making this
> happen! Now we have some work to do!   :-)
> 
> The xml-contrib area is designed so that people can look at the code,
> try it out, etc.  The license issues have all been worked out, and the
> code is now under the Apache 1.1 license, so feel free to look at it, 
> play with it, figure it out, etc.
> 
> I have a couple of major suggestions (I'm sure that other people have
> more):
> 
> 1) It has been reported that the Crimson code is 50% faster than
> Xerces-J when running on a Sparc Ultra-5, however Xerces-J is 40% faster
> than Crimson on a Windows NT machine. It's not obvious to me why this
> would be true!  We need to figure out WHY, so we can create a single
> code base that is fast on BOTH.
> 
> 2) Crimson has a DOM implementation that is particularly interesting. 
> It has been reported that it "scales better" as the size of an XML
> document goes up, but that is not my experience (but, I've been looking
> only at Windows NT, so this could again be a Sparc/Windows difference).
> This could be due to differences in memory consumption, or something
> else altogether.  We should be able to figure out what's going on here,
> and get the best of both worlds.  Because the Xerces DOM is pluggable,
> we might need to end up with two DOM's that are optimized for two
> different things:  a) the current deferred DOM is optimized for
> performance, but maybe not for memory consumption, and b) perhaps the
> Crimson DOM is optimized for memory consumption.  

The differences were in the SAX parsing. In DOM there was no difference on
windows / Sparc. On both the platforms the crimson dom parser scales
better in both the time to create the DOM and the size of the files. 

- Rajiv

> 
> 3) Now that we can see the XHTML code, we should be able to compare
> Assaf's HTML parser code, and the new Crimson code, so we can end up
> with the best of both.  We routinely get requests for HTML parsing, and
> this is a pretty self-contained area, so it's a great opportunity to
> jump in and contribute!
> 
> All of these things are high on my list -- does anybody want to take a
> crack at them? This is a great opportunity for some new people to jump
> in, and check out all the code...
> 
> Mike
> 
> P.S.  Traffic is now moving to the xerces-j list...please adjust your
> mailing list subscriptions accordingly!
> 
> Rajiv Mordani wrote:
> > 
> > Announicing the release of the code for Crimson XML Parsing Core Library..
> > This code is based on Sun's Java Project X and is available via the cvs
> > module xml-contrib/crimson for people to look at... Please read the README
> > for directions on how to build the source. The list of features to be
> > included into xerces is yet to be decided.
> > 
> > - Rajiv
> > 
> > --
> > :wq
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>