You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@xml.apache.org by James Duncan Davidson <ja...@eng.sun.com> on 2000/07/08 07:31:16 UTC

[spinnaker] Announce

It's been a while since Xerces was launched onto the world. And more
recently we received Crimson to compare it to. From experience and this
comparison, we've found a few things to be evident.

    * Xerces is performant on JDK 1.1 VMs. Very much so. Admirably
      so in fact.

    * Crimson isn't so optimized, yet it runs about as fast as Xerces
      does on modern VMs such as HotSpot. The HotSpot team told us
      that heavily optimized code for 1.1 would not benefit under
      HotSpot. We have the proof now. In fact, there's cases where
      it seems that Xerces slows down.

    * Xerces has a large memory consumption. And a large Jar size.
      This probably wasn't an original design goal, but there are a
      catagory of users that we've talked to that have an issue with
      this.

    * Use of Xerces is widespread. Obviously people want a good, high
      quality parser from a free source.

    * Xerces is a great product. It stands well in the marketplace.

    * However, because Xerces was heavily pre-optimized, its
      extremely complex to understand and delve into. I think
      that this is best reflected in that most of the bits that
      go into Xerces come from IBM Cupertino.

    * In our analysis of the Xerces code base, we can't use it for
      future inclusion in the JDK. The pre-optimization is a killer.
      The code-complexity is a killer. And the memory consumption is
      a problem.

These are not unknown problems. Ted L. and I talked about the current Xerces
source base at length at ApacheCon (as we were working out details for
getting Crimson donated). Ted put forward the opinion that it might be best
to do a massive refactoring based on the lessons learned from both parsers.
To essentially ground up a new parser that has a heritage in both existing
parsers.

I've come to the conclusion that I agree with him. After quite a bit of
discussion, the rest of the XML team at Sun, the people who are responsible
for the parser that will ship in the core of future JDKs, agree as well. It
is important to stress that we want to ship an Apache based parser in the
JDK for all the reasons that you'd expect. Apache code tends to be good
code. The Apache process is one that we beleive in.

So, in the best of Apache traditions, were gonna do something about it. I'm
creating a tree in the xml-contrib area in which to do a lot of code work to
explore how such a new parser could come to be. It's called Spinnaker.

This is the Spinnaker project description based on the README that will get
checked in:
-=-------------------------------------------------------------------------

Spinnaker is an attempt to create a next generation Apache XML Parser based
on all the lessons learned from the current versions of Xerces and Crimson.

GOALS:

    * Simple to read, maintainable code. Above all, this is the primary goal
      for any openly developed project as without the ability to read the
      code, it's impossible for people to contribute and get involved.

    * Smallest possible size. This means small distribution size (JAR file)
      and small memory footprint.

    * Modular. It should be possible to build a parser as a set of Jar files
      so that a smaller parser can be assembled which fits the need of a
      particular implementation. For example, in TV sets do you really need
      validation?

    * Cleanly Optimized. This means optimized in a way that is compatible
      with modern virtual machines such as HotSpot. Optimizations that work
      well with JDK 1.1 style VMs can actually impact performance under
      more modern VMs. Optimizations that interfere with readability,
      modularity, or size will be shunned.

    * Collaboratively Developed. This means that we want *lots* of people
      from diverse backgrounds to participate in this barn raising.

PLAN OF RECORD:

In order to bootstrap what will essentially be a full refactoring of what an
XML parser is (base on our two existing ones), the following is a list of
possible checkpoints to hit.

    * First, factor out utility classes from both the Xerces and Crimson
      source bases. There is a lot of good work on things like the Xerces
      decoders which are faster than the JDK's. This is actually the start
      of an Apache wide common utility set (something that I'd like
      to see in the future as AUC -- Apache Utility Classes). We've talked
      about this before in other Apache projects, and there's a lot of
      good code that we can start it off with here.

    * Determine what the modular API looks like. What are the various
      peices that can be factored out. How can we get to a point where it's
      easy to package a parser that doesn't include DOM or a particular
      validator? There's some work started on a branch, but it hasn't
      been touched in a month or so. This might serve as a start place.

    * Refactor out a base parser. Once we see how those APIs should look (or
      at least get a start, they don't have to be perfect :) we start at
      the bottom and look at the code of the existing parsers to come up
      with a basic non-validating parser that can rip through XML.

    * Set SAX on top of this base parser. Of course.

    * Look at pluggable validation.

    * Factor in tree based producers. We'd like to see DOM and JDOM up
      front.

    * Stability. By this point, we should have something that is starting
      to work well. Stability will be a driving goal then.

It should be said up front that this won't happen overnight. It will be a
while before any fruit starts to grow.

-=-------------------------------------------------------------------------

So, to close a few thoughts...

Q. Isn't this a slam on the Xerces guys?
A. Nope. This is a natural thing that happens when people get an itch to
scratch in the Apache organization. It should be pointed out that Apache
Webserver 2.0 started out as a thought project, and that the next version of
Tomcat may very well be Catalina which was a similar refactoring of the
current Tomcat source base.

Q. When will this be ready?
A. Damn if I know. Not anytime immediately to be sure. There's a bit of work
to be done.

Q. Where's the repository gonna be?
A. $CVSROOT/xml-contrib/spinnaker

Q. When's the code going to go in?
A. Well, the initial little itty bit that I've done so far to set up a
directory structure and identify a few utility classes is going to be put in
in just a few minutes time after this email goes out. I'll be working on
more pieces throughout the weekend that will beef things up.

Q. Is this Xerces 2.0?
A. No. Not Yet. And maybe Never. It would take the acceptance of the
developer community to be so. For the time being, it's just a code base
where some of us are going to hang out and work. It should be said that
software darwinism could strike and this code base goes absolutely nowhere.
Or, as I hope, this is going to take off and really work out.

Q. Can I help?
A. Duh....

Oh and by the way, to help keep discussion seperate, please use [spinnaker]
in your subject lines. This has been a help on the Tomcat lists.

That's all for now... Let the code start flowing. ;)

.duncan

Re: design docs and diagrams

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/9/00 4:56 PM, Arved Sandstrom at Arved_37@chebucto.ns.ca wrote:

> This was very interesting, as I have used xfig myself (although not
> recently), and never thought of it. I went and did a Google search and
> located "Universal Modeling Language Library for Xfig"


.....

Note that I've started a design thread on xerces-j-dev... Since you guys are
interested in design, come on in.. :)

.duncan

Re: design docs and diagrams

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.

This was very interesting, as I have used xfig myself (although not 
recently), and never thought of it. I went and did a Google search and 
located "Universal Modeling Language Library for Xfig"

http://epb1.lbl.gov/xfig/libraries/UML/index.html

which looks like it might fit the bill. I suspect that this would be our 
closest *UNIX counterpart to Visio.

At 12:55 PM 7/9/00 -0400, Guy Hulbert wrote:
> [ SNIPPAGE ]
>RJP> I think there are other factors which do more to discourage the use of
>RJP> these diagrams in open-source projects. After all, I use Rose, Visio,
>RJP> etc. to produce a diagram in a standard graphic format (GIF, EPS, ...)
>RJP> which I then use in documentation or communication.
>
>Unless you need anything more than templates and pictures (if all you need is
>a "diagram") then Xfig is fine.  I thought that some of these commercial
tools
>also were useful for code generation etc.

I agree; for straight diagrams having templates is sufficient.

The commercial tools offer reverse and/or round-trip engineering, which I 
use with caution. However, this can be extremely useful. I recently imported 
the latest FOP into Together/J, and plan to start uploading the resulting 
class diagrams to FOP CVS. The alternative would have been painful. :-)

One thing that even Visio has, is some built-in UML intelligence. Once 
classes are defined, it is useful to be able to call up a list of available 
methods when drawing a method invocation in a dynamic diagram, for example. 
This helps maintain consistency across diagrams. I suspect that Xfig with 
the UML library probably doesn't do this, so the onus is on the 
illustrator-developer.

>The fig format is pure text, extremely simple and well-documented and a free
>UML tool that used this format would probably not be too hard to create and
>would have the added bonus that the diagrams would be loaded into Xfig for
>editing etc.
>
>	<snip>
>
>----
>Guy Hulbert, Informatics Director	Bioinformatics Supercomputing Centre
>(416) 813-8876				555 University Avenue
>email: guy@bioinfo.sickkids.on.ca	The Hospital for Sick Children
>http:  www.bioinfo.sickkids.on.ca	Toronto, ON, M5G 1X8, CANADA.

Senior Developer
e-plicity.com (www.e-plicity.com)
Halifax, Nova Scotia
"B2B Wireless in Canada's Ocean Playground"

Re: design docs and diagrams

Posted by Guy Hulbert <gu...@bioinfo.sickkids.on.ca>.

On Sun, 9 Jul 2000, Randall J. Parr wrote:

	<snip>

RJP> I heartily agree that many (most) of the open source projects I've worked
RJP> with would benefit greatly from requirements and design documentation;
RJP> most especially some diagrams. I find that a dozen or so good diagrams

	<snip>

RJP> I also agree that part of the problem is free and/or affordable tools. I
RJP> have and use Visio, Rational Rose, Oracle Designer, and others. I too am
RJP> looking hopefully towards the next release of ArgoUML and Dia. I would
RJP> really like to be free of high-cost, proprietary, MS WINDOWS based tools.

Personally, I use xfig for "drawing".

RJP> 
RJP> I think there are other factors which do more to discourage the use of
RJP> these diagrams in open-source projects. After all, I use Rose, Visio,
RJP> etc. to produce a diagram in a standard graphic format (GIF, EPS, ...)
RJP> which I then use in documentation or communication.

Unless you need anything more than templates and pictures (if all you need is
a "diagram") then Xfig is fine.  I thought that some of these commercial tools
also were useful for code generation etc.

The fig format is pure text, extremely simple and well-documented and a free
UML tool that used this format would probably not be too hard to create and
would have the added bonus that the diagrams would be loaded into Xfig for
editing etc.

	<snip>

----
Guy Hulbert, Informatics Director	Bioinformatics Supercomputing Centre
(416) 813-8876				555 University Avenue
email: guy@bioinfo.sickkids.on.ca	The Hospital for Sick Children
http:  www.bioinfo.sickkids.on.ca	Toronto, ON, M5G 1X8, CANADA.

design docs and diagrams [was Re: [spinnaker] Announce]

Posted by "Randall J. Parr" <RP...@TemporalArts.COM>.

Arved Sandstrom wrote:

> At 10:38 PM 7/8/00 -0700, James Duncan Davidson wrote:
> >on 7/8/00 3:31 PM, Arved Sandstrom at Arved_37@chebucto.ns.ca wrote:
> >
> >> On a very related issue, as "developers" we don't spend most of our time on
> >> code. When we communicate, we communicate with design documents. UML,
> >> IEEE-compliant design descriptions, yada yada.
> >
> >I don't see a lot of UML on Open Source projects. Typically we communicate
> >with code and mail. But that's a personal thing. I can hold a discussion in
> >UML as well as anything -- so what's your design for a NG parser? :)
>
> I don't personally see _any_ UML on open source projects. :-) I wouldn't
> mind seeing some judicious use of the most expressive diagrams, and I plan
> to start sneaking them into FOP, which has some fairly intricate stuff
> happening, and could benefit from them.
>
> This is a completely unrelated tack, but maybe that's something we want to
> encourage - communication of designs with formal modelling notation (not
> getting completely anal, mind you), rather than code. I suppose the main
> problem right now is access to the tools - Visio 2000, or Together/J, or
> Rational Rose, are not cheap. I'm keeping an eye on ArgoUML, though (which
> is under the auspices of the Tigris project), and recommend it to anyone who
> wishes to get a start with UML. There isn't support for everything yet, but
> that's coming.

I heartily agree that many (most) of the open source projects I've worked with
would benefit greatly from requirements and design documentation; most especially
some diagrams. I find that a dozen or so good diagrams often do a better job of
conveying the design and, perhaps more importantly, the design approach or
philosophy, than written documents. More times than I can count I have found
designers, programmers, and end-users are working off a handful of diagrams
they've ripped from the design book.

I also agree that part of the problem is free and/or affordable tools. I have and
use Visio, Rational Rose, Oracle Designer, and others. I too am looking hopefully
towards the next release of ArgoUML and Dia. I would really like to be free of
high-cost, proprietary, MS WINDOWS based tools.

I think there are other factors which do more to discourage the use of these
diagrams in open-source projects. After all, I use Rose, Visio, etc. to produce a
diagram in a standard graphic format (GIF, EPS, ...) which I then use in
documentation or communication.

1) Use of any graphic formats, inclusions, attachments, etc. is very actively
discouraged in most (if not all) of the forums wherein the requirements and design
discussion occur.

2) Few projects seem to give a fraction of the thought to
developing/coordinating/versioning/ documentation source (design or other) as they
to do code source.

It strikes me, documentation and communication of the agreed upon core design
should be one of the most important requirements of an open soure project.

3) Use of diagrams and the like seems to be discouraged even in the design/program
documentation which is produced.

I could blather on (this is a pet peeve of mine) but I won't.

R.Parr
Temporal Arts

Re: [spinnaker] Announce

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/9/00 4:55 AM, Arved Sandstrom at Arved_37@chebucto.ns.ca wrote:

> I don't personally see _any_ UML on open source projects. :-) I wouldn't
> mind seeing some judicious use of the most expressive diagrams, and I plan
> to start sneaking them into FOP, which has some fairly intricate stuff
> happening, and could benefit from them.

That's the best way to do it. Just use em. I think that they are great to
talk about overal direction and code modularity. Sometimes I get worried
when people UML hashtables and other fine grained things, but... that's just
a personal thing. :)

> I'm keeping an eye on ArgoUML, though (which is under the auspices of the
> Tigris project), and recommend it to anyone who wishes to get a start with
> UML. There isn't support for everything yet, but that's coming.

One could always do it the very hard way in Gimp or something. :) Ok, maybe
not.. :)

>> No, new branches are experimental. And at Apache, we encourage
>> experimentation. To the point where on the Jakarta project we created a
>> manifesto for it that expresses some of the points in use from early days of
>> Apache. I'll try to dig that out of the archives and post it somewhere.
> 
> I'd like to see it.

I finally figured out how to dig it out of the archives... It took a special
mix of being an apmail user and a fancy 'find . -exec grep' command. I've
posted it:

http://www.x180.com/rules

> I don't think anyone is arguing that new branches cannot be created. :-) I
> think that what is being expressed is a sense of dismay at the lack of news.

Ok, so regard this as news. A call to action. A discussion starter.

> You know, I could, theoretically, start a new FOP branch today, and it could
> contain one file - the interface for a complete FO processor, somewhat
> different than what we have now. Then I could go away for a few weeks.

I'm not going to go away for a few weeks. I'm only going to step it up. :)

> This would cause consternation and probably some anger - what's this guy up
> to? what's the direction of the project? am I spinning my wheels doing any
> more work on the main branch?

That's always a problem with revolutionary branches. It *does* cause
consternation on the main branch. Especially with the active developer
community. I am painfully aware of what the current core Xerces developers
are feeling as I've been there before. It's one of the things that gives
pause before you decide to do something like this. It's one of the things
that you have to take into account before you do something.

> Yes, you are correct. I think the significant thing here is that one
> normally doesn't start up a potential competitor under the auspices of the
> original. There would have been much less hullaballoo if this refactoring
> were taking place elsewhere, I think. Failing that, consider HR and
> politics. Keep people informed.

I think it would have been *less* open and cause *more* consternation to do
this elsewhere. However, if people would feel more comfortable with me going
off somewhere else to do this, so be it. I wouldn't feel comfortable doing
anywhere else besides Apache though. And, I have to warn you that if we
started spinnaker-dev@apache mailing lists, the likelyhood is very high that
we'd end up with 2 parsers. Is that a bad thing, no, probably not as they
would be targeted for very different things. But it's something that would
also cause probably consternation.

If the current Xerces team would feel more comfortable with me doing this
however, I'll be happy to oblige and try to do this. After all, I don't
really want to piss people off or cause consternation -- I want a better
parser.

> You said the magic word - "collaborate". To me that means "communicate
> design decisions and announce intentions".

See the subject line: Announce.

See the original message -- a start of a GOALS: section detailing out
intentions. Followed by a plan to get there. I submit that I *did* "announce
intentions" and *did* "communicate [initial] design decisions" :)

> We had a situation over with FOP when James Tauber gave the software to
> Apache, and because of real work, has essentially withdrawn. There was no
> formal handover process that would have ensured that James' ideas as to
> design and implementation were captured - no design documentation. I think
> that he himself would acknowledge that this is not good. FOP is just one
> among many OS projects in this regard.

Yes, however I have been following what you are doing on FOP -- and out of
all the xml.apache.org projects, it is FOP and Cocoon that are doing best
from a community process -- you are very real development communities. And
that's a *good* thing.

> Anyhow, I think this is interesting. Apache is maturing, and I don't think
> that is synonymous with "ossifying". :-) Stuff like this has to occur, and
> be argued, and dealt with.

Yep. Somebody said many years ago that at Apache, we don't always take the
best way of getting there, we have our problems, but somehow we do get
things done and in retrospect everything turns out good. Or something like
that -- I'll have to go look up the original...

.duncan

Re: [spinnaker] Announce

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.

At 10:38 PM 7/8/00 -0700, James Duncan Davidson wrote:
>on 7/8/00 3:31 PM, Arved Sandstrom at Arved_37@chebucto.ns.ca wrote:
>
>> On a very related issue, as "developers" we don't spend most of our time on
>> code. When we communicate, we communicate with design documents. UML,
>> IEEE-compliant design descriptions, yada yada.
>
>I don't see a lot of UML on Open Source projects. Typically we communicate
>with code and mail. But that's a personal thing. I can hold a discussion in
>UML as well as anything -- so what's your design for a NG parser? :)

I don't personally see _any_ UML on open source projects. :-) I wouldn't 
mind seeing some judicious use of the most expressive diagrams, and I plan 
to start sneaking them into FOP, which has some fairly intricate stuff 
happening, and could benefit from them.

This is a completely unrelated tack, but maybe that's something we want to 
encourage - communication of designs with formal modelling notation (not 
getting completely anal, mind you), rather than code. I suppose the main 
problem right now is access to the tools - Visio 2000, or Together/J, or 
Rational Rose, are not cheap. I'm keeping an eye on ArgoUML, though (which 
is under the auspices of the Tigris project), and recommend it to anyone who 
wishes to get a start with UML. There isn't support for everything yet, but 
that's coming.

>> I personally don't want to
>> receive a new source tree as the first deliverable.
>
>Take a look at the spinnaker tree. There's not much there. In fact, some of
>my friends were concerned that there should have been more. What it is is a
>README that points out some obvious goals, then a few utility classes that
>I've managed to pull out of Xerces so far (And I'm pulling out more as we
>speak). You'll note that in the README I actually outline talking about an
>API set, and moving forward from there. I would have actually felt bad about
>checking in too much code at once because then it *would* have been a fiat.
>
> [ SNIPPAGE ]

Well, I'm satisfied that the intentions were good. In any case, since I'm 
not a Xerces member I don't exactly have to be satisfied anyway. But as an 
Apache XML member I had some concerns.

>> Speaking for myself, since I'm the release coordinator for FOP, if I saw
>> that some group came in out of the blue and launched an entire new source
>> tree I'd be pretty fired up. On any open-source project there is at least
>> the principle of coordination. You can commit new source without
>> consultation if it doesn't impact existing API's; new branches are a cut
>> above that and require some feedback.
>
>No, new branches are experimental. And at Apache, we encourage
>experimentation. To the point where on the Jakarta project we created a
>manifesto for it that expresses some of the points in use from early days of
>Apache. I'll try to dig that out of the archives and post it somewhere.

I'd like to see it.

I don't think anyone is arguing that new branches cannot be created. :-) I 
think that what is being expressed is a sense of dismay at the lack of news. 
You know, I could, theoretically, start a new FOP branch today, and it could 
contain one file - the interface for a complete FO processor, somewhat 
different than what we have now. Then I could go away for a few weeks. This 
would cause consternation and probably some anger - what's this guy up to? 
what's the direction of the project? am I spinning my wheels doing any more 
work on the main branch?

>Basically it expresses that anybody has a right to go off and create
>something new. If people (developers and users) flock to it, then it's
>something worthwhile and should be considered, if people don't, that sends
>an equally strong message. At the end of the day, it only becomes the
>santioned next version when everyone is convinced that it's the right thing
>to do.

Yes, you are correct. I think the significant thing here is that one 
normally doesn't start up a potential competitor under the auspices of the 
original. There would have been much less hullaballoo if this refactoring 
were taking place elsewhere, I think. Failing that, consider HR and 
politics. Keep people informed. 

>Changes to the main tree require more process, things on the periphery
>should require no process. This is one of the problems with the Jakarta and
>XML processes -- they have been influenced by corporate thinking and the
>notion of processes has crept in -- and some amount of fiefdom. Some
>companies feel that one project is theirs to own, and the other project
>isn't something that they feel welcome in so they will shun it. I've seen it
>in both my corp, and in other corps. And that's BS. These projects are owned
>by Apache. And we play by Apache rules here.. And the number one Apache rule
>is to collaborate and produce code (in fact the +1/0/-1 rule is about all
>that we really do have in stone). There aren't many rules, and those that we
>have change over time -- it doesn't work beautifully, but it does manage to
>work.

You said the magic word - "collaborate". To me that means "communicate 
design decisions and announce intentions".

There is absolutely nothing wrong with process. Most open source projects 
are operating at CMM Level 1 - projects are successful because of the 
individuals. Getting most of the processes covered under Levels 2 & 3 
injected into OS would be a boon, I think. I suspect that maybe these are 
not the kinds of processes you had in mind?

We had a situation over with FOP when James Tauber gave the software to 
Apache, and because of real work, has essentially withdrawn. There was no 
formal handover process that would have ensured that James' ideas as to 
design and implementation were captured - no design documentation. I think 
that he himself would acknowledge that this is not good. FOP is just one 
among many OS projects in this regard.

Anyhow, I think this is interesting. Apache is maturing, and I don't think 
that is synonymous with "ossifying". :-) Stuff like this has to occur, and 
be argued, and dealt with.

Arved Sandstrom

Senior Developer
e-plicity.com (www.e-plicity.com)
Halifax, Nova Scotia
"B2B Wireless in Canada's Ocean Playground"

Re: [spinnaker] Announce

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/8/00 3:31 PM, Arved Sandstrom at Arved_37@chebucto.ns.ca wrote:

> I disagree. It's not quite "spot-on" to blindside people with stuff like
> this. Over on FOP we have no problems in letting people know about upcoming
> CVS branches and other developments.

Well, if you take a look at the source tree for spinnaker, there's probably
10 files checked in right now, if that. This *is* starting the discussion of
what a new parser should look like. It is my experience that if you
mumble-mumble what you want to do, it gets kinda lost in the noise. If you
actually start with some amount of commitment to it, then, well, things move
forward.

This is kinda what happend on tomcat-dev with catalina. Craig looked at the
code in detail, reviewed it greatly, said "I don't like this" and started up
Catalina. Several months later, the process of design and dicussion have
wound there way forward to a good piece of code that looks pretty
interesting. At the time he announced, there were those of us that felt a
bit hurt (especially since I wrote the first version of Tomcat while it was
still just an infant project at sun before it went Open Source), but after a
day or two of discussion, and introspection and seeing that things can
*always* be done better, I could see that his goals were in the right place,
so it wouldn't hurt going off and doing something.

> On a very related issue, as "developers" we don't spend most of our time on
> code. When we communicate, we communicate with design documents. UML,
> IEEE-compliant design descriptions, yada yada.

I don't see a lot of UML on Open Source projects. Typically we communicate
with code and mail. But that's a personal thing. I can hold a discussion in
UML as well as anything -- so what's your design for a NG parser? :)

> I personally don't want to
> receive a new source tree as the first deliverable.

Take a look at the spinnaker tree. There's not much there. In fact, some of
my friends were concerned that there should have been more. What it is is a
README that points out some obvious goals, then a few utility classes that
I've managed to pull out of Xerces so far (And I'm pulling out more as we
speak). You'll note that in the README I actually outline talking about an
API set, and moving forward from there. I would have actually felt bad about
checking in too much code at once because then it *would* have been a fiat.

In open source development, things get done because people push for them to
be done. It's important to note that the Apache process doesn't mandate
design by committee. It mandates openess and a minimum threshold
meritocracy. There's a difference.

I'm *starting* another source tree. The message was a call for
help/discussion/etc. I apologize if that wasn't more clear -- so let me make
it clear now:

Spinnaker is a something that I want to start here and now. I want *your*
help in making it something that can be good. It's wide open. No design
decisions have been made. You can affect it right now. SO, what would you
like to see in a parser today? :)

> I'm not trying to take the piss out of anyone here, as the British say. It's
> just that I've had a heightened level of sensitivity to all of this ever
> since I saw that Sun produced an XSL processor. Why did they have to bother,
> I ask? Were we actually lacking for decent XSLT implementations?

Quite honestly, I asked the same thing. Just so you know, it was produced by
a different part of Sun that I have no relation or connection to. It was an
engineer trying to take a new approach to the problem of XSLT.

And XSLT the spec is so new that we're not going to have a good answer for
what an XSLT processor looks like for a few years to come. I imagine that
the processor that we have in a few years won't look like Xalan 1.0 or XT or
the Sun one. What's the point if we think that we have the perfect peice of
software today? That's never the case.

There have in fact been many discussions about taking the XSLT compiler out
open source so that it can be further developed in the open arena and
compete on equal terms with the other parsers out there. If this happens,
may the best one succeed. Software isn't a cathedral where we're looking for
one perfect manifestation. Ideas come up, get compared, swapped around, and
later merge into something better. Not all software is destined for
greatness. But everybody should be given the chance to go off and do
something big.

> Mind you, jaxp is pretty good, but again, why jaxp when you've already got
> other better stuff? So when I see this I see a big corporate entity starting
> to operate by fiat. And it raises my hackles.

Do you mean JAXP the spec or JAXP the implementation (also known as
Crimson). There's lots of people that we know that like Crimson better than
Xerces. Lots of folks that looked at the Xerces code base, saw problems and
didn't really see a way to make a change. If I really wanted to start a war,
I would have pushed for more exposure and development of Crimson. It's a
decent enough parser that has different enough characteristics from Xerces
to be really interesting.

However, 1) I didn't want that war, 2) I wanted the next parser to be a
collaborative design... Done from the ground up in the open. Once again,
take a look at the source tree -- there literally *nothing* there. The
design is wide open.

> Speaking for myself, since I'm the release coordinator for FOP, if I saw
> that some group came in out of the blue and launched an entire new source
> tree I'd be pretty fired up. On any open-source project there is at least
> the principle of coordination. You can commit new source without
> consultation if it doesn't impact existing API's; new branches are a cut
> above that and require some feedback.

No, new branches are experimental. And at Apache, we encourage
experimentation. To the point where on the Jakarta project we created a
manifesto for it that expresses some of the points in use from early days of
Apache. I'll try to dig that out of the archives and post it somewhere.

Basically it expresses that anybody has a right to go off and create
something new. If people (developers and users) flock to it, then it's
something worthwhile and should be considered, if people don't, that sends
an equally strong message. At the end of the day, it only becomes the
santioned next version when everyone is convinced that it's the right thing
to do.

Changes to the main tree require more process, things on the periphery
should require no process. This is one of the problems with the Jakarta and
XML processes -- they have been influenced by corporate thinking and the
notion of processes has crept in -- and some amount of fiefdom. Some
companies feel that one project is theirs to own, and the other project
isn't something that they feel welcome in so they will shun it. I've seen it
in both my corp, and in other corps. And that's BS. These projects are owned
by Apache. And we play by Apache rules here.. And the number one Apache rule
is to collaborate and produce code (in fact the +1/0/-1 rule is about all
that we really do have in stone). There aren't many rules, and those that we
have change over time -- it doesn't work beautifully, but it does manage to
work.

Httpd 2.0 started out as a couple of people going off and hacking, then
other developers took a look at it. Only when everybody accepted it did it
become the descided next version of Apache Webserver. It could have died.
The same thing applies here, expect instead of grabbbing a few people and
starting to code, I sent out some email intending of finding some interested
people.

And if this code base doesn't gain any critical mass, I will be the first to
say it must die. That's a community process that doesn't need to be
expressed any more succinctly than that -- people use what they want to use,
people develop what they want to develop on. If a lot show up, then hey, the
barn raising moves forward. If nobody shows up, then there's a few
foundations that rot in the sun.

Apologies if the mail sounded like a done deal and there was a full working
parser checked in. Go take a look -- we're ready for the design discussions.

.duncan

Re: [spinnaker] Announce

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.

At 02:46 PM 7/8/00 -0700, James Duncan Davidson wrote:
>on 7/8/00 12:22 AM, Andy Clark at andyc@apache.org wrote:
>
>> Is it possible that, in the future, we hear about submissions
>> to the tree *before* everyone goes home on Friday? I want us
>> all to work together on the future of the Xerces parser instead
>> of being surprised by a new source tree over a weekend.
>
><talking as an asf member>
>Pppht. This is open source, Apache style. People work whenever they work and
>that's the way this all works. Most Apache developers don't work on the main
>sources during the typcial M-F 8-4(local time) window. They work when they
>get time, or the muse is with them, or whatever. There are no limits, it's a
>24/7 shop and to be blunt, conformance with a corporate schedule isn't part
>of the mandate.

I agree. I'm pleased that this was expressed so bluntly. I have my real 
work, and that's usually M-F 8-5. My real work has a component of "using" 
XML, but it doesn't involve anything like Apache XML, in the sense of 
defining how the XML is processed at the fundamental level. I suspect most 
of us in open-source land fall into this group. What's this idea that 
*anything* I do has to appear before close of business on Friday? Hell, I 
_start_ working on FOP on the weekend. Let's keep in mind that most 
contributors to open-source actually have other real jobs.

>Secondly, you may ask why didn't we talk about this before we did it. We'll,
>I've never beleived in starting out too small. If you start out too small,
>you kind of get lost. If you start out strong, well, you can get momentum
>built. Just talking without setting up the code tree would have been, well,
>pointless. As developers we know code, we speak code.
>
I disagree. It's not quite "spot-on" to blindside people with stuff like 
this. Over on FOP we have no problems in letting people know about upcoming 
CVS branches and other developments.

On a very related issue, as "developers" we don't spend most of our time on 
code. When we communicate, we communicate with design documents. UML, 
IEEE-compliant design descriptions, yada yada. I personally don't want to 
receive a new source tree as the first deliverable. It doesn't express the 
requirements or design decisions; it only implements them. Leaving other 
collaborators out of that loop is tantamount to saying "you only need to 
validate what we decided to do".

As good developers, we know process, we speak process. Then we know 
requirements capture. Then we know design. Somewhere about 50% of the way in 
we know code. If everything else was OK we already know exactly what we have 
to write so we don't even dick around with the code too much, and move right 
on to testing.

I'm not trying to take the piss out of anyone here, as the British say. It's 
just that I've had a heightened level of sensitivity to all of this ever 
since I saw that Sun produced an XSL processor. Why did they have to bother, 
I ask? Were we actually lacking for decent XSLT implementations? I think 
not. It's not like Sun is well-known for producing the best implementations 
on the block - JSWDK is fair, and J2EE is lousy. Mind you, jaxp is pretty 
good, but again, why jaxp when you've already got other better stuff? So 
when I see this I see a big corporate entity starting to operate by fiat. 
And it raises my hackles.

Speaking for myself, since I'm the release coordinator for FOP, if I saw 
that some group came in out of the blue and launched an entire new source 
tree I'd be pretty fired up. On any open-source project there is at least 
the principle of coordination. You can commit new source without 
consultation if it doesn't impact existing API's; new branches are a cut 
above that and require some feedback.

Just my thoughts.

Arved Sandstrom

Senior Developer
e-plicity.com (www.e-plicity.com)
Halifax, Nova Scotia
"B2B Wireless in Canada's Ocean Playground"

Re: [spinnaker] Announce

Posted by Rajiv Mordani <ra...@eng.sun.com>.

Andy Clark wrote:
> 
> James Duncan Davidson wrote:
> > After quite a bit of discussion, the rest of the XML team at Sun,
> > the people who are responsible for the parser that will ship in
> > the core of future JDKs, agree as well.
> 
> I would like to know who the "XML team at Sun" is. I've checked
> the previous commit messages and only saw the initial checkin
> of the Crimson DOM and some metric test files. Are the checkins
> from the xml-contrib module going to the CVS mailing list? I
> must be overlooking something. There doesn't seem to be any
> commits on the main branch of the source code, though.
> 
> > So, in the best of Apache traditions, were gonna do something
> > about it. I'm creating a tree in the xml-contrib area in which
> > to do a lot of code work to explore how such a new parser could
> > come to be. It's called Spinnaker.
> 
> Is it possible that, in the future, we hear about submissions
> to the tree *before* everyone goes home on Friday? I want us
> all to work together on the future of the Xerces parser instead
> of being surprised by a new source tree over a weekend.

1. It is NOT a submission to the "tree". It is in the xml-contrib area
and that doesn't mean anything. It isn't an official project. What will
happen of it, only time will tell.

2. I don't see the problem in doing it over the weekend?? On a prior
occasion I have been asked this question by Arnaud when I checked in the
whiteboard directory in xerces and when I asked people what was wrong in
checking things in on a weekend there was no answer from anyone on the
"xerces" team. Anyways I don't think there is anything wrong on doing it
over a weekend. Expecting people to work on this only Monday - Friday
9-5 or something of the sort isn't what this is meant to be. (Infact I
don't think that is the case these days even in MOST corporates for that
matter.).

- Rajiv

> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

--
:wq

Re: [spinnaker] Announce

Posted by Rajiv Mordani <ra...@eng.sun.com>.

Andy Clark wrote:
> 
> James Duncan Davidson wrote:
> > After quite a bit of discussion, the rest of the XML team at Sun,
> > the people who are responsible for the parser that will ship in
> > the core of future JDKs, agree as well.
> 
> I would like to know who the "XML team at Sun" is. I've checked
> the previous commit messages and only saw the initial checkin
> of the Crimson DOM and some metric test files. Are the checkins
> from the xml-contrib module going to the CVS mailing list? I
> must be overlooking something. There doesn't seem to be any
> commits on the main branch of the source code, though.
> 
> > So, in the best of Apache traditions, were gonna do something
> > about it. I'm creating a tree in the xml-contrib area in which
> > to do a lot of code work to explore how such a new parser could
> > come to be. It's called Spinnaker.
> 
> Is it possible that, in the future, we hear about submissions
> to the tree *before* everyone goes home on Friday? I want us
> all to work together on the future of the Xerces parser instead
> of being surprised by a new source tree over a weekend.

1. It is NOT a submission to the "tree". It is in the xml-contrib area
and that doesn't mean anything. It isn't an official project. What will
happen of it, only time will tell.

2. I don't see the problem in doing it over the weekend?? On a prior
occasion I have been asked this question by Arnaud when I checked in the
whiteboard directory in xerces and when I asked people what was wrong in
checking things in on a weekend there was no answer from anyone on the
"xerces" team. Anyways I don't think there is anything wrong on doing it
over a weekend. Expecting people to work on this only Monday - Friday
9-5 or something of the sort isn't what this is meant to be. (Infact I
don't think that is the case these days even in MOST corporates for that
matter.).

- Rajiv

> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

--
:wq

Re: Let's move on!

Posted by Edwin Goei <Ed...@eng.sun.com>.

> Again, I think all this could have been avoided, easily. But forget it.

Maybe this is just part of open source development :-) as evidenced by
gnuemacs vs. XEmacs, gcc vs. egcs, or just Tomcat vs. Catalina.

> As I've said, or tried to: we've never intended to make this project
> ours, we are all for an open discussion, we are all for a rearchitecture
> of the parser, and we welcome anybody to join the discussion and help on
> the next version of Xerces.

Great, glad to hear it.

-Edwin

Let's move on!

Posted by Arnaud Le Hors <le...@us.ibm.com>.

Alright guys,

Several messages in my mailbox this morning make me feel sick again. But
to try and put an end to this, I'm not going to answer any of them. Just
don't assume I agree with anything that has been said.

Again, I think all this could have been avoided, easily. But forget it.

As I've said, or tried to: we've never intended to make this project
ours, we are all for an open discussion, we are all for a rearchitecture
of the parser, and we welcome anybody to join the discussion and help on
the next version of Xerces.

So, please, let's move on!
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group

Re: [spinnaker] Announce

Posted by Stefano Mazzocchi <st...@apache.org>.

twleung@sauria.com wrote:
> 
> Let me make a few comments on this whole episode.
> 
> 1) If any Xerces developer at Sun or IBM wants to hassle someone
> for wanting a revolution in the Xerces codebase, then they should start
> with me, not Duncan.  I was the one who suggested that all the people
> working on Xerces take a deep breath, eat some humble pie, and admit
> that there are problems with both Xerces and Crimson.  So lay off of
> Duncan.   The current Xerces code base is based on a design rule which
> says, "write a Java program in an idiomatic style that mimics what you would
> do in C".  As a corollary, "If anything gets in the way of going fast, crush it".
> I remember specific instances where things that were interfaces got turned into
> classes solely because of the performance impact of method invocation through
> an interface as opposed to a class.
> 
> 2) As a former member of the IBM team in Cupertino, I have to agree that general
> visibility into the development state of Xerces is non-existant.  Sure, I can pick
> up the phone and call Mike, Andy, Jeff or Arnaud and get the exact details on
> what's going on.  But I shouldn't have to.  It's 9 months after the initial contribution
> of IBM code into Xerces, and the majority of the development is not happening out
> in the open.
> 
> 3) For whatever we do in the future, I would like to see the requirements clearly
> laid out, so that we can view both the Xerces and Crimson codebases against those
> requirements.  The creation of that requirements document must be a public community
> activity.  Similarly for a design document.
> 
> 4) For the record, the code base that is now called Xerces underwent at least 2 major
> refactorings and rewrites before it was donated to Apache.  There is a reason that it
> is the way that it is.  The primary criteria were "compliant" (whatever that means) and
> "fast" (which meant faster than whoever else happened to have an XML parser).  We
> understood some other requirements.  Some of them got trampled by the above 2.
> But there were lots of other interesting requirements.  I think that now is a good time for
> some of those to be considered.  At one point , we conceived of a family of parsers,
> tuned for different scenarios, but making have use of a common pool of code.
> 
> 5) I'll go on the record as being in favor of controlled revolution.  For me one of the
> attractions of open source development is the possiblity that we engineers might actually
> get to build something that we're proud of, having been released from the corporate mandates
> of schedule and feature creep.  If you look at what's happening in the Linux kernel, they are
> periodically having a revolution -- just watch the linux-kernel list and see how many people
> are screaming about 2 different VM implementations, or "last minute" upheavals in the device
> driver architecture.   All production code that I've ever worked on has eventually turned into
> a steaming pile of **** because it was impossible to throw it away.  I'd like to see a place
> where interested members of the community can fool around with ideas without being subjected
> to the pressure of "This Xerces 2.0", or "this is the main trunk so don't break the build", or any
> other pressures.
> 
> 6)  I'm actually more interested in services API's around the parser than the parser itself.  That
> includes stuff like JDOM, or databinding, or other parser APIs.  I'm profoundly unhappy with
> the W3C DOM (sorry, Arnaud), and if it was up to me, we'd just heave the entire mess in the
> garbage and leave it there.   It takes more work than it should to build an XML producing or
> consuming application.
> 
> 7)  I suppose now I'll be chastised for posting this at 3:45AM when all sane people are sleeping, except
> for the ones reinstalling their operating systems.  Um.  Really, folks, this is getting to be an ugly place
> to be around.  If we can't make this a civil community and learn to work with each other, there isn't
> going to be a next-gen parser, and the IBM folks in Cupertino are going to be the only ones who
> work on Xerces.

Amen.

Ted is the man.

> P.S.  Duncan, the next time you want to have people come violently into agreement, could you just
> send an H-bomb, instead of e-mail?   Might be less mess.

Like I said, shit happens.

But I'm sure James learned the lesson very well ;-)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

Re: [spinnaker] Announce

Posted by tw...@sauria.com.

Let me make a few comments on this whole episode.

1) If any Xerces developer at Sun or IBM wants to hassle someone
for wanting a revolution in the Xerces codebase, then they should start
with me, not Duncan. I was the one who suggested that all the people
working on Xerces take a deep breath, eat some humble pie, and admit
that there are problems with both Xerces and Crimson. So lay off of
Duncan. The current Xerces code base is based on a design rule which
says, "write a Java program in an idiomatic style that mimics what you would
do in C". As a corollary, "If anything gets in the way of going fast, crush it".
I remember specific instances where things that were interfaces got turned into
classes solely because of the performance impact of method invocation through
an interface as opposed to a class.

2) As a former member of the IBM team in Cupertino, I have to agree that general
visibility into the development state of Xerces is non-existant. Sure, I can pick
up the phone and call Mike, Andy, Jeff or Arnaud and get the exact details on
what's going on. But I shouldn't have to. It's 9 months after the initial contribution
of IBM code into Xerces, and the majority of the development is not happening out
in the open.

3) For whatever we do in the future, I would like to see the requirements clearly
laid out, so that we can view both the Xerces and Crimson codebases against those
requirements. The creation of that requirements document must be a public community
activity. Similarly for a design document.

4) For the record, the code base that is now called Xerces underwent at least 2 major
refactorings and rewrites before it was donated to Apache. There is a reason that it
is the way that it is. The primary criteria were "compliant" (whatever that means) and
"fast" (which meant faster than whoever else happened to have an XML parser). We
understood some other requirements. Some of them got trampled by the above 2.
But there were lots of other interesting requirements. I think that now is a good time for
some of those to be considered. At one point , we conceived of a family of parsers,
tuned for different scenarios, but making have use of a common pool of code.

5) I'll go on the record as being in favor of controlled revolution. For me one of the
attractions of open source development is the possiblity that we engineers might actually
get to build something that we're proud of, having been released from the corporate mandates
of schedule and feature creep. If you look at what's happening in the Linux kernel, they are
periodically having a revolution -- just watch the linux-kernel list and see how many people
are screaming about 2 different VM implementations, or "last minute" upheavals in the device
driver architecture. All production code that I've ever worked on has eventually turned into
a steaming pile of **** because it was impossible to throw it away. I'd like to see a place
where interested members of the community can fool around with ideas without being subjected
to the pressure of "This Xerces 2.0", or "this is the main trunk so don't break the build", or any
other pressures.

6) I'm actually more interested in services API's around the parser than the parser itself. That
includes stuff like JDOM, or databinding, or other parser APIs. I'm profoundly unhappy with
the W3C DOM (sorry, Arnaud), and if it was up to me, we'd just heave the entire mess in the
garbage and leave it there. It takes more work than it should to build an XML producing or
consuming application.

7) I suppose now I'll be chastised for posting this at 3:45AM when all sane people are sleeping, except
for the ones reinstalling their operating systems. Um. Really, folks, this is getting to be an ugly place
to be around. If we can't make this a civil community and learn to work with each other, there isn't
going to be a next-gen parser, and the IBM folks in Cupertino are going to be the only ones who
work on Xerces.

Ted

P.S. Duncan, the next time you want to have people come violently into agreement, could you just
send an H-bomb, instead of e-mail? Might be less mess.

Re: [spinnaker] Announce

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/8/00 12:22 AM, Andy Clark at andyc@apache.org wrote:

> I would like to know who the "XML team at Sun" is.

The visible to the outside people are:

Myself
Ed Goei
Rajiv Mordani
Costin Manaloche

Then there are people behind the scenes that do QA, Docs, and all that other
stuff. The manager of the team is Jim Driscoll, whos boss is Connie Weiss,
whos boss is Jeff Jackson whos boss is, well, that's to the director level
at least. :)

> I've checked the previous commit messages and only saw the initial checkin of
> the Crimson DOM and some metric test files.

<talking as a member of the Sun team>
Yep. We've been spending the last 6 months trying to figure out how the hell
to move forward with this codebase in a way that satisfies our constraints.
We played around with lots of different ideas that we could dive in and
start developing with. You don't see any checkins because we didn't have any
substantial things that could pan out for our uses. But Ed, Rajiv, and
Costin know both the Crimson and Xerces source bases and could give you
details of the positives and negatives of each.

It wasn't a small step in committing to looking far into the future instead
of the here and know, but you have to understand what we're looking for in a
parser. I *has* to run well on JDK 1.3/1.4. It has to have a reasonably
small memory runtime as any parser we put into the JDK will end up in TV
sets eventually. And our team has to feel comfortable with maintaining the
code. They have to understand what things do, or be able to immediatly so
that issues can be resolved.

> Are the checkins from the xml-contrib module going to the CVS mailing list? I
> must be overlooking something.

Yes, the xml-contrib-cvs mailing list. It was set up as a different mailing
list by Dirk when he set up the contrib space to put Crimson into a while
ago. It's unfortunate that it goes off into a different mailing list.

> There doesn't seem to be any commits on the main branch of the source code,
> though.

<talking as myself>
There aren't. I put this over in xml-contrib to highlight the fact that this
is experimental. An incubator. A future peice of code that might not live to
see the light of day. There's a paper that I wrote called "Rules for
Revolutionaries" back when we were sorting out the painful process of how to
let new generations of codebases come along when they needed to rather than
just iterative generations. Turns out that all we had to do was look no
further than the httpd project which set out in a similar fashion a while
ago for the 2.0 server, and only now is starting to release betas of what
will be the next Apache web server. Lots of twists and turns. Lots of
experiements that shouldn't have happened in the main source tree (not even
on a branch -- branches will limit the amount of code refactoring that you
are willing to do).

> Is it possible that, in the future, we hear about submissions
> to the tree *before* everyone goes home on Friday? I want us
> all to work together on the future of the Xerces parser instead
> of being surprised by a new source tree over a weekend.

<talking as an asf member>
Pppht. This is open source, Apache style. People work whenever they work and
that's the way this all works. Most Apache developers don't work on the main
sources during the typcial M-F 8-4(local time) window. They work when they
get time, or the muse is with them, or whatever. There are no limits, it's a
24/7 shop and to be blunt, conformance with a corporate schedule isn't part
of the mandate.

Secondly, you may ask why didn't we talk about this before we did it. We'll,
I've never beleived in starting out too small. If you start out too small,
you kind of get lost. If you start out strong, well, you can get momentum
built. Just talking without setting up the code tree would have been, well,
pointless. As developers we know code, we speak code.

Third, if you take a look, it's hardly a new source tree. There's a very few
utility items that I've moved into place. It'll build over time. Many
months. It's not going to happen instantaneously.

.duncan

Re: [spinnaker] Announce

Posted by Andy Clark <an...@apache.org>.

James Duncan Davidson wrote:
> After quite a bit of discussion, the rest of the XML team at Sun, 
> the people who are responsible for the parser that will ship in 
> the core of future JDKs, agree as well.

I would like to know who the "XML team at Sun" is. I've checked
the previous commit messages and only saw the initial checkin
of the Crimson DOM and some metric test files. Are the checkins
from the xml-contrib module going to the CVS mailing list? I
must be overlooking something. There doesn't seem to be any
commits on the main branch of the source code, though.

> So, in the best of Apache traditions, were gonna do something 
> about it. I'm creating a tree in the xml-contrib area in which 
> to do a lot of code work to explore how such a new parser could 
> come to be. It's called Spinnaker.

Is it possible that, in the future, we hear about submissions
to the tree *before* everyone goes home on Friday? I want us
all to work together on the future of the Xerces parser instead
of being surprised by a new source tree over a weekend.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: [spinnaker] Announce

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/9/00 8:36 AM, Stefano Mazzocchi at stefano@apache.org wrote:

> Duncan placed "the XML team at Sun" and all good intentions were sinked
> by corporate shit. I was honestly surprised to see such a baby mistake
> from one that taught me so much about human resource management and
> diplomatic skills, but shit happens.

Yeah, looking at it, it's so freaking obvious that one misplaced sentance
can so screw something up. Bah. I do hope I've made it quite clear that this
wasn't something that Sun dreamed up, it was something that I dreamed up.
The Sun folks weren't even sure that it was a good idea when I started
talking about it on Wednesday (it was something that crystalized while
driving along the Oregon coast over the 4th of july weekend). :)

> This is called "internal forking" and Duncan wrote an excellent paper
> about this called "rules for revolutionarists". (James, where is it?)

http://www.x180.com/rules -- now. :)

> Will Spinnaker be smaller, faster and more modular? Great, we'll use it.
> Will it sink? Great, we won't use it.

And you know what... I'm happy for the world to tell me in a while whether
or not we did the right thing. But let me have my chance.

> I've questioned myself many times: would have been possible to do JServ
> 1.0 without any revolution? I still don't believe so.

One thing I've noticed about the Apache process over the years, and of Open
Development in general is that there are critical times when talk is just
that: talk. At times, individuals just have to do -- and then see what the
outcome is. It gives focus.

.duncan

Re: [spinnaker] Announce

Posted by Stefano Mazzocchi <st...@apache.org>.

All right, 

Duncan placed "the XML team at Sun" and all good intentions were sinked
by corporate shit. I was honestly surprised to see such a baby mistake
from one that taught me so much about human resource management and
diplomatic skills, but shit happens.

Is Spinnaker a "coup d'état"?

This is the question, we'll answer this at the end.

I never worked in the Xerces team, nor I did any code contribution
(well, besides a very early Ant build file), nor I know the internals of
the code well, nor I partecipate in any of the xerces mailing lists.

So, since I don't consider myself entitled to speak for Xerces, I'll
make up an equivalent situation where I'd be entitled to speak.

Suppose that Cocoon2 is out, Stylebook is deprecated, but some people
don't like it. [This is a possibility I have to take into serious
consideration.]

So, since Cocoon2 is much more advanced than stylebook but not good
enough for their needs (whatever they are, it doesn't matter), they
decide to propose a new project that clones Cocoon2 functionality but
make it more similar to what they are used to.

This is the exact equivalent of what is happening with Xalan/Spinnaker,
as well as Tomcat/Catalina and happened with JServ0.9/JServ1.0,
Apache1.3/Apache2.0

This is called "internal forking" and Duncan wrote an excellent paper
about this called "rules for revolutionarists". (James, where is it?)

But let's continue with the Cocoon2 story: will I be happy for this
"internal forking"?

Honestly, no, I wouldn't be: some guy believes some of the decisions we
have taken are bad and they can do better. Instead of helping directly,
they feel it's better to fork or start off from scratch.

But while ego dissipates, I start thinking:

 1) they may have good points
 2) they will have less visibility (internal forks are not as visible as
main projects)
 3) they may come up with things I could reuse for the main project
 4) we may change their minds and incorporate the efforts later on
 5) or we may be blinded by our own ideas and die out

The question is: what is the most durable thing, the code or the
process?

The answer is obvious, expecially in IT where things move so fast.

I normally don't like revolutions but I did one: JServ 1.0 (award
winning!) came out of Pier's and mine intention of internal fork.
JServ0.9 was crap, we could do better, the history told us we did.

Do all internal forks terminate in a project take-over? no, almost
never. Catalina will become Tomcat 4.0, Cocoon2 (which is an internal
fork, in fact, I haven't written a single like of code yet on that
codebase) will be Cocoon-next, Apache2 will be Apache-next, and so on.

Will "codename Spinnaker" be Xerces-next? I don't know, nor care, to be
honest. The important part is that somebody is not feeling happy about
what a project and the process gives them the right to fork internally
and the ASF agrees to donate them resources to continue their quest.

If they _happen_ to work for one company or another, or if this is their
day job or night hobby, we (the Apache community) don't give a shit: the
only thing that counts is the outcome, we'll judge that one.

Will Spinnaker be smaller, faster and more modular? Great, we'll use it.

Will it sink? Great, we won't use it.

Egos are a big part of open source development and most of the time
revolutions create lots of friction... but when they happen they
_rarely_ do harm in the long term, expecially under Apache where several
internal forks happened but no external forks.

Internal forks happen when the development community is not responsive
enough nor willing to accept diversities.

When I proposed the JServ 1.0 internal fork, Ed Korthof (ed@apache.org)
at that time one of Jserv0.9 project coordinators judged the whole thing
with this sentence: "Enjoy your cathedral."

After JServ 1.0 became a successful project and boosted an incredibly
more successful community, Ed publicly apologized for not having
understood my intentions.

I've questioned myself many times: would have been possible to do JServ
1.0 without any revolution? I still don't believe so.

The problem is never in the code, but in the developer community.
Sometimes dev communities become closed, sellfish and oppose
diversities. (note, this has nothing to do with corporations or day
jobs, it just happens)

An internal fork is a way to go around the obstacle, it's a way to
"challenge" the power of the dev community a way to emerge and influence
the project.

If a group of individuals are capable of bootstrapping an internal fork
and reach the point where the community appreciate them more than the
original project, let this be.

Viva la diversitad!

Arnaud Le Hors wrote:
> 
> James Duncan Davidson wrote:
> >
> >     * Crimson isn't so optimized, yet it runs about as fast as Xerces
> >       does on modern VMs such as HotSpot. The HotSpot team told us
> >       that heavily optimized code for 1.1 would not benefit under
> >       HotSpot. We have the proof now. In fact, there's cases where
> >       it seems that Xerces slows down.
> 
> So far the only proof I've got is that Hotspot miserably fails on
> Xerces. This means to me that Hotspot has a problem, not xerces.

Bullshit. 

I used to optmimize x86 assembly code by hand for Pentium I dual
pipeline, then worked great for Pentium I but failed miserably with
Pentium II machines compared to what the C compiler produced
automatically.

If you JVM specialities to optimize your code that don't work anymore if
the JVM evolves, you are the mistaken one, not those who build a JVM
that should optmimize "normal code".

> >     * However, because Xerces was heavily pre-optimized, its
> >       extremely complex to understand and delve into. I think
> >       that this is best reflected in that most of the bits that
> >       go into Xerces come from IBM Cupertino.
> 
> Not so. What you're refering to as "IBM Cupertino" is hardly a fixed set
> of people. We've actually had a lot of turnover and we keep getting new
> people involved in this project all the time. This hasn't prevented any
> of them to contribute significantly. The only reason most bits come from
> IBM is that nobody else has comitted as many resources to this project.

I don't give a shit about who is paying whom to do anything as long as
what is being done is good for me. I'm happy with Xerces and I use it.
But if somebody is not, they have all the rights in the world to do
something about it and if the community is not open enough to listen, to
prove their points by creating new code and let the community decide.

This is _NOT_ a IBM vs. Sun thing and I suggest everyone on this list to
ignore any post that go in that direction.

> >     * In our analysis of the Xerces code base, we can't use it for
> >       future inclusion in the JDK. The pre-optimization is a killer.
> >       The code-complexity is a killer. And the memory consumption is
> >       a problem.
> 
> There definitely are choices that have been made that could be
> revisited. But you make it sound like we never took into account memory
> consumption. It is hardly the case. As you know there is always a
> trade-off between memory consumption and performance. You may have
> different requirements here, but they'd have to be laid out and agreed
> upon.

Sometimes is easier to write different code than to agree upon every
single bit. Expecially when everything becomes a "Sun vs. IBM" war.

> > So, in the best of Apache traditions, were gonna do something about it. I'm
> > creating a tree in the xml-contrib area in which to do a lot of code work to
> > explore how such a new parser could come to be. It's called Spinnaker.
> 
> Is it really in the Apache traditions to start new things like that over
> a week-end without having any discussion beforehand? Looking at Sun's
> record I guess I can see a trend for sure...

Get real. If this is your day-job that's good for you, but don't assume
everybody has your same status, nor live in the same place of the
planet.

Apache is a "worldwide community of software developping volunteers".

If you don't feel you fit in this category, this is your problem, not
ours.

> > So, to close a few thoughts...
> >
> > Q. Isn't this a slam on the Xerces guys?
> 
> I say yes. Looks like a "coup d'etat" to me.

A "coup d'etat" would be if the ASF ruled the XML parser project out to
start off a new project. This is not so, nor will _ever_ be.

Spinnaker is a way to express points with code and bootstrap a community
around a fresh codebase with the intention to improve what is already
existing. A clean-room implementation that hopes to bring ideas and
experience from all parties, along the spirits of Apache. A 2.0-version
that happens naturally within projects.

James assumed a couple of points in his announcement:

1) spinnaker is a "codename", not a project name. Spinnaker, if approved
by the community, will become Xerces 2.0.
2) the community rules: if users like Xerces more, Spinnaker will die.
3) internal forks don't get "full project" visibility. They have to gain
their visibility with code and community interaction on the mail lists.

At the hand, while I don't have anything to say on the technical side of
things, I judge this move very _healthy_ for the overall stability of
the Apache XML parsing community of developers.

Of course, this is a medium/long term vision and James is perfectly
aware of this. It might win or go nowhere, but if he has enough energy
to start this and back it up, we can only respect this and stay at the
window to see what they have to say.

The important thing is the quality of the process: quality in code then
happens naturally.

The Spinnaker effort will indicate if Xerces is stopping innovation or
not. Unfortunately, this is easier to do with revolutions that
evolutions so some competition friction will be developped, but as long
as this is kept respectful and any company racism is left out of the
picture, it'll be a good thing.

This is why I suggest you all to ignore any "this vs. that" corporate
bullshit.

[Note: I'm talking as a volunteer individual with no affiliation to
anything rather than himself]

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

Re: [spinnaker] Announce

Posted by Rajiv Mordani <ra...@eng.sun.com>.

Arnaud Le Hors wrote:
> 
> James Duncan Davidson wrote:
> >
> >     * Crimson isn't so optimized, yet it runs about as fast as Xerces
> >       does on modern VMs such as HotSpot. The HotSpot team told us
> >       that heavily optimized code for 1.1 would not benefit under
> >       HotSpot. We have the proof now. In fact, there's cases where
> >       it seems that Xerces slows down.
> 
> So far the only proof I've got is that Hotspot miserably fails on
> Xerces. This means to me that Hotspot has a problem, not xerces.

That isn't true. There is also a problem on running it on Solaris. The
results are different on windows and on Solaris. Pier has seen this
himself. It isn't just HotSpot but also the OS and the environment that
it is run in. Infact we have discussed this over lunch also Arnaud if
you remember (James, Ed, Mike Pogue you and I).

> 
> >     * However, because Xerces was heavily pre-optimized, its
> >       extremely complex to understand and delve into. I think
> >       that this is best reflected in that most of the bits that
> >       go into Xerces come from IBM Cupertino.
> 
> Not so. What you're refering to as "IBM Cupertino" is hardly a fixed set
> of people. We've actually had a lot of turnover and we keep getting new
> people involved in this project all the time. This hasn't prevented any
> of them to contribute significantly. The only reason most bits come from
> IBM is that nobody else has comitted as many resources to this project.

That isn't the only point - the other thing is that since most of you'll
work in the same office there is a lot of things that happen in there
that should be actually done on the mailing lists. I don't remember
seeing a mail going out proposing the implementation of schemas for e.g.
The only info that was sent out is that the repository will be a little
unstable for the next few days as schema support is being added to
xerces so please use the tagged version of the workspace.

[SNIP]

- Rajiv
--
:wq

Re: [spinnaker] Announce

Posted by Rajiv Mordani <ra...@eng.sun.com>.

Arnaud Le Hors wrote:
> 
> James Duncan Davidson wrote:
> >
> >     * Crimson isn't so optimized, yet it runs about as fast as Xerces
> >       does on modern VMs such as HotSpot. The HotSpot team told us
> >       that heavily optimized code for 1.1 would not benefit under
> >       HotSpot. We have the proof now. In fact, there's cases where
> >       it seems that Xerces slows down.
> 
> So far the only proof I've got is that Hotspot miserably fails on
> Xerces. This means to me that Hotspot has a problem, not xerces.

That isn't true. There is also a problem on running it on Solaris. The
results are different on windows and on Solaris. Pier has seen this
himself. It isn't just HotSpot but also the OS and the environment that
it is run in. Infact we have discussed this over lunch also Arnaud if
you remember (James, Ed, Mike Pogue you and I).

> 
> >     * However, because Xerces was heavily pre-optimized, its
> >       extremely complex to understand and delve into. I think
> >       that this is best reflected in that most of the bits that
> >       go into Xerces come from IBM Cupertino.
> 
> Not so. What you're refering to as "IBM Cupertino" is hardly a fixed set
> of people. We've actually had a lot of turnover and we keep getting new
> people involved in this project all the time. This hasn't prevented any
> of them to contribute significantly. The only reason most bits come from
> IBM is that nobody else has comitted as many resources to this project.

That isn't the only point - the other thing is that since most of you'll
work in the same office there is a lot of things that happen in there
that should be actually done on the mailing lists. I don't remember
seeing a mail going out proposing the implementation of schemas for e.g.
The only info that was sent out is that the repository will be a little
unstable for the next few days as schema support is being added to
xerces so please use the tagged version of the workspace.

[SNIP]

- Rajiv
--
:wq

Re: [spinnaker] Announce

Posted by Kevin Regan <ke...@valicert.com>.

On Sat, 8 Jul 2000, James Duncan Davidson wrote:

> on 7/8/00 10:56 AM, Arnaud Le Hors at lehors@us.ibm.com wrote:
> 
> > So far the only proof I've got is that Hotspot miserably fails on
> > Xerces. This means to me that Hotspot has a problem, not xerces.
> 
> Many many many programs work better on Hotspot than not. And the Hotspot
> team has been saying for *years* now (at least 3, more than half the
> lifetime of Java) what kind of optimizations would crap on them. When I
> was
> on the Java Web Server team, we hit almost every one. Cleaning up the
> code
> reversed this treand and performance picked up. Sure, people are going
> to
> get bit by this... Xerces wasn't the first, nor will it be the last.
> 

I agree with this whole-heartedly.  The HotSpot folks have been
keeping us well informed about which optimizations we should not
include in our code because they would be taking care of it
for us (and, indeed, it would actually hurt more than help).
I, for one, am very pleased with what HotSpot has matured into...

Sincerely,
Kevin Regan

Re: [spinnaker] Announce

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/8/00 10:56 AM, Arnaud Le Hors at lehors@us.ibm.com wrote:

> So far the only proof I've got is that Hotspot miserably fails on
> Xerces. This means to me that Hotspot has a problem, not xerces.

Many many many programs work better on Hotspot than not. And the Hotspot
team has been saying for *years* now (at least 3, more than half the
lifetime of Java) what kind of optimizations would crap on them. When I was
on the Java Web Server team, we hit almost every one. Cleaning up the code
reversed this treand and performance picked up. Sure, people are going to
get bit by this... Xerces wasn't the first, nor will it be the last.

> Not so. What you're refering to as "IBM Cupertino" is hardly a fixed set
> of people. We've actually had a lot of turnover and we keep getting new
> people involved in this project all the time. This hasn't prevented any
> of them to contribute significantly. The only reason most bits come from
> IBM is that nobody else has comitted as many resources to this project.

It's a center for development where everyone is getting paid to do work on
the same codebase, meeting in meeting rooms, and communicating with and
helping each other out in person. As efficient as this is for product work,
it's not condusive to building a truely sustainable Apache project.

> There definitely are choices that have been made that could be
> revisited. But you make it sound like we never took into account memory
> consumption. It is hardly the case. As you know there is always a
> trade-off between memory consumption and performance. You may have
> different requirements here, but they'd have to be laid out and agreed
> upon.

Ok, so I'm the bad guy because I didn't enunciate every possible motive. I
took a look at a relevant set. All of us that work on software know that
there are tradeoffs. But writing code for cutting edge VMs, the trade offs
are different and there are different advantages that can be put into play
that change the balance of the trade off. I want to take a look at what can
happen like that.

>> So, in the best of Apache traditions, were gonna do something about it. I'm
>> creating a tree in the xml-contrib area in which to do a lot of code work to
>> explore how such a new parser could come to be. It's called Spinnaker.
> 
> Is it really in the Apache traditions to start new things like that over
> a week-end without having any discussion beforehand? Looking at Sun's
> record I guess I can see a trend for sure...

Bite me. This isn't about Sun vs. IBM. I'm here as an ASF member, developer,
and founder of a few other Apache codebases. I happen to work at Sun, yes.
I've brought out quite a bit of Sun code to Apache. But I am an ASF member
before being a Sun employee. I'm very unhappy that you've tried to play a
corporate piss match card here.

ASF developers check in code and send email around the clock from all over
the world. That's why we do this asynchronous email communication thing. And
sometimes it works better than other times. And I have to point out that the
first two comments here are from primary developers on a Saturday morning.

Yes, it's Apache tradition that when an itch wants to be scratched, it gets
scratched. Yes, entirely new source trees have been started up over less.
Software Diversity is a good thing. To assume that we already have the
perfect peice of software is arrogant, and sometimes the best way to find
out is to experiment  outside the confines of the current tree.

> These two requirements are in direct conflict.

I don't think so. But we'll find out. And we very well might hit a different
balance of things, or a better cleaner way to code it that will make a
difference.

>> So, to close a few thoughts...
>> 
>> Q. Isn't this a slam on the Xerces guys?
> 
> I say yes. Looks like a "coup d'etat" to me.

<sigh>If that's the way you're going to take it, then I'm sorry for that.

Like I said, this is an experimental code base. It doesn't mean squat yet. I
may never mean squat. If you don't like it, don't participate. It's as
simple as that. 

.duncan

Re: [spinnaker] Announce

Posted by Arnaud Le Hors <le...@us.ibm.com>.

James Duncan Davidson wrote:
> 
>     * Crimson isn't so optimized, yet it runs about as fast as Xerces
>       does on modern VMs such as HotSpot. The HotSpot team told us
>       that heavily optimized code for 1.1 would not benefit under
>       HotSpot. We have the proof now. In fact, there's cases where
>       it seems that Xerces slows down.

So far the only proof I've got is that Hotspot miserably fails on
Xerces. This means to me that Hotspot has a problem, not xerces.

>     * However, because Xerces was heavily pre-optimized, its
>       extremely complex to understand and delve into. I think
>       that this is best reflected in that most of the bits that
>       go into Xerces come from IBM Cupertino.

Not so. What you're refering to as "IBM Cupertino" is hardly a fixed set
of people. We've actually had a lot of turnover and we keep getting new
people involved in this project all the time. This hasn't prevented any
of them to contribute significantly. The only reason most bits come from
IBM is that nobody else has comitted as many resources to this project.

>     * In our analysis of the Xerces code base, we can't use it for
>       future inclusion in the JDK. The pre-optimization is a killer.
>       The code-complexity is a killer. And the memory consumption is
>       a problem.

There definitely are choices that have been made that could be
revisited. But you make it sound like we never took into account memory
consumption. It is hardly the case. As you know there is always a
trade-off between memory consumption and performance. You may have
different requirements here, but they'd have to be laid out and agreed
upon.

> So, in the best of Apache traditions, were gonna do something about it. I'm
> creating a tree in the xml-contrib area in which to do a lot of code work to
> explore how such a new parser could come to be. It's called Spinnaker.

Is it really in the Apache traditions to start new things like that over
a week-end without having any discussion beforehand? Looking at Sun's
record I guess I can see a trend for sure...

>     * Smallest possible size. This means small distribution size (JAR file)
>       and small memory footprint.

These two requirements are in direct conflict. Interestingly enough the
DOM implementation used to be designed to produce the smallest byte code
possible. The complete reorg I have made to it (just a few months after
I got involved in this project btw) had a very different goal: making a
DOM instance smaller in memory. This led me to create many new classes
and sometimes duplicate some code.

> So, to close a few thoughts...
> 
> Q. Isn't this a slam on the Xerces guys?

I say yes. Looks like a "coup d'etat" to me.
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group

Re: [spinnaker] Announce

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/8/00 8:19 PM, burtonator at burton@relativity.yi.org wrote:

> Creating a new XML parser...
> 
> That is soooo 1999! :)

Yeah, well, in 1999 we were still really figuring out what this XML thing
meant.. We're still far into the hype zone, it'll be another year before we
get too much out of that.

1.0 is always a learning experience. :)

.duncan

Re: [spinnaker] Announce

Posted by burtonator <bu...@relativity.yi.org>.

James Duncan Davidson wrote:
<snip>
> explore how such a new parser could come to be. It's called Spinnaker.

Creating a new XML parser...

That is soooo 1999! :)

Kevin

-- 
Kevin A Burton (e-mail: burton@apache.org, UIN: 73488596, ZKey:
burtonator)
http://relativity.yi.org
Message to SUN:  "Please Open Source Java!"
To fight and conquer in all your battles is not supreme excellence;
supreme 
excellence consists in breaking the enemy's resistance without fighting.
    - Sun Tzu, 300 B.C.

Re: [spinnaker] Announce

Posted by Andy Clark <an...@apache.org>.

James Duncan Davidson wrote:
> After quite a bit of discussion, the rest of the XML team at Sun, 
> the people who are responsible for the parser that will ship in 
> the core of future JDKs, agree as well.

I would like to know who the "XML team at Sun" is. I've checked
the previous commit messages and only saw the initial checkin
of the Crimson DOM and some metric test files. Are the checkins
from the xml-contrib module going to the CVS mailing list? I
must be overlooking something. There doesn't seem to be any
commits on the main branch of the source code, though.

> So, in the best of Apache traditions, were gonna do something 
> about it. I'm creating a tree in the xml-contrib area in which 
> to do a lot of code work to explore how such a new parser could 
> come to be. It's called Spinnaker.

Is it possible that, in the future, we hear about submissions
to the tree *before* everyone goes home on Friday? I want us
all to work together on the future of the Xerces parser instead
of being surprised by a new source tree over a weekend.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: [spinnaker] Announce

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/7/00 10:31 PM, James Duncan Davidson at james.davidson@eng.sun.com
wrote:

> I've come to the conclusion that I agree with him. After quite a bit of
> discussion, the rest of the XML team at Sun,

Let me say that my description of the XML team at Sun isn't to imply that
this is a Sun venture... Reading back through I can see how this might have
colored some people that were p.o'd about seeing this. Starting up the new
code tree was *MY* idea.. The people at Sun didn't know it was going to
happen until 2 or so days before I sprang it on the world.

I would have started this no matter where I worked. I think that the need is
that great. If you want to blame Sun for this and make it into a pissing
match, then fine go ahead and feel that way, but it's really a "Duncan" idea
-- and if there is anybody to blame for firing it up, it's me and me alone.

I don't speak for my employer, my employer doesn't speak for me, and what I
do only coincidentally and sometimes accidentially happen to be in their
best interest. The only times I do speak for my company is when I call it
out specically. Otherwise, it's just me.

.duncan