You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xml-commons-dev@xerces.apache.org by ne...@ca.ibm.com on 2002/08/01 23:15:23 UTC

Re: [discuss] Versioning strategy for standards files: SAX/DOM/JAXP

Hi all,

Hadn't realized the reply-to fields wasn't set on this list...
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com


----- Forwarded by Neil Graham/Toronto/IBM on 08/01/2002 05:14 PM -----
|---------+---------------------------->
|         |           Neil Graham      |
|         |                            |
|         |           08/01/2002 05:11 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >---------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                                             |
  |       To:      Shane Curcuru/Cambridge/IBM@Lotus                                                                                            |
  |       cc:                                                                                                                                   |
  |       From:    Neil Graham/Toronto/IBM@IBMCA                                                                                                |
  |       Subject: Re: [discuss] Versioning strategy for standards files: SAX/DOM/JAXP(Document link: Neil Graham)                              |
  |                                                                                                                                             |
  |                                                                                                                                             |
  |                                                                                                                                             |
  |                                                                                                                                             |
  |                                                                                                                                             |
  >---------------------------------------------------------------------------------------------------------------------------------------------|



Hi all,

I've been meaning to follow-up on this excellent post from Shane for quite
a while now, but I got lost in the morass that is my in-box.  :-)

I've never posted to this forum before, so for those who don't know, I'm
one of the xerces-j committers.  I guess that makes me a prospective
customer--since, of course, Xerces-J does not yet make use of xml-commons.

In this post, I want to do a few things.  Firstly, I need to give some
background into the reasons why Xerces-J has not adopted xml-commons. This
will also mean going into a bit of detail into some Xerces
requirements--requirements that might well resonate with other base
technologies like Xalan.  Then, I'll respond to some particular points in
Shane's note.  Finally, I'll try and share some ideas about how to move
forward.

Before going further, though, I should point out that I'm not any kind of
Xerces-J spokesperson; I'm just giving my perspective on where Xerces has
been and where it's at now.  I'm interested in this discussion because I
think it's overwhelmingly important for the users of low-level projects
like Xalan and Xerces (and Soap and Axis too, I guess) that products are
shipped supporting API's as close to identical as possible.  The API side
of xml-commons interests me only in so far as it is a useful container for
identifying and grouping "standard" versions of common API's.

So far, it seems to me that Xerces has chosen to go its own way largely so
that we precisely control the API's that we implement.  We'd like to decide
when it's time, for example, for us to support DOM level 3 as a standard
part of our distribution; we'd like to deal directly with our customers to
assess when that point is reached, rather than having to work through the
intermediary of some other project with different dependencies.  Also,
there are times when it's proved necessary for us to deviate slightly from
"standard" API's:  For instance, at the moment we're signature-compatible
with SAX 2.0 but we've adopted many of the bugfixes that went into SAX
2.0.1--and one or two that didn't, IIRC.

Packaging has always been an issue for us as well.  We've always wanted to
avoid shipping API's--like the transform half of JAXP or DOM subpackages
such as views--that we don't implement, because we feel that this could be
a real source of confusion for users.  It's by no means obvious to a newbie
where an XML parser stops and an XSLT processor begins, for example, and
we've always felt it to be best if we ship precisely what we support.

It's also true that not everyone in Xerces-J land has completely accepted
the principle behind xml-commons.  There are certainly products that both
implement a certain API and rely on other implementations as well--I
believe Xalan has its own DOM implementation, but can also happily use a
Xerces DOM, for instance.  But for most products that *really* care about
API versions, implementation versions will matter quite as much.  So the
API might as well come from the product implementing it.  Indeed, now we
need to have discussions about versions of API's in versions of
xml-commons, which seems somehow ironic.

But xml-commons is used by a lot of folks, and clearly Xerces-J needs to
figure out how to deal with it.  And besides, it is unquestionably a good
forum for cross-project collaboration.  And since many customers use
several Apache projects, cross-project collaboration with respect to basic
things like API's just has to absorb more focus than it has heretofore.  So
let me respond to a few things Shane said:

>like work.  proposal/item[@num="2"] is designed to meet this need.  With
>just a little bit of effort and documentation, we can easily produce
>xml-commons distributions that provide this.  Once we better understand
how
>the TCK's work, we could even make selected fixes and updates to various
>implementations of standards files (some SAX fixes come to mind) that will
>still pass the TCKs (which partly only test method signatures).

This is certainly what we've had to do in Xerces.  So I'd agree with
this--any TCK-compliant "branch" would probably have a few fixes in it.
Edwin's recent contribution to JAXP's FactoryFinders comes to mind:  This
is a beneficial fix that improves robustness and won't break
TCK-compatibility; it's something we'd very likely want to see in a
TCK-compliant API.  Migrating this fix to the SAX code, and communicating
its usefulness to the SAX maintainers hopefully for inclusion in subsequent
SAX releases would definitely be a winning strategy.

><issue num="1">Which 'branch' would be which?  Shane proposes to make a
>'TCK' branch that would meet proposal/item[@num="2"] now, and perhaps
would
>be updated later to meet JAXP 1.2, etc.  Then the main trunk would be for
>proposal/item[@num="3"].  Why?  Because the trunk already has SAX 2.0.1
>checked in, and because for me the trunk should be the
latest-and-greatest.
>But several others have argued the reverse, so we need to get
>consensus.</issue>

Guess I'd be one of those others.  :-)  If we take Gary's suggestion--which
is very interesting--then the question becomes mainly moot.  But my problem
with putting the TCK-compliant code on a branch is that it's not unlikely
we'll neded to branch it at some point:  Say we have a JAXP 1.1-compliant
set of API's and subsequently version that to a JAXP-1.3 compliant set.  If
a fix needs to be applied to the JAXP-1.1 code, if we're on a branch we've
got difficulties; if we're on the main trunk, we can always branch.  But
back levels of the "latest and greatest" should never need branching,
because it's the "latest and greatest" and it should always be fine to
apply fixes to the head.  It's also much more likely there will be hard
(legal) dependencies on TCK-compliant code than the "latest and greatest"
I'd think, so users of the "latest and greatest" should almost always be
able simply to grab the head.

><issue num="2">Better understanding and procedures for which TCK's we need
>to support, and exactly what they test.  I think that a number of the IBM
>committers on Xerces and Xalan may be able to help cover this issue, since
>some of us already have experience in this area (sadly, Shane is not one
of
>them yet...)</issue>

Not at all sure I'd be sad about that if I were you!  :-)  But I'm
certainly willing to help here.

><issue num="3">Actual packaging of standards files.

I think there are concerns about how the current "monolithic" xml-apis (and
indeed xmlParserAPIs) interacts with the Java endorsed standards override
mechanism.  So this issue deserves further discussion both in xerces and
xml-commons; but it also deserves a separate thread...

><issue num="5">Coordination with various dependent Apache projects,
>especially for ones with GUMP runs and/or who would want to provide builds
>from both 'branches' of xml-commons (one backlevel TCK one, the other
>latest-and-greatest one).</issue>

I have to admit it isn't clear to me how many products will manage to
support *both* a latest-and-greatest build *and* a TCK-compliant build.  I
guess Xerces itself is moving somewhat in this direction, so perhaps this
is more of a general problem than might appear at first sight...

In terms of moving forward, I'd propose a few steps:

1.  Send an e-mail to general@xml and general@jakarta advising of this
discussion and asking interested people--especially from projects that
contemplate depending on xml-commons (or Xerces for that matter) to
participate;

2.  Once we've given folks time to come on board, we need to conclude where
the "latest-and-greatest" and "TCK-compliant" code will live in
xml-commons;

3.  Then we need to define just what TCK-compliant means at this moment,
and figure out some way of versioning it so that it is versionable and yet
all stakeholders are kept involved.

Just what the latest-and-greatest means for the moment should probably get
decided too at some point, but there probably isn't quite as much of a rush
there.  No one's likely to be forced to delay shipping because they don't
have some specific version of the latest and greatest, but a project could
definitely get into trouble over TCK compliance.

Anyway, that's way more than enough raving for my first post!  :-)

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com