You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@xalan.apache.org by Sc...@lotus.com on 2000/03/03 01:27:00 UTC

Xalan 2.0 Plans

Stefano Mazzocchi <st...@apache.org> wrote (in an offline note):
> Please, for Xalan 2.0, let's try to do a clean-room implementation.

Can you explain more what you mean by clean-room implementation?

The first order of business for 2.0 is that Rob and I will try hard to:

1) Untangle the
XMLParserLiaison/XPathSupport/XPathEnvSupport/XSLTEngineImpl spaghetti to
proper XPathContext and XSLTContext objects.  I want to get rid of the
concept of XMLParserLiaisons altogether, if possible, which I think I can
do by querying for DOM2 methods.  The reason this entanglement happened in
the first place is partly the division between XPath and XSLT... we've
worked hard to make XPath be able to be in it's own jar, which will become
especially important since Schema now requires use of XPath, as well as
does XPointer.

2) Move all XSLT stylesheet construction stuff into an xslt.compiler
package, and do the same for XPaths.

3) Implement the TRaX interfaces.

4) There's a bunch of other stuff, like making the SAX input interface
drive a DTM-like incremental DOM instead of a Xerces DOM (the DTM is very
Xerces Parser specific), adding more tooling support, reworking the
extension stuff according to some emerging standards, etc.

5) Of course the performance battle will go on.  This will partly be moving
more and more stuff out of run-time into compile-time.

6) We need to do more raw code cleanup, like bottlenecking more common
functions, perhaps sharing more of the Xerces utility libraries.  I would
like to shrink the code down to about half the current size, before we
start adding more features.

We have already (as of last night) tagged all classes and methods with meta
tags that describe if they are for internal use or not, removed dead code,
made more stuff private, and Don has been working feverishly to improve the
documentation.

We would be glad for any constructive input above what I mentioned on how
to make the code more readable and understandable.  I've tried pretty hard
to keep up on the JavaDoc, and do my best to explain things.  It's a hard
juggling act for us between trying to make the code fast, supporting the
standard 100% (I think we're 99% there right now), supporting people who
are using it, implementing needed features, helping to design things like
TRaX, keeping up with the planning and design in the XSL WG for XSLT
version 2, etc.  Hopefully stabilizing a 1.0.0 version will give us a
chance to step back and just do some cleaning up and redesign.  We
understand fully that good clean code that people can easily understand is
part of the criteria for delivery of code to open source (and part of the
criteria to our own sanity), and will do our best to deliver.

For 2.0 we would *really* like to get more non-Lotus people involved in the
coding -- hopefully cleaning up the code will help this process.  Module
candidates that come to mind are:

1) We would love to have someone take over the extension mechanism.

2) We need to build an XPointer implementation on top of XPath.  Is anyone
interested in this?

3) We need tooling interfaces into XPath in particular... in some sense to
expose the XPath as a sort of DOM.

4) Two somewhat special-purpose "NodeLocators" need to be built: a) one
that takes advantage of a schema, and b) one that can build an incremental
structured index that can be cached and reused.

-scott

Re: Xalan 2.0 Plans

Posted by Stefano Mazzocchi <st...@apache.org>.

Scott_Boag@lotus.com wrote:
> 
> Stefano Mazzocchi <st...@apache.org> wrote (in an offline note):
> > Please, for Xalan 2.0, let's try to do a clean-room implementation.
> 
> Can you explain more what you mean by clean-room implementation?

I mean:

1) outline the needs/requirements/constraints
2) draw a schematic view of the project from a logical components point
of view
3) define the interfaces between these components (using existing ones
or proposing the ones that are missing)
4) recurse to point 1 until the granularity is an object.

Open source software must be _designed_ to be so from the beginning.
Look at Cocoon, for example, or Avalon, or James or even JMeter.

In all the projects I started, I tried to make the architecture solid
and the modularity clear. Yes, it might sound more complicated to
°bootstrap° (in fact it is) but open source needs to be parallelized in
order to succeed.

Xalan is terrible at this, today: you have to know _all of it to
understand how to patch/help. This is why you didn't have that many
contributions from external people.

True, Xalan is a very specific tool, and for that reason, less
subjective to modularization... but still, you guys should focus on
better object orientation.

Let's be honest: XT and XSLT are way better than Xalan when it comes to
code readability and object orientation.

Scott, don't take this bad: Cocoon 1.x has serious problems and we all
know that. I know you understand that Xalan 1.x will have design
problems.

Clean-room means: forget you already have a working XSLT processor.
Let's design another one from scratch and let's do a better job this
time with the know-how we already have.

> The first order of business for 2.0 is that Rob and I will try hard to:
> 
> 1) Untangle the
> XMLParserLiaison/XPathSupport/XPathEnvSupport/XSLTEngineImpl spaghetti to
> proper XPathContext and XSLTContext objects.  I want to get rid of the
> concept of XMLParserLiaisons altogether, if possible, which I think I can
> do by querying for DOM2 methods.  The reason this entanglement happened in
> the first place is partly the division between XPath and XSLT... we've
> worked hard to make XPath be able to be in it's own jar, which will become
> especially important since Schema now requires use of XPath, as well as
> does XPointer.

Great move, +1

Ok, so we already have a modularization:

  Xalan -> XSLT engine + XPath engine 

> 2) Move all XSLT stylesheet construction stuff into an xslt.compiler
> package, and do the same for XPaths.

ok, recursive step #2

         Xalan
           |
     +-----+-----+
     |           | 
     V           V
    XSLT ----> XPath
     |           | 
     V           V
    XSLT       XPath
   compiler   compiler

> 3) Implement the TRaX interfaces.

I still like SAXT more, it's easier to remember. But I agree: this is a
must... so again another picture

          TRaX
           |
     +-----+-----+
     |           | 
     V           V
    XSLT ----> XPath
     |           | 
     V           V
    XSLT       XPath
   compiler   compiler

> 4) There's a bunch of other stuff, like making the SAX input interface
> drive a DTM-like incremental DOM instead of a Xerces DOM (the DTM is very
> Xerces Parser specific), adding more tooling support, reworking the
> extension stuff according to some emerging standards, etc.

The SAX behavior is clearly something you guys should focus. XSLT
transformations need to be fast and their processing loops tight so that
hotspot based JVM will optimize them a lot.

I strongly believe that part of XT performance comes from lack of
specifically optimized java code and clean object orientation. It is
possible to write faster java code by, for example, avoiding the use of
String creation and all that normal stuff, but it's way more productive
(and cheap!) to write good and clean java code and let the JVM do its
job.

I'm not saying that we should write tons of interfaces if we don't have
to... but, please, let's think java-ish.

> 5) Of course the performance battle will go on.  This will partly be moving
> more and more stuff out of run-time into compile-time.

This is the way to go, but consider this: normal C-optimization
technique is loop-unrolling. Unfortunately, in Java, the good old
for(;;) is way faster because execution analyzers can understand the
"heat" of that routine and optimize it for you on fly.

In theory, since much more information is present at execution time,
on-fly native incremental compilation with read-ahead capabilities could
provide better performance than natively compiled code.

This might not be today, but it will definately happen tomorrow. I was
told that Xerces-J is already faster than Xerces-C on some platform.
Let's make it so even for Xalan :)

> 6) We need to do more raw code cleanup, like bottlenecking more common
> functions, perhaps sharing more of the Xerces utility libraries.  I would
> like to shrink the code down to about half the current size, before we
> start adding more features.

+10

I might add: if a method is longer than your screen and your class has
more than 10 public methods, you're doing something wrong. Of course,
this is a flexible design pattern, expecially when APIs are concerned,
but in general, Xalan is too C-like. JVM have a real hard time
optimizing code that didn't follow their common OO techniques.

So, I suggest, let's write good, well designed and clean java code,
performance will come by itself.

> We have already (as of last night) tagged all classes and methods with meta
> tags that describe if they are for internal use or not, removed dead code,
> made more stuff private, and Don has been working feverishly to improve the
> documentation.

You guys still have to tell me what is that XalanDoc thing... ;)

> We would be glad for any constructive input above what I mentioned on how
> to make the code more readable and understandable.  I've tried pretty hard
> to keep up on the JavaDoc, and do my best to explain things.  It's a hard
> juggling act for us between trying to make the code fast, supporting the
> standard 100% (I think we're 99% there right now), supporting people who
> are using it, implementing needed features, helping to design things like
> TRaX, keeping up with the planning and design in the XSL WG for XSLT
> version 2, etc.  Hopefully stabilizing a 1.0.0 version will give us a
> chance to step back and just do some cleaning up and redesign.  

Yes. Don't feel pushed, man. I totally understand your situation and I
think you guys are all doing a great job and know the weak points of
your project. This is good enough for me at this point.

> We
> understand fully that good clean code that people can easily understand is
> part of the criteria for delivery of code to open source (and part of the
> criteria to our own sanity), and will do our best to deliver.
> 
> For 2.0 we would *really* like to get more non-Lotus people involved in the
> coding -- hopefully cleaning up the code will help this process.  Module
> candidates that come to mind are:
> 
> 1) We would love to have someone take over the extension mechanism.

This won't happen overnight. We are trying to see a way to intermix XSP
and Xalan to make XSP the instructions for the XSLT extentions... still
I don't see the light on this and since the people use it like this and
like it... there is no real need for this at this point.

> 2) We need to build an XPointer implementation on top of XPath.  Is anyone
> interested in this?

What do you mean?

> 3) We need tooling interfaces into XPath in particular... in some sense to
> expose the XPath as a sort of DOM.

What about including XPath capabilities in TReX?

> 4) Two somewhat special-purpose "NodeLocators" need to be built: a) one
> that takes advantage of a schema, and b) one that can build an incremental
> structured index that can be cached and reused.

Suggestion: if you want somebody to take over some ideas of yours, you
need to explain it in detail. This is what I did with XSP and Cocoon2.
Note: Ricardo wrote XSP and Pier is writing Cocoon2. Exactly the way I
like it :)

Docs are the key, expecially when your code is so complex to be
unreadable.

Look at Mozilla. They released 1.5 million lines of C code to the
public... and people loved the idea but hated the code.

It took two years for them to come up with something "decent" from an
open source perspective... and have you seen the latest M14? it rocks!
It took two years to bootstrap the project and come up with something
useful, but if they release a final version of Mozilla, well, people,
this project is going to beat the hell out of any other browser... 

why? because of the well designed internal architecture and great
modularity. No matter how hard M$ will try to add features, their
architecture will allow more and more people to work parallely on
cloning it, or improve it when wrong.

Xalan was not designed with the people in mind. Other processors were
(XT, XSL:P).

Do you know that would _really_ make me happy: to see Scott, Keith, Mike
Kay, and James Clark working together for the greatest of the XSLT
processors.

If this means staring from XSL:P or XT code and improve on that... hey,
let's talk about it.

I know Keith was not happy when the Apache XMl project decided to use
LotusXSL for the foundation of Xalan... but he agreed it was a necessary
choice.

Today this is not so anymore so let's discuss where the future should
take us.

Please, no flames. :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Come to the first official Apache Software Foundation Conference!  
------------------------- http://ApacheCon.Com ---------------------

Re: Xalan 2.0 Plans

Posted by Pierpaolo Fumagalli <pi...@apache.org>.

Scott_Boag@lotus.com wrote:
> 
> Stefano Mazzocchi <st...@apache.org> wrote (in an offline note):

It's not nice to forward private emails to the list :)

> > Please, for Xalan 2.0, let's try to do a clean-room implementation.

Good... I was just about to write one of those long-long-long emails on
that. (And actually, I'm going to in a couple of hours!!!)

> Can you explain more what you mean by clean-room implementation?

Let's get a whiteboard, with no content on it. Let's design the new
XALAN architecture with coloured whiteboard markers on it, and then, is
something from the old codebases (XALAN or whatever) can be recycled,
let's recycle it.

> The first order of business for 2.0 is that Rob and I will try hard to:
> 
> 1) [...] 2) [...] 3) [...] 4) [...] 5) [...] 6) [...] 

Those are all "patches" applied to the current codebase, IMVHO.
If we really want to start a collaborative effort, between tall
different teams involved in the XSLT engine development, I believe we
need to start with a new architecture, that must be understood and
drafted by all of us (you) involved in writing the XSLT engine for the
"next generation"

> We have already (as of last night) tagged all classes and methods with meta
> tags that describe if they are for internal use or not, removed dead code,
> made more stuff private, and Don has been working feverishly to improve the
> documentation.

Good... 

> We would be glad for any constructive input above what I mentioned on how
> to make the code more readable and understandable.  I've tried pretty hard
> to keep up on the JavaDoc, and do my best to explain things.  It's a hard
> juggling act for us between trying to make the code fast, supporting the
> standard 100% (I think we're 99% there right now), supporting people who
> are using it, implementing needed features, helping to design things like
> TRaX, keeping up with the planning and design in the XSL WG for XSLT
> version 2, etc.  Hopefully stabilizing a 1.0.0 version will give us a
> chance to step back and just do some cleaning up and redesign.  We
> understand fully that good clean code that people can easily understand is
> part of the criteria for delivery of code to open source (and part of the
> criteria to our own sanity), and will do our best to deliver.

That's really good...

> For 2.0 we would *really* like to get more non-Lotus people involved in the
> coding -- hopefully cleaning up the code will help this process.  Module
> candidates that come to mind are:
> 
> 1) We would love to have someone take over the extension mechanism.
> 
> 2) We need to build an XPointer implementation on top of XPath.  Is anyone
> interested in this?
> 
> 3) We need tooling interfaces into XPath in particular... in some sense to
> expose the XPath as a sort of DOM.
> 
> 4) Two somewhat special-purpose "NodeLocators" need to be built: a) one
> that takes advantage of a schema, and b) one that can build an incremental
> structured index that can be cached and reused.

I have another proposal on how to "draft" the XALAN 2.0 code, in a
similar way that the Jakarta project was "revolutioned" with the
Catalina proposal. I'm writing a longer email on that.

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

Re: Xalan 2.0 Plans

Posted by Lionel Villard <Li...@inrialpes.fr>.

And what up about a cleaner implementation of stylesheet document ? I mean
do you plan to implement the XSL-T document on top of a generic DOM 2
implementation ?

Thanks.

Lionel

----- Original Message -----
From: <Sc...@lotus.com>
To: Stefano Mazzocchi <st...@apache.org>
Cc: <xa...@xml.apache.org>
Sent: Friday, March 03, 2000 1:27 AM
Subject: Xalan 2.0 Plans


>
> Stefano Mazzocchi <st...@apache.org> wrote (in an offline note):
> > Please, for Xalan 2.0, let's try to do a clean-room implementation.
>
> Can you explain more what you mean by clean-room implementation?
>
> The first order of business for 2.0 is that Rob and I will try hard to:
>
> 1) Untangle the
> XMLParserLiaison/XPathSupport/XPathEnvSupport/XSLTEngineImpl spaghetti to
> proper XPathContext and XSLTContext objects.  I want to get rid of the
> concept of XMLParserLiaisons altogether, if possible, which I think I can
> do by querying for DOM2 methods.  The reason this entanglement happened in
> the first place is partly the division between XPath and XSLT... we've
> worked hard to make XPath be able to be in it's own jar, which will become
> especially important since Schema now requires use of XPath, as well as
> does XPointer.
>
> 2) Move all XSLT stylesheet construction stuff into an xslt.compiler
> package, and do the same for XPaths.
>
> 3) Implement the TRaX interfaces.
>
> 4) There's a bunch of other stuff, like making the SAX input interface
> drive a DTM-like incremental DOM instead of a Xerces DOM (the DTM is very
> Xerces Parser specific), adding more tooling support, reworking the
> extension stuff according to some emerging standards, etc.
>
> 5) Of course the performance battle will go on.  This will partly be
moving
> more and more stuff out of run-time into compile-time.
>
> 6) We need to do more raw code cleanup, like bottlenecking more common
> functions, perhaps sharing more of the Xerces utility libraries.  I would
> like to shrink the code down to about half the current size, before we
> start adding more features.
>
> We have already (as of last night) tagged all classes and methods with
meta
> tags that describe if they are for internal use or not, removed dead code,
> made more stuff private, and Don has been working feverishly to improve
the
> documentation.
>
> We would be glad for any constructive input above what I mentioned on how
> to make the code more readable and understandable.  I've tried pretty hard
> to keep up on the JavaDoc, and do my best to explain things.  It's a hard
> juggling act for us between trying to make the code fast, supporting the
> standard 100% (I think we're 99% there right now), supporting people who
> are using it, implementing needed features, helping to design things like
> TRaX, keeping up with the planning and design in the XSL WG for XSLT
> version 2, etc.  Hopefully stabilizing a 1.0.0 version will give us a
> chance to step back and just do some cleaning up and redesign.  We
> understand fully that good clean code that people can easily understand is
> part of the criteria for delivery of code to open source (and part of the
> criteria to our own sanity), and will do our best to deliver.
>
> For 2.0 we would *really* like to get more non-Lotus people involved in
the
> coding -- hopefully cleaning up the code will help this process.  Module
> candidates that come to mind are:
>
> 1) We would love to have someone take over the extension mechanism.
>
> 2) We need to build an XPointer implementation on top of XPath.  Is anyone
> interested in this?
>
> 3) We need tooling interfaces into XPath in particular... in some sense to
> expose the XPath as a sort of DOM.
>
> 4) Two somewhat special-purpose "NodeLocators" need to be built: a) one
> that takes advantage of a schema, and b) one that can build an incremental
> structured index that can be cached and reused.
>
> -scott
>
>
>
>